Lecture Notes in Control and Information Sciences 345 Editors: M. Thoma, M. Morari
De-Shuang Huang, Kang Li, George William Irwin (Eds.)
Intelligent Computing in Signal Processing and Pattern Recognition International Conference on Intelligent Computing, ICIC 2006 Kunming, China, August 16–19, 2006
ABC
Series Advisory Board F. Allgöwer, P. Fleming, P. Kokotovic, A.B. Kurzhanski, H. Kwakernaak, A. Rantzer, J.N. Tsitsiklis
Editors De-Shuang Huang
George William Irwin Queen’s University Belfast, UK E-mail:
[email protected]
Institute of Intelligent Machines Chinese Academy of Sciences Hefei, Anhui, China E-mail:
[email protected]
Kang Li Queen’s University Belfast, UK E-mail:
[email protected]
Library of Congress Control Number: 2006930912 ISSN print edition: 0170-8643 ISSN electronic edition: 1610-7411 ISBN-10 3-540-37257-1 Springer Berlin Heidelberg New York ISBN-13 978-3-540-37257-8 Springer Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable for prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springer.com c Springer-Verlag Berlin Heidelberg 2006 The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Cover design: design & production GmbH, Heidelberg Printed on acid-free paper
SPIN: 11816515
89/techbooks
543210
Preface
The International Conference on Intelligent Computing (ICIC) was formed to provide an annual forum dedicated to the emerging and challenging topics in artificial intelligence, machine learning, bioinformatics, and computational biology, etc. It aims to bring together researchers and practitioners from both academia and industry to share ideas, problems and solutions related to the multifaceted aspects of intelligent computing. ICIC 2006 held in Kunming, Yunnan, China, August 16-19, 2006, was the second International Conference on Intelligent Computing, built upon the success of ICIC 2005 held in Hefei, China, 2005. This year, the conference concentrated mainly on the theories and methodologies as well as the emerging applications of intelligent computing. It intended to unify the contemporary intelligent computing techniques within an integral framework that highlights the trends in advanced computational intelligence and bridges theoretical research with applications. In particular, bio-inspired computing emerged as having a key role in pursuing for novel technology in recent years. The resulting techniques vitalize life science engineering and daily life applications. In light of this trend, the theme for this conference was “Emerging Intelligent Computing Technology and Applications”. Papers related to this theme were especially solicited, including theories, methodologies, and applications in science and technology. ICIC 2006 received over 3000 submissions from 36 countries and regions. All papers went through a rigorous peer review procedure and each paper received at least three review reports. Based on the review reports, the Program Committee finally selected 703 high-quality papers for presentation at ICIC 2006. These papers cover 29 topics and 16 special sessions, and are included in five volumes of proceedings published by Springer, including one volume of Lecture Notes in Computer Science (LNCS), one volume of Lecture Notes in Artificial Intelligence (LNAI), one volume of Lecture Notes in Bioinformatics (LNBI), and two volumes of Lecture Notes in Control and Information Sciences (LNCIS). This volume of Lecture Notes in Control and Information Sciences (LNCIS) includes 149 papers covering one topics of Intelligent Computing in Signal Processing and Pattern Recognition and one topics of Special Session on Computing for Searching Strategies to Control Dynamic Processes. The organizers of ICIC 2006, including Yunan University, the Institute of Intelligent Machines of the Chinese Academy of Science, and Queen’s University Belfast, have made enormous effort to ensure the success of ICIC 2006. We hereby would like to thank the members of the ICIC 2006 Advisory Committee for their guidance and advice, the members of the Program Committee and the referees for their collective effort in reviewing and soliciting the papers, and the members of the Publication Committee for their significant editorial work. We would like to thank
VI
Preface
Alfred Hofmann, executive editor from Springer, for his frank and helpful advice and guidance throughout and for his support in publishing the proceedings in the Lecture Notes series. In particular, we would like to thank all the authors for contributing their papers. Without the high-quality submissions from the authors, the success of the conference would not have been possible. Finally, we are especially grateful to the IEEE Computational Intelligence Society, The International Neural Network Society and the National Science Foundation of China for the their sponsorship.
June 2006
De-Shuang Huang Institute of Intelligent Machines, Chinese Academy of Sciences, China Kang Li Queen’s University Belfast, UK George William Irwin Queen’s University Belfast, UK
ICIC 2006 Organization
General Chairs
De-Shuang Huang, China Song Wu, China George W. Irwin, UK
International Advisory Committee Aike Guo, China Alfred Hofmann, Germany DeLiang Wang, USA Erke Mao, China Fuchu He, China George W. Irwin, UK Guangjun Yang, China Guanrong Chen, Hong Kong Guoliang Chen, China Harold Szu, USA John L. Casti, USA
Marios M. Polycarpou, USA Mengchu Zhou, USA Michael R. Lyu, Hong Kong MuDer Jeng, Taiwan Nanning Zheng, China Okyay Knynak, Turkey Paul Werbos, USA Qingshi Zhu, China Ruwei Dai, China Sam Shuzhi GE, Singapore Sheng Zhang, China Shoujue Wang, China
Songde Ma, China Stephen Thompson, UK Tom Heskes, Netherlands Xiangfan He, China Xingui He, China Xueren Wang, China Yanda Li, China Yixin Zhong, China Youshou Wu, China Yuanyan Tang, Hong Kong Yunyu Shi, China Zheng Bao, China
Program Committee Chairs
Kang Li, UK Prashan Premaratne, Australia
Steering Committee Chairs:
Sheng Chen, UK Xiaoyi Jiang, Germany Xiao-Ping Zhang, Canada
Organizing Committee Chairs:
Yongkun Li, China Hanchun Yang, China Guanghua Hu, China
Special Session Chair:
Wen Yu, Mexico
VIII
Organization
Tutorial Chair:
Sudharman K. Jayaweera, USA
Publication Chair:
Xiaoou Li, Mexico
International Liasion Chair:
C. De Silva, Liyanage, New Zealand
Publicity Chairs:
Simon X.Yang, Canada Jun Zhang, Sun Yat-Sen, China Cheng Peng, China
Exhibition Chair:
Program Committee: Aili Han, China Arit Thammano, Thailand Baogang Hu, China Bin Luo, China Bin Zhu, China Bing Wang, China Bo Yan, USA Byoung-Tak Zhang, Korea Caoan Wang, Canada Chao Hai Zhang, Japan Chao-Xue Wang, China Cheng-Xiang Wang, UK Cheol-Hong Moon, Korea Chi-Cheng Cheng, Taiwan Clement Leung, Australia Daniel Coca, UK Daqi Zhu, China David Stirling, Australia Dechang Chen, USA Derong Liu, USA Dewen Hu, China Dianhui Wang, Australia Dimitri Androutsos, Canada Donald C. Wunsch, USA Dong Chun Lee, Korea Du-Wu Cui, China Fengling Han, Australia Fuchun Sun, China
Guang-Bin Huang, Singapore Guangrong Ji, China Hairong Qi, USA Hong Qiao, China Hong Wang, China Hongtao Lu, China Hongyong Zhao, China Huaguang Zhang, China Hui Wang, China Vitoantonio Bevilacqua, Italy Jiangtao Xi, Australia Jianguo Zhu, Australia Jianhua Xu, China Jiankun Hu, Australia Jian-Xun Peng, UK Jiatao Song, China Jie Tian, China Jie Yang, China Jin Li, UK Jin Wu, UK Jinde Cao, China Jinwen Ma, China Jochen Till, Germany John Q. Gan, UK Ju Liu, China K. R. McMenemy, UK Key-Sun Choi, Korea
Luigi Piroddi, Italy Maolin Tang, Australia Marko Hoþevar, Slovenia Mehdi Shafiei, Canada Mei-Ching Chen, Taiwan Mian Muhammad Awais, Pakistan Michael Granitzer, Austria Michael J.Watts, New Zealand Michiharu Maeda, Japan Minrui Fei, China Muhammad Jamil Anwas, Pakistan Muhammad Khurram Khan, China Naiqin Feng, China Nuanwan Soonthornphisaj, Thailand Paolo Lino, Italy Peihua Li, China Ping Guo, China Qianchuan Zhao, China Qiangfu Zhao, Japan Qing Zhao, Canada Roberto Tagliaferri, Italy Rong-Chang Chen, Taiwan RuiXiang Sun, China
Organization
Girijesh Prasad, UK Sanjay Sharma, UK Seán McLoone, Ireland Seong G. Kong, USA Shaoning Pang, New Zealand Shaoyuan Li, China Shuang-Hua Yang, UK Shunren Xia, China Stefanie Lindstaedt, Austria Sylvia Encheva, Norway Tai-hoon Kim, Korea Tai-Wen Yue, Taiwan Takashi Kuremoto, Japan Tarık Veli Mumcu, Turkey Tian Xiang Mei, UK
Liangmin Li, UK Tim. B. Littler, UK Tommy W. S. Chow, Hong Kong Uwe Kruger, UK Wei Dong Chen, China Wenming Cao, China Wensheng Chen, China Willi Richert, Germany Worapoj Kreesuradej, Thailand Xiao Zhi Gao, Finland Xiaoguang Zhao, China Xiaojun Wu, China Xiaolong Shi, China Xiaoou Li, Mexico Xinge You, Hong Kong Xiwen Zhang, China
IX
Saeed Hashemi, Canada Xiyuan Chen, China Xun Wang, UK Yanhong Zhou, China Yi Shen, China Yong Dong Wu, Singapore Yuhua Peng, China Zengguang Hou, China Zhao-Hui Jiang, Japan Zhen Liu, Japan Zhi Wang, China Zhi-Cheng Chen, China Zhi-Cheng Ji, China Zhigang Zeng, China Ziping Chiang, Taiwa
Reviewers Xiaodan Wang, Lei Wang, Arjun Chandra, Angelo Ciaramella, Adam Kalam, Arun Sathish, Ali Gunes, Jin Tang, Aiguo He, Arpad Kelemen, Andreas Koschan, Anis Koubaa, Alan Gupta, Alice Wang, Ali Ozen, Hong Fang, Muhammad Amir Yousuf , An-Min Zou, Andre Döring, Andreas Juffinger, Angel Sappa, Angelica Li, Anhua Wan, Bing Wang, Rong Fei, Antonio Pedone, Zhengqiang Liang , Qiusheng An, Alon Shalev Housfater, Siu-Yeung Cho, Atif Gulzar, Armin Ulbrich, Awhan Patnaik, Muhammad Babar, Costin Badica, Peng Bai, Banu Diri, Bin Cao, Riccardo Attimonelli, Baohua Wang, Guangguo Bi, Bin Zhu, Brendon Woodford, Haoran Feng, Bo Ma, Bojian Liang, Boris Bacic, Brane Sirok, Binrong Jin, Bin Tian, Christian Sonntag, Galip Cansever, Chun-Chi Lo, ErKui Chen, Chengguo Lv, Changwon Kim, Chaojin Fu, Anping Chen, Chen Chun , C.C. Cheng, Qiming Cheng, Guobin Chen, Chengxiang Wang, Hao Chen, Qiushuang Chen, Tianding Chen, Tierui Chen, Ying Chen, Mo-Yuen Chow, Christian Ritz, Chunmei Liu, Zhongyi Chu, Feipeng Da, Cigdem Turhan, Cihan Karakuzu, Chandana Jayasooriya, Nini Rao, Chuan-Min Zhai, Ching-Nung Yang, Quang Anh Nguyen, Roberto Cordone, Changqing Xu, Christian Schindler, Qijun Zhao, Wei Lu, Zhihua Cui, Changwen Zheng, David Antory, Dirk Lieftucht, Dedy Loebis, Kouichi Sakamoto, Lu Chuanfeng, Jun-Heng Yeh, Dacheng Tao, Shiang-Chun Liou, Ju Dai , Dan Yu, Jianwu Dang, Dayeh Tan, Yang Xiao, Dondong Cao, Denis Stajnko, Liya De Silva, Damien Coyle, Dian-Hui Wang, Dahai Zhang, Di Huang, Dikai Liu, D. Kumar, Dipak Lal Shrestha, Dan Lin, DongMyung Shin, Ning Ding, DongFeng Wang, Li Dong, Dou Wanchun, Dongqing Feng, Dingsheng Wan, Yongwen Du, Weiwei Du, Wei Deng, Dun-wei Gong, DaYong Xu, Dar-Ying Jan, Zhen Duan, Daniela Zaharie,
X
Organization
ZhongQiang Wu, Esther Koller-Meier, Anding Zhu, Feng Pan, Neil Eklund, Kezhi Mao, HaiYan Zhang, Sim-Heng Ong, Antonio Eleuteri, Bang Wang, Vincent Emanuele, Michael Emmerich, Hong Fu, Eduardo Hruschka, Erika Lino, Estevam Rafael Hruschka Jr, D.W. Cui, Fang Liu, Alessandro Farinelli, Fausto Acernese, Bin Fang, Chen Feng, Huimin Guo, Qing Hua, Fei Zhang, Fei Ge, Arnon Rungsawang, Feng Jing, Min Feng, Feiyi Wang, Fengfeng Zhou, Fuhai Li, Filippo Menolascina, Fengli Ren, Mei Guo, Andrés Ferreyra, Francesco Pappalardo, Chuleerat Charasskulchai, Siyao Fu, Wenpeng Ding, Fuzhen Huang, Amal Punchihewa, Geoffrey Macintyre, Xue Feng He, Gang Leng, Lijuan Gao, Ray Gao, Andrey Gaynulin, Gabriella Dellino, D.W. Ggenetic, Geoffrey Wang, YuRong Ge, Guohui He, Gwang Hyun Kim, Gianluca Cena, Giancarlo Raiconi, Ashutosh Goyal, Guan Luo, Guido Maione, Guido Maione, Grigorios Dimitriadis, Haijing Wang, Kayhan Gulez, Tiantai Guo, Chun-Hung Hsieh, Xuan Guo, Yuantao Gu, Huanhuan Chen, Hongwei Zhang, Jurgen Hahn, Qing Han, Aili Han, Dianfei Han, Fei Hao, Qing-Hua Ling, Hang-kon Kim, Han-Lin He, Yunjun Han, Li Zhang, Hathai Tanta-ngai, HangBong Kang, Hsin-Chang Yang, Hongtao Du, Hazem Elbakry, Hao Mei, Zhao L, Yang Yun, Michael Hild, Heajo Kang, Hongjie Xing, Hailli Wang, Hoh In, Peng Bai, Hong-Ming Wang, Hongxing Bai, Hongyu Liu, Weiyan Hou, Huaping Liu, H.Q. Wang, Hyungsuck Cho, Hsun-Li Chang, Hua Zhang, Xia Huang, Hui Chen, Huiqing Liu, Heeun Park, Hong-Wei Ji, Haixian Wang, Hoyeal Kwon, H.Y. Shen, Jonghyuk Park, Turgay Ibrikci, Mary Martin, Pei-Chann Chang, Shouyi Yang, Xiaomin Mu, Melanie Ashley, Ismail Altas, Muhammad Usman Ilyas, Indrani Kar, Jinghui Zhong, Ian Mack, Il-Young Moon, J.X. Peng , Jochen Till, Jian Wang, Quan Xue, James Govindhasamy, José Andrés Moreno Pérez, Jorge Tavares, S. K. Jayaweera, Su Jay, Jeanne Chen, Jim Harkin, Yongji Jia, Li Jia, Zhao-Hui Jiang, Gangyi Jiang, Zhenran Jiang, Jianjun Ran, Jiankun Hu, Qing-Shan Jia, Hong Guo, Jin Liu, Jinling Liang, Jin Wu, Jing Jie, Jinkyung Ryeu, Jing Liu, Jiming Chen, Jiann-Ming Wu, James Niblock, Jianguo Zhu, Joel Pitt, Joe Zhu, John Thompson, Mingguang Shi, Joaquin Peralta, Si Bao Chen, Tinglong Pan, Juan Ramón González González, JingRu Zhang, Jianliang Tang, Joaquin Torres, Junaid Akhtar, Ratthachat Chatpatanasiri, Junpeng Yuan, Jun Zhang, Jianyong Sun, Junying Gan, Jyh-Tyng Yau, Junying Zhang, Jiayin Zhou, Karen Rosemary McMenemy, Kai Yu, Akimoto Kamiya, Xin Kang, Ya-Li Ji, GuoShiang Lin, Muhammad Khurram, Kevin Curran, Karl Neuhold, Kyongnam Jeon, Kunikazu Kobayashi, Nagahisa Kogawa, Fanwei Kong, Kyu-Sik Park, Lily D. Li, Lara Giordano, Laxmidhar Behera, Luca Cernuzzi, Luis Almeida, Agostino Lecci, Yan Zuo, Lei Li, Alberto Leva, Feng Liang, Bin Li, Jinmei Liao, Liang Tang, Bo Lee, Chuandong Li, Lidija Janezic, Jian Li, Jiang-Hai Li, Jianxun Li, Limei Song, Ping Li, Jie Liu, Fei Liu, Jianfeng Liu, Jianwei Liu, Jihong Liu, Lin Liu, Manxi Liu, Yi Liu, Xiaoou Li, Zhu Li, Kun-hong Liu, Li Min Cui, Lidan Miao, Long Cheng , Huaizhong Zhang, Marco Lovera, Liam Maguire, Liping Liu, Liping Zhang, Feng Lu, Luo Xiaobin, Xin-ping Xie, Wanlong Li, Liwei Yang, Xinrui Liu, Xiao Wei Li, Ying Li, Yongquan Liang, Yang Bai, Margherita Bresco, Mingxing Hu, Ming Li, Runnian Ma, Meta-Montero Manrique, Zheng Gao, Mingyi Mao, Mario Vigliar, Marios Savvides, Masahiro Takatsuka, Matevz Dular, Mathias Lux, Mutlu Avci, Zhifeng Hao, Zhifeng Hao, Ming-Bin Li, Tao Mei, Carlo Meloni, Gennaro Miele, Mike Watts, Ming Yang,
Organization
XI
Jia Ma, Myong K. Jeong, Michael Watts, Markus Koch, Markus Koch, Mario Koeppen, Mark Kröll, Hui Wang, Haigeng Luo, Malrey Lee, Tiedong Ma, Mingqiang Yang, Yang Ming, Rick Chang, Nihat Adar, Natalie Schellenberg, Naveed Iqbal, Nur Bekiroglu, Jinsong Hu, Nesan Aluha, Nesan K Aluha, Natascha Esau, Yanhong Luo, N.H. Siddique, Rui Nian, Kai Nickel, Nihat Adar, Ben Niu, Yifeng Niu, Nizar Tayem, Nanlin Jin, Hong-Wei Ji, Dongjun Yu, Norton Abrew, Ronghua Yao, Marco Moreno-Armendariz, Osman Kaan Erol, Oh Kyu Kwon, Ahmet Onat, Pawel Herman, Peter Hung, Ping Sun, Parag Kulkarni, Patrick Connally, Paul Gillard, Yehu Shen, Paul Conilione, Pi-Chung Wang, Panfeng Huang, Peter Hung, Massimo Pica Ciamarra, Ping Fang, Pingkang Li, Peiming Bao, Pedro Melo-Pinto, Maria Prandini, Serguei Primak, Peter Scheir, Shaoning Pang, Qian Chen, Qinghao Rong, QingXiang Wu, Quanbing Zhang, Qifu Fan, Qian Liu, Qinglai Wei, Shiqun Yin, Jianlong Qiu, Qingshan Liu, Quang Ha, SangWoon Lee , Huaijing Qu, Quanxiong Zhou , Qingxian Gong, Qingyuan He, M.K.M. Rahman, Fengyuan Ren, Guang Ren, Qingsheng Ren, Wei Zhang, Rasoul Milasi, Rasoul Milasi, Roberto Amato, Roberto Marmo, P. Chen, Roderick Bloem, Hai-Jun Rong, Ron Von Schyndel, Robin Ferguson, Runhe Huang, Rui Zhang, Robin Ferguson, Simon Johnston, Sina Rezvani, Siang Yew Chong, Cristiano Cucco, Dar-Ying Jan, Sonya Coleman, Samuel Rodman, Sancho SalcedoSanz, Sangyiel Baik, Sangmin Lee, Savitri Bevinakoppa, Chengyi Sun, Hua Li, Seamus McLoone, Sean McLoone, Shafayat Abrar, Aamir Shahzad, Shangmin Luan, Xiaowei Shao, Shen Yanxia, Zhen Shen, Seung Ho Hong, Hayaru Shouno, Shujuan Li, Si Eng Ling, Anonymous, Shiliang Guo, Guiyu Feng, Serafin Martinez Jaramillo, Sangwoo Moon, Xuefeng Liu, Yinglei Song, Songul Albayrak, Shwu-Ping Guo, Chunyan Zhang, Sheng Chen, Qiankun Song, Seok-soo Kim, Antonino Staiano, Steven Su, Sitao Wu, Lei Huang, Feng Su, Jie Su, Sukree Sinthupinyo, Sulan Zhai, Jin Sun, Limin Sun, Zengshun Zhao, Tao Sun, Wenhong Sun, Yonghui Sun, Supakpong Jinarat, Srinivas Rao Vadali, Sven Meyer zu Eissen, Xiaohong Su , Xinghua Sun, Zongying Shi, Tony Abou-Assaleh, Youngsu Park, Tai Yang, Yeongtak Jo, Chunming Tang, Jiufei Tang, Taizhe Tan, Tao Xu, Liang Tao, Xiaofeng Tao, Weidong Xu, Yueh-Tsun Chang, Fang Wang, Timo Lindemann, Tina Yu, Ting Hu, Tung-Kuan Liu, Tianming Liu, Tin Lay Nwe, Thomas Neidhart, Tony Chan, Toon Calders, Yi Wang, Thao Tran, Kyungjin Hong, Tariq Qureshi, Tung-Shou Chen, Tsz Kin Tsui, Tiantian Sun, Guoyu Tu, Tulay Yildirim, Dandan Zhang, Xuqing Tang, Yuangang Tang, Uday Chakraborty, Luciana Cariello, Vasily Aristarkhov, Jose-Luis Verdegay, Vijanth Sagayan Asirvadam, Vincent Lee, Markus Vincze, Duo Chen, Viktoria Pammer, Vedran Sabol, Wajeeha Akram, Cao Wang , Xutao Wang, Winlen Wang, Zhuang Znuang, Feng Wang, Haifeng Wang, Le Wang, Wang Linkun, Meng Wang, Rongbo Wang, Xin Wang, Xue Wang, Yan-Feng Wang, Yong Wang, Yongcai Wang, Yongquan Wang, Xu-Qin Li, Wenbin Liu, Wudai Liao, Weidong Zhou, Wei Li, Wei Zhang, Wei Liang, Weiwei Zhang, Wen Xu, Wenbing Yao, Xiaojun Ban, Fengge Wu, Weihua Mao, Shaoming Li, Qing Wu, Jie Wang, Wei Jiang, W Jiang, Wolfgang Kienreich, Linshan Wang, Wasif Naeem, Worasait Suwannik, Wolfgang Slany, Shijun Wang , Wooyoung Soh, Teng Wang, Takashi Kuremoto, Hanguang Wu, Licheng Wu, Xugang Wang, Xiaopei Wu, ZhengDao Zhang, Wei Yen, Yan-Guo Wang, Daoud Ait-Kadi, Xiaolin Hu, Xiaoli Li, Xun
XII
Organization
Wang, Xingqi Wang, Yong Feng, Xiucui Guan, Xiao-Dong Li, Xingfa Shen, Xuemin Hong, Xiaodi Huang, Xi Yang, Li Xia, Zhiyu Xiang, Xiaodong Li, Xiaoguang Zhao, Xiaoling Wang, Min Xiao, Xiaonan Wu, Xiaosi Zhan, Lei Xie, Guangming Xie, Xiuqing Wang, Xiwen Zhang, XueJun Li, Xiaojun Zong, Xie Linbo, Xiaolin Li, Xin Ma, Xiangqian Wu, Xiangrong Liu, Fei Xing, Xu Shuzheng, Xudong Xie, Bindang Xue, Xuelong Li, Zhanao Xue, Xun Kruger, Xunxian Wang, Xusheng Wei, Yi Xu, Xiaowei Yang, Xiaoying Wang, Xiaoyan Sun, YingLiang Ma, Yong Xu, Jongpil Yang, Lei Yang, Yang Tian, Zhi Yang, Yao Qian, Chao-bo Yan, Shiren Ye, Yong Fang, Yanfei Wang, Young-Gun Jang, Yuehui Chen, Yuh-Jyh Hu, Yingsong Hu, Zuoyou Yin, Yipan Deng, Yugang Jiang, Jianwei Yang, Yujie Zheng, Ykung Chen, Yan-Kwang Chen, Ye Mei, Yongki Min, Yongqing Yang, Yong Wu, Yongzheng Zhang, Yiping Cheng, Yongpan Liu, Yanqiu Bi, Shengbao Yao, Yongsheng Ding, Haodi Yuan, Liang Yuan, Qingyuan He, Mei Yu, Yunchu Zhang, Yu Shi, Wenwu Yu, Yu Wen, Younghwan Lee, Ming Kong, Yingyue Xu, Xin Yuan, Xing Yang, Yan Zhou, Yizhong Wang, Zanchao Zhang, Ji Zhicheng, Zheng Du, Hai Ying Zhang, An Zhang, Qiang Zhang, Shanwen Zhang, Shanwen Zhang, Zhang Tao, Yue Zhao, R.J. Zhao, Li Zhao, Ming Zhao, Yan Zhao, Bojin Zheng, Haiyong Zheng, Hong Zheng, Zhengyou Wang, Zhongjie Zhu, Shangping Zhong, Xiaobo Zhou, Lijian Zhou, Lei Zhu, Lin Zhu, Weihua Zhu, Wumei Zhu, Zhihong Yao, Yumin Zhang, Ziyuan Huang, Chengqing Li, Z. Liu, Zaiqing Nie, Jiebin Zong, Zunshui Cheng, Zhongsheng Wang, Yin Zhixiang, Zhenyu He, Yisheng Zhong, Tso-Chung Lee, Takashi Kuremoto Tao Jianhua, Liu Wenjue, Pan Cunhong, Li Shi, Xing Hongjie, Yang Shuanghong, Wang Yong, Zhang Hua, Ma Jianchun, Li Xiaocui, Peng Changping, Qi Rui, Guozheng Li, Hui Liu, Yongsheng Ding, Xiaojun Liu, Qinhua Huang
Table of Contents
Intelligent Computing in Signal Processing and Pattern Recognition An 802.11-Based Location Determination Approach for Context-Aware System Chun-Dong Wang, Ming Gao, Xiu-Feng Wang . . . . . . . . . . . . . . . . . . . .
1
A Face Recognition System on Distributed Evolutionary Computing Using On-Line GA Nam Mi Young, Md. Rezaul Bashar, Phill Kyu Rhee . . . . . . . . . . . . . . .
9
A Fuzzy Kohonen’s Competitive Learning Algorithm for 3D MRI Image Segmentation Jun Kong, Jianzhong Wang, Yinghua Lu, Jingdan Zhang, Jingbo Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
A Hybrid Genetic Algorithm for Two Types of Polygonal Approximation Problems Bin Wang, Chaojian Shi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
A Hybrid Model for Nondestructive Measurement of Internal Quality of Peach Yongni Shao, Yong He . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
A Novel Approach in Sports Image Classification Wonil Kim, Sangyoon Oh, Sanggil Kang, Kyungro Yoon . . . . . . . . . . . .
54
A Novel Biometric Identification Approach Based on Human Hand Jun Kong, Miao Qi, Yinghua Lu, Shuhua Wang, Yuru Wang . . . . . . . .
62
A Novel Color Image Watermarking Method Based on Genetic Algorithm Yinghua Lu, Jialing Han, Jun Kong, Gang Hou, Wei Wang . . . . . . . . .
72
A Novel Emitter Signal Recognition Model Based on Rough Set Guan Xin, Yi Xiao, He You . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
A Novel Model for Independent Radial Basis Function Neural Networks with Multiresolution Analysis GaoYun An, QiuQi Ruan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
90
XIV
Table of Contents
A Novelty Automatic Fingerprint Matching System Tianding Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Abnormal Pattern Parameters Estimation of Control Chart Based on Wavelet Transform and Probabilistic Neural Network Shaoxiong Wu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 An Error Concealment Technique Based on JPEG-2000 and Projections onto Convex Sets Tianding Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 An Extended Learning Vector Quantization Algorithm Aiming at Recognition-Based Character Segmentation Lei Xu, Bai-Hua Xiao, Chun-Heng Wang, Ru-Wei Dai . . . . . . . . . . . . . 131 Improved Decision Tree Algorithm: ID3+ Min Xu, Jian-Li Wang, Tao Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Application of Support Vector Machines with Binary Tree Architecture to Advanced Radar Emitter Signal Recognition Gexiang Zhang, Haina Rong, Weidong Jin . . . . . . . . . . . . . . . . . . . . . . . . 150 Automatic Target Recognition in High Resolution SAR Image Based on Electromagnetic Characteristics Wen-Ming Zhou, Jian-She Song, Jun Xu, Yong-An Zheng . . . . . . . . . . 162 Boosting in Random Subspace for Face Recognition Yong Gao, Yangsheng Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Component-Based Human Body Tracking for Posture Estimation Kyoung-Mi Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Computation of the Probability on the Number of Solution for the P3P Problem Jianliang Tang, Xiao-Shan Gao, Wensheng Chen . . . . . . . . . . . . . . . . . . 191 Context-Awareness Based Adaptive Classifier Combination for Object Recognition Mi Young Nam, Battulga Bayarsaikhan, Suman Sedai, Phill Kyu Rhee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Detecting All-Zero Coefficient Blocks Before Transformation and Quantization in H.264/AVC Zhengyou Wang, Quan Xue, Jiatao Song, Weiming Zeng, Guobin Chen, Zhijun Fang, Shiqian Wu . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Table of Contents
XV
Efficient KPCA-Based Feature Extraction: A Novel Algorithm and Experiments Yong Xu, David Zhang, Jing-Yu Yang, Zhong Jing, Miao Li . . . . . . . . 220 Embedded System Implementation for an Object Detection Using Stereo Image Cheol-Hong Moon, Dong-Young Jang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 Graphic Editing Tools in Bioluminescent Imaging Simulation Hui Li, Jie Tian, Jie Luo, Yujie Lv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Harmonics Real Time Identification Based on ANN, GPS and Distributed Ethernet Zhijian Hu, Chengxue Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 The Synthesis of Chinese Fine-Brushwork Painting for Flower Tianding Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Hybrid Bayesian Super Resolution Image Reconstruction Tao Wang, Yan Zhang, Yong Sheng Zhang . . . . . . . . . . . . . . . . . . . . . . . . 275 Image Hiding Based Upon Vector Quantization Using AES Cryptosystem Yanquan Chen, Tianding Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Image Ownership Verification Via Unitary Transform of Conjugate Quadrature Filter Jianwei Yang, Xinxiang Zhang, Wen-Sheng Chen, Bin Fang . . . . . . . . . 294 Inter Layer Intra Prediction Using Lower Layer Information for Spatial Scalability Zhang Wang, Jian Liu, Yihua Tan, Jinwen Tian . . . . . . . . . . . . . . . . . . 303 Matching Case History Patterns in Case-Based Reasoning Guoxing Zhao, Bin Luo, Jixin Ma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 Moment Invariant Based Control System Using Hand Gestures P. Premaratne, F. Safaei, Q. Nguyen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 Multiple-ROI Image Coding Method Using Maxshift over Low-Bandwidth Kang Soo You, Han Jeong Lee, Hoon Sung Kwak . . . . . . . . . . . . . . . . . . 334 Multi-resolution Image Fusion Using AMOPSO-II Yifeng Niu, Lincheng Shen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
XVI
Table of Contents
Multiscale Linear Feature Extraction Based on Beamlet Transform Ming Yang, Yuhua Peng, Xinhong Zhou . . . . . . . . . . . . . . . . . . . . . . . . . . 353 Multisensor Information Fusion Application to SAR Data Classification Hai-Hui Wang, Yan-Sheng Lu, Min-Jiang Chen . . . . . . . . . . . . . . . . . . . 364 NDFT-Based Audio Watermarking Scheme with High Robustness Against Malicious Attack Ling Xie, Jiashu Zhang, Hongjie He . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374 New Multiple Regions of Interest Coding Using Partial Bitplanes Scaling for Medical Image Compression Li-bao Zhang, Ming-quan Zhou . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382 Particle Swarm Optimization for Road Extraction in SAR Images Ge Xu, Hong Sun, Wen Yang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392 Pattern Recognition Without Feature Extraction Using Probabilistic Neural Network ¨ un¸c Polat, T¨ Ov¨ ulay Yıldırım . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402 Power Transmission Towers Extraction in Polarimetric SAR Imagery Based on Genetic Algorithm Wen Yang, Ge Xu, Jiayu Chen, Hong Sun . . . . . . . . . . . . . . . . . . . . . . . . 410 Synthesis Texture by Tiling s-Tiles Feng Xue, Yousheng Zhang, Julang Jiang, Min Hu, Tao Jiang . . . . . . . 421 Relaxation Labeling Using an Improved Hopfield Neural Network Long Cheng, Zeng-Guang Hou, Min Tan . . . . . . . . . . . . . . . . . . . . . . . . . . 430 Adaptive Rank Indexing Scheme with Arithmetic Coding in Color-Indexed Images Kang Soo You, Hyung Moo Kim, Duck Won Seo, Hoon Sung Kwak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440 Revisit to the Problem of Generalized Low Rank Approximation of Matrices Chong Lu, Wanquan Liu, Senjian An . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450 Robust Face Recognition of Images Captured by Different Devices Guangda Su, Yan Shang, Baixing Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . 461 Robust Feature Extraction for Mobile-Based Speech Emotion Recognition System Kang-Kue Lee, Youn-Ho Cho, Kyu-Sik Park . . . . . . . . . . . . . . . . . . . . . . 470
Table of Contents
XVII
Robust Segmentation of Characters Marked on Surface Jong-Eun Ha, Dong-Joong Kang, Mun-Ho Jeong, Wang-Heon Lee . . . 478 Screening of Basal Cell Carcinoma by Automatic Classifiers with an Ambiguous Category Seong-Joon Baek, Aaron Park, Daejin Kim, Sung-Hoon Hong, Dong Kook Kim, Bae-Ho Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488 Segmentation of Mixed Chinese/English Documents Based on Chinese Radicals Recognition and Complexity Analysis in Local Segment Pattern Yong Xia, Bai-Hua Xiao, Chun-Heng Wang, Yao-Dong Li . . . . . . . . . . 497 Sigmoid Function Activated Blocking Artifacts Reduction Algorithm Zhi-Heng Zhou, Sheng-Li Xie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507 Simulation of Aging Effects in Face Images Junyan Wang, Yan Shang, Guangda Su, Xinggang Lin . . . . . . . . . . . . . 517 Synthetic Aperture Radar Image Segmentation Using Edge Entropy Constrained Stochastic Relaxation Yongfeng Cao, Hong Sun, Xin Xu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528 The Influence of Channel Coding on Information Hiding Bounds and Detection Error Rate Fan Zhang, Xinhong Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538 Wavelet Thinning Algorithm Based Similarity Evaluation for Offline Signature Verification Bin Fang, Wen-Sheng Chen, Xinge You, Tai-Ping Zhang, Jing Wen, Yuan Yan Tang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547 When Uncorrelated Linear Discriminant Analysis Are Combined with Wavelets Xue Cao, Jing-Yu Yang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556 2D Direct LDA for Efficient Face Recognition Un-Dong Chang, Young-Gil Kim, Dong-Woo Kim, Young-Jun Song, Jae-Hyeong Ahn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566 3-D Curve Moment Invariants for Curve Recognition Dong Xu, Hua Li . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572 3D Ear Reconstruction Attempts: Using Multi-view Heng Liu, Jingqi Yan, David Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578
XVIII
Table of Contents
A Class of Multi-scale Models for Image Denoising in Negative Hilbert-Sobolev Spaces Jun Zhang, Zhihui Wei . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584 A Detection Algorithm of Singular Points in Fingerprint Images Combining Curvature and Orientation Field Xiaolong Zheng, Yangsheng Wang, Xuying Zhao . . . . . . . . . . . . . . . . . . . 593 A Mathematical Framework for Optical Flow Computation Xiaoxin Guo, Zhiwen Xu, Yueping Feng, Yunxiao Wang, Zhengxuan Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600 A Method for Camera Pose Estimation from Object of a Known Shape Dong-Joong Kang, Jong-Eun Ha, Mun-Ho Jeong . . . . . . . . . . . . . . . . . . . 606 A Method of Radar Target Recognition Basing on Wavelet Packets and Rough Set Hong Wang, Shanwen Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614 A Multi-resolution Image Segmentation Method Based on Evolution of Local Variance Yan Tian, Yubo Xie, Fuyuan Peng, Jian Liu, Guobo Xing . . . . . . . . . . 620 A New Denoising Method with Contourlet Transform Gangyi Jiang, Mei Yu, Wenjuan Yi, Fucui Li, Yong-Deak Kim . . . . . . 626 A Novel Authentication System Based on Chaos Modulated Facial Expression Recognition Xiaobin Luo, Jiashu Zhang, Zutao Zhang, Hui Chen . . . . . . . . . . . . . . . 631 A Novel Computer-Aided Diagnosis System of the Mammograms Weidong Xu, Shunren Xia, Huilong Duan . . . . . . . . . . . . . . . . . . . . . . . . . 639 A Partial Curve Matching Method for Automatic Reassembly of 2D Fragments Liangjia Zhu, Zongtan Zhou, Jingwei Zhang, Dewen Hu . . . . . . . . . . . . 645 A Split/Merge Method with Ranking Selection for Polygonal Approximation of Digital Curve Chaojian Shi, Bin Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651 A Training Strategy of Class-Modular Neural Network Classifier for Handwritten Chinese Character Recognition Xue Gao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657
Table of Contents
XIX
Active Set Iteration Method for New L2 Soft Margin Support Vector Machine Liang Tao, Juan-juan Gu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663 Adaptive Eigenbackground for Dynamic Background Modeling Lei Wang, Lu Wang, Qing Zhuo, Huan Xiao, Wenyuan Wang . . . . . . . 670 Adaptive Content-Based Image Retrieval Using Optimum Fuzzy Weight Value Dong-Woo Kim, Young-Jun Song, Un-Dong Chang, Jae-Hyeong Ahn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 676 An Adaptive MRF-MAP Motion Vector Recovery Algorithm for Video Error Concealment Zheng-fang Li, Zhi-liang Xu, De-lu Zeng . . . . . . . . . . . . . . . . . . . . . . . . . . 683 An Efficient Segmentation Algorithm Based on Mathematical Morphology and Improved Watershed Ge Guo, Xijian Ping, Dongchuan Hu, Juanqi Yang . . . . . . . . . . . . . . . . 689 An Error Concealment Based on Inter-frame Information for Video Transmission Youjun Xiang, Zhengfang Li, Zhiliang Xu . . . . . . . . . . . . . . . . . . . . . . . . . 696 An Integration of Topographic Scheme and Nonlinear Diffusion Filtering Scheme for Fingerprint Binarization Xuying Zhao, Yangsheng Wang, Zhongchao Shi, Xiaolong Zheng . . . . . 702 An Intrusion Detection Model Based on the Maximum Likelihood Short System Call Sequence Chunfu Jia, Anming Zhong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709 Analysis of Shell Texture Feature of Coscinodiscus Based on Fractal Feature Guangrong Ji, Chen Feng, Shugang Dong, Lijian Zhou, Rui Nian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715 Associative Classification Approach for Diagnosing Cardiovascular Disease Kiyong Noh, Heon Gyu Lee, Ho-Sun Shon, Bum Ju Lee, Keun Ho Ryu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 721 Attentive Person Selection for Human-Robot Interaction Diane Rurangirwa Uwamahoro, Mun-Ho Jeong, Bum-Jae You, Jong-Eun Ha, Dong-Joong Kang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728
XX
Table of Contents
Basal Cell Carcinoma Detection by Classification of Confocal Raman Spectra Seong-Joon Baek, Aaron Park . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735 Blind Signal-to-Noise Ratio Estimation Algorithm with Small Samples for Wireless Digital Communications Dan Wu, Xuemai Gu, Qing Guo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741 Bootstrapping Stochastic Annealing EM Algorithm for Multiscale Segmentation of SAR Imagery Xian-Bin Wen, Zheng Tian, Hua Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . 749 BP Neural Network Based SubPixel Mapping Method Liguo Wang, Ye Zhang, Jiao Li . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755 Cellular Recognition for Species of Phytoplankton Via Statistical Spatial Analysis Guangrong Ji, Rui Nian, Shiming Yang, Lijian Zhou, Chen Feng . . . . 761 Combination of Linear Support Vector Machines and Linear Spectral Mixed Model for Spectral Unmixing Liguo Wang, Ye Zhang, Chunhui Zhao . . . . . . . . . . . . . . . . . . . . . . . . . . . 767 Combining Speech Enhancement with Feature Post-processing for Robust Speech Recognition Jianjun Lei, Jun Guo, Gang Liu, Jian Wang, Xiangfei Nie, Zhen Yang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773 Conic Section Function Neural Networks for Sonar Target Classification and Performance Evaluation Using ROC Analysis Burcu Erkmen, Tulay Yildirim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 779 3D Map Building for Mobile Robots Using a 3D Laser Range Finder Zhiyu Xiang, Wenhui Zhou . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785 Construction of Fast and Robust N-FINDR Algorithm Liguo Wang, Xiuping Jia, Ye Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791 Dental Plaque Quantification Using Cellular Neural Network-Based Image Segmentation Jiayin Kang, Xiao Li, Qingxian Luan, Jinzhu Liu, Lequan Min . . . . . . 797 Detection of Microcalcifications Using Wavelet-Based Thresholding and Filling Dilation Weidong Xu, Zanchao Zhang, Shunren Xia, Huilong Duan . . . . . . . . . . 803
Table of Contents
XXI
ECG Compression by Optimized Quantization of Wavelet Coefficients Jianhua Chen, Miao Yang, Yufeng Zhang, Xinling Shi . . . . . . . . . . . . . . 809 Effects on Density Resolution of CT Image Caused by Nonstationary Axis of Rotation Yunxiao Wang, Xin Wang, Xiaoxin Guo, Yunjie Pang . . . . . . . . . . . . . 815 Embedded Linux Remote Control System to Achieve the Stereo Image Cheol-Hong Moon, Kap-Sung Kim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 821 Estimation of Omnidirectional Camera Model with One Parametric Projection Yongho Hwang, Hyunki Hong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827 Expert Knowledge Guided Genetic Algorithm for Beam Angle Optimization Problem in Intensity-Modulated Radiotherapy Planning Yongjie Li, Dezhong Yao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834 Extracting Structural Damage Features: Comparison Between PCA and ICA Luo Zhong, Huazhu Song, Bo Han . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 840 Face Alignment Using an Improved Active Shape Model Zhenhai Ji, Wenming Zheng, Ning Sun, Cairong Zou, Li Zhao . . . . . . 846 Face Detection with an Adaptive Skin Color Segmentation and Eye Features Hang-Bong Kang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 852 Fall Detection by Wearable Sensor and One-Class SVM Algorithm Tong Zhang, Jue Wang, Liang Xu, Ping Liu . . . . . . . . . . . . . . . . . . . . . . 858 Feature Extraction and Pattern Classification on Mining Electroencephalography Data for Brain-Computer Interface Qingbao Liu, Zongtan Zhou, Yang Liu, Dewen Hu . . . . . . . . . . . . . . . . . 864 Feature Extraction of Hand-Vein Patterns Based on Ridgelet Transform and Local Interconnection Structure Neural Network Yu Zhang, Xiao Han, Si-liang Ma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 870 Fuzzy Support Vector Machines for Automatic Infant Cry Recognition Sandra E. Barajas-Montiel, Carlos A. Reyes-Garc´ıa . . . . . . . . . . . . . . . . 876 Geodesic Gabriel Graph Based Supervised Nonlinear Manifold Learning Huajie Chen, Wei Wei . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 882
XXII
Table of Contents
Grouping Sampling Reduction-Based Linear Discriminant Analysis Yan Wu, Li Dai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888 Hierarchical Adult Image Rating System Wonil Kim, Han-Ku Lee, Kyoungro Yoon . . . . . . . . . . . . . . . . . . . . . . . . . 894 Shape Representation Based on Polar-Graph Spectra Haifeng Zhao, Min Kong, Bin Luo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 900 Hybrid Model Method for Automatic Segmentation of Mandarin TTS Corpus Xiaoliang Yuan, Yuan Dong, Dezhi Huang, Jun Guo, Haila Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 906 ICIS: A Novel Coin Identification System Adnan Khashman, Boran Sekeroglu, Kamil Dimililer . . . . . . . . . . . . . . . 913 Image Enhancement Method for Crystal Identification in Crystal Size Distribution Measurement Wei Liu, YuHong Zhao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 919 Image Magnification Using Geometric Structure Reconstruction Wenze Shao, Zhihui Wei . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925 Image-Based Classification for Automating Protein Crystal Identification Xi Yang, Weidong Chen, Yuan F. Zheng, Tao Jiang . . . . . . . . . . . . . . . 932 Inherit-Based Adaptive Frame Selection for Fast Multi-frame Motion Estimation in H.264 Liangbao Jiao, De Zhang, Houjie Bi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 938 Intelligent Analysis of Anatomical Shape Using Multi-sensory Interface Jeong-Sik Kim, Hyun-Joong Kim, Soo-Mi Choi . . . . . . . . . . . . . . . . . . . . 945 Modeling Expressive Music Performance in Bassoon Audio Recordings Rafael Ramirez, Emilia Gomez, Veronica Vicente, Montserrat Puiggros, Amaury Hazan, Esteban Maestre . . . . . . . . . . . . . 951 Modeling MPEG-4 VBR Video Traffic by Using ANFIS Zhijun Fang, Shenghua Xu, Changxuan Wan, Zhengyou Wang, Shiqian Wu, Weiming Zeng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 958 Multiple Textural Features Based Palmprint Authentication Xiangqian Wu, Kuanquan Wang, David Zhang . . . . . . . . . . . . . . . . . . . . 964
Table of Contents
XXIII
Neural Network Deinterlacing Using Multiple Fields Hyunsoo Choi, Eunjae Lee, Chulhee Lee . . . . . . . . . . . . . . . . . . . . . . . . . . 970 Non-stationary Movement Analysis Using Wavelet Transform Cheol-Ki Kim, Hwa-Sei Lee, DoHoon Lee . . . . . . . . . . . . . . . . . . . . . . . . . 976 Novel Fault Class Detection Based on Novelty Detection Methods Jiafan Zhang, Qinghua Yan, Yonglin Zhang, Zhichu Huang . . . . . . . . . 982 Novel Scheme for Automatic Video Object Segmentation and Tracking in MPEG-2 Compressed Domain Zhong-Jie Zhu, Yu-Er Wang, Zeng-Nian Zhang, Gang-Yi Jiang . . . . . . 988 Offline Chinese Signature Verification Based on Segmentation and RBFNN Classifier Zhenhua Wu, Xiaosu Chen, Daoju Xiao . . . . . . . . . . . . . . . . . . . . . . . . . . 995 On-Line Signature Verification Based on Wavelet Transform to Extract Characteristic Points LiPing Zhang, ZhongCheng Wu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1002 Parameter Estimation of Multicomponent Polynomial Phase Signals Han-ling Zhang, Qing-yun Liu, Zhi-shun Li . . . . . . . . . . . . . . . . . . . . . . . 1008 Parameters Estimation of Multi-sine Signals Based on Genetic Algorithms Changzhe Song, Guixi Liu, Di Zhao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1013 Fast Vision-Based Camera Tracking for Augmented Environments Bum-Jong Lee, Jong-Seung Park . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1018 Recognition of 3D Objects from a Sequence of Images Daesik Jang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1024 Reconstruction of Rectangular Plane in 3D Space Using Determination of Non-vertical Lines from Hyperboloidal Projection Hyun-Deok Kang, Kang-Hyun Jo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1030 Region-Based Fuzzy Shock Filter with Anisotropic Diffusion for Adaptive Image Enhancement Shujun Fu, Qiuqi Ruan, Wenqia Wang, Jingnian Chen . . . . . . . . . . . . . 1036 Robust Feature Detection Using 2D Wavelet Transform Under Low Light Environment Jihoon Lee, Youngouk Kim, Changwoo Park, Changhan Park, Joonki Paik . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1042
XXIV
Table of Contents
Robust Music Information Retrieval in Mobile Environment Won-Jung Yoon, Kyu-Sik Park . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1051 Robust Speech Feature Extraction Based on Dynamic Minimum Subband Spectral Subtraction Xin Ma, Weidong Zhou, Fang Ju . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1056 Searching Algorithm for Shadow Areas Using Correlation in Fourier Domain and Its Application Choong Ho Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1062 Shadow Detection Based on rgb Color Model Baisheng Chen, Duansheng Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1068 Shape Analysis for Planar Barefoot Impression Li Tong, Lei Li, Xijian Ping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1075 Statistical Neural Network Based Classifiers for Letter Recognition Burcu Erkmen, Tulay Yildirim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1081 The Study of Character Recognition Based on Fuzzy Support Vector Machine Yongjun Ma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1087 Tracking, Record, and Analysis System of Animal’s Motion for the Clinic Experiment Jae-Hyuk Han, Young-Jun Song, Dong-Jin Kwon, Jae-Hyeong Ahn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1093 VEP Estimation with Feature Enhancement by Whiten Filter for Brain Computer Interface Jin-an Guan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1101 Weight Estimation for Audio-Visual Multi-level Fusion in Bimodal Speaker Identification Zhiyong Wu, Lianhong Cai, Helen M. Meng . . . . . . . . . . . . . . . . . . . . . . . 1107
Special Session on Computing for Searching Strategies to Control Dynamic Processes A Study on Optimal Configuration for the Mobile Manipulator Considering the Minimal Movement Jin-Gu Kang, Kwan-Houng Lee, Jane-Jin Kim . . . . . . . . . . . . . . . . . . . . 1113
Table of Contents
XXV
Multi-objective Flow Shop Scheduling Using Differential Evolution Bin Qian, Ling Wang, De-Xian Huang, Xiong Wang . . . . . . . . . . . . . . . 1125 A Genetic Algorithm for the Batch Scheduling with Sequence-Dependent Setup Times TsiuShuang Chen, Lei Long, Richard Y.K. Fung . . . . . . . . . . . . . . . . . . . 1137 A Study on the Configuration Control of a Mobile Manipulator Base Upon the Optimal Cost Function Kwan-Houng Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1145 An Effective PSO-Based Memetic Algorithm for TSP Bo Liu, Ling Wang, Yi-hui Jin, De-xian Huang . . . . . . . . . . . . . . . . . . . 1151 Dual-Mode Control Algorithm for Wiener-Typed Nonlinear Systems Haitao Zhang, Yongji Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1157 NDP Methods for Multi-chain MDPs Hao Tang, Lei Zhou, Arai Tamio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1163 Research of an Omniberaing Sun Locating Method with Fisheye Picture Based on Transform Domain Algorithm Xi-hui Wang, Jian-ping Wang, Chong-wei Zhang . . . . . . . . . . . . . . . . . . 1169 Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1175
An 802.11-Based Location Determination Approach for Context-Aware System Chun-Dong Wang1, 2, Ming Gao2, and Xiu-Feng Wang1 1 College
of Information Technical Science, NanKai University, Tianjin 300071, China
[email protected],
[email protected] 2 Department of Computer Science & Engineering, Tianjin University of Technology, Tianjin 300191, China {Michael3769, Ten_minutes}@163.com
Abstract. WLAN location determination systems are gaining increasing attention due to the value they add to wireless networks. This paper focuses on how to determine the mobile devices’ location indoor by using signal strength (SS) in 802.11-based system. We propose an 802.11-based location determinat-ion technique Nearest Neighbor in Signal Space (NNSS) which locates mobile objects via collecting the sensed power strengths. Based on NNSS, we present a modification Modified Nearest Neighbor in Signal Space (MNNSS) to enhance the location determination accuracy by taking into account signal strength of more reference points in each estimating location of the mobile objects. In NNSS, we compare the measured SS (signal strength) with the SS of each reference point recorded in database to find the best match, but in MNNSS, we not only compare the measured SS with that of each reference point, but also the reference points around it, so it increases the location determination preciseness. The experimental results show that the location information provided by MNNSS assures higher correctness than NNSS. Implementation of this technique in the WLAN location determination system shows that the average system accuracy is increased by more than 0.5 meters. This significant enhancement in the accuracy of WLAN location determination systems helps increase the set of context-aware applications implemented on top of these systems.
1 Introduction With the development of the wireless network, many techniques and applications about location determination [1-3] especially the context-aware applications [4] have been put forward. According to the current research of location determination approaches, because of the influence of the barriers and other factors, indoor location determination has less accuracy and more complexity, so it become more difficult. The communication system [5] or the Global Positioning System (GPS) [6,7] is usually used to provide location information in outdoor location determination. GPS is a technique widely used. Several satellites are used in the system to position objects. But in the application for indoor location determination, GPS system is not an appropriate technique for its bigger standard error, and the barriers indoor may block its signal. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1 – 8, 2006. © Springer-Verlag Berlin Heidelberg 2006
2
C.-D. Wang, M. Gao, and X.-F. Wang
Another application for outdoor location determination is the cellular system, and it has the similar disadvantage that it has less accuracy and may be easily influenced by the barriers. So it is difficult to position objects indoor by using GPS system, and it is necessary to develop an indoor position system with more accuracy. Recently many indoor location determination techniques are emerging, for example, the Received Signal Strength (RSS) method [8-10]. We appreciate the RSS method, and this paper is also about it. The remainder of this paper is organized as follows. In Section 2, we classify the location determination approaches into three categories, and introduce the main idea of each category. In section 3, we propose MNNSS algorithm. The experiments and comparisons are described in section 4, and the conclusion is drawn in section 5.
2 Related Work 2.1 Location Determination Approach The location determination approaches used in mobile computing system can be classified into three categories. The first category applies the Time of Arrival (TOA) or Time Difference of Arrival (TDOA) schemes to locate mobile terminals. The principle of TOA and TDOA is estimating the distance between the receiver and each sender according to the traveling time from the senders to the receivers and then calculating the position of the receiver with the help of the known positions of three senders. The second category applies the Angle of Arrival (AOA) schemes to locate mobile terminals. The principle of AOA is estimating the angle of arrival signal and then calculating the position of the sender by the known positions of the receivers and the angles of arrival signals detected by each receiver. The last category utilizes the attenuation of Received Signal Strength (RSS) of the senders nearby to locate mobile terminals. Each category of approaches has its advantage and disadvantage. Although TOA and TDOA can present more accurate location determination results, these locating technologies often require the senders to be equipped with an extremely accurate synchronized timer. Besides, the distance between the senders should be significantly large enough to ensure the difference of the location determination signal arrival time distinguishable. The above constraints induce TOA and TDOA approaches not appropriate for indoor location determination. On the other hand, the AOA approach also requires the sender to be able to detect the direction of arrival signals. This also requires the access point (AP) to equip extra components such as the smart antennas. Besides, the reflection problem due to the indoor natures such as walls and pillars often causes inaccurate location determination results. 2.2 WLAN Location Determination Systems As 802.11-based wireless LANs become more ubiquitous, the importance of WLAN location determination systems [11-14,16-19] increases. Such systems are purely software based and therefore add to the value of the wireless network. A large class of applications, including [15] location-sensitive content delivery, direction finding, asset tracking, and emergency notification, can be built on top of such systems. This
An 802.11-Based Location Determination Approach for Context-Aware System
3
set of applications can be broadened as the accuracy of WLAN location determination system increases. WLAN location determination systems usually work in two phases: offline training phase and online location determination phase. During the offline phase, the signal strength received from the access points (APs) at selected locations in the area of interest is tabulated, resulting in a so called radio map. During the location determination phase, the signal strength samples received from the access points are used to “search” the radio map to estimate the user location. Radio-map based techniques can be categorized into two broad categories: deterministic techniques and probabilistic techniques. Deterministic techniques [11,12,17] represent the signal strength of an access point at a location by a scalar value, for example, the mean value, and use nonprobabilistic approaches to estimate the user location. For example, in the Radar system [11,12] the authors use nearest neighborhood techniques to infer the user location. On the other hand, probabilistic techniques [13,14,18,19] store information about the signal strength distributions from the access points in the radio map and use probabilistic techniques to estimate the user location. For example, the Horus system uses a Bayesian-based approach to estimate the user location. Youssef et al. (2005) uses a Multivariate Analysis for Probabilistic approach to estimate the user location. WLAN location determination systems need to deal with the noisy characteristics of the wireless channel to achieve higher accuracy. In this paper, we use the RSS approach. The advantage of the RSS approach is that it can be easily applied. We can get the signal strength of the access points at the mobile terminal in the networks that support 802.11 protocol. If we can locate objects with this information, the approach of this category is surely the most cost-efficient one. The disadvantage of RSS approach is that environments can easily influence the signal strength, and it is more serious in indoor environments.
3 Modified Nearest Neighbor(s) in Signal Space Algorithm 3.1 Nearest Neighbor(s) in Signal Space (NNSS) Algorithm NNSS is a kind of RSS approach [8]. We first maintain the power signature of a set of positions. For position i, we define (si(a1), si (a2), … , si (an)) as the power signature, si (aj) denotes the signal strength (SS) received from access point aj at position i, and n is the count of the APs. The position whose power signature is maintained in the database is called a reference point. We define (si’(a1), si’(a2), … , si’(an)) as the measured power signature actually, si’(aj) denotes one mobile terminal receive the SS from access point aj at position i currently. Then we compare the power signature measured by the mobile terminal with the data recorded in the database, and then estimate the position of the mobile terminal. When we estimate the mobile terminal’s position, we determine the location that best matches the observed SS of the mobile terminal. We need a metric and a search methodology to compare multiple locations and pick the one that best matches the observed signal strength. The idea is to compute the distance (in signal space)
4
C.-D. Wang, M. Gao, and X.-F. Wang
between the observed set of SS measurements and the recorded SS at the reference points, and then pick the location that minimizes the distance. We can use the Euclidean distance measure, i.e.,
Ed =
n
¦ ( s '(a ) − s (a )) j
i
2
j
i
(1)
.
j =1
3.2 The Principle of MNNSS To illustrate our method, we assume the following situation showed in Fig.1.
Fig. 1. The 8 reference points around reference point m , s1-s8 denotes the SS of the 8 neighbors of m. When we want to estimate whether a position is the nearest neighbor of the mobile terminal, we should not only consider the SS of the origin point m, but also consider the SS of the reference points around one position.
In MNNSS, we defined l layers around each reference point. We calculate the Euclidean distance of the reference points of each layer respectively, and used a weight value averaging these results, and then use this new average value to estimate the position. When we estimate reference point i, we must calculate the Euclidean distance of each layer around it respectively. For reference point i, i(u,v) denotes neighbor v in layer u of reference point i. In layer1, there is only one reference point (i.e. i), and we calculate the Euclidean distance S1(i) of layer1 according to the approach described in NNSS algorithm:
S1 (i) =
n
¦ ( s '(a ) − s (a )) i
j
2
.
(2)
'(a j ) − si(u,v) (a j ))2 .
(3)
i
j
j =1
Analogically, in layer u,(u>=2)
1 8(u −1) Su (i) = ¦ 8(u − 1) v =1
n
¦ (s j =1
i(u,v)
An 802.11-Based Location Determination Approach for Context-Aware System
5
Su(i) means the average Euclidean distance in layer u around reference point i. As we mentioned before, sometimes we can’t measure the signal strength at some a particular position, so the actual number of the access points in layer u may be less than 8(u-1). Therefore, we should replace 8(u-1) with the actual number in formula (3).
Fig. 2. The layers around reference point O. We defined 3 layers. Layer1 is the position O itself. In layer2, there are 8 reference points around the position O, and there are 16 reference points in layer. Analogically, the layer u has 8(u-1) reference points. But sometimes we can’t measure the signal strength at some a particular position, so the actual number of the access points in layer u may be less than 8(u-1). Thus, we must replace 8(u-1) with the actual number in the following formula.
Then we define:
S(i) = n
in which,
¦w
u
1 n ¦ (wu Su (i)) . n u =1
(4)
= 1 . Here wu is the weight value, it denotes how important this layer
u =1
is in estimating the result, and n donates the number of layers. We can use different sequence of wu in different application, but obviously, wu must be a decreasing sequence for the layers near the center should play a more important role in calculation. Then we choose the position where we get the minimum of S(i) as nearest neighbor of the mobile terminal.
4 Experiments and Results In this sample paper, we have presented the formatting instructions for ICIC2006. The experiments were carried in the sixth floor of Dept of Computer Science, Tianjin University of Technology. The client is implemented on an Intel-processor laptop with a wireless network card. The OS of the laptop is Linux Red Flag 4.0.
6
C.-D. Wang, M. Gao, and X.-F. Wang
We placed 2 APs in the fifth floor and one AP in the sixth floor. We classified the whole area into 11 regions: room 601-609, corridor and lobby. Fig.3 shows the layout of the sixth floor of Dept of Computer Science, Tianjin University of Technology.
Fig. 3. The layout of the sixth floor of Dept of Computer Science, Tianjin University of Technology
The performance of the location determination algorithm was measured using the metric error distance and the correctness of region classification. The metric error was defined as the spatial distance between the original position and the position calculated by the location Determination system. Table 1. The error distance in each position (NNSS)
Testing point P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11
Error Distance 2.45 2.13 1.89 1.78 1.57 3.65 3.22 2.33 0.78 2.45 1.78
Testing point P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22
Error Distance 0.98 1.14 2.51 2.21 3.20 1.29 2.43 1.34 1.56 2.34 3.21
In our experiment, we defined 178 reference points in the whole area. At each reference point, we measured the power signature 100 times and stored the average. Then we selected 22 testing positions in the whole region (2 points each region), recorded their actual location and received signal strength. We first used NNSS to position mobile terminals. NNSS classifies all the testing positions into their regions correctly except 2 positions and the mean error distance was 2.10 meters. Table1 shows the error distance in each position. Then we used MNNSS to position mobile terminals. For each compare, we used 2 layers. We set the weight values w1=1, w2=0.5. MNNSS corrected the two errors and classifies all the testing positions. The mean error distance was 1.47 meters. Table2 shows the error distance in each position.
An 802.11-Based Location Determination Approach for Context-Aware System
7
Table 2. The error distance in each position (MNNSS)
Testing point P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11
Error Distance 1.25 2.13 1.43 1.78 1.57 0.59 1.35 2.33 0.98 1.23 1.78
Testing point P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22
Error Distance 0.98 1.14 2.51 1.38 0.75 1.29 2.43 1.34 1.56 2.34 0.26
5 Conclusions This paper discussed how to use MNNSS to determine the objects location in 802.11 based location determination system. We assume that all data needed in location determination are acquired before, and we didn’t discuss how to set up APs and how to select the reference points. We lay emphasis on the location determination algorithm. NNSS can position objects only by the received signal strength measured by each mobile terminal. This algorithm is cost-efficient, and we don’t need to apply much modification to the communicative devices in the mobile terminals and the service provider. MNNSS is a modification to NNSS. In MNNSS, disturbance of incidents to the location determination results can be avoided by using more access points in each compare. In further research, we can focus on following aspects: Using more layers in our calculation can provide more location determination accuracy, but the costs of calculation will increase. How to balance them is the direction of further research. We are sure that the weight value we used in averaging the Euclidean distance must be decreasing. But how to define them is still a important problem. If it decreases too sharply, the advantage of MNNSS will be weakened, and MNNSS will be not too different from NNSS. If it decreases too slowly, the outer layers will be the same important as the central layers, and this is not reasonable.
Acknowledgements This paper is supported by Tianjin Municipal Education Commission, “the Higher Education Institution Science and Technology Development Fund” (No.20041615).
8
C.-D. Wang, M. Gao, and X.-F. Wang
References 1. Bahl, P., Balachandran, A., Padmanabhan, V.N.: Enhancements to the RADAR User Location and Tracking System, Microsoft Research Technical Report, February (2000) 2. Robert, J.O., Abowd, G.D.: The Smart Floor: A Mechanism for Natural User Identification and Tracking. Porceedings of the 2000 Conference on Human Factors in Computing Systems (CHI 2000), The Hague, Netherlands, (2000) 1-6 3. Priyantha, N.B., Chakraborty, A., Balakrishnan, H.: The Cricket Location-Support system, Proc. 6th ACM MOBICOM, Boston, MA, (2000) 32-43 4. Mitchell, S., et al.: Context-Aware Multimedia Computing in the Intelligent Hospital. (2000) 13–18 5. Liu, T., Bahl, P.: Mobility Modeling, Location Tracking, and Trajectory Prediction in Wireless ATM Networks. IEEE JSAC, Vol.16 (1998) 922–936 6. Enge,P. and Misra,P.: Special issue on GPS: The Global Positioning System. Proceedings of the IEEE, 87 (1999) 3–15 7. Garmin Cor.: About GPS. Website, 2001, http://www.garmin.com/aboutGPS/ 8. Bahl,P., Padmanabhan, V. N.: ADAR: An RF-Based In-Building User Location and Tracking System. Proc. IEEE Infocom, (2000) 236-241 9. Jin, M.H., Wu, E.H.K., Liao, Y.B., Liao, H.C.: 802.11-based Positioning System for Context Aware Applications. Proceedings of Communication Systems and Applications, (2004) 236-239 10. Lionel, M. N., Liu, Y.H., Lau, Y.C., Abhishek P. P.: LANDMARC: Indoor Location Sensing Using Active RFID. Proceedings of the first IEEE International Conference on Pervasive Computing and Communications (Percom’03), (2003) 239-249 11. Bahl, P., Padmanabhan,V. N.: RADAR: An In-Building RF-based User Location and Tracking System. In IEEE Infocom 2000, vol. 2 (2000) 775–784 12. Bahl, P., Padmanabhan, V. N., A. Balachandran.: Enhancements to the RADAR User Location and Tracking System. Technical ReportMSR-TR-00-12, Microsoft Research, (2000) 13. Castro, P., Chiu, P., Kremenek, T., Muntz, R.: A Probabilistic Location Service for Wireless Network Environments. Ubiquitous Computing 2001, September (2001) 14. Castro, P., Muntz, R.: Managing Context for Smart Spaces. IEEE Personal Communications, (2000) 412-421 15. Chen, G., Kotz, D.: A Survey of Context-Aware Mobile Computing Research. Technical Report Dartmouth Computer Science Technical Report TR2000-381, (2000) 16. Ganu, S., Krishnakumar, A.S., Krishnan, P.: Infrastructurebased Location Estimation in WLAN Networks. In IEEE Wireless Communications and Networking Conference, March (2004) 236-243 17. Krishnan, P., Krishnakumar,A., Ju,W.H., Mallows, C., Ganu. S.: A System for LEASE: Location Estimation Assisted by Stationary Emitters for Indoor RF Wireless Networks. In IEEE Infocom, March (2004) 39-42 18. Ladd, A. M., Bekris, K., Rudys, A., Marceau,G., Kavraki, L. E., Wallach, D. S.: RoboticsBased Location Sensing using Wireless Ethernet. In 8th ACM MOBICOM. Atlanta, GA, September (2002) 69-72 19. Roos, T., Myllymaki, P., Tirri, H.: A Statistical Modeling Approach to Location Estimation. IEEE Transactions on Mobile Computing, Vol.1 (2002) 59–69
A Face Recognition System on Distributed Evolutionary Computing Using On-Line GA Nam Mi Young, Md. Rezaul Bashar, and Phill Kyu Rhee Dept. of Computer Science & Engineering, Inha University 253, Yong-Hyun Dong, Nam-Gu Incheon, South Korea {rera, bashar}@im.inha.ac.kr,
[email protected]
Abstract. Although there is much research on face recognition, however, yet now there exist some limitations especially in illumination and pose. This paper addresses a novel framework to prevail over the illumination barrier and a robust vision system. The key ideas of this paper are distributed evolutionary computing and on-line GA that is the combining concept of context-awareness and genetic algorithm. This research implements Fuzzy ART that carries out the context-awareness, modeling, and identification for the context environment and the system can also distinguish changing environments. On-line GA stores the experiences to make context knowledge that is used for on-line adaptation. Finally, supervised learning is applied to carry on recognition experiments. Experimental results on FERET data set show that On-line GA based face recognition performance is significantly benefited over the application of existing GA classification.
1 Introduction For high security purposes, biometric technologies are urbanized and face recognition is the basic and elementary step to regulate these technologies. To make a high security application, the accuracy and efficiency of the system must have robust, tolerant, and error free characteristics. The increasing use of biometric technologies in highsecurity applications and beyond has stressed the requirement for highly dependable face recognition systems. Face recognition specialists are giving more concentration to make the recognition system more error free. A survey by [1] expresses that the accuracy of state-of-the-art algorithms is fairly high under constrained conditions, but degrades significantly for images exhibiting pose, illumination and facial expression variations. Current research on face recognition efforts strive to achieve insensitivity to such variations following three main directions [2]: (a) introduction of new classification techniques and similarity measurement analysis, (b) reimbursement of appearance variations, and (c) reinforcement of existing systems with additional modalities that are insensitive to these variations. Knowledge or experience plays a vital rule in accuracy and efficiency for a recognition system. It is desirable from a robust system to recognize an object with a minimum time period. Current tremendous research interest grows on optimization of D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 9 – 18, 2006. © Springer-Verlag Berlin Heidelberg 2006
10
N.M. Young, Md.R. Bashar, and P.K. Rhee
execution time with optimization of functions having a large number of variables and it is referred as evolutionary computation [3]. Genetic Algorithm (GA) [4, 5, 6] , strong feature for learning to make knowledge in terms of biological chromosome, creates human like brain for recognizing the objects. However, it takes a lot of time to produce brain. To prevail over this curb, this paper proposes a concept of distributed evolutionary computing and on-line GA (OLGA). The main focuses of this paper are to exploit on-line genetic algorithm to make an on-line evolution on illumination variations and distributed evolutionary computing to make a robust recognition scheme with very low time especially in new environment. The efficiency and robustness of the proposed system is demonstrated on a standard face dataset (FERET) of significant size and is compared with state-of-the-art compensation techniques. On the following section we review the related work in this field in different time by different researchers highlighting the novelties of the proposed work. Previous work related on illumination barrier is conferred in section 2, while the evolutionary computing is briefly described in section 3. Section 4 describes the experimental environments and section 5 illustrates design example. The performance of the algorithms is evaluated with extensive experiments in section 6 and section 7 describes the concluding remarks
2
Previous Work
An image-capturing device captures a superiority image at the present of light. If there are adequate amount of light, there are pleasant pictures and vigorous recognition system. In many research [1, 7], it is initiated that the varying of illumination seriously affects in the performance of face recognition systems. At the same time, face recognition experts feeling more interests to the problem of coping with illumination variations and significant progress has been achieved. Several techniques have been proposed in this area, which may be roughly classified into two main categories [2]. The first category contains techniques looking for illumination insensitive representations of face images. In this category, different preprocessing and filtering techniques are used to eradicate the illumination variation. For example, Hong Liu [8] et. al. proposed a multi-method integration (NMI) scheme using grey, log, and normal histogram techniques to compensate variations of illumination. Jinho Lee [9] et. al generated an illumination subspace for arbitrary 3D faces based on the statistics of measured illuminations under variable lighting circumstances. At their experiment, bilinear illumination model and shape-specific illumination subspace techniques are employed and applied on FRGC dataset. Marios Savvides [10] et.al. presents illumination tolerant face recognition system using minimum average correlation energy (MACE) combining with PCA. Laiyun et. al.[6] modeled a face recognition system with the relighting process and applied to CMU-PIE database.. Phill Kyu Rhee [11] et.al. has developed a context-aware evolvable system with the concept of basic genetic algorithm under dynamic and uneven environments. The second approach based on the development of generative appearance models like active shape model and active appearance model, which are able to reconstruct novel gallery images similar to the illumination in the probe images.
A Face Recognition System on Distributed Evolutionary Computing
11
In parallel to these efforts Computer Graphics scientist has achieved a significant progress for realistic image based rendering and relighting of faces and estimation of the reflectance properties of faces [6]. These researches inspired computer vision work on illumination compensation. The first approach proposed in this paper, since we try to relight the probe image so that it resembles the illumination in gallery images, we propose preprocessing and retinex [11, 12] filtering method to generate a convenient image. Fuzzy Assistance Resonance Theory [12] is exploited to categorize the variant illuminant objects.
3 Distributed Evolutionary Computing (DEC) From the last decade, there is a tremendous interest in the development of the theory and applications of evolutionary computing [3, 5] techniques both in industry and laboratory. Evolutionary computing (EC) is the collection of algorithms based on the evolution of a population towards a solution of a certain problem. These algorithms are exploited successfully in many applications requiring the optimization of a certain multidimensional function. The population of possible solutions evolves from one generation to the next generation, ultimately arriving at a satisfactory solution to the specified problem. These algorithms differ in the way that a new population is generated from the existing one and in the way the members are presented within the algorithm. Three types [5] of evolutionary computing techniques are widely reported recently. These are genetic algorithms, genetic programming, and evolutionary algorithms (EA). The Eas can be divided into evolutionary strategies (ES) and evolutionary programming. Genetic Algorithm (GA) is a search technique used to find approximate solutions to optimization and search problems that relies on a linear representation of genetic materials genes, or genotypes [4, 5]. In GA, a candidate solution for a specific problem is called an individual or a chromosome made up of genes and represented by binary string. To manipulate the genetic composition of a chromosome, GAs use three types of operators: selection, crossover and mutation. The term DEC refers to the technique where chromosomes will be resided at a distant place, not in the executing system. DEC makes the system more convenient and faster. The new idea related to GA, OLGA technique is also innovated to extend and to make more efficient from the existing GA.
4 Proposed Experimental Environment The proposed scheme works in two phases: categorize the environmental context into clusters and recognize the individual objects within a cluster. 4.1 Environmental Context-Awareness Environmental context-awareness is conceded by means of environmental context data that is defined as any observable and relevant attributes, and its interaction with other entities and/or surrounding environment at an instance of time [11].
12
N.M. Young, Md.R. Bashar, and P.K. Rhee
For identifying and category environmental context data, Fuzzy Adaptive Resonance Theory (FART), a variation of first generation ART [12] algorithm is adopted. First ART, named ART1, works with binary inputs, while FART is a synthesis of the ART algorithm and Fuzzy operators that (FART) allows both binary and continuous input patterns. The image space of object instance with varying illuminations must be clustered properly so that the location error can be minimized.. Thus, FART method, which shows robustness in subjective and ambiguous applications in order to achieve optimal illumination context clustering is preferred for adaptation. The performance of clustering is improved by observing previously clustered data repeatedly. For example, if a dynamic environment has Obj = {O1, O2, .., On} individual objects, then FART system produces CLS1, CLS2, …….CLSm clusters where CLSi = {O1,O2,….Oj}, j < n and CLSi∈Obj. 4.2 On-Line Genetic Algorithm (OLGA) The designed OLGA operates in two modes: the evolutionary mode and the action mode. In the evolutionary mode, it accumulates its knowledge by exploring its application environments, while it performs its designated task using the accumulated knowledge in action mode. For example, a system requires t time for evolutionary mode and it starts action mode after t time. The evolutionary mode can be online or offline adaptation. For offline adaptation, environmental context is categorized or identified according to some predefined characteristics (here, illumination) and genetic algorithm is employed for learning. For online adaptation, when a new context is encountered, it directly interacts with the action mode. Whenever an application environment changes, the system accumulates and stores environmental context knowledge in terms of context category and its corresponding action. FART has its capability for on-line learning that introduces clustering for on-line system in a dynamic environment. For on-line learning, as with the usual work for separating environmental context, FART looks for an unknown type of cluster, if it finds, it makes a new cluster. In Fig. 1, context category module (CCM) performs these operations. Initially, the system accumulates the knowledge and stores in context knowledge (CK) that guarantees optimal performance for individual identified context. The CK stores the expressions of identifiable contexts and their matched actions that will be performed by Adaptation module (AM) that consists of one or more action primitives i,e preprocessing, feature representation etc. This knowledge is also stored at server to make an extensive knowledge database so that when a new context arrives for recognition, the executing system can share knowledge from server. The matched or provided action can be decided by either experimental trial-and-error or some automating procedures. In the operation time, the context expression is determined from the derived context representation, where the derived context is decided from the context data. Evolution action module, EAM searches for the best combining structure of action primitives for an identified context. These action primitives are stored in CK with the corresponding context expression.
A Face Recognition System on Distributed Evolutionary Computing
13
Fig. 1. On-line learning
OLGA works in two phases. At first it performs off-line evolution to accumulate environmental context knowledge and stores in CK and then it performs on-line evolution. During off-line evolution, CCM categorizes the environmental context into clusters, EAM searches for the best population, if found, and updates the CK as shown in Fig. 2.
Fig. 2. Off-line evolution
The adaptive task is carried out using the knowledge of the CK evolved in the evolutionary mode and then action mode is performed. For on-line evolution, when a new context data is found, it creates a new cluster and makes collection, searches to match existing clusters, if matches it selects action primitives, otherwise send a request to server for providing primitives and performs that action primitives with the help of EAM and CK, and finally updates the CK as shown in Fig. 3.
14
N.M. Young, Md.R. Bashar, and P.K. Rhee
Fig. 3. On-line evolution
5 Design Example The AM of OLGA consists of three stages: preprocessing, feature extraction and classification. The action primitives in the preprocessing steps are histogram equalization, contrast stretching and retinex [12]. The action primitives in the feature extraction stage are PCA and Gabor representation [12] and finally cosine distance measurement is concerned for classification. The proposed framework is applied in the field of visual information processing i,e face recognition. Face images with different illumination are preferred for this experiment due to its spatial boundaries so that it is easy to distinguish among the
Fig. 4. Example of face images clustered on different illumination
A Face Recognition System on Distributed Evolutionary Computing
15
C lustering Result 0
1
2
3
4
5
M ean
1 0.8 0.6 y 0.4 0.2 0 0
0.2
0.4
0.6
0.8
1
x
Fig. 5. The visualization of face image clusters
environmental contexts. In this research, 128 × 128 spatial resolution and 256 gray levels input images are considered for input and hybrid vectorization technique [12]. FART constructs clusters according to variation of illumination using hybrid vectorization technique showing Fig. 4. And Fig. 5 shows the visualization of the clustering result.
6 Experimental Results
Series1
C
lu st er lu -0 st C er lu st 1 C er lu st 2 C er lu st 3 C er lu st 4 C er lu st 5 C er lu st 6 C er lu st 7 er -8
105 100 95 90 85 80 75 70
C
Recognition Rate
We have conducted several experiments to evaluate the effectiveness of the proposed OLGA in the area of face recognition. With two properties of GA: off-line GA (usual GA) and on-line GA relating to clusters and times are encountered for experiments. Extensive number of FERET face image dataset with its normal illumination fafb and bad illumination fafc are employed for making artificial environmental contexts and for artificial DEC. Firstly, experiments on off-line GA are represented in Fig. 7 where FART has constructed 9 types of cluster.
Fig. 6. Performance of face recognition with off-line GA
16
N.M. Young, Md.R. Bashar, and P.K. Rhee
The recognition rate of the proposed OLGA based face recognition for real-time system is shown in Table 1. For the first span of time for gathering information form environmental data 6 clusters are encountered for off-line evolution, while 9 and 13 clusters are encountered for real-time system. Fig. 7 describes the recognition rate for off-line GA. Initially the system has accumulated knowledge from the environmental context through offline evolution and it Table 1. Face recognition ratio and cluster’s number according to time for the proposed OLGA method Illumination context
Time 0
Time 1
Time 2
Cluster 0
96.09%
94.41%
96.42%
Cluster 1
97.45%
97.65%
100.00%
Cluster 2
96.94%
97.33%
99.05%
Cluster 3
95.27%
96.45%
95.65%
Cluster 4
99.36%
94.64%
97.96%
Cluster 5
92.62%
95.96%
97.53%
Cluster 6
-
96.46%
97.78%
Cluster 7
-
100.00%
91.48%
Cluster 8
-
97.22%
98.99%
Cluster 9
-
-
96.19%
Cluster 10
-
-
97.85%
Cluster 11
-
-
96.70%
Cluster 12
-
-
-
96.29%
96.68%
97.13%
Average
Face Recognition on off-line GA
Recognition Rate
0.97 0.965 0.96 0.955 0.95 0.945 0.94 1
2
3
4
5
6
7
8
9
10
11
12
13
Time
Fig. 7. Face recognition rate over time for off-line GA
A Face Recognition System on Distributed Evolutionary Computing
17
produces more than 96% accuracy, however, when a lot of context categories are present, it takes comparatively more time for evolution, as a result the recognition rate decreases. Fig. 8 describes the recognition rate for OLGA. Gathering knowledge from offline evolution from cluster 0 to 5, the on-line evolution starts and for some times it achieves better performance than previous offline system. Later, as the number of contexts increases, the recognition rate decreases, while the evolution is finished, it receives the highest recognition rate. And finally Fig. 10 shows the comparison between on-line and off-line GA based face recognition system where OLGA shows better performance than off-line GA.
Recognition Rate
Face Recognition on on-line GA
0.972 0.97 0.968 0.966 0.964 0.962 0.96 0.958 0.956 1
2
3
4
5
6
7
8
9
10
11
12
13
Time
Fig. 8. Face recognition rate for on-line GA
Comparison between on-line and off-line GA
Recognition Rate
0.975 0.97 0.965 0.96
on-line GA
0.955
off-line GA
0.95 0.945 0.94 1
2
3
4
5
6
7
8
9 10 11 12 13
Time
Fig. 9. Comparison between on-line GA and off-line GA
7 Conclusion This paper contributes to the effort for the robust face recognition system that describes the new concepts of OLGA in the area of dynamic environmental objects and
18
N.M. Young, Md.R. Bashar, and P.K. Rhee
the concepts of DEC to make an efficient system. The proposed system not only produces highly robust and real-time face recognition system on different illumination categorized images but also establishes a new concept of OLGA that reduces the execution time of traditional genetic algorithm with higher performance. As demonstrated by extensive experimental evaluation the proposed OLGA leads to superior face recognition rates.
References 1. Kevin, W., Bowyer, Kyong Chang, Patrick Flynn: A Survey of Approaches and Challenges in 3D and Multi-modal 3D + 2D Face Recognition. Computer Vision and Image Understanding, Vol. 101, Issue 1, January (2006) 1-15 2. Sotiris Malassiotis, Michael, G., Strintzis. : Robust Face Recognition using 2D and 3D Data: Pose and Illumination Compensation. Pattern Recognition, Vol. 32, Issue 2, December (2005) 28~39 3. Tang, Kai Wing., Ray, A., Jarvis.: An evolutionary Computing Approach to Generating Useful and Robust Robot Team Behaviors. IEEE International Conference on Intelligent Robots and Systems, Sedai, Japan, September 28- October2 (2004) 4. Chia-Feng Juang: Combination of Online Clustering and Q-Value Based GA for Reinforcement Fuzzy System Design, IEEE transaction on fuzzy systems. Vol.13, No.3, June (2005) 6~124 5. Vonk, E., Jain, L.C., Hibbs, R.: Integrating Evolutionary Computation with Neural Networks. IEEE conference 0-8186-7085 (1995) 1-95 6. Qing, Laiyun., Shan, Shiguang., Gao, Wen., Du, Bo.: Face Recognition Under Generic Illumination Based on Harmonic Relighting. International Journal on Pattern Recognition and Artificial Intelligence, Vol. 19, No. 4 (2005) 513-531 7. Philips, P.J., Moon, H., Rauss, P.J., Rizvi, S.: The Feret Evaluation Methodology for Face Recognition Algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 22 (2000) 1090–1100 8. Lee, Jinho., etc.: A Bilinear Illumination Model for Robust Face Recognition. 10th IEEE International conference on Computer Vision (ICCV’05) (2005) 9. Liu, H.,et. al.: Illumination Compensation and Feedback of Illumination Feature in Face Detection. Proc. International Conferences on Information-technology and Informationnet, Beijing, Vol. 3 (2001) 444-449 10. Marios Savvides et.al.: Corefaces- Robust Shift Invariant PCA based Correlation Filter for Illumination Tolerant Face Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04) (2004) 11. Phill Kyu Rhee, et, al.: Context-Aware Evolvable System Framework for Environment Identifying Systems. KES2005(2005) 270-283 12. Mi Young, et. al.: Hybrid Filter Fusion for Robust Visual Information Processing. KES2005 (2005) 186-194
A Fuzzy Kohonen’s Competitive Learning Algorithm for 3D MRI Image Segmentation Jun Kong1, 2, ∗, Jianzhong Wang1, 2, Yinghua Lu1, Jingdan Zhang1, and Jingbo Zhang1 1
Computer School, Northeast Normal University, Changchun, Jilin Province, China 2 Key Laboratory for Applied Statistics of MOE, China {kongjun, wangjz019, luyh, zhangjd358}@nenu.edu.cn
Abstract. Kohonen’s self-organizing feature map (SOFM) is a two-layer feedforward competitive learning network, and has been used as a competitive learning clustering algorithm in brain MRI image segmentation. However, most brain MRI images always present overlapping gray-scale intensities for different tissues. In this paper, fuzzy methods are integrated with Kohonen’s competitive algorithm to overcome this problem (we will name the algorithm F_KCL). The F_KCL algorithm fuses the competitive learning with fuzzy c-means (FCM) cluster characteristic and can improve the segment result effectively. Moreover, in order to enhancing the robustness to noise and outliers, a kernel induced method is exploited in our study to measure the distance between the input vector and the weights (KF_KCL). The efficacy of our approach is validated by extensive experiments using both simulated and real MRI images.
1 Introduction In recent years, various imaging modalities are available for acquiring complementary information for different aspects of anatomy. Examples are MRI (Magnetic Resonance Imaging), Ultrasound, and X-ray imaging including CT (Computed Topography). Moreover, with the increasing size and number of medical images, the use of computers in facilitating their processing and analyses has become necessary [1]. Many issues inherent to medical image make segmentation a difficult task. The objects to be segmented from medical image are true (rather than approximate) anatomical structures, which are often non-rigid and complex in shape, and exhibit considerable variability from person to person. Moreover, there are no explicit shape models yet available for capturing fully the deformations in anatomy. MRI produces high contrast between soft tissues, and is therefore useful for detecting anatomy in the brain. Segmentation of brain tissues in MRI images plays a crucial role in threedimensional (3-D) volume visualization, quantitative morphmetric analysis and structure-function mapping for both scientific and clinical investigations. ∗
Corresponding author. This work is supported by science foundation for young teachers of Northeast Normal University, No. 20061002, China.
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 19 – 29, 2006. © Springer-Verlag Berlin Heidelberg 2006
20
J. Kong et al.
Because of the advantages of MRI over other diagnostic imaging [2], the majority of researches in medical image segmentation pertains to its use for MR images, and there are a lot of methods available for MRI image segmentation [1]. Image segmentation is a way to partition image pixels into similar regions. Clustering methods are tools for partitioning a data set into groups of similar characteristics. Thus, clustering algorithms would naturally be applied in image segmentation [4] [5]. However, the uncertainty of MRI image is widely presented in data because of the noise and blur in acquisition and the partial volume effects originating from the low sensor resolution. In particular, the transitional regions between tissues are not clearly defined and their membership is intrinsically vague. Therefore, fuzzy clustering methods such as the Fuzzy C-Means (FCM) are particularly suitable for the MRI segmentation [6] [7]. However, these FCM-based algorithms are sensitive to noise and dependent on the weighting exponent parameter without a learning scheme [8] [9]. Conversely, neuralnetwork-based segmentation could be used to overcome these adversities [10] [11] [12]. In these neural network techniques, Kohonen’s self-organizing map (SOM) is used most in MRI segmentation [13] [14] [15]. In this paper we address the segmentation problem in the context of isolating the brain tissues in MRI images. Kohonen’s self-organizing feature map (SOFM) is exploited as a competitive learning clustering algorithm in our work. However, the transitional regions between tissues in MRI images are not clearly defined and the noise in the image will leads to further degradation with segmentation. Therefore, fuzzy methods and kernel methods are integrated with Kohonen’s competitive algorithm in this study to overcome the above problems. The rest of this paper is organized as follows. Section 2 presents the fuzzy Kohonen’s competitive algorithm (F_KCL). The kernel-induced distance measure is incorporated with the F_KCL algorithm by replacing the Euclidean distance (KF_KCL) in Section 3. Experimental results are presented in Section 4 and we conclude this paper in Section 5.
2 Fuzzy Kohonen’s Competitive Algorithm 2.1 Conventional Kohonen’s Competitive Learning Algorithm The SOFM consists of an input layer and a single output layer of nodes which usually form a two-dimensional array. The training of SOFM is usually performed using Kohonen’s competitive learning (KCL) algorithm [16]. There are two phases of operation: the similarity matching phase and the weight adaptation phase. Initially, the weights are set to small random values and a vector is presented to the input nodes of the network. During the similarity matching phase, the distances dj between the inputs and the weights are computed as follows: d
j
= x i − μ ij
2
,
j = 1, 2, , M .
(1)
where xi is the ith input vector of X = (x1, x2, …, xN), N is the number of the input vectors, M is the number of output nodes, and wij is the weight from input node i to output node j. Next, the output node g having the minimum distance dg is chosen and is declared as the “winner” node. In the weight adaptation phase, the weights from the
A Fuzzy Kohonen’s Competitive Learning Algorithm
21
inputs to the “winner” node are adapted. The weight changes are based on the following rule:
wij (t + 1) = wij (t ) + a(t )hij (t )( xi − wij (t )) .
(2)
with 1 if xi − wij ( t ) = min xi − win ( t )
hij (t ) = {
1≤ n ≤ M
(3)
.
0 otherwise
The parameter a(t) is the learning rate of the algorithm and hij(t) denotes the degree of neuron excitation. It can be seen from the Equation (3) that only the weight of the “winner” note updates during the training iteration. Generally, the learning rate a(t) are monotonically decreasing functions of time [16]. A typical choice for a(t) is
t T
α (t ) = α 0 (1 − ) .
(4)
The training procedure is repeated for the number of steps T which is specified apriori. 2.2 Fuzzy Kohonen’s Competitive Learning Algorithm Though the conventional Kohonen’s competitive algorithm possesses some very useful properties, it is still a hard partition method. As we have mentioned above, most brain MRI images always present overlapping gray-scale intensities for different tissues, particularly in the transitional regions of gray matter and white matter, or cerebrospinal fluid and gray matter. Therefore, fuzzy methods are more suitable for the brain MRI image segmentation because they can retain more information from the original image. The most widely used fuzzy method for image segmentation is fuzzy c-means (FCM) algorithm. The FCM clustering algorithm assigns a fuzzy membership value to each data point based on its proximity to the cluster centroids in the feature space. The standard FCM objective function of partitioning a dataset X =(x1, x2,…,xn) into c clusters is
J m (μ , v) =
c
n
¦¦ i =1
2
c
μ ijm x j − v i
subject to
¦μ
ij
=1.
(5)
i =1
j =1
where ||·|| stands for the Euclidean norm, vi is the fuzzy cluster centroids,
μij
gives
the membership of the jth data in the ith cluster ci, and m presents the index of fuzziness. The objective function is minimized when pixels close to the centroid of their clusters are assigned high membership values, and low membership values are assigned to pixels with data far from the centroid. The membership function and cluster centers are updated by the following:
μ ij =
c
¦( k =1
x j − vi x j − vk
)
−2
( m −1 )
.
(6)
22
J. Kong et al.
and n
vi =
¦μ
m ij
xj .
j =1 n
¦μ
(7)
m ij
j =1
Based on Equation (6), Karen et al. proposed a generalized Kohonen’s competitive learning algorithm [20]. In their method, the degree of neuron excitation h(t) and learning rate a(t) in Equation (2) are approximated using FCM membership functions μ ij as follow:
ª μ º ij » hij (t ) = « « min μ ij » ¬ 1≤ i ≤ c ¼
1+
f (t ) c
, i = 1, 2, , c .
(8)
and
a i (t ) =
where
μij
a0 . § a0 · ¨¨ ¸¸ + h ij ( t ) © a i ( t − 1) ¹
(9)
is the FCM membership in (6) and f(t) is a positive strict monotone in-
crease function of t which controls the degree of neuron excitation. In general, f(t) = (t ) is chosen. Although the experimental results in [20] show that their method is validated. There are still some problems. Firstly, the two functions in Equation (8) and (9) are very complicated and time consuming. Secondly, the degree of neuron excitation hij(t) in equation (8) will be extreme large as the time t increase, this is because when the iteration time increase, and the network tends towards convergence. For each input data, if its value is close to one of the centroids, its membership to the class it belongs to will be very high, and the membership to other classes will be low, even will be zero sometimes. Thus, the quotient obtained in (8) will be large. The neuron excitation will also be huge after the exponential operation, and increase the computation complexity evidently. 2.3 Our Proposed F_KCL Algorithm Due to the aim of overcoming the problems of Equation (8) and (9), in this section, we will present a new low-complexity method to approximate the neuron excitation and the learning rate as follow:
1 · § hij (t ) = exp ¨ t ( μ ij − ) ¸ c ¹ ©
, i = 1, 2, , c .
(10)
A Fuzzy Kohonen’s Competitive Learning Algorithm
23
and
a i (t ) =
a i ( t − 1) . a 0 + h ij ( t )
(11)
Transparently, in our proposed method, the neuron excitation and the learning rate are also determined by the membership function, but the hij(t) in our method will not be too large as the time t increase. It is clearly shows that the learning rate ai(t) in (11) monotonically decreases to zero as time t increase.
3 F_KCL Based on Kernel-Induced Distance In Section 2, we have described the fuzzy Kohonen’s competitive learning algorithm (F_KCL). By integrating the FCM cluster with Kohonen’s competitive learning algorithm, F_KCL algorithm can deal with the overlapping grayscale intensities and the not clearly defined borders between tissues successfully. However, the FCM algorithm always suffers sensitivity to the noise and outliers [20], thus, the F_KCL segmentation result will be degradation when applied to the noise corrupted images. Another drawback of the standard FCM is not suitable for revealing non-Euclidean structure of the input data due to the use of Euclidean distance (L2 norm). In order to avoid these disadvantages, Chen and Zhang proposed a kernel-induced distance measure method in [21] and [22]. The kernel methods are one of the most researched subjects within machine learning community in recent years and have been widely applied to pattern recognition and function approximation. In Chen’s study, they used kernel functions to substitute the inner products to realize an implicit mapping into feature space so that their corresponding kernelized versions are constructed. The major characteristic of their approach is that they do not adopt dual representation for data centroid, but directly transform all centroids in the original space, together with given data samples, into high-dimensional feature space with a mapping. Through the kernel substitution, the new class of non-Euclidean distance measures in original data space is obtained as Φ (x
j
) − Φ (v )
2
i
=
= (Φ (x
) − Φ (v )) (Φ (x ) − Φ (v )) Φ (x ) Φ (x ) − Φ (v ) Φ (x ) − Φ (x ) T
j
j
T
j
+ Φ (v i
=
i
i
T
j
T
i
)T Φ (v i ) K (x j , x j ) + K (v i , v i ) −
j
j
2 K (x j , v i
Φ (v i ) .
(12)
)
and the kernel function K(x, y) is taken as the radial basis function (RBF) to simplified (12), the typical RBF kernel is:
§ § d ¨−¨ xi − y i ¨ ¨¦ i =1 © K ( x , y ) = exp ¨ δ2 ¨ ¨ ¨ ©
a
· ¸ ¸ ¹
b
· ¸ ¸ ¸. ¸ ¸ ¸ ¹
(13)
24
J. Kong et al.
where d is the dimension of vector x. Obviously, for all x and RBF kernels, we can get K(x, x) = 1. With the above formulations, the kernel version of the FCM algorithm and its membership function are:
J mΦ ( μ , v ) =
c
¦¦ i =1
=2
n j =1
μ ijm Φ (x j ) − Φ (v i )
¦ ¦ μ (1 − K (x
j
(1 − K ( x j , v i ))
( m −1)
c
n
m ij
i =1
j =1
2
.
, v i ))
(14)
and
μ ij =
c
¦ ( (1 − K ( x k =1
j , v k ))
)
−2
.
(15)
In our study, for the sake of overcoming the sensitivity to noise and outliers, we also incorporate the kernel distance measure with the F_KCL algorithm (we will name the algorithm KF_KCL), the KF_KCL can uniformly be summarized in the following steps: KF_KCL Algorithm Step 1) Fix the number of cluster c, the train time T; Step 2) Initialize the weights and the learning rate a(0)=ai(0)=1; Step 3) For t = 1, 2,...,T; For j = 1, 2,...,n; Set vi = wij (t ), i = 1, 2,c ; Calculate
μ ij using (15);
Calculate hij(t) using (10); Calculate ai(t) using (11); Update all notes using the following equation:
wij (t + 1) = wij (t ) + a(t )hij (t ) K ( xi , wij (t ))( xi − wij (t ))
(16)
Step 4) End;
4 Experimental Results The proposed KF_KCL algorithm was implemented in Matlab and tested on both simulated MRI images obtained from the BrainWeb Simulated Brain Database at the McConnell Brain Imaging Centre of the Montreal Neurological Institute (MNI), McGill University [17], and on real MRI data obtained from the Internet Brain Segmentation Repository (IBSR) [18]. Extra-cranial tissues are removed from all images prior to segmentation.
A Fuzzy Kohonen’s Competitive Learning Algorithm
25
4.1 Results Analysis and Comparison
In this section, we apply our algorithm to a simulated data volume with T1-weighted sequence, slice thickness of 1mm, volume size of 21 . The number of tissue classes in the segmentation is set to three, which corresponds to gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF). Background pixels are ignored in our experiment. For image data, there is strong correlation between neighboring pixels. To produce meaningful segmentation, the spatial relationship between pixels is considered in our experiment. The input vector of each pixel is constructed by the intensity of the current pixel and the mean value of its neighborhood, The 3-D neighborhood that we used in our study is a 3-D six-point neighborhood, i.e., north, east, south, west of the center voxel, plus the voxels immediately before and after the voxel. The parameters in the RBF kernel function are set as: σ = 400 , a = 2 and b = 1 . The brain image in Fig. 1(a) is a slice of the simulated 3-D volume, the segment result using our proposed KF_KCL algorithm is given in Fig. 1(b), Fig. 1(c) and (d) are the segmentation results obtained by the standard KCL algorithm and the FCM cluster algorithm, the “ground truth” of Fig. 1(a) is shown in Fig. 1(e). Though the visual inspection shows that the images in the Fig. 1(b), (c), and (d) are nearly same between each other and they are all similar to the “ground truth”, when we compare the three images with the “ground truth” and get the similarity indices [19] in Table 1, we can see that the result of our proposed algorithm is better than the standard KCL and the FCM algorithm. In addition, the KF_KCL converges faster than the KCL.
(a)
(b)
(d)
(c)
(e)
Fig. 1. (a) The original slice of the 3-D brain image (z = 70). (b) Segmentation result by KF_KCL algorithm with 5 iterations. (c) Segmentation result by standard KCL algorithm with 10 iterations. (d) Segmentation result by FCM. (e) The ground truth of (a).
26
J. Kong et al. Table 1. Similarity index for different methods in Fig. 1
KF_KCL
ρ
WM GM CSF
98.85 97.76 96.93
Standard KCL
ρ
97.37 95.10 92.00
FCM
ρ
97.38 96.79 96.99
After tested and compared our algorithm on the noise-free image, we then apply the KF_KCL algorithm to the images corrupted by noise and other imaging artifact. Fig. 2(a) shows the same slice as Fig. 1(a) but with 5% Rician noise, Fig. 2(b) is the segmentation result by KF_KCL algorithm, the images in Fig. 2(c) and (d) are the segment result by standard KCL and FCM algorithm. Transparently, the result of the proposed KF_KCL algorithm is better than other two methods. This is because both the kernel-induced distance measure and the spatial constrains can reducing the medical image noise effectively. The similarity indices of the images in Fig. 2 are also calculated and shown in Table 2.
(a)
(b)
(c)
(d)
Fig. 2. (a) The slice of the 3-D brain image (z = 70) with 5% noise. (b) The segment result using KF_KCL. (c) The segment result using KCL. (d) The segment result using FCM. Table 2. Similarity index for different methods in Fig. 2
KF_KCL
ρ
WM GM CSF
94.09 92.65 93.10
Standard KCL
ρ
92.21 90.46 92.02
FCM
ρ
92.55 91.53 90.32
Figure 3 (a) shows a slice of the simulated images corrupted by 3% noise and 20% intensity non-uniformity (INU), Fig. 3 (b) is the KF_KCL segment result. Although there are not any bias estimate and correct methods in our algorithm, by comparing with the “ground truth” in Fig. 3 (c), the similarity indices we obtained of WM, GM and CSF are 93.69%, 90.07% and 92.51% respectively. The similarity
A Fuzzy Kohonen’s Competitive Learning Algorithm
(a)
(b)
27
(c)
Fig. 3. (a) A slice of the 3-D brain image (z = 130) with 3% noise and 20% INU. (b) The segment result using KF_KCL. (c) The ground truth.
Fig. 4. Segmentation results for the whole brain and the white matter
(a)
(b)
(c)
(e)
(f)
(g)
(d)
(h)
Fig. 5. Segmentation of real MRI images. (a) and (e) are original images. (b) and (f) are the segment result by our proposed KF_KCL algorithm. (c) and (g) are FCM segmentation result. (d) and (h) are the segment result by standard KCL.
28
J. Kong et al.
index ρ > 70% indicates an excellent similarity [19]. In our experiments, the similarity indices ρ of all the tissues are larger than 90% even for a bad condition with noise and INU, which indicates an excellent agreement between our segmentation results and the “ground truth”. Figure 4 shows the 3-D view segmentation results of the whole brain and the white matter. 4.2 Performance on Actual MRI Data
The images in Fig. 5 (a) and (e) are two slices of real T1-weighted MRI images. Fig. 5 (b) and (f) are KF_KCL segmentation results. Fig. 5 (c) and (g) show the clustering results using FCM, the KCL segment results are shown in Fig. 5 (d) and (h). Visual inspection shows that our approach produces better segmentation than other algorithms.
5 Conclusions A novel fuzzy Kohonen’s competitive learning algorithm with kernel induced distance measure (KF_KCL) is presented in this paper. Because the transitional regions between tissues in MRI brain images are always not clearly defined, fuzzy methods are integrated with Kohonen’s competitive learning (KCL) algorithm to deal with this problem. Though the KCL-based segmentation techniques are useful in reducing the image noise, in order to further increase the segmentation accuracy, kernel methods are also incorporated in our work. The kernel methods have been widely applied to unsupervised cluster in recent years, and the kernel distance measure can effectively overcome the disadvantage of the Euclidean distance measure, e.g. sensitive to noise and outliers. At last, we consider the spatial relationships between image pixels in our experiments. The proposed KF_KCL algorithm is applied on both simulated and real MRI images and compared with the KCL and FCM algorithm. The results reported during the test show that our approach is better than the others.
References 1. Pham, D.L, Xu, C. Y, Prince, J. L: A Survey of Current Methods in Medical image Segmentation. [Technical report version, JHU/ECE 99—01, Johns Hopkins University], Ann. Rev. Biomed. Eng. 2 (2000) 315-37 2. Wells,W. M., Grimson, W. E. L., Kikinis, R., Arrdrige, S. R.: Adaptive Segmentation of MRI Data. IEEE Trans Med Imaging 15 (1996) 429-42 3. Gerig, G, Martin, J., Kikinis, R, Kubler, D, Shenton, M., Jolesz, F. A: Unsupervised Tissue Type Segmentation of 3D Dual-echo MR Head Data. Image Vision Compute, 10 (1992) 349-60 4. Alan, W. C. L., Yan, H.: An Adaptive Spatial Fuzzy Clustering Algorithm for 3-D MR Image Segmentation. IEEE Transaction on Medical Imaging 22 (9) (2003) 1063-1075 5. Philips, W. E., Velthuizen, R. P., Phuphanich, S, Hall, L. O, Clarke, L. P, Silbiger, M. L: Application of Fuzzy C-means Segmentation Technique for Differentiation in MR Images of a Hemorrhagic Glioblastoma Multiforme. Mag Reson Imaging 13 (1995) 277–90
A Fuzzy Kohonen’s Competitive Learning Algorithm
29
6. Pham, D. L, Prince, J. L.: Adaptive Fuzzy Segmentation of Magnetic Resonance Images. IEEE Trans Med Imaging 18 (1999) 737-752 7. Bezdek, J: Pattern Recognition with Fuzzy Objective Function Algorithms. New York: Plenum Press (1981) 8. Wu, K. L., Yang, M. S.: Alternative C-means Clustering Algorithms. Pattern Recognition 35 (2002) 2267–2278 9. Hall, L. O, Bensaid, A.M, Clarke L. P, Velthuizen, R. P, Silbiger, M. S, Bezdek, J. C: A Comparison of Neural Network and Fuzzy Clustering Techniques in Segmenting Magnetic Resonance Images of the Brain. IEEE Trans Neural Networks, 3 (1992) 672–682 10. Ozkan, M, Dawant, B. M, Maciunas, R. J: Neural-network-based Segmentation of Multimodal Medical Images: A Comparative and Prospective Study. IEEE Trans Medical Imaging, 12 (1993) 534–544 11. Reddick, W. E, Glass, J. O, Cook, E. N, Elkin, T. D, Deaton, R: Automated Segmentation and Classi-fication of Multispectral Magnetic Resonance Images of Brain using Artificial Neural Networks. IEEE Trans Med Imaging , 16 (1997) 911–918 12. Reddick, W. E, Mulhern, R. K, Elkin, T. D, Glass, J. O, Merchant, T. E, Langston, J. W: A Hybrid Neural Network Analysis of Subtle Brain Volume Differences in Children Surviving Brain Tumors. Mag Reson Imaging, 16 (1998) 413–421 13. Chuang, K. H, Chiu, M. J, Lin, C. C, Chen, J. H: Model-free functional MRI analysis using Kohonen clustering neural network and fuzzy c-means. IEEE Trans Medical Imaging, 18 (1999) 1117–1128 14. Glass, J. O, Reddick, W. E, Goloubeva, O., Yo, V., Steen, R. G: Hybrid Artificial Neural NetWork Segmentation of Precise and Accurate Inversion Recovery (PAIR) Images From Normal Human Brain. Mag Reson Imaging 18 (2000) 1245–1253 15. Kohonen, T.: Self-Organizing Maps. New York: Springer-Verlag (1995) 16. Kwan, R. S., Evans, A., Pike, G: MRI Simulation-based Evaluation of Image-processing and Classification Methods. IEEE Trans. Med. Imaging 18 (11) (1999) 1085-1097. Available: http://www.bic.mni.mcgill.ca/brainweb 17. Kennedy, D. N, Filipek, P. A, Caviness, V. S: Anatomic Segmentation and Volumetric Calculations in Nuclear Magnetic Resonance Imaging. IEEE Transactions on Medical Imaging, 8 (1989) 1-7. Available: http://www.cma.mgh.harvard.edu/ibsr/ 18. Zijdenbos, A., Dawant, B.: Brain Segmentation and White Matter Lesion Detection in MR Images. Crit. Rev. Biomed. Eng. 22(5–6) (1994) 401–465 19. Karen, C. R. Lin, M. S., Yang, H. C., Liu, J.F., Lirng, Wang, P. N.: Generalized Kohonen’s Competitive Learning Algorithm for Ophthalmological MR Image Segmentation. Magnetic Resonance Imaging 21 (2003) 863-870 20. Chen, S. G., Zhang, D. Q.: Robust Image Segmentation using FCM with Spatial Constraints Based on New Kernel Induced Distance Measure. IEEE Transaction on SMC-Part B 34 (2004) 1907-1916 21. Zhang, D. Q., Chen, S. C.: A Novel Kernelized Fuzzy C-means Algorithm with Application in Medical Image Segmentation. Artificial Intelligence in Medicine 32 (2004) 37-50
A Hybrid Genetic Algorithm for Two Types of Polygonal Approximation Problems Bin Wang1, and Chaojian Shi1,2 1
Department of Computer Science and Engineering, Fudan University, Shanghai, 200433, P. R. China 2 Merchant Marine College, Shanghai Maritime University, Shanghai, 200135, P. R. China
[email protected],
[email protected]
Abstract. A hybrid genetic algorithm combined with split and merge techniques (SMGA) is proposed for two types of polygonal approximation of digital curve, i.e. Min-# problem and Min-ε Problem. Its main idea is that two classical methods—split and merge techniques are applied to repair infeasible solutions. In this scheme, an infeasible solution can not only be repaired rapidly, but also be pushed to a local optimal location in the solution space. In addition, unlike the existing genetic algorithms which can only solve one type of polygonal approximation problem, SMGA can solve two types of polygonal approximation problems. The experimental results demonstrate that SMGA is robust and outperforms other existing GA-based methods.
1
Introduction
In image processing, the boundary of an object can be viewed as a closed digital curve. How to represent it for facilitating subsequent image analysis and pattern recognition is a key issue. Polygonal approximation is a good representation method for the closed digital curve. Its basic idea is that a closed digital curve is divided into a finite number of segments and each segment is approximated by a line segment connecting its two end points. The whole curve is then approximated by the polygon formed by these line segments. Polygonal approximation is a simple and compact representation method which can approximating the curve with any desired level of accuracy. Therefore, this method is widely studied in image processing, pattern recognition, computer graphics, digital cartography, and vector data processing. In general, there are two types of polygonal approximation problems which have attracted many researchers’ interest. They are described as follows: Min-# problem: Given a closed digital curve, approximate it by a polygon with a minimum number of line segments such that the approximation error does not exceed a given tolerance error ε. Min-ε problem: Given a closed digital curve, approximate it by a polygon with a given number of line segments such that the approximation error is minimized.
Corresponding author.
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 30–41, 2006. c Springer-Verlag Berlin Heidelberg 2006
A Hybrid GA for Two Types of Polygonal Approximation Problems
31
Both of the above polygonal approximation problems can be formulated as a combinatorial optimization problem. Since an exhaustive search for the optimal solution in the potential solution space will result in an exponential complexity [1], many existing methods for polygonal approximation problems yield suboptimal results to save computational cost. Some existing methods for polygonal approximation problems are based on local search technique. They can be classified into following categories: (1) sequential tracing approach [2], (2) split method [3], (3) merge method [4], (4) split-and-merge method [5], and (5) dominant point method [6]. These methods work very fast but their results may be very far away from the optimal ones because of their dependence on the selection of starting points or the given initial solutions. In recent years, many nature-inspired algorithms such as genetic algorithms (GA) [1,8,9,10,11], ant colony optimization (ACO)[12], particle swarm optimization (PSO)[13] and so on, have been applied to solve the Min-# problem or the Min-ε problem and presented promising approximation results. In this paper, we focus on using GA-based method to solve polygonal approximation problems. The power of GA arises from crossover, and crossover causes a structured, yet randomized exchange of genetic material between solutions, with the possibility that ’good’ solutions can generate ’better’ ones. However, crossover may also generate infeasible solutions, namely, two feasible parents may generate an infeasible child. This especially arises in combinatorial optimization where the encoding is the traditional bit string representation and crossover is the generalpurpose crossover [11]. Therefore how to cope with the infeasible solution is the main problem involved in using GA-based method for polygonal approximation problems. Among existing GA-based methods for polygonal approximation problems, there are two schemes which are used to cope with infeasible solutions. One is to modify the traditional crossover and constrain it to yield feasible offsprings. Here, we term it constraining method. Yin [8] and Huang[10] adopt this method for solving min-ε problem and min-# problem, respectively. Both of them adopt a modified version of the traditional two-cut-point crossover. In traditional twocut-point crossover (shown in Fig. 4), two crossover sites are chosen randomly. However, it may generate infeasible solutions. They modified it by choosing the appropriate crossover point on the chromosome which can maintain the feasibility of offsprings. However, this will require repeated testing candidate crossover points on the chromosome and result in an expensive cost of time. Furthermore, in some case, such crossover sites can not be obtained for Min# problem. For solving min-ε problem, Chen and Ho [11] proposed a novel crossover termed orthogonal-array-crossover which can maintain the feasibility of offsprings. However, the complexity of this kind of crossover is also high and it is only suitable for min-ε problem and not for min-# problem. Another method for coping with the infeasible solutions is penalty function method. Yin [1] adopted this scheme for min-# problem. It’s main idea is that a penalty function is added to the fitness function for decreasing the survival
32
B. Wang and C. Shi
probability of the infeasible solution. However, it is usually difficult to determine an appropriate penalty function. If the strength of the penalty function is too large, more time will be spent on finding the feasible solutions than searching the optimum, and if the strength of penalty function is too small, more time will be spent on evaluating the infeasible solutions [11]. For solving the above problems involved in coping with the infeasible solutions, we propose a hybrid genetic algorithm combined with split and merge technique (SMGA) for solving min-ε problem and min-# problem. The main idea of SMGA is that the traditional split and merge technique is employed to repair infeasible solutions. SMGA has following three advantages over the existing GA-based methods. (1) SMGA doesn’t require developing a special penalty function, or modifying and constraining the traditional two-cut-point crossover for avoiding yielding an infeasible solution. In SMGA, an infeasible solution can be transformed into a feasible one through a simple repairing operator. (2) SMGA combines the advantage of GA possessing the strong global search ability, and the merits of the traditional split and merge technique having the strong local search ability. This will improve the solution quality and convergence speed of GA. (3) Different from the existing GA-based methods which are designed for solving min-ε problem or min-# problem alone, SMGA are developed for solving both of them. We use four benchmark curves to test SMGA, the experimental results show its superior performance.
2
Problems Formulation
Definition 1. A closed digital curve C can be represented by a clockwise ordered sequence of points, that is C = {p1 , p2 , . . . , pN } and this sequence is circular, namely, pN +i = pi , where N is the number of points on the digital curve. Definition 2. Let p i pj = {pi , pi+1 , . . . , pj } represent the arc starting at point pi and continuing through point pj in the clockwise direction along the curve. Let pi pj denote the line segment connecting points pi and pj . Definition 3. The approximation error between p i pj and pi pj is defined as follows: d2 (pk , pi pj ), (1) e(p i pj , pi pj ) = pk ∈pi pj
where d(pk , pi pj ) is the perpendicular distance from point pk to the line segment pi pj .
A Hybrid GA for Two Types of Polygonal Approximation Problems
33
Definition 4. The polygon V approximating the contour C = {p1 , p2 , . . . , pN } is a set of ordered line segments V = {pt1 pt2 , pt2 pt3 , . . . , ptM −1 ptM , ptM pt1 }, such that t1 < t2 < . . . < tM and {pt1 , pt2 , . . . , ptM } ⊆ {p1 , p2 , . . . , pN }, where M is the number of vertices of the polygon V . Definition 5. The approximation error between the curve C = {p1 , p2 , . . . , pN } and its approximating polygon V = {pt1 pt2 , pt2 pt3 , . . . , ptM −1 ptM , ptM pt1 } is defined as E(V, C) =
M
e(pti pti+1 , pti pti+1 ),
(2)
i=1
Then the two types of polygonal approximation problems are formulated as follows: Min-# problem: Given a digital curve C = {p1 , p2 , . . . , pN } and the error tolerance ε. Suppose Ω denotes the set of all the polygons which approximate the curve C. Let SP = {V | V ∈ Ω ∧ E(V, C) ≤ ε}, Find a polygon P ∈ SP such that |P | = min |V |,
(3)
V ∈SP
where |P | denotes the cardinality of P . Min-ε problem: Given a digital curve C = {p1 , p2 , . . . , pN } and an integer M , where 3 ≤ M ≤ N . Suppose Ω denotes the set of all the polygons which approximate the curve C. Let SP = {V | V ∈ Ω ∧ |V | = M }, where |V | denotes the cardinality of V . Find a polygon P ∈ SP such that E(P, C) = min E(V, C).
(4)
V ∈SP
3 3.1
Overview of Split and Merge Techniques Split Technique
Traditional split technique is a very simple method for solving polygonal approximation problem. It is a recursive method starting with an initial curve segmentation. At each iteration, a split procedure is conducted to split the segment at the selected point unless the obtained polygon satisfy the specified constraint condition. The detail of split procedure is described as follows: Suppose that curve C is segmented into M arcs pt 1 pt2 , . . . , ptM −1 ptM , pt M pt1 , where pti is segment point. Then a split operation on curve C is: for each point pi ∈ ptj ptj+1 , calculate the distance to the corresponding chord D(pi ) = d(pi , ptj ptj+1 ). Seek a point pu on the curve which satisfies D(pu ) = max D(pi ). pi ∈C
ptk+1 . Then the arc ptk ptk+1 is segmented at the point pu Suppose that pu ∈ ptk into two arcs p p and p p . Add the point into the set of segment points. tk u u tk+1 Fig. 1 shows a split process. The function of split operator is to find a new possible vertex using heuristic method.
34
B. Wang and C. Shi
split
Fig. 1. Split operation
merge d min
Fig. 2. Merge operation
3.2
Merge Technique
Merge technique is another simple method for yielding approximating polygon of digital curve. It is a recursive method starting with an initial polygon which regards all the points of the curve as its vertexes. At each iteration, a merge procedure is conducted to merge the selected two adjacent segments of the current polygon until the obtained polygon satisfy the specified constraint condition. The detail of merge procedure is described as follows: Suppose that curve C is segmented into M arcs pt 1 pt2 , . . . , ptM −1 ptM , pt M pt1 , where pti is segment point. Then a merge operation on curve C is defined as: For each segment point pti , calculate the distance it to the line segment which connect its two adjacent points Q(pti ) = d(pti , pti−1 pti+1 ). Select a segment point ptj which satisfies Q(ptj ) = min Q(pti ), where V is the set of the current pti ∈V
segment points. Then two arcs ptj−1 ptj and ptj ptj+1 are merged into a single arc ptj+1 . The segment point ptj is removed from the set of the current segment ptj−1 points. Fig. 2 shows a merge process. The function of merge operator is to remove a possible redundant vertex in heuristic way.
4 4.1
The Proposed Genetic Algorithm (SMGA) Chromosome Coding Scheme and Fitness Function
The encoding mechanism maps each approximating polygon to a unique binary string which is used to represent a chromosome. Each gene of the chromosome corresponds to a point of the curve. if and only if its value is 1, its corresponding curve point is considered as a vertex of the approximating polygon. The number
A Hybrid GA for Two Types of Polygonal Approximation Problems
35
Fig. 3. Mutation
of genes whose value is 1 equals to the number of vertexes of the approximating polygon. For instance, given a curve C = {p1 , p2 , . . . , p10 } and a chromosome ’1010100010’. Then the approximating polygon the chromosome represents is {p1 p3 , p3 p5 , p5 p9 , p9 p1 }. Assume that a chromosome α = b1 b2 . . . bN . For min-ε problem, the fitness function f (α) is defined as follows: f (α) = E(α, C)
(5)
For Min-# problem, the fitness function f (α) is defined as follows: f (α) =
N
bi
(6)
i=1
For the above fitness functions, the smaller the function value is, the better the individual is. 4.2
Genetic Operators
Selection. Select two individual from the population randomly and leave the best one. Mutation. Randomly select a gene on the chromosome and shift it a site to left or right randomly and set 0 to the original gene site (shown in Fig. 3). Crossover. Here, we use the traditional two-cut-point crossover. Its detail is that: randomly select two sites on the chromosome and exchange the two chromosomes’ substring between the two selected sites. For example: given two parent chromosomes ’1010010101’ and ’1011001010, the randomly selected crossover sites is 4 and 7. Then the two children yielded by two-cut-point crossover are ’1011001101’ and ’1010010010’ (shown in Fig. 4). 4.3
Chromosome Repairing
Two-cut-point crossover may yield infeasible offspring. Here, we develop a method using the split and merge technique introduced in section 3 for repairing the infeasible offsprings. For Min-ε Problem: Suppose that the specified number of sides of the approximation polygon is M . Then for an infeasible solution α, we have L(α) = M ,
36
B. Wang and C. Shi
Parent 1
Offspring 1
Parent 2
Offspring 2
Fig. 4. Two-cut-point crossover
where L(α) denotes the number of sides of the approximating polygon α. Then the infeasible solution α can be repaired through following process: If L(α) > M then repeat conducting merge operation until L(α) = M . If L(α) < M , then repeat conducting split operation until L(α) = M . For Min-# Problem: Suppose that the specified error tolerance is ε. Then for an infeasible solution α, we have E(α) > ε, where E(α) is the approximation error. Then the infeasible solution α can be repaired through following process: If E(α) > ε, then repeat conducting split operation until E(α) ≤ ε. Computational Complexity: Supposed that the number of curve points is n and the number of sides of the infeasible solution is k. From the definitions of the split and merge operations, the complexity of the split procedure is O(n − k) and that of the merge procedure is O(k). For Min-ε problem: suppose that the specified number of sides of the approximating polygon is m. If k < m, then repairing the current infeasible solution will require recalling split procedure m − k times. Thus the complexity of the repairing process is O((n − k)(m − k)). If k > m, then repairing the current infeasible solution will require recalling merge procedure k − m times. Therefore, the complexity of the repairing process is O(k(k − m)). For Min-# problem: it is difficult to exactly compute the complexity of the repairing process. Here, we give the complexity of the worst case. In the worst case, we have to add all the curve point to the approximating polygon to maintain the feasibility of the solution. In such case, the approximation error is equal to 0. It will require calling split procedure n − k times. Therefore, the complexity of the repairing process in the worst case is O((n − k)2 ). 4.4
Elitism
Elitism is implemented by preserving the best chromosome with no suffering from being changed to the next generation.
5
Experimental Results and Discussion
To evaluate the performance of the proposed SMGA, we utilize four commonlyused benchmark curves, as shown in Fig. 5. Among these curves, (a) is a figure-8
A Hybrid GA for Two Types of Polygonal Approximation Problems
(a) figure-8
(b) chromosome
(c) semicircle
37
(d) leaf
Fig. 5. Four benchmark curves
curve, (b) is a chromosome-shaped curve, (c) is a curve with four semi-circles and (d) is a leaf-shaped curve. The number of their curve points is 45, 60, 102 and 120 respectively. Literature [6] presented their chain codes. Two groups of experiments are conducted to evaluate the performance of SMGA. One is to apply SMGA to solve the Min-ε problem. The other is to apply SMGA to solve the Min-# problem. All the experiments are conducted using a computer with CPU Pentium-M 1.5 under Windows XP. The parameter of SMGA is set as follows: population size Ns = 31, crossover probability pc = 0.7, mutation probability pm = 0.3 and the maximum number of generations Gn = 80. Table 1. Experimental results of SMGA and EEA [11] for Min-ε problem Curves
semicircle (N = 102)
Figure-8 (N = 45)
chromosome (N = 60)
M 10 12 14 17 18 19 22 27 30 6 9 10 11 13 15 16 8 9 12 14 15 17 18
BEST ε EEA SMGA 38.92 38.92 26.00 26.00 17.39 17.39 12.22 12.22 11.34 11.19 10.04 10.04 7.19 7.01 3.73 3.70 2.84 2.64 17.49 17.49 4.54 4.54 3.69 3.69 2.90 2.90 2.04 2.04 1.61 1.61 1.41 1.41 13.43 13.43 12.08 12.08 5.82 5.82 4.17 4.17 3.80 3.80 3.13 3.13 2.83 2.83
AVERAGE ε EEA SMGA 44.23 42.89 29.42 27.80 20.14 18.55 14.46 13.37 12.79 12.56 11.52 11.22 8.63 7.73 4.87 4.05 3.67 2.93 18.32 17.64 4.79 4.71 3.98 3.73 3.19 3.15 2.36 2.05 1.87 1.69 1.58 1.51 15.56 13.99 13.47 12.76 6.75 5.86 5.13 4.56 4.27 4.07 3.57 3.21 3.04 2.95
VARIANCE EEA SMGA 78.50 25.98 4.68 2.05 4.69 1.41 2.31 1.11 1.47 0.91 0.97 0.50 0.56 0.32 0.57 0.15 0.33 0.04 0.45 0.12 0.15 0.06 0.05 0.02 0.04 0.01 0.06 0.00 0.04 0.01 0.03 0.01 2.42 1.26 1.76 0.55 0.88 0.00 0.59 0.06 0.14 0.04 0.16 0.03 0.05 0.01
38
B. Wang and C. Shi
(M = 18, ε = 11.34) (a) EEA
(M = 18, ε = 11.19) (e) SMGA
(M = 22, ε = 7.19) (b) EEA
(M = 22, ε = 7.01) (f) SMGA
(M = 27, ε = 3.73) (c) EEA
(M = 27, ε = 3.70) (g) SMGA
(M = 30, ε = 2.84) (d) EEA
(M = 30, ε = 2.64) (h) SMGA
Fig. 6. The comparative results of SMGA and EEA [11] for Min-ε problem, where M is the specified number of the sides of approximating polygon, ε is the approximation error
5.1
For Min-ε Problem
Ho and Chen [11] proposed a GA-based method, Efficient Evolutionary Algorithm (EEA), which adopted constraining method to cope with infeasible solutions for solving Min-ε problem. Here we use three curves, semicircle, figure-8 and chromosome to test SMGA and compare it with EEA. For each curve and a specified number of sides M , the simulation conducts ten independent runs for SMGA and EEA, respectively. The best solution, average solution and variance of solutions during ten independent runs for SMGA and EEA are listed in Table 1. Parts of simulation results of SMGA and EEA are shown in Fig. 6, where M is the specified number of the sides of approximating polygon, and ε is the approximation error. From Table 1 and Fig. 6, we can see that, for the same number of polygon’s sides, SMGA can obtain approximating polygon with smaller approximation error than EEA. The average coputation time of EEA for three benchmark curves, semicircle, figure-8 and chromosome, are 0.185s, 0.078s and 0.104s respectively, while SMGA only require 0.020s, 0.011s and 0.015s. It can be seen that SMGA outperforms EEA in the convergence speed. 5.2
For Min-# Problem
Yin [1] proposed a GA-based method for solving Min-# problem (we term it YGA). YGA adopted penalty-function method to cope with infeasible solutions.
A Hybrid GA for Two Types of Polygonal Approximation Problems
39
Table 2. Experimental results for SMGA and YGA [1] for Min-# problem Curves
Leaf (N = 120)
Chromosome (N = 60)
Semicirle (N = 102)
( ε = 30,M = 20) (a) YGA
( ε = 30,M = 16) (e) SMGA
ε 150 100 90 30 15 30 20 10 8 6 60 30 25 20 15
BEST M YGA SMGA 15 10 16 12 17 12 20 16 23 20 7 6 8 7 10 10 12 11 15 12 12 10 13 12 15 13 19 14 22 15
( ε = 15,M = 23) (b) YGA
( ε = 15,M = 20) (f) SMGA
AVERAGE M YGA SMGA 15.4 10.1 16.2 12.6 17.4 12.8 20.3 16.0 23.1 20.0 7.6 6.0 9.1 7.0 10.4 10.0 12.4 11.0 15.4 12.0 13.3 10.0 13.6 12.1 16.3 13.0 19.5 14.0 23.0 15.2
( ε = 6,M = 15) (c) YGA
( ε = 6,M = 12) (g) SMGA
VARIANCE YGA SMGA 0.5 0.0 0.3 0.0 0.4 0.0 0.3 0.0 0.4 0.0 0.2 0.0 0.3 0.0 0.4 0.0 0.3 0.0 0.4 0.0 0.3 0.0 0.4 0.0 0.5 0.0 0.3 0.0 0.7 0.0
( ε = 15,M = 22) (d) YGA
( ε = 15,M = 15) (h) SMGA
Fig. 7. The comparative results of SMGA and YGA [1] for Min-# problem, where ε is the specified error tolerance, M is the number of sides of the approximating polygon
Here, we conduct SMGA and YGA using three benchmark curves, leaf, chromosome and semicirle. For each curve and a specified error tolerance ε, the simulation conducts ten independent runs for SMGA and YGA, respectively. The best solution, average solution and variance of solutions during ten independent runs for SMGA and YGA are listed in Table 2, Parts of simulation results of SMGA
40
B. Wang and C. Shi
and YGA are shown in Fig. 7, where ε is the specified error tolerance, M is the number of sides of the approximating polygon. From Table 2 and Fig. 7, we can see that, for the same error tolerance, SMGA yields approximating polygon with relatively smaller number of sides than YGA. The average computation time of YGA for three benchmark curves, leaf, chromosome and semicirle, are 0.201s, 0.09s and 0.137s respectively, while SMGA only require 0.025s, 0.015s and 0.023s for them. It can be seen that SMGA outperforms YGA in the quality of the convergence speed.
6
Conclusion
We have proposed SMGA successfully to solve two types of polygonal approximation of digital curves, Min-# problem and Min-ε problem. The proposed chromosome-repairing technique of using split and merge techniques effectively overcomes the difficult problem of coping with infeasible solutions. The simulation results have shown that the proposed SMGA outperforms the existing GA-based methods which use other techniques of coping infeasible solutions for two types of polygonal approximation problems.
Acknowledgement The research work in this paper is partially sponsored by Shanghai Leading Academic Discipline Project, T0603.
References 1. Yin, P.Y.: Genetic Algorithms for Polygonal Approximation of Digital Curves. Int. J. Pattern Recognition Artif. Intell. 13 (1999) 1–22 2. Sklansky, J., Gonzalez, V.: Fast Polygonal Approximation of Digitized Curves. Pattern Recognition. 12 (1980) 327–331 3. Douglas, D.H., Peucker, T.K.: Algorithm for the Reduction of the Number of Points Required to Represent a Line or Its Caricature. The Canadian Cartographer. 12(2) (1973) 112–122 4. Leu, J.G., Chen, L.: Polygonal Approximation of 2D Shapes through Boundary Merging. Pattern Recgnition Letters. 7(4) (1988) 231–238 5. Ray, B.K., Ray, K.S.: A New Split-and-Merge Technique for Polygonal Apporximation of Chain Coded Curves. Pattern Recognition Lett. 16 (1995) 161–169 6. Teh, H.C., Chin, R.T.: On Detection of Dominant Points on Digital Curves. IEEE Trans Pattern Anal Mach Intell. 11(8) 859–872 7. Yin, P.Y.: A Tabu Search Approach to the Polygonal Approximation of Digital Curves. Int. J. Pattern Recognition Artif Intell. 14 (2000) 243–255 8. Yin, P.Y.: A New Method for Polygonal Approximation Using Genetic Algorithms. Pattern Recognition letter. 19 (1998) 1017–1026. 9. Huang, S.-C., Sun, Y.-N.: Polygonal Approximation Using Genetic Algorithms. Pattern Recognition. 32 (1999) 1409–1420
A Hybrid GA for Two Types of Polygonal Approximation Problems
41
10. Sun, Y.-N., Huang, S.-C.: Genetic Algorithms for Error-bounded Polygonal Approximation. Int. J. Pattern Recognition and Artificial Intelligence. 14(3) (2000) 297–314 11. Ho, S.-Y., Chen, Y.-C.: An Efficient Evolutionary Algorithm for Accurate Polygonal Approximation. Pattern Recognition. 34 (2001) 2305–2317 12. Yin, P.Y.: Ant Colony Search Algorithms for Optimal Polygonal Approximation of Plane Curves. Pattern Recognition. 36 (2003) 1783–1997 13. Yin, P.Y.: A Discrete Particle Swarm Algorithm for Optimal Polygonal Approximation of Digital Curves. Journal of Visual Communication and Image Representation. 15 (2004) 241–260
A Hybrid Model for Nondestructive Measurement of Internal Quality of Peach Yongni Shao and Yong He College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310029, China.
[email protected]
Abstract. A nondestructive optical method for determining the sugar and acidity contents of peach was investigated. Two types of preprocessing were used before the data were analyzed with multivariate calibration methods of principle component analysis (PCA) and partial least squares (PLS). A hybrid model combined PLS with PCA was put forwarded. Spectral data set as the logarithms of the reflectance reciprocal was analyzed to build a best model for predicting the sugar and acidity contents of peach. A model with a correlation coefficient of 0.94/0.92, a standard error of prediction (SEP) of 0.50/0.07 and a bias of 0.02/-0.01 showed an excellent prediction performance to sugar/acidity. At the same time, the sensitive wavelengths corresponding to the sugar content and acidity of peaches or some element at a certain band were proposed on the basis of regression coefficients by PLS.
1 Introduction Peach is one of the most important fruit in the agriculture markets of China and favored by many people. However the different varieties of peach are of different taste and quality. Both the appearances (shape, color, size, tactility, etc) and the interior qualities (sugar content, acidity and the vitamin content, etc) are the aspects which can be used as the quality criterion of peach, thereinto sugar and acid contents are the most important evaluation criterion which affects the consumers’ appreciation for selection. Most of the methods to measure these qualities are based on complex processing of samples, the most expensive chemical reagents and so on. Marco et al. applied high-performance liquid chromatography (HPLC) to test and analyze the quality of peach [1]. Wu et al. also used HPLC to analyze the change of sugar and organic acid in peach during its maturation [2]. Steinmetz et al. used sensor fusion technology to analyze peach quality [3]. Corrado et al. used electronic nose and visible spectra to analyze the peach qualities including SSC and acidity [4]. Near infrared spectroscopy (NIR) technique has several attractive features including fast analytical speed, ease of operation and nondestructive natures. The most important one is that it can give the response of the molecular transition of its corresponding chemical constituents to the spectrum, such as O-H, N-H, and C-H. In recent years, NIR has attracted considerable attention for the purpose of discrimination between sets of similar biological materials such as citrus oils [5], yogurt variety [6], honey [7], and apple D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 42 – 53, 2006. © Springer-Verlag Berlin Heidelberg 2006
A Hybrid Model for Nondestructive Measurement of Internal Quality of Peach
43
variety [8]. It is also regarded as a method for nondestructive sensing of fruit quality. Lammertyn et al. examined the prediction capacity of the quality characteristics like acidity, firmness and soluble solid content of Jonagold apples with a wavelength range between 380 and 1650nm [9]. Carlini et al. used visible and near infrared spectra to analyze soluble solids in cherry and apricot [10]. Lu evaluated the potential of NIR reflectance for measurement of the firmness and sugar content of sweet cherries [11]. McGlone et al. used Vis/NIR spectroscopy to analyze mandarin fruit [12]. Andre and Marcia predicted solids and carotenoids in tomato by using NIR [13]. There are many multivariate calibration used for quantitative analysis of sample constituents in NIRS. Principal Components Regression (PCR), Partial Least Squares (PLS) and Artificial Neural Networks (ANN) are the most useful multivariate calibration techniques [14, 15, 16]. The PCR can effectively compress the dimensions of the original independent variables by constructing the relationship between the original independent variables and new reduced dimension independent variables. However, the correlation degree of original independent variables and new reduced dimension independent variables is decreased, which lead to low prediction precision. The ANN, which is a popular non-liner calibration method in chemometrics, has a high quality in non-linear approximation. Nevertheless, the weaknesses of this method, such as its low training speed, ease of becoming trapped at a local minimum and over-fitting should be taken into account [17]. The PLS is usually considered for a large number of applications in fruit and juice analysis and is widely used in multivariate calibration. One important practical aspect of PLS is that it takes into account errors both in the concentration estimates and spectra. Therefore, PLS is certainly an invaluable linear calibration tool. Thus, this paper proposed PLS to predict the sugar and acid contents of peach. Although NIR based on non-destructive measurements have been investigated on some fresh fruits, information about peach is limited. It is known that SSC and pH values vary as a function of storage time and temperature. Slaughter studied that Vis/NIR spectroscopy could be used to measure non-destructively the internal quality of peaches and nectarines [18]. Pieris et al. did the research on spatial variation in soluble solids content of peaches by using NIR spectrometer [19]. Ortiz et al. used impact response and NIR to identify woolly peaches [20]. Golic and Walsh used calibration models which were based on near infrared spectroscopy for the in-line grading of peach for total soluble solids content [21]. The objective of this research is to examine the feasibility of using Vis/NIR spectroscopy to detect the sugar and acid contents of intact peach through using a hybrid model, which combined PLS with principle component analysis (PCA). At the same time, try to find sensitive wavelengths corresponding to the sugar and acidity contents of peach.
2 Materials and Methodology 2.1 Experimental Material To get dependable prediction equations from NIRS, it is necessary that the calibration set covers the range of fruit sources to which it will be applied. Three kinds of peaches:
44
Y. Shao and Y. He
Milu peach (from Fenghua of Zhejiang, China), Dabaitao peach (from Jinhua of Zhejiang, China) and Hongxianjiu peach (from Shandong, China) were used in this experiment. A total of 80 peaches used for the experiment were purchased at a local market and stored for two days at 20°C. By calculating and deleting all samples with PCA, two peaches were detected as outliers. So, 48 peaches were finally used for the calibration model, and 30 samples were used for prediction model. Peaches to be measured were selected to cover two parameters (sugar and acidity contents). All the peaches were cut in half and extracted using a manual fruit squeezer (model: HL-56, Shanghai, China). Samples of the filtered juice were then taken for sugar content measurement using digital refractometer (model: PR-101, ATAGO, Japan) by the China standard for sugar content measurement in fruit (GB12295-90). The measurement for acidity was using a pH meter (SJ-4A, Exact instrument Co., Ltd., Shanghai, China) also by the China standard. 2.2 Spectra Collection For each peach, reflection spectra was taken at three equidistant positions approximately 120° around the equator, and with each reflection spectra the scan number was 10 at exactly the same position, so a total scan for one peach was 30, with a spectrograph (FieldSpec Pro FR (325–1075 nm)/ A110070), Trademarks of Analytical Spectral Devices, Inc. (ASD), using RS2 software for Windows. Considering its 20° field-of-view (FOV), the spectrograph was placed at a height of approximately 100 mm above the sample and a light source of Lowell pro-lam 14.5V Bulb/128690 tungsten halogen (Vis/NIRS) was placed about 300 mm from the center of the peach to make the angle between the incident light and the detector optimally about 45°. To avoid low signal-noise ratio, only the wavelength ranging from 400 to 1000 nm was used in this investigation. In order to obtain enough sensitivity to measure the diffuse reflectance of the intact peach, each spectrum was recorded as log (1/R), where R=reflectance. 2.3 Processing of the Optical Data To test the influence of the preprocessing on the prediction of the calibration model, two types of preprocessing were used. First to reduce the noise, the smoothing way of Savitzky-Golay was used, with a gap of 9 data points. The second type of preprocessing was the use of the multiplicative scatter correction (MSC). This method was used to correct additive and multiplicative effects in the spectra. Once these preprocessing procedures were completed, a hybrid method combined PLS with PCA was used to develop calibration models for predicting the sugar content and the acidity. The pre-process and calculations were carried out using ‘The Unscrambler V9.2’ (CAMO PROCESS AS, Oslo, Norway), a statistical software package for multivariate calibration. 2.4 A Hybrid Method Combined PLS with PCA PLS is a bilinear modeling method where the original independent information (X-data) is projected onto a small number of latent variables (LV) to simplify the relationship between X and Y for predicting with the smallest number of LVs. The standard error of calibration (SEC), the standard error of prediction (SEP) and correlation coefficient (r) were used to judge the success and accuracy of the PLS model.
A Hybrid Model for Nondestructive Measurement of Internal Quality of Peach
45
In this paper, PCA combined with PLS regression was used to derive the first 20 principal components from the spectral data for further analysis to examine the relevant and interpretable structure in the data as well as outlier detection[22]. It was also used to eliminate defective spectra and at the same time some unnecessary wavelengths. Because PCA does not consider the concentration of the object, as in this paper the sugar and acidity, so PLS was used for further analysis of the sensitive wavelengths corresponding to sugar and acidity of peaches.
3 Results and Discussion 3.1 PCA on the Full Wavelength Region Spectra were exported from the ViewSpec software for multivariate analysis. First, the pretreatment method of multiplicative scatter correction (MSC) was used to correct for additive and multiplicative effects in the spectra after Savitzky-Golay smoothing. Then, PCA was used to derive the first 20 principal components from the spectral data for further analysis to examine the relevant and interpretable structure in the data as well as outlier detection. PCA was performed on the whole wavelengths from 400nm to 1000 nm for the total of 80 peaches in the training set, and two peaches were detected as outliers, it may caused by the man-made error when collecting the spectral curves. It was also noticed that the first four PCs could together explain over 98% of the total population variance and the remainders could account for little. Thus, the first four PCs were appropriate for characteristic description of the peach spectral curve. 3.2 Selection of Optimal Wavelengths Fig. 1 shows the loadings of first four principal components from 78 samples across the entire spectral region. It is called ‘The loading plot of PC1 to PC4’. As described above,
Fig. 1. Loadings of first four principal components from 78 peaches across the entire spectral region
46
Y. Shao and Y. He
the cumulative reliabilities of PC1 to PC4 were very high, so the loadings of PC1 to PC4 should be considered as the basis to eliminate unnecessary spectral for establishing the calibration model. From the loading figure, it also shows that the wavelengths before 700nm have more wave crest than wavelengths after 700nm. It indicates that the wavelengths in the visible spectral region played a very important role than near infrared region. But it may caused by the color difference of the peaches, not the sugar or acidity. So further analysis of PLS was used to ascertain the sensitive wavelengths of sugar and acidity of peach. 3.3 Sugar Content Prediction After analysis of PCA, two peaches were detected as outliers, and some unnecessary spectral were eliminated to establish the calibration model. PLS was finally used to establish the model for peach quality analysis. All 78 samples were separated randomly into two groups: A calibration set with 48 samples and the remaining 30 samples were used as the prediction set. The correlation coefficient of calibration between NIR measurements and the sugar content was as high as 0.94, with a SEC of 0.52. When the model was used to predict the 30 unknown samples, the correlation coefficient was 0.94, with a SEP of 0.50 and a bias of 0.02 (Fig. 2).
Fig. 2. Vis/NIR prediction results of sugar content for 30 unknown samples from the PLS model
3.4 Acidity Prediction The same disposal methods were used to predict the acidity of peach. The correlation coefficient of calibration between NIR measurements and the acidity was as high as 0.94, with a SEC of 0.08. And in prediction, the correlation coefficient was 0.92, with a SEP of 0.07 and a bias of -0.01 (Fig. 3).
A Hybrid Model for Nondestructive Measurement of Internal Quality of Peach
47
Fig. 3. Vis/NIR prediction results of acidity for 30 unknown samples from the PLS model
3.5 Analysis of the Sensitive Wavelengths Using Loading Weights and Regression Coefficients In the above discussion of the prediction results from the PLS model, no consideration was given to the contributions of the individual wavelength to the prediction results.
Fig. 4. Loading weights for sugar content of peaches by PLS
48
Y. Shao and Y. He
This is because the PLS method first applies linear transform to the entire individual wavelength data. As a result, it is often difficult to ascertain how individual wavelengths are directly related to the quantities to be predicted. However, it would be helpful to examine how sugar and acidity contents are simply related to individual wavelengths so that a better understanding of NIR spectroscopy reflectance may be obtained. As to sugar content, after PLS process which was carried out with the 48 samples were finished, the number of factors LVs in PLS analysis was determined as 4 by cross-validation (Fig. 4). By choosing spectral wavebands with the highest loading weights in each of those LVs across the entire spectral region, the optimal wavelengths were chosen: 905-910nm, 692-694 nm, 443-446nm, 480-484nm (in PC1), 975-978nm, 990-992nm, 701-703nm, 638-642nm (in PC2), 984-988nm (in PC3), 580-583nm (in PC4), which were taken as the characteristic wavelengths. And the reflectance values of those 42 wavelengths were set as PLS variable to establish the prediction model. It demonstrated that the prediction results were better than those using the entire spectral region (Fig. 5). To acidity measurement, the number of factors LVs in PLS analysis was determined also as 4 by cross-validation (Fig. 6). By choosing spectral wavelengths with the highest loading weights in each of those LVs across the entire spectral region, 38 wavelengths were chose as the optimal ones. And set them as PLS variable to establish the acidity prediction model. It showed that the prediction result was not as good as those using the entire spectral region (Fig. 7).
Fig. 5. Vis/NIR prediction results of sugar content for 30 unknown samples from the PLS model using several narrower spectral regions
To further analysis the sensitive wavelengths to sugar content and acidity, the regression coefficients were also analyzed, the results were similar to loading weights, shown in Fig. 8 and Fig. 9. From Fig. 8, we can find that wavelengths of 905-910nm, 975-998nm might be of particular importance for the sugar content calibration, the wavelengths in the visible regions like 488-494nm and so on may caused by the color or shape of the peaches. The peak at 968nm may caused by the 2v1+v3 stretching
A Hybrid Model for Nondestructive Measurement of Internal Quality of Peach
49
Fig. 6. Loading weights for acidity of peaches by PLS
Fig. 7. Vis/NIR prediction results of acidity for 30 unknown samples from the PLS model using several narrower spectral regions
vibration of water. The regression coefficients shown in Fig. 9 also have strong peaks and valleys at certain wavelengths, such as 900-902nm, 980~995 nm related to acidity. The wavelengths of visible spectral regions to acidity were similar to the sugar content,
50
Y. Shao and Y. He
Fig. 8. Regression coefficients with corresponding wavelengths for sugar content
Fig. 9. Regression coefficients with corresponding wavelengths for acidity
because there is non-existent of organic acids in this region of the spectrum. While the wavelengths between 700 to 950 nm are possible that it results from a 3rd overtone stretch of CH and 2nd and 3rd overtone of OH in peaches which was referred by Rodriguez-Saona et al. in their article about rapid analysis of sugars in fruit juices by
A Hybrid Model for Nondestructive Measurement of Internal Quality of Peach
51
FT-NIR spectroscopy [23]. Slobodan and Yukihiro also proposed the detailed band assignment for the short-wave NIR region useful for various biological fluids [24]. So in our research, for sugar content, wavelengths 905-910nm, 975~998nm might be of particular importance, and for acidity, 900-902nm, 980~995 nm were better. This found was similar to the earlier literature, such as He found a wavelength of 914nm was sensitive to the sugar content of satsuma mandarins. And near 900nm were sensitive wavelengths corresponding to organic acid of oranges [25].
4 Conclusions The results from this study indicated that it is possible to use a non-destructive technique to measure the sugar and acidity contents of peach using Vis/NIR spectroscopy. Through a hybrid method of PCA and PLS, a correlation was established between the absorbance spectra and the parameters of sugar content and acidity. The results were quite encouraging with a correlation coefficient of 0.94/0.92 and SEP of 0.50/0.07 for sugar/acidity, which showed an excellent prediction performance. At the same time, the sensitive wavelengths corresponding to the sugar content and acidity of peaches or some element at a certain band were proposed on the basis of regression coefficients by PLS. For sugar content, wavelengths 905~910nm, 975~998nm might be of particular importance and for acidity, 900~902 nm, 980~995nm were better. The sensitive wavelengths analysis is very useful in the field of food chemistry. Further research on other fruits is needed to improve the reliability and precision of this technology. Even if to peaches, the different growing phase, growing situation may lead to the different results. And it is also interesting to determine whether there are nondestructive optical techniques for measurement of the maturity indices to peaches, like skin color, flesh firmness, which can combined with sugar content and acidity.
Acknowledgments This study was supported by the Teaching and Research Award Program for Outstanding Young Teachers in Higher Education Institutions of MOE, PRC, Natural Science Foundation of China (Project No: 30270773), Specialized Research Fund for the Doctoral Program of Higher Education (Project No: 20040335034) and Natural Science Foundation of Zhejiang Province, China (Project No: RC02067).
References 1. Marco, E., Maria, C.M., Fiorella, S., Antonino, N., Luigi, C., Ennio, L.N., Giuseppe, P.: Quality Evaluation of Peaches and Nectarines by Electrochemical and Multivariate Analyses: Relationships between Analytical Measurements and Sensory Attributes. Food Chemistry, 60(4) (1997) 659-666 2. Wu, B.H., Quilot, B., Genard, M., Kervella, J., Li, S.H.: Changes in Sugar and Organic Acid Concentrations during Fruit Maturation in Peaches, P. Davidiana and Hybrids as Analyzed by Principal Component Analysis. Scientia Horticulturae, 103(4) (2005) 429-439
52
Y. Shao and Y. He
3. Steinmetz, V., Sevila, F., Bellon-Maurel, V.: A Methodology for Sensor Fusion Design: Application to Fruit Quality Assessment. Journal of Agricultural Engineering Research, 74 (1) (1999) 21-31 4. Corrado, D.N., Manuela, Z.S., Antonella, M., Roberto, P., Bernd, H., Arnaldo, D.A.: Outer Product Analysis of Electronic Nose and Visible Spectra: Application to the Measurement of Peach Fruit Characteristics. Analytica Chimica Acta, 459(1) (2002) 107-117 5. Steuer, B., Schulz, H. Lager, E.: Classification and Analysis of Citrus Oils by NIR Spectroscopy. Food Chemistry, 72(1) (2001) 113-117 6. He, Y., Feng, S.J., Deng, X.F., Li, X.L.: Study on Lossless Discrimination of Varieties of Yogurt Using the Visible/NIR-spectroscopy. Food Research International, 39(6) (2006) 645-650 7. Downey, G., Fouratier, V., Kelly, J.D.: Detection of Honey Adulteration by Addition of Fructose and Glucose Using Near Infrared Spectroscopy. Journal of Near Infrared Spectroscopy, 11(6) (2004) 447-456 8. He, Y., Li, X. L., Shao, Y. N.: Quantitative Analysis of the Varieties of Apple Using Near Infrared Spectroscopy by Principle Component Analysis and BP Model. Lecture Notes in Artificial Intelligence, 3809 (2005) 1053-1056 9. Lammertyn, J., Nicolay, B., Ooms, K., Semedt, V.De, Baerdemaeker, J.De.: Non-destructive Measurement of Acidity, Soluble Solids and Firmness of Jonagold Apples Using NIR-spectroscopy. Transactions of the ASAE, 41(4) (1998) 1089-1094 10. Carlini, P., Massantini, R. Mencarelli, F.: Vis−NIR Measurement of Soluble Solids in Cherry and Apricot by PLS Regression and Wavelength Selection. Journal of Agricultural and Food Chemistry, 48(11) (2000) 5236−5242 11. Lu. R.: Predicting Firmness and Sugar Content of Sweet Cherries Using Near-infrared Diffuse Reflectance Spectroscopy. Transactions of the ASAE, 44(5) (2001) 1265-1271 12. McGlone, V.A., Fraser, D.G., Jordan, R.B., Kunnemeyer, R.: Internal Quality Assessment of Mandarin Fruit by Vis/NIR Spectroscopy. Journal of Near Infrared Spectroscopy, 11(5) (2003) 323-332 13. Pedro, A.-M.K., Ferreira, M.-M.C.: Nondestructive Determination of Solids and Carotenoids in Tomato Products by Near-Infrared Spectroscopy and Multivariate Calibration. Analytical Chemistry, 77(8) (2005) 2505-2511 14. He, Y., Zhang, Y., Xiang, L. G.: Study of Application Model on BP Neural Network Optimized by Fuzzy Clustering. Lecture Notes in Artificial Intelligence, 3789 (2005) 712-720 15. Zhang, Y. D., Dong, K., Ren, L. F.: Patternre Cognition of Laser-induced Auto Fluorescence Spectrum from Colorectal Cancer Tissues Using Partial Least Square and Neural Network. China Medical Engineering, 12(4) (2004) 52-59 16. Dou, Y., Sun, Y., Ren, Y. Q., Ren, Y. L.: Artificial Neural Network for Simultaneous Determination of Two Components of Compound Paracetamol and Diphenhydramine Hydrochloride Powder on NIR Spectroscopy. Analytica Chimica Acta, 528(1) (2005) 55-61 17. Fu, X. G., Yan, G. Z., Chen, B., Li, H.B.: Application of Wavelet Transforms to Improve Prediction Precision of Near Infrared Spectra. Journal of Food Engineering, 69(4) (2005) 461-466 18. Slaughter, D.C.: Non-Destructive Determination of Internal Quality in Peaches and Nectarines. Transactions of the ASAE, 38(2) (1995) 617-623 19. Pieris, K.-H.S., Dull, G.G., Leffler, R.G., Kays, S.J.: Spatial Variability of Soluble Solids or Dry-matter Content within Individual Fruits, Bulbs, or Tubers: Implications for the Development and Use of NIR Spectrometric Techniques. Hortscience, 34(1) (1999) 114-118
A Hybrid Model for Nondestructive Measurement of Internal Quality of Peach
53
20. Ortiz, C., Barreiro, P., Correa, E., Riquelme, F., Ruiz-Altisent, M.: Non-destructive Identification of Woolly Peaches Using Impact Response and Near-infrared Spectroscopy. Journal of agricultural Engineering Research, 78(3) (2001) 281-289 21. Golic, M., Walsh, K.B.: Robustness of Calibration Models Based on Near Infrared Spectroscopy for the In-line Grading of Stonefruit for Total Soluble Solids Content. Analytical Chimica Acta, 555(2) (2006) 286-291 22. Naes, T., Isaksson, T., Fearn, T., Davies, A.M.: A User-friendly Guide to Multivariate Calibration and Classification, NIR Publications, UK (2002) 23. Rodriguez-Saona, L.E., Fry, F.S., McLaughlin, M.A., Calvey, E.M.: Rapid Analysis of Sugars in Fruits by FT-NIR Spectroscopy. Carbohydrate Research, 336(1) (2001) 63-74 24. Sasic, S., Ozaki, Y.: Short-Wave Near-Infrared Spectroscopy of Biological Fluids. 1. Quantitative Analysis of Fat, Protein, and Lactose in Raw Milk by Partial Least-Squares Regression and Band Assignment. Analytical Chemistry, 73(1) (2001) 64-71 25. He, Y.D.F.: 1998. The Method for Near Infrared Spectral Anlysis. In Yan, Y. L., Zhao, L. L., Han, D. H., Yang, S. M. (Eds.), The Analysis Basic and Application of Near Infrared Spectroscopy 354. Light Industry of China, Bei Jing
A Novel Approach in Sports Image Classification∗ Wonil Kim1, Sangyoon Oh2, Sanggil Kang3, and Kyungro Yoon4, ** 1
College of Electronics and Information Engineering at Sejong University, Seoul, Korea
[email protected] 2 Computer Science Department at Indiana University, Bloomington, IN, U.S.A.
[email protected] 3 Department of Computer Science, The University of Suwon, Gyeonggi-do, Korea
[email protected] 4 School of Computer Science and Engineering at Konkuk University, Seoul, Korea
[email protected]
Abstract. It will be very effective and useful if an image classification system uses a standardized feature such as MPEG-7 descriptors. In this paper, we propose a sports image classification system that properly classifies sports images into one of eight classes. The proposed system uses normalized MPEG-7 visual descriptors as the input of the neural network system. The experimental results show that the MPEG-7 descriptorscan be used as the main feature of image classification system.
1 Introduction In this paper, we propose sports image classification system that classifies images into one of eight classes, such as Taekwondo, Field and Track, Ice Hockey, Horse Riding, Skiing, Swimming, Golf, and Tennis. These eight sports are selected according to the particular features of the given sports. The proposed system uses MPEG-7 visual descriptors as the main input feature of the classification system. In this paper, we first analyze several MPEG-7 descriptors, regarding color, texture, and shapes. After which we discuss several descriptors that perform well on sports image classification. This task is effective and requires no intense time computation. It can be de facto standard for real time image classification. The simulation shows that the visual MPEG-7 descriptors can be effectively used as main features of the images classification process and the proposed system can successfully rate images into multiple classes depending on the employed descriptors. In the next chapter, we discuss some previous researches in the Neural Network based image classification, then image classification system using MPEG-7 descriptors, and finally sports image classification. The proposed system is explained in the next section. The simulation environment and the results are discussed in Chapter 4. Chapter 5 concludes. ∗ **
This paper is supported by Seoul R&BD program. Author for correspondence: +82-2-450-4129.
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 54 – 61, 2006. © Springer-Verlag Berlin Heidelberg 2006
A Novel Approach in Sports Image Classification
55
2 Related Works 2.1 Sports Image Classification Due to the large amount of digitized media being generated and the popularity of sports, sports image classification has become an area that requires all the techniques and methods that are described in this section. Jung et al [1] proposed a sports image classification system using Bayesian network. In this work, they showed that image mining approach using statistical model can produce a promising result on sports image classification. The existing CBIRS like QBIC and VisualSEEK [2] provide the image retrieval system based on method that are limited to low-level features such as texture, shape, and color histograms. There are researches that use various techniques to specific image domains, such as sports images. For automatic multimedia classification, Ariki and Sugiyama show a general study of classification problem of TV sports news and propose a method for this problem using multi space method that provides a sports category with more than one subspace corresponding to the typical scenes. Discrete Cosine Transformation (DCT) components are extracted from the whole image and they are used as the classification features [3]. Their other paper [4] contains in-depth experimental results. The Digital Video | Multimedia (DVMM) Lab [5] of Columbia University has done many researches in the image classification area. One of them is about structural and semantic analysis of digital videos [6]. Chang and Sundaram develop algorithms and tools for segmentation, summarization, and classification of video data. For each area, they emphasize the importance of understanding domain-specific characteristics, and discuss classification techniques that exploit spatial structure constraints as well as temporal transitional methods. One of the key problems in achieving efficient and user-friendly retrieval in the domain of images and videos is developing a search mechanism that guarantees the delivery of high precision information. One of the restrictions of image retrieval system is that it should have a sample object or a sample texture. Khan et al. [7, 8, 9, and 10] propose an image processing system in which it examines the relationships among objects in images to help to achieve a more detailed understanding of the content and meaning of individual images. It uses domain dependent ontology to create a meaning based index structure through the design and implementation of concept-based model. They propose a new mechanism to generate ontology automatically for scalable system. Their approach is applied to the sports image domain. ASSAVID is an EU sponsored project to develop a system for automatic segmentation and semantic annotation of sports video. Messer, Christmas and Kittler describe the method they use to automated classification of unknown sports video in their paper [11]. The technique is based on the concept of “cues” which attach semantic meaning to low-level features computed on the video. The paper includes experimental results with sports videos. Hayashi et al. [12] present a method to classify scenes based on motion information. Comparing to previous works that use object trajectories and optical flow field as motion information, they use the instantaneous motions of multiple objects in each image. To deal with number of objects in a scene, moment statistics are used as features in the method.
56
W. Kim et al.
The method consists of two phases: scenes in the learning data are clustered in the learning phase and a newly observed scene are classified in the recognition phase. 2.2 Neural Networks and MPEG-7 Descriptors Neural network has been used to develop methods for a high accuracy pattern recognition and image classification for a long period of time. Kanellopoulos and Wilkinson [13] perform their experiments about using different neural networks and classifiers to categorize images including multi-layer perceptron neural networks and maximum likelihood classifier. The paper examines the best practice in areas such as: network architecture selection, algorithm optimization, input data scaling, enhanced feature sets, and hybrid classifier methods. They have recommendations and strategies for effective and efficient use of neural networks as well. It is known that the neural network used for the modeling the image classification system should make different errors to be effective. Giacinto and Roli [14] propose an approach to ensemble automatic design of neural network. Their approach is aimed to select the subset of given large set of neural networks to form the most errorindependent nets. The approach consists of the overproduction phase and the choice phase, which choose the subset of neural networks. The overproduction phase is studied by Partidge [15] and the choice phase are sub-divided into the unsupervised learning step for identifying subsets and the final ensemble set creation step by selecting subsets from the previous step. Contrast to a relatively longer period of the study of neural network in image classification and content-based image retrieval system, MPEG-7 [16] is a recent emerging standard used in this area. It is not a standard dealing with the actual encoding and decoding of video and audio, but it is a standard for describing media content. It uses a XML to store metadata and it solves the problem of lacking standard to describe visual image content. The aim, scope, and details of MPEG-7 standard are well described by Sikora of Technical University Berlin in his paper [17]. There are a series of researches that use various MPEG-7 descriptors. Ro et al. [18] shows a study of texture based image description and retrieval method using an adapted version of homogeneous texture descriptor of MPEG-7. Other studies of image classification use descriptors like a contour-based shape descriptor [19], a edge histogram descriptor [20], and a combination of color structure and homogeneous texture descriptors [21]. As a part of the EU aceMedia project research, Spyrou et al. propose three image classification techniques based on fusing various low-level MPEG-7 visual descriptors [22]. Since a direct inclusion of descriptors would be inappropriate and incompatible, fusion is required to bridge the semantic gap between the target semantic classes and the low-level visual descriptors. The three different image classification techniques are: a merging fusion, a back-propagation fusion, and a fuzzy-ART neuro-fuzzy network. There is a CBIRS that combines neural network and MPEG-7 standard: researchers of Helsinki University of Technology develop a neural, self-organizing system to retrieve images based on their content, the PicSOM (the Picture + self-organizing map, SOM) [23]. The technique used to develop the PicSOM system is based on pictorial examples and relevance feedback (RF) and the system is implemented using tree structured SOM. The MPEG-7 content descriptor is provided for the system. In
A Novel Approach in Sports Image Classification
57
the paper, they compare the PicSOM indexing technique with a reference system based on vector quantization (VQ). Their results show the MPEG-7 content descriptor can be used in the PicSOM system despite the fact that Euclidean distance calculation is not optimal for all of them.
3 The Proposed Sports Image Classification System 3.1 Feature Extraction Module Our classification system for classifying sports image is composed of two modules such as the feature extraction module and the classification module. The two modules are connected in serial form, as shown in Fig. 1. In the feature extraction module, there are three engines. From the figure, MPEG-7 XM engine extracts the features of images with XML description format. The parsing engine parses the raw descriptions to transform them to numerical values, which are suitable for neural network implementation. The preprocess engine normalizes the numerical values to the 0-1 range. By normalizing the input features, it can avoid that input features with big number scale dominant the output of the neural network classifier (NNC) for the classification of the sports image over input features with small number scale.
Sports Image
Output MPEG-7 XM Engine
Parsing Engine
Preprocessing Engine
Neural Network Classifier
Classification Module
Feature Extraction Module
Fig. 1. The schematic of our sports image classification system
3.2 Classification Module Using the data set of the normalized input features and classes of sports, we can model an NNC in the classification module. Fig. 2 shows an example of the NNC with three layers, i.e., one input layer, one hidden layer, and one output layer. According to different MPEG-7 descriptors, the number of the input features can be various. Let us denote the input feature vector obtained from the first MPEG-7 descriptor as X D1 = ( xD1,1 , x D1, 2 , , x D1,i , , x D1,n ) , here x D1,i is the ith input feature 1
extracted from MPGE-7 descriptor 1 and the subscript n1 is the dimension of the input features from the first MPEG-7 descriptor. With the same way, the input feature vector obtained from the last MPEG-7 descriptor k can be expressed as X Dk = ( x Dk ,1 , x Dk , 2 , , x Dk ,i , , xDk ,n ) . Also, the output vector can be expressed as k
58
W. Kim et al.
x D1,1 x D1, 2 . . . . x Dk ,n
y1 y2
. . . .
. . . .
. . .
k
Input layer
Hidden layer
. . . .
ys
Output layer
Fig. 2. An example of three layered neural network classifier
Y = ( y1 , y 2 , , yi , , y s ) , here yi is the output from the ith output node and the subscript s is the number of classes. By utilizing the hard limit function in the output layer, we can have binary value, 0 or 1, for each output node yi as Equation (1).
netinputo ≥ 0 · ¸ otherwise ¸¹
§ 1, yi = f o (netinputo ) = ¨¨ © 0,
(1)
where fo is the hard limit function at the output node and netinput o is the net input of fo. As shown in Equation (2), the net input is can be expressed as the product of the output vector in the hidden layer, denoted as Yh, and the weight vector Wo at the output layer. netinputo = Wo Yh T
(2)
With the same way, the hidden layer output vector, Yh , can also be computed by functioning the product of the input weight vector and the input vector. Thus, the accuracy of the NNC depends on the values of whole weight vectors. To obtain the optimal weight vectors, the NNC is trained using the back-propagation algorithm which is commonly utilized for training neural networks. The training is done after coding each class of sports into s dimension orthogonal vector. For example, if we have eight classes then the classes are coded to (1, 0, 0, 0, 0, 0, 0, 0), (0, 1, 0, 0, 0, 0, 0, 0), . . . , (0, 0, 0, 0, 0, 0, 0, 1). Once obtaining an optimal weight vector, we evaluate the performance of NNC using the test data which is unseen during training phase.
4 Experiment 4.1 Experimental Environment
We implemented our sports image classification system using 8 sports image data such as Taekwondo, Field & Track, Ice Hocky, etc. As explained in the previous section, we extracted input features from query images using four MPEG-7 descriptors such as Color Layout (CL), Edge Histogram (EH), Homogenous Texture
A Novel Approach in Sports Image Classification
59
(HT), and Region Shape (RS) from the feature extraction module. The input feature values were normalized into 0-1 range. A total of 2,544 images were extracted. For training an NNC, 2,400 images (300 images each sports) were used and 144 images (18 images each sports) for testing. The training and testing images are exclusive. We structured the three-layered NNC in the classification module. The hyperbolic tangent sigmoid function and hard limit function was used in the hidden layer and in the output layer, respectively. In the hidden layer, 32 nodes were connected. For training
Table 1. The classification accuracies for 4 different MPEG-7 descriptors (%)
Taekwondo CL 77.78 RS 50.00 Taekwondo HT 87.50 EH 27.78 CL 0.00 Field & RS 5.56 Track HT 0.00 EH 11.11 CL 0.00 RS 0.00 Ice Hockey HT 4.65 EH 5.56 CL 0.00 RS 0.00 Horse Riding HT 5.56 EH 11.11 CL 0.00 RS 0.00 Skiing HT 0.00 EH 5.56 CL 5.56 Swim- RS 5.56 ming HT 5.71 EH 16.67 CL 16.67 RS 0.00 Golf HT 0.00 EH 5.56 CL 11.11 RS 5.56 Tennis HT 0.00 EH 11.11
Field & Ice Track Hockey 11.11 5.56 5.56 11.11 0.00 0.00 22.22 0.00 66.67 5.56 50.00 11.11 86.67 0.00 50.00 0.00 11.11 72.22 0.00 33.33 0.00 55.81 44.44 11.11 0.00 5.56 5.56 5.56 16.67 5.56 27.78 22.22 0.00 5.56 5.56 5.56 0.00 0.00 0.00 0.00 5.56 5.56 0.00 0.00 0.00 0.00 11.11 11.11 5.56 0.00 11.11 5.56 0.00 0.00 11.11 0.00 11.11 0.00 11.11 16.67 0.00 30.95 5.56 11.11
Horse Riding 0.00 5.56 0.00 0.00 16.67 5.56 0.00 16.67 0.00 38.89 6.98 33.33 83.33 77.78 33.33 33.33 5.56 0.00 0.00 16.67 0.00 11.11 17.14 5.56 5.56 11.11 0.00 11.11 5.56 11.11 4.76 5.56
Skiing Swimming 0.00 0.00 5.56 11.11 0.00 0.00 5.56 16.67 0.00 0.00 0.00 16.67 0.00 0.00 0.00 0.00 5.56 5.56 11.11 5.56 0.00 9.30 0.00 0.00 0.00 11.11 0.00 5.56 11.11 16.67 0.00 0.00 83.33 5.56 72.22 11.11 59.09 4.55 33.33 22.22 11.11 66.67 5.56 44.44 2.86 54.29 27.78 22.22 0.00 0.00 11.11 11.11 0.00 25.00 22.22 11.11 16.67 11.11 5.56 5.56 0.00 19.05 27.78 11.11
Golf
Tennis
0.00 11.11 12.50 22.22 0.00 11.11 13.33 11.11 0.00 11.11 23.26 0.00 0.00 0.00 5.56 5.56 0.00 5.56 36.36 16.67 0.00 16.67 20.00 5.56 72.22 33.33 68.18 33.33 0.00 16.67 11.90 5.56
5.56 0.00 0.00 5.56 11.11 0.00 0.00 11.11 5.56 0.00 0.00 5.56 0.00 5.56 5.56 0.00 5.56 0.00 0.00 5.56 0.00 16.67 0.00 0.00 0.00 16.67 6.82 5.56 44.44 27.78 33.33 22.22
60
W. Kim et al.
the NNC, we chose the back-propagation algorithm because of its training ability. In order to optimal weight vectors, large number of iterations (500,000 in this experiment) is selected. 4.2 Experimental Result
Table 1 shows the result of the accuracy of our sports image classification system for each sports image according to 4 different MPEG-7 descriptors. As seen in the table, we can see the input features extracted from Color Layout descriptor provide the best overall performance (about 70% accuracy) of classifying sports images for all sports except Field & Track since its image consists of both Track and Field. While those results from Region Shape descriptor do not work well for most of the sports, the input features from Region Shape work relatively well for the speedy sports such as Horse Riding (77.78%) and Skiing (72.22%). The results from Homogenous Texture for the outdoor sports such as Field & Track (86.76%) and Golf (68.18%) are also acceptable. From the analysis, we can say that our sports image classification system shows promising results for classifying sports images when the input features extracted from Color Layout descriptor are used as inputs of the NNC. Other descriptors can be complementary features according to images and domain.
5 Conclusion This paper proposed a novel classification system for classifying sports images using the neural network classifier. From the experimental results, we can conclude that the system provides acceptable classification performance (about 70%) when Color Layout MPEG-7 descriptor is used for extracting the input features of a neural network classifier. As the further researches for improving the classification performance, we continue to find the best combination of MPEG-7 descriptors by heuristic algorithms and empirical experiments. In the next research, we plan to extend the number of available sports to more than 20 instead of 8 mentioned in this paper.
References 1. Jung, Y., Hwang, I., Kim, W.: Sports Image Classification Using Bayesian Approach. Lecture Notes in Computer Science, Vol. 3697. Springer-Verlag, Berlin Heidelberg, New York (2003) 426-437 2. Smith, J., Chang, S.: Tools and Techniques for Color Image Retrieval. In Proceedings of The Symposium on Electronic Imaging: Science and Technology Storage and Retrieval for Image and Video Databases (1996) 426-437 3. Ariki, Y., Sugiyama, Y.: Classification of TV Sports News by DCT Features Using Multisubspace Method. In Proceedings of 14th International Conference on Pattern Recognition, Vol. 2 (1998) 1488-1491 4. Sugiyama, Y., Ariki, Y.: Automatic Classification of TV Sports News Video by Multiple Subspace Method. Systems and Computers in Japan, Vol. 31, No. 6 (2000) 90-98 5. Digital Video Multi Media (DVMM) Lab of Columbia University, http://www.ctr. columbia.edu/dvmm/newHome.htm
A Novel Approach in Sports Image Classification
61
6. Chang, S., Sundaram, H.: Structural and Semantic Analysis of Video. In Proceedings of IEEE International Conference on Multimedia and Expo (2000) 687 7. Khan, L., McLeod, D., Hovy, E.: Retrieval Effectiveness of An Ontology-based model for Information Selection. The VLDB Journal: The International Journal on Very Large Databases, Vol. 13, No. 1. ACM/Springer-Verlag Publishing (2004) 71-85 8. Khan, L., Wang, L.: Automatic Ontology Derivation Using Clustering for Image Classification. In Proceedings of 8th International Workshop on Multimedia Information System (2002) 56-65 9. Breen, C., Khan, L., Kumar, A., Wang, L.: Ontology-based Image Classification Using Neural Networks. In Proceedings of SPIE Internet Multimedia Management Systems III (2002) 198-208 10. Breen, C., Khan L., Ponnusamy, A.: Image Classification Using Neural Network and Ontologies. In Proceedings of 13th International Workshop on Database and Expert Systems and Application, Vol. 2 (2002) 98-102 11. Messer, K., Christmas, W., Kittler, J.: Automatic Sports Classification. In Proceedings of 16th International Conference on Pattern Recognition, Vol. 2 (2002) 1005-1008 12. Hayashi, A., Nakashima, R., Kanbara, T., Suematsu, N.: Multi-object Motion Pattern Classification for Visual Surveillance and Sports Video Retrieval. In Proceedings of 15th International Conference on Vision Interface (2002) 13. Kanellopoulos, I., Wilkinson, G.: Strategies and Best Practice for Neural Network Image Classification. International Journal of Remote Sensing, Vol. 18, No. 4 (1997) 711-725 14. Giacinto, G., Roli, F.: Design of Effective Neural Network Ensembles for Image Classification Purposes. Image and Vision Computing, Vol. 19, No. 9-10 (2001) 699-707 15. Patridge, D.: Network Generalization Differences Quantified. Neural Networks, Vol. 9, No. 2 (1996) 263-271 16. MPEG-7 overview, http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm. 17. Sikora, T.: The MPEG-7 Visual Standard for Content Description – an overview. IEEE Transactions on Circuit and Systems for Video Technology, Vol. 11, No. 6 (2001) 696-702 18. Ro, Y., Kim, M., Kang, H., Manjunath, B., Kim, J.: MPEG-7 Homogeneous Texture Descriptor. ETRI Journal, Vol. 23, No. 2 (2001) 41-51 19. Bober, M.: The MPEG-7 Visual Shape Descriptors. IEEE Transactions on Circuit and Systems for Video Technology, Vol. 11, No. 6 (2001) 716-719 20. Won, C., Park, D., Park, S.: Efficient Use of MPEG-7 Edge Histogram Descriptor. ETRI Journal, Vol. 24, No. 1 (2002) 23-30 21. Pakkanen, J., Ilvesmaki, A., Iivarinen, J.: Defect Image Classification and Retrieval with MPEG-7 Descriptors. Lecture Notes in Computer Science, Vol. 2749. Springer-Verlag, Berlin Heidelberg, New York (2003) 349-355 22. Spyrou, E., Borgne, H., Mailis, T., Cooke, E., Arvrithis, Y., O’Connor H.: Fusing MPEG7 Visual Descriptors for Image Classification. Lecture Notes in Computer Science, Vol. 3697. Springer-Verlag, Berlin Heidelberg, New York (2005) 847-852 23. Laaksonen, J., Koskela, M., Oja, E.: PicSOM – Self-organizing Image Retrieval with MPEG-7 Content Descriptor. IEEE Transactions on Neural Networks: Special Issue on Intelligent Multimedia Processing, Vol. 13, No. 4 (2002) 841-853
A Novel Biometric Identification Approach Based on Human Hand∗ Jun Kong1, 2, ∗∗, Miao Qi1, 2, Yinghua Lu1, Shuhua Wang1, 2, and Yuru Wang1, 2 1
Computer school, Northeast Normal University, Changchun, Jilin Province, China 2 Key Laboratory for Applied Statistics of MOE, China {kongjun, qim801, luyh, wangsh946, wangyr950}@nenu.edu.cn
Abstract. At present, hand-based identification as a biometric technique is being widely researched. A novel personal identification approach is presented in this paper. In contrast with the existing approaches, this system extracts multimodal features, including hand shape, palm-print and finger-print to facilitate coarse-to-fine dynamic identification. Five hand shape geometrical features are used to guide the selection of a small set of similar candidate samples at coarse level matching stage. In fine level matching stage, the features of one palmprint region and six finger regions segmented from three middle fingers are used for the final confirmat ion. The Gabor filters and wavelet moment are used to extract the palm-print feature. In addition, the maximum matching method and the fusion matching mechanism are applied in decision stage. The experimental results show the effectiveness and reliability of the proposed approach.
1 Introduction Hand-based recognition systems verify a person’s identity by analyzing his/her physical features, which have been widely used in many personal identification applications because they possess the following physiological properties: acceptability, uniqueness and arduous duplicate characteristics such as fingerprints, face, iris and retina, etc. However, it has been reported in [1] that hand-based identification is one of the most acceptable biometric. Human hand contains a lot of visible characteristics features including hand shape, principal lines, wrinkles, ridges and finger texture, which are unique to an individual and stable with the growth of age. How to extract these features is a key step for identification. From the viewpoint of feature extraction, existing hand-based recognition approaches mainly include the line-based approaches [2-4], the texture-based approaches [5-7] and appearance-based approaches [8-10]. And most of the existing systems are based on single palm-print feature which might lead to low recognition rate sometimes. Therefore, the multimodal biometric identification system integrating two or more different biometric features is being developed. ∗
This work is supported by science foundation for young teachers of Northeast Normal University, No. 20061002, China. ∗∗ Corresponding author. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 62 – 71, 2006. © Springer-Verlag Berlin Heidelberg 2006
A Novel Biometric Identification Approach Based on Human Hand
63
Image Acquisition
Pre-processing and Sub-image Segmentation
N
Library1
Coarse-level Identification Y Sub-images Feature Extraction
Index Vector
Fine-level Identification
Library2
Decision N
Y
Owner
Attacker Fig. 1. Block diagram of the proposed identification system
In our multimodal biometric identification system, hand geometrical features, palm-print region of interest (PROI) features and six finger strip of interest (FROI) features are employed and a coarse-to-fine dynamic identification strategy is adopted to implement a reliable and real time personal identification system. A block-diagram of the proposed system is shown in Fig. 1, where hand geometrical features and texture features are stored in Library1 and Library2, respectively. Firstly, handprint image is captured by a flatbed scanner as an input device. Then, a series of pre-processing operations are employed for the segmentation of PROI and FROI, and geometry features are also obtained in the process of pre-processing. The hand shape geometry features are first used for coarse-level identification. And the 2D Gabor filters and wavelet moment are used to extract the PROI features for finelevel identification. At decision stage, the maximum matching method and fusion matching mechanism are employed to output the identification result. The rest of this paper is organized as follows. Section 2 introduces the image acquisition and the segmentation of FROI and PROI. Section 3 describes the Gabor filters and wavelet moment function in brief. The process of personal identification is depicted in Section 4. The experimental results are reported in Section 5. The conclusions are given in Section 6.
64
J. Kong et al.
2 Images Acquisition and Pre-processing 2.1 Images Acquisition In our system, no guidance pegs are fixed on the flatbed scanner. The users place their right hands freely on the platform of the scanner and the images collected are shown in Fig. 2. The advantage of this scanning manner is that the palm need not to be inked or no docking device required on the scanner to constrain the hand position. In this way, the user will not feel uncomfortable during the images acquisition.
(a)
(b)
(c)
(d)
(e1)
(e2)
(e3)
(e4)
Fig. 2. (a)-(d), (e1)-(e4) are the original gray-level images of handprint scanned from different persons and the same person, respectively
2.2 Pre-processing Before feature extraction, the segmentation of a PROI and six FROI is performed. The process of segmentation mainly includes two steps in our proposed approach: border tracing and key points locating, PROI and FROI generating, which are detailed in the following paragraphs. Step 1: Binarize the hand image by Otsu’s method [11]. Then trace the border starting from the left top to obtain the contour of hand shape which is sequentially traced and represented by a set of coordinates. The finger-webs location algorithm proposed by Lin [12] is used to obtain the seven key points (a-g) (as shown in Fig. 3). Step 2: Based on Step 1, the six regions of interest including one PROI and six FROI, are segmented and five hand shape features are also extracted in following process of pre-processing: 1. Find the point h and k which are the intersection of lines db , Then compute the midpoints m1 , m2 , m3 and
df with hand contour.
m4 of lines hb , bd , df and fk .
A Novel Biometric Identification Approach Based on Human Hand
2. Find line and line
AB which is parallel to line bf , and the distance L between line AB
bf is 50 pixels.
3. Form five length features by computing length of lines of and
65
am1 , cm2 , em3 , gm4
AB . Locate the top left corner R1 and top right corner R2 of PROI. As
shown in Fig. 3, line
fR2 is perpendicular to line bf and the length of line fR2
is 20 pixels. In addition, the length of line R1 R2 is 20 pixels longer than line bf . Fig. 4(a) shows the segmented square region R1 R2 R4 R3 as PROI.
4. Extract two finger strips of interest (FSOI) on ring finger with the sizes of 50 × 32 and 60 × 32 according to line cm2 (see Fig. 4(b)). 5. Find two FROI of size 32 × 32 with the maximal entropy value on the two FSOI segmented in Step 5. 6. Repeat Step 4 and 5 to find the other four FROI on middle and index fingers based on lines em3 and gm4 . Then store the six FROI (see Fig. 4(c)) as templates.
Fig. 3. The process of handprint segmentation
(a)
(b)
(c)
Fig. 4. (a) The segmented PROI. (b) The FSOI on three middle fingers. (c) The FROI segmented from (b).
66
J. Kong et al.
PROI images segmented from different person may be different in size, even come from the same person. The size is variable with the difference of stretch degree. The PROI is normalized to 128 × 128 pixels in our work.
3 Feature Extraction 3.1 Gabor Filtering Gabor filter is wildly used to feature extraction [13-16], which has been already demonstrated to be a powerful tool in texture analysis. A circular 2-D Gabor filter form is defined as:
x2 + y2 ½ exp ®− 2 ¾ . 2πσ 2 ¯ 2σ ¿ exp{2πi (ux cos θ + uy sin θ )},
G ( x, y ,θ , u , σ ) =
1
(1)
where i= − 1 , u is the frequency of the sinusoidal wave, θ controls the orientation of the function and σ is the standard deviation of the Gaussian envelope. By intuitive observation about the PROI images from different persons, we found that the dominant texture lines mainly lie on π / 8 , 3π / 8 , 3π / 4 directions. But there will be more pseudo texture lines due to different tensility and pressure in the direction 3π / 4 on captured image. Therefore, the Gabor filter is convoluted with the PROI in two directions in our study: π / 8 and 3π / 8 . The filtered image is employed by the real part. Then an appropriate threshold value is selected to binarize the filtered
(a)
(b) Fig. 5. (a) Binary images of PROI filtered using Gabor filter with two directions
3π / 8
π /8 ,
from two different persons. (b) The results of (a) processed by sequential morphological operations.
A Novel Biometric Identification Approach Based on Human Hand
67
image. Fig. 5(a) shows a part of results of the binarized images. Finally, morphological operators including clean, spur and label are employed to removes the spur, isolated pixels and trim some short feature lines (shown in Fig. 5(b)). 3.2 Wavelet Moment Representation The characteristic of wavelet moment method is particularly suitable for extracting local discriminative features of normalized images. Its translation, rotation and scale invariance promote itself as a widely used feature extraction approach [17]. The family of wavelet basis function is defined as:
ψ a , b (r ) =
1
a
ψ(
r −b ), a
(2)
where a is dilation and b is shifting parameters. The cubic B-spline in Gaussian approximation form is:
ψ β (r ) = n
where
(2r − 1) 2 ½ 4a n+1 σ w cos(2πf 0 (2r − 1)) exp®− 2 ¾, . 2π (n + 1) ¯ 2σ w (n + 1) ¿
(3)
n =3, a = 0.697066, f 0 = 0.409177, and σ w2 = 0.561145. Since the size r of
an image is always restricted in a domain [0, 1], let both parameters be set to 0.5, and the domains for m and n can be restricted as follows:
a = 0.5 m , m = 0, 1, ..., M , b = n ⋅ 0.5.0.5 m , n = 0, 1, ..., 2 m+1 .
(4)
Then the wavelet defined along a radial axis in any orientation can be rewritten as:
ψ mβ ,n (r ) = 2 m / 2ψ β (2 m r − 0.5n). n
n
(5)
And the cubic B-spline Wavelet moments (WMs) are defined as:
Wm,n ,q = ³ ³ f (r ,θ )ψ mβ ,n (r )e − jpθ rdrdθ . n
(6)
If N is the number of pixels along each axis of the image, then the cubic B-spline WMs for a digital image f ( r , θ ) can be defined as:
Wm,n ,q = ¦¦ f ( x, y )ψ mβ ,n (r )e − jpθ ΔxΔy, n
x
y
(7)
r = x + y ≤ 1, θ = arctan( y x) 2
2
4 Identification 4.1 Coarse-Level Identification Though the geometrical length features are not so discriminative but it can be used in coarse level matching to facilitate the system to work on a small candidates. Five
68
J. Kong et al.
hand shape length values are obtained in the pre-processing block. There are M training samples for every person X in the enrollment stage. μ is the template which is the mean vector of the M vectors. The similarity between testing sample to the template is measured by the Manhattan distance defined as follows: L
d ( x, μ ) = ¦ | xi −μ i |,
(8)
i =1
If the distance is smaller than pre-defined threshold value, record the index number of the template into an index vector R for fine-level identification. 4.2 Fine-Level Identification The index vector R has been recorded in coarse-level identification stage. In this section, the testing image will be further matched with the templates whose index numbers are in R. One PROI and six FSOI regions are segmented from the testing sample as shown in Fig. 4. The correlation function is adopted to compute the correlation value between the FSOI and the template. The matching rule is that a template in Library2 moves from up to down on FSOI of testing sample, and there is a correction value when the template moves one time. At last we select a maximal value as correlation value. The PROI of testing sample convolutes with Gabor filter in two directions. Then the feature vector of the PROI is computed by wavelet moment. The correlation function is used again for measuring the similarity degree. The outputs of the eight matching results are combined at the matching-score level using fusion procedure. The fusion mechanism is expressed as following equation: 8
S = ¦ wi ⋅ si ,
(9)
i =1
where wi is weight factor associated with each of the hand parts and fulfill the condition w1 + w2
+ ... + w8 = 1 and their values are set w1 = w8 = 0.13, w2 = 0.14, w3 = 0.12, w4 = w5 = w6 = 0.11 and w7 = 0.15. 5 Experimental Results In this section, our proposed approach is performed to evaluate the effectiveness and accuracy. A handprint database contains 1000 handprint images collected from 100 individuals’ right hand using our flatted scanner. The size of all images is 500×500 and the resolution is 300 dip. Five images of per user were used for training and remaining five images were employed for the testing. Each image was processed by the procedure involving pre-processing, segmentation and feature extraction. At the stage of coarse-level identification, the threshold value is set 30. The final identification results are usually quantified by false rejection rate (FRR) and false
A Novel Biometric Identification Approach Based on Human Hand
acceptation rate (FAR) which are variable depending on the threshold
69
p .The distribu-
tions of FRR ( p ) and FAR ( p ) are depicted in Fig. 6. There is also another threshold T2 is selected for fine-level identification. More than one template may smaller than T2 at final outputs. We select the smallest distance between the query sample and template as the final identification result. If the number of output is zero, it illuminates that the query sample is an attacker. The accuracy of personal identification is measured by the correct match rate CMR which is defined as:
CMR = 1 − [ FRR( p0 ) + FAR( p0 )]. Seen from Fig. 6, the
(10)
CMR can reach the value 96.21%, when p0 =0.815,
FRR=1.97%, FAR=1.82%. Comparing with the single palm-print methods for personal identification, our approach fuses multiple features to facilitate fine-level identification, which increases the reliability of decisions. Failure identification occurs in some handprint. The main reason of the failure identification is the variation of pressure and tensility while acquiring handprint mages. The pseudo texture lines (as shown in Fig. 7) in side of the PROI lead to mismatch.
Fig. 6. The distributions of FRR
( p ) and FAR ( p )
Fig. 7. The PROI images with different pressure and tensility
70
J. Kong et al.
6 Conclusions In this paper, there are three main advantages of proposed coarse-to-fine dynamic matching strategy. Firstly, no guidance pegs scanned mode is adopted to capture handprint image, which won’t make user feel uncomfortable. But failure identification may occur in some handprint image in that there are pseudo texture lines in the side of PROI because of the variation of pressure and tensility or sometimes hand moves while acquiring handprint mages, which is the reason that the proposed system can’t reach very high CMR . Secondly, our system adopts a coarse-to-fine dynamic matching strategy, which implements the real-time of system. Thirdly, this system adopts a multimodal approach, rather than concentrating just on one of the hand area, which increases the reliability of decisions. 2-D Gabor filters and wavelet moment are employed to capture the texture feature on PROI. Based on the cubic B-spline wavelet function, it is near-optimal in terms of its space-frequency localization have the wavelet inherent property of multi-resolution analysis. The maximum matching method and the fusion matching mechanism are applied in decision stage. The experimental results show that the proposed multimodal personal identification approach is feasible and reliable.
References 1. Jain, A. K., Bolle, Biometrics, R.: Personal Identification in Networked Society, and S. Pankanti, eds. Kluewr Academic, (1999) 2. Rafael, Gonzalez, C., Richard, Woods, E.: Digital Image Processing Using Matlab. IEEE Press, New York (2001) 405-407 3. Wu, X., Zhang, D., Wang, K., Bo Huang: Palmprint Classification Using Principal Lines. Patt. Recog. 37 (2004) 1987-1998 4. Wu, X., Wang, K.: A Novel Approach of Palm-line Extraction. Proceedings of International Conference on Image Processing, New York (2004) 5. Han, C. C., Cheng, H. L., Lin, C. L., Fan, K. C.: Personal Authentication Using Palm-print Features. Patt. Recog. 36 (2003) 371–381 6. You,, J., Li, W., Zhang, D.: Hierarchical Palmprint Identification Via Multiple Feature Extraction. Patt. Recog. 35 (2003) 847–859 7. Zhang, D., Kong, W. K., You, J., Wong, M.: On-line Palmprint Identification. IEEE Trans. Patt. Anal. Mach. Intell. 25 (2003) 1041-1050 8. Jing, X. Y., Zhang, D.: A Face and Palmprint Recognition Approach Based on Discriminant DCT Feature Extraction. IEEE Transaction on systems, Man and Cybernetics, 34 (2004) 2405-2415 9. Wu, X., Zhang, D., Wang, K.: Fisherpalms Based Palmprint Recognition. Patt. Recog. Lett. 24 (2003) 2829–2838 10. Connie, T., Jin, A. T. B., Ong, M. G. K., Ling, D. N. C: An Automated Palmprint Recognition System. Image and Vision Computing, 23 (2005) 501-515 11. Slobodan, Ribaric, Ivan, Fratric: A Biometric Identification System Based on Eigenpalm and Eigenfinger Features. IEEE Trans. Patt. Anal. Mach. Intell. 27 (2005) 1698-1709 12. Chih-Lung, Lin, Thomas, Chuang, C., Kuo-Chin Fan: Palmprint Verification Using Hierarchical Decomposition. Patt. Recog. 38 (2005) 2639-2652
A Novel Biometric Identification Approach Based on Human Hand
71
13. Kong, W. K., Zhang, D., Li, W.: Palmprint Feature Extraction Using 2-D Gabor filters. Patt. Recog. 36 (2003) 2339-2347 14. Sanchez-Avila, C., Sanchez-Reillo, R.: Two Different Approaches for Iris Recognition using Gaobr Filters and Multiscale Zero-crossing Representation. Patt. Recog. 38 (2005) 231-240 15. Ahmadian, M. A.: An Efficient Texture Classification Algorithm Using Gabor Wavelet. Proceedings of the 25 Annual International Conference of the IEEE EMBS Cancun, Mexico (2003) 17-21 16. Lee, T. S.: Image Representation Using 2-D Gabor Wavelets. IEEE Trans. Patt. Anal. Mach. Intell. 18 (1996) 959-971 17. Pan, H., Xia, L. Z.: Exact and Fast Algorithm for Two-dimensional Image Wavelet Moments via The Projection Transform. Patt. Recog. 38 (2005) 395-402
A Novel Color Image Watermarking Method Based on Genetic Algorithm Yinghua Lu1, Jialing Han1, 2, Jun Kong1, 2, *, Gang Hou1, 3, and Wei Wang1 1
Computer School, Northeast Normal University, Changchun, Jilin Province, China 2 Key Laboratory for Applied Statistics of MOE, China 3
College of Humanities and Science, Northeast Normal University, Changchun, China {kongjun, hanjl147, luyh}@nenu.edu.cn
Abstract. In the past a few years, many watermarking approaches have been proposed for solving the copyright protection problems, most of the watermarking schemes employ gray-level images to embed the watermarks, whereas the application to color images is scarce and usually works on the luminous or individual color channel. In this paper, a novel intensity adaptive color image watermarking algorithm based on genetic algorithm (CIWGA) is presented. The adaptive embedding scheme in three channels’ wavelet coefficients, which belong to texture-active regions, not only improves image quality, but also furthest enhances security and robustness of the watermarked image. The experimental results show that our method is more flexible than traditional methods and successfully fulfills the compromise between robustness and image quality.
1 Introduction With the widespread use of digital multimedia and the development in computer industry, digital multimedia contents suffer from infringing upon the copyrights with the digital nature of unlimited duplication, easy modification and quick transfer over the Internet. As a result, copyright protection has become a serious issue. Hence, in order to solve this problem, digital watermarking technique has become an active research area [1] [2] [4]. In the past a few years, most of the watermarking schemes employ gray-level images to embed the watermarks, whereas their application to color images is scarce and usually works on the luminous or individual color channel. Fleet [3] embedded watermarks into the yellow-blue channel’s frequency domain. Kutter et al. [5] proposed another color image watermarking scheme that embedded the watermark into the blue-channel of each pixel by modifying its pixel value. But they didn’t notice that the capacity of hiding information in different color channel is varied with the image changing. In this paper, a novel watermarking embedding method based on genetic algorithm (GA) is *
Corresponding author. This work is supported by science foundation for young teachers of Northeast Normal University, No. 20061002, China.
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 72 – 80, 2006. © Springer-Verlag Berlin Heidelberg 2006
A Novel Color Image Watermarking Method Based on Genetic Algorithm
73
proposed. GA is applied to analyze the influence on original image when embedding and the capacity of resisting attacks in every channel. Then the optimized intensity is selected for every color channel. Using GA can improve image quality and furthest enhance security and robustness of the watermarked image simultaneously. This algorithm fulfills an optimal compromise between the robustness and image quality. This paper is organized as follows: the watermark embedding algorithm and extraction algorithm are described in Section 2 and Section 3, respectively. Experimental results are presented in Section 4. Finally, conclusions are given in Section 5.
2 The Embedding Algorithm 2.1 The Host Image Analyzing Every host image has its own color information and texture features. Based on human visual system’s characteristic, edges and complex texture of image have good visual mask effect. So the watermark is always embedded into these regions to ensure the imperceptibility [4]. In our study, for the purpose of getting active regions in host image and taking less time, the block-variance method is employed, which divides the host image into subblocks and computes each sub-block’s variance for detecting texture active regions. The process is based on block-wise, as follows: 1. Separate three channels’ sub-image R, G, B from host image I. 2. Divide each three channel sub-image into un-overlapped 8 × 8 sub-blocks in spatial domain. 3. Compute each image sub-block’s variance. Variance can measure the relative smoothness and contrast of the intensity in a region. 4. Compute each sub-image’s average variance. Compare each block’s variance with average variance. If block’s variance is greater than the average value, the block is classified as the texture active region. The two results of green channel of ‘Lena’ image and ‘Baboon’ image after texture region analysis using our algorithm are shown in Fig. 1. The image sub-blocks unchanged are the relative active regions.
(a)
(b)
Fig. 1. (a) and (b) are the result images of ‘Lena’ and ‘Baboon’ texture regions analysis using block variance method
74
Y. Lu et al.
(a)
(b)
Fig. 2. (a) and (b) are the texture active region sub-images of Fig. 1
We extract sub-blocks, which belong to the texture active regions respectively from three sub-images R, G and B, and then three new sub-images called texture active region sub-images are formed using these blocks. The texture active region sub-images are depicted in Fig. 2. Finally, our watermark is embedded into the frequency domain of these sub-images. 2.2 Intensity Optimizing Using GA For the texture active region sub-images, the discrete wavelet decomposition is adopted in frequency domain to embed watermarks. The multi-resolution feature and compatibility to JPEG-2000 compression standard [7] of wavelet transform make the embedded watermark robust to compression operation. Intensity optimal selecting algorithm is described as follows: 1. 2.
Transform three texture active region sub-images using discrete wavelet transform. Select coefficients to embed watermark W. Insert watermark signal at coefficients called w _ co using additive modulation. Every color channel has its own embedding intensity as α ( i ) . the wavelet coefficients after embedding.
w _ co w denotes
w _ co w = w _ co + α (i )× W i ∈ {1, 2, 3} . w
(1)
Perform the inverse discrete wavelet transform on w _ co . Embed the watermarked sub-images back into the original host image to get the watermarked color image I’. 5. Apply the attacking schemes on I’, and then adopt the GA training process to search for the optimal intensity for each channel. The flowchart for illustrating intensity optimal selecting algorithm using GA is shown in Fig. 3. Not all watermarking applications require robustness to all possible signal processing operations. In addition, the watermarked image after attacking needs to be worthy of using or transmitting. Therefore, some attacks like image-cropping is not employed in our GA training procedure. In this paper, three major attacking schemes are employed, namely, additive noise attack, median filtering attack, and JPEG attack with 3. 4.
A Novel Color Image Watermarking Method Based on Genetic Algorithm
75
Fig. 3. The flowchart of intensity optimizing algorithm
quality factor of 50%. The quality of watermark extracted from embedded image I’ is measured by the normalized correlation (NC). The NC between the embedded watermark W (i, j ) and the extracted watermark W ' i, j is defined as,
( )
H
NC
L
¦ ¦ W (i , j ) × W ' (i , j ) = ¦ ¦ [W (i, j )] i =1
j =1
H
L
i =1
j =1
2
.
(2)
The watermarked image’s quality is represented by the peak signal-to-noise ratio (PSNR) between the original color image I and watermarked image I’, as follows,
PSNR = 10 × log 10 (
M × N × max( I 2 (i, j )) M
¦ ¦ i =1
N
[ I (i, j ) − I ' (i, j )] 2 j =1
).
(3)
After obtaining the PSNR of the watermarked image and the three NC values after attacking, we are ready to adopt the GA training process. The fitness function in the mth iteration is defined as: 3
f m = −( PSNRm + λ ¦ NC m, i ) , i =1
(4)
76
Y. Lu et al.
where f m is fitness value, λ is the weighting factor for the NC values. Because the PSNR values are dozens of times larger than the NC values in the GA fitness function, the NC values are magnified with the weighting factors λ in the fitness function to balance the influences caused by both the imperceptibility and robustness requirements. 2.3 Watermark Embedding
The first five steps of watermark embedding algorithm are the same as intensity optimal selecting algorithm, and then the obtained optimal intensity is used to form watermarked image. Fig. 4 is the block-diagram of the embedding algorithm.
Fig. 4. The block-diagram of embedding algorithm
3 Watermark Extracting Watermark extraction algorithm is the exact inverse process of embedding algorithm. The watermark can be extracted just when we get the optimal intensity as the secret keys.
4 Experimental Results The performance of digital watermarking system can be characterized by the following aspects: imperceptibility, security and robustness. All these aspects are evaluated by experimental results respectively in our study. In our simulation, ‘Lena’ image and ‘Baboon’ image with the size of 256 × 256 are taken as test images and watermark with size of 64 × 64 is shown in Fig. 8(d). The result images of test image ‘Lena’ and ‘Baboon’ are shown in Fig. 5(b) and Fig. 6(b). When free of any attacks, the PSNR of the watermarked image ‘Lena’ is 35.8487, NC is 1 and the PSNR of the watermarked image ‘Baboon’ is 36.3028 and NC is 1. In the GA training process, ten individuals are chosen for every iteration. The crossover operation is selected as scattered function in the MATLAB Genetic Algorithm Toolbox. The selection operation is selected as stochastic uniform function and
A Novel Color Image Watermarking Method Based on Genetic Algorithm
(a)
77
(b)
Fig. 5. (a) Original host image ‘Lena’ (b) Result image watermarked
(a)
(b)
Fig. 6. (a) Original host image ‘Baboon’ (b) Result image watermarked
the mutation operation is Gaussian function with the scale value 1.0 and the shrink value 1.0. The training iterations are set to 200. The fitness values converge after 200 iterations, which can be seen from Fig. 7, and the optimized intensity with the optimal fitness value is 62, 64, and 94 for R, G and B channel respectively. The result images under different attacks and the watermarks exacted are depicted in Fig. 8. Seen from Table 1, the conclusion can be drawn that our algorithm is robust to attacks encountered always in image processing and transmission.
Fig. 7. The diagram of fitness value
78
Y. Lu et al.
(a)
(d)
(b)
(e)
(f)
(c)
(g)
(h)
Fig. 8. (a) Result image of watermarked ‘Baboon’ under additive noising attack, (b) Watermarked image under filtering attack, (c) Watermarked image under compressing attack, (d) Original watermark, (e-g) Extracted watermarks from (a-c) using our method, respectively. (g) Extracted watermark from (c) using Kutter’s method. Table 1. Experimental results under different attacks of our scheme (measured by NC)
Attack Type Attack-free Additive noising Filtering JPEG QF=80 JPEG QF=50 JPEG QF=30
Baboon 1 0.9137 0.9320 0.9957 0.9801 0.9639
Lena 1 0.9139 0.9536 0.9830 0.9547 0.9390
Airplane 1 0.9479 0.9139 0.9957 0.9861 0.9752
Table 2. Experimental results under different attacks of Kutter’s scheme (measured by NC)
Attack-free 0.9684
Noising
Filtering
0.9546
0.9362
JPEG QF=80 0.6386
JPEG QF=50 0.5925
JPEG QF=30 0.5071
To evaluate the robustness of the proposed watermarking scheme, Kutter’s algorithm is simulated as comparison. The results under several attacks of Kutter’s algorithm are shown in Table 2. Compared with Table 1, it can be concluded that our algorithm is more robust than Kutter’s, especially in resisting additive nosing and JPEG compressing. To evaluate the performance of watermarking techniques, Pao-Ya Yu et al [9] used mean square error (MSE) as a quantitative index. Another quantitative index for robust is mean absolute error (MAE). These two indices are defined respectively as,
A Novel Color Image Watermarking Method Based on Genetic Algorithm
MSE =
1 3× M × N
M
N
¦ ¦ ¬ª( R
ij
i =1 j =1
79
− R 'ij ) + ( Gij − G 'ij ) + ( Bij − B 'ij ) ¼º , (5)
1 H L MAE = ¦¦ Wij − W 'ij , H × L i =1 j =1
where M × N and H × L denote the size of the host color image and the watermark image respectively. Note that the quantitative index, MAE, is exploited to measure the similarity between original watermark and extracted watermark. For evaluating the performance, Table 3 exhibits comparisons of our method and Pao-Ya Yu’s method in terms of above-mentioned two quantitative indices. Table 3 illuminates that our algorithm has better robustness than Pao-Ya Yu’s. Table 3. Experimental results under different attacks of Pao-Ya Yu’s method
MAE Attacks
Images
MSE
Proposed method
Pao-Ya Yu’s
Attack-free
Lena
1.597
0.00149
0.00195
Baboon
1.667
0
0.02344
Lena
38.714
0.0206
0.0205
Baboon
345.778
0.0337
0.16211
Lena
21.103
0.0801
0.08887
Baboon
62.631
0.0947
0.23535
Filtering JPEG
5 Conclusion A novel embedding intensity adaptive CIWGA is proposed in this paper. A color image is divided into three channels firstly. Then genetic algorithm is applied to analyze the influence on the original image when embedding and the capacity of resisting attacks in every channel. At last, the watermark is embedded in R, G and B channels respectively. Using genetic algorithm can not only improve image quality, but also furthest enhance security and robustness of the watermarked image. This algorithm fulfills an optimal compromise between the robustness and image quality.
References 1. Cheung, W. N.: Digital Image Watermarking in the Spatial and Transform Domains. ProCeedings of TENCON’2000, Sept. 24-27,2000,3 2. Zhang, X. D., Feng, J., Lo K. T.: Image Watermarking Using Tree-Based SpatialFrequency Feature of Wavelet Transform. Journal of Visual Communication and Image Representation 14(2003) 474-491
80
Y. Lu et al.
3. Fleet, D., Heeger, D.: Embedding Invisible Information in Color Images. Proc. 4th IEEE International conference on Image Processing, Santa Barbara, USA, 1(1997) 532-535 4. Kong, J., Wang, W., Lu, Y. H., Han, J. L., Hou, G.: Joint Spatial and Frequency Domains Watermarking Algorithm Based on Wavelet Packets Transform.The 18th Australian Joint Conference on Artificial Intelligence ,2005 5. Kutter, M., Jordan, F., Bossen, F.: Digital Watermarking of Color Images Using Amplitude Modulation. J. Electron. Imaging 7(2) (1998) 1064-1087 6. Gen M., Cheng R.: Genetic Algorithms and Engineering Design. Wiley, New York, NY, 1997 7. Suhail, M. A., Obaidat, M. S., Ipson, S. S., Sadoun B.: A Comparative Study of Digital Watermarking in JPEG and JPEG 2000 Environments. Information Sciences 151(2003) 93-105 8. Shieh, C. S., Huang, H. C., Wang, F. H., Pan J. S.: Genetic Watermarking Based on Transform-domain Techniques. Pattern Recognition 37(2004) 555-565 9. Pao-Ta Yu, Hung-Hsu Ysai, Jyh-Shyan Lin: Digital Watermarking Based on Neural Networks for Color Images. Signal Processing 81(2001) 663-671 10. Cox, I. J., Kilian, J., Leighton, F.T., Shamoon , T.: Secure Spread Spectrum Watermaking for Multimedia. IEEE Trans. Image Process 6(12) (1997) 1673-1687 11. Ganic, E., Eskicioglu, A.M.: Robust DWT-SVD Domain Image Watermarking: Embedding Data in All Frequencies. Proceedings of 2004 Multimedia and Security Workshop on Multimedia and Security, (2004) 166-174
A Novel Emitter Signal Recognition Model Based on Rough Set Guan Xin, Yi Xiao, and He You Research Institute of Information Fusion, Naval Aeronautical Engineering Institute, YanTai, P.R. China ,264001
[email protected]
Abstract. On the basis of classification, rough set theory regards knowledge as partition over data using equivalence relation. Rough set theory is deeply studied in this paper and introduced into the problem of emitter recognition, based on which a new emitter signal recognition model is presented. At the same time, a new method of determining weight coefficients is proposed, which is independent of a prior knowledge. And a new classification rule is also presented in this paper. At last, application example is given, which demonstrates this new method is accurate and effective. Moreover, computer simulation of recognizing radar emitter purpose is selected, and compared with fuzzy pattern recognition and classical statistical recognition algorithm through simulation. Experiments results demonstrate the excellent performance of this new recognition method as compared to existing two pattern recognition techniques. A brand-new method is provided for researching on emitter recognition.
1 Introduction With the development of sensor technology, a lot of regular or special emitters are widely used. Emitter recognition has become an important issue in military intelligence, surveillance, and reconnaissance. In fact, a prior knowledge is hard to obtain and emitter signals overlap to a great degree. So, regular algorithms for emitter recognition do not always give good performance. Some researches have been conducted for emitter recognition over the past years, such as expert system[1], fuzzy recognition method[2], artificial neural network[3] , and attribute mathematics recognition method[4] etc. Indeterminacy mathematics methods should be developed for the sake of solving this problem objectively, practically and rationally. Rough set theory was developed by Zdzislaw Pawlak in 1982[5]. The main goal of the rough set theory is to synthesize approximation of concepts from the acquired data. On the basis of classification, rough set theory regards knowledge as partition over data using equivalence relation. Rough set theory has been conceived as a tool to conceptualize, organize and analyze various types of data, in particular, to deal with inexact, uncertain or vague knowledge in applications related to artificial intelligence[6-8]. The main advantage of rough set theory is that it does not need any preliminary or additional information about data. For the special traits of emitter recognition, a new emitter recognition method based on rough set theory is presented with its detailed steps, and a new approach to D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 81 – 89, 2006. © Springer-Verlag Berlin Heidelberg 2006
82
G. Xin, Y. Xiao, and H. You
determining weight coefficients is proposed, which is independent of a prior knowledge. A new classification rule based on decision table is generated. Finally, application example is given, which demonstrates this new method is accurate and effective. Moreover, computer simulation of recognizing the radar emitter is selected, and compared with fuzzy recognition approach and classical statistical pattern recognition.
2 Basic Definitions of Rough Set Theory 2.1 Information Systems and Indiscernibility A data set is represented as a table, where each row represents a case, an event, or simply an object. Every column represents an attribute that can be measured for each object. This table is called an information system. More formally, an information system can be noted as a pair (U , A) , where U is a non-empty finite set of objects called the universe and A is a non-empty finite set of attributes such that a : U → Va for every a ∈ A . The set Va is called the value set of a . Any subset B of A determines a binary relation I ( B ) on U , which will be called an indiscernibility relation, and is defined as follows: xI ( B ) y if and only if a ( x) = a ( y ) for every a ∈ A , where a ( x) denotes the
a for element x . Obviously, I ( B ) is an equivalence relation. If ( x, y ) belongs to I ( B ) , we will say that x and y are B -indiscernible. Equivalence classes of the relation I ( B ) (or blocks of the partition U / B ) are refereed to as B -elementary sets. value of attribute
2.2 Reduction of Attributes Attribute reduction is one of the major concerns in the research on rough set theory. In an information system, some attributes may be redundant regarding a certain classification. Rough set theory introduces notions, which help reducing attributes without declining ability of classification. Let R be a set of equivalence relation, and r ∈ R . An attribute r is dispensable in R if ind ( R ) = ind ( R − {r}) . Otherwise, r is indispensable. The dispensable attribute does not improve or reduce the classification when it is present or absent. The set of all attributes indispensable in P is called a core of P , denoted as core( P ) . The core contains attributes that can not be removed from P without losing the original classification. 2.3 Decision Table
S = (U , R,V , f ) can be represented in terms of a decision table, assume that R = C D and C D = φ , where C is the condition attributes An information system
A Novel Emitter Signal Recognition Model Based on Rough Set
and D is the decision attributes. The information system deterministic if C
→ D , otherwise, it is non-deterministic.
83
S = (U , R,V , f ) is
3 The Algorithm of Emitter Signal Recognition The detailed recognition steps of the proposed emitter signal recognition model are given as follows: 3.1 Constructing Relationship Data Model Assume that we have r radar classes. Every radar class has multi-pattern. Assume that the total mode in known radar template library is n . Regard the characteristic parameters of radar emitter signals as condition attributes, marked as C =
{c1 , c2 , , cm } . Regard the class of radar emitters as decision attributes, marked as
D = {d1 , d 2 ,, d r } . Let us denote some sample ut in the known radar template library as u t = (c1,t , c 2,t , , c m ,t ; d t ) . The universe U = {u1 , u 2 , , u n } is also called
sample
set.
Then,
the
attribute
values
of
ut
are
c i (u t ) =
c i ,t (i = 1, 2 , , m ; t = 1, 2 , , n ) , d (ut ) ∈ D . The two dimension table constituted by
u t (t = 1,2,, n ) is relationship data model of radar emitter
recognition. 3.2 Constructing Knowledge Systems and Discretization In order to analyse the dependency among knowledge and importance among attributes from samples, classification must be done to the universe utilizing attributes and knowledge system must also be constructed in the universe. Discretization must be done to continuous attributes before classifying, because that rough set theory cannot deal with continuous attributes. Assume that every object in the universe U is discretized, we can determine equivalence relation on U . Then, knowledge system can be constructed. Assume that
C ′ ⊆ C , define a binary relation as RC = {(u , v ) ∈ U × U
RC = ci (u ) = ci (v ), ∀ci ∈ C ′} . In like manner, define R D as R D =
{(u , v ) ∈ U × U d (u ) = d (v )} . Obviously,
RC and R D are both equivalence
relation on U . So, these two relation can be used to determine knowledge system on U . Real data sets from emitter signals include continuous variables. Partitioning continuous variables into categories must be done. Considering of the traits of emitter recognition problem, some simple applicable discretization methods can be used, such
84
G. Xin, Y. Xiao, and H. You
as equidistance, equifrequency, Naive Scaler algorithm, Semi Naive Scaler[9], etc. The result of discretization impacts on classification quality directly. 3.3 Conformation of Weight Coefficients In general, a prior information is hard to obtain to show the importance of each characteristic parameter of emitter, thus average weight coefficients can be adopted. But, in fact, the importance of every characteristic parameter is not always equivalence. So it is much better to adopt weighed processing. Some researches have been conducted, such as entropy analytical method, comparable matrix method, analytic hierarchy processed and so on. Here, we adopt a new method to adjust weight coefficients. This method changes the weight coefficients problem to expression of significance of attributes, which is independent of a prior knowledge. Different attribute with the decision set D may be takes on different importance. Significance of an attribute ci in the decision table can be evaluated by measuring
∈ C form the attribute set C on the positive region defined by the decision table. The number γ C (D ) expresses the degree of dependency between attributes C and D , or accuracy of approximation of U / D by C . We can ask how the coefficient γ C (D ) change when an attribute ci is removed, i.e., what is the difference between γ C (D ) and γ C−{c } (D) . Thus, we can i the effect of removing of an attribute ci
define the significance of an attribute
ci as
σ D (ci ) = γ C (D) − γ C−{ci } (D) . where bigger
σ D (ci )
is, the more important attribute
(1)
ci is.
The steps of determining weight coefficients are describes as follows. Step 1. Calculate the degree of dependency between R D and RC . That is to say, calculate the degree of dependency between emitter attribute set
(
C and emitter’s type D .
)
§ · card ¨ RC [ y ]RD ¸ [ y ] ∈(U RD ) © − ¹ . γ RC (R D ) = R D card (U ) ¦
(2)
card(S ) expresses the cardinal number of S . Step 2. Calculate the degree of dependency of R D on R C − {c i } .
where,
(
§ card¨¨ RC −{ci } [ y ]RD [ y ]R ∈(U RD ) © − γ RC −{c } (RD ) = D i card(U ) ¦
)·¸¸
¹ , i = 1, 2 , , m .
(3)
A Novel Emitter Signal Recognition Model Based on Rough Set
Step 3. According to eq.(1), calculate the significance of the
i th attribute.
σ D (c i ) = γ RC (R D ) − γ RC −{c } (R D ) , i = 1,2, , m . i
Step 4. Calculate the weight coefficient of the
λi =
σ D (c i )
¦ σ (c ) m
D
85
(4)
i th attribute.
, i = 1, 2 , m .
(5)
j
j =1
3.4 Classification Rules Based on Decision Table After discretization and reduction of incomplete decision table, the following classification rule can be conducted. Rule 1: Calculate the accordance degree of characteristic parameter of the pending unknown signal with condition attributes of each classification, then choice the decision rule that have the biggest accordance degree to assign the pending signal.
μ(X i ) =
card( X i Fx ) . card(Fx )
(6)
Fx is characteristic parameter set of the pending unknown signals, X i is condition attribute set of decision table, and X i Fx is characteristic set that meet characteristic conditions of X i in set Fx . where
It is easy to see that average weight is adopted in rule 1. But, in fact, the influence of each attribute on decision is different. So, weighed processing to condition attributes is better for recognition. A new classifying rule based on accordance degree matrix is presented here. Rule 2: Compare the characteristic parameters of the pending unknown signal x 0
xi in the template library, then a matrix S n×m is obtained, where n is the number of known samples in the template library. Assume that C is the set of characteristic parameters, and m is the number of characteristic parameters. If c j ( x 0 ) = c j (x i ), c j ∈ C , j = 1,2, , m , then sij = 1 . Otherwise, and that of known signal
s ij is equal to 0. Denote the weight coefficients as a = ( a1 , a 2 , , a m ) ′ , then the accordance
degree
matrix
can
be
described
as
ȝ n×1 = S n×m × a . If
i 0 = max μ i (i = 1,2,..., n) , then the pending signal x0 is the same class with the i
i0
th
emitter in the template library.
86
G. Xin, Y. Xiao, and H. You
4 Simulation Analysis To test validity and applicability of this new recognition method proposed in this paper, in the example below, it is applied to identify the purpose of hostile radar emitter with simulated radar data. 4.1 Application Example Assume that radar characteristic vector comprises three characteristic parameters, that is radio frequency (RF), pulse repetition frequency (PRF) and pulse width (PW). Three different purposes radar are selected from the template library. Extracted incomplete sample characteristic parameters are shown in table 1. For the convenience of denotation, we give the following corresponding expressions. No.--U , U = {x1 , x 2 , , x17 } , RF--- a , PRF--- b , PW--- c . Table 1. Extracted Example Data Of Radar Emitter No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Class 1 1 1 1 1 1 1 1 2 2 2 2 2 3 3 3 3
RF(MHz) 2774.3 1280 1313 1251 9214 2746 2985 3109 2695 160 2700 2700 2000 3400 2970 9000 3700
PRF(Hz) 428 300 301 601 429 1873 325 375 335 190 375 330 600 2500 1250 1750 2250
PW
μs
3 4 4.1 1.7 0.8 0.6 2.6 2.1 1.1 7 1.7 0.8 3.5 0.5 0.8 0.25 0.37
Because of influences of stochastic factors on radar signals during the process of emission, transmission, and receiving, radar characteristic vector takes on statistical property. Four metrical radar emitter samples are given in table 2. Table 2. Metrical Radar Characteristic Parameter Metrical Sample 1 2 3 4
RF(MHz)
PRF(Hz)
2682.2 1285.5 2673.4 3821.4
429 617.6 326.8 2216.6
PW
μs
2.81 1.7402 0.8291 0.3732
A Novel Emitter Signal Recognition Model Based on Rough Set
87
Discretization must be done on extracted data of radar emitter. Naive Scaler discretization approach[9] is adopted here. The results can be seen in table 3. Table 3. Discretized Extracted Example Data
U 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
d
a
b
c
1 1 1 1 1 1 1 1 2 2 2 2 2 3 3 3 3
4 2 2 2 8 4 6 6 3 1 3 3 3 7 5 7 7
5 2 2 7 5 9 2 4 3 1 4 3 6 10 8 8 10
6 8 8 5 3 2 6 6 4 9 5 3 7 1 3 1 1
After discretization, metrical radar signal is shown in table 4. Table 4. Discretized Metrical radar signal characteristic parameter
U
a
b
c
1 2 3 4
3 2 3 7
5 7 2 10
6 5 3 1
Using recognition rule described in section 3.4, the pending unknown radar signals are assigned to No. 1, No. 1, No. 2 and No.3 class respectively. The recognition result is accord with the fact. 4.2 Comparison Experiment In our simulation, 10 different purposes radar are firstly selected from known radar template library. The simulated template library we used is built consulting the metrical parameter region of some scout, which contains 4016 records. Condition attributes set is described as {radio frequency, pulse width, pulse repetition frequency, antenna scan type, antenna polarization type, antenna scan period, radio frequency type, pulse repetition frequency type, inner pulse modulation type}. And decision attribute set is described as {radar emitter purpose}.
88
G. Xin, Y. Xiao, and H. You
Observation vector comprises two parts, which are random-access known characteristic vector and measurement noise. The measurement noise is assumed a zero-mean white noise with the standard deviation σ . In our simulation, two different noise environments are selected, whose standard deviation of measurement noise respectively are 2, 5 percent of the corresponding characteristic parameter. For continuous attributes, equifrequency discretization approach is adopted here. Table 5 shows the corresponding recognition results of emitter recognition algorithm based on rough set as compared to existing fuzzy pattern recognition and statistical pattern recognition technique, which are gained through 100 Monte Carlo experiments. Cauchy membership or normal membership can be used in fuzzy pattern recognition[2] . In our simulation, normal membership is adopted for continuous attributes. For discrete attributes, membership function is equal to 1 in case of modulation mode matching. Otherwise, membership function is equal to 0. Table 5. Correct recognition rate of three methods for 100 runs of stochastic simulation Rough set recognition noise
Number of reduced attributes
environment 1 environment 2
fuzzy pattern recognition
statistical pattern recognition Number Correct of recognition attributes rate
Correct recognition rate
Number of attributes
Correct recognition rate
4
90.6%
9
81.3%
9
72.1%
4
82.1%
9
76.8%
9
61%
Based on the experiments above, we can draw the following conclusions. (1) Rough set emitter signal recognition algorithm is not merely a kind of classifier. It can obtain minimal representation of knowledge under the precondition of retaining key information. It can also identify and evaluate the correlative relation and can obtain knowledge of rule from experience dada. It can be seen from table 5 that rough set recognition approach proposed excels fuzzy pattern recognition and traditional statistical pattern recognition method according to such practical reconnaissance environment. (2) Subjectivity exists in fuzzy pattern recognition method when determining membership function, which becomes an obstacle to its applications. Moreover, rough set signal recognition model is independent of a prior knowledge and depends on samples merely. (3) Rough set signal recognition model shows its obvious advantages in big samples. (4) In order to promote the correct recognition rate, more reasonable discretization algorithms should be selected according to practical applications.
5 Conclusions Emitter recognition is a key technology to multisenor information fusion system. A new emitter signal recognition model based on rough set theory is presented in this
A Novel Emitter Signal Recognition Model Based on Rough Set
89
paper. At the same time, a new method of determining weight coefficients is given and a new classification rule is also presented. Finally, detailed simulation experiments are conducted to demonstrate the new method. Moreover, the method is compared with fuzzy pattern recognition and classical statistical pattern recognition through simulation. The new recognition approach shows promising performance and is approved to be effective and feasible to emitter recognition.
Acknowledgements This paper is supported by the National Natural Science Foundation of China (Grant No. 60572161), Excellent Ph.D Paper Author Foundation of China(Grant No. 200036) and Excellent Ph.D Paper Author Foundation of China(Grant No. 200237).
References 1. Cheng, X.M., Zhu, Z.W., Lu, X.L.: Research and Implementation on a New Radar Radiating-Source Recognizing Expert System. Systems Engineering and Electronics, Vol.22,8 (2000) 58–62 2. Wang, G.H., He, Y.: Radar ID Methods Based on Fuzzy Closeness and Inexact Reasoning. Systems Engineering and Electronics, 1 (1995) 25–30 3. Shen, Y.J., Wang, B.W.: A Fast Learning Algorithm of Neural Network with Tunable Activation Function. Science in China, Ser. F Information Sciences, Vol.47,1 (2004) 126– 136 4. Guan, X., H,e Y., Yi, X.: Attribute Measure Recognition Approach and Its Applications to Emitter Recognition. Science in China Series F, Information Sciences, Vol.48,2 (2005) 225– 233 5. Pawlak, Z.: Rough sets. International Journal of Information and Computer Science, 11 (1982) 341–356 6. Pawlak, Z.: Rough Set Theory and Its Application to Data Analysis. International Journal of Cybernetics and Systems, 29 (1998) 661–688 7. Li, M., Zhang, H.G.: Research on the Method of Neural Network Modeling Based on Rough Sets Theory. Acta Automatica Sinica, 1 (2002) 27–33 8. Cho, Y., Lee, K., Yoo, J., Park, M.: Autogeneration of Fuzzy Rules and Membership Functions for Fuzzy Modeling Using Rough Set Theory. IEE Proceeding of Control Theory Application, Vol.145,5 (1998) 437–442 9. Wang, G.Y.: Rough Set Theory and Knowledge Acquisition. Press of Xi’an Jiaotong University, (2001)
A Novel Model for Independent Radial Basis Function Neural Networks with Multiresolution Analysis GaoYun An and QiuQi Ruan Institute of Information Science, Beijing Jiaotong University, Beijing, China, 100044
[email protected],
[email protected]
Abstract. Classical radial basis function (RBF) neural network directly projects input samples into a high dimension feature space through some radial basis functions, and does not take account of the high-order statistical relationship among variables of input samples. But the high-order statistical relationship does play an important part in pattern recognition (classification) area. In order to take advantage of the high-order statistical relationship among variables of input samples in neural network, a novel independent radial basis function (IRBF) neural network is proposed in this paper. Then a new hybrid system combining multiresolution analysis, principal component analysis (PCA) and our proposed IRBF neural network is also proposed for face recognition. According to experiments on FERET face database, our proposed approach could outperform newly proposed ICA algorithm. And it is also confirmed that our proposed approach is more robust to facial expression, illumination and aging than ICA in face recognition.
1 Introduction Up to now, there have been many successful algorithms for face recognition. Principal Component Analysis (PCA) [6], Fisher’s Linear Discriminant (FLD) [7] and Independent Component Analysis (ICA) [1] are three basic algorithms for subspace analysis in face recognition, and have been well developed. But there are still some outliers which will impact the performance of face recognition algorithms. These outliers are facial expression, illumination, pose, masking, occlusion etc. So how to make current algorithms robust to these outliers or how to develop some powerful classifiers is the main task for face recognition. As a useful tool for multiresolution analysis, wavelet decomposition has also been introduced into face recognition to make algorithms much more robust to facial expression, pose and small occlusion, like the work of [2] and [3]. In [3] it has been demonstrated that the Daubechies 4 (db4) wavelet outperforms other wavelets in computation time and recognition accuracy rate. Lai etc. [2] have combined the wavelet decomposition and the Fourier transform to propose the spectroface representation for face recognition, which is robust to facial expression, translation, scale and on-the-plane rotation. So inspired by [2] and [3], the db4 wavelet will be adopted to D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 90 – 99, 2006. © Springer-Verlag Berlin Heidelberg 2006
A Novel Model for Independent Radial Basis Function Neural Networks
91
extract subband face images which are robust to facial expression and small occlusion for our proposed approach. From another point of view, algorithms proposed in [1]-[3], [6] and [7] are just for the feature extraction stage in an identification system. Some powerful classifiers are expected to classify these extracted features. Meng etc. [4] has tried to use the radial basis function (RBF) neural network to classify features extracted by FLD. A classical RBF neural network is formed by input layer, hidden layer and output layer. It directly projects the input samples into a high dimension feature space through some radial basis functions, and does not take account of the high-order statistical relationship among variables of input samples. As known, the high-order statistical relationship does play an important part in pattern recognition (classification) area. So in order to take advantage of the high-order statistical relationship among variables, a novel independent radial basis function (IRBF) neural network is proposed in this paper. Then a novel hybrid system for face recognition is also proposed. In the hybrid system, the proposed IRBF neural network is adopted to classify the extracted PCA features of enlarged subband face images extracted by wavelet decomposition. The detail about the proposed IRBF neural network and the new hybrid system for face recognition will be discussed in the following section.
2 IRBF Neural Networks with Multiresolution Analysis In order to take advantage of the information reflected by the high-order statistical relationship among various variables, a novel IRBF neural network is proposed in this section. Then, a novel hybrid system for face recognition is also proposed. The hybrid system contains three main sub-models. They are multiresolution analysis sub-model, PCA sub-model and IRBF neural networks sub-model. Given matrix X ( n × N ) of training samples, where N is the number of samples and sample xi ∈ \ n . The whole hybrid system for face recognition could be discussed as follows. 2.1 Multiresolution Analysis Wavelet decomposition is a powerful tool for multiresolution analysis. Here, db4 wavelet in wavelet family is chosen to extract subband images for multiresolution analysis for the new hybrid system. The subband images of every sample in X are extracted for multiresolution analysis, as follows: 2 Let V ( R ) denote the vector space of a measurable, square integrable 1D function.
The continuous wavelet decomposition of a 1D signal s (t ) ∈ V ( R ) is defined: 2
(Wa s )(b) = ³ s (t )φa , b (t ) dt
(1) −1 2
where the wavelet basis function can be expressed as φa , b (t ) = a φ ((t − b) a ) , and the arguments a and b denote the scale and location parameters, respectively. Eq.(1) n can be discretized by restraining a and b to a discrete lattice ( a = 2 , b ∈ ] ).
92
G.Y. An and Q. Ruan
The discrete wavelet decomposition for 2D images can be similarly defined by implementing the 1D discrete wavelet decomposition for each dimension of the images separately. In this paper, only the enlarged subband images corresponding to the low frequency components in both vertical and horizontal directions of the original images are chosen for later process, because according to the wavelet theory, these subband images are the smoothed version of original images and insensitive to facial expression and small occlusion in face recognition. Fig. 1 illustrates some examples, the left two images are the original images (one is normal facial expression with glasses, the other is smiling expression without glasses), the middle two images are the chosen subband images and the right two images are the enlarged version of corresponding subband images. From the right two images, it could be noticed that facial expression and glasses are kicked out. If define the correlation factor as Euclidean distance of two images, it is clear that the correlation factor of the right two images are much smaller than that of the left two images. So the enlarged subband images corresponding to the low frequency components in both vertical and horizontal directions of the original images are very useful for face recognition and chosen in our approach.
correlation factor 187.73
correlation factor 106.38
Fig. 1. Illustration of subband images of the same person with different facial expression and small occlusion (glasses)
So, for every sample x i , y i = B (W A (xi )) , where function B (<) stands for resizing the subband image equal to the size of xi with bilinear interpolation and formatting an image as an vector, function W A (<) stands for performing two level 2D discrete wavelet decomposition and choosing the subband image corresponding to the low frequency components in both vertical and horizontal directions of the original images. n All the enlarged subband images y i ∈ \ format the new training matrix Y . 2.2 Principal Component Analysis
Before transferring every sample in training matrix Y into the new IRBF neural networks for classification, we should reduce the dimension of every sample with PCA. In this step, the projection matrix Wpca is calculated as [6]:
A Novel Model for Independent Radial Basis Function Neural Networks
93
Wpca = arg max WT ℵW W
= [ w1
w2 "
(2)
wm ]
where m is the dimension of the PCA feature space. Matrixℵ is the total scatter matrix, and is calculated as: N
ℵ = ¦ i=1 (y i - ȝ )(y i - ȝ)T
(3)
where ȝ ∈ \ n is the mean vector of all samples in training matrix Y . All the samples in training matrix Y are projected into the PCA feature space as:
z i = Wpca ( y i - ȝ ) T
(4)
where z i is the representation of sample y i in the PCA feature space. And all the
zi ,
i = 1," , N , will been transferred into the IRBF neural network as input samples. 2.3 Independent Radial Basis Function Neural Networks A classical RBF neural network is formed by three layers: input layer, hidden layer and output layer. It directly projects the input samples into a high dimension feature space through some radial basis functions ϕi (<) , and does not take account of the high-order statistical relationship among variables of input samples. As known, the high-order statistical relationship does play an important part in pattern recognition (classification) area. So in order to take advantage of the high-order statistical relationship among variables, a novel independent radial basis function (IRBF) neural network is proposed here. As shown in Fig.2, the IRBF neural network contains four layers: input layer, unmixing layer, hidden layer and output layer. The input layer just transfers the input samples z = [ z1 , z2 ," , zm ]T to the unmixing layer.
ϕ
ξ11
z1
# ξ1m
ξ1i
#
#
si
ϕ
#
zi zm
s1
#
ξ mm
w1k
w11 w1 j
# cj
#
#
sm
ϕ
c1
ck wMk
Fig. 2. The main structure of IRBF neural networks
The unmixing layer extracts the high-order statistical relationship among variables of the input samples transferred from input layer as follows:
si = f i
(¦
m p =1
)
ξ pi z p , i = 1," , m
(5)
94
G.Y. An and Q. Ruan
where si , i = 1," , m is statistical independent, and ξ pi is one component of unmixing matrix Ξ m× m . Function f i (ϑ ) is an invertible squashing function, mapping real numbers into the [0, 1] interval. Here, we chose f i (ϑ ) = 1 (1 + e −ϑ ) . In order to achieve the independence among si , i = 1, " , m , an information maximization approach [5] to blind separation and blind deconvolution is used as follows. The relationship between joint entropy H (s) and mutual information I (s) is defined as:
H ( s1 ," , sm ) = H ( s1 )+," , + H ( sm ) − I ( s1 ," , sm )
(6)
where s = [ s1 ," , sm ]T . Since independent components have zero mutual information, as proposed by [5] the objective of independence among si , i = 1," , m could be achieved by maximizing the joint entropy H (s) : m
m
Ξ opt = arg max H ( f1 (¦ ξ p1 z p ))+," , + H ( f m (¦ ξ pm z p )) Ξ
p =1
p =1
m
m
p =1
p =1
(7)
− I ( f1 (¦ ξ p1 z p )," , f m (¦ ξ pm z p )) The optimization of unmixing matrix Ξ opt could be calculated through the following gradient update rule [1]: ΔΞ ∝ ∇ Ξ H (s) = (ΞT ) −1 + E (s′zT )
(8)
where s′ = [ s1′," , sm′ ]T and si′ = f1′′ (¦ p =1 ξ pi z p ) / f i′ (¦ p =1 ξ pi z p ) . E (<) stands for calm
m
culating the expected value. After getting the optimal unmixing matrix Ξ opt , for all the input samples z ( i ) the new feature vector s (i ) = [ s1 ," , sm ]T which reflects the high-order statistical relationship could be calculated by Eq. (5). Then these new feature vectors s (i ) are mapped into a high dimension feature space through the radial basis function ϕ (<) of the hidden layer. In our proposed IRBF neural network, the number of nodes of hidden layer is equal to the number of input training samples, and the radial basis function is defined as:
ϕi (s) = ψ ( s - t i ) , i = 1, 2," , N
(9)
where t i is the center and is chosen as t i = si in this paper. Function ψ (<) chooses multiquadrics function. Now the output of the jth output node of IRBF neural network is defined as: N
N
Γ j (s) = ¦ i =1 wijψ (s, si ) =¦ i =1 wijψ ( s − si )
(10)
A Novel Model for Independent Radial Basis Function Neural Networks
95
At last, the weighted matrix W between hidden layer and output layer is calculated through the following optimization problem:
Wopt = arg maxE (Γ ) W
Ν
k
N
= arg max ¦ i =1 ¦ j =1 (cij − ¦ p =1 wpjψ ( si - t p )) 2
(11)
W
The solution to Eq. (11) could be calculated by Eq. (12), and the detail about the calculating procedure could be referred to [8]. Wopt = (Ψ T Ψ ) −1 Ψ T C
(12)
where C is a N × k matrix of target output and k is the number of classes. Matrix Ψ is : ª ψ (s1 ,s1 ) ψ (s1 ,s 2 ) «ψ (s ,s ) ψ (s ,s ) 2 1 2 2 Ψ=« « # # « «¬ψ (s N ,s1 ) ψ (s N ,s 2 )
" ψ (s1 ,s N ) º " ψ (s 2 ,s N ) »» » % # » " ψ (s N ,s N ) »¼
(13)
2.4 Summary
A new IRBF neural network has been proposed in this section. The training procedure contains two steps: first, the unmixing matrix Ξ opt should be adjusted by an information maximization approach; second, the weighted matrix Wopt between hidden layer and output layer should be tuned with Eq. (12). A new hybrid system for face recognition may be constructed with three sub-models: 2D wavelet decomposition, PCA and the trained IRBF classifier. In the new hybrid system, for every sample the enlarged subband images which are robust to facial expression and small occlusion are firstly extracted by 2D wavelet decomposition. Then, these enlarged subband images are projected into the PCA feature space to further reduce the dimension. At last, our proposed IRBF neural network is adopted to classify these PCA features of enlarged subband images. Compared with classical RBF neural network, our proposed IRBF neural network could fully take advantage of the high-order statistical relationship among variables of input samples. So the new hybrid system for face recognition has following three advantages: − The new hybrid system for face recognition is robust to facial expression and small occlusion. − The new hybrid system for face recognition could fully take advantage of the high-order statistical relationship of faces. − The new hybrid system for face recognition could efficiently classify faces with a powerful IRBF neural network classifier.
3 Experimental Results For the FERET face database [10], the standard testing sets, whose detail information is shown in Table 1 (copied from the website of FERET), are used in this paper. Here, the
96
G.Y. An and Q. Ruan
training set contains all the 1196 frontal face samples with normal facial expression from the gallery set and other 500 face samples (not contained in the gallery set) randomly chosen from the FERET face database. All the face images are cropped, rotated and resized to 70x60 according to the coordinates of the two eyes, nose and mouth. Table 1. Detail about testing sets of FERET (Copied from the website of FERET) Evaluation Task
Recognized Names
Gallery (1196)
Probe Set
Aging of subjects
dup1
gallery.names
probe_dup_1_*.names (722)
Aging of subjects
dup2
gallery.names
probe_dup_2_*.names (234)
Facial Expression
fafb
gallery.names
probe_fafb_*.names (1195)
Illumination
fafc
gallery.names
probe_fafc_*.names (194)
As shown in Fig. 3, the eigenvalues with high indices (>200) are very small (near to zero). So in our experiments, we choose the first 200 eigenvectors corresponding to the first 200 largest eigenvalues to span the PCA feature space. 7
magnitude of eigenvalues
6 5 4 3 2 1 0 0
50
100
150 200 250 300 index of eigenvalues
350
400
450
Fig. 3. Eigenvalue spectrum of the training set of face images
In the 2004 technical report of Delac etc. [9], ICA with architecture II (ICA2) has been demonstrated to outperform other famous algorithms (PCA, FLD and ICA with architecture I (ICA1)) with the FERET face database. So in our experiment, we will just compare our proposed approach with ICA2 with four famous distances (L1, L2, Cos and Md) as similarity measurement for nearest neighbor classifier in face recognition. The four distances are: L1 : DL (x, y ) = 1
¦
xi − yi
i
(14)
L 2 : DL (x, y ) = ( x - y ) (x - y )
(15)
Cos : Dcos ( x, y ) = − xT y x y
(16)
T
2
Md : DMd ( x, y ) = (x - y )
T
¦
−1
(x - y )
(17)
A Novel Model for Independent Radial Basis Function Neural Networks
97
Table 2. The accuracy recognition rates (%) at rank 1 of ICA2 and our proposed approach L1
L2
Cos
Md
IRBF
ICA2
75.65
76.07
80.92
76.07
—
Our Approach
—
—
—
—
91.80
ICA2
62.37
61.34
73.20
60.82
—
Our Approach
—
—
—
—
82.99
fafb
fafc
dup1 ICA2
69.81
70.50
76.18
70.50
—
Our Approach
—
—
—
—
84.21
ICA2
77.78
76.92
83.76
76.50
—
Our Approach
—
—
—
—
91.03
dup2
1
1 0.95
0.95 0.9 0.85
0.9
0.8 0.85
0.75 IRBF ICA2+L1 ICA2+L2 ICA2+Cos ICA2+Md
0.8
0.75 0
10
20
30
40
IRBF ICA2+L1 ICA2+L2 ICA2+Cos ICA2+Md
0.7 0.65 50
0.6 0
10
20
(a)
30
40
50
(b)
1
1
0.95 0.95 0.9 0.9
0.85 0.8
0.85 IRBF ICA2+L1 ICA2+L2 ICA2+Cos ICA2+Md
0.75 0.7 0.65 0
10
20
30
(c)
40
IRBF ICA2+L1 ICA2+L2 ICA2+Cos ICA2+Md
0.8
50
0.75 0
10
20
30
40
50
(d)
Fig. 4. The accuracy recognition rates at rank 1 - 50 of ICA2 and our proposed approach. The horizontal axis is the index of rank, and the vertical axis is the accuracy recognition rate. (a) is for fafb testing set, (b) is for fafc testing set, (c) is for dup1 testing set and (d) is for dup2 testing set.
98
G.Y. An and Q. Ruan
Table 2 illustrates the accuracy recognition rate at rank 1 of ICA2 and our proposed approach respectively. For the fafb testing set, the accuracy recognition rate of our proposed approach is 10.88% (130/1195) higher than that of ICA2; for the fafc testing set, the accuracy recognition rate of our proposed approach is 9.79% (19/194) higher than that of ICA2; for the dup1 testing set, the accuracy recognition rate of our proposed approach is 8.03% (58/722) higher than that of ICA2; and for the dup2 testing set, the accuracy recognition rate of our proposed approach is 7.27% (17/234) higher than that of ICA2. So it is clear that our proposed approach outperforms ICA2 for all the four testing conditions. That is also confirmed that our proposed approach is more robust to facial expression, illumination and aging than ICA2. Fig. 4 also illustrates the accuracy recognition rate at rank 1 – 50 for the four testing conditions. If the accuracy recognition rate at rank 10 is adopted, that of our proposed approach may reach 96.07% (1148/1195) for the fafb testing set, 91.75% (178/194) for the fafc testing set, 93.08% (672/722) for the dup1 testing set and 96.15% (225/234) for the dup2 testing set.
4 Conclusions In this paper, a novel IRBF neural network has been proposed. Employing the IRBF neural network as a powerful classifier, a hybrid system for face recognition has also been proposed. In the hybrid system, the proposed IRBF neural network is adopted to classify the extracted PCA features of enlarged subband face images extracted by wavelet decomposition. According to the experiments on FERET face database, our proposed approach could outperform newly proposed ICA algorithm for face recognition. And it has also been confirmed that our proposed approach is robust to facial expression, illumination and aging in face recognition.
Acknowledgement This work was supported by the National Nature Science Foundation of China (60472033), by the Doctoral Program Foundation of the Ministry of Education of China (20030004023), and by the 973 Program of China (2004CB318005). Portions of this paper use the FERET database of facial images collected under the FERET program, and we will thank the authors of [10] for their hard work on face database. And the authors will also thank the anonymous reviewers.
References 1. Bartlett, M.S., Movellan, J.R., Sejnowski, T.J.: Face Recognition by Independent Component Analysis. IEEE Trans. Neural Networks, vol. 13 (2002) 1450–1464 2. Lai, J.H., Yuen, P.C., Feng, G.C.: Face Recognition Using Holistic Fourier Invariant Features. Pattern Recognition, vol. 34 (2001) 95-109 3. Feng, G.C., Yuen, P.C., Dai, D.Q.: Human Face Recognition Using PCA on Wavelet Subband. SPIE Journal of Electronic Imaging, vol. 9 (2000) 226-233 4. Meng, J.E., Wu, S., Lu, J., Hock, L.T.: Face Recognition with Radial Basis Function (RBF) Neural Networks. IEEE Trans. Neural Networks, vol. 13 (2002) 697-710
A Novel Model for Independent Radial Basis Function Neural Networks
99
5. Bell, A.J., Sejnowski, T.J.: An Information-maximization Approach to Blind Separation and Blind Deconvolution. Neural Computing, vol. 7 (1995) 1129–1159 6. Turk, M., Pentland, A.: Eigenfaces for Recognition. Cognitive Neuroscience, vol. 3 (1991) 71-86 7. Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J.: Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection. IEEE Trans. PAMI, vol. 19 (1997) 711–720 8. Haykin, S.: Neural Networks: A Comprehensive Foundation. 2nd Edition. Pearson Education USA (1999) 9. Delac, K., Grgic, M., Grgic, S.: Independent Comparative Study of PCA, ICA, and LDA on the FERET Data Set. Technical Report, University of Zagreb, FER (2004)
www.face-rec.org/algorithms/ Comparisons/FER-VCL-TR-2004-03.pdf 10. Phillips, P.J., Wechsler, H., Huang, J., Rauss, P.: The FERET Database and Evaluation Procedure for Face Recognition Algorithms. Image and Vision Computing, vol. 16 (1998) 295-306
A Novelty Automatic Fingerprint Matching System Tianding Chen Institute of Communications and Information Technology, Zhejiang Gongshang University, Hangzhou, China, 310035
[email protected]
Abstract. In recently years, fingerprint identification becomes more and more important for security. In this paper, it introduces an automatic fingerprint matching system(AFMS). It includes three stages: fingerprint classification, minutiae extraction and fingerprint matching. In the stage of fingerprint classification, a fingerprint is classified into one of the five types, Arch, Right Loop, Left Loop, Whorl and Others. In the stage of minutiae extraction, the minutiae, composed of ridge endings and bifurcations, are detected. In the stage of fingerprint matching, a matching score between two minutiae pattern is computed. Our AFMS is tested on 6 databases of fingerprint images. According to the type of top three matches, the recognition rates are excellent. The results reveal the expected performance and applicability of the system. They prove as well the availability of design methodology proposed in this system.
1 Introduction Biometrics is the science and engineering of using digital technology to identify or verify individuals based on the individual’s unique physical and biological, or behavioral characteristics. Fingerprint has been used as an individual identification for many years. However a good automatic fingerprint matching system has not been discovered yet. So we attempt to establish an algorithm to improve the performance of AFMS[1]. The process can be divided into three stages: fingerprint classification, minutiae extraction and fingerprint matching. A fingerprint image can be classified into one of the five types: arch, left loop, right loop, whorl, and others. The purpose of classification is to reduce the execution time of fingerprint matching and increase the recognition rate of fingerprint matching. After classification, we need to extract the minutiae pattern from a fingerprint image because the identification method of our AFMS is using minutiae. A minutiae pattern consists of two local ridge characteristics: ridge ending and ridge bifurcation. After classification and minutiae extraction, we will use the information to match the pattern. The first step of fingerprint classification is image enhancement, which enhances the contrast between the valleys and the ridges. Next, a block direction will be computed. And then the region that we want to process will be preserved and we compute the block directional images anew to diminish the noise. Next, we detect singular points, which are named core and delta points, from a directional fingerprint image. Then we classify the fingerprint image according to the number and relative positions of the singular points. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 100 – 111, 2006. © Springer-Verlag Berlin Heidelberg 2006
A Novelty Automatic Fingerprint Matching System
101
After classification, we will extract the minutiae. First, the input image will be enhanced. Then the image will be binarized and smoothed in order to distinguish the ridges and valleys. Next, the image will be thinned by a thinning approach of Hilditch’s algorithm[2]. Finally, a minutiae pattern will be extracted according to the local ridge characteristic. By classification and minutiae extraction, fingerprint matching is implemented. First, we check the fingerprint type. Then we use the minutiae to compute the matching score. If the matching score is high, the fingerprints are much similar with each other.
2 Fingerprint Classification The classification stage classifies fingerprint to one of the five types: arch, left loop, right loop, whorl and others[3]. The purpose of fingerprint classification is to increase the recognition rate and to reduce the execution time of matching. 2.1 Image Enhancement and Orientation Computation Taking a gray value fingerprint image as input, we may normalize the gray value. The purpose of normalization is to enhance the contrast between the valleys and the ridges[4]. Let A(i, j) denote the gray level at (i, j), ȝ and ı2 denote the mean and variance of A. The normalized image B(i, j) is defined as follows.
B(i, j ) = α + γ × ([ A(i, j ) − μ ] / σ )
(1)
where Į=150 and Ȗ=95 assigned in this paper. The orientations in the fingerprint image are represented by the gradient vector [G x ( x, y ), G y ( x, y )]T , which can be computed by Sobel operator. Before computing the orientational image, we use a ×5 median filter to avoid wrong gradient vectors which are generated by noise. Then we compute Gx(i, j) and Gy(i, j) at pixel (i, j) which is centered at z5 by
G x = ( z 7 + 2 z 8 + z 9 ) − ( z1 + 2 z 2 + z 3 ) G y = ( z 3 + 2 z 6 + z 9 ) − ( z1 + 2 z 4 + z 7 )
(2)
A fingerprint image is partitioned into a set of w×w blocks. Then we compute ridge orientation of each block. Opposite gradient vectors will offset each other, although they may be the same ridge-valley orientation. So we can never average the gradient vectors of the block directly. We can double the angles of the gradient vectors before averaging. After doubling the angles, the gradient vectors of opposite direction will be regarded as the same direction and heighten each other. It also can diminish the perpendicular gradient vectors. At last, the length of the gradient vectors is squared. This can make strong orientation have higher effect in the average orientation than weaker orientation [5]. For doubling the angle and squaring the length, the gradient vector is converted to polar coordinates, in which it is given by [ ρ , θ ]T . This conversion is given by
102
T. Chen
ªG x º ª ρ cos θ º «G » = « » ¬ y ¼ ¬ ρ sin θ ¼
(3)
The gradient vector is converted back to its Cartesian representation by ª ρ º ª G x2 + G y2 º » «θ » = « −1 ¬ ¼ «¬ tan (G y G x )»¼
(4)
The relationship between [Gx, Gy]T and [ȡ,ș]T is shown in Fig. 1.
Fig. 1. The relationship between [Gx, Gy]T and [ȡ,ș]T
Let be represented by ª G x2 − G y2 º ª ρ 2 (cos 2 θ − sin 2 θ ) º ª ρ 2 cos 2 θ º ªα x º » » = « » = « «α » = « 2 2 «¬ ρ sin 2 θ »¼ «¬ ρ ( 2 sin θ cos θ ) »¼ «¬ 2 G x G y ¼» ¬ y¼
(5)
We compute orientation by averaging gradients in each w by w block. The average gradient [α~ x , α~ y ]T in each block R can be computed by ª α~ x º 1 « α~ » = w2 ¬ y¼
¦
R
ªα «α ¬
x y
º 1 » = w2 ¼
¦
R
ª G x2 − G y2 º « » ¬« 2 G x G y »¼
(6)
then the average gradient direction φ is given by
φ=
1 ~ ~ ∠(α x , α y ) 2
(7)
where ∠ (x, y) is defined as: tan ° ∠ ( x , y ) = ® tan ° tan ¯
−1
( y x)
−1
( y x) + π −1 ( y x) − π
x ≥ 0 x < 0 ∧ y ≥ 0 x < 0 ∧ y < 0
(8)
2.2 Region of Interest Detection
In order to avoid obtaining false singular points or minutiae, we remove the noisy region at the borders in fingerprint image, and preserve the region in fingerprint image which we want to process[6]. By using the mean and standard deviation in each block to decide if the block is what we want or not.
A Novelty Automatic Fingerprint Matching System
103
If the block of oriented image is what we want to process, the block usually has low mean and high standard deviation. Then we can use the linear combination to decide if the region is what we want to process or not.
ν = w0 (1 − μ ) + w1 × σ + w2
(9)
We assign w0=0.5 and w1 0.5. w2 is the percentage of distance to the center of fingerprint. The mean ȝ and the standard deviation ı are normalized to be in [0, 1] and v is the value to be tested. If the value of v is greater than 0.8, the block is what we want. Otherwise, the block is background region at the borders. 2.3 Singular Point Detection
Each block of the oriented image is quantized to eight directions: °, 22.5°, 45°, 67.5°, 90°, 112.5°, 135°, and 157.5°, respectively. We then can extract a singular point by computing the Poincaré index[3]. Because of noisy directions, we have to smooth the direction before computing the Poincaré index. First we regard the direction as a vector (cosij, sinij). Then we double the angles (cos2ij, sin2ij) and use a 3×3 averaging filter to smooth the direction. The 3×3 averaging filter is shown as:
B3 B4 B5
B2 Bc B6
B1 B0 B7
¦ b = ¦
7 i=0
B i,x + 2 B c, x
7 i=0
B i, y + 2 B c, y
a =
1 1 1
B i = ( B i , x , B i , y ),
0 ≤ i ≤ 7
1 2 1
1 1 1 (10)
or
i = c
1 b arctan( ) . 2 a Then we can compute Poincaré index by summing up the changes in direction around the block P. For each block P, we compute the angle difference from 8 neighboring blocks along counter-clockwise direction. P1ĺP2ĺP3ĺP4ĺP5ĺP6ĺP7ĺP8ĺP1 P1 P8 P7 P2 P P6 P3 P4 P5 If the sum of difference is 180°, we define that the block P contains a core. If the sum of difference is -180°, we define that the block P contains a delta. The average direction of the block θ =
2.4 Classification
We classify each fingerprint into one of the five types: arch, right loop, left loop, whorl, and others[7]. An arch fingerprint contains no core and delta points. Loop and tented arches contain one core and one delta. If its delta point is located in the right of a core point, the type of fingerprint is left loop. If its delta point is located in the left of a core point, the type of fingerprint is right loop. If its delta point is in the middle of
104
T. Chen
a core point, the type of fingerprint is tented arch. Delta may be missed when we get the fingerprint image from the sensor. So we have to trace the ridge flow of fingerprint starting from the core. If the flow is right skew, the type of fingerprint is right loop. If the flow is left skew, the type of fingerprint is left loop. Whorls and twins loops contain two cores and two deltas. These two types are regarded as the same type. Otherwise, we define it as the other type of fingerprint. Table 1. Criteria for the types of fingerprints type #of core #of deltas
arch(tented arch) 0 or 1 0 or 1(middle)
Left loop
right loop
whorl(twins loop)
others
1 1(right)
1 1(left)
2 0~2
0 or>2 0 or>2
3 Minutiae Extraction Because the rule of our system is based on minutiae matching, we have to extract minutiae from an input fingerprint image. In this chapter, we introduce what minutiae are, how to extract minutiae, and how to store the information of a minutiae pattern. 3.1 Binarization, Smoothing and Thinning
We have to distinguish valley and ridge of a fingerprint image before smoothing and thinning. So the gray value of pixels in the enhanced fingerprint image will be binarized to 0 or 255[8]. First we compute the gray value of P25 and P50 from the enhanced fingerprint image, where Pk is the k-percentile of enhanced fingerprint image histogram. Then we partition an enhanced fingerprint image into w by w blocks and compute the mean of each blocks. We define that Mj is the mean of the j-th block. If the gray value of pixel Si is less than P25, we assign 0 to Si. If the gray value of pixel Si is greater than P50, we assign 255 to Si. Otherwise, the pixel value is defined by the following rule: °°255 Si = ® ° ¯°
1 8
if 0
8
¦S
x
≥Mj
x =0 x ≠i
(11)
otherwise
After binarization, we find that there is still much noise on a ridge region. In order to make the result of thinning better, we have to smooth the fingerprint image first. A smoothing stage uses neighboring pixels to remove noise. First a 5×5 filter is used. The pixel pi is assigned by: 255 ° pi = ® 0 °p ¯ i
¦ if ¦
if
5×5 N w
≥ 18
5×5 N b
≥ 18
otherwise
(12)
A Novelty Automatic Fingerprint Matching System
105
Then a 3×3 filter is further proceeded by: 255 ° pi = ® 0 °p ¯ i
¦ if ¦
if
3×3 N w
≥5
3×3 N b
≥5
(13)
otherwise
The purpose of thinning stage is to gain the skeleton structure of fingerprint image. The ridge is thinned to unit width to extract minutiae. We use Hilditch’s algorithm to preserve the connectedness and shape of the fingerprint image[2]. 3.2 Minutiae Extraction
We classify a ridge pixel P into one of the five types according to the number of its 8connected neighbors[9]. The types can be defined as follows: Table 2. Types of a ridge pixel according to P
#of neighbors type
0 Isolated
1 Ending
2 Edge
3 Bifurcation
4 Crossing
Fig. 2. Spurious minutiae points
Ending and bifurcation points are defined to be minutiae. Due to broken ridges, fur effects, and ridge endings near the margins of image, we have to remove the spurious minutiae as described as follows. 1). Two endings are too close (within 8 pixels)(Fig. 2(a), Fig. 2(b)). 2). An ending and a bifurcation are too close (Fig. 2(c)). 3). Two bifurcations are too close (Fig. 2(d), Fig. 2(e)). 4). Minutiae are near the margins. (Fig. 2(f)) 3.3 Fingerprint Template Data
We want to define a standard format of fingerprint template data. For the security, we only store the information of singular points and minutiae of fingerprint. For the capacity of fingerprint database, we would like to use fewer bits to store the information about singular points and minutiae of a fingerprint image. Finally, we will match the fingerprint template data instead of checking a fingerprint image to reduce the matching time[10]. The format of fingerprint template data is described in Table 3. Since the fingerprint images are separated into five types, 4 bits of memory space are required to represent what type of a fingerprint image is. There are at most two cores and two deltas in a fingerprint image, so we only use 2 bits to store the number of cores or deltas. The detailed format of cores or deltas is described in Table 4. In general the
106
T. Chen
size of a fingerprint image is usually not larger than 1024×1024, so we use 10 bits memory space to store the X coordinate or Y coordinate. Although we only define 8 directions for core or delta, for the expansion, 4 bits of memory space are required to represent what direction of a core or a delta is. There are usually 20~70 minutiae detected in a fingerprint image, so 7 bits of memory space are used to record the number of minutiae. The detailed format of minutiae is described in Table 5. Although there are only two kinds of minutiae (ending, bifurcation) for our method, there are four kinds of minutiae for other methods, 2 bits of memory space is required to represent what kind of a minutiae is. Table 3. The information format of fingerprint template data Type 4 bits
# of cores 2bits
Core* 24 bits
# of deltas 2 bits
Delta* 24 bits
Minutiaee* 26 bits
# of minutiaee 7 bits
Table 4. The information format of singular points, core or delta X Coordinate 10 bits
Y Coordinate 10 bits
Direction 4 bits
Table 5. The information format of a minutiae Kind of Minutiae 2 bits
X Coordinate 10 bits
Y Coordinate 10bits
Direction 4 bits
4 Fingerprint Matching The last and most important step of fingerprint matching system is the match execution. The rule of our system is based on minutiae matching. We first define the registration point, then we use our method to compute the matching score, the larger, the better. 4.1 Registration Point
The registration point is regarded as the origin in a fingerprint matching. We usually use the core of each fingerprint image as a registration point. If the type of fingerprint classification is whorl, its core of smaller row coordination is employed. If the type of fingerprint classification is left loop or right loop, its core is utilized. However, if the type of fingerprint classification is arch, we use the mask as shown in Fig. 3 to find the registration point. If the orientation of the block and its neighbors are similar to the mask, the center of this mask region is the registration point. If the type of fingerprint classification is “others”, the center coordination is the registration point.
Fig. 3. Mask of tented region
A Novelty Automatic Fingerprint Matching System
107
4.2 Minutiae Matching
Matching a fingerprint from a query with the ones stored in a database is equivalent to comparing their minutiae patterns[11]. We read the features of a fingerprint from the corresponding fingerprint template data before the matching process, and there are four steps involved in our matching process. 1). Check the type of fingerprint; 2). Overlay by registration point; 3). Rotate and relocate a minutiae pattern; 4). Compute the matching score (the matching score, the larger, the better match). First, we check whether the types of fingerprint images are the same or not. A fingerprint image is only compared with those of the same type. The process can increase the recognition rate and reduce the matching time. Then we overlay these two fingerprint images by the registration point, and rotate the fingerprint image to make the orientation of registration point the same. Because of the rotation, we have to compute new position and new angle of each minutiae. Then all minutiae points are compared by their relative coordinates to registration point. We employ a tolerance disk (radius about 8~16 pixels) to match the minutiae points. If the same kind minutiae is found in the region, it is said to be a successful minutiae match. When we find out all the successful minutiae matches, the matching score of these two fingerprints is calculated by S = 100 ×
1 M
M
r
¦ (1 − R j =1
)
(14)
j
where M is the number of potential type-matching minutiae within a disk of a certain user-specified radius, R (about 8~16 pixels), r measures the distance between a pair of potentially matched minutiae points.
5 Experimental Results and Evaluation This section first introduces three fingerprint databases, Rindex, Lindex and FVC2000. The fingerprint databases we use are Rindex, Lindex and four databases from FVC2000. Our experiments are running on a windows based system with P-IV2.4GHz CPU and 512 MB RAM. Each fingerprint image is compared with the others in the same database. Then the matching score is computed for each comparison. We use a leave-one-out method to evaluate our AFMS. If the matching score with the other fingerprint images provided from the same person is ranked as the top 3. It is said to be a successful match for our system; otherwise, it is a dismatch. 5.1 Fingerprint Databases
The first database is Rindex. It contains 112 images of size 300 by 300 contributed by 28 different individuals. Each contributed four times with the same finger. Four fingerprint images from the same individual are given below.
Fig. 4. Four right index fingerprint images of the same individual
108
T. Chen
The second database is Lindex. It contains 404 images of size 300 by 300 contributed by 101 different individuals. Each contributed 4 times with the same left index finger. Four fingerprint images from the same individual are given below.
Fig. 5. Four left index fingerprint images of the same individual
Fig. 6. Examples of fingerprint images from each database
There are four different databases provided by FVC2000, DB1, DB2, DB3, DB4. Each database contains 80 fingerprints from 10 different individuals. Each contributed 8 times with the same finger. Examples of fingerprint images from each database are given above. 5.2 Experiments on Rindex
Rindex contains 112 right index fingerprint images of size 300 by 300 from 28 persons. Every fingerprint image is compared with other 111 fingerprint images in this database. Among the 112 matches, we accept 111 matches, and reject 1 match. Therefore, a 99.1% recognition rate is achieved. The only one non-matched fingerprint image is shown in Fig. 7.
(a)
(b)
(c)
(d)
Fig. 7. The fingerprint of poor quality
In Fig. 7(b), we detect the singular points of this fingerprint image correctly. So there is no problem on classification. In Fig. 7(d), we find the number of minutiae is so few. The information of minutiae pattern is too few to be compared. It is due to the poor quality of the original fingerprint image. There is too much noise in the right part of the fingerprint image. 5.3 Experiments on Lindex
Lindex contains 404 left index fingerprint images of size 300 by 300 from 101 persons. Every fingerprint image is compared with the other 403 fingerprint images in
A Novelty Automatic Fingerprint Matching System
109
this database. Among the 403 matches, we accept 334 matches, and reject 69 matches. Therefore, an 82.67% recognition rate is achieved. The non-matched fingerprint images are shown in Fig. 8 and Fig. 9.
(a)
(b)
(c)
(a)
Fig. 8. Fingerprint of position shifted and poor quality
(b)
Fig. 9. Fingerprints of position shifted
In Fig. 8, the same person provides these three fingerprint images. There are some problems between these images. In Fig. 8(c), the quality of the fingerprint image is bad. In Fig. 8(a) and Fig. 8(b), the contents of the images are a little different. When we capture the fingerprint image, the provider may put his finger with a certain distance of position shifted. The information of the minutiae in Fig. 8(a) and Fig. 8(b) are much different, so they are rejected in our AFMS. In Fig. 9, although the quality of these two fingerprint images are good, the rejection occurred because of wrong classification due to missing cores or deltas. Among 404 images, we have 17 mis-classification results due to a certain distance of shift. 5.4 Experiments on FVC2000
FVC2000 provides 4 databases, DB1, DB2, DB3 and DB4. Each database contains 80 fingerprint images from 10 individuals. We achieve 92.5% (74/80), 90.00% (72/80), 87.5% (70/80), and 92.5% (74/80) recognition rates in these databases with the same evaluation method as done for the previous two databases. The matching errors of DB1, DB2, and DB3 mostly occurred due to the shift of fingerprints. Next the quality of fingerprint images in DB3 is not good. 5.5 Summary
Table 6 shows the experimental results of 6 test databases in our AFMS. The enrolling time contains the time of type classification and minutiae detection. Because we match a fingerprint template data instead of checking a fingerprint image, we spend less matching time. Table 6. The experimental results of various databases
Recognition rate Enrolling time for each fingerprint image Matching time
Rindex 99.11% 111/112 0.28 sec
Lindex 82.67% 334/404 0.28 sec
DB1 92.50% 74/80 0.28 sec
DB2 90.00% 72/80 0.28 sec
DB3 87.50% 70/80 0.50 sec
DB4 92.50% 74/80 0.19 sec
0.371 sec
3.22 sec
0.28 sec
0.226 sec
0.243 sec
0.166 sec
110
T. Chen
6 Conclusion and Future Work In this paper, we reveal three problems, which affect the result of our AFMS. If we could overcome the problems, the performance of AFMS could be improved. First, noise causes the poor binarization image. We cannot distinguish ridges and valleys clearly. It will make the poor result of thinning image, which result in minutiae pattern. Next, broken ridges result in the error orientation, which cause wrong type classification. The shifted fingerprint image is difficult to match the minutiae pattern well and it may cause wrong type classification because of missing cores or deltas. In order to improve the performance of our system, we have to find some better algorithm such as [12] to solve these problems. We refine some ideas to solve the problems in the future. First, we need to capture the fingerprint image carefully. Next, we could enhance the quality of the fingerprint image in advance. If our AFMS could distinguish ridges from valleys more clearly, the results of thinning could be better, and more useful minutiae points could be detected to improve the recognition rate. Although a type classification could reduce the matching time, there are some classification errors in our database. So we could also match fingerprint images without any classification in a small database. In large databases, we could find a robust classification method to classify the database without cores and deltas. This could avoid error type classification which occurs due to missing cores and deltas.
References 1. Jea, T. Y., Govindaraju, V.: A Minutiae-based Partial Fingerprint Recognition System, Pattern Recognition, 38(10), (2005)1672-1684 2. Naccache, N.J., Shinghal, P.: An Investigation into the Skeletonization Approach of Hilditch, Pattern Recognition, 17(3), (1984)279-284 3. Karu, K., Jain, A.K.: Fingerprint Classification, Pattern Recognition, 29(3), (1996)284-404 4. Ko, T.: Fingerprint Enhancement by Spectral Analysis Techniques, The Proceedings of Applies Imagery Pattern Recognition Workshop, (2002)133-139 5. Bazen, A.M., Gerez, S.H.: Directional Field Computation for Fingerprints Based on the Principal Component Analysis of Local Gradients, In Proc. ProRISC2000 Workshop on Circuits, Systems and Signal Processing, Veldhoven, The Netherlands, Nov. (2000) 6. Bazen, A.M., Gerez, S.H.: Segmentation of Fingerprint Images, In Proc. ProRISC2001 Workshop on Circuits, Systems and Signal Processing, Veldhoven, The Netherlands, Nov. (2001) 7. A.Senior: A Combination Fingerprint Classifier, IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(10), (2001)1165-1174 8. Conti, V., Pilato, G., Vitabile, S., Sorbello, F.: Verification of Ink-on-paper Fingerprints by Using Image Processing Techniques and a New Matching Operator, VIII Convegno AI*IA, Siena 10-13, (2002)594-601 9. Espinosa-Duro, V.: Minutiae Detection Algorithm for Fingerprint Recognition, IEEE Aerospace and Electronics Systems Magazine, 17(3), (2002)7-10
A Novelty Automatic Fingerprint Matching System
111
10. Maio, D., maltoni, D., Cappelli, R., Wayman, J.L, Jain, A.K.: FVC2000: Fingerprint Verification Competition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3), (2002)811-816 11. Zorita, D. S., Garcia, J.O., Lianas, S. C., Rodriguez, J. G.: Minutiae Extraction Scheme for Fingerprint Recognition Systems, International Conference on Image Processing, Vol. 2, (2001)254-257 12. Zhang, Q.Z., Yan, H.: Fingerprint Classification Based on Extraction and Analysis of Singularities and Pseudo Ridges, Pattern Recognition, 37(11), (2004)2233-2243
Abnormal Pattern Parameters Estimation of Control Chart Based on Wavelet Transform and Probabilistic Neural Network Shaoxiong Wu Dept. of Economics and Management, Fujian University of Technology Fuzhou, Fujian 350014, P.R. China
[email protected]
Abstract. The general framework of combining wavelet transform with PNN was presented, and applied in abnormal pattern parameters estimation of control chart. The simulation results show that the performance of the proposed method has many advantages, such as simple structure, quick convergence, higher recognition rate, which can be used in the abnormal patterns parameter estimation of control chart.
1 Introduction The control chart is widely used in manufacturing process control, establishing process parameters and evaluating process capability. It lies in its ability to separate special disturbances from inherent variability by plotting the point on the chart. When the process is out of control, the operator should track down the causes of it, whether it is tool wear, machine wear, material differences, or environmental factors, and adjust the process in time to keep it under control. The patterns recognition and abnormal pattern parameters estimation of the control chart is the two main fields of intelligence statistical process control system. When the process quality data is input into system for plotting the control chart and recognizing its patterns, we can determine whether the process is in control or out of control. If the process is out of control, we shall estimate some parameters of it in order to find out its degree of drift and adjust the process in time to keep it under control. This process is shown in Fig.1. In dealing with abnormal pattern parameters estimation of control chart, Le Qinghong et al put forward a supervised linear feature mapping (SLFM) network[1] (2002) Ruey-shiang Guh used BPN to recognize it[2] (2003), and presented a hybrid learning-based model[3] (2005). In his model, he combined BPN with decision tree (DT) to estimate the abnormal pattern parameters. The wavelet transform, developed as a branch of applied mathematics in the late 1980’s, has become a popular tool for many areas, such as signal and image processing, etc. It provides a mechanism for “zooming” in on fine details of a signal because of its variable time-frequent windows, although both are time frequency location analysis methods. PNN has the characteristics of simple structure, stabilized working attitude, strong nonlinear mapping, quick training speed and good convergence. In practice, PNN is often an excellent pattern classifier, outperforming other classifiers including BPN. In this paper, a method of combining wavelet D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 112 – 119, 2006. © Springer-Verlag Berlin Heidelberg 2006
Abnormal Pattern Parameters Estimation of Control Chart
113
transform with PNN was presented, and it was used to estimate the parameters of abnormal patterns of control chart. The parameters include slip of trend pattern, magnitude of shift pattern, amplitude and period of cyclic pattern.
Fig. 1. Model of on-line detection and analysis System of Control Chart
2 WPNN Basic Theories 2.1 Wavelet Transform If Ψ (t ) ∈ L2 ( R) , and satisfies the condition +∞
Ψ ( w)
−∞
w
CΨ = ³
dw < ∞ .
(1)
Then we call ψ (t ) wavelet bases. A set of discrete wavelets are generated from the wavelet base, ψ (t ) , by Ψ j , k (t ) = 2 − j / 2 Ψ (2 − j t − k )
j, k ∈ Z .
(2)
Mallat can be used generally in discrete wavelet transform of one-dimension. According to it, low-pass filter and high-pass filter are applied to the discrete wavelet transform of a discrete signal, iteratively and subsequently down sampling them by two. Fig. 2 illustrates this process, where g[n] and h[n] are the high-pass and lowpass filters, respectively in (3) and (4). At each level, this procedure is computed.
y high = ¦ x[n] ⋅ g[2k − n] .
(3)
y low = ¦ x[n] ⋅ h[2k − n] .
(4)
h[ N − 1 − n] = (−1) n g[n] .
(5)
n
n
Where
114
S. Wu
N is the total number of samples in x[n], yhigh is the outputs of high-pass, and ylow is the outputs of low-pass. At the last level, the ylow[k] obtained is called approximation signal, the yhigh[k] computed at each level is called the detail coefficient at that level.
Fig. 2. Computation of discrete wavelet transform
2.2 PNN
PNN was first put forward by Dr. D. F. Specht in 1989. It is made up of four layers. The first layer is an input layer, at which the activation function is a pure-line function. The second layer is an exemplar layer, activation function at this layer is g(zi)=exp[(zi-1)/(ı2)], where, zi is the input of the ith neurons, and ı is mean deviation. The third layer is summation layer which has the function of liner sum, and the number of neurons is equal to that number of patterns. The fourth layer is an output layer which has the decision function governed by Winner-talk-all mechanism, and its output is 1 and -1 (or 0) which presents classifies of input patterns. As for abnormal patterns parameters estimation of control chart, the number of input data contained 32 points used to input data from 32 consecutive sample data points in a control chart. The input data was decomposed by wavelet function. The feature vector extracted thus fed to PNN. Estimator for the probability density functions is, p ( x | si ) =
ª − ( x − x (ji ) )T ( x − x (ji ) ) º exp ». ¦ « (2π ) m / 2 σ im ni j =1 2σ i2 ¼» ¬« 1
ni
Fig. 3. The structure of PNN
(6)
Abnormal Pattern Parameters Estimation of Control Chart
115
Where ni is the cardinality of the set of patterns in class si , x (ij ) is jth exemplar pattern or training pattern belonging to class si type of pattern, σ i is smoothing parameter, p( x | si ) is the probability of vector x occurring in set si, corresponding to the type of pattern. 2.3 Structure of Combined Wavelet Transform with PNN
The general framework is illustrated in Fig. 4. The steps are as follows: 1) In the first phase, input data was decomposed to the 3rd lever by wavelet transform. 2) The energy at each lever’s detail coefficient was computed. 3) The features vector T was the combination of the approximations decomposed at the 3rd lever with energy of each lever’s detail coefficient. 4) The features vector T was fed to PNN for abnormal pattern parameters estimation of control chart.
Fig. 4. The structure of combined wavelet transform with PNN
3 Control Chart Patterns and Its Description The control chart patterns for this research can be divided into six types: normal pattern, cyclic pattern, increasing trend, decreasing trend, upward shift and downward shift. This is illustrated in Fig. 5. The following expressions is used to generate the training and test data sets, which is expressed in a general form and includes the process and two noise components: Y(t)=ȝ+r(t)+S(t).
(7)
Where Y(t) is the sample value at time t, ȝ is the mean value of the process variable being monitored, r(t) is a random normal noise or variation, and S(t) is a special disturbance due to some assignable causes. 1) Normal patterns: S(t)=0 2) Cyclic patterns: S(t)= a sin( 2πt / T ) Where a is amplitude of cyclic variations, T is period of a cycle. 3) Increasing or decreasing trends: S(t)= ±gt Where g is magnitude of the gradient of the trend, if S(t)>0, it expresses increasing trends, otherwise it is decreasing trends. 4) Upward or downward shifts: S(t)= ±ks Where k is parameter determining the shift position, s is magnitude of the shift, if S(t)>0, it expresses upward trend, otherwise it is downward trend.
116
S. Wu
(a) Normal pattern
(b) Cyclic pattern
(c) Increasing trend
(d) Decreasing trend
(e) Upward shift
(f) Downward shift
Fig. 5. Control chart patterns
4 Simulation and Results 4.1 Training Sample and Test Sample
The training sample and test sample were generated by formula (7) the number of input data contained 32 sample data. In this work, we takes u=0 and ı=1. The actual training inputs and testing inputs were scaled values of y(t). Scaling was performed by using the following expressions: y'(t)=(y(t)-ymin)/(ymax- ymin).
(8)
where y'(t) =scaled pattern valus (in the range 0 to 1) ymin=minimum allowed value ymaxn=maximum allowed value For increasing and decreasing trends, the slip took at the value [-0.14, -0.10, -0.06, -0.02, +0.02, +0.06, +0.10, +0.14] . Each slip produced 300 training sample, and the total number was 2400. Each slip produced 600 test sample, and the total number was 4800. For upward and downward shifts, the magnitude of shift took at the value [-7, -5, 3, -1, +1, +3, +5, +7]. Each magnitude produced 300 training sample, and the total number was 2400. Each magnitude produced 600 test sample, and the total number was 4800. For cyclic patterns, the amplitude of cyclic took at the value [1, 3, 5, 7]. Each amplitude produced 300 training sample, and the total number was 1200. Each amplitude produced 600 test sample, and the total number was 2400.
Abnormal Pattern Parameters Estimation of Control Chart
117
For cyclic patterns, the period of a cyclic took at the value [4, 8, 12, 16, 20, 24, 28]. Each period produced 300 training sample, and the total number was 2100. Each period produced 600 test sample, and the total number was 4200. 4.2 Simulation
In this work, the input data was decomposed to three lever through wavelet transform, by Daubechies(dbN), SymletsA(symN), Coiflet(coifN). 4.2.1 Pattern Parameters Estimation of Increasing and Decreasing Trends For increasing and decreasing trends, Fig. 6 shows the aggregate performance testing results of different wavelet function.
--o--Daubechies(dbN) --*--Coiflet(coifN) --¡--SymletsA(symN)
Fig. 6. Classification performance of different wavelet function (increasing or decreasing trends)
From Fig. 6, the better recognition performance is coif1, db2 and sym2. The recognition rate is 93.1458, 93.2917 and 93.2917 respectively. 4.2.2 Pattern Parameters Estimation of Upward and Downward Shifts For upward and downward shifts, Fig. 7 shows the aggregate performance testing results of different wavelet function. From Fig. 7, the best recognition performance is coif5 and sym7. The recognition rate is 89.9583.
--o--Daubechies(dbN) --*--Coiflet(coifN) --¡--SymletsA(symN)
Fig. 7. Classification performance for different wavelet function(upward and downward shifts)
118
S. Wu
4.2.3 Pattern Parameters Estimation of Cyclic Pattern’s Amplitude For cyclic pattern’s amplitude, Fig. 8 shows the aggregate performance testing results of different wavelet function. From Fig. 8, the best recognition performance is coif5, and its recognition rate is 80.5833. 4.2.4 Pattern Parameters Estimation of Cyclic Pattern’s Period For the pattern parameters estimation of cyclic pattern’s period, tTested every 4 periods, decomposed by the wavelet function Daubechies(dbN), SymletsA(symN) and Coiflet(coifN), the training samples and the test samples could be recognized precisely(shown in table 1). Then another experiment was done, whose samples were decomposed to three levers through wavelet function coif5 and interval of test periods was reduced to 3, and the training samples and the test samples were recognized at 99.76% by aggregate recognition rate (shown in table 1). And if the interval of test periods was reduced more, the aggregate recognition rate will become lower. Because if the interval of test periods becomes narrower, the feature of samples will become vaguer, that is why the accurate rate of recognition will become lower as well. Table 1. The coif5 recognition results with reduced interval of period wavelet training test function samples samples dbN, symN, coifN
2100
coif5
2700
period(T) 4 4200 recognition 100 rate period(T) 3 5400 recognition 100 rate
aggregate recognition rate(%)
recognition rate at each period(%) 8
12
16
20
24
28
100
100
100
100
100
100
6
9
12
15
18
21
24
27
98.4
100
100
99.8
100
100
99.8
99.8
100
99.76
4.3 Comparison of WPNN with PNN
To describe the performance of WPNN, PNN was used to estimate the abnormal pattern parameters of control chart. The recognition results of WPNN and PNN are shown in table 2. Table 2. The recognition results of WPNN and PNN
WPNN PNN
slip of trend pattern 93.2917 37.815
magnitude of shift pattern 89.9583 60.3750
amplitude of cyclic pattern 80.5833 38.7
period of cyclic pattern 100 100
The recognition results show that WPNN have higher recognition rate than PNN.
Abnormal Pattern Parameters Estimation of Control Chart
119
--o--Daubechies(dbN) --*--Coiflet(coifN) --¡--SymletsA(symN)
Fig. 8. Classification performance for different wavelet function(cyclic pattern’s amplitude)
5 Conclusions In this work, WPNN was applied in the abnormal pattern parameters estimation of control chart. The simulation results show that WPNN has many advantages, such as quicker training and better recognition performance than ANN and PNN. From the simulation results, we can also come to following conclusion: 1) The features vector which combined approximations with energy of each lever’s detail coefficient can be used as the input of PNN. 2) It is feasible that WPNN is applied in the abnormal pattern parameters estimation of control chart. From the simulation results, it was found that WPNN has higher rate in aggregate recognition of patterns. But WPNN still has some faults, especially when the recognition of cyclic pattern amplitude is only 80.5833%, which is too low to be applied in the realities. So the improvement of this method is the importance of advanced research, for example, SVM based on multi-class classification or wavelet transformation and multi-class SVM structure could be used to estimate the parameters of abnormal patterns of control chart to improve the recognition precision.
References 1. Le, Q. H.: A Neural Network Approach for Abnormal Pattern Parameters Estimation of Control Charts. Aeronautical Manufacturing Technology, 4 (2002) 31-33 2. Ruey-Shiang, Guh.: Intergrating Artificial Intelligence into On-line Statistical Process Control. Quality and Reliability Engineering International. 19 (2003) 1-20 3. Ruey-shiang, Guh.: A hybrid Learning-based Model for On-line Detection and Analysis of Control Chart Patterns. Computers & Industrial Engineering, 49 (2005) 35-62 4. Pham, D. T, Waini, M. A.: Feature-based Control Chart Pattern Recognition. INT. J. PROD. RES., 35 (7) (1997) 1875-1890
An Error Concealment Technique Based on JPEG-2000 and Projections onto Convex Sets Tianding Chen Institute of Communications and Information Technology, Zhejiang Gongshang University, Hangzhou, China, 310035
[email protected]
Abstract. It proposes an error concealment technique based on ROI (region of interest) embedding. The ROI of image contains the human-interested information and is the most important area. In our proposed technique, several copies of ROI are embedded into ROB (region of background) for reconstructing ROI in corrupted image. Even if all ROI are damaged, the ROI still can be reconstructed by extracting information embedded in ROB. Although JPEG-2000 stream of ROI is embedded into ROB, it is invisible to the human eyes. Simulation results show the original image only has a little bit distortion after embedding JPEG2000 stream of ROI. The reconstructed image also has high visual quality and PSNR value. Compared with traditional techniques, the proposed technique outperforms the existing techniques especially in the performance of ROI.
1 Introduction Nowadays, image and video are used in a wide range of applications on the internet. Due to the amount of data, it is necessary to offer an efficient coding mechanism to reduce data. Many image and video coding standards, such as JPEG , H.26x and MPEG, are based on block based coding and adopt discrete cosine transform (DCT), variable-length coding (VLC) , and motion compensation (MC) techniques[1][2][3]. Although VLC is very simple and efficient for reducing data, it is sensitive to error in transmission. This is a critical problem when transmitted over an imperfect communication channel. Only one bit error may cause error-propagation and lose synchronization with start and end of blocks. When transmitting image or video data over error prone channels, some unexpected errors may occur, an example is lost block. Specifically in JPEG format image, an image is segmented into non-overlapping ×8 blocks. Each block is transformed via DCT, quantization and entropy coded using VLC. The compressed image can be transmitted over communication cannels or stored. In fact, the real-world communication channels are not always reliable and the block-based image or video coding is very sensitive to transmission error. Although retransmission of lost data is a very useful solution, such as automatic repeat request(ARQ)[4], the retransmission may degrade the performance. It is not suitable for real time system. Therefore, when image or video is compressed and transmitted over unreliable communication channels, D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 120 – 130, 2006. © Springer-Verlag Berlin Heidelberg 2006
An Error Concealment Technique
121
some techniques must be utilized to make the quality of the decoded image or video acceptable. In our technique, some information is embedded before image or video is compressed. After decoding, if image or video has missing blocks, we reconstruct lost blocks by extracting embedded information. To solve the problems mentioned above, the error concealment technique is widely investigated for real application. Error concealment is an effective method to estimate the lost data and it has the advantage of consuming no extra bandwidth. The traditional error concealment techniques reconstruct the lost block by using pixels in correct blocks neighboring to the lost block. Wang et al.[5], and Hemami and Meng[6] are earlier examples of using spatial interpolation to estimate lost blocks. Nevertheless, spatial interpolation approaches often cause blur in the complex or edge blocks. In this paper, we propose an error concealment based on JPEG-2000 and projection onto convex set (POCS). According to JPEG-2000 standard, we could divide image into the ROI and the ROB. The ROI of original image is embedded into ROB for reconstructing error blocks of ROI in corrupted image. On the other hand, the ROB adopts POCS technique to estimate lost blocks.
2 A Survey of Error Concealment in Spatial Domain In general, error concealment techniques include two categories: spatial domain and temporal domain. In spatial domain, the error concealment attempts to recover the error blocks by utilizing information from the spatially neighbor blocks. In temporal domain, error concealment conceals the error block with motion vector, by utilizing the data in the adjacent frames. 2.1 Spatial Domain Interpolation for Error Concealment A simple approach is to interpolate pixels in a damaged block from pixels in adjacent correctly received blocks. In general, only the boundary pixels in neighboring blocks are used for interpolation[7]. Comparison with interpolating individual pixels, a simpler approach is to estimate the DC coefficient of a damaged block and replace the damaged block with the estimated DC value. The DC value can be estimated by averaging the DC values of surrounding blocks[8]. No matter interpolating individual pixels or estimating DC value, these methods work well only on smooth blocks. For the irregular or high-detail blocks, these approaches make noticeable blurring image blocks and the reconstructed block does not match well with the surrounding blocks. 2.2 Best Neighborhood Matching Algorithm The Best Neighborhood Matching (BNM) algorithm was presented by Zhou Wang, Yinglin Yu, and David Zhang[9]. Their algorithm can be summarized as follows. 1) Localize the lost block. 2) Set the range block which is bigger than the lost block. 3) Set the searching range in the whole image.
122
T. Chen
4) Search for a candidate domain block with the same size of the range block in the searching range within an image. 5) Using good pixels from range block with the domain block through a matching function. 6) Calculate minimal MSE between domain block and range block. 7) Recover the lost block by using pixel values transformed from the corresponding part of best domain block. 2.3 Multi-directional Interpolation for Error Concealment Kwork and Sun proposed a technique based on the detection of edge to conceal the unknown block[10]. First, classify the missing block into one or more of the eight directional categories. Second, each of the classified direction applies the spatial interpolation along the direction to produce a set of blocks. Finally, the blocks in the set are mixed together. 2.4 Projection onto Convex Sets Methods POCS is an important result from vector-space projections theory. In the image recovery problem, the algorithm searches for a solution that is consistent with a number of a priori constrains, which are defined as convex sets. Then an iterative projection is applied to the convex sets, the optimal solution is obtained by iteratively projecting a previous solution onto each convex set. Fig. 1 draws the projection onto convex set.
Fig. 1. Projection onto convex set
Park et al. proposed a Recovery of image blocks using the method of alternating projections (RIBMAP)[11]. This technique is described resumptively as follows: 1) Line orientation detection: Computing gradients of missing block, and classifying error block belong to vertical or horizontal block. 2) Making surrounding vector library: For missing block, the neighborhood area can be segmented into several pixel blocks. The segmentation of the neighborhood area and corresponding vectors are formed by shifting a N×N window on every grid of pixels. This is shown in Fig. 2. 3) Making recovery vectors: According to the dominating line orientations, two positions of the recovery vectors are employed. This is shown in Fig. 3. Each vector includes (N −1)×N known and 1×N unknown or N×(N −1) known and N×1 unknown pixels.
An Error Concealment Technique
Fig. 2. Surrounding vectors
123
Fig. 3. Recovery vectors
4) Projection operator P1: The surrounding vector are used to form a convex hull in an N×N dimensional space. The closet vertices of the convex hull to the recovery vector are found in the mean-square sense
d i = arg min ri − s i
for
1 ≤ i ≤ 2,1 ≤ j ≤ 8 N
(1)
d i = arg min Ri − S i
for
1 ≤ i ≤ 2,1 ≤ j ≤ 8 N
(2)
j
Or j
where Ri = T ⋅ ri , S j = T ⋅ s j , and T is a 2-D DCT kernel. The recovery vectors are
then orthogonally projected onto the selected vertex Sˆi , as Psˆ j ( Ri ) =
Ri , Sˆi sˆi
2
⋅ Sˆi
i = 1,2
(3)
To preserve the dc level, the dc value in the recovery vectors is not changed. Pˆ ( R (u, v)), P1 ⋅ Ri (u, v) = ® s j i Ri (u, v), ¯
for u, v ≠ 0 otherwise
(4)
5) Projection operator P2: Projection operator P2 imposes constraints on the range on the restored pixel values. The convex set C2 is C 2 = { f : Fmin ≤ f n ≤ Fmax
for
n ∈ L}
(5)
where L is the set of missing pixels and Fmin and Fmax are chosen minimum and maximum pixel values of an image. The P2 is Fmin , °F , ° P2 ⋅ f n = ® max ° fn , °¯ C n ,
for
for f n ≤ Fmin , n ∈ L for f n > Fmax , n ∈ L Fmin ≤ f n ≤ Fmax , n ∈ L otherwise
(6)
6) Projection operator P3: define h = [( f 0 − g 0 ), , ( f N − g N )] = f − g . The f is the vector of missing pixels in a recovery vector, and g is the vector of adjacent pixels
124
T. Chen
to the missing line in the same vector. By setting the vector h as a bounded signal with a constant, Į , the convex set for P3 is C3 = {h : hn ≤ α }
(7)
The value of Į can be set to the maximum value of differences between pixels which are adjacent to the missing block in the surrounding neighborhood(shown in Fig. 4.).
Fig. 4. Areas for computing parameters Į1 and Į2
The P3 is shown as follows: g n − α , ° P3 ⋅ f n = ® g n + α , ° fn , ¯
hn < −α hn > α
for for
(8)
otherwise
7) Iterative operator P1 P2 P3 8) Projection operator P4: After all pixels in a missing block are restored, a final convex constraint is applied to two center lines of the restored block. Define e=[(fc1,0fc2,0),…, (fc1,N - fc2,N)] where fc1 and fc2 are the final restored pixels of each center line in a restored block. The convex set C4 and P4 are C 4 {e : e n ≤ β }
P4 ⋅ f m,n
f c1,n + f c 2,n , ° ° 2± β ° f c1,n + f c 2,n =® , 2# β ° f m,n , ° ° ¯
for
hn < − β
for
hn > β
(9)
otherwise
Missing pixels are restored iteratively by alternatively projecting onto the convex sets.
3 The Proposed Error Concealment Technique The proposed error concealment technique is presented in this chapter. The proposed technique emphasizes robustness of ROI. It is based on data hiding techniques to reconstruct lost data in ROI. On the other hand, the ROB can be recovered
An Error Concealment Technique
125
by traditional error concealment technique. The proposed technique is briefly described below. 1) Partition an image into ROI and ROB; 2) JPEG-2000 bit stream of ROI is embedded into ROB; 3) Missing blocks of ROI are recovered by extracting information from ROB; 4) ROB is recovered by traditional error concealment technique. The proposed technique is divided into two parts, preprocess and error concealment method. The architecture and detailed steps are described in the following.
3.1 The Preprocess of Proposed Technique Fig. 5 shows the diagram of preprocess. The preprocess of proposed technique is listed below. 1) ROI definition; 2) Generating JPEG-2000 bit stream of ROI; 3) Embedding JPEG-2000 bit stream of ROI into ROB.
Fig. 5. Preprocess
In the proposed technique, firstly, users need to define ROI, and this is an important area in an image. Then, The ROI can be encoded by JPEG-2000 (code stream). Finally, Embedding JPEG-2000 bit stream of ROI into ROB. Let original image be divided into a set of blocks with size w×w pixels. All JPEG-2000 bit streams of ROI are embedded into randomly selected blocks of ROB. Generally, bit streams are embedded into every pixel value except ROI. Every pixel value of ROB changes the Least Significant Bit (LSB) to embed bit streams of JPEG-2000 and embedding sequence is shown in Fig. 6. Fig. 7 shows the original image and embedded image of “Lena”.
3.2 The Proposed Error Concealment Method The proposed error concealment method is listed below. 1) Error concealment method. a) The extraction of JPEG-2000 bit stream of ROI from the corrupted image; b) The decompression of compressed ROI by JPEG-2000; c) The recovery of ROI error blocks using decompressed ROI; d) The recovery of ROB using traditional technique (RIBMAP). 2) The extraction of JPEG-2000 bit stream of ROI from the corrupted image. Extracting stage is the inverse of embedding process. The JPEG-2000 bit streams will be extracted when image is received at the decoder.
126
T. Chen
3) The decompression of compressed ROI by JPEG-2000. By extracting JPEG2000 bit stream, we can decompress the compressed ROI by JPEG-2000. 4) The recovery of ROI error blocks using decompressed ROI. The ROI error blocks are recovered using the decompressed ROI. 5) The recovery of ROB using traditional technique (RIBMAP). Finally, errors in ROB are recovered by RIBMAP technique.
(b)
(a) Fig. 6. Embedding sequence of JPEG-2000 bit stream in w×w block
Fig. 7. (a) Original Lena (b) embedded Lena, PSNR =56.25 dB
4 Experiment Simulation Results Simulations have been conducted to perform the proposed technique in comparison with some recent error concealment techniques[11][12][13][14][15]. All images are 51 ×512 monochrome still images with 256 gray levels per pixel, and ROI size is128×128 . In order to evaluate the quality of reconstructed image, the peak signalto-noise ratio (PSNR) criterion is adopted and is given as follows.
PSNR = 10 × log10
255 2 MSE
(10)
The MSE is MSE =
1 N1 × N 2
N1 −1 N 2 −1
¦ ¦ ( x(i, j ) − ~x (i, j ))
2
(11)
i =0 j =0
where N1 and N2 is high and width of image, respectively. The x(i, j) and ~ x (i, j ) are original and reconstructed images. In the experiments, we utilize seven monochrome still images, “Lena”, “Masquerade”, “Peppers”, “Boat”, “Elaine”, “Couple”, and “Barbara” as the testing images. All programs are written in Visual C++ 6.0. Taking Fmax and Fmin as the maximum and minimum values in surrounding vectors instead of full image will result in better PSNR value in projection operator P2. Simulation results for isolated and consecutive error blocks are summarized and shown in the following figures. Comparison with other techniques is also listed.
An Error Concealment Technique
127
4.1 Isolated Error Blocks 1) Boat with 8
(a) Original Boat
8 error blocks
(b) stream embedded Boat PSNR=55.69 dB
(c) ROI of Boat
(d) error image PSNR=10.89 dB
(e) RIBMAP recovery
(f) proposed ROI PSNR=36.96 dB full image PSNR=30.78 dB full image PSNR=32.43 dB Fig. 8. Error concealment results of Boat (8 8 error blocks)
2) Elaine with 8 8 error blocks
(a) Original Elaine
(b) stream embedded Elaine (c) ROI of Elaine PSNR=55.75 dB
(e) RIBMAP recovery ROI PSNR=38.40 dB
(d) error image PSNR=11.29 dB
(f) proposed full image PSNR=34.62 dB full image PSNR=35.13 dB
Fig. 9. Error concealment results of Elaine (8 8 error blocks)
128
T. Chen
3) Couple with 8 8 error blocks
(a) Original Couple (b) stream embedded Couple PSNR=55.59 dB
(c) ROI of Couple
(d) error image PSNR=12.23 dB
(e) RIBMAP recovery
(f) proposed
ROI PSNR=35.75 dB
full image PSNR=31.50 dB full image PSNR=31.66 dB
Fig. 10. Error concealment results of Couple (8 8 error blocks)
4.2 Consecutive Error Blocks 1) Lena with missing rows of 8×8 blocks Table 1. Comparison with RIBMAP (8×8 isolated error blocks) Lena 34.66 35.21
RIBMAP proposed
Masquerade 29.88 30.10
Peppers 34.21 35.50
Boat 30.77 32.44
Elaine 34.62 35.13
Couple 31.48 31.68
Average 32.61 33.33
Table 2. Comparison with RIBMAP (missing rows of 8×8 consecutive error blocks and 16×16 isolated error blocks)
RIBMAP proposed
missing rows of 8×8 blocks 30.19 31.35
16×16 isolated error blocks 32.71 33.25
2) Image with consecutive 16×16 error blocks Table 3. Comparison with Lin et al.[14] Block loss rate(%) 10.0 12.5 15.0 17.0
ROI 46.73 41.28 40.30 40.04
Lin Full image 36.34 34.36 33.56 31.77
ROI 47.26 46.83 45.55 45.01
proposed Full image 37.11 36.25 35.38 34.57
An Error Concealment Technique
129
Fig. 11. Reconstructed ROI
Fig. 12. Reconstructed full image
3) Block loss rate: ROI 10% ROB 10%
(a) Original Lena
(b) stream embedded Lena PSNR=55.58 dB
(e) Wang[15] recovery ROI PSNR=24.5 dB ROB PSNR=26.6 dB
(c) ROI of Lena
(d) error image PSNR=15.96 dB
(f) proposed ROI PSNR=48.42 dB ROB PSNR=36.79 dB
Fig. 13. Comparison with Wang[15] (with consecutive 8×8 error blocks and block loss rate: ROI=10%, ROB=10%)
5 Conclusion Traditional error concealment techniques utilize neighboring blocks to reconstruct missing blocks. Therefore, the neighboring blocks can not be damaged. In this paper,
130
T. Chen
a novel error concealment technique based on ROI embedding has been proposed. The proposed technique provides better performance than traditional techniques, especially for the ROI of image. This is because several copies of ROI are embedded into ROB for reconstructing ROI in corrupted image, even if all ROI are damaged, the ROI still can be recovered by extracting information embedded in ROB. Although JPEG-2000 stream of ROI is embedded into ROB, it is invisible to the human eyes. Simulation results demonstrate the original image does not have much distortion after embedding JPEG-2000 stream of ROI. In addition, the reconstructed image has high visual quality and PSNR value.
References 1. CCITT Recommendation T.81: Digital Compression and Coding of Continuous-tone Still Images (1992) 2. Wallace, G.: The JPEG Still Picture Compression Standard. IEEE Trans.Consumer Electronics, 38 (1992) 30-44 3. CCITT Recommendation H.261: Video Codec for Audiovisual Services at px64 kbit/s(1990) 4. Lin, S., Costello, D. J., Miller, M. J.: Automatic Repeat Request Error Control Schemes. IEEE Communication Magazine, 22 (1984) 5-17 5. Wang, Y., Zhu, Q., Shaw, L.: Coding and Cell-loss Recovery in DCT based Packet Video. IEEE Trans. on Circuits and Systems for Video Technology, 3 (1993) 248-258 6. Hemami, S., Meng, T.: Transform Coded Image Reconstruction Exploiting Interblock Correlation. IEEE Trans. on Image Processing, 4 (1995) 1023-1027 7. Aign, S.: Error Concealment for MPEG-2 video, in Signal Recovery Techniques for Image and Video Compression and Transmission. Kluwer Academic Publishers, (1998) 235-268 8. Hong, M. C., Kondi, L., Scwab, H., Katsaggelos, A. K.: Video Error Concealment Techniques. Signal Processing: Image Communications, 14 (1999) 437-492 9. Wang, Z., Yu, Y., Zhang, D.: Best Neighborhood Matching: an Information Loss Restoration Technique for Block-based Image Coding Systems. IEEE Trans. on image processing, 7 (1998)1056-1061 10. Kwork, W., Sun, H.: Multi-directional Interpolation for Spatial Error Concealment. IEEE Trans. on Consumer Electronics, Vol. 39, (1993)455-460 11. Park, J., Park, D.C., Marks II, R. J., El-Sharkaw, M. A.: Recovery of Image Blocks using the Method of Alternating Projections. IEEE Trans. on Image Proc. 14 (2005)461-474 12. Ma, Y.F., Cai, A.N.: A New Spatial Interpolation Method for Error Concealment. Proceedings of the IEEE Sixth International Symposium on Multimedia Software Engineering, (2004)65-69 13. Hsia, S.C.: An Edge-oriented Spatial Interpolation for Consecutive Block Error Concealment.IEEE Signal Processing Letters, 11 (2004) 577-580 14. Lin, S.D., Shie, S.C., Chen, J.W.: Image Error Concealment Based on Watermarking. The 7th International Conference on Digital Image Computing: Techniques and Applications (DICTA '03), Sydney, Australia, (2003) 137-143 15. Wang, J., Ji, L.: A Region and Data Hiding Based Error Concealment Scheme for Images. IEEE Trans. on Consumer Electronics, 47 (2001) 257-262
An Extended Learning Vector Quantization Algorithm Aiming at Recognition-Based Character Segmentation Lei Xu, Bai-Hua Xiao, Chun-Heng Wang, and Ru-Wei Dai Laboratory of Complex System and Intelligent Science Institute of Automation, Chinese Academy of Sciences Zhongguancun East Rd, No.95, Beijing, 100080, P.R. China
[email protected]
Abstract. Recognition-based segmentation strategies have greatly improved the performance of optical character recognition systems. The key issue of these strategies is to design a classifier that can provide accurate rejection information. Many learning algorithms, such as GLVQ and H2M-LVQ, are not suitable for large category sets and multiple prototypes. More seriously, they often suffer from local minimum state and overtraining. In this paper, we propose an extended learning vector quantization algorithm which can efficiently train the nearest prototype classifier with negative samples. The cost function is based on multiple confusable prototype-pairs so that our algorithm is insensitive to initialization. We also introduce the criterion of safe zone to avoid overtraining. Experimental results show that the classifier trained by our proposed method can achieve good recognition performance and can provide accurate rejection information for segmentation.
1 Introduction The optical character recognition (OCR) is one of the most challenging topics in the fields of pattern recognition. Given a document image, we first perform layout analysis and extract text lines. Then each text line is segmented into isolated individual character images. At last, these character images are sent to the classifier and the corresponding class labels are obtained. The whole process is straightforward for well-printed documents and is shown in Fig 1. However, through many experiments and applications, people have found that a major proportion of the recognition errors are caused by incorrect segmentation. In fact, in many practical documents with poor quality, neighboring character images may easily touch or overlap each other, and as a result, it is intractable to find the correct segmentation points only by means of image analysis. Fig 2 is a text line extracted from a picture taken by a mobile camera. We can see in this figure that the character pair ”www”, ”kj” and ”co” are all difficult for segmentation. For example, character ”c” and ”o” touch with each other, so that if we can not validate the character pair ”co”, we will miss the correct segmentation point B. Recently, the recognition-based segmentation strategy [1] becomes prevalent. The soul of this strategy is to establish a feedback loop linked from the output of the classifier to the segmentation engine, such that the tentative segmentation points can be D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 131–140, 2006. c Springer-Verlag Berlin Heidelberg 2006
132
L. Xu et al.
Fig. 1. Flow diagram of traditional OCR systems
validated by the classifier until the correct one is found. Obviously, the classifier with highly reliable rejection mechanism is crucial for the recognition-based segmentation. For example, if the character pair ”co” in Fig 2 is rejected by the classifier, we will think that there must be some mistake in the current segmentation results, and the text line should be resegmented.
Fig. 2. A text line which is difficult for segmentation
Among all the existing rejection strategies, negative training [1] is one of the most effective and efficient one. Negative training strategy is implemented as follows. In training phase, an additional class, called dummy class, is introduced. The negative samples, which don’t belong to any valid class, such as the character pair ”kj” in Fig 2, are collected and assigned to the dummy class. The number of negative samples should be large enough to cover all the usual segmentation error. Then we design a classifier containing both the dummy class and the valid classes. In segmentation phase, for a tentative segmentation point, we extract the corresponding character images and send them to the classifier. If at least one of these character images is assigned to the dummy classes, we will believe that the current segmentation point must be incorrect and search for another one. Figure 3 shows the architecture of the OCR system with a recognitionbased segmentation engine. The main task of this paper is just to present the methodology of designing a classifier which can provide precise rejection information. In many practical pattern recognition systems with large category sets and unknown probability density distributions, such as OCR system, the Euclid-distance-based nearest prototype classifier (NPC) has yielded superior performance. Compared with other nonparametric techniques such as k-nearest-neighbor (k-NN) classifier, each class in NPC is represented using several prototypes instead of all the original training samples, so that the computation cost and storage requirements are greatly reduced.
An Extended Learning Vector Quantization Algorithm
133
Fig. 3. Recognition-based segmentation using negative training scheme
The key issue for NPC is to abstract or generate the optimal prototypes set which will allow for the lowest possible error rate. Genetic algorithm [2] and the maximum variance cluster algorithm (MVC) [3] have been proposed to search for the most representative prototypes, but these methods do not take into account the boundaries between classes. Leaning Vector quantization (LVQ), as a two-layer competitive network, has been proved to be the most powerful tool for the design of NPC [4]. The early LVQ algorithms, such as LVQ2.1, adjust the prototypes without optimizing any criterion, and therefore are only rules of thumb. As a result, the performance can not be guaranteed, and more seriously, the training procedure may diverge. In order to overcome this drawback, several extended LVQ algorithms [5,6,7,8] have been proposed based on empirical cost functions and stochastic gradient decent scheme. One of the things that makes negative training scheme difficult to implement is the complicated distribution of the negative samples. As a result, in order to make the classification precise, a large number of prototypes are necessary for the dummy class, which will make the task of classifier design rather difficult. GLVQ algorithm constitutes the cost function based on only two prototypes so that the training procedure often gets trapped into local minimum state. The H2M-LVQ algorithm utilizes the harmonic average of all the incorrect prototypes to avoid local minimum, but it is computationally expensive and may also make the training procedure diverge. In this paper, we modify the traditional LVQ training algorithm so that it is suitable for negative training. In section 3, the cost function based on multiple confusable prototype-pairs is introduced and the corresponding update algorithm for the network is obtained. The experimental results in section 4 show that the classifier designed by our proposed method is applicable and can provide accurate rejection information to the recognition-based character segmentation engine.
2 GLVQ and H2M-LVQ Given a set of M labeled prototypes W = {(w1 , c1 ), (w2 , c2 ), . . . , (wM , cM )}, where wi ∈ RD is the prototype vector in feature space and 1 ≤ ci ≤ C is the corresponding class label (C is the number of classes), the class label y of any unknown sample x is determined by y = cq
134
L. Xu et al.
where q = arg min d(x, wi ) i
and d(x, wi ) denotes the squared Euclidean distance between x and wi . GLVQ [8] is one of the most powerful algorithms which proceeds as follows. Given the labeled sample (xt , yt ), we can find the closest genuine prototype wk and the closest incorrect prototype wl using k = arg min d(xt , wi ) i, ci =yt
l = arg min d(xt , wi ) i, ci =yt
Based on dk and dl , the cost function can be defined as 1 1 −ξ(t)μ(x t ,W ) N i=1 1 + e N
JGLV Q (X, W ) = where
μ(xt , W ) =
dk − dl dk + dl
(1)
(2)
and N is the number of training set. In the above, ξ(t) is a positive number increasing linearly with the iteration number t to approximate the zero-one step loss function. For simplicity, we assume for the moment that μ = μ(xi , W ) and ξ = ξ(t). Since equation (1) is differentiable, we can update the prototype wk and wl by the stochastic gradient descent scheme 4dl ξe−ξμ (xt − wk ) (3) Δwk = α (1 + e−ξμ )2 (dk + dl )2 Δwl = −α
4dk ξe−ξμ (xt − wl ) (1 + e−ξμ )2 (dk + dl )2
(4)
where α is the learning rate. Although the convergence of GLVQ can be guaranteed, its ultimate performance still heavily depends on the prototypes’ initial positions. To alleviate this influence, Qin and Suganthan have proposed the H2M-LVQ algorithm. In H2M-LVQ, dk and dl in equation (2) is replaced by Dk =
1+
M d yt k (dk /di )η
(5)
i=k,ci =yt
and Dl =
(M − Myt )dl 1+ (dl /di )η
(6)
i=l,ci =yt
respectively, where Myt is the number of prototypes with label yt . If parameter η changes from 1 to 0 as training proceeds, equation (5) and (6) will correspondingly
An Extended Learning Vector Quantization Algorithm
135
change from the harmonic average distance to the minimum distance. The introduction of Dk and Dl makes the training algorithm for H2M-LVQ rather complicated. For brevity, we first differentiate Dk and Dl with respect to di : ⎧ η+1 2 ⎪ i = k, ci = yt ⎨ηMyt (dk /di ) /Sk ∂Dk 2 = Myt [η + (1 − η)Sk ]/Sk i = k (7) ⎪ ∂di ⎩ 0 otherwise ⎧ η+1 2 ⎪ ⎨η(M − Myt )(dl /di ) /Sl ∂Dl = (M − Myt )[η + (1 − η)Sl ]/Sl2 ⎪ ∂di ⎩ 0 where Sk = 1 +
i = l, ci = yt i=l otherwise
(8)
(dk /di )η
i=k,ci =yt
Sl = 1 +
(dl /di )η
i=l,ci =yt
Then, the entire training algorithm will be Δwi = α
ξe−ξμ 4Dl ∂Dk (xt − wi ) ifci = yt (1 + e−ξμ )2 (Dk + Dl )2 ∂di
(9)
4Dk ∂Dl ξe−ξμ (xt − wi ) ifci = yt −ξμ 2 2 (1 + e ) (Dk + Dl ) ∂di
(10)
Δwi = −α
Because all prototypes contribute to the cost function (1). As a result, the whole network is updated at every t when using H2M-LVQ method.
3 Proposed Methods Although the reported experimental results of the above-mentioned algorithms are rather inspiring, we still note that they are mostly based on small-scale database and low-dimensional feature space. As for OCR systems, further modification is necessary. 3.1 An Extended Training Algorithm for LVQ Network In the case of character recognition, multiple prototypes are necessary due to the large number of fonts, so the number of total prototypes is usually very large. From the standpoint of efficiency, the computation cost of updating all the prototypes for every (xt , yt ) is unacceptable. Furthermore, LVQ is a local learning algorithm in the sense that classification boundary is approximated locally. If the prototypes far away from the local boundary are adjusted frequently, the training procedure may become unstable. In addition, there are many similar patterns in OCR system. A training sample may be located around the boundaries of more than two classes. Under these complicated
136
L. Xu et al.
circumstances, the adjustment should be determined according to the overall local information rather than the nearest prototype-pair. In the GLVQ algorithm, only the most confusable prototype-pair is choose to form the cost function (1). In this paper, we retain wk , but extend the number of nearest incorrect prototypes to L (they can come from different classes). For every candidate prototype-pair, we utilize similar cost function as expression (2) Ji = where μi (xt ) =
1 1+
(11)
e−ξ(t)μi (xt )
dk − dli , dk + dli
i = 1, 2, · · · , L
(12)
Based on single Ji , we define the synthetical cost function 1 1 1 1 1 ( )η ] η = [ (Ji )η ] η −ξ(t)μ (x ) i t L i=1 1 + e L i=1
L
J(Xt , W ) = [
L
(13)
Thus, by the stochastic gradient descent scheme, we get Δwli = −α and
where
ξe−ξμi 4dk ∂J (xt − wli ) −ξμ 2 i ∂Ji (1 + e ) (dk + dli )2
(14)
L ξe−ξμi 4dli ∂J ](xt − wk ) Δwk = α [ −ξμi )2 (d + d )2 ∂J (1 + e i k li i=1
(15)
L 1−η ∂J 1 1 = ( ) η [ (Ji )η ] η (Ji )η−1 ∂Ji L i=1
(16)
In the above expressions, parameter η changes from 1 to +∞ as training proceeds. When η equals to 1, J is the arithmetic mean of Ji and all the L prototypes round the boundary are involved; when η approaches +∞, J becomes the maximum of Ji and only the most confusable prototype-pair is taken into account. 3.2 Special Measures on Optimizing the Training Procedure Perfect performance on the train set doesn’t necessarily imply the same performance on the test set, especially when the distribution of negative samples are very complicated. To prevent the network from overtraining, we introduce the concept of safe zone [5]. In other words, if d(wk , wli ) < γ, i = 1, 2, · · · , L (17) dli where γ is a threshold determined by preliminary experiments, we will believe that this training sample is in the safe zone, and make no adjustments for the current training sample.
An Extended Learning Vector Quantization Algorithm
137
Additionally, allowing the network to continue more training than what is necessary usually results in poor generalization performance. After every iteration t has finished, the classification on the validation set is performed and the correct rate Et and its increment ΔEt = Et − Ebest are calculated, where Ebest is the highest correct rate that has been obtained [9]. If the latest three consecutive ΔEt are all less than 0.01%, the training procedure should be stopped. Most of the computation cost during the training procedure is spent on the Euclidean distance between the current sample and all the prototypes. Now that just the local boundaries are considered, only the distance associated with the confusing classes should be involved, and therefore the training procedure can be accelerated. As a preliminary work, we first design an Euclidean distance classifier containing all the valid classes and then record a fixed number nearest classes of each sample as its confusing subset. When training, only the prototypes that belong to the current sample’s confusing subset or the dummy class should be taken into account. Note that the classification of validation (or test) set should be performed on the whole class set to provide accurate information about overtraining.
4 Experimental Results and Analysis The database of our experimental OCR system consists of numerals, English characters c and . Some character-pairs which appear and other common symbols, such as frequently and are inherently difficult for segmentation, such as rn, are added to our OCR system as new categories. The list of such character-pairs can be found in [1], except ffi. Furthermore, some similar characters, such as ”C” and ”c”, are combined into one class and can be distinguished from each other in the postprocessing stage. There are totally 179 classes and 194330 samples in our OCR system, as is shown in Table 1 and Table 2. The test set is also used as validation set. Table 1. The categories in our system Numeral 10
English Character 55
Character-pair 45
Symbol 69
Total 179
Each character image is represented as a multi-scale directional element feature (DEF) [10] vector which can be extracted as follows. In preprocessing, the character image is linearly normalized to 64×64 size and subsequently the character contour is extracted. Four types of elements denoting different orientations (horizontal, vertical and ±45◦) are calculated and assigned to each contour pixel. In order to make the feature more stable, we partition the whole contour image into 7 × 7 subareas. For each subarea, the elements with the same orientation are accumulated over all the contour pixels. Since each subarea has four types of elements, the whole dimensionality is 4 × 7 × 7 = 196. Similarly, if we partition the contour image into 5 × 5 and 6 × 6 subareas, the dimensionality of the corresponding feature vectors will be 100 and 144,
138
L. Xu et al.
respectively. By concatenating the three feature vectors serially, we can get the 440dimensional feature vector of the character image. The corresponding parameters are set as follows. The number of prototypes is 8 for each valid class and 64 for the dummy class. These prototypes are initialized by kmeans algorithm. The related parameters are L = 5, α = 0.175 and ξ = t/1.5. η is 1 − t/80 for H2M-LVQ and 1 + 0.11(t − 1) for the proposed method. We first test the performance of GLVQ. The corresponding recognition rate during the whole training process is shown in Figure 4. We can see in this figure that GLVQ seriously suffers from overtraining. Although the recognition rate on train set could always increase as the training proceeds, however, the recognition rate on test set began to decrease after several iterations. In practical application, the training procedure should be stopped at point A . 99.5
99
Recognition rate
98.5
98
A 97.5
train set test set
97
96.5
96
0
10
20
30
40
50
60
70
80
Itereation number
Fig. 4. The recognition rate of each iteration GLVQ
Table 2. The division of the samples
Normal Sample Negative Sample Total
Train Set 102333 38000 140333
Test Set 42597 11400 53997
Total 144930 49400 194330
Fig 5 depicts the comparison of H2M-LVQ and the proposed method. It is obvious from this figure that both methods can effectively avoid overtraining and yield good generalization performance. The main reason is that these methods constitute the cost function on multiple prototypes, and consequently the train procedure can jump out of the local minimum state.
An Extended Learning Vector Quantization Algorithm
139
99.5
99
Recognition rate
98.5
98
97.5
97
train set for proposed method test set for proposed method train set for H2M−LVQ
96.5
96
test set for H2M−LVQ
0
10
20
30
40
50
60
70
80
Iteration number
Fig. 5. The comparison of H2M-LVQ and proposed method Table 3. Recognition rate of the ultimate classifiers
GLVQ H2M-LVQ Propose method
Normal 99.14 99.01 99.39
Train set Negative 96.77 98.63 98.70
Total 98.50 98.90 99.19
Normal 98.75 98.63 98.81
Test set Negative 97.21 98.41 98.57
Total 98.42 98.59 98.76
We also note that the convergence of H2M-LVQ is much slower than that of the proposed method, especially at the initial stage. This is because H2M-LVQ update all prototypes at this stage. The prototypes far away from the current boundary will weaken the adjustment on the most confusable prototypes. Another remarkable advantage of the proposed method is that it needs much less computation than that of H2M-LVQ. Table 3 lists the ultimate recognition rate of the classifier designed by the abovementioned methods. From this table we can conclude that the proposed method has yielded superior performance than the other two methods.
5 Conclusion From our experimental OCR system we find that a large proportion of error is caused by inaccurate segmentations, so that the recognition-based segmentation schemes are necessary. The key issue of these schemes is to design a classifier that can provide accurate rejection information to the segmentation engine. We present an extended LVQ algorithm to solve this problem.
140
L. Xu et al.
We constitute the cost function based on multiple confusable prototype-pairs so that the network can jump out of the local minimum state effectively. Furthermore, the criterion of safe zone is utilized and the training procedure can keep away from overtraining. Coupled with negative training scheme, our classifier is capable of providing reliable rejection information to the segmentation engine. The rate that the negative samples are assigned to the dummy class is 98.70% for train set and 98.57% for test set, so most incorrect segmentation can be detected and the overall performance is improved.
References 1. Huo, Q., Feng, Z.D.: Improving Chinese/English OCR Performance by Using MCE-based Character-pair Modeling and Negative Training. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition, Vol. 1. IEEE (2003) 2. Kuncheva, L.I., Bezdek, J.C.: Nearest Prototype Classification: Clustering, Genetic Algorithms, or Random Search?. IEEE Transaction on Systems, Man and Cybernetics-Part C. 28 (1998) 160–164 3. Veenman, C.J., Reinders, M.J.: The Nearest Subclass Classifier: A Compromise between the Nearest Mean and Nearest Neighbor Classifier. IEEE Transaction on Pattern Analysis and Machine Intelligence. 27 (2005) 1417–1429 4. Liu, C.L., Nakagawa, M.: Evaluation of Prototype Learning Algorithms for Nearest-neighbor Classifier in Application to Handwritten Character Recognition. Pattern Recognition. 34 (2001) 601–615 5. Pedreira, C.E.: Learning Vector Quantization with Training Data Selection. IEEE Transaction on Pattern Analysis and Machine Intelligence. 28 (2006) 157–162 6. Qin, A.K., Suganthan, P.N.: Initialization Insensitive LVQ Algorithm Based on Cost-function Adaptation. Pattern Recognition. 38 (2005) 773–776 7. Seo, S., Bode, M., Obermayer, K.: Soft Nearest Prototype Classification. IEEE Transaction on Neural Networks. 14 (2003) 390–398 8. Sato, A.S., Yamada, K.: Generalized Learning Vector Quantization. In: Advances in Neural Information Processing Systems, Vol. 7. MIT Press (1995) 9. Palmes, P.P., Hayasaka, T., Usui, S., Obermayer, K.: Mutation-Based Genetic Neural Network. IEEE Transaction on Neural Networks. 16 (2005) 587-600 10. Kato, N., Suzuki, M., Omachi, S., Aso, H., Nemoto, Y: A Handwritten Character Recognition System Using Directional Element Feature and Asymmetric Mahalanobis Distance. IEEE Transaction on Pattern Analysis and Machine Intelligence. 21 (1999) 258–262
Improved Decision Tree Algorithm: ID3+ Min Xu1,2, Jian-Li Wang1, and Tao Chen1 1
Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China 2 Graduate School of the Chinese Academy of Sciences, Beijing 100039, China
[email protected]
Abstract. This paper proposed an improved decision tree algorithm, ID3+. Through the performance of autonomous backtracking, information gain reduction and surrogate value, our method overcomes some ID3’s disadvantages, such as preference bias and the inability to deal with unknown attribute values. The experimental results show that our method can competitively and efficiently solve the two problems. The first problems often leads to inferior decision trees, while the second limits ID3’s applicability in real-world domains. And the method could be a good start for a more robust decision tree learning system.
1 Introduction As one of the most popular concept learning methods, decision tree, with its many advantages over alternatives such as neural networks[9], Bayese learning, and nearest neighbors, is one of the most popular machine learning methods. Up to now, ID3[2] is one of the best known decision tree algorithms. The basic idea of ID3 family of algorithms is to infer decision trees by growing them from the root downward, greedily selecting the next best attribute for each new decision branch added to the tree. ID3 searches a complete hypothesis space and is capable of representing any discretevalued function defined over discrete-valued instances. Effective and expressive as it is on many learning tasks, ID3 in its basic form still has some serve limits. One shortcoming of ID3 is its inability to handle noisy data, which will lead to overfitting. Solutions of this problem include validation set pruning and introduction of some fuzziness. In fact, previous research reported good performance after these noise-tolerant techniques were employed with ID3. Assuming that the training data set is noise free and adequate, does ID3 always generate a correct decision tree? The answer is no. Preference bias of ID3 can generate inferior decision trees. The most widely used attribute evaluation method is entropy-based information gain. From the information theory point of view, it is an attempt to encode a class of randomly drawn member of the training set with smallest number of bits. The problem with these measures is that it is biased toward attributes that have many possible values. These attribute give ID3 an appearance of being good class predictors since they split the training data into perfectly classified partitions. But in many cases they are not. Besides information gain, there are other attribute evaluation D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 141 – 149, 2006. © Springer-Verlag Berlin Heidelberg 2006
142
M. Xu, J.-L. Wang, and T. Chen
methods. To our knowledge all existent selection criteria proposed so far incur some kind of search bias. The second shortcoming is that ID3 in the basic form misses opportunities in complex, real-world applications, though it is computationally efficient. One of ID3’s non-incremental learning assumptions is that all attribute values are present in both the training examples and test instances. However, the situations that the values of some attributes are missing from training examples often take place. This could happen, for examples, when some sensors fail in the data collection process. Simply ignoring the examples with unknown values lead to bad decisions. ID3 is not supposed to be responsible for incorrect prediction for a test instance with unknown attribute values. But it should be capable of making a reasonable guess according to the nonmissing feature elements of this instance, as most medical doctors would do toward a patient in similar circumstances. The remainder of this paper is organized as follows. Section 2 surveys related work. Section 3 describes our proposed solutions to the above problems. Section 4 gives the results of our simple experiments. Section 5 summarizes our conclusions and discusses future work.
2 Related Work A large variety of extensions to the basic ID3 algorithm has been developed by different researchers. These include methods for post-pruning trees, handling real-valued attributes, accommodating training examples with missing attribute values, incrementally refining decision trees as new training examples become available, using attribute selection measures other than information gain, and considering costs associated with instance attributes. Several ID3-based systems have been built [3-7]. To the best of our knowledge, there is no previous work that attacked the preference bias problem of ID3 through general backtracking. Dealing with missing attribute values is usually addressed by incremental learning systems. These systems try to avoid rebuilding an entirely new decision tree when a new training example arrives. Instead, they change only ‘faulty’ parts of the learned knowledge base to accommodate the new observations. Schlimmer’s STGGER [6] system represents concepts as a probabilistic summary of important concept subcomponents. Motivated by the fact that real-world applications require resistance to noise, STGGER does not insist on perfect consistency between the decision tree and the training environment, nor does it make abrupt repairs following each misclassification. Experiments demonstrate that probabilistic representations and a conservative revision strategy enables STGGER to deal effectively with noise. A system that appears quite different than STGGER at a cursory level, but that draws important principles from it, is Schlimmer and Fisher’s ID4[5]. At the core of incremental ability of ID4 is the observation that information gain evaluation needs not be computed directly from the set of training examples, but a probabilistic summary of the observations is sufficient. The general findings of Schlimmer and Fisher
Improved Decision Tree Algorithm: ID3+
143
are that ID4 converges on decision trees equivalent in quality to ID3’s, while the cost of updating a decision tree in ID4 is often lower than ID3 when the tree is large. Utgoff proposed several useful tree revision operators in ID5R[7] for efficient tree reorganization. The two most important ones are tree transposition and cutpoint slewing. When a different test attribute should replace the current one at a decision node, each non-leaf subtree is first recursively revised so that the new test occurs at the root of each subtree. Then the tree is transposed at the decision node. Cutpoint slewing is accomplished through checking the instances everywhere below the current node. Each instance that is in the wrong subtree is backed out to the current node by removing its information from the test along the way. The most important goal for an incremental method is that its incremental cost be less than the cost of starting from scratch upon the arrival of a new observation. The disadvantage of this approach is that localized changes may result in a decision tree of lower quality. Thus, incremental learning methods trade quality against cost.
3 Proposed Solutions: ID3+ To overcome the two problems discussed in Section 1, we first introduce autonomous backtracking to ID3 to reduce preference bias, then augment ID3 so that it can deal with unknown attribute values in induction and can make reasonable guesses for test instances with unknown attribute values. We named the improved decision tree learning system ID3+. 3.1 Autonomous Backtracking In section 1, we claim that ID3 may generate an incorrect decision tree even if the training set is self-consistent and adequate. Here are some definitions used in this paper. Definition 1. A training set is adequate if there exists a decision tree such that every leaf of the tree is supported by at least one example in the training set. Definition 2. A training set is self-consistent if there are no conflicting examples in it. Intuitively, adequacy means that the training examples provide enough information for humans to be able to understand the instances in a verifiable way. If a leaf in a decision tree is not supported by any example, then it is anything but a sound reasoning because it is not variable by the training set. This kind of tree is not desirable. Rather, we would like each leaf of our decision tree to be reconfirmed by some of the training examples. If the training data were not collected well, people are unable to find a classification without making some guesses for some cases. This in fact is a problem with data collection. The learner is not to blame. The problem with ID3, however, is that even with an adequate training set, it is likely to make inferior splits due to its favor for shallowness over correctness of the tree.
144
M. Xu, J.-L. Wang, and T. Chen
*ID3 algorithm in [8]: ID3(examples, attributes){ If all examples in one category, Then return a leaf node with this category as label If attribute = => Then return a leaf node with the most common category as label A the “best” decision attribute of attributes For each value Vi of A Let examplesi be the subset of examples with value Vi for A If examplesi = => Then create a leaf node with the most common category as label Else Call ID3(examplesi, attributes-A) } When we examine the ID3 algorithm, we can find that every step is perfect except when it runs out of attributes or runs out of examples. ID3 uses voting to select some common category as label, as is marked with arrows in the pseudo-code. Since nodes far away from the root typically have a small number of examples, choosing the most common category from under this node does not make probabilistic sense. On the other hand, Choosing the most common category from the entire training set provides little information about a given node. These classification paths are more likely to be wrong than that terminate at “all examples in one category”. There are several possible reasons for an empty attribute set or training data set. One is inadequate attributes or inadequate examples, or both. This is essentially a problem with data collection. It cannot really be solved with any machine learning algorithm without asking for more features or more examples. Another possible reason, however, is bad earlier split due to search bias of ID3. With information gain as the attribute evaluation method, a short and wide tree is preferred over a deep and narrow tree, even if the former might be an incorrect one. We do backtracking upon either realizing that there are no more attributes to divide the impure example set, or that there is no remaining example that takes this value of the attribute. These are actually the moments we realize that we must have made some incorrect splits earlier, assuming that the training set is adequate and self-consistent. The basic idea is that we don’t guess if we can make induction without guess. We backtrack from the dead end and try another split, so we get the augmented algorithm as fellows *ID3+ algorithm: ID3(examples, attributes) { If all examples in one category, Then return a leaf node with this category as label If attribute = examples= Then return NIL loop: A the next “best” decision attribute of attributes
Improved Decision Tree Algorithm: ID3+
145
For each value Vi of A Let examplesi be the subset of examples with value Vi for A If (ID3+(examplesi , attributes-A)=NIL) Then {Nullify subtree(A =Vi ) goto loop } } Note that this autonomous backtracking does not conflict with validation set backtracking. On the contrary, they complement each other. Validation set pruning is controlled backtracking since we need to force n elaborately selected validation set on the (potentially) overfitting decision tree. Usually the validation set simply consists of some of the training examples that could have been used in top-down induction. Validation set backtracking helps filter out noise in the training data and avoid coincidental regularities, while autonomous backtracking serves to reduce preference bias. There is no another set of pruning data involved in ID3+ still respects the principles of ID3 so the backtracking is called autonomous. To avoid danger of infinite loops with the introduction of backtracking, the attributes need to be tried in a particular order at each node. By associating each node with a list of attributes ordered according their information gains, we can avoid infinite loops as well as repetitive computations of information gain when it backtracks to this node. By trying the attributes in a descending order with respect to their information gains, ID3+ still respects the principles of ID3, though correctness is now put on a higher priority than making the tree wide and shallow. It is the cases when multiple correct decision trees exist with some training data. In these cases ID3+ simply chooses the one that best fits the information gain criterion. As long as the training examples are self-consistent and adequate, ID3+ will eventually find a correct decision tree. We do not present a formal proof here for this due to space limitation, but the intuitive explanation is that by backtracking ID3+ tries every possible tree if it has found evidence that the training set is adequate. 3.2 Dealing with Unknown Attribute Values Unknown attribute values need to be taken care of twice in induction, i.e., attribute selection and individual case classification. Several methods have been explored for both phases. We implemented Quinlan’s information gain reduction [1,3] during training and borrowed Breiman’s surrogate splits [4] during testing. Originally these were results of research in incremental learning. We do not intend to make ID3+ an incremental learner like ID4[5] or ID5R[7]. Delayed batch-style restructuring to accommodate new observations is acceptable in most situations, and usually generates a decision tree of higher quality. 3.2.1 Information Gain Reduction When evaluating a test based on attribute A, Quinlan[1] reduces the apparent information gain from test set A by the proportion of cases with unknown values of A. The rationale for this reduction is that, if A has an unknown rate of x%, test set A will yield no information x% of the time. ID3+ includes this method to deal with unknown attribute values in training set.
146
M. Xu, J.-L. Wang, and T. Chen
3.2.2 Surrogate Value By contrast, Breiman[4] tries to “fill in” the missing values of A before calculating the information gain of A. We borrow this idea of surrogate split and apply it to ID3 + in test phase. ID3+ examines all non-missing feature elements of the test sample and takes the values that are most probable according to other instances in the training set. Thus, the convenient value is surrogate value to the real one. For example, the value for A1, which is one of m attributes A1, A2…, Am, is missing from a test instance i. We scan the training set and find that among the training examples whose values for the m-1 other attributes all match those of i, 15 examples have the value v11 for attribute A1, 7 examples have the value v12, 2 examples have the value v13. We say that v11 is most likely to be the value for A1 in test instance i. If none of the training examples matches instance i for all m-1 non-missing feature elements, we lower the likeliness threshold from m-1 to m-2. This search continues until no match vales can be found even when the likeliness threshold is as low as, say, m/2. At that time we decide that the class of this test instance is simply not predictable. Choosing an educated guess is already beyond the scope of this paper.
4 Experiment Results In this section, we will give out some experiment to show the efficient and competitive of our algorithm ID3+ to the ID3. Performance of autonomous backtracking is the accuracy that ID3+ demonstrates on self-consistent training and test data. It is compared against the accuracy of original ID3 from [8]. For information gain reduction and surrogate value, performance is tolerance of ID3+ to missing values. 4.1 Autonomous Backtracking To illustrate why ID3+ can outperform ID3 in terms of accuracy, we have built training set, ID3 trace file and ID3+ trace file, and it is an adequate set of training examples which supports a three-layer tree as in shown in Figure 1(a). All training examples of (A0=v01 &&A1= v13) are of class Y regardless of their values for attribute A2. Figure1(b) is the decision tree generated by ID3. After splitting on attribute A0, ID3 chooses A2 as the best attribute over A1 directed by information gain metrics on the branch of (A0=v01). This is because A2 has 4 possible values and A1 only has 3 possible values while both attributes partition the remaining examples equally well. As ID3 goes down (A0=v01 &&A2= v24), there is not a training example such that (A1= v13) in this branch, which means ID3 runs out of examples and has to make up a tag for this leaf by guessing. Voting leads to N because of the dominating number of Nclass examples in branch (A0=v02). Unfortunately this is a wrong label. In contrast to ID3, ID3+ backtracks and tries partitioning upon attribute A1 first. It finds that down stream (A0=v01 &&A1= v13), though A2 takes three values v21, v22, all remaining examples here are of class Y. So ID3+ comfortably concludes with a tag of Y over this pure set of training examples in Figure 1(a). How does ID3+ perform compared to ID3 on larger data sets? We have chosen Voting, Students, Intelligent Cards and Weather to carry out the experiment and the numbers of their instances are enough to prove the difference of performance between ID3 and ID3+.
Improved Decision Tree Algorithm: ID3+
147
The detailed experimental results are presented in the Table 1. The two rightmost columns are accuracies obtained from applying the programs on test data after training them on training data. Both ID3 and ID3+ attain 100% accuracy on the Voting and Students data sets. The Evaluation of the IC Cards and the weather, as to Intelligent Cards, the accuracy of ID3 is 94.1% while ID3+ is 99.5% accuracy and as to Weather, the accuracy of ID3 is 93.3% while ID3+ is 99.5% accuracy. ID3+ classifies all of them exceeding ID3. From this simple example we can see that ID3+ outperforms ID3.
(a) The correct decision tree
(b) The incorrect decision tree Fig. 1. The Difference of ID3+ and ID3 Table 1. Experiment Results of Autonomous Backtracking
data set Voting Students Intelligent Cards Weather
training instances 400,000 20,000 1,860 9,700
test instances 136 320 430 200
attributes
ID3
ID3+
16 22 10
100% 100% 94.1%
100% 100% 99.5%
8
93.3%
99.0%
4.2 Information Gain Reduction and Surrogate Value Information gain reduction and surrogate value are both based on probability estimate. We will illustrate how information gain reduction works and obtain performance numbers in the experiment. Given the 12 training examples in table 2 without instance i’ , information gains for all the three attributes A1, A2 and A3 are the same, 0.000 since they all evenly split
148
M. Xu, J.-L. Wang, and T. Chen
the training objects into two classes. The hierarchical decision tree is A1—A2—A3, from top to bottom (Figure 4(a)). With instance i’ added to the training set the information gains for A1, A2, and A3 are now 0.007,0, and 0.004, respectively. The tree will be A1—A3—A2 (Figure 4(b)). Reduced information gain for attribute A2 moves it one level down the decision tree. The experiment show that ID3+ can deal with unknown attribute values in the expected way. Table 2. Results of the 12 Training Examples
attributes instances i1 i2 i3 i4 i5 i6 i7 i8 i9 i10 i11 i12 i’
A1
A2
A3
v1 v1 v1 v1 v2 v2 v2 v2 v3 v3 v3 v3 v1
v21 v21 v22 v22 v21 v21 v22 v22 v21 v21 v22 v22 *
v31 v32 v31 v32 v31 v32 v31 v32 v31 v32 v31 v32 v31
(a )
y n n y y n n y y n n y y
(b)
Fig. 2. Information Gain Reduction Example
5 Conclusions In this paper, we have proposed an augmented decision tree, ID3+. Experiment show that it is capable of being expanded to a useful tool, which outperforms ID3. With autonomous backtracking, ID3+ prevents itself from getting stuck in a dead end
Improved Decision Tree Algorithm: ID3+
149
caused by an earlier inferior split. With information gain reduction and surrogate value, ID3+ can deal with unknown values in training and test set in a reasonable way. ID3+ can efficiently solve the two problems with ID3, preference bias and the inability to deal with unknown attribute values. The first problem often leads to inferior decision trees, while the second limits ID3’s applicability in real-world domains. In future, we will farther study how to make ID3+ noise tolerant and how to adapt ID3 and ID3+ to continuous values for the attribute. Another interesting future work is tree simplification to improve from subtree replication, in order to improve the space efficiency and understandability of the obtained decision tree.
References 1. Quinlan, J.: Unknown Attribute Values in Induction. International Conference on Machine Learning, (1989) 2. Quinlan, J..: Discovering Rules by Induction From Large Collections of Examples. Expert Systems in the Microelectronic Age, Edinburgh University Press, (1979) 3. Quinlan, J.: Programs for Mchine Learning. Morgan Kaufman, California, (1993) 4. Breiman, L., Nagy, G.: Decision Tree Design Using A Probabilistic Model, IEEE Trans. Information Theory, IT-30 (1984) 93-99 5. Schlimmer, J., Fisher, D.: A Case Study of Incremental Concept Induction. Proc. Fourth National Conference on Artificial Intelligence, (1986) 496-501 6. Schlimmer, J.: Concept Acquisition Through Representational Adjustment. UC Irvine, Dept. of Information and Computer Science, TR, 87-19 (2002) 7. Utgoff, P.: Incremental Induction of Decision Trees. Machine Learning, 4 (1989) 161-186. 8. Mitchell, T.: Machine Learning, McGraw-Hill, Singapore (1997) 52-78 9. Huang, D. S.: Systematic Theory of Neural Networks for Pattern Recognition. Publishing House of Electronic Industry of China, Beijing (1996)
Application of Support Vector Machines with Binary Tree Architecture to Advanced Radar Emitter Signal Recognition Gexiang Zhang, Haina Rong, and Weidong Jin School of Electrical Engineering, Southwest Jiaotong University, Chengdu 610031 Sichuan, China
[email protected]
Abstract. Classifier design is an important issue in radar emitter signal (RES) recognition in which respondence time is a very important and strict performance criterion. For computational efficiency, the multiclass support vector machines (SVMs) with binary tree architecture is introduced to recognize advanced RESs. Resemblance coefficient is used to convert multi-class problems into binary-class problems and consequently the structure of multi-class SVM is obtained. The presented classifier has good classification capability and fast decision-making speed. Experimental results show that the introduced classifier is superior to one-against-all, one-against-one, directed acyclic graph, bottom-up binary tree and several classification methods in the recent literature.
1
Introduction
Radar emitter signal (RES) recognition, especially advanced RESs recognition, is one of the key procedure of signal processing in electronic intelligence, electronic support measure and radar warning receiver systems in modern electronic warfare [1-3]. The state of the art of RES recognition corresponds to the technical merit of electronic reconnaissance equipment. Except for feature extraction and feature selection, classifier design is an important issue in RES recognition. Because RESs are interfered with plenty of noise in transmitting and receiving, the correct recognition rates of advanced RESs are usually very low when traditional classifiers are used. What is more, RES recognition is mainly used in electronic warfare in which respondence time is a very important and strict performance criterion for classifiers. How to design a classifier with good classification capability and fast decision-making speed is an ongoing issue in advanced RES recognition. Recently, support vector machine (SVM) becomes a very popular classification method because it has good robustness and generalization [4, 5]. SVM was designed for binary classification originally and it is not a straightforward issue to
This work was supported by the National Natural Science Foundation of China (60572143), Science Research Foundation of SWJTU (2005A13) and National EW Lab Pre-research Foundation (NEWL51435QT220401).
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 150–161, 2006. c Springer-Verlag Berlin Heidelberg 2006
Application of SVMs with BTA to Advanced RES Recognition
151
extend binary-class SVM to multi-class problem [6-8]. Constructing multi-class SVMs is still an ongoing research issue [6-12]. In the existing literature, there are mainly 5 combination approaches to incorporate binary-class SVMs. They are respectively one-against-all (OAA) [9], one-against-one (OAO) [10], directed acyclic graph (DAG) [11], bottom-up binary tree (BUBT) [13] and binary tree architecture (BTA) [6-8,12,14-16]. For an N -class classification problem, BTA need only test log2 N binary SVMs for classification decision, while the other 4 methods need make at least (N − 1) binary decisions. So BTA has faster respondence speed and is more suitable for recognizing RESs than OAA, OAO, DAG and BUBT. In the procedure of constructing a binary tree, how to choose the root node of every layer is an ongoing issue. In [8], k-means clustering method was used to construct the binary tree. In [7], a kernel-based self-organizing map was used to convert the multi-class problem into binary hierarchies. The conversion employed two methods including human drawing and automatic clustering that maximizes a scattering measure calculated from the data. In [12], the minimum spanning tree algorithm was used as a tool for finding binary partition of classes in a multi-class learning problem. But k-means clustering and automatic clustering [7] can not process effectively the cases that there are overlaps among multiple or all classes. Human drawing is suitable for the classification problem with only 2-dimensional or 3dimensional features because high dimensional feature vector can not be plotted intuitively. Furthermore, the conversion in [7] requires exhaustive search for all possible combinations. The minimum spanning tree algorithm will product multiple binary tree architectures [12], which brings difficulty of choosing the best binary tree. To obtain good classification ability and efficiency, BTA is used to design classifiers for RES recognition. Resemblance coefficient is introduced to convert multi-class problems into binary-class problems in this paper and consequently the classifier called Resemblance Coefficient based support vector machine with Binary Tree Architecture (RCBTA), is obtained. The outstanding characteristics of RCBTA are fast decision-making speed and good classification capability.
2
RCBTA
The existence of confusion classes is one of the most important problems in multi-class classification problems [8]. It is also the main reason that results in misclassifications of different classes. How to deal with the overlapping samples of different classes is the key technique that converts multi-class classification problems into multiple binary-class classification problems [8]. When binary-class SVMs are incorporated to form a binary tree, it is critical technique for making a coarse discrimination between confusion classes and then a finer discrimination within the confusion classes to determine how to partition multiple classes in every binary-class SVM in the tree. This section gives an algorithm to accomplish this task. This section uses resemblance coefficient [17] to judge the degree of confusion of different classes and to construct BTA.
152
G. Zhang, H. Rong, and W. Jin
Definition 1. Suppose that one-dimensional functions f (x) and g(x) are continuous, positive and real, resemblance coefficient of f (x) and g(x) is defined as f (x)g(x)dx
Cr = . (1) f 2 (x)dx · g 2 (x)dx In (1), the integral domains of f (x) and g(x) are their definable domains of the variable x. Moreover, when x is within its definable domain, the value of function f (x) or g(x) cannot be always equal to 0. Because f (x) and g(x) are positive, according to the famous Cauchy Schwartz inequality, we can obtain 0≤
f (x)g(x)dx ≤
f 2 (x)dx ·
g 2 (x)dx .
(2)
From (2), the value domain 0 ≤ Cr ≤ 1 of resemblance coefficient Cr can be inferred. According to the conditions of Cauchy Schwartz inequality, if f (x) equals to g(x), resemblance coefficient Cr of f (x) and g(x) gets the maximal value 1. In fact, if and only if the f (x)-to-g(x) ratio in every point is constant, resemblance coefficient Cr equals to 1. If and only if the integral of product of f (x) and g(x) is zero, i.e. for arbitrary x, f (x) = 0 or g(x) = 0, resemblance coefficient Cr equals to the minimal value 0. From Def. 1, computing resemblance coefficient of two functions corresponds to computing the correlation of the two functions. The value of resemblance coefficient mainly depends on the characteristics of two functions. If f (x) is in proportion to g(x), i.e. f (x) = kg(x), k > 0, the value of resemblance coefficient Cr equals to 1, which indicates function f (x) resembles g(x) completely. As the overlapping of the two functions decreases gradually, resemblance coefficient Cr will increase gradually, which indicates that f (x) and g(x) are resemblant partly. When f (x) and g(x) are completely separable, Cr gets to the minimal value 0, which implies f (x) does not resemble g(x) at all. To compute the degree of sample overlapping of two classes, we introduce the following criterion function: f (x)g(x)dx
. (3) J = 1 − f 2 (x)dx · g 2 (x)dx According to (1) and (2), the value of J is always equal to or more than zero. For any x, if f (x) = 0 and g(x) = 0 or if g(x) = 0 and f (x) = 0, J arrives at the maximal value. If f (x) is the same as g(x), J=0. So the criterion function J given in (3) satisfies the three class separability conditions that class separability criterion based on probability distribution must satisfy [17]. i.e. (i) the criterion function value is not negative; (ii) if there is not overlapping part of distribution functions of two classes, the criterion function value gets to the maximal value; (iii) if distribution functions of two classes are identical, the criterion function value is 0.
Application of SVMs with BTA to Advanced RES Recognition
f(x)
153
f(x)
f(x) g(x)
g(x) g(x)
(a)
J =1
(b)
0< J <1
(c)
J =0
Fig. 1. Three separability cases of functions f (x) and g(x)
When f (x) and g(x) in (3) are regarded respectively as probability distribution functions of feature samples of two classes A and B, several separability cases of A and B are shown in Fig.1. For all x, if one of f (x) and g(x) is zero at least, which is shown in Fig.1(a), A and B are completely separable and J arrives at the maximal value 1. If there are some points of x that make f (x) and g(x) not equal to 0 simultaneously, which is shown in Fig.1(b), A and B are partly separable and J lies in the range between 0 and 1. For all x, if f (x) = kg(x), k ∈ R+ , which is shown in Fig.1(c), k=2, A and B are not completely separable and J arrives at the minimal value 0. In pattern recognition, the extracted features usually follow a certain laws. In general, the features vary in the neighboring area of expectation value because of plenty of noise and measurement errors. If occurrences of all feature samples are computed in statistical way, a feature probability distribution function can be obtained. The function can be considered approximately as a Gaussian distribution function with the parameters of expectation and variance of feature samples. Thus, f (x) and g(x) in (3) can be considered as feature distribution functions of different classes. Fig.1(b) and (3) indicate that the more serious the overlapping of two classes is, the larger J is. In other words, when f (x) and g(x) in (3) stand for feature distribution functions of two classes, J denotes the confusion of two classes. Thus, we can use J in (3) to classify multiple classes into binary classes gradually. An additional explanation is that any function satisfied the conditions in Def.1 can be used as f (x) or g(x) in (3). According to the above criterion function, the detailed algorithm for constructing binary tree architecture to combine multiple binary-class SVMs is given as follows. Algorithm for constructing multi-class SVM with binary tree architecture Begin Initialization: N= number of classes; M= number of features; L= number of samples in training set; for i=1 to N for j=1 to M Computing average value a(i,j) and variance value v(i,j); end end
154
G. Zhang, H. Rong, and W. Jin
for j=1 to M sorting a(1,j), a(2,j), ..., a(N,j) from small to the large and sorted values are b(1,j), b(2,j), ..., b(N,j); end Choosing a real number T as the adjusting parameter; for j=1 to M code(1,j)=0; % The position, where the smallest mean value % b(1,j) is encoded to zero to be the initial % value for k=2 to N computing the criterion function value J(k,j) of b(k-1,j) and b(k,j) using (3); if J(k,j)>=T code(k,j)=code(k-1,j)+1; end end end end
When the adjusting parameter T = 1, the neighboring two classes have no overlapping samples completely. When T = 0, the samples of the neighboring two classes are overlapping seriously. As T decreases from 1 to 0, the confusion samples increase. Through decreasing the adjusting parameter T from 1 to 0, we can gradually convert multi-class classification problems into binary-class classification problems. In the procedure of computing the criterion function value J(k, j) , the function f (x) is considered as a Gaussian function of the mean value b(k − 1, j) and its corresponding variance value and the function g(x) is considered as a Gaussian function of the mean value b(k, j) and its corresponding variance value. After the above algorithm is performed once, an encoded table is obtained. An example of the encoded table is shown in Table 1. According to codes shown in Table 1, we can easily classify multiple classes into two groups. The detailed classification steps are described as follows. The classes with the same columns are considered as a group. Take Table 1 for example, column 1, 2, 3, 8 and 9 are the same and column 4, 5 and 10 are the same. So we can first group classes 1, 2, 3, 8 and 9. The rest classes 4, 5, 6, 7 and 10 are grouped. Of course, classes 6 and 7 may belong to any group because the two classes have different code from any of other classes. To balance the binary-class SVM, we group classes 6 and 7 with classes 4, 5 and 10 here. To subdivide the two groups, the adjusting parameter T decreases and the above algorithm is performed again. When the adjusting parameter T decreases, a new table shown in Table 2 is obtained. Classes 1, 2 and 8 have the same codes. So we classify the former group composed of classes 1, 2, 3, 8 and 9 into two subgroups: one is composed of classes 1, 2 and 8; the other is composed of classes 3 and 9. Similarly, the latter group composed of classes 4, 5, 6, 7 and 10
Application of SVMs with BTA to Advanced RES Recognition
155
Table 1. An example of encoded table (2 rows denote two features and 10 columns denote 10 classes) Classes Feature 1 Feature 2
1 1 0
2 1 0
3 1 0
4 0 0
5 0 0
6 3 0
7 2 0
8 1 0
9 1 0
10 0 0
Table 2. The encoded table obtained when the adjusting parameter decreases (2 rows denote two features and 10 columns denote 10 classes) Classes Feature 1 Feature 2
1 1 0
2 1 0
3 2 0
4 0 2
5 0 2
6 4 0
7 3 1
8 1 0
9 2 2
10 0 0
1,2,3,8,9 vs. 4,5,6,7,10
1,2,8 vs. 3,9
1,2 vs. 8
8
1 vs. 2
1
2
4,5 vs. 6,7,10
3 vs. 9
3
6,7 vs.10
4 vs. 5
9
4
5
10
6 vs. 7
6
7
Fig. 2. An example of binary tree architecture constructed
is also classified two subgroups: one is composed of classes 4 and 5; the other is composed of classes 6, 7 and 10. The rest may be deduced by analogy. Finally, a whole binary tree, which is shown in Fig.2, is constructed. According to this tree, a multi-class problem can be converted into binary-class sub-problems. Thus, a classifier designed by using SVM with binary tree architecture is achieved. By the way, the results of Table 1, Table 2 and Fig.2 are obtained just by using RES features in Section 4. Because the overlapping of different classes are dealt with in the procedure of constructing the binary tree, the multi-class classifier with the structure of the tree has good capabilities of classification and recognition efficiency.
156
3
G. Zhang, H. Rong, and W. Jin
Testing Experiments
Prediction of yeast protein cellular localization sites is a general benchmark and a very difficult classification problem [15, 18-20]. The dataset of yeast protein localization sites was donated by Horton and Nakai in 1996. The detailed information can be found in [18, 19]. The dataset can be obtained in this web [21]. In this dataset, the class is the localization site and there are 10 classes and 8 features for prediction. The number of samples for predicting yeast protein localizations is 1484. The 10 classes are respectively cytosolic or cytoskeletal (CYT), nuclear (NUC), mitochondrial (MIT), membrane protein with no N-terminal signal (ME3), membrane protein with uncleaved signal (ME2), membrane protein with cleaved signal (ME1), extracellular (EXC), vacuolar (VAC), peroxisomal (POX) and endoplasmic reticulum lumen (ERL). The 10 classes are labeled 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10, respectively. The description of this dataset is shown in Table 3. According to the samples of 10 classes shown in Table 3, the proposed algorithm is used to construct binary tree architecture, which is shown in Fig.3. In order to bring into comparison with the methods in [18-20], we use 5-fold cross-validation methodology to test the introduced algorithm. That is to say, Table 3. Yeast protein dataset description (NoS is abbreviation of number of samples) Classes NoS
CYT 463
NUC 429
MIT 244
ME3 163
ME2 51
ME1 44
EXC 35
VAC 30
POX 20
ERL 5
1,2,4,5 vs. 3,6,7,8,9,10
1,2 vs. 4,5
1 vs. 2
1
3,9,10 vs. 6,7,8
3,9 vs. 10
4 vs. 5
2
4
5
3 vs. 9
3
6,7 vs. 8
10
9
8
6 vs. 7
6
7
Fig. 3. Binary tree architecture for predicting yeast protein localization sites
Application of SVMs with BTA to Advanced RES Recognition
157
Table 4. Comparisons of RCBTA and the methods in [18-20] (The blank cells in this table have no values in [20]) (%) Classes 1 2 3 4 5 6 7 8 9 10 Ave.
RCBTA 68.90 52.21 51.64 80.98 33.33 77.07 54.29 3.33 30.00 60.00 59.37
PCS 74.3 35.7 47.1 85.3 15.7 63.6 45.7 10.0 0 60.0 54.9
kNN 55.78 59.18 60.96 65.75 48.63 62.33 68.49 58.90 56.85 58.22 59.51
DT 55.10 51.02 56.16 58.22 50.00 57.53 65.75 57.53 56.85 57.53 56.57
NB 53.74 57.82 56.16 58.22 45.21 54.11 60.27 61.64 56.16 59.59 56.29
HN 55.10 55.78 58.22 55.48 47.95 53.42 67.81 56.16 55.48 57.53 56.29
GCS
FN
GA
ERR
55
57
55
56
Table 5. Comparisons of RCBTA and 13 methods in [15] (%) Classes RCBTA 1AA AAA DDAG ECOC 1 2 3 4 5 6 7 8 DT
CYT 72.7 67.7 72.2 70.2 69.8 71.3 56.0 70.0 74.7 71.7 69.6 61.2 71.3 57.5
NUC 53.4 50.8 48.2 49.9 50.1 51.3 59.4 50.4 49.4 52.0 48.0 56.4 51.3 50.1
MIT 49.6 57.7 55.3 54.5 57.7 48.3 52.0 50.4 45.8 51.1 45.8 50.7 50.3 52.4
ME3 81.6 84.8 78.1 78.6 81.1 75.6 78.7 78.7 71.9 71.3 83.0 79.3 75.0 83.6
ME2 33.3 33.3 41.3 41.3 39.3 33.7 41.3 43.3 26.0 39.0 12.0 25.0 33.7 41.0
ME1 79.5 81.0 71.5 71.5 73.5 69.5 73.5 75.5 71.5 51.0 73.5 80.0 69.5 74.5
EXC 60.0 50.0 61.7 61.7 61.7 47.5 61.7 52.5 65.0 40.0 65.0 52.5 55.0 60.0
VAC 0.0 3.3 0.0 3.3 6.7 0.0 0.0 6.7 3.3 0.0 0.0 3.3 0.0 3.3
POX 40.0 45.0 50.0 50.0 45.0 45.0 45.0 45.0 45.0 50.0 45.0 50.0 45.0 30.0
Ave. 60.9 60.1 59.9 59.8 60.5 57.6 57.6 59.2 58.3 58.3 57.1 57.8 58.8 55.8
the samples of every class are divided into 5 subsets of approximately equal size. Gaussian function is chosen as kernel function of SVMs. To decrease the effect of changing parameters, 63 combinations of constant C=[100 , 101 , 102 , 103 , 104 , 105 , 106 ] and kernel parameter σ=[0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10] are used to test respectively the multi-class SVMs with the BTA shown in Fig.3. The final experimental result obtained using the multi-class SVM classifiers with the BTA shown in Fig.3 is given in Table 4. In [18-20], several methods were used to predict yeast protein localization sites. The methods are respectively probabilistic classification system (PCS) [19], kNN [18], decision tree (DT) [18], Na¨ıve Bayes (NB) [18], HN [18], growing cell structures (GCS)[20], feed forward neural networks (FN) [20], genetic algorithm (GA) [20] and ERR [20]. The
158
G. Zhang, H. Rong, and W. Jin
experimental results in [18-20] are also shown in Table 4. Additional explanation is that the number of samples of CYT and NUC has a little difference. In [18], CYT and NUC have 444 samples and 426 samples respectively, while in this paper, they are 463 and 429, respectively. The baseline correct recognition rate, which is the error rate for a classifier that always predicts the class with most instances, is 54.50% . From Table 4, the introduced method achieves 59.37% correct recognition rate, which is much higher than PCS, DT, NB, HN, GCS, FN, GA, ERR and a little lower than kNN. In [15], only 9 classes were used to conduct 10-fold cross-validation experiments. To compare RCBTA with several methods in [15], we also use the 9 classes to test the performance of the proposed method. In this experiment, 1479 samples are divided into 10 subsets. In every test, 9 subsets are employed to train SVMs and the rest one subset is used to test trained SVMs. The average correct recognition rate of RCBTA is shown in Table 5. There are 13 methods in [15] and they are 1AA (one-against-all), AAA (all-against-all), DDAG (decision directed acyclic graph), ECOC (error correct-ing output codes), 1, 2, 3, 4, 5, 6, 7, 8 and DT, respectively. 1AA, AAA, DDAG and ECOC are 4 multi-class SVM methods. 1AA, AAA, DDAG are respectively identical with OAA, OAO and DAG in this paper. Experimental results of the 13 methods in [15] are also shown in Table 5. Table 5 shows that the introduced method is superior to the other 13 methods in [15] in correct recognition rate. Table 4 and Table 5 indicate that resemblance coefficient algorithm is a valid method to construct binary tree architecture for combining multiple binary-class SVMs to solve multi-class classification problems.
4
Radar Emitter Recognition Using RCBTA
In our prior work [22, 23], two entropy features were extracted from 10 advanced RESs. The 10 signals are represented with x1 , x2 , · · · , x10 , respectively. Every RES has 500 samples. Thus, there are 5000 samples of 10 signals totally. Using the samples, resemblance coefficient method is applied to construct binary tree architecture, which is shown in Fig.2. The encoded tables in the procedure of construction are shown in Table 1 and Table 2. An explantation must be given that the results of Table 1, Table 2 and Fig.2 are obtained just by using the samples of 10 RESs. In the experiment, the samples are divided equally into two groups: training group and testing group. Gaussian function is chosen as kernel functions of SVMs. The parameter σ of Gaussian kernel function varies from 0.001 to 10, i.e. σ=[0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10]. The constant C varies from 1 to 1000000, i.e. C=[100 , 101 , 102 , 103 , 104 , 105 , 106 ]. Thus, there are 63 parameter combinations of C and σ to conduct this RES recognition experiment so as to decrease the effect of changing parameters. Table 6 shows the lowest error recognition rate (ERR) of the 63 tests of the introduced method. To compare the classification performance and recognition efficiency of RCBTA with those of several popular multi-class SVMs including OAA, OAO,
Application of SVMs with BTA to Advanced RES Recognition
159
Table 6. Comparisons of RCBTA, OAA, OAO, DAG and BUBT (%) Signals x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 ERR Testing time (Sec.)
RCBTA 9.88 44.88 0.00 0.31 2.25 0.00 0.00 11.75 0.00 3.88 7.29 93.01
OAA 29.63 51.81 2.13 0.94 6.94 0.00 2.94 9.69 0.19 6.88 11.11 492.21
OAO 22.81 38.06 1.38 0.44 3.44 0.00 0.00 9.56 0.00 3.44 7.91 625.19
DAG 21.81 33.25 0.00 0.19 2.25 0.00 0.00 9.94 0.00 7.06 7.45 472.52
BUBT 23.56 35.63 0.00 0.44 1.38 0.00 0.00 10.69 0.00 5.94 7.76 503.34
DAG and BUBT. Classification performance and recognition efficiency are evaluated with ERR and and testing time, respectively. Each of the 4 methods fulfills 63 tests, of which the lowest ERR is also given in Table 6 which shows that RCBTA achieves the lowest ERR and the shortest testing time among 5 multi-class classification SVM classifiers. The experimental results verify again the validity of the introduced algorithm for constructing multi-class classification SVM with binary tree architecture.
5
Conclusions
SVMs with BTA is introduced into advanced RES recognition because BTA has faster decision-making speed than OAA, OAO, DAG and BUBT. In the procedure of constructing a binary tree, how to choose the root node of every layer is a very important issue. This paper presents a valid algorithm based on resemblance coefficient to convert multi-class classification problems into multiple binary-class classification sub-problems SVMs. After the difficult classification problem and general benchmark of predicting yeast protein cellular localization sites is used to validate the performances of the introduced algorithm, RCBTA is employed to recognize 10 advanced RESs. A large number of experimental results show that RCBTA has high efficiency and good classification capability. Furthermore, experimental results also show that if the structure of binary tree is chosen properly, BTA can achieve better classification performances than OAA, OAO, DAG and BUBT. So a promising classifier is given to recognize advanced RESs. Though, only two examples are used to compare the performances of RCBTA with other methods. Further study is that more examples are applied to test the performances of RCBTA. What’s more, resemblance coefficient method is mainly used to process continuous feature values. How to deal with discrete feature values for constructing binary tree architecture is also a problem to study further.
160
G. Zhang, H. Rong, and W. Jin
References 1. Zhang, G.X., Rong, H.N., Jin, W.D., Hu, L.Z.: Radar Emitter Signal Recognition Based on Resemblance Coefficient Features. In: Tsumoto, S., et al., (eds.): Rough Sets and Current Trends in Computing. Lecture Notes in Artificial Intelligence, Vol. 3066. Springer-Verlag, Berlin Heidelberg New York (2004) 665-670 2. Shieh, C.S., Lin, C.T.: A Vector Network for Emitter Identification. IEEE Transaction on Antennas and Propagation. 50 (2002) 1120-1127 3. Zhang, G.X., Hu, L.Z., Jin, W.D.: Intra-pulse Feature Analysis of Radar Emitter Signals. Journal of Infrared and Millimeter Waves, 23 (2004) 477-480 4. Vapnik, V.N.: An Overview of Statistical Learning Theory. IEEE Transactions on Neural Networks, 10 (1999) 988-999 5. Wang, L.P. (ed.): Support Vector Machines: Theory and Applications. SpringerVerlag, Berlin Heidelberg New York (2005) 6. Lei, H.S., Govindaraju, V.: Half-Against-Half Multi-class Support Vector Machines. In: Oza, N.C., et al., (eds.): Multiple Classifier Systems. Lecture Notes in Computer Science, Vol. 3541. Springer-Verlag, Berlin Heidelberg New York (2005) 156-164 7. Cheong, S.M., Oh, S.H., Lee, S.Y.: Support Vector Machines with Binary Tree Architecture for Multi-class Classification. Neural Information Processing-Letters and Reviews, 2 (2004) 47-51 8. Schwenker, F., Palm, G.: Tree-Structured Support Vector Machines for Multi-class Pattern Recognition. In: Kittler, J., Roli, F., (eds.): Multiple Classifier Systems. Lecture Notes in Computer Science, Vol. 2096. Springer-Verlag, Berlin Heidelberg New York (2001) 409-417 9. Rifkin, R., Klautau, A.: In Defence of One-Vs-All Classification. Journal of Machine Learning Research, 5 (2004) 101-141 10. Kreβel, U.: Pairwise Classification and Support Vector Machines. In: Scholkopf, B., et al., (eds.): Advances in Kernel Methods-Support Vector Learning, MIT Press (1999) 185-208 11. Platt, J.C., Cristianini, N., Shawe-Taylor, J.: Large Margin DAG’s for Multiclass Classification. Advances in Neural Information Processing Systems, 12 (2000) 547553 12. Lorena, A.C., Carvalho, A.C.P.L.F.: Minimum Spanning Trees in Hierarchical Multiclass Support Vector Machines Generation. In: Ali, M., Esposito, F., (eds.): Innovations in Applied Artificial Intelligence. Lecture Notes in Artificial Intelligence, Vol. 3533. Springer-Verlag, Berlin Heidelberg New York (2005) 422-431 13. Guo, G.D., Li, S.Z.: Content-Based Audio Classification and Retrieval by Support Vector Machines. IEEE Transactions on Neural Networks, 14 (2003) 209-215 14. Kahsay, L., Schwenker, F., Palm, G.: Comparison of Multiclass SVM Decomposition Schemes for Visual Object Recognition. In: Kropatsch, W., et al., (eds.): Pattern Recognition. Lecture Notes in Computer Science, Vol. 3663. Springer-Verlag, Berlin Heidelberg New York (2005) 334-341 15. Lorena, A.C., Carvalho, A.C.P.L.F.: Protein Cellular Localization with Multiclass Support Vector Machines and Decision Trees. In: Setubal, J.C., Verjovski-Almeida, S., (eds.): Advances in Bioinformatics and Computational Biology. Lecture Notes in Bioinformatics, Vol. 3594. Springer-Verlag, Berlin Heidelberg New York (2005) 42-53
Application of SVMs with BTA to Advanced RES Recognition
161
16. Zhang, G.X.: Support Vector Machines with Huffman Tree Architecture for Multiclass Classification. In: Lazo, M., Sanfeliu, A., (eds.): Progress in Pattern Recognition, Image Analysis and Applications. Lecture Notes in Computer Science, Vol.3773. Springer-Verlag, Berlin Heidelberg New York (2005) 24-33 17. Zhang, G.X., Hu, L.Z., Jin, W.D.: Resemblance Coefficient and a Quantum Genetic Algorithm for Feature Selection. In: Suzuki, E., Arikawa, S., (eds.): Discovery Science. Lecture Notes in Artificial Intelligence, Vol. 3245. Springer-Verlag, Berlin Heidelberg New York (2004) 155-168 18. Horton, P., Nakai, K.: Better Prediction of Protein Cellular Localization Sites with k -nearest Neighbor Classifiers. In: Proceedings International Conference of Intelligent Systems in Molecular Biology, Vol.5 (1997) 147-152 19. Horton, P., Nakai, K.: A Probabilistic Classification System for Prediction the Cellular Localization Sites of Proteins. In: Proceedings of International Conference of Intelligent Systems in Molecular Biology, vol.4 (1996) 109-115 20. Cairns, P., Huyck, C., Mitchell, I., Wu, W.X.: A comparison of Categorisation Algorithms for Predicting the Cellular Localization Sites of Proteins. In: Proceedings of 12th International Workshop on Database and Expert Systems Applications (2001) 296-300 21. Chang, C.C., Lin, C.J.: LIBSVM: a library. http://www.csie.ntu.edu.tw/∼cjlin/ libsvm/ 22. Zhang, G.X., Rong, H.N., Hu, L.Z., Jin, W.D.: Entropy Feature Extraction Approach of Radar Emitter Signals. In: Proceedings of International Conference on Intelligent Mechatronics and Automation (2004) 621-625 23. Zhang, G.X., Hu, L.Z., Jin, W.D.: Radar Emitter Signal Recognition Based on Entropy Features. Chinese Journal of Radio Science, 20 (2005) 440-445
Automatic Target Recognition in High Resolution SAR Image Based on Electromagnetic Characteristics Wen-Ming Zhou, Jian-She Song, Jun Xu, and Yong-An Zheng Department of Information Engineering Xi’an Research Institute of Hi-Tech Hongqing Town Xi’an, Shannxi, P.R.China 710025
[email protected],
[email protected] [email protected],
[email protected]
Abstract. Propose an improved hybrid method (HM) of microwave backscattering calculation and 3D modeling of target. Based on the HM, the high-frequency radar cross section (RCS) of complex military targets is calculated and high resolution inversed SAR (ISAR) image is simulated. The method of affine transform is used to extract the invariant features of the image contour of several types of airplanes. The image contours of several airplanes are real-time acquired by using space distribution image of backscatter characteristics through Cellular Neural Networks. Then, the target can be recognized automatically from simulated ISAR image. A series of experiments are presented using the high resolution SAR image acquired by the Chinese airborne SAR system.
1
Introduction
With the rapid advance of high resolution synthetic aperture radar (SAR) technology, the model-based automatic target recognition becomes of great interest. The resolution of SAR image acquired by the airborne system already reaches centimeter level. It provides a good condition for recognition of small type military target as vehicles, airplanes, tanks and so on. Therefore, it has already become an important aspect in SAR application at present that radar targets are fast, accurately and automatically identified from a large amount of SAR data. Due to many merits of SAR than the conventional radar, many advanced countries are developing automatic target recognition (ATR) for national defence. Since SAR image involves intensive speckle noise, the feature of SAR image is variable, moreover, slight fluctuation of imaging parameter, the variation of depression angle, target azimuth angle and circumstance can create great variation of the feature of SAR image. Therefore, Automatic target recognition in SAR image is a very difficult task. According to different input to classifier function fraction, automatic target recognition system can be divided into template-based and model-based SAR ATR systems. Since SAR imaging is sensitive to attitude of target, template-based SAR ATR system needs a huge D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 162–171, 2006. c Springer-Verlag Berlin Heidelberg 2006
Automatic Target Recognition in High Resolution SAR Image
163
amount of memory space to store a large amount of template data. However, according to target 3D model stored, model-based SAR ATR system can provide continuous attitude target image in real time to perform the automatic target recognition. Therefore, the model-based automatic target recognition becomes of great interest [1]. In this paper, based on the principle of electromagnetic scattering, an effective hybrid method (HM) is used to calculate the high-frequency radar cross section (RCS) of complex radar targets[2][9]. Target backscatter characteristics image is simulated by synthetic aperture principle under different attitude[3]. Then, a fast, accurate automatic target recognition based on simulated image is performed. At last, we use Chinese airborne SAR image to do analytical test, the result show that the method in this paper is feasible and efficient.
2
Target Backscatter Characteristics
Due to modulation of radar target, radar backward wave signal involves physical and geometrical configuration information of the radar target. In order to obtain space distribution of radar target characteristics, electromagnetic scattering field must be well calculated. Radar cross section also is achieved accordingly. Then, according to the principle of SAR imaging, the space distribution of electromagnetic scattering characteristics can be computed. The flowchart of backscatter characteristics computation is as follows (see Fig.1): Start
3D model of radar target (Modeling)
Electro magnetic scattering com put ation (R C S) ISAR im agin g (Imaging algorith m) Space distribution of backscatter c haracteristics Feature extraction
Output
Fig. 1. The flowchart of backscatter characteristics computation
2.1
Target Geometric Modeling
A computer-aided design (AutoCAD) package for geometric modeling of aircrafts has been used for modeling target geometry. Until now, the plane-fact
164
W.-M. Ming et al.
representation has been the most extensively used method for the geometrical description of RCS targets. In this method, the aircraft is described as a collection of facets and wedges. Some of its principal advantages lie in its simplicity, and in its capacity to represent any geometry. Some types of military weapons are modeled (see Fig.2 and Fig.3):
Fig. 2. Plane-fact surface model of a battleplane and missile
Fig. 3. Plane-fact surface model of a tank and a warship
2.2
Electromagnetic Scattering Computation
In this section, according to electromagnetic scattering principle, various scattering mechanisms of radar targets are explored. An effective hybrid method (HM) is used to calculate the high-frequency radar cross section (RCS) of complex radar targets. Recently, the method of equivalent currents (MEC) is widely applied in computing the high-frequency RCS of complex radar targets [4]-[6]. It can compute electromagnetic fields of non-Keller cone directions by eliminating the limit of geometrical theory of diffraction (GTD) and physical theory of diffraction (PTD). But computing results by MEC are infinite on the shadow and reflection boundaries. In order to solve this problem, this paper modify the expressions of MEC in transition region by the introduction of a factor as transition function and some proper transformation through imitating the form of the uniform theory of diffraction (UTD) . New expressions are derived for the uniform equivalent currents which are finite on the shadow and reflection boundaries. That is called an improved method of equivalent currents (IMEC). Area projection/physical optics method (AP/PO) is derived for the computation of the multiple scattering. We use physical optics (PO) technique to analyze
Automatic Target Recognition in High Resolution SAR Image
165
the multiple scattering. When a plane wave irradiates a target, some patch illuminated on surface of the target that be called A reflected it to another part of the surface that be called B, then the patch B reflect it again, thus, it forms the second scattered fields. It is supposed that the nature of the plane wave is maintained from the first reflection, and the scattered field from the second illuminated surface can be computed by PO. The basic ideas are as follows: firstly, confirm the scattered fields and direction reflected by A on the object surface through geometrical optics (GO) method and, uses it as the second incident fields. Secondly, illuminated region on A is projected onto B, confirms the illuminated region of the second incidence wave on B. Lastly, the second scattered fields are computed by PO. According to this new efficient hybrid method (HM), the high-frequency radar cross section (RCS) of electrically large, complex radar targets are calculated. The HM involves PO, IMEC, AP/PO and GO algorithms. The IMEC effectively eliminates the infinities on the shadow and reflection boundaries. The AP/PO is better for computation of the multiple scattering. The monostatic RCS for some types of battleplane and warship are given as follows. The left graph is the RCS of battleplane model (the dashed line is unimproved result and the solid line is improved result). The right graph is the RCS of warship model (the dashed line is the RCS of a warship body affiliated deck, and the solid line is one of that affiliated upper structure)(see Fig.4).
Azim uth ang le, th eta (deg)
Azimuth angle, theta (d eg)
Fig. 4. (a) The RCS for some type battleplane (b) The RCS for some type warship
2.3
Space Distribution of Backscatter Characteristics
According to the principle of inverse synthetic aperture radar (ISAR) imaging, the space distribution of target electromagnetic scattering characteristics is simulated. The process of imaging simulation is as follows: Step1, supposing a fixed sensor that is used to transmit and receive microwave, it has enough angle width to irradiate homogeneously radar target. Step2, complex RCS values are obtained by using different frequency electromagnetic microwave to irradiate the target model in a fixed angle.
166
W.-M. Ming et al.
Step3, rotate the target model, repeat the step2 in different angle. Step4, down-range profiles of complex radar target are obtained by Fourier transform in every angle. Step5, using angle as independent variable, make Fourier transform to received signal in the same distance threshold, cross-range profiles of every distance threshold of complex radar target are obtained. Step6, two-dimensional (2-D) RCS distribution of a radar target in the downrange and cross-range domain is obtained.
3
Target Recognition
Supported by radar target electromagnetic scattering characteristics image, the ATR of specific military targets are performed according to the flowchart of electromagnetic scattering-based ATR (see Fig.5).
SAR Image
CF AR Detections
Discrimi inant-1 Discriminant-2
Front End:Focus of Attention Back End:Predict, Extract,Match,Search
3D M odel
Search
ROI
Predict
Match Result
Extract
Scatterin g Distribution V ector
Match
Feature V ector
ATR Output
Fig. 5. The flowchart of electromagnetic scattering-based ATR in SAR
First, potential target regions are obtained from a large SAR image according to the located detection window and constant false alarm rate (CFAR) detection. The detection window is set by size of target being to be recognized. Second, use several express operators to reduce false alarm rate to segmented region of interest (ROI). Last, the method of affine transform is used to extract the invariant features of the image contour of several types of airplanes. The image contours of several airplanes are real-time acquired by using space distribution image of backscatter characteristics through Cellular Neural Networks. And the invariant features of airplane contours are acquired correspondingly. Then, vector space is created through target template and segmented potential target based on extracted features. ATR is completed through searching the best matched target.
Automatic Target Recognition in High Resolution SAR Image
3.1
167
CFAR Detection
A high-speed target detection algorithm is acquired based on CFAR technique and target variance character. This CFAR detector is composed of two processes, the horizontal CFAR and the vertical CFAR. The superposition of adjacent reference windows and the distribution character of the image are used to speed up the parameter estimation. The variance character of target is utilized to reduce the effect of bright natural clutter, such as grass and trees, so as to reduce the false alarm rate. The basic principle of CFAR detector is as follows: toward every pixel point, take a specific region around it as reference window, determine a threshold x0 according to the statistical character of the reference window, and make the detection with a constant false rate PFA. ∞ fx (X/clutter) dX (1) PF A = x0
3.2
if (xc > x0 ) then xc is target pixel point esle xc is a clutter point
(2)
SAR Image Target Discrimination
The vector quantization discriminator (VQ) is based on silhouette of target[8], and he can eliminate the false alarms caused by the bright clutter of man-made scenery, such as buildings. The vector quantization discriminator is defined as:
if ∃i , st.d X 0 , W i < E then ROI is target (3) esle ROI is a f alse alarm Feature vector of training samples is acquired by Radon transform, then feature code table W is obtained by coding through vector quantization. Radon transform is used to acquire feature vector X 0 to every target region obtained during detections. At last, compare X 0 and W. if all distances between X 0 and every code in W are greater than E, then only clutter is involved in the ROI. As long as one distance between them is less than E, then can determine the ROI involves target. The peak power ratio (PPR) is defined as a ratio. The ratio is the energy of the best bright a percent of all pixels in the target region relative to all energy in the target region. Threshold T is set through training data. The PPR discriminator is defined as:
if (P P R > T ) then ROI is target (4) esle ROI is a f alse alarm 3.3
Target Recognition Based on Backscatter Characteristics
In the SAR image recognition there are scale transform, rotation transform and translation transform, so looking for the features to be invariant to all these transforms becomes very important. In this paper, the method of affine transform [7]
168
W.-M. Ming et al.
is used to extract the invariant features of the image contour of several types of airplanes. The image contours of several airplanes are real-time acquired by using space distribution image of backscatter characteristics through Cellular Neural Networks. And the invariant features of airplane contours are acquired correspondingly. During the recognition of an airplane type, the first step is to calculate the invariant features of the real-time acquired SAR image contour of the potential airplane and ISAR image contour of hypothetic target. The second step is to calculate the correlation coefficient between the real-time acquired image contour invariant features of the potential airplane target and that of every airplane image acquired by space distribution image of backscatter characteristics in real-time. The maximum correlation coefficient indicates the type of the airplane to be recognized.
4
Experimental Result
An airfield is selected as practical region, experimental data is airborne high resolution KU band SAR image of our country. The image resolution is 0.3 × 0.3m. Imaging parameter is as follows: slant distance is 15 km, depression angle is 15.80, and the line of flight is parallel with airfield runway. The size of practical region is 2 × 2km, the miniature of it is as follows(see Fig.6). The true resolution SAR image is as Fig.7(a), and Fig.7(b) is its corresponding quickbird. The aim of this experiment lies in automatically identifying a specific type airplane. Three dimension of some specific airplane are modeled (see Fig.8-a). The model of the airplane is composed of 320 nodes and 410 facets. According to the 3D model, use the HM to calculate the RCS of the airplane, and the space electromagnetic scattering distribution image is acquired by the principle of the inverse synthetic aperture radar (see Fig.8-b). The imaging parameter is as follows: the angle of incidence of radar is 74.20 , imaging center frequency is 10 GHz,
Fig. 6. SAR image in experimental region
Automatic Target Recognition in High Resolution SAR Image
169
Fig. 7. (a) true resolution SAR image (b) quickbird image
Fig. 8. (a) 3D model of some type plane (b) scattering characteristics image
band width is 1.28GHz, and angle window is 0.15radian. The azimuth angle of simulated scattering characteristics image is 700 . During automatic target recognition in the SAR, according to the true size of the airplane (width: 34.2m; length: 38.8m), and to resolution (0.3 × 0.3m) of the SAR image, the size of CFAR window is calculated, and 58 potential target is acquired. In order to reduce dimensions of search space in the course of matching, the vector quantization discriminator (VQ) and the peak power ratio discriminator (PPR) are used. The number of potential airplane target is reduced to 28. Based on decisive matching of correlative measure, the SAR image after discrimination is decisively matched with the calculated electromagnetic scattering characteristics image of airplane. The best matching result is obtained (see Fig.9). The plane in the white square is the potential target of maximal relativity. The azimuth angle (700 ) of plane is anastomotic with scattering characteristics image.
170
W.-M. Ming et al.
Fig. 9. Target recognition result
5
Conclusion
In this paper, the plane-fact representation is used to describe 3D surface model of the target. The aircraft is described as a collection of facets and wedges. Its principal advantages lie in its simplicity, and in its capacity to represent any geometry. An improved hybrid method (HM) of microwave backscattering calculation is proposed. The improved method of equivalent currents (IMEC) can effectively eliminate the infinities on the shadow and reflection boundaries. The AP/PO is better for computation of the multiple scattering. According to this new efficient hybrid method (PO+IMEC+AP/PO+GO), the high-frequency radar cross section (RCS) of electrically large, complex radar targets are calculated. Furthermore, According to the principle of inverse synthetic aperture radar (ISAR) imaging, the space distribution of target electromagnetic scattering characteristics is simulated. Based on acquired scattering characteristics image of some type airplane through above method, select an airfield as practical region, use experimental data of airborne high resolution KU band SAR image of our country to automatically identify some type airplane in the SAR image. The experimental results prove the validity of electromagnetic scattering characteristics ISAR imaging and the feasibility of ATR in SAR image basing on it.
Acknowledgment This work is partially supported by the National Natural Science Foundation of China Grant # 60272022 to J.S.Song. The airborne SAR image data are supplied by China Electronic Technology Corporation No.38 Institute.
Automatic Target Recognition in High Resolution SAR Image
171
References 1. Kuang,G.Y., Ji, K.F., SU, Y.,YU,W.X.: A Survey of Researches on SAR ATR[J]. Chinese Journal of Image and Graphics, 8(10): (2003) 1115-1120 2. Ruan, Y.Z.: Radar Cross Section and Hiding Technology[M]. publishing House of Defence Industry, Beijing (2001) 3. Mensa,D.L.: High Resolution Radar Cross-Section Imaging [M]. Artech House , Boston. London (1991) 4. Youssef,N.N.: Radar Cross Section of Complex Targets, Proc. IEEE, vol. 77, no. 5, May-June (1989) 722-734 5. Rius,J.M., Ferrando,M., Jofre,L.: High-Frequency RCS of Complex Radar Targets in Real-Time, IEEE Trans. Antennas and Propagat., vol. AP- 41, no. 9, Sept. (1993)1308-1460 6. Domingo,M., Rivas,F., Perez,J., Torres,R.P., Catedra,M.F.: Computation of The RCS of Complex Bodies Modeled Using NURBS Surfaces, IEEE Antennas and Propagat., vol. 37, no. 6, Dec. (1995) 36-47 7. ZHANG,Y.H.,YANG, X.Q.,GUO, H.T.: Aircraft Image Recognition Based on Affine Transformation[J]. Chinese Journal of Acta Aeronautica et Astronautica Sinica. 24(3): (2003) 251-254 8. ZHANG, C.: Research on Automatic Target Recognition in High Resolution SAR Images[D]. National university of defense technology. Chang sha (2003) 9. Rius,Juan.M., Ferrando,Miguel., Jofre,Luis.:High-Frequency RCS of Complex Radar Targets in Real-Time. IEEE Antennas and Propagat., vol.41,no.9,Sep. (1993) 13081319
Boosting in Random Subspace for Face Recognition Yong Gao and Yangsheng Wang Institute of Automation, Chinese Academy of Sciences, Beijing 100080, P. R. China
[email protected]
Abstract. Boosting is a general method for improving the performance of any learning algorithm. In this paper, an unusual regularization technique, boosting in random subspace, is employed to improve generalization capability of boosting. Meanwhile the space complexity of training is lowered, and the classifier combination strategy can be used to further improve recognition accuracy. Using the method, we achieved 98.99% rank-1 recognition rate on FERET f b probe set.
1
Introduction
Boosting is a general method for improving the accuracy of any given learning algorithm. As an important and typical pattern classification problem, face recognition is an important application field of boosting. Heretofore, many boosting based algorithms have been proposed for face recognition [1] [2] [3] [4]. The main difference existed among these algorithms is that they boost weak learner in different feature spaces (Haar-like feature, Gabor feature, Local Binary Pattern (LBP) [5] feature etc). There is little optimization work on boosting itself according to different types of features. In this paper we propose a novel boosting framework for face recognition. Instead boosting weak learner in original feature space, only randomly selected feature subspace is used. We argue that using the randomly selected feature subspace has the effect of lowering the VC dimension [6] of original base classifier function space indirectly. Classifiers trained in the random subspace have better generalization capability than those trained in the original feature space under the same training error. A side product is that the space complexity of training is also lowered. Moreover, we can produce several random subspaces, and combine the classifiers trained in these subspaces, which will further improve classification accuracy because of the different properties of these classifiers. All of these deductions are justified by our extensive experiments on FERET database [7].
2
Related Work and Motivation
In [4], Zhang et al. proposed a boosting based face recognition algorithm. By scanning the face image with a scalable sub-window, over 7,000 sub-regions are obtained, from which corresponding Local Binary Pattern (LBP) [5] histograms are D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 172–181, 2006. c Springer-Verlag Berlin Heidelberg 2006
Boosting in Random Subspace for Face Recognition
173
extracted. Using the concepts of Intra-personal Variation and Extra-personal Variation, face recognition is converted from a multi-class pattern recognition problem into a two-class one. The Chi square distances between corresponding LBP histograms are used as discriminative features for this intra/extra-personal classification. Then AdaBoost [8] is employed to learn classifier for the two classes problem, whether a pair of face images comes from the same persons or different persons? The Chi square statistic of histograms in sub-windows is a kind of feature with strong discriminant power. In [5], a face image is equally divided into 7 × 7 sub-regions, from which LBP histograms are extracted and the recognition is performed using a nearest neighbor classifier in the computed feature space with Chi square as a dissimilarity measure. With 49 (7 × 7 sub-regions) Chi square distances, the weighted version of the algorithm achieves impressive recognition accuracy on FERET database. To further demonstrate the strong discriminant power of the Chi square distance of histograms in sub-regions, we implemented the algorithms proposed in [4]. The training error curve on FERET database is shown in Fig. 1. The weak learner is stump classifier, that is, only one feature used in each weak classifier. The trainging error is only 0.052 after the first feature is learned, and almost reaches zero after 24 features are learned. 0.06
0.05
error
0.04
0.03
0.02
0.01
0
0
5
10
15
20
25
rounds
Fig. 1. Training error on FERET database using the algorithms proposed in [4], boosting with LBP based Chi square statistic of histograms in sub-windows. The weak learner is stump classifier.
Boosting is a family of methods for generating combination that use only simple rules but combine them so that the combined rule fits the training data well [9]. The generalization capability is mainly comes from the simpleness of basic classifiers. Freund and Schapire [10] bound the generalization error of the final classifier in terms of its training error, the size m of the sample, the VCdimension d of the base classifier space and the number of rounds T of boosting.
174
Y. Gao and Y. Wang
They show that the generalization error of the final classifier H(x), with high probability, is at most Td ˜ ˜ P r[H(x) = y] + O( ). (1) m where P˜ r[·] denotes empirical probability on the training sample (Note that Schapire et al. later gave an alternative analysis in terms of the margins of the training examples [11] and the new bound is entirely independent of T . However, other researchers found that boosting forever can over-fit the data [12]. So we still use the earlier analysis here). Let g(·) denote stump classifier and φ(·) denote mapping function from original image space Rd to feature space R1 . For LBP based Chi square statistic of histograms, φ(·) is a nonlinear mapping. In the above boosting recognition algorithm, the true base classifier is G(x) = g(φ(x)),
(2)
not the g(·), which is often confused. The VC-dimension of space of function G(·) is much higher than that of the stump classifier function space. However, it is not an easy task to know the VC-dimension of a space of function G(x). An frequently encountered question is ”What classifier can be boosted?”. This is a very difficult problem, part of the difficulties come from the unkown VC-dimension. If using boosted decision stump classifier, as analyzed in 2, the question should be ”What features can be used together with the boosting classification algorithm ?”
3
Boosting in Random Subspace
To partly relieve the problem, an unusual regularization technique, boosting in random subspace, is proposed. Only part of dimensions from a given feature space are randomly selected. All samples are projected to this subspace, and boosting is carried out using the projected training samples. The effect of this process is equivalent to indirectly lower the VC-dimensions of base classifier space and better generalization capability can be expected. 3.1
Two Boosting Methods
Based upon the above thought, we propose two boosting methods, Boosting Method 1 and Boosting Method 2, which make use of random subspace. The two methods are given in Fig. 2. For Boosting Method 1, a random subspace is randomly selected from original feature space and boosting is carried out using the projected training samples in this subspace. This method improves generalization capability by weakening discriminant power of base classifier space indirectly. The Boosting Method 2 generates multiple random subspaces, so multiple classifiers are produced, which are further combined using simple fusion scheme,
Boosting in Random Subspace for Face Recognition
175
sum rule. Improved accuracy mainly come from classifier combination. There are one parameter L, the number of dimensions of random subspace in Boosting Method 1, and two parameters L and M , in Boosting Method 2. M is the number of random subspaces . They can be decided using the cross-validation strategy. Boosting Method 1 Training: Step 1: Create feature space F = {φ1 , φ2 , ..., φN }. Step 2: Randomly selected a feature subspace F = {φ1 , φ2 , · · · , φL } with dimensionality L from F . Step 3: Project all training samples into the random subspace F . Step 4: Carry out boosting using the projected training samples, and output classifier H(x). Recognition: Evaluate classifier, get similarity score S(x) = H(x). Boosting Method 2 Training: Step 1: Create feature space F = {φ1 , φ2 , ..., φN }. with Step 2: Generate M random feature subspaces F1 , F2 , · · · , FM dimensionality L. Step 3: For i = 1, 2, · · · , M Project all training samples into random subspace Fi . Carry out boosting using the projected training samples, and output classifier Hi (x). Recognition: Step 1: For i = 1, 2, · · · , M Evaluate classifier, get similarity score Si (x) = Hi (x). Step 2: Combine scores using sum rule S(x) = M i=1 Si (x) Fig. 2. Two boosting methods based on random subspace
3.2
Random Subspace
In [13], T. K. Ho first proposed that combining multiple trees constructed in randomly selected subspaces can achieve nearly monotonic increase in generalization accuracy while preserving perfect accuracy on training data, provided that the features are sufficient to distinguish all samples belonging to different classes. The theory behind T. K. Ho’s algorithm is stochastic discrimination (SD), where the combination of various ways to partition the feature spaces is studied [14]. In the SD theory, classifiers are constructed by combining many components that have weak discriminative power but can generalize very well. The Boosting Method 2 proposed here may have some relations with the SD theory. However, the component classifier out of boosting is not weak. It is more natural to find the theory base from classifier fusion. Classifiers trained in different subspaces, which are randomly produced from original feature space, have different characteristics, which is a very important condition for classifier
176
Y. Gao and Y. Wang
combination. With this property, combining the output of these classifiers will achieve improved accuracy. Boosting Method 2 will achieve higher accuracy than Boosting Method 1. The Boosting Method 1 proposed here is to weaken discriminant power of original feature space, which, in a sense, produce the effect of lowering VCdimension of base classifier function space. Under the same empirical error, more accuracy is expected to be achieved than boosting in original feature space.
4
Experiments
A series of experiments were carried out on FERET database [7] to test the proposed two boosting method using Chi square statistic of histograms in sub-windows as features and stump classifier as weak learner. By changing subwindow size and position, total 11700 features are created. The FERET standard training set totally includes 736 images of 314 subjects, which yields 592 intra-personal pairs and 269, 888 extra-personal pairs. For the number of extrapersonal pair is too large, we randomly selected 7000 as training samples, and all intra-personal pairs are used. All images are cropped and rectified according to the manually located eye coordinates. The normalized face images are 142 pixels high by 120 pixels wide. 4.1
Experiments on Boosting Method 1
In these experiments, we test the Boosting Method 1. One of our purposes is to compare recognition accuracies of Boosting Method 1 and boosting in original feature space. To avoid the influence of other factors, we do not use the cascade structure [15], that is, only one stage are trained. So the experimental results may be different with those using cascade structure. For comparison of generalization capabilities under the same training error, we still use training error as stopping criterion. The dimension number L of random subspace is the only parameter of Boosting Method 1. To shed light on the effect of dimension number of subspace to recognition accuracy, eight kinds of random subspace with different dimensionality are generated from 11700-dimension feature space. The numbers of dimensions are 600, 800, 1000, 2000, 3000, 4000, 5000 and 6000. And for every kind of dimensionality, at least 12 random subspaces are produced. The results on FERET standard probe sets f b, f c, dupI and dupII are demonstrated in Fig. 3, 4, 5 and 6. The blue solid lines in figures are rank-1 recognition rates of boosting in original 11700 features space. The red dashed lines are mean rank1 recognition rates in random subspace with dimension number indicated on x-axis. Error bars are standard deviations. From these results, we can find i) Boosting Method 1 achieves better recognition accuracy than boosting in original LBP feature space under the same training error; ii) the recognition accuracies are different along with different dimensionality of random subspaces, and there is a peak in the random
Boosting in Random Subspace for Face Recognition
177
0.98
Rank−1 recognition rate on fb probe set
Standard deviatioin Boosting Method 1 Boosting in original feature space 0.97
0.96
0.95
0.94
0.93
0.92
0.91
0
1000
2000
3000
4000
5000
6000
7000
Dimension number of random subspace L
Fig. 3. Rank-1 recognition rate comparison between Boosting Method 1 and boosting in original feature space on FERET f b probe set. The error bar is standard deviation. 0.36 Standard deviatioin Boosting Method 1 Boosting in original feature space
Rank−1 recognition rate on fb probe set
0.34
0.32
0.3
0.28
0.26
0.24
0.22
0.2
0.18
0.16
0
1000
2000
3000
4000
5000
6000
7000
Dimension number of random subspace L
Fig. 4. Rank-1 recognition rate comparison between Boosting Method 1 and boosting in original feature space on FERET f c probe set. The error bar is standard deviation.
subspaces with 1000 dimensions. However, there is not clear relationship between dimension number L and recognition accuracy, which is a little out of our expectation. 4.2
Experiments on Boosting Method 2
For Boosting Method 2, we use the same setup with previous experiments. The number M of random subspaces is an extra parameter. We test the effect of
178
Y. Gao and Y. Wang 0.64 Standard deviatioin Boosting Method 1 Boosting in original feature space
Rank−1 recognition rate on fb probe set
0.62
0.6
0.58
0.56
0.54
0.52
0.5
0.48
0.46
0
1000
2000
3000
4000
5000
6000
7000
Dimension number of random subspace L
Fig. 5. Rank-1 recognition rate comparison between Boosting Method 1 and boosting in original feature space on FERET dupI probe set. The error bar is standard deviation.
Rank−1 recognition rate on fb probe set
Standard deviatioin Boosting Method 1 Boosting in original feature space 0.55
0.5
0.45
0.4
0
1000
2000
3000
4000
5000
6000
7000
Dimension number of random subspace L
Fig. 6. Rank-1 recognition rate comparison between Boosting Method 1 and boosting in original feature space on FERET dupII probe set. The error bar is standard deviation.
different M , from 4 to 12 on final recognition accuracy. The test results on FERET f b probe sets are shown in Table 1. Meanwhile, the corresponding minimum, mean and maximum rank-1 recognition rates of Boosting Method 1 also are listed in the last three columns of every table for comparison. First, it is very clearly that Boosting Method 2 improves recognition accuracy greatly compared with Boosting Method 1. Recognition rates of Boosting Method 2 are higher than the maximum ones of Boosting Method 1. The
Boosting in Random Subspace for Face Recognition
179
Table 1. Rank-1 recognition rate comparsion between Boosting Method 2 and Boosting Method 1 on Feret f b probe set. M is the number of random subspaces.
M 0600 0800 1000 2000 3000 4000 5000 6000
4 0.979 0.986 0.990 0.968 0.975 0.976 0.977 0.967
5 0.978 0.987 0.990 0.972 0.974 0.976 0.975 0.966
Boosting Method 2 6 7 8 9 10 0.979 0.982 0.982 0.982 0.985 0.989 0.987 0.987 0.989 0.990 0.989 0.990 0.989 0.989 0.990 0.975 0.977 0.979 0.977 0.977 0.976 0.976 0.975 0.978 0.978 0.976 0.977 0.980 0.982 0.981 0.977 0.978 0.979 0.978 0.978 0.968 0.968 0.969 0.969 0.972
11 0.985 0.990 0.990 0.978 0.978 0.982 0.978 0.970
12 0.983 0.990 0.989 0.977 0.979 0.981 0.980 0.971
Boosting Method 1 Min Mean Max 0.936 0.950 0.956 0.962 0.968 0.978 0.959 0.971 0.978 0.922 0.933 0.941 0.940 0.950 0.964 0.921 0.945 0.961 0.937 0.949 0.956 0.905 0.931 0.960
Table 2. Recognition accuracies of several top recognition methods on FERET Fb test set Meth. Fisherface GFC Result of [5] W-LGBPHS Accu. 0.94 0.96 0.97 0.98
improvement is about 1 ∼ 4% on f b set compared to the corresponding maximum recognition rates of Boosting Method 1. Our other works [16] [17] also demonstrate the effectiveness of Boosting Method 2 with other kind of features for face recognition. Second, along with the increase of M , the recognition rates of Boosting Method 2 have the improving trend, but is not very significant. For example, see Table 1, The recognition rate of Boosting Method 2 with four 600dimension random subspaces is 0.979. When random subspaces number reach 7, the recognition rate is 0.982, and 0.985 when 10 random subspaces are used. On other probe sets and random subspaces with different dimension number, this phenomenon is also observed. However, in practical application, the increased random subspaces will produce more features to be saved and total training time is also increased. So there is tradeoff between complexity and accuracy on the selection of M . In Table 2, we also report the rank-1 recognition rates of some top face recognition algorithms including Fisherface [18], Gabor Fisher Face (GFC) [19], LBP based method [5] and Weighted LGBPHS [20]. We can find that the boosting in random subspaces method 2 with 800 or 1000-dimension is comparable with any of these top algorithms. 4.3
Computation Complexity
On a computer with P4-2.6GHz CPU, 2Gb RAM, the average time of training single classifier in original feature space and random subspace with different dimensionality is reported in Table 3. We can see that time complexity of the proposed algorithm with proper number M of random subspaces is comparable
180
Y. Gao and Y. Wang
Table 3. Average training time using LBP features on FERET face database. D and T denote dimensionality and training time respectively. D 11,700 1000 2000 3000 4000 5000 T 2 hours 30 min 34 min 40 min 47 min 50 min
with original boosting method. And the new boosting method can be distributed on multiple processor, which can further reduce training time.
5
Conclusions
In this paper, we proposed a unusual regularization method, boosting in random subspace, to improve generation capability of boosting when the discriminant power of feature space is too strong. Guided by this thought, two boosting method were proposed. The Boosting Method 1 carry out boosting in randomly selected feature subspace, which, we believe, indirectly lowers the VC-dimension of base classifier function space and improves generalization capability. Our extensive experiments on FERET database have shown that under the same training error, the Boosting Method 1 has better recognition accuracy than boosting in original feature space. The Boosting Method 2, in fact, is the combination of classifiers that are trained using Boosting Method 1. Because these classifiers are trained in different random subspaces, they have different generalization characteristics, which provide a solid foundation for further classifier fusion. Using simple sum rule, the Boosting Method 2 achieves better recognition accuracy than Boosting Method 1, which is also verified by an series of experiments on FERET database. Then we compared these two methods with other state of the art algorithms on FERET database, the results demonstrate that our methods are favorably comparable with them.
References 1. Jones, M., Viola, P.: Face Recognition Using Boosted Local Features. MERL Technical Reports 25 (2003) 2. Yang, P., Shan, S., Gao, W., Li, S.Z., Zhang, D.: Face Recognition Using AdaBoosted Gabor Features. FGR (2004) 3. Zhang, L., Li, S.Z., ZhiYiQu, Huang, X.: Boosting Local Feature Based Classifiers For Face Recognition. CVPR Workshops (2004) 4. Zhang, G., Huang, X., Z.Li, S., Wang, Y., Wu, X.: Boosting Local Binary Pattern (LBP)-Based Face Recognition. Sinobiometrics’04 (2004) 5. Ahonen, T., Hadid, A., Pietik¨ ainen, M.: Face Recognition With Local Binary Patterns. ECCV (2004) 469–481 6. Vapnik, V.N., Chervonenkis, A.Y.: On The Uniform Convergence Of Relative Frequencies Of Events To Their Probabilites (1971) Theory of Probability and its applications. 7. P. J. Pillips, H. Moon, S.A.R., Rauss, P.J.: The Feret Evaluation Methodology For Face Recognition Algorithms. IEEE Trans. on PAMI 22 (2000)
Boosting in Random Subspace for Face Recognition
181
8. Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization Of On-Line Learning And An Application To Boosting. Journal of Computer and System sciences 55 (1997) 119–139 9. Freund, Y.: An Introduction To Boosting Based Classification. Proceedings of the AT&T conference on Quantitative Analysis (1998) 10. Schapire, R.E., Singer, Y.: Improved Boosting Algorithms Using Confidence-rated Prediction. Proceeding of the Eleventh Annual Conference on Computation Learning Theory (1998) 80–91 11. Schapire, R.E., Freund, Y., Bartlett, P., Lee, W.S.: Boosting The Margin: A New Explanation For The Effectiveness Of Voting Methods. The Annals of Statistics 26 (1998) 1651–1686 12. Grove, A., Schuurmans, D.: Boosting in the Limit: Maximizing the Margin of Learned Ensembles. Proceedings of the Fifteenth National Conference on Artificial Intelligence (1998) 13. Ho, T.K.: Random Decision Forests. Proc. Third Int’l Conf. Document Analysis and Recognition (1995) 278–282 14. Kleinberg, E.M.: An Overtraining-resistant Stochastic Modeling Method for Pattern Recognition. Annals of Statistics 4 (1996) 2,319–2,349 15. Viola, P., Jones, M.: Robust Real Time Object Detection. ICCV (2001) 16. Gao, Y., Wang, Y.: Boosting in Random Subspaces for Face Recognition. Accepted by ICPR06 (2006) 17. Gao, Y., Wang, Y., Feng, X., Zhou, X.: Boosting Gabor Features For Face Recognition Using Random Subspace. Accepted by ICASSP06 (2006) 18. Belhumeur, P.N., ao P. Hespanha, J., Kriegman, D.J.: Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection. IEEE Trans. on PAMI 19 (1997) 19. Liu, C., Wechsler, H.: Gabor Feature Based Classification Using The Enhanced Fisher Linear Discriminant Mdoel for Face Recognition. IEEE Trans. Image Processing 11 (2002) 467–476 20. Zhang, W., Shan, S., Gao, W., Chen, X., Zhang, H.: Local Gabor Binary Pattern Histogram Sequence (LGBPHS): A Novel Non-statistical Model For Face Representation And Recognition. ICCV (2005)
Component-Based Human Body Tracking for Posture Estimation Kyoung-Mi Lee Intelligent Multimedia Lab. Department of Computer Science Duksung Women’s University Seoul 132-714, Korea
[email protected] http://imlab.duksung.ac.kr/
Abstract. To track a human body and estimate its posture, a component-based approach is less susceptible to changes in posture and lighting. This paper proposes model-based tracking with a component-based human body model comprised of 10 components and their hierarchical link. The proposed method first divides a video frame into blobs based on color, groups the blobs to make components, and matches the components to human body parts. Instead of matching blobs individually, the proposed model-based tacking uses components and their links. This paper shows the making of coarse-to-fine searches, so it offers a model that can make human-body matching more time-efficient.
1 Introduction Determining a human body’s shape and estimating its posture is an important issue in computer vision and poses several difficulties, because the human body is a non-rigid form. A full-body approach is subject to a high degree of transformation because the method is affected significantly by changes in posture and lighting conditions. Therefore, a component-based approach has been widely adopted, which considers each part of the human body as a component and that subsequently uses the relationships between the components to represent the entire human body. Human posture can be estimated by tracking body parts in video frames. One simple approach is to group pixels by color to blobs, as a part of components, and to map them from the previous frame to the current frame. Such a blob-based approach may allow easy image processing, but cause problems due to the different number of blobs in each frame [5,6]. The other approach for tracking the parts can build complete components, use their configuration in the current frame, and predict the next configuration [2]. Such a dynamic configuration, however, requires complement image processing and should be difficult with data association. In this paper, we use a component-based human body model to determine the human body’s parts and a model-based approach to track those parts. The proposed model-based tracking system treats a set of blobs as a human model's component instead of tracking the blobs individually, and could prevent the affect in tracking by error that can happen in image processing. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 182 – 190, 2006. © Springer-Verlag Berlin Heidelberg 2006
Component-Based Human Body Tracking for Posture Estimation
183
2 Component-Based Human Body Model We use a human-body model that consists of 10 body components, and that subsequently connects the components in a hierarchical manner (Fig. 1). Each component contains geometric information (position, relative size, and shape) and appearance information (average color and standard deviation). Each component also includes information on its link to other components, such as which of the four sides that represent each component are connectible, the names of the parts being connected, the connecting angles and the connecting distances. A human-body model can be represented as follows: (1) i=1, …, I HBi = ( gi , ai , Ri ) , where HBi = ( gi , ai , Ri ) represents a human body model that consists of I (=10) number of components. g i and ai refer to each component’s geometric and appearance information, respectively, while
Ri represents information on the link between components.
Fig. 1. A component-based hierarchical human model consists of a head (H), a torso (T), a left upper arm (LUarm), a left lower arm (LLarm), a right upper arm (RUarm), a right lower arm (RLarm), a left upper leg (LUleg), a left lower leg (LLleg), a right upper leg (RUleg), and a right lower leg (RLleg). Lines between components mean their hierarchical relations for matching.
3 Matching with a Component-Based Model In this section we introduce an approach that matches the component-based human body model to image frames. The proposed approach compensates for illumination noises, separates a human body, makes blobs by grouping pixels with similar colors, initializes the human model with sets of blobs and matches the blobs to corresponding components using hierarchical relations. 3.1 Adaptive Background Subtraction Video image frames taken by a camera have variation in lighting conditions caused by lighting, time of day, and so on. Because such conditions’ noises inhibit tracking, the
184
K.-M. Lee
lighting noise should be removed from the images. To separate noises from images, an intrinsic image can be used to get a noise image by subtracting it from the image frame. Recently, Matsushita et al proposed a method for time-dependent intrinsic image estimation [4]. In this paper, we update a noise image frame-by-frame to estimate a time varying intrinsic image. We first initialize a noise image by subtracting the first image frame from the intrinsic image and then update the noise image frameby-frame. If a pixel is similar to a noise pixel, the pixel is updated. To detect the human body after illumination correction, background subtraction provides the most complete feature data. In this paper we build an adaptive background model using the background’s mean and standard deviation. Whenever a new frame arrives, a change in pixel intensity is computed using Mahalanobis distance to classify background or foreground (humans). The evaluated distance is compared to a difference threshold previously observed from the sequence of images. If a pixel is classified as background, the adaptive background model is updated with the pixel. 3.2 Human Model Initialization After background subtraction, tracking human body parts should be initiated when they start to appear in the video. To group segmented foreground pixels into a blob and to locate the blob on a body part, we use a connected component algorithm which calculates differences in intensities between a pixel and its neighborhoods and then merge small blobs into large blobs and neighboring blobs that share similar colors are further merged to overcome over-segmentation generated by initial grouping. Each blob contains information such as an area, a central position, color, position, a bounding box and a boundary to form a human body. Then some of the created blobs are removed according to criteria, such as being too small, too long, or too heterogeneous and incompact blobs. As a human can be defined as a subset of blobs, which correspond to human body parts, blobs in a frame should be assigned to corresponding individuals to facilitate multiple individual tracking. Let P0 be a subset of blobs Bi . The set of potential person areas is built iteratively, starting from the P0 set and its adjacent graph. Set P1 is obtained by merging compatible adjacent blobs of P0 . Then each new set of merged blob Pk is obtained by merging the set of merged blob Pk −1 with the original set P0 . Finally, a candidate body part HBi contains the assembled sets of merged blob Pk , i.e., HBi = Kk=0 Pk . 3.3 Hierarchical Matching Estimating postures with a component-based hierarchical human body model is a matching process which is used to configure a proper human body model by combining the detected components. The human body model (Eq. (1)) is matched to take the combination that has the least variance or the greatest probability of matching, d i = ¦ min g i − p j + ¦ min ai − q j +¦ min Ri − r j i
j
i
j
i
j
(2)
Component-Based Human Body Tracking for Posture Estimation
185
p q r where ( j , j , j ) refer to geometrical information and appearance information regarding the j-th component and its relationships to other components. To match the model to a human body, we adopt coarse-to-fine searches. Such searches limit the relationship between components to tree structures in a top-down model, and restrict the matching sequence in a hierarchical way, so as to reduce the range of any search for different components. For example, the left arm is matched prior to a detailed search for the upper/lower left arm, and the left arm’s matching probability, d Larm , is considered to be the sum of the matching probability of the up-
per/lower left arm ( d LUarm / d LLarm ): d Larm = d LUarm + d LLarm . To detect the other parts, such a hierarchical technique can be applied iteratively.
4 Model-Based Human Body Tracking Tracking a human body poses several difficulties because the human body is a nonrigid form. After forming blobs, blob-based human body tracking maps blobs from the previous frame to the current frame by computing the distance between blobs in consecutive frames. Such a blob-based approach for tracking multiple body parts may cause problems, however, due to the different number of blobs in each frame: blobs can be split, merged, or can even disappear or be newly created. Many-to-many blob mapping can be applied to overcome such a situation. In this paper we assume that a human body HBit −1 has already been tracked up to frame t-1 and that new blobs Bit are formed in frame t. Multi-parts are then tracked as follows: Case 1: If Bit is included in HBit −1 , the corresponding blob in HBit −1 is tracked to Bit .
Case 2: If a blob in HBit −1 is separated into several blobs in frame t, the blob in HBit −1 is tracked to one blob in frame t and other blobs at time t are appended into HBit −1 .
186
K.-M. Lee
Case 3: If several blobs in HBit −1 are merged into Bit , then one blob in HBit −1 is tracked to Bit and other blobs are removed from HBit −1 .
Case 4: If Bit is included in HBit −1 but the corresponding blob does not exist, then Bit is added to HBit −1
where including a region in a body part with a bounding box means the region overlaps over 90% of the part. Corresponding a blob to the human body model is computed using Eq. (2). One advantage of such a model-based tracking is to eliminate the burden of correctly blobbing. Even though a blob can be missed by an illumination change, model-based tracking can retain individual identity using other existing blobs.
5 Adaptive Human Modeling In this section we introduce a human body model that connects body parts in a hierarchical manner and a component-based online learning, which is ideally suited for highly repetitive tasks. Therefore, the components are updated online for every one of a new human body is incoming. It is more desirable to learn such a pattern incrementally, based on examples provided by the user while he is solving a problem. 5.1 Component-Based Online Learning Each component of the component-based human body model can change with posture and lighting environments, so it is necessary for the geometrical and appearance information to be updated. If a set of human examples is given, the model is simply updated by calculating each component’s averages, μ , and standard deviations, σ , of each component. Such a batch update, however, is not suitable for maintaining the model in an online environment. Therefore, instead of collecting all previous examples each time a new example is modeled, it is more useful to extend the human model with only the new example [3].
Component-Based Human Body Tracking for Posture Estimation
187
When an n-th example, hn , is modeled, the averages and standard deviations of the human model’s i-th component are updated as follows:
μ i ,n −1 + hi ,n and (n − 1)u n−1 + ( μ i ,n − hi ,n ) 2 . σ i ,n =
μ i ,n =
n
n
where hi is the geometric, g i , or appearance, ai , information of the i-th component, u n−1 = (σ i ,n −1 ) 2 + ( μ i ,n − μ i ,n−1 ) 2 . It follows that Eq. (1) and (2) can be respectively modified as follows:
[(
)(
) ]
HBi = μ ig ,σ ig , μ ia ,σ ia , Ri d i = ¦ min j
i
and
μ ig − p j μ ia − q j + ¦ min +¦ min Ri − r j . g j j σi σ ia i i
(3)
where μ ig and μ ia are average and σ ig and σ ia are standard deviations of the geometrical, g i , or appearance, ai , information of the i-th component. Thus, after training on n examples, each component represents the statistical information of the corresponding body part in the component space. This makes it possible to adaptively estimate human posture in case of posture and lighting changes. 5.2 Online Update of the Human Model Because human body parts are tracked using a human model, the human model’s information is stored to track multiple body parts. Even though a human’s total motion is relatively small between frames, large changes in the color-based model can cause simple tracking to fail. To resolve this sensitivity of color model tracking, we compare the current component to a reference human model. The reference human model should be compensated according to occlusion as well as illumination changes. Let a human HBit −1 be represented by an average ( μit −1 ) and a deviation ( σ it −1 ), which are computed up to time t-1, and current components with ( p tj , q tj , r jt ) are formed in frame t. The minimum difference between the human model HBit −1 (Eq. (3)) and the current component ( p tj , q tj , r jt ) is computed as follows: d it = ¦ min i
j
μig ,t −1 − p tj μ ia ,t −1 − q tj + ¦ min + ¦ min Ri − r j g ,t −1 j j σi σ ia ,t −1 i i
.
(4)
where μ ig ,t −1 and μ ia ,t −1 are average and σ ig ,t −1 and σ ia ,t −1 are standard deviations of the geometric, g i , or appearance, ai , information of the i-th component until time t-1. If the minimum distance is less than a predefined threshold, the online learning algorithm adds components ( p tj , q tj , r jt ) to the corresponding human body model ( μ ig ,t −1 and μ ia ,t −1 ) and updates the adaptive model by recalculating their center and uncertainties. Here, similarity thresholds are set empirically and can be adjusted by a user.
188
K.-M. Lee
6 Experimental Results and Conclusions The proposed algorithm was implemented in JAVA and tested in Windows 2000 with a Pentium-IV 1.8 GHz CPU with a memory of 512 MB. The video frames used in this experiment are taken from the images (420×316) acquired with Sony DSC-P10. To evaluate the proposed algorithm’s tracking performance, we used the ROC curves with tracking sensitivity and tracking accuracy. Tracking sensitivity is the likelihood that a body part as a component will be tracked by a component-based model. Tracking accuracy is the likelihood that a tracked component is a corresponding body part.
Tracking sensitivity = Tracking accuracy =
number of tracked persons number of persons being tracked
number of correctly tracked persons number of tracked persons
Tracking accuracy
1 0.9 0.8
Head Torso Legs Arms
0.7 0.6 0.5 1
11
21
31
41
51
61
71
81
91
101
111
121
Tracking sensitivity (%) Fig. 2. Tracking results using model-based tracking
Fig. 2 illustrates the average accuracy of tracking body parts to estimate human body posture. Our head and torso tracking rates in Fig. 2 are quite good with 100% and 99.17%, respectively. Arms and legs that are tracked by dividing the parts into upper and lower parts result in tacking rate of 91.74% and 95.04%, respectively. The estimation rate for human-body posture is calculated on the assumption that it is good only when all the human-body parts are matched correctly. With this matching method, we get a 90.08% estimation rate of human-body posture. Table 1 shows the tracking rates of each part, comparing them to Colombo et al [1]. While they tracked a head, arms and legs without change in shape or orientation, the proposed method tracks 10 different components with variations in shape, orientation or occlusion for non-rigid tracking. Fig.3 shows tracking results in four consecutive frames.
Component-Based Human Body Tracking for Posture Estimation
189
Table 1. Tracking performance of head, arms, and legs
Head Arms Legs
C. Colombo et. al [1]
Proposed method
93.10% 93.05% 92.70%
100% 91.74% 95.04%
In this paper, we propose a method to track the human body and to estimate its postures through a component-based human body model and model-based tracking. A component-based method is useful for estimating an appropriate human body posture even when all of its components are not clearly detected due to partial distortion. The component-based human body model proposed in this paper makes use of the external shape, including color and size of human body components, flexible connection data and hierarchical relationship data. Additionally, the proposed model-based tracking reduces the load of exactly blobbing, keeping a set of blobs as an individual component of the human model.
Fig. 3. Tracking results in four consecutive frames
Acknowledgments This work was supported by the Korea Research Foundation Grant funded by the Korean Government (MOEHRD) (R04-2003-000-10092-0 (2005)).
References 1. Colombo, C., Bimbo, A.D., Valli, A.: Non Intrusive Full Body Tracking for Real-time Aatar Animation. Proc. of International workshop on very low bitrate video coding (2001) 36~51 2. Huang, Y., Huang, T. S.: Model-based Human Body Tracking. Proc. of ICPR. (2002) 552-555
190
K.-M. Lee
3. Lee, K. M., Street, W. N.: Model-based Detection, Segmentation and Classification using On-line Shape Learning. Machine vision and application, Vol. 19, No. 4 (2003) 4. Matsushita, Y., Nishino, K., Ikeuchi, K., Sakauchi, M.: Illumination Normalization with Time-dependent Intrinsic Images for Video Surveillance. IEEE trans. on PAMI, 26(10) (2004) 1336 -1347 5. Mori, G., Malik, J.: Estimating Human Body Configurations using Shape Context Matching. Proc. of ECCV (2002) 666-680 6. Wren, C., Azarbayejani, A., Darrell, T., Pentland, A.: Pfinder: Real-time Tracking of the Human Body. IEEE trans. on PAMI, Vol. 11, No. 9 (1997)
Computation of the Probability on the Number of Solution for the P3P Problem Jianliang Tang1,2 , Xiao-Shan Gao2, , and Wensheng Chen1 1
2
College of Science, Shenzhen University Shenzhen 518060, P.R.China {jtang, chenws}@szu.edu.cn Key Laboratory of Mathematics Mechanization, CAS Beijing 100080, P.R.China
[email protected]
Abstract. The perspective-n-point (P nP ) problem is to find the position and orientation of a camera with respect to a scene object from n correspondence points and is a widely used technique for pose determination in the computer vision community. This paper studies the multi-solution phenomenon for the perspective 3-point (P3P) problem. For the P3P problem, we give: 1) an algorithm to compute the number of solutions based on a Monte-Carlo type method; 2) the probabilities for the P3P problem to have zero, one, two, three and four solutions using the algorithm.
1
Introduction
One of the fundamental goals of computer vision is to discover properties that are intrinsic to a scene by one or several images of this scene. Within this paradigm, an essential process is the determination of the position and orientation of the sensing device (the camera) with respect to objects in the scene. This problem is known as the exterior camera calibration problem and has many applications in pattern recognition, robotics, image analysis, automated cartography and photogrammetry, etc. Fischer and Bolles [5] summery the problem as follows: “Given the relative spatial locations of n control points, and given the angle to every pair of control points from an additional point called the Center of Perspective (CP ), find the lengths of the line segments joining CP to each of the control points.” This problem is referred to as the perspective n-point (PnP) problem. One of the major concerns on the PnP problem is its multi-solution phenomenon. The reason is that if the solution is not unique, then we need further determine which solution is the one we want. Unfortunately, PnP problems for
Partially supported by Shuxue Tianyuan Foundation (No.10526031). IEEE Member.
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 191–200, 2006. c Springer-Verlag Berlin Heidelberg 2006
192
J. Tang, X.-S. Gao, and W. Chen
n ≤ 5 all have multiple solutions. It is a well-know result that the P3P problem could have one to four solutions [1,5,22,24,25]. A complete solution classification of the P3P problem was given in [7]. In order to obtain a unique solution, one natural way is to add one control point to consider a P4P problem. Algorithms for solving the P4P problem were proposed in [10,17,18] without considering the number of solutions. When the control points are coplanar, the P4P problem has a unique solution [1]. If control points are not coplanar, the P4P problem could have up to five solutions [12]. It is proved that the P5P problem could have two solutions [13]. For n ≥ 6, the PnP problem has one solution and can be solved with the DLT method [6]. In this paper, we will give the probability for the number of solutions for the P3P problem. For the P3P problem, we compute two sets of probabilities with a Monte-Carlo type method: the feasible probability and the physical probability. We show that for parameters satisfying certain “reality conditions”, the feasible probabilities for the P3P problem to have zero, one, two, three and four solutions are 0.5903, 0.3661, 0.0433, 0.0002 and 0.0001 respectively. If we assume that the parameters are from real observation and the P3P problem has at least one solution , then the physical probabilities for the P3P problem to have one, two, three and four solutions are 0.8934, 0.1058, 0.0005 and 0.0003 respectively. The information on probability gives more insight to the multi-solution phenomenon of the P3P problem. It can be used to guide the computation process. For instance, we may conclude that for the P3P problem, with very high probabilities we will have one solution and the probabilities for the P3P problem to have more solutions is decreasing. The rest of the paper is organized as follows. In section 2, we present the P3P problem. In section 3, we compute the probability for a special case of the P3P problem. In Section 4, we give an algorithm to compute the probability for the P3P problem based on a Monte-Carlo type method and give the probability by the algorithm. In Section 5, conclusions are presented.
2
Notations and Preliminary Results
Let P be the center of perspective, and A, B, C the control points (Fig. 1). Let |P A| = X, |P B| = Y, |P C| = Z, α = BP C, β = AP C, γ = AP B, d the diameter of the circumscribed circle of triangle ABC, A = BAC, B = ABC, C = ACB. From triangles P BC, P AC and P AB, we obtain the P3P equation system: ⎧ 2 ⎨ Y + Z 2 − 2 cos(α)Y Z − d2 sin2 (A) = 0 (1) Z 2 + X 2 − 2 cos(β)XZ − d2 sin2 (B) = 0 ⎩ 2 X + Y 2 − 2 cos(γ)XY − d2 sin2 (C) = 0 where A + B + C = π. The above equation system is for variables X, Y, Z with parameters d, A, B, C, α, β, γ. Definition 1. A set of values for parameters d, A, B, C, α, β, γ is called feasible for the P3P problem if the following “reality conditions” are satisfied.
Computation of the Probability on the Number of Solution
193
P γ α
x
β
z
C A
y
B Fig. 1. The P3P problem
⎧ ⎨ d > 0, 0 < A < π, 0 < B < π, 0 < C < π 0 < α, β, γ < π, 0 < α + β + γ < 2π ⎩ α + β > γ, α + γ > β, γ + β > α
(2)
A solution of the P3P problem for a set of the feasible parametric values is a set of positive solutions of equation system (1). Notice that for one solution of equation system (1), there exist two solutions for point P . These two solutions are symmetric about plane ABC. Since in practice we know clearly that the center of perspective is in a fixed side of plane ABC, we need only consider the solutions in one side of plane ABC. Definition 2. A set of feasible parametric values is called physical for the P3P problem if the P3P problem has at least one solution. Since in practice the parametric values are obtained from real measurements, these values must be physical. √ To simplify the equation system, set X = xZ, Y = yZ, Z v = d sin(C) in (1). We obtain a new equation system. ⎧ sin2 (A) 2 ⎪ ⎨ y + 1 − 2 cos(α)y − v sin2 (C) = 0 2 (B) (3) x2 + 1 − 2 cos(β)x − v sin 2 (C) = 0 sin ⎪ ⎩ 2 2 x + y − 2 cos(γ)xy − v = 0. Lemma 1. For any set of feasible parametric values, the number of positive solutions of equation system (1) is the same as that of (3). Proof. Since | cos(γ)| < 1, we have v = x2 + y 2 − 2 cos(γ)xy > 0. We thus have the following one to one correspondence between the positive solutions of (1) and (3) √ d sin(C)= vZ
X=xZ,Y =yZ
{(x, y, v)} ← − − − − −− → {(x, y, Z)} ← − − − − −− → {(X, Y, Z)}.
194
J. Tang, X.-S. Gao, and W. Chen
As a consequence of Lemma 1, to compute the probabilities of the number of solutions for the P3P problem for all possible values of the parameters, we need only to consider equation (3). In other words, the number of solutions of the P3P problem does not depend on d and can be obtained from (3). Considering (3) instead of (1) simplifies the problem in two aspects. One aspect is simple: equation system (3) has five parameters and (1) has six parameters, and hence the amount of computation is reduced. For the second aspect, we will use a Monte-Carlo type method to compute the probabilities of the number of solutions. In the computation, we need to assume that each parameter takes values in a finite interval [a, b]. Parameters A, B, C, α, β, γ satisfy this property due to (2). But, parameter d does not have an upper bound. Since parameter d does not occur in (3), this difficulty is avoided.
3
A Working Example
We will use a special case of the P 3P problem to illustrate how to compute the probabilities of the number of solutions for the P3P problem. Let A = B = C = π 3 and β = γ. Set p = 2 cos(α), q = 2 cos(β). Then the solutions of (3) depend on p and q and this dependence is given in in Figure 2 [7].
Fig. 2. Solution distribution for case A = B = C = 2
π ,β 3
=γ
4 2 In the above figure, L1 is p = 4+q 4 , L2 is p = q − 2, and L3 is p = 3 − q2 . Now, the reality conditions (2) become: 0 < α, β < π, 0 < α + 2β < 2π, and 2β > α, which can be reduced to
−2 < p < 2, - 2 < q < 2, q2 − 2 < p.
Computation of the Probability on the Number of Solution
195
In other words, the area between line p = 2 and curve L2 is the region of feasible parametric values (Figure 2). Let S be the area of this region and Si , i = 0, 1, 2, 3, 4 the areas of the regions in Figure 2 in which the P3P problem has 0, 1, 2, 3, 4 solutions respectively. Then, we may compute two sets of probabilities. Let fi = Si /S0 , i = 0, 1, 2, 3, 4.
(4)
Then ri is the probability for the P3P problem to have i solutions for all possible feasible parametric values. We thus call them the feasible probabilities. Let pi = Si /(S1 + S2 + S3 + S4 ), i = 1, 2, 3, 4.
(5)
Then pi is the probability for the P3P problem to have i solutions for all physical parametric values. We thus call them the physical probabilities. Using Maple, the areas Si and hence the probabilities can be computed easily and are listed in Table 1. Table 1. Probabilities obtained by computing areas # of solutions 0 1 2 3 4 Feasible Probability 0.2612 0.6174 0.0668 0.0321 0.0225 Physical Probability 0.8356 0.0904 0.0435 0.0315
These probabilities (or the areas of the regions) may be computed with the following Monte-Carlo type method. 1. Let N be a positive integer and ni = 0, i = 0, 1, 2, 3, 4. 2. For i from 0 to N , let qi = −2 + N4i+1 and do Step 3. 3. For j from 0 to N , let pj = −2 +
4j N +1
and if qj2 − 2 < pi do Step 4.
4. Let p and q be random numbers in the intervals (pj , pj + N4+1 ) and (qi , qi + 4 N +1 ). For p and q, compute the number of solutions for the P3P problem. If the P3P problem has i solutions, then add one to ni . It is clear that ni is the number of sample points for which the P3P problem has i solutions. Therefore, when N is large enough, the feasible and physical probabilities can be approximately computed as follows: fi = ni /(n0 + n1 + n2 + n3 + n4 ), i = 0, 1, 2, 3, 4. pi = ni /(n1 + n2 + n3 + n4 ), i = 1, 2, 3, 4.
(6)
Note that n0 + n1 + n2 + n3 + n4 = N 2 since only feasible parametric values are considered. Table 2 gives the probabilities computed in this way for N = 1000. Comparing Tables 1 and 2, we see that the Monte-Carlo type method gives very good approximation.
196
J. Tang, X.-S. Gao, and W. Chen Table 2. Probabilities obtained with a Monte-Carlo type method Number of solutions 0 1 2 3 4 Feasible Probability 0.2612 0.6174 0.0668 0.0321 0.0225 Physical Probability 0.8356 0.0904 0.0435 0.0305
4
Probabilities for the Number of Solutions
From now, we will consider equation system (3) with five parameters A, B, α, β, γ (C = π − A − B). It is well known that the P3P problem could have one, two, three and four solutions. In [7], we give the complete solution classification to the P3P equation system, that is, we give explicit criteria for the P3P problem to have one, two, three, or four solutions. These criteria are equations and inequalities in the parameters of the P3P equation system. These explicit criteria divided R5 into many regions whose boundaries are algebraic hypersurfaces. As a consequence, the volumes of these regions exist and are finite numbers due to conditions (2). For parametric values taken from each region, the number of solutions for the P3P problem is fixed. We thus have the following definition. Definition 3. Let Si , i = 0, 1, 2, 3, 4 be the volumes of the regions for feasible parameters of the P3P problem to have i solutions. Then the feasible probabilities ri , i = 0, 1, 2, 3, 4 and the physical probabilities pi , i = 1, 2, 3, 4 are defined as follows. fi = Si /(S0 + S1 + S2 + S3 + S4 ), i = 0, 1, 2, 3, 4 (7) pi = Si /(S1 + S2 + S3 + S4 ), i = 1, 2, 3, 4. But the explicit criteria are too complicated to be used to compute the integrations corresponding to the volumes of the regions. To compute these probabilities, we will use a Monte Carlo type method [15] similar to the method (6) used in Section 3. We first estimate the value ranges for the parameters. From (2), A, B, C may take feasible parametric values in the following intervals: A ∈ (0, π), B ∈ (0, π − A), C = π − A − B; α, β, γ may take feasible parametric values in the following intervals: α ∈ (0, π), β ∈ (0, π), γ ∈ (max{α − β, β − α}, min{2π − α − β, α + β}) Base on the above estimation, we design the following procedure to compute the the probabilities. 1. Let N = 9, and ni = 0, i = 0, 1, 2, 3, 4. 2. For Ai = Niπ +1 , i = 0, · · · , N do Step 3.
Computation of the Probability on the Number of Solution
197
(π−Ai )N For Bj = Njπ do Step 4. +1 , j = 0, · · · , π For αk = Nkπ , k = 0, · · · , N do Step 5. +1 lπ For βl = N +1 , l = 0, · · · , N do Step 6. Let m1 = Nπ+1 max{αk − βl , βl − αk }, m2 = Nπ+1 min{2π − αk − βl , αk + βl }). For γm = Nmπ +1 , m = m1 , · · · , m2 do Step 7. 7. Let A, B, α, β, γ be random numbers in the intervals (Ai , Ai + Nπ+1 ), (Bi , Bi + π N +1 ), (αi , αi + Nπ+1 ), (βi , βi + Nπ+1 ), (γi , γi + Nπ+1 ), and C = π − A − B. For these feasible parametric values A, B, C, α, β, γ, compute the number of positive solutions for equation (3). If the number is t, then add one to nt . 8. Now ni is the number of sample points for the P3P problem to have i solutions. Use (6) to compute the feasible and physical probabilities for the P3P problem.
3. 4. 5. 6.
To ensure that the computation is reliable, we use different sizes of sample points and plot the result in Figures 3 and 4. In Figures 3 and 4, the x-axis is the number of sample points and the y-axis is the feasible and physical probabilities respectively. From these figures, we see that when the number of sample points becomes larger, the probabilities become stable. The approximate probabilities computed in this way is listed in following Table 3. Feasible Probabilities VS
the Number of Sample Points
0.7 0.65 0.6 0.55
P(ns=0)
Feasible Probabilities
0.5 0.45 0.4 0.35 P(ns=1)
0.3 0.25 0.2 0.15 P(ns=2)
0.1 0.05
P(ns=3)
0
P(ns=4) −0.05
0
5
10
15
the Number of Sample Points
Fig. 3. Feasible probabilities
20
25 * 10 5
198
J. Tang, X.-S. Gao, and W. Chen Physical
Probabilities VS
the Number of Sample Points
1 0.95 0.9 0.85 P(ns=1)
0.8 0.75
Physical
Probabilities
0.7 0.65 0.6 0.55 0.5 0.45 0.4 0.35 0.3 0.25 0.2
P(ns=2)
0.15 0.1 P(ns=3)
0.05 0
P(ns=4)
−0.05 −0.1
0
5
10
15
20
25 * 10 5
the Number of Sample Points
Fig. 4. Physical probabilities Table 3. Probabilities for P3P problem Number of solutions 0 1 2 3 4 Feasible Probability 0.5903 0.3661 0.0433 0.0002 0.0001 Physical Probability 0.8934 0.1058 0.0005 0.0003
5
Conclusion
For the P3P problem, we present an algorithm to compute the probabilities on the number of solutions and give the probabilities for it to have one, two, three and four solutions by the algorithm. From these probabilities, we may conclude that with very high probabilities we will have one solution and the probabilities for the P3P problem to have more solutions is decreasing. This kind of information may provide a guidance on the number of solutions during the solving process.
References 1. Abidi, M. A., Chandra, T.: A New Efficient and Direct Solution for Pose Estimation Using Quadrangular Targets:Algorithm and Evaluation. IEEE Transaction on Pattern Analysis and Machine Intelligence 17(5) (1995) 534-538
Computation of the Probability on the Number of Solution
199
2. Ansar, A. and Daniilidis, K.: Linear Pose Estimation from Points and Lines. IEEE Transaction on Pattern Analysis and Machine Intelligence 25(5) (2003) 578-589 3. DeMenthon, D. and Davis, L. S.: Exact and Approximate Solutions of the Perspective-Three-Point Problem. IEEE Transaction on Pattern Analysis and Machine Intelligence 14(11) (1992) 1100-1105 4. DeMenthon, D. F. and Davis, L. S.: Model-Based Object Pose in 25 Lines of Code. International Journal of Computer Vision 15 (1995) 123-141 5. Fischler, M. A., Bolles, R. C.: Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartomated Cartography. Communications of the ACM 24(6) (1981) 381-395 6. Ganapathy, S.: Decomposition of Transformtion Matrices for Robot Vision. Proc. IEEE Con. Robotics and Automation IEEE Press (1984) 130-139 7. Gao, X. S., Hou, X. R., Tang, J. L. and Cheng, H.: Complete Solution Classification for the Perspective-Three-Point Problem. IEEE Transaction on Pattern Analysis and Machine Intelligence 25(8) (2003) 534-538 8. Haralick, R. M., Lee, C., Ottenberg, K. and Nolle, M.: Analysis and Solutions of the Three Point Perspective Pose Estimation Problem. Proc. of the Int. Conf. on Computer Vision and Pattern Recognition (1991) 592-598 9. Hartley, R. I. and Zisserman, A.: Multiple view geometry in computer vision. Cambridge University Press (2000) 10. Horaud, R., Conio, B., and Leboulleux, O.: An Analytic Solution for the Perspective 4-Point Problem. Computer Vision, Graphics and Image Processing 47 (1989) 33-44 11. Horn, B. K. P.: Closed Form Solution of Absolute Orientation Using Unit Quaternions. Journal of the Optical Society of America 5(7) (1987) 1127-1135 12. Hu, Z. Y., Wu, F. C.: A Note on the Number Solution of the Non-coplanar P 4P Problem. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(4) (2002) 550-555 13. Hu, Z. Y., Wu F. C.: A Study on the P 5P Problem. Chinese Journal of Software 12(5) (2001) 768-775(in Chinese) 14. Mishra, B.: Algorithmic Algebra. Springer-Verlag New York (1993) 15. Mikhailov G. A.: Parametric Estimates by the Monte Carlo Method. Utrcht the Netherlands Tokyo (1999) 16. Kalos M. H. and Wbitlock P. A.: Monte Carlo Methods Volume I: Basics. A Wiley Interscience Publication John Wiley & Sons New York (1986) 17. Quan, L. and Lan, Z.: Linear N-Point Camera Pose Determination. IEEE Transaction on Pattern Analysis and Machine Intelligence 21(8) (1999) 774-780 18. Rives, P., Bouth´emy, P., Prasada, B. and Dubois, E.: Recovering the Orientation and the Position of a Rigid Body in Space from a Single View. Technical Report INRS-T´el´ecommunications place du commerce Ile-des-Soeurs Verdun H3E 1H6 Quebec Canada 3 (1981) 19. Su, C., Xu, Y., Li, H. and Liu, S.: Necessary and Sufficient Condition of Positive Root Number of P3P Problem (in Chinese). Chinese Journal of Computer Sciences 21 (1998) 1084-1095 20. Su, C., Xu, Y., Li, H. and Liu, S.: Application of Wu’s Method in Computer Animation. The Fifth Int. Conf. on Computer Aided Design/Computer Graphics 1 (1997) 211-215 21. Winkler G.: Image Analysis, Random Fields and Dynamic Monte Carlo Methods. Springer-Verlag Press Berlin (1995) 22. Wolfe, W. J. and Jones, K.: Camera Calibration Using the Perspective View of a Triangle. Proc. SPIE Conf. Auto. Inspection Measurement 730 (1986) 47-50
200
J. Tang, X.-S. Gao, and W. Chen
23. Wu, W. T.: Mathematics Mechanization. Science Press Beijing(in Chinese) (2000) English Version Kluwer Aacademic Publishers London (2000) 24. Wolfe, W. J., Mathis D., Weber C. and Magee M.: The Perspective View of Three Points. IEEE Transaction on Pattern Analysis and Machine Intelligence 13(1) (1991) 66-73 25. Yuan, J. S. C.: A General Photogrammetric Method for Determining Object Position and Orientation. IEEE Transaction on Robotics and Automation 5(2) (1989) 129-142
Context-Awareness Based Adaptive Classifier Combination for Object Recognition Mi Young Nam, Battulga Bayarsaikhan, Suman Sedai, and Phill Kyu Rhee Dept. of Computer Science & Engineering, Inha University.253, Yong-Hyun Dong, Incheon, Korea {rera, battulga, suman}im.inha.ac.kr,
[email protected]
Abstract. In this paper classifier combination scheme is presented as a cascade of adaptive selection of classifiers and fusion of classifiers. In the proposed scheme, system working environment is learned and the environmental context is identified. Then the group of classifiers that are most likely to produce accurate output is generated for each environmental context. Then the decision of fusion of more than one classifiers or selecting best classifier is made using proposed t-test decision model. The proposed t-test decision model ensures the reliability of fusion. The proposed scheme has been tested in area of face recognition using standard FERET database, taking illumination as an environmental context. Experimental result showed that using context awareness and t-test decision model in classifier combination provides robustness to varying environmental conditions.
1 Introduction Classifier is a black box that that uses observations made on an object and from these assigns a label to the object. In this paper we present face recognition system where the system is supposed to recognize a person provided some prior observations of the person. We present our system as the combination of classifiers. Classifier combination can be thought of as classifier selection and classifier fusion. In real world analogy, classifier selection can be thought as choosing experts who are specialized in a particular field and classifier fusion can be thought as combining the decision of multiple experts to make the final decision. Classifier selection methods are divided into static classifier selection and dynamic classifier selection methods [1] and some popular classifier fusion methods are majority voting [2], decision templates [3], naive Bayesian, neural network [4] etc. The proposed method primarily aims at robust object recognition under uneven environments by cascading of selection and fusion of different classifiers using context awareness. The method makes the decision of whether to use multiple classifiers fusion or to use best single classifier using the proposed t-test decision model. The t-test decision model is derived from the reliability condition of combination. The main difference of the proposed classifier combination method from other methods is that it can combine classifiers in accordance with the identified context in reliable way. The proposed method adopts the strategy of context D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 201 – 210, 2006. © Springer-Verlag Berlin Heidelberg 2006
202
M.Y. Nam et al.
knowledge accumulation so that classifier combination scheme is able to adapt itself under changing environments based on experience. Context knowledge base consists of the knowledge of classifiers and there combination scheme for each environmental context. Once the context knowledge base is constructed; the system can react to changing environments at run-time. This paper is organized as follows. Section 2 presents the model of proposed context-aware classifier combination scheme (ACCS) and describes the framework and implementation issues such as context learning, context identification. Section 3 presents the t-test decision model for classifier combination .Section 4 presents the design examples of ACCS on face recognition system and shows the experimental result and finally section 5 concludes the paper.
2 Framework of Context-Aware Classifier Combination Scheme In this section, we discuss about the architecture of context-aware classifier combination scheme with the capability of adaptation and the issue of context modeling. 2.1 Architecture of Context-Aware Classifier Combination Scheme There is no general approach or theory for efficient classifier combination, yet [1]. Classifier combination can be thought as the generation of candidate classifiers and decision aggregation of candidate classifiers. For the simplicity of explanation, we assume that a classifier combination consists of four stages: preprocessing, feature representation, class decision, and aggregation stages. Each stage consists of several competitive and complementary components. An example of classifier combination scheme is given in Fig. 1.
Fig. 1. Simple Classifier Combination scheme
Classifier combination scheme combines classifier components and produce the combined classifier system to achieve high performance. Fig. 2 shows that our proposed context based classification and classifier fusion scheme. We propose the framework of the adaptive classifier combination scheme (ACCS) for realizing the model of context-aware classifier combination.
Context-Awareness Based Adaptive Classifier Combination for Object Recognition
203
Fig. 2. Architecture of proposed context aware adaptive classifier combination scheme (ACCS)
Fig. 2 shows architecture blocks of the context aware adaptive classifier combination scheme. In the figure, flow of training and testing data is set across various architectural blocks. Feature extraction section extracts the feature from the input data. The result of feature extraction is a vector which is of relatively low dimension than the input data. Context modeling block divides the training data into several meaningful context clusters by employing some unsupervised learning. Once context is learned context aware block is able to identify the context of test data by using some classification methods. Depending on the identified context of test data, classifier combination block decides whether to combine output of different classifiers (fusion) or to choose best classifier. This knowledge is initialized in training phase by applying t-test decision model as described in section 3. Our framework addresses the following issues: i) How to learn and identify an environmental context in association with application working environments? ii) How to find the knowledge of classifier selection or fusion for each identified context? We assume that context data can be modeled as clusters of several environmental contexts called context categories. For each context, classifiers are combined using a t-test decision to produce an optimal output of the scheme. Detail of the t-test model and fusion method used are discussed in section 3. 2.2 Context Modeling and Identification Context data is defined as any observable and relevant attributes, and its interaction with surrounding environment at an instance of time [2, 3, 4]. Examples of context data can be various configurations, computing resource availability, dynamic task requirements, application conditions, application related environment[5] etc. Context knowledge describes a trigger of system action i.e. selection of classifiers for an identified context, in association with an identified context stored in the context knowledge base over a period of time. Context learning, also referred as context modeling, basically implies clustering context data into context categories. So the result of context learning is several context categories where each category represents
204
M.Y. Nam et al.
the context to which an environment can be associated at given timestamp t. Context learning can be performed by an unsupervised learning algorithm such as SOM, Fuzzy Art, K-means etc. Context identification is to determine the context category of a given context data y. Context identification can be carried out employing a normal classification method such as NN, K-NN, SVM, etc
3 T-Test Decision Model for Classifier Combination In this section we discuss about the statistical model to achieve the reliable fusion system for combining results from several constituent classifiers. Fusion gives better result when the classifiers being fused are closed related to each other. Kunchiva have applied a paired t-test to find whether the best classifier is different enough from rest in decision space [3].Here we try to estimate the fusion condition to achieve the reliability of the fusion system without much loss in accuracy. We want to increase the lower limit of the recognition rate of the fusion system than that of best classifier, hence achieving higher reliability. Lets us consider a restricted case of only two classifiers CLSA and CLSB, where CLSA is found to be best one in decision space. Classifier CLSA recognition rate is normally distributed with mean deviation
δA
PˆA and standard
and the distribution is denoted by PA ~ N ( PˆA , δ A ) . Similarly for
classifier CLSB distribution is PB
~ N ( PˆB , δ B ) . Then distribution of recognition rate
after fusion is PF ~ N ( PˆF , δ F ) . If PˆF = aPˆA + bPˆB such that shown that standard deviation of recognition rate of fusion is
a+b=1. It can be
δ F = a 2δ a 2 + b 2δ b 2 )
(1) . The figure 3 shows the distribution of each classifier CLSA, CLSB and fusion (CLSA, CLSB). In the figure SA=2įA, SB=2įB and SF= 2įF such that (PA-SA), (PB-SB) and (PF-SF) denotes the lower limit of the 95% confidence interval in respective distribution. From equation (1), it can be shown that
SF = a2S A + b2SB ) . 2
2
(2)
We want to do the fusion only when lower limit of fusion (PF- SF) will be greater than lower limit of best classifier (PA-SA).That is,
PˆF − S F > PˆA − S A where PˆF = aPˆA + bPˆB .
(3)
2 2 aPˆA + bPˆB − a 2 S A + b 2 S B ) > PˆA − S A ) .
(4)
Here we are interested to find the gap
Δ = PˆA − PˆB between the best classifier and
second best classifiers recognition rate, for reliable fusion, In equation 4, if we take (a=b=1/2). 2 2 PˆA − PˆB < 2S A − S A + S B .
(5)
Context-Awareness Based Adaptive Classifier Combination for Object Recognition
205
Fig. 3. Example distribution of recognition rate of best classifier CLSA, second best CLSB and there fusion CLSF
And assuming that standard deviation of PA and PB similar, i.e.
Δ < 0.6S A
PA ≈ PB (6)
SA can be calculated from N number of observations of recognition rate as follows.
S A = t ( 0.05, N −1)
PˆA (1 − PˆA ) N .
(7)
where t (0.05,N-1) is the t value fro 95% significance level (Į=0.05) and N-1 degrees of freedom. For N>100 we can use t (0.05, N-1) =1.96.From (6) and (7)
Δ < 1.176
PˆA (1 − PˆA ) N .
(8)
Equation 8 gives the condition of better fusion, in order to increase the reliability by increasing lowest recognition rate of best classifier (PA-SA.) to PF -SF. Four fusion methods namely Decision Templates (DT)[2], Majority Voting (MV)[8], Product and Average [8] are used for aggregation of classifier output when t-test decides for fusion.
4 Experiments on Face Recognition The proposed method was tested in the area of face recognition using standard FERET database. Its performance was evaluated through extensive experiments, and shown to be reliable and superior to those of most popular methods, especially under changing illumination. Face images are used as context data as well as action input data. The changes in image data under changing illumination are modeled as environmental contexts. First face images are clustered into several distinguishable contexts according to illumination variations. The model of illumination environmental contexts is con-structed by considering light direction and brightness. Context modeling is implemented by Kohnen's self-organizing map (SOM) [6, 7] and Radial basis function (RBF). SOM has the capability of unsupervised learning. It models illumination environment as several context categories. The RBF neural network is trained using the clustered face data in
206
M.Y. Nam et al.
order to identify the context category of an input image. Histogram equalization (HE) is used for preprocessing components and Gabor wavelet is used as feature representation [9]. Gabor wavelet shows desirable characteristics in orientation selectivity and special locality. As an example, Gabor13 is generated using 13 fiducial points as shown figure 4.
Fig. 4. An example of 13 Gabor feature points for face recognition
Three classifiers namely Gabor 13, Gabor28 and Gabor32 are used as candidate classifier for each region. These classifiers are trained using enrollment images. Knowledge of classifier combination scheme for each context is again learned by t-test decision model. The knowledge of effective classifier structure for a data context is described by the pair of data context category and corresponding set of best classifiers or single best classifier along with the decision of fusion or selection. 4.1 Experimental Results The feasibility of the proposed method has been tested using FERET database. The data set has two sets of 2182 frontal face images from 1091 people. One is training set and other is testing set. Single image is used for registration of each people in training set containing 1091 images. The remaining 1091 images in training set used are used as the intermediate test images to perform t-test test to train classifier combiner. First of training images which are used in enrollment are divided into the data context categories or clusters (six, nine and twelve clusters are investigated separately) by unsupervised learning [6, 7]. An example of different context of face data identified using SOM is shown Figure 5. It can be seen that, each cluster consists of group of faces having similar intensity and direction of light.
Fig. 5. The example of face images clustered into six context categories using SOM
Context-Awareness Based Adaptive Classifier Combination for Object Recognition
207
Gabor13
HE
ED/ CD
Gabor28
Gabor32
Fig. 6. A set of 3 classifiers for each context Table 1. Comparison of result of various fusion strategies and best selection result for context model and first set of classifiers
Combination/ Context cluster 1 2
T-Test Decision Fusion Selection
DT
MV
Product
Average
90.1 -
90.5 -
90.1 -
90.1 -
Best Selection 90.1 95.5
3
Selection
-
-
-
-
94.3
4 5 6 Total
Selection Selection Fusion -
86.5 91.5
85.4 91.4
86.5 91.5
86.5 91.5
92.4 91.7 87.0 91.6
six
Experiment is conducted on two set of classifiers from three Gabor feature representation for each context. First set uses Euclidian distance as the distance, measure in feature space and second used cosine distance, see Fig. 6. Further more learning continues towards training the fusion system. Here best selection or best combination of fusion for a particular data context is decided by T-test as described in section 3. The interval ǻ between best and second best classifier is compared to the condition of reliable fusion stated by equation (8). If the condition matches then fusion is done, otherwise best selection is done. After training is completed, the classifier combination scheme is stored in lookup table also referred as context knowledge base (CKB) as knowledge of context action association. After completion of learning mode; the system is run under action mode or recognition mode using testing data set. Recognition rate of classifier combination scheme for each data context is recorded. Table 1 show the result of fusion of each context for six context model. It shows that t-test decides one of best selection or fusion for each context for reliable fusion. Again the result shows that, we get reliable fusion, while the fusion result approaches the best selection. In the case t-test decided for fusion see context model label 1, 4, 5 and 6 in Table 1, recognition rate for all the fusion strategies gives almost similar result. Since it was stated that fusion method is not a primary factor when the
208
M.Y. Nam et al.
ensemble contains diverse set of classifiers [1]. This gives insight that our classifier selection method has chosen optimal set of classifiers for each context. Table 2.A and Table 2.B show the result of six experiments (3 context models and 2 type of distance measure) along with the rank of each combination strategy for 6, 9 and 12 cluster. First five rows shows the result of t-test based combination and last four rows shows result of normal combination method. In almost all experiment t-test based combination gives higher recognition rate. Each classifiers output with highest rank is marked bold. This table suggests that T-test gives reliable result in classifier combination. Table 2. A. Shows comparison result over six experiments conducted on classifiers using different fusion methods using (6, 9 context model) and (Euclidian distance, Cosine Distance) of T-test and non t-test method of combination Context Classifiers/ Combiner
6-Context Model Euclidian Cosine A
Proposed T-test
Normal
BS DT MV PRODUCT AVERAGE DT MV PRODUCT AVERAGE
89.7 90.3 91.6 90.3 90.3 89.3 89.9 89.3 89.3
R 4 7 9 7 7 2 5 2 2
A 91.6 91.5 91.4 91.5 91.5 91.1 91.5 91.1 91.1
R 9 5.5 8 5.5 5.5 2 5.5 2 2
9-Context Model Euclidian Cosine A 88.4 89.1 88.7 89.0 89.0 89.1 88.6 89.1 89.0
R 1 7 3 5 5 7 2 7 5
A 92.1 92.1 92.1 92.1 92.1 91.9 92.2 91.9 91.9
R 6 6 6 6 6 2 9 2 2
Table 2. B. Shows comparison result over six experiments conducted on classifiers using different fusion methods using 12 context models and (Euclidian distance, Cosine Distance) of T-test and non t-test method of combination
Context Classifiers/ Combiner
Proposed T-test
Normal
12-Context Model Euclidian Cosine
BS BS+T-DT BS+MV BS+PRO BS+AVR DT MV PRO AVR
A 88.7 88.8 89.4 88.8 88.8 88.9 88.9 88.9 89.1
R 1 3 9 3 3 6 6 6 8
A 90.7 91.1 90.7 91.1 91.1 91.0 90.9 91.0 91.0
R 1.5 8 1.5 8 8 5 3 5 5
Total R 21 28.5 35 26.5 26.5 19 27.5 19 19
Context-Awareness Based Adaptive Classifier Combination for Object Recognition
209
Table 3 shows a recognition rate of proposed method and comparison with other methods. In experimental result, we know that our proposed method is better the other approach. Table 3. Comparison with other approach
Selection Our T-Test
Kuncheva[2,3]
Best DT MV Product Average DT MV Product Average
6 Clusters 91.56 91.47 91.38 91.47 91.47 91.10 91.47 91.10 91.10
9 Clusters 92.11 92.11 92.11 92.11 92.11 91.93 92.20 91.93 91.93
12 Clusters 90.74 91.10 90.74 91.10 91.10 91.01 90.92 91.01 91.01
5 Conclusion Since no single classifier is best for all environmental contexts we choose the best classifiers for each context. The fusion of chosen best classifiers of a context may not always prove to be effective, so t-test is performed to check the effectiveness of the fusion in terms of reliability. If expected fusion seems to be unreliable then best selection is chosen. The proposed method tested on face recognition system shows reliable result across varying environmental context. We found almost consistent result across different illumination is face images and across various experiments we conducted. It can be concluded that use of context awareness and t-test decision model gives the high reliability to the system. Also it is shown that the system adapts itself to the changing environment so it is suitable for real time system.
References 1. Ludmila, I., Kuncheva.: Switching Between Selection and Fusion in Combining Classifiers: An Experiment. IEEE Transactions on Systems, Man, and Cybernetics - part B: cybernetics, Vol.32, Vo.2 (2002) 146-156 2. Ludmila, I., Kuncheva: A Theoretical Study on Six Classifier Fusion Strategies. IEEE S on PAMI, Vol.24, No.2 (2002) 56~68 3. Kuncheva, L. I., Bezdek, J. C., Duin, R. P. W.: Decision Templates for Multiple Classifier Fusion: An Experimental Comparison. Pattern Recognit., Vol.34, No.2 (2001) 299–314 4. Huang, Y.S., Suen, C.Y.: A Method of Combining Multiple Classifiers—A Neural Network Approach. Proc. 12th Int’l Conf. Pattern Recognition (2002) 5. Yau, S., Karim, F., Wang, Y., Wang, B.: Reconfigurable Context-Sensitive Middleware for Pervasive Computing. IEEE Pervasive Computing, Vol.1, No.3 (2002) 33-40
210
M.Y. Nam et al.
6. Nam, M.Y., Rhee, P.K: An Efficient Face Recognition for Variant Illumination Condition. ISPACS2005, Vol.1 (2004) 111-115 7. Nam, M.Y., Rhee, P.K.: A Novel Image Preprocessing by Evolvable Neural Network. LNAI3214, Vol.3 (2004) 843-854 8. Kuncheva L.I., Jain, L. C.: Designing classifier fusion systems by genetic algorithms. IEEE Transactions on Evolutionary Computation, Vol.4, No.4 (2000) 327-336 9. Inja Jeon, Eun-Sung Jung, Phill Kyu Rhee: Adaptive Normalization based Highly Efficient Face Recognition under Uneven Environments. KES 2005, LNCS 3682 (2005) 759-768
Detecting All-Zero Coefficient Blocks Before Transformation and Quantization in H.264/AVC∗ Zhengyou Wang1, 2, Quan Xue3, Jiatao Song4, Weiming Zeng1, Guobin Chen3, Zhijun Fang1, and Shiqian Wu1 1
School of Information Technology, Jiangxi University of Finance & Economics, Nanchang, 330013, China 2 Key Laboratory of High-Performance Computing Technology, Jiangxi Province, Nanchang330022, China
[email protected],
[email protected],
[email protected],
[email protected] 3 Department of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China
[email protected],
[email protected] 4 College of Electronic and Information Engineering Ningbo University of Technology, Ningbo 315016, China
[email protected]
Abstract. In this paper a new early detection algorithm for all-zero coefficient blocks before integer transformation and quantization is proposed in very low bit rate video coding for H.264/AVC standard. The all-zero detecting threshold is very critical as a tradeoff between encoder complexity and image quality. So firstly a feasible threshold range is analyzed theoretically, then based on extensive experiments on typical video sequences from the statistical opinion, a better optimal threshold is discussed, which has a better balance than the theoretically optimal threshold between mis-justice and leak-out ratio in detecting all-zero coefficient blocks. In our proposed algorithm, the sum of absolute difference (SAD) of each motion prediction error in the inter block is used, and this SAD is a bypass obtained from motion estimation process, so no additional computation is required. The simulation results show that the proposed algorithm can reduce the computational load significantly, almost without video quality loss.
1 Introduction As the latest international video coding standard proposed by JVT between ITU-T VCEG and ISO/IEC MPEG, H.264/MPEG4 AVC [1] uses state-of-the-art coding tools and provides enhanced coding efficiency for a wide range of applications in∗
This work was partially supported by Innovation Fund of Jiangxi University of Finance and Economics; Fund of Key Laboratory of High-Performance Computing Technology (JXHC2005-004), Jiangxi Province; and the Science and Technique Program of Educational Department of Jiangxi Province (No: [2006]232).
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 211 – 219, 2006. © Springer-Verlag Berlin Heidelberg 2006
212
Z. Wang et al.
cluding videophone and videoconferencing application. However, the enormous number of computations required in H.264/MPEG4 AVC video coding limits the applications. Therefore, reducing the computations of the encoder is vital for this new standard. The portion of the transform and quantization [2] becomes an important module and cannot be neglected for the realization of the real-time encoder. Therefore, if all-zero coefficient blocks can be detected early before transformation and quantization, the number of computations will be reduced effectively, so it is quite meaningful and common to develop an efficient all-zero block detection criteria in very low bit-rate video encoding [3]-[6]. For H.264/AVC, Sousa [7] theoretically derived a precise condition to define a condition sufficient for quantizing all DCT coefficients to zero. Based on it, Moon [8] proposes a more precise sufficient condition by modifying the calculation order of the sum of absolute difference. In order to reduce the transform and quantization complexity, an improved early detection method for all-zero coefficients block for H.264/AVC is proposed in this paper. The feasible range of all-zero block detection threshold is analyzed theoretically to get the upper limit and lower limit, then based on extensive experiments on typical video sequences from simple to complex, the best optimal threshold is presented statistically with a good balance between mis-justice and leak-out ratio. In the whole detection process, the Sum of Absolute Difference (SAD) is used, which is the bypass from motion estimation, so no additional computation is needed. The paper is structured as follows: in Section 2 the conventional existing conclusion is given at detail. The optimal detection threshold is discussed theoretically and statistically in section 3. In section , simulation results are presented to show the efficiency and effectiveness of the proposed method. The last section summarizes our conclusion.
2 Conventional Early Detection Method In H.264/MPEG4 AVC video codec, a 4x4 integer transform not the 8x8 floatingpoint discrete cosine transformation (DCT) is used in order to reduce the blocking and ringing artifacts and eliminate any mismatch problems between the encoder and the decoder. Thus, Sousa gives an early detection method for the integer transform, derived from the conventional approach. Development from the 4x4 DCT, is given by:
a a º ªa b a c º ªa a «b c − c − b » «a c − a − b » ». »[X ]« Y = AXAT = « « a − a − a a » «a − c − a b » » « » « ¬ c − b b − c ¼ ¬a − b a − c ¼ where:
a=
1 1 1 §π · § 3π · ,b = cos¨ ¸ , c = cos¨ ¸ . 2 2 2 ©8¹ © 8 ¹
(1)
This matrix multiplication can be factorized and orthogonal to elements is approximated to ensure that the transform remains orthogonal. The final forward transform becomes:
Detecting All-Zero Coefficient Blocks Before Transformation and Quantization
§ ª1 1 1 1 º· 1 1 º ª1 2 ¨« » « »¸ ¨ «2 1 − 1 − 2» «1 1 − 1 − 2» ¸ [X ] Y =¨ «1 − 1 − 1 1 » «1 − 1 − 1 2 » ¸ ¨« ¸ ¨ 1 − 2 2 − 1» «1 − 2 1 − 1» ¸ ¼ ¬ ¼¹ ©¬ ª a2 « ab 2 ⊗« 2 « a « «¬ab / 2
ab / 2 a2 b 2 / 4 ab / 2 ab / 2 a2 b 2 / 4 ab / 2
ab / 2º » b2 / 4 » ab / 2» » b 2 / 4 »¼
213
(2)
This transform is an approximation to the 4x4 DCT but because of the change to factors, the result of the new transform will not be identical to the 4x4 DCT. H.264/MPEG4 AVC assumes a scalar quantized operation is
Z i , j = round (Yi , j Qstep ).
(3)
where Y is coefficient of the transform described above, Qstep is a quantized step size and Z is a quantized coefficient. A total of 52 values of Qstep are supported indexed by a Quantization Parameter (QP). Qstep doubles in size for every increment of 6 in QP. Both parameters are in the range 0~51. 2 2 The post-scaling factor a , ab / 2 , b / 4 is incorporated into the forward quantizer. First the input block X is transformed to give a block of unscaled coeffiT cients W = CXC . Then, each coefficient w is quantized and scaled in a single operation: i, j
i, j
ij
MF · § Z ij = round ¨ wij ⋅ qbits ¸ . 2 ¹ ©
(4)
qbits = 15 + floor (QP 6) .
(5)
and
In integer arithmetic, (4) can be implemented as follows:
(
)
Z ij = wij ⋅ MF + f >> qbits .
(6)
sign( Z ij ) = sign( wij ) .
(7)
where >> indicates a binary shift right, and f is 2 / 6 for Inter blocks. If all of the upper bound of (6) is smaller than 1, it is obvious that all transform coefficients are simultaneously quantized into zero. Based on this fact, the sufficient condition of the Soura’s threshold is as follows: qbits
(( wij ⋅ MF + f ) >> qbits ) < 1 .
(8)
214
Z. Wang et al.
3 Proposed Algorithm 3.1 Theoretical Upper Threshold Analysis Just like the discussion on the H.263 [3], if a block is all-zero one, then its DC coefficient should be zero of course. Normally, the value of DC coefficient in the block is the biggest one compared with other AC coefficients. And the other AC coefficient in this block can be estimated to be zero. Therefore, we use the DC coefficient as the biggest number in the whole 4x4 block get the following: 3
3
DC = ¦¦ wij < (qbits − f ) MF .
(9)
i =0 j =0
Through an absolute inequality “ absolute of sum is smaller than the sum of absolute”, we can get a condition from (9) to decide whether the block is all zero 3
3
¦¦ w
ij
< (qbits − f ) MF .
(10)
i =0 j =0
In this equal, we notice that the right-hand-side is a function of Q , noted as f (Q ) , which can be computed to a table beforehand, and the left-hand-side term is the block’s Sum of Absolute Difference (SAD) gotten from motion prediction error. It is a necessary condition to judge the all-zero block, which means if a block is allzero block, this block must be satisfied with the inequity (10). Of course, there are examples to show that DC coefficient is zero but AC coefficient is not zero in these blocks. p
p
3.2 Theoretical Lower Threshold Analysis Using the Cauchy-Schwarz inequality to (1), we can get 3
3
¦¦ w
ij
i =0 j =0
1 2
[(
ª 3 3 º ≤ «¦¦ rij » × H ⋅ H ¬ i =0 j =0 ¼
)]
1 T 2 2
1 2
ª 3 3 º = 232 × «¦¦ rij2 » . ¬ i =0 j =0 ¼
(11)
At the same time, just like the Li [4] shows, arithmetic average is compared with geometry average combined with integer transform matrix H, and the distribution about motion prediction error. Normally we can get the Minimums Sum Error (MSE) 1 2
ª º 1 ª 3 3 º 2 «¦¦ rij » < «¦¦ rij » . 4 2 ¬ x =0 y =0 ¼ ¬ i =0 j =0 ¼ 3
3
(12)
From inequality (11) and (12), we can get a new condition for detecting the all-zero block 3
3
¦¦ w
ij
i =0 j =0
<
2 f (QP) . 29
(13)
Detecting All-Zero Coefficient Blocks Before Transformation and Quantization
215
In this inequality, f (Q ) is already gotten in (9), and the left-hand-side term is the block’s SAD after the motion estimation, which is the middle result from motion estimation. Here we give another criterion to detection, and there is no conflict at all. In fact it is gotten by a series of inequality and a very rigid sufficient condition, so that many all-zero blocks can not pass the judgment. p
3.3 Statistical Optimal Threshold Analysis 2
⋅ f (QP)
From equation (10) we know that the threshold 29 represents the sufficient condition for all-zero DCT coefficient and f (QP) in inequality (13) represents the necessary condition. So in our research we tried to increase the lower limitation threshold by a coefficient named Belta to get a more accurate criterion for most sequences, then the amount of unnecessary computation is reduced as possible. Notice that the determination should be done before the transformation and quantization, so the data distribution in the space block must be analyzed by frequency domain. That means the corresponding relation between them is complex and nonlinear, for varied sequences have the different motion and texture in the moving picture. So the best threshold (both mis-justice and leak-out radio are zero) for all of the video sequences does not exist. In this condition, we want to propose a method to find a threshold judging the allzero block question for most of sequences from the statistical opinion. From the Theoretical analysis above, we notice that the optimal detection criteria should be exist between the upper limit and lower limitation. Here, we use the approaching method to large the lower limitation to upper limitation by factor belta, just as followed definition:
SAD < belta ⋅
2 ⋅ f (QP) . 29
Fig. 1. Mis-justice and leak-out ratio against different detection thresholds
(14)
216
Z. Wang et al.
To examine the effectiveness of the afore mentioned threshold, we chose six sequences with vastly varying content from the standard test video clips and implemented them in H.264/MPEG4 AVC reference encoder. These sequences are classified into three categories, which represent different kinds of motion. 1) Small motion scenes. Salesman, Akiyo and mother daughter sequences, which posses little movement and a still background. 2) Moderate motion scenes: News, Children2, Container and Silent sequences. 3) Large motion scene: Mobile, Stefan and Foreman which possesses large facial motion and a fast moving background, and coastguard which possess large but uniform motion.
(a) belta = 2.3
(b) belta = 2.6
Fig. 2. Comparison between mis-justice and leak-out ratio with different Belta
Detecting All-Zero Coefficient Blocks Before Transformation and Quantization
217
Here, we define a proportion cooperated with mis-justice and leak-out ratio. The numerator of this proportion is the difference between the number of factual all-zero blocks and the number of determined all-zero blocks, and then the denominator of this proportion is the number of factual all-zero blocks. If the result of proportion is positive, that means some all-zero blocks are not found by this threshold, so the leak-out ratio is higher. On the contrary, if the result of proportion is negative, that means some non-all-zero blocks are judged to all-zero blocks, so the mis-justice ratio becomes higher. Through ten typical video sequences are tested using this principle, figure 1 shows the ratio of determined all-zero blocks against determination threshold. We may notice that the experimental data about mis-justice and leak-out ratio against different detection threshold has a monotonic characteristic as the criterion is increased. This process is satisfied with the threshold varies from lower limitation to upper limitation. In other words, the leak-out ratio decreases and mis-justice ratio increases. Thus we can decide that the fitful value of belta coefficient is nearby 3. To get an exact coefficient further, we examine the different value about belta from 2.5 to 3.5 under the different quantization parameters, equal to different encoding bit rate. As an example, here, we give two results to show the ratio of mis-justice and leak-out against different determination threshold. In figure 2(a), the belta is 2.3 and in figure 2(b) the value of belta is 2.6 respectively. From these figures we can notice that belta is equal 2.6 shall give a better tradeoff between image distortion and computational burden, both the mis-justice and leak-out ratio is closed to zero, especially in the very low bit rate condition. As a conclusion, the best optimal threshold is chosen by
2.6 ⋅
2 ⋅ f (QP) . 29
(15)
4 Performance Results In our experiments, we have applied the proposed method to the H.264/MPEG4 AVC video encoder. The frame frequency is fixed to 30 frames per second. We test the ten typical sequences Stefan, Mobile, Container, Foreman, Hall monitor, Akiyo, M&D, Car phone, Salesman and Sign_iren in our system. In addition, one reference frames were used for the motion estimation. The proposed algorithm was compared among the Lower Threshold (L.T), the Upper Threshold (U.T) and the Statistical Threshold (S.T) in order to verify the improvement from the optimal criteria. In general, the transform and quantization are necessary efficiently to be implemented in the encoding process. Thus, it is necessary to observe the number of calculations found for the all-zero blocks by our proposed method. Table 1 shows the simulation results for our proposed rule. With the quantization step size of 42, there are nearly 84.634% inter block which can be determined to be of all-zero coefficients blocks. If the lower threshold is used, the leak-out ratio is higher, that means the 8% on average all-zero blocks can not be affirmed and have to be transform and quantization, so more computation have to be done. However, if the upper threshold is used,
218
Z. Wang et al.
the mis-justice comes up, 2.836% blocks in the picture is thought as all-zero blocks, as a result, the degradation of reconstructed image is noticed. The proposed optimal threshold detects all-zero blocks which were impossible to detect in the lower threshold method, and a little mis-justice ratio with it. Table 1. Mis-justice and leak-out ratio comparison (dB)
sequence Stefan Mobile Container Foreman Hallmonitor Akiyo M&D Car phone Salesman Sign_iren
Original
L.T
S.T
U.T
64.27 69.44 85.42 75.40 91.53 97.91 96.62 87.36 89.46 88.93
76.64 83.59 92.57 86.80 95.12 101.07 100.13 92.78 94.39 93.62
62.77 68.02 85.08 74.84 89.67 97.89 96.59 87.22 89.37 88.81
58.26 62,23 82.79 72.39 89.50 96.80 95.42 85.47 87.92 87.20
The table 2 gives the comparison among the lower threshold, upper threshold and statistical threshold. Of course, if we use the lower threshold with leak-out ratio, there is no effect on the image quality. But when upper threshold is selected, the PSNR value is decreased and it is observed by human eyes obviously. Then it is proved that the optimal threshold gotten from experiments works well for most of video sequences. Table 2. PSNR comparison (dB)
sequence Stefan Mobile Container Foreman Hallmonitor Akiyo M&D Car phone Salesman Sign_iren
Original 24.73 23.042 27.046 28.357 28.444 30.395 30.190 26.97 25.71 27.63
L.T 24.73 23.042 27.046 28.357 28.444 30.395 30.190 26.97 25.71 27.63
S.T 24.62 22.922 26.996 28.297 28.434 30.395 30.190 26.95 25.71 27.62
U.T 24.32 22.452 26.986 28.267 28.404 30.395 30.180 26.94 25.69 27.64
5 Conclusion In this paper, we propose an improved algorithm for detecting all-zero blocks early in H.264/MEPG4 AVC video encoding. Based on the theoretical analyzes and experimental statistics, we have derived an optimal criterion as all-zero coefficient blocks
Detecting All-Zero Coefficient Blocks Before Transformation and Quantization
219
threshold, under which each quantized coefficient in blocks can be judged to be zero more accurately. The simulation results show that the proposed algorithm detects all-zero blocks which were not determined by the Sousa algorithm. The mis-justice in these blocks corresponds to approximately 4%–12% of the total all-zero blocks by the Lower threshold. The proposed algorithm achieves approximately a 10%–35% computational saving compared to the Sousa 4 4 algorithm. Therefore, the proposed algorithm effectively eliminates the redundant calculations for the transform and quantization without degrading video-quality. In addition, the performance of the proposed algorithm does not depend on the number of reference frames.
References 1. Joint Video Team of ITU-T and ISO/IEC JTC 1: Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification (ITU-T Rec. H.264 | ISO/IEC 14496-10 AVC). Document JVT-G050r1, May 2003; technical corrigendum 1 documents JVTK050r1 (non-integrated form) and JVT-K051r1 (integrated form), March 2004; and Fidelity Range Extensions documents JVT-L047 (nonintegrated form) and JVT-L050 (integrated form), July 2004 2. Wiegand, T., Sullivan, G.J., Bjontegarrd, G., Luthra, A.: Overview of the H.264/AVC Video Coding Standard. IEEE Transaction on Circuits and Systems. Video Technology, No. 7 (2003) 560-576 3. Alice, Y., Lee, R., Flynn, M.: Early Detection of All-zero Coefficients in H.263, Proceedings of the picture coding symposium, Berlin, Germany (1997)159-164 4. Li, Q., Cui, H.J., Tang, K.: Adaptive Zero-Block Detection Algorithm with MultiResolution. Journal of Software. 13 (12) (2002) 5. Pao, I.M., Sun, M.T.: Modeling DCT Coefficients for Fast Video Encoding. IEEE Trans. On CSVT, 9 (1999)608-616 6. Xuan, Z., Zhenghua, Y., Songyu, Y.: Method for Detection All-zero DCT Coefficients Ahead of Discrete Cosine Transformation and Quantization. Electron. Lett., 34 (1998) 1839–1840 7. Sousa, L.A.: General Method for Eliminating Redundant Computations in Video Coding. Electronics Lett., 36 (2000) 306-307 8. Moon, Y.H., Kim, G.Y., Kim, J.H.: An Improved Early Detection Algorithm for All-Zero Blocks in H.264 Video Encoding. IEEE Trans. On Circuits and Systems for Video Technology, 15 (2005) 1053-1058
Efficient KPCA-Based Feature Extraction: A Novel Algorithm and Experiments Yong Xu1, 2, *, David Zhang3, Jing-Yu Yang1, Zhong Jing1, and Miao Li2 1
Department of Computer Science & Technology, Nanjing University of Science & Technology, Nanjing, China 2 Bio-Computing Research Center and Shenzhen graduate school, Harbin Institute of Technology, Shenzhen, China 3 The Biometrics Research Center and Department of Computing, Hong Kong Polytechnic University, Kowloon, Hong Kong
Abstract. KPCA has been widely used for feature extraction. It is noticeable that the efficiency of KPCA-based feature extraction is in inverse proportion to the size of the training sample set. In order to speed up KPCA-based feature extraction, we develop a novel algorithm(i.e. IKPCA) which improves KPCA with a distinctive viewpoint. The algorithm is methodologically consistent with KPCA with clear physical meaning. Experiments on several benchmark datasets illustrate that IKPCA-based feature extraction is much faster than KPCA-based feature extraction. The ratio of IKPCA-based feature extraction time to KPCA-based feature extraction time may be smaller than 0.30. Furthermore, the classification accuracy corresponding to IKPCA is comparable with KPCA.
1 Introduction Principal component analysis(PCA) is based on second-order statistical information of data and substantially reduces the complexity of data in which a large number of variables are interrelated, such as in large-scale gene expression data obtained across a variety of different samples or conditions. PCA accomplishes this by computing a new, much smaller set of uncorrelated variables which best represents the original data. Indeed, PCA is a powerful, well-established technique for data reduction. For some real-world applications, it seems that, PCA is a success in feature extraction and dimensionality reduction[1-8]. However, if data points are not linear separable, PCA cannot perform well in feature extraction. In this case, nonlinear techniques will be more appropriate than linear ones for feature extraction[7]. Thus, kernel PCA (KPCA) was derived from PCA as a nonlinear feature extraction method[9,10]. Theoretically, KPCA may be viewed as a combination of the following two procedures: the first procedure implicitly transforms the input space into a higher-dimensional feature space, and the second one carries out PCA in the feature space. By virtue of so-called kernel functions, KPCA is computationally tractable compared to other nonlinear methods. *
Corresponding author.
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 220 – 229, 2006. © Springer-Verlag Berlin Heidelberg 2006
Efficient KPCA-Based Feature Extraction: A Novel Algorithm and Experiments
221
For PCA, the eigenvectors of the correlation matrix(or covariance matrix) corresponding to large eigenvalues are considered to be optimal feature extractors and feature extraction can be implemented by projecting samples onto the feature extractors. As for KPCA, it is certain that one optimal feature extractor can be expanded in terms of all training samples in the feature space. Consequently, if we use KPCA to extract features of one sample, we should work out all the kernel functions between this sample and the total training samples ahead of time and then carry out feature extraction based on them. As a result, the larger the size of the training sample set is, the lower the efficiency of feature extraction is. Especially, with larger numbers of training samples, KPCA-based feature extraction will be inefficient and even unfeasible for real-world applications. Actually, other kernel methods also suffer from similar problems[11-14]. On the other hand, almost all the real-world applications desire efficient feature extraction. Under the circumstances, some algorithms have been proposed to accelerate the feature extraction based on kernel methods. Generally, these algorithms primarily root in the following two ideas. The first idea is based on the supposition that in feature space one or more training samples can be expressed as a linear combination of the others. Provided that it is true, any feature extractor in feature space can be expanded in terms of some training samples rather than all of them, and the corresponding feature extraction may become more efficient than the naïve KPCA. Another idea is that in feature space one feature extractor may be approximately expanded in terms of some vectors. These vectors may or may not be from the training sample set. The dimensionality of the vectors is identical to that of the training samples. On the other hand, the number of the vectors is less than that of the total training samples and consequently feature extraction can perform more quickly. However, the first idea presented above is not always consistent with real-world applications. If the training samples in feature space are linearly dependent, undoubtedly one training sample can be expressed as a linear combination of the others. Therefore, excluding one training sample from the set of the total training samples, we can still expand one optimal extractor in terms of the other training samples exactly. There is no changes which will occur in the forthcoming feature extraction except that the process will get more efficient. However, the above situation may not occur in practice. An exception to it is the feature spaces associated with the Gaussian kernel, in which none of the training samples can be expanded in terms of the others. On the other hand, most of the algorithms based on the second idea were developed with the viewpoint of numerical approximation and it is not clear whether these algorithms were methodologically consistent with the KPCA algorithm or not. In this paper, we attempt to develop a novel algorithm(i.e. IKPCA) which makes KPCA-based feature extraction faster. The algorithm is distinct from the existing ones and consistent with the methodology of KPCA. Experimental results show that IKPCA is comparable with KPCA in classification correctness, whereas the efficiency of IKPCA-based feature extraction is quite higher than that of KPCA-based feature extraction. The rest of this paper is organized as follows. KPCA is introduced in section 2 as an extension of the PCA technique. Then the algorithm of IKPCA is presented in section 3, followed by the experimental results shown in section 4. Finally, a brief conclusion is drawn in section 5.
222
Y. Xu et al.
2 Nonlinear PCA with Kernel Function PCA can yield a low-dimensional subspace that best represents the original data according to a minimum-square-error criterion; however, if the interaction relations of features of original data are complicated and nonlinear, the linear subspace obtained by PCA may be a poor presentation and nonlinear methods may be needed[2]. In fact, as a nonlinear method, KPCA is nothing but the PCA in the feature space associated with the corresponding kernel function. We assume that there are N training samples,
x1 , x 2 ,..., x N , in total. If the training samples have been mapped into a feature space F by a nonlinear function φ , we may perform PCA based on the corresponding training samples φ ( x1 ), φ ( x 2 ),..., φ ( x N ) in the feature space. The correlation matrix in the feature space can be computed by Σ = 1 φ
N
¦φ ( x )φ ( x ) N i
and the corresponding
T
i
i =1
eigenvalue equation is Σφ u = λu . It is easy to demonstrate that optimal feature extractors in the feature space must be from the set of the eigenvectors of Σφ . Exactly, in the feature space the optimal extractors are the eigenvectors associated with large eigenvalues. In practice, with these feature extractors, we can obtain the optimal features of samples, through which we can reconstruct the samples with the minimum mean-square error. All solutions u to the equation lie in the subspace spanned by all N
the training samples in the feature space[9], i.e. u = ¦ α jφ ( x j ) .Therefore the j =1 above eigenvalue problem can take the form of a set of equations
φ ( xk ) ⋅ Σφ )u = λ φ ( xk ) ⋅ u , k = 1,2,..., N
(1)
N
with the constraint of u = ¦ α jφ ( x j ) . Substituting this constraint into (1) we get j =1
Kα = λα where the matrix
(2)
K is defined to be ( K ) ij = k ( xi , x j ) = φ ( xi ) ⋅ φ ( x j ) , i, j = 1,2,..., N .
Suppose that the eigenvalues of the matrix K in (2) are λ1 ≥ λ2 ,..., ≥ λ N . If the
m eigenvectors of K associated with the first m(m < N ) largest non-zero eigenvalues are α (1) , α ( 2 ) ,..., α ( m ) respectively, the unitary eigenvectors, u1 , u 2 ,..., u m , of Σφ will N
be ui = ¦ α (ji )φ ( x j ) j =1
λi , i = 1,2,..., m , where α (ij ) is the j − th component of the
vector α (i ) [9,10]. In other words, in the feature space, the first m eigenvectors of the correlation matrix can be determined by the first m ones of the K in Eq.(2) respectively. As a feature extraction method, KPCA is optimal in the sense of minimal reconstruction error. Obviously, the projection of φ (x) onto ui can be written as
¦
N j =1
α (j i ) k ( x j , x )
λi , and all the projections of φ (x) onto u1 , u 2 ,..., u m can be
integrated to form the following vector:
Efficient KPCA-Based Feature Extraction: A Novel Algorithm and Experiments
Y=
[¦
N j =1
α (j1) k ( x j , x)
λ1
¦
N
α (j 2) k ( x j , x)
j =1
λ2
¦
N
α (j m ) k ( x j , x)
j =1
223
λm
]
T
(3) The method presented above is called Kernel PCA(KPCA). Y is the feature vector of the sample φ (x) . Based on the feature vectors of samples, classification can be carried out.
3 Idea and Algorithm of Improving KPCA 3.1 Idea of Fast Feature Extraction Section 2 has shown that, in the feature space, feature extraction can be implemented according to (3). However, (3) indicates that in order to get the feature vector of one sample, we should work out all the kernel functions between this sample and the total training samples, which means the feature extraction process associated with a training sample set of a large size is inefficient. To speed up KPCA-based feature extraction, we assume that in the feature space one feature extractor may be approximately expressed as a linear combination of a portion of training samples, called nodes. The assumption is supported by the fact that when one feature extractor is expanded in terms of all the training samples, different training samples will correspond to dissimilar effects on the expansion of the feature extractor. In other words, some training samples contribute much to the expansion, whereas the others contribute less[13]. If we find out the “important” training samples, which contribute much to the expansion, and newly expand feature extractors in terms of them, then the “important” samples may be taken as the nodes. As a result, we can extract features of one sample based on all the kernel functions between this sample and the nodes. Since the number of the nodes is smaller than that of the total training samples, we can lead to more efficient KPCA-based feature extraction. The strategy of determining nodes will be given in section 3.2. Actually, similar ideas have been successfully applied to kernel-based discriminant analysis methods[11,12], though the corresponding technical routines are distinct from that of IPKCA presented below. Suppose that a feature extractor ui can be approximately expanded in terms of s
ui ≈ ¦ j =1 β jφ ( x 0j ), s < N and consequently Σφ ui ≈ λui . For simplicity, we replace the sign “ ≈ ” with “ = ” in the context below. Here φ ( x10 ),φ ( x20 ),...,φ ( x s0 ) are the nodes. Note that the following set of equations is certain:
φ ( xk0 ) ⋅ Σφ )ui = λ φ ( xk0 ) ⋅ ui , k = 1,2,..., s . Substituting the above expansion of s
λ ¦ j =1 β j (φ ( xk0 ) ⋅ φ ( x 0j )) =
(4)
ui into (4) arrives at
s 1 N β (φ ( x k0 ) ⋅ φ ( xi ))(φ ( xi ) ⋅ φ ( x 0j )), k = 1,2,..., s. ¦ i =1 ¦ j =1 j N
224
Y. Xu et al.
We can formulate this set of equations as follows:
1 K1 ( K 1 ) T β = λK 2 β , N ªk ( x10 , x1 ) « 0 «k ( x2 , x1 ) where «. K1 = « «. «. « «¬k ( xs0 , x1 )
(5)
ª k ( x10 , x10 ) . . . k ( x10 , x s0 )º . . . k ( x10 , x N )º » « 0 0 » 0 0 0 . . . k ( x2 , x N ) » , « k ( x2 , x1 ) . . . k ( x 2 , x s )» . » «. » » K2 = « » » «. » » «. » » « » 0 0 0 0 0 «¬ k ( x s , x1 ) . . . k ( x s , x s )»¼ . . . k ( xs , x N )»¼
The above demonstration relates the problem of determining nodes to an eigenvalue equation. In a word, we may approximately expand one feature extractor as a linear combination of the nodes, and the coefficients of the linear combination can be determined through Eq.(5). The above approach is called improved KPCA(IKPCA), which aims to extract features of samples more efficiently. In practice, how to determine the nodes is the key of IKPCA. We will address the issue in the following subsection. 3.2 Algorithm of Determining Nodes For both PCA and KPCA, the performance of the optimal feature extractors can be assessed by the corresponding eigenvalues(For either of the two methods, the optimal feature extractors must be the eigenvectors of the corresponding eigenvalue equations). For the PCA-based(or KPCA-based) feature extraction, we prefer one feature extractor(eigenvector) corresponding to a large eigenvalue to that corresponding to a small one, because a larger eigenvalue means less construction error. IKPCA is derived from KPCA and it may be considered an approximation version of KPCA, thus the performance of one feature extractor from IKPCA may be also assessed by the eigenvalues of Eq.(5). That is, for an eigenvalue equation taking the form of Eq.(5), the larger one eigenvalue is, the more powerful the corresponding eigenvector(feature extractor) is. Consequently, we can determine nodes using the following algorithm. Step 1. Determine the first node For the i − th training
K1 = [ k ( xi , x1 )
λ = K1 ( K1 )
sample
K1 , K 2 , λ
xi ,
k ( xi , x2 ) . . . k ( xi , xl ) ]
T
,
are
computed
K 2 = [k ( xi , xi )]
using and
T
K 2 . Obviously, K 2 and K 1 ( K 1 ) are both scalars and every training sample corresponds to its own λ . When all the training samples have been searched and investigated, the one associated with the maximum λ is taken as the first node, 0
denoted by x1 . Then, the matrices K 10 , K 20 respectively, i.e.
K10 = [k ( x10 , x1 )
K 1 , K 2 corresponding to x10 are recorded as
k ( x10 , x2 ) . . . k ( x10 , x N )] , K 20 = [k ( x10 , x10 )] .
Efficient KPCA-Based Feature Extraction: A Novel Algorithm and Experiments
225
Step l . Determine the l − th node If l − 1 nodes, x10 , x20 ,..., xl0−1 , have been determined by the previous l − 1 steps, the
l − th node may be determined as follows. Firstly, vector k 1j is defined as k 1j = [k ( x j , x1 ), k ( x j , x2 ),..., k ( x j , xl )] . Let K10 , K 20 denote the matrices K 1 , K 2 corresponding to x10 , x20 ,..., xl0−1 , i.e.
K10
ij
= k ( xi0 , x j ), i = 1,2,..., l − 1, j = 1,2,..., N ; ( K 20 ) ij = k ( xi0 , x 0j ) i, j = 1,2,..., l − 1 .
{
}
Note that the l − th node should be from the sample set P = x j | x j ≠ x10 , x20 ,..., xl0−1 , which is a subset of the set of the total training samples. In the step, we will take each element of P as one candidate for the l − th node and assess them respectively to select the optimal candidate as the l − th node. For assessing one sample(element)
x j from P , we define the corresponding K 1 , K 2 as : ª K 20 K2 = « 2 «¬k j
ª K10 º K1 = « 1 » , ¬k j ¼
(k 2j ) T º » k ( x j , x j ) »¼
where k 2j = [k ( x j , x10 )
k ( x j , x20 ) ...k ( x j , xl0−1 )] . Based on the K1 , K 2 , an eigenvalue equation in the form of Eq.(5) can be constructed, and then its eigenvalues, λ1 , λ2 ,..., λ N , can be worked out. Suppose that we want m feature extractors. A variable v is introduced and evaluated as follows: if l ≤ m , v = λ1 + λ2 + ... + λl ; otherwise, v = λ1 + λ2 + ... + λm . After all the samples(elements) in P have been researched and investigated by the above procedure, the maximum v is denoted by vl . Then, the candidate associated with vl , is sorted out as the l − th node and denoted by xl0 . Meanwhile, K10 , K 20 are newly defined to be the matrices K 1 , K 2 0
0
corresponding to x10 , x 2 ,…, xl respectively. The above procedure is not terminated until s ≥ N ∗ t , where t < 1 , N is the total number of the training samples and s is the number of the determined nodes. After the procedure of determining nodes is terminated, the sample φ (x ) in the feature space can be featured by
f =
[¦
s j =1
β j(1) k ( x 0j , x )
λ1
¦
s j =1
β j( 2 ) k ( x 0j , x )
s
λ2 ¦ j =1 β (j m ) k ( x 0j , x)
λm
], T
where β (i ) = [ β1( i ) β 2(i ) . . . β s(i ) ]T . In fact, β (1) , β ( 2 ) ,…, β ( m ) are the first m eigenvectors associated with the first m largest eigenvalues of the corresponding eigenvalue equation taking the form of (5), which is based on the determined nodes x10 , x 20 ,…, x s0 and all the training samples x1 , x2 ,..., x N .
226
Y. Xu et al.
4 Experiments To illustrate the efficiency and performance of IKPCA, we conduct experiments on 4 benchmark datasets(http://ida.first.gmd.de/~raetsch/data/). Every dataset includes 100 subsets except for “Splice” where are only 20 subsets. Moreover, each subset consists of one training subset and one test subset. We use the Gaussian kernel k(x, y) = exp (−||x−y||2/2σ2), and let σ2 be equal to the square of Frobenius norm of the correlation matrix of the first training subset. The training procedure is performed on the first training subset, and test is implemented for all the test subsets with the near neighbor classifier. For dataset “Splice” t is set to be 0.25, while for the other datasets t is set to be 0.5. Every test subset corresponds to one classification error rate, so we can figure out the average error rate of one dataset based on its whole test subsets. We can also figure out the average deviation of the error rates on a dataset based on all the corresponding test subsets. Table 1 and Table 2 show the experimental results on the datasets. In the two tables, the sign “±” connects the average error rate(the former) and the average deviation of error rates on one dataset(the latter). It appears that IKPCA is much more efficient than KPCA in feature extraction. Take the dataset “Splice” for an example, we see that the feature extraction time based on IKPCA is only 260 seconds when the number of the feature extractors is 100. However, the feature extraction time based on KPCA is 864 seconds, which is much longer than that based on IKPCA. Averagely, the ratio of IKPCA-based feature extraction time to KPCA-based feature extraction time is Table 1. Experimental result of KPCA on benchmark datasets
Number of feature extractors Splice
Diabetis
Banana
Cancer
100 90 80 70 100 90 80 70 100 90 80 70 70 60 50 40
Average error rate and deviation 25.5±2.6 24.7±2.5 24.0±2.4 21.8±2.2 11.5±2.8 11.7±2.8 11.5±2.8 11.8±2.9 13.8±0.2 13.8±0.2 13.8±0.2 13.8±0.2 9.0±3.2 8.5±3.0 8.5±3.0 9.8±3.3
Total number of training samples 1000
468
400
200
Feature extraction time(s) 864 832 801 778 238 212 202 191 3128 2983 2908 2825 25.4 23.2 22.2 21.8
Efficient KPCA-Based Feature Extraction: A Novel Algorithm and Experiments
227
between 0.28 and 0.60(see Table 2). Meanwhile, for datasets “Diabetis” and “Cancer”, there are comparative classification error rates for the two methods. For the dataset “Banana”, the classification result of IKPCA is slightly inferior to KPCA. It is noticeable that, for the dataset “Splice” in which the dimensionality of the sample space is 1000, IKPCA performs much better than KPCA with a lower classification error rate. These data indicate that IKPCA surely performs well in feature extraction as we expect. Table 2. Experimental result of IPCA on benchmark datasets Number of feature extractors
Splice
Diabetis
Banana
Cancer
100 90 80 70 100 90 80 70 100 90 80 70 70 60 50 40
Average error rate and deviation
17.8±1.8 17.8±1.8 17.5±1.8 17.6±1.7 11.4±2.8 11.9±2.9 11.6±2.8 12.0±2.9 14.1±0.2 14.2±0.2 14.2±0.2 14.2±0.2 9.2±3.3 8.6±2.9 8.6±2.9 8.1±2.9
Number of nodes
250
234
200
100
Feature extraction time(s)
260 247 230 214 140 125 121 114 1807 1799 1734 1688 15.3 13.9 13.2 12.7
The ratio of IKPCA-based feature extraction time to KPCA-based feature extraction time 0.30 0.30 0.29 0.28 0.59 0.59 0.60 0.60 0.58 0.60 0.60 0.60 0.60 0.60 0.59 0.58
5 Conclusion As a nonlinear method, KPCA has been widely used for feature extraction. If KPCA is used to extract features of one sample, all the kernel functions between the sample and the total training samples should be obtained in advance. As a result, with large numbers of training samples, the efficiency of KPCA will become very low. On the other hand, real-world applications usually desire pattern recognition systems with efficient feature extraction. In this paper, we develop the IKPCA algorithm to improve KPCA for more efficient feature extraction. The algorithm is straightforward and reasonable. Moreover, it is distinct from the existing algorithms of improving KPCA and subject to the principle of KPCA with clear physical meaning. For IKPCA, the optimal feature extractors are the eigenvectors corresponding to large eigenvalues of an eigenvalue equation. The experimental results on the benchmarks show that the efficiency of IKPCA-based feature extraction is much higher than that of KPCA-based feature extraction. The lowest ratio of the time taken by IKPCA-based feature extraction to that used by KPCA-based feature extraction is only 0.28 and the highest ratio is 0.60. Moreover, we can carry out classification based on the features generated from IKPCA with a high accuracy.
228
Y. Xu et al.
Acknowledgments This article is partly supported by National Nature Science Committee of China under grants No. 60472060, No. 60473039, No.60472061 and No. 60620160097.
References [1] Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd Edition, Academic Press, Inc., New York, (1990) [2] Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, China Machine Press, Beijing, (2004) [3] Kirby, M., Sirovich, L.: Application of the KL Procedure for the Characterization of Human Faces, IEEE Trans. Pattern Anal. Machine Intell. 12 (1) (1990) 103-108 [4] Turk, M., Pentland, A.: Face Recognition Using Eigenfaces, Proc. IEEE Conf. On Computer Vision and Pattern Recognition, (1991)586-591 [5] Yang, J., Yang, J.-Y.: Why can LDA be Performed in PCA Transformed Space? Pattern Recognition 36(2) (2003) 563-566 [6] Liu, C.: Gabor-based Kernel PCA with Fractional Power Polynomial Models for Face Recognition, IEEE Trans. Pattern Analysis and Machine Intelligence 26(5) (2004) 572-581 [7] Bian, Z., Zhang, X.: Pattern Recognition(in Chinese), Tsinghua University Press, Beijing, (2000) [8] Jin, Z., Davoine, F., Lou, Z., Yang, J.-Y.: A Novel PCA-based Bayes Classifier and Face Analysis, IAPR International Conference on Biometrics (ICB2006), Hong Kong, Jan. 5-7, (2006) [9] Scholkopf, B., Smola, A., Muller, K.R.: Nonlinear Component Analysis as a Kernel Eigenvalue Problem, Neural Computation 10 (5) (1998) 1299-1319 [10] Scholkopf, B., Smola, A., Muller, K.R.: Kernel Principal Component Analysis, Artificial Neural Networks- ICANN'97, Berlin, (1997) 583-588 [11] Xu, Y., Yang, J.-Y., Lu, J., Yu, D.-J.: An Efficient Renovation on Kernel Fisher Discriminant Analysis and Face Recognition Experiments, Pattern Recognition 37 (2004) 2091-2094 [12] Xu, Y.,. Yang, J.-Y, Yang, J.: A Reformative Kernel Fisher Discriminant Analysis, Pattern Recognition 37 (2004) 1299-1302 [13] Xu, Y., Yang, J.-Y. , Lu, J.-F.: An Efficient Kernel-based Nonlinear Regression Method for Two-class Classification, Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Guangzhou, China, August, (2005) 4442-4445 [14] Xu, Y., Zhang, D., Jin, Z., Li, M., Yang, J.-Y.: A Fast Kernel-based Nonlinear Discriminant Analysis for Multi-class Classification, Pattern Recognition, (2006) 39(6), 1026-1033
About the Author —YONG XU received the B.S. degree, the M.S. degree and the PhD degree in 1994, 1997 and 2005 respectively. His current interests include face recognition, handwritten character recognition linear and nonlinear discriminant analysis. About the Author — David Zhang graduated in computer science from Peking University in 1974 and received the MSc and PhD degrees in computer science and
Efficient KPCA-Based Feature Extraction: A Novel Algorithm and Experiments
229
engineering from the Harbin Institute of Technology (HIT) in 1983 and 1985, respectively. He received a second PhD degree in electrical and computer engineering from the University of Waterloo, Ontario, Canada, in 1994. After that, he was an associate professor at the City University of Hong Kong and a professor at the Hong Kong Polytechnic University. Currently, he is a founder and director of the Biometrics Technology Centre supported by the UGC of the Government of the Hong Kong SAR. He is the founder and editor-in-chief of the International Journal of Image and Graphics and an associate editor for some international journals such as the IEEE Transactions on Systems, Man, and Cybernetics, Pattern Recognition, and International Journal of Pattern Recognition and Artificial Intelligence. His research interests include automated biometrics-based identification, neural systems and applications, and image processing and pattern recognition. So far, he has published more than 180 papers as well as 10 books, and won numerous prizes. He is a senior member of the IEEE and the IEEE Computer Society. About the Author —JING-YU YANG received the BS degree in computer science from Nanjing University of Science and Technology (NUST), Nanjing, China. From 1982 to 1984, he was a visiting scientistat the Coordinated Science Laboratory, University of Illinois at Urbana-Champaign. From 1993 to 1994, he was a visiting professor in the Department of Computer Science, Missuria University. And, in 1998, he acted as a visiting professor at Concordia University in Canada. He is currently a professor and chairman in the Department of Computer Science at NUST. He is the author of more than 300 scientific papers in computer vision, pattern recognition, and artificial intelligence. He has won more than 20 provincial and national awards. His current research interests are in the areas of pattern recognition, robot vision, image processing, data fusion, and artificial intelligence. About the Author —ZHONG JIN received the BS degree in mathematics, the MS degree in applied mathematics, and the PhD degree in pattern recognition and intelligence system from Nanjing University of Science and Technology (NUST), China, in 1982, 1984, and 1999, respectively. He is a professor in the Department of Computer Science, NUST. He visited the Department of Computer Science and Engineering, The Chinese University of Hong Kong from January 2000 to June 2000 and from November 2000 to August 2001 and visited the Laboratoire HEUDIASYC, UMR CNRS 6599, Universite de Technologie de Compiegne, France, from October 2001 to July 2002. Dr. Jin is now visiting the Centre de Visio per Computador, Universitat Autonoma de Barcelona, Spain, as the Ramon y Cajal program Research Fellow. His current interests are in the areas of pattern recognition, computer vision, face recognition, facial expression analysis, and content-based image retrieval. About the Author —MIAO LI obtained her BS degree in computer science from Jilin University, Changchun, China, in 2003. Now she is a MS.D. student in Bio-Computing Research Center and Shenzhen graduate school, Harbin Institute of Technology, Shenzhen, China. Her current interests include pattern recognition, image processing, and neural network.
Embedded System Implementation for an Object Detection Using Stereo Image Cheol-Hong Moon1 and Dong-Young Jang2 1
Gwangju University, Gwangju, Korea
[email protected], http://web2.gwangju.ac.kr/ chmoon/ 2 Gwangju University, Gwangju, Korea
[email protected]
Abstract. This paper reports the implementation of an Embedded system for object detection using a stereo image. The Embedded system can be divided into three areas: image capture, memory control block, system control block. The system control block was implemented using a 32bit RISC processor. The other block was assisted by FPGA using H/W in order to obtain image information. An algorithm using mean squared error (MSE) was used for object detection and was performed in the system control block. In order to identify an identical object in a stereo image block, a matching algorithm was used to create DEPTH MAP for the image-processing algorithm. After searching the distance to the object by inputting the DEPTH value from the user, the object in a specific position was detected by comparing the DEPTH to the DEPTH MAP. The information and image was transmitted to a PC and identified.
1
Introduction
Image recognition in digital image signal processing can be applied to various fields in real life, and can be classified as still image recognition and moving picture recognition. The representative application of the still image is automobile plate recognition. However, it can be used in other fields such as finger print recognition, retina recognition, face recognition, character recognition, non-conforming product selection in a factory, OMR, and OCR. In addition, still images are used in medical equipment. In this still image recognition, the most important consideration is how effectively the characteristics can be obtained from an image.[1] The moving picture recognition field is used to examine the automatic tracking system that traces the object automatically. It has been being actively performed as a method for dramatically reducing human labor.[1] Many studies on automatic visual tracing or automatic target recognition systems as a device automatically tracing the object have been performed. Image tracing has became one of the most important elements in some applications including image base control, human-computer interface, monitoring devices, agricultural automation, medical image analysis and image recovery. The main research direction of image tracing is to detect the distance between two adjacent images D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 230–240, 2006. c Springer-Verlag Berlin Heidelberg 2006
Embedded System Implementation for an Object Detection
231
according to their movement and then to trace the moving object. This includes techniques such as the image matching between two adjacent images[2][4], image detection of a moving object from a background image, and an estimation of the movement of a moving object in the next frame based on the parameter of the past movement. The key areas of object tracing are how to detect an accurate object, to process it in real time and to adapt it to dynamically changing atmospheric environments. To trace such an object, two CCD cameras will be needed and the desired object should be traced through the stereo matching of the two images acquired from the two CCD cameras.[3] This study configured the stereo matching that can recognize a certain object from a stereo image and a DEPTH MAP to detect the subject in a photograph, which is located in a certain position amongst many objects, in order to recognize and trace the object automatically[2]. This embedded system was used to perform all the related algorithms. In the image entry part of the implemented embedded system, FIFO logic was designed to automatically save the image signals from the stereo camera, and the embedded system was configured using a RISC 32bit processor to control the various image processing algorithms and systems.[3][5]
2 2.1
Image Processing Theory Stereo Image Acquiring Method and Configuration
Stereo image capture system requires two or more cameras to obtain a pair of images right and left. Each camera is configured to have a parallel structure. Focus F of the two cameras should be identical and those cameras should be adjusted so there will be no asymmetry such as a perpendicular difference. A different focus or optical center of the cameras will cause serious error in the left and right image conformance even if it is very tiny difference. It will cause difficulties in estimating the difference and compression, and adversely affect the recovered image quality.[2][6] This study used a parallel camera configuration method, which employs two cameras at some distance where the retinas and
Fig. 1. Parallel Configuration
232
C.-H. Moon and D.-Y. Jang
the optical center of lenses are parallel, while the optical center of the camera is directed to a convergence point. A horizontal image is shown on the screen if the CCD sensor of the camera moves horizontally. 2.2
Stereo Matching
Human sight acquires the distance information by combining the two images acquired from the different locations. Stereo matching is one of the computer sight fields used to automate the distance detection capabilities. The basic step to acquire the distance information consists of image acquisition, detection of the characteristics, stereo matching, calculating the distance from the displacement. The important factors are to select a matching element used as a special feature and to calculate the most appropriate matching method for that element.[2] The matching element can be classified as a feature-based method and an area-based method. This study used a stereo matching algorithm, which uses the areabased method. The stereo matching algorithm is used frequently in areas where a movement vector is detected. It detects how an object moves according to the difference. It calculates the DEPTH (difference information) by comparing
Fig. 2. Stereo Matching Method
the current left and right image with the previous frame.[1] Figure 2 shows how the block conforms the stereo image. If the base block is set using the left image as the standard image, the right image will be conformed and set as the matching block. It will take too much time if the whole block of the right image is set. Therefore, in order to reduce the time, the block was set on a part of the image.[1] Figure 2 shows the stereo matching order . It moves from the initial coordinate of the block F(0, Y) to the right pixel by pixel. This study used the MSE (Mean Squared Error), which was formulated according to the schematic diagram as shown in Figure 2. The MSE is the matching between the base
Embedded System Implementation for an Object Detection
233
block and the target block. It is calculated by squaring the difference between the pixels corresponding to the location of each block. Here, the block with the lowest average value is the conformed block. Equation (1) shows the MSE. M SE(i, Y ) =
B 1 ((I(X + M, Y + N ) − I(i + M, Y + N ))2 ) B2
(1)
M,N =1
In(1) , initial coordinate of the test block are defined X, Y , test block moving width are defined i , block size is defined B (B × B) , pixel width traveling in the block are defined M, N. 2.3
Focus
Figure 3 shows the proportion to the object looking from the Stereo Camera. The two triangles are proportional because the two triangles are symmetrical to the axis of the camera. Equations (2), (3), and (4) can be obtained when formulating the proportion of the actual image distance and the Focus with X. D =X ×F
(2)
M L = X × RL
(3)
M R = X × RR
(4)
The distance between the image center of the left camera and that of the right camera is L, which is the camera axis distance, and is expressed as equation (5). L = ML + MR
Fig. 3. Proportion to the object looking from the Stereo Camera
(5)
234
C.-H. Moon and D.-Y. Jang
Equation (6) is obtained by substituting equations (3) and (4) for equation (5). X = L/(RL + RR)
(6)
Equation (7) can be obtained by substituting equation (2) for (6). D = L × F/(RL + RR)
(7)
The distance from the target object can be obtained using equation (7). The distance between the camera lens (Focus) and the image sensor can be calculated from equation (7). F = (RL + RR) × D/L (8) As ML+MR, the object coordinates the distance between which was known are the coordinates of a certain object in stereo image, we can convert the value of difference (RL + RR) to DEPTH. 2.4
DEPTH MAP
The DEPTH MAP is acquired by saving the DEPTH value a pixel difference value acquired by stereo matching the array with the size of the image.[6] The larger the DEPTH value is, the closer the object is and visa versa. The DEPTH MAP converts the DEPTH value into data so that the distance of the object in each pixel can be expressed as a visual image. In addition, because the DEPTH MAP itself shows the distance of the object, the object can be detected at a certain distance by comparing the DEPTH MAP with the DEPTH value of a certain distance (the distance that user wishes to know).[2][4][6][7]
3
Hardware Design
Figure 4 shows the system architecture of the embedded system developed in this article . The Embedded System controls the whole system using a 32bit
Fig. 4. Architecture of the Embedded System
Embedded System Implementation for an Object Detection
235
RISC processor [5] and processes the stereo matching in parallel. The whole system architecture consists of an Image Input Decoder [8] that converts the analog image signals from the stereo CCD camera to digital, a Memory Control Part (FPGA) that is responsible for the saving the entered image data in the memory or displaying the data from the memory[9], and a system control part that performs stereo matching and controls the entire system.
4
Experiment and Result
A stereo image was acquired from the decoder and the Y signal was detected from the acquired image. When the stereo image was first received, stereo matching was performed to determine the distance between the camera lens and the image sensor, and the Focus F was calculated. Whole stereo matching was then performed to obtain the DEPTH MAP. The target distance was entered to obtain the DEPTH value of the entered distance, and the calculated value was compared with the DEPTH MAP to detect the object at a certain distance. Figure 5 shows a flow chart from image acquisition to the detection of the object in the embedded system.
Fig. 5. Experiment Flowchart
236
C.-H. Moon and D.-Y. Jang
4.1
Frame Buffer Simulation
Figures 6 and 7 show the verification of the read/write functions of the frame buffer memory through the synchronization signals entered from the decoder.
Fig. 6. Frame Buffer Write Signal Verification
Fig. 7. Frame Buffer Read Signal Verification
Fig. 8. MSE of the Stereo Matching
Embedded System Implementation for an Object Detection
4.2
237
Specific Area Stereo Matching
The specific area was chosen as a base block to obtain the matching area. The start point of the base block was (118, 290) and end point was (134, 306). As the graph of figures 8, the start point of block area was (0, 290) and the end point was(134, 306). MSE value was obtained equation(1). Figure 8 shows the maximum MSE value(11014) and minimum value(665). The stereo matching is to improve the block matching. The difference of computational count is equal to Table 1 in the specific area. The image size is 640×480 and a base block size is 16×16. Table 1. Compare Compatational Count Type Compatational Count Block Matching[4] Full Image size(640×480) ×Base Block size(16×16) Stereo Matching Specific Area(118) ×Base Block size(16×16)
4.3
Object Detection
Because it was assumed that the stereo image satisfies the epipolar conditions, it only has a horizontal difference not a vertical difference. Therefore, in the matching process, the block was set in the horizontal zone with the vertical zone excluded. Figure 9 shows the method for calculating the Focus by detecting
Fig. 9. Block Coordination Calculation and Focus
the DEPTH using stereo matching. Here, 118 pixels means the block coordinate corresponding to the first left image, and 50 pixels is the value acquired from stereo matching of the right image. DEPTH Value : 118pixel - 50pixel = 68pixel The Focus F was calculated using the DEPTH value as follows: F = 68pixel × 3.15m / 66mm = 3245pixel
238
C.-H. Moon and D.-Y. Jang
As the unit of the calculated F value is a pixel, a pixel value of 6.35 was multiplied to obtain the actual distance. The Actual Distance of the Focus = 3245pixel × 6.35 = 20.6mm.
Fig. 10. DEPTH MAP using stereo matching
Fig. 11. Detection of the Object 3.15 m away from the camera
Fig. 12. Detection of the Object 3.51 m away from the camera
Embedded System Implementation for an Object Detection
239
In figure 10, the object was detected by comparing the DEPTH MAP with whole stereo matching. Figure 10 shows a visual image of the DEPTH value acquired by stereo matching of the left and right image, which was saved in DEPTH MAP. The potato chips were the brightest, which means that they were closest to the camera. Next were the neighboring the lighter and monitor followed by the books in the bookshelf. The black part is the area that could not be calculated accurately through stereo matching. The entered distance was converted to a DEPTH value using stereo matching so that the object could be detected at a certain distance. An object at a certain distance can be obtained by a comparison of the DEPTH value of a certain distance with the pre-made DEPTH MAP. The coordinate in the DEPTH MAP is the object of interest, which has the same value of the DEPTH value. DistanceM easurement : D = L × F/(RL + RR)
(9)
RL + RR can be calculated using the following equations: RL + RR = L × F/D
(10)
DEP T Hvalue = L × F/D
(11)
The object can be detected at a certain distance by comparing the DEPTH MAP shown in Figure 10 with the DEPTH value calculated from equation 12. The two figures below shows the objects, 3.15 m and 3.51m away from the camera. Table 2 shows distance measuring errors from 30 DEPTH value(742.1cm) to 100 DEPTH value(211.9cm). Table 2. Distance Measuring Error No DEPTH value Computational Actual Error (pixel) Distance(cm) Distance(cm) (%) 1 30 713.9 742.1 3.95 2 40 535.43 558.3 4.27 3 50 428.34 438.2 2.3 4 60 356.95 360.8 1.07 5 70 305.96 306.2 0.7 6 80 267.71 265.8 0.71 7 90 237.97 234.7 1.37 8 100 214.17 211.9 1.06
5
Conclusion
A stereo image can be used to obtain distance information of an object without the limitations of atmospheric conditions. It is expected that including this stereo image system into an embedded system will result in a compact system with an image memory that can be controlled with a single IC, lower construction costs and versatility in various fields with slight modification. This study
240
C.-H. Moon and D.-Y. Jang
implemented an embedded system that can detect an object at a certain distance using stereo image. In order to acquire a stereo image, two CCD cameras were installed in parallel structure and an analog image (camera output) was acquired as a digital image signal, YUV 4:2:2, using two image decoders. The frame memory was controlled using a processor control signal and a decoder reference signal through the FPGA logic design and the digital image data was transmitted to the processor. The Y signal was detected with the system level software of the processor and the DEPTH MAP was created through stereo matching. After acquiring the DEPTH value of the distance that the user wanted, it was compared with the pre-created DEPTH MAP to detect the object of interest at a certain distance. The detected information and image was transmitted to a PC.
Acknowledgements This research was supported by the Program for the Training of Graduate Students in Reginal Innovation and the Technical Developement of Regional Industry which was conducted by the Ministry of Commerce, Industry and Energy of the Korean Government.
References 1. Cho, Y.S.: A Study on High Speed Tracing of the Moving Object using Moving Vector Detection and Separation Method. Ph.D Thesis 13-15 2. Hong C.: A Study on the Separation of the Object and Background using Difference Information in 3 Dimensional Image, Engineering Master Degree Thesis, (2001) 3. International Techno Information Laboratory : CCD Camera and Image Processing Circuit Design, Seon Ho Park and et al, (2000) 4. Gyaourova,A., Kamath, C., Cheung S.-C.: Block Matching for Object Tracking, UCRL-TR-200271, October, (2003) 5. Intel : StrongARM SA-1110 Microprocessor Developer’s Manual, June, (2000). 6. Kim E.S., Lee’ S.H.: Base of 3 dimensional Image, (1998) 7. Do, K.H.: Image Segmentation using Stereo matching Transition Information, Dongseo Thesis Collection of Dongseo University (1996) 10. 115-127 8. Philips : SAA7114H PAL/NTSC/SECAM video decoder Mar, (2000) 9. Averlogic : AL422B Data sheet Revision V1.3, April, (1999)
Graphic Editing Tools in Bioluminescent Imaging Simulation Hui Li1,2, Jie Tian1, Jie Luo1, and Yujie Lv1 1
Key Laboratory of Complex Systems and Intelligence Science, Institute of Automation, Chinese Academy of Sciences, P.O. Box 2728, Postfach 100080, Beijing, China
[email protected] 2 Department of Education Technology, Capital Normal University, Postfach 100037, Beijing, China
Abstract. It is a challenging task to accurately describe complicated biological tissues and bioluminescent sources in bioluminescent imaging simulation. To a certain extent, complicated anatomical structures and bioluminescent sources can be approximately modeled by combining a sufficient large number of geometric building blocks with Boolean operators. However, those models are too simple to describe the local features and fine changes in 2D/3D irregular contours. Therefore, interactive graphic editing tools are developed to interactively correct or improve the initial models of anatomical structures or bioluminescent sources and to efficiently model each part of the bioluminescent simulation environment. With initial models composed of geometric building blocks, interactive spline mode is applied to conveniently perform dragging and compressing operations on 2D/3D local surface of biological tissues and bioluminescent sources inside the region/volume of interest. Several applications of the interactive graphic editing tools will be presented in this article.
1 Introduction With the development of molecular marker technique and optical imaging technique, in vivo bioluminescent imaging attracts more and more attention and can be used to non-invasively visualize the physiological and pathological process of biological tissues in real time[1,2]. As an important part of bioluminescence tomography, bioluminescent simulation environment (BSE) is developed to simulate bioluminescent phenomena in the living small animal and to predict bioluminescent signals detectable outside the animal. It is a challenging task to build the virtual BSE which accurately describes the complicated biological tissues and bioluminescent sources. The accuracy of the BSE directly influences the precision of the simulated results, so it is crucial to build appropriate graphic editing tools. Several graphic editing tools (GET) have been developed to efficiently model each biological tissue of the BSE. In general, there are two major types of graphic editing tools, i.e., non-interactive GET and interactive GET. The first type is usually applied in most of optical simulation platforms (e.g. MCNP[3], EGS4[4], MCML[5], TracePro[6]). Regular geometric graphics and superquadrics are taken as the geometric building D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 241 – 250, 2006. © Springer-Verlag Berlin Heidelberg 2006
242
H. Li et al.
blocks (GBB) of these platforms. With Boolean operators, anatomical structures of biological tissues and bioluminescent sources can be approximately modeled by combining a sufficient large number of GBB. However, because of the complexity of medical images, GBB models are too simple to describe the local features and fine changes in 2D/3D irregular contours. Therefore, interactive graphic editing tools are developed to facilitate the local modifications of any initial surface model, especially the virtually specific contours. In our bioluminescent imaging simulation software MOSE[7], Bezier cubic spline mode is applied to conveniently perform dragging and compressing operations on 2D/3D local surface of biological tissues and bioluminescent sources inside the region/volume of interest. Key surface characters of complicated anatomical structures and bioluminescent sources can be accurately described by shifting the control points of Bezier cubic spline curves. In this article, we present these graphic editing tools not only for building the virtual biological environment but also for conveniently modifying the shape of the biological tissues and bioluminescent sources. In particular, interactive graphic editing tools are emphasized in the article due to its advantages of describing the complicated biological tissues. These tools have been integrated into MOSE which realizes the forward problem of the bioluminescent tomography imaging.
2 Non-interactive Graphic Editing Tools Because of the simpleness and convenience in building a virtual environment, noninteractive graphic editing tools are pervasively used in most of the simulation platforms. Most of the non-interactive GET can be described by mathematic functions with several parameters, so different parameters generate various geometric shapes. As the basic building blocks, regular geometric graphics and superquadrics are chosen in MOSE platform. With various parameters of these proposed primitives and Boolean operators (e.g. intersection, and subtraction), a large variety of solid building blocks can be modeled into a distinctly irregular structure of biological tissues. Related information of these building blocks can be stored in the graphic database, so that the simulated tissue structures can be quickly input and conveniently modified. 2.1 Regular Geometric Graphics 2D/3D regular geometric graphics usually involve ellipse, polygon, ellipsoid, cylinder, and so on, which are basic building blocks in many simulation platforms. Functions of these primitives can be easily obtained according to certain geometric parameters (e.g. center coordinates ( x0 , y 0 ) , long axis a , and short axis b of the ellipse). By varying these parameters, shapes of the geometric graphics are conveniently changed. Shown as Fig. 1, a 2D thorax phantom model of the mouse has been realized by the combination of 11 ellipses, whose geometric functions can be expressed as ( x - x0 )2 / a 2 + ( y - y0 )2 / b2 = 1 . In Fig. 1, it is easy to see the complicate structures generated by combination operations of different ellipses. For example, the left/right lung tissue is made up of the union of three ellipses; and muscle is described by subtraction between the mouse skin (i.e. the outer ellipse) and all the other tissues
Graphic Editing Tools in Bioluminescent Imaging Simulation
243
(i.e. lung, heart, spine, sternum), which is a typical irregular structure. Sternum y Skin
Left lung
Heart Right lung
x
Muscle
Spine
Fig. 1. 2D thorax phantom model of the mouse with the combination operations of different ellipses
2.2 Superquadrics The superquadric models were introduced to the computer graphics field in 1981[8-11]. There are four types of superquadric models: superellipsoid, supertoroid, and superhyperboloid with one or two sheets. Among these four types, only the superellipsoid defines a closed surface without holes, which is always consistent with the condition of real environment. Therefore, the superellipsoid is commonly referred to as the superquadric. In our simulation platform, the superellipsoid is used as a particular geometric primitive. A superellipsoid surface is defined by an implicit equation: ( x / rx
2/ε2
+ y / ry
2 / ε2 ε / ε 2 1
)
+ z / rz
2 / ε1
=1
(1)
where radius parameters rx , ry , rz denote the scaling factors on x, y and z axes,
squareness parameters ε 1 , ε 2 are the shape parameters related to the squareness/ roundness/pinchedness in the longitudinal and horizontal directions respectively. Another concise definition of the superellipsoid in the sphere coordinate system is described as
ª rx sgn(cosη cos ω ) cosη ε1 cos ω ε 2 º « » ε ε r (η , ω ) = « ry sgn(cosη sin ω ) cosη 1 cos ω 2 » « » « rz sgn(sin η ) sin η ε1 » ¬ ¼
(2)
where η ,ω are latitude and longitude angles with −π / 2 ≤ η ≤ π / 2 , −π ≤ ω ≤ π , and sgn(x ) is the sign function. By varying the value squareness parameters ε 1 , ε 2 , a wide
244
H. Li et al.
range of shapes can be conveniently generated. Roughly speaking, if a squareness parameter is significantly less than 1, the geometry is somewhat square; if it is close to 1, the object is quite round; if it is close to 2, the shape has a flat bevel; if it is greater than 2, the structure is pinched. Fig. 2 presents a 3D mouse thorax and abdomen model simulated by superquadrics.
Fig. 2. 3D mouse thorax and abdomen model with the combination of different ellipsoids, cylinders, superquadrics, etc
When the building blocks are the superellipsoid, it is easy to judge whether a point p( x, y, z ) is inside of the shape according to the inside-outside function:
F ( x, y, z ) = ( x / rx
2 / ε2
+ y / ry
2/ ε 2 ε / ε 2 1
)
+ z / rz
2 / ε1
.
(3)
If function F ( x, y, z ) < 1 , point p is inside the superellipsoid; if F ( x, y, z ) = 1 , point p is on the boundary of the superellipsoid; if F ( x, y, z ) > 1 , point p is inside the superellipsoid.
3 Interactive Graphic Editing Tools The combinations of non-interactive graphic editing tools can describe some relatively complex shapes, but it is hard to describe local features and fine changes in 2D/3D contours of anatomical structures. Because of the remarkable complexity of biological tissues and bioluminescent sources, interactive GET models are usually chosen to modify local structures of 2D/3D simulated shapes. 2D/3D interactive GET have been respectively realized and integrated in the MOSE. In the following section, our work focuses on the generation and modification of 3D bioluminescent sources for brevity. The original shape of any 3D irregular object is usually chosen as a sphere or a cylinder whose parameters are determined according to the prior knowledge of operators. The Bezier cubic spline mode is applied as the interactive editing tool.
Graphic Editing Tools in Bioluminescent Imaging Simulation
245
First, the volume of interest (VOI) of the original object is selected, which defines the local surface to be modified. Then, the selected local surface of the original 3D object can be dragged via a so-called control point pc along any direction, which can be repositioned interactively. The position of the original control point pc is determined by the shape of the local surface S L which needs to be modified. When the position of the control point is changed, the whole local surface is modified accordingly by a Bezier cubic spline mode. In 3D simulation environment, multiple Bezier cubic splines are used to describe the modified local surface inside the VOI of the 3D object. To make this 3D interactive mode easier to be understood, we begin with the 2D Bezier cubic spline curve. The 2D Bezier cubic spline curve can be described algebraically by a Bernstein polynomial of degree 3[12,13]: p (t ) = (1 − t ) 3 p 0 + 3(1 − t ) 2 tp1 + 3(1 − t )t 2 p 2 + t 3 p 3
(4)
where t varies between 0 and 1. Fig. 3 shows the 2D typical Bezier cubic splines. pc
p1
SL
p′c p2
S′L
p1
p2
Spline
p0
Initial spline
p3 p0
p3
Object
Object
(a)
(b)
Fig. 3. The schematics of the 2D local surface modified by the control point pc and described by Bezier cubic splines
From the Eq. (4), it is evident that four points p0 , p1 , p2 , and p3 are needed to determine the shape of the spline curve. The two end points p0 and p3 (shown as Fig. 3) are fixed, because they are the boundary points of the selected local surface of the original object. However, it is still difficult to interactively operate on the other two points p1 and p2 simultaneously. The solution is to search one control point pc to express two points p1 and p2 . In the MOSE, we chose points p1 and p2 are the midpoints of line segments p0 pc and p3 pc respectively. Then, Eq. (4) is rewritten as: p (t ) = 1.5[(1 − t )2 t + (1 − t )t 2 ] pc + [(1 − t )3 + 1.5(1 − t ) 2 t ] p0 + [t 3 + 1.5(1 − t )t 2 ] p3 .
(5)
246
H. Li et al.
Because the two points p0 and p3 are known, the whole Bezier cubic spline can be determined by the control point pc only. When the control point is moved, the local surface S L can be conveniently modified accordingly (Fig. 3 (b)). The mechanism of 3D surface modification is the same as that of 2D surface modification. The difference lies in that 3D local surface is made up of many Bezier cubic splines with the same control point pc . Known the 3D original local surface to be modified and a series of initial Bezier cubic splines determined by the control point, the 3D local surface of the object is modified by a group of Bezier cubic splines in 3D simulation environment (Fig. 4).
pc′
pc
p1
p2
p1
Initial spline
p2 Spline
p3
p0 Object
p0
p3 Objec
Fig. 4. The schematics of 3D local surface modification
In the MOSE, all the 3D shape including the biological tissues and bioluminescent sources are described by a series of triangle meshes. It is crucial to judge whether a point is inside a 3D irregular shape. Given a point P and a certain irregular contour, ))* the line LP with certain direction can be easily obtained. The number N of ))* intersections between LP and the contour described by 3D interactive graphic editing tools is used to determine whether the point P is inside of the contour. If N is odd, the point P is inside the shape; if N is even or zero, the point P is outside the shape; if the point P satisfies the function of contour composed of many triangle meshes, it is on the boundary of the 3D shape.
4 Experimental Results The input of our simulation platform MOSE is a series of micro-CT slices or image volume including prerequisite parameters (e.g. the input includes image width/height of each slice, the total number of slices, the inter-slice distance, and the optical properties of biological tissues). With 3D imaging processing algorithms[14,15] (e.g. segmentation[15,16], surface rendering[17,18], mesh simplification[19,20]), any irregular biological tissue can be described by a series of triangular meshes. The virtual
Graphic Editing Tools in Bioluminescent Imaging Simulation
247
biological environment is completed after the combination of all biological tissues. Then, with the interactive graphic editing tools presented above, the irregular 2D/3D shapes can be conveniently generated and modified according to the interactive operations and prior knowledge of the operator. The following experiment shows the implementation of the virtual biological environment and the modification of 3D bioluminescent source. In this experiment, the input to the MOSE is a 145×122×86 volume from 86 CT slices. The voxel dimensions are 0.156, 0.156, and 0.1428 millimeter, respectively. With 3D imaging processing algorithms, the geometric model of the mouse thorax with transparent effect, i.e. the virtual biological environment, is obtained, shown as the purple graphics in Fig. 5. Furthermore, the 3D original bioluminescent source can be conveniently given by initializing the parameters (e.g. the coordinates of the center, the scaling factors on x, y and z axes, the original shape and distribution function of the object) of the control panel in MOSE interface. With the movement of slides, the position and dimensions of the original bioluminescent source can be easily modified. Fig. 5 (a) shows the virtual biological environment and the original shape of the 3D bioluminescent source; Fig. 5 (b) presents the source movement with the modification of the position parameters; and Fig. 5 (c) denotes the change of source dimensions with the modification of the scaling parameters.
(a)
(b)
(c)
Fig. 5. The position and dimensions of a 3D bioluminescent source are conveniently modified by the control panel
Besides the change of source position and dimension, the 3D local surface of the bioluminescent source can be conveniently modified by the Bezier cubic spline mode introduced in Section 3. In this experiment, the original bioluminescent source is given as a solid sphere described by a series of triangle meshes, which is shown as the red graphics in Fig. 6. First, according to the operators’ experience or prior
248
H. Li et al.
knowledge, he should select the volume of interest (VOI) of the original bioluminescent source, which defines the 3D original local surfaces to be modified. Then, the default position of the original control point pc is determined by the shape of the chosen local surface and automatically displayed in the MOSE interface. And the original local surface is described by a series of Bezier cubic splines. When the location of the control point pc is changed, the selected local surface inside the VOI is updated to a new series of Bezier cubic splines. The yellow graphics in Fig. 6 shows four modified local surfaces of the original bioluminescent sources generated by four groups of Bezier cubic splines.
Fig. 6. Three views of the original bioluminescent sources (the solid sphere) and the local surface modified by the Bezier cubic spline curves (the local convex surfaces)
5 Discussion Several graphic editing tools have been developed and successfully applied in accurately modeling 2D/3D complicated shapes in the bioluminescent imaging simulation. All computation algorithms have been realized and completely integrated in our bioluminescent imaging simulation platform named MOSE. Our experiment results indicate that graphic editing tools (especially geometric building blocks and Bezier cubic spline mode) can be used to efficiently model the local surface features and to interactively modify the initial contours of irregular shapes (e.g. the bioluminescent sources) in the virtual bioluminescent simulation environment. Moreover, we proposed efficient algorithms to judge whether a point lies in the modified 2D/3D shape and to calculate the intersection points where the photon hits the boundaries of given phantoms, which ensures higher precision of the simulation results. In our bioluminescent imaging simulation environment, the interactive graphic editing tool is efficiently combined with the geometric building blocks, so that local features of 2D irregular contours and 3D complex surfaces can be efficiently modeled by manual corrections of the initial GBB models. As a result, according to operators’ prior knowledge, the bioluminescent imaging simulation environment especially bioluminescent sources can be precisely simulated by the methods proposed in this artical.
Graphic Editing Tools in Bioluminescent Imaging Simulation
249
Acknowlegement This work was supported in part by the National Natural Science Fund for Distinguished Young Scholars of China (Grant No. 60225008), the National Natural Science Fundation of China (Grant No. 30500131).
References 1. Weissleder, R., Mahmood, U.: Molecular Imaging. Radiology 219 (2001) 316-333 2. Blasberg, R.G., Gelovani-Tjuvajev, J.: In Vivo Molecular-Genetic Imaging. Journal of Cellular Biochemistry, 87 (2002) 172-183 3. Briesmeister, J. F. : MCNP-A General Monte Carlo N-Particle Transport Code, Version 4C. User’s Manual LA-13709-M, Los Alamos National Laboratory, (2000) 4. Bielajew, A. F., Hirayama, H., Nelson, W. R., Rogens, D. W. O.: History, Overview and Recent Improvements of EGS4. National Research Council of Canada Report PIRS-0436 (1994) 5. Wang, L. H., Jacques, S. L., Zheng, L. Q.: MCML-Monte Carlo Modeling of Light Transport in Multi-Layered Tissues. Computer Methods and Programs in Biomedicine, 47 (1995) 131-146 6. Lambda Research Corporation. User’s Manual (Release 3.0) of TracePro (Software for Opto-Mechanical Modeling). Lambda Research Corporation, (2002) 7. Li, H., Tian, J., Zhu, F. P., Cong, W. X., Wang, L. H., Hoffman, E. A., Wang, G.: A Mouse Optical Simulation Environment (MOSE) to Investigate Bioluminescent Phenomena in the Living Mouse with the Monte Carlo Method. Academic Radiology, 11 (2004) 1029-1038 8. Barr, A. H.: Superquadrics and Angle Preserving Transformation. IEEE Computer Graphic and Application, 1 (1981) 11-23 9. Barr, A. H.: Global and Local Deformations of Solid Primitives. Cmputer Graphic 18 (1984) 21-30 10. Sinnott, J., Howard, T.: SQUIDS: Interactive Deformation of Superquadrics for Model Matching In Virtual Environment. In: Lundervold, D. (ed.): Proceedings of the Eurographics UK Conference. Eurographics UK Chapter, Abingdon (2000) 73-80 11. Zhu, J. H., Zhao, S. Y., Ye, Y. B., Wang, G.: Computed Tomography Simulation with Superquadrics. Medical Physics, 32 (2005) 3136-3143 12. Farin, G.: Curves and Surfaces for Computer Aided Geometric Design: A Practical Guide. Academic Press, New York (1990) 13. Kang, Y., Engelke, K., Kalender, W. A.: Interactive 3D Editing Tools for Image Segmentation. Medical Image Analysis, 8 (2004) 35-46 14. Zhao, M., Tian, J., Zhu, X., Xue, J. Cheng, Z. L., Zhao, H.: Design and Implementation of a C++ Toolkit for Integrated Medical Image Processing and Analyzing. In: Robert, L., Galloway, J. (eds.): Visualization, Image-Guided Procedures and Display. SPIE Proceedings, Vol. 5367. SPIE Press, (2004) 39-47 15. Zhu, F. P., Tian, J.: Modified Fast Marching and Level Set Method for Medical Image Segmentation. Journal of X-Ray Science and Technology, 11 (2003) 193-204 16. Lin, Y., Tian, J., He, H. G.: Image Segmentation Via Fuzzy Object Extraction and Edge Detection and Its Medical Application. Journal of X-Ray Science and Technology, 10 (2001) 95-106
250
H. Li et al.
17. Zhao, M., Tian, J., Li, G. M., He, H.G.: A Practical Surface Reconstruction Algorithm for Very Large Medical Datasets. In: Lu, H.Q., Zhang, T.X. (eds.): Third International Symposium on Multispectral Image Processing and Pattern Recognition (MIPPR). SPIE Proceedings, Vol. 5286. SPIE Press, (2003) 243-247 18. Zhao, M., Tian, J., He, H.G., Li, G.M.: Points Based Reconstruction and Rendering of 3D Shapes from Large Volume Dataset. In: Robert, L., Galloway, J. (eds.): Visualization, Image-Guided Procedures and Display. SPIE Proceedings, Vol. 5029. SPIE Press, (2003) 18-26 19. Garland, M., Heckbert, P. S.: Surface Simplification Using Quadric Error Metric. In: Whitted, T. (ed.): SIGGRAPH97. SIGGRAPH Proceeding Series. ACM Press, New York (1997) 209-216 20. Li, G. M., Tian, J., Zhao, M., He, H.G.: A New Mesh Simplification Algorithm Combining Half-Edge Data Structures with Modified Quadric Error Metric. In: Kasturi, R., Laurendeau, D., Suen C. (eds.): Proceedings of the 16th International Conferences on Pattern Recognition. ICPR Proceedings. IEEE Computer Society Press, Los Alamitos (2002) 659-662
Harmonics Real Time Identification Based on ANN, GPS and Distributed Ethernet Zhijian Hu and Chengxue Zhang School of Electrical Engineering,Wuhan University 430072, Wuhan, P. R. China
[email protected],
[email protected]
Abstract. A novel harmonic real time identification method by artificial neural network based on GPS technology and distributed Ethernet was proposed in this paper. The method uses an artificial neural network to estimate the amplitudes and phase angles of the distorted current/voltage in power system. In this method, only half cycle harmonic current signal was used as the input of the neural network. In order to improve the accuracy of harmonic source identification, Global Positioning System (GPS) is used as the synchronized signal for an embedded harmonics measurement system based on digital signal processor (DSP). The samples selecting and training methods of artificial neural network are explained and the hardware structure of the embedded harmonic identification system is given. Real-Time Digital Simulator (RTDS) simulation results prove the effectiveness of the proposed method.
1 Introduction We know that nonlinear loads, such as inverters, rectifiers, AC drives and DC drives, generate power system harmonics. Harmonic can flow into the distribution system, causing many problems for the power system operation. In order to avoid these problems and to improve the quality of the delivered energy, on the one hand, harmonic parameters such as amplitudes and phase angle should be known. On the other hand, we need to identify the harmonic sources in the power system and take measures to eliminate the harmonics. The difficulty in measuring the power system harmonics comes from the fact that harmonic generating loads are dynamic in nature. Harmonics monitoring was still considered not well developed [1]. The use of FFT algorithm reduces the calculating time required for evaluation of the DFT by several orders of magnitude. However, there are several limitations of the DFT method and they are due to the implicit windowing of the data that occurs when processing blocks of data with FFT. Windowing manifests itself as ‘leakage’ in the spectral domain [5]. Recently, the applications of Artificial Neural Network (ANN) for power system have gained considerable attention. Many neural network models have been proposed for power system harmonic estimation. In [2], the initial estimates by ANN are used as pseudo-measurements for harmonic state estimation. This approach permits D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 251 – 262, 2006. © Springer-Verlag Berlin Heidelberg 2006
252
Z. Hu and C. Zhang
measurement with relatively few measuring instruments. ANN is used to estimate the harmonic components in [3]. A Hopefield Neural Network is proposed to identify amplitude and phase angle of harmonics in [4]. A Fourier Neural Network (FNN) is developed and implemented for harmonic estimate in [5]. In [9], a neural-network (NN)-based approach to non-intrusive harmonic source identification was proposed. In this approach, NNs are trained to extract important features from the input current waveform to uniquely identify various types of devices using their distinct harmonic “signatures.” In [10], a new approach based on non-linear least squares parameter estimation was proposed. It used Hopfield type feedback neural networks for realtime harmonic evaluation. In [11], ANN is used to control the HPC amidst a very dynamic power system environment. The training and performance of the ANN is optimized in terms of training set size, training set packing. In [12], a technique to distinguish between magnetizing inrush and internal fault currents of a power transformer is presented. Unlike the existing relaying techniques, this method is independent of the harmonic contents of the differential current. In this paper, a harmonic real time identification and measurement method by artificial neural network based on Global Positioning System (GPS) and embedded Ethernet was proposed. This method overcomes the shortage of FFT algorithm in calculating the phase angle of harmonics. The ANN algorithm uses only half-cycle distorted wave to estimate each order of harmonics, so, it can be used for on line harmonic measurement. Another new idea is that, in order to monitor harmonic power flow and identify harmonic sources, all harmonic currents at different monitoring spots should be sampled simultaneously. GPS is used as the synchronized signal in the measurement system.
2 Artificial Neural Network Model and Algorithm for Harmonics Identification A back-propagation network (BP) is used to estimate the power system harmonics. A typical three-layered feed-forward neural network is shown in Fig.1.
…
xj
wij
Input Layer
…
yi Ol
…
…
wli
Hidden Layer Output Layer
Fig. 1. Three-layered feed-forward neural network
Harmonics Real Time Identification Based on ANN, GPS and Distributed Ethernet
253
The nodes are the processing units, which receive input from their upper side and deliver output on the lower side. The input signals are applied to the input layer. The input pattern is transmitted to the input of the hidden layer through the weighted network connections. The hidden units receive the weighted pattern, where the signals are combined in an activation function. Supposing x j is the input node, y i is the hidden node, ol is the output node and tl is the expected output of ANN. The BP neural network algorithm is simply illustrated in the paper. 2.1 The Calculation of the Output Nodes The output nodes
ol are calculated by (1) and (2), yi = f (
¦w
ij x j
−θi )
(1)
j
ol = f (
¦w y − θ ) li i
l
(2)
i
Where wij is the connection weight between node j and i,
wli is the connection
weight between node l and i, θ i and θ l are the thresholds of node i and node l respectively. f (⋅) is the activation function. 2.2 The Adjustment of the Output Nodes The adjustment of the output nodes is given by (3), (4), (5) and (6). n
ek = ¦ t l(k ) − ol(k )
(3)
δ l = (t l − ol ) f ′(netl )
(4)
wli (k + 1) = wli (k ) + Δwli = wli (k ) + ηδ l y i
(5)
θ l (k +1) = θ il (k ) + ηδ l
(6)
l =1
2.3 The Adjustment of the Hidden Nodes The adjustment of the hidden nodes is given by (7), (8) and (9).
δ i = y i (1 − y i )
¦δ w l
li
(7)
l
wij (k + 1) = wij (k ) + Δwij = wij (k ) + η ′δ i x j
(8)
254
Z. Hu and C. Zhang
θ i (k +1) = θ i (k ) + η ′δ i
(9)
In order to accelerate the convergence of ANN, a momentum term is added to (5) and (8), therefore Δwli and Δwij can be written as follows,
Δwli ( k + 1) = ηδ l yi + βΔwli ( k )
(10)
Δwij (k + 1) = η ′δ i x j + βΔwij (k )
(11)
Where η and η ′ are learning rate and β is the momentum factor.
3 Harmonics Identification Algorithm by Artificial Neural Network One of the purposes of harmonic measurement is to compensate the harmonic current components, namely, to eliminate harmonics. Active power filter has been widely used nowadays. The principal function of the active power filter is to separate the unwanted harmonic components from the original fundamental component. An artificial neural network is employed to predicate the magnitude and angle of harmonic current. This method applied only half cycle of the load current to estimate the harmonic current from the distorted signals. Supposing the power system of the given fundamental angular frequency w is distorted by the higher order harmonics of unknown magnitudes and phases. The general form of the line current i (t ) is predicted as follows, i (t ) =
N
¦A
n
n =1
sin ( nω t + ϕ n ) n = 1, 2, ..., N
Where An is the magnitude and
ϕn
(12)
is the phase of n-th harmonic component.
In normal operating conditions, only odd harmonic are present in electrical equipment and electronic appliance. So (12) may be written as i(t ) =
N
¦ (a
n=1,3,5
n
sin nωt + bn cos nωt )
(13)
Where a n = An cos ϕ n , bn = An sin ϕ n . By estimating the values of a n and bn , the magnitude and phase of n-th harmonic component can be determined from (14) and (15).
An = an2 + bn2
(14)
ϕ n = arctan( bn a n )
(15)
Harmonics Real Time Identification Based on ANN, GPS and Distributed Ethernet
255
The process of magnitude and phase identification of each harmonic by artificial neural network from the distorted current is shown in Fig.2 and Fig.3. In the method, only half cycle of harmonic signal is used to estimate the harmonic components.
Load Current
Half Cycle
iN 2
…
Input Layer
…
…
Hidden Layer Output Layer
... … 1th 3th 5th 7th 15th 1th 3th 5th 7th 15th Sine Components Detection Cosine Components Detection Fig. 2. ANN based harmonic component determination (hidden layer completely connected)
Load Current … Partly Connected
iN 2
Input Layer
…
…
…
Hidden Layer …
Output Layer
1th 3th 5th 7th 15th 1th 3th 5th 7th 15th Sine Components Detection Cosine Components Detection Fig. 3. ANN based harmonic component determination (hidden layer partly connected)
In order to check the accuracy of ANN with hidden partly connected, we input untrained harmonic currents with different frequencies to the trained ANN and the measurement results are given in table 1. From the measurement results we can see, the measurement accuracy of ANN is quit high.
256
Z. Hu and C. Zhang Table 1. ANN measurement results for un-trained harmonic current Frequency Order of Harmonics (Hz) Magnitude(p.u.) 3th angle(degree) 49.8 magnitude(p.u.) 5th angle(degree) 3th 50.0 th
5
3th 50.2 5th
Test Signal 0.08
ANN Relatively Measurement Error (%) 0.08007 0.093
180
179.902
0.07
0.07006
0.084
30
30.0159
0.053
magnitude(p.u.)
0.27
0.2699
-0.078
angle(degree)
90
89.9755
-0.027
magnitude(p.u.)
0.22
0.2202
0.091
angle(degree)
90
89.937
-0.070
magnitude(p.u.)
0.43
0.4296
-0.093
angle(degree)
60
59.968
-0.053
magnitude(p.u.)
0.46
0.4597
-0.065
angle (degree)
150
149.871
-0.086
-0.054
4 The Hardware Structure of the Embedded Harmonics Measurement System For real time harmonic measurement and harmonics sources identification, harmonic currents of all measuring spots must be measured simultaneously. The 1PPS pulse of GPS is used as synchronization signal of the measurement system [6]. The hardware structure of the monitoring instrument based on GPS technology is shown in Fig.4. TA
is
il
ul
TV
A/D Converter
ic Inverter ROM
PPS
Double-Port RAM CPU RS 232
Ether Network
System Bus
DSP
GPS Receiver
Hard Disk I/O LCD Monitor
Fig. 4. The hardware structure of the monitoring instrument
Harmonics Real Time Identification Based on ANN, GPS and Distributed Ethernet
257
In Fig.5, an embedded DSP TMS320-V33 is used as the processor for ANN harmonic estimation algorithm. The harmonics estimated by ANN are stored in the double-port RAM, main CPU reads the estimated measurement results from doubleport RAM and sends them to central process computer through Ethernet interface. At the same time, according to harmonic identification results, DSP sends the control signals to an inverter to compensate the harmonic current components. Because ANN uses only half cycle current wave to identify amplitudes and angles of harmonics, the measurement and control system can realize real-time measurement and compensateion of harmonic current. As GPS synchronized signal is used in the measurement system, the synchronization of data acquisition at different locations is realized. So the system can measure harmonic power flow of power system and identify harmonic sources. The measurement configuration of the harmonic monitoring system is shown on Fig.5. In Fig.6, the detail hardware flat is the same as the hardware structure shown in Fig.4.
Fig. 5. The measurement configuration of the harmonics monitoring system
5 Network Configuration of Harmonics Measurement System The system configuration of the harmonics monitoring network is shown in Fig.7. The measurement system consists of a few basic hardware components: 1. Harmonics monitoring meters; 2. Modems and Ethernet for communication; 3. GPS interface; 4. Central server and workstation; 5.Data storage; 6. Client PC. The system is designed to be able to be accessed through Web browser via internet/intranet. The advantages of using internet/intranet are well discussed by researchers for power management and monitoring [8]. The system can provide data logging and storage function as well as collecting real time data into the database. The Web Java program will read the required data based on the timestamp and the post to the Web for internet/intranet access.
258
Z. Hu and C. Zhang
Fig. 6. Harmonics monitoring system network layout
6 Simulation Results and Analysis Dynamic simulation tests are carried out to investigate the effectiveness of the proposed method. In the simulations, all the harmonic currents are generated by RealTime Digital Simulator (RTDS). The diagram of the RTDS simulation is shown in Fig. 7. A 7-buse power network is shown in Fig.8. The network has three generator buses, with linear and nonlinear loads at other buses. Only odd harmonics, such as 3th, 5th, 7th, 11th, 13th and 15th harmonics are generated in the network. Harmonic monitoring systems are installed at line 1-2 and 4-5. In harmonic component identification process, the training patterns are constructed by varying the magnitudes and phase angles of the fundamental and different harmonic components. The magnitude of the fundamental component is varied at 20%, 40%, 60%, 80% and 100% while the magnitudes of 3th, 5th, 7th, 9th, 11th , 11th , 13th and 15th harmonic component are varied at 60%, 45%, 30% and 15% respectively for each value of the fundamental magnitude. The phase angles of fundamental and harmonic components are varied from –180 degree to +180 degree with an interval of 30 degrees. The frequency of the fundamental component is varied from 45 Hz to 55 Hz with an interval of 0.5 Hz. The number of hidden layer nodes is an important factor, because the number of the connection weights is decided by it. Suppose that the sampling rate is 64 points per cycle, the nodes of the output layer is 8, the nodes of the hidden layer is 20. The number of the connection weighs is shown in table 2.
Harmonics Real Time Identification Based on ANN, GPS and Distributed Ethernet
Amplifier
ANN Harmonic Detector
RTDS Amplifier
259
GPS Receiver
Fig. 7. The diagram of RTDS simulation
G1
1
G3
L1
7
6
Meter 1 Meter 2 2
5
4 Harmonics Source
3
L2
Harmonics Source
G2
Fig. 8. 7-buse simulation network Table 2. The number of connection weighs
Half cycle One cycle
Hidden layer completely connection (20 nodes) 2*(32*20+20*8)=1600 2*(64*20+20*8)=2880
Hidden layer partly connection (10+10 nodes) 2*(32*10+10*8)=800 2*(64*10+10*8)=1440
From table 2, we know that, if the hidden layer is divided into two parts and only half cycle sampling points is used for harmonics identification, the number of the connection weighs is the least, so its calculation speed is also the fast. If we decrease the number of hidden layer nodes, the number of the connection weighs will decrease. The number of hidden layer nodes depends on the harmonics and calculation accuracy. The average error curve training diagrams of ANN with 20 nodes (completely connection) and 10+10 nodes (partly connection) are shown in Fig.9 and Fig.10 respectively. The two simulation results from RTDS tests are shown in table 3 and table 4. In table 3, no noise was injected while in table 4, a signal to noise ratio (SNR) of 20 dB was injected in the measurement. The results calculated by ANN are compared with FFT. All magnitudes are in per unit. The unit of angle is degree. The sampling rate is 64 points per cycle.
260
Z. Hu and C. Zhang
Fig. 10. The error curve with hidden nodes completely connected
Fig. 11. The training error curve with hidden nodes partly connected
The definition of error is the difference between the test signal (generated by RTDS) and ANN estimation result or between the test signal and FFT calculation result. Because the frequency of power system is changing around 50 Hz or 60 Hz, it
Harmonics Real Time Identification Based on ANN, GPS and Distributed Ethernet
261
Table 3. Simulation results with no noise injected Harmonics 1 th 3 th 5 th 7th 9th 11th
magnitude (p.u.) angle (degree) magnitude (p.u.) angle (degree) magnitude (p.u.) angle (degree) magnitude (p.u.) angle (degree) magnitude (p.u.) angle (degree) magnitude (p.u.) angle (degree)
Test Signal 1.0 45.5 0.27 88.3 0.22 126.6 0.13 32.5 0.11 168.2 0.1 56.5
ANN Estimation 1.0061 46.046 0.268 89.943 0.218 128.767 0.131 33.112 0.109 171.73 0.1023 57.986
Error 0.61% 1.2% -0.75% 1.86% -0.91% 1.71% 0.78% 1.89% -0.91% 2.11% 2.30% 2.63%
FFT Estimation 0.981 43.225 0.262 97.574 0.207 142.024 0.120 45.846 0.116 183.252 0.089 79.474
Error -1.91% -5.10% -2.78% 10.51% -5.91% 12.18% -7.69% 41.07% 5.45% 8.95% -11.2% 40.66%
Table 4. Simulation results with noise injected Harmonics 1 th 3 th 5 th 7th 9th 11th
magnitude (p.u.) angle (degree) magnitude (p.u.) angle(degree) magnitude (p.u.) angle (degree) magnitude (p.u.) angle (degree) magnitude (p.u.) angle (degree) magnitude (p.u.) angle (degree)
Test Signal 1.0 45.5 0.27 88.3 0.22 126.6 0.13 32.5 0.11 168.2 0.1 56.5
ANN Estimation 1.084 46.274 0.265 90.543 0.224 129.461 0.133 33.323 0.108 172.250 0.104 59.161
Error 0.84% 1.71% -1.85% 2.54% 1.82% 2.26% 2.31% 2.53% -2.73% 2.41% 4.01% 4.71%
FFT Estimation 0.968 41.446 0.284 107.952 0.205 148.881 0.112 50.081 0.125 192.602 0.086 82.742
Error -3.20% -8.91% 5.19% 22.26% -6.82% 17.61% -13.85% 54.10% 13.61% 14.50% -14.11% 46.44%
is difficult to realize the sampling rate of 64 points per cycle. So FFT algorithm also generates calculation error. From table 3 and table 4, we can see results calculated respectively by the ANN and FFT were affected by the additive noise in terms of the accuracy of the estimation. The ANN algorithm has better calculation accuracy than FFT, especially for phase angles estimation. Besides high accuracy, ANN algorithm also has fast speed in estimating harmonics. ANN algorithm needs only half-cycle time to identify harmonics while FFT needs at least one-cycle time to calculate harmonics, this may cause serious problems for active power filter inverter when the load conditions are in fluctuated states. Therefore, ANN algorithm can be used in active power filter.
262
Z. Hu and C. Zhang
7 Conclusion In this paper, a new real time harmonic identification approach by adaptive neural network based on GPS and embedded DSP measuring system was proposed. This method can real-time estimate each order of harmonics of power system. RTDS simulation results show that the proposed method can correctly detect harmonic components from half cycle of distorted wave, so it can be used in active power filter. As GPS synchronized signal is used in the measurement system, all data is attached with GPS time-tag, the system can also measure harmonic power flow of power systems and identify harmonic sources.
References 1. Arrillaga,J., Bradley, D.A., Bodger, P.S.: Power System Harmonics, New York: John Wiley &s Sons (1985) 2. Hartanta, R.K., Gill, G. Richards.: Harmonic Source Monitoring and Identification Using Neural Networks, IEEE Trans. on Power Systems, Vol. 5, No. 4.(1990) 1098-1104 3. Rukonuzzaman, M.: Magnitude and Pphase Determination of Harmonic Current by Adaptive Learning Back-propagation Neural Network, IEEE PEDS’99, Hong Kong, (1999) 1168-1172 4. Lai, L. L.: A two approach to frequency and harmonic evaluation, Artificial Neural Networks, Conference Publication No. 440, (1997) 245-250 5. Ibrahim El-Amin.: Artificial Neural Networks for Power Systems Harmonic Estimation, IEEE/PES ICHQP’98, Athens, Greece, Oct. 14-16,(1998) 999-1009 6. Zhijian, Hu., Chengxue, Zhang.: GPS Based Synchronous Clock and Its Application in Power Plant Automation System, Automation of Electric System, Vol. 26, No. 12, (2002) 72-73 7. Ringo P K Lee, L L Lai: A Web-based Multi-channel Power Quality Monitoring System for a Large Network, Power system management and control, (2002) 112-117 8. Hiroyuki, Mori., Kenji Itou.: An Artificial Neural Based Method for Predicting Power System Voltage Harmonics, IEEE Trans. On Power Delivery, vol. 7, No.1 (1992) 402-409 9. Srinivasan, D.,W. S. Ng, A. C. Liew: Neural-network-based Signature Recognition for Harmonic Source Identification, IEEE Trans on Power Delivery, vol. 21, No. 1, (2006) 398-405 10. Lai, L.L., Chan, W.L.: Real-time Frequency and Harmonic Evaluation Using Artificial Neural Networks, IEEE Trans. on Power Delivery, Vol. 14, No. 1, (1999) 52-59 11. George van Schoor, Jacobus Daniel van Wyk, Ian S. Shaw: Training and Optimization of an Artificial Neural Network Controlling a Hybrid Power Filter, IEEE Trans. on Industrial Electronics, Vol. 50, No. 3, (2003) 546-553 12. Zaman, M.R., M.A Rahman.: Experimental Testing of the Artificial Neural Network Bbased Protection of Power Transformers, IEEE Trans. on Power Delivery, Vol. 13, No. 2, (1998) 510-517
The Synthesis of Chinese Fine-Brushwork Painting for Flower Tianding Chen Institute of Communications and Information Technology, Zhejiang Gongshang University, Hangzhou, China 310035
[email protected]
Abstract. The Chinese Fine-Brushwork Painting gets more and more important in the traditional Chinese Ink Painting since Tang Dynasty. Compared to Free Style Chinese Ink Painting, the Fine-Brushwork Painting puts emphasis on painting realistic, detailed, and the use of colors. It consists of two major categories: one is the Birds and Flowers Painting and the other is the Figure Painting. In this thesis, we focus on the flowers drawing in the former category. In our system, we simulate two important processes, sketching the contour and coloring. Input a photo of flower. We apply the brush strokes to the outline and simulate the pigment’s flowing paths to imitate multi-level coloring. Therefore, users may generate Fine-Brushwork Painting style easily by using our system without any painting skill.
1 Introduction Chinese Ink Painting has a long history in China. Painters use ink and water to create their compositions. After Tang Dynasty, Chinese Ink Painting divides into two parts: Free Style Chinese Ink Painting and Chinese Fine-Brushwork Painting. The former emphasizes that emotion is more important than shape. The artists use the characteristics of the relationship with ink, water and rice paper(Xuan paper) to express their creations. Different from Free Style Painting, Fine-Brushwork Painting focuses on the object’s appearance[1][2]. It depicts the object’s shape realistically and uses a great quantity of colors as a foil to it. Although the Free Style Painting becomes the main stream, Fine-Brushwork Painting still is a major part in Chinese Ink Painting. In this paper, it proposes a system to synthesize realistic style of Chinese FineBrushwork Painting, specifically on flowers. Since the Birds and Flowers Painting is the main kind of Fine-Brushwork Painting, we focus on illustrating flowers. Although it already has complete background and mature procedure in painting, up to now, there are still few researches focusing on this topic.
2 Stroke Generation In this section, we describe how to obtain the stroke information from input image and use it to sketch the contour in the style of Chinese Fine-Brushwork Painting. We D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 263 – 274, 2006. © Springer-Verlag Berlin Heidelberg 2006
264
T. Chen
introduce our proposed brush model first, and then discuss it from three parts: the petal, the leaf and the pistil. 2.1 The Brush Model of Hon-Do Pen In the field of Chinese Fine-Brushwork, artists use tapering and flexible Hon-Do pen. In this section, we explain how we simulate the characteristics of Hon-Do pen. HonDo pen has the characteristics of stiffness and taper. So, in sketching of Chinese FineBrushwork Painting, Hon-Do pen can generate clear and obvious lines. Lines seldom have the situation of diffusing. In the respect of drawing, artists often sketch with Zhong Feng skill. It means that artists place the tip of pen in the middle without any slanting to make sound, strong, neat and smooth script. Therefore we use a single circle to simulate the contact region of Hon-Do pen on alum-Xuan paper. We do not consider the situation of ink diffusion. We just take the stroke’s variation of width and color into consideration. We define a circle which changes the radius along the variation of tangent line to represent the contact surface of Hon-Do pen on alum-rice paper. This circle will move along a curve and sweep the footprints as the ink calligraphy on the canvas. Consider Fig.1, when the contact circles are close enough, the whole footprints seems like a brush painting stroke. Fig.2 shows a curve depicted by Bezier polynomial. We may define a contact circle C with radius r, Pi is the point generated by substituting ui into the Bezier polynomial Q(u) , S1 and S2 are the thresholds of the curve’s two extremities to restrict the circle C’s variation on them.
(a) Thin
(b) Tight Fig. 1. The footprints of circles
Fig. 2. The brush model
We define Eq. (1), where the variables mean: 1) ri : the radius of circle C at point Pi . 2) base: the minimum radius of circle C. It presents the minimum width of the Hon-Do pen stroke. 3) start: the initial radius of circle C. It presents the initial pressure when drawing. 4) ș: the included angle of adjacent points’ tangent lines as shown in Fig.3. 5) k: an adjustable degree for angle ș.
The Synthesis of Chinese Fine-Brushwork Painting for Flower
265
6) m: the u’s unit variation. It means the disparity between ui and ui+1 .
(kθ ) base + , if S1 < ui < S 2 ° m °° (kθ ) ri = ®start → base + , if ui ≤ S1 m ° ° base + (kθ ) → 0, if ui ≥ S 2 °¯ m
(1)
Fig. 3. The representation
Consider Eq.(1), when the value of ui lies between S1 and S2 , the width of line depends on the variation of the line’s curvature. In real painting, artists usually draw heavier to generate thicker stroke when the line’s curvature becomes larger. So, we produce this effect by using the variable ș. The larger the included angle ș the greater the line’s curvature. So we will obtain wider lines, and vice versa. We can also adjust the influence of angle ș by the parameter k. Then, we let the equation divide by m to decrease the effect of different disparity between ui and ui+1. If we assign smaller disparity of u, it will cause smaller ș. That will affect radius r indirectly. In other words, if parameter m does not exist, we may get different widths of stroke when the u’s unit variation is different. But it is not a reasonable result, we use the variable m to avoid it. When u is smaller than S1, we simulate the effect of initial drawing. The radius r increases from start to the radius of point S1. The radii between P0 and PS1 are generated by blending start and rS1. On the other hand, when u is greater than S2, we decrease radius r from rS2 to 0. It simulates the ending of the stroke when painting. So, after all parameters are determined, we set a very small disparity of u. Then we can draw many circles with various radii and let them as the footprints of the ink calligraphy to obtain the goal of simulating Hon-Do pen style. In respect of color’s variation, artists always use brushes full of ink to sketch. So we will not obtain the effect of dried pen. But the ink still could distribute nonuniformly, and it will fade away gradually. In our proposed model, we obtain this effect by adjusting the intensity of the color. The user can choose the color they like. Then, the system will transform it from the RGB domain to HSI domain. When drawing, we raise the intensity of the color and let it change pale smoothly. In order to make non-uniform color, we cut circle C into several parts, then apply colors with slightly different intensity to each part. Then, we use a smooth filter on the stroke to make it vary smoothly and naturally.
266
T. Chen
2.2 Sketching the Contour on Petal
In this sub-section, we introduce the procedure for sketching the contours on flower’s petals according to the input image. We use Canny edge detector to find the silhouettes in an image. The Canny edge detector is a well-known algorithm. It uses the gradient of color in an image to determine the location of lines and find continuous and single-pixel-wide lines. User may input three parameters to decide the output line’s delicacy and importance. The full and detail algorithm can be found in [3][4]. In our proposed system, the user can use proper parametric values to exclude unnecessary detail information and obtain appropriate lines. Since the goal of this step is to sketch the contours, the lines which can show the outlines of petals have higher priority. After the edge extraction, we obtain a group of edge pixels. But we can not decide which points will form a stroke. In Chinese Fine-Brushwork Painting, artist sketches a line once based on the shape of the object. An artist will draw a stroke for a petal, or draw two symmetric strokes for it, etc. Therefore, we have to group these pixels into strokes. So we have to cut off or eliminate some unsuitable connections according to the following rules: 1) The pixel whose links are less than X pixels will be eliminated. The default value of X is 30, which can be changed by user. Its goal is to avoid too short lines or lines interfered by noise. 2) The lines which are closed connected into a stroke in pistil area will be eliminated. Pistil area is a region selected by user. Because the procedure of pistils is different from that of petals, we eliminate the pistil’s lines to avoid them being redrawn. The method of the pistils will be discussed in section 2.4. 3) If two connected pixels accumulate the variation of gradient’s direction more than 90 degrees, we have to cut it off. The direction of color gradient can be obtained by Canny edge detector. This rule mainly cuts off the connection between two petals. The goal of vectorization is to let pixels of a stroke be represented by a polynomial. Therefore, we can definitely know the stroke’s starting point, ending point and the tangent line’s gradient. This field has developed well, so we do vectorization based on Schneider’s work[5]. After vectorization, we apply the brush model to generate each stroke. User can define appropriate parameters, or use the default values offered by the system. 2.3 Sketching the Contour on Leaf
In this sub-section, we introduce the generation of veins. The users need to specify the primary vein. Then the subsidiary veins will be generated automatically. Finally, we apply the brush model to each vein. First, the user needs to specify the primary vein. From this, we can obtain the direction and the curvature of this leaf. User does not need to specify the primary vein’s position carefully but only to select several dots, as shown in Fig.4, the red points are selected by user. Then, we produce a fitting curve to approach them.
The Synthesis of Chinese Fine-Brushwork Painting for Flower
Fig. 4. Example of sketching the main vein
267
Fig. 5. Vein generation
In this paper, we produce the secondary veins by using primary vein’s information. So there may have slight disparity from the original image. First, we approximate the primary vein with a Bezier curve. Then we can get some information of this leaf, such as length, curvature, and so on. We use this information to generate secondary veins. Consider Fig.5, curve Q is the primary vein and we want to generate secondary veins Qi, i = 0,1,2,..,n-1. ș is the rotation angle defined by user. n is the amount of secondary veins defined as follows:
if 3, °° dist ( S − E ) − X + 3, n=® X ° °¯ 3
dist ( S − E ) ≤ X else
(2)
where dist (S-E) means the distance between curve Q’s two extremities, S and E. The default value of parameter X is 60, but it can be adjusted by user. It affects the interval between adjacent Pi . We divide the curve Q equally into n curve segments with points Pi, i=0,1,2…,n-1. Then, we select the point Pi, i=0,1,2…,n-1, P0=S, as the root of secondary vein Qi . For example, as shown in Fig.5, the number of n is 4. We want to generate veins Q0, Q1, Q2, Q3 and their roots are P0, P1, P2, P3. In order to avoid too uniform veins, we adjust the location of Pi slightly by shifting a small random distance. 2.4 Sketching the Contour on Pistil
We describe the procedure of generating pistil in this section. First, we use the process similar to section 2.2 to produce the pistil’s strokes. Then, the clustering algorithm will provide the colors which approximate the colors in input image. In our proposed system, users need to specify the pistil area by dragging an ellipse. Then, this pistil image is the input of the following process. First, we use the Canny Edge Detector to extract its edges[6][7][8]. Since artists usually draw exquisitely for pistil, we choose the lower thresholds than petal’s to get more detailed information. Then, we define strokes similar to that in section 2.2. We change the second restriction to eliminate the lines which are connected outside the pistil area. Since these lines do not belong to the pistil, we have to avoid them being mistakes. After defining strokes, we apply the vectorization and brush model to them. Unlike strokes of petal, painters draw pistils with more colors. So we use a simple color clustering algorithm to determine the colors. User can decide the amount of preferred colors by adjusting the parameter num. Then, the system will select num
268
T. Chen
colors which appear mostly in pistil area as reference colors. When drawing a stroke, we calculate the disparities between these reference colors and the realistic color at the same stroke in the input image. Then, we choose the most similar color to dye this stroke.
3 Washing We discuss the dyeing effect in our system. First, we get the color information from input image. Then, we simulate the washing technique that is a traditional skill in Chinese Fine-Brushwork Painting. Finally, we blend the resulting image and the stroke image. 3.1 Washing Paths Generation
This sub-section introduces how the washing paths be generated. The washing paths simulate the routes when artists apply washing skill by using watered pen. We will dye along these paths when applying washing. First, users need to specify an area for dyeing. Then, the system will generate the washing paths automatically. Before starting washing, users need to specify the starting and ending points. Users need to specify one starting point and several ending points. Since painters always washing from pistil to petal, we set the starting point at the pistil and ends at the petal. Naturally, user can set the points anywhere, and the system will apply washing technique on the specified area. According to the user-specified starting and ending points, we can generate the paths of washing. Artists often use colored pens to apply dyestuff on the center of flower, and then they may use watered pens to dye from the center to the petal. We just simulate the paths of watered pens. 3.2 Washing Model
This sub-section introduces how we simulate the traditional dyeing technique “washing” of Chinese Fine-Brushwork Painting. Our model is based on the actual painting procedure and observations. We imitate the procedures of painting and apply several methods in image processing to synthesize the result of washing. First, washing paths are constructed as discuss in section 3.1. Then, when the brush sweeps along the washing paths, we may apply an exponential function to simulate color diffusion. Finally, we mix the color pigments with background color to produce the result. In our proposed model, we combine the colored and watered pens and apply pigments on the watered pen directly. It is different from sketching the contour. Since the brush for washing is softer than Hong-do pen, we use an ellipse to model the contact region of watered pen on alum-rice paper. For easy of implementation, the size of ellipse is fixed. Fig.6 (a) shows the contact region C. Variables a and b are the major axis and minor axis respectively. Ellipse C will move along the washing paths and sweep the footprints as the color calligraphy on the canvas. It is similar to the brush model described in section 2. But, it is a little difference. Ellipse C will rotate when moving along the path. Consider Fig.6 (b). Qi is a washing path and Pk is a point on it. When C move forward along Qi, it will aim its y-axis at the direction of point Pk’s tangent on the curve.
The Synthesis of Chinese Fine-Brushwork Painting for Flower
(a) Contact surface
269
(b) A path for washing Fig. 6. Washing model
When C moves along with Qi, C will leave pigments on the canvas. We hypothesize that every point has its capacity. We set pigment 100 is full, 0 is empty. When C passes though Pk, we will decide how much percentage of pigments to leave by C according to an exponential function. To implement this method, we set two thresholds tLow and tHigh. They will influence dyeing color’s variation. Then, we pick a random number r between these two thresholds. Variable r is the power of the exponential function as shown in Eq. (3). We pick 100 points P on curve Qi. It means ellipse C will cover Qi one hundred times. Nall is the total pigment number on the brush. The pigment number NPk that C left on Pk is: N pk =
(100 − k ) r 99
¦j
× N all , k ∈ j
r
(3)
j =0
In Eq. (3), when setting a larger r, the exponential function will vary sharply. Then, we may get intensely variation of color. In other way, if we have a smaller number of r, we generate color variation smoothly. The number of r is controlled by thresholds tLow and tHigh. We can adjust them automatically according to the input image. When applying washing, we need to determinate three parameters, tHigh, tLow, Nall. The last one is specified by users because it depends on the consistency of color pigment they want to dye. But parameters tHigh and tLow, can be decided by color variation of the input image. As shown in Fig.7, after all points are assigned, we can find the boundary of two groups, such as the line L in Fig.7.
Fig. 7. Parameters determination
270
T. Chen
Since L corresponds to the color changed sharply in input image, we determine thresholds tHigh and tLow according to L’s position. If L is close to point Ustart, we set a large thresholds to obtain dyeing color vary sharply near Ustart and vice versa. The following Table 1 shows the values of tHigh and tLow. Table 1. The values of thresholds in washing
Lposition between Ustart and U1 between U1 and U2 between U2 and U3 between U3 and Uend
value of tHigh 13 8 3 1.5
value of lLow 10 3 1 1
Every time after simulating the effect of washing, we will start to do the color mixture. In our system we take Subtractive Color Mixture to model the overlapping of color. In section 3.2 we got an image with the distribution of color pigments. We still map the pigments’ quantity to color. We set the color of canvas as the background color such as yellowish brown. Because the capacity of pigment is 100, the full pigment (100) will totally cover the background. It will show the color which we want to dye. If the pigment is less than the full capacity, the percentage of pigment will determine the degree of color mixture. As Eq. (4) shown below, Nx,y is the pixel’s quantity of pigment, colord is the color which we want to dye, and colorb is the background color. The pixel’s color colorx,y will be: colorx , y =
N x, y 100
× colord + (1 −
N x, y 100
) × colorb
(4)
Since the process washing will be performed several times, we let the current pixel’s color be the background for the next washing process. Therefore, we can achieve the goal of multilayer color mixture. Table 2 shows the algorithm of washing. 3.3 Dimming Procedure
After washing, we apply a dimming procedure to let the color approach traditional Chinese colors. Since we washing with the color obtained from input photograph, the color is too brighter to simulate the traditional Chinese colors. Unlike Western Painting, Chinese painters usually use sober color to present implicit and gentle mind. Furthermore, the material of Chinese color is made by minerals or plants, so most of all are turbid and dim. To achieve this effect, we mask a gray-level image on the dyed image. We transform original input into gray-level image, and mix the dyed image with it in the ratio 8:2, as shown in Fig. 8. Finally, when all washing procedure is completed, the combination of sketched images and dyed images is the final result. We show them in the next section.
The Synthesis of Chinese Fine-Brushwork Painting for Flower
271
Table 2. The washing algorithm
Washing Algorithm Input: ------------------------------------------------------------------------------------CloseSetClas area /* painting area and washing paths */ ImageClass inputImg /* background image */ color col /* dyeing color */ /* quantity of pigment in watered pen */ int Nall double tHigh, tLow /* thresholds of exponential function */ Output: ------------------------------------------------------------------------------------ImageClass outputImg /* washing image */ ----------------------------------------------------------------------------------------------void Washing() initial… /* set watered pen */ for (every path Qi in area) r =Exponential(tHigh, tLow); for (every points Pk through path Qi , k=0~99) N pk =
(100 − k ) r 99
¦j
× N all , k ∈ j
r
j =0
for (every points x in watered pen ) rotate the mask of watered pen, aim its y-axis at Pk ’s tangent; Nx= ( NPk / watered pen.size)+ Nx ; if ( Nx >FULL ) Nx =FULL; for (every point (x, y) in outputImg ) if ( Nx,y > outputImgx,y ) outputImgx,y = Nx,y; colorMixture(inputImg, outputImg, col ); smoothFilter(outputImg);
(a) Gray-level image
(b) Dyed image
(c) Mix by 0.2*(a)+0.8*(b)
Fig. 8. Examples of dimming procedure
4 Experiment and Results The input sources are colored photographs and users need to separate them into different objects by painting software. The algorithm are implemented in C++ language on Win-XP+ PIV-2.4 and 512MB DDR RAM. Example is a 800 600 image as shown in Fig. 9. Fig. 10 shows the intermediate results of sketching the contour and multi-washing. Fig.10 (a) shows the partial
272
T. Chen
Fig. 9. The original image
(a)
(b)
(d)
(c)
(e)
(f)
Fig. 10. (a) Sub-original image (b)(c) Sketching the contour (d)(e) Multi-washing by two times (f) Dimming procedure
Fig. 11. The composition of Fig. 10(c) and (f) washing
a)
b)
Fig. 12. Final result of example 1 by two times
c)
d)
Fig. 13. (a)(b)(c) Multi-washing by three times (d) Dimming procedure
Fig. 14. Final result of example 1 by three times washing
The Synthesis of Chinese Fine-Brushwork Painting for Flower
273
Table 3. Parameters of example 1
procedure Sketching the Contour Canny Edge Detector Stroke Definition Vectorization Brush Model Multi-Washing First Washing from petal to pistil (showed in Fig. 10(d))
Second Washing from pistil to petal (showed in Fig. 10(e))
Third Washing from pistil to petal (showed in Fig. 13(c))
Parameters high=0.9,low=0.6, į=1.2 X=50 error=8.0 start=1.0,base=2.0 tHigt=8, tLow=1 Nall =1500000 colord=RGB(229,243,254) (selected by Uend’s color ) tHigt=8, tLow=3 (automatically generated by 3.2) Nall =1500000 colord=RGB(233,217,104) (selected by Ustart’s color ) tHigt=8, tLow=5 Nall =1000000 colord =RGB(179,145,53)
original image. Its extracted contours are shown in Fig.10 (b), and Fig.10 (c) shows the result after applying the brush model to it. Fig.10 (d)-(f) show the intermediate results of multi-washing procedure. We first use washing skill from petal to pistil as shown in Fig.10 (d). Then, use it again inversely as shown in Fig.10 (e). Finally, Fig.10 (f) shows the result after dimming procedure. Fig. 11 is the composition of Fig. 10(c) and (f). The final result of this example is shown in Fig. 12 and the corresponding parameters are displayed in Table 3. Fig. 13 and 14 show the other results of washing by different times.
5 Conclusion In this paper, we propose a method to synthesize Chinese Fine-Brushwork Painting on flowers. We simulate this style by two important processes: contour sketching and coloring. In the former process, we design a brush model to simulate Hong-do pen style and propose a complete procedure for extracting the contours from input image. In coloring, we synthesize the traditional Chinese dyeing technique, washing, and approximate the effect of multi-coloring by washing several times. Therefore, users may generate Fine-Brushwork Painting style easily by using our proposed system without any painting skill. However, there are still some issues left to be studied in the future. 1). The contours extracting procedure is not generic. Our approach can not find correct strokes of complex flowers, such as peony; 2). Our dyeing procedure only focuses on washing skill; 3). Our proposed system uses the basic Subtractive Color Mixture method. This
274
T. Chen
method simulates the traditional Chinese color mixture roughly but not exactly. Moreover, the other advanced methods focus on the color mixture of Western Painting, such as KM model. We hope to find a suitable or integrate several color mixture methods for traditional Chinese colors.
References 1. Bill, B., Vincent, S., Ming, C., Dinesh, L., Manocha, D. A. B.: Interactive Haptic Painting with 3D Virtual Brushed. Proceedings of ACM SIGGRAPH 01, (2001) 461-468. 2. Suguru, S., Masayuki, N.: 3D Physics-based Brush Model for Painting. Proceedings of ACM SIGGRAPH 99 (1999) 226-233. 3. John, C.: Cmputational Aproach to Eage Dtector. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(6) (1986) 679-698 4. Ching, C., Ergun, A., Jianer, C.: Two methods for creating Chinese painting. IEEE Computer Graphics and Application, (2002) 403-411. 5. Philip, J. S.: An aAlgorithm for Automatically Fitting Digitized Curves. Graphics Gems, Book 1, (1999) 612-626. 6. Helena,T., Wong, F., Horace, H. S.: Ip. Virtual brush: A Model-based Synthesis of Chinese Calligraphy. Computers and Graphics, (2000) 99-113. 7. Xu, S. H., Francis,C. M., Lau, F. T., Yunhe Pan.: Advanced Design for a Realistic Virtual Brush. EUROGRAPHICS, 22 (3) (2003) 533-542. 8. Yu, Y. J., Lee, D. H., Lee, Y. B., Cho, H. G.: Interactive Rendering Technique for Realistic Oriental Painting. Journal of WSCG, 11 (2003) 213-225
Hybrid Bayesian Super Resolution Image Reconstruction Tao Wang, Yan Zhang, and Yong Sheng Zhang Institute of Surveying and Mapping, Zhengzhou Information Engineering University, No. 66 Longhai Middle Road, Zhengzhou 450052, Henan, China
[email protected]
Abstract. Superresolution (SR) image reconstruction is able to overcome the resolution limit of camera imaging system through integrating information of multiple low resolution (LR) images that the perceived resolution of the reconstruction image is much higher than that of the individual image. Since the illposed SR reconstruction problem can be regularized within a Bayesian context by adopting a priori image model, we propose a hybrid Bayesian method for image reconstruction, which firstly estimates the unknown point spread function (PSF) and an approximation for the original ideal image, and then sets up the Huber Markov Random Field (HMRF) image prior model and assesses its tuning parameter using maximum likelihood (ML) estimator, finally computes the regularized solution automatically. Hybrid Bayesian estimates computed on simulation images, satellite images and video sequence show dramatic visual and quantitative improvements over bilinear interpolation and Maximum A Posteriori reconstruction results with sharp edges, correctly restored textures and a high PSNR improvement.
1 Introduction There are increasing demands for high-resolution (HR) images in various applications. Although the most direct way to increase spatial resolution is to use a HR image acquisition system, the high cost for high precision optics and image sensors is always an important concern in many commercial applications. Therefore, a new approach toward increasing spatial resolution is required to overcome these limitations of the sensors and optics manufacturing technology. One promising approach is to use image SR reconstruction technique to obtain a HR image (or sequence) from observed multiple LR images, which has been one of the most active image processing techniques recently [1]. Since Tsai and Huang’s work [2], taking the aliasing that exists in each LR image in the frequency domain to reconstruct a HR image, many work has been reported in the literature, including the weighted least-squares algorithm [1], the non-uniform interpolation approach [1], the projection onto convex sets (POCS) method [3]-[4] and MAP Bayesian approach [5]-[8]. Among these algorithms, the Bayesian approach is most notable currently. The MAP SR reconstruction from an LR video sequence based on the HMRF image prior model was proposed by Schultz[5]. A MAP framework for the joint estimation of image registration parameters and the HR image was presented by D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 275 – 284, 2006. © Springer-Verlag Berlin Heidelberg 2006
276
T. Wang, Y. Zhang, and Y.S. Zhang
Hardie[6]. Cheeseman [7] applied the Bayesian estimation with a Gaussian prior model to integrating multiple satellite images observed by the Viking orbiter. Jalobeanu [8] proposed an inhomogeneous Gaussian model for MAP SPOT satellite images reconstruction. Robustness and flexibility in modeling noise characteristics and a priori knowledge about the solution are the major advantages of the Bayesian approach. Assuming that the noise process is white Gaussian, a Bayesian estimation with convex energy functions in the priors ensures the uniqueness of the solution. But existing Bayesian reconstruction methods suffer from several impractical assumptions. Previous research often assumes that the PSF is known beforehand, which is impossible in actual imaging process. Further, the image prior model founded upon the upsampled LR image greatly affects the quality of the reconstruction result as the LR images are already contaminated and the resulted prior model is not robust to noise. Finally, the tuning parameter of the image prior model needs to be adjusted by experienced experts empirically, which limits the wide usage of the Bayesian estimator. Therefore, we propose a hybrid Bayesian estimator for SR image reconstruction. Under the Bayesian framework, it deconvolutes the upsampled LR image to access the PSF and an approximation value for the ideal HR image with APEX algorithm first, and then models the HMRF image prior model and assesses its tuning parameter through ML estimation, finally regularizes the ill-posed reconstruction process automatically.
2 Observation Model Above all we formulate an observation model that relates the original HR image to the observed LR image. Consider the desired HR image x =[ x1 , x2 ,...., xN]T, N= L1N1×L2N2, which is sampled at or above the Nyquist rate from a hypothetically bandlimited continuous scene. L1 and L2 are the horizontal and vertical down-sampling factors, respectively. Let the kth LR image be denoted as y(k) =[ y(k)1 , y(k)2 ,...., y(k)M]T, M = N1×N2. During the imaging process, the observed LR image results from warping, blurring, and subsampling operators performed on x and is also corrupted by additive noise, we can then represent the observation model as
y k = DHTk x + n k = Wk x + n k for 1kp .
(1)
where Tk is a warp matrix, H represents a blur matrix , D is a subsampling matrix and nk represents noise vector . A block diagram for the observation model is illustrated in Fig. 1.
Fig. 1. The observation model relating LR image to HR image
Hybrid Bayesian Super Resolution Image Reconstruction
277
3 Statement on Hybrid Bayesian Reconstruction Algorithm The SR image reconstruction problem is ill-posed. A well-posed problem can be formulated using the stochastic regularization technique of Bayesian estimation by introducing a priori constraint. The MAP optimization of the HR image x can be expressed as
x = arg max{log P(y 1 , y 2 ,, y p x ) + log P(x )} .
(2)
Both the image priori model P(x) and the conditional density P(y1 ,y2 ,… ,yp |x) will be defined by a priori knowledge concerning x and the statistical information of noise. If the motion estimation error between images is assumed to be independent and noise is assumed to be an independent identically distributed zero mean Gaussian distribution, the conditional density can be expressed in the compact form P ({y k } x ) =
1 N 2
(2π ) σ N
1 exp ® − y − Wx 2 2 σ ¯
2
½ . ¾ ¿
(3)
where σ is the error variance, y =[ y1 , y2 ,...., yp]T, W =[ W 1 , W 2 ,...., W p]T. In order to reconstruct the high-frequency components of the image lost through imaging, we take the HMRF model, which represents piecewise smooth image data, with the probability density function defined as follows [5] 2
P ( x) =
1 1 exp{− Z 2β
1
4
1
¦ ρ (d x,α )} = Z exp{− 2β ¦¦ ρ (d x,α )} . t l
c∈S
t l
(4)
m , n l =1
where Z is a normalizing constraint, β is the temperature parameter, c is a local group of pixels contained within the image cliques S, α is a threshold parameter separating the quadratic and linear regions. The quantity
d lt x measures the second-
order finite differences in four directions at each pixel in the HR image, small in smooth locations and large at edges [5]. The likelihood of edges in the data is controlled by the Huber penalty function x ≤ α, . x > α,
° x 2 , ρ ( x, α ) = ® 2 °¯2α x − α ,
(5)
The regularized solution is then equivalent to minimizing the cost function 4
2 U (x ) = y − Wx + ¦¦ ρ (dlt x, α ) .
(6)
m, n l =1
The HMRF prior model should be founded on the ideal undegraded HR image, and the threshold parameter α and the conditional density should also be decided upon it. The problem is that the ideal HR image is unknown. In the existing Bayesian MAP estimation research, this problem is often ignored by taking the upsampling LR image as substitute for the ideal HR image. And the threshold parameter is set empirically. It is a suboptimal method and a bad initialization often leads to degenerated solutions [8].
278
T. Wang, Y. Zhang, and Y.S. Zhang
The ideal HR image can not be approximated by its degraded version because the LR image is blurred and noisy. Parameters estimated on a blurred image have too high a value and lead to over-smoothed solutions. Parameters estimated on a noisy image are too low, and provide insufficient regularization leading to noisy solutions. The Bayesian estimator is only significant and supplies good parameter estimate in the case of the ideal HR image Therefore, an approximation of x has to be accurately determined by an excellent restoration algorithm if we want the HMRF model and the parameters obtained in this way to be significant for regularization. We choose the APEX algorithm to compute the approximation of x. This algorithm has been chosen because the deconvoluted result exhibits sharp textures and noisefree areas, and is sufficiently close to the original image to enable us to estimate the tuning parameter from it. Moreover, the unknown PSF can also be determined. The proposed method is hybrid since the estimation is not directly done on the observed image, but on an intermediate image. In the following, we detail how to get an approximation of x with the APEX algorithm, how to estimate the inhomogeneous control parameter α from the approximation image, and how to generate a reconstruction estimate automatically.
4 Hybrid Bayesian Reconstruction Solution 4.1 APEX Prior Blind Deconvolution The APEX [9] method is a FFT-based direct blind deconvolution technique. The method is applicable to a restricted two-dimensional radially symmetric shiftinvariant G class blur. This class generalizes Gaussian and Lorentzian densities. The OTF (Optical Transfer Function) form of G class blur h(x, y) is defined as
H (ε , η
) = ³R
2
h ( x , y )e − i 2 π ( ε x + η y ) dxdy
= e − a (ε
2
+η
) .
2 b
(7)
where (a > 0,0 < b ≤ 1) . When just blurring factor considered, the relationship between the HR image and the LR image is listed as follows
y ( x, y ) = h ( x, y ) * x ( x, y ) + n ( x, y ) .
(8)
The Fourier transform of equation (8) has the following form
Y (ε ,η ) = H (ε ,η )X (ε ,η ) + N (ε ,η ) . We fies
may
surely
assume
that
³ n(x, y )dxdy ≤ f (x, y )dxdy = σ > 0 R2
the
noise
(9)
n(x,
y)
satis-
( σ is a normalizing constant), so that
we can ignore N(İ, Ș) and further normalize equation (9) into equation (10), assuming Y(İ, Ș), X(İ, Ș) and the OTF keep the following relation in a region Ω in the frequency domain
(
log Y (ε ,η ) ≈ − a ε 2 + η 2
)
b
+ log X (ε ,η ) .
(10)
Hybrid Bayesian Super Resolution Image Reconstruction
279
We replace log| X(İ, Ș) | by constant -A and solve (a, b) by best fitting curve
(
− a ε 2 +η 2
)
b
− A using nonlinear least squares algorithms. Putting (a, b) into equation (11), we can get the optimal approximation value for ideal HR image after inverse Fourier transform. H is the conjugate of H , K and s are adjustable parameters
H (ε ,η )G(ε ,η )
F (ε ,η ) =
H (ε ,η ) + K 1 − H (ε ,η ) 2
−2
s
.
(11)
2
4.2 Maximum Likelihood Estimation on HMRF Tuning Parameter The ML estimation of the inhomogeneous control parameter α based on the approximation value provided by the APEX deconvolution is calculated as
αˆ = arg max P(x α ) .
(12)
Since approximation of the original HR image is known, α can be assessed according to a predetermined cutoff ratio T, which roughly corresponding to the percentage of high-frequency components in the image
( ) ( )
f α d lt x f d lt x = T .
(13)
( )
) of the second order derivative, f α d lt x is the where f (d lt x ) ҏis the norm from( norm when threshold α is taken into consider (any value lower than α is set to zero).Under the assumption than there is more energy for the low-frequency components than that for the high-frequency components, T is usually chosen within (0 , 0 . 5 ]. If no prior information of the energy distribution is available, T can be set as 0.5 to allow enough high-frequency components to appear in the reconstructed HR image. After ratio T is set, estimation on threshold α consists of solving the following system
∂ log P (x α ) ∂α
4 ª º = ∂ «¦¦ ρ (d lt x, α )» ∂α = 0 . ¬ r l =1 ¼
(14)
where r is the component within the set of all high frequency components above T. Thus it gives αˆ =
¦ (d x ) t l
n , n is the number of high-frequency components.
r∈R
4.3 Gradient Projection Solution Since the object function in equation (6) is convex, gradient-based methods may be employed to compute the unique minimum solution. We select the conjugate gradient optimization technique for it avoids the complex computation on the Hessian matrix in the Newton gradient approach and converges faster than the steepest descent gradient approach. The conjugate gradient technique converges to a global minimum of the objective function by following the trajectory defined by the conjugate direction. Any
280
T. Wang, Y. Zhang, and Y.S. Zhang
starting point x0 that satisfies equation (1) is valid. We use the APEX restored image as the initial value x0. Before the iteration, the gradient is computed as g0= U(xi) and the conjugate direction is taken as p0 = -g0. In the subsequent iterations i=0,…,K, to ˆ moves in the descent direction p i with step size IJi minimize U(xi), x
τi = −
g Ti p i . p Ti W T W p i
(
)
(15)
x i +1 = x i + τ i p i .
(16)
And the conjugate direction is determined according to the following formulas.
g i +1 = g i + τ i ( W T W )p i . § gT p p i +1 = −g i +1 + ¨¨ − T i +T1 i © pi W W pi
(
)
(17)
· ¸p i . ¸ ¹
(18)
Then the new iteration will go on from equation (15) to (18), a sequence of iterates
{x i }iK=0 , more and more close to xˆ , are generated. The convergence is not achieved until the relative state change for a single iteration has fallen below a predetermined threshold ε , such that x i +1 − x i x i ≤ ε .
The whole reconstruction procedure of the hybrid Bayesian estimator is summarized as follows. 1) Upsample the LR images according to the given enhancement factor q using the bilinear interpolation approach, construct the downsampling matrix D according to q, construct the geometric distortion matrix T using the hierarchical block matching algorithm [5]. 2) Deconvolute the reference upsampling image with the APEX algorithm and compute the PSF and the optimal approximation value for HR image. 3) Calculate the conjugate direction pi. In the first iteration, the conjugate direction is set as p0 = -g0, in the following iteration i=1,…,K, pi is calculated according to the equation (18). 4) Compute the step size τ i according to definition in the equation (15). 5) Update the state according to the equation (16). 6) If convergence criterion is satisfied, the estimate is given as
xˆ = x i +1 . Other-
wise, increment xi+1= xi+IJipi and return step 3.
5 Results In order to demonstrate the performance of the proposed algorithm, several experimental results are presented here. The first set of experiment involves simulation data derived from a single HR image. The second set of experiment considers multiple 5.0m resolution SPOT 5 satellite images. The third set of experiment data is a LR video sequence grabbed from a digital video film during play back.
Hybrid Bayesian Super Resolution Image Reconstruction
281
5.1 Visual and Quantitative Results for Simulation Imagery Nine sets of random translational shifts are generated by IDL random function. Using these shifts, a sequence of nine translated images is generated from the HR image “board” in Fig. 2(a). These 9 images are further blurred by 3×3 Gaussian smoothing
(a)
(c)
(b)
(d)
Fig. 2. Simulation board sequence. (a) Original high-resolution image. (b) Bilinear interpolation of the reference image. (c) Huber-MAP estimate. (d) Hybrid Bayesian estimate.
filter and decimated by a factor of L1= L2=2 to produce 9 LR images of size 128×128. The bilinear interpolation image of the reference image is shown in Fig. 2(b). The Huber-MAP estimate with α =1 is shown in Fig. 2(c). The initial HR image is the bilinear interpolation image in Fig. 2(b) and 20 iterations are performed. Fig. 2(d) shows the hybrid Bayesian estimate. The initial estimate is the approximation value provided by APEX deconvolution and 15 iterations are performed. The PSNR (Peak Signal-to-Noise Ratio) of the bilinear interpolation is 20.1, that of Huber-MAP result is 23.2, and that of hybrid Bayesian estimate is 26.7, which demonstrates a significant improvement in
282
T. Wang, Y. Zhang, and Y.S. Zhang
PSNR. Obviously, the hybrid Bayesian estimator has created a HR image with considerably much higher resolution than the bilinear interpolation result and Huber-MAP reconstruction result. The digital numbers, the characters and the circuit nodes in the hybrid Bayesian estimate are much clearer than what is seen in the bilinear interpolation result and Huber-MAP reconstruction result. 5.2 Visual Results for Actual Satellite Images Now we apply the proposed estimator to a sequence of five 5.0m resolution SPOT 5 satellite images. The enhancement factor q is set to be 2. The bilinear interpolation of the reference image in shown in Fig.3(a). The Huber-MAP estimate is shown in Fig. 3(b) for α =1 and the hybrid Bayesian estimate is shown in Fig.3(c).
(a)
(b)
(c) Fig. 3. Actual Satellite Image Sequence. (a) Bilinear interpolation of the reference image. (b) Huber-MAP estimate. (c) Hybrid Bayesian estimate.
Hybrid Bayesian Super Resolution Image Reconstruction
283
It can be seen clearly from Fig.3. that the visual resolution is significantly improved in the hybrid Bayesian estimate, and the roads and other edge details appear most continuous and dominant in it. 5.3 Visual Results for Actual Film Video Sequence Nine frames are grabbed from the video sequence. The enhancement factor q is set to be 2. The frame shown in Fig. 4(a) is the bilinear interpolation of the reference frame. The Huber-MAP result after 20 iterations is shown Fig. 4(b) and the hybrid Bayesian estimate after 16 iterations is shown Fig. 4(c). Experimental result shows that the image generated by the hybrid Bayesian estimator outperforms those generated by bilinear interpolation and Huber-MAP reconstruction. The digital number and the word ’January’ can be clearly seen in Fig.4(c). Moreover, the other details in the calendar are more prominent in it.
(a)
(b)
(c) Fig. 4. Actual Video Sequence. (a) Bilinear interpolation of the reference frame. (b) HuberMAP estimate. (c) Hybrid Bayesian estimate.
6 Conclusion The main contribution of this paper is to propose a novel hybrid Bayesian algorithm for HR image reconstruction from multiple LR images or video sequence. The proposed method is hybrid for the estimation is not made directly on the observed image, but on an intermediate image. This approach comprises getting a good approximation of ideal HR image with APEX algorithm, estimating the inhomogeneous control
284
T. Wang, Y. Zhang, and Y.S. Zhang
parameter from intermediate data by ML estimation and generating a regularized reconstruction estimate automatically. And its essential novelties are providing a good approximation of the original image and assessing the inhomogeneous control parameter by ML estimation, which enables the reconstruction processing to be carried out automatically and ensures the robustness of the estimate. Moreover, the hybrid Bayesian Experimental results demonstrate this new technique is robust and gives very excellent reconstruction result in simulation data, actual satellite data and actual video data. Furthermore, the resulting images exhibit much sharper and clearer details than images reconstructed by the Huber-MAP estimator.
References 1. Park, S. C., Park, M. K., Kang, M. G.: Super-Resolution Image Reconstruction: A Technical Overview. IEEE Signal Processing Magazine. 5 (2003) 21-36 2. Tsai R. Y., Huang, T.S.: Multiframe Iimage Restoration and Registration. In: in Huang, T.S.(Ed.): Advances in Computer Vision and Image Processing, JAI Press, (1984) 317-339 3. Patti, A.J., Sezan, M. I., Tekalp, A. M.: Superresolution Video Reconstruction with Arbitrary Sampling Lattices and Nonzereo Aperture Time. IEEE Trans. Image Processing. 8 (1997) 1064-1997 4. Patti, A.J., Altunbasak, Y.: Artifact Reduction for Set Theoretic Super Resolution Image Reconstruction with Edge Adaptive Constraints and Higher-order Interpolants. IEEE Trans. Image Processing. 1(2001) 179-186 5. Schulz, R.R., Stevenson, R. L.: Extraction of High-Resolution Frames from Video Sequences. IEEE Trans. Image Processing. 6 (1996) 996-1011 6. Hardie, R.C., Barnard ,K.J., Armstrong E.E.: Joint MAP Registration and High-resolution Image Estimation using a Sequence of Undersampled Images, IEEE Trans. Image Processing. 12 (1997) 1621-1633 7. Cheeseman, P., Kanefsky, B., Kraft, R., et al.: Super-resolved Surface Reconstruction from Multiple Images, NASA Ames Research Center, Moffett Field, CA, Tech. Rep. FIA-9412, (1994). 8. Jalobeanu, A., Blanc-Féraud, L. et al.: An Adaptive Gaussian Model for Satellite Image deblurring, IEEE Trans. Image Processing, 4(2004) 613-621 9. CARASSO, A.S.: The Apex Method in Image Sharpening and The Use of Low Exponent Lévy Stable Laws, SIAM J. APPL. MATH., 2(2002) 593-618
Image Hiding Based Upon Vector Quantization Using AES Cryptosystem Yanquan Chen and Tianding Chen Institute of Communications and Information Technology, Zhejiang Gongshang University, Hangzhou, Zhejiang, China 310035 {chenyq776, chentianding}@163.com
Abstract. A novel gray-level image-hiding scheme is proposed. The goal of this scheme is to hide an important gray-level image into another meaningful gray-level image. The secret image to be protected is first compressed using the vector quantization (VQ) scheme. Then, the Advanced Encryption Standard (AES) cryptosystem is conducted on the VQ indices and related parameters to generate the encrypted message. Finally, the encrypted message is embedded into the rightmost two bits of each pixel in the cover image.
1 Introduction Due to the growth of network bandwidth, the Internet has become a popular channel for date transmission. The demand of image data safety and protecting important image content from disclosed become more and more urgent. Several encryption techniques have been employed to protect the digital data from illegal tampering or stealing [1, 2]. Data with encryption has a common characteristic, it is transformed into a set of meaningless codes. If interceptors can interpret such codes, they may find out the codes indicate something valuable. To protect data safety and to keep the value of data from exposing, information hiding techniques have been introduced [3]. Information hiding differs from encryption, in that it embeds critical information in a noncritical host message to distract opponents’ attention. Based on the level of significance of information embedded, information hiding schemes can be divided into two classes: the watermark schemes and the steganography schemes [4]. The goal of the watermarking schemes is to protect the copyright of digital media while the steganography schemes aim to embed other information into the host message. In general, the robustness of the watermarking schemes is the main concern for copyright protection of digital media such as text, image, audio and video. Usually, the amount of the embed copyright message is small. In contrast, the steganography schemes intend to embed important message into the host message. The hiding capacity instead of robustness is the main consideration of the steganography schemes. Recently, the research towards the information hiding techniques are mostly related to image, i.e. hiding one secret image that needs to be protected into another meaningful image called a host image, and then transmitting the embedded image via D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 285 – 293, 2006. © Springer-Verlag Berlin Heidelberg 2006
286
Y. Chen and T. Chen
embedded some useful data into a picture and transmit it to others via the Internet. Thus, interceptors can’t find which picture that has useful data. In general, most image hiding schemes are designed for certain classes of images such as bi-level images[5, 6], half-tone images [7], gray-level images [8], and color images. That is because properties of images in different classes are quite different. Among these image classes, the research toward the hiding of gray-level images is the most attractive one. It is because the gray-level occupies a large portion of the imagerelated data on Internet and most hiding schemes for gray-level image can easily be modified to hide color images. In this paper we develop a data hiding method that is based upon VQ, supports a varying number of secret images, assures greater safety of the secret image, causes little distortion to the host image. To make it more difficult for interceptors to detect the secret image from the host image, we first use AES encryption technique on the codebook and indices. Then insert the encrypted data in the LSB position of the host image [9].
2 Related Works The purpose of this paper is to design an image hiding method which use the AES cryptosystem, but causes little distortion to the host image. For the explanations to be given in Sec. 3 assume in this paper, there are two n-bits gray-level images, S and H. Both image sizes of S and H are w×h. we name image S as “secret image”, and name image H as “host image”. Under these assumptions we will introduce VQ, AES and our method. In this paper, we use PSNR (Peak-Signal-to-Noise Ratio) to measure the degree of similarity between two given gray-level image of w×h pixels [10] .The definition of PSNR is illustrated as below. (1) n 2
PSNR = log MSE = (
( 2 −1) MSE 10
w h 1 )¦¦ ( z ij − hij ) 2 w × h i =1 j =1
(2)
In Eq. (2), zij and hij represent original pixel value and modified pixel value. 2.1 The Vector Quantization Scheme We use the vector quantization (VQ) scheme to decrease the volume of the transmitted data of the secret image. In this subsection, we will introduce the basic concept of VQ [10, 11]. In general, VQ can be divided into three processes: the codebook design process, the encoding process and the decoding process .The goal of the VQ codebook design process is to cluster the training images, and then find out some codewords to represent the training set. Regarding the research concerning the codebook design process, the LBG algorithm [12] is the most popular method for codebook design.
Image Hiding Based Upon Vector Quantization Using AES Cryptosystem
287
Using the LBG algorithm to design a codebook consisting of Nc codewords, with each codeword consisting of a u-dimensional vector, each training image is first partitioned into a set of nonoverlapping image blocks of u pixels. Then, an initial codebook is randomly selected from the training set of image blocks. Each image block in the training set is classified into one closest codeword where the Euclidean distance between this image block and the codeword is the smallest. Therefore, Nc groups of the training set of image blocks are obtained. The centroid of each group is calculated, and all new centroids of these Nc groups are treated as a new codebook. The same operations are repeatedly executed until all centroids are convergent. The codebook generating process of the LBG algorithm using one training image is illustrated in Fig. 1.Here, the codebook is composed of 256 codewords and each codeword is a 16-dimensional image vector. In other word, the training image is segmented into a set of nonoverlapping image blocks of 4×4 pixels.
IM
Decomposition
IM
Generating
C1 C2 C3
Clustering
Cluster1 Cluster2 Cluster3
codebook
C256
Cluster256 Regenerating codebook
C C C
C
No
1 2 3
256
Stable or not
Yes Over Fig. 1. The flowchart of VQ codebook design process using the LBG algorithm
288
Y. Chen and T. Chen
After the codebook design process, a set of representative codewords forms the codebook. An identical codebook is used both in the VQ encoding/decoding processes. In the VQ encoding process, each image to be compressed is partitioned into a set of nonoverlapping image block of u-dimensions. Then, the closet codeword in the codebook is the one with the minimum squared Euclidean distance with this image block. Next, the corresponding index of the closest codeword is transmitted to the receiver. Compression is accomplished by transmitting the index of the codeword in the codebook instead of the image block itself. In the VQ decoding process, a same codebook is used. By successively replacing the corresponding codeword in the codebook using the received index, each encoded image block can be recovered. After all image blocks are decoded, the whole encoded image can thus be reconstructed. 2.2 AES Theory The AES is the Advanced Encryption Standard [13].It specifies a FTPS-approved cryptographic algorithm that can be used to protect electronic data. It is a symmetric block cipher that can encrypt and decrypt information, that encryption converts data to an unintelligible form called cipher text, and decrypting the cipher text converts the data back into its original form, called plaintext. It supports key sizes of 128 bits, it is a substitution-linear transformation network with 10 rounds, depending on the key size. A data block to be encrypted is split into an array of bytes, and each encryption operation is byte-oriented.
3 The Proposed Scheme The purpose of this paper is to design a gray-level image hiding method that embeds a secret image into one host image using ASE. In Sec.3.1, we introduce the procedure of the image hiding that we proposed. It includes six steps: Decomposition; Generating the secret image’s codebook; Index mapping; Encryption; Embedding; Secret image extracted procedure. Then we will introduce the proposed image hiding procedure and the experiment results. 3.1 The Proposed Image Hiding Procedure The proposed gray-level image hiding scheme is also based on the VQ scheme. The VQ scheme is employed to reduce the volume of the secret images. In addition to assure the security of secret images, the AES cryptosystem is used to encrypted metadata, including the indices and codebook for secret images to form encrypted metadata. Lastly, the pixels of encrypted metadata are embedded back to image H using R as a reference. The process is illustrated in Fig.2. The steps are explained in detail as below, at the same time we will show our experiment results. We use the Bridge (Fig.3) as the secret image and Lena (Fig.4) as the host image.
Image Hiding Based Upon Vector Quantization Using AES Cryptosystem
Secret image (n-bits/pixel)
S
Host image (n-bits/pixel)
H
Separation { S1, S2, S3 K Sns } A set of vecto rs
289
LSB extraction
R
Residual image (r-bits/pixel)
VQ codebook generation { I 1, I 2, I 3 K I ns } Indices of sec ret image Replacement Metadata AES encryption
Encrypted Metadata
Z Embedding Resulting (n-bits/pixel)
Fig. 2. Flowchart of embedding by VQ
Fig. 3. Bridge size 512 × 512
Fig. 4. Lena size 512 × 512
Step 1. Decomposition To hide a secret image S into host image H, we first determine three parameters, including r for LSB substitution, the number of codewords Nc, and the block size u forsecret image S. we then separate the secret image S into a set of vectors {S1, S2, S , Sns}. Here, ns is the number of vectors in S, and the size of each vector is u. In
290
Y. Chen and T. Chen
addition, we extract the rightmost r bits out of each of the pixels in host image H to compose a residual image R, whose width and height are the same as the host image H. In our experiment, letting r=2, we extract the rightmost two bits of each pixel in Lena to form residual image R, Setting Nc=128, and u=16, we divide the secret image into a set of vectors, with the size of each vector being 16. Step 2. Generating the Secret Image’s Codebook for VQ The LBG algorithm, originally proposed by Linde, Buzo and Gray [9], is used here to generate the codebook with Nc codewords for secret image S as stated below: Step 2.1 Randomly pick Nc codewords from ns vector{S1, S2, S3 , Sns}to be the initial codebook. Step 2.2 Classify the training vectors into certain codeword group based on the minimum squared Euclidean distance between the vector and the codeword of the codebook, yielding Nc codeword groups. Step 2.3 Recalculate the centroids of the Nc codewords to generate a new codebook. Step 2.4 Repeat Step 2.2 and Step 2.3 until the Nc centroids are convergent. In our experiment we can obtain the codebook from Bridge as show in Table 1 Table 1. Codebook from Bridge (Nc=128, u=16)
Index I1
72 81
72 73
Codeword 78 79 74 77 71 78 74 77
I2
81 82
79 82
79 83
65 75
107 86 75 101
91 87
76 91
I3
99 93
91 97
92 92 105 94
96 121 100 86
89 98
93 96
Ii
80 79
69 73
I127
194 195
190 194
196 196
197 196
194 195
194 197
195 193
196 194
I128
207 209
206 207
209 208
209 207
213 209
207 210
207 210
209 213
Step 3. Index Mapping Each image vector of secret image S and codeword has u values. Set each vector of secret image. Si={Si1, Si2, Si3 , Siu}, and each codeword of codebook, Ci={Cj1, Cj2, Cj3 ,Cju}, sequentially select the vector of the secret image s, {S1,S2,S3 , Sns}, 0i ns, and the Nc codewords of codebook {C1, C2, C3, , CNc}, perform calculations using these values according to the following formula. The set of vectors {S1, S2, S3, , Sns} of S image shall be replaced by the
Image Hiding Based Upon Vector Quantization Using AES Cryptosystem
291
corresponding index values of the codebook to create a set of indices of secret image {I1 , I2 , I3, …, In}, u
¦ l =1
u
Nx
( s il − c xl ) 2 = min h =i
¦ (s
il
− c hl ) 2 1 ≤ i ≤ n s ,1 ≤ l ≤ u,1 ≤ x ≤ N s
(3)
l =1
In Eq. (3), x is the index of image vector Si, and x =Ii , i=1, 2, 3, , ns. Finally, we obtain a set of indices {I1 , I2 , I3, …, In} of secret image S. In our experiment, every vector’s corresponding index value is shown in Table 2. Table 2. Codebook from Bridge
Secret image’s vector Index value
S1 47
S2 S3 Sn s 51 55 59
Step 4. Encryption AES is a symmetric block cipher that encrypts and decrypts 128-bit blocks of data. In this paper, each block size is 128 bits and the length of keys we use 128 bits. In our experiment, we use 1234567890123456 as the key. After step 2 and 3, we get the codebook and indices from the secret image, and then we can combine both images and other parameters, including 512 × 512 image size of secret image, block size 16 of each vector, 2 of the LSB and 128 the number of codewords in the codebook, to generate a set of metadata. Afterwards, AES is conducted on the metadata. Step 5. Embedding To generate the embedding result image Z based on r, we replace the rightmost r bits of the corresponding pixel in host image H by the bit value of encrypted metadata. For example, if r=4, we replace the rightmost four bits of the corresponding pixel in host image H by the bit value of encrypted metadata. Based on image R, we can embed each pixel of encrypted metadata to the rightmost two bits of each pixel of host image Lena, which is shown in Fig. 5, the embedding result that hides the secret image Bridge is called image Z, as shown in Fig. 6.
Fig. 5. Original Lena
Fig. 6. Embedding Lena
292
Y. Chen and T. Chen
Step 6. The Proposed Secret Extraction Procedure The secret extraction procedure is symmetric with the proposed image hiding procedure. First, we fetch the encrypted working parameters from the first row of the embedded image and then decrypt them by the AES decryption procedure. Next, we proceed to fetch the encrypted message consisting of the codebook and the indices. The encrypted message is then decrypted using the AES decryption image, each encoded image block can be sequentially reconstructed by the corresponding codeword in the codebook. After all image blocks are processed, the compressed secret image is now available. The image extracted as Fig. 7 and Fig. 8 as below:
Fig. 7. Bridge was decrypted the Bridge was distract
Fig. 8. The host image Lena after From Lena
Because our proposed scheme applies AES to encrypt the data compressed by VQ, we need to examine whether the VQ influence the quality of the secret image. All of the LBG codebook generation programs, the index mapping programs and the AES encryption/decryption programs were written in C. We both used the 512 × 512 images in our host and secret image. When the secret image is embedded into the host image, secret image generates its own codebook, and the PSNR values of secret images was showed below. All the results are showed in table 3 and table 4. Table 3. Quality of secret image after performing vector quantization using different numbers of codewords
Codewords 128
256
512
1024
30.263
31.798
32.523
33.630
Secret Image Bridge
Table 4. Quality of host images hiding secret images using our proposed scheme
Host image after hiding secret image
128
256
512
1024
44.152
44.167
44.203
44.217
Image Hiding Based Upon Vector Quantization Using AES Cryptosystem
293
4 Conclusions In this paper, we proposed a new scheme that use the AES algorithm to encrypt the image which was compressed by VQ. From the result of experiment in Table 4, we found that our scheme is a good way to hide secret image, and it provides good image qualities of both the host image and the retrieved secret images.
References 1. Anderson, R.J., Petitcolas, F. A. P.: On the Limits of Steganography, IEEE J.Selected Areas Commun. (1998)474-481 2. Artz., D.: Digital Steganography: Hiding Data within Data. IEEE Internet Comput.(2001) 75-80 3. Augot, D., Boucqueau, J.M., Delaigle,,J.F., Fontaine, C., Goray, E.: Secure Delivery of Image over Open Networks, In Proc. IEEE 87(7) (1999)1251-1266 4. Fu, M.S., Au. O.C.: Date Hiding Watermarking for Halftone Images. IEEE Trans. Imag. Process. 11(4)(2002)477-484 5. Tseng, Y. C., Chen, Y. Y., Pan, H. K.: A Secure Data Hiding Scheme for Binary Images, IEEE Trans. Commun. 50(8) (2002) 1227-1231 6. Tseng, Y. C., Pan, H. K.: Data Hiding in 2-color Images, IEEE Trans. Comput.51(7) (2002) 873-878 7. Fu, M. S., Au, O. C.: Halftone Image Data Hiding with Intensity Selection and Connection Selection. Sign. Process: Imag. Commun. 16 (2001) 909-930 8. Chang, C. C., Lin, M. H., Hu, Y. C.: A Fast and Secure Image Hiding Scheme Based on LSB Substation, Int. J. Pattern Recognition and Artificial Intelligence 16(4) (2002) 399-416 9. Chang, C. C., Hu, Y. C.: A fast codebook training algorithm for vector quantization, IEEE Trans Consum.Electron.44(4)(1998) 1201-1208 10. Fridrich, J., Goljan, M., Du. R.: Detecting LSB Steganography in Color and Grayscale Images. IEEE Multim.May.8(4) (2001) 22-28 11. Hu, Y.C., Chang,C. C.: Apredictive Subcodebook Search Algorithm for Vector Quantization.Opt.Eng.39(6) (2000) 1489-1496 12. Linde, Y., Buzo, A., Gray. R.M.: An Algorithm for Vector Quantizer Design. IEEE Trans.Commim.28(1980)84-95 13. Rhee, M.Y.: Cryptography and Secure Communication. McGraw—Hill Book Co.Singapore. (1994)
Image Ownership Verification Via Unitary Transform of Conjugate Quadrature Filter Jianwei Yang1 , Xinxiang Zhang2 , Wen-Sheng Chen3,4 , and Bin Fang5 1
Department of Mathematics, Nanjing University of Information Science and Technology, Nanjing, 210044, P.R. China 2 Department of Computer Science, Henan Institute of Finance and Economics, Zhengzhou, 450002, P.R. China 3 Department of Mathematics, Shenzhen University, Shenzhen, P.R.China 4 Key Laboratory of Mathematics Mechanization, CAS, Beijing, P.R.China 5 Department of Computer Science, Chongqing University, Chongqing, P.R. China
Abstract. A wavelet-based watermarking system is described for ownership verification of digital images. The wavelet filters used in this system are constructed by unitary transform of two-dimensional conjugate quadrature filter (CQF). Tensor-product wavelet filters are only special cases of this construction. This construction provides more ways to randomly generate perfect reconstruction filters, and will increase the difficulty for counterfeiters to gain the exact knowledge of our watermark. Furthermore, the watermark is inserted into several middle-frequency sub-bands and the existence of the watermark is asserted if any one of the correlation values is greater than a pre-determined threshold. Experimental results show that the proposed algorithm achieves invisibility, blind, and robustness to noising, sharpening, cropping etc.
1
Introduction
Image watermarking is finding more and more support as a possible solution for the protection of intellectual property rights [1]. In [2], Wang, Y. et al have presented a wavelet-based watermarking algorithm for ownership verification. Recently, Huang, Z. et al [3], [4] also provide a watermarking algorithm for ownership verification. It was said in [2] that to make these watermarking schemes more robust to counterfeit attempts, we need to create as many ways as possible to randomly generate perfect reconstruction filters, which will increase the difficulty for counterfeiters to gain the exact knowledge of the filters. However, wavelet filters used in all these schemes are tensor product of one-dimensional orthogonal wavelet filters derived from the method given by Vaidyanathan, P. et. al.[6]. In this paper, we propose a method to construct two-dimensional wavelet filters by unitary transform of conjugate quadrature filter (CQF). Tensor product two-dimensional wavelet filters are only special cases of our construction. Hence, this approach provides more ways to randomly generate perfect reconstruction filters, and offers more room for analysis and optimization. Furthermore, in our algorithm, watermarks are inserted into different middle frequency sub-bands. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 294–302, 2006. c Springer-Verlag Berlin Heidelberg 2006
Image Ownership Verification Via Unitary Transform of CQF
295
Even when only one of the several correlation values is greater than the threshold, we can declare the existence of the watermark. Hence, our algorithm will be robust to some image manipulations. The construction of wavelet filters by unitary transform of CQF is presented in Section 2. Watermarking algorithm is described in Section 3. Experimental results are presented in Section 4. We compare our method with previous methods in Section 5.
2
Construction of Two-Dimensional Wavelet Filters
To construct two-dimensional orthogonal lowpass wavelet filter and its associated highpass wavelet filters, we need to construct sequences {bα }α∈Z 2 , and {djα }α∈Z 2 (j = 1, 2, 3) such that (see [5]) bα∈Z 2 bα+2β = δ0,β , (1) α∈Z 2
j2 1 djα∈Z 2 dα+2β = δj1 ,j2 δ0,β ,
(2)
bα∈Z 2 djα+2β = 0,
(3)
α∈Z 2
j, j1 , j2 = 1, 2, 3, β ∈ Z 2 .
α∈Z 2
The sequences {bα }α∈Z 2 which satisfies (1) is called CQF. A sequence {bα }α∈Z 2 is called finite supported sequence if the set {α : bα = 0} is finite. Definition 1. Let {bα }α∈Z 2 be a finite supported sequence and U be an arbitrary 4 × 4 orthogonal matrix. Define (˜b(2i,2j) , ˜b(2i,2j+1) , ˜b(2i+1,2j) , ˜b(2i+1,2j+1) ) = (b(2i−2A1 ,2j−2B1 ) , b(2i−2A2 ,2j−2B2 +1) , b(2i−2A3 +1,2j−2B3 ) , b(2i−2A4 +1,2j−2B4 +1) )U, (4)
where Ai , Bi (i = 1, 2, 3, 4) are integers, then {˜bα }α∈Z 2 is called the unitary transform of {bα }α∈Z 2 . Definition 2. {˜b}α∈Z 2 is called the unitary transform 1 of {bα }α∈Z 2 if we choose A2 = A4 = 1, A1 = A3 = Bi = 0 (i = 1, 2, 3, 4) in (4). {˜b}α∈Z 2 is called the unitary transform 2 of {bα }α∈Z 2 if we choose B3 = B4 = 1, B1 = B2 = Ai = 0 (i = 1, 2, 3, 4) in (4). Unitary transform 1 and unitary transform 2 are respectively denoted by {˜bα }α∈Z 2 =
2
U{bα }α∈Z 2 ,
and
{˜bα }α∈Z 2 =
1
U{bα }α∈Z 2 .
296
J. Yang et al.
0 ,ξ0 k λ0 ,ξ0 For arbitrary ξ0 , λ0 , ξ, λ ∈ R, let {Bα }λα∈Z 2 , {Dα }α∈Z 2 (k = 1, 2, 3) be the filters as follows ⎧ ⎧ cos ξ0 cos λ0 , α = (0, 0); cos ξ0 sin λ0 , α = (0, 0); ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ cos ξ0 sin λ0 , α = (0, 1); ⎨ − cos ξ0 cos λ0 , α = (0, 1); Bα = sin ξ0 cos λ0 , α = (1, 0); , Dα1 = sin ξ0 sin λ0 , α = (1, 0); , ⎪ ⎪ ⎪ ⎪ − sin ξ0 cos λ0 , α = (1, 1); ⎪ sin ξ0 sin λ0 , α = (1, 1); ⎪ ⎪ ⎪ ⎩ ⎩ 0 otherwise 0 otherwise
⎧ sin ξ0 cos λ0 , α = (0, 0); ⎪ ⎪ ⎪ ⎪ ⎨ sin ξ0 sin λ0 , α = (0, 1); Dα2 = − cos ξ0 cos λ0 , α = (1, 0); , ⎪ ⎪ − cos ξ0 sin λ0 , α = (1, 1); ⎪ ⎪ ⎩ 0 otherwise
⎧ sin ξ0 sin λ0 , α = (0, 0); ⎪ ⎪ ⎪ ⎪ ⎨ − sin ξ0 cos λ0 , α = (0, 1); Dα3 = − cos ξ0 sin λ0 , α = (1, 0); , ⎪ ⎪ cos ξ0 cos λ0 , α = (1, 1); ⎪ ⎪ ⎩ 0 otherwise
and let Uλ , Uξ be unitary matrices such that ⎛ ⎛ ⎞ ⎞ cos λ sin λ 0 0 cos ξ 0 sin ξ 0 ⎜ − sin λ cos λ ⎜ 0 0 ⎟ cos ξ 0 sin ξ ⎟ ⎟ , Uξ = ⎜ 0 ⎟ Uλ = ⎜ ⎝ 0 ⎝ − sin ξ 0 cos ξ 0 ⎠ . 0 cos λ sin λ ⎠ 0 0 − sin λ cos λ 0 − sin ξ 0 cos ξ Theorem 1. For positive integers N and M , choosing λ0 , λ1 , · · · , λN −1 , ξ0 , ξ1 , · · · , ξM−1 ∈ R such that π π λ0 + λ1 + · · · + λN −1 = 2n1 π + , and ξ0 + ξ1 + · · · + ξM−1 = 2n2 π + , 4 4 then {bN,M }α∈Z 2 = α
εnN +M −2
UnN +M −2
εnN +M −3
UnN +M −3 · · ·
εn1
0 ,ξ0 Un1 {Bα }λα∈Z 2 . (5)
0 ,ξ0 Un1 {Dαk }λα∈Z 2. (6) are two-dimensional lowpass and highpass wavelet filters, where
}α∈Z 2 = {dk,N,M α
εnN +M −2
UnN +M −2
εnN +M −3
UnN +M −3 · · ·
εn1
n1 , · · · , nN +M −2 ∈ {λ1 , λ2 , · · · , λN −1 , ξ1 , ξ2 , · · · , ξM−1 }
and εnj =
1 nj ∈ {λ1 , λ2 , · · · , λN −1 } . 2 nj ∈ {ξ1 , ξ2 , · · · , ξM−1 }
Remark 1. In Theorem 1, we construct wavelet filters by unitary transform 1 and unitary transform 2. In fact, we can define different unitary transforms of CQF by choosing different integers in (4). Many two-dimensional orthogonal wavelet filters can be constructed in terms of these unitary transforms. For example, if we choose integers Ai , Bi (i = 1, 2, 3, 4) in (4) such that B2 = B4 = 1, B1 = B3 = Ai = 0 (or A3 = A4 = 1, A1 = A2 = Bi = 0) (i = 1, 2, 3, 4), then we obtain a type of unitary transform which is called unitary transform T1 (or unitary transform T2). It can be proved that all tensor product two-dimensional wavelet filters can be constructed in terms of unitary transforms T1 and T2.
Image Ownership Verification Via Unitary Transform of CQF
3 3.1
297
Watermarking Scheme Watermark Preprocessing
The watermark used in our algorithm is binary image. We rotate the binary image into a real-numbered image by the method given by Y. Wang et al [2]. Furthermore, to make the watermark different from image to image, we scramble the watermark by chaos. 3.2
Watermark Embedding
The block diagram of the embedding process is given in Fig. 1. We insert the watermark into several middle-frequency sub-bands in discrete wavelet transform domain. One example of these sub-bands are shown in Fig. 2. We replace the selected sub-bands by a magnified preprocessed watermark image. The wavelet filters used in this scheme are derived by the unitary transform of CQF. In our watermarking scheme, unitary transform 1 and unitary transform 2 are used. N is set to be equal to M , and λ0 , λ1 , · · · , λN −1 , ξ0 , ξ1 , · · · , ξN −1 ∈ R are selected such that λi = 2π · rdx(i), λN −1 = 2nπ +
π − 4
ξj = 2π · rdy(j). N −2 i=0
λi ,
ξN −1 = 2nπ +
π − 4
(7) N −2
ξj .
(8)
j=0
where n ∈ Z, i, j = 0, 1, · · · , N − 2, and rdx(i) & rdy(j) can be randomly generated from a uniform distribution of zero mean and unit variance.
Fig. 1. Block diagram of embedding process
298
J. Yang et al.
Fig. 2. An example of middle-frequency sub-bands to embed the watermark
Fig. 3. Block diagram of detection process
3.3
Watermarking Detection
The watermark detector is correlation-based. The process is basically the inversion of the embedding process, and is accomplished without referring to the
Image Ownership Verification Via Unitary Transform of CQF
299
original image (blind). The block diagram of the detection process is given in Fig. 3. If any one of the correlations is greater than a preset threshold T , we claim that the suspect image contains the watermark. In our scheme, we set the threshold T to be 0.4 as in [2]. It can be proved that the false alarm probability is less than 1.356 × 10−11 .
4
Experimental Results
We use the standard gray level images lena, boat, mandrill, peppers, goldhill as test images. All these test images are of size 512 by 512 pixels. The algorithm and attacks are performed in the Matlab 6.5 environment. In our experiments, the symmetric extension mode is used for our discrete wavelet transform. Our binary watermark is of size 20 by 20, which is as shown in Fig. 4(a). The watermark is rotated by the method given in [2]. Then, the rotated watermark is scrambled by chaos. The scrambled watermark is denoted as w, which is as shown in Fig. 4 (b). We randomly generate a seed, and then by (7) and (8) we generate rdxk (i) and rdy k (j) (i, j = 1, 2, 3 k = 1, 2, 3, 4), and by (5) & (6), we derive 4 sets of wavelet (k) s,(k) filters {bk1 ,k2 }, {dk1 ,k2 } (k1 , k2 = 0, 1, 2, 3, 4, 5, k = 1, 2, 3, 4, j = 1, 2, 3). By the wavelet filters, we decompose test images. The number of decomposition levels is chosen to be four. Three middle-frequency sub-bands will be replaced by the rotated and scrambled watermark. In our experiments, watermarks are embed in the sub-bands as shown in Fig. 2 (three black domains in this figure). The selected sub-bands are replaced by a magnified watermark. We choose the threshold to be 0.4 as in [2], which produces a false alarm probability in order of less than 10−11 . In some cases, for the random generated wavelet filters, the image details of the watermarked images have noticeable degradation. To avoid this case, we choose the wavelet filters such that the peak signal to noise ratio (PSNR) of the watermarked image is not less than 39. The original “peppers” image and its watermarked version are shown in Fig. 5(a) and Fig. 5(b) respectively. We can see that our algorithm achieves perceptual invisibility.
(a)
(b)
Fig. 4. (a) Original watermark, (b) Scrambled watermark
300
J. Yang et al.
(a)
(b)
(c)
Fig. 5. (a) The original “peppers”; (b) the watermarked “peppers”; (c) the half cropped watermarked “peppers” Table 1. The correlations of noised watermark images by Gaussian noise with σ = 600 Images
Correlation1
Correlation2
Correlation3
lena
0.3766
0.5570
0.4647
boat
0.4117
0.4644
0.4559
mandrill
0.5814
0.5514
0.5688
peppers
0.3704
0.4371
0.3302
goldhill
0.2979
0.4396
0.4835
Table 2. The correlations of noised watermark images by 4% salt-and-pepper noise Images
Correlation1
Correlation2
Correlation3
lena
0.4016
0.4784
0.4602
boat
0.3256
0.4986
0.4464
mandrill
0.5449
0.5500
0.5532
peppers
0.4505
0.3506
0.3731
goldhill
0.2684
0.4106
0.4037
To verify the robustness of our watermark scheme, the watermarked images have been tested under various attacks. Table 1 and table 2 show the correlations computed from the Gaussian noise watermarked images (with σ 2 = 600), and slat-and-pepper noise watermarked images (with 4% salt-and-pepper noise) respectively. In Table 3 and Table 4, we list the correlations computed from histogram equalized and sharpened watermarked images respectively. Fig. 5(c) shows a cropped watermarked image of “peppers”. Table 5 shows the robustness of our watermark against the cropping attack. Table 6 lists the correlation values with JPEG compression quality 15%. From these results, we observe that our algorithm is robust to noise, histogram equalization, sharpening, half cropped, JPEG compression etc.
Image Ownership Verification Via Unitary Transform of CQF
301
Table 3. The correlations of histogram equalization of the watermark images Images
Correlation1
Correlation2
Correlation3
lena
0.7626
0.7191
0.7609
boat
0.6595
0.6049
0.6090
mandrill
0.7105
0.7140
0.6823
peppers
0.7345
0.6985
0.7070
goldhill
0.6837
0.7037
0.7014
Table 4. The correlations of the sharpened watermark images Images
Correlation1
Correlation2
Correlation3
lena
0.7301
0.6891
0.6959
boat
0.7000
0.6765
0.6271
mandrill
0.6798
0.6653
0.6235
peppers
0.7315
0.6793
0.6899
goldhill
0.6870
0.7127
0.6880
Table 5. The correlations of the half cropped watermark images Images
Correlation1
Correlation2
Correlation3
lena
0.5577
0.5387
0.5398
boat
0.5081
0.5014
0.4820
mandrill
0.4873
0.4950
0.4624
peppers
0.5411
0.5239
0.4992
goldhill
0.5359
0.5205
0.5469
Table 6. The correlations of the JPEG compression watermark images (quality of 15%) Images
Correlation1
Correlation2
Correlation3
lena
0.5010
0.5313
0.4997
boat
0.3959
0.4999
0.4137
mandrill
0.6373
0.6411
0.6148
peppers
0.4461
0.4526
0.3555
goldhill
0.4368
0.5575
0.5077
302
5
J. Yang et al.
Comparisons with Previous Methods
In this paper, we describe a wavelet-based watermarking system for ownership verification of digital images. To make the scheme more robust to counterfeit attempts, we need create as many ways as possible to randomly generate perfect reconstruction filters, which will increase the difficulty for counterfeiters to gain the exact knowledge of the filters [2]. By remark 1, we know that the tensor product wavelet filters can be constructed in terms of these unitary transforms T1 and T2. In other words, the tensor-product orthogonal wavelet filters used in [2], [3], [4] are only special cases in our construction. Therefore, the construction of two-dimensional orthogonal wavelet filters given in Section 2 provides more ways to randomly generate perfect reconstruction filters, and offers more room for optimization due to the ample choices of the filters. On the other hand, in our scheme, watermarks are inserted into different middle frequency sub-bands. The existence or absence of the watermark is determined by whether the correlation value is greater or smaller than a predetermined threshold T . For example, though the correlation values correlation1 and correlation3 in Table 1 are less than the threshold for the peppers image, the correlation value correlation2 is greater than 0.4. So we confirm that the watermark is present.
References 1. Barni, M., Bartolini, F., Piva, A.: Improved Wavelet-based Watermarking through Pixel-wise Masking. IEEE Trans. on Image Processing. 10 (2001) 783–791 2. Wang, Y., Doherty, J., Robert, E.: A Wavelet-based Watermarking Algorithm for Ownership Verification of Digital Image. IEEE Transactions on Image Processing, 11 (2002) 77–88 3. Huang, Z., Jiang, Z.: Image Ownership Verification via Private Pattern and Watermarking Wavelet Filters. In: Proceedings of the VII Digital Image Computing: Techniques and Applications (Sydney)(2003) 801–810 4. Huang, Z., Jiang, Z.: Watermarking still Images Using Parametrized Wavelet Systems. In Proceedings of Image and Vision Computing’03 (Palmerston North) (2003) 215–220 5. Lai, M.: Construction of Multivariate Compactly Supported Orthonormal Wavelets. Advances in Computational Math.(to appear) 6. Vaidyanathan, P., Nguyen, T., Doganata, Z., Saramaki, T.: Improved Technique for Design of Perfect Reconstruction Fir qmf Banks with Lossless Polyphase Matrices. IEEE Trans. Acoust. Speech,Signal Processing, 37 (1989) 1042–1056 7. Gonzalez, R., Woods R.: Digital Image Processing. Prentice Hall, 2nd edition (2001) 8. Zhao, D., Chen, G., Liu, W.: A Chaos-based Robust Wavelet-domain Watermarking Algorithm. Chaos, Solitons and Fractals, 22 (2004) 47–54
Inter Layer Intra Prediction Using Lower Layer Information for Spatial Scalability Zhang Wang1, Jian Liu1, Yihua Tan2, and Jinwen Tian2 1 Department of Electronics and Information, Huazhong University of Sci.&Tech., 430074, Wuhan, China
[email protected] 2 Key Lab on Image Processing & Intelligence Control, Huazhong University of Sci.&Tech., 430074, Wuhan, China
[email protected],
[email protected]
Abstract. This paper proposes an efficient scalable coding scheme with intra coding for spatial scalability, where the base layer is encoded with H.264/AVC compatible, and the enhancement layer is encoded with an improved extension mode of the base layer intra prediction. In order to improve coding efficiency of the enhancement layer, the proposed scheme uses base layer information for intra prediction when the enhancement layer information is not available. Experimental results show that the average increment on PSNR-Y is about 0.086dB for sequence Mobile and 0.11dB for sequence Foreman, with almost no changes on both the encoding Bit-rate and the computation complexity.
1 Introduction Scalability has become an important feature for video coding design for heterogeneous networks due to variable bandwidth, variable wired and wireless network conditions and variable terminal capabilities. Streaming of multimedia content over the Internet, where a variety of end-users may request the same material while experiencing different available bandwidth, is the natural environment for scalable video coding. The media provider, using these scalable video coding techniques, generates a single compressed bitstream, from which appropriate subsets, providing different visual quality, frame-rate and resolution, can be extracted to meet the bit-rate requirements of a broad range of clients without the necessity for a low-level transcoding. In case of scalable video coding solely, the usage of a code-stream parser will suffice. Recently, to combine the scalability feature in H.264/AVC [1], the experts of Joint Video Team (JVT) from both ITU-T and MPEG committees, are working for this goal. They have focused on the standardization of Scalable Video Coding (SVC) [2] which serves as an extension of the existing H.264 video compression standard. Generally, scalability mainly infers to three aspects: spatial, temporal and SNR quality. In SVC, to provide multiple resolutions, spatial scalability is achieved by using a layered approach which consists of one H.264 compatible base layer and several enhancement layers. For temporal scalability, a temporal decomposition D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 303 – 311, 2006. © Springer-Verlag Berlin Heidelberg 2006
304
Z. Wang et al.
technique called Motion Compensated Temporal Filtering (MCTF) [3] is used to provide multiple frame-rate levels, which performs temporal biorthogonal wavelet transform on frames within a group of pictures (GOP). Similar to H.264, SVC also uses Fine Granular Scalability (FGS) method based on the bit-plane coding to provide quality scalability [4][5]. In this paper, we investigate intra coding method for spatial scalability in SVC. Due to the compatibility requirement, base layer intra coding remains the same as in H.264. While for enhancement layer intra coding, an improved intra coding scheme is proposed. It uses base layer information for intra coding when the enhancement layer information is not available. We also illuminate when the base layer information should be used or not. The remainder of this paper is structured as follows: in section 2 we describe the intra coding method in H.264 used for base layer intra prediction. In section 3 we propose an improved intra coding scheme using base layer information for enhancement layer coding. The configuration used to perform the experiments and the corresponding results are reported and commented in section 4. Conclusions are drawn in section 5.
2 Base Layer Intra Coding Algorithm Due to the compatibility requirement, the base layer intra coding remains the same as in H.264/AVC. The H.264 standard exploits the spatial correlation between adjacent macroblocks or blocks for intra prediction [6]. In contrast to some previous video coding standards such as H.263 and MPEG-4, where intra prediction has been conducted in the transform domain, intra prediction in H.264 is always conducted in the spatial domain, by referring to neighboring samples of previously coded blocks which are to the left and/or above the block. 2.1 Multiple Intra Prediction Modes H.264 offers a rich set of prediction patterns for Intra prediction, i.e. nine prediction modes of Intra_4x4 type for luminance blocks and four prediction modes of Intra_16x16 type for luminance macroblocks [7]. For Intra_16x16 prediction type, the whole luminance component of a macroblock is predicted. The four Intra_16x16 modes are listed in Table 1. Vertical prediction(mode 0), horizontal prediction(mode 1) and DC prediction(mode 2) are similar to the same modes in Intra_4x4 prediction type except that instead of 4 neighbors, 16 neighbors are used to predict a 16x16 macroblock. Table 1. Four prediction modes of Intra_16x16 type Mode ID Direction
Mode 0 Vertical
Mode 1 Horizontal
Mode 2 DC
Mode 3 Plane
Inter Layer Intra Prediction Using Lower Layer Information for Spatial Scalability
305
For Intra_4x4 prediction type, each 4x4 luminance block is predicted from spatially neighboring samples as illustrated in Figure 1-(a). The 16 samples of the 4x4 block which are labeled as a-p are predicted using prior encoded samples in adjacent blocks labeled as A-Q. In addition to DC prediction mode (mode 2) where a mean value of all prediction samples is used to predict the entire 4x4 block, eight directional prediction modes are specified as illustrated in Figure 1-(b).
(a)
(b)
Fig. 1. (a) Intra_4x4 prediction is conducted for samples a-p of a block using samples A-Q. (b) 8 of the 9 prediction modes for INTRA_4x4 prediction.
2.2 Mode Selection Process To take the full advantage of those prediction modes described above, the H.264 standard select the best mode using the rate distortion optimization (RDO) [8]. Figure 2 shows the major steps for the intra prediction mode selection process.
Fig. 2. Intra prediction mode selection process in H.264/AVC
As we seen, firstly, the two types Intra_16x16 and Intra_4x4 select the best prediction mode of their own respectively. Then the obtained two candidates are determined by a final RDO mode decision to generate the outputting best mode. The RDO mode decision exhaustively searches the best mode for each 4x4 block [9] which produce the minimum rate-distortion (RD) cost given by
306
Z. Wang et al.
S (o, r, mod e QP, λmod e ) = D(o, r , mod e QP ) + λmod e × R (o, r , mod e QP ) .
(1)
where QP is the macroblock quantization parameter, and the Lagrangian multiplier
λmod e = 0.85 × 2QP / 3 ; D(.) means the distortion between the original 4x4 luminance block denoted by o and its reconstruction version r, and R(.) represents the number of bits associated with the chosen mode.
3 Enhancement Layer Intra Coding Algorithm 3.1 Inter Layer Frame Relationship In a spatial layered coding structure, when a frame in the enhancement layer is preparing to be coded, its corresponding frame in the base layer which has lower resolution has already been encoded and reconstructed. Furthermore, the base layer frame can be magnified to have the same resolution to the frame in enhancement layer via an efficient upsample filter [10]. Figure 3 shows the frame relationship between two spatial layers.
Fig. 3. Inter layer frame relationship between adjacent layers
As shown in Figure 3, both the frame-rate and spatial resolution of base layer are half of those in enhancement layer. Frame B1 and frame B2 have already been encoded and reconstructed inside a GOP. UB1 and UB2 are generated by using a 2-D upsample filter performing on B1 and B2, and have the same resolution as to the frames E1 and E3 in enhancement layer. 3.2 Inter Layer Intra Prediction Mechanism According to this important feature of UB1, it is obvious to see that the information from the base layer frame can be used for prediction in enhancement layer coding. The Intra_BL mode adopted in SVC is based on this concept, which uses the macroblock in frame UB1 to predict the corresponding position of the macroblock in frame E1 pixel by
Inter Layer Intra Prediction Using Lower Layer Information for Spatial Scalability
307
pixel [11]. By using Intra_BL mode, SVC has greatly improved the coding efficiency of enhancement layer. And here we propose an improved enhancement layer intra coding scheme using both information from the base layer and the enhancement layer. Figure 4 shows the principle of the proposed scheme.
Fig. 4. Inter layer intra prediction scheme
In Figure 4, the enhancement layer macroblock MBe uses adjacent macroblock values Ue and Le for intra prediction when they are available. Otherwise, MBe can also be predicted with the following rules: (a) In case of that only Ue is not available. The prediction value for MBe , pred_MBe is given by pred_MBe = P ( Le , αU b ,(1 − α ) Db , mod e) .
(2)
U b and Db represent the top and the bottom row of the upsampled base layer macroblock MBb respectively, α (0< α <1) is the weight coefficient for U b and Db , where i.e.,
α = 0.5, P(.) means the prediction mode decision process.
(b) In case of that only Le is not available. Similarly, pred_MBe is given by pred_MBe = P (U e , β Lb , (1 − β ) Rb , mod e) .
(3)
Lb and Rb represent the left and the right row of MBb respectively, β (0< β <1) is the weight coefficient for Lb and Rb , i.e., β = 0.5.
where
(c) In case of that both Le and Ue are not available. Hence, pred_MBe is given by pred_MBe= P(αUb,(1−α)Db, βLb,(1− β)Rb,mode) .
(4)
The main process of our proposed intra coding algorithm is described by the following computer program listings.
308
Z. Wang et al. The C language pseudocode of intra coding algorithm used in enhancement layer.
Intra Coding Algorithm Module Start if ( Current_Layer_ID = ENHANCEMENT ) { bEnhLeftAvailable = GetLeftAdjPixels(); bEnhUTopAvailable = GetUTopAdjPixels(); if ( bEnhLeftAvailable == TRUE && bEnhUTopAvailable == TRUE ) nCaseID = 0; else if ( bEnhLeftAvailable == TRUE ) nCaseID = 1; else if ( bEnhUTopAvailable == TRUE ) nCaseID = 2; else nCaseID = 3; Switch ( nCaseID ) { CASE 0: DoBaseIntraCoding(); break; CASE 1: DoIntraCodingA(); break; CASE 2: DoIntraCodingB(); break; CASE 3: DoIntraCodingC(); break; DEFAULT: NotDoAnything(); break; } // End of Switch Function } // End of if( Current_Layer_ID = ENHANCEMENT ) Intra Coding Algorithm Module End
4 Experiments and Result Analysis To evaluate the performance of the proposed enhancement layer intra prediction scheme, it is compared with the original intra coding method in SVC test model software JSVM-4.0 [12]. The standard test sequences Foreman and Mobile in QCIF and CIF format are used and the encoded frame-rate is 15 Hz. Both GOP size and Intra period are equal to 1 which means all frames are INTRA type. Only two spatial layers are presumed: one base layer and one enhancement layer, Table 2 lists the detailed configuration options for both two layers. The experiment results of encoded Bit-rate and PSNR-Y in enhancement layer are collected. Table 3 and Table 4 list these results with a comparison between Original Method and Proposed Method for Foreman and Mobile, respectively.
Inter Layer Intra Prediction Using Lower Layer Information for Spatial Scalability Table 2. The option configuration for two spatial layers Encoding Option Resolution Frame-rate AVC Compatible FRExt Intra Mode Usage Adaptive QP Num FGS Layers Inter Layer Prediction Close Loop Decode Loops QP Values Update Step Usage Entropy Mode
Base Layer QCIF 15 Hz Yes On On Yes 0 No L+H rate Single 8, 12,16, 20, 24, 28, 32, 36 Off CABAC
Enhancement Layer CIF 15 Hz No On On Yes 0 Adaptive L+H rate Single 8, 12,16, 20, 24, 28,32, 36 Off CABAC
Table 3. Bit-rate and PSNR-Y Values for Foreman QP Value 8 12 16 20 24 28 32 36
Original Method Bit-rate PSNR (kbit/s) -Y(dB) 7784.41 53.056 5815.18 49.674 4256.38 46.394 2923.16 43.002 1961.45 40.033 1276.71 37.321 790.07 34.559 481.48 32.108
Proposed Method Bit-rate PSNR (kbit/s) -Y(dB) 7787.90 53.168 5818.62 49.790 4254.51 46.508 2923.41 43.112 1962.41 40.141 1275.68 37.428 788.01 34.667 482.53 32.213
Table 4. Bit-rate and PSNR-Y Values for Mobile QP Value 8 12 16 20 24 28 32 36
Original Method Bit-rate PSNR (kbit/s) -Y(dB) 13426.35 53.001 11056.02 49.544 8996.51 46.045 7040.85 42.202 5372.50 38.633 3944.92 35.146 2753.65 31.518 1842.42 28.302
Proposed Method Bit-rate PSNR (kbit/s) -Y(dB) 13430.18 53.095 11058.55 49.635 9002.44 46.132 7041.26 42.289 5374.91 38.716 3943.87 35.227 2753.48 31.601 1843.33 28.381
309
310
Z. Wang et al.
For sequence Foreman in Table 3, the fluctuation range of Bit-rate is less than 4, which shows that the change of Bit-rate is negligible. The maximal increase of PSNRY is 0.116 dB on QP=12 and the minimum one is 0.105 dB on QP=36. As a result, the average increase of PSNR-Y is about 0.11dB for all QP values. Similarly, for sequence Mobile in Table 4, the fluctuation range of Bit-rate is less than 6. And the maximum increase of PSNR-Y is 0.094 dB on QP=8 and the minimum one is 0.079 dB on QP=36, so the average increase of PSNR-Y is about 0.086 dB for all QP values. Figure 5 and Figure 6 display the corresponding curve diagram related to the data values in Table 3 and Table 4 respectively. Obviously, since the curve for proposed method is always on top compared with that for original method, proposed method is better than the original method.
Y-PSNR [dB]
52 48 44 40 old
36 32 480
new
1480
2480
3480
4480
5480
6480
bit-rate [kbit/s]
Fig. 5. Bit-rate and PSNR-Y Curve for Foreman
52 Y-PSNR [dB]
48 44 40 36 old
32 28 1842
new
3842
5842
7842
9842
bit-rate [kbit/s]
Fig. 6. Bit-rate and PSNR-Y Curve for Mobile
11842
Inter Layer Intra Prediction Using Lower Layer Information for Spatial Scalability
311
5 Conclusion In this paper, we first introduce the concept of spatial scalability in scalable video coding, and then we describe the intra coding method in H.264 for the spatial base layer. Moreover, an improved intra coding scheme using base layer information for intra prediction is proposed for the spatial enhancement layer. Experiment results show that the average increase on PSNR-Y is about 0.086dB for Mobile and 0.11dB for Foreman, with almost no changes on both the encoding Bit-rate and the computation complexity. Therefore, the proposed scheme can be applied in the enhancement layer intra coding and as a complement to Intra_BL mode in SVC standard.
References 1. Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG: Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification (ITU-T Rec.H.264 ISO/IEC 14496-10 AVC) in JVT-G050, (2003) 2. Joint Video Team (JVT) of ISO/IEC MPEG & ITU- T VCEG: Scalable Video Coding Join Draft 4 in JVT-Q201, 17rd Meeting: Nice, France (2005) 3. Golwelkar, A., Woods, J. W.: Scalable Video Compression Using Longer Motion Compensated Temporal Filters. In Proc. SPIE Visual Communication Image Process, 5150 (2003) 1406-1417 4. Mayer, C., Crysandt, H., Ohm, J. R.: Bit Plane Quantization for Scalable Video Coding. In Proc. SPIE Visual Communication Image Process, 4671 (2002) 1142-1152 5. Vander S., M., Radha, H.: A Hybrid Temporal SNR Fine-granular Scalability for Internet Video. IEEE Trans. On Circuits, System and Video for Video Technology, 11 (2001) 318331 6. Wiegand, T., Sullivan, G. J., Luthra, A.: Overview of the H.264/AVC Video Coding Standard. IEEE Trans. On Circuits, System and Video for Video Technology, 13 (2003) 560-576 7. H.264/MPEG-4 Part 10 White Paper: Prediction of Intra Macroblocks. http://www.vcodex. -com/h264.html (2002) 8. Pan, F., Lin, X.: Fast Mode Decision for Intra Prediction in JVT-G013. 7rd Meeting: Pattaya, Thailand (2003) 9. Kim, C. S., Shih, H. H., Kuo, C. C. J.: Multistage Mode Decision for Intra Prediction. H.264 codec in 16rd Annual Symposium Electric Imagingm, (2004) 10. Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG: Adaptive Upsampling for Spatially Scalable Coding in JVT-O010, 15rd Meeting: Busan, Korea, (2005) 11. Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG: Intra-BL Prediction Considering Phase Shift in JVT-O023, 15rd Meeting: Busan, Korea, (2005) 12. Joint Video Team (JVT) of ISO/IEC MPEG & ITU- T VCEG: Joint Scalable Video Model JSVM-4 in JVT-Q202, 17rd Meeting: Nice, France, (2005)
Matching Case History Patterns in Case-Based Reasoning* Guoxing Zhao1, Bin Luo2, and Jixin Ma1 1
School of Computing and Mathematical Sciences, University of Greenwich, UK {g.Zhao, j.ma}@gre.ac.uk 2 School of Computer Science and Technology, Anhui University, China
[email protected]
Abstract. This paper introduces a mechanism for representing and recognizing case history patterns with rich internal temporal aspects. A case history is characterized as a collection of elemental cases as in conventional case-based reasoning systems, together with the corresponding temporal constraints that can be relative and/or with absolute values. A graphical representation for case histories is proposed as a directed, partially weighted and labeled simple graph. In terms of such a graphical representation, an eigen-decomposition graph matching algorithm is proposed for recognizing case history patterns.
1 Introduction The notion of case is fundamental for many real life applications. In conventional case-based systems, various cases in the world under consideration are usually represented as isolated episodes. Generally speaking, temporal representation and reasoning is essential for many areas in computer science, where one is interested not only in the representation of distinct episodes of an enterprise, but also in the temporal relations among the episodes. In particular, appropriate temporal representation and reasoning is fundamental for many case-based systems, where the history of cases, rather than isolated cases, plays an important role in solving problems including explanation, diagnosis, prediction, planning, process management, and history reconstruction, etc. For instance, in the area of medical information systems, the patients’ medical histories are obviously very important. In fact, to prescribe the right treatments, the doctor needs to know not only the patients’ current status, but also their previous health records. Similarly, in weather forecasting, without a good understanding of climate phenomena based on past observations, the weather expert cannot make good predictions of the future. Despite the fact that temporal representation and temporal reasoning have been neglected in most conventional case-based reasoning systems which only address snapshot episodes, a few interesting approaches have been proposed to incorporate the temporal concepts into isolated elemental cases. Examples of these are that of *
This research is supported in part by National Nature Science Foundation of China (No.60375010).
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 312 – 321, 2006. © Springer-Verlag Berlin Heidelberg 2006
Matching Case History Patterns in Case-Based Reasoning
313
Nakhaeizadeh [1], of Branting and Hasting [2], of Jaczynski and Trousse [3], of Hansen [4], and of Jare, Adamodt and Skalle [5]. The underlying time models employed in most of these systems are point-based, and therefore, it is required that absolute time points [1-4], or intervals delimited by a pair of points [5], must be associated with the time-dependent statement being addressed. However, there are many applications in which there may be just some relative temporal knowledge about the time-depended statements to hand, where their precise time characters such as the exact starting and finishing time are not available (e.g., “John ran 3 miles yesterday morning”, “John arrived at the office before Mary went to home”, etc.). Pattern recognition aims at the operation and design of technologies to pick up meaningful patterns in data [6]. While pattern classification is about putting a particular instance of a pattern in a category, the goal of pattern matching is to determine how similar a pair of patterns are [7]. The objective of this paper is to introduce a mechanism for case history representation and recognition. Section 2 presents the formalism, including: the temporal basis which allows expression of both absolute time values and relative temporal relations; a formal characterization of fluents and elemental cases; and two equivalent schemas for case history representation. A network, called Case History Graph, given in terms of a directed, partially weighted and labeled simple graph, is introduced in section 3 for graphical representation of case histories. In section 4, an eigen-decomposition algorithm is proposed for matching case history graphs, where some illustrated experimental results. Finally, section 5 provides the conclusions.
2 The Formalism We shall describe the formalism in terms of a many-sorted reified logic with equality [8], consisting of four disjoint sorts objects T, F, C and H, called time elements, fluents, elemental cases and case histories, respectively. Firstly, each time element is defined to be in one of the following four forms: (p1, p2) = {p | p1,p2,p∈R ∧ p1 < p < p2} [p1, p2) = {p | p1,p2,p∈R ∧ p1 ≤ p < p2} (p1, p2] = {p | p1,p2,p∈R ∧ p1 < p ≤ p2} [p1, p2] = {p | p1,p2,p∈R ∧ p1 ≤ p ≤ p2} where R is the set of real numbers, and ≤, < and = are the conventional order relations over real numbers. In this paper, p1 and p2 in the above shall be called the left-bound and right-bound of time element t, respectively. The absolute values as for the left and/or right bounds of some time elements might be unknown. In this case, real number variables are used for expressing relative relations to other time elements. If the left-bound and right-bound of time element t are the same, we shall call t a time point, otherwise t is called a time interval. Without confusion, we shall take time element [p, p] as identical to p. Also, if a time element is not specified as open or closed at its left (right) bound, we shall use “<” instead of “(” and “[” as for its left
314
G. Zhao, B. Luo, and J. Ma
bracket; similarly, we shall use “>” instead of “)” and “]” as for its right bracket. In addition, we define the duration of a time element t, Dur(t), as the distance between its left bound and right bound: t =
⇔ Dur(t) = p2 – p1 Following Allen’s terminology [9], we shall use Meets to denote the immediate predecessor order relation over time elements, defined by: Meets(t1, t2) ⇔ ∃p1,p,p2∈R( t1 = (p1, p) ∧ t2 = [p, p2) ∨ t1 = [p1, p) ∧ t2 = [p, p2) ∨ t1 = (p1, p) ∧ t2 = [p, p2] ∨ t1 = [p1, p) ∧ t2 = [p, p2] ∨ t1 = (p1, p] ∧ t2 = (p, p2) ∨ t1 = [p1, p] ∧ t2 = (p, p2) ∨ t1 = (p1, p] ∧ t2 = (p, p2] ∨ t1 = [p1, p] ∧ t2 = (p, p2]) It is easy to see that the intuitive meaning of Meets(t1, t2) is that, on the one hand, while t1 is an “earlier” time element compared with t2, there are no other time elements standing between them; on the other hand, time elements t1 and t2 don’t overlap each other (i.e., they don’t have any part in common, not even a point). Analogous to those introduced by Allen [9], other order relations between time elements can be derived in terms of the primitive relation Meets, including Equals, Before/After, Meets/Met-by, Overlaps/Overlapped-by, Starts/Started-by, During/Contains and Finishes/Finished-by [10]. As shown in [11], such a time model as adapted describe here has all the expressive power and convenience of the approach that treats intervals as primitive [9, 10, 12]. Specially, since the open/closed nature of a time element may be unspecified, it can overcome the disadvantage of conventional pointbased approaches in representing possibly incomplete temporal knowledge, and bypass some historical puzzles such as the so-called Dividing Instant Problem [13]. In what follows in this paper, we shall use TR to denote the set of these 13 exclusive temporal order relations. Secondly, a fluent is defined a statement (or proposition) whose truth-value is dependent on times. The sort of fluents F is characterized as the minimal set closed under the following rules: f1, f2∈F f1∨ f2∈F f∈F not(f)∈F In order to associate a fluent with a time element, we shall use a global-predicate [8, 14], Holds(f, t), to state that fluent f holds true over time t: Holds(f, ) ⇔ ∀p3,p4∈R(p1 ≤ p3 ∧ p4 ≤ p2 Holds(f, )) Holds(f, ) ∧ Holds(f, ) ∧ Meets(, ) Holds(f, )
Matching Case History Patterns in Case-Based Reasoning
315
Thirdly, an elemental case is defined as a collection of fluents. We shall use Belongs(f, c) to denote that fluent f belongs to case c [15]. Without confusion, we also use Holds(c, t) to state that case c holds over time t, provided that every fluent f belongs to case c holds true over time t: Holds(s, t) ⇔ ∀f(Belongs(f, s) Holds(f, t)) In addition, we introduce two binary operators, Union and Intersection, over the sort of elemental cases C, so that Union(c1, c2) and Intersection(c1, c2) denote the union, and the intersection, of case c1 and case c2, respectively: Belongs(f, Union(c1, c2)) ⇔ Belongs(f, c1) ∨ Belongs(f, c2) Belongs(f, Intersection(c1, c2)) ⇔ Belongs(f, c1) ∧ Belongs(f, c2) Finally, a case history h, can be formalized in terms of one of the following two equivalent schemas: In the first schema, Schema I, a case history is represented as a quadruple , where Caseh is a collection of elemental cases, Holdsh is a collection of Holds formulae, Relationh is a collection of temporal order relations, and Durh is a collection of duration knowledge. That is: Schema I h = Caseh = {chi | chi∈C, i = 1, …, m} Holdsh = {Holds(chi, thi) | thi∈T, 1 ≤ i ≤ m} Relatioh = {Relation1,2(th1, th2) | for some th1,th2∈Th, Relation1,2∈TR} Durh = {Dur(t) = r | for some t∈Th, r∈R} where Th is the minimal subset of T closed under the following rules: thi∈Th, i = 1, …, m; t∈Th ⇔ ∃t’∈Th(Meets(t, t’) ∨ Meets(t’, t)) It is for the reason of general treatment that the temporal relationships presented in the above Schema I are given in the form of a collection of order relations each of which can be any one of those 13 in TR, that is, Equal, Before, After, Meets, Overlaps, Overlapped-by, Met-by, Starts, Started-by, During, Contains, Finishes and Finished-by. However, since all these order relations can be derived from the single Meets relation, we shall have another schema, Schema II, which is equivalent to the Schema I: Schema II h = Caseh = {chi | chi∈C, i = 1, …, m} Holdsh = {Holds(chi, thi) | thi∈T, 1 ≤ i ≤ m} Meetsh = {Meets(th1, th2) | for some th1,th2∈Th} Durh = {Dur(t) = r | for some t∈Th, r∈R}
316
G. Zhao, B. Luo, and J. Ma
3 Graphical Representation of Scenarios In [16], a graphical representation for expressing temporal knowledge has been introduced in terms of a directed and partially weighted graph. It can be extended to express case histories presented in the Schema II as introduced in section 2. In fact, a given case history h can be represented in terms of a temporal network, defined as a directed, partially weighted/labeled simple graph Gh, called Case History Graph, where: • Each time element t in Th is denoted as a directed arc of the graph labeled by t that is bounded by a pair of nodes, which are called the tail-node, and the headnode, of the arc, respectively. • Each relation Meets(ti, tj) in Meetsh is represented by means of merging the headnode of ti and the tail-node of tj as a common node, of which ti is an in-arc and tj is an out-arc, respectively (see Fig. 1). • Each formula Holds(chi, thi) in Holdsh is represented by means of simply adding chi as an additional label to the arc labeled by the corresponding thi. For any time element t in Th, if there is no Holds knowledge, it will be labeled by the empty state {}. • Each piece of duration knowledge Dur(t) = r in Durh is expressed as a real number, r, alongside the corresponding arc t.
tj’’ ti’’
ti’
tj’
tj’’’
ti’’’
ti
tj
Meets(ti, tj)
ti’’’
ti’’
tj’
tj’’
ti’ tj’’’
ti
tj
Fig. 1. Merging the head-node of ti and the tail-node of tj as a common node if Meets(ti, tj)
Matching Case History Patterns in Case-Based Reasoning
317
In what follows, we shall simply assume |F| = n. Corresponding to case history graph Gh with m nodes, we define a m*m-matrix Mh, named the characteristic matrix, where Mh(u, v) is a (n+1)-dimension vector luv∈Rn+1, such that: (a) For any adjacent pair of nodes u and v in Gh, if (u, v) is an arc representing time element t, then luv(k) = 1 if Holds(fk, t), otherwise luv(k) = 0, 1 k n; and luv(n+1) = Dur(t). (b) For any non-adjacent pair of nodes u and v in Gh, luv = <w, w, …, w>, where w is a negative real number, which will be use to adjust the edit-distance of deleting operations in graph matching process. In this paper, we shall use M kh to denote the matrix whose u-v-entry is the k-th element of u-v-entry in Mh.
4 Eigen-Decomposition Graph Matching Algorithm Spectral graph theory is a branch in mathematics which aims to characterize the properties of graphs using the eigen-values and eigen-vectors of the adjacency matrix or the closely related Laplacian matrix [17]. However, conventional spectral-based approaches usually deal only with symmetric real matrices, where the adjacency matrices of directed graphs, like the case history graph introduced in this paper, are in general asymmetric. Moreover, the entries of the characteristic matrix of a given case history graph are (n+1)-dimension vectors, rather than single real values as in conventional spectral-based models. In what follows, we shall extend the so-called eigen-decomposition graph matching algorithm, proposed by Umeyama [18], to match case history graphs. 4.1 Definition of Case History Similarity In what follows in this paper, if M is a complex number matrix, we shall use ||M|| to denote the Frobenius-norm of M, and |M| to denote the matrix whose elements are the module of the corresponding elements of M. Given two case histories h1 and h2, assume the characteristic matrices are Mh1 and h2 M , with size m1*m1 and m2*m2. Without losing the generality, we assume m1 = m2 = m. In fact, if m1 < m2, we can simply add m2 – m1 isolated dummy nodes to graph Gh1 to get an extended graph Gh1’, whose characteristic matrix Mh1’ will have the same size as that of Mh2, i.e., m2*m2. Similar treatment can be applied to the case where m2 < m1. The similarity degree between h1 and h2 is therefore universally defined by: n +1
min
sim(h1 , h2 ) = 1 −
p∈ perm( m ) n +1
¦
pM kh1 pT − M kh 2
2
k =1
¦( M
h1 2 k
2
+ M kh 2 )
k =1
where perm(m) denotes the set of all m*m permutation matrices. It is easy to see that sim(h1, h2) falls in the range of [0, 1].
318
G. Zhao, B. Luo, and J. Ma
4.2 Calculating the Similarity The similarity degree between two case graphs defined in section 4.1 only involves calculating the minimal value with respect to all the possible permutation matrices. In what follows, we extend Umeyama’s algorithm as define for a single pair of asymmetric matrices to m pairs of asymmetric matrices. In fact, to calculate n +1
min
p∈ perm ( m )
¦
pM kh1 pT − M kh 2
2
k =1
we defined
Ekh1 =
M kh1 + ( M kh1 )T M h1 − ( M kh1 )T + −1 k 2 2
Ekh 2 =
M kh 2 + ( M kh 2 )T M h 2 − ( M kh 2 )T + −1 k 2 2
where k = 1, 2, …, n, (n+1). h1 h2 Since Ek and Ek are Hermitian matrices, we can get the eigen-compositions h1
h2
of Ek and Ek as:
Ekh1 = U kh1Dkh1 (U kh1 )* Ekh 2 = U kh 2 Dkh 2 (U kh 2 )* h1
h2
h1
h2
where U k and U k are unitary matrices, and Dk and Dk are diagonal matrices h1
h2
formed from the ordered eigen-values of Ek and Ek , respectively. N.B. Here, * denotes the Hermitian transposition. Let
V1 =< U 1h1 , U 2h1 , , U nh+11 > V2 =< U1h 2 , U 2h 2 , , U nh+21 > Then, we get the optimized permutation matrix p by means of using the Hungarian algorithm:
p = Hungarian (V2V1T ) 4.3 Experimental Results The algorithm has been implemented in MatLab. What follows describes some experiments conducted, where the corresponding weight w was set as -0.3. As shown in Fig. 2 – Fig. 8, for the 7 pairs of case history graphs, Gih1 and Gih2 (i = 1, 2, …, 7), the corresponding results computed from the algorithm provide a well stable similarity measurement as expected.
Matching Case History Patterns in Case-Based Reasoning
Fig. 2. The similarity between G1h1 and G1h2 is 1
Fig. 3. The similarity between G2h1 and G2h2 is 0.7800
Fig. 4. The similarity between G3h1 and G3h2 is 0.8413
Fig. 5. The similarity between G4h1 and G4h2 is 0.9148
Fig. 6. The similarity between G5h1 and G5h2 is 0.7301
Fig. 7. The similarity between G6h1 and G6h2 is 0.7669
319
320
G. Zhao, B. Luo, and J. Ma
Fig. 8. The similarity between G7h1 and G7h2 is 0.5711
4.4 Computational Complexity For two given case history graph Gh1 and Gh2 with m nodes and n fluents, the computation consists of two significant parts. The first part involves 2*(n+1) calculations as for the eigen-decomposition of matrix with size m*m, giving a complexity of O(nm3). The second part involves in applying Hungarian algorithm to a matrix of size m*m, giving a computational complexity of O(m3). Therefore, the overall complexity of matching two case history graphs is O(nm3).
5 Conclusions In this paper, we have introduced a mechanism for representing and recognizing case histories with rich internal temporal aspects, in the domain of case-based reasoning. The formalism includes two equivalent schemas for case history formalization and a graphical representation corresponding to the unified second schema. It is shown that case history pattern recognition and matching can be simply transformed into graph matching. By means of extending the eigen-decomposition algorithm from weighted graph to vector labeled graph, we can get ideal results in matching pairs of case history graphs for most states of affairs. However, in some special states of affairs, the algorithm may fail to work as expected. The future work of this research includes identifying the reason of the failure, and improving the algorithm for general real life applications.
References 1. Nakhaeizadeh, G.: Learning Prediction of Time Series: A Theoretical and Empirical Comparison of CBR with Some Other Approaches. In Proceedings of the Workshop on CaseBased Reasoning, AAAI-94. Seattle, Washington (1994) 67-71 2. Branting, L., Hastings, J.: An Empirical Evaluation of Model-Based Case Matching and Adaptation. In Proceedings of the Workshop on Case-Based Reasoning, AAAI-94. Seattle, Washington (1994) 72-78 3. Jaczynski, M.: A Framework for the Management of Past Experiences with TimeExtended Situations. In Proceedings of the 6th International Conference on Information and Knowledge Management (CIKM'97), Las Vegas, Nevada, USA (1997) 32-39 4. Hansen, B.: Weather Reasoning Predication Using Case-Based Reasoning and Fuzzy Set Theory, MSc Thesis, Technical University of Nova Scotia, Halifax, Nova Scotia, Canada (2000) 5. Jare, M., Aanodt, A., Shalle, P.: Representing Temporal Knowledge for Case-Based Reasoning. Proceedings of the 6th Euroupean Conference, ECCBR 2002, Aberdeen, Scotland, UK (2002) 174-188
Matching Case History Patterns in Case-Based Reasoning
321
6. Tveter, D.: The Pattern Recognition Basis of Artificial Intelligence. Wiley-IEEE Computer Society Press (1998) 7. Theodoridis, S., Koutroumbas, K.: Pattern Recognition. Second Edition, Academic Press (2003) 8. Ma, J., Knight, B.: Reified Temporal logic: An Overview, Artificial Intelligence Review, Vol. 15 (2001) 189-217 9. Allen, J.: Maintaining Knowledge about Temporal Intervals. Communications of the ACM, Vol. 26 (11) (1983) 832-843 10. Allen, J., Hayes, P.: Moments and Points in an Interval-based Temporal-based Logic. Computational Intelligence, Vol. 5 (1989) 225-238 11. Ma, J., Hayes, P.: Primitive Intervals Vs Point-Based Intervals: Rivals Or Allies? The Computer Journal, Vol. 49(1) (2006) 32-41 12. Ma, J., Knight, B.: A General Temporal Theory. The Computer Journal, Vol. 37(2) (1994) 114-123 13. Ma, J. and Knight, B.: Representing The Dividing Instant. The Computer Journal, Vol. 46(2) (2003) 213-222 14. Shoham, Y.: Temporal Logics in AI: Semantical and Ontological Considerations, Artificial Intelligence Vol. 33 (1987) 89-104 15. Shanahan, M.: A Circumscriptive Calculus of Events, Artificial Intelligence, Vol. 77 (1995) 29-384 16. Knight, B. and Ma, J.: A General Temporal Model Supporting Duration Reasoning, Artificial Intelligence Communication, Vol. 5(2) (1992) 75-84 17. Chung, F.: Spectral Graph Theory, CBMS series 92, American Mathematical Society, Province, RI 1997 18. Umeyama, S.: An Eigendecomposition Approach to Weighted Graph Matching Problems, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 10(5) (1988) 695- 703
Moment Invariant Based Control System Using Hand Gestures P. Premaratne1, F. Safaei1, and Q. Nguyen2 1
School of Electrical, Computer & Telecommunications Engineering, The University of Wollongong, North Wollongong, 2522, NSW, Australia 2 Research School of Information Sciences and Engineering, Australian National University, Canberra, ACT, Australia [email protected]
Abstract. Almost all consumer electronic equipments today use remote controls for user interface. However, the variety of physical shapes and functional commands that each remote control features also raises numerous problems: the difficulties in locating the wanted remote control, the confusion among the button layout, the replacement issue, etc. Consumer electronics control system using hand gestures is a new innovative user interface that will resolve the complications of using numerous remote controls for domestic appliances. Based on one unified set of hand gestures, this system interprets the user hand gestures into pre-defined commands to control one or many devices simultaneously. The system has been tested and verified under both incandescent and fluorescent lighting conditions. The experimental results are very encouraging as the system produces real-time responses and highly accurate recognition toward various gestures.
1 Introduction Human Computer Interaction (HCI) has become an increasingly important part of our lives due to massive technological infusion into our lifestyles. Whether it is our lounge room, bedroom or office room, there could be number of electronic equipments that need commands to provide some valuable tasks. It could be the television set, the VCR or the Settop Box waiting for our command to provide us with music or perhaps news and the command may reach them with a push of a button of a remote controller or a keyboard. People have long tried to replace these items using voice recognition or glove-based devices [1, 2, 3, 4, 5] with mixed results. Glovebased devices are tethered to the main processor with cables which restricts the user’s natural ability to communicate. Many of those approaches have been implemented to focus on a single aspect of gestures, such as, hand tracking, hand posture estimation, or hand pose classification using uniquely colored gloves or markers on hands/fingers [6, 7, 8 , 9, 10, 11, 12, 13]. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 322 – 333, 2006. © Springer-Verlag Berlin Heidelberg 2006
Moment Invariant Based Control System Using Hand Gestures
323
Our research distinguishes from the previous attempts due to few marked differences 1. A minimum number of gestures are used to offer higher accuracy with less confusion 2. Only low processing power is required to process the gestures there by useful for simple consumer control devices 3. Very robust to lighting variations 4. Real-time operation The desire to develop a limited set of hand gestures that are distinctive has improved the processing accuracy of captured gestures with less computing power. This also requires a less sophisticated classification system using neural networks that does not need much processing power to work at real-time. The system has been thoroughly tested under both incandescent and fluorescent lighting to simulate home environments. It also incorporates text overlaid feedback to restrict the system responding to unintentional hand movements. In section 2, we discuss the hardware implementation that realizes this system incorporating camera, hand gesture processing unit and parallel-port-controlled universal remote control interface. Section 3 is about gesture registration followed by feature extraction in section 4. Section 5 discuses the neural network implementation of the classification system and experimental results and the conclusions are drawn in section 6.
2 Hardware Design Overview The system initially captures the user hand gestures via a webcam. Real-time image data is then processed using MATLAB software package. Once the user hand gesture matches with a pre-defined command, the command will be issued to the corresponding remote control via parallel port. The following diagram illustrates the system in operation. The primary objective of the hardware designing process is to construct an interface circuitry for the parallel port to manipulate the output of the remote control. The operation methodology of the interface circuit, however, can be simplified in the block diagram shown in Fig. 2.
Fig. 1. System Overview
324
P. Premaratne, F. Safaei, and Q. Nguyen
Fig. 2. Hardware Module Block Diagram
3 Gesture Registration The captured hand gestures from a real-time video stream need to be processed before they can be interpreted by a computer. It is extremely important that the captured image is registered as a hand gesture using skin segmentation after removing the background of the image. The skin segmentation techniques used in this research involves converting the image from RGB format to YCbCr format [14]. Then a threshold filter is applied to remove ‘non-skin’ components. The major advantage of this approach is that the influence of luminosity can be removed during the conversion process. Thus it makes the segmentation less dependent on the lightning condition, which has always been a critical obstacle for image recognition. The threshold values were obtained using our own data set.
Fig. 3. RGB Format(left) versus YCbCr Format(right)
Having taken a number of sample images, a random one is picked out to find the characteristics of the pixels representing the skin region. The pixels’ properties are categorized to determine the threshold limits for the filter. Then, the converted picture in YCbCr format is viewed in the imtool of MATLAB so that every pixel and its
Moment Invariant Based Control System Using Hand Gestures
325
associated values such as x, y coordinates and intensity can be determined accurately. number of sample points that represent skin patches and non-skin patches are obtained. Fig. 3 highlights the color variation under different color formats. According to Fig. 4, both of the ‘skin’ pixels and the ‘non-skin’ pixels have their luminosity values spreading over the full range, i.e. the Y component of Skin Patch is from 110 to 165 whereas that of Non-Skin Patch is from 16 to 208. This property of the Y component implies that we are not filtering the luminance of the image, but the other two remaining components, Cb and Cr. For better visualization, the Cb and Cr values of both Skin and Non-skin patches are plotted in a graph to find the region that they are likely to fall in. As can be seen from Fig. 4, the ‘skin’ pixels clearly distinguish themselves from the ‘non-skin’ ones. The thresholds of the filter, hence, can be determined easily. Similar approach is repeated upon another image in fluorescent lighting condition. The threshold values, as expected, are quite different as depicted in Table 1. Skin Patch vs Non-Skin Patch 150 145
Cr
140 135 130 125 120 110
115
120
125
130
135
Cb Skin Patch
Non-Skin Patch
Fig. 4. Correlation between cr and cb for skin patch and non-skin patch pixels Table 1. Filter Thresholds of Different Lighting Conditions
Incandescent Incandesce
100 ≤ Cb ≤ 122 132 ≤ Cr ≤ 150
Fluorescent
114 ≤ Cb ≤ 128 140 ≤ Cr ≤ 158
A number of images are used to evaluate effectiveness of the classification system. Both the original image and the skin-segmented image are used to observe and verify the accuracy of the segmentation algorithm. Images with low lightning condition are also tested. It is quite apparent from Fig. 5 that there are always a number of noisy spots in the filtered images, regardless of the lightning condition. This distortion, as expected, becomes more pronounced in low lighting conditions. As a result, the skinsegmented image is noisy and distorted and will likely result in incorrect recognition at the subsequent stages. These distortions, however, can be removed during the gesture normalization stage.
326
P. Premaratne, F. Safaei, and Q. Nguyen
Original Image
Bad Lighting Condition
Skin Segmentation
Skin Segmentation
Fig. 5. Skin Segmentation
3.1 Gesture Normalization Gesture normalization is done by the well-known morphological filtering technique, erosion combined with dilation [15]. The output of this stage is a smooth region of the hand figure, which is stored in a logical bitmap image as shown below in Fig. 6.
Original Image
Skin Segmentation Image
Smoothed Image
Fig. 6. Gesture Normalization
The experiments are carried out on an average computer of 1.6 GHz with 256 MB RAM. This is mainly to determine whether the system can operate from a settop box with limited processing power. The observed execution time of 0.2 seconds is acceptable as it consumes only one fifth of the available processing time (1 second). Shorter execution time can be obtained on a computer with better specification. Above all, when the system is implemented into a single Integrated Circuit (IC), hardware-based processing will be swifter.
4 Feature Extraction It is straight forward to realize that effective real-time classification can not be achieved by using attempts such as template matching [16]. Template matching itself
Moment Invariant Based Control System Using Hand Gestures
327
is very much prone to error when a user can not reproduce an exact hand gesture: to a gesture that is already stored in the library. It also fails due to variance to scaling as the distance to the camera may produce scaled version of the gesture. The gesture variations due to rotation, scaling and translation can be circumvented by using set of features that are invariant to these operations. Moment invariants offer a set of features that encapsulate these properties. 4.1 Moment Invariants Moment invariants algorithm has been known as one of the most effective methods to extract descriptive feature for object recognition applications. The algorithm has been widely applied in classification of aircrafts, ships, ground targets, etc [17]. Essentially, the algorithm derives a number of self-characteristic properties from a binary image of an object. These properties are invariant to rotation, scale and translation. Let f(i,j) be a point of a digital image of size M×N (i = 1,2, …, M and j = 1,2, …, N). The two dimensional moments and central moments of order (p + q) of f(i,j), are defined as: M
N
m pq = ¦¦ i p j q f (i, j ) .
(1)
i =1 j =1
M
N
U pq = ¦¦ (i − i ) p ( j − j ) q f (i, j ) .
(2)
i =1 j =1
Where
i=
m10 m00
j=
m01 m00
From the second order and third order moments, a set of seven moment invariants are derived as follows [18]:
φ1 = η 20 + η 02
.
φ 2 = (η 20 − η 02 ) 2 + 4η11 2
(3) (4)
.
φ3 = (η 30 − 3η12 ) 2 + (3η 21 − η 03 ) 2
(5)
.
φ 4 = (η 30 + η12 ) 2 + (η 21 + η 03 ) 2
(6)
φ5 = (η 30 − 3η12 )(η 30 + η12 )[(η 30 + η12 ) 2 − 3(η 21 + η 03 ) 2 ]
[
+ (3η 21 − η 03 )(η 21 + η 03 ) 3(η 30 + η12 ) 2 − (η 21 + η 03 ) 2
]
.
φ 6 = (η 20 − η 02 )[(η 30 + η12 ) 2 − (η 21 + η 03 ) 2 ] + 4η11 (η 30 + η12 )(η 21 + η 03 )
(7)
(8)
328
P. Premaratne, F. Safaei, and Q. Nguyen
φ 7 = (3η 21 − η 03 )(η 30 + η12 )[(η 30 + η12 ) 2 − 3(η 21 + η 03 ) 2 ]
[
− (η 30 − 3η12 )(η 21 + η 03 ) 3(η 30 + η12 ) 2 − (η 21 + η 03 ) 2
]
.
(9)
Where η pq is the normalized central moments defined by:
η pq =
U pq U 00r
where
r = [( p + q ) / 2] + 1 and p + q = 2,3,...
4.2 Example of Invariant Properties Fig. 7 shows images containing letter ‘A’, rotated and scaled, translated and noisy versions of letter ‘A’. There respective moment invariants calculated using formulas using Equations (1) to (9) are shown in Table 2.
Normal ‘A’
Rotated and Scaled
Translated
Noisy
Fig. 7. Letter ‘A’ in Different Orientations Table 2. Moment Invariants of the Different Orientations of Letter ‘A’
ĭ1 ĭ2 ĭ3 ĭ4 ĭ5 ĭ6 ĭ7
A1 0.2165 0.001936 3.69E-05 1.64E-05 -4.03E-10 7.21E-07 0
A2 0.2165 0.001936 3.69E-05 1.64E-05 -4.03E-10 7.21E-07 0
A3 0.204 0.001936 3.69E-05 1.64E-05 -4.03E-10 7.21E-07 0
A4 0.25153 0.002161 0.004549 0.002358 7.59E-06 7.11E-05 1.43E-06
It is obvious from Table 2 that the algorithm produces the same result for the first three letters ‘A’ despite of the different transformations applied upon them. There is only one value, i.e. ĭ1, displays a small discrepancy of 5.7% due to the difference in scale. The other values of the three figures are effectively the same for ĭ2, ĭ3, ĭ4, ĭ5, ĭ6 and ĭ7. The last letter, however, reveals the drawback of the algorithm: it is susceptible to noise. Specifically, the added noisy spot in the letter has changed the
Moment Invariant Based Control System Using Hand Gestures
329
entire moment invariants set. This drawback suggests that moment invariants can only be applied on noise-free images in order to achieve the best results. Since the algorithm is firmly effective against transformations, a simple classifier can exploit these moment invariants values to differentiate as well as recognize the letter ‘A’ from other letters, such as the letter ‘L’. 4.3 Application The example has proved that moment invariants can be used for object recognition applications since it is rigidly invariant to scale, rotation and translation. The following account summarizes the advantages of moment invariants algorithm for gesture classification. • For each specific gesture, moment invariants always give a specific set of values. These values can be used to classify the gesture from a sample set. The set of chosen gestures have a set of unique moments. • Moment invariants are invariant to translation, scaling and rotation. Therefore, the user can issue commands disregarding orientation of his/her hand. • The algorithm is susceptible to noise. Most of this noise, however, is filtered at the gesture normalization stage. • The algorithm is moderately easy to implement and requires only an insignificant computational effort from the CPU. Feature extraction, as a result, can be progressed rapidly and efficiently. • The first four moments, ĭ1, ĭ2, ĭ3, and ĭ4 are adequate to represent a gesture uniquely and hence result in a simple feature vector with only four values.
5 Gesture Classification Having accomplished all the above stages, we have successfully extracted a data set from an image of a user hand gesture. However, this data set remains meaningless unless the program can interpret it into a preset command to control the electronic device. Despite of the special properties of the moment invariant algorithm against the gesture variance, there are still unwanted errors in the resultant data set. The classification process, thereby, can be done either by a nearest-neighbor classifier or via a neural network. On one hand, both of the methods require a training set containing a number of sample images. After the feature extraction stage, each group of the sample images that represent the same gesture produces a certain range of ĭ1, ĭ2, ĭ3, and ĭ4. These ranges are then used as preset values to classify a random input image. The procedure implicitly states that the more samples we have, the better the classification becomes. On the other hand, the nearest-neighbor classifier is more computationally intensive than the neural network. In particular, the former approach involves in calculating the distance from a new point to all of the points in the sample set. The
330
P. Premaratne, F. Safaei, and Q. Nguyen
value of the new point is then rounded to that of the sample point which produces the minimum distance. Therefore, the more values in the sample set, the longer it takes to compute and determine the output, especially when the system complexity is elevated. Statistical approach can be applied to determine a small number of prototypes out of a sample set. However, it is extremely time-consuming to analyse the method and therefore not practically feasible to achieve real-time operation. Neural network classifier, however, proves itself to be more effective and more efficient. Neural network has been applied to perform complex functions in numerous applications including: pattern recognition; classification; identification and so on. Once implemented, it can compute the output significantly quicker than the nearestneighbor classifier. Neural network also encompasses the ability to learn and predict over the time. This property enables the system to be viewed more as a human-like entity that can actually understand the user, which is also one of the major objectives of our research. 5.1 The Proposed Neural Network Design The designed neural network is a backpropagation network, in which input vectors (the sample set of user hand gestures) and the corresponding target vectors (the commands set) are used to train the network until it can approximate a function between the input and the output. There are a number of neuron layers between the input and the output. Each layer contains a number of nodes whose properties are characterized by a transfer function such as Log-Sigmoid transfer function, TanSigmoid transfer function or linear transfer function. In this particular design, there are only three layers due to the limited number of hand gestures to be classified. More complex network can be possibly designed and implemented, but it is neither practical nor necessary for our research. The first layer, i.e. the input layer, contains four nodes simulating the tansig transfer
2 −1 1 + e−2 x function . The second layer, i.e. the hidden layer, contains three f ( x) = x . The reason why the nodes and it simulates the purelin transfer function f ( x) =
layer is called hidden layer is that the layer is not directly connected to the input and the output of the network. This also implies that more complex network can possibly be designed, but it is not necessary and practical for this particular project. The last layer, i.e. the output layer, consists of ten nodes simulating the logsig transfer
f ( x) =
1 1 + e − x . Each of the above functions encompasses different
function properties and can be exemplified as follows:
The input is the set of moment invariant values derived from the sample set of hand gestures. The output is the target set of commands corresponding to each gesture. For better visualization, the network can be illustrated as follows (Fig. 8):
Moment Invariant Based Control System Using Hand Gestures
331
Fig. 8. Structure of the Neural Network Classifier
In Fig. 8, W represents the weighting function in which each input is weighted with an appropriate w; b represents the bias coefficient and it is set to 1 in this design. The sum of the weighted inputs and the bias forms the input to the transfer function f. In this case, f = tansig for the input layer, f = purelin for the hidden layer and f = logsig for the output layer. After the network is initialized, 300 sample values of seven gestures and 300 corresponding outputs are used to train the network. The iteration limit is set to 500 times and the Mean Square Error (MSE) is set to 0.05. These limits assure that the network has a sufficient set of training data to develop an accurate transfer function between the input and the output. Furthermore, the network is trained twice to improve the accuracy and the precision of the transfer function. If the network fails the training, i.e. exceeding the maximum iteration but not meeting the preset goal MSE, we simply initialize a new network and restart the training session again. The output of the above code is a 10×1 vector named label, in which there is only one number 1 and the rest is 0’s. The index of the only number 1 is also the command that the input gesture is interpreted. For instance, label = [0 0 1 0 0 0 0 0 0 0] indicates command number 3; label = [0 0 0 0 1 0 0 0 0 0] indicates command number 5. These commands are then transferred to Interface Circuit to control the device. 5.2 Test Results The experiments were carried out on a computer featuring a 1.6 GHz processor with 256 MB of RAM running MATLAB 7.01 SP1. A software program RCS was written display the feedback to the user as well as to display command being implemented when the hardware is controlled. A Panasonic CRT television and a VCR were used for the experiments. Currently the system needs few seconds to analyse the user’s hand in order to determine the threshold value for skin segmentation and store it. The first gesture needed to initialize the hardware is the Start followed by Power-On gesture. This can be followed by VCR or TV selection. Even though we used only two consumer electronics equipments, any number of equipments can be controlled.
332
P. Premaratne, F. Safaei, and Q. Nguyen
Any command can be issued randomly however, if they are not issued in a logical manner, a proper course of action can not be taken. For instance, if Up or Down command is issued prior to Volume or Channel, even though the command is recognised, no action will be taken. The system was observed to be 100% accurate under normal lighting conditions for both fluorescent and incandescent lights. The tests have firmly consolidated the hardware design and the software interface of the developed prototype. The hardware module produces a very fast response to the outputs of the parallel port as well as delivers correct commands to the remote control.
6 Conclusions and Future Work The system is developed to reject unintentional and erratic hand gestures (such as kids’ irrational movements) and to supply visual feedback on the gestures registered. In our research, we have managed to invent a set of gestures that are distinct from each other yet easy to recognize by the system. This set has unique 4 invariant moments which result in highly accurate and real-time classification. The accuracy of the control system was 100% and was mainly due to the limited number of hand gestures. This set of hand gestures is adequate for any consumer electronic control system. The software interface produces unique key mapping ability such that Volume gesture in TV mode can be mapped to Speed function of a ceiling fan. In future, we expect to utilize IR camera to address poor lighting conditions. This system is currently ready to be implemented on dedicated hardware such as a digital TV settop box.
References 1. Baudel, T., M. Baudouin-Lafon, Charade: Remote Control of Objects Using Free-Hand Gestures. Comm. ACM, 36(7), (1993) 28-35 2. Fels, S.S. , Hinton, G.E., Glove-Talk: A Neural Network Interface between a Data-Glove and a Speech Synthesizer.IEEE Trans. Neural Networks, 4, (1993) 2-8 3. Quam, D.L.: Gesture Recognition with a DataGlove.Proc. 1990 IEEE National Aerospace and Electronics Conf., 2, (1990) 755-760 4. Sturman, D.J., Zeltzer, D.: A Survey of Glove-Based Input. IEEE Computer Graphics and Applications, 14,(1994)30-39 5. Wang, C., Cannon, D.J.: A Virtual End-Effector Pointing System in Point-and-Direct Robotics for Inspection of Surface Flaws Using a Neural Network-Based Skeleton Transform. Proc. IEEE Int’l Conf. Robotics and Automation, 3,( 1993) 784-789 6. Cipolla, R., Okamoto, Y., Kuno, Y.: Robust Structure From Motion Using Motion Parallax. Proc. IEEE Int’l Conf. Computer Vision, (1993)374-382 7. Davis J., Shah, M.: Recognizing Hand Gestures. Proc. European Conf. Computer Vision, Stockholm, (1994)331-340 8. Kuno, Y., Sakamoto, M., Sakata, K., Shirai, Y.: Vision-Based Human Computer Interface With User Centered Frame, Proc. IROS’94, 1994. 9. Lee, J., Kunii, T.L.: Model-Based Analysis of Hand Posture. IEEE Computer Graphics and Applications, (1995)77-86
Moment Invariant Based Control System Using Hand Gestures
333
10. Maggioni, C.: A Novel Gestural Input Device for Virtual Reality. 1993 IEEE Annual Virtual Reality Int’l Symp., (1993)118-124 11. Lee, L.K., Ki, S., Choi, Y., Lee, M.H.: Recognition of Hand Gesture to Human-Computer Interaction. IEEE 26th Annual Conf., 3, (2000)2117 – 2122 12. Hasanuzzaman, Md., Zhang, T., Ampornaramveth, V., Kiatisevi, P., Shirai, Y., Ueno, H.: Gesture based Human-robot Interaction Using a Frame Based Software Platform. IEEE International Conference on Man and Cybernetics 3,(2004) 2883-2888 13. Zobl, M., Geiger, M., Schuller, B., Lang, M., Rigoll, G.: AReal-time System for Hand Gesture Controlled Operation of In-car Devices.Proc. Int. Con. Multimedia and Expo 3, (2003)541-544 14. Marius, D., Pennathur, S., Rose, K.: Face Detection Using Colour Thresholding, and Eigenimage Template Matching. EE368: Digital Image Processing Project, http://www.stanford.edu/class/ee368/Project_03/Project/reports/ee368group15.pdf 15. Gonzalez, R.C., Woods, R.E., Eddins, S.L.: Digital Image Processing using MATLAB. New Jersey , Pearson Prentice Hall, 2004 16. Shan, C., Wei, Y., Qiu, X., Tan, T.: Gesture Recognition Using Temporal Template Based Trajectories. Proc. of the 17th Int. Con. Pattern Recognition, 3, (2004)954-957 17. Zhongliang, Q., Wenjun, W.: Automatic Ship Classification by Superstructure Moment Invariants and Two-stage Classifier. ICCS/ISITA '92. Communications on the Move, (1992)544-547
Multiple-ROI Image Coding Method Using Maxshift over Low-Bandwidth Kang Soo You1 , Han Jeong Lee2 , and Hoon Sung Kwak2 1 2
School of Liberal Arts, Jeonju University, Jeonju, 560-759, Korea [email protected] Department of Image Engineering, Chonbuk National University, Jeonju, 561-756, Korea {sosim, hskwak}@chonbuk.ac.kr
Abstract. In this paper, we propose an enhanced ROI (Region of Interest) of image compression algorithm to transmit over communication link with low bandwidth. The proposed algorithm compress image that includes multiple-ROI using Maxshift method in Part 1 of JPEG2000. We evaluate the performance of the proposed method with different bit rate. Simulation results show that the proposed method increases PSNR vs. compression ratio performance above the Maxshift method.
1
Introduction
In general, by using redundancy of the image signals, many world-wide standardization organizations have put effort to standardize the technique of compressing image signals in order to reduce communication bandwidths and storage space while keeping the visual quality of the original image[1]. The recent development of multimedia information era enables it to use images in various applications. JPEG2000 have continued to pursue the development of new algorithms for better compression of 2-dimensional still images [2]. The ROI coding using the Maxshift method in Part 1 of JPEG2000 Standard is used for applications where, before reconstructing the entire image, the user’s area of interest is required to be transmitted first, guaranteeing the visual quality [3] [4]. This method can show the users only the ROI area of the given image because of the consecutive transmission using the embedded coding of coefficients in ROI. Also, since it uses the lossless compression technology, ROI coding provides a much better quality of image than non-ROI area. The user may be able to request more than one ROI as the size of the image increases. But the Maxshift method does not support coding of images with multiple ROIs. Therefore, this paper proposes an algorithm to support multipleROI coding using the scaling method from the existing Maxshift method. This paper is organized as follows. Section 2 introduces the existing Maxshift method and Section 3 the proposed multiple-ROI coding method. The performance evaluation is given in Section 4 and finally Section 5 concludes. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 334–342, 2006. c Springer-Verlag Berlin Heidelberg 2006
Multiple-ROI Image Coding Method Using Maxshift over Low-Bandwidth
2 2.1
335
The Maxshift Method in JPEG2000 What Is ROI?
In 1990s, papers about the progressive image transmission using wavelet in the image compression eventually increased the interest from the population on the multi-resolution analysis of images. The development of communication methods such as the Internet has led users to seek for different interests. The progressive image coding using the multi-resolution analysis can satisfy their various wants at the same time as achieving ROI coding based on the progressive image coding. An image with ROI can be transmitted progressively with a user-specified interest area, hence the compression rate and transmission time show a better performance than simple progressive transmission methods. Furthermore, it can support the user’s various needs such as efficient memory management. The Part 1 of JPEG2000 recommends the Maxshift method for ROI coding [5] [6], which is covered in the following section. 2.2
ROI Coding Method: Maxshift
Maxshift method finds the largest coefficient from the background area first. Then, it places the interest area on a bit-plane, which is higher than the largest coefficient from the background area. This encoding method keeps the coding efficiency without requiring a large number of bits, because it keeps the value of the largest coefficient in the background area instead of the information on the shape itself. There is one disadvantage of this encoding method though; the coefficients of the background area cannot be obtained until it decodes the interest area, which is the only one interest area. In such ROI coding, ROI mask needs to be generated first to select the interest area. This ROI mask should be in wavelet transformation form, since the image to code is wavelet-transformed. ROI mask transformation can be given as below.
1 , inside ROI M (x, y) = . (1) 0 , background ROI Here, M (x, y) indicates the coordinates of each pixel in the input image. 1 represents a coordinate within ROI and 0 in background. Inside a encoder, the coefficients outside the ROI are adjusted so that it would be on a higher bitplane than those inside the ROI. The scaling value s used in this process is as given below. (2) s ≥ max(Mb ) . Mb in equation (2) means the largest coefficient in the background area after quantization in each sub-band. This also means that after scaling all coefficients
336
K.S. You, H.J. Lee, and H.S. Kwak
in ROI, it should be bigger than those in non-ROI. Therefore all coefficients in ROI are located on a higher bit-plane than those in background area. Figure 1 illustrates the method to find the biggest back-ground coefficient to obtain the scaling value s and to scale up ROI coefficients the biggest background coefficient to obtain the scaling value. This insures that the smallest non zero ROI coefficient is still larger than the largest background coefficient as shown in Figure 1.
Fig. 1. Scaling of ROI Coefficients
Max Coef in BG
(
max
max M b b
)
Scaling Value ROI Coef
Fig. 2. Selecting Scaling Value s in Maxshift
Figure 2 illustrates the method to find the biggest background coefficient to obtain the scaling value s. ROI coding using the Maxshift method supports coding of ROI of any arbitrary shape without a need to send information of the ROI shape. When it reconstructs the image without any information on the shape of ROI, the decoder can perform a scale-down since it recognizes that the coefficients are scaled up in the higher bit-plane than the scaling value s.
Multiple-ROI Image Coding Method Using Maxshift over Low-Bandwidth
3
337
Proposed Multiple-ROI Coding Method
Let us consider an example of 2 ROIs in using Maxshfit algorithm. When there are two ROIs, (A) and (B) as shown in Figure 3, the standard ROI algorithm needs to specify (C). (C) includes both (A) and (B) to generate the mask value for both ROIs, as if there is one large ROI. Each of ROIs, (A) and (B), are only 10% - 20% of the entire image, So, the coding of ROI is necessary. However, (C) is not as effective due to its area occupying almost 60% of the image. In an extreme case, when ROI is distributed both at the beginning of the image and end of the image, the ROI covers the entire image. Hence, there is no real difference from simply transmitting the entire image.
(A)
(C)
(B)
Fig. 3. Two ROIs
Therefore, in this paper, it seeks for processing ROIs (A) and (B) separately using the characters of the coordinate system, and it looks for achievement of a more efficient image coding on ROIs with different priorities. To select masks for both ROIs, (A) is selected in a conventional way. The location and size of (B) are obtained based on the information from coordinates of the start and the end of the area (A). Figure 4 shows location of the start and the end of the ROI (B) with consideration of production of x- and y- coordinates of ROI (A). The mask transformation given in equation (1) is used to perform ROI mask coding regardless of ROIs (A) and (B), which is used in the Maxshift method in Part 1 of JPEG2000. After generating the mask, wavelet transformation and quantization are processed using it. Then, ROI is distinguished from the background area. The biggest coefficient value in the background area gets calculated to distinguish the areas, as given in equation (2). The variable s in equation (2) is used to scale up the first ROI using the Maxshift method. This s value is called the scaling variable and used to distinguish ROI from the background region. The equation (3) is used for finding the
338
K.S. You, H.J. Lee, and H.S. Kwak
start x
x range
start y
y range
(A) (B)
Fig. 4. The Proposed multiple-ROI Masks
biggest value in the first ROI Mr1 , in order to determine the priority between two ROIs this paper proposes. p = max(Mr1 ) .
(3)
The value p is applied to s1 to indicate the priority. In other words, the value of s1 needs to be smaller than the maximum of p in Mr1 , or greater than or equal to the minimum of 0, so that the second ROI can have a higher priority. s1, which is used for scaling up the second ROI, can be 0. This is when the two ROIs have the same priority. The equation (4) defines the values on bit planes, where 0 ≤ s1 ≤ max(Mr1 ) . M (x, y) ∈ Mr1 (x, y) , M (x, y) ∈ Mr2 (x, y) ,
s=s . s = s + s1
(4)
Mr1 means the mask for ROI (A) and Mr2 for ROI (B) respectively. s is the largest value in the background region and s1 is the variable to distinguish the priority of Mr1 and Mr2 . If vb [n] denotes the coefficients’ values in background regions after quantization, the values in ROI Mr1 (x, y) become vb [n] + s and those in ROI Mr2 (x, y) become vb [n] + s + s1. Figure 5(a) shows coefficients prior to the Maxshift method, and the coefficients in the two ROIs can be seen in the gray area. The coefficients after scaling up using multiple-ROI are shown in Figure 5(b). The coefficients, which has been ROI coding after quantization, get compressed in a code-block unit using EBCOT algorithm. As you can see in the
Multiple-ROI Image Coding Method Using Maxshift over Low-Bandwidth
MSB 0 0 0 1 1 0 1 LSB 1
0 0 0 0 1 1 1 1
0 1 1 1 0 1 0 1
0 0 0 1 1 1 1 1
0 0 1 1 1 0 1 0
0 0 1 0 1 1 1 0
0 0 1 1 0 1 1 0
0 0 1 1 0 1 1 0
0 1 1 0 1 0 1 0
0 1 0 1 0 1 1 0
0 1 1 1 1 1 1 1
0 0 0 1 1 1 0 0
0 1 1 1 1 0 0 0
0 0 1 1 0 1 1 0
0 0 0 0 1 1 1 1
0 0 0 0 1 1 0 1
0 0 1 1 0 1 1 1
0 0 1 1 1 1 0 1 Coef
0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0
0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1
0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 Coef
339
(a) MSB 0 0 0 0 0 0 0 0 Scaling 0 value 0 0 0 1 1 0 1 LSB 1
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 1
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
0 0 0 0 1 1 1 0 1 0 0 0 0 0 0 0 0
0 0 0 0 1 1 1 0 1 0 0 0 0 0 0 0 0
0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0
0 0 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0
0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0
(b) Fig. 5. The Bit-plane for the Proposed Method
bit-plane where the multiple-ROI coefficients exist, the proposed algorithm does not require the shape information just like in standard ROI algorithm for encoding. Nonetheless, there is a disadvantage of coding non-ROI as well; it may require more bits to handle the priority between two ROIs. In other words, the overall coding overhead increases because the non-ROI coefficients also have to be coded. In an extreme scenario, this may increase the transmission overhead due to the scaling up to the maximum of Mr1 . Therefore, it is recommended to keep s1 less than the value of the largest coefficient in Mr1 to minimize the transmission overhead while specifying the priority between two ROIs. Simulation suggests that it is efficient when the value of s1 is about 10% - 20% of the value of the largest coefficient in Mr1 .
4
Performance Evaluation
512x512 Lena and Barbara images are used for simulation. This paper uses RMSE and PNSR for performance evaluation of the proposed multiple-ROI
340
K.S. You, H.J. Lee, and H.S. Kwak
Fig. 6. PSNR Comparison with 0.065 bpp Lena
Fig. 7. PSNR Comparison with 0.125 bpp Lena
Table 1. RMSE and PSNR Comparison with 0.065 bpp Barbara Method 1 RMSE Maxshift 2800.90039 multiple-ROI 2591.00977 PSNR Maxshift 13.65782 multiple-ROI 13.99611
Wavelet layer 2 3 4 612.39331 394.49902 330.47784 590.07599 396.25919 330.63394 20.26050 22.17034 22.93938 20.42172 22.15101 22.93733
5 331.21633 331.21633 22.92967 22.92967
Table 2. RMSE and PSNR Comparison with 0.125 bpp Barbara Method 1 RMSE Maxshift 2800.52490 multiple-ROI 2583.78760 PSNR Maxshift 13.65840 multiple-ROI 14.00824
Wavelet layer 2 3 4 610.44824 386.34168 292.41498 579.49390 373.02985 282.55164 20.27432 22.26109 23.4709 20.50032 22.41337 23.61983
5 259.75247 258.09610 23.98521 24.01299
method in comparison with the Maxshift method under fixed transmission rates of 0.065 bpp and 0.125 bpp with various wavelet layers. Figures 6 and 7 show the PSNR values on Lena image with 0.065 bpp and 0.125 bpp respectively. Also RMSE and PSNR values on the Barbara image are given in Tables 1 and 2. The reconstructed Lena image under 0.125 bpp is shown in Figures 8 and 9. The simulation proves that the proposed method showed improvement in performance of 0.0 ∼ 0.4 dB per 1 ∼ 4 layers under low transmission rates of 0.065 bpp and 0.125 bpp. When all the layers were transmitted as seen the last in figure 8 and 9, the results were the same.
Multiple-ROI Image Coding Method Using Maxshift over Low-Bandwidth
341
Fig. 8. Reconstruction of Lena Image(0.125bpp) with Maxshift Method
Fig. 9. Reconstruction of Lena Image(0.125bpp) with the proposed method
5
Conclusions
This paper proposes a multiple-ROI coding method to support one or more region of interest, using the Maxshift method in Part 1 of JPEG2000. Standard
342
K.S. You, H.J. Lee, and H.S. Kwak
testing images such as Lena and Barbara are used for simulation and performance evaluation. The proposed technique shows following characteristics: 1. It shows the characteristics of the Maxshift method. 2. it can support multiple ROIs that user requires unlike Maxshift. 3. It exhibits a better performance than the existing method in lower transmission rate. This coding method can be utilized and applied on different areas such as biomedical images requiring without any loss of a particular image area, one that require a fast transmission of a particular image area, or on the communication media with low-bandwidth.
References 1. W.B. Pennebaker, J.L. Mitcell.: JPEG: Still Image Data Compression Standard. Van Nostrand Reinhold, New York (1993) 2. M.Rabbani, R.Joshi.: An Overview of the JPEG 2000 Still Image Compression Standard. Signal Processing: Image Communication Vol.17 (2002) 3–48 3. David S. Taubman, Michael W. Marcellin.: JPEG2000 Image Compression Fundamentals, Standards and Practice. (2002) 4. C. Christopoulos, J. Askelof, M. Larsson.: Efficient Methods for Encoding Regions of Interest in the Upcoming JPEG 2000 Still Image Coding Standard. IEEE Signal Process. Lett. Vol.7(9), (2000) 247–249 5. Joel Askelof, Mathias Larsson Carlander, Charilaos Christopoulos.: Region of Interest Coding in JPEG2000. Signal Processing: Image Communication Vol.17 (2002) 105–111 6. ISO/IEC JTC1/SC29/WG1 (ITU-T SG8) N1646R, JPEG2000 Part 1 Final Committee Draft Vision 1.0. (2000)
Multi-resolution Image Fusion Using AMOPSO-II Yifeng Niu and Lincheng Shen School of Mechatronics and Automation National University of Defense Technology 410073, Changsha, China {niuyifeng, lcshen}@nudt.edu.cn
Abstract. Most approaches to multi-resolution image fusion are based on experience, and the fusion results are not the optimal. In this paper, a new approach to multi-resolution image fusion based on AMOPSO-II (Adaptive Multi-Objective Particle Swarm Optimization) is presented, which can achieve the optimal fusion results through optimizing the fusion parameters. First the uniform model of multi-resolution image fusion in DWT (Discrete Wavelet Transform) domain is established; then the proper evaluation indices of multiresolution image fusion are given; and finally AMOPSO-II is proposed and used to search the fusion parameters. AMOPSO-II not only uses an adaptive mutation operator and an adaptive inertia weight to raise the search capacity, but also uses a new crowding operator to improve the distribution of nondominated solutions along the Pareto front, and uses the uniform design to obtain the optimal combination of the parameters of AMOPSO-II. Results show that AMOPSO-II has better exploratory capabilities than AMOPSO-I, and that the approach to multi-resolution image fusion based on AMOPSO-II realizes the Pareto optimal multi-resolution image fusion.
1 Introduction The multi-resolution image fusion techniques merge the spatial information from a high-resolution image with the radiometric information from a low-resolution image to improve the performances of the fused image in information content and reliability of interpretation [1], [2]. The process can also be considered as sharpening of the lowresolution image. A pixel in the low-resolution image with contains the information of multiple pixels in the high-resolution image. If the pixel information in the lowresolution image is used to revise that in the high-resolution image in course of image fusion, the high resolution will be preserved and the uncertainty of interpretation will be reduced. Different methods of multi-resolution image fusion have the same objective, i.e. to acquire a better fusion effect. Different methods have the given parameters, and different parameters can result in different fusion effects. In general, we establish the parameters based on experience or the parameters adaptively change with the image contents, so it is fairly difficult to gain the optimal fusion effect. If one image is regarded as one information dimension, image fusion can be regarded as an optimization problem in D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 343 – 352, 2006. © Springer-Verlag Berlin Heidelberg 2006
344
Y. Niu and L. Shen
several information dimensions. A better result, even the optimal result, can be acquired through searching the optimal parameters and discarding the given values in the process of multi-resolution image fusion. Therefore, a proper search strategy is very important for the optimization problem. In [3], an approach to image fusion based on multi-objective optimization algorithm was explored, which can deal with multiple evaluation indices that may be compatible or incompatible with one another, but this approach required the source images should have the same resolution, which is often untenable, e.g. in remote sensing images. At present, multi-objective optimization algorithms include Pareto Archive Evolutionary Strategy (PASE) [4], Strength Pareto Evolutionary Algorithm (SPEA2) [5], Nondominated Sorting Genetic Algorithm II (NSGA-II) [6], Non-dominated Sorting Particle Swarm Optimization (NSPSO) [7], Multiple Objective Particle Swarm Optimization (MOPSO) [8], [9], etc. Lots of experiments with the two objective optimization problems show that MOPSO has better optimization capacities and a higher convergence speed [8]. However, once the number of the objectives is greater than three, MOPSO will need too much calculation time, and cause failure in allocating memory even in integer format. So we have presented an adaptive multi-objective particle swarm optimization (AMOPSO-I) in [3], in which the adaptive grid is discarded, and the crowding distance [6], the adaptive mutation and a new adaptive inertia weight were introduced to improve the search capacity. Moreover, AMOPSO-I was applied to optimize the parameters of the multi-objective image fusion. But the crowding distance needs too much time and the optimal combination of the parameters is difficult to obtain in AMOPSO-I, so we make an improvement and propose AMOPSO-II. AMOPSO-II adopts a new distance based on Manhattan distance, and introduces the uniform design to obtain the optimal combination of the parameters. In contrast to AMOPSO-I, AMOPSO-II has a higher convergence speed and better exploratory capabilities and the approach to multi-resolution image fusion based on AMOPSO-II is more successful. The remainder of this paper is organized as follows. The methodology of multiresolution image fusion in DWT domain is introduced in section 2. The proper evaluation indices of multi-resolution image fusion are given in section 3. The algorithm of AMOPSO-II is designed in section 4. The experimental results of multiresolution image fusion are given in section 5. Finally, a summary of our studies and the future researches are given in section 6.
2 Multi-resolution Image Fusion in DWT Domain As shown in Fig. 1, the approach to multi-resolution image fusion in DWT (Discrete Wavelet Transform) domain is as follows. Step 1. Input the source images A and B, where A is the low-resolution image while B is the high-resolution image. Register the low-resolution image A to the same size as the high-resolution image B in order to be superimposed. Sharpen the edge of the high-resolution image B if B is blurred.
Multi-resolution Image Fusion Using AMOPSO-II
Source images
Wavelet transform
Fusion optimization
Inverse transform
345
Fused image
Fig. 1. Illustration of multi-resolution image fusion in DWT domain
Step 2. Find the DWT of each A and B to the specified proper number of decomposition levels in order to perform multi-resolution fusion accurately, at each level we will have one-approximation sub band and 3×Ji (i=A or B, JA<JB) details, where Ji is the decomposition level. When Ji equals 0, the result of DWT will be the original image. Step 3. The details of the different resolution images in DWT domain have an effect on the fused image [10]. However the details of the low-resolution image are not sufficient, so the coefficients of the high-resolution image are substituted for the fused coefficient.
WFj ( x, y ) = WBj ( x, y )
(1)
where WFj(x, y) is the final fused coefficient, WBj is the current coefficients of B at level j. Step 4. For approximations in DWT domain, we use weighted factors to calculate the approximation of the fused image of F. Let CF, CA, and CB be the approximations of F, A, and B respectively, two different fusion rules will be adopted. One rule called “uniform weight method (UWM)” is given by
CF ( x, y ) = w1 ⋅ C A ( x, y ) + w2 ⋅ CB ( x, y )
(2)
where the weighted factors of w1 and w2 are the values in the range [0, 1], and w1 and w2 are the decision variables. The other rule called “adaptive weight method (AWM)” is given by CF ( x, y ) = w1 ( x, y ) ⋅ C A ( x, y ) + w2 ( x, y ) ⋅ CB ( x, y )
(3)
where w1(x, y) and w2(x, y)are the decision variables. Step 5. Using the multi-objective algorithm of AMOPSO-II, we can find the optimal decision variables of multi-resolution image fusion in DWT domain, and realize the optimal multi-resolution fusion. Step 6. The new sets of coefficients are used to find the inverse transform to get the fused image F. J
F = ¦ WFj + CF j =1
(4)
346
Y. Niu and L. Shen
3 Evaluation Indices of Multi-resolution Image Fusion In our approach to multi-resolution image fusion, the establishment of an evaluation index system is the basis of the optimization that determines the quality of the final fused image. However, in the image fusion literature only a few indices for quantitative evaluation of different image fusion methods have been proposed. Generally, the construction of the perfect fused image is an illdefined problem since in most case the optimal combination is unknown in advance. In [3], we explored the possibility to establish an impersonal evaluation index system and have gotten some meaningful results. In fact, the evaluation indices of multi-resolution image fusion include subjective indices and objective indices. The subjective indices rely on the ability of people’s comprehension and are hard to come into application. While the objective indices can overcome the influence of human vision, mentality and knowledge, and make machines automatically select a superior algorithm to accomplish the mission of multi-resolution image fusion. The objective indices can be divided into three categories based on the subjects reflected. One category reflects the fused image features, such as entropy and gradient. The second reflects the relation of the fused image to the source images, such as the mutual information (MI) and the information symmetry (IS). And the third reflects the relation of the fused image to the reference image (when the reference image exists), such as peak signal to noise ratio (PSNR) and the structural similarity (SSIM) [11]. The definitions of these indices can be also found in [3].
4 AMOPSO-II for Multi-resolution Image Fusion Kennedy and Eberhart brought forward particle swarm optimization (PSO) inspired by the choreography of a bird flock in 1995 [12]. Unlike conventional evolutionary algorithms, PSO possesses the following characteristics: 1) Each individual (or particle) is given a random velocity and flows in the decision space; 2) each individual has its own memory; 3) the evolutionary of each individual is composed of the cooperation and competition among these particles. Since the PSO was proposed, it has been of great concern and become a new research field. PSO has shown a high convergence speed in single objective optimization [13], and it is also particularly suitable for multi-objective optimization [8], [9], [14]. In order to improve the performances of AMOPSO-I in [3], we make an improvement and propose “AMOPSO-II” (adaptive multi-objective particle swarm optimization), in which not only the adaptive mutation operator and the adaptive inertia weight is used to raise the searching capacity, but also a new crowding operator based on Manhattan distance is used to improve the distribution of nondominated solutions along the Pareto front and maintain the population diversity, and the uniform design is used to obtain the optimal combination of the algorithm parameters. 4.1 AMOPSO-II Algorithm
The algorithm of AMOPSO-II is the following. Step 1. Initialize the position of each particle: pop[i]=arbitrary, where i=1,…,NP, NP is the particle number; initialize the velocity of each particle: vel[i]=0; initialize
Multi-resolution Image Fusion Using AMOPSO-II
347
the record of each particle: pbests[i]=pop[i]; evaluate each of the particles in the POP: fun[i, j], where j=1,…,NF, and NF is the objective number; and store the positions that represent nondominated particles in the repository of the REP according to the Pareto optimality. Step 2. Update the velocity of each particle using (5). vel[i ] = W ⋅ vel[i ] + c1 ⋅ rand1 ⋅ ( pbests[i ] − pop[i ]) + c2 ⋅ rand 2 ⋅ (rep[h] − pop[i ])
(5)
where W is the adaptive inertia weight [15]; c1 and c2 are the learning factors [16], rand1 and rand2 are random values in the range [0, 1]; pbests[i] is the best position that particle i has had; h is the index of the maximum crowding distance in the repository that implies the particle is located in the sparse region, as aims to maintain the population diversity; pop[i] is the current position of particle i. Step 3. Update the new positions of the particles adding the velocity produced from the previous step. pop[i ] = pop[i ] + vel[i ]
(6)
Step 4. Maintain the particles within the search space in case they go beyond their boundaries (avoid generating solutions that do not lie on valid search space). Step 5. Adaptively mutate each of the particles in the POP at a probability of Pm. Step 6. Evaluate each of the particles in the POP. Step 7. Update the contents in the REP, and insert all the current nondominated positions into the repository. Step 8. Update the record of each particle. When the current position of the particle is better than the position contained in its memory, the latter is updated. pbests[i ] = pop[i ]
(7)
Step 9. If the maximum cycle number is reached, stop the process and output the Pareto solutions; else go to Step 2. 4.2 Repository Control
The external repository of REP is used to record the nondominated particles in the primary population. At the beginning of the search, the REP is empty. The nondominated vectors found at each iteration are compared with respect to the contents of the REP. If the REP is empty, the current solution will be accepted. If this new solution is dominated by an individual within the REP, such a solution will be automatically discarded. Otherwise, if none of the elements contained in the REP dominates the solution wishing to enter, such a solution will be stored in the REP. If there are solutions in the REP that are dominated by a new element, such solutions will be removed out of the REP. Finally, if the REP has reached its allowed maximum capacity, the new nondominated solution and the contents of the REP will be combined into a new population, according to the objectives, the individuals with lower crowding distances (located in the dense region) will not enter into the REP. 4.3 Crowding Distance
In order to improve the distribution of nondominated solutions along the Pareto front, we introduce a concept of crowding distance from NSGA-II [6] that indicates the
348
Y. Niu and L. Shen
population density. When comparing the Pareto optimality between two individuals, we find that the one with a higher crowding distance (located in the sparse region) is superior. In [6], the crowding distance is defined as the size of the largest cuboids enclosing the point i without including any other point in the population, and it can be acquired through calculating average distance of two points on either side of point of the objective. However, the definition has O(mnlogn) (m=NF, n=NP) computational complexity, and may need too much time because of sorting order. Here we propose a new crowding distance that can be calculated using the Manhattan distance between the points and the barycentre of their objectives based on the cluster analysis. It is defined as NF
Dis[i ] = ¦ | f ij − G j | j =1
(8)
where Dis[i] is the distance of the ith particle, fij is the jth objective of the ith particle, Gj is the barycentre of all the jth objectives. The new crowding distance is superior to the crowding distance of NSGA-II [6], for it doesn’t need to sort order and has less computational complexity, and it is also superior to the grid [4], [8] because the later may fail to allocate memory when there exist too many objectives. 4.4 Uniform Design for Parameter Establishment
Uniform design is used to convert the problem of parameter establishment into the experimental design of multi-factor and multi-level, which can reduce the work load of experiment greatly of simulation [17]. The main objective of uniform design is to sample a small set of points from a given set of points, such that the sampled points are uniformly scattered. Let there be n factors and q levels per factor. When n and q are given, the uniform design selects q combinations out of qn possible combinations, such that these q combinations are scattered uniformly over the space of all possible combinations. The selected combinations are expressed in terms of a uniform array U(n, q)=[Ui,j]q×n, where Ui,j is the level of the jth factor in the ith combination. When q is prime and q>n, it has been proved that is given by U i , j = (iσ
j −1
mod q) + 1
(9)
where σ is a parameter determined by the number of factors and the number of levels per factor [18].
5 Experiments The performances of the proposed multi-resolution image fusion approach are tested and compared with that of different fusion schemes. The image “aerial” is selected as the reference image of R. The two source images of A and B are shown in Fig. 2(a) and Fig. 2(b) respectively, where A is the lower resolution image (128×128) with the pixel size of 30 m, but is clear, while B is the higher resolution image (256×256) with the pixel size of 15 m, but is blurred. The entropy of A is 6.9323, the gradient of A is
Multi-resolution Image Fusion Using AMOPSO-II
349
21.5313, the entropy of B is 6.6950, and the gradient of B is 6.1047. We use AMOPSO-II to search the Pareto optimal weights of the multi-resolution image fusion and compare the results with those of AMOPSO-I and MOPSO. Since the solutions to the optimization of image fusion are nondominated by one another, we give preference to the six indices so as to select the Pareto optimal solutions to compare, e.g. one order of preference is SSIM > Entropy > MI > PSNR > Gradient > IS. When the reference image exists, SSIM is the principal objective. Otherwise, the Entropy will become the principal objective, and the order is Entropy > MI > Gradient > IS. 5.1 Results of Uniform Design for AMOPSO-II Parameters
The important parameters of AMOPSO-II include the number of particles, the number of cycles, the size of the repository, and the mutation probability. We construct a uniform array with four factors and five levels as follows, where σ is equal to 2. We compute U(4, 5) based on (9) and get
ª2 «3 « U (4, 5) = «4 « «5 «1 ¬
3 5 2 4 1
5 4 3 2 1
4º 2»» 5» » 3» 1»¼
In the first combination, the four factors have respective levels 2, 3, 5, 4; in the second combination, the four factors have respective levels 3, 5, 4, 2, etc. The value range of the number of particles is [40, 120]; the range of the number of cycles is [50, 250]; the range of the size of the repository is [100, 300]; the range of the mutation probability is [0.02, 0.06]. All combinations are run for a maximum value of 500 evaluations. Results show that the third combination is the optimal in the problem. It is to be known that the optimal combination of the parameters in one multi-objective problem can be not the optimal combination in another problem. Thus, the parameters of AMOPSO-II are as follow: the particle number of NP is 100; the maximum cycle number of Gmax is 100; the allowed maximum capacity of MEM is 200; the mutation probability of Pm is 0.06. The inertia weight of Wmax is 1.2, and Wmin is 0.2; the learning factor of c1 is 1, and c2 is 1, the parameters of MOPSO are the same, while the inertia weight of W is 0.4, the grid number of Ndiv is 20, for a greater number may cause the failure of program execution, e.g. 30. 5.2 Comparison Among Different Schemes
In order to compare the different schemes of multi-resolution image fusion, all the approaches are run for a maximum of 100 evaluations and the sum of the weights at each position of two source images is limited to 1. The fused images from the Pareto optimal solutions are shown in Fig. 2(c) and Fig. 2(d). It can be seen that the fused image of AWM at decomposition level 3 (of the high-resolution image) is the optimal. Table 1 shows the evaluation indices of the fused images from different schemes,
350
Y. Niu and L. Shen
where MOPSO denotes the AWM based on MOPSO (see [8]), AWM-I is the AWM based on AMOPSO-I, AWM is the AWM based on AMOPSO-II, (JA, JB) denotes that the decomposition level of A is JA, and that the decomposition level of B is JB. From Table 1, we can see that the indices of AWM are inferior to those of UWM when the decomposition level of B is equal to 1. The reason is that the decision variables of AWM are too many and AWM can’t reach the Pareto optimal front in a limited time, e.g. the number of iteration is 100. The run time of AWM should increase with the number of decision variables. When the level of B is greater than 1 in DWT domain, the indices of AWM are superior to those of UWM because the weights of AWM are adaptive in different regions. The higher the decomposition level is, the better the fused image is, for a higher level decreases the decision variables and improves the adaptability. However, the pixels in sub images will cause the distortion as the level becomes greater, so the maximum value of the level is limited to 3. The indices of AWM-I and MOPSO are inferior to those of AWM, which indicates that the new crowding distance can increase the running speed and achieve better results than the distance in [6], that the uniform design can help to improve the performances of AMOPSO-II, and that MOPSO needs too much memory because the grid [4] is worse for too many objectives, e.g. 6. Therefore, the approach to multi-resolution image fusion that uses AMOPSO-II to search the adaptive fusion weights at level 3 (of the high-resolution image) in DWT domain is the optimal. This approach can save up the optic features of the images, overcome the limitations of given fusion parameters, and obtain the optimal fusion.
(a) Source Image A
(b) Source Image B
(c) UWM fused for JB=3 (d) AWM fused for JB=3
Fig. 2. Source and fused images Table 1. Evaluation indices of the fused images from different schemes Schemes (JA, JB) UWM (0, 1) AWM (0, 1) UWM (1, 2) AWM (1, 2) UWM (2, 3) MOPSO (2, 3) AWM-I (2, 3) AWM (2, 3)
Entropy 7.1529 6.9679 6.8752 7.1623 6.9595 7.3104 7.1529 7.1823
Gradient 12.1093 14.2213 11.6481 12.1546 11.9852 12.2209 12.1093 12.4250
MI 12.4220 13.6555 12.4026 12.4263 12.4410 12.9813 12.4220 12.4272
IS 1.9825 1.9841 1.9807 1.9825 1.9819 1.9988 1.9825 1.9823
PSNR 27.5151 15.8836 24.1379 27.5682 24.9924 27.6257 27.5151 27.7075
SSIM 0.9115 0.8076 0.9280 0.9621 0.9414 0.9634 0.9636 0.9637
Time (s) 482.06 752.17 548.87 333.07 294.12 1023.79 331.01 282.86
Multi-resolution Image Fusion Using AMOPSO-II
351
6 Conclusions The approach using AMOPSO-II to optimize the parameters of multi-resolution image fusion is reasonably feasible and effective, and can get the Pareto optimal fusion result. Multi-objective optimization for the fusion parameters can avoid the limitations of too heavy dependence on the experience and simplify the algorithm design for multi-resolution image fusion. Once the valid evaluation indices are established, the method of multi-objective optimization can be used to deal with these objectives that could conflict with one another and eliminate the influence of preference effectively. The proposed AMOPSO-II is an effective algorithm to solve the multiobjective problem, which can get to the Pareto front of optimization problems quickly and attain the optimal solutions. AMOPSO-II can effectively be applied to solve other multi-objective optimization problems. One aspect that we would like to explore in the future is the analysis for the evaluation indices system for pixel-level image fusion using PCA (Principal Component Analysis) to acquire a meaningful measurement, which will improve the performances of multi-resolution image fusion. We are also considering improving the adaptability of the multi-objective image fusion. Finally it is desirable to study the applications of the optimization algorithm in color images with other fusion methods.
References 1. Pohl, C., van Genderen, J.L.: Multisensor Image Fusion in Remote Sensing: Concepts, Methods and Applications. International Journal of Remote Sensing 5 (1998) 823-854 2. Zhukov, B., Oertel, D., Lanzl, F., Reinhackel, G.: Unmixing-Based Multisensor Multiresolution Image Fusion. IEEE Transactions on Geoscience and Remote Sensing 3 (1999) 1212-1226 3. Niu, Y.F., Shen, L.C.: A Novel Approach to Image Fusion Based on Multi-Objective Optimization. In: Proceedings of the 6th World Congress on Intelligent Control and Automation (WCICA06) in press. (2006) 4. Knowles, J.D., Corne, D.W.: Approximating the Nondominated Front Using the Pareto Archived Evolution Strategy. Evolutionary Computation 2 (2000) 149-172 5. Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: Improving the Strength Pareto Evolutionary Algorithm. TIK-Report 103, ETH, Zurich, Switzerland (2001) 6. Deb, K., Pratap, A., Agarwal, S., et al.: A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 2 (2002) 182-197 7. Li, X.: A Non-dominated Sorting Particle Swarm Optimizer for Multiobjective Optimization. In: Cantu-Paz E. et al. (Eds.): Genetic and Evolutionary Computation. Lecture Notes in Computer Science, Vol. 2723. Springer-Verlag, Berlin Heidelberg (2003) 37-48 8. Coello, C.A., Pulido, G.T., Lechuga, M. S.: Handling Multiple Objectives with Particle Swarm Optimization. IEEE Transactions on Evolutionary Computation 3 (2004) 256-279 9. Sierra, M.R., Coello, C.A.: Improving PSO-Based Multi-objective Optimization using Crowding, Mutation and e-Dominance. In: Coello C.A. et al. (Eds.): Evolutionary MultiCriterion Optimization. Lecture Notes in Computer Science, Vol. 3410. Springer-Verlag, Berlin Heidelberg (2005) 505-519 10. Huang, X.S., Chen, Z.: A Wavelet-Based Image Fusion Algorithm. In: Proceedings of the IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering (TENCON 2002), Beijing (2002) 602-605
352
Y. Niu and L. Shen
11. Wang, Z., Bovik, A.C., Sheikh, H.R., et al.: Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Transactions on Image Processing 4 (2004) 600-612 12. Kennedy, J., Eberhart, R.C.: Particle Swarm Optimization. In: Proceedings of IEEE International Conference on Neural Networks (ICNN95), Perth (1995) 1942-1948 13. Kennedy, J., Eberhart, R.C.: Swarm Intelligence. Morgan Kaufmann, San Mateo (2001) 14. Ray, T., Liew, K.M.: A Swarm Metaphor for Multiobjective Design Optimization. Engineering Optimization 2 (2002) 141-153 15. Shi, Y., Eberhart, R.C.: A Modified Particle Swarm Optimizer. In: Proceedings of IEEE International Conference of Evolutionary Computation, Anchorage (1998) 69-73 16. Eberhart, R.C., Shi, Y.: Particle Swarm Optimization: Development, Applications and Resources. In: Proceedings of the IEEE Congress on Evolutionary Computation, Seoul (2001) 81-86 17. Leung, Y.W., Wang, Y.P.: Multiobjective Programming Using Uniform Design and Genetic Algorithm. IEEE Transactions on Systems, Man, and Cybernetics-Part C: Applications and Reviews 3 (2000) 293-304 18. Fang, K.T., Ma, C.X.: Orthogonal and Uniform Experimental Design. Science Press, Beijing in Chinese (2001)
Multiscale Linear Feature Extraction Based on Beamlet Transform* Ming Yang, Yuhua Peng, and Xinhong Zhou School of Information Science and Engineering, Shandong University, Jinan, Shandong Province, People's Republic of China [email protected], [email protected], [email protected]
Abstract. Beamlet [1] is an efficient tool for multiscale image analysis. A fast algorithm for discrete beamlet transform [2] is proposed. It greatly reduces the complexity for computing the coordinates of pixels on beamlets, and concentrates the beamlet transform on summation of the pixel grayscale values. This paper also improves Donoho's method of using complexity-penalized energy [1] to extract multiscale linear features. It establishes the two-scale relationship of the maximal beamlet energy in the dyadic square, and presents a threshold-processed maximal beamlet energy algorithm which can avoid the problem of selecting penalty factor. Experimental results prove the efficiency of the method proposed.
1 Introduction Linear feature extraction is very important in image processing technology. Wavelet analysis has great advantages in point feature extraction, but it is not good at extracting linear feature. Radon transform is integrated along line segments which span the whole image, so it can not provide precise information about the position of endpoints and the length of line segments. Linear feature extraction based on beamlet transform is proposed under this background. Beamlet has a frame in which line segments play a role analogous to the role played by points in wavelet analysis. While wavelets offer localized scale/location representation near fixed regions of space, beamlets have localized scale/location /orientation based on dyadically-organized line segments. So it is valuable to research the beamlet transform and its application in image processing.
2 Discrete Beamlet Transform Donoho proposed the theory of continuous beamlet transform. Xiaoming Huo developed the conception of discrete beamlet [2]. *
This project is sponsored by SRF for ROCS, SEM (2004.176.4) and NSF SD Province (Z2004G01) of China.
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 353 – 363, 2006. © Springer-Verlag Berlin Heidelberg 2006
354
M. Yang, Y. Peng, and X. Zhou
A digital image of size n × n (n = 2 ) is called a dyadic square with scale j. Connecting arbitrary two pixels on the boundary of the square, we can get a discrete beamlet basis. See fig.1. Pixels on the beamlet are determined by interpolation. The beamlets generated by interpolating pixels should approximate the ideal line segments optimally. j
Fig. 1. A discrete beamlet generated by interpolation
The scope of the dyadic square scale is j = 0,1,2,, J for an image of
N × N ( N = 2 J ) . The collection of all beamlets at various scales, locations and orientations is marked as B. The beamlet transform of a digital image on one beamlet basis is just the sum of the grayscale values of pixels on this beamlet. Then the whole beamlet transform of the image can be defined as follow:
T (b) =
¦ p(i, j ) , b ∈ B
( i , j )∈b
Here
(1)
p(i, j ) is the grayscale value of pixel (i, j ) on beamlet b .
3 Fast Beamlet Transform 3.1 Fast Algorithm of Pixel Coordinates The emphasis of fast algorithm is finding a way of rapidly calculating the coordinates of pixels on the beamlets. Qinfeng Shi put forward a fast algorithm [3] using Bresenham method [6]. But the huge quantity of beamlets will lead to unacceptable computational complexity if each beamlet basis uses Bresenham method. This paper presents a fast algorithm using the symmetry and parallelism properties of line segments in the dyadic square. Any beamlet has its vertical and horizontal projection, and the length of projection is counted by the number of pixels. Fig.2 shows six beamlets and their vertical and horizontal projection length. We use {i, j} to represent the beamlets whose projection length is
i and j (suppose i ≥ j , and define i as the length of beamlet
Multiscale Linear Feature Extraction Based on Beamlet Transform
355
b , that is, l (b) = i ). Then the six beamlets b1 ~ b6 in fig.2 can be marked as {13,7} , {16,3} , {3,1} , {16,7} , {11,7} and {8,5} . For a dyadic square of n × n , we sort the beamlets to the first kind if l (b) < n , and to the second kind if l (b) = n . The first kind represented by {i, j} includes 8 beamlets at different position (see fig.3, b1 ~ b8 can be represented by {12,7} ). The
{n, j} includes 4 beamlet families (see fig.4, b1k ~ b4 k can be represented by {16,7} ), and each family consists of n − j + 1 beamlets, that is, k = 1䯸2䯸䯸 n − j + 1 .
second kind represented by
Fig. 2. The projection length of the discrete beamlets
We define the pixel coordinates in the dyadic square of Then we can get the conclusions as follow:
ª (1,1) «(2,1) « « « ¬(n,1)
(1,2) (2,2 )
(n,2)
n × n as equation (2).
(1, n ) º (2, n )»»
» » (n, n )¼
(2)
(1) For the 8 beamlets belonged to the first kind (see fig. 3): a. On two beamlets which have the left-right symmetry in the square (such as
b1
b2 ), the horizontal coordinates of corresponding pixels are the same, and the sums of corresponding vertical coordinates are n + 1 . For example, coordinates of pixels on b1 is (1,7), (2,6), (3,6), ... , (11,2), (12,1), while b2 is (1,10), (2,11), (3,11),
and
... , (11,15), (12,16). We can see that they have the same horizontal coordinates, and all the sums of corresponding vertical coordinates are 17. b. On two beamlets which have the up-down symmetry in the square (such as b1 and b4 ), the sums of horizontal coordinates of corresponding pixels are n + 1 , and the corresponding vertical coordinates are the same. For example, coordinates of
356
M. Yang, Y. Peng, and X. Zhou
pixels on b1 is (1,7), (2,6), (3,6), ... , (11,2), (12,1), while b4 is (16,7), (15,6), (14,6), ... , (6,2), (5,1). We can see that all the sums of corresponding horizontal coordinates are 17, and the corresponding vertical coordinates are the same.
Fig. 3. The first kind of beamlets
The horizontal coordinates of pixels on
b1 can be written as b1 ( x ) = 1 : i . The
b1 ( y ) can be calculated through Bresenham method. So we can solve the coordinates of b2 ~ b4 as follow:
vertical coordinates
b2 ( x ) = b1 ( x ) ® ¯b2 ( y ) = n + 1 − b1 ( y )
b3 ( x ) = n + 1 − b2 ( x ) ® ¯b3 ( y ) = b2 ( y )
b4 ( x ) = b3 ( x ) ® ¯b4 ( y ) = b1 ( y )
(3)
b5 ~ b8 and b1 ~ b4 are symmetric about the diagonal y = x , we can get the coordinates of b5 ~ b8 by exchanging the horizontal and vertical coordinates of b1 ~ b4 . So we solve 8 beamlets by using Bresenham method only once. (2) For all the parallel beamlets in beamlet families b1k and b2 k which belonged to Because
the second kind (see fig. 4): a. All the beamlets have the same horizontal coordinates sequence 1: n (or n : −1 : 1 ). b. The vertical coordinates of neighbor beamlets differ 1. For example, coordinates of pixels on b11 and b12 are (1,7), (2,7), (3,6), … , (15,1), (16,1) and (1,8), (2,8), (3,7), … , (15,2), (16,2). We can see the corresponding vertical coordinates differ 1. The beamlets with the same label k in b1k and b2 k have the up-down symmetry, so they have the same vertical coordinates. So once we calculate b11 ( y ) , that is, the vertical coordinates of method, all the beamlets can be solved as follow:
b11 , by Bresenham
Multiscale Linear Feature Extraction Based on Beamlet Transform
b1k ( x ) = 1 : n ® ¯b1k ( y ) = b1,k −1 ( y ) + 1
b2 k ( x ) = n : −1 : 1 ® ¯b2 k ( y ) = b1k ( y )
357
(4)
b3k and b4 k through exchanging horizontal and vertical coordinates of b1k and b2 k . So we solve 4( n − j + 1) beamlets by using We can get the coordinates of
Bresenham method only once.
Fig. 4. The second kind of beamlets
3.2 Steps of Fast Beamlet Transform 1. Scanning the first kind of beamlets (represented by {i, j} )
i = 2 , j =1. (2) Calculate b1 ( y ) through Bresenham method, and then solve other beamlets b1 ~ b4 according to equation (3). (3) Solve b5 ~ b8 by exchanging horizontal and vertical coordinates of b1 ~ b4 . (1) Initialization
(4) Sum the pixel grayscale values and get the beamlet transform of
b1 ~ b8 .
j = j + 1 . If j ≤ i , go to step (2); else go to step (6). (6) i = i + 1 ; If i ≤ n , go to step (2); else end. 2. Scanning the second kind of beamlets (represented by {n, j} ) (1) Initialization j = 1 . (2) Calculate b11 ( y ) through Bresenham method, and then solve other beamlets b1k and b2 k according to equation (4). (5)
(3) Solve
b1k and b2 k .
b3k and b4 k by exchanging the horizontal and vertical coordinates of
358
M. Yang, Y. Peng, and X. Zhou
(4) Sum the pixel grayscale values and then get the beamlet transform of
b1k ~ b4 k . (5) j = j + 1 . If j ≤ n , go to step (2); else end. 3.3 Comparison of Computational Time Between Fast and General Algorithm We compare the computational time of Qinfeng Shi's algorithm [3] (here we call it general algorithm) and this paper's method as follow (CPU: Pentium III 800):
Fig. 5. Comparison of computational time between fast algorithm and general algorithm
We can see the fast algorithm exhibits more and more advantages while the size of dyadic square gets bigger. When the scale j=9, that is the size of the square is 512×512, the fast algorithm reduces about 60% time.
4 Multiscale Linear Feature Extraction For a dyadic partition of the dyadic square, if pieces of partition are (optionally) decorated by associated beamlets, we call this partition "beamlet-decorated recursive dyadic partition" [1] (BD-RDP). See fig.6.
Multiscale Linear Feature Extraction Based on Beamlet Transform
359
Fig. 6. BD-RDP
The most important property of “BD-RDP” is inter-scale inhibition between beamlets: A beamlet in large scale can be represented by at most three beamlets in small scale, see fig. 7(1). If the linear feature of the dyadic square can be represented by the beamlet in large scale, we will not decompose it into small scale. So the multiscale linear feature extraction is essentially searching for the optimal BD-RDP. Donoho proposed a method named "complexity-penalized energy" [1]:
J ( P) = ¦ C S2 − λ # P
(5)
S~P
T 2 (b ) stands for the maximal beamlet energy of dyadic b~ S l (b ) square S. Ȝ is the penalty factor. # P represents the number of dyadic squares in the RDP. The method for optimizing RDP is looking for the maximal J (P ) , that is max J ( P ) . We can write J (P ) as the additive function: J ( P ) = ¦ ΓS , here Here C S2 = max
P ~ BDP
S~P
ΓS = C − λ . 2 S
So we can get a fast algorithm of searching for max J ( P ) , that is, the “bottom-up P ~ BDP
tree pruning process” [1]: starting from the smallest scale, moving up level by level, we can decide whether decompose the dyadic square by comparing ΓS . 4.1 Problems of Donoho's Method 1. It is difficult to establish the two-scale relationship between and
C S2i − λ . Here they correspond to dyadic square S and sub-square Si
C S2 − λ making
up S. 2. The selection of penalty factor Ȝ is very difficult. It will vary with people's experience and different images. In Donoho and Xiaoming Huo's words, how to choose an optimal penalty factor is a future work [4] [5].
360
M. Yang, Y. Peng, and X. Zhou
Considering the example of fig. 7(1), the beamlet b0 in large scale is composed of three beamlets
b0 ~ b0 in small scale. Then we hope to keep b0 and not divide it 1
3
into three pieces. Obviously, we have:
T = T1 + T2 + T3 ® ¯l = l1 + l 2 + l3 Here
(6)
T , T1 , T2 , T3 stand for the beamlet coefficients of b0 , b01 , b02 , b03 , and
l , l1 , l 2 , l3 is the length of them. There is C S2 − λ =
T2 − λ in large scale and l
2
2 2 T T1 T + 2 + 3 − 3λ in small scale. Even if the easiest l1 l2 l3 i circumstance, that is, l1 = l 2 = l 3 , we have:
¦ (C S2i − λ ) =
C S2 =
T 2 T12 + T22 + T32 + 2(T1T2 + T2T3 + T1T3 ) = ≤ l 3l1
T12 + T22 + T32 + (T12 + T22 ) + (T22 + T32 ) + (T12 + T32 ) T12 + T22 + T32 = = ¦ C S2 i 3l1 l1 i
(7)
We can see that, it is difficult to choose a proper penalty factor Ȝ which can make all the dyadic squares satisfy C S2 − λ ≥ C S2 − 3λ (that is keeping the linear feature
¦ i
i
in large scale) when they have the relationship of equation (6). 4.2 Improvements of Donoho's Method 1. Establish the two-scale relationship of maximal beamlet energy An improved maximal beamlet energy is presented as follow:
ES =
¦ p 2 (i, j )
(8)
( i , j )∈b0
T (b ) , that is, b0 is the beamlet l (b ) which has the maximal normalized beamlet coefficient in S. And p (i, j ) has the Here
b0 is the beamlet which satisfies max b~ S
same meaning as the one in equation (1). So we can get the two-scale relationship between correspond to dyadic square S and sub-square
ES
Si respectively):
and
ESi
(Here they
Multiscale Linear Feature Extraction Based on Beamlet Transform
361
(1) When the linear feature of S can be represented by one beamlet in large scale, there is
E S ≥ ¦ E S , see fig.7(1) (If pixels on b0 coincide with the pixels on i
i
b0 completely, E S = ¦ E S . If there are a few pixels not coincided, E S > ¦ E S ). i
i
i
i
i
(2) When the linear feature of S should be decomposed and represented by several beamlets in small scale,
E S < ¦ E S (see fig.7 (2)). i
i
Fig. 7. Two-scale relationship of maximal beamlet energy
2. Replace penalty factor with threshold-processed maximal beamlet energy If the difference between the maximal normalized beamlet coefficient max b~ S
T (b ) l (b )
and the average of the grayscale values of dyadic square lower than the threshold, we consider the dyadic square as background region, and set E S = 0 . Then the bottom-up tree pruning process can be described as follow: (1) Consider each dyadic square S of 2 by 2. We decide whether divide it into 4 pieces through comparing
ES and sum of E S . We mark the bigger as ES* , record i
this partition in P , and record the beamlets corresponding to this partition (if
E S* > 0 ) in B . (2) Move up one level, and consider the square S of 4 by 4. We decide whether divide it or undivided through comparing
ES and the optimized ES* . Still we mark
ES* > 0 ) in P and B . (3) Move up level by level as above and up to the whole image. Then P and B *
the bigger as ES , and record the partition and the beamlets (if are the optimal partition and the corresponding beamlets. 4.3 Comparison of Extraction Results
We extract linear feature for an image named "Picasso" (see fig.8). We can see the result of this paper’s method (see fig.10) is better than Donoho’s method (see fig.9), especially the detail features such as which marked with "1", "2", "3" are much clearer.
362
M. Yang, Y. Peng, and X. Zhou
Fig. 8. Original image "Picasso"
Fig. 9. Experimental result of Donoho's method
Multiscale Linear Feature Extraction Based on Beamlet Transform
363
Fig. 10. Result of this paper's method
5 Conclusion The fast beamlet transform greatly reduces the computational complexity of calculating pixel coordinates on the line segments. Doing beamlet transform for an image of 512×512, we can reduce sixty percent time. The two-scale relationship and the threshold-processed maximal beamlet energy method simplify the extraction of linear features efficiently, and the extracting result is more clearly.
References 1. David L.D., Huo X.M.: Beamlets and Multiscale Image Analysis, in In Multiscale and Multiresolution Methods, Lecture Notes in Computational Science and Engineering, 20 (2001) 2. Huo X.M., Chen J.H.: JBEAM: Multiscale Curve Coding via Beamlets. IEEE Transaction on Image Processing, 14 (2005) 3. Shi Q.F., and Zhang Y.N.: Adaptive Linear Feature Detection Based on Beamlet, Processings of the Third International Conference on Machine Learning and Cybernetics, Shanghai, (2004) 26-29 4. David L. D., Huo X.M.: Beamlet Pyramids: A New Form of Multiresolution Analysis, suited for Extracting Lines, Curves, and Objects from Very Noisy Image Data, SPIE2000, (2000) 5. David L. D., Huo X.M.: Near-Optimal Detection of Geometric Objects by Fast Multiscale Methods. IEEE Transactions on Information Theory, 51(7) (2005) 6. Bresenham J.E.: Algorithm for Computer Control of a Digital Plotter. IBM Systems Journal, 4(1) (1965) 25-30
Multisensor Information Fusion Application to SAR Data Classification Hai-Hui Wang1,2, Yan-Sheng Lu1, and Min-Jiang Chen2 1 College of Computer Science & Technology, Huazhong University of Science & Technology, Wuhan, 430074, China 2 School of Computer Science & Engineering, Wuhan Institute of Technology, Wuhan, 430073, China [email protected]
Abstract. A new multisensor information fusion classifier is introduced and applied to land cover classification using SAR composites. This classifier aims at the integration of multi-source, contextual and prior to information in a single and a homogeneous framework. Statistical and fuzzy logic approaches have been employed in the experiments. Fuzzy membership maps to different thematic classes are first calculated using classes and sensors a priori knowledge. These maps are then iteratively updated using spatial contextual information. A classification rule is associated to different iterations. The confidence map constitutes an important issue in order to evaluate the classification process complexity and the validity of the used assumptions. Finally, after compared the statistical properties of the fusion result by different methods, the proposed method showed satisfied result.
1 Introduction Information extraction technique for multisensor image data fusion is a process whereby data from various modalities are merged to provide a more complete description of a scene or object under identification. Multi-source images interpretation is one of the most important issues in many image processing tasks, ranging from remotely sensed satellite imagery to medical image, robot vision, etc[1-3]. Two approaches are generally conducted: the analyst’s visual and the computerbased interpretation approaches. The analyst’s visual interpretation consists in using one specific sensor (visible, for example), or a color composite of three sensors, as a visual input to which the theoretical knowledge and expertise of the analyst are applied. This method suffers from subjectivity and speed. Nevertheless, it has the main advantage of being capable of using several knowledge sources simultaneously during the interpretation process. Computer-based interpretation methods consider the reflected radiation, or emission, from different sensors as a numerical feature vector in the feature space. Thus, the multi-source images interpretation problem is considered as a classification one. In this study, a fuzzy-based multisensor data fusion classifier is developed and applied to land cover classification using ERS-1/JERS-1(European Resource D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 364 – 373, 2006. © Springer-Verlag Berlin Heidelberg 2006
Multisensor Information Fusion Application to SAR Data Classification
365
Satellite-1/Japanese Earth Resources Satellite-1) SAR(Synthetic Aperture Radar) composites. This classifier aims at the integration of multi-source, contextual and prior to information in a single and a homogeneous framework [4,5]. Fuzzy membership maps (FMM) to different thematic classes is firstly calculated using classes and sensors a priori knowledge. These FMM are then iteratively updated using spatial contextual information. A classification rule is associated to different iterations. This classifier has the following advantages: first, due to the use of fuzzy concepts, it has the flexibility of integrating multi-source / contextual and a priori information in a single and a homogeneous framework. Second, the classification results consist of a thematic as well as a confidence maps. The confidence map (a classification certainty map representing the degree of certainty in the thematic map) constitutes an important issue in order to evaluate the classification process complexity and the validity of the used assumptions. The application of this classifier using ERS-1/JERS-1 SAR composites is shown to very promising.
2 Classification Algorithm The inadequacy of conventional mathematics in modeling the expert’s approximate reasoning process has motivated researchers to seek other alternatives [6]. The fuzzy set theory provides us with a powerful mathematical tool for modeling the human ability to reach conclusions when the information available is imprecise, incomplete, and not totally reliable. In conventional crisp set theory, one element either belongs to a set or it does not. The major characteristic that distinguishes fuzzy set theory from traditional crisp set theory is that it allows intermediate grades of membership. In this section, fuzzy sets definition as well as their application to multisource pattern recognition problems is given. Let Ω denote the universal set. A fuzzy set A over Ω is defined as the set of ordered pairs
A = {( X , μ A ( X )), X ∈ Ω } where μ A ( X ) X ∈ (0,1) is termed the grade of membership, or simple the membership function, of the element X to the fuzzy set A. In a multi-source classification problem, a pattern X is described as a vector in an N-dimensional space, X = [x1 , x 2 , , x N ] , x n ∈ Ω n , n = 1, 2 , , N , where Ω n denotes the nth sensor data observation space (i.e; the source universe), and N the number of sensors. Using this formulation, the universal set Ω is the Cartesian product representing the multi-source observation space. If C = {C1 , C 2 , , C M } is the set of "M" predefined classes, then, each class Cm is defined as a fuzzy set over Ω . Thus,
μ Cm ( X ) conveys an information on the degree to which extent the pattern
X ∈ Ω may be treated as belonging to the class Cm. Fuzzy concepts application to multi-source data classification can be decomposed into the three following steps: This step aims at the determination of the membership value of each pattern X over the nth source universe ( Ω n ) to different classes: μ Cm ,n ( X ) , m = 1, .., M, n = 1, .. N.
366
H.-H. Wang, Y.-S. Lu, and M.-J. Chen
These membership functions, μ Cm ,n (⋅) , will be referred to as the Source Related (SR)
membership functions. Notice that the term "source" is used in this study in order to indicate different information or data granula. This data may be issued from physical sensors as well as from ancillary data or new features computed over them. It is worthwhile to underline the great interest of this step in the context of "multisource" data fusion. In fact, it permits the integration of numeric as well as of linguistic descriptions in the same homogenous framework of membership values [7,8]. In a multi-source classification application, the fuzzification in the source universes operates on the different sources of data "separately" in order to obtain the SR membership functions μ Cm ,n ( X ) . Based on these SR membership functions, the fuzzy reasoning step aims at using the expert's as well as a priori knowledge in order to compute the membership functions μ Cm ( X ) , m = 1, .., M, of the pattern X to different classes
μ Cm ( X ) = F §¨ μ C ©
m' ,n
( X )·¸ ¹
, n=1,…,N; m'=1, …., M.
(1)
where F denotes a fuzzy combination operator. A first simplification to (1), generally used in fuzzy multi-source classification, is to consider the SR membership functions concerning the same class of interest (i.e; m'=m). Equation (2) is thus reduced to:
(
)
μ Cm ( X ) = F μ Cm ,n ( X ) , n = 1, .., N.
(2)
This equation means that μ Cm ( X ) is computed as a function of the SR membership functions for the same class Cm. The wide range of combination operators proposed in fuzzy set literature reflects the powerful as well as the flexibility of the use of fuzzy concepts. In this study, the simple fuzzy intersection operator is used. Notice that this operator considers the N sensors as "equivalently" informative about the considered class Cm. In the case where a priori knowledge concerning the "Sensors/classes relationship" is available, then, other more appropriate combination rules should be used. In this study, we assume the absence of such a priori knowledge. As mentioned earlier, the two preceding steps aim at computing the membership functions μ Cm ( X ) , m = 1, .., M, of the pattern X to different classes. The defuzzification step aims at obtaining a ‘hard’ membership decision by attributing the pattern X to only one of the predefined classes. Similarly to the maximum likelihood decision approach, the defuzzification step uses, generally, the maximum membership decision rule.
3 Fuzzy Classifier Design As previously mentioned, the expert decision making process is mainly based on the use of multi-source and contextual (spatial) knowledge information sources. In this study, we propose a fuzzy classifier which is strongly inspired from the expert's reasoning approach. An important issue is also addressed through this classifier.
Multisensor Information Fusion Application to SAR Data Classification
367
It concerns the fact that the classification results (given as a thematic map) lake some additional information related to the degree of certainty, and/or complexity, associated with each thematic decision. The following assumptions are made for the development of this classifier:
• The observed scene contains M thematic classes denoted as C1, ..., Cm, ..,CM. • Each pixel in the observed scene is characterized by a feature vector : X = [x1 , x 2 , , x N ] , x n ∈ Ω n , n = 1, 2 , , N , where xn is the observed, measured or computed data from the nth sensor at the pixel's location, Ω denotes the n
nth sensor data observation space and N the number of sensors. • Each thematic class Cm, is "described" by a learning set Bm of samples extracted, by the expert, from the analyzed scene. This classifier is composed of three systems: 1) a fuzzy membership determination, 2) a contextual processing, and, 3) a decision systems. The fuzzy membership determination system aims at realizing the multi-source data fusion by computing the membership values, μ Cm (⋅) , of different pixels to the M predefined thematic classes. The contextual processing system is devoted to the use of the contextual information in order to compute a contextual-based membership values, denoted η Cm (⋅) . Finally, the decision system is in charge of making classification decisions, when possible, based of the contextual membership values. In the case where a decision is made, this information is back-propagated in order to update the initial fuzzy membership values. In the following sub-sections, these systems are detailed. 3.1 Fuzzy Membership Data Fusion
The fuzzy membership determination, constitutes the only processing step at the pixel level. It aims at assigning, to each object of the observed scene, membership values to the M predefined thematic classes. This determination process can be intuitive (i.e.; empirical representing the expert’s knowledge) or it can be based on some algorithmic, probabilistic or logical operations. Consider a thematic class Cm and let p Cm ( x n ) denote its defined over the set Ω n , the observation space associated with the nth sensor. The probability density functions are computed using the histograms estimated from the learning set Bm. In the framework of fuzzy set theory, multi-source information fusion aims at “combining” the membership values in order to obtain a single membership value μ Cm (⋅) of the observed pixel to the class Cm. This value is assumed to resume all the knowledge concerning the observed pixel, in terms of membership to the class Cm. In this study, the fuzzy intersection operator is used. This operator assumes that different sensors to be merged are reliable, meaning that they should be
368
H.-H. Wang, Y.-S. Lu, and M.-J. Chen
agreeing. This is of no doubt the simplest combining operator in terms that it is sensor and classes independent. Nevertheless, in case of conflicting sensors or an unreliable sensor, the membership value obtained through this operator is worthless. Notice that specific knowledge related to different sensors and to their thematic classes discrimination capacities can be easily introduced through the use of more sophisticated aggregation operators. 3.2 Contextual Membership Data Fusion
The aim of the contextual processing system is to integrate the spatial correlation between adjacent pixels in order to improve the classification results. When analyzing the expert's decision making reasoning process, we notice that if a pixel-based multi-source data decision is difficult to be made, then, the expert's decision is based on the "homogeneous region" surrounding the considered pixel. This task is particularly difficult when dealing with data affected by speckle noise (radar images, for instance). In this study, the proposed classifier introduces an intermediate contextual level (a restricted spatial neighborhood of each pixel). This local neighborhood is exploited in order to "update" the membership values to different thematic classes. This restricted neighborhood contextual membership values updating is iteratively repeated in order to "propagate" (or to diffuse) the validated decisions to their surrounding regions. The pixel P0 restricted neighborhood considered in this study is a 3x3 window centered on P0. Only the membership values, in the restricted neighborhood, to one class of interest are considered in order to update the membership values (i.e; cross membership values are not considered). The contextual membership determination process for each class Cm (m=1, .., M) affects pixels for which the membership to the considered class is different from zero meaning that P0 possibly belongs to the thematic class. In this case, the contextual membership value is computed using the multi-source membership assignments, to the same class, in the restricted neighborhood of this pixel. The mean value of the membership degrees is used here. An attractive feature of this contextual process is that it does not use any particular knowledge concerning existing structures in the analyzed scene. Thus, linear structures will not be necessarily preserved as will be shown through the simulation results. 3.3 Decision and Confidence Map Computing
The decision making system aims at attributing, based on the contextual membership values, of a classification decision label for each pixel in the analyzed scene. A decision rule is applied in order to determine, for each pixel, whether a decision can be justified, and thus made, or not. Based on the initial fuzzy membership values, the decision making rule attributing the pixel P0 to the class m0 is as follows:
((
)
(
if η Ct m (P0 ) ≈ 1 AND η Ct m (P0 ) ≈ 0 ∀m ≠ m 0 0
then (decide class m0).
))
(3)
Multisensor Information Fusion Application to SAR Data Classification
369
This rule means that if the pixel-based membership degree to a given class m0 is the highest (unity) and is null to all the other classes, then, a decision can be made. Such a decision is associated the highest certainty value (unity). If t >0 (contextual membership values are used), the decision making rule is given by:
((
)
(
if η Ct m (P0 ) ≥ 0.5 AND η Ct m (P0 ) < 0.5 ∀m ≠ m 0 0
))
(4)
then (decide class m0).
The reason for which this decision rule is less “severe” than the pixel-based rule is that the contextual membership values at the iteration “t” are obtained by averaging those of the iteration “t-1” over the restricted pixel's neighborhood. Therefore, the existence of pixels belonging to other classes in this neighborhood will not disturb significantly the decision making process. The certainty factor associated with a decision is considered as a decreasing function of the iteration “t”. The stopping condition used in this algorithm is simply that the number of newly labeled pixels through the current contextual membership values updating iteration becomes non significant compared to the total number of pixels in the analyzed scene. Notice that when a pixel P0 is labeled as belonging to the class m0, then, this decision is "diffused" to its surrounding pixels by setting the membership value of P0 to the class m0 to unity and, its membership value to other classes as equal to zero. As a result of the application of this algorithm, a thematic as well as a confidence maps are obtained.
4 Simulation Experiments In remote sensing SAR images, objects displace versatile textures. Partitioning them into only a few classes will cause great intra-class scattering and inter-class overlapping. So it is proper to take the stratification scheme. The first layer is composed of the concept classes, which represent the textures’ basic properties. Then each class is divided into several sub-classes in the next layer. In our experiment, a SAR image of Marseilles is used. In this image, the concept classes include ocean, cultural, hill and vegetation. The intensities and the contrast of waters vary with their locations. Therefore the ocean class is divided into four sub-classes. Similarly the cultural class is divided into downtown, habitation and suburb. The hill class includes two branches, and the vegetation cluster has only one sub-class due to its high uniformity. In the sample space half of them were chosen to be the training set, the other half to be the testing set. The samples for each class are shown in Fig 1. From (a) to (j) indicate alphabetically all the sub-class textures of the ocean, the cultural, the hill, and the vegetation respectively. In the experiment the features of each pixel were calculated in its 16×16 neighbors. There are various classifiers, and some of them are relatively complex. We mean to show the comparison of the new features to other ones, so the widely used statistical classifier, i.e. the Bayes classifier is considered.
370
H.-H. Wang, Y.-S. Lu, and M.-J. Chen
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
Fig. 1. Examples of the training sample sets
Table 1 lists the classification accuracy for each testing texture, it also shows the rate that a texture is mistaken for all other textures. From Table 1 it can be seen that the classification accuracy of ocean class is the highest. The mistakes of it mainly concentrate on the vegetation class. This is proven from the mistakes of the vegetation. It also shows that some suburb textures are difficult to be distinguished from the hill textures. To show the advancement of the new features, we also chose other classification techniques to make comparisons. They are the wavelet histogram signatures [10] and Kaplan’s definitions [9]. Reference [10] reported another signature, the co-occurrence matrix signatures for the detail images, including inertia, total energy, cluster shade, cluster prominence, local homogeneity, entropy, max probability, and information measure of correlation, but we did not adopt these features to avoid too high time consumption. Table 1. Classification results for training samples(unit:%)
Ocean
Cultural
Hill
Vegetation
Ocean1
92.92
0
2.83
4.25
Ocean2
87.35
0
0
12.65
Ocean3
95.16
0
3.23
1.61
Ocean4
91.67
0
2.78
5.56
Cultural1
0.91
87.36
11.73
0
Cultural2
1.72
96.55
1.72
0
Cultural3
0
90.83
9.17
0
Hill1
0
10.79
89.21
0
Hill2
4.09
6.36
84.45
5.09
Vegetation
10.44
0
3.49
86.07
Overall correction
91.26
Multisensor Information Fusion Application to SAR Data Classification
(a)
(b) Fig. 2. (a) is a gray level image, and (b) is the result of classification Table 2. Classification accuracy comparison for image of Marseilles (unit :%)
Ocean1
Wavelet histogram 79.72
Kaplan features 91.04
Information fusion 92.92
Ocean2
94.12
82.35
87.35
Ocean3
69.35
93.55
95.16
Ocean4
50.00
88.89
91.67
Cultural1
68.18
82.27
87.36
Cultural2
93.10
96.55
96.55
Cultural3
84.43
90.42
90.83
Hill1
62.07
68.97
89.21
Hill2
59.09
59.09
84.45
Vegetation
44.19
76.74
86.07
Overall correction
70.43
82.99
91.26
371
372
H.-H. Wang, Y.-S. Lu, and M.-J. Chen
The result of classification of the test set is shown in Fig 2. Fig 2 (a) is a gray level image, including ocean, urban, suburb, shore, and island. Fig 2(b) is the classes obtained. It can be seen that the correct classification rate to ocean cluster is rather high. The island and part of the shore are basically taken as the hill and bare land class. The land has more errors, because some of the textures there are similar to the hills’ and some even can not be divided into any classes properly, although most of the textures belong to the suburb. It also implies the complexity and the overlapping of the real data. Examining the obtained results clearly shows that the proposed approach allowed the correct classification of more than 90% of the test site.The experiment shows in Table 2 that the overall classification correction for our features is 91.26%, while it is 82.99% when using Kaplan features and 70.43% when using the wavelet histogram signatures.
5 Conclusions and Further Comments In this study, a fuzzy-based multi-source data fusion classifier is developed and applied to land cover classification using ERS-1/JERS-1 SAR composites. This classifier aims at the integration of multi-source, contextual and a prior information in a single and a homogeneous framework. Initial fuzzy membership maps to different thematic classes are first calculated using classes and sensors a priori knowledge. These maps are then iteratively updated using spatial contextual information. A classification rule is associated to different iterations. This classifier has the following advantages: first, due to the use of fuzzy concepts, it has the flexibility of integrating multi-source/contextual and a priori information in a single and a homogeneous framework. Second, the classification results consist of a thematic as well as a confidence maps. The confidence map constitutes an important issue in order to evaluate the classification process complexity and the validity of the used assumptions. The application of this classifier using ERS-1/JERS-1 SAR composites is shown to very promising. The proposed approach is actually studied in order to introduce through the computation of fuzzy membership maps of a priori knowledge concerning the thematic class's discrimination power of each sensor.
Acknowledgment This work was supported by the Nature Science Foundation of Hubei Province, China (No. 2005ABA232). The authors would like to thank the reviewers for providing us with valuable comments and insightful suggestions that have brought improvements to several aspects of this manuscript.
References 1. Thomas, N., Hendrix, C., Congalton, R. G.: A Comparison of Urban Mapping Methods using High-resolution Digital Imagery. Photogrammetric Engineering and Remote Sensing, (2003) 963– 972 2. Franklin, S. E., Hall, R. J., Moskal, L. M., Maudie, A. J., Lavigne, M. B.: Incorporating Texture into Classification of Forest Species Composition from Airborne Multispectral Images. International Journal of Remote Sensing, (2000) 61– 79
Multisensor Information Fusion Application to SAR Data Classification
373
3. Jiao Z., Li X., Wang J.: A New Fusion Algorithm Based on Classification. Journal of Image and Graphics, (2002) 71-775 4. Benz, U. C., Hoffmann, P., Willhauck, G., Lingenfelder, I., Heynen, M.: Multi-resolution Object-oriented Fuzzy Analysis of Remote Sensing Data for GIS-ready Information. ISPRS Journal of Photogrammetry and Remote Sensing, (2004) 239– 258 5. Itti, L., Koch C.: A Saliency-based Search Mechanism for Overt and Covert Shifts of Visual Attention. Vision Research, (2000) 1489-1506 6. Bosse, E., Roy, J., Paradis S.: Modelling and Simulation in Support of Design of a Data Fusion System. Information Fusion, (2000) 77–87 7. Shaw, G., Manolakis, D.: Signal Processing for Hyperspectral Image Exploitation. IEEE Signal Processing Magazine, (2002) 12-16 8. Jeon, B., Landgrebe, D.: Decision Fusion Approaches for Multitemporal Classification. IEEE Transactions on Geoscience and Remote Sensing, (1999) 1227-1233 9. Kaplan, L.M.: Texture Roughness Analysid and Synthesis via Extended Self-similar(ESS) Model. IEEE Trans. On PAMI, (1995) 1043-1056 10. Wouwer, G., Scheunders, P., Dyck D.: Statististical Texture Characterization from Discrete Wavelet Representations. IEEE Trans. On Image Processing, (1999) 592-598
NDFT-Based Audio Watermarking Scheme with High Robustness Against Malicious Attack Ling Xie, Jiashu Zhang1, and Hongjie He 1 Sichuan
Key Lab of Signal and Information Processing Southwest Jiaotong University, Chengdu 610031, China [email protected]
Abstract. Due to public frequency points leading to weak robustness against malicious attack in DFT-based audio watermarking algorithm, a novel audio watermarking scheme based on nonuniform discrete Fourier transform (NDFT) is proposed to improve robustness against malicious attack in this paper. NDFT transform could set up random sampling points in frequency domain as desired rather than fixed frequency points in DFT. The characteristic of NDFT provides the possibility to hide embedding positions. In the NDFT-domain selectable frequency range which satisfies necessary requirements of audio watermarking scheme, the proposed scheme utilizes chaotic system to select embedding points for random and hidden embedding positions. Theoretical analysis and simulation results show that the proposed scheme not only guarantees robustness against common operations, but also further improves systemic security and robustness against malicious attack due to the secret embedding position.
1 Introduction With the rapid development of Internet and maturity of audio compression technology, network music represented by MP3 spreads more and more widely. On the other hand, it also leads to unscrupulous duplication and pirate of music products which makes authors and publishers’ benefit suffer from enormous harms. The need for protecting intellectual property becomes more and more important [1-2]. Recently, robust audio watermarking techniques have been an effective solution to copyright protection and authentication [1]. Transform-domain watermarking algorithms have obtained very good inaudible of Human Audible System (HAS) and strong robustness against common signal processing operations [3-5]. One of the typical representations is quantizing discrete Fourier transform (DFT) parameters (amplitude or phase) to embed watermark proposed by Wang and Sun [5], in which every 8 samples as a segment is carried on DFT. The frequency points of DFT are chosen as [0, π / 4, π / 2,3π / 4, π ,5π / 4,3π / 2,7π / 4] . Generally, the middle frequency point “ π /2” is considered as popular tradeoff between insensitivity and robustness to embed watermark. It has very high robustness against filtering, compression, re-sampling. However, there is a secure hidden trouble for fixed embedding position in DFT domain. According to the hypothesis of Kerckhoff in 1983, the security of a cipher or D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 374 – 381, 2006. © Springer-Verlag Berlin Heidelberg 2006
NDFT-Based Audio Watermarking Scheme
375
other mechanism must rely on the secret key solely but the secrecy of the algorithm . As DFT-based algorithm itself is public, the embedding position for every segmented audio signal is fixed and public. The huge weakness could be used to destroy the watermark maliciously with slightest effort. For example, a malicious attack by plus or minus a constant value on the frequency point π /2 is designed to attack [5]; the attacked watermarked signal fails to extract while has well inaudible. That is to say, a security hole against malicious attack in DFT-based audio watermarking algorithm has been produced. Due to the above problem, a robust audio watermarking scheme based on nonuniform discrete Fourier transform (NDFT) is proposed to improve robustness against malicious attack by hiding the embedding watermark positions. The NDFT characteristic of setting up arbitrary frequency points in frequency domain provides possibility to hide embedding position. Chaotic system is used to select random embedding positions in the selectable frequency range of NDFT domain. Theoretical analysis and simulation results show that the proposed scheme further enhances robustness against malicious attack on the basis of comparative robustness against normal signal processing operations.
[6]
2 NDFT-Based Watermarking Scheme 2.1 1-D NDFT Transform 1-D NDFT and its inverse transform (INDFT) are: N −1 F ( k ) = f (n)e − jkmn ¦ m ° ° n =0 , m = 0,, M − 1 . ® M −1 jk m n ° f ( n) = 1 ¦ F ( k m )e °¯ M m =0
(1)
Where, km can be any real numbers within [0,2π ) , M expresses the number of frequency sampling points. Compared DFT with NDFT, the difference is different methods of choosing frequency sampling points, i.e., the method of choosing k m , m = 1,, M in formula (1) is different. Fig.1 shows different chosen methods in DFT and NDFT as M=8: (a) shows the equal interval sampling in DFT, where, km=(2π/8)*m, m=0,1,2,…,7; (b) shows one form of choosing nonuniform interval sampling, where, km ∈[0, 2π), m=0,1,2,…,7. The proposed scheme adopts the following method [7] to realize NDFT: NDFT with N points can be expressed by arbitrary N points on unit circle in Zplane, i.e.: N −1
X ( z k ) = ¦ x( n) z k−n n=0
k = 0,1, N - 1 .
(2)
376
L. Xie, J. Zhang, and H. He
(a) DFT method
(b) NDFT method
Fig. 1. The sketch map of DFT and NDFT sampling methods in frequency domain
Where, z0, z1,…,zN-1 are arbitrary N different points in Z-plane, matrix expression of formula(3) is given by: X = Dx .
Where
ªX (z0 ) º » « X ( z1 ) » X =« » « » « X(z ) » N -1 ¼ ¬«
ª1 « «1 D=« « «1 ¬
ª x(0) º « » x(1) » x=« « » « » x ( N − 1 ) ¬« ¼»
(3) z -1 0
z -2 0
z1-1
z1-2
z -N1 -1 z -N2-1
-1) º z -(N 0 » -(N -1) ». z1 » » -(N -1) » z N -1 ¼
Matrix D is a Vandemone matrix and entirely determined by N points zk (k=1,2,…,N-1). Expression of matrix D is: det( D) = ∏ ( zi−1 − z −j 1 ) .
(4)
i ≠ j ,i > j
-1
D is nonsingular, so INDFT exists and is unique, that is: x=D X. 2.2 NDFT-Based Watermarking Algorithm 2.2.1 The Selectable Frequency Range in NDFT In the proposed scheme, a 16-bit 44.1k samples/s audio signal named ‘1.wav’ is used as carrier, and a binary image ‘swjtu.bmp’ is used as watermark. They are showed in fig.2. The signal to noise ratio (SNR) is used to evaluate the inaudibility of watermarked audio signal; and the normalized correlation coefficient is used to evaluate similarity between original watermark image and the extracted one which is defined as: M1 M 2
ρ (W , Wr ' ) =
¦ ¦ w(i, j )wr ' (i, j )
.
i =1 j =1
M1 M 2
¦ ¦ w (i, j ) i =1 j =1
2
M1 M 2
¦¦ wr ' (i, j ) i =1 j =1
2
(5)
NDFT-Based Audio Watermarking Scheme
377
swjtu.bmp
Fig. 2. Original signal and watermark image
(a)
(b)
Fig. 3. The various curves of SNR and similarity according to variable middle frequency points
In our experiments, in order to estimate the range of selectable middle frequency coefficients as embedding positions with good insensitivity and high robustness against common operations, for every 8 samples as a segment, it is tried to embed watermark with frequency points [0, π / 4, f i ,3π / 4, π ,5π / 4,2π − f i ,7π / 4], f i ∈ (π / 4,3π / 4) in NDFT domain. When the chosen step of f i is π / 2048 , figure 3(a) shows the SNR curve varying with f i . To test the robustness, a certain Gaussian noise is added into watermarked signal, (b) shows the similarity curve between original watermark and extracted watermark from the noise watermarked signal. The threshold is SNR = 32dB and ρ = 0.8 , which is considered as having good insensitivity and successful extraction. From above experiments it can be easily made out that the middle frequency variable range is [817π /2048, 1329π /2048] by
378
L. Xie, J. Zhang, and H. He
and [527π /2048,1535π /2048] by ρ ≥ 0.8 ; the intersection SNR ≥ 32dB [817π /2048, 1329π /2048] is the selectable frequency range in NDFT domain, which is shown in figure 1 (b). 2.2.2 Embedding Algorithm In order to compare NDFT-based with DFT-based watermarking algorithm conveniently, in the proposed scheme the visible binary image is used as watermark and the same methods of reducing dimension and encryption are also adopted similar to [5]. The details of embedding are as follows:
(1) Segment the audio signal, each segment is named as Ae(k ) ; (2) Select an embedding frequency points f (k ) ; For each segment Ae(k ) , an embedding frequency point f (k ) in the selectable frequency range calculated in 2.2.1 is chosen based on a chaotic secrete key
xk to carry
on NDFT. Here, Tent chaotic map is adopted to generate the specific secrete key sequence. Tent map is defined as: x1k / a x1,k +1 = ® ¯(1 − x1k ) /(1 − a)
0 ≤ x1k < a . a ≤ x1k ≤ 1
(6)
Where, ‘a’ is a constant and 0 < a < 1 . Here a=0.351 and the secrete key is as the initial value of Tent chaotic map. (3) Carry on NDFT with the chosen frequency points for every segment Fe( k ) = NDFT ( Ae(k ), f (k )) ; (4) Quantize the chosen frequency coefficient to embed reduced encrypted water[5] mark , the watermarked coefficient is named by f (k ) w ; During the process, two aspects should be paid more attention to: (4b) To guarantee embedded temporal values still to be real number by way of INDFT, embedding is implemented under positive symmetric condition similar to DFT. That is: on the chosen frequency point, let f ( k ) = f ( N − k ) . The *
positive symmetric condition is defined as: | F ( k ) |←| F ( k ) | +ε ,
(7a)
| F ( N − k ) |←| F ( N − k ) | +ε .
(7b)
(4b) The value of quantization step : the quantization step and amplitude in [5] are used for reference, here quantization step =5120. That is the compromise of inaudible and robustness. (5) Do INDFT with the embedded frequency coefficient, the segmented watermarked audio signal is got by Ae(k ) w = INDFT ( f (k ) w ) ; (6) Combine the segmented watermarked audio signal as the watermarked audio signal A w .
NDFT-Based Audio Watermarking Scheme
379
2.2.3 Extracting Algorithm In this paper, extracting watermark does not need original audio signal due to quantizing NDFT amplitude coefficients. The process of extracting is the inverse process of embedding; the detailed steps are:
(1) Segment the received audio signal A* with the same length at the sender, each segment is named by Ae(k )* ; (2) Carry out NDFT for each Ae(k ) * by the same embedding frequency points based on symmetric chaotic secret key Fe( k ) * = NDFT ( Ae(k ) * , f (k )) ; (3) Extract watermark by inverse quantization rule on the chosen frequency point f (k ) ;
3 Performance Analyses and Simulation 3.1 Performance Analyses
The proposed NDFT-based audio watermarking scheme embeds watermark into a key-based hidden frequency point in NDFT-domain selectable frequency range. The performance analyses are shown as follows: (1) Comparative robustness against common signal processing operations: the NDFT-domain selectable frequency range is chosen as the tradeoff of inaudible and robustness against normal operations. The embedded frequency points are all from this range in NDFT-based watermarking scheme. Consequently, NDFT-based watermarking algorithm has the comparative robustness against common signal processing operations; (2) High robustness against malicious attack: the selectable frequency range brings the possibility to be hidden; and the chosen embedding point based on chaos secret key determines hidden and random embedding position. The two measures have guaranteed the privacy of the embedding position, which has higher robustness against malicious attack in NDFT-based audio watermarking algorithm than fixed embedding positions in DFT-based watermarking scheme. Therefore, the intruder without secret key must be difficult to implement malicious attack on NDFT-based audio watermarking scheme. (3) High systematic security: chaotic systems have high security due to extreme sensitivity to initial value. In the proposed method, chaotic Tent map is used to choose embedding position, the initial value of Tent map is considered as systematic secret key. Therefore, the security of whole system only relies on secret key, which makes it have higher systematic security. 3.2 Robustness Against Common Operations
The experimental results are shown in table 1, where, the mp3 compression is 128kbps; the frequency of re-sampling attack varies from 44.1khz to 96khz, and then changes back to 44.1khz; the quantization bits of re-quantization attack varies from16-bit to 8-bit, and then changes back to 16-bit; the low-pass filtering is 6 order with 22.05khz cutoff frequency.
380
L. Xie, J. Zhang, and H. He Table 1. Performance comparison
DFT [5] 1
No attack Mp3 compression Re-sampling Re-quantization low-pass filtering 1 2
ȇ 1 0.9989 0.9998 1 1
SNR2 35.4072 35.1246 35.3736 35.1785 35.2881
NDFT(Proposed) ȡ SNR 1 34.5925 1 34.2847 0.9883 33.7254 1 34.4072 1 34.5542
ȡ: degree of similarity. SNR: Signal-to-noise ratio.
From table 1 we can easily see that the robustness against common signal processing operations is as high as DFT-based [5]. Even if the SNR is a little lower than [5], it only contributes slight influence in sense of hearing. 3.3 Robustness Against Malicious Attack
In the experiment, we adopt the same attack method mentioned in introduction to compare the robustness NDFT-based with DFT-based. A fraction of watermarked audio signal suffers from malicious attack. The similarity between extracted watermark and original one would be as the performance index to evaluate.
Fig. 4. The comparison of similarity between NDFT-based and DFT-based
Fig.4 shows the experimental results. From the results, it is clear that no matter how much attack degree is, NDFT-based watermarking algorithm always has higher similarity than DFT-based watermarking algorithm. Therefore, the NDFT-based audio watermarking algorithm has stronger robustness against malicious attack due to
NDFT-Based Audio Watermarking Scheme
381
the secret embedding position instead of public position in DFT domain. That means if attacker wants to destroy watermark in watermarked audio signal based on NDFT, he needs to pay more expense in sense of hearing; unfortunately, it is usually not tolerable and acceptable for receiver.
4 Conclusion In this paper, a robust NDFT-based audio watermarking scheme with high robustness against malicious attack is put forward. The proposed scheme utilizes the characteristic of NDFT to set up arbitrary sampling points in frequency domain as desired for hiding positions to be embedded. Besides this, chaotic system is also adopted to select embedding points for random and hidden embedding positions, so the embedding points which are in the selectable frequency range are private. Consequently the proposed method has higher robustness against malicious attack due to the secret embedding position instead of public position in DFT domain. The performance analyses and simulation results show that the proposed scheme not only guarantees robustness against common operations, but also further improves robustness against malicious attack and systemic security.
5 Future Work In further work, we will study the computational cost of this algorithm and discuss whether the usage of the Fast Fourier Transform could be beneficial for Fast Nonuniform Fourier Transform. Moreover, the SNR values of the proposed scheme needs to be improved.
References 1. Bender, W., Gruhl, D., Morimoto, N., Lu, A.: Techniques for Data Hiding. IBM System Journal. (1996)313-336 2. Wu, S. Q, Huang, J. W., Huang, D., Yun, Q. Shi: Efficiently Self-synchronized Audio Watermarking for Assured Audio Data Transmission. IEEE Transactions on Broadcasting. (2005) 69 -76 3. Yang, H.Y., Wang, X.Y., Zhao, H.: A New Digital Audio Watermarking Algorithm for Copyright Protection. Microelectronics and Computer. (2004)12-18 4. Muhammad Khurram Khan , Zhang, J.S., Tian, L.: Protecting Biometric Data for Personal Identification, Lecture Notes in Computer Science, Springer-Verlag, (2004)629-38 5. Wang, Q.S., Sun, S.H.: Watermark Embedding Algorithms Based on Quantizing Frequency Domain Parameters of Digital Audio Signal. Acta Acustica. (2002)379-385 6. A. Kerckhoff. La Cryptographie Militaire: Journal des Sciences Militaires 9th series. (1883) 5-38,161-191 7. Sonali Bagchi, Sanjit K Mitra: The Nonuniform Discrete Fourier Transform and its Application in Filter Design. IEEE Trans. on Circuits and System- :Analog and Digital Signal Processing. (1996) 422-433
New Multiple Regions of Interest Coding Using Partial Bitplanes Scaling for Medical Image Compression Li-bao Zhang and Ming-quan Zhou College of Information Science and Technology, Beijing Normal University 100875, Beijng, China [email protected]
Abstract. In the paper, a new Multiple Regions of Interest (MROIs) coding for medical image compression is proposed. The new method takes full advantage of two scaling strategies: Multiple Bitplanes Grouping (MBG) and Partial Bitplanes Scaling (PBS). According to different importance of ROI and BG (background) bitplanes, the MBG divides all bitplanes into three parts: MSRB, GSRB and LSB. Then, the PBS scheme completes all bitplanes shift. When MROIs are coded in an image, the new method can adjust the bitplane scaling value of every ROI in GSRB based on the degree of interest of ROI. The experimental results show the new method not only has the advantages of the Maxshift method, but also supports MROIs coding in different degrees of interest without any ROI shape information.
1 Introduction The medical image compression is necessary because they produce prohibitive amounts of data. For example, the CT or MRI image, which produce human body pictures in digital form. Additionally, the wireless transmission of medical images and telemedicine applications also need the efficient and high quality medical image compression methods. Many current compression schemes provide a very high compression rate but with considerable loss of quality. On the other hand, in some areas in medicine, it may be sufficient to maintain high image quality only in the Region of Interest (ROI), i.e., in diagnostically important regions. Based on these facts, the ROI coding for medical images is proposed. In recent years, several ROI coding technologies for the medical image compression have been proposed in [1-3]. In [1] and [3], a new Region-Based Discrete Wavelet Transform (RBDWT) is proposed. And based on the RBDWT, an Object-Based extension of the Set Partitioning in Hierarchical Tree (OB-SPIHT) coding algorithm is applied to the region-based medical image compression and represents an improvement in the compression efficiency from full-image methods. In [3], I. Ueno and W. A. Pearlman complete the ROI coding using the 3D-SPIHT on the medical volumetric images. Although the above algorithm can support ROI coding of medical image, they utilize the zerotree or hierarchical tree scheme, which is incompatible with the JPEG2000 standard. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 382 – 391, 2006. © Springer-Verlag Berlin Heidelberg 2006
New MROIs Coding Using PBS for Medical Image Compression
383
In JPEG 2000 standard, two coding algorithms so-called Maxshift (maximum shift) method in part 1 [4-6] and the general scaling-based method in part 2 [4-6] along with the syntax of a compressed codestream are presented. In these methods, regions of interest can have better quality than the rest at any decoding bit-rate. In other words, this implies a non-uniform distribution of the quality inside the image. The Maxshift method in [4] is efficient, but it cannot support Multiple Regions of Interest (MROIs) coding with different degrees of interest, since all ROIs have the same scaling value in an image. The general scaling-based method can support MROIs coding. But it needs to code every ROI shape, which not only improves coding complexity, but also restricts ROI shape selection. In this paper, we propose a new and flexible MROIs coding method for the medical image compression. The method is called Multiple Bitplanes Grouping and Partial Bitplanes Scaling (MBG-PBS). The MBG strategy divides all bitplanes into three parts: MSRB, GSRB and LSB. Then, the PBS scheme completes the bitplanes shift. When MROIs are coded in an image, according to the different significance of different areas in a CT or MRI image, the new method can adjust the bitplane scaling value of every ROI in GSRB based on the degree of interest of ROI. The experimental results show the new method not only has the advantages of the Maxshift method, but also supports MROIs coding in different degrees of interest without any ROI shape information. The remainder of this paper is structured as follows. In Section 2, the several main ROI coding methods for the medical image are reviewed. In Section 3, the MBG-PBS method is presented. In Section 4, the MROIs coding based on the MBG-PBS method is proposed, while the experimental results are given in Section 5. Finally, the conclusions are drawn in Section 5.
2 The Significant ROI Coding Methods The functionality of ROI is important in applications where certain parts of the image are of higher importance than others. In such a case, these ROIs need to be encoded at higher quality than the background. During the transmission of the image, these regions need to be transmitted first or at a higher priority, as for example in the case of progressive transmission. ROI coding is based on wavelet transforms and lifting scheme. More recently, two basic coding strategies are presented in the literatures-ROI coding based on zerotree or zeroblock scheme and ROI coding based on bitplane coding and EBCOT [4-7]. The former have been researched and applied widely, as for example EZW, SPIHT and SPECK for volumetric medical data [1-3]. However, the methods have high coding complexity and do not realize spatial scalability. The latter can realize spatial scalability and reduces coding complexity, which enable it to be recommended by JPEG2000. 2.1 The General Scaling Based Method The general scaling based method is recommended by part two of the JPEG2000 standard. In the method, regions of interest can have better quality than the rest at any
384
L.-b. Zhang and M.-q. Zhou
Fig. 1. Standard ROI coding methods in JPEG2000, where coefficient bitplanes are represented by the gray blocks; No scaling (top); the general scaling based method (medium); the Maxshift method (bottom)
decoding bit-rate. In other words, this implies a non-uniform distribution of the quality inside the image. The general scaling-based method can support a bitplane scaling with the arbitrary value, so allows fine control on the relative importance between ROI and BG. However, the general scaling based method has two major drawbacks [7-9]. First, it needs to encode and transmit the shape information of the ROIs. This rapidly increases the algorithm complexity. Second, if arbitrary ROI sharps are desired, the shape coding will consume a large number of bits, which significantly decreases the overall coding efficiency. 2.2 Maxshift Method To solve these problems in the general scaling based method, JPEG2000 proposes the Maxshift method, which is a particular case of the general scaling-based method when the scaling value is so large that there is no overlapping between BG and ROI bitplanes. All significant bits associated with the ROI after scaling will be in higher bitplanes than all the significant bits associated with the background. Therefore, ROI shape is implicit for the decoder in this method, and arbitrarily shaped ROI coding can be supported. Figure 1 shows these two ROI coding methods in JPEG2000. The Maxshift method has two main disadvantages [7-9]. First, it does not select an arbitrary scaling value as to define the relative importance of the ROI and the BG coefficients as in the general scaling-based method. This means in all the subbands, no information about the non-ROI coefficients can be received until every detail of the ROI coefficients has been fully decoded, even if the detail is imperceptible random noise or unnecessary information. Second, the EBCOT algorithm in JPEG2000 also increases the complexity of ROI coding. Figure 1 shows the diagram of the Maxshift method.
New MROIs Coding Using PBS for Medical Image Compression
385
2.3 ROI Coding Based on RBDWT and OB-SPIHT Region of interest (ROI) coding techniques are particularly suitable for medical imaging. Such methods provide the possibility of adequately compressing those regions with diagnostic relevance with better quality than the rest of the image. In [1] and [3], W. A. Pearlman proposed OB-SPIHT coding algorithm. It is object-based extensions of full-image SPIHT algorithms. It preserves the features of the original methods. When OB-SPIHT is used with RBDWT, it is possible to efficiently perform wavelet subband decomposition of a region of arbitrary shape, while maintaining the same number of wavelet coefficients to be coded than pixels within the region and keeping spatial correlation and self-similarity across subbands. The main drawback of RBDWT in the ROI coding is that the shape information of ROI must be coding, which significantly decreases the ROI coding efficiency.
3 The Basic Scheme of MBG-PBS Method The ROI coding system based on the MBG-PBS method includes three parts: integer wavelet transform, ROI mask generation, MBG-PBS method and bitplane encoding. In the basic MBG-PBS method, we hope the ROI can obtain higher quality than the BG at low bit rates, while at the high bit rates, both ROI and BG can be coded with high quality and the difference between them is not very noticeable [8], [9]. 3.1 ROI Mask Generation in the MBG-PBS Method When an image is coded with one or multiple ROIs, it should be possible to reconstruct the entire ROI at a higher bit rates than BG part. It is therefore necessary to identify the wavelet coefficients needed for the reconstruction of the ROIs, so that they can be coded at a higher quality. For this purpose an ROI mask is created. This mask indicates which wavelet coefficients belong to an ROI. In fact, if all coefficients in the ROI mask are losslessly encoded, the ROI can be losslessly reconstructed. This is performed for lines and columns at each decomposition level. The process is then repeated for the remaining levels, until the entire wavelet tree is processed. Figure 2 gives the ROI mask of a CT image. We select two ROIs and three level wavelet decomposition.
Fig. 2. ROI mask of a CT image with two ROIs
386
L.-b. Zhang and M.-q. Zhou
3.2 Scaling Strategy of the MBG-PBS Method Although the MBG-PBS method still applies the bitplane shift, it is different from the Maxshift method. The new method takes full advantage of two scaling strategies: Multiple Bitplanes Grouping (MBG) and Partial Bitplanes Scaling (PBS). First, according to different importance of ROI and BG bitplanes, the MBG divides all bitplanes into three parts: MSRB, GSRB and LSB. Second, the PBS scheme completes all bitplanes shift in the MARB and GSRB. The bitplanes in LSB are not shifted. Figure 3 shows the diagram of MBG-PBS method in single ROI coding.
Fig. 3. Diagram of MBG-PBS method in single ROI coding
3.3 Basic Steps of the MBG-PBS Method In this paper, we index the original bottom bitplane (original LSB) as bitplane 1, the next up to original bottom as bitplane 2, the next down to original bottom as bitplane 1, and so on. At the encoder, the MBG-PBS method can be described as follows: 1. Bitplane scaling parameters definition: S MSR−1 : The number of the first group bitplane belonged to MSRB in ROI;
S MSR− 2 : The number of the second group bitplane belonged to MSRB in ROI; S MSB : The bitplane number belonged to MSRB in BG; S GSR : The bitplane number belonged to GSRB in ROI; S GSB : The bitplane number belonged to GSRB in BG; S LSB : The bitplane number belonged to LSB in BG; 2. For any bitplane b of an ROI coefficient: If b ∈ S GSR , no shift and encoding directly; If b ∈ S MSR − 2 , scaling b up to the bitplane b + S LSB + S GSB − S GSR ; If b ∈ S MSR −1 , scaling b up to the bitplane b + S LSB + S GSB + S MSB − S GSR ; 3. For any bitplane b of an BG coefficient: If b ∈ S GSB , no shift and encoding directly; If b ∈ S LSB , scaling b down to the bitplane −b ; If b ∈ S MSB , scaling b up to the bitplane b + S MSR − 2 ; At the decoder, for any given non-zero wavelet coefficient, the first step is to identify whether it is a bitplane of the ROI coefficient or the BG coefficient. The ROI decoding algorithm is presented as follows:
New MROIs Coding Using PBS for Medical Image Compression
387
1. If b < 0 , then b ∈ BG , the scaling b up to bitplane −b ; 2. If 0 < b ≤ S GSB + S LSB , then b ∈ BG or b ∈ ROI , no shift and decoding directly; 3. If S GSB + S LSB < b ≤ S MSR −2 + S GSB + S LSB , then b ∈ ROI , the scaling b down to bitplane b − S GSB − S LSB + S GSR ; 4. If S MSR − 2 + S GSB + S LSB < b ≤ S MSB + S MSR − 2 + S GSB + S LSB , then b ∈ BG , the scaling b down to bitplane b − S MSR − 2 ; 5. If b > S MSR −2 + S MSB + S GSB + S LSB , then b ∈ ROI , the scaling b down to bitplane b − S MSB − S GSB − S LSB + S GSR .
4 MROIs Coding Based on MBG-PBS Method In JPEG2000, both the Maxshift method and the general scaling based method can support the MROIs coding. However, each method has itself the drawbacks. The main drawback of Maxshift method is that the coefficient bitplanes of all ROIs must be scaled with the same values, which does not have the flexibility to allow for an arbitrary scaling value to define the relative importance of the ROIs and BG wavelet coefficients, and cannot code ROIs according to different degrees of interest. Additionally, in Maxshift method, all bitplanes of the BG coefficients cannot be decoded until the all bitplanes of all ROIs are decoded. The general scaling based method can offer the MROIs coding with different degrees of interest, but it needs to encode the shape information of ROIs, which increases the complexity of encoder/decoder when the number of the ROIs increases. Additionally, when arbitrary ROI shapes are desired, the shape coding of the ROIs will consume a large number of bits, which reduces the overall coding efficiency.
Fig. 4. Diagram of MBG-PBS method in MROIs coding
The MBG-PBS method not only can support arbitrary ROIs shape without shape coding, but also allows arbitrary scaling value between the ROIs and BG, which enables the flexible adjustment of compression quality in ROIs and BG according to different degrees of interest. The scheme of the PBLShift method for multiple ROI coding is illustrated in figure 4. The encoding and decoding method for MROIs are similar to that for single ROI. However, a key point must be noticed. For each ROI,
388
L.-b. Zhang and M.-q. Zhou
S MSR− 2 and S GSR are only variable, but the scaling value from S GSR to S MSR− 2 is the constant S v for every ROI. According to this rule, the determinant of the ROI coefficient is as simple in the encoder as follows: 1. If b < 0 , then b ∈ BG , the scaling b up to bitplane −b ; 2. If 0 < b ≤ S GSB + S LSB , then b ∈ BG or b ∈ ROI , no shift and decoding directly; 3. If S GSB + S LSB < b ≤ S MSR −2−i + S GSB + S LSB , then b ∈ ROI , the scaling b down to bitplane b − S GSB − S LSB + S GSR −i
(i = 1,2,3...n) ;
4. If S MSR − 2−i + S GSB + S LSB < b ≤ S MSB + S MSR − 2−i + S GSB + S LSB , then b ∈ BG , the scaling b down to bitplane b − S MSR − 2−i
(i = 1,2,3...n) ;
5. If b > S MSR −2−i + S MSB + S GSB + S LSB , then b ∈ ROI , the scaling b down to bitplane b − S MSB − S GSB − S LSB + S GSR −i
(i = 1,2,3...n) .
In the above algorithm, n is the number of the ROIs. As illustrated in Fig. 4, at the low bit rates, different bitplanes are decoded with different degrees of ROI interest. At the high bit rates, both ROIs and BG can be coded with high quality and difference between them is not very noticeable. Additionally, MBG-PBS method can also support some BG bitplanes are prior to encode if the ROI detail is imperceptible random noise or not important.
5 Experimental Results For the examples provided here, one bitstream, in progressive by accuracy mode, is generated for each test image using the ROI coding scheme explained above. We select CT, MRI and state-of-the-art images and compare the MBG-PBS method and other efficient ROI coding methods.
Fig. 5. Comparison of different ROI coding methods for CT image with single ROI at 0.25 bpp; the original CT image (left); the reconstructed image using the Maxshift method (medium); the reconstructed image using the MBG-PBS method (right)
Figure 5 gives an original CT image and two reconstructed CT images with single ROI. The left picture shows the original image with an arbitrarily shaped ROI and the covering area is about 5.02% of the whole image. The image is a 512 × 512 gray CT
New MROIs Coding Using PBS for Medical Image Compression
389
image. In the encoding, we utilize the 5/3 wavelet filter group. The decomposed level is five and the decoding rate is 0.25bpp. The medium picture is the reconstructed CT image using the Maxshift method and the right picture is the reconstructed CT image using the MBG-PBS method. Figure 6 shows three CT images with two ROIs. The CT image is a 512 × 512 gray image. The whole covering area is about 4.35% of the CT image. The area of ROI-1 is about 2.30% and that of ROI-2 is 2.05%. The left picture shows the original image with two arbitrarily shaped ROIs. The medium picture is the reconstructed CT image using the MBG-PBS method at 0.25bpp and the right picture is at 0.5bpp. The ROI-1 is the left and top part and the ROI-2 is the right and bottem part. Figure 7 shows three MRI images with three ROIs. The MRI image is a 512 × 512 gray image. The covering area of all ROIs is about 10.87% of the MRI image. The area of ROI-1 is about 2.06%, that of ROI-2 is 4.77% and that of ROI-3 is 4.04%. The left picture shows the reconstructed MRI image using the MBG-PBS method at 0.25bpp. The medium picture is the reconstructed image at 0.5bpp and the right picture is the reconstructed image at 1.0bpp. The ROI-1 is the left and top part, the ROI-2 is the bottem part and the ROI-3 is the right part.
Fig. 6. The reconstructed CT image with two ROIs using the MBG-PBS method; the original CT image (left); 0.25 bpp (medium); 0.5 bpp (right); ROI-1 is the left and top part; ROI-2 is the right and bottem part
Fig. 7. The reconstructed MRI image with three ROIs using the MBG-PBS method; 0.25 bpp (left); 0.5 bpp (medium); 1.0 bpp (right); ROI-1 is the left and top part; ROI-2 is the bottem part; ROI-3 is the right part
390
L.-b. Zhang and M.-q. Zhou
Figure 8 shows three Goldhill image with two ROIs. The image is a 512 × 512 gray state-of-the-art image. The whole covering area is about 9.00% of the image. The area of ROI-1 is about 4.53% and that of ROI-2 is about 4.47%. The left picture shows the original image with two arbitrarily shaped ROIs. The medium picture is the reconstructed Goldhill image using the MBG-PBS method at 0.25bpp and the right picture is that at 1.0bpp. The ROI-1 is the top part and the ROI-2 is the bottem part. Experimental result shows that the presented method can efficiently code multiple ROIs based on different degrees of interest without any shape information of the ROIs. Additionally, the new method can flexibly adjust the compression quality between ROI and background by selecting an arbitrary scaling value. Hence, the MBGPBS method can support multiple ROI coding in a certain range of bit rates, which depends on the number of up-shifted bitplanes for each ROI.
Fig. 8. The reconstructed Goldhill image with two ROIs using the MBG-PBS method; the original image (left); 0.25 bpp (medium); 1.0 bpp (right); ROI-1 is the top part; ROI-2 is the bottem part
6 Conclusions ROI coding for medical applications is a very desirable feature, such as telemedicine, volumetric medical data compression and medical image analysis. The proposed scheme is highly flexible. It allows the user to request an ROI at any moment, new ROIs at any other moment, to switch between different ROIs, and to switch between ROI and BG transmission. It also supports for lossless coding of ROIs only or the entire image. These features leave room for network server optimization. In fact, the network server can optimize the switching bit-rate, as well as the exact geometry of the ROI. The MBG-PBS method has three main advantages. First, it can support arbitrarily shaped multiple ROI coding with different degrees of interest without coding the ROI shapes, which is very important to interactive network medical image transmission and the distance diagnosis based on large images. Second, the new method can support some BG bitplanes are prior to encode if the ROI detail is not important. Finally, the complexity of the new scheme is very low at the decoder, only the ROI mask has to be generated and some arithmetic coder contexts reinitialized. We expect this idea is valuable for future medical image coding based on MROIs.
New MROIs Coding Using PBS for Medical Image Compression
391
References 1. Penedo, M., Pearlman, W.A., Tahoces, P.G., Souto, M., Vidal, J.J.: Region-Based Wavelet Coding Methods for Digital Mammography. IEEE Trans. on Medical Imaging, Vol.22 (2003) 1288-1296 2. Tasdoken, S., Cuhadar, A.: Quadtree-based Multi-region Multi-quality Image Coding. IEEE Signal Processing Letters, Vol.11 (2004) 101-103 3. Ueno, I., Pearlman, W.A.: Region of Interest Coding in Volumetric Images with ShapeAdaptive Wavelet Transform. SPIE/IS&T Electronic Imaging 2003, Vol.5022 (2003) 1048-1055 4. ISO/IEC, ISO/IEC 15444-1: Information Technology JPEG 2000 Image Coding SystemPart 1: Core coding system. http://www.jpeg.org (2004) 5. Skodras, A., Christopoulos, C.A., Ebrahimi, T.: The JPEG 2000 Still Image Compression Standard. IEEE Signal Processing Magazine, Vol.12 (2001) 36-58 6. Christopoulos, C., Askelf, J., Larsson, M.: Efficient Methods for Encoding Regions of Interest in The Upcoming JPEG 2000 Still Image Coding Standard. IEEE Signal Processing Letters, Vol. 7 (2000) 247-249 7. Li-bao, Z., Wang, K.: New Regions of Interest Image Coding Using Up-Down Bitplanes Shift For Network Applications. Lecture Notes in Computer Science, Vol. 3222, SpringerVerlag, Berlin Heidelberg New York (2004) 513-516 8. Liu, L., Fan, G.: A New JPEG 2000 Region of Interest Image Coding Method: Partial Significant Bitpanes shift. IEEE Signal Processing Letters, Vol.10 (2003) 35-38 9. Li-bao, Z.: A New Region of Interest Image Coding for Narrowband Network: Partial Bitplane Alternating Shift. Lecture Notes in Computer Science, Vol.3779, Springer-Verlag, Berlin Heidelberg New York, (2005) 425-432
Particle Swarm Optimization for Road Extraction in SAR Images Ge Xu, Hong Sun, and Wen Yang Signal Processing Laboratory, Wuhan University, Wuhan 430079, China [email protected]
Abstract. The paper proposes a new method for road extraction in SAR images. We regard that the road in SAR images can be represented by the Bspline curve. Firstly, we manually select the road’s extremities. Secondly, we calculate the each pixels’s road membership value using local road detector in the original SAR images. Thirdly, with particle swarm optimization that is one of the most powerful methods for optimization problem we obtain the optimal B-spline control points from the result of road detection. Finally, according to the optimal B-spline control points, we obtain the B-spline curve that is the result of road extraction. The experimental result shows that the method in the paper can accurately extract the road.
1 Introduction Recently, the research about extraction linear features, including roads, railroads and rivers, from optical or radar’s images have been carried out, and most of applications include two steps procedure: the first step consists of detecting pixels that are situated on the road. The second step consists of using starting points or interest regions in order to find the best pixel candidates belonging to the road [1],[2]. The road detection usually is implemented by the edge detect operator in first step. Owing to SAR image’s speckle, the traditional edge detection operators are not practical for SAR images. So some approaches of edge detection that are based on images statistical property or ratio of region’s intension are developed. The second step needs construct the model for road, and link the best pixel candidates using active contour model (ACM)[3] or Markov random field (MRF)[4]. However, the ACM techniques need initial contour that should be near to the road, and these methods are quite sensitive to noise and local minima. And MRF methods need many parameters and its computational cost is high. In[5], the new idea is to segment first the image data and recognize very basic urban classes(vegetation, built areas and roads) by means of fuzzy clustering. Then, from these classes, the road is extracted by means of FPCWHT (Fuzzy Pyramidal Connectivity Weighted Hough Transform), FPRT (Fuzzy Pyramidal Rotation Transform) and FSPE (Fuzzy Shortest Path Extraction). However, it is clear that this method is suitable for the straight road, and the accuracy of extraction for the road whose curvature is more than a certain value is low. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 392 – 401, 2006. © Springer-Verlag Berlin Heidelberg 2006
Particle Swarm Optimization for Road Extraction in SAR Images
393
To overcome the shortcomings of the above methods, the paper present a new method in the paper. Compared to other methods, the proposed algorithm in this paper is special because the method uses swarm intelligent algorithm to improve the accuracy and computational efficiency. It is well-known that extracting roads in satellite images is difficult because these images have speckles and their resolutions are low. Owing to (PSO) particle swarm optimization, our method can extract roads in space-borne SAR images with high accuracy and low computational cost. The overall flow is shown in figure 1. SAR image
select road extremities local road detection partical swarm optimization result of extraction Fig. 1. Overall flow of our method
Fig 1 presents the overall flow of our method. After selecting the extremities, the road extraction means finding optimal curve that should pass two selected extremities.And the optimal curve corresponds to true roads. In the step of local road detection, we adopt the ratio line detector [4] in images. And particle swarm optimization is used for searching the optimal curve in the result of local road detection. In the following, we discuss the road mode in section 2. And we research the local road detection in section 3. In particular, we describe the particle swarm optimization and extraction of road via particle swarm optimization in section 4 and lastly we provide extraction result of real SAR images and conclusions.
2 Road Model in SAR Image All the cited literatures are making the following assumptions in road extraction: (1) the amplitude of pixels inside the road is locally homogeneous, (2) there is a significant contrast between the road and the background, (3) its width is varying slowly, (4) its local curvature is less than a certain value. Since SAR is backscatter radar and the road surface is smooth, the amplitude of pixels on the road is low. In SAR images, the road appears darker than its neighborhood. It is shown in figure 2.
394
G. Xu, H. Sun, and W. Yang
Fig. 2. Road object in SAR image
The road can be described by the B-spline curve [6]. Since road curvature generally is low, we can character the road using the low order B-spline curve. It is shown in figure 3.
Fig. 3. Road represented with B-spline curve
In figure 3, the number of green B-spline curve’s control points is 5(red points denote control points), and its order is 2. From the figure 3, we can know that the road can be represented by the B-spline curve that is described by several control points. In a word, if we can obtain a set of control points of B-spline curve, we can determinate a one and only curve. So the road extraction can be deemed to finding the optimum control points of B-spline curve that corresponds to true roads. And we discuss the method that finds out the optimum control points with PSO in section 4.
3 Local Road Detection Using the statistical property in road amplitude, we are able to detect the pixels on the road. We implement detection procedure with Ratio Line Detector. Ratio Line Detector is modified from the ratio edge detector [4],[7] for the road. Our ratio line detector is derived from the coupling of two such edge detectors on both sides of a region. The mask of detector is shown in figure 4.
Particle Swarm Optimization for Road Extraction in SAR Images
395
Fig. 4. Mask of road detection
Let index 1 denote the central region and index 2 and 3 both lateral regions (Fig. 4), and X 0 is the pixel for detection. We define the radiometric mean μi of a given region
i is μi = (1 / ni )¦ s∈i As .
where the amplitude of pixel
s is noted As and
(1) ni is number of pixels in region
We then define the response of the edge detector between regions
rij = 1 − min ( μ i / μ j , μ j / μ i ) .
i.
i and j as ri , j (2)
So, we define the response of local road detection:
rroad = min(r12 , r13 ) .
(3)
We consider that the response of local road detection is the membership value of pixels in the road. Generally speaking, the detector response of pixels in the road should be more than others. It is clear that the membership value of pixels in the road is high without speck noise. Since the width of road can’t be accurately determinate, we should calculate the road’s membership value of each pixel according the different mask whose width is all possible size. Similarly, we should calculate the road’s membership value of each pixel according the mask whose orientation is different. After the local road detection, we obtain the membership value of each pixels in the images. In fact, we can extract partly road pixels through hard threshold decision, but a part of true road pixels whose local road response value is low cannot be extracted because there is speckle noise in SAR images. And it is very important process that finding the best pixel candidates belonging to the road and
396
G. Xu, H. Sun, and W. Yang
extracting the curve from the best pixel candidates. It is clear that the problem of extracting the optimum curve from a great deal of the best pixel candidates in 2- dimensional space is an NP-hard problem. Unfortunately, we often only obtain the local optimum result using the tradition method such as dynamic programming. To obtain the global optimum result, we implement road extraction using the optimization algorithm in the paper. In section 4, we introduce PSO method and apply PSO method to road extraction.
4 Particle Swarm Optimization (PSO) 4.1 Introduction of PSO The PSO model was introduced in 1995 by J.Kennedy and R.C.Eberhart, being discovered through simulation of a simplified social model such as fish schooling or bird flocking [8],[9],[10]. It was originally conceived as a method for optimization of continuous nonlinear functions. Each of potential solutions is a point in the ddimensional space in PSO, and we call it a particle. Each particle also has an associated velocity, which takes into account the experience of its accompaniers and itself. Each particle corresponds to particular value of the fitness function, and each particle knows the best position reached by itself (particle best, pbest) and current position. Additionally, each particle knows the best position reached by all ones (global best, gbest). And each particle change its position according to the following information:1) current position 2)current velocity 3) the distance between the current position and particle best position 4) the distance between the current position and global best position. Through the iteration, PSO finds the optimum solution in the population that is random initialized the position and velocity. 4.2 Mathematical Model of PSO There is a population consist of m potential solutions in the d-dimensional space
& & & & S = { X 1 , X 2 ,..., X m } , then particle i is represented as X i = ( xi1 , xi 2 ,..., xid ) , & i = 1, 2,..., m and it is a vector in the d-dimensional solution space. Vector X i
corresponds to particular value according to substituting it in the fitness function. And
& Pi = ( pi1 , pi 2 ,... pid ), i = 1, 2,..., m records particle best of the i-th particle. & There is at least a best particle that is represented as vector Pg = ( pg 1 , pg 2 ,..., pgd ) vector
in
the
population,
and
the
i-th
particle’s
velocity
is
represented
as
& Vi = (vi1 , vi 2 ,..., vid ) . PSO Changes particle’s velocities and positions using the
following
& & & & & & V ik +=1 Vi k + c1 ∗ r1 ∗ ( Pi k − X ik ) + c 2* r2 *( Pgk − X ik ) .
(4)
& & & X ik +1 = X ik + Vi k +1 .
(5)
Particle Swarm Optimization for Road Extraction in SAR Images
397
In the above formula, i is the particle’s index and k is the number of current iterations. The parameter c1 , c 2 which are constant are generally equal to 2. The
& & r1 , r2 are the random numbers between 0 to 1. In order to let Vi k and X ik in the reasonable range, we must limit them with parameter Vmax and parameter X max . In the formula (4), the current velocity of particle i is renew by the previous
factor
velocity, the distance between the current position and particle best position(pbest) and the distance between the current position and global best position(gbest). The current position of particle i is renewed according to the formula (5). After the iteration with formula (4) and (5), PSO method can attain the optimum solution. For example, figure 5 illustrate the update of particle i ’s position in the 2-dimensional solution space.
Pi Pg
X ik V Fig. 5. Update of particle
Xik+1
i
i ’s position in the 2-dimensional solution space
4.3 Road Extraction with PSO
In section 3, we acknowledge that the road extraction can be deemed to finding the optimum B-spline curve. And we apply PSO method to road extraction in this section. We assume that the number of B-spline curve’s control is N and the degree of curve is K(generally and experientially K=2,N=11).As the extremities are selected, We only need to calculate the rest (N-2) control points. So the dimension of solution space is d = 2* ( N − 2) , and we construct the position vector and velocity vector according to the dimension of solution space. It is shown on figure 6. X 1x
X1y
(a)
Xix
Xiy
V1x
Vi y
Vi x
Vi y
(b)
Fig. 6. Particle definition (a) position vector of a particle (b) velocity vector of a particle
398
G. Xu, H. Sun, and W. Yang
X i x denotes the abscissa of i-th control point in B-spline, and X i y denotes the x
ordinate of i-th control point in B-spline curve. Analogously, Vi is the abscissa y
velocity of i-th control point in B-spline, and Vi is the ordinate velocity of i-th control point in B-spline. And the (N-2) control points which are determinated by one particle and rest two control points which are road extremities form the N control points of B-spline. In a word, one particle corresponds to one B-spline curve. Finding out the optimum the set of control points means extracting the optimum curve that is the result. For finding the optimum B-spline curve, we definite the fitness function in formula 8. M
Fitness ( f ) = ¦ exp(d ( f (k ))) / M
(6)
k =1
In the formula, f is a certain B-spline curve and Fitness ( f ) is the fitness of curve f . Equation f ( k ) denotes the k-th point on the B-spline curve f , and M is the sum of points on the curve f . d (.) is the measure of Ratio Line Detector. After iteration in PSO method, we can obtain a one and only B-spline curve whose fitness is maximum in all possible curves. The process is show on figure 7.
select the road extremities
initial random particle evaluate each particle in popupation
PSO
result of extraction
update the popupation stop iteration?
N
Y
Fig. 7. Procedure of extraction with PSO
5 Experimental Result of Road Extraction We applied our algorithm to some sample regions of Netherlands in ERS-1 images. The resolution of the data used in our experiments is about 16m.The result is shown on the figure 8 and 9.
Particle Swarm Optimization for Road Extraction in SAR Images
399
In figure 8(b) and 9(b), the color points that are the road’s extremities are selected manually. The figure is the result of local road detection. And the white line is the final road extraction result in figure 8(d) and 9(d). In Table 1, performance analysis results of the road extract algorithm proposed in the paper are given. We have extracted a total of 867 road points in the two real SAR images, where a road point means a point on the B-spline curves that are extracted by our method. The errors are defined as the minimum Euclidean distances in pixels between extracted road points and true points manually determined on the road in the SAR images, with the average error determined as the total errors in pixels divided by the total number of road points. Table 2 and table 3 give the parameter setting of our algorithm, and the parameters in table 2 depend on the resolution of SAR images.
(a)
(c)
(b)
(d)
Fig. 8. (a) ERS-1 image(b) Road’s extremities selected(c) Result of detection(d) Result of PSO
400
G. Xu, H. Sun, and W. Yang
(a)
(b)
(c)
(d)
Fig. 9. (a) ERS-1 image(b) Road’s extremities selected(c) Result of detection(d) Result of PSO Table 1. Performance of road extraction
Total Number of extraction
867
Number of correct extraction Correct extraction rate (%) Average Error of False extraction (pixels)
853 98.4 0.02
Table 2. Parameter setting of road detection
Width of the mask (pixels)
10
Height of the mask (pixels) Width of the center region (pixels) Number of orientation
10 1 to 3 8
Particle Swarm Optimization for Road Extraction in SAR Images
401
Table 3. Parameter setting of PSO
Number of particles Number of iteration PSO model
400 100 global vision and asynchronization
6 Conclusions In this paper, we prose a semi-automatic road extraction method for SAR images. Our method regards road in SAR images as B-spline curve and extracts them. From the experimental result, we conclude that PSO perfectly extract the road object even if its curvature is higher, such as in figure 8. Applying the PSO to our problem, we devise an efficient algorithm. And experimental results show that the proposed method with PSO can accurately and efficiently extract roads.
Acknowledgement This research was supported by NSFC project (60372057, 40376051) and SOA Key Laboratory for Polar Science, Polar Research Institute of China(KP2005009).We wish to thank the Signal Processing Laboratory of Wuhan University for providing the facility to test and evaluate our algorithms over their software platform about Interpretation of SAR Imagery.
References 1. Hellwich.O., Mayer.H., Winkler.G.: Detection of Lines in Synthetic Aperture Radar(SAR)Scenes[A],Proceedings of International Archives of Photogrammetry RemoteSensing[C],Vienna, Austria(1996) 312-320 2. Samadani,R.,Vesecky,J.F.:Finding Curvilinear Features in Speckled Images[J],IEEE Transactions on Geoscience and Remote Sensing,(1990) 669-673 3. G eman, D., Jedynak, B.: An Active Testing Model for Tracking Roads in SatelliteImages,IEEETrans. on PAMI(1996) 1-14 4. T upin, F., Maître, H., Mangin, J.F., Nicolas, J.M., et Pechersky, E.: Detection of LinearFeatures in SAR Images: Application to Road Network Extraction, IEEE Trans. Geosci.and Remote Sensing vol. 36, no. 2, March (1998) 5. Fabio Dell’Acqua, Paolo Gamba.: Detection of Urban Structures in SAR Images by Robust Fuzzy Clustering Algorithms: The Example of Street Tracking[J], IEEE Transactions on Geoscience and Remote Sensing,(2001) 6. Gruer, A.,Li, H H.: Semiautomatic Linear Feature Extraction with Dynamic Programming and LSB-snakes[J],Photogrammetric Engineering and Remote Sensing (1997) 7. Touzi, R., .Lopes Bousquet A,P.: A Statistical and Geometrical Edge Detector for SAR Images[J].IEEE Transactions Geoscicece and Remote Sensing,(1988) 764-773 8. Kennedy, J., Eberhart, R.C.: Particle Swarm Optimization. Proc. IEEE int’l conf. On neural network Vol. IV, . 1942-1948. IEEE service center, Piscataway, NJ (1995) 9. Xu,Wenbo Sun,Jun.: Adaptive Parameter Selection of Quantum-behaved Particle Swarm Optimization on Global Level, Advances in Intelligent Computing: International Conference on Intelligent Computing, ICIC 2005. Proceedings (2005) 420-428 10. Kim, Dong Hwa.: Park III, Jin Intelligent PID Controller Tuning of AVR System Using GA and PSO Advances in Intelligent Computing: International Conference onIntelligent Computing, ICIC 2005. Proceedings( 2005)366-375
Pattern Recognition Without Feature Extraction Using Probabilistic Neural Network Övünç Polat and Tülay Yıldırım Electronics and Communications Engineering Department Yıldız Technical University Besiktas, Istanbul 34349, Turkey {opolat, tulay}@yildiz.edu.tr
Abstract. This article describes an approach to automatically recognize patterns such as 3D objects and handwritten digits based on a database. The designed system can be used for both 3D object recognition from 2D poses of the object and handwritten digit recognition applications. The system does not require any feature extraction stage before the recognition. Probabilistic Neural Network (PNN) is used for the recognition of the patterns. Experimental results show that pattern recognition by the proposed method improves the recognition rate considerably. The system has been compared to other network structures in terms of speed and accuracy and has shown better performance in simulations.
1 Introduction Pattern classification is the core task of many applications such as image segmentation, object detection, etc. Many classification approaches are feature-based, which means some features have to be extracted before classification can be carried out. Explicit feature extraction is not easy, and not always reliable in some applications [1]. In this paper, our purpose is to classify patterns without feature extraction using Probabilistic Neural Network (PNN). Probabilistic Neural Networks introduced by Specht is essentially based on the well-known Bayesian classifier technique commonly used in many classical patternrecognition problems[2]. In literature, among the artificial neural network (ANN) approaches developed for pixel-based and feature-based pattern recognition are feed-forward, Hopfield, and fuzzy ANNs [3]. A pattern recognition system generally consists of three blocks. The first block is a preprocessing stage, the second block is used for feature extaction and the last block is the classification stage. In literature, there are some works that uses the same object database as ours for object recognition. In these works, feature extraction is realized mainly by using principal component analysis (PCA) [4] or the optical flow of the image evaluated from its rotational poses [5] or by choosing the components of different directions with Cellular Neural Network and selecting the remaining pixel numbers in that particular direction as features [6]. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 402 – 409, 2006. © Springer-Verlag Berlin Heidelberg 2006
Pattern Recognition Without Feature Extraction Using Probabilistic Neural Network
403
There are convolutional neural networks which do not have a feature extraction stage, where basically the input image itself is applied at the input of the network for applications such as face recognition and handwritten character recognition in literature. In those systems, the input images are convolved with certain masks to get certain feature maps, then a subsampling stage takes place to reduce the image size and multilayer perceptron (MLP) networks are used for classification in the last stage of the network. This network topology has been applied in particular to image classification when sophisticated preprocessing is to be avoided [7-9]. In this study, we do not have a feature extraction stage. The input images are resized before they are applied to the input of the PNN network for object recognition. The proposed system is tested for both object recognition and handwritten digit recognition applications. The system has high recognition ratio for especially object recognition despite very few reference data are used. In this work, object recognition is achieved by comparing unknown object images with the reference set. To this end, each object is placed on a turntable which is rotated through 360 degrees and poses are taken with a fixed camera. Images of the objects are taken at pose intervals of 5 degrees. This corresponds to 72 poses per object, in total 720 poses for 10 objects. A 128x128 size Red-Green-Blue (RGB) input image is converted to grayscale and resized as 16x16. Then the16x16 matrix is converted to a 1x256 input vector. The input images are resized for a faster simulation. The same procedure is repeated for each of the 72 poses and some of these poses are chosen as reference set and then the PNN is simulated. The rest of the poses are used for test and high test performance is obtained. As stated before, the classification is done using the proposed system without any feature extraction and preprocessing stages for the digits in the data set in the handwritten digit recognition application. Ten samples for each of the digits between 0 – 9, hence 100 samples, is used for training, a different set of ten samples for each of the digits is used for test. The training performance of the network is 100% whereas the average test performance of the network is 67% in spite of the fact that much better test performance is achieved for certain digits.
2 Overview of PNN Structure A PNN, a supervised neural network, may require more neurons than a standard feedforward backpropagation network, but they require less set-up time and training time. PNN’s work best when an adequate number of samples are available to train with and those samples possess good class distinctions[10]. Consider a pattern vector x with m dimensions that belongs to one of two categories K1 and K2. Let F1(x) and F2(x) be the probability density functions (pdf) for the classification categories K1 and K2, respectively. From Bayes’ decision rule, x belongs to K1 if
F1 ( x) L1 P2 > F2 ( x) L 2 P1
(1)
404
Ö. Polat and T. Yıldırım
Conversely, x belongs to K2 if F1 ( x) L1 P2 < F2 ( x) L2 P1
(2)
where L1 is the loss or cost function associated with misclassifying the vector as belonging to category K1 while it belongs to category K2, L2 is the loss function associated with misclassifying the vector as belonging to category K2 while it belongs to category K1, P1 is the prior probability of occurrence of category K1, and P2 is the prior probability of occurrence of category K2. In many situations, the loss functions and the prior probabilities can be considered equal. Hence the key to using the decision rules given by (1) and (2) is to estimate the probability density functions from the training patterns [2]. In the PNN, a nonparametric estimation technique known as Parzen windows[11]is used to construct the class-dependent probability density functions for each classification category required by Bayes’ theory. This allows determination of the chance a given vector pattern lies within a given category. Combining this with the relative frequency of each category, the PNN selects the most likely category for the given pattern vector. Both Bayes’ theory and Parzen windows are theoretically well established, have been in use for decades in many engineering applications, and are treated at length in a variety of statistical textbooks. If the jth training pattern for category K1 is xj, then the Parzen estimate of the pdf for category K1 is F1 ( x) =
ª (x − x j ) T (x − x j ) º «− » exp «¬ »¼ (2π ) m / 2 σ m n j =1 2σ 2 1
n
¦
(3)
where n is the number of training patterns, m is the input space dimension, j is the pattern number, and ı ҏis an adjustable smoothing parameter [2].
Fig. 1. Probabilistic Neural Network (PNN) Architecture
Pattern Recognition Without Feature Extraction Using Probabilistic Neural Network
405
Figure 1 shows the basic architecture of the PNN. The first layer is the input layer which is represents the m input variables (x1, x2,...,xm ). The input neurons merely distribute all of the variables x to all neurons in the second layer. The pattern layer is fully connected to the input layer, with one neuron for each pattern in the training set. The weight values of the neurons in this layer are set equal to the different training patterns. The summation of the exponential term in (3) is carried out by the summation layer neurons. There is one summation layer neuron for each category. The weights on the connections to the summation layer are fixed at unity so that the summation layer simlpy adds the outputs from the pattern layer neurons. Each neuron in the summation layer sums the output from the pattern layer neurons which correspond to the category from which the training pattern was selected. The output layer neuron produces a binary output value corresponding to the highest pdf given by (3). This indicates the best classification for that pattern [2].
3 Grouping of the Database For the object recognition application, each of the 16x16 grayscale input images is converted to one 1x256 input vector. Similarly, for the handwritten digit recognition application, each of the 28x28 grayscale input images is converted to one 1x784 input vector in the designed model and applied to the input of the network. In the simulation of PNN for object recognition, different numbers of poses from original data set with different rotation intervals are selected as the reference poses. The remaining images in the data set are used to test the network. This procedure is shown in Figure 2. For handwritten digit recognition, 50% of the dataset is defined as reference and the system is tested by the remaining 50% of the dataset.
Reference data set
........
Original data set
Test data set
........
........
Fig. 2. Grouping of the database
406
Ö. Polat and T. Yıldırım
4 Simulation and Results 4.1 Object Recognition For the experiments, we are currently using images of the Columbia image database. Columbia Object Image Library (COIL–100) is a database of color images of 100 objects. We selected the 10 objects from the dataset shown in Figure 3. The objects were placed on a motorized turntable against a black background. The turntable was rotated through 360 degrees to vary object pose with respect to a fixed color camera. Images of the objects were taken at pose intervals of 5 degrees. This corresponds to 72 poses per object. The images were size normalized [12]. Figure 4 shows the frames with rotations 0o to 25o in the object–6 dataset from COIL100.
Fig. 3. 10 objects used the simulate recognition system
Fig. 4. The image sequence of object–6 in database with rotations 0o to 25o
As described above, different number of poses from the original dataset with different rotation intervals is selected as reference poses. For recognition process, a code is given for each object as target values of the network. 6 and 12 poses from the original dataset with 60o and 30o rotation intervals, respectively, are selected as the reference poses as seen in Table 1. The remaining images in the dataset are used to test the network. In Table 2, the accuracy of the test results obtained by using 12 poses taken with 30o rotation intervals for the ten objects considered are given. As can be seen from Table 2, recognition rate is quite high for each object.
Pattern Recognition Without Feature Extraction Using Probabilistic Neural Network
407
Table 1. Number of reference images and corresponding recognition rates
Number of reference images
Recognition rate of the test set
6 poses from original data set with 60o rotation intervals are selected as the reference poses. The remaining 66 images in the data set are used to test the network. 12 poses from original data set with 30o rotation intervals are selected as the reference poses. The remaining 60 images in the data set are used to test the network.
85,76 %
96,83 %
Table 2. Recognition rates of objects
Test results
Obj1
Obj2
Obj3
Obj4
Obj5
86,6%
100%
93,3%
100%
100%
Obj6
Obj7
Obj8
Obj9
Obj10
98,3%
100%
90%
100%
100%
Average
96,83 %
4.2 Handwritten Digit Recognition In this study, the total of 200 digits, 20 for each digit, is taken from MNIST handwritten digit set (It is a subset of a larger set available from NIST) [13]. The digits have been size-normalized and centered in a fixed-size image. The recognition is done without any feature extraction and preprocessing stages for the digits in the data set. Ten samples for each of the digits between 0 – 9, hence 100 samples, is used for training, a different set of ten samples for each of the digits is used for test. Gray level images of some of the digits between 0 – 9 in the data set is shown in Figure 5.
Fig. 5. Digits from the MNIST-Database
Each of the 28x28 input images is converted to one 1x784 input vector in the designed model and applied at the input of the network. The vectors selected as reference from the data set are applied at the input of PNN network. A code is given for each digit as target values of the network and the network is simulated. The network is tested with the rest of the dataset. The test results for 10 references and 10 test data for each digit is presented in Table 3.
408
Ö. Polat and T. Yıldırım Table 3. Recognition rates of handwritten digits
Digits
0
1
2
3
4
5
6
7
8
9
0Test 100% 80% 80% 80% 20% 50% 50% 70% 60% 80% results
Average
67%
The training rate of the network is 100%. The network response for test data is low for only 4 digits. The network has high test rate for digits 0, 1, 2, 3 and 9. An average test rate is obtained for the other input digits. These test rates can be increased using more samples in the reference set. The designed model which uses PNN network has 100% training and 96.8% test performance for object recognition. The training time of the network in a P4 3GHz, 256 MB RAM PC is 0.06 seconds. It takes the system 1.34 seconds total to load the input data, to train of the network with reference data and to test the network with remaining data. The same application is done using several different networks. The RBF (Radial basis function) neural network has high training performance whereas test performance is around 60%. The MLP network is also simulated using different training algorithms. There is memory shortage problem for 28x28 size input image for handwritten character recognition application. The input image is resized as 16x16 for the object recognition application. The MLP used in this application has one hidden layer having ten neurons. It takes 8 minutes to train this network to achieve minimum error rate. The resulting network has high training performance but the test performance is around 75%. These simulations indicate that proposed network has both the speed and accuracy advantage for test data over those of MLP and RBF.
5 Conclusions In this study, a model is designed for pattern recognition without feature extraction by PNN. System tested for 3D object recognition using 2D poses and handwritten digits recognition. For object recognition, the application is carried out for 10 objects and high recognition rate is obtained. The ability of recognizing unknown objects after training with low number of samples is important property of this system. For handwritten digits recognition, the application is carried out for 10 digits and considerable recognition rate is obtained. However, the number of the data chosen as reference can be increased in this application. The lack of the preprocessing and feature extraction stages other than image resizing is one of the important attributes of the proposed system. The availability of different input patterns shows that the system is able to be used in pattern recognition applications such as handwritten character recognition and signature recognition. This topology can be applied for pattern recognition while avoiding sophisticated preprocessing and feature extraction. Furthermore, the proposed system has much better recognition rates and much faster training times than that of RBF and MLP networks for the same applications.
Pattern Recognition Without Feature Extraction Using Probabilistic Neural Network
409
References 1. Li, B.Q., Li, B.X.: Building Pattern Classifiers Using Convolutional Neural Networks. IJCNN '99. International Joint Conference on Neural Networks. 5 (1999) 3081-3085 2. Goh, T. C.: Probabilistic Neural Network for Evaluating Seismic Liquefaction Potential. Can. Geotech. J., 39 (2002) 219-232 3. Egmont-Petersen, M., de Ridder, D., Handels, H. : Image Processing with Neural Networks A Review. The Journal of the Pattern Recognition Society, 35(10) (2002) 2279-2301 4. Zhao, L.W., Luo, S.W., Liao, L.Z.: 3D Object Recognition and Pose Estimation Using Kernel PCA. Proceedings of 2004 International Conference on Machine Learning and Cybernetics, 5 (2004) 3258-3262 5. Okamoto, J., Jr., Milanova, M., Bueker, U.: Active Perception System for Recognition of 3D Objects in Image Sequences. 5th International Workshop on Advanced Motion Control. AMC '98- Coimbra. 29 (1998) 700-705 6. Polat, O., Tavsanoglu, V.: 3-D Object Recognition Using 2-D Poses Processed by CNNs and a GRNN. Proc. of 2005 Turkish Symposium Artificial Intelligence and Neural Networks. Cesme, Turkey (June 2005) 7. Nebauer, C.: Evaluation of Convolutional Neural Networks for Visual Recognition. IEEE Transactions on Neural Networks, Vol. 9, Issue 4. (July 1998) 685 – 696 8. Fasel, B.: Head-pose Invariant Facial Expression Recognition Using Convolutional Neural Networks. Proc. of 2002 Fourth IEEE International Conference on Multimodal Interfaces. (2002) 529-534 9. Simard, P.Y., Steinkraus, D., Platt, J.C.: Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis. Proc. of the 2003 Seventh International Conference on Document Analysis and Recognition, (2003) 958-963 10. Comes, B., Kelemen, A.: Probabilistic Neural Network Classification for Microarraydata. Proceedings of the International Joint Conference on Neural Networks, 3 (2003) 1714-1717 11. Parzen, E.: On Estimation of A Probability Density Function and Model, Annals of Mathematical Statistics, 36 (1962) 1065-1076 12. Sameer, A. N., Shree, K. N., Hiroshi M.: Columbia Object Image Library (COIL-100). Technical Report No. CUCS-006-96, Department of Computer Science Columbia University New York, N.Y. 10027 13. Available in http://www.yapay-zeka.org/modules/news/article.php?storyid=3
Power Transmission Towers Extraction in Polarimetric SAR Imagery Based on Genetic Algorithm Wen Yang, Ge Xu, Jiayu Chen, and Hong Sun School of Electronic Information, Wuhan University, Luoyu Road 129#, Wuhan 430079, Hubei Province, China [email protected]
Abstract. This paper addresses a novel extraction approach of power transmission tower groups in fully polarimetric synthetic aperture radar data sets based on genetic algorithm. The new approach firstly introduces polarimetric Whitening Filter for improving the ability to detect targets in clutter by reducing the effect of speckle without affecting the resolution. Then point-like targets are detected based on multi-resolution statistic energy level method which adapt to different clutter backgrounds and detect targets efficiently. Finally, a generic algorithm based searching approach is applied to extracting series of power transmission towers from all the remaining potential point-like targets. Good experimental results are obtained when our method is applied to single-look, high resolution, X-band full polarimetric SAR images.
1 Introduction Synthetic Aperture Radar (SAR) sensor has many advantages over electro-optical sensor (EO) as an active microwave remote sensing imaging system, such as rangeindependent high resolution and superior poor weather performance [1]. It has more and more important role in military reconnaissance and achieving information dominance on the battlefield. The collection capacity for such imagery is growing rapidly, and along with that growth is the expanding need for computer-aided or automated exploitation of SAR images. One important aspect of the aided/automated exploitation is automatic target detection/recognition (ATD/R).When they are used to detect manmade targets and distinguish them from naturally occurring backgrounds, we require some fast and effective discriminators to suppress natural clutter, detect the presence of targets, and classify the type of targets from their radar return. Unlike optical remote sensing images, characterized by very neat and uniform features, SAR images are affected by speckle which results in a granular aspect with random spatial variations. Strong backscattering comes from targets is usually assumed in single channel SAR images. A target will be lost when the backscattering amplitude from the target is not large enough compared with the background at some aspect angles [2]. To overcome this, full polarimetric SAR images are used. Current airborne SAR systems and future satellite systems also acquire multi-channel SAR data, i.e. fullpolarimetric and/or multi-frequency data, because such data contain a lot more information [3]. In this paper, we present a genetic algorithm (GA) based method to extracting power transmission towers (PTTs) in full polarimetric SAR images. Firstly, a polarimetric D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 410 – 420, 2006. © Springer-Verlag Berlin Heidelberg 2006
Power Transmission Towers Extraction in Polarimetric SAR Imagery Based on GA
411
whitening filter (PWF) [4], point-like targets detector and some post-processing operator are applied to the image. These operators are described in section 2. Then, a searching approach of extracting the power transmission towers in lots of hash pointlike targets based on GA is described in section 3. And some experimental results and analysis of utilizing this extraction scheme are presented in Section 4. Finally, some discussions and conclusions are presented in Section 5. X-band Pi-SAR (The Pi-SAR sensor is an airborne POLSAR system developed by NICT and JAXA of Japan. The resolution in the X-band image is 1.5m×1.5m) data are used for illustration.
2 Detection of Point-Like Targets In the extraction of PTTs, point-like targets detection is the first and important technology. The detection results affect the accuracy of final extraction directly. Power Transmission Towers in the four channel SAR images is shown in Fig.1, from which we can find that it is a very difficult problem to detect these point- like targets in SAR Imagery.
1 2 3 4 5 6 7 8 (a)HH
(b)HV
(c)VH
(d)VV
Fig. 1. Aspect of Power transmission towers (targets in the yellow circles) in the four channel polarimetric SAR image
An obvious way to detect point-like targets in such multi-channel images is to fuse the results of existing detectors applied on each individual channel. An alternative is to use certain algorithm which combines information from the different channels into a single output image, afterwards, a detection is applied on it. Here, we take the latter, and Fig. 2 illustrates our approach that can be used for detecting point-like targets in polarimetric SAR images.
412
W. Yang et al.
Fig. 2. Flow chart of point-like targets detection in polarimetric SAR images
2.1 Polarimetric Speckle Reduction Polarimetric SAR image provides a measurement of the backscattering matrix in each pixel of the image, i.e. each pixel is characterised by a value for each of the four polarisations HH, HV, VH and VV. Speckle phenomenon also affects the phase of scattering coefficients and corrupts polarimetric information. Specific procedures have to be used to preserve the spatial resolution and to reduce the randomness of the acquired signals. The value at every pixel of a polarimetric SAR image is a four-element complex vector. As described by Novak et al. [5], polarimetric measurements described in terms of the scattering matrix can be simplified owing to the reciprocity condition (which holds for most targets ( S HV = SVH )) and can thus be described by
X = [ S HH
S HV
SVV ] .
(1)
with its covariance matrix denoted by Σ . The output image having the least speckle can be constructed from the following equation, −1
y = XT¦ X .
(2)
−1 / 2
is a whitening filter, which we call the polarimetric whitening filter. The filter Σ The output image y not only contains the information from all channels of the scattering matrix image but also has the lowest speckle level, which significantly improves the signal-to-clutter ratio for man-made targets. 2.2 Target Detection Based on Multi-resolution Statistic Energy Level The speckle noise, the geometrical properties of SAR imaging, and the complicated scattering properties make the target in SAR images presenting some unique properties.
Power Transmission Towers Extraction in Polarimetric SAR Imagery Based on GA
413
According to the feature of point targets in SAR image, and based on the statistic characteristic of the wavelet coefficients at different scales, an automatic detection approach was proposed to distinguish between the targets and clutter background based on multi-resolution statistic energy level [6].
Fig. 3. Diagram showing the different steps and corresponding sketch map of our detection algorithm
Fig 3 shows the different steps of this algorithm. Firstly, the filtered data are decomposed by means of the non-orthogonal wavelet transform. As we all known, information contained in SAR images is mainly depended on the local statistical property of grayscale instead of grayscale itself. So we can get the local statistical information of every pixel in SAR image by the time-frequency analysis based on non-orthogonal wavelet transform. Then the random processes at each pixel position among different scales are defined, and the information at each pixel position among different scales is correlated. The correlation on multi-level wavelet coefficients of speckle and the interferential background is attenuated quickly, but that of target area changes gradually. So the energy image is constructed based on the energy function, the different parts of a region can be converged at their certain energy level. An acute change will be presented if a heterogeneous target appears in the background, as shown in the energy image of Fig 3. Finally, point targets can be detected through the adaptive scale windows in the energy image. The results of experiments show that this method can be applied to different clutter background, and targets can be detected efficiently without missing some dim targets, meanwhile, the geometrical shape of the targets will be well preserved.
414
W. Yang et al.
Since isolated pixels have little chance of belonging to a PTT, a pixel suppression step is performed to suppress local false alarms and obtain a “cleaner” binary result by using simple heuristic rules. For each pixel kept with direction, we look for other selected pixels with a direction close to in an angular beam around it. If none is found, the pixel is suppressed. The purpose of this post-processing is to reduce the total number of candidate points.
3 Searching Power Transmission Tower Based on GA In Section 2, we obtain a lot of point-like targets among which only a part of point targets are PTTs. In general, selecting the interested targets from a given targets set is a typical NP problem. For example, if the number of points that be selected is 30, there are 230 selections. The computation cost is very huge if we take the exhaustive search strategy. And if we use classic optimization techniques, such as dynamic programming and linear programming, it is very difficult to obtain the acceptable solution. In this section, a search strategy based on GA is proposed to solve the difficult problem. GA is a sort of optimization techniques based on Darwin’s Principle of Natural Selection. It was originally developed by John Holland in 1975 [7]. The basic idea of principle of natural selection is “select the best, discard the rest”. GA implements optimization strategies by simulating evolution of species through the natural selection. Firstly, selection operator replicates the more successful solutions found in a population at a rate proportional to their relative quality. Then, crossover decomposes two distinct solutions and then randomly mixes their parts to form novel solutions. Finally, mutation randomly perturbs a candidate solution to obtain the new solution, and a satisfactory solution has been achieved through the iteration. GA has been applied to a variety of function optimization problems, and also has been shown to be highly effective in the large, poorly defined search space even in the presence of difficulties such as high-dimensionality, multi-modality, discontinuity and noise [8]. When using GA, the following three key parts which directly affect the speed of convergence and the value of result should be pay more attention: − Definition of chromosome; − Genetic operators: selection, crossover and mutation; − Definition of fitness function. 3.1 Definition of Chromosome Here we take the most common method of encoding – Binary Encoding. Chromosomes are strings of 1s and 0s and each position in the chromosome represents a particular characteristic of the problem. “1” denotes the point which is a power transmission tower, and “0” is represents other target. For example, as Fig.4.(a) illustrates, there are six point-like targets, the green point represents the power transmission tower, and the hollow point is non-targets, so the responding gene of the link is “101011”. 3.2 Genetic Operators The purpose of genetic operators is to simulate the effect of errors that happen with low probability during duplication [9].
Power Transmission Towers Extraction in Polarimetric SAR Imagery Based on GA
415
• Selection: Extracts k individuals from the population with uniform probability (without re-insertion) and makes them play a “tournament”, where the probability for an individual to win is generally proportional to its fitness Selection pressure is directly proportional to the number k of participants. The typical choices for rank based selection are linear, non-linear, and tournament selection. Tournament selection inherits the advantages of rank selection. It has a major advantage over other selection methods. Namely, tournament selection has the shortest time complexity; In addition, it does not require global reordering and is more naturally-inspired [10]. So we apply the tournament selection as the selection scheme. For example, there are 26 solutions in Fig4.(a). We randomly create population in which there are eight potential solutions. Through the Selection operator, we preserve four solutions for Crossover, and the rest solutions are discarded. • Crossover: Two chromosomes (strings) combine their genetic material (bits) to produce a new offspring which possesses both their characteristics. Here, we take the single point crossover - A random point is chosen on the individual chromosomes (strings) and the genetic material is exchanged at this point. After the selection operator, there are 4 solutions are in mating pool in the example of Fig4.(a) .We randomly select two solution as parents. We assume that two parents are ‘100011’ and ‘111001’ and the random point is 3. In a one-point crossover, the chromosomes of parent ‘100011’ and parent ‘111001’ are separated and connected at 3rd position in chromosome , two new chromosome is generated. Two offspring are ‘100001’ and ‘111011’. Here, the number of generated children is the same as that of extinguished individuals. • Mutation: Induces a change in the solution, so as to maintain diversity in the population and prevent premature convergence. In our research, we mutate the string by randomly selecting any two points and interchanging their positions in the solution, thus giving rise to a new selection. As the example illustrated in Fig4.(a), we can obtain the offspring through the Crossover. And we assume the offspring is ‘100001’ and the mutation position points are 3 and 6. Then the new offspring that is ‘101000’ is generated after interchanging the 3rd and the 6th position. 3.3 Designing of Fitness Function A fitness function quantifies the optimality of a solution (chromosome) so that particular solution may be ranked against all the other solutions. A fitness value is assigned to each solution depending on how close it actually is to solving the problem. Ideal fitness function correlates closely to goal and quickly computability [11]. As we can see form Fig 4(b), the PTTs in SAR images have the following geometrical properties: − The series of PTTs are long (they should almost never stop); − The series of PTTs have a low curvature, at least in nonurban areas; − The two arbitrary adjacent PTTs are nearly equidistant and the distance between them is larger than a fixed threshold. According to the geometrical relationship of PTTs, the corresponding fitness function can be designed. Suppose the number of total points in the region of
416
W. Yang et al.
1 2 3 4
5 6 1
0
1
0
1
1
(a)
(b)
angle1 distant1
distant2
angle2
distant3
(c) Fig. 4. (a) Binary encoding of Chromosome; (b) Sketch map of geometrical relationship of PTTs; (c) Consideration on designing the fitness function
interest is M. We get a group of N points for a certain gene (the number of PTTs in this gene is N), and link them by “the nearest neighbors”, as the Fig 4(c), where angle[i] is the angle of two linked lines by the ith point. Distance[i] represents the distance between the ith point and the previous last adjacent point. So we make the following definitions: Mean of Angles:
1 Meanangle = N
N
¦ angle[i] .
(3)
¦ distance[i] .
(4)
i =1
Mean of Segments:
Meanseg =
1 N
N
i =1
Power Transmission Towers Extraction in Polarimetric SAR Imagery Based on GA
417
Variance of Segments:
Varseg =
1 N
N
¦ (distance[i] - Meanseg )
2
.
(5)
i =1
If the corresponding point of the gene is a PTT, the three values above will be very small, so we design the fitness function as below,
§ · aN aN aN + + fitness = exp ¨ ¸. © Meanangle + b Meanseg + b Varseg + b ¹
(6)
Because the value of Meanseg is probably tend to become zero, the factor aN/Meanseg is probably tend to infinite which is discommodious for computation. So we should introduce a bias ‘b’ to avoid this problem. According to the experiences, a= 0.3, b =0.5.
4 Experimental Results Fig.5 gives our whole extraction scheme. Firstly, we get the filtered image by applying the PWF which combines the information in four different polarimetric channels and reduces the speckle. Then the point-like targets are detected by multi-resolution statistic energy level approach, among which only a fraction are the potential PTTs. We also take a post-processing to reduce the space of candidate points. Finally, we find out the real PTTs by using a GA based searching. Two groups of extraction are given in Fig.6 and Fig. 7 to illustrate the effectiveness of the extraction algorithm. The PTT groups marked by yellow circle in the two different scenes are well extracted, including
Fig. 5. Flow chart of PTTs extraction in polarimetric SAR images
418
W. Yang et al.
the very dim targets such as the third PTT in Fig.6 and the forth PTT in Fig.7 (counting from the top to the bottom) are both extracted. It is obvious that the proposed algorithm is not full automatic and needs some manual intervention, namely we have to determine the beginning and the ending of the PTT group by manual selection.
(a)
(b)
(c)
(d)
Fig. 6. (a) Color composite polarimetric image: HH(red), HV(green), VV(blue) (b) Filtered image based on PWF (c) Point-like target detection result (d) Extraction result based on GA
(a)
(b)
(c)
(d)
Fig. 7. (a) Color composite polarimetric image: HH(red), HV(green), VV(blue) (b) Filtered image based on PWF (c) Point-like target detection result (d) Extraction result based on GA
Power Transmission Towers Extraction in Polarimetric SAR Imagery Based on GA
419
Table 1. Parameters setting of GA
Maximum number of iteration Probability of crossover Probability of mutation Population size
60 0.9 0.2 200
Table 2. Time consumption of the two test data (Programming in Matlab 7 and running on Pentium 4 based personal computer (CPU: 2.4G, Memory: 512M))
Test data Data of Fig.6 Data of Fig.7
Time consumption 20.6s 22.2s
Table 1 gives the parameters setting of GA, and Table 2 gives the computational time of the two data sets in Fig.6 and Fig.7. The test environment is Pentium 4 based personal computer (2.4GHz) with 512M memory, and programs are running on Matlab 7 environment. Due to different numbers of point-like targets, the lengths of chromosomes are not same. There are 32 point-like targets in region of interest in the Fig.6(c), and the length of chromosome is 32.There are 56 point-like targets in region of interest in the Fig.7(c), so the length of chromosome is 56. Since the number of point-like targets in Fig.7(c) is larger than Fig.6(c), the time consumption of the former is longer than the latter.
5 Conclusions In this paper we present an ingenious extraction algorithm of PTT series in polarimetric SAR data, as well as the identification of very specific targets, namely PTTs using suitable heuristics in the fitness function of a GA. The algorithm is also extended in extracting other point-like targets group. Further work will focus on the full automatic processing and acquiring more robust performance.
Acknowledgement The authors would like to thank the anonymous referees for the detailed, valuable suggestions. This work has been supported by Funds of National Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), SOA Key Laboratory for Polar Science (KP200509), and NSFC project (60372057, 40376051).
References 1. Oliver, C. J., Quegan, S.: Understanding Synthetic Aperture Radar Images. Artech House Inc. 685 Canton Street, Norwood, MA (1998) 2. Chaney, R. D., Burl, M.C., Novak L. M.: On the Performance of Polarimetric Target Detection Algorithms, IEEE International Radar Conference, Arlington, VA (1990) 520-525
420
W. Yang et al.
3. Touzi, R., Boerner, W. M., Lee, J. S., Lueneburg, E.: A Review of Polarimetry in the Context of Synthetic Aperture Radar: Concepts and Information Extraction, Can. J. Remote Sensing, 30 (3) (2004) 380–407 4. Novak, L. M., Burl, M.C.: Optimal Speckle Reduction in Polarimetric SAR Imagery. IEEE Transactions on Aerospace and Electronic Systems, 26 (1990) 2293–305 5. Novak, L. M., Burl, M. C., Irving, W. W. : Optimal Polarimetric Processing for Enhanced Target Detection. IEEE Transactions on Aerospace and Electronic Systems, 29 (1) (1993) 234–243 6. Wen Y., Jiayu, C., Hong, S., Wang, X. J.: Study on Point Targets Detection in SAR Imagery , Chinese Journal of Radio Science, Vol. 19, No. 3 (2004) 362–366 7. Holland, J. H.: Adaptation in Natural and Artificial Systems. The University of Michigan Press, Ann Arbor, Michigan (1975) 8. Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs. SpringerVerlag, Berlin (1996) 9. Goldberg, D. E.: Genetic Algorithm in Search, Optimization and Machine Learning, Addison Wesley, New York (1989) 10. Harvey, I.: Evolutionary Robotics and SAGA: The Case for Hill Crawling and Tournament Selection. Proceedings of Workshop on Artificial Life, Santa Fe, NM, Addison-Wesley, XVII, (1992) 299-326 11. Sipper, M.: Machine Nature: The Coming Age of Bio-Inspired Computing. McGraw-Hill, New York (2002)
Synthesis Texture by Tiling s-Tiles Feng Xue1, Yousheng Zhang1, Julang Jiang2, Min Hu1, and Tao Jiang1 1
School of Computer Science Hefei University of Technology, 230009 Hefei, P.R. China 2 Physics Department of Anqing Normal College, 246011, Anqing, P.R. China [email protected], [email protected], [email protected], [email protected], [email protected]
Abstract. We present a novel method to generate a set of texture tiles from samples, which can be seamlessly tiled into arbitrary size texture in real-time. Compared to previous methods, our approach is more simple and advantageous in eliminating visual seams that may exist in each tile of previous methods, especially to those samples that have elaborate features or distinct colors. Texture tiles generated by our approach can be regarded as single-colored tiles on each orthogonal direction border, which make them easier for tiling. Experimental results demonstrate the feasibility and effectiveness of our approach.
1 Introduction Textures are often used for representation of realism of surfaces. Texture mapping is an alternative method for that representation, but may introduce seams and texture distortion. In the past decade, a new technique called texture synthesis has attracted much attention and has been an active research area in computer graphics. The goal of texture synthesis can be stated as: given a sample texture, synthesize a new texture, which will be perceptually similar to the sample and also ensure that the output has sufficient variations. Early texture synthesis algorithms [2,3,4,5,6,7] are time consuming processes, which use Markov Random Fields(MRF) to model textures. Literature [1] is the pioneer work using texture tiles(Wang Tiles) to generate arbitrary size texture which makes the synthesis process run at real-time. Motivated by [1], this paper presents a new approach to create texture tiles, called s-tiles, and then tiles these s-tiles into a large texture in real-time. 1.1 Related Work There are many efforts available for contributing to texture synthesis. Efros et al. [2] grow a new image outward from an initial seed, one pixel at a time using nonparametric method. Wei and Levoy [3] extend this approach to multiple frequency bands and use vector quantization to speed up the processing. Ashikhmin’s cunning modification [4] to Wei and Levoy’s algorithm [3] remarkably reduces the computation complexity and is fitter for nature textures. These techniques all have in common that they generate D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 421 – 429, 2006. © Springer-Verlag Berlin Heidelberg 2006
422
F. Xue et al.
textures one pixel at a time. Image quilting proposed by Efros et al. [5] align adjacent patch boundaries by computing sum-of-squared differences (SSD) between synthesized patch and candidate patch, and then perform a minimum-error-boundary-cut (MEBC) within the overlap region to reduce artifacts. Image quilting is fast but still hard to be competent in real-time applications due to its dynamic computation of SSD and MEBC. Nealen et al. [6] propose a hybrid approach of integrating image quilting method and pixel-based method to minimize the overlap artifacts in the result of image quilting algorithm. Graphcut-based algorithm in Kwatra et al. [7] formulates the patch boundary problem as a max flow/min cut problem, generalizing the image quilting algorithm to arbitrarily shaped patches and arbitrary dimension. All the approaches discussed above are time-consuming methods, which cannot generate large size texture at runtime. Xu [8] proposes a method called chaos mosaic to generate large size texture. Liang et al. [9] use image pyramid, KD-Tree and principle component analysis to accelerate texture synthesis which makes the algorithm run in near real-time, but these additional data structures also make the algorithm more complicated and hard for implementation. Zelinka and Garland [10,11] firstly analysis sample texture to generate a data structure called jump map, which stores for each input pixel a set of matching input pixels, then texture synthesis proceeds in real-time as a random walk through the jump map. But the synthesized results are not quite desirable for lower frequency textures as shown in Figure 6 of their paper [10]. Tiling technique may be helpful for real-time synthesizing. Neyret and Cani [12] texture a mesh by tiling a small set of triangular tiles. In their algorithm, triangle set must be created interactively or by procedural methods, and the original mesh needs to be optimized to minimize the texture distortion, which is labor-intensive work. Cohen et al. [1] use Wang Tiles to generate arbitrary size texture in real-time. In their paper, texture generation is divided into two passes: firstly, use image quilting algorithm to produce Wang Tiles, then those tiles are placed together according to their edge constraints. But there may be some visible seams along the diagonals of Wang Tiles, which will cause apparent diamond shaped artifacts in the final tiled result. Based on Wang Tiles method, Wen et al. [13] propose an improved approach, called ω-tile, to reduce conspicuous seams of Wang Tiles. But there still may be some visible seams in the final results even a Poisson smoothing process is made, especially to those samples which have elaborate features or distinct colors. 1.2 Algorithm Outline Like [1][13] we also decouple texture synthesis into two steps as shown in Figure 1. First of all, we present a novel idea to create s-tiles, secondly, we tile these s-tiles into arbitrary size output texture in real-time. It is obvious that for a particular texture, stile set only needs to be created once, which can be reused for any times to synthesis large textures in real-time. Compared to Wang Tiles and ω-tiles methods, our algorithm has following advantages: • Because we generate s-tiles by using pixel-based texture synthesis algorithm, which will be detailed in Section 2, there is no artificial seam inside, and postprocesses in [13] such as graph-cut and Poisson smoothing are no longer needed.
Synthesis Texture by Tiling s-Tiles
423
• As can be seen from Figure 1(b), there is only a single color constraint on each orthogonal direction of s-tiles, which makes s-tiles easier to be tiled. The rest of the paper is organized as follows. Section 2 describes the generation of s-tile. Section 3 introduces a simple tiling strategy of s-tiles to avoid periodical patterns in the output. Section 4 presents our experimental results, and Section 5 concludes the paper with strength and weakness of our method and suggestions on future work.
Fig. 1. Illustration of Texture synthesis using Wang Tiles and s-tiles
2 s-Tile Generation S-tiles are crucial in our method for acquiring seamless, aperiodic output textures. In general, all s-tiles generated in our algorithm should conform to criteria as: • They should be “continuous” across their borders so that they can be seamlessly tiled with each other; • There should be sufficient variations inside each tile so that little repetition can be found in the tiled output. We noted that results synthesized by using Wei and Levoy’s random order synthesis algorithm in [14] partially meet these two criteria. Random order synthesis algorithm is a two-pass algorithm, which is specifically described in Wei and Levoy’s work [14]: “During the first pass, we search the input texture using a neighborhood that contains only pixels from the lower resolution pyramid levels(except the lowest resolution where we randomly copy pixels from the input image). In the second pass, we use a neighborhood containing pixels from both the current and lower resolution. In both passes, on each level, the neighborhoods used are symmetric (noncausal). We alternate these two passes for each level of the output pyramid, and within each pass we simply visit the vertices in a random order”. Compared to Wei and Levoy’s basic multi-resolution synthesis algorithm in [3], there are two modifications in their random order synthesis method:
424
F. Xue et al.
(1) All the neighborhoods are symmetric, which are constructed toroidally in both horizontal direction and vertical direction, as shown in Figure 2(a). This kind of neighborhood makes the synthesized output of random order synthesis 1 algorithm (Figure 2(b)) self-tilable , shown as in Figure 2(c). (2) Instead of scan line synthesis order, a random synthesis order is used which makes the synthesized results different from each other after each run of the algorithm.
Fig. 2. Result of random order synthesis using noncasual neighborhood is self-tilable. (a) sample texture, and the neighbors are constructed toroidally, (b) synthesis result of Wei an Levoy’s random order algorithm, (c) tiled result of (b).
2.1 Generating s-Tiles Based on Wei and Levoy’s random order synthesis method, we present a simple approach to create s-tiles. Figure 3 demonstrates our s-tiles generation algorithm: (1) Firstly, we start from sample texture (Figure 3(a)) to generate a self-tilable output patch, called seed patch(Figure 3(b)) using Wei and Levoy’s random order synthesis algorithm; (2) Then, a constraint patch(Figure 3(c))are derived from the seed patch by filling center squared region with white noise respectively, while slim margins(normally, 2-4 pixels wide) are reserved. (3) Finally, a constrained texture synthesis is performed to fill the noise region in the constraint patch using Wei and Levoy’s random order synthesis algorithm to generate s-tiles. We repeat this noise filling process within the constraint patch(see Figure 3(c)) to produce any number of s-tiles, which can be seamlessly tiled with each other. In this step, random synthesis algorithm makes s-tiles different from each other, which ensures that there is little repetition pattern in the final tiled result. 1
Self-tilable means that tiles can be seamlessly tiled with the copies of itself, both vertically and horizontally.
Synthesis Texture by Tiling s-Tiles
425
Fig. 3. Generation of s-tiles. (a) sample, (b) seed patch, (c) constraint patches, (d) s-tiles.
2.2 Seamlessness and Variations In Figure 3, we build seed patch in (b) from (a) using random order synthesis algorithm, which make it self-tilable. From the s-Tiles generating algorithm described in 2.1, it is rational to conclude that all s-Tiles can be seamlessly tiled together since they all have the same borders with the seed patch. In Figure 3(d), from the constraint patch we can generate as many s-tiles as we want. Because of using random order synthesis, sufficient variations are guaranteed among resulting s-tiles so that there is little repetition pattern in the tiled output texture. But there are still some border repetitions in the final output, because all the s-tiles have the same squared border . An effective approach to minimize these border repetitions is to use smaller width of the reserved border in Figure 3(c). In our experience, using 2 or 4 pixels wide border and 2-leveled pyramid will produce desirable s-tiles. In most cases, we can generate satisfying s-tiles using Wei and Levoy’s random order synthesis algorithm which will result in reasonable output textures in consequent tiling phase. Optionally, one can do some improvement in these s-tiles by little manual editing, if very high quality of the output is demanded.
3 Tiling Strategy Once the s-tile set has been created, we can tile them into arbitrary size texture in realtime. A straightforward way is to randomly paste one tile to the output texture. This
426
F. Xue et al.
will do in most cases that the quality is not so highly claimed. We present a simple method in which no identical tile is placed side-by-side in horizontal and vertical directions. As shown in Figure 4, we tile the output in scan line order, in each step we perform a check, only those tiles(C2,C3,C4, C6 and C7) that are different from the left and upper neighbors(C1,C5) are allowed to be pasted to the output. Then we randomly select an s-tile from C2, C3, C4, C6 and C7 to paste to the output. Thus, side-by-side repetition of a same s-tile can be avoided which will be helpful to minimize texture repetitions in the output.
Fig. 4. Our tiling strategy. Before tiling, a check is performed: only those s-Tiles(C2, C3, C4, C6, C7) which are different from the left and upper neighbors of current position are allowed to paste. Then we randomly select an s-Tile from C2, C3, C4, C6 and C7 to tile.
4 Results and Discussion Some experimental results of our algorithm are shown in Figure 5. They are all generated by tiling pre-computed s-tiles. Most synthesized results in Figure 5 are desirable except that there is small texture deformation in the bottom right example, which is because Wei and Levoy’s algorithm runs into difficulties in strongly structural textures. The size and number of s-tiles play import roles in our algorithm for graceful results. The size of s-tile should not be so small that each s-tile can’t grasp the basic structure of the sample, while too large size makes the s-tiles generation time rapidly increase. In our experience, s-tile size is at least 4~5 times of the basic texel size to ensure that there are sufficient variations inside of each s-tile. Unlike Wang Tiles or ω-tiles method, there is no explicit limitation of s-tiles number in our approach for tiling any large area without seams across the tiles’ boundaries. The number of tiles depends on the class of sample texture. Stochastic texture needs more tiles to ensure that there is no apparent repetition in the tiled output, while structural texture needs relative fewer tiles to guarantee its regularity. We also present some synthesized results using Wang Tiles, ω-tiles methods and our s-tiles method in Figure 6 for comparison.
Synthesis Texture by Tiling s-Tiles
427
Fig. 5. Synthesized results of our algorithm. The outputs are tiled by pre-computed s-tiles.
In Figure 6, the first row are two sample textures, and the rest two rows are corresponding tiled results of using Wang Tiles(left column), ω-tiles (middle column) and s-tiles (right column) respectively. There are some visible seams(called diamond shaped artifacts in literature [1]) in tiled results of Wang Tiles and ω-tiles methods, this is probably because that it remains challenging to obtain good cutting curves to avoid prominent seams in resulting Wang Tiles and Ȧ-tiles, especially to those input textures having elaborate or large features(the second sample) or with distinctive colors(the first sample), as discussed in [13]. As shown in Figure 6, results tiled by s-tiles (right column) are more graceful than previous two methods without any artificial seams, except there is some slight blurring due to the Wei and Levoy algorithm’s tendency to blur the output [4], but this kind of blurring is often acceptable.
428
F. Xue et al.
Fig. 6. Comparison of results synthesized by Wang Tiles, ω-tiles and our s-tiles method
5 Conclusions and Future Work In summary, we present a simple yet method to build texture tiles, s-tiles, which can be seamlessly tiled into large textures in real-time. Our texture synthesis approach can eliminate potential visible seams that may exist in previous Wang Tiles and ω-tiles methods. Because there is only one single edge color on each orthogonal border of s-tile, it is easier for s-tiles to be tiled with lower color constraints than Wang Tiles and ω-tiles. Although our approach can do well in many stochastic textures and semi-structural textures as shown in Figure 5, there are some limitations to those strongly structural textures like bottom right sample texture in Figure 5, from which our algorithm can not producing satisfying s-tiles due to Wei and Levoy’s algorithm’s inborn incompetence in that kind of texture. There may be two potential directions for future work: (1) find better method to further improve the quality of s-tiles. (2) in addition, our method can be extended to arbitrary surfaces by tiling s-tiles onto polycube models like Fu et al. [15]. In fact, stiles are more suitable for surface tiling than Wang Tiles method used in [15], because the complicated Sign Assignment phase becomes simpler when s-tiles are used.
Synthesis Texture by Tiling s-Tiles
429
Acknowledgement We would like to thank Tiow-Seng Tan (one of authors of [13]) for sharing their experimental results at (http://www.comp.nus.edu.sg/~tants/w-tile/resources.html) using Wang Tiles and Ȧ-tiles methods that we used in Figure 6 for comparison. We also owe our thanks to many anonymous reviewers of this paper. This research is supported by the National Nature Science Foundation of China under Grant No.60575023 and Science Research and Development Foundation of Hefei University of Technology of China under Grant No.060504F.
References 1. Cohen, M. F., Shade J.,Hiller S., Deussen O. Wang,: Tiles for Image and Texture Generation. Proceedings of SIGGRAPH, (2003) 287-302 2. Efros, A. A., Leung, T. K.: Texture Synthesis by Nonparametric Sampling. Proceedings of International Conference on Computer Vision, Vol. 2, (1999) 1033-1038 3. Wei, L., Levoy, M.: Fast Texture Synthesis using Tree Structured Vector Quantization. Proceedings of SIGGRAPH, (2000) 479-488 4. Ashikhmin, M.: Synthesizing Natural Texture. Proceedings of Symposium on Interactive 3D Graphics, (2001) 217-226 5. Efrosan, A. A. , Freeman, W.T.: Image Quilting for Texture Synthesis and Transfer. Proceedings of SIGGRAPH, (2001) 341-346 6. Nealen, A., Alexa, M.:Hybrid Texture Synthesis. Eurographics Symposium on Rendering. (2003).97-105 7. Kwatra, V., Schodl, A., Essa, I., Turk, G., Bobick, A.: Graphcut Texture: Image and Video Synthesis using Graph Cuts, Proceedings of SIGGRAPH, (2003) 277-286 8. Xu, Y.Y., Guo, B., Shum, H.: Chaos Mosaic: Fast and Memory Efficient Texture Synthesis, Technical Report, MSRTR-2000-32, Microsoft Research, (2000) 9. Liang, L., Liu, C., Xu, Y.Q., Guo, B., Shum, H.Y.: Real-time Texture Synthesis by Patchbased Sampling, ACM Transactions on Graphics, Vol. 20, No. 3, (2001) 127-150 10. Zelinka, S., Garland, M.: Towards Real-time Texture Synthesis with The Jump Map, Proceedings of the Eurographics Symposium on Rendering (2002) 99-104 11. Zelinka, S., Garland, M.: Jump Map-based Interactive Texture Synthesis, ACM Trans. Graph. 23(4), (2004) 930-962 12. Neyret, F., Cani, M.: Pattern-based Texturing Revisited, SIGGRAPH (1999) 235–242 13. Wen, C.H., Zhang X.Y., Tan, T.S.: Generating an Ȧ-tile Set for Texture Synthesis. The 2005 Computer Graphics International, 22-24 June, Stony Brook, New York, USA (2005) 14. Wei, L.Y., Levoy, M.: Texture Synthesis Over Arbitrary Manifold Surfaces, SIGGRAPH 2001, Computer Graphics Proceedings, (2001) 355–360 15. Fu C.W., Leung, M.K.: Texture Tiling on Arbitrary Topological Surfaces, Proceedings of the Eurographics Symposium on Rendering (2005) 99-104
Relaxation Labeling Using an Improved Hopfield Neural Network Long Cheng, Zeng-Guang Hou, and Min Tan Key Laboratory of Complex Systems and Intelligence Science, Institute of Automation, The Chinese Academy of Sciences, P.O. Box 2728, Beijing 100080, China {long.cheng, zengguang.hou, min.tan}@ia.ac.cn
Abstract. The relaxation labeling is a useful technique to deal with local ambiguity and achieve consistency. In [1], some useful comments indicate several common properties exist in the relaxation process and the neural network technique. Neural networks can be used as an efficient tool to optimize the average local consistency function whose optimal solution results in a compatible label assignment. However, most of current investigations in this field are based on the standard Hopfield neural network (SHNN) presented in [2]. In this paper, an improved Hopfield neural network (IHNN) presented in [3] is utilized to fulfill relaxation labeling. Compared to the SHNN, this approach has some advantages. 1) The IHNN uses fewer neurons than that of SHNN. 2) The activation function of IHNN is easier to be implemented than that of SHNN. 3) The IHNN does not contain any penalty parameters. It can generate the exact optimal solution. Some experimental results illustrate that the IHNN approach can obtain a better labeling performance than that of SHNN.
1
Introduction
The relaxation labeling is a useful technique to deal with local ambiguity and achieve consistency. It has far broader applications such as scene labeling, curve detection and enhancement, and image segmentation. There are basically three types of relaxation processes, namely discrete relaxation, fuzzy relaxation, and probabilistic relaxation [4]. In this paper, neural network technique is utilized to fulfill the probabilistic relaxation whose general idea is described as follows. This relaxation process involves a set of n objects O1 , O2 , · · · , On and a set of m labels C1 , C2 , · · · , Cm . For each object Oi , a measurement pij is used as the probability of the case that Oi has the label Cj . For each i ∈ {1, 2, · · · , n}, pij has to satisfy the following condition m j=1
0 ≤ pij ≤ 1,
pij = 1,
(1)
∀j = 1, 2, · · · , m.
Suppose further that the label assignment is correlated. For example, label Cj at object Oi can influence label Ck at object Oh . This compatibility of the pair D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 430–439, 2006. c Springer-Verlag Berlin Heidelberg 2006
Relaxation Labeling Using an Improved Hopfield Neural Network
431
of label assignments Oi ∈ Cj and Oh ∈ Ck can be quantitatively measured by r(i, j; h, k). If object Oi having label Cj tends to support the object Oh with label Ck , then r(i, j; h, k) is positive and larger, vice versa. To apply relaxation labeling, each object is firstly assigned the initial prob(0) ability pij of its possible label memberships. Then relaxation process uses the compatibility measurement r(i, j; h, k) to find a set of n classifications of all the objects which are as compatible as possible. Rosenfeld et al. proposed an iterative updating algorithm to find this compatible classification [5]. Specifically, (r) for each object and each label in the rth iteration, one computes the qij by following equation n m 1 (r) (r) qij = r(i, j; h, k)phk . (2) n h=1 k=1
(r)
qij denotes the contribution from the other objects to support the assignment (r)
pij . Then the new assignment value is adjusted according to following equation (r)
(r+1)
pij
(r)
pij [1 + qij ] = . m (r) (r) pik [1 + qik ]
(3)
k=1
In [7], Hummel and Zucker proposed following quantity criterion named average local consistency function to evaluate the performance of relaxation labeling. n m n m pij r(i, j; h, k)phk (4) Z(p) = i=1 j=1
h=1 k=1
They demonstrated that above hoc updating formula (2) and (3) are just an approximate approach to find the maximal solution to average local consistency function. Thus the probabilistic relaxation problem can be converted to a constrained optimization problem. In this point of view, many traditional numerical optimization approaches such as gradient ascent method can be used to fulfill relaxation process [7]. However these traditional methods suffer from their enormous computation cost and cannot satisfy the real-time processing requirement when the relaxation problem is complicated. So new approaches are required to reduce algorithm complexity and increase computational efficiency. In recent years, neural networks as an efficient optimization tool have attracted a lot of attention in scientific and engineering applications. Rather than solving problems by numerical iteration, the fundamental idea behind the neural approach to optimization problems is: by setting up a neural circuit which is determined by a set of differential equations, for some initial network state, the neural networks will eventually evolve to an equilibrium state and this state will coincide with the optimal solution to the original problem. Since Hopfield and Tank’s classic work [2], various neural network approaches have been proposed to resolve many kinds of optimization problems [3], [8], [9] and [10]. Some researchers also utilized neural network approach to fulfill relaxation process [1],
432
L. Cheng, Z.G. Hou, and M. Tan
[11] and [12]. In [1], some useful comments indicate several common properties exist in the relaxation process and the neural network technique. Some advantages of using neural networks are that its natural character of parallel computation enables neural networks could solve complicated problems efficiently and the structure of neural networks can be implemented by VLSI technology at a more reasonable cost. So the relaxation process based on the neural network technique can be accomplished in real time. But many investigations which used neural networks to relaxation labeling are based on the standard Hopfield neural network [1] and [11]. The SHNN contains penalty parameters, thus it only generates the approximate solution and has an implementation problem when penalty parameters are very large. In this paper, an improved Hopfield neural network presented in [3] has been utilized to fulfill the relaxation process. The IHNN model uses fewer neurons than that of SHNN. The activation function of IHNN has a “saturation” nonlinearity form which is easier to be implemented than the sigmoid function used in SHNN. Moreover, the IHNN approach is demonstrated by variational inequality technique to have the ability of obtaining an exact optimal solution [3]. According to experimental results, the IHNN approach can obtain a better labeling performance than that of SHNN. The remainder of this paper is organized as follows. Section 2 provides the dynamical differential equations and architectures of the improved Hopfield neural network. Experimental results are shown and discussed in Section 3. Section 4 concludes this paper with final remarks.
2
Improved Hopfield Neural Network Model
First, the constrained optimization problem which maximizes the average local consistency function (4) is stated as follows max Z(p) =
m n n m
pij r(i, j; h, k)phk ,
(5)
i=1 j=1 h=1 k=1
s.t.
m
pij = 1,
j=1
0 ≤ pij ≤ 1, i = 1, 2, · · · , n;
j = 1, 2, · · · , m.
This is a quadratic optimization problem with equality and inequality constraints. In the neural network literature, there exist a few NN models for solving quadratic optimization problems. Some of them are based on the penalty function method [2] and [8]. The fundamental idea behind the penalty function method is that, when a constraint violation occurs, the magnitude and direction of the violation are fed back to adjust the states of neurons. But penalty parameters cannot be infinite, the network only generates the approximate solution. Wang et al. presented several Hopfield-type neural network approaches (Lagrangian neural network, primal-dual neural network, primal neural network and dual neural network) which fulfill the Karush-Kuhn-Tucker (KKT) condition to obtain an exact optimal solution [3], [9] and [10]. According to [9] and
Relaxation Labeling Using an Improved Hopfield Neural Network θ1
Q1,1 Q1,2 . . . Q1,n(m−1)
+
α1
+
+
−
τ1
p0 1
p1 (t)
θ2
Q2,1
α2
Q2,2 . . . ···
433
+
+
+
−
τ2
p0 2
p2 (t)
. . .
Q2,n(m−1)
Qn(m−1),1 θ n(m−1)
− + τ n(m−1)
αn(m−1) + +
Qn(m−1),2 . . .
p0 n(m−1) pn(m−1) (t)
Qn(m−1),n(m−1)
Fig. 1. Functional block diagram for IHNN model
[10], the primal-dual neural network and the dual neural network usually require the objective function convexity, this requirement can not be guaranteed by the quadratic optimization problem defined by (5). An improved continuous-time Hopfield neural network presented in [3] is adopted in this paper. This network model can deal with bound-constrained nonlinear optimization with any continuously differentiable objective function which is not necessarily quadratic or convex. The quadratic optimization problem considered by IHNN in [3] is described as follows 1 T x Qx + xT θ, 2 s.t. a ≤ x ≤ b
min E(x) =
(6)
where x = (x1 , x2 , · · · , xn )T ∈ n is a decision vector; a = (a1 , a2 , · · · , an )T ∈ n and b = (b1 , b2 , · · · , bn )T ∈ n are respectively given constant lower and upper bounds with ai ≤ bi (i = 1, 2, · · · , n); Q ∈ n×n is a symmetric weighted matrix; θ = (θ1 , θ2 , · · · , θn )T ∈ n is a constant vector and the superscript T denotes the transpose operator. Note that the quadratic problem defined by (5) has equality constraints. In order to utilize the IHNN approach, equality constraints have to be canceled by replacing pij0 (∀j0 ∈ {1, 2, · · · , m}) in (5) with 1 − m j=1,j=j0 pij , then the objective function of the optimization problem defined by (5) can be reformulated as follows n n n m Z(p) = pij r(i, j; h, k)phk =
i=1 j=1 h=1 k=1 n n m
m
pij phk Qij,hk + i=1 j=1 j=j0 h=1 k=1,k=j0 n n m n pij θij +
i=1 j=1,j=j0 T T
= p Qp + p θ + C.
r(i, j0 ; h, j0 )
i=1 h=1
(7)
434
L. Cheng, Z.G. Hou, and M. Tan
where p = p11 , · · · , p1(j0 −1) , p1(j0 +1) , · · · , p1m , · · · , pn1 , · · · , pn(j0 −1) , pn(j0 +1) , T · · · , pnm ) ∈ n(m−1) is the decision variables; Qij,hk = r(i, j; h, k)−r(i, j; h, j0 ) j0 ), Q = (Qij,hk ) ∈ n(m−1)×n(m−1) is the − r(i, j0 ; h, k) + r(i, j0 ; h, nsymmetric n n weighted matrix; θ = r(i, j; h, j ) + r(h, j ; i, j) − ij 0 h=1 h=1 h=1 r(h, j0 ; n n0 n i, j0 ) − h=1 r(i, j0 ; h, j0 ), θ = (θij ) ∈ n(m−1) and C = i=1 h=1 r(i, j0 ; h, j0 ). Thus the problem defined by (5) is converted to the following quadratic optimization problem with only bound constraints. min Z(p) = −(pT Qp + pT θ + C), s.t. 0 ≤ p ≤ 1,
(8)
where 0 = (0, 0, · · · , 0)T ∈ n(m−1) and 1 = (1, 1, · · · , 1)T ∈ n(m−1) . According to [3], an improved Hopfield neural network model can be designed to resolve above problem. The differential equations which determine the IHNN configuration are described as follows dp = −p + g (p + Λ(2Qp + θ)) (9) dt where Γ = diag τ1 , τ2 , · · · , τn(m−1) and Λ = diag α1 , α2 , · · · , αn(m−1) are time constant positive defined matrix and scaling positive defined matrix, respectively. These parameters can be used to control the convergence rate of the T IHNN model. The activation function g(x) = g1 (x), g2 (x), · · · , gn(m−1) (x) and gi (x) is defined by following equation ⎧ ⎨ 0, if x < 0; gi (x) = 1, if x > 1; ⎩ x, if 0 ≤ x ≤ 1. Γ
g(x) can be regarded as the projection operator of n(m−1) to the closed convex feasible region Ω = [0, 1] × [0, 1] · · · × [0, 1]. The IHNN model has been demonstrated to have following features by variational inequality and ordinary differential equation techniques in [3]. 1) The IHNN model is regular in the sense that any optimal solution to problem defined by (8) is also an equilibrium point of IHNN. Moreover, if the objective function is convex, the set of equilibrium point of IHNN is equal to the set of the optimum of optimization problem defined by (8). 2) The IHNN model is quasiconvergent in the sense that the trajectory of IHNN cannot escape from the feasible region and will converge to the set of equilibrium point of IHNN for any network initial state in the feasible region. This guarantees the IHNN model is stable. 3) If the initial state of IHNN is in the feasible region, the value of objective function will decrease along the corresponding solution trajectory. Thus, although the objective function of the optimization problem defined by (8) is not convex, which means that the equilibrium point of IHNN may not be the optimum of optimization problem defined by (8), the convergence point of IHNN can still obtain a better performance than the network initial state. Also because of the non-convexity of the objective function, there are maybe more than one
Relaxation Labeling Using an Improved Hopfield Neural Network
435
equilibrium point of IHNN. In order to make IHNN converge to the equilibrium point resulted in a better labeling performance, the initial state of IHNN cannot be selected randomly in the feasible region. The initial state which is computed by analyzing the corresponding relaxation labeling problem can be regarded as a prior knowledge, and the solution trajectory of IHNN is expected to converge to the equilibrium point which is nearest to the initial state. After IHNN converges, m pij0 can be obtained by the relationship pij0 = 1 − j=1,j=j0 pij . Then select the maximum value pijm among pi1 , pi2 , · · · , pim . Object i is assigned to label jm . The dynamical equations (9) of IHNN indicate that the network is composed by only one layer of n(m − 1) neurons which is fewer than that of SHNN which is composed of nm neurons [1]. The IHNN has only one nonlinearity form “saturation” which is easier to be implemented by an operational amplifier than the sigmoid function used in SHNN. The IHNN approach is not based on the penalty function method, it yields an exact optimal solution while SHNN contains finite penalty parameters which can only generate an approximate optimal solution. The block diagram of the IHNN controller is shown in Fig. 1. The weighted connections of network and the initial network state have to be determined according to the corresponding relaxation labeling problem.
3
Experimental Results
To demonstrate the efficiency of the IHNN approach, a typical image processing problem, thresholding, is performed. In other word, to determine each pixel of a noise corrupted image belong to either object or background. The original gray image is a Chinese character “cheng” shown in Fig. 2. The size of this image is 40 × 40. The most of intensity values of character is 20 while the intensity value of background is 255. Then this image is corrupted by a zero means white noise. The corrupted image is depicted in Fig. 3. It is clear that thresholding by simply using a single thresholding value for the entire image does not work, so relaxation method is used to deal with this problem.
Fig. 2. The Chinese character “cheng” without noise corruption
Fig. 3. The Chinese character “cheng” with white noise corruption
The object of this relaxation labeling problem is each pixel point of the corrupted image and two kinds of labels, character and background, will be assigned to each object. For the pixel point Ax,y , px,y,1 denotes the probability of this point having character label and px,y,0 = 1 − px,y,1 denotes the case of
436
L. Cheng, Z.G. Hou, and M. Tan
background label. Then the initial probability of each pixel and compatibility coefficients have to be determined. For the initial probability, a scheme to determine the likelihood of the pixel being the character or background is needed. According to the analysis in Section 2, a good initial probability assignment results in a good labeling performance. So the likelihood determination scheme is very important to the problem resolution. Here a simple method presented in [4] is adopted. Specifically, let Vx,y be the gray level of pixel point Ax,y , Vmax and Vmin are the maximum and minimum gray levels of the entire image, respectively. Then initial probability of pixel point Ax,y can be obtained by following equations Vx,y , Vmax − Vmin Vx,y =1− . Vmax − Vmin
p0x,y,0 =
(10)
p0x,y,1
(11)
For compatibility coefficients, suppose that each pixel point Ax,y only correlates to its nearest 8-neighborhood points. So the compatibility coefficient for any point not in the 8-neighborhood will be zero. For each point Ax+i,y+j (i, j ∈ {−1, 0, 1}), there are four compatibility coefficients, r(x, y, 1; x + i, y + j, 1), r(x, y, 1; x+i, y+j, 0), r(x, y, 0; x+i, y+j, 1) and r(x, y, 0; x+i, y+j, 0). A general method (12) proposed in [6], which is based on the mutual information of labels at neighboring pixels, is utilized to compute these compatibility coefficients. The computation result is shown in Table 1. px,y,λ px+i,y+j,λ˜ n (x,y) ˜ (12) r(x, y, λ; x + i, y + j, λ) = log px,y,λ px,y,λ˜ (x,y)
(x,y)
˜ ∈ {0, 1}. where λ, λ ˜ Table 1. Compatibility coefficients computed by (12) where each element in row (λ, λ) ˜ and column((x, y),(x + i, y + j)) means the coefficient r(x, y, λ; x + i, y + j, λ) ˜ (λ, λ) 0; 0; 1; 1;
0 1 0 1
0; 0; 1; 1;
0 1 0 1
(x, y) (x − 1, y − 1) 0.023070 -0.029055 -0.036887 0.044478 (x, y) (x + 1, y + 1) 0.029292 -0.037177 -0.032316 0.039101
(x, y) (x, y) (x, y) (x − 1, y) (x − 1, y + 1) (x, y + 1) 0.027393 0.025787 0.040600 -0.034715 -0.032637 -0.051198 -0.038016 -0.032475 -0.049311 0.045800 0.039289 0.058908 (x, y) (x, y) (x, y) (x + 1, y) (x + 1, y − 1) (x, y − 1) 0.029778 0.024255 0.037603 -0.037809 -0.030661 -0.048042 -0.036012 -0.034328 -0.053104 0.043452 0.041473 0.063259
Relaxation Labeling Using an Improved Hopfield Neural Network
Fig. 4. The thresholding image processed by the IHNN approach
437
Fig. 5. The thresholding image processed by the SHNN approach
Fig. 6. The Chinese character “cheng” with color noise corruption
Fig. 7. The thresholding result of figure 6 by the IHNN approach
Fig. 8. The Chinese character “long” without noise
Fig. 9. The Chinese character “long” with white noise corruption
Fig. 10. The thresholding image processed by the IHNN approach
Now, the IHNN model is simulated on an IBM personal computer. The differential equations (9) which determine the IHNN configuration are solved by Matlab ode45 method and the time constants matrix Γ and scaling matrix Λ in (9) are set to be 100 × I (I is a unity matrix) and 100 × I, respectively. The thresholding image processed by the IHNN approach is shown in Fig. 4, which is almost same with the original image. Then the SHNN approach is used to process the corrupted image again in order to compare with the IHNN method. The gain λ of SHNN is set to be 0.1 and the initial probability and compatibility coefficients are computed by the method used in [1]. The thresholding image processed by the SHNN approach is shown in Fig. 5. According to the experimental result, the IHNN method can obtain a better performance than that of SHNN. In order to verify the robustness of IHNN algorithm, some other
438
L. Cheng, Z.G. Hou, and M. Tan
type of noise is generated to corrupt the original image. This color noise is the output signal of the filter 1−z1 −1 whose input signal is the original white noise. The color noise corrupted image is shown in Fig. 6, and the thresholding image is depicted in Fig. 7. It is obvious that the IHNN algorithm can also obtain a good thresholding result. Another thresholding experiment is to deal with a Chinese character “long” with a different font type. The original image is shown in Fig. 8 and the noise corrupted image is shown in Fig. 9. The initial probability values and compatibility coefficients are computed by the same method used in the first experiment. Γ and Λ of the IHNN model are set to be 100 × I and 100 × I. According to the experimental result shown in Fig. 10, the IHNN approach can still obtain a good labeling performance.
4
Concluding Remarks
The relaxation labeling is a useful technique to deal with local ambiguity and achieve consistency. In [1], some useful comments indicate that several common properties exist in the relaxation process and the neural network technique. Recently, many investigations in this field are based on the standard Hopfield neural network. In this paper, an improved Hopfield neural network model presented in [3] is utilized to fulfill the relaxation labeling process. Compared to the SHNN, this IHNN approach has some advantages that 1) The IHNN uses fewer neurons than that of SHNN. 2) The activation function of IHNN has a “saturation” nonlinearity form which is easier to be implemented than that of SHNN. 3) The IHNN is not based on the penalty function method and can generate an exact optimal solution while SHNN contains the finite penalty parameters which result in an implementation problem when penalty parameters are very large. According to the experimental results, the IHNN method can obtain a better labeling performance than that of SHNN. Because the neural network approach can be used as a guidance to design analog or digital circuits, thus any real-time relaxation application can be accomplished by the IHNN method. Note that the relaxation labeling problem is essentially a non-convex quadratic optimization problem. The IHNN model used here is only regular but not complete (complete means the set of equilibrium point of the neural network model is equal to the set of the optimum of quadratic optimization problem). So some advanced complete neural network model can be further investigated. Moreover, how to select the network initial state appropriately according to the corresponding relaxation problem is also an interesting topic for future research.
Acknowledgments This research was supported in part by the National Natural Science Foundation of China (Grants 60205004, 50475179 and 60334020), the National Basic Research Program (973) of China (Grant 2002CB312200), and the Hi-Tech R&D Program (863) of China (Grants 2002AA423160 and 2005AA420040).
Relaxation Labeling Using an Improved Hopfield Neural Network
439
References 1. Yu, S. S., Tsai, W. H.: Relaxation by the Hopefield Neural Network. Pattern Recongition. 25 (1992) 197-209 2. Tank, D. W., Hopfield, J. J.: Simple Neural Optimization Networks: an A/D Converter, Signal Decision Circuit and a Linear Programming Circuit. IEEE Transcations on Circuits and Systems. 35 (1988) 554-562 3. Liang, X. B., Wang, J.: A Recurrent Neural Network for Nonlinear Optimization with a Continuously Differentiable Objective Function and Bound Constraints. IEEE Transcations on Neural Networks. 11 (2000) 1251-1262 4. Rosenfeld, A., Kak, A. C.: Digital Picture Processing Volume 2. New York: Academic Press, (1982) 5. Rosenfeld, A., Hummel, R. A., Zucker, S. W.: Scene Labeling by Relaxation Operations. IEEE Transcations on System, Man and Cybernetics. 6 (1976) 420-433 6. Peleg, S., Rosenfeld, A.: Determining Compatibility Coefficients for Curve Enchancement Relaxation Processes. IEEE Transcations on System, Man and Cybernetics. 8 (1978) 548-555 7. Hummel, R. A., Zucker, S. W.: On the Foundations of Relaxation Labeling. IEEE Transcations on Pattern Analysis and Machine Intelligence. 5 (1983) 267-287 8. Kennedy, M. P., Chua, L. O.: Neural Networks for Nonlinear Programming. IEEE Transcations on Circuits and Systems Part A. 35 (1988) 554-562 9. Zhang, Y. N., Wang, J., Xia, Y. S.: A Dual Neural Network for Redundancy Resolution of Kinematically Redundant Manipulators Subject to Joint Limits and Joint Velocity Limits. IEEE Transcations on Neural Networks. 14 (2003) 658-667 10. Xia, Y. S., Wang, J.: A Recurrent Reural Network for Solving Nonlinear Convex Programs Subject to Linear Constraints. IEEE Transcations on Neural Networks. 16 (2005) 379-386 11. Sang, N., Zhang, T. X.: Segmentation of FLIR Images by Hopfield Neural Network with Edge Constraint. Pattern Recongition. 34 (2001) 811-821 12. Kurugollu, F., Sankur, B., Harmanci, A. E.: Image Segmentation by Relaxation Using Constraint Satisfaction Neural Network. Image and Vision Computing. 20 (2002) 483-497
Adaptive Rank Indexing Scheme with Arithmetic Coding in Color-Indexed Images Kang Soo You1, Hyung Moo Kim2, Duck Won Seo2, and Hoon Sung Kwak2 1
School of Liberal Arts, Jeonju University, Jeonju, 561-756, Korea [email protected] 2 Department of Computer Engineering, Chonbuk National University, Jeonju, 561-756, Korea [email protected]
Abstract. In this paper, we introduce re-indexing method in indexing images using ranking, and propose two algorithms to determine rank of equivalent counted values on a row of co-occurrence count table. The goal of all algorithms is to improvement performance of lossless image compression. Using these algorithms, compression efficiency can be expected with higher data redundancy because most of pixels are contained in few higher ranks. After the simulation, it is verified that our proposed algorithms show the improvement of compression ratio than existing conventional re-indexing algorithms.
1 Introduction It is well known that indexing images can be re-indexed without any loss of data as long as the color palette can accommodate it. This means that a proper re-indexing can be chosen for a better compression performance [1]. One of the most popular formats of indexing images is the GIF (Graphics Interchange Format) from CompuServe Inc. The GIF-format images are generally used for transmission of artificially generated computer game units or online computer graphics rather than for real images. Figure 1 shows the 8-bits values for R, G, and B correspond to each index [2],[3]. The re-indexing method proposed by Zeng et al. [4] is based on one-step lookahead greedy approach, which aims at increasing the lossless compression efficiency of color-indexed images. And in the re-indexing method proposed by Pinho et al. [5], a theoretical analysis of Zeng’s method for the case of Laplacian distributed differences of neighboring pixels lead to a set of parameters that differs from the one originally suggested in [4]. Similar to the re-indexing methods mentioned above, You et al. [6] proposed a new re-indexing method using ranking which is the results of descending ordering to cooccurrence count values. It is shown that few higher ranks contained most of pixels. An image converted by You’s proposed method are called ranked image. Seeing the histogram distribution of ranked image, almost pixels concentrated in few higher ranks in left axis. It can be expected ranked image is suitable for arithmetic coding. In this paper, we introduce adaptive RIAC (ranked re-indexing with arithmetic coding) [6] and propose two algorithms to determine rank of equivalent counted values on a row of count matrix. There’s goal is to save the bits per pixel in lossless compression. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 440 – 449, 2006. © Springer-Verlag Berlin Heidelberg 2006
Adaptive Rank Indexing Scheme with Arithmetic Coding in Color-Indexed Images
441
Fig. 1. 8-bits values for R, G, and B correspond to each index
This paper is organized as follows. We described an overview of Pinho’s and You’s re-indexing scheme in section 2. And we explain how to convert an original indexed image into ranked image adaptively in section 3. Then, we present two algorithms to determine rank of same counted. We present performance evaluation comparing conventional re-indexing scheme in section 4. Finally we summarize our results and conclude this paper in section 5.
2 Conventional Re-indexing Scheme 2.1 Pinho’s Scheme Pinho et al. [5] proposed the modified Zeng's scheme for efficient entropy coding. This scheme’ core also is obtaining a smooth distribution of the indices over the image. If the spatial distribution of the indices over the image is smooth, greater compression ratios may be obtained. The following steps addressed Pinho’s scheme. • Step 1: Calculate the cross-counts C ( Si , S j ) for each pair of symbol Si and S j based on the initial color-indexed image. Here, S denotes symbol, i and j denote index. And then calculate the cumulative cross-counts for each symbol Si . • Step 2: Take S max that has the largest cumulative cross-counts. Denote it as L0 ,
and put L0 in a symbol pool P that will consist of spatially ordered symbols such as P = {L0 } . The size of P denotes as N , and set N = 1 . A new unassigned
symbol is entered in P only from either the left and or the right end. In order to take the most suitable symbol that will put the current end position of the pool P , the maximum unassigned symbol is chosen. It is the reason that difference between neighbored indices in replaced with index image. A candidate symbol entering P is chosen by equation (2) and (3). The parameter N is used to indicate the weight w( N , j ) .
442
K.S. You et al. N −1
DL ( Si , N ) = ¦ log 2 P ( j + 1) C ( Si , L j ) j =0
N −1
− ¦ log 2 P ( j + 2) C ( Si , L j ) . j =0
N −1
P ( j + 1)
j =0
P ( j + 2)
= ¦ log 2
(2)
C (Si , L j )
N −1
DR ( Si , N ) = ¦ log 2 P ( j + 1) C ( Si , L j ) j =0
N −1
− ¦ log 2 P ( N − j + 1) C ( Si , L j ) . j =0
N −1
P( N − j )
j =0
P ( N − j + 1)
= ¦ log 2 •
(3)
C ( Si , L j )
Step 3: Assign the larger potential function value Di to the corresponding end
position in the pool P . Denote as LN , Set N = N + 1 . • •
Step 4: If N is lesser M , go to Step 2. Step 5: Assign integers 0, 1, , M − 1 to the spatially ordered symbols in the pool P in left-to-right or right-to-left order.
Through above processing, the re-indexed image can be obtained by replacing the initial index value i with the new index value assigned to Si . 2.2 You’s Scheme
You et al. [6] proposed new re-indexing scheme using rank represented co-occurrence frequency. This scheme consists of following steps. •
Step 1: Count co-occurrence frequency in initial color-indexed image. Subsequently, make a co-occurrence count table using cumulative count value which is the number of times how many the pair of neighboring color symbols (Si, Sj) occurred. The table size is M. Here, M indicates the max color number of index image. • Step 2: Generate a co-occurrence rank table. After ordering each row by descending order, then give rank to each element in a row in co-occurrence count table. • Step 3: Get a new rank image after referencing co-occurrence rank table and the pair of neighboring color symbols (Si, Sj).
Figure 2 shows how the co-occurrence rank table is converted to rank image. Figure 3 shows the encoder of You’s scheme. The rank image showed few higher ranks contained most pixels. Therefore this scheme has enhanced arithmetic coding because its symbols have skewed distributions
Adaptive Rank Indexing Scheme with Arithmetic Coding in Color-Indexed Images
443
Fig. 2. (a) Sample image with four colors, Virtual ‘0’ is located in front of ‘3’ to get rank of the first pair (0, 3). (b) Co-occurrence count table. (c) Co- occurrence rank table. (d) Converted rank image by referencing (a) and (c).
Fig. 3. Encoder of You’s scheme
and little variance [7]. But we knew that this scheme had some disadvantage. Its additional information such as co-occurrence rank table must be stored and transmitted when image is reconstructed. Sometimes when the size of original image is not big, You’s scheme called RIAC (Rank Indexing scheme with Arithmetic Coding) can be inefficient because of the additional information.
3 The Proposed Methods 3.1 Adaptive RIAC
In order to reconstruct original image from rank image completely, as mentioned earlier, the additional information of co-occurrence rank table must be stored and
444
K.S. You et al.
transmitted when the index image is converted into the ranked image in RIAC [6]. This may be inefficient depending on the size of image and the number of colors, especially when the original image is small. In this paper, therefore, we introduce an adaptive RIAC, which is an adaptive improving method of RIAC algorithm, to have better compression efficiency in entropy coding. If co-occurrence count table has same count values, we also suggest two methods to determine rank of equivalent counted values on a row of cooccurrence count table, assuming that there is no same rank about equivalent counted values. The adaptive RIAC scheme consists of following steps. •
•
Step 1: Let assume what indices of each pixel to represent one-dimensional permutation P = ( P0 , P1 , , Pm ) . Parameter P0 denotes a virtual pixel’s index for
processing the first pixel and it value is the lowest index value ‘0’. Step 2: Initialize co-occurrence count table with ‘0’. And give rank to each element in a row in co-occurrence count table after ordering each row by 1
descending order. Subsequently, obtain rank of P1 by referencing P = ( P0 , P1 ) and co-occurrence rank table. • Step 3: Update co-occurrence count table with the value of ( P0 , P1 ) . And give rank to each element in a row in co-occurrence count table after ordering each row by descending order. Subsequently, obtain rank of P2 by 2
referencing P = ( P1 , P2 ) . •
m
Step 4: Iterate Step 3 to the value of ( Pm −1 , Pm ) in the end pair P .
Now the complete rank image from original color-indexed image is obtained after the last index pair ( Pm −1 , Pm ) is processed by above Step 1 to Step 4. In figure 4, the source image is figure 2(a). Figure 4(a) displays an initialized co-occurrence table and figure 4(b) displays a co-occurrence rank table by descending order in Step 2. And the 1
first rank is obtained by referencing P = ( P0 , P1 ) = (0, 3) and figure 4(b). Figure 5 shows the flow chart of adaptive RIAC scheme’s encoder.
Fig. 4. (a) Initial co-occurrence count table. (b) Co-occurrence rank table from (a). (c) The first rank by referencing (b) and figure 2(a).
Adaptive Rank Indexing Scheme with Arithmetic Coding in Color-Indexed Images
445
Fig. 5. Encoder of adaptive RIAC scheme
3.2 Determining Rank of Equivalent Counted Values
In case two or more co-occurrences in a row of co-occurrence count table have equivalent counted values, the ranking rule gives a higher rank to the first coming entity in a row of co-occurrence count table in RIAC. This determining rank method is called ARIAC, adaptive RIAC scheme in this case. We suggest enhanced rank method. There are distinct phases to get co-occurrence rank table. These phases are considered to set the collision when there is same rank value in the table of neighboring rank index. Also EARIAC-1 and EARIAC-2 indicate determining rank method according to each distinct phase. The two methods at each two phase are following. •
EARIAC-1: This method gives higher rank to the nearest element from main diagonal axis of unit matrix. M denotes the color number of index images. Consequently the co-occurrence frequency value of entity which exists in the i, j
location that corresponds the Kronecker’s delta δ ( i = j ) has the highest and the nearer the higher. In the formula which calculates the distance of element valued ‘1’ from the unit matrix, the max value of M is 256. (4) ^ M −1−|i − j| C ij = C ij + M
446
K.S. You et al.
• EARIAC-2:
This method gives higher rank to the value of equation (5) with weight function. It means that the nearest counted value to the highest one has the priority than the others. For example, let there be the counted values in a row (10, 50, 30, 25, 20, 30), then the third entity has higher rank than the sixth one because the third entity is nearer to the highest counted value ‘50’ than sixth one. Therefore the rank should be (6, 1, 2, 4, 5, 3) M
^
C ij = C ij +
¦ Cin ω ( j − n )|
n =1
.
N
¦ Cij
(5)
l =1
ω ( n ) = (1 −
|n| 2 ) . M −1
Figure 6(a) and 6(b) show the characteristic result of distance measuring by graph in EARIAC-1 and EARIAC-2 method respectively. Figure 6(a) shows that the higher rank should be the nearest element from diagonal axis of unit matrix and figure 6(b) shows that the higher rank should be the nearest element from the highest rank with weight function ω . Here, x-axis denotes columns and y-axis denotes distance of each element.
(a) EARIAC-1
(b) EARIAC-2
Fig. 6. The distance measuring in each method
4 Performance Evaluation In order to simulate each scheme, we have two sets of images all based on eight bits palette. One set collects photo images of real life scenes and the other set collects synthetic graphic images. All images have limited number of colors within 256. These images of two sets are obtained in [8]. We compared the performance of the proposed method with that of Pinho's and You’s method. We used measure such as bits per pixel (bpp) to evaluate the performance. Table 1 and 2 display the simulation results with graphic images and photo images of real life scenes respectively. As can be seen in table 1 and table 2, the proposed methods have lower bit rates than other scheme. Especially the bpp of ‘cwheel’, ‘fractal’, ‘party8’, ‘yahoo’ from graphic images and ‘airplane’, ‘anemone’, ‘baboon’, ‘boat’ from photo images in You’s RIAC scheme is lower than Pinho’s scheme,. Because it is that the RIAC scheme needs the size of additional information to get reconstructed image.
Adaptive Rank Indexing Scheme with Arithmetic Coding in Color-Indexed Images Table 1. Thecomparison bpp of simulation with graphic images Images
Pinho
RIAC
The Proposed Method ARIAC EARIAC-1 EARIAC-2
clegg
5.456
4.628
4.175
4.159
4.161
cwheel
2.878
3.076
2.857
2.843
2.845
fractal
5.828
5.919
5.268
5.236
5.235
frymire
3.376
2.540
2.414
2.408
2.409
party8
0.318
0.355
0.289
0.287
0.287
serrano
3.273
2.637
2.379
2.369
2.373
yahoo
1.789
2.786
1.816
1.753
1.761
Average
3.274
3.134
2.743
2.722
2.724
Table 2. The comparison bpp of simulation with photo images Images
Pinho
RIAC
The Proposed Methods ARIAC
EARIAC-1 EARIAC-2
airplane
4.445
4.896
4.282
4.232
4.223
anemone
4.966
5.050
4.326
4.296
4.296
baboon
6.496
7.353
6.101
6.058
6.052
boat
5.823
6.306
5.321
5.273
5.269
lena
5.049
4.776
4.055
4.036
4.041
monarch
3.917
3.710
3.233
3.217
3.221
peppers
5.019
4.647
4.016
3.989
3.994
Average
5.102
5.248
4.476
4.443
4.442
Fig. 7. Original and rank image. The histogram top 10 bins (yahoo).
447
448
K.S. You et al.
Fig. 8. Original and rank image. The histogram top 10 bins (peppers).
Figure 7 and figure 8 show the histograms of original and rank image, ‘yahoo’ and ‘peppers’ image. Specially, the histogram of each image displays top 10 bins distributed most pixels. After all, we can expect the higher efficiency in the compression ratio, i.e., bpp.
5 Conclusions In this paper, in order to achieve a better compression ratio with no loss of information on color-indexed images, we proposed enhanced adaptive rank indexing scheme with arithmetic coding. Especially to determine the ranking which followed by the cooccurrence frequency, two phases are defined and applied in adaptive RIAC scheme. From the simulations, it is verified that our proposed scheme reduce bits per pixel over other scheme by 2.722 bpp and 2.724 bpp in table 1, 4.443 bpp and 4.442 bpp in the table 2, respectively on the average. So our proposed scheme can be applied to many other fields such like medical image processing, satellite image processing and military remote sensing application. It extracts a feature vector as edge information as can be seen rank image. For the studies, it is highly recommended to utilize content based image retrieval based on our proposed scheme.
References 1. Battiato, S., Gallo, G. Impoco G., Stanco, F.: An Efficient Re-Indexing Algorithm for Color-Mapped Images, IEEE Transactions on Image Processing, Vol. 13 No. 11 (2004) 1419–1423 2. Gonzalez, R.C., Woods, R.E., Eddins, S.L.: Digital Image Processing using MATLAB, New Jersey: Prentice Hall (2003) 3. Li, Z.N., Drew, M.S.: Fundamentals of Multimedia, New Jersey: Prentice Hall (2004)
Adaptive Rank Indexing Scheme with Arithmetic Coding in Color-Indexed Images
449
4. Zeng, W., Li, J., Lei, S.: An Efficient Color Re-Indexing Scheme for Palette-based Compression, Proc. IEEE int. Image Processing, ICIP-2000., Vol.3 (2000) 476–479 5. Pinho, A.J., Neves, A.J.R.: A Survey on Palette Reordering Methods for Improving the Compression of Color-Indexed Images, IEEE Transactions on Image Processing, Vol. 13, No.11 (2004) 1411–1418 6. You, K.S., Lee, H.J., Jang, E.S., Kwak, H.S.: An Efficient Lossless Compression Algorithm using Arithmetic Coding for Indexed Color Image, KICS on Communication Theory, Signal Processing, Information Security, Vol. 30 No. 1C (2005) 35-43 7. Sayood, K.: Introduction to Data Compression, 2nd Edition, California: Morgan Kaufmann (2000) 8. Pinho, A.J.: ftp://ftp.ieeta.pt/%7Eap/images
Revisit to the Problem of Generalized Low Rank Approximation of Matrices Chong Lu1,2 , Wanquan Liu2 , and Senjian An2 1
Department of Computer Science, YiLi Normal College, Yining, 835000, PR China 2 Department of Computing, Curtin University of Technology, WA 6102, Australia
Abstract. In this paper, we will revisit the problem of generalized low rank approximation of matrices (GLRM) recently proposed in [1]. The GLRM problem has been investigated by many researchers in the last two years due to its broad engineering applications. However, it is known that there may be no closed-form solution for this problem and the convergence for the proposed iterative algorithm Algorithm GLRAM is actually still open. Motivated by these observations, we will derive a necessary condition for the GLRAM problem via manifold theory in this paper. Further, the relationship between this derived necessary condition and the Algorithm GLRAM is investigated, which will provide some insights for the possible complete analysis of the convergence problem in the future. Finally, some illustrative examples are presented to show different cases for convergence and divergence with different initial conditions.
1
Introduction
Dimension reduction has received much attention in recent years in areas such as machine learning, signal processing, and information retrieval [2],[5],[6]. The aim of dimensional reduction is to obtain a lower dimension compact data representation with few loss of information. There are several important algorithms for dimensional reduction based on vectors selection which may be called vector space model [1]. Under this model, each datum is treated as a vector and the whole data collection will be a single data matrix, where each column corresponds to a data point. Many algorithms have been proposed based on this model for dimensional reduction in differen areas such as face recognition [7], machine learning [8] and information retrieval [9]. A well-known technique based on this vector space model for dimensional reduction is the low rank approximation by using Singular Value Decomposition (SVD). An appealing property of this type of low rank approximation via SVD is that it can achieve the smallest reconstruction error among all approximations D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 450–460, 2006. c Springer-Verlag Berlin Heidelberg 2006
Revisit to the Problem of Generalized Low Rank Approximation of Matrices
451
with the constrain of same rank with Euclidean distance. Also there are some low rank approximation algorithms within a given threshold proposed based Schur decomposition [3],[4]. However, applications of this technique to high dimensional data, such as images and videos will quickly meet some critical problems like complexity, computational limits, etc [10],[11]. Though there are some incremental algorithms proposed to overcome the computational complexity issue [12],[13]. To the best of our knowledge, no such existing algorithms can guarantee the quality of the approximations produced. Recently, a new type of low rank approximation is proposed in [1] based on matrix space model instead of vector space model. With this technique, each datum is treated as a matrix instead of a vector and the whole data collection will be a set of data matrices. In this way, the dimension curse in vector space model can be removed. Since the publication of paper [1], many applications have been investigated by using this technique successfully [10],[14]. However, it is known that there may be no closed form solution for this type of generalized low rank approximation of matrices. In this case, it is necessary to analyze the convergence for the Algorithm GLRAM though it has been used successfully in quite a few applications. Observing that some basic analysis for the Algorithm GLRAM is given in [1] and they are not rigorously right mathematically from our analysis in this paper, we will continue to study this issue in this paper. Our main contribution is to derive a necessary condition for this generalized low rank approximation problem based on manifold theory and then establish a relationship of this derived necessary condition and the Algorithm GLRAM proposed in [1]. The rest of this paper is organized as follows. We give a brief overview of the generalized low rank approximation of matrices based on the matrix space model. One necessary condition for this corresponding optimization problem is derived in section 3 via using manifold theory. In section 4, the relationship of the derived necessary condition and the proposed algorithm in [1] is investigated. Some examples are given in Section 5 to show convergence/diveregnce in different cases and conclusions and directions for future work are summarized in section 6.
2
Generalized Low Rank Approximation of Matrices
The traditional low rank approximation of matrices was formulated as follows. Given A ∈ Rn×m , find a matrix B ∈ Rn×m with rank(B)= k, such that B = arg
min ||A − B||F rank(B)=k
where
the Frobenius norm ||M ||F of a matrix M = (Mij ) is given by ||M ||F = 2 ij Mij . This problem has a closed-form solution through SVD [11]. However, the computational cost is high when n is large as stated in [1]. In order to overcome the computational complexity, Ye [1] proposed a matrix space model for low rank approximation of matrices called the generalized low rank approximation of matrices (GLRAM), which can be stated as follows.
452
C. Lu, W. Liu, and S. An
Let Ai ∈ Rn×m , for i = 1, 2, · · · , N. be the N data points in the training set. We aim to compute two matrices L ∈ Rn×r and R ∈ Rm×c with orthonormal columns and N matrices Mi ∈ Rr×c for i = 1, 2, · · · , N. such that LMi RT approximates Ai well for all i. Mathematically, this can be formulated as an optimization problem N
min
LT L=Ir ,RT R=Ic
||Ai − LMi R ||2F
i=1
Until now, no closed form solution has been found for the proposed GLRAM. In order to find an optimal solution for this GLARM, the author in [1] has proved that the proposed GLRAM is equivalent to the following optimization problem. max J(L, R) = L,R
N
||L Ai R||2F
i=1
=
N
T r [ L Ai RR Ai L].
(1)
i=1
where L and R must satisfy LL = Ir , RR = Ic . Instead of solving the above optimization problem directly, the author in [1] observed the following fact in Theorem 1 and proposed the Algorithm GLRAM stated below. Theorem 1. [1] Let L, R and {Mi }N i=1 be the optimal solution to the minimization problem in GLRAM. Then (1) For a given R, L consists of the r eigenvectors of matrix ML (R) =
N
Ai RR A i
i=1
corresponding to the largest r eigenvalues. (2) For a given L, R consists of the c eigenvectors of the matrix MR (L) =
N
A i LL Ai
i=1
corresponding to the largest c eigenvalues. Based on these observations, the following algorithm for solving the GLRAM is proposed in [1], which is called the Algorithm GLRAM. Algorithm GLRAM[1] Input: matrices {Ai }, r and c. Output: matrices L, R and {Mi }
Revisit to the Problem of Generalized Low Rank Approximation of Matrices
453
1. Obtain initial L0 for L and set i = 1. 2. While not convergent N 3. From the matrix MR (L) = j=1 A j Li−1 Li−1 Aj compute the c eigenvectors R c {φj }j=1 of MR corresponding to the largest c eigenvalues. R 4. Let Ri = [φR 1 , · · · , φc ] N 5. From the matrix ML (R) = j=1 Aj Ri Ri Aj compute the r eigenvectors L r {φj }j=1 corresponding to the largest r eigenvalues. L 6. Let Li = [φL 1 , · · · , φr ] 7. i=i+1 8. Endwhile 9. L = Li−1 10. R = Ri−1 11. Mj = L Aj R It should be noted here that the convergence of this algorithm can not be guaranteed theoretically in general and therefore the implementation from step 2 to step 8 may be never end. In order to investigate the convergence issue of the proposed algorithm, the following criteria is defined in [1]. N 1 ||Ai − LMi R ||2F RM SRE = N i=1 where RMSRE stands for the Root Mean Square Reconstruction Error. It is easy to show that each iteration for the Algorithm GLRAM will decrease the RMSRE value and also RMSRE is bounded. Based on these facts, the following convergence result is obtained in [1]. Theorem 2. Algorithm GLRAM monotonically decreases the RMSRE value, hence it converges. Actually, Theorem 2 is not rigorously right in mathematical perspective though it is regarded as acceptable to use in engineering applications. To explain this issue clearly, we can rewrite RMSRE as RM SRE(L, R) being a function of variables L and R. Then, one can see that the produced sequences {Li , Ri } will decrease the value {RM SRE(Li, Ri )} and this can ONLY guarantee that the sequence {RM SRE(Li, Ri )} will converge mathematically. In this case, one can not conclude the convergence of the sequence {Li , Ri } itself. However, if we assume that RMSRE(L,R) only has one unique optimal solution (L, R), Theorem 2 will be true in general. In other words, the convergence analysis for the Algorithm GLRAM is not sufficient in [1]. In order to further discuss the convergence issue, we will derive a necessary condition for the GLRAM problem in next section.
3
A Necessary Condition for Optimality of GLRAM
To solve the GLRAM problem directly from eq (1), we need to consider the optimization problem on manifolds due to the constraints for L L = Ir and
454
C. Lu, W. Liu, and S. An
RR = Ic . In order to solve this constrained optimization problem, we first derive the corresponding tangent spaces via using the manifold theory [15]. In order derive the tangent spaces for L and R, one can see that the constraints for L and R are made of Stiefel manifolds as below. St(r, n) = {L L = Ir
| L ∈ Rn×r }
St(c, m) = {R R = Ic
| R ∈ Rm×c }
With these Stiefel manifolds, we can obtain their tangent spaces
TL St(r, n) = { ξ ∈ Rn×r | ξ L + L ξ = 0}. TR St(c, m) = { η ∈ Rm×c | η R + R η = 0}. and their orthonomal spaces below
TL St(r, n)⊥ = { LΛ ∈ Rn×r | Λ = Λ ∈ Rr×r }. c×c TR St(c, m)⊥ = { RΛ0 ∈ Rm×c | Λ0 = Λ }. 0 ∈ R
(2)
(3)
In order to obtain a necessary condition for the optimization problem of the GLRAM, we take the derivatives of the object function J(L, R) with respect to its variables L and R. This can be done as below. N ∂J = 2T r (L Ai RR Ai )ξ = DJL (ξ) ∂L i=1 N ∂J = 2T r (R Ai LL Ai )η = DJR (η) ∂R i=1 where Tr(M) represents trace of matrix M and D is the derivative operator. N N Let R∗ = i=1 Ai LL Ai R, and L∗ = i=1 Ai RR Ai L. Then, one can simplify the above equations. ∂J = 2T r( (L∗ ξ) = DJL (ξ) ∂L ∂J = 2T r( (R∗ η) = DJR (η) ∂R Now, define DJL (ξ) = 2T r(∇JL ξ), DJR (η) = 2T r(∇JR η),
and
DJL = 2T r(L∗ ξ)
Then the differential ∇JL must satisfy the following two conditions according to the manifold theory [15]. ∇J(L) ∈ Tu St(r, n).
(a)
Revisit to the Problem of Generalized Low Rank Approximation of Matrices
DJ(L) =< ∇J(L) , ξ >
∀ξ ∈ Tu St(r, n).
455
(b)
It is easy to see that condition (b) is equivalent to (∇J(L) − L∗ ) ξ = 0 With TL St(r, n)⊥ in equation (3), we can obtain the following ∇J(L) = (I − LL )L∗ Similarly, one can obtain ∇J(R) = (I − RR )R∗ Therefore, a necessary condition for the optimality of GLRAM problem is.
(I − LL )L∗ = 0 . (4) (I − RR )R∗ = 0 It can be seen that equations in (4) are nonlinear equations and usually it is hard/impossible to obtain their closed form solutions. In practice, we intend to seek their numerical iterative solutions instead. One possible way is to use the gradient flow idea to solve the following differential equations as done similarly in [16].
˙ L(t) = (LL − I)L∗ (5) ˙ R(t) = (RR − I)R∗ . with a given initial condition {L0 , R0 }. The convergence analysis and numerical implementation can be analyzed as in [16] and we will explore all these issues in the future. Since the Algorithm GLRAM is one approach to solve the GLRAM problem and next we will explore the relationship of this derived necessary condition and the Algorithm GLRAM.
4
Relationship of the Necessary Condition and Algorithm GLRAM
In the previous section, we derived a necessary condition for the GLRAM optimization problem. Recall that the Algorithm GLRAM in [1] also provides an iterative solution for the GLRAM problem. Though the convergence of Algorithm GLRAM has not be proved mathematically, it works quite well in practice. Therefore, it is necessary to investigate their relationship. We rewrite the necessary conditions as follows by substituting L∗ and R∗ . N (I − LL ) i=1 Ai RRT A i L =0 N (I − RR ) i=1 A LL A iR = 0 i
456
C. Lu, W. Liu, and S. An
By multiplying L and R on the right side of above equations, one can derive the following N (I − LL ) i=1 Ai RR A i LL = 0 . (6) N (I − RR ) i=1 A i LL Ai RR = 0 Actually, these equations are equivalent to N N LL i=1 Ai RR A LL = i=1 Ai RR A i i LL N N RR i=1 A i LL Ai RR = i=1 Ai LL Ai RR .
(7)
Using the notations MR (L) and ML (R) as in Theorem 1, one can obtain the following.
LL ML (R)LL = ML (R)LL (8) RR )MR (L)RR = MR (L)RR . With a given initial value L0 , if we denote the sequence produced by the Algorithm GLRAM as {Li−1 } and {Ri } for i = 1, 2, · · · . Then, one can easily testify that they will satisfy the following equations
Li+1 L i+1 ML (Ri )Li+1 Li+1 = ML (Ri )Li+1 Li+1 (9) Ri Ri )MR (Li−1 )Ri Ri = MR (Li−1 )Ri Ri . This implies that if the sequences {Li−1 } and {Ri } converge, their limits will satisfy the necessary conditions for the optimality of GLRAM. In addition, every two iterations in the Algorithm GLRAM will satisfy the necessary condition as shown in (9). However the convergence of these sequences can not be guaranteed theoretically in general due to their complex nonlinearity. Mathematically speaking, L and R should be treated equally in (8) as two independent variables. In this case, one will derive sequences {Li } and {Ri } by using the Algorithm GLRAM interchangeably and we call this implementation procedure as Extended Algorithm GLRAM. Along this line, we can produce different sequences from the necessary conditions by giving L0 , or R0 or both of them. We will use some examples in the next section to show the convergence/divergence in different cases.
5
Illustrative Examples
In this section, we will present several examples to show the convergence/ divergence in difference cases for the Algorithm GLRAM and the Extended Algorithm GLRAM. First, we generated a rand matrix A below. ⎛ ⎞ 61.094 17.502 50.605 34.176 28.594 30.621 1.467 26.182 ⎜ 7.117 62.103 46.478 40.180 39.413 11.216 66.405 70.847 ⎟ ⎟ A=⎜ ⎝ 31.428 24.596 54.142 30.769 50.301 44.329 72.406 78.386 ⎠ 60.838 58.736 94.233 41.157 72.198 46.676 28.163 98.616 and let A1 = A(:, 1 : 4), and A2 = A(:, 5 : 8) and r = 3, c = 3. Here the notations are adopted from matlab programming. In the process of implementing these two algorithms, the number of iteration steps is set to be 40.
Revisit to the Problem of Generalized Low Rank Approximation of Matrices
5.1
457
Convergence to Different Limits with Different Initial Conditions
If let one initial condition L0 be ⎛
⎞ 23 3 ⎜2 1 5 ⎟ ⎟ L0 = ⎜ ⎝ 6 9 16 ⎠ 65 7 The Algorithm GLRAM will converge to the following limit: ⎛ ⎞ −0.3124 0.5793 0.6989 ⎜ −0.4494 −0.4417 0.4255 ⎟ ⎟ L=⎜ ⎝ −0.5058 −0.5329 −0.0552 ⎠ −0.6668 0.4305 −0.5722 ⎞ −0.4704 0.2239 −0.5451 ⎜ −0.3768 −0.0486 −0.5714 ⎟ ⎟ R=⎜ ⎝ −0.5600 −0.7656 0.2979 ⎠ −0.5685 0.6011 0.5363 ⎛
And if we choose another different initial L0 ⎛ ⎞ 23 3 ⎜2 1 5 ⎟ ⎟ L0 = ⎜ ⎝ 6 9 15 ⎠ 65 7 The Algorithm GLRAM will converge to another different limit: ⎛ ⎞ −0.3087 0.6521 −0.1129 ⎜ −0.4538 −0.5993 0.4848 ⎟ ⎟ L=⎜ ⎝ −0.5050 −0.3128 −0.8020 ⎠ −0.6661 0.3431 0.3301 ⎛
⎞ −0.4685 0.2729 −0.4223 ⎜ −0.3797 −0.1728 0.8523 ⎟ ⎜ ⎟ ⎟ R=⎜ ⎜ −0.5625 −0.7238 −0.2993 ⎟ ⎝ −0.5657 0.6097 0.0753 ⎠ This indicates that different initial conditions may lead to different solutions. For a given initial condition R0 , similar conclusion can be observed. Further, as we discussed previously, we should treat L and R equally in order to interpret the necessary conditions in mathematical perspective. In this case, we should give initial condition {L0 , R0 } and implement the Extended Algorithm GLRAM to produce sequences {Li , Ri }.
458
5.2
C. Lu, W. Liu, and S. An
Convergence/Divergence with Different Initial Conditions
Let one initial condition {L0 , R0 } be as below. ⎛ ⎞ 23 3 ⎜2 1 5 ⎟ ⎟ L0 = ⎜ ⎝ 6 9 16 ⎠ 65 7 ⎞ ⎛ 1 5 8 ⎜ 9 1 12 ⎟ ⎟ R0 = ⎜ ⎝ 65 6 1 ⎠ 8 21 9 Then, the Extended Algorithm GLRAM will converge to ⎛ ⎞ −0.3124 0.5793 0.6989 ⎜ −0.4494 −0.4417 0.4255 ⎟ ⎟ L=⎜ ⎝ −0.5058 −0.5329 −0.0552 ⎠ −0.6668 0.4305 −0.5722 ⎞ ⎛ −0.4704 0.2239 −0.5451 ⎜ −0.3768 −0.0486 −0.5714 ⎟ ⎟ R=⎜ ⎝ −0.5600 −0.7656 0.2979 ⎠ −0.5685 0.6011 0.5363 And if we choose another initial condition {L0 , R0 } as ⎛ ⎞ 23 3 ⎜2 1 5 ⎟ ⎟ L0 = ⎜ ⎝ 6 9 15 ⎠ 65 7 ⎛ ⎞ 1 5 8 ⎜ 9 1 12 ⎟ ⎟ R0 = ⎜ ⎝6 6 1 ⎠ 8 21 9 Then the Extended Algorithm GLRAM will converge to ⎛ ⎞ −0.3087 0.6521 −0.1129 ⎜ −0.4538 −0.5993 0.4848 ⎟ ⎟ L=⎜ ⎝ −0.5050 −0.3128 −0.8020 ⎠ −0.6661 0.3431 0.3301 ⎞ ⎛ −0.4685 0.2729 −0.4223 ⎜ −0.3797 −0.1728 0.8523 ⎟ ⎟ ⎜ ⎟ R=⎜ ⎜ −0.5625 −0.7238 −0.2993 ⎟ ⎝ −0.5657 0.6097 0.0753 ⎠
Revisit to the Problem of Generalized Low Rank Approximation of Matrices
459
which are different from the previous case. This is similar to the case for Algorithm GLRAM. Further, If one chooses {L0 , R0 } as follows. ⎛ ⎞ 23 3 ⎜2 1 5 ⎟ ⎟ L0 = ⎜ ⎝ 6 9 16 ⎠ 65 7 ⎛ ⎞ 1 5 8 ⎜ 9 1 12 ⎟ ⎟ R0 = ⎜ ⎝6 6 1 ⎠ 8 21 9 and one will find that the Extended Algorithm GLRAM will not converge at all. This indicates that the convergence analysis for the Extended Algorithm GLRAM is much more complex and that the differential equations in (5) may have no solutions for some initial conditions.
6
Conclusions
In this paper, we revisited the problem of the generalized low rank approximation of matrices and derived a necessary condition for the optimality of GLRAM based on manifold theory. Also we found that the the limits of the sequences produced by Algorithm GLRAM will satisfy the derived necessary condition if they converge. In the future, we will investigate the convergence issue for the Extended Algorithm GLRAM based on the derived necessary condition along the line of gradient flow idea. In other words, we will investigate the convergence issue for the derived gradient flow (5) and found another way to solve the solutions for the necessary condition. The illustrative examples show that it is necessary to find all the possible initial conditions which could lead to convergent limits for the Extended Algorithm GLRAM.
References 1. Jieping, Ye.:Generalized Low Rank Approximations of Matrices, The Proceedings of Twenty-first International Conference on Machine Learning, Banff, Alberta, Canada(2004)887-894 2. Hua,Y., Wanquan Liu.: Generalized Karhunen-Loeve Transform,, IEEE Signal Processing Letters, vol. 5, no. 6, June (1998)141-142 3. van der Veen,A.J.:A Schur Method for Low-rank Matrix Approximation, SIAM J. Matrix Anal. Appl, Jan(1996)139-160 4. Gotze,J., van der Veen,A.J.:On-line Subspace Estimation Using a Schur-type Method,” IEEE Trans. Signal Processing, 44(6)June (1996)1585-1589 5. Senjian An, Wanquan Liu, Svetha Venkatesh, Ronny Tjahyadi.: A Fast Dimension Reduction Algorithm for Kernel Based Classification, Proceedings of International Conference 9 on Machine Learning and Cybernetics 2005 18 - 21 August, Guangzhou,China(2005)3369-3376
460
C. Lu, W. Liu, and S. An
6. Srebro, N., Jaakkola, T. .: Weighted Low-rank Approximations. In ICML Conference Proceedings (2003) 720-727 7. Turk,M., Pentland,A.:Eigenfaces for Recognition. Journal of Cognitive Neuroscience, 13(1)(1991)71-86 8. Castelli, V., Thomasian, A., Li, C.-S.:CSVD: Clustering and Singular Value Decomposition for Approximate Similarity Searches in High Dimensional Space. IEEE Transactions on Knowledge and Data Engineering (2003)671-685 9. Berry, M., Dumais, S., OBrie, G. : Using Linear Algebra for Intelligent Information Retrieval. SIAM Review, 37 (1995)573-595 10. Yang, J., Zhang, D., Frangi, A., Yang, J. : Two-dimensional PCA: A New Approach to Appearance-based Face Representation and Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence(2004)131-137 11. Golub, G. H., Van Loan, C. F. : Matrix Computations, 3rd edition. Baltimore, MD, USA: The Johns Hopkins University Press(1996) 12. Brand, M. : Incremental Singular Value Decomposition of Uncertain Data with Missing Values. In ECCV Conference Proceedings(2002)707-720 13. Kanth, K. V. R., Agrawal, D., Abbadi, A. E., Singh, A. : Dimensionality Reduction for Similarity Searching in Dynamic Databases. In ACM SIGMOD Conference Proceedings(1998)166-176 14. Wang,L., Wang,X., Zhang,X., Feng,J.: Equivalence of Two Dimensional PCA to Line-based PCA, Pattern recognition Letters (2005) 57-60 15. Helmke,U., JB Moore.: Optimization and Dynamic Systems, Springer, London (1994) 16. Yan,W .Y., Lam,J.: An Approximation Approach to H2 Optimal Model Reduction, IEEE Transaction on Automatic Control(1999) 1341-1358
Robust Face Recognition of Images Captured by Different Devices Guangda Su, Yan Shang, and Baixing Zhang Research Institute of Image and Graphics Electronic Engineering Department, Tsinghua University 100084, Beijing, P.R. China [email protected]
Abstract. The performance of a face recognition system is greatly influenced when the target and test images are captured through different imaging devices, like scanned photos, images captured through digital camera or video camera. By excluding the influence of lighting, expression, as well as time difference, the difference of gray scale distribution of images captured through different devices is analyzed in this paper. Based on this analysis, we decomposed the facial feature into general feature and local feature. General facial feature is determined by the imaging device and local facial feature represents the individual facial difference, which is to be used for face recognition. Before performing face recognition, the general facial feature is excluded through gray scale normalization. In this paper the normalization was carried out through gray scale sorting, and the experiment results under ideal and realistic conditions showed the performance improvement of face recognition system through this technique.
1
Introduction
Face recognition is a hot topic in pattern recognition society, and it has found wide usage in many areas [3,4,5]. Based on our research of face recognition techniques [1,2,6], we have noticed that the performance of a face recognition system is greatly influenced by the input imaging device. If the input facial image and the target facial images are captured through the same type of imaging device, a high recognition rate can be achieved. The recognition rate decreases significantly when the input image and the target ones are from different imaging devices. In some real applications, this kind of influence is unbearably serious to the system performance. For example, for a face surveillance system in an airport, the facial images in the pre-stored database are usually scanned photo images, and the facial image to be recognized is captured on-site through digital camera or digital video camera. In other applications, facial images captured through mobile phones may be compared with scanned photos, which is frequently encountered in forensic areas. In the above mentioned circumstances, the comparison of facial features for the same person is not carried out on the images captured through the same type of device, thus the recognition rate is consequently deteriorated. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 461–469, 2006. c Springer-Verlag Berlin Heidelberg 2006
462
G. Su, Y. Shang, and B. Zhang
Nowadays, quite a lot face recognition systems with satisfying laboratory performance encounter difficulties in real-world applications. Reason for this phenomenon lies not only in the commonly regarded difference in pose, expression, lighting and imaging time, the difference in imaging devices is also an important factor, which is the topic of our research in this paper. In order to investigate the influence of different imaging devices on face recognition system, we have captured facial images through three types of devices: traditional camera, digital camera and digital camcorder. Front images under same lighting, pose and expression are captured, Figure 1 shows the captured facial images. From Figure 1, it is obvious that there exists fundamental difference in gray scale distribution between different image types, which we have investigated theoretically in [6]. However, it is difficult to apply an unified theoretical equation to exclude this device dependent difference because of the huge variance in the imaging environments in real-world applications. Some researchers have addressed the problem of excluding the difference in lighting, pose, expression and etc. for a face recognition system through preprocessing algorithms [7,8,9,10]. To our knowledge, there is no special research aimed at excluding the influence of different imaging devices. In this paper, we have proposed a novel gray scale normalization method based on the device characteristic to solve this problem. In the following section, this novel gray scale normalization method is explained in detail theoretically, and section 3 describes the realization procedure. The experiment results is given in section 4, this paper is concluded in section 5.
(a)
(b )
(c )
Fig. 1. Facial images from same candidates captured through (a) traditional camera, (b) digital camcorder, and (c) digital camera. All the images are captured under same circumstances, thus differences in lighting, pose, expression are excluded.
2
Gray Scale Normalization Based on Device Characteristics
Figure 2 shows the corresponding gray scale histograms of the facial images in Figure 1. From figure 2, it can be seen that the histograms from one type of imaging device have a similar shape, and the histogram shapes of different imaging devices shows significant difference.
Robust Face Recognition of Images Captured by Different Devices
(a)
(b )
463
(c )
Fig. 2. Histograms for the corresponding facial images in Figure 1. From left to right, histograms of the facial images from(a) traditional camera, (b) digital camcorder, and (c) digital camera. All the images are captured under same circumstances, thus differences in lighting, pose, expression are excluded.
From the special characteristic of the device-dependent histogram distribution, we have derived the following conclusion: under ideal situation, which means only the difference of imaging devices is considered, the facial feature can be expressed as, Facial Featute = Gerneral Feature + Individual Feature.
(1)
Where, general feature is determined by the imaging device, and it can be seen as a carrier wave. Individual feature is determined by each person’s physiological characteristic, and it can be regarded as a small signal. The total facial image can be seen as these two signals being modulated together with the following equation, (2) f0 ≡ fg fi . where f0 is the facial image to be processed, fg is the general feature, and fi is the individual feature. According to the information theory, the information contained in the facial image can be calculated as, H(f0 ) = H(fg fi ) = H(fi |fg ) + H(fg ).
(3)
where H refers to the entropy of the two dimensional image matrix. In a face recognition system, the basic operation is to compare the similarity between two facial images f10 and f20 . For a robust and reliable face recognition system, it is important to compare only the individual facial features, which means that f1g and f2g should have no influence on the recognition result. Therefore, it is necessary for a face recognition system to evaluate the general feature for an input facial image and exclude this feature first before submitting the image for similarity comparison: (1) when f1g = f2g = fg , the feature difference between the two input facial images can be expressed as: Dif(f10 , f20 ) = Dif(f1i |fg , f2i |fg ) = Dif(f1i , f2i ).
(4)
This equals to compare two images from the same input device. In this circumstances, no pre-processing is needed to exclude the influence of the input device.
464
G. Su, Y. Shang, and B. Zhang
(2) when f1g = f2g , the feature difference can be expressed as: Dif(f10 , f20 ) = Dif(f1i |f1g , f2i |f2g ) + H(f1g f2g ).
(5)
This means that the two images are from different imaging devices. Here H(f1g f2g ) is a constant, which can be derived through large sample training. Therefore, we only need to calculate Dif(f1i |f1g , f2i |f2g ). For the above mentioned first situation, many face recognition systems have achieved satisfying results, while for the second situation, it is normally difficult to achieve reliable performance without further preprocessing. In the following section, we have proposed a novel gray scale sorting algorithm, which excludes the general feature introduced by the input devices, while preserves the individual facial feature.
3
Normalization Through Gray Scale Sorting
In order to improve the face recognition performance for images captured through different devices, it is important to exclude the general image feature introduced by the input device. Gray scale sorting is adopted in this paper to perform the gray scale normalization for images from different input sources. Standard gray scale distributions are defined, which represents the general characteristic of facial images captured through a certain type of input imaging device. These standard distributions are derived through training large number of samples with known input devices. Gray scale sorting is then performed to transform the histogram of the facial image to be recognized to the standard distribution. 3.1
Standard Gray Scale Distribution
The standard gray scale distribution is calculated through a training process: a set of facial images from the same type of imaging device are chosen as the training samples, and the histograms which reflect the gray scale distribution in the images are then submitted to a statistical process, which calculates a weighted average version fg from all the images. It should be mentioned here, that the training images are gray scale images, and the pixel values of the average image fg are not truncated to integer, instead they are preserved in this process as positive real number. To get a standard gray scale distribution for a certain type of imaging device, which can be used as a reference to normalize a facial image from a different input device, we first calculated a class center distribution function (CCDF). Based on this CCDF, we then calculated a class center cumulative function, which is used as a lookup table in the gray scale normalization. The following procedure describes how these two functions are calculated: (1) First the histogram hn (g) of every facial image in the training set is calculated (where n is the nth sample, g is the gray scale, and n = 1,2,...,N, g = 0,1,...,255).
Robust Face Recognition of Images Captured by Different Devices
465
(2) Define B(g) as the class center distribution function for the specified input class. For every gray scale, B(g) is calculated recursively. Take B(0) as an example: suppose the class center for the first two images in the training set is k1 , the precision radius is r1 , then k1 =
h1 (0) + h2 (0) 2
and r1 =
|h1 (0) − h2 (0)| . 2
(6)
If |h3 (0) − k1 | ≤ r1 , then the class center and the precision radius remain the same k2 = k1 and r2 = r1 ; If otherwise, k2 =
k1 + h3 (0) 2
and r2 =
|k1 − h3 (0)| . 2
(7)
The same calculation is carried out to all the N training samples with B(0) = kN −1 . The class center values for other gray scales are calculated in the same way as the calculation of B(0). (3) Based on B(g), the class center cumulative function P (g) is calculated as, 255 B(g)dg. (8) P (g) = 0
Figure 3 shows the class center cumulative functions for facial images from scanned photos, digital camera and digital camcorder. There are all together 10000 images in the total training set, and it is shown that all the three cumulative function have their own distinctive shape.
(a)
(b )
(c )
Fig. 3. Class center cumulative functions for facial images from scanned photos (a), digital camera (b) and digital camcorder (c)
3.2
Gray Scale Sorting
The aim of gray scale sorting is to normalize a facial image to a standard gray scale distribution of a certain image type. First we define the following function, Definition 1. Nδ ((i, j), (m, n), D), where (i, j) and (m, n) are the coordinates of the pixels in a two-dimensional facial image, D is the gray scale matrix, δ is a neighborhood region of pixel (m, n). Nδ ((i, j), (m, n), D) is defined as the ascending gray scale order of pixel (i, j) in the neighborhood region of pixel (m, n) in the gray scale matrix D. The neighborhood region δ can be a region of 3 × 3, or the entire image. It can also be region containing facial parts like eye, nose, brow and etc.
466
G. Su, Y. Shang, and B. Zhang
Let Ds be the gray scale distribution matrix of the standard image fg , Do be the gray scale distribution of the image to be processed, and X be the target image after gray scale sorting with the same size as the original one, for every pixel in the target, its gray scale is computed as, X(i, j) = D(m, n).
(9)
where Nδ ((i, j), (i, j), Do ) = Nδ ((m, n), (i, j), Ds ). It is easy to prove that the above assignment is unique and complete. In a specified neighborhood δ, the gray scales of the pixels in the target image are assigned through the gray scale in the standard image, therefore the target image preserves the statistical characteristic of the standard image like mean and variance. On the same time, the gray scale order of the original image is also preserved in the target one, therefore, the individual facial feature remains intact in this process, while the general feature is normalized to a standard device. Figure 4 shows sample images and their corresponding sorted images using the whole image as the neighborhood δ.
(a) D estination image from scanned photo
(b) from C C D camcorder
(c) from digital camera
(d) histogram of (a)
(e) normalized image from (b)
(f) normalized image from (c)
Fig. 4. Sample images and their sorted images
4
Experiment Results
We have tested the effect of the proposed gray scale sorting algorithm on a face recognition system based on multimodal part based PCA (MMP-PCA) [1]. Construction of all the eigenspaces is performed on a selected training set. Firstly,
Robust Face Recognition of Images Captured by Different Devices
467
pure face, brow+eye, eye, nose, mouth are automatically extracted from a normalized face image, which are shown in Figure. 5. PCA is applied to construct the desired eigenspaces: eigenface(1), eigenbrow+eye(2), eigeneye(3), eigennose(4), eigenmouth(5). The projection vectors (Bi ) of extracted facial parts (qi ) from the facial image to be recognized can be calculated through Equation 10, Bi = uTi × qi
i = (1, 2, 3, 4, 5).
(10)
where ui is the matrix composed from the eigenvectors of the corresponding ith eigenspace obtained from the training set. For face recognition, the projection vectors of the simulated portrait calculated from Equation 10 are compared with the corresponding projection vectors of every face image stored in the database to calculate the similarity of the simulated portrait to the stored face image.
Fig. 5. Extracted pure face, brow+eye, eye, nose, and mouth from a geometrically normalized face image
Altogether 31 modalities can be formed through different combination of the above 5 eigenspaces, which can be used freely for simulated portrait recognition based on the similarity evaluation of differnt facial parts from the witness or the operator. When a global modal is chosen, which means the combination of all the eigenspaces together, the total similarity is calculated through a weighting scheme of 6 : 5 : 4 : 3 : 2 for the individually calculated similarities of pure face, brow+eye, eye, nose, mouth. Table 4.1 illustrates the recognition rate of the partial and global modalities for face images of different ages in a database of 100,000 face images. 500 test images are used, and the age difference between the text image to be recognized and the target image stored in the database is above three years. It is shown from table ?? that the global recognition modality with the proposed weighting scheme has the highest recognition rate, and eye+brow has the best performance among all the facial parts. Test of the proposed algorithm is carried out using the above explained face recognition system. We have designed two experiments: First, 20 test images of 20 individuals were inserted in the database of 100000 images. These test samples are scanned from photos taken under ideal situation which excludes the difference in lighting, expression, pose and time. Another 20 images from the same persons
468
G. Su, Y. Shang, and B. Zhang
Table 4.1. Recognition rate of the partial and global recognition modalities for face images of different ages in a database of 100,000 facial images. First column shows the position of the target image in the descending similarity list.
first first 5 first 10 first 50
pure face brow+eye eye nose mouth global 30.6% 18.3% 13.4% 6.2% 14.0% 71.8% 66.3% 32.4% 23.8% 11.5% 14.0% 71.8% 71.4% 38.4% 28.2% 15.0% 17.0% 78.9% 79.5% 51.2% 37.8% 24.3% 26.6% 87.6%
Table 4.2. Recognition rate under ideal situation with/without gray scale normalization. First column shows the position of the target image in the descending similarity list.
first first 5 first 10 first 50
without normalization after normalization 60% 75% 85% 90% 90% 90% 95% 95%
Table 4.3. Recognition rate under realistic situation with/without gray scale normalization. First column shows the position of the target image in the descending similarity list.
first first 5 first 10 first 50
without normalization after normalization 27.1% 40% 45.3% 57.1% 50.0% 61.8% 68.2% 75.9%
as the test group were captured under the same condition using digital camcorder, they were processed with the proposed gray scale sorting algorithm, which normalized their gray scale distribution to the standard distribution of scanned photos. The second experiment was carried out in the same scenario as the first one, except that the images were not captured under the same condition, there exist difference in lighting, pose, as well as half year time difference and the number of test images were increased to 170. This represents a more realistic scenario. Under the global recognition mode, the face recognition performance for the above two experiments are given in table 4.2 and 4.3 respectively. It is shown from the data in these two tables that the recognition performance is increased around 5% in ideal situation and around 10% in a more realistic situation.
5
Conclusion
One of the factors that influence the system performance of a face recognition system is that the images to be compared are often from different imaging de-
Robust Face Recognition of Images Captured by Different Devices
469
vices. For example, in many cases the images stored in the database are scanned photos, while the facial images to be recognized are captured through digital camcorder. Therefore in order to increase the system performance in this situation, it is important to normalize the gray scale distribution of the images before comparing them. In this paper, a gray scale sorting algorithm is proposed to perform the gray scale normalization, and the experiment results showed the effectiveness of the proposed algorithm. The image data in our system were collected under the same illumination environment, therefore it is ensured that the difference in gray scale distribution comes only from different imaging devices. It is worth to mention here that although the proposed gray scale normalization method is tested on a face recognition system with PCA, it is also applicable to face recognition systems that adopt LDA, SVM or etc. In the future, it is worth to investigate further the influence of the neighborhood size, which in this paper we have set as the entire image. Another choice which may perform better is to set the neighborhood to important facial regions. It is also worth to investigate the influence of the proposed algorithm on face recognition systems not based on eigenface technique as the one adopted in this paper.
References 1. Su, G.D., Zhang, C.P., Ding, R., Du, C.: MMP-PCA Face Recognition Method. Electronic Letters, Vol. 38, No. 25, (2002) 415–438 2. Ding, R., Su, G.D., Lin, X.G.: Face Recognition Algorithm Using Both Local Information and Global Information. Electronic Letters, Vol. 38, No. 8, (2002) 363–364 3. Zhao, W., Chellappa, R., Phillips, J., Rosenfeld, A.: Face Recognition: A Literature Survey. ACM Computing Surveys, (2003) 399–458 4. Turk, M., Pentland, A.:Face Recognition Using Eigenfaces. Proc. IEEE CVPR ’91, (1991) 586–591 5. Pentland, A., Moghaddam, B., Starner, T.: View-Based and Modular Eigenspaces for Face Recognition. Proc. IEEE CVPR ’94, (1994) 84–91 6. Zhang, B.X., Su, G.D.: Research on Characteristic of Facial Imaging and Goals of Facial Image Normalization. Optic Electronic · Laser , Vol. 14, No. 4, (2003) 406–410 7. Hezeltine, T.: Evaluation of Image Pre-processing Techniques for Eigenface Based Face Recognition. the Second International Conference on Image and Graphics (ICIG2002), Hefei, China, SPIE, Vol. 4875, (2002) 677-685 8. Gross, R., Baker, S., Matthews, I., Kanade, T.: Face Recognition Across Pose and Illumination. In: Li, S.Z., Jain, Anil K. (eds.): Handbook of Face Recognition. Springer-Verlag, June, 2004 9. Georghiades, A.S., Belhumeur P.N., Kriegman D.J.: From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, No. 6, (2001) 643–660 10. Tian, Y.L., Kanade, T., Cohn, J.F.: Recognizing Action Units for Facial Expression Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 23, No. 2, (2001) 97–115
Robust Feature Extraction for Mobile-Based Speech Emotion Recognition System Kang-Kue Lee, Youn-Ho Cho, and Kyu-Sik Park Dankook University Division of Information and Computer Science San 8, Hannam-Dong, Yongsan-Ku, Seoul Korea, 140-714 {lk_sun, adminmaster, kspark}@dankook.ac.kr
Abstract. In this paper, we propose a robust feature extraction method for mobile-based speech emotion recognition system. A query speech signal is captured by a cellular phone in the real mobile environment. A major problem in this environment is distortions contained in the features of the query sound due to the mobile network and environmental noise. In order to alleviate these noises, a signal subspace noise reduction algorithm is applied. Then a robust feature extraction method called SFS feature optimization is implemented to improve and stabilize the system performance. The proposed system has been tested with cellular phones in the real world and it shows about 73% of average classification success rate with Fuzzy SVM classifier.
1 Introduction Recently, the problem of the speech emotion recognition has gained increased attention, because the human-machine interface grows its importance in intelligent communication services. Besides human facial expressions, speech has proven as one of the most promising media for the automatic recognition of human emotions. With regards to the problem of speech emotional classification, most previous research has used prosodic features such as pitch, energy and spectral information such as MFCC (Mel Frequency Cepstral Coefficient) as their acoustic cues. Dellaert et al. [1] used 17 features and compared different classification algorithm. They achieved 79.5% accuracy with four emotion categories. Scherer [2] extracted 16 features by the jack-knifing procedure and achieved an overall accuracy 40.4% for fourteen emotional states. G. Zhou et al. [3] used nonlinear teager energy operator (TEO) feature for stressed/neutral classification and compared the feature performance with the traditional pitch and MFCC feature. Other good works on speech emotional classification can be found in [4],[5],[6]. Although the most of studies in speech emotion recognition are mainly based on the PC based system with no serious noise condition, little attention has been paid to the automatic speech emotion recognition system in mobile service environment. Previous methods are tend to fail when the query speech signal contains background noises and network errors as in mobile environment. Speech emotion recognition D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 470 – 477, 2006. © Springer-Verlag Berlin Heidelberg 2006
Robust Feature Extraction for Mobile-Based Speech Emotion Recognition System
471
system by a cellular phone has not yet been fully investigated. The essential problem is that in cellular phone based speech emotion recognition the noise characteristics greatly vary with the user’s environment. In contrast to previous works, this paper focuses on the issues of the mobile based speech emotion recognition system that classifies the speech emotion into five categories- neutral, joy, sadness, anger, and annoyance. The proposed system accepts query speech captured by a cellular phone in real mobile environment. In order to release noises due to a mobile network and environment, a signal subspace noise reduction algorithm is applied. Further effort to extract a noise robust feature is performed by SFS (sequential forward selection) feature optimization method. Various classification algorithms were made use of in recent studies about speech emotion recognition. Aim to our implement we choose Fuzzy SVM [7] as our classification algorithm and compared the performance with simple k-NN [8] algorithm. This paper is organized as follows. Section 2 describes proposed mobile-based speech emotion recognition system. Section 3 explains the methods of noise-robust feature extraction and optimization. Section 4 compares experimental results of the proposed system. Finally, a conclusion is given in section 5.
2 Proposed Mobile-Based Emotion Recognition System The proposed system is illustrated in Fig. 1. The system consist 4 stages – speechsignal acquisition, mobile noise reduction, robust feature extraction, and speech emotion
Fig. 1. Proposed mobile-based speech emotion recognition system
472
K.-K. Lee, Y.-H. Cho, and K.-S. Park
classification and SMS service to the user request. Firstly, a query speech signal is picked up by the single microphone of the cellular phone and then transmitted to the emotion recognition server. Then the signal is acquired by the INTEL dialogic D4PCI-U board in 8 kHz ssampling rate, 16 bit, MONO. Secondly, a signal subspace noise reduction algorithm is applied to the query signal. This stage is required to enhance speech signal by reducing mobile noises. Thirdly, pre-defined set of features are extracted from the enhanced query signal. At this moment, SFS feature optimization is applied to extract robust features against mobile noise. Finally, the queried speech is classified using Fuzzy SVM and the classification result will be transmitted to the user request via SMS server.
3 Emotional Feature Extraction and Optimization in Mobile Environment 3.1 A Signal Subspace Noise Reduction Before feature extraction, a well known signal subspace noise reduction algorithm [8] is applied to the query speech acquired by the cellular phone to reduce the mobilenoises. The signal subspace approach for enhancing speech signal is to decompose the vector space of the noisy signal into a signal plus noise subspace and a noise subspace. Enhancement is performed by removing the noise subspace and estimating the clean signal from the remaining signal subspace. So the speech emotional feature extraction is performed on the clean signal space. 3.2 Emotional Feature Extraction At the sampling rate of 8 kHz, the speech signals are divided into 20ms frames with 50% overlapped hamming window at the two adjacent frames. Two types of features are computed from each frame: One is the prosodic features such as pitch and energy. The other is speech phoneme feature such as six mel-frequency cepstral coefficients (MFCC). The means and standard deviations of these original features and their delta values are computed over each frame for each music file to form a total of 32dimensional feature vector. Followings are short descriptions of the features used in this paper. 3.2.1 Pitch A few pitch detection algorithms including HPS[9], AMDF[10], SHR[11] are tested and compared in noisy speech environment. Among them, SHR algorithm shows good robust characteristics against mobile noise and so the SHR is used in this paper. SHR (Subharmonic-to-Harmonic Ratio) transforms the speech signal into the FFT domain, and then decides two candidate pitches by detecting the peak magnitude of the log spectrum. The final pitch is determined by comparing SHR to a certain threshold. If SHR is less than a threshold value, then f 2 is chosen as the final pith. Otherwise
f1 is selected. Here DA( ) is difference function [11].
Robust Feature Extraction for Mobile-Based Speech Emotion Recognition System
SHR = 0.5
DA(log f1 ) − DA(log f 2 ) . DA(log f 1 ) + DA(log f 2 )
473
(1)
3.2.2 Energy Short time energy of the speech signal provides a convenient representation that reflects amplitude variations. The short time energy is defined as ∞
En =
¦x
2
( m) ⋅ h ( n − m) .
(2)
m = −∞
Another significance of short time energy is that it can provide a basis for distinguishing voiced speech segment from the silent speech segment. By analyzing the speech signal in voiced speech segment only excluding the silent period, it can allow more reliable feature extraction process. 3.2.3 Mel Frequency Cepstral Coefficient MFCC is the most widely used feature in speech recognition. It captures short-term perceptual features of human hearing system. Six coefficients MFCC are used in this paper. 3.3 Feature Optimization Against Mobile Noise In order to extract noise robust features in mobile environment, an efficient feature optimization method is desired. As described in paper [12], a sequential forward selection (SFS) method is used to meet this need. SFS also helps to reduce the computational burden and so speed up the search process. In this paper, we adopt the same SFS method for feature selection to reduce dimensionality of the features and to enhance the classification accuracy in mobile noise environment. Firstly, the best single feature is selected and then one feature is added at a time which in combination with the previously selected features to maximize the classification accuracy [12]. This process continues until all 32 dimensional features are selected. After completing the process, we pick up the best feature lines that maximize the emotion classification accuracy.
4 Experimental Results 4.1 Speech Database and Experimental Setup We used the emotional database in Korean speech recorded at 8 kHz by Yonsei University [13]. The database includes short utterances by fifteen semi-professional actor and actress and it consists of total of 5400 utterances in 5 emotional states, i.e, neutral, joy, sadness, anger and annoyance. Among them 1500 utterances (300 utterances for each motional state) were used as the training data set and the 200 utterances (40 utterances for each emotional state) as the test data set. The system was tested 100 times for each noisy query to yield the average classification accuracy.
474
K.-K. Lee, Y.-H. Cho, and K.-S. Park
Fig. 2 shows the block diagram of experimental setup. In order to demonstrate the system performance, two sets of experiment have been performed. One is the system with a proposed signal subspace noise reduction technique and SFS feature optimization (dashed line). The other is the system without any noise reduction technique and the feature optimization. The proposed mobile emotion recognition system works as follows. Firstly, a test speech is acquired through actual cellular phone through Intel NTEL dialogic D4PCI-U board in 8 kHz sampling rate, 16 bit, MONO. Secondly, a feature extraction with a signal subspace noise reduction algorithm is applied to the acquired speech signal. This stage is required to enhance speech signal by reducing mobile noises. Thirdly, pre-defined set of features are extracted from the enhanced speech signal and they are further optimized using SFS method. Finally, two pattern recognition algorithms including k-NN, and Fuzzy SVM are tested and compared to classify the speech emotional state.
Fig. 2. Two sets of experimental set up
4.2 Emotional State Classification Results Fig. 3 shows average classification results of the proposed system with respect to speech query captured by cellular phone in real world. A signal subspace noise reduction and SFS feature optimization is adopted with two pattern classifiers – k-NN and Fuzzy SVM. As seen from the figure, Fuzzy SVM classifier shows fast convergence speed with higher classification accuracy than k-NN. From the figure, we see that the classification performance increases with the increase of features up to certain number of features, while it remains almost constant after that. Thus based on the observation of these boundaries, we can select first 10 features up to the boundary and ignore the rest of them. In this way, we can determine the number of best feature sets for each classifier. As we intuitively know, the less number of feature set is always desirable. Table 1 compares the classification results between the proposed system and the one without the noise reduction (NR) technique and SFS feature optimization. As seen on the table 1, the proposed method with NR and SFS technique achieves more than 20% higher accuracy than the one without them even with less number of feature dimension.
Robust Feature Extraction for Mobile-Based Speech Emotion Recognition System
475
Accurace Rate
1 0.8 0.6 0.4 0.2 k- NN
0 1
3
5
7
SVM
9 11 13 15 17 19 21 23 25 27 29 31 Feature Dimension
Fig. 3. SFS Feature selection procedure with noise reduction algorithm Table 1. Average emotion classification results
Pattern Classifier
k-NN (Feature Dim.)
Fuzzy SVM (Feature Dim.)
Noisy Query
44.5 % (32)
51.5% (32)
Query with NR and SFS
63.5 % (10)
72.5% (10)
Table 2 shows Fuzzy SVM classification performance of the proposed system in speech emotion classification in a form of a confusion matrix. As a comparison purpose, the classification results without NR and SFS method using original 32 dimensional feature vector is included in the table and they are shown in parenthesis. The numbers of correct classification results lie in the diagonal of the confusion matrix. Table 2. Classification accury in a form of confusion matrix (Fuzzy SVM)
Neutral
Joy
Sadness
Anger
Annoyance
Neutral 37 (40) 0(0) 1(0) 2(0) 0(0) Joy 3(11) 24(23) 4(4) 2(0) 7(2) Sadness 4(14) 2(1) 25(24) 4(0) 5(1) Anger 2(16) 4(2) 2(5) 25(3) 7(4) Annoyance 1(5) 4(8) 0(14) 1(0) 34(13) Average Classification Accuracy 72.5% (51.5%) From the table 2, we see much improvement of classification performance with proposed NR algorithm and SFS feature optimization. It actually can achieves more than 20% improvement even with fewer number of feature set. The proposed method
476
K.-K. Lee, Y.-H. Cho, and K.-S. Park
works fairly well over the emotional states of neutral and annoyance while the average number of correct classification is little lower in joy, sadness and anger state. Joy and anger emotional state is often misclassified as annoyance emotional state. The confusion matrix also shows some notable misclassifications over the anger and annoyance for the system without NR and SFS technique.
5 Conclusion In this paper, we propose a mobile based speech emotion recognition system. A query speech signal is captured by a cellular phone in real world. In order to alleviate noise due to the mobile network and environment, a signal subspace noise reduction algorithm is applied. Then a robust feature extraction method called SFS feature optimization is implemented to improve and stabilize the system performance. The proposed system has been tested with cellular phones in the real mobile environment and it shows about 72.5 % of average classification accuracy with Fuzzy SVM which is more than 20% improvement over the system without NR and SFS technique. Future work will involve the development of new emotional features and further analysis of the system for practical implementation.
Acknowledgment This work was supported by grant No. R01-2004-000-10122-0 from the Basic Research Program of the Korea Science & Engineering Foundation
References 1. Dellaert, F., Polzin, T., Waibel, A.:Recognizing Emotion in Speech, In Proc. International Conf. on Spoken Language Processing (1996)1970-1973 2. Scherer, K.R.:Adding the Affective Dimension: A New Look in Speech Anlysis and Synthesis, In Proc. International Conf. on Spoken Language Processing (1996)1808-1811 3. Zhou, G., John, Hansen, James ,H. L., Kaiser, F.:Nonlinear Feature Based Classification of Speech Under Stress, IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 3, MARCH (2001) 4. Kostov, V., Fukuda, S.:Emotion in User lnterface , Voice Interaction system, IEEE Intl Conf. on systems, Man, Cybernetics Representation, no. 2 (2000)798- 803 5. Oriyama, T. M., Oazwa.:Emotion Recognition and Synthisis System on Speech , IEEE Intl. Conference on Multimedea Computing and Systems (1999)840-844 6. Lee ,C. M., Narayanan, S., Pieraccin,i R.:Classifying Emotions in Human-machine Spoken Dialogs, ICME’02 (2002) 7. Shigeo Abe, Takuya Inoue.:Fuzzy Support Vector Machines for Multiclass Problems"ESANN'2002 proceedings - European Symposium on Artificial Neural Networks, Bruges (Belgium), 24-26 April 2002, d-side publi., ISBN 2-930307-02-1, 113-118 8. Anil K. Jin, Robert P.W. Duin, Jianchang Mai.:Statistical Pattern Recognition: A Review IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 1, Jan (2000)
Robust Feature Extraction for Mobile-Based Speech Emotion Recognition System
477
9. Ephraim, Y.:A signal subspace approach for speech enhancement, IEEE Transactions on Speech and Audio Processing, vol. 3, no. 4,. July (1995) 251-266 10. Ross, M.J., Shaer, H.L., Cohen, A., Freudberg, R., Manley, H. J.:Average Magnitude Dierence Function Pitch Extractor, ASSP-22:, Oct (1974) 353 362 11. Noll, M.: Pitch Determination of Human Speech by the Harmonic Product Spectrum, the Harmonic Sum Spectrum, and a Maximum Likelihood Estimate. In Proceedings of the Symposium on Computer Processing Communications (1969)779-797 12. Xuejing Sun.:A Pitch Determination Algorithm Based On Subharmonic-toHarmonic Ratio, ICSLP (2000)676-679 13. Liu ,M. , Wan, C.:A study on Content-based Classification Retrieval of Audio Database, Proc. of the International Database Engineering & Applications Symposium( 2001)339 - 345 14. Kang, Bong-Seok.:A Text-independent Emotion Recognition Algorithm Using Speech Signal, Yonsei University (2000)
Robust Segmentation of Characters Marked on Surface Jong-Eun Ha1, Dong-Joong Kang2, Mun-Ho Jeong3, and Wang-Heon Lee4 1
Automotive Engineering, Seoul National University of Technology, Korea [email protected] 2 Mechanical Engineering, Pusan National University, Korea [email protected] 3 Intelligent Robotics Research Center, Korea Institute of Science and Technology [email protected] 4 Department of Information Technology, Hansei University, Korea [email protected]
Abstract. Optical character recognition (OCR) is widely used for automation. Typical OCR algorithm runs on each character and it include preprocessing step of separating each character from input image. Most segmentation algorithm runs well on good quality of image of machine printed. Also, barcode on surface would be good candidate. But, there are industrial applications that could not adopt barcode. In this case, identification code is marked directly on the surface of products. Characters produced by marking have height difference between character and background region. This makes it difficult to devise illumination system which guarantees good quality of image. New algorithm targeting robust segmentation of characters marked on surface is proposed. Proposed algorithm is based on consistent use of two profiles of accumulated magnitude of edge not only in finding of rectangular region containing identification code on input image but also in final segmentation. Final position of segmentation of each character is found by dynamic programming which guarantee global minimum. Feasibility of proposed algorithm is tested under various lighting condition.
1 Introduction Optical character recognition is widely adopted in factory for tracking of each product on process. Most OCR systems operate on individual characters so that it is essential step to segment out each character on input image. Traditional OCR algorithm deals with printed characters on paper that yield rather good quality of image. Typical approaches for character segmentation first binarize input image, then find the position of each character by analysis of histogram. Lu [1] reviews various machine printed character segmentation methods. In [1], character segmentation algorithms using vertical projection [2], pitch estimation or character size [3,4,5], contour analysis [6], or segmentation-recognition coupled techniques [7,8,9] are reviewed. Most algorithm first binarizes input image and it is commented that binarization preprocess can affects segmentation. Also he suggests that segmentation directly on gray image is promising direction. In [10], character segmentation using projection and topographic features from gray image is presented. Final solution is determined by using multi-stage graph D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 478 – 487, 2006. © Springer-Verlag Berlin Heidelberg 2006
Robust Segmentation of Characters Marked on Surface
479
searching algorithm. Video OCR is presented in [11] and they deal with lowresolution characters and complex background. Character segmentation is done by adopting recognition-segmentation paradigm. In industrial application, characters are marked on the surface of product so that there is noticeable height difference between characters and backgrounds region. One good direction is to devise illumination system that could give a good quality of image. But it is very difficult for illumination system to cover the whole inspection area with homogenous brightness distribution because of the height difference between background and characters and irregularities lies on surface. Fig. 1 shows image acquired under direct lighting by typical illumination where characters are marked on metal surface. We can see that it is very difficult to guarantee homogeneous distribution of light on the entire image. This is due to the fact that characters are marked by machine tools so that area corresponding to characters is more irregular than that of machine-printed. In industrial application, there are many constraints so that it is important for system to be simple and robust. Typical OCR algorithm first extracts each character from input image, then runs recognition algorithm for each character. There are many recognition algorithms such as neural-network. Commercial OCR package include segmentation routine as preprocessing step, but they works well on good quality of image such as printed characters on paper. Proposed algorithm targets robust segmentation of characters on poor quality of image such as characters marked on metal surface. In this paper, we propose simple and robust character segmentation algorithm using histogram of edge and it could be used directly as preprocessing for OCR.
Fig. 1. Characters marked by machine tools on metal surface
2 Finding Region of Interest (ROI) of Characters Typical OCR algorithms run on each segmented character. It is necessary to extract the position of each character on image before applying OCR algorithm. Also they deal with machine printed characters so that background and characters region has rather high contrast. High contrast between background and character region allow for simple threshold-based method to separate background and character region. After then, each character is usually segmented out by analyzing histogram. Fig. 2 shows the histogram of intensity of Fig. 1. We can see that there are many peaks and valleys due to inhomogeneous background and height difference between characters and background materials. This makes it difficult for conventional approach to segment out each character. Fig. 3 shows the result of binarization by applying various threshold values to the image of Fig. 1. It is difficult for threshold-based approach to obtain meaningful binarization under variation of illumination.
480
J.-E. Ha et al.
Fig. 2. Histogram of intensity of two images in Fig. 1
Fig. 3. Result of binarization using the image of Fig. 1 with threshold value 200 and 220
We overcome this problem by analyzing edge profile that is obtained by summation of each edge magnitude along vertical and horizontal direction. Intensity has direct relation to the variation of lighting so that threshold on intensity image gives unpredictable results according to the change of lighting. Proposed method focuses on the fact that characters are marked on surface so that there is height difference between character and background region. Although this fact makes it difficult to devise an illumination system that guarantees homogenous quality of image, there is intensity difference between character and background due to the difference of height. We use consistently edge profile within rectangular region not only in finding region of characters on image but also in segmentation of characters. First, we apply Sobel edge operator on input image. Let E (i, j ) as
E (i, j ) = I x (i, j ) + I y (i, j )
(1)
where I x (i, j ) and I y (i, j ) represents edge magnitude by Sobel operator in x and y
direction. Accumulative edge magnitude along horizontal and vertical direction is defined as AE x (i, j ) =
i
¦ k =0
j
E (k , j ) AE y (i, j ) =
¦ E (i , k )
(2)
k =0
where AE x (i, j ) and AE y (i, j ) is the value of summation of edge magnitude along x and y direction and i is row index and j is column index.
Robust Segmentation of Characters Marked on Surface
481
We can find the profile of edge magnitude of rectangular region as follows. EPx (i, j ) = AE x (i + h, j ) − AE x (i − h, j ) EPy (i, j ) = AE y (i, j + h) − AE y (i, j − h)
(3)
where EPx (i, j ) is profile of edge magnitude along row direction with height of 2h + 1 and EPy (i, j ) is profile of edge magnitude along column direction with width of 2h + 1 . As shown in Eq. (3), the profile of edge magnitude within rectangular region can be obtained by simply subtracting two accumulated values. Two profiles of accumulated edge magnitude are shown at left and bottom side in Fig. 4. Four positions for rectangular region that contain identification code are found by checking two peaks on each profile. In Fig. 4, inner white rectangle represents found region.
Fig. 4. Result of finding the region that contains identification code using vertical and horizontal profile
3 Character Segmentation Using Dynamic Programming Previous section has found the position of characters in input image. In bottom of Fig. 4, profile of accumulated magnitude of edge along vertical direction is shown. There are peaks and valleys in the interior region of character. Conventional segmentation algorithm that finds peaks and valleys would require further processing to deal with peaks and valleys in the interior region of character. We adopt dynamic programming to find the position of segmentation. First, initial positions of search are set by even division of rectangular region containing identification code. These positions could be located at character side due to the difference of width of each character. We could find the position of segmentation by searching neighborhood positions starting from initial position. This problem could be formulated as search on one dimensional as shown in Fig. 5. Finally, this problem could be formulated as search problem on two dimensional grids. Column is initial positions and row corresponds to horizontal direction of search in Fig. 5.
482
J.-E. Ha et al.
This search problem could be solved using dynamic programming that has been used successfully in the area of stereo matching [12] and snake algorithm for contour tracking [13,14] in computer vision.
Fig. 5. Initial position and search grid along horizontal direction
The energy of each control point is defined as E (i, j ) = αEint + E ext
(4)
where α is the coefficient that adjusts relative weighting between internal and external energy term. Eint is defined as Eint (i, j ) = dis (i, j )
(5)
where dis (i, j ) is distance between two positions of i and j . Eext is defined as Eint (i, j ) = EPx (i, j )
(6)
where EPx (i, j ) is defined in Eq. (3). Eint corresponds to the internal energy of Snake model and Eext corresponds to the external energy of Snake model. Final solution that minimizes the total energy is found by dynamic programming [14] and it can be implemented using backwardlooking algorithm as shown in Fig. 6. In backward-looking algorithm, lines 1-2 initialize the array of cost. c(r, c) is cost at row r and column c. p(r, c) saves the path to the previous column that gives minimum cost. d s is the width of the search and d c is the number of segmentation. After processing the entire column, solution could be traced into backward using p(r, c) .
Robust Segmentation of Characters Marked on Surface
1 2
483
for r ← 0 to d s c(r,0) ← E external (r,0)
3 for c ← 1 to d c 4
for cr ← 0 to d s
5
c ext ← E ext (cr, c)
6
c min ← ∞
7
for ir ← 0 to d s
8
c int ← E int (cr, ir)
9
c new ← c ext + c int + c(ir, c - 1)
10
if c new < c min
11
c min ← c new
12
p(cr, c) ← (ir, c - 1) c(cr, c) ← c min
13
Fig. 6. Backward-looking algorithm
4 Experimental Results Fig. 7 shows some images among 14 test images acquired under various illumination conditions such as fluorescent lamp, LED and halogen lamp. In Fig. 7, all images are sub-part of input image corresponding to rectangular region containing identification code that is found automatically. In Fig. 7, black vertical lines are initial positions of segmentation by even spacing of rectangular region. White vertical lines are the position of segmentation found by proposed algorithm. We can notice that some initial positions of segmentation lie on middle side of characters owing to the combination of characters and numbers with different width. The distribution of brightness value on characters and backgrounds has large variations according to the change of illumination condition. Nevertheless proposed algorithm finds correctly segment out each character. In experiments, the same value for parameters is used. Search width is set as 50 pixels, α in Eq. (4) is set as 0.1. The number of characters is provided manually and this is the only parameter that user should set. Proposed algorithm has also the property that could cope with the change of scale because all processing is done using two profile of accumulated magnitude of edge. Accuracy of segmentation is evaluated by comparing with true value that is obtained manually. Table 1 shows the accuracy of proposed algorithm. Proposed algorithm could segment out each character with one pixel accuracy that is sufficient enough to run conventional OCR algorithm. In Fig. 7, three different lightings such as LED, fluorescent lamp, and halogen lamp are used. Proposed algorithm gives consistent results of segmentation under the variation of lighting with fixed parameters. Fig. 8 shows error according to the variation of coefficient in Eq. (4). In range between 0.1 and 100, we could have segmentation with accuracy of 1 pixel.
484
J.-E. Ha et al.
(a)
(b)
(c) Fig. 7. Experimental results under various lighting conditions such as (a) LED (b) fluorescent lamp and (c) halogen lamp (Vertical lines with black are initial positions and vertical lines with white are the found position as segmentation of characters)
Robust Segmentation of Characters Marked on Surface
485
Table 1. The error of character segmentation and true values are obtained by hand ([pixel])
mean error 0.795
standard deviation 0.696
Fig. 8. The error of segmentation according to the variation of coefficient in Eq. (4)
Fig. 9. Images used in noise experiment (from top to bottom original image, Gaussian noise with 2%, 4%, and 6%) Table 2. The error of character segmenation acoording to the variation of noise ([pixel])
Mean error Standard deviation
2% 1.1187 0.9068
Gaussian Noise Level 4% 6% 1.5125 1.5688 1.0815 1.1526
486
J.-E. Ha et al.
Fig. 9 shows image with Gaussian noise and Table 2 shows the result of proposed algorithm according to the variation of noise level. Up to the noise level of 6%, proposed algorithm gives the accuracy of segmentation under 2 pixel that is sufficient for the preprocessing step of optical character recognition algorithm. Proposed algorithm not only operates robustly under the variation of illumination but also is simple that requires only two values of threshold and one user input. Proposed algorithm could be used effectively in the preprocessing of segmentation of characters marked on surface.
5 Conclusion New algorithm targeting robust segmentation of characters marked on surface is proposed. Characters produced by marking have height difference between character and background region. This makes it difficult to devise illumination system which guarantees good quality of image. Proposed algorithm is based on consistent use of two profiles of accumulated magnitude of edge not only in finding of rectangular region containing identification code on input image but also in final segmentation. Final position of segmentation of each character is found by dynamic programming which guarantee global minimum. Proposed algorithm could be used effectively as preprocessing in OCR application targeting poor quality of image.
References 1. Yi, L.: Machine Printed Character Segmentation – An Overview, Pattern Recognition, 28(1) (1995) 67-80 2. Yi, Lu., Haist, B., Harmon, L., Trenkle, J., Vogt, R.: An Accurate and Efficient System for Segmenting Machine-printed Text. U.S. Postal Service 5th Advanced Technology Conference, 3 (1992) 93-105 3. Tsuji, Y., K. Asai.: Character Image Segmentation. SPIE vol. 504 Applications of Digital Image Processing VII, (1984) 2-10 4. Tsuji, Y., Asai, K.: Adaptive Character Segmentation Method Based on Minimum Variance Criterion. Systems Computers, Japan, 17(7) 1986 5. Tsuji, Y., Tsukumo, J., K. Asai.: Document Image Analysis for Reading Books. SPIE Advances in Image Processing, 804 (1987) 237-244 6. Hoffman, R. L., McCullough, J. W.: Segmentation Methods for Recognition of Machineprinted Characters. IBM J. Res. Dev., (1971) 153-165 7. Casey, R. G., G. Nagy.: Recursive Segmentation and Classification of Composite Character patterns. Proc. 6th International Conference on Pattern Recognition, (1982) 1023-1026 8. Casey, R. G.: Coarse-fine OCR Segmentation. IBM TDB, 23(8) (1981) Kovalevsky, Image Pattern Recognition. Springer,(1980) 9. Lee, S.W., Lee, D.J., Park, H.S.: A New Methodology for Gray-scale Character Segmentation and Recognition. IEEE Trans. on Pattern Analysis and Machine Intelligence, 18(10) (1996) 1045-1050 10. Sato, T., Smith, M., Satoh, S., Kanade, T., Hughes, E..: Video OCR: Indexing Digital News Libraries by Recognition of Superimposed Caption. ACM Multimedia Systems Special Issue on Video Libraries, 7(5) (1999) 385-395
Robust Segmentation of Characters Marked on Surface
487
11. Birchfield, S., Tomasi, C.: Depth Discontinuities by Pixel-to-pixel Stereo. International Journal of Computer Vision, 35(199) (1999) 269-293 12. Kass, M., Witkin, A., Terzopoulous, D.: Snakes: Active Contour Models. International Conference on Computer Vision, (1987) 259-268 13. Amini, T. E. Weymouth., Jain, R. C.: Using Dynamic Programming for Solving Variational Problems in Vision. IEEE Trans. Pattern Analysis and Machine Intelligence, 12 (1990) 855-867
Screening of Basal Cell Carcinoma by Automatic Classifiers with an Ambiguous Category Seong-Joon Baek, Aaron Park, Daejin Kim, Sung-Hoon Hong, Dong Kook Kim, and Bae-Ho Lee The School of Electronics and Computer Engineering, Chonnam National University, Gwangju, South Korea, 500-757
Abstract. Raman spectroscopy is known to have strong potential for providing noninvasive dermatological diagnosis of skin cancer. According to the previous work, various well known methods including maximum a posteriori probability (MAP), multilayer perceptron networks (MLP), and support vector machine (SVM) showed competitive results. Since even the small errors often leads to a fatal result, we investigated the method that reduces classification error perfectly by screening out some ambiguous patterns. Those ambiguous patterns can further be examined by routine biopsy. We incorporated an ambiguous category in MAP, linear classifier using minimum squared error (MSE), and MLP. The experiments involving 216 confocal Raman spectra showed that every methods could perfectly classify basal cell carcinoma (BCC) by screening out some ambiguous patterns. The best results were obtained with MSE. According to the experimental results, MSE gave perfect classification by screening out 8% of test patterns.
1
Introduction
Skin cancer is one of the most common cancers in the world. Recently, the incidence of skin cancer has dramatically increased due to the excessive exposure of skin to UV radiation caused by ozone layer depletion, environmental contamination, and so on. If detected early, skin cancer has a cure rate of 100%. Unfortunately, early detection is difficult because diagnosis is still based on morphological inspection by a pathologist [1]. There are two common skin cancers: BCC and squamous cell carcinoma (SCC). Among them, BCC is the most common skin neoplasm and difficult to distinguish from surrounding noncancerous tissue. Clinical dermatologists have requested the accurate detection of BCC. The routine diagnostic technique used for the detection of BCC is pathological examination of biopsy samples. It involves rather complex treatments and relies upon a subjective judgment, which is dependent on the level of experience of an individual pathologist and can lead to the excessive biopsy of tissues. Thus, a fast and accurate diagnostic technique for the initial screening and selection of lesions for further biopsy is needed [2]. Raman spectroscopy has the potential to resolve this problem. It can be applied to provide an accurate medical diagnosis to distinguish BCC tissue from D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 488–496, 2006. c Springer-Verlag Berlin Heidelberg 2006
Screening of Basal Cell Carcinoma by Automatic Classifiers
489
surrounding normal (NOR) tissue. Recently, some of the researchers carried out BCC detection using Raman spectroscopy. By far, the most widely used method is Fourier transform (FT) Raman spectroscopy [3] [4]. However FT Raman spectra from a long wavelength excitation laser give poor signal-to-noise ratio and suffer from so called background noise. Then complicated statistical treatments are required for FT Raman spectra to eliminate background noise. More recently, direct observation method based on the confocal Raman technic was presented for the dermatological diagnosis of BCC using the shorter wavelength argon ion laser [2]. According to the study, confocal raman spectra provided promising results for detection of precancerous and noncancerous lesions without special treatments. Based on the result, we investigated various classification methods including MAP, probabilistic neural networks, k-nearest neighbor, MLP, and SVM for BCC detection [5]. According to the results, MAP and MLP gave a classification error rate of 4-5%. Even though the error rates are fairly low, there will be the cases that the more accurate decision is needed. Hence, we investigated the method that further reduces classification error by introducing an ambiguous category to the classifiers. Three types of classifiers, i.e., MAP, MSE, and MLP, were inspected and modified to accommodate an ambiguous category. Then we compared the classification error rates of the methods through the experiments involving 216 confocal Raman spectra.
2
Sample Preparation and Preprocessing
The tissue samples were prepared with the conventional treatment, which is exactly the same as in [2]. Details for the biological and chemical processes and the preprocessing procedures are described in [2] [5]. Some confocal Raman spectra for the skin samples are drawn in Figure 1. In the Figure, most of the spectra show a clear distinction between BCC and NOR tissues while it is not so evident for the case of Fig. 1 B(g) and Fig. 1 B(i). Between them, Fig. 1 B(g) is considered to be an outlier because it is very different from the surrounding spectra. On the other hand, Fig. 1 B(i) looks similar to a BCC spectrum and can not be considered as an outlier. It is probably caused by the fact that the spectrum was obtained from the vicinity of the boundary between BCC and NOR. A skin biopsy and spectral measurements were carried out in the perpendicular direction from the skin surface. That is the direction from the epidermis to the dermis in Fig. 1 A and 1 B. Confocal Raman spectra of BCC tissues were measured at different spots with an interval of 30-40μm. In this way, 216 Raman spectra were collected from 10 patients. After the measurements, the spectra were normalized so that it falls in the interval [-1,1]. Then we applied the same clipping window in [5] so as to discard unnecessary data. For dimension reduction, well known principal component analysis (PCA) was applied. Since PCA identifies orthogonal bases on which projections are uncorrelated, it is the most preferred method for data reduction. Principal components can be obtained via eigenvalue decomposition of the following scatter matrix S.
490
S.-J. Baek et al.
a
}
b
skin
c
}
d e
30~40༁
f g h i
A
j
1800
1400
600
1000
a
}
b
NOR
skin
c
}
d e
BCC
f
30~40༁
}NOR
g h i
}
j
B
Raman Shift (cm-1)
1800
1400
1000
NOR
BCC
NOR
600
Raman Shift (cm-1)
Fig. 1. Confocal Raman profiles of skin tissue with an interval of 30-40 μm
S=
(xk − μ)(xk − μ)T ,
(1)
k
where xk is an input pattern and μ is the mean of xk . If we let D be a diagonal matrix of eigenvalues in descending order and E be an orthogonal matrix whose columns are the eigenvectors corresponding to eigenvalues, we can obtain principal components y k as follows. S = EDE T , y k = E T xk .
(2)
Data reduction is accomplished by discarding the unimportant elements of y k . Figure 2 shows the first and the second principal components of BCC and NOR. From the figure, we can see that the features are highly discriminating even though there are some confusing features near the boundary. This figure also says that more than two features are needed to improve the performance of classifiers. However, if we introduce an ambiguous category, even two features would give the satisfactory results. So we used the first two principal components only in the experiments.
3
Classification Methods and Experimental Results
In this section, we describe the classification methods with an ambiguous category. Three types of classifiers including MAP, MSE, and MLP were examined. Among them, MSE is the only method which has linear decision region. Since input patterns are not linearly separable as shown in Fig. 2, other forms of linear discriminant classifier are not applicable except for MSE. To introduce an ambiguous category, the decision rules of MAP, MSE, and MLP are to be modified. For practical application, if a pattern is classified into an ambiguous category, it could be further examined through routine biopsy.
Screening of Basal Cell Carcinoma by Automatic Classifiers
491
10
8
6
The second principal component
4
2
0
−2
−4
−6
NOR −8
−10 −15
BCC
−10
−5
0
5
10
The first principal component
Fig. 2. The first and the second principal components of NOR and BCC
3.1
Classifiers with an Ambiguous Category
Reduced Coulomb energy networks (RCE) classifier is a method which inherently incorporates ambiguous category. Thus we used RCE as a reference. In RCE training, each pattern in the training set has a parameter equivalent to a radius in the d-dimensional space that is adjusted to be as large as possible without enclosing any points from a different category [6]. As new patterns are presented, each radius is decreased so that no sphere encloses a pattern of a different category. In this way, each sphere can enclose only patterns having the same category label. During classification, a test pattern which lies in the overlapped region of different class is considered ambiguous. Let λi be the radius of a training pattern y i and L be the set of labels of training patterns in whose hyperspheres the test pattern x lies. Assume that the class label of y i is li and the following Euclidian distance is used.
(3) D(x, y i ) = (x − y i )T (x − y i ). The RCE classification algorithm can be summarized as follows. initialization: L = {} for all y i : if D(x, y i ) < λi then L = L li decision: corresponding label, if all labels in L are the same ambiguous label, otherwise.
(4)
Nonparametric method like RCE suffer from the requirement that all of the patterns must be stored. Because a large number of samples are needed to obtain
492
S.-J. Baek et al.
good estimates, the memory requirements can be severe. In addition, considerable computation time may be required. Thus parametric method is preferable when the data is not sufficient and hard to obtain as in our case. In MAP classification, we select the class ωi that maximizes the posterior probability density p(ωi |x). Given the same prior probability density, it is equivalent to the selection of the class that maximizes the class conditional probability density. Let ω1 , ω2 be BCC and NOR class respectively. Let μi , Σi be a mean vector and a covariance matrix, and ni be the number of training patterns in ωi . Assuming that the class conditional probability density is multivariate Gaussian, MAP classification rule can be described with a discriminating function gi . Decide ω1 if g1 (x) ≥ g2 (x) and ω2 otherwise, where −1 gi (x) = −(1/2)xT Σ −1 i x + Σ i μi + ri , −1 −1 T ri = −(1/2)μ i μi − (1/2)ln|Σ i |, inΣ i μi = (1/ni ) k=1 xk , ni Σ i = (1/ni ) k=1 (xk − μ)(xk − μ)T .
(5)
To introduce an ambiguous class, we took a simple approach. Let ω3 denote the ambiguous class and θ be a threshold. Modified classification rule with an ambiguous class is as follows. ⎧ ⎨ ω1 if g1 (x) ≥ g2 (x) + θ, (6) ω2 if g2 (x) ≥ g1 (x) + θ, ⎩ ω3 otherwise. The problem of minimizing the sum of squared error in MSE training is a classical one. Let’s define an augmented training vector y a be [1, y T ]T , where y is a training vector. It can be solved with the psedoinverse of a matrix Y whose row is y a . Once we have the solution vector wa normal to the decision hyperplane, an input pattern x can be classified according to the sign of the inner product between an augmented vector xa and w a . But this decision rule should be modified to introduce an ambiguous class. Instead of the inner product, we compute the minimum distance from the test pattern to the decision hyperplane. Let σ be a small positive number and dx be the minimum distance from x to the decision hyperplane. Assuming that the sign of augmented training vectors from NOR class are converted beforehand for normalization, the modified MSE algorithm can be summarized as follows. wa = σ(Y T Y )−1 Y T I = σY + I dx = w Ta xa /w ⎧ ⎨ ω1 if dx > +θ, ω2 if dx < −θ, ⎩ ω3 otherwise.
(7)
MLP is the most powerful and flexible classifier since it can adapt to arbitrarily complex posterior probability functions [7]. Each layer of MLP has several
Screening of Basal Cell Carcinoma by Automatic Classifiers
493
processing units, called nodes or neurons which are generally nonlinear units except for the input nodes that are simple bypass buffers. The unit operation is characterized by the equation, ok = f (netk ). The input to the unit, netk , and the bipolar sigmoidal function, f (), is given by the following equations [8]. = i wik oi + biask , netk (8) 2 f (netk ) = 1+exp(−2net − 1. k)
1
1 z θ
Σ ˧
x1
ω1 ω3 -θ ω2
x2
input
˧
hidden
˧
output
Fig. 3. Structure of the feed forward neural networks
Since there are two distinct classes, only one output unit was used in the experiments. The number of hidden units was set to 5 according to the preliminary experiments. Figure 3 shows the structure of the networks. MLP models were trained to output -1 for NOR class and +1 for BCC class using back propagation algorithm. The performance of MLP undergoes a change according to the initial condition. Thus the experiments were carried out 20 times and the results were averaged. If we let the output value of MLP be z, decision rule is as follows and depicted in Fig. 3. ⎧ ⎨ ω1 if z > +θ, (9) ω2 if z < −θ, ⎩ ω3 otherwise To see the distribution of the feature vectors of three different classes, we plotted them in the Fig. 4, where x, y axis corresponds to the first and the second principal component. The figure clearly shows that ambiguous data lie in the vicinity of the decision boundary. With the aid of this ambiguous category,
494
S.-J. Baek et al. 10
8
6
The second principal component
4
2
0
−2
−4
−6
False
BCC −8
NOR Ambiguous
−10 −15
−10
−5
0
5
10
The first principal component
Fig. 4. Distribution of BCC, NOR, and ambiguous patterns in case of MAP classification
input feature vectors in BCC and NOR are well separated. Hence we could expect that the classifiers would show the enhanced performance. 3.2
Experimental Results
The classification results without ambiguous category are summarized in the Table 1. In the table, we can see that the the average sensitivity is about 96.9% while the average specificity is about 95.4%. Even with two principal components, the classification results are good, which convince us that the confocal Raman spectra give the discriminating features. Table 1. Classification results without ambiguous category. Stars indicate the decision of an expert pathologist. MAP MSE MLP BCC NOR BCC NOR BCC NOR BCC∗ 96.1 3.9 96.1 3.9 98.7 1.3 NOR∗ 3.6 96.4 3.6 96.4 6.4 93.6
The experimental results with an ambiguous class are summarized in the Table 2. The threshold of each methods is 0.75, 6, 3.5, 0.89 respectively. It was adjusted so that the number of ambiguously classified patterns do not exceed
Screening of Basal Cell Carcinoma by Automatic Classifiers
495
10% of test patterns. According to the results, average sensitivity is about 99.0% and average specificity is about 99.5% when RCE is excluded. Though RCE is the only classifier with inherent ambiguous class, it gave poor classification results in terms of specificity. It is because RCE is the only nonparametric method and enough data is not available. Table 2. Classification results with ambiguous category. Stars indicate the decision of an expert pathologist. RCE MAP MSE MLP BCC NOR BCC NOR BCC NOR BCC NOR BCC∗ 100 0 97.1 2.9 100 0 100 0 NOR∗ 3.3 96.7 0.8 99.2 0 100 0.8 99.2
Among the four classifiers, MSE shows the best results, i.e, perfect classification for BCC and NOR. In the case, the percentage of ambiguous data is about 8%. It can be inferable from the Fig. 2. The figure shows that the patterns could be made linearly separable with some margin, i.e., by screening out some patterns in the vicinity of decision boundary. It is consistent with the experimental results with MSE. The other methods can also be made perfect by allowing more ambiguous data. Table 3 shows the classification error rates of MAP as the number of ambiguous data is increased. According to the results, test patterns are perfectly classified when ambiguously classified data are over 16.2% of test features. In comparison with the case of MSE, the more ambiguous data are needed for perfect classification due to nonlinear decision boundary of MAP. The same tendency was observed for MLP. Table 3. Classification results of MAP with the number of ambiguous data increasing. Stars indicate the decision of an expert pathologist. The percentage of ambiguous data 9.2% 10.5% 12.7% 16.2% BCC NOR BCC NOR BCC NOR BCC NOR BCC∗ 97.1 2.9 98.5 1.5 100 0 100 0 NOR∗ 0.8 99.2 0.8 99.2 0.8 99.2 0 100
4
Conclusion
In this paper, we investigated the performance enhancement of the classifiers in the previous work by introducing an ambiguous category. Three types of classifiers including MAP, MSE, MLP are modified to accommodate an ambiguous category and compared to RCE. Our experiments show that we can reduce classification error rates by increasing the number of ambiguous data. According to
496
S.-J. Baek et al.
the experimental results, perfect classification is possible with reasonable number of ambiguous data, i.e, about 8% of test patterns with MSE classifier. With these promising results, we are currently inspecting which peaks or regions are the most discriminating for BCC detection.
Acknowledgement This work was supported by grant No. RTI-04-03-03 from the Regional Technology Innovation Program of the Ministry of Commerce, Industry and Energy(MOCIE) of Korea.
References 1. Jijssen, A., Schut, T. C. B., Heule, F., Caspers, P. J., Hayes, D. P., Neumann, M. H., Puppels, G. J.: Discriminating Basal Cell Carcinoma from its Surrounding Tissue by Raman Spectroscopy. Journal of Investigative Dermatology 119 (2002) 64–69 2. Choi, J., Choo, J., Chung, H., Gweon, D.-G., Park, J., Kim, H. J., Park, S., Oh, C.H.: Direct Observation of Spectral Differences between Normal and Basal Cell Carcinoma (BCC) Tissues using Confocal Raman Microscopy. Biopolymers 77 (2005) 264–272 3. Sigurdsson, S., Philipsen, P. A., Hansen, L. K., Larsen, J., Gniadecka, M., Wulf, H. C.: Detection of Skin Cancer by Classification of Raman Spectra. IEEE Trans. on Biomedical Engineering 51 (2004) 1784–1793 4. Nunes, L. O., Martin, A. A., Silveira Jr, L., Zampieri, M., Munin, E.: Biochemical Changes between Normal and BCC Tissue: a FT-Raman study. Proceedings of the SPIE 4955 (2003) 546–553 5. Baek, S.-J., Park, A., Kim, J.-Y., Na, S. Y., Won, Y., Choo, J.,: Detection of Basal Cell Carcinoma by Automatic Classification of Confocal Raman Spectra, (to appear in ICIC 2006 Proceedings) (2006) 6. Duda, R. O., Hart, P. E., Stork, D. G.: Pattern Classification. Jone Wiley & Son, Inc (2001) 7. Gniadecka, M., Wulf, H., Mortensen, N., Nielsen, O., Christensen, D.: Diagnosis of Basal Cell Carcinoma by Raman Spectra. Journal of Raman Spectroscopy 28 (1997) 125–129 8. Kecman, V.: Learning and Soft Computing. The MIT Press (2001)
Segmentation of Mixed Chinese/English Documents Based on Chinese Radicals Recognition and Complexity Analysis in Local Segment Pattern Yong Xia, Bai-Hua Xiao, Chun-Heng Wang, and Yao-Dong Li Institute of Automation, Chinese Academy of Sciences, Beijing, 100080, China {yong.xia, baihua.xiao, chunheng.wang, yaodong.li}@ia.ac.cn
Abstract. Segmentation based on character recognition is one of the most popular methods of segmenting mixed Chinese/English documents. However, the rejection to outliers is always the bottleneck of this method. A new method is provided to alleviate the problem in this paper. We will give language attribute of each segment as possible as we can and then merge or split segment according to the language attribute. First of all, we construct a mixed OCR engine for Chinese radical and English character and some English character-pairs. Furthermore, English negative samples are trained to improve the capability of rejection to outliers. Finally, language determination of segments based on the mixed OCR engine and complexity analysis of local pattern is conducted. Encouraging performance has been obtained according to the test results.
1 Introduction The segmentation and recognition of multi-language document images attract lots of researchers’ interests[1]-[3], and mixed document including Chinese is more outstanding[4]-[8]. The reason is that mixed Chinese/English document is very common, and it is also difficult to segmentation because of much similarity between Chinese and English characters. Besides, the number of Chinese characters is very large and some special method should be considered to segment mixed Chinese/English documents. Language determination is necessary to segment mixed documents. There are primarily two methods, namely adopting global pattern features and adopting local pattern features. The global pattern is a big text block including lots of segments[2][3][7][8]. And the local pattern is a small text block including few segments[4]-[6][9][10]. It is obvious that higher accuracy of language determination can be achieved by adopting global pattern features than by adopting local pattern features. But as for a common mixed document, various mixed types cause the invalidity of global pattern feature. So language determination based on local pattern features is more applicable in general. Traditionally, while recognizing a mixed document, segmentation is prior to recognition and language determination is prior to segmentation. However, this method causes an open loop system and excellent results cannot be achieved. At present, integrating language determination and segmentation and recognition is an effective D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 497 – 506, 2006. © Springer-Verlag Berlin Heidelberg 2006
498
Y. Xia et al.
way to recognize mixed multi-language documents, which is agreed by most of researchers. And the method of integration is the key to solve the problem. Especially, some problems such as how to evaluate recognition reliability and how to reject outliers are the bottleneck of adopting feedback of character recognition. Next section, we will discuss this problems and give our solution to segmentation of mixed Chinese/English documents and the inspiring performance is achieved. The rest of this paper is organized as follows. Section 2 discusses some methods of rejection to outliers and analyzes primary derivation of segmentation error. Section 3 demonstrates the method of pre-segmentation. The most critical parts in this paper are section 4, 5 and 6. Mixed OCR for Chinese radicals and English characters and English character-pairs is described in section 4 and complexity analysis of pattern is illustrated in section 5 and language determination in local pattern is illustrated in section 6. Next, the experimental results are given in section 7. Finally, section 8 makes a summary.
2 Rejection to Outliers and Segmentation Error Analysis It is rational to segment document image based on recognition, which is somewhat like the procedure of man kind’s recognizing characters. The intention of adopting the feedback of recognition is to correct the error from segmentation. When segmentation errors occur, outliers also emerge. How to reject them? Is it enough to adopting recognition confidence? How to get the recognition confidence? All these problems are very important, but very difficult. Some researchers do some tries to solve these problems. In [11], two rules are provided as follows. We presume that the distances of two top rank candidates are d1 and d2 respectively, and the relation d1d2 holds. The first rule is that rejection based on recognition will occur while d1>D1, where D1 is a threshold from training. And the second rule is that the input pattern will be rejected if d2 d1į, where į is a threshold from training. In [11], Liu et al. point out that it is still difficult to reject all outliers reliably whatever the threshold is. Therefore, training negative samples is prevalently adopted to enhance the capability of rejection to outliers[11][12][13]. In [12], 1000 outlier models are constructed to reject garbage pattern. Furthermore, the valid sample space is enlarged by adding 44 character-pairs class that are prone to causing error of segmentation or recognition. All these methods can promote the performance of OCR for English or number text, but some special problems should be considered in OCR for mixed Chinese/English documents. There are large character classes for Chinese characters and training negative samples is hardly applicable or very difficult because we cannot construct very limited dummy negative classes to respond the whole negative sample space. There are primarily two kinds of segmentation errors, and one is that a Chinese character is segmented into several parts and the other is that several English characters are emerged into one unit. The reason is primarily that components of a Chinese character are confusable with some English characters. Besides, touching English characters cause much trouble to language determination of Chinese and English. Based on the derivation of errors and some features of Chinese characters’
Segmentation of Mixed Chinese/English Documents
499
construction, we give a new general way to segment mixed Chinese/English documents. Chinese radicals and some confusable English character-pairs are added to the valid sample classes. Negative English samples are also trained as [12]. The local pattern features and recognition comparison are considered simultaneously to solve the ambiguity.
3 Pre-segmentation We presume that the input document image have been segmented into lots of text lines. Then connected component analysis is conducted to a text line and some conservative mergence to the connected components will be done based on their mutual position dependency. The set of connected components is defined as C and we give a factor to respond the possibility of mergence of two segments as follows:
d= where
w( xi ) + w( x j ) − w( xi ∪ x j ) min {w( xi ), w( x j )}
(1)
xi , x j ∈ C and they are neighbors, w( x) is the width of the segment of x,
xi ∪ x j is the segment of merging xi and xj. If the following relation holds, the two segments will be merged.
d ( xi , x j ) ≥ T
(2)
where, T is a threshold and equal to 0.5 in this paper. We call those merged segments as basic segments and the set including them is defined as BC. Then merge those basic segments by setting the threshold zero. The new created set is defined as CC. In general, the number of Chinese characters is much more than that of English characters in mixed Chinese/English documents. So we can segment the text line into lots of small blocks and every block is probably made up of only one Chinese character or several English characters. Then characters in every block have identical language attribute. The average width and height of Chinese characters in a line will be evaluated based on the set of CC. Firstly, a new set defined as RC is created as follows:
½ w( x) RC = ® x x ∈ CC , h( x) > 0.9 H LR , 0.7 ≤ ≤ 1.3¾ h( x ) ¯ ¿
(3)
where w(x) is the width of segment, h(x) is the height of segment and HLR is the height of text line. Secondly, we presume the number of segments in the set of RC is p and then we can evaluate the average width and height of Chinese character as follows:
500
Y. Xia et al.
WA = where
1 p 1 p w( xi ) , H A = ¦ h( xi ) ¦ p i =1 p i =1
(4)
xi ∈ RC , WA is average width and HA is average height. Furthermore the
average ratio of width and height is defined as:
RA =
WA HA
(5)
Based on some tolerance to error, we can merge those basic segments in the set of BC to small blocks. The set of small blocks is defined as VC. The procedure of mergence is shown in Fig.1. In order to show the procedure clearer, the basic segments are divided before mergence.
(a)
(b) Fig. 1. Pre-mergence of segments in a text line. (a) The original text line. (b) The procedure of pre-mergence of segments.
In addition, some special symbols such as ‘[’, ‘(’, ‘{’ etc. and some punctuations such as comma, quotation mark and so on will make some trouble to pre-mergence. But the features of these special symbols and punctuations are very outstanding when standing by a Chinese character and we can attain high precision to detect them before pre-mergence.
4 Mixed Recognition of Chinese Radicals and English Characters In general, a basic segment is probably Chinese character or English character or Chinese radical or touching English characters or a part of a Chinese character rather than radical or touching Chinese characters or broken characters. After the premergence in the above section, each segment will be probably composed of a Chinese character or an English character or English string. So the segment is composed of characters of identical language. If the segment is Chinese, we will recognize it directly. If the segment is English, we will split the segment to basic segments and then recognize it. Therefore, the most important thing is language determination of each segment. A recognition engine for Chinese radicals and English character and characterpairs is constructed. Negative English character samples are also trained in order to
Segmentation of Mixed Chinese/English Documents
501
enhance the capability of rejection to outliers. Chinese radicals have about 200 classes, and only 96 of them, which are prone to causing segmentation error, are extracted and are shown in Fig.2. These radicals are located in the left or right of a Chinese character and prone to be classified as English character. English character samples are composed of numbers, English letters, punctuations and some symbols. Some special punctuation or symbols such as comma and quotation mark and dash are not included in this sample classes because these punctuation and symbols are very obvious in pattern features or typographic features. A simple classifier will be considered to these special symbols and punctuations. English character-pair samples have 45 classes, where 44 classes are from [12] as shown in Fig.3 and we add a class of ‘ffi’. The negative samples are from segmentation error collection, which is similar to [12]. Allowing for the features in mixed Chinese/English documents, we also add some artificial negative samples composed of some invalid character-pairs.
Fig.2. Chinese radical classes
Fig. 3. English character-pair classes
Two kinds of classifiers are considered. One is nearest neighbor classifier based on Euclidean distance, which is also similar to [12]. The other is based on ANN. As for the former, the technology of multi-model is adopted. And as for the latter, BP is adopted to train the samples. As demonstrated in [11][13][15][16], the performance of rejection to garbage patterns in adopting ANN is worse than adopting nearest distance classifier, so we select the nearest neighbor classifier.
5 Complexity Analysis in Local Segment Pattern In general, Chinese characters are more complicated than English characters in pattern features. But it is difficult to describe the complexity. We do some tries based on black run or white run of pattern, where a run means a string of continuous pixels with identical gray color in a line or a column. In this paper, the gray color is only black or white. We can get these black or white runs through scanning the pattern horizontally or vertically. Some definitions of sets are given as follows. AFH is horizontal black runs, AFV is vertical black runs, ABH is horizontal white runs and ABV is vertical white runs. And A is the combination of the above four sets.
502
Y. Xia et al.
We define the complexity function of pattern as:
f ( x) = max { f1 ( x1 ), f 2 ( x2 ), , f s ( xs )}
(6)
f i ( xi ),1 ≤ i ≤ s is sub-function of f ( x) and s is the number of sub-function and x, xi ∈ A . In this paper s is equal to 4. Next we will explain these sub-functions
where
in detail. The sub-function 1 is based on the normalized number of vertical black runs in each column, which is defined as:
g FV ( x) =
col _ num
¦
α i g FV i ( x), x ∈ AFV
(7)
i =1
where
g FVi ( x) is the number of black runs in the ith column and α i is coefficient of
each column and holds the relation: col _ num
¦
α i = 1, α i ≥ 0
(8)
i =1
Because some English characters of serif-font will make some interference, we allocate different coefficient to different column to decrease the negative effect. The left and right sides of the pattern are allocated lower coefficients than the middle of the pattern. The sub-function 1 is defined as follows.
g FV ( x) < T1 0 ° f1 ( x ) = ®1 T1 ≤ g FV ( x) < T2 °2 others ¯ where,
(9)
T1 and T2 are thresholds.
The sub-function 2 is based on the normalized number of horizontal black runs in each row, which is defined as:
g FH ( x) =
row _ num
¦
α i g FH i ( x), x ∈ AFH
(10)
i =1
where
g FHi ( x) is the number of black runs in the ith row and α i is coefficient of
each row and holds the relation: row _ num
¦
α i = 1, α i ≥ 0
(11)
i =1
The value of
αi
in the top and bottom of the pattern should be smaller than others.
So we give the sub-function 2 as follows:
Segmentation of Mixed Chinese/English Documents
0 g FH ( x) < T3 f 2 ( x) = ® others ¯1 where
503
(12)
T3 is a threshold.
The sub-function 3 is based on the number of stroke and fork. It is well known that the extraction of stroke is very difficult. In this paper, we only consider horizontal and vertical strokes. At first we evaluate the width of vertical stroke WV and the width of
WH . Then the number of vertical stroke and horizontal stroke are defined as NV and N H respectively. If a horizontal stroke and a vertical stroke have common black pixel, there exists a fork. The number of fork is defined as N F . So we
horizontal stroke
give the sub-function 3 as follows.
0 NV < T4 , N H < T5 , N F < T6 f 3 ( x) = ® others ¯1 where
(13)
T4 , T5 , T6 are thresholds.
The sub-function 4 is based on the number of connected components. Considering the set of AFV or AFH , connected black runs will be merged into one component, and the number of the components will be counted as
N F . As for English character,
there is only one connected component except several special character such as i,j. The condition F1 is defined as holding the features of those special characters. As for a touching English string, there is also a connected component if not including those special characters such as i,j. The condition F2 is defined as holding the features of touching characters including those special characters. So we give the sub-function 4 as follows.
NF < 2 0 °0 N F = 2 and F1 ° f 4 ( x) = ® °1 N F ≥ 2 and ! F1 and F2 °¯2 others
(14)
6 Language Determination of Chinese/English in Local Segment Pattern Language determination of segments in the set of VC will be done. We presume that a segment v ∈ VC , and basic segments in v are x1 , x2 ,..., xt ∈ BC , where t is the number of basic segments in v. Next, we will demonstrate the language determination based on two situation, namely t=1 and t>1.
504
Y. Xia et al.
When t is equal to 1, v can be a Chinese character or an English character or touching English characters. A mixed OCR engine called as Engine-1 for Chinese character and English character and some English character-pairs will be conducted to get the final recognition results. When t is larger than 1, the situation will become complicated. The segment xi can be a Chinese radical or its remainder or an English character or touching English characters. The complexity functions of the t segments are f ( x1 ), f ( x2 ),..., f ( xt ) .
f ( xi ) = 2,1 ≤ i ≤ t holds, xi is Chinese radical or a part of a Chinese character rather than radical. If the relation f ( xi ) = 1,1 ≤ i ≤ t holds, xi is Chinese
If the relation
radical or a part of a Chinese character rather than radical or touching English characters. If the relation f ( xi ) = 0,1 ≤ i ≤ t holds, xi can be any class. Based on the complexity function, if the language of xi is Chinese, recognize v by Engine-1. Otherwise, recognize each xi by a mixed OCR engine called Engine-2 for Chinese radicals and English character and English character-pairs. If a segment xi is recognized as a Chinese radical, recognize v by Engine-1, and verification is necessary based on the match between Chinese radical and Chinese character. If a segment of xi is recognized as an English character or character-pair, verification is also needed based on the typographic features of English characters and complexity function.
7 Experimental Results The OCR engine is very important and three primary engines are used in this paper. The first engine is mixed OCR engine for Chinese character and English character and some English character-pairs, which is mentioned in the above section. The dimension of extracted feature is 440, and several different types of features such as direction line elements and mesh features and so on. The transform of LDA is adopted to enhance the distinction of feature and decrease the dimension of features. The nearest-neighbor classifier using Euclidean distance is used. Furthermore, training based on LVQ is adopted to improve the accuracy of classification. Based on the above measures, a classification accuracy of 99.13% is obtained. As for the second engine, it is very critical and can distinguish Chinese radicals and English character and English character-pairs, which has been demonstrated in the above sections in detail. The design of classifier is similar as the first engine and the classification accuracy is 99.08%. The number of model for each class is 8 and that for negative dummy class is 500. The number of training samples for each class is about 1000 and that of testing samples for each class is about 400. The third engine is to detect some special symbols or punctuations. The number of these characters is very few and some structural and typographic features are adopted. Both the first engine and the second engine are mixed Chinese and English OCR and some experimental results for the two engines are shown in table 1. The method proposed in this paper is evaluated using 180 document images, which are from various documents such as magazines, journals, books and so on. The total number of characters is 64230 and the number of Chinese character and English character are 46625 and 17605 respectively. We test these document samples on a Pentium IV-3.2GHz CPU PC and the results are shown in table 1.
Segmentation of Mixed Chinese/English Documents
505
Table 1. The result of test
Types
CRL(%)
CRS(%)
CRR(%)
Engine-1 Engine-2
99.88 99.43
— —
99.13 99.08
System
99.32
99.15
99.11
In the table 1, the results of Engine-1 and Engine-2 are from character samples and that of system is from document samples. And the correction ratio of language determination is abbreviated to CRL, and that of segmentation is abbreviated to CRS, and that of recognition is abbreviated to CRR.
8 Summary In this paper, we discuss the methods of integrated segmentation and recognition and point out several existing problems. After analyzing those problems and error derivation, we give a new method of adopting complexity analysis of pattern and mixed OCR engine for Chinese radicals and English characters and English character-pairs. This method can improve the capability of rejection to garbage pattern and high segmentation and recognition accuracy is obtained. Semantic analysis of near segments isn’t considered in this paper, which should be very useful and we will do some tries in this point to improve the performance of segmentation and recognition of mixed Chinese/English documents in the future.
References 1. Wen, D., Ding, X.Q.: A General Framework for Multi-Character Segmentation and Its Application in Recognizing Multilingual Asian Documents. Proc. SPIE Conference on Document Recognition and Retrieval XI, SPIE, Vol. 5296 (2004) 147-154 2. Spitz, A.L.: Determination of the Script and Language Content of Document Images, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 19 (1997) 235-245 3. Tan, C.L., Leong, P.Y., He, S.: Language Identification in Multilingual Documents. Proc. ISIMADE-1999, Baden-Baden, Germany (1999) 59-64 4. Guo, H., Ding, X.Q., Zhang, Z., Guo, F.X., Wu, Y.S.: Realization of a High-Performance Bilingual Chinese-English OCR System. Proc. ICDAR-1995, Vol.2, Montreal, Canada (1995) 978-981 5. Kuo, H.H., Wang, J.F.: A Nw Method for the Segmentation of Mixed Handprinted Chinese/English Characters. Proc. ICDAR-1993, Tsukuba Science City, Japan (1993) 810-813 6. Feng, Z.D., Huo, Q.: Confidence Guided Progressive Search and Fast Match Techniques for High Performance Chinese/English OCR. Proc. ICPR-2002, Vol. 3, Quebec, Canada (2002) 89-92 7. Wang, K., Jin, J.M., Pan, W.M., Shi, G.S., Wang, Q.R.: Mixed Chinese/English Document Auto-Processing Based on the Periodicity. Proc. ICMLC-2004, Shanghai, China Vol. 6, (2004) 3616-3619
506
Y. Xia et al.
8. Wang, K., Wang, Q.R.: Research on Chinese/English Mixed Document Recognition. Journal of Software, Vol. 16, (2005) 786-798 (in Chinese with English abstract) 9. Hwang, Y.S., Moon, K.A., Chi, S.Y., Jang, D.G., Oh, W.G.: Segmentation of a Text Printed in Korean and English Using Structure information and Character Recognizers. Proc. ICSMC-2000, Vol. 3, Nashville, USA (2000) 1586-1591 10. Kim, J.H., Kim, K.K., Chien, S.I., Choi, H.M.: Segmentation of Touching Characters in Printed Korean/English Document Recognition. Proc. ICSMC-1996, Beijing, China Vol. 1 (1996) 438-443 11. Liu, C.L., Sako, H., Fujisawa, H.: Performance Evaluation of Pattern Classifiers for Handwritten Character Recognition. International Journal on Document Analysis and Recognition,Vol. 4 (2002) 191-204 12. Huo, Q., Feng, Z.D.: Improving Chinese/English OCR Performance by Using MCE-based Character-Pair Modeling and Negative Training. ICDAR-2003, Vol. 1 (2003) 364-368 13. Kim, H.Y., Lim, K.T., Nam, Y.S.: Handwritten. Numeral String Recognition Using Neural Network Classifier Trained with Negative data. Proc. IWFHR-2002, (2002) 395-400 14. Liu, C.L., Marukawa, K.: Handwritten Numeral String Recognition: Character-Level vs String-Level Classifier Training. Proc. ICPR-2004, Vol. 1 (2004) 405-408 15. Gori, M., and Scarselli, F.: Are Multilayer Perceptrons Adequate for Pattern Recognition and Verification. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20 (1998) 1121-1132 16. Cordella, L.P., Stefano, C.D., Tortorella, F., Vento, M.: A Method for Improving Classification Reliability of Multilayer Perceptrons. IEEE Transactions on Neural Networks, Vol. 6 (1995) 1140−1147
Sigmoid Function Activated Blocking Artifacts Reduction Algorithm Zhi-Heng Zhou and Sheng-Li Xie College of Electronic & Information Engineering, South China University of Technology 510641, Guangzhou, China [email protected], [email protected]
Abstract. The block-based DCT compression methods usually result in discontinuities called blocking artifacts at the boundaries of blocks due to the coarse quantization of the coefficients. In this paper, a new blocking artifacts reduction algorithm is proposed to reduce blocking artifacts and preserve edges adequately. We introduce the definition of homogeneous block and the measurement with an alterable threshold for judging whether a shifted block is homogeneous or inhomogeneous. For each row of the homogeneous blocks, an adaptive Sigmoid function is used to replace the step function to reduce blocking artifacts. For the inhomogeneous blocks, Sigma filter is used to smooth two adjacent blocks’ boundary. Simulation results evaluated by both visual and numerical quality metrics show that the proposed blocking artifacts reduction algorithm outperforms the original algorithms.
1 Introduction The block-based discrete cosine transform (BDCT) scheme is a foundation of many image and video compression standards including JPEG, H.263, MPEG-1, MPEG-2, MPEG-4 and so on. The BDCT scheme takes advantage of the local spatial correlation property of the image. It usually divides the image into several 8 × 8 blocks, transforms each block from image pixels to DCT coefficients and then quantizes the DCT coefficients. The blocks are coded separately and the correlation among spatially adjacent blocks is not taken into account in encoding. It may result in that block boundaries are visible when the decoded image is reconstructed. In other words, the two adjacent blocks, which have a smooth change of luminance across the border, can result in a step shape in the decoded image if they fall into different quantization intervals. This kind of degradation is so called blocking artifacts. Zeng (1997) [1] first proposed that the shifted block between two adjacent blocks, which suffered from blocking artifacts, could be modeled by a 2D step function and then presented his zero-masking algorithm. This idea affects the post-processing techniques based on the DCT domain significantly. After that, many improvements have proposed because of the unsatisfied results of zero-masking algorithm. Liu (2002) [2] proposed a linear function to replace the step function. This idea is straightforward. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 507 – 516, 2006. © Springer-Verlag Berlin Heidelberg 2006
508
Z.-H. Zhou and S.-L. Xie
But it costs a lot of computations on detecting blocking artifacts and edges, and does not obtain good results evaluated by visual measurement. Luo (2003) [3] combined blocking artifacts detection with edge detection, and used weighted average values to modify the transformed coefficients of the shifted block. This method costs fewer computations and obtains better results evaluated by visual measurement. But it does not mention the reasons for the values of thresholds and weighted coefficients used in detection and transformed coefficients modification. Many creative works [8-11] on this area have been presented recently. In this paper, a new efficient reduction algorithm of blocking artifacts is proposed. We first introduce the definition of homogeneous blocks and classify the shifted blocks into two kinds: homogeneous and inhomogeneous. For the homogeneous blocks, an adaptive Sigmoid function is used to replace the step function in a rowwise manner. For the inhomogeneous blocks, Sigma filter is used to smooth the two adjacent blocks’ boundary. We intend to reduce the blocking artifacts and avoid blurring the edges as well. Because of the symmetry, we only describe the proposed algorithm for horizontal blocking artifacts.
2 Blocks Classification In order to avoid mistaking the edges as blocking artifacts, we first introduce the definition of homogeneous blocks, and then separate homogeneous blocks from inhomogeneous ones.
b1 and b2 , we form a shifted block b comb b b( x, y ) , bxx ( x, y ) and posed of the right half of 1 and the left half of 2 . Let b yy ( x, y ) Considering two adjacent 8 × 8 blocks
be the pixel and the spatial image second order gradients at the coordinate
( x, y ) in block b , respectively. We also denote the average second order gradients b yy bxx
over the complete block by
and
.
Thus, we define the shifted block b as a homogeneous block, if it satisfies the following condition:
bxx + b yy ≤ T
(1)
where T is the given threshold. Otherwise, we define it as an inhomogeneous block. We construct this condition by using the idea of Prewitt operator, which is good at edge detection. In fact, it costs lots of computations to obtain
bxx and b yy . Enlight-
ened by Coimbra’s work [7], condition (1) is approximated by
C1 + C 2 ≤ Ta
(2)
Sigmoid Function Activated Blocking Artifacts Reduction Algorithm
509
where
C1 =
2 7 7 2x +1 π b( x, y ) cos ¦¦ 8 x =0 y =0 16
2 7 7 2 y +1 π C2 = b( x, y ) cos ¦¦ 8 x =0 y =0 16 and
(3)
Ta is taken to be the edge threshold in edge detection.
However, a fixed threshold is not fit for the whole image or different images. In the relatively smooth regions, if the threshold is too small, the homogeneous blocks will be mistaken as inhomogeneous ones. In the regions with plenty of textures, if the threshold is overlarge, the inhomogeneous blocks will be mistaken as homogeneous ones. So, we design an alterable threshold
Ta = 1 −
σ
σ m
× T0
(4)
m are respectively the mean and standard deviation of two adjacent blocks b1 and b2 , T0 is the initial threshold value. In the simulations of this paper, T0 is set to 60. Considering the property of gray scale images, we generally have 0 ≤ σ m < 1 . Let us analyze (4). If two adjacent blocks b1 and b2 belong to the relatively smooth region, σ m will trend to 0. So the threshold T becomes larger lest the homogene-
where
and
ous blocks will be mistaken as inhomogeneous ones. On the contrary, if two adjacent blocks b1 and b2 belong to the regions with plenty of textures, σ m will trend to 1. So, the threshold T becomes smaller lest the inhomogeneous blocks will be mistaken as homogeneous ones.
3 Blocking Artifacts Reduction Algorithm In last section, homogeneous blocks detection is implemented in order to avoid the edges being mistaken as blocking artifacts. So we must use different methods to reduce the blocking artifacts for homogeneous and inhomogeneous blocks, respectively. 3.1 Blocking Artifacts Reduction for Homogeneous Block 1) Model of Blocking Artifacts The criterion for evaluating the blocking artifacts reduction algorithms generally has two sides. One is the ability of reduction, and the other is the ability of edge preserving. Many former algorithms can result in blurring the edges. We intend to propose a
510
Z.-H. Zhou and S.-L. Xie
new algorithm to not only reduce blocking artifacts efficiently, but also preserve edges adequately. For each row of the shifted block, we define a 1D step function s
−1 8 s( j ) = ® ¯1 8
j = 0,,3 j = 4,,7
(5)
Given a BDCT coded image, we will model each row of the block as a constant row distorted by independent identically distributed white noise. Then the shifted block b can be modeled by
b(i, j ) = β (i ) ⋅ s ( j ) + μ (i ) + r (i, j ) , i, j = 0,,7 where
β (i )
is the amplitude of the 1D step function s in the ith row of b,
(6)
μ (i )
is
the mean of the ith row of b, and r is the residual block. On the other hand, for two adjacent blocks b1 , b2 and their shifted block b , we define two matrices q1 and q 2
ª 0 q1 = « ¬ I 4×4
0 4×4 º ª 0 , q2 = « » 0 ¼ ¬0 4×4
I 4×4 º 0 »¼
(7)
where I is an identity matrix and 0 is a zero matrix. Suppose that the i th row of
b1 and b2 are denoted by b(i ) , b1 (i ) and definition of the shifted block, we have
b,
b2 (i ) , respectively. According to the
b(i ) = b1 (i )q1 + b2 (i )q2 , i = 0,,7
(8)
Using the linear and distributive properties of the DCT, we easily obtain
B(i ) = B1 (i )Q1 + B2 (i )Q2 , i = 0,,7
(9)
B(i ) , B1 (i ) , B2 (i ) , Q1 and Q2 are the DCTs of b(i ) , b1 (i ) , b2 (i ) , q1 and q 2 . Let the vector vi = [vi 0 vi1 vi 7 ] be the DCT of step function s in
where
the i th row of
b . Then vi 0 = vi 2 = vi 4 = vi 6 = 0 , and we have
vi So, the parameter
β
2
=
7
¦ vij2 = j =0
7
¦s
2
( m) = 1
(10)
m=0
in (6) can be computed as follows 7
β (i ) = ¦ v ij B (i, j ) j =0
= vi1 B (i,1) + v i 3 B(i,3) + v i 5 B(i,5) + vi 7 B(i,7)
(11)
Sigmoid Function Activated Blocking Artifacts Reduction Algorithm
In (11), it can be found that the larger the value of
β (i ) , the more serious the block-
ing effect is taken to be. Further more, from (5) and (6), we find that if step distance of step function s has the relation
511
β (i) ≤ 1 , the
2 ⋅ β (i ) ⋅1 8 ≤ 0.7 . It means that
there is no blocking effect in this case. So, it does not need blocking artifacts reduction, if
β (i) ≤ 1 .
2) Sigmoid Function Activated Reduction We know that the step function causes the blocking effect. So we can replace the step function by a monotone smooth function to reduce the blocking artifacts. In [2], a linear function is proposed as the smooth function. The linear function can smooth each of the whole row pixels. But the severity of blocking artifacts varies in different rows or different shifted blocks. So, we should not replace the step function with a fixed smooth function. We intend to combine the severity of blocking artifacts with the fluctuant range of smooth function. The more serious the blocking effect is taken to be, the more pixels need to be smoothed. So, the smooth function should trend to the linear function. On the contrary, if the blocking effect is not serious, we can only smooth the pixels in the middle of the shifted block. So, the smooth function should trend to original step function.
Fig. 1. Illustration of adaptive smooth function
g i (x)
For each row of the shifted block, we design an adaptive Sigmoid function
g i ( x) = where
1 1 − , β (i ) > 1 , i = 0,,7 1 + exp[− x log10 β (i ) ] 2
(12)
log10 β (i ) controls the shape variation of Sigmoid function within a given range.
512
Z.-H. Zhou and S.-L. Xie
The illustration of the adaptive Sigmoid function is shown in Fig.1. If the blocking effect is serious,
β (i )
becomes larger,
g i (x) will trend to the linear function, and
β (i )
more pixels will be smoothed. If the blocking effect is not serious,
becomes
g i (x) will go steeper to trend to the step function, and only the pixels in the middle of the shifted block will be smoothed. So, the designed function g i (x ) is smaller,
rightly fit for the demand of adaptively smoothing. Thus, eight pixel values on the smooth function can be obtained
d (i ) = [d (i,0),, d (i,7)]
(13)
where
d (i,0) = −1
8 , d (i,7) = 1
8,
2j 2 +1 )) , − 1) log10 β (i ) ln( 7 2 −1 i = 0,,7 , j = 1,,6 .
d (i, j ) = g i ((
(14)
Let bˆ be the new block after replacing the step function with the adaptive smooth function. Then
bˆ(i, j ) = b(i, j ) + β (i ) ⋅ [d (i, j ) − s ( j )]
i, j = 0,,7
(15)
3.2 Blocking Artifacts Reduction for Inhomogeneous Block In the inhomogeneous blocks, there perhaps exist not only blocking artifacts, but also edges. So, they are processed differently from homogeneous ones. Paying attention to both computational cost and edge preserving, we use the 5 × 5 Sigma filter on the blocks boundaries. Lee [5] proposed the original Sigma filter. Assume that all the pixels in the same object of the image are subject to a Gaussian distribution with mean μ and variance
σ 2 . Given a certain pixel x 2σ , we have the probability
and an interval C , whose center is
P{x; x − μ ≤ 2σ } ≈ 95%
μ
and radius is
(16)
It represents that if pixel x belongs to the same object as the other pixels in interval C , then the probability of x ∈ C is about 95%, and vice versa. The 5 × 5 Sigma filter, as shown in Fig.2, applies a 5 × 5 sliding window through the image with the current center pixel xij . Then it replaces the pixel xij by the mean of the pixels, whose difference between the center pixel and itself is no more than 2σ in the sliding window. It can be formulated as follows:
Sigmoid Function Activated Blocking Artifacts Reduction Algorithm 2
xˆij =
2
¦ ¦ δ kl xi+k , j +l
k = −2l = −2
2
513
2
¦ ¦δ
kl
k = −2l = −2
(17)
where
°1 °¯0
δ kl = ® and
σ
xi + k , j + l − x ij ≤ 2σ x i + k , j + l − xij > 2σ
(18)
reflects the edge measurement in the sliding window.
Fig. 2. The 5 × 5 Sigma filter
4 Simulation Results The simulations are performed with several 512 × 512 images coded at low bit rates, with the block size 8 × 8. Two quantitative metrics are used in image quality evaluation. One is peak signal to noise ratio (PSNR), and the other is mean square difference of slope (MSDS) [6]. PSNR is a common metric for image quality. But it is ineffective in evaluating the image with blocking effects. MSDS is much better than PSNR in this case. MSDS is calculated by the pixels’ mean square difference on the horizontal, vertical and diagonal directions between blocks. Note that the smaller of MSDS, the better the image quality will be. The visual evaluation of the proposed technique is shown in Fig.3. Fig.3 shows the original image Peppers, an enlarged part of it coded at 0.207bpp, and the results postprocessed by different techniques. In the image processed by zero-masking method in [1], blocking artifacts are still obvious. The visual quality of the image processed by Liu’s method in [2] is inferior to those of Luo’s method in [3] and our proposed technique. But the visual comparison between Luo’s and ours results is difficult. So we appeal to the quantitative comparison. In Table 1, both PSNR and MSDS are used to evaluate the performance of different techniques. We use different images coded at different bit rates to perform the simulations. At the low bit rates, the proposed method is superior to other methods evaluating by both metrics. However, at the high bit rates, this is not always the fact under each metric, respectively. But, if combining with two metrics, the proposed method is still generally superior to other methods, even Luo’s.
514
Z.-H. Zhou and S.-L. Xie
(a)
(c)
(e)
(b)
(d)
(f)
Fig. 3. Visual quality comparison of the image Peppers coded at 0.207bpp and post-processed by different methods (a) Original image, (b) Decoded image, (c) Zero-masking method in [1], (d) Liu’s method in [2], (e) Luo’s method in [3], (f) Proposed method
Sigmoid Function Activated Blocking Artifacts Reduction Algorithm
515
Finally, we compare the computational complexity of different techniques. In the detection part, our method costs much less than Liu’s method in [2], and slightly more than Luo’s method in [3]. In the reduction part, eight row-wise DCTs cost the same computations as a block-wise DCT. Our method does not need inverse DCT and the cost of adaptive smooth function is very low, because we merely need eight discrete values. The Sigma filter is much simpler than the post-filter used in Liu’s method. Altogether, the total cost of our proposed method is less than Liu’s method and equivalent to Luo’s method. Table 1. Quantitative quality comparison of different methods Image
Bit Rate (bpp) 0.216
Lena
0.274 0.338 0.207
Peppers
0.265 0.329 0.244
Fishingboat
0.325
0.431
Evaluation metric
Decoded image
PSNR
29.92
Zeromasking method 29.83
Liu’s method
Luo’s method
Proposed method
30.20
30.22
30.45
MSDS
5693.5
3441.9
2980
3413.2
2831.7
PSNR
31.65
31.15
31.54
31.39
31.54
MSDS
4316.9
2812.4
2659.6
2838.6
2217.2
PSNR
32.77
31.93
32.47
32.22
32.3
MSDS
3663.5
2553.7
2595.9
2518.3
1968.9
PSNR
30.22
30.14
30.31
30.67
30.82
MSDS
4666.6
2369.8
2226.8
3014.7
2818.8
PSNR
31.75
31.29
31.5
31.9
32.06
MSDS
3697.5
2130.6
2073.2
2510.2
2322.9
PSNR
33.15
32.28
32.4
32.84
33.01
MSDS
2942
1812.2
1844.2
2206.1
2062.5
PSNR
27.74
27.1
27.63
27.7
28.09
MSDS
9262.9
4781.4
6250
5489.8
4623.1
PSNR
29.05
27.96
28.71
28.6
29.16
MSDS
8150
4546.8
6204.7
5228.2
4209.2
PSNR MSDS
30.32 7215.9
28.7 4356.2
29.72 6014.5
29.44 4839.9
30.13 3774.3
5 Conclusions The BDCT compression methods usually result in discontinuities, which are called blocking artifacts at the boundaries of blocks due to the coarse quantization of the coefficients. We propose a new efficient algorithm not only to reduce blocking artifacts, but also preserve edges adequately. There are three novelties in this paper: (i) Introducing the definition of homogeneous blocks and the measurement with an alterable threshold for judging whether a shifted block is homogeneous or inhomogeneous.
516
Z.-H. Zhou and S.-L. Xie
(ii) Proposing a row-wise instead of block-wise processing for the shifted block to avoid over-smoothing. (iii) Replacing the step function with an adaptive Sigmoid function to reduce the blocking artifacts. Simulation results evaluated by both visual and quantitative quality metrics show that the proposed blocking artifacts reduction algorithm outperforms the original algorithms.
References 1. Zeng, B.: Reduction of Blocking Effect in DCT-coded Images Using Zero-masking Techniques. Signal Processing, Vol. 79, No.2, (1999) 205–211 2. Liu, S. Z., Bovik, Alan C.: Efficient DCT-domain Blind Measurement and Reduction of Blocking Artifacts. IEEE Trans. Circuits Syst. Video Technol., Vol. 12, No.12 (2002) 1139-1149 3. Luo, Y., Ward, Rabab K.: Removing the Blocking Artifacts of Block-based DCT Compressed Images. IEEE Trans. Image Processing, Vol.12, No.7 (2003) 838-842 4. Gao, W.F., Mermer, C., Kim, Y.: A De-Blocking Algorithm and a Blockiness Metric for Highly Compressed Images. IEEE Trans. Circuits Syst. Video Technol., Vol.12, No.12 (2002) 1150-1159 5. Lee, J.S.: Digital Image Smoothing and the Sigma Filter. Comput. Vis., Graph., Image Processing, Vol.24, No.2 (1983) 255-269 6. Triantafyllidis, G.A., Tzovaras, D., Strintzis,M.G.: Blocking Artifact Detection and Reduction n Compressed Data. IEEE Trans. Circuits Syst. Video Technol., Vol.12, No.10 (2002) 877- 890 7. Coimbra, M.T., Davies, M.: Approximating Optical Flow within the MPEG-2 Compressed Domain. IEEE Transactions on Circuits and Systems for Video Technology, Vol.15, No.1 (2005) 103 – 107 8. Xu, Z.L., Xie, S.L.: A Deblocking Algorithm Using Anisotropic Diffusion Based on HVS. Computer Engineering, Vol.32, No.4 (2006) 10-12 9. Xu, Z.L., Xie, S.L.: A Deblocking Algorithm Based on HVS. Journal of Electronics & Information Technology, Vol.27, No.11 (2005) 1717-1721 10. Xie, S.L., Xu, Z.L.: An Adaptive De-blocking Algorithm Based on MRF. IEEE Seventeenth International Conference on Tools with Artificial, (2005) 710-711 11. Xie, S.L., He, Z.S., Gao, Y.: Adaptive Theory of Signal Processing. 1st ed. Chinese Science Press, Beijing (2006) 264-273
Simulation of Aging Effects in Face Images Junyan Wang, Yan Shang, Guangda Su, and Xinggang Lin Research Institute of Image and Graphics, Dept of Electronic Engineering, Tsinghua University, Beijing, China [email protected]
Abstract. In this paper a method of simulating aging effects automatically in face images is proposed. The first step of the method is to represent a facial image by a pair of shape and texture vectors which are extracted by projecting its shape and texture in the eigenspaces of shape or texture separately. The second is estimating the age of face image by Intelligence Increasing Neural Network. After this, the synthesized shape and texture vectors at target age are generated by using typical vector creating function, estimated age and the feature vector of the original test image. At last the synthesized shape and texture vectors at target age are reconstructed in eigenspaces to get the shape and texture at target age which are combined to synthesize facial image at target age. Experiments show that the proposed method can effectively “change” the age of face images.
1 Introduction A human face has much information. For example, facial signals such as skin colors, shapes and bone structures show race, sex, characters and health conditions. Permanent wrinkles and muscles represent age. Facial expressions and head motions depict emotion, mood and mental state [1]. There are many kinds of methods for analyzing and synthesizing the facial information for many practical applications. This paper focuses on age information. Age is one of the most important information that face images conveys. Automatic age estimation and simulation are very useful in human-computer interaction, computer animation, face recognition robust to age variation, and so on. Researchers have developed many methods to analyze age information in face images, most of which focused on age estimation and age simulation. Burt and Perrett [2] firstly simulated the effect of age in facial images by using cartoon method to exaggerate age. Mean faces of different ages are calculated to synthesize with test image to get new images. The age of synthesized image is the blending of the ages of mean face and test image. This method can add aging effects in face images, but the image with age variation was not natural. Changseok Choi [3] used PCA and 3-D face shape model to extract the age change components from 3-D facial images, and then added the age change components to test image to synthesize the facial images at different ages. Some synthesized images with age variation were shown in that paper. Young H. K. and Niels V. L. [4] researched face changing along with age variation. They positioned many facial parts including eyes, nose, etc, and calculated some ratios between different facial parts, and extracted wrinkles by snakelet. Their work showed that when one person grows up from child to adult, his skull change largely D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 517 – 527, 2006. © Springer-Verlag Berlin Heidelberg 2006
518
J. Wang et al.
and when he becomes older form young adult, the main change is wrinkles. A face recognition system robust to age variation is proposed by Lanitis A. and Taylor C.J. [5] [6]. They build a face model and an age function [7] [8] to isolate age change. The face model trained in this method uses a feature vector to represent a face. Then, an age function is trained to estimate the age of test image and to get the feature vector of test image at different ages. PCA is used to get feature vector. The structure of age function is polynomial. Some simulated face images and a better recognition rate than traditional PCA are shown in these papers. This method got better result than the former methods. But because of the limit of age function, this method can’t get precise result. In this paper, an automatic age simulation method using only one example image is proposed. First, an approved ASM [9] approach is applied to align the face image and get facial shape of training samples, then a triangle based affine transform is used to get the pure texture image of training samples. Then we train a shape eigenspace and a texture eigenspace by PCA. After this, we project facial shape and facial texture of test image in eigenspaces separately. Then an Intelligence Increasing Neural Network [10] is used to estimate the age of the input image. A typical vector creating function is used to generate the shape and texture vectors at target age together with original image, estimated age. Shape and texture vectors are reconstructed in eigenspaces to synthesize the shape and texture at target age. At last we combine the shape and texture at target age to generate the final face image. The paper is organized as follows: In the next section, the age simulation system of our method is briefly introduced. In the third section, we describe how we train the shape and texture eigenspaces, and how we represent the test image by a texture vector and a shape vector. In the fourth section, age estimation method and how to synthesize the face image at target age of the original test image are described. The experiment results and the conclusions are given in the last section.
2 Age Simulation System Age simulation system includes 3 parts: facial feature vector extraction, age estimation and age simulation. In the first part, a shape feature vector and a texture feature vector are extracted as model parameters to represent a face, and a face image can be reconstructed by a shape feature vector and a texture feature vector. In age estimation section, the feature vectors become the input layer of artificial neural network and output is the estimated age. Another main part of the system is face image synthesis at different ages. In this part, a pair of shape vector and texture vector generated in the first part is changed to that of the target age and face images at different ages are synthesized. Fig. 1 shows the flowchart of age simulation system. Some work can be done outline, such as building eigenspace, training artificial neural network for age estimation, and so on. Before we extract facial features, in order to get good simulation result, some preprocessing are done to improve the image quality, including illumination normalization, scale adjustment, and so on.
Simulation of Aging Effects in Face Images
519
Fig. 1. Flow chart of age simulation system
3 Face Representation One face image includes two parts: shape information and pure texture image. So a facial image can be separated into shape information and texture information. If we have these two kinds of information, we can reconstruct a facial image. 3.1 Shape and Texture Information Extraction The shape information is represented by the coordinates of 101 landmarks on face (as shown in Fig. 2), and Active Shape Model (ASM) is used to extract the landmarks. After the landmarks are positioned, triangle based affine transform are used to span the face image to a pure texture image at standard shape. We use Delaunay
Fig. 2. Shape and texture information (The left is the source image with landmarks extracted by ASM; the middle is the mean shape and triangles; the right one is the pure texture image at mean shape)
520
J. Wang et al.
Triangulation [11] to get a set of triangles such that no data points are contained in any triangle’s circumcircle. This pure texture image is the texture information of the face image. Usually we use the mean shape of a series of faces as the standard shape.
ª x ' º ª a b º ª x º ªOx º « y ' » = « c d » « y » + «O » . ¬ ¼ ¬ ¼¬ ¼ ¬ y¼
(1)
Equation (1) shows the calculation of triangle based affine transform. Where,
( x, y ) is the coordinate in source image, and ( x ' , y ' ) is that of target image, a, b, c, d are coefficients of transform, and (O x , O y ) is offset. 3.2 Eigenspace Training We do Principal Component Analysis (PCA) to the coordinates of the landmarks and the pure texture images respectively. So we can get a shape eigenspace and a texture eigenspace.
X = X + Pb .
(2)
b = P −1 ( X − X ) .
(3)
Equation (2) and (3) show the program of PCA. Where, X is the given sample, X is the mean of training samples, P is eigen matrix composed by eigen vectors, and b is the projected vector in eigenspace. 3.2.1 Shape Eigenspace Training In order to train shape eigenspace, let X in equation (2) be the coordinates of 101 landmarks positioned in equation. Let ci be the mean square variance of the projecting values of training samples in ith dimension of eigenspace. The effect of ith dimension of shape eigenspace can be illustrated by X in equation (4).
bi = (0,...0, ci ,0,...0)
i = 1,...m
X = X s ± P(3 × bi ) . Where
(4)
X s is the mean shape of training samples, P is eigen matrix.
Fig. 3 shows the effect of the first 4 dimensions of shape eigenspace. We can see that most of these dimensions have clear meanings and some of them show the change of age. Each row is the effect of one dimension; the middle column is the mean shape. The first and second dimensions include the change of age and pose. The third dimension is mainly about pose variation. The fourth dimension includes the variation of age, pose, and expression.
Simulation of Aging Effects in Face Images
521
Fig. 3. The effect of the first 4 dimensions of the shape eigenspace
3.2.2 Texture Eigenspace Training In order to train texture eigenspace, let X in equation (2) be the pure texture image generated in section 3.1 as shown in the left of Fig.2. Let ci be the mean square variance of the projecting values of training samples in ith dimension of eigenspace. The effect of ith dimension of texture eigenspace can be illustrated by X in equation (5).
bi = (0,...0, ci ,0,...0)
i = 1,...m
X = X t ± P(3 × bi ) . Where
(5)
X t is the mean texture of training samples, P is eigen matrix.
Fig.4 shows the effect of the first 4 dimensions of texture eigensapce. We can see that most of these dimensions have clear meanings and some of them show the change of age. Each row is the effect of one dimension; the middle column is the mean texture image. Each row shows the change of illumination, the second and the third row include the variation of age For a given image, after the projection in the eigenspaces, we can get a shape feature vector and a texture feature vector. And if we have a pair of these vectors, we can reconstruct a face shape and a pure texture image, and combine them to get a face image.
522
J. Wang et al.
Fig. 4. The effect of the first 4 dimensions of the texture eigenspace
4 Age Estimation For a given image, after feature extraction, it is represented by a pair of vectors. These vectors contain its main information, including age information. Age estimation tries to build a relationship between features and age. Age function is a method to establish the map from feature vector to age. But because we don’t know the effect of every dimension on estimating age, and also the form of the map is unknown, we can not establish an appropriate structure for age function. Artificial neural network can be trained to finish this task. If we have a set of face images with age, we can use them to train an artificial neural network with given structure. After training, the coefficients of NN are set and the error between the real age and the estimated age of training samples is minimized. Fig. 5 shows the frame of Intelligence Increasing Neural Network used to estimate age. The input layer is the shape feature vector and texture feature vector, and the integration output is the estimated age. This system can do self-learning to change its coefficients and improve performance. IINN is a new architecture of neural network, which can obtain good classification results with more clear-cut meaning than other neural networks’, using a discrimination principle based on Bayesian maximum posterior probability, good data extraction method and reasonable optimization algorithm. IINN has a database increasing
Simulation of Aging Effects in Face Images
523
Fig. 5. The structure of IINN
dynamically, whose scale has a logarithmic connection with the number of the dividing classes. The belief degree of the database converges in probability and the problem of over training does not exist.
5 Age Simulation For a given image, there are two feature vectors extracted to represent it. If we can get the feature vectors at new age (the target age), the facial image at target age will be reconstructed. 5.1 Reconstruct Feature Vector at Target Age After estimating the age of the given image, we should simulate the process of aging and calculate the feature vector at target age. What we know is the feature vector of given image bcur and estimated age cur. Assume that the feature vector of target age new can be calculated by a function g.
bnew = g (bcue , new ) = bcur + Δb
.
(6)
.
(7)
So, we can get the following equation.
Δ b = bnew − bcur ≈ B new − Bcur Where
Bnew and Bcur are the typical feature vectors of target age and estimated age
which have no relativity to given image. The feature vector at target age bnew is:
b new ≈ b cur + ( B new − B cur ) .
(8)
524
J. Wang et al.
bcur is the feature vector of given image, cur is the estimated age, new is target age, Bnew and Bcur are the typical feature vectors of target age. By varying the
Where
feature vector, we can get a series of vectors corresponding to the same age. The traditional method is to calculate the mean vector as the typical feature vector B for every age and build a lookup table. This method can not create typical feature vector for the age no in training samples. Here we propose our method: typical vector creating function. 5.2 Typical Vector Creating Function The typical feature vector
Ba of age a should satisfy the following requests:
Ba is concerned with every training sample. The more similar the age of training sample is to a , the higher relative the feature vector of training sample to Ba . (1)
We use Gauss function to represent the relativity.
Ba and the feature vector of training
(2) The sum of the square error between sample E is minimal.
E =
n
¦
i =1
N σ (a − a i ) B a − bi
Where n is the number of training samples, the feature vector of i variance (3)
σ
th
training sample,
ai
2
.
is the age of ith training sample,
(9)
bi is
N σ ( ) is Gauss function with square
.
Ba is continuous to a . Ba minimize E, so
Based on request (2),
∂E = 0 . ∂B a
(10)
We resolve these requests, and get: n
Ba =
¦ N σ (a − a
i
)bi
i =1 n
¦ N σ (a − a
i
.
(11)
)
i =1
Equation is typical vector creating function. Comparing with the typical vector creating method of look up table [6], this function can create typical vector for the age not in training samples. And it can create typical vector for continuous age. The lookup table method can not finish these two requests. When we let σ is zero, this method become the method of using mean feature vector as typical vector to generate lookup table.
Simulation of Aging Effects in Face Images
525
5.3 Synthesize the Facial Image at Target Age The shape and texture at target age can be reconstructed in eigenspace using equation (12).
X new = X + Pbnew .
(12)
X new is the shape or texture at target age, X is the mean shape or texture, P is eigen matrix, bnew is the feature vector at target age.
Where,
After the shape and texture at target age are reconstructed, the affine transform based on triangulation is used to generate the facial image at target age.
Fig. 6. Shape and texture at target age are combined to generate face image at target age
6 Experiments and Conclusions We use TH face database for our experiments. There are 400 images from 60 persons at different ages in the database. The training set includes 300 images. Fig. 7 shows some samples in the face database.
Fig. 7. Samples in face database (Each row contains images of the same subject.)
526
J. Wang et al.
6.1 Age Estimation Result In this experiment, we use 100 test images which have real ages. The estimated result is shown in table 1. In paper [6], Lanitis A used age function for age estimation. We finished that method and do experiment using the same training set and test set. The age function is quadratic. Table 1 show that the error of IINN is lower. Table 1. Age estimation result
Method
Age function
IINN
Error(years)
2.7
1.9
6.2 Age Simulation Result Fig. 8 shows the age simulation result. We can see that the simulated images carry the information of age. We change the age of face image.
Fig. 8. Age simulation result 1 (The left column of images are source images. The second to fifth column are simulated images at age 20, 30, 40, and 50)
6.3 Conclusions In this paper, an automatic method is proposed to synthesize the face image at different age. We first locate position of the feature points using ASM, then a triangle based affine transformation is applied to get pure texture image, then shape and texture vectors are obtained by PCA, then IINN is used to estimate the age of the input test image, then a lookup table is applied to get vectors at target age to reconstruct the shape and texture, finally we combine the synthesized texture and shape to get the final synthesized image. Experiments show that the proposed method can estimate age precisely and simulate face image at different ages very well.
Simulation of Aging Effects in Face Images
527
References 1. Phillips, P.J., Grother, P., Micheals, R.J., D.M et al.: FRVT 2002: Evaluation Report. March (2003) http://www.frvt.org/FRVT2002/documents.htm 2. D. Michael Burt ,David I. Perrett. : Perception of Age in Adult Caucasian Male Faces: Computer Graphic Manipulation of Shape and Colour Information, Proceedings of Royal Society. (1995) 137–143 3. Changseok Choi.: Age Change for Predicting Future Faces, Proceedings of IEEE International Fuzzy Systems Conference. (1999) 1603–1608 4. Young H. Kwon ,Niels da Vitoria Lobo.: Age Classification from Facial Images, Computer Vision and Image Understanding. (1999) 1-21 5. Lanitis A., Taylor C.J.: Towards Automatic Face Identification Robust to Ageing Variation, Proc of 4th IEEE International Conference on Automatic Face and Gesture Recognition. (2000) 391–396 6. Lanitis A., Taylor C.J.: Robust Face Recognition using Automatic Age Normalization. Proceedings of Electrotechnical Conference. (2000) 478–481 7. Andreas Lanitis, Chris J. Taylor, Timothy F. Cootes. : Modeling the Process of Aging in Face Images. Proceedings of IEEE ICCV99. (1999) 131–136 8. Andreas Lanitis, Chris J. Taylor, Timothy F. Coots. :Toward Automatic Simulation of Aging Effects on Face Images., IEEE Trans on PAMI. (2002) 442–456 9. Cootes, T.F.; Taylor, C.J.; Lanitis, A.: Multi-resolution search with active shape models. Pattern Recognition. (1994) 610–612 10. Zheng N., Guangda S. ,Junyan W.:A new neural network – Intelligence Increasing Neural Network, Int. C. Robotics, Intelligent Systems and Signal Processing. (2003) 1224–1229 11. Boissonnat J.: Representing 2-D and 3-D Shapes with the Delaunay Triangulation, Int. C. Pattern Recognition. (1984) 745–748
Synthetic Aperture Radar Image Segmentation Using Edge Entropy Constrained Stochastic Relaxation Yongfeng Cao, Hong Sun, and Xin Xu Signal Processing Laboratory, School of electronics information, Wuhan University, Wuhan 430079, China {cyf, sh, xx}@eis.whu.edu.cn http://www.dsp.whu.edu.cn
Abstract. A synthetic aperture radar (SAR) image segmentation method using the multi-level logistic (MLL) model and edge entropy constrained stochastic relaxation is proposed. Edge entropy is developed and combined with a stochastic relaxation process to get expected segmentation. Gamma distribution is used for SAR intensity data and MLL model for the underlying label image. Parameters of Gamma distribution are estimated using EM method. The proposed method is an iterative scheme consists of two alternating steps: to approximate the estimation of the pixel class labels and to estimate gamma distribution parameters. The weight of the prior part in goal energy function is increased slowly versus the increasing iteration times until the edge entropy value of segmentation reaches an experiential threshold. The segmentation results for synthetic and real SAR images show that the proposed method has a good performance.
1 Introduction This paper addresses getting synthetic aperture radar (SAR) image segmentation with a method combining markov random field (MRF) model and edge entropy constrained stochastic relaxation. Contextual constraints are ultimately necessary in a capable segmentation method for SAR images with strong multiplicative noises [1]. MRF model provides a convenient and consistent way of modeling context dependent entities such as image pixels through characterizing mutual influences among such entities. Prior contextual knowledge brought by MRF model makes image segmentation a constrained optimization problem that equals minimizing a global energy function [2,3]. We usually do not know the proper weight of the prior part in the global energy, whereas it has important effect on quality of segmentation results. Most MRF based segmentation methods simply let all the parts in global energy function has equal weight or give the prior part an experiential weight [2-5]. Constrained stochastic relaxation (CSR)[6] is a good method for constraint optimization problems. It deals with the weight of prior part of global energy with an intuitive strategy: slowly increasing it from a very little value to a large value. But the D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 528 – 537, 2006. © Springer-Verlag Berlin Heidelberg 2006
SAR Image Segmentation Using EECSR
529
experiential maximum value for one image is not surely suitable for other images, so CSR could not guarantee consistently performance. In this paper, we introduce an edge entropy constrained stochastic relaxation (EECSR) method. A new measure, edge entropy, is used to assess how well the segmentation results accord with our prior knowledge. During the segmentation progress of EECSR, the weight of prior energy is increased slowly from a very little values and the edge entropy of segmentation result is monitored. Final segmentation result is the one whose edge entropy first reaches an expected threshold. EECSR is better than CSR at two points: first, edge entropy threshold relate to prior knowledge of segmentation more directly than the maximum weight of prior energy in Global energy; second, EECSR can guarantee the consistently performance in edge entropy sense. In the SAR image segmentation method proposed in this paper, multi-level logistic (MLL) model is used for the underlying label image and gamma distribution for SAR intensity data. EM method is used to estimate the parameters of Gamma distribution. The hyper parameters of MLL model are supposed to be known a priori. The segmentation method is an iterative scheme consists of two alternating steps: to approximate the maximization of the posterior marginals (MPM) estimation of the pixel class labels and to estimate gamma distribution parameters. The weight of the prior energy in goal energy function is increased slowly versus the increasing iteration times until the edge entropy value of segmentation reaches an experiential threshold.
2 Image Models In this section, we consider a rectangular pixel lattice S. The label field will be denoted X={Xs, s S} and the observed image field will be denoted Y={Ys, s S}. Throughout the paper, x={xs, s S} and y={ys, s S} will represent sample realizations of X and Y. We suppose there are K different classes in SAR image, so xs {1,2,…,K}. It is well known that the Gamma distribution provides a good model for SAR intensity data [1]. We suppose, therefore, that each pixel of SAR image, ys is conditionally independent of all other pixels, conditioned on the knowledge of segmentation label at that pixel, xs , and satisfies gamma distribution. Formally, P ( y | x ) = ∏ P ( y s | xs ) = s∈S
L
L ys
L −1
∏ Γ( L)σ s∈S
L xs
exp( −
ys L
σ xs
)
(1)
where L is the number of looks of the SAR image, σ xs is the parameter of gamma distribution associated with segmentation label xs . For the label image we use a MLL model and consider only 8-neighborhood and cliques of two pixels (see Fig.1). The local conditional distribution at site s can be represented by Gibbs distribution as followed P ( xs | xr , r ∈ N s ) =
1 Z ( xs )
exp{− ȕ, H ( xs , N s ) }
(2)
530
Y. Cao, H. Sun, and X. Xu
where ȕ = [ β1 , β 2 , β 3 , β 4 ] are hyper parameters, N s is the neighborhood of site s, Z(xs) is a normalizing constant. H( xs , N s ) = [u (a 0 , xs ) + u (a1 , xs ), u (a 2 , xs ) + u (a 3 , xs ), u (a 4 , xs ) + u (a 5 , xs ), u (a 6 , xs ) + u (a 7 , xs )] (3)
where u(i,t)=0 for t=i, u(i,t)=1 otherwise. The MLL model for label image gives a prior regularity constraint to the solution space of segmentation.
Fig. 1. 8-neighborhood and cliques of two pixels
The posterior distribution of x given y is P ( x | y , ș) =
P ( x | ș ) P ( y | x , ș) = P ( y | ș)
exp{−a ⋅ ¦ ȕ, H ( xs , N s ) } s∈S
P ( y | ș) ⋅ Z
⋅∏ s∈S
LL y s L −1 Γ( L )σ xLs
exp(−
ys L
σ xs
° ª LL y s L −1 y L º½ ° = 1 exp ®− a ⋅ ¦ ȕ, H ( xs , Ns ) + ln «∏ exp( − s ) » ¾ L σ Z′ σ Γ ( L ) s S ∈ ∈ s S xs ¼ » °¿ xs °¯ ¬«
)
(4)
where, ș =[ ı , ȕ ]= [σ 1 , , σ K , β1 , β 4 ] . Z and Z ′ are normalizing constants. The exponential part is Gibbs energy function
ª LL ys L −1 yL º U ( x y ) = aU ( x ) + U ( y x) = − a ⋅ ¦ ȕ, H ( xs , Ns ) + ln «∏ exp( − s ) » L σ xs » s∈S ¬« s∈S Γ( L)σ xs ¼
(5)
The parameter a controls the weight of prior energy contributed by MLL model. In (4)(5), the parameter a equals unit, but we will increase it slowly from a little value during our segmentation process.
3 Edge Entropy Constrained Stochastic Relaxation(EECSR) EECSR slowly increases the weight of prior energy from a very small value as CSR does, but along with monitoring edge entropy of segmentation result, and will stop the increasing when the edge entropy reaches an expected threshold. We describe CSR in section 3.1, and develop the edge entropy measure in section 3.2. 3.1 Constrained Stochastic Relaxation The Gibbs energy function (5) is a special form of below expression U ( x y ) = t −1 ª¬ a ⋅ U ( x) + U ( y x) ¼º
(6)
SAR Image Segmentation Using EECSR
531
Segmentation is finding a label image x that minimizes (6). In this case, discrete-time Markov chain X(t)={X(1),X(2),…, X(Ts)} is usually created from the Gibbs energy function (6) by Gibbs sampler and is made to converge in distribution to its Gibbs distribution such as (4). In minimizing Gibbs energy (6), Simulated annealing (SA)[3] fixes the parameter a, and decrease the temperature t slowly. Iterated Conditional Mode (ICM) is a simpler SA with the temperature t fixed. As shown in [6], CSR fixes parameter t, and increase parameter a with a suitable rate from a very little value to the infinite. The strategy of dealing with a, the weight of prior part of global energy, is intuitive. Let
Ω∗ = { x : U ( x) = u} u = min U ( x) , Π ∗ ( x y ) = δ Ω∗ ( x) x
exp {−U ( x y )}
¦ exp {−U ( x′ | y)}
(7)
x ′∈Ω∗
It is easy to check that [6,7] lim Π ( x y, t = 1, a) = Π ∗ ( x y ) a →∞
(8)
In practice, parameter a is increased to a very large value. But the experiential maximum value for one image is not surely suitable for other images, so CSR could not guarantee consistently performance.
3.2 Edge Entropy We are concerned with developing a measure by which to assess the regularity of segmentation edges. It is supposed that there are K different labels in label image. We consider the eight nearest spatial neighbors (ENSN) (see Fig.1) of each edge pixel. Then, there are total K8 different ENSN configures. We know that more irregular the edges are, more different ENSN configures or more information we may find in label image. So, information entropy can be introduced to measure the regularity of edges. Formally K8
K8
i =1
i =1
H = −¦ pi log( pi ) = −¦
Mi M log( i ) M M
(9)
where, M is the total number of distinct 3×3 blocks in image lattice, Mi is the appearance frequency of the ith ENSN configure in label image. The value of H relate to parameter K. When K8 is too big, using Mi/M as the approximation of pi is statistically unreliable based on a small amount of ENSN samples. On the other hand, using H is inconvenient when comparing the regularity of label images with different K. Aiming at these drawbacks, we introduce edge entropy as (10). In a label image, we think xs is a edge pixel between classes n and m, if xs {m,n} and there is one pixel xt {m,n} and xtxs in ENSN of xs. The ENSN of each edge pixel xs between two classes n and m, is dealt as below (see Fig.2): for every pixel xt in the ENSN of xs, xt=0 for xt=xs, xt=1 for xtxs and xt {m,n}, xt =2 otherwise. Then the ENSN theoretically have 38 different configures, which is not a very big number. Let M ni , m stand for the appearance frequency of the ith ENSN
532
Y. Cao, H. Sun, and X. Xu
configure for edge pixels between classes n and m. Let M n , m stand for the number of edge pixels between classes n and m. The edge entropy is defined as such 38
i i Eedge = −¦ M log( M ) log ( 38 ) M i =1 M
K −1
where, M i = ¦
K
¦
n =1 m = n +1
K −1
M ni , m , M = ¦
(10)
K
¦
M n,m .
n =1 m = n +1
Fig. 2. Dealing with the ENSN of contour pixels (gray pixels) between class 1 and class 2
The edge entropy will be used in EECSR for monitoring the regularity of medium segmentation results and deciding when to stop the segmentation process.
4 SAR Image Segmentation 4.1 Description of Segmentation Method
We get the segmentation based on maximization of the posterior marginals (MPM) criterion. As shown in [4,8], finding the MPM estimation of X is equal to maximizing each pixel’s posterior marginal probability mass function
P( X s = k | y, ș) = ¦ P( x | y, ș)
(11)
x:xs = k
over all k {1,2,…,K}, for every s ∈ S . Then the MPM estimation of X is
xˆ = {xˆs : xˆs = arg max P( X s = k | y, ș) } k∈{1,2,K }
(12)
Because exact computation of these marginal probability mass functions as in (11) is computationally infeasible, we use the approximation method proposed by Marroquin et al.[8]: use Gibbs sampler [3] to generate a discrete-time Markov chain X(t)={X(1),X(2),…, X(Ts)} which converges in distribution to a random field with probability mass function (4), and then approximate function (11) by
P ( X s = k | y , ș) ≈
1 Ts
Ts
uk , s (t ), ∀k ,s ¦ t =1
(13)
SAR Image Segmentation Using EECSR
533
where uk,s(t)=1 for Xs(t)=k, uk,s(t)=0 otherwise, and Ts is the number of visits to pixel s made by Gibbs sampler. The MPM estimation of label image can be got using (13) and (12) with known model parameter ș and data y . We suppose the hyper parameters of MLL model are known, but the gamma parameter ı , is unknown and should be estimated during the segmentation process. The segmentation method is an iterative scheme, which begins with an initial xˆ ( 0 ) and ș (0 ) . At ith iteration, xˆ (i ) is first got, as shown above, using expression (13) and (12), then ı ( i ) is got using EM algorithm. EECSR is combined with MPM for segmentation by slowly increasing the parameter a in (5) during iterative scheme until the edge entropy value of segmentation is below a certain threshold Tee . The threshold Tee must be selected for segmentation. This may be done “in the blind,” but the performance is certainly increased if this choice is data-driven, either from prior experience, or by extracting training samples. We make xˆ ( 0 ) a random realization of label field and estimate ı (0) by
σ k(0) =
1 Nk
¦ s:xˆ
(0) s =k
ys , ∀k ∈ {1, 2, , K }
(14)
where, Nk is the number of pixels of label k in xˆ (0) . Section 4.2 will show how to get ı ( i ) .
4.2 EM Method for Estimation of Gamma Parameter EM is a well-known algorithm for the estimation of parameters in incomplete-data problems. Now, we use it for estimating Gamma parameters of the observed SAR image. The EM algorithm is an iterative algorithm that consists of two alternating steps: the expectation step and the maximization step. If ș (i ) is the estimation of ș at ith iteration, the expectation function at ith iteration is
Q(ș, ș(i −1) ) = E (log P( x | ș) y, ș(i −1) ) + E (log P( y | x, ș) y, ș(i −1) )
(15)
Since the first term of (15) does not depend on ı , we only use the second term of (15). Let N denote the number of pixel sites in S . Substituting (1) into (15), we get
(
)
N
E log P( y x, ș) y, ș (i −1) = ¦ ¦ log P ( y j x j , ș) ⋅ P( x y, ș( i −1) ) K
N
K
x j =1
K
= ¦ ¦ ¦ ¦ δ k , x j ⋅ log P( y j x j , ș) ⋅ P ( x y, ș (i −1) ) x1 =1 x N =1 j =1 k =1 K N
K
K
x1 =1
xN =1
(16)
= ¦ ¦ log P( y j x j , ș) ¦ ¦ δ k , x j ⋅ P( x y, ș( i −1) ) k =1 j =1 K N
(
= ¦ ¦ log L y j k =1 j =1
L
L −1
)
exp( − y j L σ x j ) Γ( L)σ x j ⋅ P( X j = k y, ș L
( i −1)
)
534
Y. Cao, H. Sun, and X. Xu
Differentiating, setting to zero, and solving for ı gives [9] N
σˆ k (i ) = ¦ ys P( X j = k | y, ș(i −1) ) j =1
N
¦ P( X
j
= k | y, ș(i −1) ) , ∀k ∈ {1, K }
(17)
j =1
where P( X j = k y, ș( i −1) ) can be approximated by (13).
5 Segmentation Results The proposed method is applied to synthetic and real SAR images. All computations were carried out using programs written in C and running on a Pentium-Based PC (2.4GHz).
5.1 Synthetic SAR Images Fig.3(left) shows a synthetic three-look SAR intensity image, in which, there are three different classes with gamma distribution parameters ı=[32400,57600,90000]. The hyper parameters of MLL model are ȕ =[0.79,1.05,0.28,0.06], which were estimated from the true label image of Fig.3(left) using method in [10]. The edge entropy threshold Tee for segmentation of Fig.3(left) is 0.45, which is also estimated from the true label image of Fig.3(left). Parameter a, the weight of prior energy in Gibbs energy begins from 0.05, and increases following a(k)=a(k-1)/0.93, where k is the time of iteration. Ts , the length of Markov chain X(t) at each iteration is 20. We got the segmentation result (Fig.3(middle)) after 56 iterative times when the edge entropy of segmentation result reached the threshold Tee =0.45. Continuing the iteration and stopping the segmentation processing until the parameter a reaches a large value 72, then we got another segmentation result (Fig.3(right)), which is got by using CSR instead of EECSR. The second result is got after 100 iterative times. The first result clearly conserves more details than the second one.
Fig. 3. Synthetic SAR image with 3 classes(left); Segmentation result using EECSR (56 iterations)(middle) and using CSR (100 iterations)(right)
The percentage of misclassified pixels (PMP) and the edge entropy of all medium segmentation results during 100 iterations were calculated and shown in Fig.4. We can see that the edge entropy value decreases throughout the 100 iterations. When the
SAR Image Segmentation Using EECSR
535
edge entropy value reaches Tee =0.45, after 56 iterations, the PMP is just of the least value. This phenomenon implies that edge entropy is an effective measure for controlling the segmentation process and getting good segmentation result.
Fig. 4. The edge entropy and percentage of misclassified pixels (PMP) versus the times of iteration
5.2 Real SAR Images Fig.5(first row) shows segmentation results using EECSR for four single-look intensity SAR images, in which, there are mainly three different classes: tank, shadow of tank and background. The hyper parameters of MLL model are ȕ =[0.5,0.5,0.5,0.5].
Fig. 5. Real single look SAR tanks images (first row); Segmentation results with Tee=0.45 (second row); Segmentation results with Tee=0.2(third row)
536
Y. Cao, H. Sun, and X. Xu
Parameter a, the weight of prior energy in Gibbs energy begins from 0.05, and increases following a(k)=a(k-1)/0.93, where k is the time of iteration. Ts , the length of Markov chain X(t) at each iteration is 20. The edge entropy threshold Tee for segmentation results of Fig.5(second row) is 0.45. The edge entropy threshold Tee for segmentation results of Fig.5(third row) is 0.2. It can be seen that Fig.5(second row) conserve more details than Fig.5(third row) and Fig.5(third row) are more regular than Fig.5(second row). Fig.6 shows a real SAR intensity image (farmland, single look) and its segmentation result using the proposed segmentation method. The SAR data was acquired by the ERS1-2 in Paris and its neighborhood. We segmented the SAR image into 3 different farmland types. The edge entropy thresholds for the two results are Tee=0.25 and Tee=0.18. Other parameters are same with segmenting Fig.5.
Fig. 6. Real single look SAR farmland image(left); the segmentation result with Tee= 0.25(middle); the segmentation result with Tee=0.18(right)
6 Conclusions We proposed a SAR image segmentation method combining MLL model and EECSR. Prior contextual knowledge brought by MRF model makes image segmentation a constrained optimization problem that equals minimizing a Global energy function. The proper weight of the prior part in the global energy is usually unknown, whereas it has important effect on quality of segmentation results. We proposed the EECSR method to deal with the weight. In EECSR, the increasing of the weight of prior energy come along with monitoring edge entropy of segmentation result, and will stop when edge entropy reach an expected threshold. EECSR is better than CSR at two points: first, edge entropy threshold relate to prior knowledge of segmentation more directly than the maximum weight of prior energy in Global energy; second, EECSR can guarantee the consistently performance in edge entropy sense. Segmentation results for real and synthetic SAR Images show that our method is effective.
Acknowledgments I would like to thank for the financial support from National Natural Science Foundation of China (projects No.60372057 and No.40376051), and from State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing of China.
SAR Image Segmentation Using EECSR
537
References 1. Oliver, C.J. and Quegan, S.: Understanding Synthetic Aperture Radar Images, Boston London: Artech House Publishers(1998) 2. Li, S.Z.: Markov Random Field Modeling in Computer Vision, Springer-Verlag New York, Berlin Heidelberg Tokyo(1995) 3. Geman, S. and Geman, D.: Stochastic Relaxation, Gibbs Distributions, and the Bayesian Resoration of Images, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.6 no.6 (1984) 721-741 4. Comer, M.L. and Delp, E.J.: The EM/MPM Algorithm for Segmentation of Textured Images: Analysis and Further Experimental Results, IEEE Trans. Image Processing, Vol.9 no.10 (2000) 1731-1744 5. Dina, E.M. and Simon, P.W.: Double Markov Random Fields and Bayesian Image Segmentation," IEEE Trans. Image Processing, Vol.50 no.2 (2002) 357-365 6. Geman, D. Geman, S., Graffigne, C. and Dong, P.: Boundary Detection by Constrained Optimization, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.12 no.7 (1990) 609-629 7. Geman, D.: Random Fields and Inverse Problems In Imaging, Lecture Notes in Mathematics, Vol. 1427 (1990) 113-193 8. Marroquin, J., Mitter, S. and Poggio, T.: Probabilistic Solution of Ill-posed Problems in Computational Vision, J.Amr.Statist.Assoc., Vol. 82 (1987) 76-89 9. Yongfeng, C., Hong, S. and Xin, X.: An Unsupervised Segmentation Method Based on MPM for SAR Images, Geoscience and Remote Sensing Letters, IEEE, Vol.2 no.1, (2005) 55-58 10. Derin, H. and Elliot, H.: Modeling and Segmentation of Noisy and Textured Images Using Gibbs Random Fields, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.9 no.1 (1987) 39-55
The Influence of Channel Coding on Information Hiding Bounds and Detection Error Rate Fan Zhang1,2 and Xinhong Zhang3 1
Institute of Advanced Control and Intelligent Information Processing, Henan University Kaifeng 475001, P. R. China [email protected] 2 College of Computer & Information Engineering, Henan University, Kaifeng 475001, P. R. China 3 Department of Computer Center, Henan University, Kaifeng 475001, P. R. China [email protected]
Abstract. This paper analyzes the influence of channel coding on the payload capacity and detect error rate of information hiding in the digital images. According to the results of research, the detection error rate of information hiding is mainly influenced by the information average energy and the payload capacity. The error rate rises with the increase of the payload capacity of information hiding. When the channel coding is used, the information hiding error rate drops with the decrease of the payload capacity.
1
Introduction
Information hiding techniques have recently become important in a number of application areas. Digital images, audio and video are increasingly furnished with distinguishing but imperceptible marks, which may contain a hidden copyright notice or serial number or even help to prevent unauthorized copying directly. The concept of information hiding had been proposed to solve the problems of illicit interception and unauthorized copying of digital media. Information hiding embeds secret data into a host medium so that the hidden data are imperceptible but known to the intended recipient. The host medium in an information hiding system may be a digital image, audio, video, or another type of media. It is a useful technique for sending significant information such as military map, business secret information, and personal financial information secretly during multimedia communication. The first academic conference on the information hiding was organized in 1996. Information hiding has some developed branches including steganography and watermarking [1]. The information hiding capacity of digital image is the number of bits that can be hidden in a given host image. The information hiding can be considered as a communication process, and the Gaussian probability distribution is a popular model for the information-hiding channel. This model gives rise to closed D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 538–546, 2006. c Springer-Verlag Berlin Heidelberg 2006
The Influence of Channel Coding on Information Hiding Bounds
539
form solutions for the information hiding capacity. The image in which the information messages are embedded is the communication channel. The information messages are transmitted over the channel [2,3]. The information hiding capacity corresponds to the communication capacity of the “information hiding channel”. A block diagrams is shown in Fig. 1 to illustrate the process of information hiding.
Blind No-blind
Host data
01011
Information message
EMBED
PROCESSING & ATTACKS
Stego message
DETECT
Received message
01011
Reconstructed message
Fig. 1. The process of information hiding
In Fig. 1, the stego image is the information hidden-data. A switch is designed to show the different conditions of the blind and the non-blind information hiding. The blind information hiding means that the information message are detected or extracted without the original image, and the non-blind information hiding means that the original image is needed during the information message detecting or extracting. Assume that the original image is an independent additive white Gaussian noise (AWGN) channel. Then, according to the well-known Shannon capacity formula, the information hiding capacity in non-blind scenario is as follows [4,5], C = W log2
PS 1+ PN
! ,
(1)
where W is the bandwidth of channel, PS denotes the signal power constraint, PN denotes the noise power constraint, PS /PN is called the signal-noise power rate. For the communication process of information hiding, the signal power constraint PS can be calculated by the variance of hidden information amplitude σ 2w and the noise power constraint PN can be calculated by the variance of noise σ 2n [6,7]. Assume the size of image is N × N , and then the number of pixels is M = N × N . According to Nyquist sampling theory, to express all the pixels correctly, sampling points should be 2W at least. So the bandwidth of an image is W = M /2. The research of information hiding detection bit error rate will help us finding ways to hide more information while keeping an acceptable information hiding
540
F. Zhang and X. Zhang
detection bit error rate. The derived formulae are not depending on special algorithms of information hiding, and are useful for the design of general algorithms of information hiding and detection. The rest of this paper is organized as follows. In section 2, we discuss the information hiding detection error rate problem. In section 3, we analyze the relationship between the information hiding detection error rate and the information hiding payload capacity. The experimental results are shown in section 4. The conclusions of this paper are drawn in section 5.
2
Information Hiding Detection BER
The information hiding capacity is the number of bits that can be hidden in a given host image. In spite of the definition of information hiding capacity is simple, we have to distinguish between two different concepts the information hiding capacity: 1. Payload capacity CP L. It is the size (in bits) of the information messages actually hidden, associated to a certain decoding error rate. 2. Theoretical capacity C. It is a theoretical limit on the amount of error-free emendable information, or inversely, on the minimum probability of error attainable for the given messages. Therefore, capacity is the maximum possible messages length for errorless decoding. Greater input messages length than the capacity can be used but a zero error rate will not be attainable. But because of the involuntary attacks or noises, the zero error rate is not attainable even the input messages length (payload capacity) less than the channel capacity. The performance of information hiding detection is measured by the bit error rate (BER) or probability of error PB . Bit error rate is the number of error bits in the total length of information messages bits.
3
Detection Error Rate and Payload Capacity
According to Shannon second coding theorem, when the information transfer rate R is lesser than the channel capacity C, we can find a code, so that the average decoding error is small enough. If the information transfer rate R is greater than the channel capacity C, no matter how the codeword length is, we cannot find a code so that the average decoding error is small arbitrarily. The channel coding can be used to reduce the error rate PB . Channel coding theorem does not tell us how to find good codes but it gives us the direction, it gives us confidence to find the suitable codes. In recent year Turbo code may be is a good code. The RS code and the Revolution code also are good codes [8,9]. The bit error rate of channel decoding is the quotient of the number of erroneous bits divided by the total number of bits transmitted in a channel. We denote the bit error rate of channel decoding as Pb . In the condition of no channel
The Influence of Channel Coding on Information Hiding Bounds
541
coding is used, the error rate is decided by the the information hiding detection error rate PB . The watermark messages are embedded in images after the channel coding is used. In the watermarking detection, the probability of error for transmission is PB . The channel coding will decrease PB . So, PB and Pb decide the total error rate of information hiding. The channel coding means the increase of redundancy data, but the increase of redundancy data needs the higher information transfer rate. This means the greater bandwidth of channel is needed. According to Shannon second coding theorem, no matter what channel code we used, the bit error rate Pb can not be zero when the input energy ratio Eb /N0 ≤-1.6db. We should decrease the payload capacity CP L to decrease Pb . But if the payload capacity CP L is too small, we can only hide few information messages in images. Our goal is to hide more information while keeping acceptable detection bit error rate. When the information transfer rate is lesser than the channel capacity, we can decrease the bit error rate by increasing the codeword length n. When R < C, Pb can be expressed as, Pb = exp[−n · Er (R)], (2) where R is the information transfer rate, Er (R) is the random-coding exponent [10], (3) Er (R) = max max{E0 [ρ, P (x)] − ρR}, 0≤ρ≤1 p(x)
where ρ is the correction coefficient, 0 ≤ ρ ≤ 1. The function E0 [ρ, P (x)] is, 1+ρ 1 p(xi ) · p(yj |xi ) 1+ρ . (4) E0 [ρ, P (x)] = − ln Y
X
According to Eq. (3), the random-coding exponent Er (R) is the maximum of function {E0 [ρ, P (x)] − ρR} to ρ, under the condition as, ∂[E0 (ρ, P (x)) − ρR] ∂Er (R) = = 0. ∂ρ ∂ρ
(5)
According to above equation, R=
∂E0 [ρ, P (x)] . ∂ρ
(6)
If ρ = 0, R = C. While when ρ = 1, R = RCR . RCR is called the Critical Rate, R=
∂E0 [ρ, P (x)] |ρ=1 . ∂ρ
(7)
According to Eq. (3), ∂Er (R) = −ρ. (8) ∂R We can see from above equations, the random-coding exponent Er (R) decreases with the rise of the information transfer rate R. According to Eq. (2), the bit
542
F. Zhang and X. Zhang
error rate Pb rises with the rise of the information transfer rate R when the codeword length is fixed. On the other hand, Er (R) increases with the drop of the information transfer rate R and the bit error rate Pb decreases. In the information-hiding channel, the bit error rate Pb will rise with the increase of the payload capacity CP L . If the payload capacity exceeds the bound of channel capacity, the error cannot be corrected. In the general communication channel, if we wand to decrease the bit error rate by using channel coding, we should increase the codeword length, and the added information messages can be performed by enlarging the bandwidth or increasing the transfer time. While in the information hiding channel, the bandwidth is fixed, and no transfer time can be used. So we should analyze Eq. (2) according to the properties of the information-hiding channel. Assume that the maximum length codeword is used in order to get the optimum error rate. The maximum length of codeword is decided by the capacity of information hiding. In this case, according to Eq. (2), the random-coding exponent Er (R) decides the error rate. The information messages are hidden in images after channel coding is used, and the detection error rate is PB . We view the process of the information messages hiding and detection is a binary symmetric channel (BSC) communication. Assume that the hidden information messages include two kinds of letters with the same probability, p(0) = p(1) = 1/2, then the cross transition probability is PB . According to Eq. (4), we can calculate the function E0 [ρ, P (x)] in above conditions, 1+ρ 1 1+ρ p(xi ) · p(yj |xi ) E0 [ρ, P (x)] = − ln Y
X
"
# 1+ρ 1 1 1 1 p(y0 |x0 ) 1+ρ + p(y0 |x1 ) 1+ρ 2 2 " #1+ρ $ 1 1 1 1 p(y1 |x0 ) 1+ρ + p(y1 |x1 ) 1+ρ + 2 2 ⎧ 1+ρ ⎫ 1 ⎨ ⎬ 1 1 1 (1 − PB ) 1+ρ + PB1+ρ = − ln 2 × ⎩ ⎭ 2 2 1
= − ln
1
When ρ = 1,
= ρ ln 2 − (1 + ρ) ln PB1+ρ + (1 − PB ) 1+ρ .
(9)
( ( E0 [1, P (x)] = ln 2 − 2 ln( PB + 1 − PB ).
(10)
And the random-coding exponent Er (R) is, ( ( Er (R) = E0 [1, P (x)] − R = ln 2 − 2 ln( PB + 1 − PB ) − R.
(11)
The Influence of Channel Coding on Information Hiding Bounds
543
Combining Eq. (11) to Eq. (2), we get the channel decode error rate of information hiding channel as follow, ( ( Pb = exp{−nmax × [ln 2 − 2 ln( PB + 1 − PB ) − R]}. (12)
(a)
(b) Fig. 2. Original Lena image (a) and its stego image (b), PSNR is 31.87 dB
544
F. Zhang and X. Zhang
Fig. 3. The detect error decreases with the increase of average information energy (C/W =2, σ 2n =4)
4
Experimental Results
In the experiments, the 256×256 gray images are used. If the average energy of hidden information messages is higher, it means that higher intensity information messages can be embedded in images. The higher intensity information messages are easy to extract, and the error rate is smaller. The information hiding payload
Fig. 4. The detect error rises with the increase of information capacity (E(w2 )/σ 2n =16)
The Influence of Channel Coding on Information Hiding Bounds
545
Fig. 5. Lena image information hiding channel decode bit error rate
capacity also affects the error rate. For the simplest case, if we assume that an information-hiding algorithm adds a given value to the amplitude of each pixel, the hidden information is only one bit. In this scenario, the information hiding detection is very easy and the error rate is the smallest one. If we want to hide more information messages, the modification to the host image must be more complex, and the detection error rate will rise with the increase of information hiding payload capacity. Fig. 2 shows original Lena image and its stego image (PSNR is 31.87 dB). Fig. 3 shows that the detect error decreases with the increase of average information energy. (C/W =2, σ 2n =4). Fig. 4 shows that the detect error rises with the increase of information capacity (E(w2 )/σ 2n =16). Fig. 5 shows the relationship between the error rate of channel decoding and the information transfer rate (payload capacity) of Lena image. From the figure we can see that when the payload capacity is lesser than the Critical Rate RCR , the channel decode error rate is smaller, and if the payload capacity is greater than the Critical Rate RCR , the channel decode error rate increases rapidly.
5
Conclusions
This paper analyzes the information-hiding problem based on the channel capacity and the error rate theories of communication system, derives the relationship between the detection error rate with the information capacity and payload capacity. The derived formulae not depend on the special algorithms of information hiding, and they are useful for the design of general algorithms of information hiding and detection. According to above analysis and the experimental results, we can draw some conclusions as follows:
546
F. Zhang and X. Zhang
1. The detection error rate of information hiding is mainly influenced by the information average energy and the payload capacity. The error rate rises with the increase of the payload capacity of information hiding. 2. In the condition of no channel coding is used, the information hiding error rate is decided by the detection error rate. When the payload capacity CP L is lesser than channel capacity, the error rate drops with the increase of payload capacity. 3. Because the signal-noise rate of information hiding channel is smaller, the detection error rate PB is higher. So, it is difficult to get arbitrary small error rate if no channel coding is used. 4. If the channel coding is used, the information hiding error rate is decided by PB and by the channel decode bit error rate Pb . The error rate drops with the decrease of the payload capacity.
Acknowledgements This work was supported by the Natural Science Foundation of Henan University under Grant No. 05YBZR009.
References 1. Petitcolas, F., Anderson, R. and Kuhn, M.: Information Hiding A Survey. Processing of IEEE, 87(7) (1999) 1062–1078 2. Cox, I. Kilian, J. Leighton, T. and Shamoon, T.: Secure Spread Spectrum Watermarking for Multimedia. IEEE Transactions on Image Processing. 6(12) (1997) 1673–1687 3. Cox, I. Miller, M., McKellips, A.: Watermarking as Communications with Side Information. Proceedings of the IEEE. 87(7) (1999) 1127–1141 4. Moulin, P., Mihcak, M.: A Framework for Evaluating the Data-Hiding Capacity of Image Sources. IEEE Transactions on Image Processing, 11(6) (2002) 1029–1042 5. Lin, C., Chang, S.: Zero-error Information Hiding Capacity of Digital Images. In: Proceedings of IEEE nternational Conference on Image Processing. Greece, (2001) 1007–1010 6. Zhang, F., Zhang, H.: Digital Watermarking Capacity and Reliability. In: Proceedings of e-Commerce Technology. San Diego, California, USA, (2004) 295–298 7. Zhang, F., Zhang, H.: Image Watermarking Capacity and Reliability Analysis in Wavelet Domain. Journal of Imaging Science and Technology, 49(5) (2005) 481–485 8. Huang, J., Sh, Y.: Reliable Information Bit Hiding. IEEE Trans. on Circuits and Systems for Video Technology. 12(10) (2002) 916–920 9. Perez-Gonzalez, F., Hernandez, J., Felix, B.: Approaching the Capacity Limit in Image Watermarking, A Perspective on Coding Techniques for Data Hiding Applications. Signal Processing. 81(6) (2001) 1215–1238 10. Gallager, R.: A Simple Derivation of the Coding Theorem and Some Applications. IEEE Trans. Inform. Theory. 11 (1965) 3–17
Wavelet Thinning Algorithm Based Similarity Evaluation for Offline Signature Verification Bin Fang1, Wen-Sheng Chen2, Xinge You1, Tai-Ping Zhang1, Jing Wen1, and Yuan Yan Tang1 1 College of Computer Science Chongqing University, Chongqing, China 400030 {fb,xyou,yytang}@cqu.edu.cn 2 Department of Mathematics Shenzhen University, Shenzhen, China 518060 [email protected]
Abstract. Structure distortion evaluation is able to allow us directly measure similarity between signature patterns without classification using feature vectors which usually suffers from limited training samples. In this paper, we incorporate merits of both global and local alignment algorithms to define structure distortion using signature skeletons identified by a robust wavelet thinning technique. A weak affine model is employed to globally register two signature skeletons and structure distortion between two signature patterns are determined by applying an elastic local alignment algorithm. Similarity measurement is evaluated in the form of Euclidean distance of all found corresponding feature points. Experimental results showed that the proposed similarity measurement was able to provide sufficient discriminatory information in terms of equal error rate being 18.6% with four training samples.
1 Introduction It is well known in pattern recognition that when the ratio of the number of training samples to the number of the feature dimensionality is small, the estimates of the statistical model parameters are not reliable and accurate, and therefore the classification results may not be satisfactory. This problem is especially significant in off-line signature verification where usually only a few samples can be available for training such as 2-4 signatures when a person open bank accounts [1-9]. Most of the existing work employs various kinds of features to represent signature patterns for the purpose of off-line verification [10-12]. However, two important issues must be addressed: first, how much these extracted features are discriminatory to represent the signature itself for the purpose of verification. While lots of different algorithms have been proposed till now, most of them are ad hoc and assume built models being stable. Secondly, in any cases, usually only a limited number of samples can be obtained to train an off-line signature verification system which further makes off-line signature verification a formidable task. Hence, it is desired to study straightforward means to measure similarity between signatures by employing the 2-D signature graphs. The problem is how to define and compute similarity measurement D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 547 – 555, 2006. © Springer-Verlag Berlin Heidelberg 2006
548
B. Fang et al.
between two signature patterns. Since no two signatures are identical, there are no linear or non-linear parametric models that we can employ to register or match two signature patterns. See Figure 1. If we apply some non-parametric schemes such as elastic matching algorithm to find out corresponding stroke or feature points between two signature patterns, since they are not globally aligned in terms of shift and rotation, the computed similarity quantity in certain form is just misleading to wrong measurement. Our idea to tackle this problem is to incorporate merits of both global and local alignment methods. We propose algorithms to register two signatures using weak affine transformation and measure the similarity based on a match list of feature points produced by an elastic local alignment algorithm. Experimental results showed that the computed similarity measurement was able to evaluate structure distortion between two signatures with sufficient discriminatory information in terms of verification. The equal error rate was 18.6% with four training samples in our experiments. Both the global and local alignment methods use the identified skeleton patterns of signatures for matching. The reason is that for weak affine transform, distance between two patterns will be accumulated to be considered as matching accuracy measure for optimal parameter search, and if we take all pixels of the signature into account, not only the computation complexity tends to be high, but the computed distance usually does not accurately reflect the matching degree as well. It is also true for local elastic matching which will use feature points of short segments of signature skeletons. Therefore, we adopt a robust skeleton extraction approach which is based on wavelet transform of B-spline function. The inherent properties of modulus minima of the wavelet transform make it capable to present skeleton of signature patterns effectively almost without any side-effects through iterative operations.
(a)
(b)
Fig. 1. (a) Original image of signature Template; (b) original image of signature Input
2 Signature Skeleton Identified by Wavelet Thinning Algorithm Generally speaking, thinned algorithms can be classified into direct and indirect computing methods. Direct computing methods suffer from their drawbacks which badly affect performance. The generated skeleton pixels may not be connected and
Wavelet Thinning Algorithm
549
the resulting skeleton may not be centred inside the underlying shape. Alternatively, skeletons can be computed indirectly where the skeleton is referred to as the locus of the symmetric points of the shape contour. Different local symmetry analysis often leads to different symmetric points, and hence different skeletons are produced, such as Symmetric Axis Transform, Smoothed Local Symmetry, Process-Inferring Symmetry Analysis. However, the major problem of the indirect computing techniques lies in the difficulty of accurately identifying local symmetries of the underlying shape. And even if a pair of symmetric contour pixels can be accurately found, their corresponding symmetrical center may not exist. In addition, both direct and indirect thinning methods are valid for binary images and are computation cost. We have proposed a robust technique [15] to deal with gray-level images of low computation complexity based on maximum modulus symmetry of wavelet transform (MMSWT). The wavelet function we employ to detect edge points for centerline identification is quadratic B-spline function which can be expressed in Equation (1) and demonstrated in figure 2:
ψ 1 ( x, y ) = ψ ( x 2 + y 2 ) ψ 2 ( x, y ) = ψ ( x 2 + y 2 )
x x + y2 y 2
(1)
x2 + y2
where 2 2 ° 24 x + y ° 3 ° ψ ( x 2 + y 2 = ®− 8 x 2 + y 2 ° ° °¯
3
x 2 + y 2 − 16 x 2 + y 2 x 2 + y 2 + 16 x 2 + y 2 0
2
2
x2 + y 2 x2 + y 2 − 8 x2 + y 2
if x2 + y2
x 2 + y 2 ≤ 0.5
if 0.5 ≤ x 2 + y 2 ≤ 1 if
x2 + y 2 ≥ 1
We have proved mathematically that for different scales of wavelet transform, the wavelet minima locate in the mid position of the signature strokes. Especially, if the scale matches well with the width of stroke, the local minimum points exist uniquely and form skeleton lines of the ridge segments which consist of a series of single pixel. Otherwise, all local minimum points form the thinned ribbon consisting of multiple pixels in the mid of signature strokes. This property implies that the locations of wavelet minima cover exactly the inherent central line of the shape. One practical issue is how to determine the scale of wavelet transform: it is difficult to evaluate width of strokes beforehand and strokes of one signature pattern are of different width. Our answer is to adopt a much larger scale relative to the existing strokes. However, this result in multiple modulus minimum points and all of these points form a continuous region or bandwidth, which is called skeleton ribbon. In order to obtain one-pixel skeleton of the signature patterns, a multi-scale-based approach is used. Our basic idea is as follows: for each input image, we randomly choose a scale of wavelet transform and extract its corresponding skeleton of the underlying strokes by computing all wavelet minima. Hence, all these local minimum points produce the primary skeleton ribbon of the underlying ridge which consists of
550
B. Fang et al.
Fig. 2. The graphical descriptions of 2D wavelet functions: (a) function function
nd (b)
multiple pixels. Obviously, these primary skeleton ribbons are apparently thinner than the original shapes and preserve exactly the topological properties of the original strokes. Likewise, we choose a smaller scale than the prior one to perform the second wavelet transform on the image which contains generated skeleton ribbons, and compute the second level skeletons. The above procedure is iterated until the central curves are single pixel wide.
3 Similarity Measure on Corresponding Stroke Feature Points In order to measure similarity between two signatures in terms of certain kind of distance computation such as Euclidean distance, it is important to accurately find out corresponding strokes or feature points of the signatures. Here, we employ a fast global registration algorithm based on the weak affine transform to align the signature patterns. It also functions as normalization in traditional pre-processing stage. An elastic local alignment algorithm is then applied to produce a match list of corresponding feature points between the signatures to facilitate the computation of similarity measure. 3.1 Fast Global Registration Signature patterns are not just simply overlapped each other to find out corresponding feature points since shift and rotation have existed. Although no two signature from the same person are identical which means that no parametric registration model is applicable no matter linear or non-linear, considering the major two factors, we propose the use of a weak affine registration of translations and rotation to roughly align the two signature patterns in order to facilitate the elastic local alignment to find corresponding feature points for similarity measurement. The model can be mathematically expressed as follows: ª x′ º ªcos θ « y′» = « sin θ ¬ ¼ ¬
− sin θ º ª x º ª Δx º + cos θ »¼ «¬ y »¼ «¬ Δy »¼
(2)
In order to evaluate the goodness of fit between two signature patterns, a distance measure is computed in terms of the corresponding transformation. A search for the
Wavelet Thinning Algorithm
551
optimal transformation is to find the global minimum of the defined distance function. The search process typically starts with a number of initial positions in the parameter space by multi-resolution strategy. The idea behind multi-resolution matching is to search for the local optimal transformation at a coarse resolution with a large number of initial positions. Only a few promising local optimal positions with acceptable centreline mapping errors of the resulting transformation are selected as initial positions before proceeding to the next level of finer resolution. The assumption is that at least one of them is a good approximation to the global optimal matching. The algorithm is detailed as follows. One of the two signatures to be registered is called the Template and the other the Input. Thinning is performed for both the Template and the Input so that the resulting patterns consist of lines with one pixel width only. A sequential distance transformation (DT) is applied to create a distance map for the Template by propagation local distances [13]. The Input at different positions with respect to the corresponding transformations is superimposed on the Template distance map. A centreline mapping error (CME) to evaluate matching accuracy is defined as the average of feature point distance of the Input as follows:
CME =
1 N
¦ DM
p ( i , j )∈Input p ( i , j )∈Template
Template
( p(i, j )) 2
(3)
where N is the total number of feature point p(i,j) of the Input signature pattern and DM is the distance map created for the Template signature pattern. It is obvious that a perfect match between the Template and Input images will result in a minimum value of CME. The search of minimum CME starts by using a number of combinations of initial model parameters. For each start point, the CME function are searched for neighboring positions in a sequential process by varying only one parameter at a time while keeping all the other parameters constant. If a smaller distance value is found, then the parameter value is updated and a new search of the possible neighbors with smaller distance continues. The algorithm stops after all its neighbors have been examined and there is no change in the distance measure. After all start points have been examined, transformations having local minima in CME larger than a prefixed threshold are selected as initial positions on the next level of finer resolution. The optimal position search of maximum similarity between signatures is operated from coarse resolution towards fine resolution with less and less number of start points. The final optimal match is determined by the transformation which has the smallest centreline mapping error at level 0 (the finest resolution). Once the relative parameters for the global transformation model have been computed, the registration between two signatures is ready as illustrated in Fig. 3. 3.2 Elastic Local Alignment Since the two signature patterns have already been roughly aligned by global registration, the next work to do is to find out corresponding feature points by using a non-parametric elastic matching algorithm. Let Template and Input be the two globally
552
B. Fang et al.
(a)
(b)
Fig. 3. (a) Overlapped image of the signature Template and signature Input in Fig. 1. before matching.(b) Overlapped image of signature Template (black, Fig. 1(a)) and signature Input (grey, Fig. 1(b)) after global registration.
(a)
(b)
Fig. 4. (a) Overlapped image of the approximated skeletons of signature Template (black) and signature Input (grey) after global registration. (b) Overlapped images after elastic local alignment by deforming the Template.
registered signature patterns. Lines and curves are approximated by fitting a set of straight lines. Each resulting straight line is then divided into smaller segments of approximately equal lengths referred as an 'element' which is represented by its slope and the position vector of its midpoint. Both signature patterns are, in turn, represented by a set of elements. Hence, the matching problem is equal to matching two sets of elements. Note that the number of elements in the two patterns need not be equal. The Template is elastically deformed in order to match the Input locally until the corresponding elements of both Input and Template meet, as illustrated in Fig. 4. The objective is to achieve local alignment while to maintain the regional structure as much as possible. We elaborately create an energy function whose original format can be found in [14] to guide the deformation process. NI
NT
i =1
j =1
(
E1 = − K 12 ¦ ln ¦ exp − T j − I i NT NT
(
+ ¦¦ w jk d Tj ,Tk − d j =1 k =1
2
)
2 K 12 f (θ Tj , Ii )
)
2 0 Tj ,Tk
(4)
Wavelet Thinning Algorithm
553
where NT = number of Template elements, NI = number of Input elements, Tj = position vector of the midpoint of the jth Template element, θTj = direction of the jth Template element, Ii = position vector of the midpoint of the ith Input element, θIi = direction of the ith Input element, θTj,Ii = angle between Template element Tj and Input element Ii, restricted within 0-90o , f(θTj,Ii) = max(cos θTj,Ii, 0.1), dTj,Tk = current value of |Tj − Tk|,
d 0Tj,Tk = initial value of |Tj − Tk|, w jk =
(
exp − Tj − Tk
¦ exp (− T NT
j
2
− Tn
n =1
2 K 22 2
)
2 K 22
)
K1 and K2: size parameters of the Gaussian windows which establish neighbourhoods of influence, and are decreased monotonically in successive iterations. The first term of the energy function is a measure of the overall distance between elements of the two patterns. As the size K1 of the Gaussian window decreases monotonically in successive iterations, in order for the energy E1 to attain a minimum, each Ii should have at least one Tj attracted to it. The second term is a weighted sum of all relative displacements between each Template element and its neighbors within the Gaussian weighted neighborhood of size parameter K2. Each Template element normally does not move towards its nearest Input element but tends to follow the weighted mean movement of its neighbors in order to minimize the distortions within the neighborhood. E1 is minimized by a gradient descent procedure. The movement ΔTj applied to Tj is equal to -∂E1/∂Tj and is given by NI
NT
i =1
m =1
ΔT j = ¦ u ij (I i − T j ) + 2¦ ( w mj + w jm )[( Tm − Tm0 ) − ( T j − T 0j )]
(5)
where Tj0 = initial value of Tj and NT
u ij = exp( − | I i − T j | 2 2 K 12 ) f (θ Ii ,Tj ) / ¦ exp( − | I i − Tn | 2 2 K 12 ) f (θ Ii ,Tn ) n =1
3.3 Similarity Evaluation At the end of the iteration of the elastic local alignment, the corresponding elements of the two signature patterns should hopefully be the nearest to each other as shown in Fig. 4(b). And a match list of feature points has been produced. Then it is trivial to compute the Euclidean distance between two matched feature points by referring to their original coordinate positions. We define the mean square of sum Euclidean
554
B. Fang et al.
distances of all found corresponding feature points as the measure quantity for similarity between two signatures (SQ) as given below: SQ =
1 N
¦ (T N
i =1
i
− S i'
)
2
(6)
where Ti is the position vector of one feature points of the Template, S i' is the position vector of the corresponding feature points of the Input, N is the total number of feature points in the Template. As we want to test the robust and effectiveness of our proposed algorithms, only one sample signature was used as Template in global and local alignment. Another 3 genuine samples were employed to determine the threshold which verifies the test signature whether it is genuine or not. Any test signatures with SQ value larger that the threshold will be rejected as a forgery.
4 Experimental Results The database to test the proposed algorithms was collected from 55 authors who contributed 6 genuine signature samples and 12 forgers who produced 6 forgeries for each author. 4 out of the 6 genuine signatures were used as training samples where one was arbitrarily selected as the Template and the remaining three were used to determine the suitable threshold. Cross-validation strategy was adopted to compute performance in terms of EER which stands for Equal Error Rate when the FAR is equal to the FRR. To globally register two signature patterns by using the weak affine model through multi-resolution approach, the depth of multi-resolution is set to 2, resulting in the size of the 2nd level images being 32×32. There are 54 initial positions at the lowest resolution, namely: 3×3 translation points, separated by 3 pixels, and 6 equidistant rotation angles. In the practice to employ the elastic local alignment method to find out corresponding feature points between two signature patterns, each line or curve is approximated by fitting a sequence of short straight lines ('elements') of about 10 pixels long. The neighbourhood size parameters were set to 20 pixels. On average, the EER was 18.6% which is comparable to other existing methods [25] for off-line signature verification. EER was computed by varying the threshold which is 1.0 to 2.5 standard deviation from the mean of the SQ values of the three training samples.
5 Conclusion Feature based signature verification suffers from the lack of training samples which leads to unstable and inaccurate statistic model. In order to avoid building statistic model and measure the similarity between two signature patterns directly, we propose the use of a global registration algorithm to roughly align two signatures to facilitate computing similarity measurement between two signature patterns. After applying an elastic local alignment algorithm, we are able to successfully produce a match list which we can use to compute the similarity quantity (SQ) for verification. In the experiment, four sample signatures were employed as training samples to determine the threshold
Wavelet Thinning Algorithm
555
and results were promising. If we use more training samples and incorporate statistical approach, it is hopeful to achieve better performance which is our next work in the future.
References 1. Plamondon, R., Srihari, S.N.: On-Line and Off-Line Handwriting Recognition. A Comprehensive Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence. 22(1) (2000) 63-84 2. Ammar, M.: Progress in Verification of Skillfully Simulated Handwritten Signatures. Int. J. Pattern Recognition and Artificial Intelligence. 5(1) (1991) 337-351 3. Sabourin, R., Genest, G., Prêteux, F.: Off-line Signature Verification by Local Granulometric Size Distributions. IEEE Trans. Pattern Analysis and Machine Intelligence. 19(9) (1997) 976-988 4. Sabourin, R., Genest, G.: An Extended-shadow-code-based Approach for Off-line Signature Verification. Part I. Evaluation of the bar mask definition. In Proc. Int. Conference on Pattern Recognition. (1994) 450-453 5. Raudys, S.J., Jain, A.K.: Small Sample Size Effects in Statistical Pattern Recognition. Recommendations for Practitioners. IEEE Tran. Pattern Recognition and Machine Intelligence. 13(3) (1991) 252-264 6. Fukunaga, K.: Introduction to Statistical Pattern Recognition. Second Edition. Academic Press, Boston (1990) 7. Murshed, N., Sabourin, R., Bortolozzi, F.: A Cognitive Approach to Off-line Signature Verification. Automatic bankcheck processing. World Scientific Publishing Co., Singapore. (1997) 8. O’Sullivan, F.: A Statistical Perspective on Ill-posed Inverse Problems. Statistical Science. 1 (1986) 502-527 9. Fang, B., Tang, Y.Y.: Reduction of Feature Statistics Estimation Error for Small Training Sample Size in Off-line Signature Verification. First International Conference on Biometric Authentication. Lecture Notes in Computer Science, Vol. 3072. SpringerVerlag, Berlin Heidelberg New York (2004) 526-532 10. Qi, Yingyong, Hunt, B.R.: Signature Verification Using Global and Grid Features. Pattern Recognition. 27(12) (1994) 1621-1629 11. Sabourin, R., Genest, G., Prêteux, F.: Off-line Signature Verification by Local Granulometric Size Distributions. IEEE Trans. Pattern Analysis and Machine Intelligence. 19(9) (1997) 976-988 12. Fang, B., Leung, C.H., Tang, Y.Y., Tse, K.W., Kwok, P.C.K., Wong, Y.K.: Offline Signature Verification by the Tracking of Feature and Stroke Positions. Pattern Recognition. 36(1) (2003) 91–101 13. Borgefors, G.: Hierarchical Chamfer Matching: A Parametric Edge Matching Algorithm. IEEE Trans. Pattern Analysis Machine Intelligence. 10(6) (1988) 849-865 14. Leung, C.H., Suen, C.Y.: Matching of Complex Patterns by Energy Minimization. IEEE Transactions on Systems, Man and Cybernetics. Part B. 28(5) (1998) 712-720 15. Yuan Yan Tang and X.G. You.: Skeletonization of Ribbon-like Shapes Based on A New Wavelet Function. IEEE Trans. Pattern Analysis Machine Intelligence. 25(9) (2003) 1118-1133
When Uncorrelated Linear Discriminant Analysis Are Combined with Wavelets Xue Cao and Jing-Yu Yang Department of Computer Science, Nanjing University of Science and Technology, Nanjing 210094, P.R. China [email protected]
Abstract. This paper presents a novel and interesting combination of uncorrelated linear discriminant analysis and wavelets to extract features for face recognition. The proposed algorithm when compared with conventional Fisherface method and ULDA+PCA method has an improved recognition rate and a decrease of computational load for face images with high resolutions. In the proposed technique, the face images are divided into smaller sub-images by 2-D DWT and the uncorrelated linear discriminant analysis is applied to approximations sub-images. The time-cost of the proposed method is greatly reduced and recognition rates ranging between 95% and 97.5% are obtained on the ORL database. An average error rate of 1.4% is obtained with the experiments on the NUST603 database. In addition, the effect of number of discriminant vectors on the recognition system is systematically discussed.
1 Introduction Face recognition has received extensive attention within the past 20 years because of the potential applications in many fields, such as access control, electronic commerce, modern guard, commodity marketing and fund dealing, PC and network security. The research work on face recognition is composed of three main problems: segmentation, feature extraction and recognition. Extracting efficient features of face images is the key to complete the task of face recognition. Fisher discriminant analysis (LDA or FDA) is a very important method for feature extraction. Foley–Sammon linear discriminant analysis (FSLDA) and uncorrelated linear discriminant analysis (ULDA) are well known as two kinds of Fisher discriminant analysis. The basic idea of Fisher discriminate analysis is to maximize the Fisher criterion function, then to project high-dimensional feature vector on the obtained optimal discriminant vector for constructing a 1-D feature space [1]. In 1970, Sammon proposed the optimal discriminant plane based on the Fisher linear. In 1975, Foley and Sammon extended Sammon’s method and presented the discriminant method for two-class problems [2]. Duchene and Leclercy solved the problem of finding the set of Foley-Sammon optimal discriminant vectors for multiclass problems [3]. In 2001, Jin and Yang proposed a method to extract optimal discriminant features by using the uncorrelated discirminant transformation [4]. Some of the LDA-based face recognition systems have also been developed and encouraging results have been achieved. However, LDA approach suffers from a D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 556 – 565, 2006. © Springer-Verlag Berlin Heidelberg 2006
When Uncorrelated Linear Discriminant Analysis Are Combined with Wavelets
557
small sample size (SSS) problem. The problem happens when the number of training samples is less than total number of pixels in an image. Under this situation, the within-class scatter matrix will be singular. In turn the inverse of within-class scatter matrix cannot be calculated. The most well known technique proposed to overcome the SSS problem is the Fisherface [5] and ULDA+PCA method [4]. Fisherface method combines the techniques of Principal Component Analysis (PCA) with the LDA (it is also known as PCA+LDA) and ULDA+PCA method combines the PCA and ULDA. Although PCA has been proved to be an effective approach for human face recognition, it suffers from two limitations. First, the accuracy will decrease when the face alignment is deviated [6]. The second problem is the high computational load in finding the eigenvectors [7]. In this paper, we propose a novel method (termed ULDA-Wavelet) for recognition of frontal views of faces. The proposed method is based on Multi-resolution analysis (MRA) and ULDA. Each face image is described by a subset of band filtered images containing wavelet coefficients. From these wavelet coefficients, which characterize the face texture, we build uncorrelated and meaningful face feature vectors using ULDA. Experimental results are presented using images from the ORL database and NUST603 database. The efficiency of the proposed approach is analyzed and our results are compared with those obtained using the well-known Fisherface method and ULDA+PCA method, which combines the techniques of PCA with the ULDA. The outline of this paper is as follows: In section 2, MRA and discrete wavelet transform (DWT) is briefly reviewed. The idea of FSLDA and ULDA are explained briefly and ULDA-Wavelet method is proposed in Section 3. Experimental results and discussion are reported in detail in Section 4. Finally, conclusions are given in Section 5.
2 Multi-resolution Analysis (MRA) and Discrete Wavelet Transform (DWT) Multi-resolution analysis (MRA) was first published in 1989 by Mallat, and the advanced research and development in wavelet analysis has found numerous applications in areas such as signal processing, image processing, and pattern recognition with many encouraging results [8,9]. Since that time, it has become a very important tool in signal processing, image processing, pattern recognition, and other related fields. Two basic properties, space and frequency localization and MRA, make this a very attractive tool in signal and image analysis. Taking the wavelet transform of an image involves a pair of filters, one high-pass and one low-pass. This is followed by
Fig. 1. The structures of 2D DWT (a) One-level (b) two level
558
X. Cao and J.-Y. Yang
Fig. 2. Sample one-level wavelet decomposed image
decimation by two and repeated for as many scales as desired. The image is actually decomposed i.e., divided into four sub-bands and critically sub-sampled by applying 2-D discrete wavelet transform as shown in Fig. 1 and Fig. 2.
3 ULDA-Wavelet Feature Extraction 3.1 Outline of FSLDA and ULDA FDA is a very important method for feature extraction. FSLDA and ULDA are well known as two kinds of Fisher discriminant analysis. Our discussion on discriminant vector is based on the Fisher criterion:
F (φ ) =
φ T S bφ . φ T S wφ
(1)
where Sb and Sw are the between-class scatter matrix and the within-class scatter matrix respectively. The vector ĭ1 corresponding to maximum of F(ĭ) is the Fisher optimal discriminant direction, i.e. Fisher’s vector. It means that the projected set of samples on the direction ĭ1 has the minimal within-class scatter and the maximal between-class scatter in the one-dimensioal subspace spanned by ĭ1. Fishser’s vector ĭ1 is the eigenvector corresponding to maximum eigenvalue of the following eigenequation: S bφ1 = λS wφ1 .
(2)
Suppose r directions ĭ1, ĭ2,…, ĭr (r1) are obtained. We can obtain the (r+1)th direction ĭr+1 which maximizes the Fisher criterion function F( ĭ) with the following orthogonality constraints:
φ rT+1φ i = 0(i = 1,2,… , r) .
(3)
Based on the Jin’s algorithm [4], the eigenvector corresponding to maximum eigenvalue of Eq. (4) can be taken as the ith FSLDA discriminant vector (i>1): MS bφ = λS wφ .
(4)
−1 T −1 −1 T T I is the identity matrix. ĭ1, Where M = I − D ( DS w D ) DS w , D = [φ1φ2 φi −1 ] . ĭ2,…, ĭi-1 are the previous i-1 FSLDA discriminant vectors.
When Uncorrelated Linear Discriminant Analysis Are Combined with Wavelets
559
Based on the optimal discriminant vectors ĭ1, ĭ2,…, ĭk, we can define the following linear transform: T ª y1 º ªΦ 1 º « » « T» Y = « y 2 » = «Φ 2 » X . ... ... «¬ y k »¼ «Φ Tk » ¬ ¼
(5)
But any two features yi and yj (ji) in Foley-Sammon discriminant vectors are statistically correlated. The accuracy of statistical pattern classifiers increases as the number of features increases, and decreases as the number becomes too large [10]. ULDA obtains the uncorrelated feature components. The eigenvector corresponding to maximum eigenvalue of Eq. (6) can be taken as the ith ULDA discriminant vector (i>1): MS bφ = λS wφ .
(6)
T T −1 T −1 −1 Where M = I − S t D ( DS w S t D ) DS t S w , D = [φ1φ2 φi −1 ] . I is the identity
matrix. 1 2 i-1 are the previous i-1 ULDA discriminant vectors [4]. The generalized eigen equation (2) has no solutions for FSLDA and ULDA when the within-class scatter matrix Sw is singular. Generally speaking, the Sw is nonsingular in the case of a large number of samples, and singular in the case of a small number of samples. In Jin’s research [4], PCA is suggested to employ for dimension reduction. 3.2 ULDA-Wavelet Feature Extraction
The above two approaches both use PCA. Using PCA, the high dimensional face data is projected to a low dimensional space and then LDA or ULDA is performed in this PCA subspace. However, the discarded subspace may also encode some information helpful for recognition, this removal may introduce a loss of discriminative information. The second problem in PCA-based method is the high computational load in finding the eigenvectors. The computational complexity of PCA is O(r3),where r=min(N,d), d is the number of pixels in the training images and N is the number of training images [11]. To overcome these problems mentioned above, we resort to 2-D DWT. Wavelet transform is chosen in image analysis and image decomposition because:(1) By decomposing an image using DWT, the resolutions of the sub-images are reduced. In turn, the computational complexity will be reduced dramatically by working on a lower resolution image. (2) Wavelet decomposition provides local information in both space domain and frequency domain. It is well known that the 2-D DWT of a discrete image f(m,n) represents the image in terms of 3J+1 sub-images [12]:
A2 f , {D2(1) , D2( 2 ) , D2( 3) } j −J
Where
D2(3) −j
A2 f −J
−j
−j
−j
= 1, 2 , , J
.
(7) (1)
( 2)
is the approximation of the image at resolution 2-J, D2 − j , D2− j and
are the wavelet sub-images containing the image details at resolution 2-J. If the
560
X. Cao and J.-Y. Yang
A2
original image has a size of n×n, then sub-image (2− J ⋅ n) × (2− J ⋅ n) ,sub-image
D2(1) f , D2( 2 ) f −j
−j
and
−J
has a size of
D2(3) f have a size of −j
( 2 − j ⋅ n) × ( 2 − j ⋅ n ) .
We employ a J-level wavelet transform on the images. We get approximation subimages A 2 − J . Lai et al. [6] have demonstrated that (1) the effect of different facial expressions can be attenuated by removing the high-frequency components and (2) the low-frequency components are sufficient for recognition. As such, we proposed to use the low- frequency subband (approximation sub-image) for feature extraction. For those approximation sub-images, we extract features using ULDA as introduced in section 3.1. We can obtain the optimal uncorrelated discriminant vectors ĭ1, ĭ2,…, ĭk, and then perform the linear transform according to Eq. (5). In our method, the extraction features are more effective because wavelet has locality both in space domain and frequency domain. And the extraction features on wavelet sub-image are statistically uncorrelated. This makes the accuracy of statistical pattern classifiers stable as the number of features becomes too large. Moreover, as fast algorithm of wavelet transform exists, the complexity of wavelet transform is linear proportion to the size of the image.
4 Experiments and Discussion In this section, in order to test the algorithm presented above, we conducted a series of experiments using two different sets of test images. The first set is extracted from ORL database. There are ten different images of 40 distinct subjects. All the images were taken against a dark homogeneous background with the subjects in an upright frontal position with tolerance for some tilting and rotation of up to about 20. There is some variation in scale of up to about 10%. The images are grey scale with a resolution of 92×112. By reducing the resolution of 92×112 images, we can obtain 46×56 images in which any elements is the average value of a corresponding 2×2 sub-image of 92×112 images. In the same way, we can obtain 23×28 images and 12×14 images. The second set is extracted from the NUST603 database, and contains a set of face images taken in 1997 at Nanjing University of Science and Technology, China. There are 10 different images of 96 distinct subjects. All the images were taken against some moderately cluttered backgrounds with the subjects in an upright frontal position, with tolerance for some tilting and rotation. By normalizing method, 960 32×32 normalized faces can be extracted form 960 original images. By reducing the resolution of 32×32 images, we can obtain 16×16 images. Sample images from the two sets are displayed in figure 3 and 4. All experiments were carried out under the same programming environment: P 1.7G and MATLAB 6.5.A nearest neighbor classifier is used for classification. The nearest neighbor classifier makes decision with the minimum rule after the distances between a given test sample and anyone of mean vectors of training samples for each class are computed.
When Uncorrelated Linear Discriminant Analysis Are Combined with Wavelets
Fig. 3. Images from the ORL database
561
Fig. 4. Images from the NUST603 database
4.1 Experiment 1
This experiment aims to compare the face recognition performances obtained from Fisherface method, the ULDA+PCA method and our method, ULDA-Wavelet. Using the ORL database, the first five images are for training and the other five images are for test per class. For the four different resolutions mentioned above, we can obtain a 42-dimensional vectors after applying J-level wavelet decomposition (J=4,3,2,1). Then, by performing the uncorrelated discriminant transformation, we can obtain 39dimensional uncorrelated discriminant feature. The results of these experiments are displayed in Table 1. Using the NUST603 database, the first four images are for training and the other six images are for test per class. For the two different resolutions mentioned above, Table 1. Comparison of recognition performance obtained for ULDA-Wavelet method, Fisherface and ULDA+PCA method for the four different resolutions using ORL database, J refers to the level of wavelet decomposition, T time-cost (s), R recognition rate (%) Method Resolution
92×112 46×56 23×28 14×12
Fisherface
ULDA+PCA
ULDA-Wavelet
T
R
T
R
T
R
J
1658.93 456.03 128.16 35.187
89.5 89.5 90 89.5
1651.3 460.95 117.83 32.13
88 88.5 88.5 94
58.59 38.95 32.44 27.72
95 95 95 91.5
4 3 2 1
Table 2. Comparison of recognition performance obtained for ULDA-Wavelet method, Fisherface and ULDA+PCA method for the two different resolutions using NUST603 database, J refers to the level of wavelet decomposition, T time-cost (s), E error rate (%) Method Resolution
32×32 16×16
Fisherface
ULDA+PCA
ULDA-Wavelet
T
E
T
E
T
E
375.91 125.3
2.6 2.8
377.74 125.14
1.7 0.9
90.296 80.766
1.6 1.7
J 2 1
562
X. Cao and J.-Y. Yang
we can obtain 64-dimensional vectors after applying J-level wavelet decomposition (J=2,1) and then obtain 64-dimensional uncorrelated discriminant feature by performing the uncorrelated discriminant transformation. The results of these experiments are displayed in Table 2. According to Table 1 and Table 2, we have the following conclusions: (1) Since the discrete wavelet transform is an effective tool for reducing the dimension of images, the computation of the feature extraction using the proposed ULDA-Wavelet method is much more efficient than that of the Fisherface and ULDA+PCA methods. As discussed in section 3.2, the computation complexity of wavelet transform is O(d),where d is the number of pixels in the training images and the computation complexity of PCA is O(N3), where N is the number of training images. (2) In general, the recognition rate of the present method is much higher than that of Fisherface and ULDA+PCA methods for the high resolutions .The reason may be that the level of wavelet decomposition increases with increasing the images resolution. Therefore, the locality of wavelet both in space domain and frequency domain becomes more apparent. 4.2 Experiment 2
In order to test the performance of Fisherface, ULDA+PCA and ULDA-Wavelet methods for varying number of discriminant vectors, we perform the following experiments. With the number of discirminant vectors increasing, the recognition rate increases gradually. However, excessive features induce the decreasing of the recognition rate owing to the correlation of the features. But the recognition rate will not decrease with the increasing of uncorrelated features. Figure 5 shows the recognition rates of Fisherface, ULDA+PCA and ULDA-Wavelet for varying number of discriminant vectors. It can be seen from Fig.5 that the recognition rates increase for all three methods with increasing number of discriminant vectors. For Fisherface method, there is a decrease in recognition rate when N>29 for the four resolutions. N refers to the number of discirminant vectors. For ULDA+PCA and ULDA-Wavelet, however, the recognition rates do not decrease even as N becomes too large because the discriminant features are uncorrelated. Especially, for ULDAWavelet, the recognition rate is higher than 90% and increases steadily with the number of discriminant vectors increasing when N>13 for high resolutions. Figure 6 shows the error rates of Fisherface, ULDA+PCA and ULDA-Wavelet for varying number of discriminant vectors (N). It can be seen from Fig.6 that the error rates decrease for all three methods with increasing number of discriminant vectors. For ULDA+PCA and ULDA-Wavelet, the error rates do not increase even as N becomes very large because the discriminant features are uncorrelated. For Fisherface method, however, there is an increase in error rates when N>15 for the two resolutions. 4.3 Experiment 3
Now let us have a look at the influence of the number of training samples on the performance of ULDA-Wavelet method by changing the number of training samples for each class. When using the ORL database, the first k (k=4,5,6) images are arranged as training samples and the other images are test samples for each class.
When Uncorrelated Linear Discriminant Analysis Are Combined with Wavelets
(a) Resolution 92×112
(c) Resolution 23×28
563
(b) Resolution 46×56
(d) Resolution 12×14
Fig. 5. Comparison of recognition rates of Fisherface, ULDA+PCA and ULDA-avelet with varying number of discriminant vectors for the four resolutions using ORL database
(a) Resolution 32×32
(b) Resolution 16×16
Fig. 6. Comparison of error rates of Fisherface, ULDA+PCA and ULDA-Wavelet with varying number of discriminant vectors for the two resolutions using NUST603 database
564
X. Cao and J.-Y. Yang
ULDA-Wavelet can be regarded as the most optimal for high resolution .The average recognition rate of the ULDA-Wavelet is 95.8% for the resolution 92×112, 96.7% for the resolution 46×56 and 96% for the resolution 23×28. For the three resolutions, all the average recognition rates of the ULDA-Wavelet are superior to those of the ULDA+PCA and Fisherface methods. Table 3 gives a summary of the experiments. When using the NUST603 database, let the first k (k=3,4,5) images be training samples and the other images are test samples for each class. The results of the experiments are displayed in Table 4. It can be seen from Table 4 that, for the resolution 32×32, the lowest average error rate is 1.43% obtained by ULDA-Wavelet method and the average error rate of the Fisherface and ULDA+PCA is 1.6% and 2.83%, respectively. Table 3. Comparison of recognition rate (%) obtained from ULDA-Wavelet, Fisherface and ULDA+PCA for the four different resolutions and different number of training samples per class (N) using ORL database. R refers to the average recognition rate, F fisherface method, UP ULDA+PCA method, UW ULDA-Wavelet method. 92×112
46×56
N F
UP
UW
F
UP
UW
4 5 6
89 89.5 94.4
86.6 88 93.7
95 95 97.5
90.4 89.5 94.3
90.4 88.5 93.7
97.5 95 97.5
R
91
89.4
95.8
91.4
90.9
96.7
23×28
12×14
N F
UP
UW
F
UP
UW
4 5 6
90 90 94.4
90 88.5 94.3
95.4 95 97.5
88.8 89.5 91.3
92.9 94 96.3
93.3 91.5 98.1
R
91.5
91
96
89.9
94.4
94.3
Table 4. Comparison of error rates (%) obtained from ULDA-Wavelet, Fisherface and ULDA+PCA for the two different resolutions and different number of training samples per class (N) using NUST603 database. E refers to the average error rate, F fisherface method, UP ULDA+PCA method, UW ULDA-Wavelet method. 32×32
16×16
N F
UP
UW
F
UP
UW
3 4 5
4 2.6 1.9
2.3 1.7 0.8
2.1 1.6 0.6
4.6 2.8 1.7
2.2 0.9 0.8
2.1 1.7 0.6
E
2.83
1.6
1.43
3.03
1.3
1.47
When Uncorrelated Linear Discriminant Analysis Are Combined with Wavelets
565
From these results, the ULDA-Wavelet method deals better with different numbers of training samples for the high resolutions, which may mean that the proposed method is stable and tolerant to changes in number of training faces as well as in high resolutions.
5 Conclusions This paper deals with the important problem of extracting discriminant features for pattern classification. A novel method (termed ULDA-Wavelet) incorporating the basic idea of classical ULDA and wavelet was proposed to reduce the computational load and increase recognition rate for those images with high resolution. Experimental results on the ORL databases have shown that the ULDA-Wavelet features outperform the features extracted by the Fisherface and ULDA+PCA method in the performance of face recognition for images with high resolution. The recognition rate of ULDA-Wavelet method increases with the number of discriminant vectors increasing. Moreover, the recognition rate keeps stable when the number of discriminant vectors becomes very large. As an extension of this work, we are well aware that lots of other factors may affect the performance, and there is still large room to improve the accuracy. For instance, other distance classifier and wavelet basis should be experimented with.
References 1. Fisher, R. A.: The Use of Multiple Measurements in Taxonomic Problems. Ann. Eugenics, 7 (1936) 178-188 2. Foley, D. H., Sammon, J. W.: An Optimal Set of Discriminant Vectors. IEEE Trans. Computation, 24 (1975) 281-289 3. Duchene, J., Leclercq, S.: An Optimal Transformation for Discriminant and Principal Component Analysis. IEEE Trans. Pattern Anal. Mach. Intell. 10 (1988) 978-983 4. Jin, Z., Yang, J. Y.: Face Recognition Based on Uncorrelated Discriminant TransFormation. Pattern Recognition, 34 (2001) 1405-1416 5. Belhumeur, P. N., Hespanha, J. P., Kriegman, D. J.: Eigenfaces vs.fisherfaces: Recognition Using Class Specific Linear Projection. IEEE Trans. Pattern Anal. Machine Intell. 19 (1997) 711–720 6. Lai, J. H., Yuen, P. C., Feng, G. C.: Face Recognition Using Holistic Fourier Invariant Features. Pattern Recognition, 34 (2001) 95-109 7. Feng, G. C., Yuen, P. C., Dai, D. Q.: Human Face Recognition Using PCA on Wavelet Subband. Journal of Electronic Imaging 19 (2000) 226-233 8. Mallat, S.: A Theory for Multiresolution Signal Decomposition: The Wavelet Representation. IEEE Trans. Pattern Anal. Mach.Intell. 11 (1989) 674-693 9. Daubechies, I., Mallat, S. G., Willsky, A.S.: Introduction to the Special Issue on Wavelet Transforms and Multiresolution Signal Analysis. IEEE Trans. Inform. Theory, 38 (1992) 529-531 10. Hughes, G .F.: On the Mean Accuracy of Statistical Pattern Recognizers. IEEE Trans. Inform. Theory, 14 (1968) 55-63 11. Kirby, M., Sirovich, L.: Application of Karhunen-Loeve Procedure for The Characterization of Human Faces. IEEE trans. on Pattern Anal. Mach. Intell. 12 (1990) 103-108 12. Castleman, K. R.: Digital Image Processing. Englewood Cliffs, Prentice Hall, NJ (1996)
2D Direct LDA for Efficient Face Recognition Un-Dong Chang, Young-Gil Kim, Dong-Woo Kim, Young-Jun Song, and Jae-Hyeong Ahn Chungbuk National University, 12 Gaeshin-dong, Heungduk-gu, Chungbuk, Korea {[email protected], [email protected], [email protected], [email protected], [email protected]}
Abstract. In this paper, a new feature representation technique called 2dimensional direct LDA(2D-DLDA) is proposed. 2D-DLDA directly extracts the image scatter matrix from 2D image and uses direct LDA for face recognition. The proposed method alleviates computational difficulty and effectively solves the so-called small sample size problem which exists in most face recognition. The ORL face database is used to evaluate the performance of the proposed method. The experimental results indicate that the proposed method is effective and feasible.
1 Introduction Over the past 20 years, face recognition (FR) has been an active research. Various methods have been proposed for FR. Especially, the appearance-based methods have been successfully employed. Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are well known methods among them. PCA seeks directions that have the largest variance associated with it. On the other hand, LDA seeks directions that are efficient for discrimination between classes. PCA based methods have been developed since the eigenfaces methods[1,2] for face recognition was presented. Recently, Yang et al. [3] proposed 2D-PCA . While previous methods use 1D image vector, 2D-PCA makes directly the scatter matrix from 2D image matrices. 2D-PCA deals with the small size scatter matrix than traditional PCA-based methods and evaluates the scatter matrix accurately. For example, an image vector of 112×92 forms 10304 dimensional vector and the size of the scatter matrix is 10304×10304. On the other hand, the covariance of 2D-PCA forms only 92×92 matrix. Also, 2D-PCA is more suitable for small sample size problems because its scatter matrix is small. But 2D-PCA requires more coefficients for image representation than PCA. It needs more storage and more time for recognition. Belhumeur et al. [4] proposed Fisherfaces method based on LDA. In general, it is believed that LDAbased pattern classification methods outperform PCA-based ones. However, Traditional LDA has small sample size (SSS) problem. Also, it is difficult to directly apply to high dimensional matrix because of computational complexity. To solve the problem, Belhumer et al proposed dimensionality reduction using PCA before LDA. A potential problem is that PCA step may discard dimensions that contain important discriminative information. Chen et al. [5] have proved that the null space of within-class scatter D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 566 – 571, 2006. © Springer-Verlag Berlin Heidelberg 2006
2D Direct LDA for Efficient Face Recognition
567
matrix contains the most discriminative information. In reality, PCA discards the null space of the within-class scatter matrix. To prevent the null space from discarding, Yu et al. [6] proposed direct LDA (D-LDA) method. D-LDA directly processes data in the original high dimensional vectors. By simultaneous diagonalization, D-LDA is able to discard the null space of between-class scatter matrix and to keep the null space of withinclass scatter matrix, which contains very important discriminative information. J. Lu et al. [7] proposed kernel direct discriminant analysis (KDDA). While KDDA provides better performance, it is computationally more than D-LDA. In this paper, we introduce a new low-dimensional feature representation method, 2DDLDA. The method uses the image scatter matrix and uses D-LDA for face recognition. The image scatter matrix can evaluate the image scatter matrix accurately. And then DLDA method is used for obtaining the feature matrix. It maximizes Fisher's criterion. The remainder of this paper is organized as follows. In Section 2, the proposed 2DDLDA algorithm is described. Experimental results and comparisons with D-LDA and KDDA are presented in Section 3. Finally, conclusions are offered in Section 4.
2 2D-DLDA 2.1 Feature Extraction Let X denotes a m×n image, and W is an n-dimensional column vector. X is projected onto W by the following linear transformation
Y = XW
(1)
Thus, we get an m-dimensional projected vector Y , called the feature vector of the image X . Suppose there are C known pattern classes in the training set, and M denotes the size of the training set. The j th training image is denoted by a m ×n matrix X j ( j = 1, 2, , M ) and the mean image of all training sample is denoted by X and X i (i = 1, 2, , c) denoted the mean image of class Ti and N i is the number of samples in class Ti , the projected class is Pi . After the projection of training image onto W , we get the projected feature vector Y j = X jW ,
j = 1, 2, , M
(2)
LDA attempts to seek a set of optimal discriminating vectors to form a transform W by maximizing the Fisher criterion denoted as J (W ) =
~ tr ( S b ) ~ tr ( S w )
(3)
where tr(ǜ) denotes the trace of matrix, S~b denotes the between-class scatter matrix of ~ projected feature vectors of training images, and S w denotes the within-class scatter matrix of projected feature vectors of training images. So,
568
U.-D. Chang et al. C C ~ S b = ¦ N i (Yi − Y )(Yi − Y ) T = ¦ N i [( X i − X )W ][( X i − X )W ]T , i =1
C ~ Sw = ¦
i =1
(4)
C
¦ (Y
k
i =1 Yk ∈Pi
− Yi )(Yk − Yi ) T =¦
¦ [( X
k
− X i )W ][( X k − X i )W ]T
i =1 X k ∈Ti
So, ~ § C · tr ( S b ) = W T ¨ ¦ N i ( X i − X ) T ( X i − X ) ¸W , © i =1 ¹
(5)
§ C · ~ tr ( S w ) = W T ¨¨ ¦ ¦ ( X k − X i ) T ( X k − X i ) ¸¸W © i =1 X k ∈Ti ¹ Let us define the following matrix C
C
Gb = ¦ N i ( X i − X )T ( X i − X ), Gw = ¦ i =1
¦(X
k
− X i )T ( X k − X i )
(6)
i =1 X k ∈Ti
The matrix Gb is called the image between-class scatter matrix and Gw is called the image within-class scatter matrix. Alternatively, the criterion can be expressed by J (W ) =
W T GbW W T G wW
(7)
Now, we try to find a matrix that simultaneously diagonalizes both Gb and Gw . AG w AT = I , AGb AT = Λ
(8)
Where Λ is a diagonal matrix with diagonal elements sorted in decreasing order. First, we find eigenvectors V that diagonalizes Gb V T GbV = Λ
(9)
Where V V = I . Λ is a diagonal matrix sorted in decreasing order, i.e. each column of V is an eigenvector of Gb , and Λ contains all the eigenvalues. T
Let Y be the first l columns ( l ≤ n ) of V (a n × n matrix, n being the column numbers of image). Y T Gb Y = Db
(10)
where Db is the l × l principal sub-matrix of Λ . Further let Z = YDb−1 / 2 to unitize Gb , (YDb−1 / 2 ) T Gb (YDb−1 / 2 ) = I Z T Gb Z = I
(11)
Next, we find eigenvectors U to diagonalize Z T Gw Z . U T Z T G w ZU = Dw
(12)
2D Direct LDA for Efficient Face Recognition
569
Where U T U = I . Dw may contain zeros in its diagonal. To maximize J (W ) , we can sort the diagonal elements of Dw and discard some high eigenvalues with the corresponding eigenvectors. Let the optimal projection matrix, W W = ( Dw−1 / 2U T Z T ) T
(13)
Also, W unitizes Gw [6,8]. The low dimensional transformed vector X * is X * = ( X − X )W . 2.2 Classification The distance measure two matrices is the nearest neighbor, The distance between X 1* and X 2* is adopted by Frobenius norm. the Forbenius norm as follows D F ( X 1* , X 2* ) = X 1* − X 2*
(14) F
3 Experimental Results The proposed method is tested on the ORL face image database (http://www.camorl.co.uk/facedatabase). The ORL database consists of 40 distinct persons. There are 10 images per person. The images are taken at different times and contain various facial expressions (open/closed eyes, smiling/ not smiling) and facial details (glasses or no glasses). The size of image is 92×112 pixels with 256 gray levels. Fig. 1 depicts some sample images in the ORL database. Five sets of experiments are conducted. In all cases the five images per class are randomly chosen for training from each person and the other five images are used for testing. Thus the total number of training images and testing images are both 200. Table 1 compares the average recognition rates obtained using 2D-DLDA, D-LDA and KDDA. In the 112x92 image matrix, the size of image scatter matrix is 92x92 and
Fig. 1. Some face samples of ORL face database
570
U.-D. Chang et al. Table 1. Comparison of average recognition rates of different methods
Methods Average Recognition rate(%)
2D-DLDA
D-LDA
KDDA
94.1
90.6
93.3
Table 2. The average recognition rates of different image size
Image size 112x92 56x46 39x30 28x23
Average recognition rate(%) 94.1 93.9 94.1 94.5
the average recognition rate of 2D-DLDA is 94.1%, while the average recognition rate of D-LDA is 90.6% and the average recognition rate of KDDA which are adopted the Gaussian kernel and 28x23 down-sampled image is 93.3%. In 2D-DLDA, Image scatter matrix is smaller than scatter matrix of D-LDA so that we can avoid computational complexity of feature extraction. But 2D image based methods have a weak point. The extracted feature matrix of 2D-DLDA is larger than D-LDA. For instance, the extracted feature matrix forms 112×87 to get the best performance when the size of image matrix is 112×92. Therefore it needs the dimensional reduction of image. Table 2 shows the average recognition rates for different image sizes. The results of the experiments show that the dimensional reduction has little influence on the performance.
4 Conclusions In this paper, 2D-DLDA algorithm is proposed. The method combines the merits of the image scatter matrix and D-LDA approaches. Since the size of the image scatter matrix is smaller than the conventional method, SSS problem can be avoided and eigenvectors can be efficiently computed. Furthermore it achieves the better performance by using DLDA since D-LDA preserves the null space of within-class scatter matrix, which contains very important discriminative information. And the experimental results show that the dimensional reduction has little influence on the recognition rate. In addition, 2D-DLDA is better computing performance than KDDA.
References 1. Pentland, A., Moghaddam, B., Starner, T.: View-Based and Modular Eigenspaces for Face Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (1994) 84-91 2. Turk, M. Pentalnd, A.: Eigenfaces for Recognition. J. Cognitive Neurosci. Vol. 3. No. 1. (1991) 71-86
2D Direct LDA for Efficient Face Recognition
571
3. Yang, J., Zhang, D., Frangi, A.F., Yang, J.: Two-Dimensional PCA: A New Approach to Appearance-Based Face Representation and Recognition. IEEE Trans. Pattern Analysis and Machine Intelligence. Vol. 26. No. 1. (2004) 131-137 4. Belhumeour, P.N., Hespanha, J.P., Kriegman, D.J.: Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection. IEEE Trans. Pattern Analysis and Machine Intelligence. Vol. 19. No. 7. (1997) 711-720 5. Chen, L., Liao, H., Ko, M., Lin, J., Yu, G.: A New LDA-Based Face Recognition System Which Can Solve the Small Sample Size Problem. Pattern Recognition. Vol. 33. No. 10. (2000) 1713-1726 6. Yu, H. Yang, J.: A Direct LDA Algorithm for High-Dimensional Data with Application to Face Recognition. Pattern Recognition. Vol. 34. No. 10. (2001) 2067-2070 7. Lu, J., Plataniotis, K. N., Venetsanopoulos, A. N.: Face Recognition Using Kernel Direct Discriminant Analysis Algorithms. IEEE Trans. Neural Networks. Vol. 14. No. 1. (2003) 117-126 8. Fukunaga, K.: Introduction to Statistical Pattern Recognition. 2nd edn. Academic Press, New York (1990)
3-D Curve Moment Invariants for Curve Recognition Dong Xu1,2,3,4 and Hua Li1,2,3 2
1 Key Laboratory of Intelligent Information Processing, Chinese Academy of Science Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Science 3 National Research Center for Intelligent Computing Systems, Institute of Computing Technology, Chinese Academy of Science Beijing P. O. Box 2704, 100080, PR China {xudong, lihua}@ict.ac.cn 4 Graduate University of Chinese Academy of Sciences
Abstract. 3-D curve moments and curve moment invariants under similarity transformation are defined in this paper. 3-D curve moment invariants can be used as shape descriptors for the representation of parametric curves. We use them to describe some commonly used parametric curves and test the invariance of them in the experiment.
1 Introduction Curve matching is an important research area in computer vision and pattern recognition, which can help us determine what category the given test curve belongs to. The usual way is to use features in the curve which are independent of orientations, and find correspondences of these features between test curve and model curves. Transformation parameters also need to be recovered for pose estimation of the test curve. The invariant features mainly can be divided into two classes: differential invariants and integral invariants. Curvature and torsion of the curve are commonly used differential invariants. Moment invariants are integral invariants. Curves can be represented by various structures for matching. Keren et al. [1] described curves with implicit higher degree polynomials, and compared the coefficients of polynomials by Mahalanobis distance for object recognition. Taubin [2] also used implicit representations of curves for position estimation and edge segmentation. Bribiesca [3] represent 3-D digitalized curves and surfaces with chain codes, which can be invariant under rotation and translation. Many other literatures concern sampling points in the curve. Transformation parameters can be obtained by Iterative Closest Points (ICP) algorithm [4],[5], Least-Squares Fitting [6], or genetic algorithm [7], or other semi-local invariants in the curve [8],[9]. 2-D moment invariants were firstly proposed by Hu [10] in 1962 for character recognition. In 1980, Sadjadi and Hall [11] first extended moment invariants from 2-D to 3-D. Lo and Don [12] constructed 3-D moment invariants with complex moments and group-theoretic technique. Moment invariants can also be used to represent shape characteristics of curves. Huang and Cohen presented a new class of B-spline weighted curve moments that were used to obtain the closed-form estimations for the affine D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 572 – 577, 2006. © Springer-Verlag Berlin Heidelberg 2006
3-D Curve Moment Invariants for Curve Recognition
573
transformation between the original curve and the transformed one [13]. Zhao and Chen defined curve moments on a parameterized boundary description of an object and derived affine moment invariants on the theory of algebraic invariants [14]. Lo and Don have mentioned the moments of 3-D curves in [15]. In this paper, we generalize curve moments from 2-D to 3-D Euclidean space, and use geometrical method to derive 3-D curve moment invariants of different orders under similarity transformation. First, 3-D curve moments are defined in Part 2, and 3-D curve moment invariants are constructed by invariant geometric primitives. In Part 3, we use some 3-D curve moment invariants to represent 3-D parametric curves for the experiment. We conclude the paper and open perspectives for future work in Part 4.
2 3-D Curve Moments and 3-D Curve Moment Invariants for Curve Representation 2.1 Definition of 3-D Curve Moments Suppose P (t ) = ( x(t ), y (t ), z (t )) is a parametric curve in R 3 , T is definition domain of parameter t, and ρ ( x, y , z ) is the density distribution function of the curve. 3-D curve moments of order l+m+n are defined by the path integrals defined in the path L of P (t ) : M lmn = ³ x l y m z n ρ ( x, y, z )dl L
= ³ x(t ) l y (t ) m z (t ) n ( T
.
dx(t ) 2 dy (t ) 2 dz (t ) 2 ) +( ) +( ) ρ ( x(t ), y (t ), z (t ))dt dt dt dt
(1)
The centroid of the 3-D curve can be determined from the zeroth and the first-order moments by
x=
M M M 100 , y = 010 , z = 001 . M 000 M 000 M 000
(2)
Then central moments are defined as
M lmn = ³ ( x − x ) l ( y − y ) m ( z − z ) n ρ ( x, y, z )dl .
(3)
L
The central curve moments are invariants under translation. Assume that the curve is scaled by factor Ȝ, the new parametric 3-D curve becomes: P ' (t ) = ( x ' (t ), y ' (t ), z ' (t )) = (λx (t ), λy (t ), λz (t )) , then M ' lmn = ³ x' (t) l y' (t) m z' (t ) n ( T
dx' (t ) 2 dy' (t ) 2 dz' (t ) 2 ) +( ) +( ) ρ ' (x' (t ), y' (t ), z' (t ))dt . dt dt dt
= λl +m+n+1 M lmn
So μ lmn =
M lmn is an invariant curve moment under scaling. 1+ l + m + n M 000
(4)
574
D. Xu and H. Li
Moment invariants are expressions of moments which are invariant under a kind of transformation group. In this paper, we only discuss 3-D curve moment invariants under similarity transformation. This transformation can be decomposed into translation, scaling and rotation parts. Suppose P ' (t ) = ( x ' (t ), y ' (t ), z ' (t )) is the new curve of P after similarity transformation. The relationship between P and
P' can be expressed as
ªa11 a12 a13 º ª x(t ) º ªα º ª x' (t ) º ª x(t ) º ªα º « y ' (t )» = λM « y (t )» + « β » = λ «a a a » « y (t ) » + « β » . « 21 22 23 » « « » « » « » » « » «a a a » « z (t ) » «γ » «¬ z ' (t ) »¼ «¬ z (t ) »¼ «¬γ »¼ ¼ ¬ ¼ ¬ 31 32 33 ¼ ¬
(5)
Here M is an orthogonal matrix which has the property that MM T = M T M = I where M T is the transpose of M. Since translation and scaling invariance has been achieved in section 2.1, we only consider the rotation under orthogonal matrix M. We have proposed a geometrical method to derive moment invariants under rotation in [16], which is based on the invariant geometric primitives such as distance, area, volume and angle under similarity transformation. This construction method of moment invariants only needs one step by multiple integrals of the invariant primitives. It has the advantage that it can easily get higher order moment invariants and endows moment invariants with geometrical meanings. 2.2 3-D Parametric Curves Representation We have introduced that 3-D curves can be represented by algebraic polynomials, chain codes, sampling points etc. Density distribution function ρ ( x, y, z ) determines the shape of a curve uniquely, and it can be recovered by inverse Fourier transform if all orders of moments have been got [10]. 3-D curve moment invariants also can be treated as shape descriptors to represent a curve. Each curve can be uniquely determined by an infinite sequence of curve moment invariants theoretically. Higher order curve moment invariants are not very stable and cost expensive computational time, even they can be derived easily from geometric primitives now. We can approximately use a certain set of low order curve moment invariants as characters to represent the curve.
3 Experiment Here, we use nine low-order 3-D moment invariants by the multiple integrals of R 2 (1) , A 2 (O,1,2) , V 2 (O,1,2,3) , R 4 (1) , A 4 (O,1,2) , An 4 (O,1,2) , An 3 (O,1,2) ,
An(O,1,2) R 2 (1) R 2 ( 2) and An 2 (O,1,2) R 2 (1) for the experiment. The explicit expressions of these invariants can be seen in [11] and [16]. They are then divided by certain powers of the zeroth order moment for normalization to achieve scaling invariance. Four parametric 3-D curves are presented here for the invariance test of the 9 curve moment invariants. They are C1 —circular helix, C 2 —conical spiral, C3 — polynomial curve of degree 3,
C 4 —spherical spiral, shown in Fig. 1.
3-D Curve Moment Invariants for Curve Recognition
575
Fig. 1. Four parametric 3-D curves for the experiment
Explicit parametric expressions of these four curves are given below, where the range of parameter t is [0,10π ] .
x(t ) = 5π sin(t ) ° C1 : ® y (t ) = 5π cos(t ) ° z (t ) = t ¯
x(t ) = 100π 2 t ° C3 : ® y (t ) = 10πt 2 ° z (t ) = t 3 ¯
x(t ) = (t / 2) sin(t ) ° C 2 : ® y (t ) = (t / 2) cos(t ) ° z (t ) = t ¯
x(t ) = cos(t ) / 1 + t 2 ° ° C 4 : ® y (t ) = sin(t ) / 1 + t 2 ° 2 °¯ z (t ) = −t / 1 + t
Each moment is computed by its discrete form M lmn =
N
¦x
l i
m n y i z i li , where
i =1
( xi , yi , z i ) is one sampling point in the curve and
li is the length of one line segment
between two sampling points. Number of sampling points N=501 here. Each curve is transformed into 125 new positions under different similarity transformations. Ratios of Standard Deviation/Mean of the nine curve moment invariants of these four parametric curves are very low, which certificate the invariance of the nine curve moment invariants.
576
D. Xu and H. Li
Next, we simply give the distance matrix of the four curves based on the Euclidean distance of the nine-dimensional feature vectors. See Table 1. It shows that C1 and
C 2 have some similarities between each other. This can be explained that they are both spiral-like curves. Table 1. Distance matrix of the four parametric curves
C1 C1 C2 C3
0
C4
1.315× 10
8.403× 10 7.813× 10
C2 −4 −2 −2
8.403× 10 0 7.730× 10 1.231× 10
C4
C3 −4
−2 −2
7.813× 10 7.730× 10 0 6.507× 10
−2 −2
−2
1.315× 10 1.231× 10 6.507× 10 0
−2 −2 −2
4 Conclusions and Future Work In this paper, we define 3-D curve moment and derive 3-D curve moment invariants by geometric primitives. It can be considered as a kind of shape descriptor for the representation of 3-D curves for object recognition. Experiment certificates the correctness of the given 3-D curve moment invariants and shows that it can be used for curve recognition. In the future, 3-D curve moment invariants can be adopted to applications in computer vision after 3-D reconstruction process. Furthermore, we can try to recover the pose of test curve to model curve for exactly matching. An elaborated set of curve moment invariants should be chosen to enhance their recognition ability. Other similarity measurements between curves could also be investigated to make the result best coincident with the judgment of human vision.
Acknowledgements This work is supported by National Key Basic Research Plan (Grant No. 2004CB318006) and National Natural Science Foundation of China (Grant No. 60573154).
References 1. Keren, D., Subrahmonia, J., Cooper, D. B.: Robust Object Recognition Based on Implicit Algebraic Curves and Surfaces. Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (1992) 791-794 2. Taubin, G.: Estimation of Planar Curves, Surfaces, and Nonplanar Space Curves Defined by Implicit Equations with Applications to Edge and Range Image Segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence. 13 (1991) 1115-1138
3-D Curve Moment Invariants for Curve Recognition
577
3. Bribiesca, E.: A Chain Code for Representing 3D Curves. Pattern Recognition. 33 (2000) 755-765 4. Besl, P. J., McKay, N. D.: A Method for Registration of 3-D Shapes. IEEE Trans. Pattern Analysis and Machine Intelligence. 14 (1992) 239-256 5. Zhang, Z.: Iterative Point Matching for Registration of Free-form Curves and Surfaces. Int. Journal of Computer Vision. 13 (1994) 119-152 6. Arun, K. S., Huang T. S., Blostein S. D.: Least-Squares Fitting of Two 3-D Point Sets. IEEE Trans. Pattern Analysis and Machine Intelligence. 9 (1987) 698-700 7. Yamany, S. M., Ahmed M. N., Farag, A. A.: A New Genetic-based Technique for Matching 3-D Curves and Surfaces. Pattern Recognition. 32 (1999) 1817-1820 8. Pajdla, T., Cool L. V.: Matching of 3-D Curves using Semi-differential Invariants. Proc. Fifth Int. Conf. Computer Vision. (1995) 390-395 9. Li, S. Z.: Invariant Representation, Matching and Pose Estimation of 3D Space Curves under Similarity Transformations. Pattern Recognition. 30 (1997) 447-458 10. Hu, M. K.: Visual Pattern Recognition by Moment Invariants. IRE Trans. Information Theory, 8 (1962) 179-187 11. Sadjadi, F. A., Hall, E. L.: Three-Dimensional Moment Invariants. IEEE Trans. Pattern Analysis and Machine Intelligence. 2 (1980) 127-136 12. Lo, C. H., Don, H. S.: 3-D Moment Forms: Their Construction and Application to Object Identification and Positioning. IEEE Trans. Pattern Analysis and Machine Intelligence. 11 (1989) 1053-1064 13. Huang, Z., Cohen, F. S.: Affine-invariant B-spline Moments for Curve Matching. IEEE Trans. Image Processing. 5 (1996) 1473-1480 14. Zhao, D., Chen, J.: Affine Curve Moment Invariants for Shape Recognition. Pattern Recognition. 30 (1997) 895-901 15. Lo, C. H., Don, H. S.: Pattern Recognition using 3-D Moments. Int. Conf. Pattern Recognition. (1990) 540-544 16. Xu, D., Li, H.: 3-D Surface Moment Invariants. Int. Conf. Pattern Recognition. (2006)
3D Ear Reconstruction Attempts: Using Multi-view Heng Liu1,2, Jingqi Yan1, and David Zhang3 1
Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, 200030, P.R. China {hengliu, jqyan}@sjtu.edu.cn 2 Southwest University of Science and Technology, Mianyang, 621000, P.R. China 3 Department of Computing, The Hong Kong Polytechnic University, Hong Kong, P.R. China [email protected]
Abstract. 3D ear reconstruction based on multi-view method is explored in this paper. Our approach do not depends on special facilities and it applies multiview epipolar geometry and motion analysis principles to reconstruct 3D ear model. Ear feature points selecting and matching based on ear contour division method in user interactive way are proposed in this paper. Detail experimental results and comparisons with existing reconstruction methods are provided. Potential 3D ear feature extraction and recognition ways are discussed in the final part.
1 Introduction Ear holds obvious advantages in biometrics that it has ample and stable spatial structure which does not change from eight to seventies and does not suffer from the variation of skin-color, facial expressions, cosmetics and hairstyle. In order to extract ear adequate feature information for recognition, we should reconstruct 3D ear model. There are several ways to reconstruct 3D ear model [1] [2] [3]. All approaches are high-cost and computationally expensive and they all need special facilities. In order to obtain 3D ear model in a convenient and low-cost way, we use ear multi-view images for reconstruction task. Multi-view epipolar geometry and motion analysis principles are applied in the procedure of model reconstruction, and ear structure information will be recovered from 3D points cloud. After meshed 3D points cloud, to get finer ear model, surface refining and subdivision technology can be utilized. Feature points selecting and matching are important step for object reconstruction based on multi-view style. Thus, in this work we compare two ways for selecting and matching feature points: automatic way which is based on corner detection and RANSC searching; user interactive way which is based on auricle contour division and features observation. This paper is organized in the following: Section 2 describes the principles of 3D ear reconstruction based on multi-view and gives a detail discussion on feature points selection and matching. Section 3 presents experimental results and comparisons with other reconstruction methods. Section 4 gives a short conclusion. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 578 – 583, 2006. © Springer-Verlag Berlin Heidelberg 2006
3D Ear Reconstruction Attempts: Using Multi-view
579
2 3D Ear Reconstruction Based on Multi-view Since present methods [1] [2] [3] to reconstruct 3D models need professional facilities and complex computation whereas the effect is not perfect, in this section, we will introduce the principles of multi-view reconstruction. As two-view reconstruction is the basic case of multi-view, we show reconstruction principles in two-view form firstly. Then we discuss ear feature points selection and matching methods in detail. 2.1 Two-View Epipolar Geometry
= [X ,Y , Z ], X ∈ R 3 , its' imaging point in the 2 image plane is a 2D point x = [x , y ], x ∈ R . Let motion parameters is Π = [R, T ]∈ R 3×4 , a sketch-map of two-view epipolar geometry is shown in the Fig.1.
If we assume a 3D scene point is X
Fig. 1. Two-view epipolar geometry model.
l1 ,l 2 are epipolar lines; x1 , x 2
O1 ,O2
are lens center;
e1 , e2 are
epipoles;
are point correspondences in l1 ,l 2 .
From Fig.1, we can easily get:
λ2 x 2 = Rλ 1x1 + T where
λ1 and λ2
(1)
are corresponding points depth value (Z coordinate value). In
equation (1), if one uses vector representation and notices the orthogonally: x 2 • T
→ §→ T · 3×3 ¨ T × x2 ¸ = 0 and defines E = T × x 2 ∈ R (E is called Essential Matrix), we © ¹
can get that:
x 2T Ex1 = 0 We may assume that the first camera matrix is decomposition, E can be represented as
E = Udiag{1,1,0}V T
(2)
P = [I
0] , using SVD (3)
580
H. Liu, J. Yan, and D. Zhang
There are four possible choices for the second camera matrix P' [4]:
[
]
[
P' = UWV T | u 3 orP' = UWV T | −u 3
[
]
[
]
P = UW V | u 3 orP = UW T V T | −u 3 '
or
T
T
'
]
(4)
ª0 − 1 0 º » « where u 3 is the last column of U, and W = 1 0 0 . » « «¬0 0 1»¼ Testing with some points to determine if they are in front of both cameras is sufficient to decide between the four different solutions for the camera matrix P'.
[
]
For P = K R | T , if R and T are gotten, based on Eq. (4), we can easily to recover scene structure. Thus, once every feature point of ear is recovered, then the 3D ear points cloud will be reconstructed well. '
2.2 Ear Feature Points Selection and Matching Ear feature points selection and feature points matching across different view images are essential steps in multi-view reconstruction. For example, we can use some scene feature points to calibrate cameras and after camera calibration, we can use ear feature points to recover 3D ear structure information. Traditionally, multi-view reconstructions take the following step to finish feature selection and matching task: firstly, using corner detector (such as Harris corner detector [5]) to extract interesting points of images; then using robust RANSAC algorithm [6] as a "searching engine" to eliminate the mismatches of the initial point correspondence sets; finally, applying the multi-resolution algorithm [7], the intensity correlation and the epipolar constraints to correct the matching result for interest points in the image. We apply such series methods to select ear feature points and match them.
ª ¦ I x2 Let I be intensity of an image, and then defining C = « ¬«¦ I x I y
¦I I ¦I x
2 x
y
º » to ¼»
be a 2*2 matrix that depends on certain window, then candidate features can be measured by improved Harris corner criterion:
t = del ( C ) − k * trace 2 ( C ) t >= T
(5)
Where T is a threshold of feature measure quality. The results of ear features selection and matching are shown in Fig.2. From Fig.2, we see automatic ear feature points detection and matching results are not good. There are only a few right matching feature points in the auricle. Such results show the automatic way to reconstruct 3D ear model directly will fail possibly.
3D Ear Reconstruction Attempts: Using Multi-view
(a)
(b)
(c)
581
(d)
Fig. 2. Automatic two-view ear feature points detection and matching results using Harris corner detector, RANSAC searching with transform motion assumption. (a) (b) two views corner detection results. (c) Two views with interesting points superpose together. (d)RANSC with transform motion assumption feature points matching results.
Thus, we take a user interactive way to select and match ear feature points across multi-view images. Scene feature points for camera calibration are selected and matched by pure observation. After calibration, we could use Eq.(1),Eq.(2) and Eq.(3) with epipolar matching criterion to determine other feature points. We select and match ear feature points interactively based on ear contour division. The concrete ways are described in the following: 1) Dividing the whole auricle into four interesting region: helix, antihelix, concha, and triangular fossa. Selecting feature points in every division by such order. 2) Based on the same order, from sparse to density, manually selecting feature point almost in a unified distribution in one view image. 3) In the corresponding view, using RANSC algorithm and epipolar constrains to draw the epipolar line. And deducing the corresponding feature point position and draw such point according to the intensity correlation. 4) If the obtained feature point is obviously observed to be in the wrong opposition or departures original corresponding point position, then giving up the obtained point and replace it with the right observation point. 5) Repeating steps (2) (3) (4) until the number of selected feature points is over 300 (original image size is 1280*1024). By this way, some results of feature points selection and matching are shown in Fig.3.
(a)
(b)
Fig. 3. User interactive feature points selection and matching. (a) scene feature points selection and matching. (b) ear feature points selection and matching.
In practice, in order to take a great deal of feature points, we should create feature points density matching by automatic approach with user interactive way. Such work style is similar to the way of the modeling software PhotoModeler [8]. After density points cloud are meshed, 3D ear model can be meshed immediately.
582
H. Liu, J. Yan, and D. Zhang
3 Experimental Results and Comparisons Once ear feature points cloud is obtained, we may apply triangulation criterion to obtain a surface model. In general, due to those errors and outliers, this mesh is highly irregular and it needs to be refined and simplified continually (for detail, please refer to [9] [10] [11]). Once a surface model is available, we can use texture-mapping technology to visualize the model. Some 3D ear wireframe and texture-mapping models reconstructed by mult-view are shown in Fig. 4.
Fig. 4. Multi-view 3D ear reconstruction
A comparison with those present methods stated in Section 1 is made in Table 1. Multi-view ear methods are a potential convenient and low cost ear reconstruction approach. Table 1. Ear Reconstruction complexity comparison methods Range scan
vertices 7000-9000
meshes 20000-40000
quality Fine
complexity High
Structure Light
2000-4000
5000-10000
Good
Moderate
MRI/CT
4000-8000
10000-120000
Finest
Highest
Multi-view
300-600
600-1200
Acceptable
Low
To acquire ear full spatial 3D pose information on the head, we can model the more profile face part closed up to the ear. Such two ear models are shown in Fig. 5 which contain more profile face part and reflect multi-view spatial pose information.
Fig. 5. 3D ear models that contain adequate spatial ear pose information
3D Ear Reconstruction Attempts: Using Multi-view
583
4 Conclusions Our main contribution in this work is this is the first work which applies multi-view epipolar geometry for 3D ear reconstruction. It will provide profitable attempts for the small and hollow objects: ear multi-view reconstruction. Ear feature points selecting and matching method based on ear contour division is proposed in this work. Many Experimental results and comparisons have demonstrated the potential validity and effectiveness of such ear reconstruction methods. For future work, we focus on finishing ear features selection and matching task accurately and automatically.
Acknowledgement This work is supported by National Natural Science Foundation of China ((No.60402020). In addition, we would like to thank for all those people who have contributed to this work by providing their data and comments.
References 1. Bhanu, B., Chen, H: Human ear recognition in 3D. Workshop on Multimodal User Authentication, (2003) 91–98 2. Slobodan Ilic, Pascal Fua: Using Dirichlet Free Form Deformation to Fit Deformable Models to Noisy 3-D Data. In European Conference on Computer Vision, (2002) 3. HAN Qiang, ZHANG Fu-qiang, JIAO Ting, YE Ming, WEI Bin, LANG Wei-jun: Establishment of Auricular Three-dimensional Image Database. Chinese Journal of Prosthodontics, Vol. 5(2) (2004) 4. Hartley R. I., Zisserman. A: Multiple View Geometry in Computer Vision. Cambridge University Press (2000) 5. Harris C., Stepphens M.: A Combined Corner and Edge Detector. Fourth Alvey Vision Conference, (1998) 147-151 6. Fischer M.A., Bolles R.C.: Random Sample Consensus: A Paradigm for Model Fitting with Application to Image Analysis and Automated Cartography. Comm. Assoc. Comp, Vol. 24(6) (1981) 381-395. 7. Hannah M.J.: A System for Digital Stereo Image Matching. Photogrammetric Engineering and Remote Sensing, Vol. 55 (12) (1989)1765-1770 8. Http://www.photomodeler.com 9. George P. L: Improvements on Delaunay-based Three-dimensional Automatic Mesh Generator. Finite Elements in Analysis and Design, Vol. 25(1997)297-317 10. Peter Schroder: Subdivision as a Fundamental Building Block of Digital Geometry Processing Algorithms. Journal of Computational and Applied Mathematics, 149(1) (2002)207-219 11. M. Garland and Y. Zhou: Quadric-based Simplification in any Dimension. Transactions on Graphics, 24(2) (2005)271-292
A Class of Multi-scale Models for Image Denoising in Negative Hilbert-Sobolev Spaces Jun Zhang1,2 and Zhihui Wei2 1
School of Science, Nanjing University of Science and Technology, 210094, Nanjing, Jiangsu Province, China 2 Department of Computer Science and Engineering, Nanjing University of Science and Technology, 210094, Nanjing, Jiangsu Province, China {phil zj, gswei}@mail.njust.edu.cn
Abstract. In this paper, we propose a class of multi-scale variational models for image denoising. Our models decompose a given image into two parts: geometric component representing the objects in the image and oscillatory component representing the noise or texture. Considering different components belong to different scale spaces and oscillatory components have small norm in negative Sobolev spaces, we propose multi-scale models image denoising in negative Sobolev space. Numerical results show that our models are flexible and efficient for preserving texture when denoising image.
1
Introduction and Motivations
Image denoising and decomposition are of important interests in image processing. Many well-known models work by decomposition the observed image f into a sum of two components, e.g.f = u + v, where u is a piece-wise smooth geometric component representing the objects in the image and v is the oscillatory component representing the noise and texture. One classical model is proposed by Rudin, Osher, and Fatemi [4] (ROF): 1 |∇u| + |f − u|2 dxdy} (1) inf {F (u) = λ 2 Ω u∈BV (Ω) Ω ROF model (1) performs very well for removing noise while preserving edges. However, it fails to separate well oscillatory components from high-frequencies components. To remedy this situation, Y.Mayer suggested replacing the L2 -norm in the second term of (1) with a weaker norm more appropriate for modeling texture or oscillatory patterns. He proposed a model in [5], but it is difficult to be solved. For practical alternative, Vese and Osher [6] have introduced a model as an approximation to Mayer’s model, and Osher, Sol´e and Vese [3] further modified it to a new problem as following (OSV): |∇u| + λ|f − u|2H −1 } (2) inf {E(u) = u∈BV (Ω)
Ω
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 584–592, 2006. c Springer-Verlag Berlin Heidelberg 2006
A Class of Multi-scale Models for Image Denoising
585
2 Where |v|2H −1 (Ω) = Ω |∇Δ−1 v| dxdy. Numerical experiments show that this model separates texture from objects better than the ROF model. Motivated by these works, L. Lien and L .Vese [1] extend (2) to the negative space H −s (s > 0). They proposed model as following: |∇u| + λ||f − u||2−s } (3) inf {F (u) = u∈BV (Ω)
Ω
where||f ||−s = ( (1 + |ξ|2 )−s f)f)dξ)1/2 , f) is the Fourier Transform of f . Model (3) is more flexible than OSV model, because we can choose different parameter s for different image, which characterizes the ”smoothness” of function f in H s . Numerical results show that OSV model and model(3)are efficient for preserving texture when denoising image. Considering different oscillatory components belong to different scale spaces. It is more appropriate for modeling different components in different negative Sobolev spaces. Motivated by this idea, we propose a class of multi-scale models in negative Sobolev spaces. In Section 2, we will describe our models. In Section 3, we present the result of our models for image denoising.
2
Description of the New Models
We decompose the noisy image f into a sum f = u + v where u is a piece-wise smooth geometric component and v is the oscillatory component. To obtain different components with different scales, we decompose v = f − u into different scale spaces by wavelet decomposition. Mayer [7] pointed out that some orthonormal wavelets can construct orthonormal Riesz bases in H s ,−r < s < r, where r is the Sobolev regularity of wavelet scale function ϕ. Using orthogonal wavelet system(ϕ, ψ), we have f −u= c0k ϕ0k + djk ψjk k
j≥1 k
Where c0k =< f − u, ϕ0k >L2 ,djk =< f − u, ψjk >L2 , and the equivalent norm ||f − u||2s ∼ 22sj ||(f − u)j ||2L2 , −r < s < r j≥0
where (f − u)0 =
c0k ϕ0k , and(f − u)j =
k
djk ψjk .
k
Replacing the second term of (3), we propose a new multi-scale model as following: 1 |∇u| + λj 22jsj ||(f − u)j ||2L2 , −r < sj ≤ 0} (4) inf {F (u) = u∈BV (Ω) 2 Ω j≥0
When sj = 0, λj = λ, j ≥ 0, it is ROF model; when sj = −1, λj = λ, j ≥ 0, it is an approximate to OSV model and when sj = −s(s > 0), λj = λ, j ≥ 0, it is
586
J. Zhang and Z. Wei
an approximate to model (3).This model is more flexible because we can choose different parameters sj for different scale components (f − u)j , j ≥ 0. Considering the Total Variation is not differentiable at 0 and applying gradient descent method to minimize (4),we arrive to the following non-linear PDE: ⎧ ∂u = div( √ ∇u2 2 ) + λj 22jsj (f − u)j in Ω ⎪ ⎪ |∇u| +ε ⎨ ∂t j≥0 → (5) √ ∇u2 2 · − n = 0 on Ω ⎪ |∇u| +ε ⎪ ⎩ u = 0 outsideΩ The solution of (5) is approximate to the solution of (4).To proceed with the discrete numerical algorithm for solving (5), we assume the initial discrete image f is of M × M pixels (fi,j , i, j = 0, 1, · · · , M − 1) and use the following notations: (ux )i,j =
1 (ui+1,j − ui−1,j ), 2
(uxx )i,j = ui+1,j + ui−1,j − 2ui,j , (uyx )i,j = (uxy )i,j = Then div i,j ( √
∇u |∇u|2 +ε2
) =
(uy )i,j =
1 (ui,j+1 − ui,j−1 ) 2
(uyy )i,j = ui,j+1 + ui,j−1 − 2ui,j
1 (ui+1,j+1 + ui−1,j−1 − ui+1,j−1 − ui−1,j+1 ) 4
2 2 2 (uxx )i,j ((uy )2 i,j +ε )+(uyy )i,j ((ux )i,j +ε )−2(ux )i,j (uy )i,j (uxy )i,j ) 2
((ux )2 +(uy )2 +ε2 ) 3 i,j i,j
and using the wavelet decomposition, we have λj 22jsj (f − u)j = λ0 c0k ϕ0k + λj 22jsj djk ψjk j≥0
k
j>0 k
So we solve (5) with the following iterative scheme: 1. u0 is arbitrarily given (we can take u0 = f ) 2. Once un is calculated, decompose f − un by orthonormal wavelet decomposition and obtain coefficients c0k and djk 3. Modify the coefficients as * c0k := λ0 c0k , d*jk := λj 22jsj djk and reconstruct * n the image F = * c0k ϕ0k + djk ψjk j>0 k
k
∇un n )+ Fi,j ), for i, j = 1, 2, · · · , M − |∇un |2 +ε2 un+1 = 0, i, j ≤ 0 and i, j ≥ M − 1 ij
n i,j √ 4. Compute un+1 i,j = ui,j + Δt(div (
2, with the boundary conditions
Our model(4) is more flexible and appropriate for modeling the oscillatory components. In this paper, we are especially interested in two models: H 0 + H s model and H α + H β + H γ model which are based on (4). The results in [1],[3] and [4] show L2 -norm is appropriate for modeling piecewise smooth geometric components and norm in negative Sobolev space is appropriate for modeling the oscillatory components. Our model can mixed them freely. We consider a two level model as following(H 0 + H s model): ||(f − u)j ||2L2 + λ2 22sj ||(f − u)j ||2L2 } (6) inf { |∇u| + λ1 u∈BV (Ω)
Ω
o≤j<J
j≥J
A Class of Multi-scale Models for Image Denoising
587
where −r < s ≤ 0.The second term is for modeling the low-frequencies part of f − u and the third term is for the high- frequencies part of f − u. When s = 0 and λ1 = λ2 , it becomes ROF model and when s = −1, it can be considered as a mixed model of ROF model and OSV model. In model (6), we do not distinguish noise and texture. In wavelet decomposition of an image, the wavelet coefficients of noise are mainly at the first level, and that of texture are at the middle levels mainly. Considering this phenomena, we can modify model(6) as following(H α + H β + H γ model): inf { Ω |∇u| + λ1 22αj ||(f − u)j ||2L2 + λ2 22βj ||(f − u)j ||2L2 u∈BV (Ω) o≤j<J J ≤j<J 1 1 2 2γj +λ3 2 ||(f − u)j ||2L2 } j≥J2
(7) In this model, the second term is for modeling the low-frequencies part of f − u, the third term is for pestering textures and the forth term is for modeling edges and high-frequencies noise mainly. The parameters s in (6) and α, β, γ in (7) are the Sobolev regularities of different components which characterize the ”smoothness” of these component. They are should be choose according to the image. Mumford and Gidas [10], taking a statistical approach, have pointed out that Gaussian white noise lives −1−ε −s ,where Hloc (for any s > 0) is the Hilbert-Sobolev space of negin ∩ε>0 Hloc ative degree. On the other hand, when s = 0,H 0 = L2 and we know geometric components can be modeled in L2 appropriately. The ”smoothness” of texture should be between that of geometric component and noise. In this paper, we only consider the noisy image with Gauss white noise. So we can choose −1 ≤ s ≤ 0 for model (6), and in model (7) we can choose −1 ≤ γ ≤ β ≤ α ≤ 0.
3
Numerical Experiments
In this section, we present some numerical results of our models for image denoising.In the first experiment, we consider a geometric image of 256 × 256 pixels and the noisy image with Gauss white noise σ = 20,which are showed in Fig.1.
Fig. 1. The original image(left) and the noisy image(right) with Gauss white noise
588
J. Zhang and Z. Wei Table 1. Comparison of PSNR for denoised image of the noisy image in Fig.1 Image Noisy image ROF model OSV model model(6) model(7) PSNR 22.4012
28.7830
29.3012
29.6705
29.7787
Fig. 2. The wavelet decomposition of noisy image using Daubechies wavelet Db4, the first level(left), the second level(middle) and the third level(right)
Fig. 3. The denoised image u(left) and the residua f-u+100 (right) of ROF model
Fig. 4. The denoised image u(left)) and the residua f-u+100 (right) of OSV model
The comparison of PSNR of denoised image is presented in Table1, it shows our models are efficient for increasing PSNR. As mentioned in section 2,we choose the parameters −1 ≤ s ≤ 0 and −1 ≤ γ ≤ β ≤ α ≤ 0. In Fig.2, we can see the wavelets coefficients of texture are mainly at the second level. So in this experiment ,we choose s = −1 for the wavelet coefficients at first level in model (6). In model(7),we choose γ = −1 for the wavelet coefficients at first level,β = −0.5 for the wavelet coefficients at second level and α = 0 for other wavelet coeffi-
A Class of Multi-scale Models for Image Denoising
589
Fig. 5. The denoised image u(left)) and the residua f-u+100(right) of our model(6) with parameters s = −1, λ1 = 0.1, λ2 = 0.02
Fig. 6. The denoised image u(left) and the residua f-u+100 (right) of our model (7) with parameters α = 0, β = −0.5, γ = −1, λ1 = 0.03, λ2 = 0.4, λ3 = 0.001
Fig. 7. The original image(left) and the noisy image(right) with Gauss white noise
cients. We present the results of our model(6) in Fig.5 and model(7) in Fig.6. The results show our models are efficient for preserve textures when denoising the image, and the denoised image of our models are better than that of ROF model and OSV model.Model (7) is better than Model (6),because we choose different parameter for the wavelet coefficients at the second level. It shows that model texture in negative Sobolev spaces is appropriate than in L2 . In the second experiment, we denoise the standard image Barbara with Gauss white noise σ = 20 which has abundant of textures.In Fig.8, we can see the wavelet coefficients of texture are mainly at the first and second level. The first experiment shows modeling texture in negative space is appropriate. So in this experiment, we choose s = −1 for the wavelet coefficients at the first and second level in model (6). In model (7), we choose γ = −1 for the wavelet coefficients at first level,β = −0.8 for the wavelet coefficients at second level and α = −0.5
590
J. Zhang and Z. Wei
Fig. 8. The wavelet decomposition of noisy image using Daubechies wavelet Db4, the first level(left), the second level(middle) and the third level(right)
Fig. 9. The denoised image u(left) and the residua f-u+100 (right) of ROF model
Fig. 10. The denoised image u(left)) and the residua f-u+100 (right) of OSV model Table 2. Comparison of PSNR for the denoised image of the noisy image in Fig.7 Image Noisy image ROF model OSV model model(6) model(7) PSNR 22.1689
27.5589
27.6652
27.6924
27.7078
for other wavelet coefficients. In Table 2, the PSNR of our models are lager than ROF model and OSV model. The residua in Fig.11 shows model (6) remove more texture than OSV model but much better than ROF model. In Fig.12, the result is better than both ROF model and OSV model. Numerical results show our models are flexible and efficient for preserving texture when denoising image, especially the model (7). In our models, we should choose the proper parameters such as s, α, β, γ. In this paper, we only consider
A Class of Multi-scale Models for Image Denoising
591
Fig. 11. The denoised image u(left)) and the residua f-u+100(right) of our model(6) with parameters s = −1, λ1 = 0.15, λ2 = 1
Fig. 12. The denoised image u(left) and the residua f-u+100 (right) of our model (7) with parameters α = −0.3, β = −0.5, γ = −1, λ1 = 0.3, λ2 = 0.6, λ3 = 1.5
the Gauss white noise and just give the bounds of these parameters approximately. The relation between regularity parameters and the components, and the proper choice of the regularity parameters adaptively according to the image will be researched in the future work.
Acknowledgment The work of this paper is supported by NNSF of China (Grant No.60574015). The first author is appreciative to Wenze Shao for many enlightening discussing.
References 1. Lieu, L., Vese, L.: Image Restoration and Decomposition Via Bounded Total Variation and Negative Hilbert-Sobolev Space. UCLA CAM Report (2005) 05-33 2. Oswald, P.: Multilevel Frames and Riesz Bases in Sobolev Space. Lectures. http://www.faculty.iu-bremen.de/poswald/ (2005) 3. Osher, S., Sol´e, A., Vese, L.: Image Decomposition and Restoration Using Total Variation Minimization and Norm. Multi. Model. Simul. Vol.1-3 (2003) 349-370 4. Ruding, L., Osher, S., FatemiI, E.: Nonlinear Total Variation Based Noise Removal Algorithms. Phys. D., 60 (1992) 259-268 5. Mayer, Y.: Oscillating Patterns in Image Processing and Nonlinear Evolution Equations. Univ. Lecture Ser. 22 .AMS. Providence, RI (2002)
592
J. Zhang and Z. Wei
6. Vese, L., Osher, S.: Image Denoising and Decomposition with Total Variation Minimization and Oscillatory Functions. J. Math. Imaging Vision, 20 (2004) 7-18 7. Meyer, Y.: Wavelets and Operators. Cambridge Studies in Advanced Mathematics, Vol. 37, Cambridge Univ. Press, Cambridge (1992) 8. Daubechies, I.: Ten Lectures on Wavelet. SIAM. Philadelphia, (1992) 9. Daubechies, I., Teschke, G.: Variational Image Restoration by Means of Wavelets: Simultaneous Decomposition, Deblurring, and Denoising. Appl. Comput. Harmon. Anal. 19 (2005) 1-16 10. Mumford, D., Gidas, B.: Stochastic Methods for Generic Images. Quart . Appl. Math. 59 (2001) 85-111
A Detection Algorithm of Singular Points in Fingerprint Images Combining Curvature and Orientation Field Xiaolong Zheng, Yangsheng Wang, and Xuying Zhao Institute of Automation, Chinese Academy of Science Bei Jing, China [email protected]
Abstract. The detection of singular points(core and delta) is very important for fingerprint classification and some fingerprint matching based on reference points. This paper propose a new algorithm for extracting the singular points using the combination of curvature and orientation field of fingerprint images, which provide a more accurate description of characteristic of singular points. The effectiveness of our algorithm has been shown by the experiments implemented in the public databases, DB2 and DB3, FVC2002.
1 Introduction Fingerprint has been widely researched currently due to the uniqueness and invariability of fingerprint feature and the feasibility of fingerprint verification system[1,2]. Generally, a fingerprint has two kinds of feature——the local feature and the global feature, the former, called minutiae in conventional fingerprint verification /identification method, provides the base of exactly matching, while the later gives a rough profile of a fingerprint, which can be used to classify fingerprints into several classes. Singular points(SPs), the most important global feature, contain the significant global information which is served as not only the foundation of fingerprint pattern classification[3,4] but also the reference points in the some fingerprint matching [5,6,7]. A fingerprint image is the pattern of valleys and ridges on the human fingertips[6]. This flow-like pattern forms an orientation field extracted from the trend of valleys and ridges. In the large part of a fingerprint, the orientation field is quite smoothing, while in some areas, the orientation appear abrupt discontinuous, the SPs, including core and delta, are defined as the centers of those areas. The core is the topmost or bottommost of the most curving ridge, and the delta is the center of intersecting range of three or more orientations[8]. Fig.1 presents the typical SPs. Until now, several detection methods have been given in the literature[8,9,10]. Masahiro Kawagoe et al. [11], as well as Lin Hong [3] provides a detect method based on the Poincaré Index. The detect approach adopted by Lin Wang[12] using the coherence derived from the distribution of Gaussian -Hermite moment of different orders of the fingerprint image. Anil K. Jain et al. in [6] propose a method base on the multiple resolution analysis of the orientation fields. However, the accurate estimation of SPs is still a challenging task because of the various noise of fingerprint image. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 593 – 599, 2006. © Springer-Verlag Berlin Heidelberg 2006
594
X. Zheng, Y. Wang, and X. Zhao
Core
Delta Fig. 1. An example of fingerprint image and its singular points
In this paper, we proposed a novel approach to detect the SPs in fingerprint images. For the purpose of detecting the position and type of SPs in the raw fingerprint images, we first combine the curvature and orientation field of a fingerprint, second, we propose an improved Poincaré Index method to judge the type of candidate points. Through the second step can also remove the spurious SPs, which make our algorithm more robust to the unavoidable noise in the fingerprint image. This paper is organized as below. In section 2, the characteristics of local fingerprint are discussed as the foundation of SPs detection. The detection algorithm is presented in section 3. Some experimental results are presented in section 4. Finally, section 5 give the conclusion of this paper.
2 Analysis of Local Fingerprint Image `As mentioned before, fingerprint is a typical flow-like pattern, while the singular points are defined as the centers of the area where the directional field is abruptly discontinuous and where the curvature of the ridge or valley is larger than other areas[8,9]. From this viewpoint, we can obtain the SPs by using the intrinsic properties——orientation field and curvature field. Fig.2 shows the orientation field of the SPs region and normal region.
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 2. Various typical fingerprint regions: (a) normal region, (b) the orientation of (a), (c) core region, (d) the orientation of (c), (e) delta region, (f) the orientation of (e)
According to differential geometry[13], curvature can be derived from orientation field. Obviously, in order to more accurately locate the position of SPs, we should make full use of this both properties. From Fig.2, we can observe that the orientation field of SPs has a common characteristic, it appear disorder in the area around SPs, while in the normal area, it appear quite smoothing. So we can represent the characteristic of orientation field in the area around SPs by a measure of disorder of the orientation field.
A Detection Algorithm of SPs in Fingerprint Images
595
First, we quantify the orientation field. In this paper, we obtain 8 directions of orientation field. Second, as to each point in the fingerprint image, we divide a w × w block around the point and calculate the direction histogram of the block. We represent the degree of disorder of the orientation field by the equation: 8
D = ¦ [ P(i)]2 .
(1)
i =1
Where
P (i ) =
N (i ) N
Where N (i ) is the number of pixel whose direction number is i in the w × w block, N is the number of total pixel of the block (i.e. w × w ). According to (1), D reach the maximum if the local orientation field has only one direction, while D reach the minimum if the local orientation field has a uniform distribution. In other words, D is large in the normal area, while in the area around SPs, D turns small. On the other hand, since the area around SPs has a large curvature, and curvature can be expressed by directional difference of the points along the tangent, assume the orientation of a point P1 in the fingerprint image is θ1 , the next point along
θ1 is P 2 , and its orientation is θ 2 (see Fig.3), the curvature of point P1 is defined as follow[13]:
k(s) =
dθ . ds
(2)
Where θ is the tangent angle of point in the curve, images, we define the curvature as follow:
s is the arc length. In fingerprint
k = θ1 −θ 2 .
(3)
Where θ1 and θ 2 are shown in the Fig.3.
Fig. 3. Illustrate the local characteristic of curvature
In the area around SPs, the curvature of ridge is larger than the normal region, while the degree of disorder D is smaller than the normal region. So, in order to get the accurate location by combining the orientation and curvature field, the curvature of each point in the fingerprint image should be changed as follow: C = f (k) .
(4)
596
X. Zheng, Y. Wang, and X. Zhao
Where the function f (⋅) is monotone decreasing, as a result, the value of C in the area around SPs is smaller than the normal region. We can weight (1) by (4): D' = ¦ C( j)⋅[ P( j)]2 .
(5)
j
Where C(j) calculated by (4) is the curvature measure of the jth pixel in the fingerprint image, P (i ) calculated by (1) is the direction probability of the jth pixel in the fingerprint image. It is obvious that D ' will reach the local minimum in SPs.
3 Singular Points Detection Algorithm In order to implement our detection algorithm, the orientation field of fingerprint images should be first calculated, many methods to calculate orientation field have been proposed in the literature[2,14], in this paper, we adopt the method proposed in the [2] to calculate and smooth the orientation field. Generally, our detection algorithm include two steps, the first step estimate the location of SPs, and the second step judge the type of SPs. The algorithm of first step proceeds as follow: 1. Calculate the block orientation field using the method in the [2], and derive the curvature field from the orientation field according to (3), then convert the curvature k ( s ) to the measure C using (4), in this paper, we define the f (⋅) of (4) as follow: 1 f (k) = . (6) 1+ k Where k is the curvature in each pixel, ⋅ is the operator of taking modulus. 2. Quantify the orientation field into 8 directions: 0, ʌ/8, ʌ/4, 3ʌ/8, ʌ/2, 5ʌ/8, 3ʌ/4, 7ʌ/8. Calculate the directional probability of each pixel using the direction histogram in a v × v block. 3. Calculate the measure D ' of SPs using (5), get the measure image of fingerprint images, usually the global minimum of measure field appeared in the background, so, before search the SPs, we have to obtain the foreground of fingerprint images, in this paper, we employ the segmentation algorithm proposed in our previous work[15]. 4. Search the global minimum in the measure image of the fingerprint foreground, then judge the type of this point through the step two. Since the measure D ' of pixels near the SPs are smaller than the normal region, we should assign the measure D ' of those pixels a large value in order to avoid the relocate the SPs around the previous SPs. Continue search the global minimum in the measure image, if the type of candidate points is neither core nor delta, then stop the whole algorithm. As many researchers pointed out in the literature[8,11], Poincaré Index not only can find the location of SPs but also can judge the type the SPs. According to the Poincaré Index defined in the [3]:
Poincare(i,j)=
1 Nφ ¦ Δ (k) . 2π k=1
(7)
A Detection Algorithm of SPs in Fingerprint Images
597
Where δ ( k ) ° Δ ( k ) = ®π +δ ( k ) ° ¯π -δ ( k )
if δ ( k ) <π / 2 if δ ( k )<−π / 2 otherwise
δ ( k ) = Ο (ψ x (i '),ψ y (i ')) - Ο (ψ x (i ),ψ y (i ))
i ' = (i + 1) mod Nψ
Where Ο(i , j ) is the orientation field of fingerprint. ψ x (i ) and ψ y (i ) are the coordinates of the pixel in the enclosed contour around the candidate points. According to (7), Poincaré Index has the value of 1/2 in a core and the value of -1/2 in a delta. While in the normal point, Poincaré Index is zero. Nevertheless, apart from the difficulty of parameter selection, the calculation of Poincaré Index is ease to be affected by the noise of orientation field. In order to judge the type of the candidate SPs and overcome the drawback of noise sensitiveness, we improve the method of Poincaré Index, we calculate the value of Poincaré Index over the disk instead of the enclosed contour around the candidate point, we divide the disk into 8 sectors(see Fig. 4), then calculate the dominant direction in each sector using the directional histogram, we obtain the modified Poincaré Index based on the sector instead of the contour by accumulate the orientation change of dominant of the eight sectors using (7).
Fig. 4. 8 sectors around a singular point
The second step of our algorithm proceeds as follows: draw a disk centered on the candidate point, and its radius is 15 pixels, divide the disk into 8 sectors like the Fig. 4, calculate the dominant direction of each sector and compute the Poincaré Index using (7), if the index is 1/2, then classify the SPs to the category of core, if the index is -1/2, the SPs are regarded as delta, otherwise, the SPs is a spurious singular points.
4 Experiment Results In order to evaluate the performance of our detection algorithm, we choose the public fingerprint database, DB2 and DB3, from FVC2002, each database provide 51 random images for our experiment. Since the true positions of SPs are not known for computer, we have to manually estimate the true positions of the SPs. The Fig. 6 shows two samples from the DB2 and DB3 respectively. According to the Fig.5, we can see that a large number of spurious SPs are locate at the region of background and the edge of the background and foreground, however, we
598
X. Zheng, Y. Wang, and X. Zhao
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
(e)
(e)
Fig. 5. Examples of our detection method: (a) the original image, (b) the image of proposed measure, (c) the location of SPs with background, (d) the measure image of foreground, (e) the detected SPs of (a)
can discard these spurious points by the segmentation technology and morphological operators. The Fig.5(b) and Fig.5(d) show the image of the proposed combining measure D of equation (4), for the convince of visual inspection, we present the measure by the bright of pixels, and the smaller the measure is, the brighter of pixel is. Actually, we get the positions of SPs by find the global minimum in the foreground of measure image. The Fig.5(c) shows the position of SPs including the spurious SPs in the background, while the Fig.5(e) shows the genuine SPs. The summary performance of our algorithm is shown in Table 1, the second column give the number of fingerprints with spurious SPs or missed SPs among the tested fingerprint database, the third column give the percent of fingerprint with spurious SPs and missed SPs produced by our detect algorithm, and a comparative performance of a classical algorithm be given in the fourth column of Table 1. Table 1. Summary experiment results and comparison of algorithms
The core The delta The core The delta
fingerprint with spurious
No. 2
fingerprint with spurious
3
fingerprint with missed
5
fingerprint with missed
3
Percent 4.9%
7.8%
Algorithm[11] 7.5%
10.4%
A Detection Algorithm of SPs in Fingerprint Images
599
5 Conclusion In this paper, a new detection algorithm of singular points in fingerprint images is proposed, the proposed method consists mainly of two steps, first, we find the accurate location of singular points using the measure combining the orientation field and curvature field, second, we verify the category using a improved Poincaré Index method, which can even discard some spurious points found by the first step. The experiment results show the performance of our detection algorithm.
References 1. Jain, A.K., Hong, L., Bolle, R.M.: On-Line Fingerprint Verification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19 (1997) 302–314 2. Hong, L., Yifei, W., Jain, A.K.: Fingerprint Image Enhancement: Algorithm and Performance Evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20 (1998) 777–789 3. Hong, L., Jain, A.K.: Classification of Fingerprint Images. Proceedings of 11th Scandinavian Conference on Image Analysis (1999) 4. Qinzhi, Z., Hong, Y.: Fingerprint Classification Based on Extraction and Analysis of Singularities and Pseudo Ridges. Pattern Recognition, 37 (2004) 2233-2243. 5. Chan, K.C., Moon, Y.S., Cheng, P.S.: Fast Fingerprint Verification Using Subregions of Fingerprint Images. IEEE Transactions on Circuits and Systems for Video Technology, 14 (2004) 95-101 6. Jain, A.K., Salil, P., Hong, L., Sharath, P.: Filterbank-based fingerprint matching. IEEE Transactions on Image Processing, 9 (2000) 846–859 7. Weiwei, Z., Yangsheng, W.: Core-Based Structure Matching Algorithm of Fingerprint Verification: 16th International Conference on Pattern Recognition, 1 (2002) 70-74 8. Bazen, A.M., Gerez, S. H.: Systematic Methods for the Computation of the Directional Fields and Singular Points of Fingerprints. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24 (2002) 905–919 9. Srinivasan, V.S., Murthy, N.N.: Detection of Singular Points in Fingerprint Images. Pattern Recognition, 25 (1992) 139-153 10. Klimanee, C., Trinh, A.V., Nguyen, D.T.: Accurate Determination of Singular Points and Principal Axes on a Fingerprint. CD-Proceedings of TENCON 2004 Conference on Analog and Digital Techniques in Electrical Engineering, Chiang Mai, Thailand. (2004) 159-162 11. Masahiro, K., Akio, T.: Fingerprint Pattern Classification. Pattern Recognition, 17 (1984) 295-303 12. Lin, W., Mo, D.: Extraction of Singular Points in Fingerprints by the Distribution of Gaussian-Hermite Moment. First International Conference on Distributed Frameworks for Multimedia Applications. (2005) 206-209 13. Mengdao, J.: Differential Geometry. Science Press(Chinese) (2003) 14. Jinwei, G., Jie Z., David, Z.: A combination model for orientation field of fingerprints. Pattern Recognition, 37 (2004) 543-553 15. Zhongchao, S., Yangsheng, W., Jin, Q., Ke, X.: A New Segmentation Algorithm for Low Quality Fingerprint Image. Third International Conference on Image and Graphic. (2005) 314–317
A Mathematical Framework for Optical Flow Computation Xiaoxin Guo, Zhiwen Xu, Yueping Feng, Yunxiao Wang, and Zhengxuan Wang Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street 2699#, Changchun, 130012, P.R.China [email protected], [email protected]
Abstract. In this paper, we propose a mathematical framework under which we present the necessary condition of the minimization of the optical flow problem. The framework is a straightforward spatiotemporal extension, which can be used to accommodate most of the known differential-based optical flow computation. In addition, this paper presents an alternative formulation of the optical flow problem that relies on spline functionals. With the framework, we describe some typical optical flow methods.
1 Introduction Optical flow is the velocity field or the displacement field related to each of the pixels in an image sequence. Such a displacement field results from the apparent motion of the image intensity in time. Optical flow provides information for analyzing motion in a sequence of frames. Estimating the optical flow is a fundamental problem in lowlevel vision and can undoubtedly serve many applications in image sequence processing. There are many different methods of estimating the optical flow. The differential framework methods start with an intensity constraint equation which forms a single linear equation per each pixel, constraining its motion vector [1][2]. Such linear constraints are posed over all the pixels in the current image, forming an ill-posed estimation problem. The various differential-based methods thus vary in the way they add constraints in order to ensure a single and accurate solution to the estimation problem. This paper generalizes the method proposed by Horn and Schunck for the estimation of optical flow, and presents a mathematical framework used to derive a regularized solution for optical flow problem. The framework benefits the formulation of the smoothness constraint. At the same time, we prove the necessary condition of the minimization of the estimation problem. The proof provides a basis for optical flow computation. A spline functional as a smoothness constraint can be regarded as a special case of the framework. Finally, we also demonstrate the applications of the framework. Typically, the framework includes the formulation that led to the HornSchunck method as a special case [2]. Section 2 begins with the presentation of a mathematical framework for the derivation of regularized solutions to the optical flow problem. This is followed in Section 3 D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 600 – 605, 2006. © Springer-Verlag Berlin Heidelberg 2006
A Mathematical Framework for Optical Flow Computation
601
by a formulation of the optical flow problem that relies on smoothness constraints involving spline functionals. Some classical examples expressed with the framework are presented in Section 4 and the paper is concluded in Section 5.
2 A Mathematical Framework Consider an image sequence g ( x, y, t ) , where ( x, y ) denotes the location within a rectangular image domain Ω, and t∈[0,T] denotes time. Many differential methods for optical flow are based on the assumption that the grey values of image objects in subsequent frames do not change over time:
g x u + g y v + gt = 0 ,
(1)
where the displacement field (u , v) is called optical flow, and subscripts denote partial derivatives. In general, optical flow can be computed by determining the functions u = u * (α ) and v = v* (α ) that minimize the functional
E ( u, v ) = ³
Ω×[0,T ]
(
)
K ρ ∗ ( g x u + g y v + gt ) dxdydt + α ( Ec1 ( u ) + Ec 2 ( v ) ) , 2
(2)
where Ω is a rectangular image domain, K ρ is a Gaussian kernel with the standard deviation ȡ, and ∗ is a convolution operator. Ec1 (u ) is defined in terms of F1 (⋅) , which is a twice continuously differentiable function of u and its partial derivatives ∂ p+q+r u ∂x p ∂y q ∂t r = ux p yqt r up to order ( K , L, N ) , as Ec 1 ( u ) = ³
Ω×[ 0,T ]
(
)
F1 u , u x , u y , ut ,! , u xK y Lt N dxdydt ,
(3)
and Ec 2 (v) is defined in terms of F2 (⋅) , which is a twice continuously differentiable function of v and its partial derivatives ∂ p+q+r v ∂x p ∂y q ∂t r = vx p yqt r up to order ( K , L, N ) , as Ec 2 ( v ) = ³
Ω×[ 0,T ]
(
)
F2 v, vx , v y , vt ,! , vxK yLt N dxdydt .
(4)
In Eq. (2), the sum represents the smoothness constraint, whereas α is the regularization parameter, which determines the relative effect of the smoothness constraint on the computation of the motion velocity field. The necessary conditions for solutions of the minimization problem established above were obtained by relying on calculus of variations and are summarized by the following theorem: Theorem: Let ^ ( K ,L , N ) [ℜ] be the space of all continuous smooth functions in a region of interest ℜ , that is, the space of continuous functions with continuous partial derivatives up to order ( K , L, N ) in ℜ . Let Γ be the boundary of the region of interest ℜ . The space ^ (0K ,L , N ) [ℜ] consists of continuous smooth functions in ℜ , whose partial derivatives up to order ( K , L, N ) vanish at the boundary Γ of ℜ . The functional E = E (u, v) defined by (2)-(4) is minimized by the functions u = u * (α ) ∈ ^ (0K ,L , N ) [ℜ] and v = v* (α ) ∈ ^ (0K ,L , N ) [ℜ] that satisfy the conditions
602
X. Guo et al.
K ρ ∗ ( g x ( g x u + g y v + gt )) +
α
K ρ ∗ ( g y ( g x u + g y v + gt ) ) +
α
K
L
N
¦¦¦ (−1) 2 p =0 q =0
K
L
r =0
N
¦¦¦ (−1) 2 p =0 q = 0
p+ q+ r
t =0
p +q +r
∂ p+ q+ r ∂x p ∂y q ∂t r
§ ∂F1 ¨ ¨ ∂u p q r © x yt
· ¸=0, ¸ ¹
(5)
∂ p +q +r ∂x p ∂y q ∂t r
§ ∂F2 ¨ ¨ ∂v p q r © x yt
· ¸=0. ¸ ¹
(6)
Proof: The functional E (u, v) defined in Eq. (2) can also be written as
E ( u , v ) = E0 ( u, v ) + α Ec1 ( u ) + α Ec 2 ( v ) ,
(7)
where Ec1 (u ) and Ec 2 (v) are defined in Eq. (3) and (4), respectively, and E0 ( u , v ) = ³
Ω×[ 0,T ]
(
)
K ρ ∗ ( g x u + g y v + g t ) dxdydt . 2
(8)
Let δ u E ( h, v ) and δ v E ( u , h ) be the first variation of E ( u, v ) with respect to u and v respectively. Then the necessary conditions for a minimum of Eq. (7) are [4][5]
δ u E ( h, v ) = δ u E0 ( h, v ) + αδ u Ec1 ( h ) = 0, ∀h ∈ ^ (0K ,L ,N ) [ℜ] ,
(9)
δ v E ( u , h ) = δ v E0 ( u, h ) + αδ v Ec 2 ( h ) = 0, ∀h ∈ ^ (0K ,L ,N ) [ℜ] .
(10)
The first variation of E0 (u, v) with respect to u can be determined in terms of the difference Δ u E0 ( h, v ) = E0 ( u + h, v ) − E0 ( u , v ) = 2³
Ω×[0,T ]
K ρ ∗ hg x ( g x u + g y v + g t ) dxdydt + ³
Ω×[0,T ]
K ρ ∗ h 2 g x2 dxdydt.
(11)
The first variation δ u E ( h, v ) is the term of the difference Δ u E0 ( h, v ) that is linear in h. According to Eq. (11),
δ u E0 ( h, v ) = 2³Ω×[0,T ] K ρ ∗ hg x ( g x u + g y v + g t ) dxdydt .
(12)
The first variation of any functional Ec1 (u ) defined in Eq. (3) is given by [6]
δ u Ec1 ( h ) = 2³Ω×[ 0,T ] hd u ( x , y ,t ) Ec1 ( u ) dxdydt ,
(13)
where h ∈ ^ (0K ,L ,N ) [ℜ] and d f ( x , y ,t ) Eci ( f ) =
1 K L N ∂ p+q +r (−1) p+q+r p q r ¦¦¦ ∂x ∂y ∂t 2 p = 0 q =0 r =0
§ ∂F1 ¨ ¨ ∂f p q r © x yt
· ¸, i = 1, 2, . ¸ ¹
(14)
Using Eq. (12) and (13), the first variation δ u E ( h, v ) of the functional E ( u, v ) defined in Eq. (7) takes the form
δ u E ( h, v ) = 2 ³Ω×[ 0,T ] hd u E ( u, v ) dxdydt ,
(15)
A Mathematical Framework for Optical Flow Computation
603
where d u E ( u, v ) = K ρ ∗ g x ( g x u + g y v + gt ) +
α
K
L
N
¦¦¦ (−1) 2 p = 0 q =0
p+ q+ r
r =0
∂ p+ q+ r ∂x p ∂y q ∂t r
§ ∂F1 ¨ ¨ ∂u p q r © x yt
· ¸. ¸ ¹
(16)
The condition δ u E ( h, v ) = 0 is satisfied for all h ∈ ^ (0K ,L ,N ) [ℜ] if d u E ( u, v ) = 0 or, equivalently, if K ρ ∗ g x ( g x u + g y v + gt ) +
α
K
L
N
¦¦¦ (−1) 2 p = 0 q =0
p+ q+ r
r =0
∂ p+ q+ r ∂x p ∂y q ∂t r
§ ∂F1 ¨ ¨ ∂u p q r © x yt
· ¸ = 0. ¸ ¹
(17)
It can similarly be shown that the condition δ v E ( u , h ) = 0 is satisfied for all h ∈ ^ (0K ,L ,N ) [ℜ] if K ρ ∗ g y ( g xu + g y v + gt ) +
α 2
K
L
N
¦¦¦ (−1) p+q+r p =0 q = 0
r =0
∂ p +q +r ∂x p ∂y q ∂t r
§ ∂F2 ¨ ¨ ∂v p q r © x yt
· ¸=0. ¸ ¹
(18)
3 The Spline Functional with a Penalizer Suppose F1 (⋅) and F2 (⋅) correspond to a spline functional of order M with a penalty function, that is
(
)
Fi f , f x , f y ,! , f xK y Lt N = ψ ( S M ( x , y ,t ) ( f )), i = 1, 2 ,
(19)
where ψ (⋅) is a penalty function, and 2
§ § · · ∂M f SM ( x , y ,t ) ( f ) = ¦ ¨ CMp ,q ¨ p q M − p−q ¸ ¸ . ¨ p +q≤ M , © ∂x ∂y ∂t ¹ ¸¹ p ≥ 0, q ≥0 ©
(20)
where CMp , q = M !/( p !q !( M − p − q)!) . We require that the penalty function ψ (⋅) ≥ 0 and is an increasing function:
ψ ′(⋅) > 0 ,
(21)
so that the functional is an increasing function with respect to the smoothness of the field as measured by SM ( x , y ,t ) ( f ) . Therefore, the minimization of the functional is equivalent to smoothing the optical flow field. The use of the penalty function enables discontinuity preservation. For instance, ψ (⋅) can adopt a Huber function to preserve the discontinuity in the a priori model. Using Eq. (19), d f ( x , y ,t ) Eci ( f ) = (−1) M
¦C
p +q≤ M , p ≥ 0, q ≥0
p ,q M
Dxp Dyq Dt( M − p−q ) (ψ ′( S M ( x , y ,t ) ( f )) Dxp Dyq Dt( M − p−q ) f ), i = 1, 2,
(22)
604
X. Guo et al.
where Dxp Dyq Dtr f = ∂ p+q+r f ∂x p ∂y q ∂t r . In this case, conditions (5) and (6) become K ρ ∗ g x ( g x u + g y v + g t ) + α d u ( x , y ,t ) Ec1 ( u ) = 0 ,
(23)
K ρ ∗ g y ( g x u + g y v + g t ) + α d v ( x , y ,t ) Ec 2 ( v ) = 0 .
(24)
4 Applications More methods can be incorporated into the same framework with similar formal expressions. For the spatial version of the first-order spline functional ( M = 1,ψ ( x) = x ), Eq. (3) and (4) correspond to, respectively
Ec1 ( u ) = ³ ( ux2 + u y2 ) dxdy and Ec 2 ( v ) = ³ ( vx2 + v y2 ) dxdy . Ω
Ω
(25)
The above functional leads to the Horn-Schunck method [2] as follows, if ρ = 0 for E0 (u, v) then E (u , v) = ³
Ω
( ( g u + g v + g ) + α ( ∇u 2
x
y
t
2
+ ∇v
2
))dxdy ,
(26)
where ∇u is a gradient operator. Spatiotemporal versions of the Horn-Schunck method have been considered by Elad et al. [7]. The term F1 (u x , u y ) + F2 (u x , u y ) = u x2 + u y2 + vx2 + v y2 measures the smoothness of the velocity field by quantifying the pixel-to-pixel variation of the velocity vectors. Thus, the Horn-Schunck method seeks a motion field that satisfies the optical flow equation with minimum pixel-to-pixel variation among the velocity vectors. The formulation that relies on a first-order spline functional can be interpreted as a special case of the minimization problem. In this case, conditions (23) and (24) become K ρ ∗ g x ( g x u + g y v + gt ) + αΔu = 0 ,
(27)
K ρ ∗ g y ( g x u + g y v + gt ) + αΔv = 0 ,
(28)
where Δ = ∂ 2 ∂x 2 + ∂ 2 ∂y 2 denotes a Laplacian operator. The Lucas–Kanade method [3] is a special case of the framework when α = 0 , which minimizes the quadratic form
(
)
E (u , v) = ³ K ρ ∗ ( g x u + g y v + gt ) . Ω
2
(29)
The spatiotemporal Lucas–Kanade method is similar to the approach of Bigün et al. [8]. Robust variants of the Lucas–Kanade method have been investigated by Black et al. [9] and by Yacoob et al. [10], respectively. From a statistical viewpoint a penalizer can be regarded as applying methods from robust statistics which penalize outliers less severely than those methods without penalizer. In general, nonlinear methods give better results at locations with flow
A Mathematical Framework for Optical Flow Computation
605
discontinuities. The numerous discontinuity preserving global methods with spatiotemporal regularizers have been proposed in different formulations.
5 Conclusions This paper presents the development of regularized optical flow computation methods, and introduces a general formulation of optical flow computation and outlines a mathematical framework for computing optical flow. The framework may adopt a spline functional as a smoothness constraint, and generalize approaches to the formulation of various known smoothness constraints. In general, the significance of the framework lies in the fact that it provides the possibility of the description of different methods.
References 1. Singh, A.: Optic Flow Computation: A Unified Perspective, IEEE Computer Society Press, (1992) 2. Horn, K. P., Schunk, B. G.: Determining Optical Flow. Aritificial Intelligence, 17 (1981) 185–203 3. Lucas, B., Kanade T.: An Iterative Image Registration Technique with Application to Stereo Vision, in Proceedings, DARPA Image Understanding Workshop, (1981) 121–130 4. Gelfand, I. M., Fomin, S. V.: Calculus of Variations. Prentice-Hall, Englewood Cliffs, NJ, (1963) 5. Sagan, H.: Introduction to the Calculus of Variations. McGraw-Hill, New York, (1969) 6. Karayiannis, N. B., Venetsanopoulos, A. N.: Regularization Theory in Image Restoration: The Stabilizing Functional Approach. IEEE Transactions on Acoustics, Speech, and Signal Processing, 38(7) (1990) 1155–1179 7. Elad, M., Feuer, A.: Recursive Optical Flow Estimation—Adaptive Filtering Approach. Journal of Visual Communication and Image Representation, 9(2) (1998) 119–138 8. Bigün, J., Granlund, G.H.: Optical Flow Based on the Inertia Matrix in the Frequency Domain. In Proc. SSAB Symposium on Picture Processing, Lund, Sweden. (1988) 9. Black, M.J. and Anandan, P.: The robust estimation of multiple motions: Parametric and piecewise smooth flow fields. Computer Vision and Image Understanding, 63(1) (1996) 75–104 10. Yacoob, Y. and Davis, L.S.: Temporal multi-scale models for flow and acceleration. International Journal of Computer Vision, 32(2) (1999) 1–17
A Method for Camera Pose Estimation from Object of a Known Shape Dong-Joong Kang1,*, Jong-Eun Ha2, and Mun-Ho Jeong3 1 School of Mechanical Eng., Pusan National University, San 30, Jangjeon-dong, Kumjung-gu, Busan 609-735, Korea [email protected] 2 Dept. of Automotive Eng., Seoul National University of Technology, 138, Gongrung-gil, Nowon-gu, Seoul 139-743, Korea [email protected] 3 Intelligent Robotics Research Center, Korea Institute of Science and Technology, 39-1, Hawolgok-dong, Seongbuk-gu, Seoul 136-791, Korea [email protected]
Abstract. Pose estimation between cameras and object is a central element for computer vision and its applications. In this paper, we present an approach to solve the problem of estimating the camera 3-D location and orientation from a matched set of 3-D model and 2-D image features. We derive an error equation using roll-pitch-yaw angle to present the rotation matrix and directly calculate the partial derivatives of Jacobian matrix without use of numerical methods for estimation parameters from the nonlinear error equation. Because the proposed method does not use a numerical method to derive the partial derivatives, it is very fast and so adequate for real-time pose estimation and also insensitive to selection of initial values for solving the nonlinear equation. The method is proved from real image experiments and a comparison with a numerical estimation method is presented.
1 Introduction Solving model-based object recognition in computer vision is related to pose estimation between camera and object. Main elements for object position initialization in computer vision for multimedia imaging systems are feature detection, correspondence analysis, and pose estimation. This paper describes a fast algorithm that provides new solution for the problem of estimating camera location and orientation for pose determination from a set of recognized features appearing in the image. If the correspondence is given from the relation between 3-D model features and 2D features found in image, and camera parameters that provides geometric transformation between 3-D world space and its projection into image pixels is known, the goal of the pose estimation is to find the rotation and translation matrices which map the world coordinate system of objects to the camera coordinate system. Camera *
This work was supported by Pusan National University Research Grant, 2006.
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 606 – 613, 2006. © Springer-Verlag Berlin Heidelberg 2006
A Method for Camera Pose Estimation from Object of a Known Shape
607
calibration for exact registration between camera and objects should be previously solved and pattern box of a known shape is used to provide camera parameters. There have been several approaches to solve the object and camera pose problems. For more detailed review, refer to Kumar [1] and Ishii [2]. Because using roll-pitchyaw angle to present rotation is very simple and intuitive, we derive an error equation based on the three angles as a nonlinear function. The error equation minimizes a point-to-plane distance that is defined as the dot product of unit normal to the projected plane of image line and a 3-D point on the model line transformed to camera coordinates. We directly extract the partial derivatives for the estimation parameters from the nonlinear error equation. After solving the correspondence between 3-D model and 2-D image line sets, Levenberg-Marquardt technique is applied to minimize the defined error function. Partial derivatives for the error equation are analytically derived to form Jacobian matrices providing linearized form of the non-linear equation. Experimental results using real image of a 3-D polyhedral object are presented. The method is very fast and so adequate for real-time processing because the partial derivatives in Jacobian matrix required to optimize the nonlinear equation is directly calculated without any use of numerical methods.
2 Camera Calibration Camera calibration is the process that solves numerical values for the geometrical and optical parameters of the camera and the external 3-D position and orientation of the camera frame relative to an external coordinate system. There are two set of parameters that define the camera model, first there are intrinsic parameters that no depends on the position and orientation of the camera. Those parameters consist of five values such as α u , α v , γ , u 0 and v0 , which explain two scaling factors of axes, distortion coefficient, and center position of image plane, respectively . In addition, we have six extrinsic parameters, three of them that define the position of the camera, and the remaining defines its orientation. In general form, we can express P called the projective projection matrix in terms of those parameters. In this paper, we need the calibration matrix of intrinsic parameters to transform the object features projected into CCD plane of camera to image pixel plane. ªX wº ªX w º ªui º « » «Y » ªα − γ 0º «v » = C[R | t ]« w » = «0 β 0 »[R | t ]«Yw » « i» « » «Z w » «Z w » «¬1 »¼ « » « » «¬0 0 1 »¼ ¬1 ¼ ¬1 ¼ ªX wº ª p11 p12 p13 p14 º « » Y ªM º = «« p21 p22 p23 p24 »» « w » = P « » «Z w » ¬1 ¼ ¬« p31 p32 p33 p34 ¼» «1 » ¬ ¼
(1)
608
D.-J. Kang, J.-E. Ha, and M.-H. Jeong
Fig. 1 shows an example to obtain the calibration matrix. For example, total twelve points are used to solve the projection matrix P and then, we can obtain intrinsic parameter matrix C and extrinsic parameter matrix [R | t ] from decomposition of the projection matrix. Eq. (1) shows the mathematical relation between a 3-D world point M and its projection coordinate u on image plane.
(a)
(b)
Fig. 1. Pattern box for camera calibration. Twelve points are used to solve the camera parameters; (a) Original image; (b) Input twelve points are shown in small red box.
3 Camera Pose Estimation from a Known Shape Object Perspective projection between a camera and 3-dimentional objects defines a plane in 3D space that is formed from a line in an image and the focal point of the camera. If no errors with perfect 3-D models were introduced in during image feature extraction, then model lines in 3-D space projecting onto this line in the image would exactly lie in this plane. This observation is the basis for the fit measure used in a few previous works [1-2]. We propose a pose decision method solving rigid transformation, which minimizes the sum-of-squared distances between points on 3-D model lines and the plane. Ni
y
Camera coord. Image Plane
Camera coord.
x P / Oc
x
y
P
oc
x z
Image Line
Oc z Image plane
Projective Plane
X P / Ow
TOc / Ow
P
3-D Model Line
Z Z
X
Ow
Y
World coord.
(a)
X
Ow
Y
Model Object
World coord.
(b)
Fig. 2. Geometric configurations. (a) The relationship between two coordinate systems; (b) The perpendicular distance by point-to-plane fit measure.
First of all, we describe the relationship between two coordinate systems. Camera and world coordinate systems are Oc − xyz , Ow − XYZ , respectively. Fig. 2(a) shows the two coordinate systems where X w , Yw , Z w and xC , y C , z C represent the axes of
A Method for Camera Pose Estimation from Object of a Known Shape
609
the world and camera coordinates system, respectively. The relationship between the two coordinate systems is given by the following vector-matrix equation: § xP / O · c ¸ ¨ ¨ yP / O ¸ = R t c ¨ ¸ ¨ zP/O ¸ c ¹ ©
§ X P / O − t1 / O W W ¨ ⋅ ¨ YP / OW − t 2 / OW ¨ ¨ Z p / O − t3 / O W W ©
· ¸ ¸, ¸ ¸ ¹
(2)
where R is rotation matrix from the world coordinate system to the camera coordinate system and Toc / ow = (t1 t 2 t 3 ) t is translation vector from Ow point to Oc point. Upper index t presents transpose of a matrix. A point in 3-D space is represented by 3-D vector P . We could use roll-pitch-yaw angles to represent the rotation matrix R [5]. Due to noise for extracting image lines, the segments usually will not lie exactly in the projection plane as shown in Fig. 2(b). The point-to-plane distance may be defined as the dot product of the unit normal to the plane and a 3-D point on the model line transformed to camera coordinates. It is possible to provide an error equation denoting the sum of squared perpendicular distances for line segments: l
m
e = ¦ ¦ eij2 =¦¦ (N i ⋅ (R t (Pij − T)) 2 . i =1 j =1
(3)
The summation is over l pairs of corresponding 3-D and 2-D line segments. A point P on the 3-D model line in Fig. 2(b) might be one among two endpoints and center point of the line. The index m is the number of points selected on the 3-D line. N i is the unit normal vector to the plane formed by each 2-D segment. The pose of the 3-D segments relative to 2-D segments is expressed as a rotation R and translation T applied to the 3-D points. The best-fit 3-D pose for a set of corresponding 3-D and 2-D line segments is defined by the rotation R * and translation T* which minimize eq. (3). Solving for R * and T* is a non-linear optimization problem. The eq. (3) for a specific point P on the 3-D model line is rewritten as: § § r11 r21 r31 ·§¨ X ij − t1 ·¸ ·¸ ¨ ¨ ¸ ¨ eij = (n1 n 2 n3 )i ⋅ ¨ r12 r22 r32 ¸¨ Yij − t 2 ¸ ¸ ¨ ¸¸ ¨ r r r ¸¨¨ ¨ © 13 23 33 ¹© Z ij − t 3 ¸¹ ¸¹ ©
(4)
where n i is normal vector components of the plane obtained from a corresponding 2D image line and 3-D model line. The Pij in relative to world coordinate system is ( X ij Yij Z ij ) t and translation vector T between two coordinate systems from Ow to Oc is given as (t1 t 2 t 3 ) t . The number of unknown parameters in eq. (4) is six for
both (t1 t 2 t 3 ) of translation and (α β γ ) of rotation. From eq. (4), we can create an equation that expresses this error as the sum of the products of its partial derivatives: ∂e ∂e ∂e ∂e ∂e ∂e δt1 + δt 2 + δt 3 + δα + δβ + δγ = δe . ∂t1 ∂t 2 ∂t 3 ∂α ∂β ∂γ
(5)
610
D.-J. Kang, J.-E. Ha, and M.-H. Jeong
For example, we can obtain six equations for three lines using two end points of a line and hence produce a complete linear system, which can be solved for all six camera-model corrections. In conventional cases, several line segments could give an over-constrained linear equation. Levenberg-Marquardt method provides a solution for linearized form of the non-linear equation [6]. The small displacement vector δx including δt1 , δt 2 , δt3 , δα , δβ , and δγ represent errors of each parameter and define the Jacobian matrix J . The partial derivatives of e with respect to each of the six parameters are analytically derived. The eq. [6] shows an example of the partial derivatives. ∂e = −(n1r11 + n2 r12 + n3 r13 ) , ∂t1
(6a)
∂r ∂r ∂r ∂e = n1[( X ij − t1 ) 11 + (Yij − t 2 ) 21 + ( Z ij − t 3 ) 31 ] ∂α ∂α ∂α ∂α ∂r32 . ∂r12 ∂r22 + n 2 [( X ij − t1 ) ] + (Yij − t 2 ) + ( Z ij − t 3 ) ∂α ∂α ∂α ∂r ∂r ∂r + n3 [( X ij − t1 ) 13 + (Yij − t 2 ) 23 + ( Z ij − t 3 ) 33 ] ∂α ∂α ∂α
(6b)
Therefore, Jacobian matrix is t
ª ∂e ∂e ∂e ∂e ∂e ∂e º J=« » . ¬ ∂α ∂β ∂γ ∂t1 ∂t 2 ∂t 3 ¼
(7)
The Levenberg-Marquardt method is an iterative variation on the Newton method in non-linear estimation. The normal equations Hδx = J t Jδx = J t e are augmented to H ′δx = J t e where H ′ = (1 + λI )H . The value λ is initialized to a small value. If the value obtained for δx reduces the error, the increment of x to x + δx is accepted and λ is divided by 10 before the next iteration. On the other hand, if the error increases, then λ is multiplied by 10 and the augmented normal equations are solved again, until an increment that reduces the error is obtained.
4 Experiments First of all, we performed a convergence test by using synthetic image. A CAD model is made and its transformation under assumption of perspective projection provides a corresponding image lines. Experiments show that there is rapid convergence even with significant errors in the initial estimates. Several iterations are enough to obtain convergence of the parameters. Fig. 3 in an experiment using synthetic image shows intermediate object positions that a false pose viewed from initial camera position converges to the exact object position with update of camera position during iteration. Convergence time is about under 0.02 sec in Pentium-IV 2Ghz. For (a), (b), and (c) of Fig. 3, we can see three different initial poses converge to the exact object position by changing the pose parameters between camera and world coordinate systems. The pose values are calculated by using derivatives of the nonlinear error equation of eq. (3) from twelve lines of edges in the synthetic image.
A Method for Camera Pose Estimation from Object of a Known Shape
(a)
(b)
611
(c)
Fig. 3. Change of object pose viewed from camera coordinate during iterations. (a) Intermediate object poses viewed by camera during iteration; (b), (c) Intermediate poses from two different initial locations.
For experiment using real image, we have to extract calibration matrix of camera to use on pose estimation. We used a calibration box such as Fig. 1 to obtain the camera parameters. Total twelve points are used to solve the projection matrix P and then, we can obtain intrinsic parameter matrix C from decomposition of the projection matrix. In Fig. 1(b), gray-color points of the pattern box present the calibration points used as world reference. The image size in the test is 640x480 pixels2 and five parameters in C as shown in eq. (8) present two scale factors, distortion ratio, and two image center positions, respectively. The matrix C of intrinsic parameters is used to solve the pose values between model objects and camera system. ª1080.7 - 0.6 C = «« 0 1080.2 «¬ 0 0
328.9º 255.4»» »¼ 1
(8)
After we calculate the camera matrix C , it is possible to decide the camera pose by observing any object that have known dimension. Fig. 4 shows experiment using an image. Fig. 4(a) shows the original image to be tested. After discarding the shorter lines, Fig. 4(b) presents the extracted lines. To solve the correspondence problem between model and image lines, we can refer a method using the topological line group extraction as described in Kang [3]. The thick lines in Fig. 4(c) present the corresponding lines to be used for pose estimation. Fig. 4(c) shows the model overlapped from the 3-D pose determination algorithm of Section 3. The thick lines matched in Fig. 4(c) are enough to guide an initial hypothesis for 3-D object recognition. Fig. 4(d) presents the model overlapped to original image. In this experiment, we set m = 3 corresponding to two endpoints and a center point for a model line. And arbitrary three translation values and uniformly sampled rotation values in α − β − γ angle space are selected for initial values of sixpose parameters. In a few iteration steps, the convergence is reached. If the error function eq. (3) for any initial values of six pose parameters is not reduced during a few iterations, the initial candidates are discarded and next initial values, specifically different angles from the uniformly sampled angle space, are tried. Convergence is very
612
D.-J. Kang, J.-E. Ha, and M.-H. Jeong
(a)
(b)
(c)
(d)
Fig. 4 Model alignment and pose for real image. (a) Original image; (b) Longer lines after discarded shorter lines; (c) Pose estimation; (d) Model overlapped on original image. Table 1. A comparison of our method with a conventional numerical approach
t
Initial value Angle (deg)
0,0,0
0,0,0
10,10,300
0,0,0
10,10,300
-10,-50,50
-340,-220,480
-10,-50,50
Numerical method Pose value (t, angle) error 2.4,0.39,2.5, -0.17,-0.46,-1.91 -205,-420,433, 23.7,-52.2,34 -263.6,-369.2,451.2, -3.1,-2.2,3.99 -338.7,-224.9,480.6, -9.8,-48.7,48.6
84.4 9.10 2.24 0.07
Our method Pose value (t, angle) -338.7,-224.9,480.6, -9.8,-48.7,48.6 -338.7,-224.9,480.6, -9.8,-48.7,48.6 -338.7,-224.9,480.6, -9.8,-48.7,48.6 -338.7,-224.9,480.6, -9.8,-48.7,48.6
error 0.07 0.07 0.07 0.07
fast as within 0.1 sec in Pentium-IV desktop computer and we can reach the stable convergence if there is a solution. Table 1 shows a comparison result between our method and a conventional numerical approach. The numerical method does not use Jacobian or analytical derivation of partial derivatives from the error equation of eq. (3). Therefore, the method is slow to converge and very sensitive to selection of initial pose parameters for the nonlinear equation. If we do not provide the initial values of the six unknown pose parameters close to true pose values, the convergence is fail and gives large error 2
value of eq.(3). The error value in Table 1 is calculated from ei of eq.(3) for each line. The Nelder-Mead method is used for the numerical optimization [7].
A Method for Camera Pose Estimation from Object of a Known Shape
613
5 Conclusions This paper presents a new method to estimate the camera 3-D location and orientation from a matched set of 3-D model and 2-D image lines. If the correspondence is given from the relation between 3D model lines and 2D lines found in image, a key step of the 3-D object recognition is to find the rotation and translation matrices that map the world coordinate system to the camera coordinate system. We propose a method using roll-pitch-yaw angle to present rotation. We derive a nonlinear error equation based on the roll-pitch-yaw angles. The error equation is designed to minimize a point-to-plane distance that is defined as the dot product of unit normal to the projected plane of image line and a 3-D point on the model line transformed to camera coordinates. Levenberg-Marquardt method minimizes the error equation with uniform sampling strategy of rotation space to avoid stuck in local minimum. From experiments using real images, the proposed method is proved to be stable to initial values of estimating parameters. From corresponding line sets between 3-D model and 2-D real images, the method converses to good pose solutions in only a few iterations.
References 1. Kumar, R., Hanson, A. R.: Robust methods for Estimating Pose and a Sensitivity Analysis. CVGIP: Image Understanding, 60 (1994) 313-342 2. Ishii, M., Sakane, S., Kakikura., M., Mikami, Y.: A 3-D Sensor System for Teaching Robot Paths and Environments. Int. J. Robotics Research, 6 (1987) 45-59 3. Kang, D. J., Ha, J. E., Kweon, I. S.: Fast Object Recognition using Dynamic Programming from Combination of Salient Line Groups. Pattern Recognition, 36 (2003) 79-90 4. Lowe, D. G.: Three-Dimensional Object Recognition from Single Two-Dimensional Images. Artificial Intelligence, 31 (1987) 355-395 5. Craig, J. J.: Introduction to Robotics: Mechanics and Control. 2nd Ed. Addison Wesley Publishing, New York (1989) 6. Press, W. H., Teukolsky, S. A.., Vetterling, W. T., Flannery, B. P.: Numerical Recipes in C, Cambridge Press, London (1992) 7. Nelder, J. A., Mead, R.: A Simplex Method for Function Minimization, J. Computing, 7 (1965) 308-313
A Method of Radar Target Recognition Basing on Wavelet Packets and Rough Set 2
Hong Wang1 and Shanwen Zhang 1
College of Mathematics and Computer Science, Shanxi Normal University, Linfen, Shanxi 041004, P. R. China [email protected] 2 Missile Institute of Air Force Engineering University, Xi’an, Shaanxi, 713800, P. R. China [email protected]
Abstract In the target recognition or classification, extracting effective classifycation features from original target signals is very important. The effective features of target can be extracted by wavelet transforms (WT). The rough set theory (RST) is used to process the features of target identification, and a mining algorithm is established, which stores the original information with table and deletes redundant information by simplifying the table, and finally mines the useful information. Based on this method, the made rules are chosen. The applications of WT and RST in information processing meet the need of original information processing in the target identification. A new recognition method is presented in this paper. Firstly, the content of WT and RST is reviewed, the feature of the target is obtained by WT, then identification algorithms are given using RST and the radar target is reconditioned from correlation matching. At the end, experiments of recognition using the data of three kinds of aircraft models are performed and demonstrated that method can achieve quite satisfactory results.
1 Introduction In the radar target recognition, the feature extraction is important. The feature selection is a process for finding a subset of features, from the original set of features for forming patterns in a given data set, optimal according to the given goal of processing and criterion. An optimal feature selection is a process of finding a subset A1 of A , which guarantees accomplishment of a processing goal by minimizing a defined feature selection criterion. A solution of an optimal feature selection does not need to be unique. One can distinguish two paradigms in data model building, and potentially in an optimal feature selection. A robust processing algorithm, with the associated set of features (reflecting complexity), is a trade-off between the ability of processing a given data set, versus generalization ability. The second general paradigm of optimal feature selection, mainly used in classifier design, relates to selecting a feature subset, which guarantees the maximal between-class reparability for the reduced data sets. This relates to the discriminatory power of features. A selected feature subset is evaluated by using as a criterion J feature predictor a D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 614 – 619, 2006. © Springer-Verlag Berlin Heidelberg 2006
A Method of Radar Target Recognition Basing on Wavelet Packets and Rough Set
615
performance evaluation J predictor of a whole prediction algorithm for the reduced data set containing patterns with the selected features as patterns elements. The wavelet analysis plays important role in radar target recognition [1]. It is more effective than the existing methods in data compressing and margin detecting due to its predominant properties in local analysis. The effective features of target can be extracted by wavelet transforms. The rough set theory may find some concealed relationships and regulations among data [2]. It is applied to the domains of artificial intelligence, pattern recognition, and intelligence information processing and so on. It becomes a theory and a method that international academic attach importance to deal with uncertain data. Target recognition is an important link in the chain of information processing for air-defense. There are many recognition results of radar target. But, there is not any ripe theory for air target type recognition. The paper presents a recognition method of radar target by using on wavelet package and rough set.
2 Wavelet Packets The following wavelet packet basis function given function
{Wn } (n = 0,) is generated from a
W0 W 2 n (l ) =
2 ¦ h ( k )W n ( 2 l − k ) k
W 2 n +1 ( l ) =
2 ¦ g ( k )W n ( 2 l − k )
(1)
k
W0 (l ) can be identified with the scaling function ϕ and Wl (l ) with the mother wavelet ψ . The h(k ) and g (k ) are the coefficients of the
Where the function
low-pass and high-pass filters, respectively. The Fig. 1. is wavelet decomposition and reconstruction. The wavelet packet functions as well as the corresponding decomposition coefficients can be organized as a binary tree as shown in Fig. 2. Each node corresponds to a frequency band. The leaf nodes of any connected subtree that has the same root node as the full tree form an orthonormal basis and can represent a signal of finite energy completely. The best basis algorithm developed by Coifman and Wickerhauser [3] provides an approach to choose the so-called “best basis” to represent a given signal adaptively. For a discrete signal X of unit energy, the Shannon entropy of the decomposition coefficient sequence
X n , j = {xnk, j } of the nth node at level j of the wavelet packets
tree is computed by [4,5]
M ( X n , j ) = − ¦ ( x nk , j ) 2 log( x nk , j ) 2 k
where
(2)
( xnk, j ) 2 log( xnk, j )2 is taken as 0 if xnk, j = 0. M ( X n , j ) reflects the degree of
energy concentration the coefficient sequence. It is large if the elements of the sequence are roughly the same and small if all but a few elements are negligible.
616
H. Wang and S. Zhang
The wavelet packets method is a generalization of the wavelet decomposition that offers a richer range of possibilities for signal analysis. In the wavelet analysis, a signal is split into an approximation and a detail. The approximation is then itself split into a second-level approximation and detail, and the process is repeated. For an nlevel decomposition, there are n+1 possible ways to decompose or encode the signal. In the wavelet packets analysis, the details as well as the approximations can be split.
Fig. 1. Wavelet decomposition and reconstruction
Fig. 2. A binary tree of wavelet packet decomposition
The wavelet decomposition tree is a part of this complete binary tree. For instance, the wavelet packet analysis allows the signal S to be represented as A1 + AAD3 + DAD3 + DD2. This is an example of a representation that is not possible with ordinary wavelet analysis. Choosing one out of all these possible encoding presents an interesting problem. In this toolbox, we use an entropy-based criterion to select the most suitable decomposition of a given signal. This means we look at each node of the decomposition tree and quantify the information to be gained by performing each split.
3 Rough Set Theory X denote the subset of elements of the universe U ( X ⊆ U ), R is an equivalence relation, U R denote the equivalence classes of the relation R . Moreover, X is said to be R -definable if it is a union of some R -basic categories. If X cannot be expressed as such a union, it is R -indefinable, i.e. X is rough with respect to R . With every set X we can associate the lower and the upper Let
approximation .The lower approximation of a set is the set of all elements that surely belongs to X , whereas the upper approximation of X is the set of all elements that possibly belong to X [4]. Let R be a family of equivalence relation over U and r ∈ R . We will say that r is indispensable if ind ( R ) ≠ ind ( R − {r}) ; otherwise r is dispensable. The family R is independent if each r ∈ R is independable in R ; otherwise it is dependent. Q ∈ R is a reduce of R if Q is independent and ind ( R ) = ind (Q ) . The set of all indispensable relations in R will be called the core of R , and will be denoted as core( R ) = red ( R ) . Where red ( R ) is the family of all reduces of R . In a data table, it is well known that not all conditional attributes are necessary to depict the decision attribute before decision rules are generated. To acquire brief
A Method of Radar Target Recognition Basing on Wavelet Packets and Rough Set
617
decision rules from decision systems, knowledge reduction is needed. Knowledge reduction for a database aims to search for some particular subsets of condition attributes that preserves the same properties as the condition attribute set.
4 Radar Target Identification by Wavelet Packets and Rough Set In using rough sets for feature selection two cases can be distinguished, namely global and local feature selection scheme. In the former case the relevant attributes for the whole data table are selected while in the latter case the descriptors of the form (a, v) , where a ∈ A and v ∈ V are selected for a given object. In both cases we are searching for relevant features for the object classification [3,4]. The question about radar target identification has complicated nature of incertitude. In many condition, we always get cursory information in the beginning of investigation. So the key of the problem is how to engender the final rule using useful feature and original data. To deal with the decision information using wavelet packet get feature vector. To express acquired and the rule, we use the form of relation table and logic arithmetic. The approach of radar target identification is given as follows: 1) To pretreatment the original radar target information, such as mending the missing data, unites the repeated object. 2) Selecting suitable wavelet function to carry on the decomposition. 3) Definition the suitable layer of wavelet decomposition. 4) To decompose wavelet packet for the pretreatment data, calculate the energy of each wavelet packet, get an eigenvector, form the relation table consisting conditional attribute aggregate and decision attribute aggregate. 5) To predigest the relation table, delete the abbreviatory attribute and unite repeated object in turn, and predigest each object, delete redundant attribute. 6) To calculate the core valued and the possible predigesting of conditional attributes, and make relevant rule. For identification rules are not alone, we choose attribution table according to definite rule and get the excellent and brief rule finally.
5 Results The experimental dates are gained from step frequency signal of millimeter wave. The frequency bandwidth is 1GHz.The frequency modulation step width is 2MHz.The targets identified are magnification model of the bombing plane, the fighter plane and helicopter. The data of above the three planes are the echo wave by using intercepting, removing equal value and normalizing, which varies at the range of ± 10 0 azimuth angle (angel step is 10 ). And then these data will be data of database. The echo data of the recognition target are the radar echo in combination with the Gauss white noise. The radar target 1-D range profile, wavelet decomposition and maximum-coefficients are shown in Fig.3.
618
H. Wang and S. Zhang
1
1
1
0. 8
0.8
0.8
0. 6
0.6
0.6
0. 4
0.4
0.4
0. 2
0.2
0
50
10 0
1 50
200
250
0
0.2
50
100
150
200
2 50
0
50
10 0
15 0
2 00
2 50
Fig. 3. The radar signal of three kinds of aircraft
First, after computing wavelet packets decomposition of radar echo, target centers are obtained. The table 1 is energy of wavelet packets decomposition. The rough set is used to select best basis of wavelet packets. Then, the features of radar target are obtained. The recognition result is known by correlative technology. Table 1. Energy of wavelet packet decomposition
Based on this method, the results of recognition are 95%, 93% and 90%. (Wavelet function is ‘Sym’, and SNR is 5dB). In the same way, the comparison of the method with WT method, PCA method, SPCT method [3,7,8] in terms of two aspects of the time consumed and classification accuracy was considered in the experiment. It can be found that the consumed time by this method also was the least. It can be found this method was better than WT and SPCT in terms of classification accuracy. According to the experiment described above, it can be found that this method can effectively extract useful features and its execution speed is very fast.
6 Conclusion The traditional method of correlation matching recognition target could not identify targets in all attitude angles because of long identification time. But, for a great number of non-stationary or time varying signals, such as speech, radar, earthquake signals, ECT. The classification features are often localized both in time and frequency, thus extracting effective features from them by general transformation methods is very difficult. WT can provide arbitrary time-frequency decomposition for the signals. The application of wavelet packets and rough set theory in information processing meet the need of original information processing in target identification. The experiments of recognition using the data of three kinds of aircraft models were performed. A satisfying result is obtained. The actual applied value is great in regard to information processing and target identifying in our aerial defense weapon systems.
A Method of Radar Target Recognition Basing on Wavelet Packets and Rough Set
619
References 1. Weiss, G.: Wavelets and Wideband Correlation Processing. IEEE Signal Processing Magazine, 1 (1994) 13-32 2. Swiniarski, R. W., Skowron, A.: Rough Set Methods in Feature Selection and Recognition. Pattern Recognition Letters, 6 (2003) 833-849 3. Laine, A., Fan, J.: Texture Classification by Wavelet Packet Signatures. IEEE Transactions Pattern Anal. Mach.Intell, 15 (1993) 1186–1191 4. Bonikowski, Z.: Algebraic Structures of Rough Sets. In: Ziarko Z. Rough Sets, Fuzzy Sets and Knowledge Discovery, Springer-V, Berlin Heidleberg New York (1994) 5. Jain, A., Zongker, D.: Feature Selection: Evaluation, Application. and Small Sample Performance. IEEE Transactions Pattern Analysis and Machine Intelligence, 2 (1997) 153-158 6. Buf, J. M., Du, H., Heitkaemper, P.: Texture Features Based on Gabor Phase. Signal Process, 23 (1991) 227–244 7. Jia, X., Richards, J. A.: Segmented Principal Components Transformation for Efficient Hyperspectral Remote Sensing Image Display and Classification. IEEE Transactions Geoscience and Remote Sensing, 37 (1999) 538-542 8. Gu, Y. F., Zhang, Y., Quan, T. N.: A Kernel—based Nonlinear Subspace Projection Method for Hyperspectral Image Data. Chinese Journal of Electronics, 4 (2003) 203-207
A Multi-resolution Image Segmentation Method Based on Evolution of Local Variance Yan Tian1, Yubo Xie1, Fuyuan Peng1, Jian Liu1, and Guobo Xing2 1
Department of Electronics and Information Engineering, Huazhong University of Science and Technology, Wuhan 430074 China {tianyan2000, xieyubo2000}@126.com 2 Department of Science and Technology, Construction University of Shangdong, Shangdong 250101, China
Abstract. In this paper, a newly image segmentation method for remote sensing image is presented. The proposed method consists of three steps: 1) multiresolution image sequences are constructed by a multi-resolution model, in which pixels between two neighboring resolutions are linked by computing parent-child distance, 2) the evolution of local variance at multi-resolution is studied, on which images are segmented based on image sequence and 3) an actual segmented result corresponding to the original image is obtained by using the parent-child link. The experimental results show that we can get favorable segmentation performance by using this method.
1 Introduction Image segmentation plays a crucial role in image processing. Traditional segmentation approaches include region and edge based methods [1]. Because of the complexity in practice and limits of sensors, no algorithm can work well on every type of images. Several methods we refer to [2, 3] are studied recently. In practice, images often have low signal-to-noise ratio (SNR) or redundancies of information due to certain reasons. For example, since under water image lack of enough information in texture and structure, it is difficult to obtain a satisfying performance for classical methods. In this paper, we proposed a methodology of multi-resolution segmentation based on local variance. The algorithm introduces a multi-resolution projection model which is used to produce accurate edge locating and boundary tracking relatively, and the evolution of local variance across resolution is studied. Since each object in an image has an inner resolution, the relationship varies from object to object and is used to segment images.
2 Multi-resolution Model Generally, multi-resolution images I 0 , I1 , , I L can be obtained by an image pyramid
as follows [4, 5]. Level I 0 and I L correspond to the highest and the lowest D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 620 – 625, 2006. © Springer-Verlag Berlin Heidelberg 2006
A Multi-resolution Image Segmentation Method
621
resolutions images respectively. As Fig.1(a) illustrates, the gray-level of pixel ( k , l) in level I m can be obtained by averaging four pixels corresponding to pixel ( k , l ) in level I m −1 . I m (k , l ) denotes the gray-level of pixel (k , l ) in image level I m .
Fig. 1. (a) Image pyramid
(b) Parent-child link
Defining an up operator γ , γ s denotes the corresponding pixel of s at the adjacent low resolution, and γ −1 s denotes the corresponding pixel of s at the adjacent high resolution. Therefore, the gray-level of each pixel can be denoted as: I m (k , l ) =
1 ¦ [γ −1 I m (k , l )] 4
(1)
In order to locate edges accurately as mapping low resolution to high resolution, we will propose a projection method between different resolutions here. Assuming that there are two adjacent resolutions I m and I m +1 we need to find a father-pixel in I m +1 for each pixel in I m . Take, as an example, the pixel I m ( x0 , y0 ) in I m . We can obtain coordinates ( x1 , y1 ) in I m +1 by projecting its coordinates ( x0 , y0 ) with bilinear method. The search area Φ of father-pixel is defined as: Φ = {I m +1 ( x, y ) | ( x1 − x) 2 + ( y1 − y ) 2 ≤ 1}
(2)
In area Φ , we define parent-child distance as follows:
D = I m +1 ( x1 , y1 ) − I m +1 ( x, y ) , I m +1 ( x, y ) ∈ Φ
(3)
When D reaches the minimum, the corresponding pixel I m +1 ( x, y ) is the father-pixel of I m ( x0 , y0 ) . We call it parent-child link. D can use another definition of distance in principle. Defining a father operator ϑ ϑ s denotes the father-pixel of pixel s , and ϑ −1 s denotes the child-pixels of pixel s in the adjacent high resolution.
622
Y. Tian et al.
3 The Evolution of Local Variance Across Resolution Local variance and resolution are two crucial characters of an image. Local variance is defined as the average value of the variances within a moving window passing through the entire image. The rationale of the relation is as follows: Every object has an inner resolution [6]. If the spatial resolution is considerably finer than the objects in the scene, most of the measurements will be highly correlated with their neighbors and the local variance will be low. If the objects to be studied approximate the size of the resolution cells, the values of pixels tend to be different from each other and therefore the local variance increases. However, as the size of the pixels increases further, more objects will be contained in a single pixel and the local variance starts decreasing, as Fig.2 shows.
Fig. 2. Local variance variability across resolution
Fig. 3. Images of three objects and analyses of local variance (a) Seawater (a1). The evolution of local variance of seawater (b) Smoke (b1). The evolution of local variance of smoke (c) Rock (c1) The evolution of local variance of rock.
In the following experiments, we analyzed the variety of local variance at different resolutions. Fig.3. (a), (b), (c) are three images which correspond to seawater, smoke, and rock in an under water image. The local variances at different resolutions of the three images are shown in Fig.3. (a1)-(c1) respectively. In Fig.3. (a1)-(c1), x-axis represents the size of image by exponential of 2, for example, since 512 can be written as 29 , we denote it with 9, and y-axis represents the local variance of image. Fig.3 (a1)-(c1) show that the local variances of seawater and smoke decrease as the resolution increases, and the local variance of rock increases as the resolution increases at the beginning. However, when the local variance of rock reaches the maxi-
A Multi-resolution Image Segmentation Method
623
mum value, it begins to decrease gradually as the resolution increases. As each object has an inner resolution, these results accord with the rationale shown in Fig.2. Since the trend of local variance of rock is quite different from that of seawater and smoke, the residuals of local variance are also different when resolution changing and it reaches a maximal value at a certain resolution. So we can segment images using the evolution of local variance across resolution.
4 Segmentation Approach with Evolution of Local Variance We formulate the image segmentation problem with the evolution of local variance across resolution. Firstly, local variance of the original image I 0 and the nearby resolution image I1 are computed respectively. Secondly, the difference between the two local variance images is counted. The result of segmentation can be obtained by segmenting the residual with OSTU algorithm. The result is segmentation at low resolution. Finally, we can gain the segmentation result of the original resolution by projecting it to high-resolution with parent-child link. Practically we just use the two highest resolutions I 0 ( 2m × 2n , m, n ∈ N ) and I1 of the image pyramid. According to the above idea, we first acquire the local variance images of I 0 and I1 . The corresponding local variance images V0 and V1 are obtained by computing local variance within a moving window passing through the image. V f (i , j ) =
1 i +1 j +1 1 i +1 j +1 [ I f ( k , l ) − ¦ ¦ I f ( w, s )]2 ¦ ¦ 9 k = i −1 l = j −1 9 w =i −1 s = j −1
where f ∈ {0,1} , i = 1,2,..., 2m − f and
j = 1, 2,..., 2n − f
(4)
.
Since the difference between image V0 and image V1 will be used later, we need to establish a mapping between them. The resolution of image V0 is reduced with formula (1), and we get the result V = (Vi , j ) whose resolution is the same as that of image V1 . Thus, the difference Q can be obtained using Q = V − V1
(5)
From the evolution of local variance discussed in Section 3, we know that the local variance of smoke and seawater decreases steadily and changes slightly as the resolution increases. The local variance of rock increases as the resolution increases at the beginning, however, when it reaches the maximum value, it begins to decrease gradually as the resolution of the image increases. It changes more sharply than that of smoke and seawater. Thus, the object can be segmented from the residual image Q easily by OSTU algorithm. To reduce noise, a simple averaging mask is applied to image Q firstly. The segmentation result Q ' ( 2m −1 × 2n −1 ) is a binary image.
624
Y. Tian et al.
The result Q ' is projected to the original resolution with the parent-child link introduced in Section 2. The mapping formula is given by the expression: ϑ Η (i , j ) = 1 1, Η (i , j ) = ® ¯ 0, otherwise
(6)
where H is the corresponding segmentation result of the original image I 0 . As a summary, the proposed segmentation algorithm is described as follows: Step 1: compute the level I1 by formula (1) from I 0 . Step 2: by formula (4) find the local variance V0 and V1 for I 0 and I1 respectively. Step 3: obtain the low resolution image V for V0 by formula (1). And calculate the residue Q = V − V1 . Step 4: the residue Q is smoothed by a 3 × 3 filter and the result is Q1 . Step 5: segment Q1 by OSTU algorithm and the result is Q ' . Step 6: project Q ' to original resolution according to formula (6) and the result labeled Η is the result corresponding to the original image.
(a) Original image 1
(b)Algorithm proposed in this paper (c) OSTU method
(d) Local variance method
(e) Original image 2
(f) Algorithm proposed in this paper (g) OSTU method (h) Local variance method
Fig. 4. Experimental results ( 512 × 512 bits)
5 Experimental Results and Analyses In order to test the efficiency of the proposed algorithm, two groups experiments are carried out. Fig. 4(a) and Fig. 4(e) are images to be segmented. There are three objects in the original image: seawater, smoke and rock. Our objective is to segment rock from the image for the route of robot when it is advancing. Since the smoke appears quite singular and the gray-level of some rock is much closed to that of the backgrounds, it is difficult to segment the image by common methods based on gray-level. We compared our method with the classical OSTU algorithm and the method using local variance directly in the experiment. The results are shown in Fig. 4. From the first group experiments it is easy to see, to some extent, OSTU method and local variance method are over-sensitive to noise and thus the segmentation re-
A Multi-resolution Image Segmentation Method
625
sults are incomplete or polluted, while the result provided in this paper is much better. For the second group, the segmentation results of OSTU and local variance method are incorrect. Some smoke is mis-classified as part of rock. However, the newly method is more robust and it distinguishes the shades of smoke and rock.
6 Conclusions This paper presents an effective approach to the image segmentation problem by using the evolution of local variance at multi-resolution. Local variance evolutional curve for the similar targets such as seawater, smoke and rock behaves diversely. Therefore it can act as a more reliable feature for image segmentation. Our experimental results demonstrate that our method has a much better performance on image segmentation as compared with the OSTU algorithm which is solely based on gray information and the method considering local variance at a single resolution.
Acknowledgments The work described in this paper is supported by NSFC (NO.60572048 & 60475024).
References 1. Gonzalez, R.C., Woods, R.E.: Digital Image Processing, 2nd ed. Reading, MA: AddisonWesley (2002) 2. Wong, W.C.K., Chung, A.C.S.: Bayesian Image Segmentation Using Local Iso-Intensity Structural Orientation. Image Processing, IEEE Transactions, 14(10) (2005) 1512 - 1523 3. Costantini, M., Zavagli, M., Milillo, G.: A Novel Approach for Image Segmentation. Geoscience and Remote Sensing Symposium, IEEE International. 3 (2002) 1603 - 1605 4. Rezaee, M.R., van der Zwet, P.M.J., Lelieveldt, B.P.F., van der Geest, R.J., Reiber, J.H.C.: A Multiresolution Image Segmentation Technique Based on Pyramidal Segmentation and Fuzzy Clustering. IEEE Trans. Imaxe Processing, 9(7) (2000) 1238-1248 5. Vincken, K.L.: Probabilistic Multiscale Image Segmentation by the Hyperstack. PhD thesis, Utrecht Univ., The Netherlands (1995) 6. Cao, C, Lam, NSN: Understanding the Scale and Resolution Effects in Remote Sensing and GIS. Scale in Remote Sensing and GIS. Lewis Publishers, New York (1997)
A New Denoising Method with Contourlet Transform Gangyi Jiang1,3, Mei Yu1,2, Wenjuan Yi1,2, Fucui Li1, and Yong-Deak Kim4 1
Faculty of Information Science and Engineering, Ningbo University, Ningbo 315211, China 2 Institute of Computer Technology, CAS, Beijing, 100080, China 3 National Key Laboratory of Machine Perception, Peking University, 100871, China 4 Division of Electronics Engineering, Ajou University, Suwon 442-749, Korea
Abstract. The contourlet coefficients of natural images are highly correlated across scales especially around boundaries, whereas contourlet coefficients of Gaussan white noise are hardly correlated. Therefore signal coefficients can be sorted out according to the different correlation characteristics between signals and noise. In this paper, a contourlet based denoising algorithm using interscale correlations is proposed combined with thresholding functions. Experimental results show that the proposed method outperforms the corresponding wavelet based method especially for the images containing many edges and fine textures.
1 Introduction Wavelet transform has been widely used in image processing fields, although it optimally approximates one-dimensional piecewise smooth signal with singularity, it is less effective for images where the singularities are located both in space and directions. Contourlet transform is a new analysis tool to make up the limitations of wavelets[1]. With two key features of anisotropy and directionality, edges and boundaries are well approximated into subbands in multiple scales and multiple directions[2]. Boundaries and edges contain dominant information in images, thus many image denoising methods based on edge detection were proposed, such as wavelet based spatially selective noise filter[3]. The contourlet coefficients of natural images are highly correlated across scales especially around boundaries, whereas contourlet coefficients of Gaussan white noise are hardly correlated[4]. So signal coefficients can be sorted out according to the different correlation characteristics between signals and noise. In this paper, a new contourlet based denoising algorithm using interscale correlations is proposed combined with thresholding functions. Experimental results show that the proposed method outperforms the corresponding wavelet based method.
2 A Contourlet Based Denoising Algorithm Contourlet transform uses a structure similar to curvelet[5], that is, a stage of subband decomposition followed by a directional transform. Coefficients with large modulus always cluster around contours and boundaries in images after contourlet transform, moreover there is certain continuity and dependency among these coefficients. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 626 – 630, 2006. © Springer-Verlag Berlin Heidelberg 2006
A New Denoising Method with Contourlet Transform
627
2.1 Threshold Denoising The decorrelation property which exists in both wavelet transform and contourlet transform ensures that the power and information concentrates on limited transformed coefficients while the rest of the coefficients are close to zero. It is completely different for the case of Guassian white noises where power distributes evenly on all the transformed coefficients. According to the phenomenon, threshold denoising theory is proposed, coefficients whose modulus are larger or smaller than a threshold T are processed separately, and then the denoised image is obtained through inverse transform on the remained coefficients. There are two commonly used threshold functions, hard threshold function (HT) and soft threshold function (ST). Let c and y be contourlet coefficients of noisy image and denoised image, T be 1, x is true I ( x) = ® ¯0, x is false . Then, HT is defined threshold and I(x) be a logical function, as y(c)=cI(|c|>T), while ST is defined as y(c)=(c-sgn(x)T)I(|c|>T). Local edges and boundaries are well kept in HT denoising, but there may be some artifacts in the denosied image due to the Gibbs-like phenomenon, the artifacts are usually thread-shaped ones. By contrast, images are smoother after ST noise removal, but the edges and boundaries may be blurred. However, the blur is restrained to some extent in contourlet based ST denoising owing to the multiple directions, and edges and boundaries are well restored. 2.2 Image Deonoising Using Interscale Correlation
Though wavelet transform is decorrelative, there are still some correlations among wavelet coefficients. For example, wavelet coefficients at the same position across scales are highly correlated. Spatially selective noise filtering technique (SSNF)[3] is based on the fact that sharp edges have large signal over many wavelet scales, while noise dies out swiftly with increasing scale. SSNF is not only suitable for wavelet transform, but also for contourlet transform. Contourlet coefficients of signal and noise can also be distinguished in similar way of wavelet coefficients based on the fact that sharp edges have large signal over many contourlet scales, while noise attenuates quickly with increasing scales[6]. Here, a new denoised algorithm based on contourlet transform is proposed, in which edges are detected by using the correlations combined with threshold functions. The steps are as follows 1) Compute correlations. For a contourlet coefficient C(j) at position (m,n) at scale j, correlation is defined as multiplication of contourlet coefficients at adjacent scales L −1
CorrL ( j , m, n) = ∏ C ( j + i ) (m, n) .
(1)
i =0
Here, L denotes the number of scales in the multiplication. 2) Power normalization so as to compare correlation with contourlet coefficients NewCorrL ( j , m, n) = CorrL ( j , m, n)
PC ( j ) , PCorr ( j )
(2)
628
G. Jiang et al.
where PC(j) is the power of contourlet coefficients and PCorr(j) is the power of correlations at scale j. 3) Classification and processing. Noise and signal are classified by correlations and thresholding If NewCorrL ( j , m, n) < k1 C ( j ) ( m, n) and C ( j ) (m, n) < k2δ n , then let
C(j)(m,n)=0. Here, k1 and k2 are the multiplicative factor changing with the scales increasing, δ n is the standard deviation of noise in different contourlet subbands. If the above condition is not satisfied, that is, ( j) ( j) (j) NewCorrL ( j , m, n) ≥ k1 C ( m, n) or C (m, n) ≥ k2δ n , C (m,n) is determined by the threshold functions. 4) Iteration. If j>2, then let j=j-1, and redo step(1) to step(3). As the center positions of many types of edges do not occur at the same location over a wide range of scales in the contourlet domain, the number of scales involved in the direct multiplication L is usually chosen as 2 or 3. In the experiments, L is chosen as 2, and the multiplicative factor k1 varies from 0.9 to 1.3 while k2 is from 1 to 3.7, and thresholds of ST are usually chosen smaller than that of HT. The improvement of the proposed method over SSNF are discussed as follows (1) Wavelet transform is replaced by contourlet transform. There is limited directional information (vertical, horizontal and diagonal) in separable wavelet domain, whereas boundaries and edges in natural images distribute in arbitrary directions. Edges and details may be blurred after SSNF. Contourlet transform is with multiple directions and multiple scales which helps to capture textures and details information well. (2) It is necessary to adjust the subbands at adjacent scales into the same size to compute the correlations. For contourlet transform, the interscale coefficients relationship is depended on the number of directional decomposition in the contourlet subbands. As illustrated in Fig.1, for two successive scales with the same number of directional decompositions, the contourlet coefficients are extended in the same way of wavelet coefficients. But for the scale in which there are twice as many directional subbands in the coarser level as in the finer level, the coefficients in the coarser level correspond to 4 coefficients located in two different subbands in the finer level and the contourlet coefficients are copied and expanded into the size of 1×2 or 2×1. (3) An iterative process is adopted in the SSNF, wavelet coefficients are filtered by the normalized correlations in every loop until the rest power of the wavelet coefficients equals to the estimated power of noises. The original intention of the loop process is to extract the edges gradually but the experimental results show that too many noisy coefficients are mixed into the signal during the loop so that it is hard to estimate the noisy power accurately, additionally, high computation cost is also a huge burden. By contrast, the proposed method computes the correlations only once and adopts the threshold functions to ensure the quality of the denoised images.
A New Denoising Method with Contourlet Transform
(a)
(b)
629
(c)
Fig. 1. Interscale relationships in wavelet and contourlet domain: (a) interscale relationship in wavelet domain, (b) interscale relationships in contourlet domain with the same number of directions in successive scales, (c) interscale relationships in contourlet domain with the twice as many numbers of directions in the coarser level as in the finer level
(a) SSNF, 24.34dB (b) WH, 25.13dB (c) WS, 26.16dB (d) CH, 25.71dB (e) CS, 27.28dB Fig. 2. Parts of the denoised “Barbara” Table 1. PSNRs of the noisy and denoised images
Images
Lena
Barbara
Noisy 20.00 21.94 24.43 27.96 20.02 21.96 24.46 27.98
SSNF 27.69 28.49 29.58 31.43 23.33 24.34 26.73 29.24
PSNR (dB) WH WS 28.02 28.06 29.04 29.15 30.56 30.22 31.95 32.53 23.43 24.78 25.13 26.16 27.29 27.84 30.08 30.17
CH 27.94 29.35 30.88 32.65 24.04 25.71 27.41 30.34
CS 28.56 29.76 31.12 33.11 25.98 27.28 28.61 31.23
3 Experimental Results and Conclusions White noise with various standard deviations are added to “Lena” and “Barbara” images with the size of 512×512, and five methods are used to denoise the test noisy images. The five methods are SSNF, wavelet based denoising algorithm using correlations combined with hard threshold(WH), wavelet based denoising algorithm using correlations and soft threshold(WS), contourlet based denoising algorithm using correlations and hard or soft threshold(CH or CS) which are named as contourlet based denoising algorithm using interscale correlations. For contourlet transform, the
630
G. Jiang et al.
numbers of directional subbands in 3 levels are 4, 8 and 16, respectively. Only the coefficients in the finest two levels are denoised. Table 1 gives the experimental results, and the parts of denoised images are given in Fig.2. From the experimental results, it may come to a conclusion as follows. Firstly, both the wavelet based and the contourlet based denoising algorithms using correlations and threshold functions outperform SSNF. The reason is that SSNF is an iterative process in which too many noisy coefficients are wrongly judged as effective signals, thus, the quality of restored images is not satisfied. But the loops are avoided in the proposed method. Secondly, denoising algorithms using correlations and ST usually outperform denoising algorithms using correlations and HT in PSNR, i.e., WS outperforms WH and CS outperforms CH. CS achieves the highest PSNR. PSNR improvement using CS over WS in images with many edges and fine textures is usually more than that of images with little details. For example, PSNR improvement in “Lena” varies in 0.5~0.7dB, while for “Barbara” which contains more details the PSNR improvement varies in 0.8~1.2dB. Finally, details and boundaries, such as the brim of the hat, the hair in “Lena” and textures on the trousers in “Barbara”, are well restored by the contourlet based denoising method using interscale correlations. CH achieves the most clear edges and boundaries visually, however there may be some thread-shaped artifacts which can be removed by cycle-spinning contourlet transform[4].
Acknowledgments This work was supported by the Natural Science Foundation of China (grant 60472100), the Natural Science Foundation of Zhejiang Province (grant RC01057, 601017, Y105577), and the Ningbo Science and Technology Project of China (grant 2004A610001, 2004A630002), and the Zhejiang Science and Technology Project of China (Grant 2004C31105).
References 1. Do, M. N., Vetterli, M.: Contourlets, J Stoeckler, G V Welland. Beyond Wavelets. New York :Academic Press, (2003) 2. Do, M. N., Vetterli, M.: Contourlets: A Directional Multiresolution Image Representation. Proc. of IEEE International Conference on Image Processing, 1(2002) 357-360 3. Xu, Y.: Wavelet Transform Domain Filters: A Spatially Selective Noise Filtration Technique. IEEE Trans on Image Processing, 3(6) (1994) 217 -237 4. Eslami, R., Radha, H.: The Contourlet Transform for Image De-noising Using Cycle Spinning. Proc. of Asilomar Conference on Signals, Systems, and Computers. Pacific Grove, (2003) 1982-1988 5. Candes, E. J., Donoho, D.: Curvelets – A Surprisingly Effective Nonadaptive Representation for Objects with Edges, Curve and Surface Fitting, A Cohen, C Rabut, L L Schumaker, Eds, Nashville: Vanderbilt University Press, (1999) 6. Po, D., Do, M. N.: Directional Multiscale Statistical Modeling of Images. SPIE conference on Wavelet Applications in Signal and Image Processing X, San Diego, USA, 5207-09 (2003)
A Novel Authentication System Based on Chaos Modulated Facial Expression Recognition Xiaobin Luo, Jiashu Zhang, Zutao Zhang, and Hui Chen Signal & information processing key Lab. of Sichuan province, Southwest Jiaotong University ,Chengdu, 610031, P.R. China [email protected] [email protected]
Abstract. We consider in this paper that the use of personal attributes can improve robustness and reliability of the facial authentication. We describe a new technique for authentication system which based on the integration of a human face-based authentication. Face recognition with variant illumination and expression is a challenging problem. An Improved Census Transform (ICT) takes 3×3 nonlinear filter window into use to transform the original facial images into pattern images. The ICT deals well with illumination variation. A chaos neural network based facial expression recognition algorithm was proposed with the detailed pattern image chaos modulation method and training method. The algorithm was tested on PC(P4, 1.8G,256M DDR), as to 50×50 pixels single facial image, the recognition time was less than 20 ms and the correct percent is 91% more. We integrated the above existing approaches and developed the authentication system based on facial expression recognition.
1 Introduction Proper user authentication is essential for reliable access control. For decades, authentication has traditionally been based on something a user has or knows. These traditional systems do not identify the user as such. However, they use objects that can be lost, stolen, forgotten, or disclosed. Passwords, for example, are often easily accessible to colleagues or even occasional visitors [1]. Biometrics are automated methods of authentication based on measurable human physiological or behavioral characteristics such as a fingerprint, iris pattern, or voice sample. Biometric characteristics should be unique and not duplicable or transferable [2]. It is evident that a more convenient and secure solution to user authentication is necessary. Biometric measurements receive an increasing interest for security applications. Often, however, attackers can copy a sample that a biometric system will accept as valid. Recent investigations confirm that attacks are much easier than generally accepted. In cooperative environments, face recognition is well accepted by individuals but they still suffer from limited performances. In the paper to avoid illumination effect, an Improved Census Transform takes nonlinear filter window to transform the original images into pattern images. A chaos neural network based facial expression recognition algorithm was proposed with the detailed pattern image chaos modulation method and training method. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 631 – 638, 2006. © Springer-Verlag Berlin Heidelberg 2006
632
X. Luo et al.
2 Biometric Authentication 2.1 Authentication in Network Environments Biometric application systems combined the ID card have been studying and developing. Because the ID card can be stored much information and realize strong security. All information including facial image data have been stored in the ID card so far. But it is easy to personate when face image is changed genuine face image into another one [3]. So we consider network-oriented method and there are three differences compared original system. 1. All personal information put on network instead of having on hand. 2. We get the demanded information from network depending on the situation. 3. We have encrypted information associated with personal information on network. In those cases, it can be difficult to falsify the information in the ID card and can consider strong security to improve by means of encryption of personal information. This paper presents a system of biometric encryption based on chaos modulated facial expression recognition for use in secured electronic commerce applications. We will satisfy our proposed system considered above factors. 2.2 Web-Enable Authentication System In locally or over a distance in a distributed environment, there is transmission. We depict our proposed system in Fig. 1. Web services enable systems to communicate with one-another by sending XML messages. Clients can be stand alone applications that interact with a web service and display results to a user; they can enabling to invoke services from another system. It is important that biometric data is encrypted before being passed across a public network to reduce the risk of the data being intercepted and misused. As with the storage of templates, the storage of policies containing access rights must be carefully designed. The process of enforcing these policies via the user interface (UI) must also be secure. User Image processing
Policy management
UI
Client
Random numbers Feature extraction
Bioscrypt
Encryption
Check table
Compare
Legend
HTTP
Enrolment
Decryption
Web service
Web service platform
Database
Authentication
Fig. 1. Simplified model of biometric authentication web services
System Based on Chaos Modulated Facial Expression Recognition
633
3 Chaos Modulated Facial Expression Authentication During the past two decades, face recognition has been conducted and face authentication has been a focus in a security. Since face is moving and changing, we need to consider invariant features when both the approaches are used. Two kinds of factors of biological factors and environmental factors have a considerable impact in facial authentication. So it is necessary to construct authentication systems. 3.1 Face Recognition System The face recognition requires considerable count for estimating scale and orientation of the face template. A number of face recognition approaches are not enough to attain the requirements. We present an algorithm for face recognition that achieves both computational efficiency and accuracy by using a chaos neural network based facial expression recognition algorithm. We design a prototypical system as Fig. 2. In Fig. 2 , the solid lines denote on-line processing, and dashed lines denote off-line learning. We describe the face recognition procedure as follows. Preprocessing contains background separation and sub-sampling. Image segmentation contains face tracking and feature extraction. CCD camera
Image sampling
Preprocessing
Image segmentation
Feature extraction Feature library
Face expression recognition
Fig. 2. Overview of the face recognition
3.2 Improved Census Transform In the paper, the features used in this work are defined as structure kernels of size 3×3 which summarize the local spatial image structure. Within the kernel structure information is coded as binary information {0, 1} and the resulting binary patterns can represent oriented edges, line segments, junctions, ridges, saddle points, etc. Actually, on a local 3×3 lattice there exist 29-1 such kernels. The Census Transform is a non-parametric local transform [4]. It is defined as an ordered set of comparisons of pixel intensities in a local neighborhood representing which pixels have lesser intensity than the center. In general the size of the local neighborhood is not restricted, we assume a 3×3 surrounding as motivated in the last section. Let N(x) define a local spatial neighborhood of the pixel at x so that x ∈ N ( x ) . The CT then generates a bit string representing which pixels in N(x) have an intensity lower than I(x). With the assumption that pixel intensities are always zero or positive the formal definition of the process is like follows. Let a comparison function S(I(x), I(Y)) be 1 if I(x) < I(Y) and let ⊗ denote the concatenation operation, the census transform [5] at x is defined as
634
X. Luo et al.
C ( x ) = ⊗ S ( A ( x ), I ( y ))
(1)
y∈ N ′
Where N ' ( x ) = N ( x ) ∪ x , A(x ) is the intensity average on the neighborhood. Ideally, the region of the interesting should only decide image features for recognition systems. In a simple model of image formation ,the image intensity I(x) is regarded as the product of object reflectance R(x) and the illumination L(x) at each point X = (x, y). The CCD influence can be modeled by a gain factor g and a bias term b, which are assumed to be constant on the image plane. Thus a simple image formation model is I(x)=gL(x)R(x)+b
(2)
In Fig. 3, although the illumination varies considerably, the illumination distribution of ICT is almost the same.
(a)
(b)
(c)
(d)
Fig. 3. Different illumination feature extract results, (a) Origin image. (b) ICT transform. (c) Prewitt edge. (d) Binary.
3.3 Facial Expression Chaos Modulated Bio-recognition In this section, chaos modulated bio-recognition is proposed with the detailed pattern image chaos modulation and training method. Image chaos modulation and correspondence detection method modulates image information through chaos message A holograph associative memory is made of image communication convolver form and correlation operation. An associational noise-like coding memory model is present. It avoids learning and train processing, and it forms memory network through sampling images directly. Moreover, this model has holograph attribute, so it can implement fragmentary image association. Discrete-time chaotic systems are generally described by a set of nonlinear difference equations [6]. Let P{e}={P(1),P(2),…,P(q)} be a sampling set. p (e ) = ( p 1 (e ), , p n (e )) T is n-dimensional vector. Where e Q={1,…,q}. d(x, r) is digital discrete chaos signal .Where x stands for input modulation message, it equals to chaos system parameter or station initial value; r is the discrete time. Suppose chaos signals are satisfied orthogonal, viz. when T is enough big, we get:
System Based on Chaos Modulated Facial Expression Recognition
σ > 0 1 T −1 d ( x1 ,τ )d ( x2 ,τ ) = ® ¦ T τ =0 ¯ε ( x ),| ε ( x ) |≤ δ << σ
635
x1 = x2 (3)
x1 ≠ x2
Hence, we can embed image P(e) into chaos system, or modulate by controlling chaos system parameter or station initial value. Namely, x i (e) = p i (e) creates corresponding chaos vector.
D(P(e),τ) = (d( pi (e),τ),,d( pn (e),τ))T
e ∈Q
(4)
Sum all modulation chaos message weights to form chaos memory vector. M (τ ) = ( m i (τ ), , m n (τ )) T
(5)
Where weight coefficient i ∈ I ( e ) = {i | p i ( e ) ≥ v ( e ), i ∈ N }
1 w (i, e ) = ® ¯0
(6)
i ∉ I (e)
Where v(e)= constant >0; Set I(e) is correlation with the gray level of image P(e), it can be as identified code. The change of v(e) will enable set I(e) change. When input face image P(h) need to be recognize , it will be chaos modulated to form correspond chaos memory vector .
D ( p ( h ), τ ) = { d ( p 1 ( h ), τ ), , d ( p n ( h ), τ } T e
(7)
And then let it correlation operator with chaos memory vector in different I e , Q, respectively, namely
R (h, e) =
¦
{
i∈ I ( e )
T −1
1 T
¦ τ
m i ( τ ) d ( p i ( h ), τ )}
According to orthogonal chaos signal Eq.(1) R (h, e) =
(8)
=0
q
¦ ( n ′ ( e , r )σ
we can get
+ n ′′ ( e , r ) ε ( p ))
(9)
r =1
Where n′(e, r) is the unit number of Γ (e, r ) = {i | pi ( r ) = pi ( h), i ∈ I (e) ΛI ( r )} .
n ′′(e, r ) is the unit number of Γ′(e, r ) = {i | pi (r) ≠ pi (h),i ∈ I (e)ΛI (r)} . n(e, r ) = n ′(e, r ) + n ′′(e, r ) is the tot unit of the set I(e) I(r ).From Eq. (9),we can obtain: A:
q
q
r =1 r ≠h
r =1 r ≠h
e = h : R(h, h) = {n(h, h) + ¦n′(h, r)}σ + ¦n′′(h, r)ε ( p) .
Denote its minimum as: q
q
r =1 r ≠h
r =1 r ≠h
R( h, h ) min = {n ( h, h ) + ¦ n′( h, r )}σ − {¦ n′′(h, r )}δ
(10)
636
X. Luo et al. q
q
r =1
r =1
B: e ≠ h : R ( h, e) = {¦ n′(e, r )}σ + ¦ n′′(e, r )ε ( p ) . Denote its maximum as q
q
r =1
r =1
R(h, e) max = {¦ n′(e, r )}σ + {¦ n′′( e, r )}δ If e
(11)
Q ,we can recognize image P(h) by
R ( h , h ) min − R ( h , e ) max > 0
(12)
or
σ > δ
q
q
r =1 r ≠h
r =1
¦n′′(h, r) + ¦n′′(e, r) q
q
r =1 r ≠h
r =1
= K (e)
{n(h, h) + ¦ n′(h, r )}− ¦ n′(e, r )
(13)
then get the maximum
R ( h , e ∗ ) = max R ( h, e ) e∈Q
.
3.4 Experimental Results This paper, human facial expression is presented by many feature blocks. Because in range feature block can simplify point information, facial expressing can regard as point distributing classifying question. In expression authentication experiment, faces displaying used seven basic facial emotional expressions, such as neutral , happy , surprised, sad ,disgusted ,angry and afraid. A boosting learning method is adopted. After training feature points of the input facial expressing image feature points one by one, the weight of every point in which local feature area can be computer. And then look for the best classifying model kernel of structure. Fig. 4 illustrates the flowing chart. In the paper, the local 27 feature points distributed on the corresponding shapes are obtained. Their topology forms facial express space structure. In order to test the algorithm described in the previous sections, we use two different databases, a database collected by us and the Cohn-Kanade AU code facial expression database [7]. Some of the expression sequence list on Fig. 4. We describe the face authentication procedure as follows. Step 1. Step 2. Step 3. Step 4.
Face image is detected and located, then segmented face area from the image , locate eye’s center. Face image normalize based on the center of two eyes. Improved Census Transform. Face feature information as vectors input to chaos neural network [8], and then modulation the face authentication.
System Based on Chaos Modulated Facial Expression Recognition
637
Fig. 4. Flow chart of the training system composition
The algorithm was tested on PC(P4, 1.8G, 256M DDR), as to 50×50 pixels single facial image. The recognition time was less than 20 ms and the correct percent is 91% more. Facial expression recognition results analysis compare [9] as table 1. Table 1. Comparison of facial expression recognition results
Classify method Recognition rate Linear discriminant rule based on PCA 74% Personalized galleries and elasticity graph 81% matching 2D emotion space (PCA) & minimum distance 84.5% PCA and LDA of the labeled –graph vectors 75%-92% BP learning-Neural Network 85%-90% Our arithmetic 91%
4 Conclusion We considered the use of personal attributes was able to improve robustness and reliability of the facial authentication. As the authentication method of the use of personal attributes, we proposed chaos modulated facial expression recognition method. In our former work, this leads to an efficient real-time detector with high recognition rates and very few false. Experimental results showed the improvement of the discriminating power. We integrated the classifiers and a face recognition system to build a real time facial expression authentication system. The security of the system is improved through the online input face images. A combined Web access control scheme has been implemented. The security of remote Web access has been improved.
Acknowledgment The project is supported by the National Natural Science Foundation of China under Grant No. 60572027, by the Outstanding Young Researchers Foundation of Sichuan Province Grant No.03ZQ026-033 and by the Program for New Century Excellent Talents in University of China under grant No. NCET-05-0794. We would also like to thank J. Cohn for kindly providing facial expression database used in this paper.
638
X. Luo et al.
References 1. Canavan, J. E..: Fundamentals of Network Security. Boston: Artech House ( 2001) 2. Ortega-Garcia, J., Bigun, J., Reynolds, D., Gonzalez-Rodriguez, J.: Authentication Gets Personal with Biometrics. Signal Processing Magazine, IEEE. Vol. 21, Iss. 2, (2004) 50 - 62 3. Marchany, R.C., Tront, J.G.: E-Commerce Security Issues. In: Sprague, R.H. (ed.): Proceedings of the 35th Hawaii International Conference on System Sciences. IEEE Computer Soc., Los Alamitos, CA., USA (2002) 2500 – 2508 4. Zabih, R., Woodfill, J.: Non-Parametric Local Transforms for Computing Visual Correspondence. In: Eklundh, J.O. (ed.): Proceedings of the 3rd European Conference on Computer Vision. Lecture Notes in Computer Science, Vol. 801 Springer-Verlag, Stockholm, Sweden (1994) 151–158 5. Froba, B., Ernst, A.: Face Detection with the Modified Census Transform. In: Azada, D. (ed.): Proceeding of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition. IEEE Computer Soc., Los Alamitos, USA (2004) 91-96 6. Ling, X.T., Zhou, H.: Chaotic Moderation and Correlative Detection Method for Image Classification and Recognition. Acta Electronica Sinica, Vol. 25, No.1 (1997) 54-57 7. Kanade, T., Cohn, J. and Tian, Y.: Comprehensive Database for Facial Expression Analysis. In: James Crowley (ed.): Fourth IEEE International Conference on Automatic Face and Gesture Recognition 2000 . IEEE Computer Soc., Los Alamitos, CA., USA (2000) 46-53 8. Rowley, H.A.: Neural Network-Based Face Detection. PhD thesis, Carnegie Mellon University, Pitsburgh, (1999) 9. Pantic, M., Valstar, M., Rademaker, R., Maat, L.: Automatic Analysis of Facial Expression: The State of the Art. IEEE trans. On pattern analysis and machine intelligence, Vol.22, No. 12 (2000) 1424-1445
A Novel Computer-Aided Diagnosis System of the Mammograms* Weidong Xu1,2, Shunren Xia1, and Huilong Duan1 1
The Key Laboratory of Biomedical Engineering of Ministry of Education, Zhejiang University, Hangzhou 310027, China 2 Automation College, HangZhou Dianzi University, Hangzhou 310018, China [email protected]
Abstract. Breast cancer is one of the most dangerous tumors for middle-aged and older women in China, and mammography is the most reliable detection method. In order to assist the radiologists in detecting the mammograms, a novel computer-aided diagnosis (CAD) system was proposed in this paper. It carried out a new algorithm using optimal thresholding and Hough transform to suppress the pectoral muscle, applied an adaptive method based on wavelet and filling dilation to extract the microcalcifications (MCs), used a model-based location and segmentation technique to detect the masses, and utilized MLP to classify the MCs and the masses. A high diagnosis precision with a low false positive rate was finally achieved to validate the proposed system.
1 Introduction Recently, breast cancer has been one of the most dangerous tumors for middle-aged and older women in China. Mammography is the most reliable detection technique of breast cancer. In the mammograms, the most important focuses are masses and microcalcifications (MCs). But the detection of those symptoms usually costs the radiologists so much time and energy that the radiologists often feel tired and miss some important focuses, for the focuses usually appear indistinct. So many computeraided diagnosis (CAD) techniques have been developed to assist the radiologists to detect the mammograms 1,2. In those methods, a high detection precision of focuses was achieved, while the adaptability and the robustness often were not emphasized. So when the focuses with special features are processed, the precision will be reduced acutely, and the false positive (FP) rate can hardly be suppressed. In this paper, a novel CAD system was proposed, which used models to represent the symptom features, applied appropriate algorithms and adjustable parameters on the targets, and overcame the defects of the conventional methods. And a high diagnosis precision with a low FP rate was realized. In this experiment, all the mammograms were taken from the 1st affiliated hospital of Zhejiang university, with the gray-level resolution of 12-bit and the spatial resolution of 1500*2000. *
Supported by Nature Science Foundation of China (No. 60272029) and Nature Science Foundation of Zhejiang Province of China (No. M603227).
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 639 – 644, 2006. © Springer-Verlag Berlin Heidelberg 2006
640
W. Xu, S. Xia, and H. Duan
(a) Fig. 1. Primary parts of the mammogram
(b)
Fig. 2. Thresholding result (a) and segmentation result (b) of the pectoral muscle
2 Pectoral Muscle Suppression Pectoral muscle is the triangle region at the corner of breast region in the MLO (medio-lateral oblique) mammograms, where the focuses of breast cancer couldn’t exist. So the detection region could be reduced, by removing the pectoral muscle. A model-based method was applied to fulfill it 3. Firstly, a series of ROI (region of interest) with different sizes were applied on the corner of the breast region. In each ROI, iterative thresholding technique was used to compute the optimal threshold. All of these thresholds were combined to a curve. Then, the local mean square deviation (MSD) at each point of the threshold curve was computed, and combined to a MSD curve. Each peak of the MSD curve denotes the inflection point of the threshold curve, which denotes the violent change of gray-level distribution. With the MSD curve, the optimal threshold of the pectoral muscle could be determined, and the corresponding region could be segmented. Each point of the edge of the thresholding region was extracted, and according to these points, zonal Hough transform was applied to detect the direction of the region edge. Different from Hough transform, zonal Hough transform registers the number of all the points lying on the parallel straight-lines of the current direction in the current zone, instead of that of the points on the current straight-line. It is used to detect the low-radian curve that approaches the straight lines. Based on this direction, two straight-lines were used to fit the pectoral muscle boundary, and elastic thread and polygon approaching techniques were carried out for the refinement. Thus, the pectoral muscle was segmented and removed accurately.
3 Microcalcifications Detection The MCs are the early focuses of breast cancer, which appear as the pieces with high intensity and contrast (Fig.3(a)), and could be represented with the high-frequency (HF) information of the mammogram. Wavelet was developed rapidly in 1990s. It could decompose the signals into HF and low-frequency (LF) domains level by level, called MRA (multi resolution analysis). Due to its smoothness and locality, wavelet has been widely applied in
A Novel Computer-Aided Diagnosis System of the Mammograms
641
many research fields. A usual wavelet-based technique is discrete wavelet transform (DWT). In each resolution, the image is decomposed into four subbands: LF subband L L and HF subbands L H , H L and H H . These three high-subbands are combined into a uniform HF domain, i.e. | L H | + | H L | + | H H | . And the HF signal denoting the MCs usually lies in the 2nd and 3rd levels of the wavelet domain. Thresholding with hysteresis was then applied to extract the high-intensity wavelet coefficients in the HF domain. Firstly, the signals in the HF domain were processed with a global thresholding: if the modulus was < T0 , the coefficient was deleted. Secondly, the reserved signals were processed with another global thresholding: if the modulus was > T1 , the signal was assured as the MC. Finally, a local thresholding was carried out on the neighborhood around each assured MC, and the remaining signals near the assured MCs were assured as the MCs, if their modulus was > T2 . Thus, the useful information with comparatively low HF modulus was extracted, leaving the noises with similar HF modulus suppressed. With the reconstruction of the assured signals in the HF domain, all the MCs were located accurately (Fig. 3(b)). Next, filling dilation was used to segment the MCs. The regions R 0 reconstructed above were considered as the original regions, and contrast in the neighborhood of them was enhanced with an intensity-remapping method. Then, R 0 began to expand outwards, with an iterative dilation process based on a cross-shaped structure element B , i.e. R1 = R 0 ⊕ B , , R n +1 = R n ⊕ B . The new combined point during the dilation process wouldn’t be accepted into the MC region, if its gray-level intensity f ( x , y ) could not satisfy | f ( x , y ) − f k e r | ≤ T 3 and | f ( x , y ) − f | ≤ T 4 . Where is the mean intensity of R 0 , f is the mean intensity of the accepted points in the neighborhood, and T3 , T4 are two thresholds. f ker
(a)
(a)
(b)
(b)
(c) (c)
Fig. 3. Original MCs (a), located MCs (b) and Segmented MCs (c)
(d)
Fig. 4. Different appearance of the masses
642
W. Xu, S. Xia, and H. Duan
In order to make the detection more accurate and adaptive, adaptive-network-based fuzzy inference system (ANFIS) is used to adjust the detection parameters ( T0 , T1 ,
T 2 , T3 , T 4 ) automatically, according to the background features. ANFIS is an artificial neural network (ANN) technique based on Sugeno fuzzy model, which has high approaching precision, good generalization ability and could avoid the local minimum 4, so it could be applied for the auto-control of the detection process of the MCs. With the experiments, the optimal values of the parameters in different backgrounds could be measured, and three features of the neighborhood (mean intensity, contrast and fractal dimension) should be extracted simultaneously. Using ANFIS, the relation between those optimal values and background features could be learned. If a new mammogram is processed, its background features in each region should be extracted firstly, and the appropriate values of the parameters could be determined by ANFIS accordingly.
4 Masses Detection The masses are the most important focuses of breast cancer. In the mammograms, the masses usually appear as a high-intensity lump with a certain area inside the breast tissue. Among them, some mass appears as a solid block, some mass appears as a roundish pie, and some mass appears as a flocky starfish. In some case, there are some MCs within the mass region. Thus, two models are proposed to represent all kinds of masses 5. Model A represents the masses in the denser tissue (Fig. 4(a), 4(b)). In this model, there is a solid central part in the mass region, the pixels of which have nearly the same gray-level intensity. Other pixels of this mass region have different intensities, and the closer to the center, the higher the intensity is. The intensity on the edge of the mass is close to the background. And Model B represents the masses in the fatty tissue, (Fig. 4(c), 4(d)). In this model, the mass appears distinct and is easy to be segmented, but there’s no obvious solid part in the region. The variance of intensity of the pixels in the mass region is much lower than that on the edge. Whatever a mass appears as, it could be represented by these two models. The suspicious regions were extracted firstly, by peeling off the fatty tissue around the denser tissue and the masses. Iterative thresholding is applied to fulfill this task, because the suspicious region has high intensity and contrast. In this way, not only the suspicious regions, but also the masses matching Model B, which appear as the isolated lumps, were extracted from the breast. To locate the masses matching Model A, which are buried deeply in the denser tissue, DWT was used to decompose the suspicious regions with high intensity. If a mass with a solid central part lies in these regions, the modulus of the HF information at the corresponding position must be very low. Hence, in the 2nd and 3rd levels of the wavelet domain, the black-hole positions where the modulus of the HF signals in a neighborhood was close to zero were registered into a map, which usually denotes the solid central part of masses. Then, a region registration process was carried out on the position map, to remove the minor structure and label the black-hole regions where the masses probably lie.
A Novel Computer-Aided Diagnosis System of the Mammograms
(a)
(a)
(b)
(c)
Fig. 5. Iterative thresholding result
(c)
643
(b)
(d)
Fig. 6. Segmentation results of the masses
Afterward, filling dilation was applied to extract the masses matching Model A, the central part of which had been located above. For the sake of the extraction precision, Canny edge detector was used to restrict the segmentation process, which is based on the local edge normal directions and the zero-crossings of the 2nd derivatives. In this way, the gradients of the boundaries inside the breast region could be extracted, and they could be regarded as one of the segmentation restrictions. The gradient extracted with Canny edge detector was like the barrier, preventing the dilated mass region from getting across. During the segmentation process, besides the detection criterions of the MCs, another criterion could be described as I g ra d ( x , y ) ≤ T g ra d , where
I grad ( x , y ) is the modulus of the gradient, and Tgrad is a threshold. Simultaneously, ANFIS was utilized to adjust the detection parameters ( T 3 , T 4 , T g ra d ) adaptively, according to the background features (mean intensity, contrast and fractal dimension), just like the auto-control introduced in Section 3. Thus, the regions of the masses matching Model A could be segmented accurately (Fig. 6).
5 Classification and Experiments With the algorithms in Section 3 and 4, the MCs and the masses in the mammograms had been located and segmented, with a number of FPs. At last, MLP (multi-layer perceptrons) was used for the classification, reducing FPs and reserving the focuses. MLP is a conventional ANN technique, which has high approaching precision and good generalization ability. Compared with the local-approaching networks, MLP requires fewer training samples at the same precision and could deal with the highdimensionality problems, so it could be applied in the medical-image processing field. In this experiment, ten features were selected to represent the MCs, including area, mean intensity, contrast, coherence, compactness, ratio of pits, number of hollows, elongatedness, fractal dimension, and clustering number. Here, coherence is defined as the MSD of the region, compactness is the roundness, ratio of pits means the ratio of the number of the pits on the boundary to the circumference, elongatedness means the ratio of the length to the width, and clustering number means the number of the MCs around the current one. Another ten features were used to represent the masses:
644
W. Xu, S. Xia, and H. Duan
area, mean intensity, contrast, coherence, compactness, elongatedness, fractal dimension, edge contrast, boundary gradient intensity, boundary direction entropy. Here, edge contrast is the MSD near the edge, boundary gradient intensity is the mean modulus of the gradients on the boundary, and boundary direction entropy means the entropy of the gradient direction distribution histogram on the boundary. 60 MLO mammograms were used to test the segmentation method of the pectoral muscle in Section 2. In the 52 samples where the pectoral muscle exists, 49 samples were detected. In the 8 samples where there isn’t any pectoral muscle, 6 samples were identified as non-pectoral-muscle mammograms, while 2 samples were mistaken. 60 mammograms were used to test the MC detection method in Section 3. In the 163 true MCs, 162 MCs were detected, while 511 FPs extracted at the same time. The true MC regions were segmented by the radiologists manually, and the result was regarded as the criterion. Thus, the extraction effect of the MCs could be evaluated, by computing the ratio of the common area (the overlapped area of the auto-extracted region and the criterion region) to the criterion area. And the mean effect was 94.7%. 60 mammograms were used to test the mass detection algorithm in Section 4. In the 78 true masses, 75 masses were detected, while 449 FPs were extracted simultaneously. And the mean extraction effect of the masses was 94.2%. The MLP classifier introduced above was finally defined as: 3 layers, 10 input nodes, 20 hidden nodes, and 1 output node. The segmented MCs and the masses were inputted to the classifier, and the result is: 158 true MCs were identified, with 12 FPs; 73 true masses were identified, with 38 FPs. Combining the segmentation and classification result, the true positive rates of the MCs and the masses were 96.9% (158/163) and 93.6% (73/78) respectively, only with 0.2 and 0.63 FP per image. The performance of this system was much better than that of the conventional methods. In this system, a series of new effective techniques were utilized, and the adaptability and the robustness of them were emphasized. Modeling technique was applied to represent the MCs and the masses, so that appropriate methods could be carried out adaptively upon the problem with different features. And ANFIS was used for the auto-adjustment of the detection. Even when the focuses with special features and backgrounds were faced with, this system also could get a satisfying result.
References 1. Xia, S.R., Lv, W.X.: Advances in the Research of Computer-aided Diagnosis on Mammograms. Foreign Medical Science: Biomedical Engineering, Vol. 23. (2000) 24–28 2. Thangavel, K., Karnan, M., Sivakumar, R., Mohideen, A.K.: Automatic Detection of Microcalcification in Mammograms-a Review. ICGST International Journal on Graphics, Vision and Image Processing, Vol. 5. (2005) 31–61 3. Xu, W.D., Wang, X.Y., Xia, S.R., Yan, Y.: Study on Model-based Pectoral-Muscle Segment Algorithm in Mammograms. J. of Zhejiang Univ. (Eng. Sci.), Vol. 39. (2005) 437–432 4. Xu, W.D., Xia, S.R., Xie, H.: Application of CMAC-based Networks on Medical Image Classification. Lecture Note on Computer Science, Vol. 3173. (2004) 953–958 5. Xu, W.D., Xia, S.R., Duan, H.L., Xiao, M.: Segmentation of Masses in Mammograms Using a Novel Intelligent Algorithm. International Journal of Pattern Recognition and Artificial Intelligence, Vol. 20. (2006) 255–270
A Partial Curve Matching Method for Automatic Reassembly of 2D Fragments Liangjia Zhu1 , Zongtan Zhou1 , Jingwei Zhang2 , and Dewen Hu1 1 Department of Automatic Control, College of Mechatronics and Automation, National University of Defense Technology, Changsha, Hunan, 410073, P.R. China [email protected] 2 Hunan Supreme People’s Court, Changsha, Hunan, 410001, P.R. China
Abstract. An important step in automatic reassembly of 2D fragments is to find candidate matching pairs for adjacent fragments. In this paper, we propose a new partial curve matching method to find the candidate matches. In this method, the fragment contours are represented by their turning functions. The matching segments between two fragment contours are found by analyzing the difference curve between two turning functions directly. The performance of our method is illustrated with randomly shredded document fragments.
1
Introduction
Automatic reassembly of 2D fragments to reconstruct original objects is an interesting problem with applications in forensics[1], archaeology[2,3], and other disciplines. The fragments are often represented by their boundary curves and candidate matches between different fragments are usually achieved by curve matching. Since matching between two fragments usually occurs over a fraction of their boundaries, partial curve matching is needed. The 2D fragments reassembly problem is similar to the automatic reassembly of jigsaw puzzles, which has been widely studied [4,5]. However, those solutions exploiting some specific features or a priori knowledge, e.g. puzzle pieces have smooth edges and well-defined corners, are impractical in many real applications. More generally, the fragments reassembly problem can be considered as a special case of partial curve matching problem. Researchers have proposed many solutions to this problem with different applications. Those solutions can be roughly divided into two kinds as to whether the fragment contour is sampled uniformly or not. One is string-matching based methods that represent fragment contours with uniformly sampled points. In [2], the curvature-encoded fragment contours are compared, at progressively increasing scales of resolution, using an incremental dynamic programming sequence-matching algorithm.Wolfson [6] proposed an algorithm that converts the curves into shape signature strings and applies string matching techniques to find the longest matching substrings. This is also a curvature-like algorithm. However, the calculation of numerical curvature is not a trivial task as expected when noise exists [7]. The other is D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 645–650, 2006. c Springer-Verlag Berlin Heidelberg 2006
646
L. Zhu et al.
feature-based matching methods. In [3], fragment contours are re-sampled using polygonal approximation and the potential matching pairs are found through optimizing an elastic energy. However, a difference in the relative sampling rate of aligned contour segments can affect the optimal correspondence and the match cost [8]. In this paper, we propose a partial curve matching method to find the candidate matching fragment pairs. The fragment contours are represented by their turning functions and the matching segments are found by analyzing the difference curve between two turning functions directly. The curve similarity is evaluated as the residual distance of corresponding points after optimal transformation between two matching segments. This paper is organized as follows: Section2 presents our partial curve matching method. We present some experimental results in Section 3, and draw our conclusions in Section 4.
2
Partial Curve Matching Based on Turning Functions
We assume that the fragment contours have been extracted successfully from the scanned fragments image. The method of comparing two fragment contours can be formulated as follows. 2.1
Contour Representation
We first build the turning function θ(s) for each fragment contour, as in [6]. Then, all θi (s), i = 1 : N are sampled with the same space δ and stored as character strings Ci , i = 1 : N in clockwise order. Note that the common segments of two matched fragments traverse in opposite directions. 2.2
Histogram Analysis on Δθ
Suppose the two fragment contours to be compared are CA = (a1 , a2 , · · · , am ) and CB = (b1 , b2 , · · · , bn ) with m ≤ n. At a moment, CA is shifted by d positions Δ d = (a1+d , a2+d , · · · , am+d ) = (ad1 , ad2 , · · · , adm ), (d is an integer) to become CA d the corresponding turning function becomes θA = θA (si + dδ), i = 1 : m. The d difference between θA and θB is defined as Δ
d d = θB − θA = (b1 − ad1 , b2 − ad2 , · · · , bm − adm ) ΔθAB
(1)
d At this moment, if there exist two sufficiently similar segments on CA and d CB , the corresponding part on ΔθAB will almost be a constant. Draw the hisd togram of ΔθAB to calculate the number of points lies in each sampling interval [iλ, (i + 1)λ] , i = 0 : tn , there must be a peak on the histogram corresponding to the matching segments. tn is the number of sampling intervals that determined by d d (m) − ΔθAB (1) ΔθAB (2) tn = λ
A Partial Curve Matching Method for Automatic Reassembly
647
Denote the indices of start and end points of each segment by start and end respectively. We only check the peaks with its height H > Hmax 2 and end − m start > t for candidate pairs of start and end points, where Hmax is the l maximum of the histogram and tl is an integer. m tl is the parameter controlling the minimum length of the permitted matching segments between contour A and B. An example of the relation between Δθ(s) and the histogram is given in Figure 1. The dashed dot lines mark the mean value of the selected segment.
Fig. 1. The relation between Δθ(s) and the histogram
This is just a primary selection for finding the correct pairs of start and end points. The candidate match pairs are selected according to the following decision rule. d , compute the Decision rule: For a segment (Δθstart , · · · , Δθend ) on ΔθAB standard deviation std, average deviation avd and angle change number acn as end (Δθi − mean)2 i=start (3) std = end − start end
sin(Δθi − mean)
i=start
avd =
end − start acn =
end−1
(4)
acni
(5)
1, if |Δθi+1 − Δθi | > t0 0, otherwise
(6)
i=start
where end
mean =
Δθi
i=start
end − start
,
acni =
If (1) std < t1 ; and (2) avd < t2 ; and (3) acn > t3 then the corresponding segments are selected as candidate matches.
648
L. Zhu et al.
The conditions (1) and (2) reflect the fact that if two segments are sufficiently similar, then the overall angle turning tendency will almost be the same; condition (2) means that the difference curve of two well matched segments should be distributed near uniformly around its mean value; and condition (3) is used to avoid matching an almost straight segment with another segment. Other constraints can also be added to these conditions. One or more segments may be found each time when shift the shorter contour one step further. For comparing any two different fragment contours, we have to shift the shorter contour CB n times, where n is the number of samples on contour CA . d For computing ΔθAB for each shift d, the total number of comparisons is m, where m is the number of samples on contour CB . Hence, the complexity of histogram analysis is O(mn). 2.3
Recovery the Transformation and Similarity
Given a pair of start points and end pints, we compute the appropriate matching contour segments in the (x, y) plane. Denote these contour segments by X and Y , then the optimal transformation Eopt between those two segments will minimize the l2 distance between EX and Y 2
2
|Eopt X − Y | = min |EX − Y |
(7)
E
As in [9], transform X with Eopt in the (x, y) plane to get the transformed segment X . Then X and Y are evenly sampled and represented by two sequence {ui } and {vj }. The curve similarity is evaluated by m
S=
i=1
d(ui − Y ) +
n
d(vj − X )
j=1 2
(min(l1 , l2 ))
, d(ui , Y ) = min |ui − vj | ∀vj ∈Y
(8)
Here, m and n are the number of points in X and Y , l1 and l2 are the length of each segment respectively.
3
Experimental Results
We used the randomly shredded document fragments to test the algorithm. The algorithm was implemented on a Windows platform, and the programming language was C#. An AGFA e50 scanner was used as the image acquisition device. The fragments had been digitized in 150 dpi. Figure2(a) shows the image of the scanned fragments and its size is 730 × 953. The scanned image was thresholded in RGB space to get a binary image. The contour of each fragment was extracted from this binary image. Figure2 (b) shows the extracted contours. In the test, the number of fragments is N = 16. The parameters were set as δ = 3.57, λ = 0.2, tl = 15, t0 = 0.05, t1 = 0.3, t2 = 0.1, t3 = 3 and ts = 1. In comparing any two different fragment contours, we may get several possible matches with the curve similarity smaller than ts . In this case, we only select the
A Partial Curve Matching Method for Automatic Reassembly
(a)
649
(b)
Fig. 2. (a) The image of scanned fragments, (b) extracted contours
Fig. 3. The first 24 candidates returned by our partial curve matching method. The similarity S of each candidate match is showed on the left bottom of each grid. The true matches are marked with star(). Table 1. Comparison between our Method and Stolfi’s Method[2] Object Ours Document Stolfi’s Ceramic
Resolution 150dpi 300dpi
T 24 73
R 16 46
Recognition Rate 66.7% 63.0%
most similar one as the candidate match. In this test, there were 24 true matches in the original document; let T denote this set and R denote the recognized true matches from T . The algorithm started with 128 initial possible matches, and returned 30 matches with S < 1, of which 16 were true. Figure 3 shows the first 24 candidate matches, in order of increasing S. Note that candidates 1-10 and 12-13, 15, 17, 18, 20 are all correct. Table 1 shows the comparison results between our method and Stolfi’s method. It is hard to mark a strict comparison between the performance of these two
650
L. Zhu et al.
methods because the test fragments are different. However, one thing to note is that our method depends much less on the scan resolution.
4
Conclusions and Future Work
A turning function based partial curve matching method has been proposed to find candidate matches for automatic reassembly of 2D fragments. The accuracy of the method was verified by our experiment. Finding the candidate matches is only the first step to reassemble the original objects. We are now working on solving the global reconstruction problem to eliminate the ambiguities resulting from the partial curve matching. Our recent results will be reported in the not remote future.
Acknowledgement This work is supported by the Distinguished Young Scholars Fund of China (60225015), National Science Foundation (60575044), Ministry of Education of China (TRAPOYT Project), and Specialized Research Fund for the Doctoral Program of Higher Education of China (20049998012).
References 1. De Smet, P., De Bock, J., Corluy,E.: Computer Vision Techniques for Semiautomatic Reconstruction of Ripped-up Documents. Proceedings of SPIE. 5108 (2003) 189–197 2. Leit˜ ao, H.C.G., Stolfi, J.: A Multiscale Method for The Reassembly of Twodimensional Fragmented Objects. IEEE Transactions on Pattern Analysis and Machine Intelligence. 24 (2002) 1239–1251 3. Kong, W., Kimia, B.B.: On Solving 2D and 3D Puzzles Using Curve Matching. Proceedings of Computer Vision and Pattern Recognition. 2 (2001) 583–590 4. Burdea, C., Wolfson, H.J.: Solving Jigsaw Puzzles by A Robot. IEEE Transactions on Robotics and Automation. 5 (1989) 752–764 5. Yao, F.H., Shao, G.F.: A Shape and Image Merging Technique to Solve Jigsaw Puzzles. Pattern Recognition Letters. 24 (2003) 1819–1835 6. Wolfson, H.J.: On Curve Matching. IEEE Transactions on Pattern Analysis and Machine Intelligence. 12 (1990) 483–489 7. Calabi, E., Olver, P., Shakiban, C., Tannenbaum, A., Haker, S.: Differential and Numerically Invariant Signature Curves Applied to Object Recognition. International Journal of Computer Vision. 26 (1998) 107-135 8. Sebastian, T.B., Klein, P.N., Kimia, B.B.: On Aligning Curves. IEEE Transactions on Pattern Analysis and Machine Intelligence. 25 (2003) 116–125 9. Pajdla, T., van Gool, L.: Matching of 3-D Curves Using Semi-differential Invariants. Proceeding of International Conference, Computer Vision. (1995) 390-395
A Split/Merge Method with Ranking Selection for Polygonal Approximation of Digital Curve Chaojian Shi1,2 and Bin Wang2, 1
2
Merchant Marine College, Shanghai Maritime University, Shanghai, 200135, P. R. China Department of Computer Science and Engineering, Fudan University, Shanghai, 200433, P. R. China [email protected], [email protected]
Abstract. Polygonal approximation of digital curve is an important problem in image processing and pattern recognition. In traditional splitand-merge method (SM), there exists the problem of dependence on the given initial solution. For solving this problem, a novel split-and-merge method (RSM), which applies the ranking selection scheme of genetic algorithm to the split and merge process, is proposed. Experiments of using two benchmark curves to test RSM are conducted and show its good performance.
1
Introduction
Polygonal approximation of digital curve is a hot topic in pattern recognition and image processing and has won wide practical applications such as vectorization, map service, CAD and GIS applications. The polygonal approximation problem can be stated as follow: given a digital curve with N points, approximate it by a polygon with a given total number of segments M so that the total approximation error is minimized. The polygonal approximation problem is a NP-hard problem and the size of the search space is C(N, M ) [1]. In the past decades, many approaches have been proposed to solve the polygonal approximation problem. Some of them are based on local search strategy such as sequential tracing[2], split-and-merge method [3] and dominant points detection [4]. Others are based on global search technique such as genetic algorithm[5,1]and ant colony methods[6]. The local-search-based methods work very fast. However as the results depend on the selection of starting point or the given arbitrary initial solution, they usually lack of optimality. The approaches based on genetic algorithm, tabu search and ant colony methods can obtain better results, but require more computation time. So they are hardly fit for real applications. In this paper, we propose a novel split-and-merge method (RSM). Different from SM, RSM applies the ranking selection scheme of genetic algorithm to the split and merge process and effectively solves the problem of final solution’s
Corresponding author.
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 651–656, 2006. c Springer-Verlag Berlin Heidelberg 2006
652
C. Shi and B. Wang
dependence on the initial solution. Experiments of using two benchmarks to test RSM are conducted and show good performance.
2
Problem Statement
A closed digital curve C can be represented by a clockwise ordered sequence of points C = {p1 , p2 , . . . , pN }, where N is the number of points of on the curve and pi+N = pi . We define arc p i pj as the consecutive points pi , pi+1 , . . . , pj , and chord pi pj as the line segment connecting points pi and pj . The approximation error between p i pj and pi pj is defined as e(p d2 (pk , pi pj ) (1) i pj , pi pj ) = pk ∈pi pj
where d(pk , pi pj ) is the perpendicular distance from point pk to the line segment pi pj . The polygon V approximating the digital curve C is defined as a set of ordered line segments V = {pt1 pt2 , pt2 pt3 , . . . , ptM −1 ptM , ptM pt1 }, such that t1 < t2 < . . . < tM and {pt1 , pt2 , . . . , ptM } ⊆ {p1 , p2 , . . . , pN }, where M is the number of vertices of the polygon V . The approximation error between the curve C and its approximating polygon V is defined as follows: E(V, C) =
M
e(pti pti+1 , pti pti+1 )
(2)
i=1
Then the polygonal approximation problem is formulated as follows: Given a closed digital curve C = {p1 , p2 , . . . , pN } and an integer number 3 ≤ M ≤ N . Let SP be the set of all the polygons which approximate the curve C. Let SSP = {V | V ∈ SP ∧ |V | = M }, where |V | denotes the cardinality of V . Find a polygon P ∈ SSP such that E(P, C) = min E(V, C) V ∈SSP
3
(3)
The Traditional Split-and-Merge Method
The traditional split-and-merge method (SM) is a recursive method starting with a initial polygon V = {pt1 pt2 , pt2 pt3 , . . . , ptM −1 ptM , ptM pt1 }, which approximates the curve. At each iteration, firstly, a split process is performed. Among all the curve’s points, select the point pk with the farthest distance from its corresponding edge pti pti+1 , and then remove the edge pti pti+1 and add two new edges pk pti and pk pti+1 to the polygon. We consider the process as splitting the edge pti ptj at point pk and term the point pk splitting point. Secondly, the merge process is performed. Among all the vertices of the polygon, select the vertex ptj which has the minimum distance from the line segment connecting two adjacent vertices ptj−1 and ptj+1 , and then remove the edges ptj−1 ptj and ptj ptj+1 and add edge
A Split/Merge Method with Ranking Selection
653
Fig. 1. Split-and-merge process
ptj−1 ptj+1 to the polygon. We consider the process as merging the edges ptj−1 ptj and ptj ptj+1 at vertex ptj and term the vertex ptj merging point. Fig. 1 give an example to illustrate the split and merge processes. Repeat the above processes until the number of iteration is equal to a pre-specified number. The disadvantage of this method is that, if a bad initial polygon is given, the obtained final solution may be far away from the optimal one. Therefore, SM is not stable and depends on the given initial solution.
4
The Proposed Method
In this section, a novel split-and-merge method (RSM), which applies ranking selection scheme of genetic algorithms to the split-and-merge process, is proposed. 4.1
Splitting Strength and Merging Strength
Let C = {p1 , p2 , . . . , pN } be a digital curve and V = {pt1 pt2 . . . , ptM −1 ptM , ptM pt1 } be its approximating polygon. In the following, we give the definitions of the splitting strength at the point of the curve C and merging strength at the vertex of the polygon V . ptk+1 and ptk ptk+1 ∈ V , the splitting strength at Definition 1. Suppose pi ∈ ptk the point pi is defined as S(pi ) = d(pi , ptk ptk+1 )/(1 + d(pi , ptk ptk+1 )).
(4)
Definition 2. Assume that ptk be a vertex of the polygon V , ptk−1 and ptk+1 be its two adjacent vertices. The merging strength of the vertex ptk is defined as M (ptk ) = 1/(1 + d(ptk , ptk−1 ptk+1 )).
(5)
654
4.2
C. Shi and B. Wang
Ranking Selection Strategy
Selection is an important phase of genetic algorithms (GA). A wide variety of selection strategies have been proposed. Most of them are based on fitnessproportionate selection and may lead to premature convergence. To avoid premature convergence, Baker proposed a ranking selection scheme in [9]. The idea of this strategy is that: at each generation, all the individuals in the population are sorted according to their fitness value, and each individual is assigned a rank in the sorted population. For N individual in the population, the best individual gets rank 1, whereas the worst receives rank N. The selection probabilities of the individuals are given by some function of their rank. Let P = {x1 , x2 , . . . , xN } denote the sorted population and f (x1 ) ≥ f (x2 ) ≥ . . . ≥ f (xN ), where f (·) is the fitness function of the individual. Then the selection probability p(xi ) must satisfies the following conditions: (1) p(x1 ) ≥ p(x2 ) . . . ≥ p(xN ) and (2) N p(xi ) = 1. i=1
Inspired by the above selection strategy, we apply it to the traditional splitand-merge method for the selection of splitting and merging points. A function for calculating the selection probabilities is developed here. Assume that C = {x1 , x2 , . . . , xM } be an ordered set of points. Here, we let the ordered set C corresponds to a sorted population and each point of C corresponds to an individual. Then we can use the above ranking selection strategy to perform the selection of points in C. For each point xi , we assign a selection probability p(xi ) to it and calcaulate the p(xi ) via the following equations: ⎧ ⎨ p(xi ) = p(xi−1 ) · e−t/(i−1) , i = 2, . . . , M M ⎩ p(xi ) = 1
(6)
i=1
where t is a parameter which is used to adjust the probability distribution. In general, we empirically set the parameter t in [1.4, 2.4]. 4.3
Algorithm Flow
The proposed algorithm has two parameters, one is the parameter t for adjusting the probability distribution, the other is the number G of iterations. input. The digital curve C and the number of polygon’s sides M . output. The polygon B with M edges which approximates C. step 1. Generate an initial polygon V with M edges by randomly selecting M points from C as the vertices of the polygon. Set B = V and k = 0 . step 2. For those points of C which are not the vertices of the polygon, calculate their splitting strength using Eq. 4. step 3. Sort these points by their splitting strength value in descending order and select a point by the ranking selection strategy. Then, perform splitting process at the selected point.
A Split/Merge Method with Ranking Selection
655
step 4. For each vertex of V , calculate its merging strength value using Eq. 5. step 5. Sort these vertices by their merging strength in descending order and select a vertex using the ranking selection strategy. Then, perform merging process at the selected vertex. step 6. Compute the approximation error of the polygon V using Eq. 2. If it is smaller than the approximation error of polygon B, then replace B with V . step 7. Set k + 1 to k, if k <= G then go to step 2. step 8. Output B.
5
Experimental Results and Discussions
In literature [5], Yin proposed a genetic-algorithm (YinGA) to solve the polygonal approximation and it has been shown to outperform those based on local search techniques including the traditional split-and-merge method. The disadvantage of YinGA is that its computational load is relatively high. Chen and Ho [1] proposed another genetic algorithm (EEA) for the polygonal approximation and it was empirically shown that EEA outperforms YinGA on the quality of solutions and the convergence speed. So, we only compare the proposed RSM with EEA.
(a) Chromosome
(b) Semicircle
Fig. 2. Two digital curves Table 1. Experimental results for EEA and RSM Result of Fig. 3(a) WORST K
Result of Fig. 3(b)
AVERAGE VARIANCE
WORST
AVERAGE VARIANCE
EEA RSM EEA RSM EEA RSM
K EEA RSM
EEA RSM EEA RSM
8
17.8 16.4
15.5 14.4
2.3
1.4
10 61.6 52.7
44.1 47.6
76.5
9
15.9 14.0
13.5 12.7
1.6
0.3
12 33.2 31.4
29.5 28.5
5.1
1.6
12.6
12
7.8
6.7
6.8
5.9
0.9
0.1
14 24.3 18.5
20.1 17.9
4.7
0.7
14
6.2
4.8
5.1
4.5
0.6
0.0
17 17.7 13.9
14.6 13.4
2.2
0.1
15
5.4
4.1
4.3
4.1
0.3
0.0
18 15.6 12.8
12.9 12.1
1.6
0.1
17
4.0
3.2
3.6
3.2
0.2
0.0
19 13.2 11.7
11.5 11.0
0.9
0.1
18
3.4
2.9
3.0
2.9
0.1
0.0
22 9.9
7.9
8.5
7.6
0.6
0.0
27 6.1
4.0
5.0
4.0
0.5
0.0
30 4.4
2.8
3.6
2.7
0.3
0.0
92.4
15.2
Total
45.3 52.7
51.8 58.7
RSM/EEA
86.1%
92.1%
5.9
24.4
30.5%
186.0 155.7 149.8 144.8 83.7%
96.7%
16.5%
656
C. Shi and B. Wang
Two benchmarks, a chromosome curve with 60 points and a semicircle curve with 102 points which are shown in Fig. 2(a) and (b), respectively, are used to test the performance of RSM. The parameter of RSM are set as follows: t=1.8 and G=1500. The platform of conducting all the experiments is a PC with CPU Pentium III 400 under Windows 2000. The simulation conducts ten independent runs for RSM and EEA. The worst solution, average solution and variance of solutions of ten independent runs are listed in Table 1. The average computation time of RSM and EEA for Fig. 2(a) and (b) are about 0.09 and 0.27 seconds for chromosome curve, and 0.41 and 0.87 seconds for semicircle curve, respectively. We can see that RSM outperforms EEA in the quality of the worst solution, average solution, variance of solutions and the convergence speed.
6
Conclusions
A split-and-merge method with ranking selection (RSM) has been proposed for the polygonal approximation of digital curve. With this method, the problem of dependence on the initial solution of the traditional split-and-merge method has been successfully solved. The experimental results demonstrate the good performance of RSM.
Acknowledgement The research work in this paper is partially sponsored by Shanghai Leading Academic Discipline Project, T0603.
References 1. Ho, S. Y., Chen, Y. C.: An Efficient Evolutionary Algorithm for Accurate Polygonal Approximation. Pattern Recognition, 34 (2001) 2305–2317 2. Kurozumi, Y., Davis, W. A.: Polygonal Approximation by The Minimax Method. Comput. Graphics Image Processing, 19 (1982) 248–264 3. Wu, J.S., Leou, J.J.: New Polygonal Approximation Schemes for Object Shape Representation. Pattern Recognition, 26 (1993) 471–484 4. Teh, H.C., Chin, R.T.: On Detection of Dominant Points on Digital Curves. IEEE Trans Pattern Anal Mach Intelligent, 11(8) (1989) 859–872 5. Yin, P.Y.: Genetic Algorithms for Polygonal Approximation of Digital Curves. Int. J. Pattern Recognition Artif. Intelligent, 13 (1999) 1–22 6. U. Vallone: Bidimensional Shapes Polygonalization by ACO. Proceedings of The 3rd International Workshop on Ants Algorithms, Brussels, Belgium (2002) 296–297 7. Ray, B. K., Ray, K. S.: Determination of Optimal Polygon from Digital Curve Using L1 Norm. Pattern Recognition, 26 (1993) 505–509 8. Phillips, T. Y., Rosenfeld, A.: An ISODATA Algorithm for Straight Line Fitting. Pattern Recognition letters, 7 (1988) 291–297 9. Baker, J. F: Adaptive Selection Methods for Genetic Algorithms. Grefenstette J. J. (Ed.) Proc. of the 1st Int’1. Conf. on Genetic Algorithms, Lawrence Earlbaum Associates, Hilladale, NJ(1985) 101–111
A Training Strategy of Class-Modular Neural Network Classifier for Handwritten Chinese Character Recognition* Xue Gao Department of Electronics and Communication Engineering, South China University of Technology, Guangzhou 510641, P.R. China [email protected]
Abstract. The convergence time of MLP-based neural network classifier is one of the important aspects that affect its successful application in the large-set handwritten Chinese character recognition. In this paper, we proposed an improved training strategy to improve the convergence of neural network. Instead of using all samples, the neural network is optimized by a simplified sample set through the sample selecting process. The experimental results on 470 and 1034 categories of handwritten Chinese characters respectively indicate that, the training strategy can significantly reduce the convergence time without degrading the recognition performance, showing its effectiveness.
1 Introduction The recognition of handwritten Chinese characters (HCC) has been a very active research area. Although much research has been done, it remains a challenge problem to develop a real system, especially in the large-set offline HCC recognition [1, 2]. To improve recognition performance, one of the promising research topics is combining different individual classifier with diversity and complementariness [3, 4]. The neural network classifier including multilayer perceptron (MLP) has been recognized as a powerful tool for implementing the base classifier due to its learning capacity to the classification function. This paper addresses the MLP-based HCC recognition. Although it has been proven in theory that, MLP can approach the Bayes optimal classification function under the least mean square error criterion, and has been successfully applied in small-set pattern classification problems such as the handwritten numerals recognition [5], there is not yet the experimental results showing its superiority over non-neural network approaches for the large-set HCC recognition when the conventional one large neural network structure is adopted [3]. The main reason is that the time required for convergence is too prohibitive for MLP to reach the optimal connecting weights, or even not converge in many cases. To speed up the convergence, a class modular neural network structure is proposed in [6], where the complex neural network is decomposed into multiple sub-neural networks, each dealing with *
This paper was supported by the Natural Science Foundation of Guangdong (NO. 04300098).
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 657 – 662, 2006. © Springer-Verlag Berlin Heidelberg 2006
658
X. Gao
one specific character class. The experimental results on 352 categories of handwritten Korean characters have shown its superiority in both convergence and recognition performance to the conventional neural network structure. However, the approach still faces the problem of too many, unbalanced training data for each sub-neural network in the large-set HCC recognition. In this paper, we proposed an improved training strategy of class-modular neural network for large-set HCC recognition. The basic idea is that, instead of using all samples for training each subnet classifier, we find a simplified sample set for each class-special subnet through a sample selecting process based on the pre-classification results. This strategy can greatly reduce the number of training data and the time for convergence while keeping the competitive recognition performance. In the following section, we first introduce the class-modular neural network structure briefly, and describe the training strategy in detail. Then, the experimental results are given to show the effectiveness of proposed training strategy. Finally, some concluding remarks are made.
2 Class-Modular Neural Network Structure In the conventional neural network framework, a K-class classification problem is implemented by one large network with K output nodes, each corresponding to one class. While in the class-modular neural network structure, the K-class classification task is decomposed into K 2-class subtasks, each subtask corresponds to the classification of one specific class (positive class) with that of the rest (negative class) and solved by a sub-neural network classifier. Therefore, the entire network consists of K independent subnets, each of which needs only two output nodes, one for the positive class and another for the negative class. In the training phase, each of K subnet classifiers is optimized independently of the other subnets, and the error-backpropagation algorithm is applied in the same way as the conventional MLP. The only difference is that we need relable all training samples into two groups, that is, the positive sample set Ω1 (samples from the positive class), and the negative sample set Ω 2 (samples from the negative class). In this way, the complexity of classification boundary for each subtask will reduce greatly comparing with the original classification problem, thus each subnet will be simpler and converge faster than the conventional network structure [5]. In recognition phase, K independent subnets work cooperatively and their outputs are combined to give the decision.
3 Training Strategy In class-modular neural network structure for the large-set HCC recognition, when each subnet is trained by the samples for all the character classes, the training set is still too large for the practical application, especially the negative sample set Ω 2 which can be K-1 times as many as the positive sample set Ω1 . However, if we consider the distribution of character classes in feature space for each 2-class classification task, it can be seen that, only the samples that are close to the classification boundary, contribute to building the boundary. Therefore, the training sample set can be reduced effectively by deleting the samples far from the classification boundary. In
A Training Strategy of Class-Modular Neural Network Classifier
659
this paper, we propose to use a minimum Euclidian distance classifier as a preclassifier to select the training samples for each sub-neural network. Usually, if the top candidate number n is chosen properly, the pre-classifier can guarantee that the HCC recognition rate of the top n candidates is very close to 100%. Here the top n candidates mean first n character classes which have the minimum distance from the input sample. Thus, the samples can be deleted from the training set of those subnets corresponding to the classes outside the top n candidates while keeping recognition rate. Table 1 shows recognition performance of different candidate number n for 1034 categories of handwritten Chinese characters. Table 1. Recognition rates of the top n candidates by the minimum distance classifier (1034 categories of handwritten Chinese characters)
Candidate number
5
10
15
25
50
Recognition rate (%)
97.87
98.62
98.94
99.31
99.58
From Table 1, we can see that recognition rate (ratio of correctly recognition samples in test samples set) of top 15 candidates is already close to 99 and can satisfy the requirement for pre-classification in many practical applications. For a K-class character classification task, we use the following criteria to select the training samples for each sub-neural network: Given the pattern sample x i suppose C i , , Ci be the top n candidate classes 1 n given by the minimum Euclidean distance classifier, Δ 1 , Δ 2 , , Δ K the K subnet classifiers corresponding to K character classes C1 , , C K , then, we have for the subnet Δ t : 1. All the pattern samples
x whose corresponding character class is Ct consist of
the positive sample set of subnet Δ t
xi satisfies j ∈ {i1 , , in } and j ≠ Ct set of subnet Δ t .
2. If
then
xi belongs to the negative sample
In the recognition phase, an unknown input pattern x is no longer applied to all the sub-neural networks. Instead, only the subnets corresponding to the top n candidate classes given by the pre-classifier are used to make the decision. Obviously, through the proposed sample selecting process, the training sample set for each sub-neural network will be reduced significantly from K ⋅ h to ( n + 1) ⋅ h samples (n << K as shown in Table 1), where h is the number of training sample for each character category. This will effectively speed up the convergence of neural network and partially solve the problem of unbalanced training samples as the experimental results show in the following section. However, to guarantee that the selected negative sample set for each sub-neural network well represents the distribution of all its possible negative samples in feature space, it is better to keep some sample redundancy in sample selecting process. In this
660
X. Gao
paper, we use an increment Δn of the candidate number to control the sample redundancy in the training set. Let Ω 2jn denote the selected negative sample set of subnet Δ j by using the top n candidate classes, then, the incremental negative sample set will be Ω 2j ( n +Δn ) while the positive sample set keeps the same. The Figure 1 shows the sample selecting process with the sample redundancy. + Pr ot ot ype vect or s i n pr ecl assi f i cai t on * Pat t er n sampl e xi
+
+
Cin
* + + ++ +
Cin+Δn
Fig. 1. Graphical depiction of
xi belonging to training sets of subnet Δ in
and
Δ in + Δn
4 Experimental Results To evaluate the performance of the proposed training strategy for class-modular neural network, the experiments were carried out on Pentium 4 2.8GHz PC. The experimental data is taken from China 863 National Handwriting Database HCL2000, in which the character patterns are scanned with the resolution of 300 DPI and normalized into 64×64 binary images. Before the training and recognition process, each handwritten Chinese character is represented with a 256-dimension directional meshing feature by using the feature extraction approach proposed in [7]. A fixed 3-layer neural network structure with 256 input nodes and 8 hidden nodes is applied in all the experiments; the anti-symmetric sigmoid as shown in equation (1) is adopted as the transfer function of neuron. f (v ) =
. 2a −a 1 + exp(−bv)
(1)
Where a = 1.716 and b = 2 3 respectively. In the error-backpropagation algorithm, we compute MSE (mean square error) of the positive training samples and the negative training samples at the network output layer respectively, and take the bigger to determine the termination of training according to the following termination criteria: if the MSE is less than ε or the number of training epochs exceeds T, then the training process terminates. The values of training parameters used in our experiments are shown in Table 2. In the first experiment, we compare the performance of neural network classifiers optimized by the training sample from all character classes (denoted as TS1) and the selected training samples (denoted as TS2). 470 categories of characters (Section
A Training Strategy of Class-Modular Neural Network Classifier
661
16~20 in GB2312-80) with each character 120 handwritten samples are used as experimental data, in which 100 samples are used as the training data and the rest 20 samples as the testing data. The experimental results are shown in Table 3. From table 3, it could be seen that, there is no significant difference in the recognition rate for both training strategies, while the average convergence time and the number of negative samples reduce greatly for the proposed training strategy. Considering the large-set HCC recognition with more than 6000 character categories, this is quite important to make the neural network based application more practical. Table 2. The training parameters of sub-neural network classifiers
ε
T
Learning rate
Momentum coefficient
0.01
5000
0.1
0.9
Table 3. The performance of neural network classifier using different training strategy ( n = 20, Δn = 10 )
Training strategy
Average positive sample number
Average negative sample number
Average convergence time
Recognition rate
TS1
100
46900
1473s
93.5%
TS2
100
2900
82.3s
93.3%
Table 4. The recognition performance of different increment ( n = 30 )
Δn Average negative sample number Average convergence time (s) Recognition rate (%)
-10 1900.7 122.7 85.73
0 2900.0 176.2 86.45
Δn
of candidate number
10 3900.4 270.0 86.74
15 4400.0 291.8 86.70
From table 4, it could be seen that keeping some redundancy in sample selecting process can improve the recognition rate, and there is no further improvement when the redundancy reach a certain degree. For example, the best Δn for 1034 categories of HCC recognition is 10. On the other hand, the insufficient sample redundancy will decrease the recognition performance significantly.
5 Conclusions The convergence time of the conventional MLP-based neural network classifier has become one of the important aspects that affect its successful application in large- set HCC recognition task, although its recognition performance for handwritten numerals is quite promising. To improve the convergence of the neural network classifier, a
662
X. Gao
class-modular neural network structure has been proposed and proven to be effective in the 352 categories of Korean characters recognition problem in [6]. However, each subnet classifier still faces the problem of too many training data in the large-set HCC recognition. In this paper, we proposed an improved training strategy of each subnet through the sample selecting process based on the pre-classification results. The experimental results on 470 and 1034 categories of handwritten Chinese characters respectively indicate that, the training strategy can reduce the convergence time significantly without decreasing the recognition performance, showing its effectiveness. In the future research, we will concentrate on selecting the class-special feature of characters for each subnet classifier to improve the training process further.
References 1. Hildebrand, T.H., Liu, W.: Optical Recognition of Handwritten Chinese Characters: Advances since 1980. Pattern Recognition, 2 (1993) 205-225 2. Kimura, Y., Wakahara, T.: Toward Robust Handwritten Kanji Character Recognition. Pattern Recognition Letters, 10 (1999) 979-990 3. Dai, R.W., Hao, H.W., Xiao, X.H.: System and Integration of Chinese Character Recognition. Zhejiang Science & Technology Publishing House, Hangzhou (1998) 4. Liu, C.L., Fujisawa, H.: Classification And Learning For Character Recognition: Comparison Of Methods And Remaining Problems. Int. Workshop on Neural Networks and Learning in Document Analysis and Recognition, Seoul (2005) 5. Cho, S.B.: Neural-Network Classifiers for Recognizing Totally Unconstrained Handwritten Numerals. IEEE Trans. Neural Networks, 1 (1997) 43-53 6. Oh, I.S., Suen, C.Y.: A Class-Modular Feedforward Neural Network For Handwriting Recognition. Pattern Recognition, 1 (2002) 229-244 7. Gao, X., Jin, L.W., Yin, J.X.: A Stroke-Density Based Elastic Meshing Features Extraction Method. Pattern Recognition & Artificial Intelligence, 3(2002) 351-354
Active Set Iteration Method for New L2 Soft Margin Support Vector Machine Liang Tao1 and Juan-juan Gu2 1 School
of Computer Science and Technology, Anhui University, Hefei 230039, P. R. China [email protected] 2 Dept. of Electronics and Electrical Engineering, Hefei University, Hefei 230022, P. R. China [email protected]
Abstract. This paper introduces a new L2 soft margin support vector machine (new L2 SVM). The dual problem for the constrained optimization of the SVM is a convex quadratic problem with simple bound constraints. The active set iteration method for this optimization problem is applied as a fast learning algorithm for the SVM, and the selection of the initial active/inactive sets is discussed. For incremental learning and large-scale learning problems, a fast incremental learning algorithm for the SVM is presented. Computational experiments show the efficiency of the proposed algorithms.
1 Introduction The support vector machine (SVM) developed by V. Vapnik and his team at AT&T Bell Labs can be seen as a new way to train polynomial, neural network, or Radial Basis Function classifiers, based on the idea of structural risk minimization rather than empirical risk minimization. Some classical problems such as multi-local minima, curse of dimensionality and overfitting in neural networks, seldom occur in support vector machines. In the past few years, support vector machines (SVM) have generated a great interest in the community of machine learning due to its excellent generalization performance in a wide variety of learning problems. However, training support vector machines is still a bottleneck, especially for a largescale learning problem. Therefore, it is important to develop a fast training algorithm for SVMs to facilitate its applications to various engineering problems. In this paper, a new L2 soft margin support vector machine (new L2 SVM) is introduced. What is unusual for the SVM is that the dual problem for the constrained optimization of the SVM is a convex quadratic problem with simple bound constraints, which enables the active set iteration method [1] to be applied as a fast learning algorithm for the SVM. Moreover, for incremental learning and large-scale learning problems, a fast incremental learning algorithm for the SVM is presented. Computational experiments are carried out to show the efficiency of the proposed algorithms. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 663 – 669, 2006. © Springer-Verlag Berlin Heidelberg 2006
664
L. Tao and J.-j. Gu
2 New L2 SVM and Its Fast Learning Algorithms 2.1 New L2 SVM Given that training samples { xi , yi }i∈Nall
Nall:={1, 2, · · ·, N}, xi∈Rd,
x i = [ x1i xi2 ⋅ ⋅ ⋅ xid ]T , yi∈{+1, −1}, where yi is the class label. In soft margin support vector machine, we consider the linear decision function D( x ) =
M
¦ w jϕ j ( x ) + b
= w Tϕ ( x ) + b
(1)
j =1
in the feature space, where w = [ w1 w2 ⋅ ⋅ ⋅ wM ]T is the weight vector, ij(x)=[ij1(x) ij2(x) · · · ijM(x)]T is the mapping function that maps the input x into the feature space, b is a scalar. The L2 soft margin support vector machine (L2 SVM) presented in [2] requires the solution of the following optimization problem:
Min w
1 T C§ N · (w w ) + ¨¨ ¦ ȟ i2 ¸ 2 2 © i =1 ¸¹
° y (w Tϕ ( xi ) + b) − 1 + ȟ i ≥ 0 s.t. ® i °¯ ξ i ≥ 0, i = 1, 2, ⋅ ⋅ ⋅ , N
(2)
where ξ = [ȟ 1 ȟ 2 ⋅ ⋅ ⋅ ȟ N ]T , its elements are the positive slack variables, N is the number of training samples, C is the margin parameter. A new L2 soft margin support vector machine (new L2 SVM) was presented in [3] and [4] via a little change of the cost function in (2). Denote W = [ w1 w2 ⋅ ⋅ ⋅ wM b ]T , and ĭ(x)=[ij1(x) ij2(x) · · · ijM(x) 1] T. The linear decision function (1) can be rewritten as: D( x ) = W TΦ ( x )
(3)
We replace w in the cost function in (2) with W, and (2) becomes: Min w
C§ N · 1 T (w w + b 2 ) + ¨¨ ¦ ȟ i2 ¸ 2 2 © i =1 ¸¹
° y (w Tϕ ( xi ) + b) − 1 + ȟ i ≥ 0 s.t. ® i °¯ ξ i ≥ 0, i = 1, 2, ⋅ ⋅ ⋅ , N
(4)
The SVM described by (4) is called the new L2 soft margin support vector machine (new L2 SVM). The dual problem for the new L2 SVM can be written as: Min α
1 T α Qα − e Tα 2
s.t. α ≥ 0
(5)
where e is an N×1 vector with all the elements being 1; α = [Į 1 Į 2 ⋅ ⋅ ⋅ Į N ]T is the Lagrange multiplier vector; Q = [Qij ]i,j∈ N , all
(
)
Qij = yi y j K ( xi , x j ) + 1 + įij / C
(6)
Active Set Iteration Method for New L2 Soft Margin Support Vector Machine
665
where į ij = 1 if and only if i=j, otherwise, įij = 0 , and K ( xi , x j ) is the kernel function, K ( xi , x j ) = ϕ ( x i ) T ϕ ( x j ) =
M
¦ϕ m ( xi )ϕ m ( x j ) ,
for i , j = 1, 2, ⋅ ⋅ ⋅ , N
(7)
m= 0
Note that Q is a positive definite matrix, and the dual problem for the new L2 SVM described by (5) is a convex quadratic minimization problem with simple bound constraints. Researches and experiments in [3] and [4] have shown that the generalization performance of the new L2 SVM is very close to that of traditional SVMs although they have many differences; therefore, in the remaining of this paper, we will focus on the research on the fast learning algorithm for the new L2 SVM. 2.2 Fast Learning Algorithm for New L2 SVM
For the convex quadratic minimization problems with simple bound constraints like (5), the reference [1] presented a simple and fast algorithm based on the active set iteration method. We will apply it to the dual problem (5) of the new L2 SVM. The following notations will be used throughout. For a subset A ⊆ N all := {1,2, ⋅ ⋅ ⋅ , N } , we write α A for the components of Į indexed by A, i.e.
ĮA := (α i ) i∈A . The complement of A will be denoted by A . If Q is a matrix and A and B are subsets of Nall, then QA,B is the submatrix of Q, with rows indexed by A and columns indexed by B. The Karush-Kuhn-Tucker (KKT) system for (5) is given by Qα − e − z = 0
z •α = 0
(8) (9)
z≥0
(10)
α ≥0
(11)
where z = [ z1 z 2 ⋅ ⋅ ⋅ z N ] T is the Lagrange multiplier vector, we write z • α to denote the vector of element-wise products, i.e. z • α := ( z iα i ) i∈N all . The crucial step in solving (5) is to identify those inequalities which are active, i.e. the active set A ⊆ N all , where the solution to (5) satisfies ĮA = 0. Then, with the inactive set I := A =Nall\A, we must have zI = 0, and the elements of I are just the indexes of support vectors in { xi }i∈N all . To compute the remaining elements ĮI and zA of Į and z, the reference [1] proposed an active set iteration algorithm by using (8) and partitioning the equations and variables according to the active set Ak and the inactive set I k at the k-th iteration:
ªQ A k , A k Q A k , I k º «Q Q I k , I k »» ¬« I k , A k ¼
ª 0 º ªe A k º ª z A k º «¬α I k »¼ − « e k » − « 0 » = 0 ¼ ¬ I ¼ ¬
(12)
666
L. Tao and J.-j. Gu
The second set of equations can be solved for α
Ik
, because QI k , I k is by
assumption positive definite:
α I k = Q I−k1, I k e I k and the first set of equations can be solved for z
Ak
(13)
,
z Ak = Q Ak , I k α I k − e Ak
(14)
If α I k ≥ 0 and z A k ≥ 0 , then stop the iteration; otherwise, let
A k +1 = { j | Į j < 0 or z j > 0}
(15)
and the (k+1)-th iteration is continued. The reference [1] provided sufficient conditions for the iterations to converge in a finite number of steps with an optimal solution. The experiments in [1] indicated that this algorithm often requires only a few iterations to find the optimal solution. 2.3 Selection of the Initial Active/Inactive Sets A0 and I 0
From (13), we note that the inverse matrix of QI k , I k has to be computed at each iteration, which needs huge computation when the size of Ik becomes very large. In fact, the size of Ik is usually small when the iteration algorithm applied to the SVM converges, because at the final iteration, the elements of Ik are just the indexes of support vectors in { xi }i∈N all and the number of support vectors is usually much less than N. Therefore, a specific selection method of the initial active/inactive sets A0 and I0 is given as follows in order to control the size of Ik, especially, and to avoid the occurrence of the worse case Ik=Nall or Ik=∅ in the iteration process. • Suppose that there are two classes of samples, Ca and Cb, with a total of N samples, indexed from 1 to N, and the number of Ca is less than or equal to that of Cb, i.e., • If the sample number of Ca is less than 0.25N, combine all the samples of Ca and some samples of Cb into a total of 0.5N samples, select the indexes of the 0.5N samples as an initial inactive set I 0, and the complement of I 0 in Nall as an initial active set A0. • If the sample number of Ca is more than or equal to 0.25N, combine 0.25N samples of Ca and 0.25N samples of Cb into a total of 0.5N samples, select the indexes of the 0.5N samples as an initial inactive set I0, and the complement of I0 in Nall as an initial active set A0. 2.4 Fast Incremental Learning Algorithm for New L2 SVM
In this section, a simple and fast incremental learning algorithm is presented based on the fast learning algorithm of the new L2 SVM. The incremental learning problem
Active Set Iteration Method for New L2 Soft Margin Support Vector Machine
667
can be described in the following: Given an existing training sample set U0, and a set of support vectors U 0sv obtained by learning with the training set U0; and an incremental training sample set H, and U0∩H=∅. The problem we want to solve is to quickly find a new set of support vectors U sv corresponding to the new training sample set U=U0∪H, based on the given support vector set U 0sv . Because the fast learning algorithm of the new L2 SVM is based on the simple linear algebra, a simple and fast incremental learning algorithm can be obtained as follows: • In terms of the selection method of the initial active and inactive sets, combine half of the samples of H and all the samples of U 0sv into a new sample set T; • Select the indexes of T as an initial inactive set I 0, and the complement of I 0 as an initial active set A0. Apply the fast learning algorithm of the new L2 SVM to the training sample set U , as a result, a new set of support vectors Usv is solved. In the incremental learning process, because the selected initial inactive set I 0 contains all the indexes of the given support vector set U 0sv , I 0 is even closer to the indexes of U sv, which leads to the fast convergence of the learning algorithm of the new L2 SVM. For large-scale learning problems, we can separate the training sample set U into p subsets, i.e., U = U0∪H1∪ H2∪⋅⋅⋅∪Hp–1. First of all, we can apply the learning algorithm of the new L2 SVM to the initial training sample set U0, then apply the incremental learning algorithm in series to the training sample set, U i = U0∪ H1∪H2∪⋅⋅⋅∪Hi, until i=p−1. We call it batch-incremental learning algorithm.
3 Computational Experiments We use a face database with 2240 face images (224 people, 10 face images for each person) to evaluate the performance of the proposed algorithm. The training sample set U consists of 2240 facial feature vectors extracted from the 2240 face images. All the algorithms in the experiments are programmed by the code of Matlab 6.1 and work on a Pentium III/450M personal computer. The learning algorithms are designed to make a distinction between the 10 images of one specified person and others. Suppose
C=105 in Q, and the Gaussian function is used as the kernel function ( σ 2 = 0.1 ): K ( xi , x j ) = exp( − || xi − x j ||22 / σ 2 )
(16)
In the first experiment, the learning algorithm of the new L2 SVM is applied based on the common used optimization method by using the “fmincon” function in the optimization toolbox of Matlab 6.1. Unfortunately, it has not been able to reach the final result after one day running. In the second experiment, the learning algorithm of the new-L2 SVM is applied based on the active set iteration method [1]. We test the algorithm under 3 different selections of the initial inactive sets: 1) I 0=∅; 2) I 0=Nall; 3) Specific selection of I 0 presented in Section 2.3. Table 1 shows the results on iteration time and number. We can see that the specific selection of Ik reached the best result. In the third experiment, U is separated into 3 subsets: U0 with 720 feature vectors, H1 with 720 feature vectors, and H2 with 800 feature vectors. The learning
668
L. Tao and J.-j. Gu
algorithm of the new-L2 SVM based on the active set iteration method is applied to the initial training subset U0, then the incremental learning algorithm is applied in series to U1 =U0∪H1 and U=U2 =U1∪H2 respectively. Table 2 shows the results on iteration time and number. The total iteration time taken by the batch-incremental learning algorithm is 0.61s+1.48s+7.85s=9.94s, which is much less than the iteration time 64.92s taken by the learning algorithm in Table 1. Therefore, for large-scale learning problems, using the incremental learning algorithm can save a lot of learning time. In the fourth experiment, to compare with the proposed algorithms, the wellknown SVM learning algorithm ---SVMLight [5] is applied to the classification problem using the Matlab code provided by Schwaighofer [6]. The running time is 17.50s, which is longer than the time (9.94s) taken by the proposed batch-incremental learning algorithm. Note that the KKT conditions are strictly satisfied in the proposed SVM algorithms while they are met with tolerance in the SVMLight algorithm. In other words, the solution obtained by the SVM-Light algorithm is only approximately optimal. Table 1. Iteration time and number under 3 different selections of initial inactive sets Iteration No. k
I 0=∅
I 0=Nall
Specific Selection
0 1 2 3 4 5 6 7 8 9 10 11 12
0 2240 1201 702 426 277 193 154 139 135 134 133 133
2240 1201 702 426 277 193 154 139 135 134 133 133 --
1120 1719 974 582 368 238 177 146 139 134 133 133 --
Iteration time
130.33s
120.35s
64.92s
Table 2. Iteration time and number for incremental learning algorithm Iteration No. k
Learning, U0 Size of I k
0 1 2 3 4 5 6 7 8 9
360 371 217 133 83 54 45 42 42 --
402 568 343 226 168 133 113 108 104 104
504 801 463 301 219 166 142 133 133 133
Iteration Time
0. 61s
1.48s
7.85s
Incr. Learning, U1 Size of I k
Incr. Learning, U Size of I k
4 Conclusions We have introduced a new L2 soft margin support vector machine in this paper. Because the dual problem for the constrained optimization of the SVM is a convex quadratic problem with simple bound constraints, the active set iteration method can
Active Set Iteration Method for New L2 Soft Margin Support Vector Machine
669
be applied as a fast learning algorithm for the SVM. To reduce the computation of the inverse matrix of QI k , I k , we presented a selection method for the initial active/inactive sets in order to control the size of I k, especially, to avoid the occurrence of the worse case I k =Nall or I k =∅ in the learning iteration process. For incremental learning and large-scale learning problems, we proposed a simple and fast batch-incremental learning algorithm. Computational experiments showed the efficiency of the proposed algorithms.
Acknowledgments The authors would like to acknowledge the supports of the National Natural Science Foundation of China under Grant No. 60572128, the Anhui Provincial Talent Development Foundation under Grant No. 2005Z029, and the Anhui University Program for Creative Teams.
References 1. Kunisch, K., Rendl, F.: An Infeasible Active Set Method for Quadratic Problems with
Simple Bounds, SIAM Journal on Optimization, vol. 14(2003) 35-52 2. Abe, S.: Analysis of Support Vector Machines. Proceedings of the 2002 12th IEEE
Workshop on Neural Networks for Signal Processing, (2002) 89-98 3. Tao, L.: Researches on Face Recognition Algorithms for Human Identification. Ph. D.
Dissertation, University of Science and Technology of China (2003) 4. Wu, G. M.: Machine Learning Based on Kernels. Ph. D. Dissertation, University of
Science and Technology of China (2002) Light
5. Joachims, T.: SVM Support Vector Machine, http://svmlight.joachims.org/ 6. Anton Schwaighofer: Software that Might be of Interest, http://www.cis.tugraz.at/igi/
aschwaig/software.html
Adaptive Eigenbackground for Dynamic Background Modeling Lei Wang, Lu Wang, Qing Zhuo, Huan Xiao, and Wenyuan Wang Department of Automation, Tsinghua University, Beijing, 100084, P.R.China {wlei04, l-wang02}@mails.tsinghua.edu.cn
Abstract. Background modeling is an essential and important task for many computer vision applications. In this paper, we propose an effective and adaptive background modeling approach based on the eigenbackground [1]. Instead of learning the eigenbackground from the training set off-line and fixing it during the detection procedure, the proposed approach updates the eigenspace on-line and employs linear prediction model to detect moving objects. Experimental results demonstrate that the proposed approach is able to model the background and detect moving objects under various type of background scenarios and with close to real-time performance.
1
Introduction
Background modeling is one of the most fundamental tasks for many computer vision applications such as real time tracking and video surveillance. The obvious way to segment foreground and background is to select a reference image representing all the static structure within the scene, and to subtract the observed image from this image. The resulting difference image is thresholded to extract the moving objects. This task looks like fairly simple, but in real world applications this approach rarely works. Usually background is never static and varies by time due to several reasons such as lighting changes, moving background objects and non-stationary scenes. To overcome these problems, it is crucial to build a stochastic representation of the background and continuously adapt it to the current environmental conditions. The popular approach for background modeling is to model the probability distribution of the pixel intensity values using Mixture of Gaussians. In [2] a mixture of three Gaussian distributions corresponding to road, vehicle and shadow is used to model pixel intensity values for a traffic surveillance application. In [3], pixel intensity values are modeled by mixture of K Gaussians (typically K is three to five) and the model parameters are updated using an online Expectation Maximization (EM) algorithm. [4] uses a nonparametric background model by estimating the probability of each observed pixel intensity value from many recent samples over time using Kernel Density Estimation. Although these pixel-based techniques seem to be reasonable choices for background modeling, they ignore the correlation of neighboring pixels and they are D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 670–675, 2006. c Springer-Verlag Berlin Heidelberg 2006
Adaptive Eigenbackground for Dynamic Background Modeling
671
not computationally possible for real-time applications. In [1], eigenbackground is presented to model the background based on subspace analysis. This method exploits the correlation of pixels and offer less computational load. However, it fails to deal with dynamic background because the eigenspace of the eigenbackground is learned from training set off-line and fixed during the detection procedure. In order to model dynamic background, we extend the work of [1] by proposing an efficient and adaptive background approach. The proposed approach updates the eigenspace online with sequential Karhunen-Loeve algorithm and employs linear prediction model to detect moving objects. The remainder of the paper is organized as follows. In section 2, we briefly present the concept of the eigenbackground while in section 3 we describe the proposed approach. Experimental results are shown in Section 4, followed by Section 5, which concludes the paper.
2
Eigenbackground
Eigenbackground was first described by Oliver and Pentland [1] as an approach for detecting moving objects. It builds an eigenspace to model the background. This eigenspace model describes the range of appearances (e.g., lighting variations over the day, weather variations, etc.) that have been observed. Let {Ii }i=1···N be a given set of column vector representation of the previous N observations. We can compute the mean vector μb and subtract it from the input image to get zero mean vector {Xi }i=1,2,··· ,N where Xi = Ii − μb . Then we can obtain the covariance matrix: Cb = E{Xi XiT } ≈ N1 XXT , where X = [X1 , X2 , · · · , XN ]. This covariance matrix can be diagonalized as: Lb = Φb Cb Φb , where Φb is the eigenvector matrix of Cb and Lb is the diagonal matrix. Φb can be calculated through the singular value decomposition (SVD) of X: X = U SV T . The eigenvectors of Cb are the columns of the matrix U , while the elements of the diagonal matrix S are the square root of the corresponding eigenvalues. In order to reduce the dimensionality of the space, only M eigenvectors are stored, corresponding to the M largest eigenvalues to give a matrix ΦM . Once a new image, I, is available, it is first projected onto the M eigenvectors subspace as I = ΦTM (I − μb ) and then I is reconstructed as I = ΦM I + μb . Because the moving objects don’t appear in the same location in the N sample images, they do not have a significant contribution to this background model. Consequently, the eigenspace is a good model for the static parts of the scene, but not for the moving objects. Finally, by computing and thresholding the Euclidean distance between the input image I and the reconstructed image I , we can detect the foreground points at the locations: |I − I | > T , where T is a given threshold. Although the eigenbackground model exploits the correlation of pixels and offer less computational load compared to pixel-based methods, it fails to deal with the dynamic background because the eigenspace is learned from the training set off-line and do not update during the detection procedure.
672
3
L. Wang et al.
Adaptive Background Modeling
In order to model dynamic background, we propose an incremental method that updates the eigenspace of the background model using a variant sequential Karhunen-Loeve algorithm which in turns is based on the classis R-SVD method. In addition, linear prediction model is employed to make the detection more robust. 3.1
Incremental Update of Eigenspace
The SVD of d × n matrix X = U SV T . The R-SVD algorithm provides an efficient way to carry out the SVD of a larger matrix X ∗ = [X, E], where E = [In+1 , In+2 , · · · , In+k ] is a d × k matrix containing k incoming observations as follows [5]: 1. Use an orthonormalization process (e.g., Gram-Schmidt algorithm) on [U, E] to ˜ obtain an orthonormal matrix " U #= [U, E].
V 0 be a (n+k)×(n+k) where Ik is a k dimensional 0 Ik identity matrix. It follows then, " T# " # " T # " # U S UT E U XV U T E V 0 S = U T X ∗ V = ˜ T [X, E] = ˜T = . (1) 0 Ik E E XV E˜ T E 0 E˜ T E
2. Let the matrix V =
˜ S˜V˜ T and the SVD of X ∗ is 3. Compute the SVD of S = U ˜ )S( ˜ V˜ T V T ). ˜ S˜V˜ T )V T = (U U X ∗ = U (U
(2)
˜ is an d × (n + k) columnwhere S˜ is a diagonal (n + k) × (n + k) matrix, U U ˜ orthonormal matrix and V V is an (n + k) × (n + k) column-orthonormal matrix. Based on the R-SVD method, the sequential Karhunen-Loeve algorithm is able to perform the SVD computation of larger matrix X ∗ efficiently using the smaller matrices U , V and the SVD of smaller matrix S . Note that this algorithm enables us to store the background model for a number of previous frames and perform a batch update instead of updating the background model every frame. 3.2
Detection
We use linear prediction [6,7] to detect foreground. This method employs a Wiener filter to estimate pixel intensity value of each pixel using latest P frames. Let I (t − 1), I (t − 2), · · · , I (t − P ) present the projections of latest P frames onto the eigenspace, i.e. I (t − i) = ΦTM (I(t − i) − μb ), i = 1, 2, · · · , P . The projection of current frame onto the eigenspace can be predicted as:
Ipred (t) =
P i=1
ai I (t − i).
(3)
Adaptive Eigenbackground for Dynamic Background Modeling
673
the current frame can be computed as:
Ipred (t) = ΦM Ipred (t) + μb .
(4)
differences between the predicted frame and the current frame are computed and thresholded, the foreground points are detected at the locations: |I(t) − Ipred (t)| > T , where T is a given threshold. 3.3
The Proposed Method
Put the initialization, detection and eigenspace update modules together, we obtain the adaptive background modeling algorithm as follows: 1. Construct an initial eigenspace: From a set of N training images of background {Ii }t=1···N , the average image μb is computed and mean-subtracted images X are obtained, then the SVD of X is performed and the best M eigenvectors are stored in an eigenvector matrix ΦM . 2. Detection: For an incoming image I, the predicted projection Ipred is first computed then it is reconstructed as Ipred , foreground points are detected at locations where |I − Ipred | > T . 3. Update the eigenspace: Store the background model for a number of previous frames and perform a batch update of the eigenspace using sequential Karhunen-Loeve algorithm. 4. Go to step 2.
(a)Input images
(b)Detection Results Fig. 1. Detection results of the first image sequence
674
4
L. Wang et al.
Experiments
In order to confirm the effectiveness of the proposed method, we conduct experiments using three different image sequences. The first is the scene of the ocean front which involves waving water surface. The second is the scene of the fountain which involves long term changes due to fountaining water and illumination
(a)Input images
(b)Detection Results Fig. 2. Detection results of the second image sequence
(a)Input images
(b)Detection Results Fig. 3. Detection results of the third image sequence
Adaptive Eigenbackground for Dynamic Background Modeling
675
changes. The third is the scene of a lobby where the lights switch. In order to reduce complexity, the images are divided into equal size blocks and each block is updated and detected individually in our experiments. Experimental results are shown in Fig. 1, Fig. 2 and Fig. 3. We can see from the results that the proposed method is able to give good performance when the appearance of the background changes dramatically. Our current implementation of the proposed method in MATLAB runs about six frames per seconds on a Pentium IV 2.4GHz processor and can certainly be improved to operate in real time.
5
Conclusion
In this paper, we extend the eigenbackground by proposing an effective and adaptive background modeling approach that 1)updates the eigenspace on-line using the sequential Karhunen-Loeve algorithm; 2)employs linear prediction model for object detection. The advantage of the proposed approach is its ability to model dynamic background. Through experiments, we claim that the proposed method is able to model the background and detect moving objects under various type of background scenarios and with close to real-time performance.
References 1. Oliver, N.M., Rosario, B., Pentland, A.P.: A Bayesian Computer Vision System for Modeling Human Interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (2000) 831–843 2. Friedman, N., Russell, S.: Image Segmentation in Video Sequences. In: Proceedings of the Thirteeth Conference on Uncertainty in Artifical Intelligence. (1997) 175–181 3. Stauffer, C., Grimson, E.: Adaptive Background Mixture Models for Real-time Tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Volume 2. (1999) 246–252 4. Mittal, A., Paragios, N.: Motion-based Background Subtraction using Adaptive Kernel Density Estimation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Volume 2. (2004) 302–309 5. Ross, D., Lim, J., Yang, M.H.: Adaptive Probabilistic Visual Tracking with Incremental Subspace Update. In: Proceedings of the Eighth European Conference on Computer Vision. Volume 2. (2004) 470–482 6. Monnet, A., Mittal, A., Paragios, N., Ramesh, V.: Background Modeling and Subtraction of Dynamic Scenes. In: Proceedings of the Ninth IEEE International Conference on Computer Vision. (2003) 1305–1312 7. Toyama, K., Krumm, J., Brumitt, B., Meyers, B.: Wallflower: Principles and Practice of Background Maintenance. In: Proceedings of the Seventh IEEE International Conference on Computer Vision. Volume 1. (1999) 255–261
Adaptive Content-Based Image Retrieval Using Optimum Fuzzy Weight Value Dong-Woo Kim, Young-Jun Song, Un-Dong Chang, and Jae-Hyeong Ahn Chungbuk National University, 12 Gaeshin-dong, Heungduk-gu, Chungbuk, Korea {[email protected], [email protected], [email protected], [email protected]}
Abstract. As a result of development of the internet and increase of digital contents, management of image information has become an important field. And appearance of content-based image retrieval has been developing the systematic management of image information much more. The existing method used several features such as color, shape, texture, etc, and set them as weight value, which caused every image to have big precision difference. The study used the fuzzy-integral method to improve the above problem, so that it has produced the optimum weight value for each image. And the proposed method, as a result of being applied to 1,000 color images, has showed better precision than the existing.
1 Introduction Today, development of computer technology and digital contents has made it possible to easily acquire and store various image as well as text. Such image information has been easy to store and increased in use, but has been more difficult in management. Particularly, image retrieval, at the early stage, used text-based image retrieval [1], but a variety of data like image had a limit in retrieval methods using text or keyword. Therefore, effective image management needed new retrieval methods, so CBIR(content-based image retrieval)[2] has appeared which makes objective and automatic image retrieval possible by automatically extracting and retrieving features from image itself. The major problem of the above method is extracting features, and finding similarity between queried image and images within database. This study has proposed an adaptive content-based image retrieval method that extracts features from color, texture, and shape, and uses fuzzy-integral image retrieval. As for the content-based image retrieval method, first, the typical technique retrieving color information uses color histogram proposed by Swain[3]. Second, the technique using texture took advantage mostly of frequency transformation domain; Wu et al. [4] used DCT (discrete cosine transform), and Yuan et al. [5] proposed a method using wavelet. Third, as a method using shape, Jain et al. [6] proposed a retrieval method used in limited applications like logo or trade-mark retrieval. Now, the method[7] using a mixture of 2~3 features, not using just each of 3 features, and the method[8] using neural network are proposed. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 676 – 682, 2006. © Springer-Verlag Berlin Heidelberg 2006
Adaptive Content-Based Image Retrieval Using Optimum Fuzzy Weight Value
677
On the other hand, fuzzy set proposed by Zadeh[9] in 1965 considers as fuzziness the degree of ambiguity resulting from human subjective judgment, and treats its degree as a fixed quantity. Fuzzy measure and fuzzy integral, mathematical conception proposed by Sugeno[10] in 1972, tries to overcome the limitation of such ambiguity through the fuzzy evaluation that transits the general additive into nonadditive method. In comparing similarity, such fuzzy integral increases precision by giving the optimum weight value between several features. The rest of the paper is organized as follows. Section 2 describes the proposed method, and Section 3 shows the experimental results. Last, Section 4 discusses the consequence.
2 Feature Extract and Similarity Comparison 2.1 Feature-Region Extract Extraction of feature region mostly uses color, texture, and shape information. Of them, color and texture features use after dividing the existing region[11]. At this time, color information acquired for each region is RCFV (region color feature vector); it is expressed as the following eq.(1) if region number is rk, and histogram ratio pi. RCFV = [rk , pi ], (k = 1, 2,, N , i = 1, 2,, M )
(1)
Here, M means quantization color level and N means the total number of blocks dividing the region of image; M for the study is 12, and N is 16. N is the experimental value. Texture information compensated the part using only the DC of the DCT by adding AC information. As using all AC increases calculation complexity, AC coefficients are recomposed of just each direction component. The size of each region of the proposed method is 64×64, so DCT transformation into 8×8 block can acquire 64 DC coefficients and synthesized AC coefficients. The average of DC coefficients and the average of AC coefficients are expressed as dk and akj each, and used as texture feature so that the acquired coefficients may be used as feature vector for each region. The acquired texture feature is expressed as a vector type like eq. (2) if the acquired texture feature is RTFV (region texture feature vector), region number rk, DC value dk, and the average of horizontal and vertical and diagonal line and the rest AC coefficient akj. RTFV = [ rk , d k , a kj ],
( k = 1, 2, , N ,
j = 1, 2, 3, 4)
(2)
Shape information uses edge. Edge pixel is selected just as over 128, the central value of lightness, for detecting only important edge. The selected edge pixel, in order to exclude minute edges, can be recognized as edge only when linked consecutively 3 times. Each edge histogram extracted from image is acquired according to each region, and used as shape feature. The acquired each-region edge histogram (RSFV: region shape feature vector) is expressed as a vector type like eq. (3) if region number is rk, and region edge histogram ek. RSFV = [rk , ek ], (k = 1, 2,, N )
(3)
678
D.-W. Kim et al.
Color feature vector (RCFV), texture feature vector (RTFV), and shape feature vector (RSFV) can be merged if the same-size regions are used. That is, merging color, shape, and texture in each region can raise precision. Equation (4) expresses RFV(region feature vector), which merges RCFV, RTFV, and RSFV. RFV = [ rk , pi , d k , a kj , ek ] (4) The acquired RFV has 1 shape feature, 5 texture features, and 12 color features according to the 12 levels of quantization for each of the 16 regions. 2.2 Comparison of Fuzzy-Integral Similarity Various methods for similarity comparison have been proposed[12]. Of them, the study used histogram intersection function with less calculation than the others. At this time, using several features arbitrarily fixes the weight value of each feature or manually sets its weight value. Therefore, setting weight value, when using fuzzy integral, can raised the efficiency of retrieval. As for the proposed method, fuzzy measure is set as item X = {x1, x2, x3}; x1 is established by color, x2 by texture, and x3 by shape. H, the power set of each item, is ij, {x1}, {x2}, {x3}, {x1, x2}, {x1, x3}, {x2, x3}, {x1, x2, x3}. At this time, g(xi), the fuzzy measure of each set, is shown in table 1 as precision appearing in retrieving optional 30 images just up to the 10th order by the method chosen as a power set. The values of fuzzy measures are experimental values. Table 1. Fuzzy measures
H ĭ x1 x2 x3 x1, x2 x1, x3 x2, x3 x1, x2, x3
Means ij Color Texture Shape Color, Texture Color, Shape Texture, Shape Color, Shape, Texture
g(xi) 0.00 0.80 0.35 0.20 0.92 0.85 0.43 1.00
The weight values are applied to fuzzy measure for database image and each queried image each. Equation (5) expresses the chosen fuzzy measure when normalized measure (Nxi) is xi as a single case. And equation (6) expresses Wxi(weight value applied to fuzzy measure) when the chosen measure is xm. Nxi =
g ( xi )
, i ∈{1, 2, 3}
3
¦ g(x
j
)
(5)
j =1
Nx i + ( Nx m × Nx m ), ° § ° ¨ Wx i = ® Nx − Nx × Nx m × ° i ¨¨ m °¯ ©
xi = x m · Nx i ¸ , i ∈ {1, 2, 3} ¸, x i ≠ x m Nx ¦ j¸ j∉{m} ¹
(6)
Adaptive Content-Based Image Retrieval Using Optimum Fuzzy Weight Value
679
The weight value of each feature is expressed in table 2 when it is substituted in eq.(5), and (6) by the fuzzy measure in table 1. Table 2. The weighting value of features
Selected feature x1 x2 x3 x1, x2 x1, x3 x2, x3
Color(Wx1) 0.96 0.56 0.59 0.48 0.60 0.39
Texture(Wx2) 0.03 0.32 0.25 0.43 0.14 0.41
Shape(Wx3) 0.01 0.12 0.16 0.09 0.26 0.20
As a result, the final similarity, as in eq.(7), results from what multiplication the original similarity by weight values. 3
¦ (x
i
i =1
× Wxi )
(7)
Fig.1 shows the whole block diagram of proposed method. In fig. 1, the solid lines mean the creation of feature vectors from input image and the dotted lines query processing. Input image
Query image
Feature extraction Color
Feature extraction
Fuzzy integral
Compare to similarity
Retrieval
Texture
& Fuzzy measure
Shape Feature DB
Fig. 1. The whole block diagram of proposed method
3 Experimental Results The study evaluated the performance of the proposed content-based image retrieval system of 1,000 natural images. 1,000 pieces of images were divided into 10 groups with 100 pieces each; the same group was composed of similar images. Each of the images is 256×384 size or 24bit color jpeg of 384×256 size, and often used in content-based image retrieval [13]. The study used precision and recall for evaluating the efficiency of the system [12]. The experiment compared 2 methods; one was a method making the weight value of 3 features fixed by adding shape information to the existing method [11], the other was the proposed method applying the optimum weight value by using fuzzy integral.
680
D.-W. Kim et al.
The whole performance of each method is shown by table 3 acquiring the average precision of up to the 10th order. As for the whole performance, it has been found that the proposed method has better performance than the existing method using fixed weight value. According to the result of table 3, the highlydependent-on-color image group like horses, flowers, buses showed good precision even when compared by color-centered fixed weight value; particularly, a highlydependent-on-color image like flowers showed a little better precision than by the proposed method. But it has been found that a less-dependent-on-color image like African remains showed much lower precision by the existing method. The proposed method, however, compared with the existing method, has been found to increase precision by decreasing the weight value of color and increasing the weight value of texture and shape information. Table 3. The precision of each method
Image group Horse Flower Bus Africa Ruins
Existing method 0.93 0.94 0.88 0.70 0.61
Proposed method 0.94 0.92 0.91 0.79 0.72
Fig.2 shows the result of retrieval comparing the existing and the proposed method by querying remains images. Remains images are comparatively hard to retrieve, but because remains are mostly buildings, taking good advantage of shape information can improve problems hard to retrieve only with color.
(a)
(b) Fig. 2. The result of query(ruins), where (a) is the result of existing method and (b) is the result of proposed method
Adaptive Content-Based Image Retrieval Using Optimum Fuzzy Weight Value
681
precisio n
According to the retrieval results, fig. 2(a) retrieved color, texture, and shape information by the existing fixed weight-value method; so wrongly retrieved mountain image at the 5th order and elephant image at the 9th order. As for fig. 2(b), weight value was adaptively applied by the proposed method, which showed better retrieval result than by the existing method even though image was wrongly retrieved at the 10th order. Therefore, the proposed method showed better precision. Fig.3 shows a graph of the acquired recall and precision of remains images. The proposed method, taking optimum advantage of texture and shape information, showed better performance than the existing method. 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.1
0.2
0.3
0.4
rec all pro pos ed method
exis tin g method
Fig. 3. Precision vs recall(ruins)
4 Conclusions Today, the content-based image retrieval system uses several multiple features, not just one feature. These methods, when comparing similarities between extracted features, give the weight value of features by human subjective judgment or set weight value manually. In this case, weight value wrongly established for image decreases precision. The study has proposed a method using fuzzy integral for the weight value of each feature in order to improve the existing method. As the result of experimenting 1,000 color images, the weight-value similarity retrieval method using fuzzy integral, which the study proposes, has been found to be more excellent in objective performance (precision and recall) than the existing method.
Acknowledgments This work was supported by the Regional Research Centers Program of the Ministry of Education & Human Resources Development in Korea.
682
D.-W. Kim et al.
References 1. Chang, S.K., Yan, C.W., Dimitroff, D.C., and Arndt, T.: An Intelligent Image Database System. IEEE Trans. Software Eng. Vol. 14. No. 5. (1988) 681–688 2. Saha, S.K., Das, A.K., Chanda, B.: CBIR using erception ased exture and olour Measures. Proceedings of Pattern Recognition ICPR 2004. Vol. 2. (2004) 985–988 3. Swain, M.J., Ballard, D. H.: Color Indexing. International Journal of Computer Vision. Vol. 7. No. 1. (1991) 11–32 4. Wu, Y.G. and Liu, J.H.: Image Indexing in DCT Domain. Proceedings of ICITA 2005. Vol. 2. (2005) 401–406 5. Yuan, H., Zhang, X.P., Guan, L.: A Statistical Approach for Image Feature Extraction in the Wavelet Domain. Proceedings of IEEE CCECE 2003, Vol. 2. (2003) 1159–1162 6. Jain, A.K. and Vailaya, A.: Shape-based Retrieval: A Case Study with Trademark Image Databases. Pattern Recognition, Vol. 31. No. 9. (1998) 1369–1390 7. Besson, L., Costa, A.D., Leclercq, E., Terrasse, M.N.: A CBIR -Framework- using Both Syntactical and Semantical Information for Image Description. Proceedings of Database Engineering and Applications Symposium 2003. (2003) 385–390 8. Han, J.H., Huang, D.S., Lok, T.M., Lyu, M.R.: A Novel Image Retrieval System based on BP Neural Network. Proceedings of IJCNN2005. Vol. 4. (2005) 2561–2564 9. Zadeh, L.A.: Fuzzy Sets. Information and Control. Vol. 8. (1965) 89–102 10. Sugeno, M.: Fuzzy Measures and Fuzzy Integrals: A Survey. Fuzzy Automata and Decision Processes. (1977) 89–102 11. Kim, D.W., Kwon, D.J., Kwak, N.J., Ahn, J. H.: A Content-based Image Retrieval using Region based Color Histogram. Proceedings of ICIC 2005. (2005) 12. Vittorio, C., Lawrence, D. B.: Image Database. John Wiley & Sons Inc. (2002) 379–385 13. Wang, J.Z., Li, J., Wiederhold, G.: Simplicity: Semantics-Sensitive Integrated Matching for Picture Libraries. IEEE Trans. on Pattern Analysis and Machine Intelligence.Vol. 23. No. 9. (2001) 947–963
An Adaptive MRF-MAP Motion Vector Recovery Algorithm for Video Error Concealment* Zheng-fang Li 1, Zhi-liang Xu 2,1, and De-lu Zeng1 1
College of Electronic & Information Engineering, South China University of Technology Guangzhou, 510641, China. 2 Department of Electronic communication Engineering, Jiang Xi Normal University Nanchang, 330027, China. [email protected]
Abstract. Error concealment is an attractive approach to combat channel errors for video transmission. A motion vector recovery algorithm for temporal error concealment is proposed. The motion vectors field is modeled as Gauss-Markov Random Field (GMRF) and the motion vectors of the damaged image macroblocks can be recovered adaptively by Maximum a Posteriori (MAP). Simulation results show that the proposed method offers significant improvement on both objective PSNR measurement and subjective visual quality of restored video sequence.
1 Introduction Most of international video coding standards can obtain high image quality at low bit rate, based on block discrete cosine transform (BDCT), motion compensation (MC), and variable length coding (VLC) techniques. However the highly compressed video data will be more sensitive to the channel error. The loss of one single bit often results in the loss of the whole block or several consecutive blocks, which seriously affects the visual quality of decoded images at the receiver. Error concealment (EC) technique is an attractive approach that just takes advantage of the spatial or temporal information that come from the current frame or the neighboring frames to recover the corrupted areas of the decoded image. EC technique requires neither the additional bit rate nor the modification of the standard coding algorithms. Traditional EC methods include BMA [1], AVMV [2], MMV [3], and so on. Recently, many creative works [4-7,10] in this field have been presented. In this paper, we focus on temporal EC to conceal the missing image blocks, which belong to inter-coded frame. *
The work is supported by the National Natural Science Foundation of China for Excellent Youth (60325310), the Guangdong Province Science Foundation for Program of Research Team (04205783), the Specialized Prophasic Basic Research Projects of Ministry of Science and Technology, China (2005CCA04100), the Growing Foundation of Jiangxi Normal university for Youth (1336).
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 683 – 688, 2006. © Springer-Verlag Berlin Heidelberg 2006
684
Z.-f. Li, Z.-l. Xu, and D.-l. Zeng
2 Motion Vector Recovery Based on MRF-MAP 2.1 Motion Vectors Field Model of MRF The motion vectors field was modeled as MRF by Salama [9]. The potential functions are chosen such that N1 −1 N 2 −1 3
¦Vc (v) = ¦ c∈C
i =0
¦ ¦b j = 0 m =0
m i, j
§ ρ ( Dm (vi , j ) · ¨ ¸ σ © ¹
(1)
where N1 and N2 are the number of MBs on vertical and horizontal direction of a frame image respectively, vi , j is the motion vector of MB(i,j), ρ is a cost function,
bim, j is weighting coefficients, σ is a scaling factor, c is cliques, the set of cliques is C={{(i,j-1), (i,j)}, {(i-1,j+1), (i,j)}, {(i-1,j), (i,j)},{ (i-1,j-1), (i,j)}}.
Dm (⋅) has the
following form:
D0 (vi , j ) = vi , j −1 − vi , j , D1 (vi , j ) = vi −1, j +1 − vi , j D2 (vi , j ) = vi −1, j − vi , j , D3 (vi , j ) = vi −1, j −1 − vi , j
(2)
The minimum of equation (3) can be obtained by means of iterative conditional modes (ICM) algorithm. The MAP estimate of motion vector vˆi , j of MB(i,j), given its neighboring motion vectors is i +1 j +1 3 § ρ ( Dm (vl , k ) · ˆvi , j = arg min ¦¦¦ blm,k ¨ ¸ vi , j l =i k = j m = 0 σ © ¹
where the parameters
(3)
blm,k and σ were set to unit value by Salama [6], ρ was cho-
sen as Huber function. The estimate of motion vector of the damaged MB can be obtained by equation (3). However, in this motion vector estimate algorithm, the spatial correlation among neighboring MBs hasn’t been considered in this algorithm (MRF-MAP) proposed by Salama [9]. In order to improve the precision of the estimated motion vector based on MRF-MAP, we propose an adaptive MRF-MAP (AMRF-MAP) algorithm to estimate the vector of the damaged MB. Considering a GMRF model, ρ is a quadratic function. The minimization of equation (3) yields a unique global solution. σ is set to unit value, so the estimated motion vector
vˆi , j =
vˆi , j of MB (i, j ) is given by
¦
( k ,l )∈U
where
bi , j →k ,l ⋅ vk ,l
¦
( k ,l )∈U
bi , j →k ,l
(4)
bi , j →k ,l is the weight assigned to the difference between the values of the mo-
(i, j ) and the motion vector of MB (k , l ) , U is the set of neighboring MBs of MB (i, j ) . tion vector of MB
An Adaptive MRF-MAP Motion Vector Recovery Algorithm
685
2.2 Adaptive Weight Selection In our adaptive MRF model, the weight is selected adaptively, based on the spatial information (neighboring pixels) and temporal information (neighboring motion vectors).
Fig. 1. Missing block and neighboring blocks
Let the size of MB is N × N . The dark block is the damaged MB, and the light blocks are the neighboring MBs as shown in Fig. 1. The size of the damaged Mb is enlarged to ( N + 1) × ( N × 1) . Let the motion vector of the damaged MB is V . When the motion compensation is performed, there will be single pixel overlapping (the grid area as shown in Fig.1) between the concealed MB and the neighboring MBs. In order to measure the degree of smoothness of the overlapping v
area, a function S is defined as follows: N −1
S Lv = ¦ f ( x0 + i, y0 − 1, n) − f ( x0 + vx + i, y0 − 1 + v y , n − 1) i =0
N −1
S Rv = ¦ f ( x0 + i, y0 + N , n) − f ( x0 + i + vx , y0 + N + v y , n − 1) i=0
N −1
STv = ¦ f ( x0 − 1, y0 + i, n) − f ( x0 − 1 + vx , y0 + v y + i, n − 1) i =0
N −1
S Bv = ¦ f ( x0 + N , y0 + i, n) − f ( x0 + N + vx , y0 + v y + i, n − 1) i=0
S v = S Lv + S Rv + STv + S Bv where
(5)
( x0 , y0 ) is the upper left coordinate of the enlarged damage MB, n represents
current frame and n-1 represents the referenced frame. The motion vector of the damaged MB is V , and its x , y components by Vx and Vy respectively. L , R , T , B represent the left, right, top and bottom directions respectively. motion vectors of the neighboring MBs, ( k , l ) ∈ U . If
S
vk ,l
V = Vk ,l , the corresponding
can be obtained by equation (5). The smaller value of
ability that
V equals to Vk ,l .
Vk ,l is one of the
S
vk ,l
, the bigger prob-
686
Z.-f. Li, Z.-l. Xu, and D.-l. Zeng
Fig. 2. Classification of the motion vectors
In addition, since neighboring MBs in one frame often move in a similar fashion, we can group motion vectors that have similar motions into a number of groups. The motion vectors are sorted into 9 classes according to the direction and magnitude information [8] as shown in Fig.2. Let G1, G2, G3,…, G9 denote the set of 9 classes. There is a counter Ci (i= 1, 2, ….9) for each of the nine classes. The counter Ci be used for store the number of motion vectors which belong to corresponding Gi. The bigger value of Ci, the bigger probability that the motion vector V of the damaged MB belongs to Gi. According to the above analysis, we define bi , j →k ,l as follows:
bi , j →k ,l = Ck ,l ⋅ where
min( S
S
vm,n
)
(6)
vk ,l
(m, n) ∈ U , Ck ,l ∈ Ci ( i = 1, 2,3,...9 ).
Substituting (6) into (4), the estimated motion vector
vˆi , j =
¦
Ck , l ⋅
( k ,l )∈U ( m , n )∈U
min( S
S
vm,n
vk ,l
)
⋅ vk ,l
vˆi , j of MB (i, j ) becomes:
¦
( k ,l )∈U ( m , n )∈U
Ck , l ⋅
min( S
S
vm,n
vk ,l
)
(7)
3 Simulation Results Four YUV ( 144× 176 ) grayscale video sequences are used to evaluate the performance of the proposed algorithm. The size of the missing MB is 8 × 8 , and isolated block loss and consecutive block loss are considered. Fig.3(a) is the 92nd frame of Forman with 20.2% isolated blocks loss. The (b), (c), (d), (e), (f) of Fig.3 show the results of BMA [1], AVMV [2], MMV [3], MRF-MAP [9], and our proposed AMRF-MAP algorithms respectively. From Fig.3 (b) and (c), the recovered images by BMA and AVMV algorithm are not smooth and still have serious blocking artifacts. We can see that the proposed algorithm AMRF-MAP recovers the images with edges more successfully than the MMV and the MRF-MAP according to the comparison of Fig.3 (d), (e) and (f).
An Adaptive MRF-MAP Motion Vector Recovery Algorithm
(a) Corrupted frame
(d) MMV
(b) BMA
687
(c) AVMV
(e)MRF MAP
(f)AMRF MAP
Fig. 3. Visual quality comparison by different error concealment methods for Foreman sequence with 20.2% isolated blocks lost rate Table 1. Multi-frame average PSNR(dB) comparison for different video sequences by different methods with block lost rate 20.2% Video sequences Carphone
BMA
AVMV
MMV 30.2
MRFMAP 30.7
AMRFMAP 32.2
26.8
28.0
Foreman Claire Coastguard
27.2 31.1 21.8
28.7 35.3 28.5
30.3 37.5 29.1
28.9 39.1 28.6
30.9 39.4 29.8
Table 2. Multi-frame average CPUtime(s) comparison for different video sequences by different methods with block lost rate 20.2% Video sequences Carphone
BMA
AVMV
MMV 1.27
MRFMAP 1.85
AMRFMAP 2.33
2.86
0.63
Foreman Claire Coastguard
2.98 3.07 3.11
0.67 0.64 0.69
1.48 1.40 1.37
2.00 2.10 1.94
2.50 2.44 2.40
Total fifteen consecutive frames of the Video sequences are used to be simulated with 20.2% isolated MBs missing. The size of the damaged MB is 8×8. In Table 1, we provide the comparison of average PSNR of the recovered image by different of
688
Z.-f. Li, Z.-l. Xu, and D.-l. Zeng
methods. Table 2 is the average CPUtime comparison. From Table 1, it is observed that the proposed algorithm outperforms the other algorithms obviously, and the complexity of the proposed algorithm is moderate demonstrated by Table 2.
4 Conclusion Aim at to recover the damaged MBs, which belong to inter-coded model, an effective temporal EC is proposed in this paper. The motion vectors field is modeled as GMRF,and the weight is selected adaptively based on the spatial information and temporal information. The simulation results show that the proposed method outperforms the existing error concealment methods.
References 1. Lam, W. M., Reilbman, A. R., Liu, B.: Recovery of Lost or Erroneously Received Motion Vectors. IEEE Proceeding ICASSP, 5 (1993) 417-420 2. Sun, H., Challapali, K., Zdepski, J.: Error Concealment in Digital Simulcast AD-HDTV decoder. IEEE Trans. Consumer Electron., 38 (3) (1992) 108-116 3. Haskell, P., Messerschmitt, D.: Resynchronization of Motion Compensated Video Affected by ATM Cell Loss. Proceeding ICASSP’92, San Francisco, CA, 3 (1992) 545-548 4. Zhou, Z. H., Xie, S. L.: New Adaptive MRF-MAP Error Concealment of Video Sequences. Acta Electronica Sinica, 34 (4) (2006) 29-34 5. Zhou, Z. H., Xie, S. L.: Error Concealment Based on Robust Optical Flow. IEEE International Conference on Communications, Circuits and Systems, (2005) 547-550 6. Zhou, Z. H., Xie S. L.: Video Sequences Error Concealment Based on Texture Detection. International Conference on Control, Automation, Robotics and Vision, (2004) 1118-1122 7. Zhou Z. H., Xie S. L.: Selective Recovery of Motion Vectors in Error Concealment. Journal of South China University of Technology, 33 (7) (2005) 11-14 8. Ghanbari, S., Bober, M. Z.: A Cluster Based Method for the Recovery of the Lost Motion Vectors in Video Coding. International Workshop on Mobile and Wireless Communications Network, (2002) 583-586 9. Salama, P., Shroff, N. B., Delp, E. J.: Error Concealment in MPEG Video Streams over ATM Networks. IEEE J. Select. Areas Commun.,18 (2000)1129-1144 10. Xie, S. L., He, Z. S., Gao, Y.: Adaptive Theory of Signal Processing. 1st ed. Chinese Science Press, Beijing (2006) 255-262
An Efficient Segmentation Algorithm Based on Mathematical Morphology and Improved Watershed Ge Guo, Xijian Ping, Dongchuan Hu, and Juanqi Yang Information Science Department, Zhengzhou Information Science and Technology Institute, Zhengzhou, Henan 450002 Mailbox 1001, 837# [email protected]
Abstract. Image separation is a critical issue toward the recognition and analysis phase in many image processing tasks. This paper describes an efficient segmentation algorithm based on mathematical morphology and improved watershed which uses the immersion-based watershed transform applied to the fusion image of multi-scale morphological gradient and distance image to decrease the known drawback of watershed, oversegmentation, notably. Furthermore, oversegmentation is posteriorly reduced by a region merging strategy to obtain meaningful results. The presented segmentation technique is tested on a series of images and numerical validation of the results is provided, demonstrating the strength of the algorithm for image segmentation.
1 Introduction Image segmentation technology is one of the most popular subjects of considerable research activity over the last forty decades. Many separation algorithms have been elaborated and present and were extensively reviewed by Clarke et. al.[1]which said that fully-automated segmentation technology will still be a difficult task and fully automatic segmentation procedures that are far from satisfying in many realistic situations. Watershed transform is a popular division tool based on morphology and has been widely used in many fields of image segmentstion due to the advantages that it possesses: it is simple; by it a continuous thin watershed line can be found quickly; it can be parallelized and produces a complete separation which avoids the need for any kind of contour joining. However, some notalble drawbacks also exist which have been seriously affect its practicability. Among those the most important are oversegmentation resulted from its sensitivity to noise and poor detection of significant areas with low contrast boundaries. To solve above problems an image fusion method is presented where both geometry and intensity information are considered to get satisfied division. Furthermore an automatically merging method is proposed to reduce overseparated regions. The actual segmentation procedure consists out of three parts: 1) Image fusion of multi-scale morphological gradient and distance image; 2) Segmentation by the immersion simulation approach described by Vincent and Soille; 3) Reduction of the oversegmentation by small region merging method. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 689 – 695, 2006. © Springer-Verlag Berlin Heidelberg 2006
690
G. Guo et al.
2 Image Fusion The division result based on watershed transform depends much on the quality of the referenced image. The difficulty of watershed is to check if the objects and their background are marked by a minimum and if the crest lines outline the objects. If not, the transform for original image is needed so that the contours to be calculated correspond to watershed lines and the objects to catchment basins. Gang Lin[2]gives a gradient-weighted distance transform to get more suitable referenced image, however, problems arise when abundant noise exit or intensity contrast is low in boundary. A modified fusion method is proposed to improve such situation. 2.1 Multiscale Morphological Gradient Morphological gradient is defined as the arithmetic difference between dilation and erudition of structuring element B. It emphasizes pixel changes much more than other gradient operators. However, the main problem of morphological gradient is the selection of the structuring element size. So a multiscale morphological gradient is taken into account to enhance blurt edge combining the respective advantages of large structuring element and small structuring element[3]which is described as follows:
M( f ) =
1 n ¦{[( f ⊕ Bi ) − ( f ΘBi )]ΘBi −1} . n i =1
(1)
Where Bi ( 1 ≤ i ≤ n ) is a group of foursquare structure element with size of
(2i + 1) × (2i + 1) . The multi-scale morphological gradient is less sensitive to noise than traditional morphological gradient because the former adopts the average value of each scale. Besides, such gradient has a stronger ability of resisting the interaction between two connected contours. 2.2 Morphological Filter for Gradient Image
The purpose here is to smooth the gradient image in order to reduce oversegmentation due to noises while retaining the salient image edges. To this aim, the smoothing of gradient image is in urgent need. Morphological filters [4] composed of morphological opening and closing are proved to be attractive for this task. It possesses the property of simplifying an image by producing flat zones while preserving efficiently sharp edges due to the flat zones connectivity. For image f , the opening and closing operations are defined as follows
Morphological Opening : γ B ( f ) = δ B (ε B ( f )) . ® ¯ Morphological Closing : ϕ B ( f ) = ε B (δ B ( f ))
(2)
Where δ B , ε B are denoted as the dilation and erosion operation with structuring element B . An opening (closing) operation can only preserve (fill) the small structures that have the same shape as the structuring element. In order to preserve the useful
An Efficient Segmentation Algorithm
691
parts and fill the small holes as much as possible, it is needed to construct a series of opening and closing operation with different shape to suit different demand and the output of the filter group is adopted by:
° Γ ( f ) = Max{γ B1 ( f ), γ B2 ( f ) γ Bn ( f )} . ® °¯ Ψ ( f ) = Min{ϕ B1 ( f ), ϕ B2 ( f ) ϕ Bn ( f )}
(3)
Here Bi figures one structuring element. It is clearly that the more structuring elements taken, the more details (holes) will be reserved (filled). In considerations of noise removal and detail preserving abilities, following structuring elements in Figure 1 are taken into account.
Fig. 1. Different structuring elements
Figure2(a) is a rice image added salt noise which produces large numbers of minimums in gradient image which makes oversegmentation seriously after watershed segmentation (Figure2(b)). Figure2(c) shows the watershed result on multi-scale morphological gradient smoothed by the method described in 2.2 where the regions reduced a lot.
(a)
(b)
(c)
Fig. 2. (a)Rice image added noise;(b)Watershed on traditional gradient;(c)Watershed on multiscale morphological gradient after morphological filter
2.3 Distance Transform
To separate single object is to find the connected points of two different regions. Distance transform is an operation for binary image which transform the position information into intensity information by assigning to every point (to both those in objects as well as those in the background) the minimum distance from that particular point to the nearest point on the border of an object. In general Chamfer algorithm is used to approximate Euclidean distance and the detailed steps are as follows: 1) Original image binerization: To reduce computing time, considering the automaticity, we can adopt the conventional threshoding methods based on histogram. Here we choose the iterative algorithm proposed by Ridler and Calvard which possesses some good properties such as stability, speed and consistency.
692
G. Guo et al.
2) Region connection: Thresholding sometimes results in some small isolated objects due to the existence of dense or uneven distribution. To remove these artificial objects, a minor region removal algorithm based on region area is used. After thresholding all the connected components are identified and the sizes of all the isolated components are calculated. Then object smaller than a set threshold is considered to be an artificial region and its intensity is changed to be the same value with its biggest neighboring object. 3) Chafer distance transform: Choose the 5×5 mask (figure3) to realize Chamfer distance transform [5]. Two scans are taken orderly that the former one is left to right, top to bottom; and the back one is right to left, bottom to top. When the mask is moving, at each position, the sum of the local distance in each mask point and the value of the point it covers are computed, and the new value of the point corresponding to 0 is the minimum of these sums.
(a)
(b)
Fig. 3. (a) Forward pass template (b) Backward pass template
2.4 Fusion Method
Multi-scale grad reflects the intensity information which is very sensitive to noise and usually results in oversegmentation. Chamfer distance reflects the position information which is geometrical and is good at separating objects with regular shapes. If we can find suitable method combining above-mentioned transforms to represent pixels’ character, edge detection of watershed will certainly become easier. Let M be the multi-scale grad, D be the Chamfer distance and g be fusion result, the fusion formula is given by:
g (i, j ) = (max( g (2) (i, j )) − g (2) (i, j )) .
(4)
Where: g (1) (i, j ) = D(i, j )[(1 + a ) − a g (2) (i, j ) =
M (i, j ) − M min ]. M max − M min
255* g (1) (i, j ) . (1) g max
(5)
(6)
Equation (6) is utilized to void g (i, j ) to overstep 255. α is a gradient weight controlling factor that is determined experientially according to edge’s blurt degree. The fainter the edge is, the bigger α is, and when edge is stronger α become smaller. The fusion image represents two characters of one point including position information
An Efficient Segmentation Algorithm
693
and intensity information. And it is clear that g (i, j ) is lower when (i, j ) is close to the center of the object where gradient is lower nevertheless higher when pixel (i, j ) is close to boundary where gradient is lower And it is clear that g (i, j ) is lower when (i, j) is close to the center of the object where gradient is lower nevertheless higher when pixel (i, j) is close to boundary where gradient is lower.
3 Immersion-Based Watershed The fusion image of multi-scale gradient and distance image is considered as a topographic relief where the brightness value of each pixel corresponds to a physical elevation. Of all watershed transforms the immersion technique developed by Vincent and Soille [6]was shown to be the most efficient one in terms of edge detection accuracy and processing time. The operation of their technique can simply be described by figuring that holes are pierced in each local minimum of the topographic relief. In the end, the surface is slowly immersed into a ‘lake’, by that filling all the catchment basins, starting from the basin which is associated to the global minimum. As soon as two catchment basins tend to merge, a dam is built. The procedure results in a partitioning of the image in many catchment basins of which the borders define the watersheds.
4 Region Merging After the watershed segmentation algorithm has been carried out on the fusion image, oversegmentation can be nearly eliminated, but there still remain a small quantity of regions that could by merging yield a meaningful segmentation. In the next step, the partitioning is additional diminished by a properly region merging process which is done by merging neighboring regions having similar characteristics. Suppose i is the current region with size of Ri and k neighboring partitions recorded as R j ( j = 1, 2, k , j ≠ i ) . Let Li , j be the mean strength for the shared boundaries between two adjacent regions i and j . If j is one of i ' s neighboring regions, the adjudication function Pi , j used in this work is defined as: Pi , j =
Ri × R j Ri + R j
μi − μ j Li , j ( j = 1, 2 k ) .
(7)
It is clearly that the smaller Pi , j is, the similar the two regions are. The merging process starts by joining two regions with the smallest P value. During the merging process, all the information of the two regions such as area, mean intensity and so on is combined and the P value is updated. Then the merging process is continued by again merging two regions with the smallest P value and the process is stopped when all the adjudication functions of any two regions satisfy Pi , j > Threshold ( Threshold is a set threshold).
694
G. Guo et al.
5 Experiments and Discussion Evaluation of the proposed algorithms was carried out using a cell image of 500 × 375 pixels. Figure4 (a) shows the segmentation process of the cell image with complex background. Several strategies including multi-scale morphological gradient and morphological filter are taken in our algorithm to reduce noise in the algorithm introduced above which decreases the oversegmentation visibly. Figure4(c) is the reference image obtained by fusing multi-scale morphological gradient and distance image which brings out reasonable result as is shown in Figure4 (c). The final output after region merging step is shown in Figure4(d) where one meaningful divided region corresponds to one single cell.
(a)
(b)
(c)
(d)
Fig. 4. (a) Original cell image; (b) Fusion image; (c) Segmentation result by our method; (d) Final result after small region merging
As comparison, Figure5 shows the watershed result on morphological gradient images. It can be seen that better segmentation performance can be available from our method described above.
Fig. 5. Segmentation result on morphological gradient image
6 Conclusion In this paper, an improved watershed algorithm is introduced where a modified fusion method and region merging strategy are applied in turn to reduce oversegmentation. The proposed algorithm was tested on a series of images with different types. Results proved that our algorithm is much suitable and easily to segment objects with corre-
An Efficient Segmentation Algorithm
695
spondingly regular shapes. However, just as all the papers on segmentation said, no algorithm can fit for all types of images, and our method, when applied to objects with greadtly erose shapes the division may be not so satisfied.
References 1. Clarke L. P,Velthuizen R. P,Camacho M. A.: MRI Segmentation: Methods and Applications. Magnetic Resonance Imaging (1995) 343-368 2. Lin G., Umesh Adiga, Kathy Olson.: A Hybrid 3D Watershed Algorithm Incorporating Gradient Cues and Object Models for Automatic Segmentation of Nuclei in Confocal Image Stacks. Cytometry. (2003) 23-36 3. Lu, G. M., Li, S. H.: Multiscale Morphological Gradient Algorithm and Its Application in Image Segmentation. (2001) 37-40 4. Mal, Zhang, Y.: A Skeletionization Algorithm Based on EDM and Modified Retinal Model. Journal of Electronics(China). (2001) 272-276 5. Borgefors, G.: Distance Transformations in Digital Images. Comput. Vis. Graph. Image Process. (1986) 344-371 6. Vincet, L., Soille, P..: Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations. IEEE Trans. Pat. Anal. Machine Intell. (1991) 583-598
An Error Concealment Based on Inter-frame Information for Video Transmission* Youjun Xiang1, Zhengfang Li1, and Zhiliang Xu2,1 1 College
of Electronic & Information Engineering, South China University of Technology, Guangzhou 510641, China [email protected] 2 College of Physics and Electronic Communication Engineering, Jiangxi Normal University, Nanchang 330027, China
Abstract. Transmission of encoded video signals in error-prone environments usually leads to packet erasures which often results in a number of missing image blocks at the decoder. In this paper, an efficient error concealment algorithm for video transmission is proposed which based on the inter-frame information. The missing blocks are classified into low activity ones and high activity ones by using the motion vector information of the surrounding correctly received blocks. The low activity blocks are concealed by the simple average motion vector (AVMV) method. For the high activity blocks, several closed convex sets are defined, and the method of projections onto convex sets (POCS) is used to recover the missing blocks by combining frequency and spatial domain information. Experimental results show that the proposed algorithm achieves improved visual quality of the reconstructed frames with respect to other classical algorithms, as well as better PSNR results.
1 Introduction Following the development of the technical of multimedia, the demand of real-time video transmission is rapidly increasing now. When the video images are transmitted on the error-prone channel, the loss of one single bit often results in the loss of the whole block or several consecutive blocks, which seriously affects the visual quality of decoded images at the receiver. Error concealment (EC) technique is an attractive approach that just takes advantage of the spatial or temporal information that comes from the current frame or the neighboring frames to recover the corrupted areas of the decoded image. When the missing blocks belong to the inter-coded mode, they can be recovered by the temporal error concealment methods. The classical temporal error concealment methods are average motion vector (AVMV)[1] and boundary match algorithm (BMA)[2]. These methods have the advantage of low computational complexity. However, when the estimated motion vector of the missing block is *
The work is supported by the National Natural Science Foundation of China (60274006), the Natural Science Key Fund of Guang Dong Province, China (020826) and the National Natural Science Foundation of China for Excellent Youth (60325310).
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 696 – 701, 2006. © Springer-Verlag Berlin Heidelberg 2006
An Error Concealment Based on Inter-frame Information for Video Transmission
697
unreliable, there will be serious blocking artifacts in the reconstructed image, which degrade the quality of the reconstructed image. In [3], [9-10], some effective error concealment algorithms are presented to successfully recovering the motion vectors and image blocks lost. Some new ideas based on adaptive MRF-MAP in [4-5] are also presented to address this problem. In order to overcome the deficiency of the AVMV algorithm, a combination temporal error concealment algorithm is proposed in this paper.
2 Restoration Algorithm Using Projections Method 2.1 Iterative Approach Based on the Theory of POCS The main inspiration below this approach has been the technique employed by Hirana and Totsuka [6] for removal of wires and scratches from still images. In order to restore the missing blocks in video images, we improve the technique. The first step of the algorithm consists in selecting a subimage, which is a neighborhood of the missing block (called repair subimage) and a same or similar subimge matched from neighboring frame (called sample subimage). Repair subimage provides a hint for about the local spatial information and sample subimage for the frequency information. Example of these subimages can be seen in Fig.1. r is the missing block, f is the repair subimage and s is the sample subimage. f and s have the same dimension.
(a) repair subimage
(b)sample subimage
Fig. 1. Selection of subimages
The second step is to formulate the desired properties in terms of convex constrains. To characterize such properties, the following constrains and projections are considered. 1) The first projection operation that we use
Pmin − DC ( f ) = IFFT ( Meiphase ( F ) )
(1)
where
min(| F (u , v ) |,| S (u , v) |) M(f ) = ® ¯ | F (0, 0) |
if (u , v) ≠ (0, 0) ½ , ¾ if (u , v ) = (0, 0) ¿
(2)
F = FFT ( f ) and S = FFT ( s ) , is a projection onto the underlying set
Cmin − DC = { f :| F (u , v) |≤| S (u , v) |, (u, v) ≠ (0, 0)} .
(3)
698
Y. Xiang, Z. Li, and Z. Xu
Generally, the observed signal can be modeled as a multiplication of the unknown signal by a time-limited binary window function. In the frequency domain, the convolution of the unknown signal spectrum with the window spectrum leads to a blurred and spread spectrum of the observed signal, in general an increased magnitude. In order to eliminate the influence of the known window spectrum, we use the sample-spectrum as a template for improving the repair-spectrum by correcting the spectrum magnitude. M defined in Eq.(4) is a kind of minimum taking operation on | F (u , v) | and | S (u , v) | . The only exception is at DC, (u , v) = (0, 0) where the value of | F (0, 0) | is retained. The motivation for not modifying the DC value of the repairspectrum is that it contains the value of the overall repair subimage intensity. While reshaping spectrum magnitude we leave the phase of the repair-spectrum untouched for automatic alignment of global features. 2) A constraint for continuity within the surroundings neighborhood of a restored block is imposed for smooth reconstruction of a damaged image. The projection onto the smooth constraint set is Psmooth ( f ) = Θ( f )
(4)
Θ( x) denotes the filtering operator applied to image x with median filter. 3) The third projection operator Pclip ( f ) imposes constrains on the range of
where
the restored pixel values. It operates in the spatial domain. The convex set corresponding to the clipping to the feasible range of [ smin , smax ] is
Cclip = { f : smin ≤ f ≤ smax , for
f ( k , l ) ∈ r} .
(5)
4) Since the foregoing operations affect even the pixels outside of the missing block r , these must now be corrected in spatial domain. This is done simply by copying the known pixel values around r from the original repair subimage. The convex set corresponding to know pixel replacement is
Creplace = { f : f (i, j ) = f 0 (i, j ), (i, j ) ∉ r} . The appropriate projection onto
(6)
Creplace is
Preplace ( f ) = f (1 − w) + f 0 w ,
(7)
where w is the binary mask which is 0 at missing pixel locations and 1 otherwise. Missing pixels are restored iteratively by alternately projecting onto the specified constraint sets. Thus the algorithm can be written as
f k +1 = Preplace ⋅ Pclip ⋅ Pmin − DC ⋅ Psmooth ⋅ f k ,
k = 0,1,
(8)
where k is the iteration index. The scheme is presented in Fig.2. 2.2 Proposed Method For the advantages of low computational complexity of AVMV and significantly good performance of POCS, a combination temporal error concealment algorithm is proposed in this paper.
An Error Concealment Based on Inter-frame Information for Video Transmission
Fig. 2. Scheme of the algorithm based POCS
699
Fig. 3. Flow Chart of the Proposed Algorithm
Fig.3 gives a flowchart of the algorithm. The missing blocks are classified into low activity blocks and high activity blocks by using the motion vector information of the surrounding correctly received blocks. For the missing blocks, which are low activity blocks, it can be concealed by the simple average motion vector (AVMV) method. For the missing high activity blocks, several closed convex sets are defined, and the method of projections onto convex sets (POCS) is used to recover the missing blocks by combining frequency and spatial domain information. While global features and large textures are captured in frequency domain, local continuity and sharpness are maintained in spatial domain. In the algorithm, we define block activity criterion as ___ 1 N 1 high activity block, for | vxi − vx | ≥ α or ° ¦ the block is N i N ® °low activity block, otherwise ¯
N
___
¦ | vy − vy | ≥ α i
i
(9)
where (vxi , vyi ) , i = 1, 2, , N is the motion vector of the surrounding correctly received blocks, (vx, vy ) is the average of these MVs and α is a predetermined values.
3 Simulation Results The video sequences “Carphone” and “Foreman” are used to evaluate the performance of the proposed algorithm. The size of the missing blocks is 8 × 8 or 16 × 16 , and isolated block loss and consecutive block loss are considered. Fig.4 (a) is the 52nd frame of “Carphone” with 20.2% isolated blocks loss and the size of missing block is 8 × 8 . The (b), (c), (d), (f) of Fig.4 show the results of MRF [7], SR [8], BMA [2], AVMV [1] and our proposed algorithms respectively. In Fig.4, it is noticed that the corrupted components of the edge of the car’s window are recovered more faithfully by our algorithm and SR algorithm than BMA [2] algorithm and AVMV [1] algorithm. There are still serious blocking artifacts in Fig 4(d) and Fig 4(e). There is obvious discontinuity between the recovered missing components and the undamaged blocks in Fig.4 (c). It is noticed that the proposed algorithm recovers the missing block in the eye’s region more faithfully than it does by MRF [7] algorithm and SR [8] algorithm.
700
Y. Xiang, Z. Li, and Z. Xu
We provide the comparison of PSNR of the recovered image by different algorithm in Table 1. If the missing blocks belong to the isolated situation and the size of missing blocks is 8 × 8 , the SR’s PSNR is higher than obtained by BMA and AVMV. From Table 1, it is observed that the proposed algorithm outperforms the other algorithms obviously at any blocks missing situation.
(a)Corrupted Frame
(d) BMA
(b) MRF
(c) SR
(e) AVMV
(f) Proposed Algorithm
Fig. 4 Recovered 52 nd frame of “Carphone” sequence with 20.2% isolated blocks missing
Table 1. Comparison of the PSNR in Different Situation PSNR (dB) Corrupted video sequences MRF Forman (92nd)
Carphone (52nd)
8 × 8 Discrete Missing (20% loss rate) 8 × 8 Consecutive Missing (20% loss rate) 16 ×16 Discrete Missing (20% loss rate) 16 ×16 Consecutive Missing (20% loss rate) 8 × 8 Discrete Missing (20% loss rate) 8 × 8 Consecutive Missing (20% loss rate) 16 ×16 Discrete Missing (20% loss rate) 16 ×16 Consecutive Missing (27% loss rate)
SR
BMA
AVMV
Ours
29.1
29.5
27.5
28.8
30.0
23.0
22.8
26.5
27.4
27.7
25.7
25.3
27.5
31.5
32.1
20.3
22.8
27.3
31.8
33.2
25.0
28.7
27.2
26.0
33.2
21.2
20.8
26.6
27.6
29.7
23.9
24.5
27.4
30.5
32.7
20.8
21.4
28.3
32.6
32.3
An Error Concealment Based on Inter-frame Information for Video Transmission
701
4 Conclusions In this paper, an efficient error concealment algorithm for video transmission is proposed which based on the inter-frame information. In our approach, the AVMV and the POCS is combined to fully exploit advantages from each method. The missing blocks are classified into low activity blocks and high activity blocks by using the motion vector information of the surrounding correctly received blocks. For the low activity block, it can be concealed by the simple AVMV. For the high activity block, several closed convex sets ( Cmin − DC , Cclip , Csmooth and Creplace ) are defined, and the POCS is used to recover the missing block by combining frequency and spatial domain information, which solves the problem that the MVs estimated by AVMV is unreliable for areas with fast motion and object boundaries. Experimental results show that the proposed algorithm achieves improved visual quality of the reconstructed frames with respect to other classical error concealment algorithms, as well as better PSNR results.
References 1. Sun, H., Challapali, K., Zdepski, J.: Error Concealment in Digital Simulcast AD-HDTV Decoder. IEEE Trans. Consumer Electron., Vol.38, No.3 (1992)108-116 2. Lam, W.M., Reilbman, A.R., Liu, B.: Recovery of Lost or Erroneously Received Motion Vectors. Proceeding ICASSP, Vol. 5 (1993) 417-420 3. Zhou Z.H., Xie S.L.: Selective Recovery of Motion Vectors in Error Concealment. Journal of South China University of Technology. Vol.33, No.7 (2005) 11-14 4. Zhou Z.H., Xie S.L.: New adaptive MRF-MAP Error Concealment of Video Sequences. Acta Electronica Sinica, Vol.34, No.4 (2006) 29-34 5. Zhou Z.H., Xie S.L.: Error Concealment Based on Adaptive MRF-MAP Framework. Advances in Machine Learning and Cybernetics, Lecture Notes in Artificial Intelligence 3930, (2006) 1025-1032 6. Hirani, A., Totsuka, T.: Combining Frequency and Spatial Domain Information for Fast Interactive Image Noise Removal. Proceeding SIGGRAPH’96 Conf., (1996) 269-276 7. Shirani, S., Kossentini, F., Ward R.: A Concealment Method for Video Communications in an Errorprone Environment. IEEE J. Select. Areas. Commu., Vol.18, No.6 (2000) 1122-1128 8. Li, X., Michale, T.O.: Edge-directed Prediction for Lossless Compression of Natural Images. IEEE Trans. on Image Processing, Vol.10, No.6, (2001) 813-817 9. Zhou, Z.H., Xie, S.L.: Error Concealment Based on Robust Optical Flow. IEEE International Conference on Communications, Circuits and Systems, HongKong, (2005) 547-550 10. Xie, S.L., He, Z.S., Gao, Y.: Adaptive Theory of Signal Processing. 1st ed. Chinese Science Press, Beijing (2006) 255-262
An Integration of Topographic Scheme and Nonlinear Diffusion Filtering Scheme for Fingerprint Binarization Xuying Zhao, Yangsheng Wang, Zhongchao Shi, and Xiaolong Zheng Institute of Automation, Chinese Academy of Sciences NO.95 Zhongguancun East Road, Beijing, P.R. China {xuying.zhao, yangsheng.wang, zhongchao.shi, xiaolong.zheng}@ia.ac.cn
Abstract. This paper proposes an approach to fingerprint binarization integrating nonlinear diffusion filtering scheme and topographic scheme in which the properties of essential flow-like patterns of fingerprint are deliberately analyzed in different points of view. The filtering scheme is on the basis of the the coherent structures, while the topographic scheme is based on the analysis of the underlying 2D surface. The fingerprint image is smoothed along the coherent structures and binarized according to the sign of the trace of the Hessian matrix. The integration method is tested with a series of experiments and the results reveal the good performance of our algorithm.
1
Introduction
Although several schemes have been proposed to extract features directly from the grey-level fingerprint image [1,2], the process of extraction is generally intractable because of the noise generated by such factors as the presence of scars, variations of the pressure between the finger and acquisition sensor, worn artifacts, the environmental conditions during the acquisition process, and so forth. Therefore, an input gray-scale fingerprint image is then transformed by the enhancement algorithm into a binary representation of the ridge pattern, called binary ridge-map image [3] to reduce the noise present in the image and detect the fingerprint ridges. The fingerprint image binarization that classifies each pixel in ridge and valley regions heavily influences the performances of the feature extraction process and hence the performances of the overall system of automated fingerprint identification. The binary image obtained follows to be used further by subsequent processes in order to extract features such as detecting and classifying the minutiae point. Most of the proposed methods [4,5,6] of fingerprint binarization require a global or local threshold to discriminate between ridge and valley regions in which the threshold is more or less arbitrarily chosen based on a restricted set of images. Wang has made an observation that if we consider the grey-scale image to be a surface, then its topographical features correspond to shape features of the original image. He investigated the properties of geometric features in the D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 702–708, 2006. c Springer-Verlag Berlin Heidelberg 2006
An Integration of Topographic Scheme
703
context of OCR and give an analysis of the practicality and effectiveness of using geometric features for text character recognition [7]. Similarly, Tico has proposed a method of fingerprint binarization based on the topographic properties of the fingerprint image [8]. In Tico’s scheme, the discrete image is treated as a noisy sampling of underlying continuous surface, and ridge and valley regions are discriminated by the sign of the maximum normal curvature of this surface. In fact, we observed that a point of fingerprint can be classified as a ridge point or as a valley point by the property of the surface with no need to calculate the normal curvature. Also the assumption of continuous surface is often invalid with those fingerprint images of poor quality. Consequently, fingerprint image enhancement is the first step in our recognition algorithm to reduce noises and increase the contrast between ridges and valleys in the gray-scale fingerprint images. It’s essential the question how to enhance flow-like patterns to improve the quality of fingerprint without destroying for instances semantically important singularities like the minutiae in the fingerprint. The problem has been addressed to have a multiscale simplification of the original image by embedding it into a scale-space in order to obtain a subsequently coarser and more global impression of the main flow-like structures. The idea of scale-space filtering that derived from the multiscale description of images have also been introduced and well developed and widely used in computer vision [9,10,11,12,13,14]. As far as fingerprint images are concerned, such a scale-space should take into account the coherence of the structures by smoothing mainly along their preferred orientation instead of perpendicular to it [14]. The technique of coherence-enhancing anisotropic diffusion filtering combines ideas of nonlinear diffusion filtering with orientation analysis by means of structure tensor. Weickert [15] also presented that the direction sensitivity constitutes an additional problem for the design of appropriate algorithms for diffusion filtering that had not been addressed in the computer vision literature before. The difficulty can be handled by use of specific first-order derivative filters that have been optimized with respect to rotation invariance [16]. In this paper, we first present the approach for nonlinear diffusion filtering with optimized rotation invariance in section 2. In section 3, we introduce our method of fingerprint binarization based on the properties of geometric features. Experimental results for the integration of two schemes are given in section 4. Finally, we presents some concluding remarks in section 5.
2
Scheme for Filtering
The essential idea of the approach to scale-space filtering can be briefly described as follows. Embed the original image in a family of derived images u(x, y, t) obtained by convolving the original image u0 (x, y, t)) with a Gaussian kernel G(x, y; t) of variance t: u(x, y, t) = u0 (x, y, t) ∗ G(x, y; t) .
(1)
Larger values of t, the scale-space parameter, correspond to images at coarser resolutions.
704
X. Zhao et al.
The one parameter family of derived images may equivalently be viewed as the solution of the heat diffusion equation: ∂u(x,y,t) ∂t
= u(x, y, t) u(x, y, 0) = u0 (x, y) ∂u(x,y,t) =0 ∂en
(x, y) ∈ Ω, t > 0 , (x, y) ∈ Ω , (x, y) ∈ Ω, t > 0 .
(2)
Upon analyzing flow-like patterns, numerous nonlinear diffusion filters have been proposed, most of which use a scalar diffusivity. Weickert surveyed underlying structure method for describing coherence in images and construct a coherence-enhancing diffusion which smooths along coherent flow-like structures [14]. The approach of the nonlinear diffusion filtering enables true anisotropic behaviour by adapting the diffusion process not only to the location, but also allowing different smoothing in different directions. 2.1
Nonlinear Diffusion Filtering
Denote the fingerprint image as I, with each pixel I(x, y). The principle of nonlinear diffusion filtering is as follows. We calculate a processed version u(x, y, t) of I(x, y) with a scale parameter t ≥ 0 as the solution of diffusion equation with I as initial condition and reflecting boundary conditions: ∂t u = div(D∇u) u(x, y, 0) = I(x, y) < D∇u, n >= 0
I(x, y) ∈ I, t > 0 , I(x, y) ∈ I, t = 0 , I(x, y) ∈ Γ, t > 0 .
(3)
Hereby, n denotes the outer normal and < ., . > the usual inner product, while Γ is the boundary of image I and D is the symmetric positive definite diffusion tensor. For the purpose of fingerprint enhancement, we should choose the diffusion tensor D as a function of the local image structure, i.e. the structure tensor Jρ (∇uσ ), to adapt the diffusion process to the image itself. The structure tensor can be obtained by convolving the tensor product of the vector-valued structure descriptor ∇uσ with a Gaussian Kρ : " # j j Jρ (∇uσ ) = 11 12 = Kρ ∗ (∇uσ ⊗ ∇uσ ) , (4) j21 j22 where the parameter σ is called local scale, and the integration scale ρ reflects the characteristic size of the fingerprint image. The symmetric matrix Jρ is positive semidefinite and possesses orthonormal eigenvectors w1 , w2 with ⎡ ⎤ 2j12 . /2 √ 2 2 ⎥ ⎢ +4j12 j22 −j11 ± (j11 −j22 )2 +4j12 ⎢ ⎥ √ (5) w1,2 = ⎢ 2 ⎥ 2 ± (j11 −j22 ) +4j12 ⎣ . j22 −j11√ ⎦ / j22 −j11 ±
2 (j11 −j22 )2 +4j12
2
2 +4j12
if j11 = j22 or j12 = 0. The corresponding eigenvalues are !
1 2 2 μ1,2 = j11 + j22 ± (j11 − j22 ) + 4j12 , 2
(6)
An Integration of Topographic Scheme
where the + sigh belongs to μ1 . The difference
2 μ1 − μ2 = (j11 − j22 )2 + 4j12
705
(7)
measures the coherence within a window of scale ρ that plays an important role for the construction of the diffusion filter. To adapt to the local structure, the diffusion tensor D should possess the same eigenvectors w1 , w2 as the structure tensor Jρ (∇uσ ). So it can be given by "
# " #" T # a b λ1 0 w1 = (w1 |w2 ) D(Jρ (∇uσ )) = . b c 0 λ2 w2T
(8)
The eigenvalues of D are chosen as: λ1 = α , α −β λ2 = α + (1 − α)e (μ1 −μ2 )2m
if μ1 = μ2 ,
(9)
else .
Herein, λ1 is given experientially by λ1 = α = 0.01 that defines the diffusion in the direction orthogonal to the ridge. λ2 is an increasing function in (μ1 − μ2 )2 with the restriction parameter β = 3, while m decides the speed of the diffusion process. 2.2
Filtering with Optimized Rotation Invariance
The first derivative operator scribed as: ⎡ −3 0 1 ⎣ −10 0 Fx = 32 −3 0
with optimized rotation invariance [16] can be de⎤ 3 10 ⎦ 3
⎤ ⎡ 3 10 3 1 ⎣ 0 0 0 ⎦ . and Fy = 32 −3 −10 −3
(10)
It has been shown that they approximate rotation invariance significantly better than related popular operators like the Sobel operator. Now we can calculate the structure tensor Jρ (∇uσ ) (4) using the optimized derivative operator Fx , Fy (10), and assemble the diffusion tensor D(Jρ (∇uσ )) (8) as a function of the structure tensor. Decompose and rewrite the divergence operator in (3) as j1 = a∂x u + b∂y u , j2 = b∂x u + c∂y u , (11) div(D∇u) = ∂x j1 + ∂y j2 . Thereby, the flux components j1 , j2 and div(D∇u) are calculated respectively by means of the optimized derivative operator. Updating in an explicit way until to be stable or in a limited steps, we obtain the enhanced fingerprint image as an input of binarization.
706
3
X. Zhao et al.
Binarization Based on Geometric Properties
An enhanced fingerprint image can be approximately regarded as continuous two dimensional surface that is defined by the equation z = u(x, y) mathematically. The geometric properties in a certain point (x, y) are determined by the gradient vector ∇u and the Hessian matrix H computed in that point. The gradient vector ∇u is oriented to the direction of maximum change in the value of the image, i.e. the two dimensional function u(x, y) that is physically the same as section 2 mentioned. The Hessian matrix is defined in terms of second order partial derivatives: 2 2 H=
∂ u ∂x2 ∂2u ∂y∂x
∂ u ∂x∂y ∂2u ∂y 2
.
(12)
Let ω1 , ω2 be the unit eigenvectors of H, and λ1 , λ2 the corresponding eigenvalues with |λ1 | ≥ |λ2 |. λ1 , λ2 are real and ω1 , ω2 are orthogonal to each other because H is symmetric. H determines the normal curvature that is the value of second order derivative of u(x, y) on a given direction ω as follows: ∂2u = ω T Hω , (13) ∂ω 2 where the direction vector ω is expressed as a two dimensional column vector. Consequently, the second directional derivative is extremized along the two directions defined by the Hessian eigenvalues ω1 and ω2 , and λ1 , λ2 are the corresponding extreme values of the normal curvature. The orthogonal directions ω1 , ω2 are also called principal directions whereas the normal curvatures λ1 , λ2 are also called principal curvatures of the surface. Detailed mathematical descriptions of various topographic properties of the two dimensional surfaces are described in [7] based on the concepts of gradient vector, Hessian eigenvectors and Hessian eigenvalues. In a fingerprint image neighborhood ridges and valleys have the same orientation in most of the image area and the gray level in every transversal section exhibits a quite sinusoidal shape. So we can conclude that the direction of the maximum principal curvature is given by λ1 due to the relationship of λ1 and λ2 . Accordingly, the sign of λ1 can be used to discriminate between ridge and valley regions, i.e., a point in the fingerprint is classified as a ridge point if λ1 is positive or as a valley point if λ1 is negative. The trace of the 2 × 2 square matrix H is defined by T r(H) =
∂2u ∂2u + 2 . ∂x2 ∂y
(14)
The following is a useful property of the trace. T r(H) = λ1 + λ2 .
(15)
Considering the relation |λ1 | ≥ |λ2 |, it is not hard to verify that the sign of λ1 is equivalent to that of T r(H). Hence the fingerprint image can be binarized in accordance to the sign of the trace of the Hessian.
An Integration of Topographic Scheme
4
707
Experimental Results and Analysis
The method proposed in this paper have been tested on the public domain collection of fingerprint images, DB3 Set A in FVC2002 and on our own database V20. The former contains 800 fingerprint images of size 300×300 pixels captured by capacitor sensor from 100 fingers (eight impressions per finger). The latter consists of 4000 images from 200 fingers of people in different age and job categories. The fingerprint images in our own database V20 are collected by using capacitive sensor of V eridicomT M at a resolution of 500 dpi and quantified into 256 gray levels. It is shown in Fig.1 the results obtained with our scheme on some typical images from DB3 Set A and V20. We can see that our method is able to connect interrupted ridges effectively and eliminate most of burrs and smudges.
Fig. 1. Fingerprint binarization example. The shown images taken from left to right represent the original images, the enhanced images and binarized images obtained with our scheme.
5
Conclusions
In this paper, we introduce an approach to fingerprint binarization integrating nonlinear diffusion filtering scheme and topographic scheme that are all advanced on analyzing in the property of flow-like patterns of fingerprint images. A series of experiments validate our algorithm that takes advantage of both nonlinear
708
X. Zhao et al.
diffusion process and geometric features. Additionally, the fingerprint enhancement can be iterated in an explicit way and be stopped in a very limited steps in most cases for that it is enough to discriminate ridge and valley regions by the sign of the trace of the Hessian. Therefore, the algorithm is computationally efficient and can be applied on on-line fingerprint verification systems.
References 1. Maio, D., Maltoni, D.: Direct Gray-Scale Minutiae Detection in Fingerprints. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 19, No. 1 (1997) 27-40 2. Jiang X., Yau W.: Detecting the Fingerprint Minutiae by Adaptive Tracing the Gray-level Ridge. Pattern Recognition (2001) 999-1023 3. Tico, M., Onnia, V., Kuosmanen, P.: Fingerprint Image Enhancement Based on Second Directional Derivative of the Digital Image. EURASIP Journal on Applied Signal Processing (2002) 1135-1144 4. Moayer, B., Fu, K.S.: A Tree System Approach for Fingerprint Pattern Recognition. IEEE Trans. On Pattern Analysis and Machine Intelligence (1986) 376-387 5. Wahab, A., Chin, S.H., Tan, E.C.: Novel Approach to Automated Fingerprint Recognition. IEE Proc. Vis. Image Signal Process (1998) 160-166 6. Nalini, K., Chen, S., Jain, K.: Adaptive Flow Orientation-Based Feature Extraction in Fingerprint Images. Pattern Recognition (1995) 1657-1672 7. Wang L., Pavlidis T.: Direct Gray-Scale Extraction of Features for Character Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 15, No. 10 (1993) 1053-1067 8. Tico, M., Kuosmanen, P.: A Topographic Method for Fingerprint Segmentation. Proceedings of 1999 International Conference on Image Processing, ICIP99, Kobe, Japan (1999) 36-40 9. Babaud, J., Witkin, A., Baudin, M., et al.: Uniqueness of the Gaussian Kernel for Scale-Space Filtering. IEEE Trans. Pattern Anal. Machine Intelligence, Vol. 8 (1986) 309-320 10. Yuille, A., Poggio, T.: Scaling Theorems for zero Crossings. IEEE Trans. Anal. Machine Intelligence, Vol.8 (1986) 150-158 11. Koenderink, J.: The Structure of Images. Biological Cybernation, Vol.50 (1984) 363-370 12. Hummel, A.: Representations Based on Zero-crossings in Scale-Space. in Proc. IEEE Computer Vision and Pattern Recognition Conf (1987) 204-209 13. Weickert, J.: Multiscale Texture Enhancement. Computer Analysis of Images and Patterns, Lecture Notes in Computer Science, Vol.970 (1995) 230-237 14. Weickert, J.: Coherence-Enhancing Diffusion Filtering. International Journal of computer Vision, Vol.31, No.2/3 (1999) 111-127 15. Weickert, J., Scharr H.: A Scheme for Coherence-Enhancing Diffusion Filtering with Optimized Rotation Invariance. Journal of Visual Communication and Image Representation, Vol. 13 (2002) 103-118 16. Jahne, B., Scharr, H., Korkel S.: Principles of filter Design. Handbook on Computer Vision, Vol.2: Signal Processing and Pattern Recognition, Academic Press, San Diego (1999) 125-152
An Intrusion Detection Model Based on the Maximum Likelihood Short System Call Sequence Chunfu Jia1, 2 and Anming Zhong1 2
1 College of Information Technology and Science, Nankai University, Tianjin 300071 College of Information Science and Technology, University of Sciences and Technology of China, Hefei 230026, China [email protected], [email protected]
Abstract. The problem of intrusion detection based on sequences of system calls is studied. Using Markov model to describe the transition rule of system calls of a process, an intrusion detection model based on the maximum likelihood short system call sequence is proposed. During the training phase, the Viterbi algorithm is used to obtain the maximum likelihood short system call sequence, which forms the normal profile database of a process, during the detecting phase, the system call sequence generated by a process is compared with the maximum likelihood sequence in its normal profile database to detect the intrusions. Experiments reveal good detection performance and quick computation speed of this model.
1 Introduction With the rapid develop of computer network, Intrusion Detection System (IDS) draws more and more attention from researchers, and begins to take a critical role in many real systems. The first problem an intrusion detection system (IDS) faced is the selection of source data. Previous researchers reveal that the sequence of system calls can reflect the essential action characteristics of a process and can be used as a type of effective source data. Researches by Forrest [1] and Kosoresow [2] show that the short sequences of system calls generated by a process at a certain length are stable and reliable. So the behaviour pattern of a process can be described by its short sequence of system calls. In the sequence time delay embedding (STIDE [1]) model, profile of normal behaviour is built by enumerating all unique, contiguous sequences of a predetermined, fixed length T in the training data. The Markov Model was first introduced into the field of IDS by Denning [3]. Ye [4] used Markov Model to calculate the occurrence probability of a certain short system call sequence. If the probability is smaller than a threshold, anomalies are assumed to occur in the current process. HMM (Hidden Markov Model) is also used to detect intrusions in the same manner. Zhong [5] studied the performance of HMM through experiments and concluded that the first-order HMM has better intrusion detection performance than that of the second-order HMM. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 709 – 714, 2006. © Springer-Verlag Berlin Heidelberg 2006
710
C. Jia and A. Zhong
Based on the Markov Model of Ye [4] and the short system call sequence model of Forrest [1], we present a new intrusion detection model using maximum likelihood short system call sequence. In this model, Markov chain is used to describe the transition rule of system calls. There are two phases involved in our model, the training phase and the detecting phase. During the training phase, the Viterbi algorithm is used to obtain the maximum likelihood short system call sequences, from which the normal profile database of a process is built. During the detecting phase, the system call sequences generated by a process are compared with the maximum likelihood sequences in its normal database, and then the difference is used to judge wether the current process is normal. The following parts of this paper are organized as below: part 2 introduces the model and main algorithms, part 3 introduces our experiments on this model, in part 4, the features of our model are discussed and some analyses on the experiment results are presented.
2 The Model and Main Algorithms We use Markov chain to describe the transition rule between system calls. The Markov chain is defined as: λ = ( S , π , A, N ) , (1) where S = {1, 2, , N } — set of states, each system call corresponds with a state;
π = (π i ) N , π i = P ( s1 = i , i ∈ S ) —distribution vector of the initial states; A = (aij ) N × N , aij = P( st +1 = i | st = j ) , ( i, j ∈ S ) — transition probability matrix; N =| S | — the number of states. Our model has five modules: System Call Collecting Module, Pre-processing Module, Markov Training Module, Markov Testing Module and Outputting Module. • System Call Collecting Module: Collects system calls generated by a process to be used as the data source of intrusion detection. This module can be implemented by different technologies on different operating system platform. For example, in Solaris BSM can be used and in Linux, LKM can be used. • Pre-processing Module: Constructs the state set of the Markov chain based on the system calls. Research by Ye [4] shows that taking every system call as a state of the Markov Chain can obtain good detecting performance. But in this way we would get too many states. Matthias [6] reported that the unpopular system calls are valuable for intrusion detection, so we can not simply drop out those unpopular system calls. In this model, we can construct the Markov state set as followings: 1) Scan through the system call sequence and count the occurrence of every system call. For each system call s, compute its frequency P(s) in the system call sequence. 2) Sort the system calls in descending order according to their frequency and give a serial number to each system call. 3) Compute the least integer N so that ΣiN=1−1 P(i ) ≥ c (where i is the serial number of a system call, and c is preset probability value near 1, such as 0.99), then take every system call whose number is between 1 and N-1 as a state of the Markov
An IDS Based on the Maximum Likelihood Short System Call Sequence
711
chain, and take all the other system calls (i.e., the unpopular system calls) as one state of the Markov chain. The Markov chain with N states is so constructed. In the above steps we do not discriminate different unpopular system calls, only treat them as one state of the Markov chain, so the state number of the Markov chain and the computation cost are both reduced. After pre-processing, the sequence of system calls is converted to the sequence of Markov states which can be denoted as s1 s2 … st …. • Markov Training Module: Responsible for the establishment of normal rule database for each process. The normal rule database is composed of the maximum likelihood short state sequence. This module works as followings: 1) Compute the transition matrix A = (aij)N×N of the Markov chain: aij = nij ni ,
(2)
where aij — the transition probability from state i to state j; nij — the number of observation pairs st and st+1 with st in state i and st+1 in state j; ni — the number of observation pairs st and st+1 with st in state i and st+1 in any one of states 1, 2, …, N. 2) Use Viterbi algorithm to calculate the maximum likelihood short state sequence for each process. The maximum likelihood sequence starting with state s at length T can be denoted as Os = s, s2 , , sT . Based on the property of Markov chain, we have P(Os | λ ) = P( s, s2 , , sT | λ ) = ass2 as2 s3 asT −1sT .
(3)
3) Since each ast −1st term in (3) is less than 1 (generally significantly less than 1), it can be seen that as T starts to get bigger (e.g., 10 or more), P(Os | λ ) starts to head exponentially to zero. For sufficiently large T (e.g., 20 or more) the dynamic range of the P(Os | λ ) computation will exceed the precision range of essentially any machine. A reasonable way of performing the computation is by logarithm. By defining
(
)
T ª º U ( s, s2 , sT ) = − « ln ass2 + ¦ ln ast −1st » , t =3 ¬ ¼
(4)
P(Os | λ ) = exp ( −U ( s, s2 , , sT ) ) .
(5)
we can get
The maximum likelihood state sequence (starting with state s at length T) Os = s, s2 , , sT should satisfy the following equation: Os = max P( s, s2 , , sT | λ ) = min U ( s, s2 , , sT ) , T T { st }t =2
{ st }t =2
(6)
We define ωij (the weight from state i to state j) as ωij = − ln(aij ) . Then the problem of finding maximum likelihood state sequence is converted to the problem of finding the shortest state path through a directed weighted graph, and can be solved by the Viterbi algorithm. To discuss the Viterbi algorithm in detail, we should introduce two parameter δ t ( j ) and ψ t ( j ) ( j ∈ N ) , where
712
C. Jia and A. Zhong
δ t ( j ) = max P( s, s2 , s3 , , st −1 , st = j | λ ) , T { st }t = 2
(7)
i.e., δ t ( j ) is the best score (least accumulative weight) along a single path, at time t, which accounts for the first t states and ends in state j. By induction we have
δ t ( j ) = min ( δ t −1 (i ) + ωij ) .
(8)
1≤ i ≤ N
To actually retrieve the state sequence, we need to keep track of the argument which minimized (8), for each t and j. We do this via the array ψ t ( j ) . The complete procedure for finding the maximum likelihood state sequence can now be stated in pseudo code as follows: for each s ∈ S { for (i = 1; i <= N; i++) //Initialization { δ 2 (i ) = ω si ; ψ 2 (i ) = s ;} for (t = 3; t <= T; t++) //Recursion for (j = 1; j <= N; j++) { δ t ( j ) = min (δ t −1 (i ) + ωij ) ; ψ t ( j ) = argmin ( δ t −1 (i ) + ω ij ) ;} 1≤ i ≤ N
sT = argmin (δ T (i ) ) ; 1≤ i ≤ N
1≤ i ≤ N
//Termination
for (t = T-1; t >= 2; t--) //Path backtracking st = ψ t +1 ( st +1 ) ; Os = s, s2 , , sT } • Markov Testing Module: Using a sliding window of length T to scan the testing sequence of system calls generate by a process, we can get a list of short sequences of system calls. For each short sequence, we compare it with the maximum likelihood short sequence in normal rule database starting with the same state. If the number of different states is over a set threshold V (V is between 1 and T), we consider this short sequence as an anomaly; if the number of different states is under V, we consider this short sequence is normal. Define the supporting probability of the testing sequence as the ratio of number of normal short sequences to number of all short sequences. If the supporting probability of a testing sequence is smaller than a threshold, its corresponding process is considered to be an intrusion. • Outputting Module: Output the testing result, i.e., whether the process under test is an intrusion.
3 Experiments To test to performance of our model, we do some experiments on UNM dataset, a testing bed for intrusion detection used by Forrest [1] in the University of New Mexico which can be downloaded from website http://www.cs.unm.edu/~immsec/
An IDS Based on the Maximum Likelihood Short System Call Sequence
713
systemcalls.htm. The UNM dataset is composed of sequences of system calls generated by privileged processes, including different kinds of programs (e.g., programs that run as daemons and those that do not), programs that vary widely in their size and complexity, and different kinds of intrusions (e.g., buffer overflows, symbolic link attacks, and Trojan programs). Our experiments only use the data of CERT sendmail (a subset of UNM dataset), which was collected at UNM on Sun SPARC stations running unpatched SunOS 4.1.1 and 4.1.4 with the included sendmail, including a normal trace file, four syslogd intrusion trace files and two unsuccessful intrusion trace files. All the trace files are composed of sequences of system calls. There are about 1.5 million system calls in the normal trace file.
Fig. 1. Testing Result of Normal Trace
Fig. 2. Testing Result of Abnormal Trace
Detection Rate
Table 1. Comparisons of maximum likelihood model with Forrest’s STIDE Time to Number of build records on Testing normal rule normal rule time database database False Positive Rate
Fig. 3. ROC of the optimum N and T
Maximum likelihood model STIDE
280min
203
7.5min
126min
948
36min
During the training phase of our experiments, using the system call sequences in the normal trace file to train the model, the normal profile database of sendmail is obtained. During the testing phase, short system call sequences at length T from both normal and abnormal sendmail processes are compared with the maximum likelihood sequences in the normal profile database, if the number of different states is over a set threshold, say T/3, the short sequence is labelled as anomaly. It can be easily
714
C. Jia and A. Zhong
concluded that the smaller the supporting probability of an abnormal process, the better performance of our model. We calculated the supporting probability of all processes in the normal trace file and in the six abnormal trace files with different T and N. The average supporting probability of processes in the normal trace file is shown in Fig. 1, and the average supporting probability of processes in the abnormal trace files is shown in Fig. 2. The running time and the number of rules in normal database of our model are compared with Forrest’s STIDE [1] in table 1. The experiments are performed on an Intel Pentium 2.0G computer with 256M RAM.
4 Discussions and Conclusions It can be seen in Fig. 1. that our maximum likelihood model have stable supporting probability for normal processes, generally over 0.80, only declines a bit when T is smaller than 4. The supporting probability for abnormal process varies sensitively according to T and N (smaller than 0.22 when N=8 and T is between 7 and 9). The experiment results with N=8 and T varies form 7 to 9 are shown in ROC[4] curve in Fig. 3, from which we can see that our model can effectively distinguish abnormal process from normal process with optimally selected parameters N and T. As in shown in table 1, the maximum likelihood model has much smaller normal profile database compared with STIDE, so the memory consumption is reduced and the testing speed is greatly accelerated. Although building the normal profile database is computationally expensive, we can do it offline. Once the normal profile database has been built, the testing phase is fast and could be acceptable for online detection.
Acknowledgments This work is supported by the Tianjin Science and Technology Developing Plan Projects of China (Grant 033800611, 05YFGZGX24200).
References 1. Forrest, S., Hofmeyr, S.A., Longstaff, T.A.: A Sense of Self for UNIX Processes. Proc. IEEE Symposium on Security and Privacy, Los Alamitos, CA. (1996) 120-128 2. Kosoresow, A.P., Hofmeyr, S.A.: Intrusion Detection via System Call Traces. IEEE Software. 11 (1997) 35-42 3. Denning, D.: An Intrusion Detection Model. IEEE Transactions on Software Engineering. 13 (1981) 118-132 4. Ye, N., Li, X., Chen, Q.: Probabilistic Techniques for Intrusion Detection Based on Computer Audit Data. IEEE Transaction on System, Man, and Cybernetics-Part A: Systems and Humans .31 (2001) 266-274 5. Zhong, A.M., Jia, C.F.: Study on the Applications of Hidden Markov Model to Computer Intrusion Detections. Proc. of the 5th WCICA, Hangzhou, China. (2004) 4352-4356 6. Schonlau, M., Du, M W., Ju, W.H.: Computer Intrusion: Detecting Masquerades. Statistical Science. 16 (2001) 1-17
Analysis of Shell Texture Feature of Coscinodiscus Based on Fractal Feature Guangrong Ji1, Chen Feng1, Shugang Dong2, Lijian Zhou1, and Rui Nian1 1
College of Information Science and Engineering, Ocean University of China, Qingdao, 266003, China [email protected],[email protected], [email protected],[email protected] 2 Division of Life Science and Technology, Ocean University of China, Qingdao, 266003, China [email protected]
Abstract. In this paper, we propose a method for extracting the shell texture feature of the coscinodiscus. According to the characteristics of these textures, we use the local fractal dimension (LFD) matrix based on the extended fractional brown motion (FBM) as the texture feature to help recognising the species of the coscinodiscus. The experiments have proved the method is effective.
1 Introduction Coscinodiscus, which is a genera of the alga, includes many species. At present, according to the cell figure and the size and arrangement of lecolus on the shell, different species can be recognised by submicroscope artificially [1]. But because of the complex structures of the coscinodiscus, the artificial recognition always wasted much time. By observing the images, we find the lecolus’ arrangement on the shell of the different species of the coscinodiscus presents different texture feature, so we propose a new method using the shell texture as the feature to help the recognition of the coscinodiscus. Texture refers to the intrinsic properties that represent the surface of an object. Texture analysis has played an important role in many areas such as medical imaging and remote sensing, and its tasks are mainly classification, segmentation, and synthesis. The methods for extracting texture features can be defined four categories: statistical methods (GLCM, RLM, SVD), structural methods, model-based methods (AR, MRF, fractal), and transform-based methods (FFT, wavelet) [2]. According to the characteristics of the shell texture of the coscinodiscus, we choose the fractal dimension (FD) to characterize the texture features. Fractal refers to entities, especially sets of pixels, which display a degree of selfsimilarity at different scales. Fractal dimension, which is used to discriminate different textures in this paper, is a parameter that can depict the self-similarity. There are many methods available to estimate the FD of a texture image. Peleg et al. proposed blanket method for estimating FD [3]. Pentland firstly regarded the gray distribution of the image as Fractional Brownian Motion (FBM) to estimate FD [4]. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 715 – 720, 2006. © Springer-Verlag Berlin Heidelberg 2006
716
G. Ji et al.
FD estimate from box-counting has been used by Keller et al. to segment texture [5]. Chaudhuri et al. modified the box-counting method to make the texture segmentation more efficient [6]. FD estimated by blanket method, box-counting method and FBM couldn’t reflect the directional information of the image. Literature [7] extended FBM to horizontal, vertical and diagonal directions and estimated the FD in these 3 directions. Because most of the shell textures of the coscinodiscus are radicalized from the shell center to the shell edge, we choose the FD based on extended FBM model to depict the shell texture features. In section 2, we explain the method for estimating the FD based on the extended FBM model. In this paper, we add FD in the asymmetric-direction in addition, which make the directional information more entire. At the same time, a new method for the extraction of the shell texture features is proposed. In this method LFD matrix is introduced to describe the shell texture features better. We present the new method detailedly in section 3. In section 4 the shell texture images are used to do experiments and the results are analyzed. In the last section we reach the conclusion and put forward the further work needed to do.
2 FD Estimation Based on the Extended FBM Model 2-D discrete FBM model can represent the gray field of the natural texture image. Considering I ( x , y ) is the gray value of the image pixel ( x , y ) , according to the characteristics of FBM, the increment of the gray value is a strict stable process, which can be described as follows:
[
E I ( x 2 , y 2 ) − I ( x1 , y1 ) ∝ ( x 2 − x1 ) 2 + ( y 2 − y1 ) 2
], H
(1)
where H is Hurst coefficient whose value range is from 0 to 1. If we define Δr = ( x 2 − x1 ) 2 + ( y 2 − y1 ) 2 , ΔI Δr = I ( x2 , y 2 ) − I ( x1 − y1 ) , we can use the following
equation to estimate the FD:
E(ΔI Δr ) = KΔr H ,
(2)
where E () is an exception operator, Δr is the spatial distance, ΔI Δr is the intensity variation, and K is the scaling constant. By applying the log function to both sides of the equation (2), we can deduce:
log( E (ΔI Δr )) = H log(Δr ) + log K .
(3)
H can be deduced from the least-squares linear regression to estimate the slope of the gray value differences log( E (ΔI Δr )) versus log(Δr ) by choosing Δrmax and Δrmin [7]. Considering the topological dimension Td , for image, Td = 3 , the fractal dimension D can be estimated from the following equation:
D = Td − H
.
(4)
Analysis of Shell Texture Feature of Coscinodiscus Based on Fractal Feature
717
In the above method, we only estimate FD of the whole image, considering the texture features of all the directions synthetically. By estimating the FD in horizontal, vertical, diagonal and asymmetric-diagonal directions respectively, we can get another 4 different values which can open out the texture features in these 4 different directions.
3 How to Choose the Shell Texture Features We use D h , Dv , Dd and Dad to denote the FD in every direction, and defined →
D = ( D , Dh , Dv , Dd , Dad ) as the feature vector of the texture image. If we simply adapt →
D to depict the texture features, it is imposable to recognise the texture images
effectively because only five values cannot reflect the complex variety of the texture. So how to choose the shell texture features becomes a key question that we have to face.
(a)
(f)
(b)
(g)
(c)
(d)
(h)
(j)
(e)
(l)
Fig. 1. The shell texture images of 10 species of the coscinodiscus. (a) C.apiculatus var .ambigus (b) C.radiatus (c) C.oculus-iridis (d) C.nodulifer (e) C. debilis (f) C.jonesianus (g) C.argus (h) C.subtilis var. subtilis (j) C.asteromphalus var. asteromphalus (l) C.centralis.
Figure 1 shows the shell texture images of 10 species of the coscinodiscus. The holes showed in this figures are called lecolus whose arrangement forms the shell texture. By observing all of the texture images of the coscinodiscus, we find the lecolus’ arrangement on the shell of every species is radicalized from the center to every direction. According to this characteristic, we attempt to use a new method for choosing the shell texture features. This method considers an image of size M × M , which has been scale down four M / 2 × M / 2 quadrants. Then in every quadrant we partitioned different size window in left-up, right-up, left-down and right-down directions using central pixel ( M / 2, M / 2) as the starting point. The size of these windows is s × s where s = 2, , M / 2 . Estimating every window’s FD that is called local fractal dimension →
(LFD), we can get a set of feature vectors D to form a v × 5 feature matrix,
718
G. Ji et al.
v = 4 × log 2M / 2 . In a general way, we select window’s size from 8 or 16 to M / 2 because the 2× 2 window and 4× 4 window are too small to estimate FD based on the extended FBM model effectively.
Fig. 2. The partition of the shell texture image
Using the 256× 256 texture image showed in Figure 2 as an example, the size of the window 1, 5, 9, 13 is 16 × 16 ; the size of the window 2, 6, 10, 14 is 32 × 32 ; the size of the window 3, 7, 11, 15 is 64 × 64 ; the size of the window 4, 8, 12,16 is 128 × 128 .We can get the feature matrix of this shell texture showed as follows:
ª → º « D1 » V =« » « → » « D16 » ¬ ¼
(5) .
4 Experiments In this section, we will do some recognition experiments to prove the method for choosing the shell texture features effective. We have 61 shell texture images ( 256 × 256 ), which belong to 17 species of coscinodiscus. For expressing better, we number every image m ( m = 1, ,61) . Every species of coscinodiscus has N h ( 2 ≤ N h ≤ 7 , h = 1, ,17 ) sample images. In the experiment, we estimate the feature matrix (V1 , , V61 ) of every texture image firstly. Then we choose the texture image n as the recognised image whose feature matrix is Vn and computed the Euclidean distances between Vn and other image’ feature matrixes. The distance can be denoted by em = Vn − Vm . We rank e1 , , e61 from the smallest value to the largest one and get a group of serial numbers which mark the location of em in the arranged sequence, denoted by num m . For every
Analysis of Shell Texture Feature of Coscinodiscus Based on Fractal Feature
719
species of coscinodiscus we can get a value S h that reflects the similarity of every species to the species showed in the image n . S h can be deduced by the follow equation: S h = ¦ numi × (1 / N h ) i
(6)
,
where i denote the numbers of the images which show the textures of the h th species of the coscinodiscus. We also rank S1 , , S17 according to sort ascending. The smaller S h is, the more anterior the location of S h in the sequence is, and the bigger the probability that the species showed in image n is the h th species of the coscinodiscus is. So if the species showed in image n is the j th species, S j should be the first one in the sequence. The location of S j in the arranged sequence shows the validity of method for choosing the shell texture features in a way. We call the location of S j the accurate location. Using every texture image as the recognised image, we deal with them in the same way and find their accurate locations. Then we count the number of every accurate location, and showed them in the Table 1. Table 1. The number of every accurate location Accurate location first Number of 29 accurate location Ratio(%) 47.53
second third
fourth
fifth
sixth
seventh
2
2
2
0
3.28
3.28
3.28
0
11
6
6
3
18.03
9.84
9.84
4.92
eighth others
From Table 1 we can observe when every one of 61 texture images is used as recognised image, the serial number of the species which 90% of them belong to is similar with one of the subscripts of the anterior five S h in the arranged sequence. So for most of recognised images, we can reduce the range of the species which should be compared with from 17 to 5, thereby much time can be saved. We also observe that the accurate locations of 52.47% of images are not the first. The reason is that there are impurities in the seawater which influence the recognised effects. Figure 3 shows 7 shell texture sample images of C. oculus-iridis .The phenomenon is incarnated in some of the 7 images, especially in the fourth and the fifth ones.
Fig. 3. 7 shell texture sample images of C. oculus-iridi
720
G. Ji et al.
5 Conclusions The results of the experiments show that the features extracted by our new method can reveal the characteristics of the shell texture of the coscinodiscus effectively. The method introduced in the paper was the first step of our task. We will enhance the accuracy of the recognition combining with other features of the coscinodiscus, which is our further work.
Acknowledgements The National Natural Science Foundation of P. R. China supported this research.
60572064
fully
References 1. Guo, Y. J., Qian, S. B.: Flore Algarum Marinarum Sinicarum Tomus V Bacllariophyta NO.1 Centricae, Science press, Beijing (2003) 13-14 2. Manish, H., Bharati, J. , Jay Liu, John, F. MacGregor: Image Texture Analysis: Methods and Comparisons. Chemometrics and Intelligent Laboratory Systems, 72 (2004) 57– 71 3. Peleg, S., Naor, J., Hartley, R., Avnir D.: Multiple Tesolution Texture Analysis and Classification. IEEE Trans. Pattern Anal. Mach, Intell, 6 (1984) 518-523 4. Pentland, A.P.: Fractal Based Fescription of Natural Dcenes. IEEE Trans. Pattern Anal. Mach, Intell, 6 (1984) 661-674 5. Keller, J.,Crownover, R., Chen S.: Texture Description and Segmentation Through Fractal Geometry. Comput. Vision Graphics Image Process, 45 (1989) 150-160 6. Chaudhuri, B. B., Sarkar, N., P. Kundu: Improved Fractal Geometry Based Texture Segmentation Technique. Proc. IEEE-part E, 140 (1993) 223-241 7. Hu, J.Y., Zhang, T.Y., Zhang, C.M.: Texture Classification using Fractional Brownian Motion and Probabilistic Neural Network. Journal of Electronics & Information Technology, 26.3 (2004) 389-393
Associative Classification Approach for Diagnosing Cardiovascular Disease∗ Kiyong Noh1, Heon Gyu Lee2,∗∗, Ho-Sun Shon2, Bum Ju Lee2, and Keun Ho Ryu2 1
Korea Research Institute of Standards and Science, Korea [email protected] 2 Database Laboratory, Chungbuk National University, Cheongju, Korea {hglee, khryu}@dblab.chungbuk.ac.kr
Abstract. ECG is a test that measures a heart’s electrical activity, which provides valuable clinical information about the heart’s status. In this paper, we propose a classification method for extracting multi-parametric features by analyzing HRV from ECG, data preprocessing and heart disease pattern. The proposed method is an associative classifier based on the efficient FP-growth method. Since the volume of patterns produced can be large, we offer a rule cohesion measure that allows a strong push of pruning patterns in the patterngenerating process. We conduct an experiment for the associative classifier, which utilizes multiple rules and pruning, and biased confidence (or cohesion measure) and dataset consisting of 670 participants distributed into two groups, namely normal people and patients with coronary artery disease.
1 Introduction The most widely used signal in clinical practice is ECG(Electrocardiogram), which is frequently recorded and widely used for the assessment of cardiac function[1, 2]. ECG processing techniques have been proposed to support pattern recognition[1, 3, 4], parameter extraction, spectro-temporal techniques for the assessment of the heart’s status[5], denoising, baseline correction and arrhythmia detection[6]. Control of the heart rate is known to be affected by the sympathetic and parasympathetic nervous system. It is reported that Heart Rate Variability(HRV) is related to autonomic nerve activity and is used as a clinical tool to diagnose cardiac autonomic function in both health and disease. This paper provides a classification technique that could automatically diagnose Coronary Artery Disease(CAD) under the framework of ECG patterns and clinical investigations. Through the ECG pattern we are able to recognize the features that could well reflect either the existence or non-existence of CAD. Such features can be perceived through HRV analysis based on following knowledge [7, 8]: 1. In patients with CAD, reduction of the cardiac vagal activity evaluated by spectral HRV analysis was found to correlate with the angiographic severity. ∗
This works was supported by the Regional Research Centers Program of Ministry of Education & Human Resources Development in Korea, and Korea Science and Engineering Foundation (#1999-2-303-006-3). ∗∗ Corresponding author. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 721 – 727, 2006. © Springer-Verlag Berlin Heidelberg 2006
722
K. Noh et al.
2. The reduction of variance (standard deviation of all normal RR intervals) and low-frequency of HRV seem related to an increase in chronic heart failure. Our aim of this paper is to investigate an effectiveness and accuracy of classification method for diagnosing ECG patterns. To achieve this purpose, we introduce an associative classifier that is further extended from CMAR [13] by using a cohesion measure for pruning redundant rules. Our classification method uses multiple rules to predict the highest probability classes for each record. The proposed associative classifier can also relax the independence assumption of some classifiers, such as NB (Naive Bayesian) [9] and DT (Decision Tree) [10]. For example, the NB makes the assumption of conditional independence, that is, given the class label of a sample, the values of the attributes are conditionally independent of one another. When the assumption holds true, then the NB is the most accurate in comparison with all other classifiers. In practice, however, dependences can exist between variables of the real data. Our classifier can consider the dependences of linear characteristics of HRV and clinical information. Finally, we implement our classifier and several different classifiers to validate their accuracy in diagnosing heart disease.
2 Feature Extractions for HRV Analysis The ECG signals are recorded by electrocardiography, and are transmitted immediately to a PC for recording for 5 minutes. The sampling frequency for ECG signals is 500Hz. The direct current component is excluded in the calculation of power spectrum to remove the non-harmonic components in the very low-frequency region (<0.04Hz). The area of spectral peaks within the whole range of 0 to 0.4Hz was defined as total power (TP), the area of spectral peaks within the range of 0 to 0.15Hz as low-frequency power (LF), and the area of spectral peaks within the range of 0.15 to 0.4Hz as high-frequency power (HF), respectively. The normalized lowfrequency power(nLF=100 LF/TP) is used as an index of sympathetic modulation; the normalized high-frequency power(nHF=100 HF/TP) as an index of vagal modulation; and the low/high-frequency power ratio (LF/HF) as the index of sympathovagal balance. Table 1 shows the results of extraction of HRV features from ECG, and the example of ECG monitoring program and feature extraction process from raw ECG is shown in Figure 1.
3 Associative Classifier Using FP-Growth This section describes the building classifier phase which consists of discovering the set of all frequent class association rules (CARs). For efficient discovering CARs, we use FP-growth-like method that was introduced in [12, 13]. The process of rule generation and pruning has in common with that described in [13], but with the following modifications: • FP-growth method is extended to perform CAR generation. • A rule cohesion measure is defined and used for rule ranking and pruning.
Associative Classification Approach for Diagnosing Cardiovascular Disease
723
Table 1. Features of HRV analysis TP VLF LF HF LF/HF nLF nHF SDNN
The variance of normal RR intervals in HRV over the 5 min ( 0.4Hz). Power in very low frequency range ( 0.4Hz). Power in low frequency range (0.04~0.15Hz) Power in high frequency range (0.15~0.4Hz) Ratio LF/HF Normalized low frequency power (=100 LF/(TP-VLF)) Normalized high frequency power (=100 HF/(TP-VLF)) Standard deviation of all NN(RR) intervals Table 2. The result of feature selection
Data
Selected Features
HRV Clinical Info.
TP, VLF, LF, HF, nLF, nHF, LF/HF, SDNN Age, Hyper Blood Pressure, Diabetes Mellitus, Smoking, Old Myocardial Infarction, Ejection Fraction, Blood Glucose, Total Cholesterol, Triglyceride, Hyperlipidemia, Systolic Blood Pressure, Diastolic Blood Pressure
Fig. 1. Development of HRV analysis program
3.1 Construction of FP-Tree Extension of the FP-growth method to CAR generation was introduced in [13]. Our work in this direction is very similar but implements several optimizations. 1. Count value of every node is replaced in the FP-tree with the class distribution, where each element of the distribution stores a number of transactions of the current class containing a pattern from this node to the root. 2. Every item count value in the frequent-item header table of the FP-tree is now replaced with item class distribution. For a ruleitem to be inserted into the FP-tree, we require it to be a frequent ruleitem satisfying class minimum support threshold Smin. Let D be a database containing |D| task-relevant transactions and C= {C1,…,Cn} be a finite set of class.
724
K. Noh et al.
Definition 1. Class-support of pattern A in dataset D: Class − Support( A.supi ) =
count( A, Ci ) = P( A,∪Ci | Ci ) |C|
.
(1)
A ruleitem is class frequent for class C if its Class − Support ≥ S min . Consider a database in Table 3. Let Smin=0.5. During the first data scan we find a sorted set L= {e:3, b:4, c:5} of items that are class frequent with at least one class. The sorting is in a frequency increasing order. The first transaction {C1: b, c, d} contains items b and c that are class frequent with C1. By sorting them in L order we get a ruleitem {C1: b, c} and insert it into FP-tree by creating new nodes for items b and c. The second transaction {C1: a, b, c} has item a that is not class frequent with class C1. Since the ruleitem {C1:b,c} has a common prefix with the prefix path already in the tree no new nodes are inserted and only counts are updated. The third transaction is also sorted into {C2:e,b,c}. Item b is not frequent for class C2, so the ruleitem {C2:e,c} is inserted by creating new nodes. For the forth transaction {C2:c,e,f}, we insert a ruleitem {C2:e,c} by just updating the counts. The last transaction is not class frequent with class C3. Let minimum confidence = 0.5. We consider them starting from the most frequent item c. For this item two potential rules are {c C1} and {c C2}. But the confidence of both rules is less then 0.5. Condit-ional prefix paths for item c are (b:{C1:2}) and (e:{C2:2}). The class distribut-ion of each prefix path is the same as that of the corresponding node for item c. Both b and e are included into the conditional frequency table (e:{C2:2}), (b:{C1:2}). Both items are class frequent and hence are included into the corresponding conditional FP-tree {<(e:{C2:2})>,<(b:{C1:2})>}. Since the tree contains two branches, we consider all items in the conditional frequency table of item c in turn. We first verify if the rule corresponding to ruleitem (b,c:{C1:2}) is confident. Its confidence is 0.5 so it is output. Prefix path of b is empty. Next we consider item e and a ruleitem (e,c:{C2:2}). Confidence of the rule (e,c C2) is 2/3 and the rule is output. Prefix path of e is empty.
→
→
→
Table 3. Training dataset ID
Attributes
Class
1 2 3 4 5
{b, c, d} {a, b, c} {b, c, e} {c, e, f} {a, b, c, e}
C1 C1 C2 C2 C3
Fig. 2. FP-tree for training data
Table 4. Mining FP-tree item
Conditional pattern base
Conditional FP-tree
Rules generated
c
(b:{C1:2}), (e:{C2:2})
<(e:{C2:2})>, <(b:{C1:2})>
b e
() ()
<> <>
<(b,c C1), sup=1, conf=0.5> <(e,c C2), sup=1, conf=0.66> <(b C1), sup=1, conf=0.5> <(e C2), sup=1, conf=0.66>
→ → → →
Associative Classification Approach for Diagnosing Cardiovascular Disease
725
3.2 Cohesion Measure Cohesion measure is used for rule ranking. Rule ranking is needed to select the best CARs in case of overlapping CARs. Definition 2. For a rule item1,…,itemn measure defined as, CO (item1 ,..., itemn , C ) =
n
C of length n, rule cohesion is a ranking
Count (item1 ,..., itemn , C ) . Count (item1 ) ⋅ ... ⋅ Count (itemn ) ⋅ Count (C )
(2)
where Count(item1,…,itemn, C) is a number of transactions where the itemsets and class occur together. Cohesion measure is higher if the item1,…,itemn and C co-occur more frequently in the together and are less frequently encountered separately. Rule ranking guarantees that only the highest rank CAR will be selected into the classifier. All patterns are ranked according to the following criteria. Definition 3. Rule Ranking, given two rules Ri and Rj, Ri>Rj , if 1. CO(Ri)>CO(Rj) or 2. CO(Ri)=CO(Rj) but sup(Ri)>sup(Rj) or 3. CO(Ri)=CO(Rj) and sup(Ri)=sup(Rj) but length(Ri )>length(Rj) 4. CO(Ri)=CO(Rj), sup(Ri)=sup(Rj), length(Ri ) > length(Rj) but Ri is generated earlier than Rj. A rule R1: {t C} is said a general rule w.r.t. rule R2: {t' C'}, if only if t is a subset of t'. Given two rules R1 and R2, where R1 is a general rule w.r.t R2, we prune R2 if R1 also has higher rank than R2.
4 Experiments and Results Coronary arteriography is performed in patients with angina pectoris, unstable angina, previous myocardial infarction, or other evidence of myocardial ischemia. Patients with stenosis of the luminal narrowing greater then 50% were recruited as the CAD group, the others were classified as the control group(normal). By using angiography, 390 patients with abnormal(CAD) and 280 patients with normal coronary arteries(Control) were studied. The accuracy was obtained by using the methodology of stratified 10-fold Table 5. Description of summary results Classifier Naïve Bayes C4.5 CBA CMAR Our Model
Precision 0.814 0.659 0.88 0.882 0.921 0.935 0.945 0.889 0.959 0.938
Recall 0.576 0.862 0.889 0.872 0.939 0.915 0.896 0.941 0.939 0.957
F-Measure 0.675 0.747 0.884 0.877 0.93 0.925 0.92 0.914 0.949 0.947
Class CAD Control CAD Control CAD Control CAD Control CAD Control
Root Mean Squared Error 0.4825 0.334 0.2532 0.2788 0.2276
726
K. Noh et al.
cross-validation. We compare our classifier with NB[10] and state-of-art classifiers; the widely known decision tree induction C4.5[11]; an association-based classifier CBA[11, 14]; and CMAR[13], a recently proposed classifier extending NB using long itemsets. The result is shown on Table 5. We used precision, recall, f-measure and root mean square error to evaluate the performance. The result is shown Table 5. As can be seen from the table, our classifier outperforms NB, C4.5, CBA and CMAR. We also satisfied these experiments because our model showed more accurate than Bayesian classifier and decision tree that make the assumption of conditional independence.
5 Conclusions Most of the parameters employed in diagnosing diseases have both strong and weak points existing simultaneously. Therefore, it is important to provide multi-parametric indices diagnosing these diseases in order to enhance the reliability of the diagnosis. The purpose of this paper is to develop an accurate and efficient classification algorithm to automatically diagnose cardiovascular disease. To achieve this purpose, we have introduced an associative classifier that is further extended from CMAR by using a cohesion measure to prune redundant rules. With this technique, we can extract new multi-parametric features that are then used together with clinical information to diagnose cardiovascular disease. The accuracy and efficiency of the experimental results obtained by our classifier are rather high. In conclusion, our proposed classifier outperforms other classifiers, such as NB, C4.5, CBA and CMAR in regard to accuracy.
References 1. 2. 3. 4. 5.
6. 7.
8. 9. 10. 11.
Cohen.: Biomedical Signal Processing. CRC press, Boca Raton, FL (1988) Conumel, P., ECG: Past and Future. Annals NY Academy of Sciences, Vol.601 (1990) J. Pan: A Real-time QRS Detection Algorithm. IEEE Trans. Eng. 32 (1985) 230-236 Taddei, G., Constantino, Silipo, R.: A System for the Detection of Ischemic Episodes in Ambulatory ECG. Computers in Cardiology, IEEE Comput. Soc. Press, (1995) 705-708 Meste, H., Rix, P., Caminal.: Ventricular Late Potentials Characterization in Timefrequency Domain by Means of a Wavelet Transform. IEEE Trans. Biomed. Eng. 41 (1994) 625-634 Thakor, N. V., Yi-Sheng, Z.: Applications of Adaptive Filtering to ECG Analysis: Noise Cancellation and Arrhythmia Detection. IEEE Trans. Biomed. Eng. 38 (1991) 785-794 Kuo, D., Chen, G. Y.: Comparison of Three Recumbent Position on Vagal and Sympathetic Modulation using Spectral Heart Rate Variability in Patients with Coronary Artery Disease. American Journal of Cardiology, 81 (1998) 392-396 Guzzetti, S., Magatelli, R., Borroni, E.: Heart Rate Variability in Chronic Heart Failure. American Neuroscience; Basic and Clmical, 90 (2001)102-105 Duda, R., Hart, P.: Pattern Classification and Scene Analysis. John Wiley, New York, (1973) Quinlan, J., C4.5: Programs for Machine Learning, Morgan Kaufmann. San Mateo, (1993) Liu, W., Ma, Y.: Integrating Classification and Association Rule Mining. In Proc. of the 4th International Conference Knowledge Discovery and Data Mining, (1998)
Associative Classification Approach for Diagnosing Cardiovascular Disease
727
12. Han, J., Pei, J., Yin, Y.: Mining Frequent Patterns without Candidate Generation. In SIGMOD'00, Dallas, TX, (2000) 13. Li, W., Han, J., Pei, J.: CMAR: Accurate and Efficient Classification Based on Multiple Association Rules. In Proc. of 2001 International Conference on Data Mining, (2001) 14. Jin Suk Kim, Hohn Gyu Lee, Sungbo Seo, Keun Ho Ryu: CTAR: Classification Based on Temporal Class-Association Rules for Intrusion Detection. In Proc, of the 4th International Workshop on Information Security Applications, (2003) 101-113
Attentive Person Selection for Human-Robot Interaction Diane Rurangirwa Uwamahoro1, Mun-Ho Jeong1 , Bum-Jae You1 , Jong-Eun Ha2 , and Dong-Joong Kang3 1
Korea Institute of Science and Technology, {diana, mhjeong, ybj}@kist.re.kr 2 Seoul National University of Technology [email protected] 3 Pusan National University [email protected]
Abstract. We present a method that enables the robot to select the most attentive person into communication from multiple persons, and gives its attention to the selected person. Our approach is a common components-based HMM where all HMM states share same components. Common components are probabilistic density functions of interaction distance and people’s head direction toward the robot. In order to cope with the fact that the number of people in the robot’s field of view is changeable, the number of states with common components can increase and decrease in our proposed model. In the experiments we used a humanoid robot with a binocular stereo camera. The robot considers people in its field of view at a given time and automatically shifts its attention to the person with highest probability. We confirmed that the proposed system works well in the selection of the attentive person to communicate with the robot. Keywords: Common components, Hidden Markov Model, attention.
1
Introduction
Human attention is an essential for HRI (Human-Robot Interaction) as well as HCI (Human-Computer Interaction) since attention sensed in many ways such as audio signal processing and computer vision technologies expresses human intention to communicate with robots or computers. The selection of attentive person might not be of little importance in single person-to-robot interaction, however, it becomes critical to multiple-person-to-robot interaction since it is the first step for interaction or communication. Human attention has been studied in the area of HCI. Ted Selker applied visual attentiveness from eye-motion and eye-gesture to drive some interfaces [1,2]. C. J. Lee et al. designed the Attention Meter, a vision-based input toolkit, which measures attention using camera based input and was applied to an interactive karaoke space and an interactive night-market [3]. Their primary concerns are D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 728–734, 2006. c Springer-Verlag Berlin Heidelberg 2006
Attentive Person Selection for Human-Robot Interaction
729
how to measure attentiveness or how to apply it to interactive user interfaces, but not how to select an attentive person for interaction. There have been some studies to deal with human attention for multiple person-to-robot interaction. Tasaki et al. proposed the method that enables robots to communicate with multiple people using the selection priority of the interactive partner based on the concept of proxemics [4]. Lang et al. presented the attention system for a mobile robot that enables the robot to shift its attention to the person of interest and to maintain attention during interaction [5]. Both the two systems, with regard to the attentive person selection they just used simple rules like digital gates combining sound localization with people detection. With their ways treating locations of sound and people, they should fail in obtaining the attentiveness with continuousness and uncertainties related to the clues such as how far people are from the robot and who is talking. The difficulties in the assessment of attention and the selection of the attentive person result from the followings: first, we should choose measurable features for the assessment of attentiveness adequate for HRI. Second, it is difficult to measure the features due to environmental noises. Third, we should consider history of features and uncertainty of features. Lastly, the selection of attentive person for interaction depends not only on the above features but also on the interaction strategies and complex atmosphere of conversation like human-tohuman interaction. In this paper we present a method to select the most attentive person for multiple-person-to-robot interaction. To express the continuousness and uncertainty of the attentive clues we use the probabilistic density functions called as common components, which represent human attentiveness related to head direction, distance to the robot, sound location, body motion, voice recognition, and etc. The robot is in a dynamic environment where the number of people in the field of view of the robot is variable and they take actions to communicate with the robot over time. To model variable presence of people and the most attentive person selection, we suggest a common-components based HMM (Hidden Markov Model) incorporating common components into a HMM framework. In this model a person corresponds to a state that could vanish and appear. Therefore, the optimal state with the maximum likelihood at certain time represents the most attentive person with the highest intention to communicate with the robot. The remainder of this paper is organized as follows. In the following section we explain the approach of common components-based HMM proposed in this paper. In section 3 we give application of common components-based HMM to the most attentive person selection. Section 4 concludes this paper.
2
Common Components-Based HMM
Hidden Markov Model (HMM) is a stochastic process, in which a time series is generated and analyzed by a probability model [6]. HMM and its modified types
730
D.R. Uwamahoro et al.
Fig. 1. Common Component
Fig. 2. Complement of Common Component
have been widely used in the areas of HRI such as gesture recognition [7,8,9], face recognition [10] and speech recognition [11]. Conventional HMM has the fixed number of states representing probabilistic distributions and the transition probabilities between the states. This property well expresses complex distributions of time series, but sets limits to such application that the number of states changes over time. The common componentsbased HMM proposed in this section can overcome the limitation since it utilizes an appointed rule forming states with probabilistic components shared by all states. 2.1
State with Common Components
Common components, f (ck ), are probability density functions of observations and compose basis functions of the states as follows: p(x) = f (c1 )f (c2 ) . . . f (cK ), x = (c1 , c2 , . . . , cK )T
(1)
where K is the size of measurements x. f (ck ) is illustrated in Fig 1 and should satisfy the condition of probability distribution function (pdf) mathematically expressed as: b f (ck ) = 1, 1 ≤ k ≤ K (2) a
The section between a and b represents the sensing scope. We also define the complement of the basis function of the states, p(x) = f (c1 )f (c2 ) . . . f (cK ).
(3)
f (ck ) is shown as Fig. 2 where M is constant and f (ck ) obtained by f (ck ) = M − f (ck ),
b
f (ck ) = 1
(4)
a
Based on the basis function of the state, we define the probabilistic distributions of the states by
Attentive Person Selection for Human-Robot Interaction
731
P (ot |qt = s1 ) = p(xs1 )p(xs2 )p(xs3 ) . . . p(xsNt ), P (ot |qt = s2 ) = p(xs1 )p(xs2 )p(xs3 ) . . . p(xsNt ), .. . P (ot |qt = sNt ) = p(xs1 )p(xs2 ) . . . p(xsNt−1 )p(xsNt ).
(5)
where qt {s1 s2 . . . sNt } is a state variable, ot = (xs1 xs2 . . . xsNt )T is the observation vector and Nt is a variable of the number of states. Equations 1 and 5 show good scalability of the states based on the common components. When the number of states is changed, the probabilistic distributions of states are easily updated by the rule shown as equation 5 noting that the observation vector does not have a fixed size due to Nt . 2.2
Optimal State Selection
We defined the states of the common components-based HMM in the previous section. The other necessary elements for constituting the common componentsbased HMM are similar to the typical ones of HMM such as the initial probabilities of states πs and the state transition probabilities Asi sj . A little difference from the conventional HMM is caused by the fact that those elements should be updated according to changes in the number of states. The Viterbi algorithm is used to find the most probable sequence of hidden states maximizing P (Qt , Ot ) where Qt = {q1 q2 . . . qt } , Ot = {o1 ot . . . ot }. Fortunately, the Viterbi algorithm can be extended to the proposed model by the use of Nt instead of the fixed number of states N in the conventional HMM.
3
Application to Most Attentive Person Selection
The proposed method was implemented in a humanoid robot, MAHRU [12]with binocular stereo vision camera system to capture images. The robot has to select one person among many people in its field of view and gives attention to the selected person. The number of people in the field of view of the robot (participants) varies over time. Experiments were conducted to ascertain the functions of the proposed method. The states of the common components-based HMM at a particular time correspond to participants at that time. We defined the common components of distance between the robot and a person and head direction of a person using Gaussian distribution as follows: (s) f (ck )
1 = √ exp 2πσk
−(c
(s) − μk )2 k 2σ2 k
+
(6)
where k = 1 for the distance, k = 2 for the head direction and is a constant value for fulfillment of 2. The measurements vector of each sate is defined as (s)
(s)
xs = (c1 , c2 )
(7)
732
D.R. Uwamahoro et al.
(a) Step 1
(b) Step 2
(c) Step 3
Fig. 3. Most Attentive Person Without Changes in the Number of People . Step 1 : There are two participants; Person A (left) and person B (right). They are at the same distance from the robot’s location, A is looking at the robot and B is not looking at the robot. A is selected (the circle within the face shows the selected person). Step 2 : B came near the robot looking at the robot, A stayed at the same location as in Step 1 and is looking at the robot. B is selected. Step 3 : A came near the robot with a very short distance less than 0.5 meters between him and the robot, B went back a little bit and continued to look at the robot. B is selected.
The observation vector at time t noted by ot groups the measurements of each state at time t, (s )
(s )
(s
)
(s
)
ot = (xs1 , . . . , xsNt ) = (c1 1 , c2 1 , . . . , c1 Nt , c2 Nt ).
(8)
The dimension of the observation vector at time t depends on the size of states at that time multiplied by the number of measurements. In this case it is given by: dim ot = K ∗ Nt = 2Nt The initial probabilities of states are defined as πsi =
1 , N0
s1 ≤ si ≤ sN0 , N0 = 0
(9)
The transition probabilities between the states are set to Asi si = 0.95, Asi sj =
0.05 , i = j, Nt − 1
s1 ≤ s i , sj ≤ s N t
(10)
Using the Open Source Computer Vision Library (OpenCV) face detector [13], we detect human faces in the captured images and calculate their head direction. The distance from the robots location is calculated by estimating the average depth within the regions of the detected faces. Figure 3 shows the experimental result for the most attentive person in the case that the number of participants is fixed. The results expresses that the proposed method well describes the human attentiveness considering continuousness and uncertainty of such measurements as distance and direction. As said, the proposed common components-based HMM allows the state size to vary to cope with the change in the number of participants. In Fig. 4 we can confirm that.
Attentive Person Selection for Human-Robot Interaction
(a) Step 1
(b) Step 2
(c) Step 3
(d) Step 4
(e) Step 5
(f) Step 6
733
Fig. 4. Most Attentive Person Without Changes in the Number of People. Step 1 : A (left), B (centre) and C (right). A is not looking straight at the robot . B is looking straight at the robot and the distance between him and the robot is closer to 1.5 meter the best distance - than the distance of A and B from the robot respectively. C is not looking at the robot. B is selected (the circle within the face shows the selected person). Step 2 : A came close to the robot and looking straight at the robot. B goes back and continues to look straight at the robot. C stayed at the same location and is looking at the robot. A is selected. Step 3 : There are three people; A, B and C. As for the robot, it can see only one face. Face A can not be recognized as face because it is partially seen by the robot, and B head direction is higher than 450 , in that case the robot can not recognize it as a face. C is the only participant and is selected. Step 4 : There are three participants; A, C and D (new participant at the right). Both are looking at the robot straight and they are at different distance from the robot. C is at the distance closer to 1.5 meter than A and D. C is selected. Step 5 : There are four participants; A, B, C and D. C and D are looking straight at the robot however they are at different distances from the robot. D is at a better distance than others. D is selected. Step 6 : Three participants stay; A, B and D. D moved close to the robot. Both A and B maintained their location from the previous step and are looking straight at the robot. D is selected.
4
Conclusion and Further Work
In this paper, we have presented a common components-based HMM to select the most attentive person for multiple person-to-robot interaction. The use of common components and an appointed rule forming states made it possible to overcome the limitation of HMM due to the fixed state size of HMM. And we found that the Viterbi algorithm still could be feasible to the case that the size of states is variable. We have implemented the proposed method in a humanoid robot, MAHRU [12] to enable the robot to select one person into communication among many participants. While the participants move in/out and change their head directions at same time, the robot showed successful results shifting its attention to the participant with the highest intention to communicate.
734
D.R. Uwamahoro et al.
There are two main works for the future. The first one is to estimate the parameters of the common components-based HMM by learning and show the effectiveness of the rule forming the states with common components theoretically. In order to make the most use of the scalability of the states based on the common components, the other one for the future is to incorporating sound localization and body motion into common components.
References 1. Selker, T., Snell, J.: The Use of Human Attention to Drive Attentive Interfaces. Invited paper CACM (2003) 2. Selker, T.: Visual Attentive Interfaces. BT Technology Journal. Vol.22, No.4 (2004)146–150 3. Lee, C.H.J., Jang, C.Y.I., Chen, T.H.D., Wetzel, J., Shen, Y.T.B., Selker, T.: Attention Meter: A Vision-based Input Toolkit for Interaction Designers. CHI 2006 (2006) 4. Tasaki, T., Matsumoto, S., Ohba, H., Toda, M., Komatani, K., Ogata, T., Okuno, H.G.: Dynamic Communication of Humanoid Robot with Multiple People Based on Interaction Distance. Proc. of 2nd International Workshop on Man-Machine Symbiotic Systems (2004) 329–339 5. Lang, S., Kleinehagenbrock, M., Hohenner, S., Fritsch, J., Fink, G.A., Sagerer, G.: Providing the Basis for Human-Robot-Interaction: A Multi-modal Attention System for a Mobile Robot. International Conference on Multimodal Interfaces (2003) 28–35 6. Rabiner, L.R.: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proc. IEEE, Vol.77 (1989) 257–286 7. Jeong, M.H., Kuno, Y., Shimada, N., Shirai, Y.: Recognition of Shape-Changing Hand Gestures. IEICE Tran. on Information and Systems, Vol. E85-D (2002) 1678–1687 8. Jeong, M.H., Kuno, Y., Shimada, N., Shirai, Y.: Recognition of Two-Hand Gestures Using Coupled Switching Linear Model. IEICE Tran. on Information and Systems, Vol. E86-D (2002) 1416–1425 9. Starner, T., Pentland, A.: Real Time American Sign Language Recognition from Video using Hidden Markov Models. Technical report 375, MIT Media Lab (1996) 10. Othman, H., Aboulnasr, T.: A Tied-Mixture 2D HMM Face Recognition System. Proc. 16th International Conference of Pattern Recognition (ICPR’02), Vol. 2 (2002) 453–456 11. Peinado, A., Segura, J., Rubio, A., Sanchez, V., Garcia, P.: Use of Multiple Vector Quantisation for Semicontinuous-HMM Speech Recognition. Vision, Image and Signal Processing, IEE Proceedings, Vol. 141 (1994) 391–396 12. MAHRU: http://humanoid.kist.re.kr (2006) 13. Intel, C.: Open Source Computer Vision (OpenCV) Library. http://www.intel. com/technology/computing/opencv (Retrieved October 2005)
Basal Cell Carcinoma Detection by Classification of Confocal Raman Spectra Seong-Joon Baek and Aaron Park The School of Electronics and Computer Engineering, Chonnam National University, Gwangju, South Korea, 500-757 [email protected]
Abstract. In this study, we propose a simple preprocessing method for classification of basal cell carcinoma (BCC), which is one of the most common skin cancer. The preprocessing step consists of data clipping with a half hanning window and dimension reduction with principal components analysis (PCA). The application of the half hanning window deemphasizes the peak near 1650cm−1 and improves classification performance by lowering the false positive ratio. Classification results with various classifiers are presented to show the effectiveness of the proposed method. The classifiers include maximum a posteriori (MAP) probability, k-nearest neighbor (KNN), and artificial neural network (ANN) classifier. Classification results with ANN involving 216 confocal Raman spectra preprocessed with the proposed method gave 97.3% sensitivity, which is very promising results for automatic BCC detection.
1
Introduction
Skin cancer is one of the most common cancers in the world. Recently, the incidence of skin cancer has dramatically increased due to the excessive exposure of skin to UV radiation caused by ozone layer depletion, environmental contamination, and so on. If detected early, skin cancer has a cure rate of 100%. Unfortunately, early detection is difficult because diagnosis is still based on morphological inspection by a pathologist. There are two common skin cancers: basal cell carcinoma (BCC) and squamous cell carcinoma (SCC). Both BCC and SCC are nonmelanoma skin cancers, and BCC is the most common skin neoplasm [1]. Thus the accurate detection of BCC has attracted much attention from clinical dermatologists since it is difficult to distinguish BCC tissue from surrounding noncancerous tissue. The routine diagnostic technique used for the detection of BCC is pathological examination of biopsy samples. However this method relies upon a subjective judgment, which is dependent on the level of experience of an individual pathologist. Thus, a fast and accurate diagnostic technique for the initial screening and selection of lesions for further biopsy is needed [2]. Raman spectroscopy has the potential to resolve this problem. It can be applied to provide an accurate medical diagnosis to distinguish BCC tissue from
Corresponding author.
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 735–740, 2006. c Springer-Verlag Berlin Heidelberg 2006
736
S.-J. Baek and A. Park
surrounding normal (NOR) tissue. Recently, direct observation method based on confocal Raman technique was presented for the dermatological diagnosis of BCC [2]. According to the study, confocal Raman spectra provided promising results for detection of precancerous and noncancerous lesions without special treatment. Hence, with the confocal Raman spectra, we could design an automatic classifier having robust detection results. In this paper, we propose a simple preprocessing method for classification of BCC. Experiments with three kinds of classifiers including MAP, KNN, and ANN were carried out to verify the effectiveness of the proposed method.
2
Raman Measurements and Preprocessing of Data
The tissue samples were prepared with the conventional treatment, which is exactly the same as [2]. BCC tissues were sampled from 10 patients using a routine biopsy. Cross sections of 20 μm were cut with a microtome at -20 o C and stored in liquid nitrogen. Two thin sections of every patients were used for experiments. One section was used for classification and the other section was stained with H&E and used as a reference after locating the boundaries between BCC and NOR by an expert pathologist with a routine cancer diagnosis. The confocal Raman spectra for the skin samples are shown in Figure 1, where no strong background noise is not observed. In the Fig. 1A, there is a clear distinction between BCC and NOR tissues. Most of the spectra belong to this case. The Fig. 1B shows the case when a BCC spectrum is measured in the vicinity of the boundary of BCC and NOR. Since a peak near 1600 cm−1 is a distinctive feature of BCC spectra as you see in the Fig. 1A, the BCC spectrum in Fig. 1B could be classified as BCC though the feature is not so evident. The Fig. 1C shows an outlier, where the BCC spectrum is obtained in the middle of the BCC region but looks very similar to that of NOR. Similar spectrum can be found in Fig. 2B (g). The case will be discussed in the latter section. A skin biopsy was performed in the perpendicular direction from the skin surface, and it is the same for the spectral measurements. That is the direction from the epidermis to the dermis in Fig. 2A and 2B. Raman spectra of BCC
NOR BCC A
B
C 1800
1600
1400
1200 800 1000 Raman Shift (cm-1)
600
Fig. 1. Confocal Raman spectra from three patients at different spots
BCC Detection by Classification of Confocal Raman Spectra
a
}
b
skin
c
}
d e
30~40༁
f g h i
A
1800
a
1400
1000
Raman Shift (cm-1)
600
}
b
NOR
skin
c
}
d e
BCC
} NOR
j
737
f
30~40༁
g h i
}
j
B
1800
1400
1000
Raman Shift (cm-1)
NOR
BCC
NOR
600
Fig. 2. Confocal Raman profiles of skin tissue with an interval of 30-40 μm
tissues were measured at different spots with an interval of 30-40 μm. In this way, 216 Raman spectra were collected from 10 patients. We normalized the spectra so that they fall in the interval [-1,1], which is often called minmax method. Needless to say, there are many normalization methods. For example, one can normalize a given data set so that the inputs have means of zero and standard deviations of 1 or have the same area. According to our preliminary experiments, however, the minmax method gave the best results. Thus we adopted the simple minmax normalization method. Next to the normalization of data, we applied a clipping window so that unnecessary data should be discarded. Unnecessary data generally degrade the performance of a classifier. According to the previous work [2], the main spectral differences between BCC and NOR are in the region of 1220 - 1300 cm−1 and 1640 - 1680 cm−1 , which are also observed in Fig. 1. Thereby we discarded the data in the region below 1200 cm−1 . In addition to it, the data in the region, 1800 - 1600 cm−1 , were windowed by half Hanning window. The presence of high peak near 1650 cm−1 is a marker of NOR tissue, while the high peak near 1600 cm−1 is a marker of BCC tissue as you see in Fig. 1A. BCC spectra measured in the vicinity of the boundary often possess both peaks. They are classified into NOR even though the characteristics of the other region is similar to those of BCC. Thus application of a half Hanning window should improve the classification rates by lowering the peak near 1650 cm−1 or lower the false positive ratio at least. A half Hanning window is defined as follows. w[n] = 0.5 − 0.5 cos(2πn/M ), 0 ≤ n ≤ M/2. Overall data window we used in the experiments are plotted in the Fig. 3. For dimension reduction, well known PCA was adopted. PCA identifies orthogonal bases on which projections are uncorrelated. Dimension is reduced by discarding transformed input data with low variance, which is measured by the corresponding eigenvalue. The number of retained principal components is determined experimentally to be 5.
738
S.-J. Baek and A. Park 1
0.5
0 1800
1600
Raman Shift (cm-1)
1200
. . . . . .
Fig. 3. The clipping data window combined with a half Hanning window
3
Experimental Results
Three types of classifiers including MAP, KNN, and ANN were examined. In the MAP classification, we select the class, wi , that maximizes the posterior probability P (wi |x). Given the same prior probability, it is equivalent to the selection of the class that maximizes the class conditional probability density. Let w1 , w2 be BCC class and NOR class respectively. MAP classification rule is expressed as follows [3]. Decide w1 if P (x|w1 ) ≥ P (x|w2 ), where conditional probability density is modeled by multivariate Gaussian. We used the Mahalanobis distance for the KNN classification. The discriminant function of the KNN classifier, gi (x), is the number of i class training data among the k number of nearest neighbors of x. The number k was set to be 5 experimentally. The KNN algorithm requires a large number computation in proportion to the number of training data. Fortunately, there are many fast algorithms available. In the experiments, we used the algorithm in [4]. As an ANN, multi layer perceptron (MLP) networks were employed for the classification. Extreme flexibility of MLP often calls for careful control of overfit and detection of outliers [5]. But for the well separated data, overly careful parameter adjustments of the networks are not necessary. In the experiment, the number of hidden unit was set to be 9 and a sigmoid function was used as an activation function. Since there are only two classes, we used one output unit. ANN models were trained to output -1 for the NOR class and +1 for the BCC class using back propagation algorithm. At the classification stage, output value is hard limited to give a classification result. The performance of MLP undergoes a change according to the initial condition. Thus the experiments were carried out 20 times and the results were averaged. Overall 216 data were divided into two groups. One is a training set and the other is a test set. Actually, the data from 9 patients were used as a training set and the data from the remaining patient were used as a test set. Once the classification completes, the data from one patient are eliminated from the training set and used as new test data. The previous test data are now inserted into the training set. In this way, the data from every patients were used as a test set. The average number of BCC and NOR spectra in the test set is 8 and 14 and that of the training set is 68 and 126 respectively.
BCC Detection by Classification of Confocal Raman Spectra
739
The classification results without the data windows are summarized in the Table 1. In the table, we can see that the sensitivity of every methods is over 91.5%. Among them, MAP and ANN shows the sensitivity over 93% and outperforms KNN. Since there were not enough BCC data, nonparametric methods such as KNN might be inferior to the others. But the specificity of KNN is nearly equal to the others for the case of NOR detection. Table 1. Classification results with original data. Stars indicate the decision of an expert pathologist MAP KNN ANN BCC NOR BCC NOR BCC NOR BCC∗ 93.0 7.0 91.8 8.2 93.2 6.8 NOR∗ 4.2 95.8 6.9 97.1 5.6 96.4
To show the effectiveness of the proposed data window, another experiments were carried with the window. The results are shown in the Table 2. Even with simple clipping window, the classification performance is improved but further improved when the clipping window is combined with a half Hanning window. With the half Hanning window, the sensitivity of every methods is over 94%. The average sensitivity is increased by 0.73% while the averaged specificity by 0.53%. This indicates that the half Hanning window contributes to lowering the false positive ratio more than the false negative ratio. In case of ANN, the false positive ratio was reduced from 6.2% to 3.1% and overall true classification rate is about 97%. Considering that this performance enhancement is achieved without any cost, the usage of the proposed data window is easily justified. Table 2. Classification results with data windowing. Stars indicate the decision of an expert pathologist.
Simple Clipping Half Hanning
BCC∗ NOR∗ BCC∗ NOR∗
MAP BCC NOR 94.5 5.5 2.9 97.1 94.6 5.4 2.3 97.7
KNN BCC NOR 94.6 5.4 3.8 96.2 95.9 4.1 2.9 97.1
ANN BCC NOR 96.5 3.5 5.6 96.4 97.3 2.7 3.5 96.5
Even though the classification error rates is already small, there is a possibility to further improve the performance. Careful examination of the errornous data reveals an interesting thing. Many of the false positive errors arise in the middle of the BCC region. The Fig. 1C and Fig. 2B (g) are such examples. Considering that the confocal Raman spectroscopy focus on a very small region, normal tissue could be focused on by chance instead of BCC tissue. Since BCC tissue is marked as a region. there is a possibility of false marking. Hence, we
740
S.-J. Baek and A. Park
are currently investigating the method to fix this kind of problem. Taking this into consideration, we could say that the classification could be almost perfect especially for the detection of BCC.
4
Conclusion
In this paper, we propose a simple preprocessing method for classification of basal cell carcinoma (BCC), which is one of the most common skin cancer. The preprocessing step consists of data clipping with a half hanning window and dimension reduction with principal components analysis (PCA). The experimental results with and without the data window show that the application of the data window could lower the false positive ratio. The ANN classification performance involving 216 Raman spectra was about 97% when the data was processed with the proposed window. With this promising results, we are currently investigating automatic BCC detection tools.
Acknowledgement This work was supported by grant No. RTI-04-03-03 from the Regional Technology Innovation Program of the Ministry of Commerce, Industry and Energy(MOCIE) of Korea.
References 1. Jijssen, A., Schut, T. C. B., Heule, F., Caspers, P. J., Hayes, D. P., Neumann, M. H., Puppels, G. J.: Discriminating Basal Cell Carcinoma from its Surrounding Tissue by Raman Spectroscopy. Journal of Investigative Dermatology 119 (2002) 64–69 2. Choi, J., Choo, J., Chung, H., Gweon, D.-G., Park, J., Kim, H. J., Park, S., Oh, C.H.: Direct Observation of Spectral Differences between Normal and Basal Cell Carcinoma (BCC) Tissues using Confocal Raman Microscopy. Biopolymers 77 (2005) 264–272 3. Duda, R. O., Hart, P. E., Stork, D. G.: Pattern Classification. Jone Wiley & Son, Inc (2001) 4. Baek, S.J., Sung, K.-M.: Fast KNN Search Algorithm for Nonparametric Classification. IEE Electronics Letters 35 (2000) 2104–2105 5. Gniadecka, M., Wulf, H., Mortensen, N., Nielsen, O., Christensen, D.: Diagnosis of Basal Cell Carcinoma by Raman Spectra. Journal of Raman Spectroscopy 28 (1997) 125–129
Blind Signal-to-Noise Ratio Estimation Algorithm with Small Samples for Wireless Digital Communications∗ Dan Wu, Xuemai Gu, and Qing Guo Communication Research Center of Harbin Institute of Technology, Harbin, China {wudan, xmgu, qguo}@hit.edu.cn
Abstract. To extend the range of blind signal-to-noise ratio (SNR) estimation and reduce complexity at the same time, a new algorithm is presented based on a signal subspace approach using the sample covariance matrix of the received signal and combined information criterion (CIC) in information theory. CIC overcomes the disadvantages of both Akaike information criterion’s (AIC) under penalization and minimum description length’s (MDL) over penalization and its likelihood form is deduced. The algorithm needs no prior knowledge of modulation types, baud rate or carrier frequency of the signals. Computer simulation shows that the algorithm can blindly estimate the SNR of digital modulation signals commonly used in additional white Gaussian noise (AWGN) channels and Rayleigh fading channels with small samples, and the mean estimation error is less than 1dB for SNR ranging from -15dB to 25dB. The accuracy and simplicity of this method make it more adapt to engineering applications.
1 Introduction Signal-to-noise ratio (SNR) is defined as a measure of signal strength relative to background noise and it is one of the most important criterions for information transformation quality. In modern wireless communication systems, the precise knowledge of SNR is often required by many algorithms for their optimal performance. For example, SNR estimates are typically employed in power control, mobile assisted hand-off, adaptive modulation schemes, as well as soft decoding [1,2] procedures, etc. . Estimating SNR and providing this estimate to the data detector are essential to the successful functioning of any communications receiver. SNR estimators can be divided into two classes. One class is the data-aided estimator for which known (or pilot) data is transmitted and the SNR estimator at the receiver uses known data to estimate the SNR. The other class is the non-data-aided estimator. For this class of estimator, no known data is transmitted, and therefore the SNR estimator at the receiver has to “blindly” estimate the SNR. Although the dataaided estimator performs better than the non-data-aided estimator, it is not suitable for non-cooperative situations. In this paper, non-data-aided or blind SNR estimator is considered. Some methods have been proposed recently. In [3], SNR estimation in ∗
This work is supported by National 863 Projects of China. Item number is 2004AA001210.
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 741 – 748, 2006. © Springer-Verlag Berlin Heidelberg 2006
742
D. Wu, X. Gu, and Q. Guo
frequency domain was introduced, using circular correlation for M-ary phase shift keying (MPSK), but the method is not suitable for other modulation types. Fourthorder moments method was applied in [4] for constant envelope modulations, and in [5], an envelope-based estimation method is proposed for nonconstant envelope modulations. Both of these two methods need the prior knowledge of envelope. In [6], an interative SNR estimation for negative SNRs algorithm was developed. However, the method has relatively high bias for low SNR (When SNR is below -10dB, bias is over 3dB). Blind SNR estimation can be employed in many hot fields of Information War, such as threat analysis, electronic surveillance system, etc. These applications have a high demand of estimation speed and range of SNR. However, performances of the methods mentioned above decrease when the number of samples is not big enough. Even the number of samples is appropriate, performances are not desirable when SNR is below zero. In this paper, a new blind SNR estimation algorithm is presented based on eigenvalue decomposition of the correlation matrix of the received signals and the principle of combined information criterion (CIC)[7]. Compared with using Akaike information criterion (AIC) and minimum description length (MDL), algorithm using CIC gives more accurate results in additional white Gaussian noise (AWGN) channels at low SNR with small samples. When applied to Rayleigh fading channels, the performance is also acceptable. This paper is organized as follows. After the statement and formulation of the problem in Section II, the blind SNR estimation algorithm is introduced in Section III. In Section IV the computer simulation results are presented. Section V draws the conclusions.
2 Problem Formulation Assume y(t) is a received signal in AWGN channels. Then the signal model after sampled is y ( k ) = s ( k ) + n( k ) .
(1)
where s(k) is a digitally modulated signal with unknown modulation type and n(k) is an independent zero-mean, Gaussian random process with second order moments E ª¬ n(k )n H (l ) º¼ = σ N2 I δ kl .
(2)
E ª¬ nT ( k ) n(l ) º¼ = 0 .
(3)
where xH denotes the Hermitian transpose of x, xT denotes the transpose of x, σ N2 is the noise power, δ kl represents the Kronecker delta function, and I is the identity matrix. Let Y(k)=[y(k) y(k+1) …y(k+L-1)], then Y (k ) = S (k ) + N (k ) .
(4)
Blind SNR Estimation Algorithm with Small Samples
743
where S(k)=[s(k) s(k+1) …s(k+L-1)], and N(k)=[n(k) n(k+1) …n(k+L-1)]. The L order covariance matrix of the received signal is Ryy = E (YY H ) = E (( S + N )( S + N ) H ) = E ( SS H ) + σ N2 I = Rss + σ N2 I .
(5)
where Rss is the covariance matrix of the original signal. Based on the properties of covariance, Rss is a positive semi-definite matrix and its rank is assumed q (q
(6)
From (5-6), SNR can be easily estimated if q is achieved.
3 SNR Estimation Algorithm Rank q can be determined from the smallest eigenvalue of Ryy. However, the problem is that when estimated from a finite sample size, the resulting eigenvalues are all different with probability one, thus making it difficult to determine q by merely “observing” the eigenvalues. The problem of achieving q can be seen as AR model selection process. Assuming Y=[Y(1), Y(2),…Y(N)], the most popular criteria for model selection is AIC and MDL AIC = −2 log( f (Y | Θ)) + 2w . (7a) MDL = − log( f (Y | Θ)) + 0.5w log( N ) .
(7b) where f (Y | Θ) is a parameterized family of probability densities, Θ is the maximum likelihood estimate of the parameters vector Θ , and w is the number of free adjusted parameters in Θ . AIC and MDL have been derived assuming that the number of samples is much larger than the number of estimated model parameters. For small samples, they have the tendency to select overly complex models. AIC tends to underpenalize whereas MDL tends to overpenalize. As a result, AIC yields a relatively large probability of overestimation of the rank q even when SNR is high while MDL underestimates the rank q when the SNR is low. To give a good tradeoff of over penalization and under penalization, CIC proposed by Broersen is employed here. CIC is a criteria for small sample model selection problems, and it combines the theoretical asymptotical preference for penalty 3 with the good finite sample characteristics of FSIC[7]. CIC for Burg in residual variance form is 1 ª º k « k 1+ N +1− i 1 » . CIC = log{res (k )} + max «∏ − 1,3¦ » 1 i=0 N + 1 − i » « i =0 1 − N +1− i ¬ ¼
(8)
744
D. Wu, X. Gu, and Q. Guo
where k∈{1, 2, …L} ranges over the set of possible ranks. Because likelihood estimator is needed in this paper, (8) should be transformed into likelihood form. Since AIC has the equation (9) AIC = −2 log( f (Y | Θ)) + 2w = N log(res(k )) + 2w . Then CIC in likelihood form is 1 ª º k « k 1+ N +1− i 2 1 » . − 1,3¦ CIC = − log( f (Y | Θ)) + max «∏ (10) » 1 N i =0 N + 1 − i » « i=0 1 − N +1− i ¬ ¼ To apply (10) to determine the rank of Ryy, f (Y | Θ) should be described firstly. In [8], ( p−k ) N § L 1/( L − k ) · λ ¨ ∏ i ¸ . ¸ (11) log f (Y | Θ) = log ¨ i = k +1 L ¸ ¨ 1 ¨ L − k ¦ λi ¸ i = k +1 © ¹ where λi is the estimate of λi, and satisfies λ1 ≥ λ2 ≥ λL . w can be estimated by
w = k (2 L − k )
(12)
From (11) and (12), CIC can be rewritten as § L 1/( L− k ) ¨ ∏ λi 2 CIC (k ) = − log ¨ i = k +1 L N ¨ 1 ¨ L − k ¦ λi i = k +1 ©
· ¸ ¸ ¸ ¸ ¹
( p −k ) N
1 ª º k « k 1+ N +1− i 1 » . + max «∏ − 1, 3¦ » 1 i =0 N + 1 − i » « i =0 1 − N +1− i ¬ ¼
(13)
Then q is estimated by q = arg min CIC (k ) k=1, 2, …L . k
(14)
From (13) and (14), the steps of blind SNR estimation proposed here are as follows: (1) Make an eigenvalue decomposition of the sample covariance matrix Ryy , 1 Ryy = N
N
¦ Y (i)Y (i) i =1
H
= U ¦U H .
(15)
consists of the orthonormal eigenvectors of Ryy . (2) Estimate the rank q, using (13) and (14). (3) Estimate the noise power according to
where
¦ = diag (λ ) , and U i
σ N2 =
L 1 λi . ¦ L − q i = q +1
(16)
Blind SNR Estimation Algorithm with Small Samples
745
and the signal power according to q
σ s = (¦ λi ) − qσ N2 . 2
(17)
i =1
(4)
The estimate of the received signal SNR is obtained as σ2 SNR = 10 log S2 . Lσ N
(18)
4 Simulations Results
The SD of the SNR estimators
In this section, the performance of the proposed algorithm is investigated in MATLAB environment. Computer simulations are performed in Monte Carlo way and 200 experiments are used. Seven digital modulation types are selected for testing the estimation performance of the algorithm, i.e. BPSK, 4PSK, BFSK, 4FSK, TFM, π/4QPSK and 16QAM. The receiver for SNR estimation is a blind receiver, that is, it has no knowledge of carrier frequency, modulation type and baud rate. First it should catch the locations of blind signals through frequency scanning, and then sample signals with the sample frequency 4 times of carrier frequency. The dimension of covariance matrix is set to L=40 and the range of SNR is -15dB to 25dB. The standard deviation can exhibit the performance more directly. In this simulation, the standard deviation (SD) is defined as SD = sqrt ( E ª( SNR − SNR ) 2 º) . ¬ ¼
4.0 3.8 3.6 3.4 3.2 3.0 2.8 2.6 2.4 2.2 2.0 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 -15
CIC AIC MDL N=400
-10
-5
0
5
10
15
20
25
SNR(dB) Fig. 1. The SD of SNR estimators in AWGN channel for BPSK (N=400)
D. Wu, X. Gu, and Q. Guo
The SD of the SNR estimators
746
4.0 3.8 3.6 3.4 3.2 3.0 2.8 2.6 2.4 2.2 2.0 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 -15
CIC AIC MDL N=600
-10
-5
0
5
10
15
20
25
SNR(dB)
The SD of the SNR estimator
Fig. 2. The SD comparison of SNR estimators in AWGN channel for BPSK (N=600)
3.0 2.8 2.6 2.4 2.2 2.0 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 -15
BPSK 4PSK BFSK 4FSK TFM PI/4QPSK 16QAM N=600
-10
-5
0
5
10
15
20
25
SNR(dB) Fig. 3. The SD of the proposed SNR estimator in AWGN channel (N=600)
4.1 AWGN Channel The robustness of the blind estimation algorithm in AWGN channels is tested in this subsection. Comparisons are made between algorithms using CIC, AIC and MDL for BPSK. Fig.1-2 show the results of the SD of the estimators using CIC, AIC and MDL
Blind SNR Estimation Algorithm with Small Samples
747
and sample numbers are N=400 and N=600 respectively. In Fig.1, the SNR estimator using CIC has better performance than that using AIC and MDL, especially at the low SNR range. The average SD for CIC is 0.55dB, for AIC is 0.73dB, and for MDL is 0.84dB. In Fig. 2, the SD decreases with the increasing of the samples. The average SD for CIC is 0.43dB, for AIC is 0.51dB, and for MDL is 0.64dB. Fig. 3 illustrates the SNR estimation results for BPSK, 4PSK, BFSK, 4FSK, TFM, π/4QPSK and 16QAM. It is seen that the proposed algorithm is not affected by modulation types. It is suitable for both constant envelope modulations and nonconstant envelope ones. The average SD is 0.38dB.
5.0 The SD of the SNR esitmator
4.5
BPSK 4PSK BFSK 4FSK TFM PI/4QPSK 16QAM N=600 fdT=0.01
4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 -15
-10
-5
0
5
10
15
20
25
SNR(dB) Fig. 4. The SD of the proposed SNR estimator in Rayleigh fading channel (N=600)
4.2 Rayleigh Fading Channel Although the proposed algorithm is deduced in AWGN channels, simulation results show that it can be applied in Rayleigh fading channel situations. Fig.4 illustrates the SNR estimation results for BPSK, 4PSK, BFSK, 4FSK, TFM and π/4QPSK in Rayleigh fading channels when fdT is 0.01. It is seen that the SD of the SNR estimator is below 1dB when SNR is not lower than -5dB. Although the performance goes down in fading channels compared with that in AWGN channels, the results still have reference value and instructive significance.
5 Conclusion In this paper, a new blind SNR estimation algorithm is proposed based on signal subspace approach and CIC. Computer simulation shows that the algorithm can
748
D. Wu, X. Gu, and Q. Guo
achieve accurate SNR estimations blindly with small samples at SNR range from 15dB to 25dB in AWGN channels and when applied in Rayleigh fading channels, it still has a good performance at SNR range from -5dB to 25dB. It exhibits a lower bias and better simplicity and more adapts to non-cooperative situations.
Acknowledgments The authors are grateful to Piet M. T. Broerson of the Department of Applied Physics of the Delft University and Fjo De Ridder of the department ELEC of the Vrije Universiteit Brussel (VUB), for their patient explaining by email.
References 1. Balachandran, K., Kadaba, S.R., Nanda, S.: Channel Quality Estimation and Rate Adption for Cellular Mobile Radio. IEEE Journal on Selected Areas in Communications. (1999) 1244-1256 2. Summers, T. A., Wilson, S.G.: SNR Mismatch and Online Estimation in Terbo Decoding. IEEE Transactions on Communications. (1998) 421-423 3. Dea-Ki Hong, Cheol-Hee Park, Min-Chul Ju. :SNR Estimation in Frequency Domain Using Circular Correlation. Electronics Letters. (2002)1693-1694 4. Matzner, R., Englberger, F.: An SNR Estimation Algorithm using Fourth-order Moments. Proceedings of the 1994 IEEE Symposium on Information Theory. (1994) 119 5. Ping G., Cihan T.: SNR Estimation for Nonconstant Modulus Constellations. IEEE Transactions on Signal Processing. (2005) 865-871 6. Bin, L., Robert, D., Ariela, Z.: A Low Bias Algorithm to Estimate Negative SNRs in an AWGN Channel. IEEE Communications Letters. (2002) 469-472 7. Piet M.T. Broersen.: Finite Sample Criteria for Aautoregressive Order Selection. IEEE Transactions on Signal Procesing. (2000) 3550-3559 8. Mati Wax, Thomas Kailath.: Detection of Signals by Information Theoretic Criteria. IEEE Transaactions on Aacoustics , Speech, and Signal Processing. (1985) 387-392
Bootstrapping Stochastic Annealing EM Algorithm for Multiscale Segmentation of SAR Imagery Xian-Bin Wen1, Zheng Tian2 , and Hua Zhang1 1
Department of Computer Science and Engineering, Tianjin University of Technology, Tianjin 300191, P.R. China [email protected] 2 Northwestern Polytechnical University, Xi’an, 710072, P.R. China
Abstract. A valid Bootstrapping stochastic annealing Expectation Maximization (BSAEM) algorithm is proposed for unsupervised and multiscale segmentation of synthetic aperture radar (SAR) imagery. Given an original SAR images, we construct multiscale sequence of SAR imagery and randomly select a small representative set of pixels based on Bootstrap technique, which reduces the dependence effect of pixels in SAR imagery. Then, mixture multiscale autoregressive model (MMAR) is employed for modeling SAR imagery, and BSAEM algorithm is proposed for SAR imagery segmentation. The algorithm consists of four steps, namely Expectation Step, Stochastic Step, Annealing Step and Maximization Step, which not only modifies convergence property of classical Expectation Maximization (EM) algorithm but also reduces the segmentation time. Finally, some experiment results are given based on our proposed approach, and compared to that of the EM algorithm. The results show that our algorithm gives better results than the EM algorithm both in the quality of the segmented image and the computational time.
1 Introduction In recent years, SAR imaging has been rapidly gaining prominence in applications such as remote sensing, surface surveillance, and automatic target recognition. For these applications, the classification of various categories of clutter is quite important, and their segmentation can play a key role in the subsequent analysis for target detection, recognition and image compression. Image segmentation consists on the partition of an image into its meaningful regions image classes. This partition is based on the statistical properties of the image pixels. During the last twenty years, two main approaches have been suggested for statistical image segmentation: the ‘contextual’ approach that utilizes the pixel neighborhood correlation information for image segmentation, and the ‘blind’ approach that ignores it. Notwithstanding the well known effectiveness of statistical segmentation methods in pattern recognition, their high cost in computational time is very unpleasant for many applications. In this paper, we apply a multiscale approach, which exploits the coherent nature of SAR imagery formation, to the SAR imagery segmentation problem. In particular, we build on the idea of characterizing and exploiting the scale-to-scale statistical variations and statistical variations in same scale in SAR imagery due to radar speckle [1]-[3]. To fully exploit this phenomenon and its complexity, we employ a recently introduced D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 749 – 754, 2006. © Springer-Verlag Berlin Heidelberg 2006
750
X.-B. Wen, Z. Tian, and H. Zhang
class of MMAR model [4], and improve the EM algorithm by adding an optimal Bootstrap sample selection [5] and decorrelation step to the blind approach. The proposed approach offers several advantages over the single-resolution approaches. Firstly, computational complexity is reduced, since much of the work can be done at coarse scales, and parameters estimation would be made using a small number of representative samples instead of all the correlated pixels in the SAR imagery. Secondly, the BSAEM algorithm would converge more rapidly to the good solution then the classical EM algorithm. Experimental results show that the cost in segmentation time using BSAEM algorithm is considerably reduced, and the quality of the segmented image is improved compared to that of the EM algorithm.
2 Bootstrap Principle Suppose that we have a random sample of N data points, χ N = { X 1 , , X N } , drawn
from an unknown distribution F . The Bootstrap approximation is to estimate the distribution of a given unknown statistics R( χ N , F ) by the Bootstrap distribution noted Fn* corresponding to the sample χ n* = { X 1* , , X n* }
(n ≤ N ) , where
X , , X are randomly selected from χ N . Since the empirical distribution converges almost surely to the underlying distribution, one can hope that the Bootstrap distribution would converge to the true unknown distribution. It is also favorable over other method as it requires in the way of modeling, assumptions or analysis. * 1
* n
3 Quadtree Interpretation of SAR Imagery The starting point for our model development is a multiscale sequence X L , X L −1 , , X 0 of SAR images, where X L and X 0 correspond to the coarsest and finest resolution images, respectively. The resolution varies dyadically between images at successive scales. More precisely, we assume that the finest scale image X 0 has a resolution of δ × δ and consists of an N × N array of pixels (with N = 2 M for some M ). Hence, each coarser resolution image X m has 2 − m N × 2 − m N pixels and resolution 2 m δ × 2 m δ . Each pixel X m (k , l ) is obtained by taking the coherent sum of complex fine-scale imagery over 2 m × 2 m blocks, performing log-detection (computing 20 times the log-magnitude), and correcting for zero frequency gain variations by subtracting the mean value. According, each pixel in image X m corresponds to four “child” pixels in image X m −1 . This indicates that quadtree is natural for the mapping. Each node
s on the tree is associated with one of the pixels X m (k , l ) corresponding
to pixel ( k, l ) of SAR image X m . As an example, Fig.1 illustrates a multiscale sequence of three SAR images, together with the quadtree mapping. Here the finestscale SAR imagery is mapped to the finest level of the tree, and each coarse scale representation is mapped to successively higher levels. We use the notation X ( s ) to indicate the pixel mapped to node
s . The scale of node s is denoted by m( s) .
Bootstrapping Stochastic Annealing EM Algorithm for Multiscale Segmentation
751
Fig. 1. Sequence of three multiresolution SAR images mapped onto a quadtree
4 Segmentation of SAR Imagery by BSAEM Algorithm In this paper, we focus on a specific class of multiscale models, namely mixture multiscale autoregressive models [4] of the form
§ X ( s ) − ak ,0 − ak ,1 X ( sγ ) − − ak , pk ( sγ K F ( X ( s ) | ℑs ) = ¦ π k Φ ¨ ¨ k =1 σk ©
γ
pk
)· ¸. ¸ ¹
(1)
s . pk is order of the regression, Moreover the coefficients a i (s ) and variance σ i only depend on m(s ) . F is the distriwhere
is defined to refer the parent of node
bution function, K is the number of classes, ℑ s is the set of X ( sγ ) , , X ( sγ p ) ( p = max p k ), Φ (.) is the standard normal distribution function and k
K
¦π
k
=1 ,
k =1
π k > 0 . Let ϕ (.) be the probability density function of a standard normal distribution. In Bayesian unsupervised segmentation using parametric estimation, the problem of segmentation is based on the model identification. The most commonly used estimator is the ML estimator, which is solved by the classical EM algorithms[4]. Here, the mixture parameters are estimated by the BASEM algorithm which consists of four steps. Given an available Bootstrap sample X 0* from the original image X 0 , and after initializing parameters from the SAR image histogram, the details are as follows: 4.1 Expectation Step
The posterior probability for a pixel X * ( s ) ∈ X 0* to belong to class k at the iteration is given by
τ s ,k =
π k (1 σ k )ϕ (eks* σ k ) K
¦ π k (1 σ k )ϕ (eks* σ k )
k = 1, , K .
(2 )
k =1
where es*, k = X * ( s ) − ak ,0 − ak ,1 X * ( sγ ) − − ak , p X * ( sγ p ) , X * ( sγ ) is parent of X * ( s ) .
752
X.-B. Wen, Z. Tian, and H. Zhang
4.2 Stochastic Step
Then, construct a Bernouilli random variable zs , k of parameter
τ s ,k .
4.3 Annealing Step
From zs , k and τ s , k , one can construct another random variable
ws ,k = τ s , k + hn ( z s ,k − τ s ,k ) .
(3)
where hn is a given sequence which slowly decreases to zero during iterations. 4.4 Maximization Step
In this step, ws , k is considered artificially as the a posterior probability of X * ( s ) , so that, at next iteration, we have
¦ ws , k
πˆ k = {s|m ( s ) = 0} N
¦
ws , k [ X * ( s ) − aˆk ,0 − aˆk ,1 X * ( sγ ) − − aˆk , p X * ( sγ p )]2
. k = 1, , K .
(5)
¦ ws ,k X * ( s ) μ ( X * ( s), i) = ¦ aˆk , j ¦ ws , k μ ( X * ( s ), j ) μ ( X * ( s), i) .
(6)
σˆ = 2 k
(4)
k = 1, , K .
{ s | m ( s ) = m}
¦
{s | m ( s ) = m}
ws , k
where (aˆ k , 0 , aˆ k ,1, , aˆ k , p ) satisfy the system of equations p
{ s | m ( s ) = m}
j =1
{ s | m ( s ) = m}
where μ ( X * ( s ), i ) = 1 for i = 0 and μ ( X * ( s ), i ) = X * ( sγ i ) for i > 0 . The estimates of the parameters are then obtained by iterating the four steps until convergence. Parameters K , pk can be selected by Bayesian information criterion (BIC). After the number of SAR imagery regions is detected and the model parameters are estimated, SAR image segmentation is performed by classifying pixels. The Bayesian classifier is utilized for implementing classification. That is to say, to attribute at each X (s ) a class k with the following way: k ( X ( s )) = Arg{max[π j (1 σ j )ϕ (e js σ j )]} . 1≤ j ≤ K
(7)
4 Experimental Results for SAR Imagery To demonstrate the segmentation performance of our proposed algorithm, we apply it to two complex SAR images which are size of 128 × 128 pixel resolution, consist of
Bootstrapping Stochastic Annealing EM Algorithm for Multiscale Segmentation
753
woodland and cornfield (see Fig. 2(a)). From the complex images, we generate an above-mentioned quadtree representation consisting of L = 3 levels and use a secondorder regression. We randomly select 900 representative pixels from the original images. An unsupervised segmentation method based on the BSAEM algorithm is then used for parameters estimation, and Bayesian classification is adopted for pixels classification. The number of resampling and the order of regression are such chosen, because it is found that by increasing the regression order to p = 2 and resampling number for both cornfield and forest, we can achieve a lower probability of misclassification and a good trade-off between modeling accuracy and computational efficiency. Fig. 2(c) shows the results from applying BSAEM approach to two SAR images, as well as the results (see Fig.2(b)) from EM algorithm for comparison. Table 1 and Table 2 show the misclassification probabilities and the time of segmentation images under P4 computer, respectively. The results we obtain show that the BSAEM algorithm produces better results than the EM method both in the quality of the segmentation and the computing time.
(a)
(b)
(c)
Fig. 2. (a) Original SAR image composed of woodland and cornfield. (b) Segmented image from EM algorithm. (c) Segmented image from BSAEM algorithm. Table 1. Misclassification probabilities for the SAR images in Fig. 2
Pmis (. | forest ) Fig.2(top) Fig.2(bottom)
EM (b) 1.312 2.776
BSAEM (c) 1.296 3.162
Pmis (. | grass ) EM (b) 5.249 1.527
BSAEM (c) 4.124 1.619
754
X.-B. Wen, Z. Tian, and H. Zhang Table 2. Time of segmentation images under P4 computer (s)
EM Fig. 2 (top) Fig. 2 (bottom)
2637 4324
BSAEM 470 793
5 Conclusion We apply the Bootstrap sampling techniques to the segmentation of SAR image based on the MMAR model, and give BSAEM algorithm for MMAR model of SAR imagery. This kind of algorithm leads to a great improvement in ML parameter estimation and considerably reduces the segmentation time. Experimental results show that the BSAEM algorithm gives better results than the classical one in the quality of the segmented image.
Acknowledgements This work is supported in part by the National Natural Science Foundation of China (No. 60375003), the Aeronautics and Astronautics Basal Science Foundation of China (No. 03I53059), the Science and Technology Development Foundation of Tianjin Higher-learning, the Science Foundation of Tianjin University of Technology.
References 1. Fosgate , C., Irving , W.W., Karl, W., Willsky, A.S.: Multiscale Segmentation and Anomaly Enhancement of SAR Imagery. IEEE Trans. on Image Processing (1997) 7-20 2. Irving , W.W., Novak , L.M., Willsky, A.S.: A Multiresolution Approach to Discrimination in SAR Imagery. IEEE Tran. Aerosp. Electron. Syst. (1997) 1157-1169 3. Kim, A., Kim, H.: Hierarchical Stochastic Modeling of SAR Imagery for Segmentation / Compression. IEEE Trans. on Signal Processing (1999) 458-468 4. Wen, X.B., Tian, Z.: Mixture Multiscale Autoregressive Modeling of SAR Imagery for Segmentation. Electronics Letters (2003) 1272-1274 5. Efron, B., Tibshirani R.: An Introduction to the Bootstrap. London. U.K.,Champman & Hall (1993)
BP Neural Network Based SubPixel Mapping Method Liguo Wang1, 2, Ye Zhang2, and Jiao Li2 1
School of Information and Communication Engineering, Harbin Engineering University, Harbin 150001, China [email protected] 2 Dept. of Information Engineering, Harbin Institute of Technology, Harbin 150001, China; {wlg74327, zhye, lijiao}@hit.edu.cn
Abstract. A new subpixel mapping method based on BP neural network is proposed to improve spatial resolution of both raw hyperspectral imagery (HSI) and its fractional image. The network is used to train a model that describes the relationship between mixed pixel accompanied by its neighbors and the spatial distribution within the pixel. Then mixed pixel can be super-resolved by the trained model in subpixel scale. To improve the mapping performance, momentum is employed in BP learning algorithm and local analysis is adopted in processing of raw HSI. The comparison experiments are conducted both on synthetic images and on truth HSI. The results prove that the method has fairly good mapping effect and very low computational complexity for processing both of raw HSI and of fractional image.
1 Introduction One biggest limitation of hyperspectral imagery (HSI) relates to spatial resolution, which determines the spatial scale of smallest detail depicted in an image. In HSI, a significant proportion of pixels are often mixed of more than one distinct material. The presence of mixed pixels severely affects the performance of military analysis, environment understanding, etc. In this case, spectral unmixing [1] was introduced to decompose each mixed pixel into disparate components with respective proportions. There exist many spectral unmixing methods to estimate land cover components, such as linear spectral mixture modeling [2], multilayer perceptrons [3], nearest neighbor classifiers [4] and support vector machines [5]. Generally, spectral unmixing can provide more accurate representation of land covers than hard classification of oneclass-per-pixel. However, the spatial distribution of each class component in mixed pixel remains unknown. Subpixel mapping (SM) is just presented to solve this problem by dividing each pixel into several smaller units and allocating the target (specified class) to the smaller cells. A limited number of SM methods have been presented. Reference [6] made use of another higher spatial resolution image to sharpen the output of spectral unmixing, but it is difficult to obtain two coincident images of different spatial resolution. Reference [7] formulated the spatial distribution of target component within each pixel as energy function of a Hopfield neural network (HNN). The results provide an improved D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 755 – 760, 2006. © Springer-Verlag Berlin Heidelberg 2006
756
L. Wang, Y. Zhang, and J. Li
representation of land covers. However, it costs a considerable computational time to obtain the results. All the methods described above are only suited for processing of fractional image. Unfortunately, for the limitation of complexity of raw image, there is no effective SM method applying to raw imagery. In this paper, a novel predictor based on BP neural network (BPNN) with momentum is proposed for processing both of fractional image and of raw HSI.
2 BP Learning Algorithms with Momentum The standard BP algorithm has slow rate of convergence and the convergence is confronted with locally optimal phenomenon. Momentum decreases BP network’s sensitivity to small details in the error surface, and with momentum a network can slide through some shallow local minima. Let xi (i = 1,2,..., n) be the inputs to the network, y j ( j = 1,2,..., p) the outputs from the hidden layer, ok (k = 1,2,.., q) the output layer and w jk the connection weight from the j th hidden node to the k th output node. Momentum algorithm is formulated by appending some fraction of previous weight increment to standard BP algorithm:
Δw jk ( n) = ηδ k (n) y j (n) αΔw jk (n − 1)
0 <α <1 ,
(1)
Where η is a small positive parameter called the leaning rate, n is the iteration index, d k is the target output value, α is the momentum factor, and
δ k = (d k − ok )ok (1 − ok ) .
(2)
When the iteration converges to an acceptable error, learning ends and the weights are determined.
3 BPNN Based SM Method 3.1 Applying to Fractional Image In this section, BPNN is used to construct this kind of SM method. In training process, BPNN is applied to construct a local SM model that describes the relationship between fractions in local window and spatial distribution of subpixels assigned to target in the center coarse pixel. For convenience, the process in simple case is described only, and it is similar in complicated cases. Let the zoom factor z be 2. Considered local window W containing 3 × 3 coarse pixels centered at xij in a fractional image, the relationship expressed by the model can be described as
ª xi −1, j −1 , xi −1, j , xi −1, j +1 º 1 2 « » BP NETWORK MAPPING ªoij oij º « » , x x x ⎯ ⎯ ⎯ ⎯ ⎯ ⎯ ⎯ ⎯ ⎯ → , , « i , j −1 i, j i , j +1 » 3 4 « o o « » ¬ ij ij »¼ ¬ xi +1, j −1 , xi +1, j , xi +1, j +1 ¼
(3)
BP Neural Network Based SubPixel Mapping Method
757
where
1, if it is corresponding to ta rget class oijk = ® ¯0, otherwise
k = 1,2,3,4 .
(4)
The expression (3) provides a pair of learning pattern. The fractional window is the input pattern, fed to network in the form of
X=
[x [x
i−1, j −1 , xi−1, j , xi−1, j +1 , xi , j −1, xi , j , xi , j +1 , xi+1, j −1 , xi+1, j , xi+1, j +1
] ]
T
T i−1, j −1 , xi−1, j , xi−1, j +1 , xi , j −1, xi , j , xi , j +1 , xi+1, j −1 , xi+1, j , xi+1, j +1
.
(5)
The input pattern is normalized to ensure that similar training samples can be gotten for different images as possible. Together with the similarity of spatial dependence of any image, the trained network can be universal for prediction of different images after once training. It can be seen that the number of input nodes of BP network is only determined by the fractional window size. Each subpixel of SM result is an output pattern, exported in the form of O = oˆij (k = 1,2,3,4) . Then, the expression (3) can be modeled by 4 k
related BP networks. In predicting process, the outputs O are integral, while it is fuzzy in test process. In this case, they can be quantified into integral values directed by the fractional abundance of mixed pixel xij , and in fact, the quantification has function of error correction. 3.2 Applying to Raw HSI Before subpixel mapping, abundance information should be obtained. There are large numbers of classes in raw HSI, and so global spectral unmixing becomes inaccurately and unreasonable. Additionally, the mapping of raw HSI is very sensitive to abundance error. In this case, local window analysis is preferred to adopt. Furtherly, preprocesses such as edge detection [8], endmember extraction [9] and so spectral unmixing are all conducted locally in one band or more than one band. When the number C of classes or endmembers is taken as 2, the whole process can be done in single band. Let A be a band of a HSI, xi , j be a pixel to be proposed in A , and W be the 3× 3 local window centered at xi , j . Firstly, edge detection algorithm is used to decide whether xi , j is an edge pixel or not. Secondly, if xi , j is not an edge pixel, it is repeated into Z subpixels directly by means of creating repetitive information (CRI) [10]. If yes, 2 endmembers v1 ,v2 are extracted in W3×3 . xi , j is then unmixed by v1 ,v2 , and mapped by BPNN method. With the help of CRI, the computational cost of the new method can be lowed down greatly. For a single band image, let λ1 , λ2 are the fractions of xi , j to v1 , v2 separately then the constrained unmixing results can be given in a closed form
758
L. Wang, Y. Zhang, and J. Li
°λ1 = ( xi , j − v2 ) /(v1 − v2 ) . ® °¯λ2 = ( xi , j − v1 ) /(v2 − v1 )
(6)
When the number C is specified as 3 (large number as more than 3 is inadvisable in local window), at least two bands of the HSI should be considered simultaneously for endmember extraction and spectral unmixing. In this case, the solution of fractional abundance can be resorted to reference [1], and BP network with multi-valueoutput is needed. Given the abundances in mixed pixels, BPNN based SM can be implemented on raw HSI.
4 Experiments and Results In order to testify the validity of the new method, comparative experiments are performed on synthetic images as well as truth HSI. The HSI is acquired over a trial field
a) Original images b) Fractional images c) SM results (HNN) d) SM results (BPNN) Fig. 1. SM images predicted by the two methods (line1: synthetic circle; line2: synthetic crossline; line3: class11 in HSI; line4: class 13 in HSI)
BP Neural Network Based SubPixel Mapping Method
759
of farm and woods in North Indiana, USA, and its hard class label is available. To obtaining of training samples and evaluating of SM performance, supervised fractional data is constructed by degrading hard classification image to a lower resolution one as size of 1/4 original image using a square mean filter. In this way, 600 training samples are selected out of all supervised fractional data for BPNN, and the original hard classification image can be used as a consultation to evaluate SM effects of supervised fractional image. In the first group of experiments, the zoom factor, the leaning rate, the momentum factor, the number of hidden nodes, and the fractional window size are specified as z = 2 , η 0.1 , α 0.8 , p = 25 , and W = 5× 5 respectively. HNN method [7], which is a classical and effective SM method, is used as a comparison of BPNN method, and root mean square error (RMSE) as an evaluation criterion. The first two lines in Fig. 1 show SM images predicted by the two methods, performing on two synthetic images of different shapes (circle and cross-line). The last two lines in Fig. 1 provide predicted SM images for class 11 and class12 of the supervised fractional image. It is shown that the two methods are of little difference in visual effect. The RMSE measures corresponding to the 4 cases are listed in table 1 to provide a quantitative comparison. From the table and in visual, the two methods have comparable effect in predicting. As for computational cost, it takes HNN method a long time to get SM result (about 40 minutes for the HSI data, 3000 iterations at zoom factor 2), while it only takes BPNN method several seconds for the same input image. Table 1. Comparison of RMSE
RMSE HNN BPNN
Circle 0.0125 0.0244
Cross-Line 0.0270 0.0148
Class 11 0.0491 0.0197
Class 12 0.0271 0.0209
In the second group of experiments, the method is used for mapping of raw HSI. The HSI is comes from AVIRIS data of naval military base acquired in San Diego. All parameters are evaluated as before. Local variance [10] is used for edge detection and 14% pixels are segmented as mixed pixels. It only takes the method about 15 seconds to process a single band. The local view of band 1 of the HSI is shown in figure 2. It can be seen that the mapped image is more definite than the original one in visual.
(a) Original image
(b) Original sub-image
(c) mapped sub-image
Fig. 2. BPNN based SM of raw HSI
760
L. Wang, Y. Zhang, and J. Li
5 Conclusions In this paper, a BPNN based SM method has been proposed. There are four main merits of the method. First, the method has good mapping effect. Second, the method can be extended to the processing of raw HSI. This case endows more significance of the method. Third, the method is universal for processing of different images after once training. Finally, it has great superiority in computational cost, the larger of zoom factor, iteration times, and the size of input image, the greater of the superiority. The proposed method waits for further practical applications in future.
References 1. Keshava, N., Mustard, J. F.: Spectral Unmixing. Signal Processing Magazine, IEEE. 19 (2002) 44–57 2. Garcia-Haro, F. J., Gilabert M. A., Melia, J.: Linear Spectral Mixture Modeling to Estimate Vegetation amount from Optical Spectral Data. Int. J. Remote Sensing. 17 (1996) 3373– 3400 3. Atkinson, P. M., Cutler, M. E. J., Lewis H.: Mapping Subpixel Proportional Land Cover with AVHRR Imagery. Int. J. Remote Sensing. 18 (1997) 917–935 4. Schowengerdt, R. A.: Remote Sensing: Models and Methods for Image Processing. San Diego, CA: Academic, 1997 5. Brown, M., Gunn, S. R., Lewis, H. G.: Support Vector Machines for Optimal Classification and Spectral Unmixing. Ecol. Modeling. 120 (1999) 167–179 6. Foody, G. M.: Sharpening Fuzzy Classification Output to Refine the Representation of Sub-pixel Land Cover Distribution. Int. Journal of Remote Sensing. 19 (1998) 2593–2599 7. Andrew, J. T., Hugh, G. L., Peter, M. A., Mark, S. N.: Super-resolution Target Identification from Remotely Sensed Images using a Hopfield Neural Network. IEEE Transactions on Geoscience and Remote Sensing. 39 (2001) 781–796 8. Wang, L.G., Zhang Y., Gu, Y.F.: Image Interpolation Based on Adaptive Edge-preserving Algorithm. Journal of Harbin Institute of Technology. 36 (2005) 18–20 9. Cipar, J. J., Eduardo, M., Edward, B.: A Comparison of End Member Extraction Techniques. Proceedings of SPIE-The International Society for Optical Engineering. 24725 (2002) 1–9 10. Fisher, P.: The pixel: A Snare and a Delusion. J. Remote Sensing. 18 (1997) 679–685
Cellular Recognition for Species of Phytoplankton Via Statistical Spatial Analysis Guangrong Ji1, Rui Nian1, Shiming Yang2, Lijian Zhou1, and Chen Feng1 1
College of Information Science and Engineering, Ocean University of China, 266003 [email protected], [email protected] 2 Division of Life Science and Technology, Ocean University of China, 266003 [email protected]
Abstract. A scheme of cellular recognition for phytoplankton species via statistical spatial analysis is presented in this paper. Bayesian Ying-Yang harmony learning system on Gaussian mixture models accomplishes automatic parameter learning and model selection in parallel, and roughly decides on the most competitive aggregate for possible genus or subgenus. With dipole kernel density estimate, probabilistic spatial geometrical coverage like hyper sausages exactly cognizes and matches against all the inclusive species in the specified aggregate. The mechanism guarantees that species inner knowledge could be explored in a coarse to fine process. Simulation experiment achieved probability distribution information, and proved the approach effective, superior and feasible.
1 Introduction Pattern recognition with diverse images, can be treated as human vision simulation in intelligent cognition. Images from sorts of objects in various styles, angles and shapes, will provide vivid and omni directional information. Traditional pattern recognition focuses on sufficient statistic cases, while statistical learning theory, with the ambition to estimate underlying distribution, dependence structure, and generalization ability as good as possible, is a machine learning principle from a finite sample size [1-3]. Instead of Empirical Risk Minimization, Vapnik first put forward Structural Risk Minimization and aroused Support Vector Machine as a popular practical method, with complexity based on a minimal capacity measure - VC dimension confidence [1]. A general statistical learning framework, Bayesian Ying-Yang harmony learning theory, was proposed by Lei Xu for simultaneous parameter learning and model selection [2]. Probabilistic spatial coverage analysis, with homologous topology nature as pre-acquired knowledge, attaches great importance on optimal cognition in high dimensional space [3-5]. Cellular recognition for regular species of phytoplankton is vital in biology, oceanic ecosystem, environment inspection, and fishery production. Previous approaches on oceanic phytoplankton species classification mainly make use of observation by microscope, manual read, judgments and labeling. In this paper, statistical spatial analysis is employed to obtain probability distribution information and explore phytoplankton cellular species in a coarse to fine process. Bayesian D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 761 – 766, 2006. © Springer-Verlag Berlin Heidelberg 2006
762
G. Ji et al.
Ying-Yang harmony learning system on Gaussian mixture models implements automatic parameter learning and model selection in parallel, and roughly decides on possible genera or subgenera. Spatial geometrical coverage like hyper sausages, shaped from a dipole kernel density estimate, exactly cognizes and predicts the particular species inclusive in the selected aggregate, with decision made on Bayesian decision rule [6].
2 Bayesian Ying-Yang Harmony Learning Bayesian Ying-Yang (BYY) harmony learning theory acts as a general statistical learning framework for parameter learning and model selection [2]. Bayesian YingYang system consists of joint distribution decomposition coordinately from complementary Bayesian perspectives, p ( x , y ) = p ( y x ) p ( x ) , q ( x, y ) = q ( x y ) q ( y ) .
The fundamental BYY harmony learning principle is to make the Ying and Yang machine be best harmony in a twofold sense, i.e., to conform to the matching nature and the least complexity nature. Mathematically, harmony learning is an incorporate task of continuous optimization for parameter learning and discrete optimization for model selection, both under the same cost function, cross-entropy measure H ( p q) . H ( p q) =
³ p ( y x ) p ( x ) ln[ q ( x y ) q ( y )]dxdy − ln z q ,
zq =
N
¦ t =1 q (u t )
(1)
In a Backward architecture, p(x) can be derived either from an empirical density or from a nonparametric Parzen window PDF estimate with a smoothing parameter. p(y x) is structure-free and q( x y) is parametric, thus resulting in p( y x) = δ ( y − yˆ ) , yˆ = argmaxy [q( x y)q( y)] . A general learning form for the Backward architecture is:
p( y x, ") = δ ( y − y"( x)), p("x) = δ ("− "( x)), y"( x) = arg min y d (", x, y ), "( x) = arg min "d (", x, y"( x)), d (", x, y ) = − ln[q( x y, ")q( y ")q(")] + ln z q where q ( y ) = q( y, ") = q( y ")q(") , q(") =
k
(2)
k
¦ j=1α jδ ("− j) and ¦"=1α " = 1 .
3 Probabilistic Spatial Coverage Analysis In the sense of classification risk minimization, probabilistic pattern recognition is philosophically optimal [7]. Let there be N objects O1, O2 ,, On ,, ON −1, ON , with Imn the m th image of object On , and the similarity between two images Imn and Iij denoted as the distance d (I mn , Iij ) . The goal for probabilistic pattern recognition is to estimate the posterior probability p(On I ) given an input image x = I . According to
homologous continuity law, complex geometrical body for each object On , can be
Cellular Recognition for Species of Phytoplankton Via Statistical Spatial Analysis
763
constituted by multitudinous divisional hyper surfaces [3-5]. Therefore, approximate spatial bodies, well fit for objective distribution, may be designed on inherent nature. 3.1 Homologous Topology
Suppose that the differences between images from identical object is changing gradually or not determined by quantum, although shape variation corresponds to infinite deformation paths, all of them will comply with homologous continuity law. Let I n be a data set inclusive all images of object On , given I xn , I yn ∈ I n and ε > 0 , n
there must be at least one gradually changing course, or a set J .
J n = {I xn = I xn1 , I xn2 I xnj −1 , I xnj = I yn | d ( I xnm , I xnm+1 ) < ε , ∀m ∈[1, j −1], m ∈ N} ⊂ I n
(3)
3.2 Spatial Coverage Architecture
Spatial geometrical coverage inclusive of the embedding set I n , is constructed by topological product between I n and hyper sphere. The selection on multiform distinct spatial bodies will lead to tremendous diversifications in topological coverage and geometrical realization. In practice, topological covering set S n like hyper sausages, inclusive of set I n for an object On , can be designed as: Sn =
S ,S i
n i
n i
= {x | d(x, y) ≤ r, y ∈Vi , x ∈ Rd },Vi = {y | y = αI xni + (1−α)I xni +1 ,α = [0,1]}
(4)
One segment of topological covering set is defined here. Let d ( x, x1 x2 ) = minα∈[0,1] d ( x, αx1 + (1 − α ) x2 )
(5)
be the distance of x and the line segment x 1 x 2 , where x1 , x2 ∈ R d are two centers.
Thus one hyper sausage set is S(x1, x2; r) = {x | d 2 (x, x1x2 ) ≤ r 2}. 3.3 Probabilistic Spatial Coverage
For the sake of building probability distribution for every object On , we create one kind of density function, called a dipole kernel function, to estimate a segment of class conditional probability function p( I On ) or q( x ") , much like one hyper sausage in spatial coverage. q ( x, x1 , x2 ) ≤ 0 hG ( x | x1 , Σ) , ° K ( x, x1 , x2 ) = ®hG ( x | x 2 , Σ) , q( x, x1 , x2 ) ≥ d ( x1 , x 2 ) °hG ( x − q( x, x , x ) × ( x − x ) d ( x , x ) | x , Σ), otherwise 1 2 2 1 1 2 1 ¯
(6)
where G(x u, Σ) is a Gaussian density, q(x, x1, x2 ) = (x − x1) • (x2 − x1) d(x1, x2 ) , h is the
magnitude. On a set of M typical views in an appropriate order from object On ,
q ( x ") = (max iM=1 K i ) M .
764
G. Ji et al.
Bayesian decision rule adopts the Maximum A Posteriori estimation for the hypothesis set, n∗ = arg maxnN=1 p(On I ) . Moreover, Bayesian theorem can decompose the posterior probability p(On I ) into the class conditional probability p(I On ) and the
prior probability p(On ) , p(On I ) = p(I On ) p(On ) p(I ) , where P( I ) is the unconditional probability. Therefore, assuming that all objects have the same prior probability, what we need is only to make a decision on the conditional probability function p(I On ) for
each object On .
4 Simulation Experiment Image database consists of cellular images of regular phytoplankton species in various styles, angles and shapes. Based on global similarity measure to guide decisions, a typical cellular sort procedure was involved to form connectivity frameworks for each phytoplankton species. Fig.1 shows example images from database, including Ceratium, Rhizosolenia, Pleurosigma, Stephanopyxis, Peridinium, Skeletonema, Nitzschia, Chaetoceros and Coscinodiscus.
Fig. 1. Example images
Cellular recognition strategy is composed of phytoplankton statistical learning and spatial coverage analysis. Recognition procedure flows from multi-object world to possible genera or subgenera nodes, and then to species nodes step by step. According to phytoplankton’s own nature, BYY system is first assigned to roughly decide on the united class nodes that multiple species subordinate to at a higher level, and to share automatic parameter learning and model selection in parallel. BYY harmony learning results in a winner-take-all type competition for model selection so that the number of possible genera or subgenera could be predicted for primary recognition. Probabilistic spatial geometrical coverage is then set up to exactly cognize genuine species from the already specified aggregate. Exquisite and subtle inner relationship inside each species could be further explored in a coarse to fine process. The whole hierarchical scheme employs BYY system on Gaussian mixture models in a Backward architecture for the united aggregate, and probabilistic spatial construction with dipole kernel density estimate function like hyper sausages for species, respectively. Pattern recognition was performed in aid of the winner-take-all competition p( y x) , or p( y x, ") and p("x) , for the united class nodes, as well as the conditional
probability computation q ( x ") for every species. The decision was made by Maximum A Posteriori estimation on the particular species where the maximal posterior probability occurred. Probability distribution, from harmony competition in genera or subgenera, and from spatial coverage in species, would increase or decrease
Cellular Recognition for Species of Phytoplankton Via Statistical Spatial Analysis
765
the corresponding object’s activity, and the most active object was used to predict or judge which species the phytoplankton are really from.
5 Result Analysis Test images from unknown species were classified into the most competitive united classes, and matched against all the inclusive species prototypes in those particular aggregate. Predictive genera or subgenera could be estimated in harmony learning stage. The hypothesis set was ranked in a down sequence by probability gained from the stored probabilistic spatial analysis. Training recognition rates were all 100%. With proper parameters, mistaken recognition rate for all images from unlearned species could be 0%, i.e., unlearned species were all rejected without incorrect recognition. Table 1 makes a comparison on average recognition rates with single BYY harmony learning, single probabilistic spatial coverage, as well as their combination. Fig. 2 lists some recognition results with the top three matches. Table 1. Recognition comparison Recognition approaches Single BYY Single probabilistic coverage BYY based spatial coverage
Recognition rates of top three matches (%) 1 2 3 76.92 88.46 92.31 84.62 96.15 98.80 85.38 96.92 98.92
Fig. 2. Recognition results with the top three matches
6 Conclusions In this paper, we present a scheme of cellular pattern recognition for species of phytoplankton via harmony statistical learning and spatial coverage analysis. BYY
766
G. Ji et al.
harmony learning system on Gaussian mixture models in a Backward architecture, is assigned to carry out automatic parameter learning and model selection in parallel, and roughly decide on the most competitive united class for possible genus or subgenus. Probabilistic spatial geometrical coverage with a dipole kernel density estimate function like hyper sausages, is adopted to exactly cognize and match all the inclusive phytoplankton species in a certain united class. The hierarchical hybrid strategy guarantees that species inner knowledge could be explored step by step in a coarse to fine process. The decision is made on Bayesian decision rule. Instead of a single evaluation, prediction is ranked in a sequence by means of probability, which combines visible information from multiple species together, and makes the correct hypothesis more probable. Simulation experiment has achieved probability distribution information, and proved the approach effective, superior and feasible.
Acknowledgements This research was fully supported by the Natural Science Foundation of China (60572064) and the National 863 Natural Science Foundation of P. R. China (2001AA636030).
References 1. Vapnik, V.N: The Nature of Statistical Learning Theory. Springer-Verlag, Berlin, (1995) 2. Xu, L.: Bayesian Ying Yang harmony learning. The handbook of brain theory and neural networks, Arbib, M.A., Cambridge, MA, the MIT Press, (2002) 3. Huang, D.S.: Systematic Theory of Neural Networks for Pattern Recognition. Publishing House of Electronic Industry of China, Beijing (1996) 70-78 4. Wang, S. J.: Biomimetics pattern recognition. INNS, ENNS, JNNS Newletters Elseviers, (2003) 5. Wang, S. J., Wang, B.N.: Analysis and theory of high-dimension spatial geometry for Artificial Neural Networks. Acta Electronica Sinica, Vol. 30 No.1, (2002) 1-4 6. S-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, (2004) 7. Huang, D. S.: Radial basis probabilistic neural networks: Model and application. International Journal of Pattern Recognition and Artificial Intelligence, 13(7), (1999) 10831101
Combination of Linear Support Vector Machines and Linear Spectral Mixed Model for Spectral Unmixing Liguo Wang1, 2, Ye Zhang2, and Chunhui Zhao1 1
School of Information and Communication Engineering, Harbin Engineering University, Harbin 150001, China {wangliguo, zhaochunhui}@hrbeu.edu.cn 2 Dept. of Information Engineering, Harbin Institute of Technology, Harbin 150001, China {wlg74327, zhye}@hit.edu.cn
Abstract. Aiming at the shortcoming of linear spectral mixing model (LSMM) and linear support vector machines (LSVM) has potential capability to be used in spectral unmixing, but it is hard to construct too many classifiers when LSVM is used in partial unmixing. In this paper, the equality of LSVM and LSMM is proved concisely, and then a new double-unmixing scheme is proposed by combining the two models. In the first time, LSVM based full unmixing is performed for selecting related class subset. In the second time, appropriate model is selected according to the cardinality of current subset for partial unmixing. Another, least square LSVM has been improved for effective unmixing. Experiments prove the high efficiency of the proposed scheme.
1 Introduction In resent years, hyperspectral remote sensing has been applied widely and the techniques of hyperspectral imagery (HSI) processing have been developed greatly. As a key technique of mixed pixel processing, spectral unmixing [1] catches more and more eyes. Mixed pixel is a pixel that corresponds to more than one class or substance of land cover. Given all endmembers(EMs) or constituent spectra in HSI, the task of spectral unmixing is to work out the fractional abundance of each class. Spectral unmixing can be classified as full spectral unmixing (full unmixing, FU) and partial spectral unmixing (partial unmixing, PU)[2]. The former is performed on the full set of EMs while the latter on the partial set of them. PU has advantage over FU for two reasons. First, the number of classes in whole HSI is large while it is small for a specified mixed pixel. Second, the participation of unrelated classes to a mixed pixel can deteriorate unmixing accuracy of it. Under this condition, selecting class subset is proposed for each mixed pixel in spectral unmixng. But the selection process as in hierarchical unmixing and in stepwise unmixing is usually of expensive computational cost [3]. So far, traditional linear spectral mixing model (LSMM)[1] is dominant in spectral unmixing methods for its clear physical meaning, great convenience and low computational cost. In this method, one and only EM is used as a representation of each D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 767 – 772, 2006. © Springer-Verlag Berlin Heidelberg 2006
768
L. Wang, Y. Zhang, and C. Zhao
class. But in fact, the spectral variety of in-class is often great and one EM cannot represent the whole class in good. This case leads to a discounted unmixing accuracy of the method. Reference [4][5] proves that linear support vector machines (LSVM) [6] has potential capability to be used in spectral unmixing. Automatic selection of pure pixel and flexible processing of linearly nonseparable modeling lead to high unmixing performance of the method. The proof is hard to comprehend for common readers. Another, it is inconvenient to construct too many classifiers for LSVM when it is used in PU in which subset selection is about class set instead of EM one. Under this condition, the method is still not be used in spectral unmixing practically. In this paper, the equality of LSVM and LSMM for spectral umixing is proved concisely, and then a new double-unmixing scheme is proposed by combining the two models.
2 Proof of Equality of LSVM and LSMM for Spectral Umixing For the limitation of space, LSVM and LSMM are omitted here. Let f (•) be the discrimination function of LSVM. In this section, the equality of LSVM (based on 1a-r classifier structure) and LSMM is proved in a concise manner when the same set of EMs is available. For visualization purpose, 3-EM set is considered firstly. Denote general expression Δ xyz is the triangle formed by point x , y and z with area of S ( Δ xyz ) , and Lxy is the line segment formed by point x and y with length of
le( L xy ) . Suppose pixel P is mixed by EM A , B and C (see in fig. 1), and A S (Δ ABC ) equals 1, then it is easy to conclude that FLSMM (P) , the fractional abundance of EM A in pixel P , generated by LSMM based spectral unmixing, equals to S (Δ PBC ) . Let D be the crossing point of LBC and extended L AP , then we get: A FLSMM ( P ) = S (Δ PBC ) =
le( LPD ) . le( L AD )
(1)
A Now we prove that the abundance FLSVM (P) generated by LSVM is just the same
as (1). Prescribe EM A is positive class while B and C (and so D ) are negative ones, i.e.
f ( A) = 1, f ( B ) = f (C ) = f ( D) = 0 .
(2)
For any real number α and β and any pixel x1 and x2 , the following expression holds constantly:
f (αx1 + βx2 ) = αf ( x1 ) + βf ( x2 ) .
(3)
Combination of LSVM and LSMM for Spectral Unmixing
769
A
P
B
D
C
Fig. 1. Equality of LSVM and LSMM A Then, the fractional abundance FLSVM (P ) can be computed as
A FLSVM ( P) = f ( P)
=
le( LPD ) le( L AP ) f ( A) + f ( D) . le( L AD ) le( L AD )
=
le( LPD ) le( L AD )
(4)
From (1) and (4) we can see the two results are the same. As for the other abundances of EM B and C , the same conclusion is drawn. For more complicated cases including bias terms calculation and nonlinear re-estimation, the proof is given in reference [4], in which the unique merits of LSVM are also demonstrated. By the way, 1-a-r (one against rest) classifier structure, which constructs classifier for each class pair, is appropriate to construct LSVM while 1-a-1 (one against one) type, which constructs classifier between each class and the rest classes, is unfeasible for purpose of spectral unmixing.
3 New Spectral Unmixing Scheme and Its Advantages According to the analysis foregoing, PU is superior to FU in terms of unmixing accuracy. In this case, LSVM is confronted with the difficulty of constructing too many subclassifers. For example, when the number of classes is large as 10 and the cardinality of each subset is no more than 3, the number of subclassifers to be constructed is perhaps larger than 400. On the other hand, LSMM has low unmixing accuracy. Considered that the two models have complementary advantage, the combination of the tow models is hopeful of getting better unmixing performance. The new unmixing scheme can be described as the following 3 steps, in which the selection of class subset (and so EM one) is also carried out by exploiting spectral information (unmixing results) and spatial information (neighbor pixels): 1. LSVM based FU is used to decompose each mixed pixel. 2. For each mixed pixel, accumulating each class abundances obtained in step 1 in its 3× 3 local window according to the unmixing results in step 1. After removing
770
L. Wang, Y. Zhang, and C. Zhao
classes with less than 5 points (reference value) accumulated values, only no more than 4 classes (reference value) are retained as related classes to the mixed pixel. 3. If the number of related classes of a mixed pixel is 1, it is considered as pure one and its abundance is adjusted to 1. If the number is 2, decomposing the pixel again by pair-class LSVM. Otherwise, the pixel is reprocessed by LSMM based PU.
4 Improving SVM for Effective Unmixing In this section, we will present a weighted or robust least square SVM for better unmixing. Least square SVM as a version of SVM is widely used for its convenience, and its optimization expression is written as min J ( w, e) = w, b , e
γ 1 2 w + 2 2
n
¦ ei2
.
(5)
i =1
In order to obtain a robust estimation, we Let n +1 and n −1 be sample numbers in class +1 and –1 respectively, φ ( x +1 ) and φ ( x −1 ) be their corresponding class centers in
kernel space, and D( xi , x y ) be the distance between sample x i and its corresponding i
class center. Formulas for computing φ ( x +1 ) , φ ( x −1 ) and D( xi , x y ) are specified as: i
φ ( x +1 ) =
1 n +1
¦ φ ( x ), i
i , yi = +1
φ ( x −1 ) =
1 n −1
¦φ (x ) . i
i , y i = −1
D( xi , x yi ) = φ ( xi ) − φ ( x yi ) = [ K ( xi , xi ) + K ( x yi , x yi ) − 2K ( xi , x yi )]1 / 2 .
(6)
(7)
For more description, one can consult reference [7].
5 Experiments and Results In the first group experiments, a simple comparison of FU effect is performed for LSVM and LSMM on a subscene (40 × 50 pixels, 126 bands) of naval military base HSI acquired in San Diego. The two EMs are the mean spectrum of two classes of training samples (100 samples in each class) selected manually in the subscene. The resulting fractional images are shown in fig. 2. It can be seen LSVM has better unmixing effect than LSMM. In the second group experiments, correction coefficient (CC)[8] is used as an evaluation criterion of spectral unmixing. The HSI comes from an agriculture/forestry landscape AVIRIS data acquired in Indian Pine Test Site (144 × 144 pixels, 200 bands). Four class crops are selected from the HSI (displayed in 4 levels grey) as experimental data. (see in fig. 3). Fig. 3 b) gives the region in which the pixels need to be unmixed again. The comparison of unmixing accuracy between the new scheme and traditional LSMM method is given in table 1. The new scheme gets about 10 points improvement of unmixing accuracy. Fig. 4 gives the comparison of fractional images. In visual, the new scheme has a clear superiority to traditional LSMM method.
Combination of LSVM and LSMM for Spectral Unmixing
771
Table 1. Comparison of unmixing accuracy
LSMM New
Unmixing accuracy of each class 0.7826 0.6858 0.9196 0.8369 0.9003 0.8369 0.9761 0.8868
(1) Original image
Average 0.8062 0.8062 0.9000 0.9000
(2) Fractional image(LSVM) (3) Fractional image(LSMM)
Fig. 2. Comparison of unmixing performance
a) Original 4 class crops
b) Mixed area
Fig. 3. Extraction of mixed area
a) Fractional images generated by traditional LSMM
b) Fractional images generated by new unmixing scheme Fig. 4. Unmixing fractional images of 4 class crops under different methods
772
L. Wang, Y. Zhang, and C. Zhao
6 Conclusions LSVM and LSMM have different advantages and shortcomings, and the appropriate utilizing of them is helpful of improving unmixing performance. When all classes or only two classes are included in spectral unmixing, LSVM is preferred, and LSMM otherwise. Another, selecting class subset by combination of spatial information and spectral information is effective and free of cumbersome computation. The proposed scheme gives a feasible manner for applying LSVM to spectral unmixing and provides new idea for improving unmixing performance. Furthermore, improved LSVM (such as robust LSVM) instead of original ones can be used for a better unmixing performance. The proposed spectral unmixing scheme has several advantages. First, this scheme makes use of spatial information and partial set of classes in spectral unmixing. This is more effective comparing with other methods based on spectral information only and full set of classes. Second, SVM based spectral unmixing has good properties than traditional LSMM, such as automatic selection of pure pixels, multi-pixel representation of each class, easy extension for non-linear classification, etc. Third, unmixing accuracy can be further improved by improving SVM property. The proposed scheme will be the new thought to develop spectral unmixing. In future, the application of nonlinear SVM in unmixing waits for further argumentation.
References 1. Keshava, N., Mustard, J. F.: Spectral Unmixing. Signal Processing Magazine, IEEE. 19 (2002) 44–57 2. Nielsen, A. A.: Spectral Mixture Analysis: Linear and Semi-Parametric Full and Iterated Partial Unmixing in Multi- and Hyperspectral Image Data. International Journal of Computer Vision, 42 (2001) 17-37 3. Daniel, N.: Evaluation of Stepwise Spectral Unmixing with Hydice Data. SIMG-503 Senior Research Final Report Center for Imaging Science. http://www.cis.rit.edu/research/thesis/ bs/1999/newland/title.html 4. Martin, B., Hugh, G. L., Steve, R. G.: Linear Spectral Mixture Models and Support Vector Machines for Remote Sensing. IEEE Trans. on Geoscience and RemoteSensing, 38 (2000) 2346-2360 5. Martin, B., Gunn, S. R., Lewis, H. G.: Support Vector Machines for Optimal Classification and Spectral Unmixing. Ecol. Modeling, 120 (1999) 167-179 6. Vapnik, V. N.: The Nature of Statistical Learning Theory. New York, Springer Press, NY (1995) 7. Wang, L.G., Zhang Y., Zhang, J.P.: A New Weighted Least Squares Support Vector Machines and Its Sequential Minimal Optimization Algorithm. Chinese J. Electron, 15 (2006), to be appear in No. 3 8. Hassan, E.: Introducing Correctness Coefficient as An Accuracy Measure for Sub Pixel Classification Results. www.ncc.org.ir/articles/poster83/H. Emami.pdf
Combining Speech Enhancement with Feature Post-processing for Robust Speech Recognition Jianjun Lei, Jun Guo, Gang Liu, Jian Wang, Xiangfei Nie, and Zhen Yang School of Information Engineering, Beijing University of Posts and Telecommunications, 100876 Beijing, China [email protected]
Abstract. In this paper, we present an effective scheme combining speech enhancement with feature post-processing to improve the robustness of speech recognition systems. At front-end, minimum mean square error log-spectral amplitude (MMSE-LSA) speech enhancement is adopted to suppress noise from noisy speech. Nevertheless this enhancement is not perfect and the enhanced speech retains signal distortion and residual noise which will affect the performance of recognition systems. Thus, at back-end, the MVA feature postprocessing is used to deal with the remaining mismatch between enhanced speech and clean speech. We have evaluated recognition performance under noisy environments using NOISEX-92 database and recorded speech signals in continuous speech recognition task. Experimental results show that our approach exhibits considerable improvements in the degraded environment.
1 Introduction In the past decade, the performance of automatic speech recognition (ASR) has been significantly improved. More and more ASR systems are being deployed in many applications. In many situations, these speech recognition systems must be operated in some adverse environments, where ambient noise becomes the major hurdle to achieve high-accuracy recognition performance. Various methods have been proposed to improve environmental robustness of ASR, which can be broadly classified into three categories [1]. Firstly, an increase of the noise robustness can be achieved by extracting speech features that are inherently less distorted by noise. Cepstral mean normalization (CMN) [2], with the merits of inexpensive computation and good recognition performance, can remove the mean cepstrum from all vectors with the cepstral mean calculated separately from each sentence. Second approaches, such as parallel model combination (PMC) [3], adapt the acoustic models in the recognizer to the changing noise conditions. In the third category, speech features are enhanced before they are fed into the recognizer. This can be achieved either prior to the feature extraction, like speech enhancement [4] [5], or by incorporating extra compensating steps into the feature extraction module, like feature compensation [6]. Such an enhancing step is largely independent of the vocabulary size of the recognizer and also does not require an adaptation of the recognition software. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 773 – 778, 2006. © Springer-Verlag Berlin Heidelberg 2006
774
J. Lei et al.
In our research we focus on the method combining speech enhancement with feature post-processing to improve the robustness of speech recognition systems. In [7] [8], spectral subtraction (SS) and Wiener filter were tested for combining with feature post-processing to improve the robustness of speech recognition systems. Minimum mean square error log-spectral amplitude (MMSE-LSA) [9] estimator is superior to SS and Wiener filter, and results in a much lower residual noise level without further affecting the speech itself. Thus, we adopted the MMSE-LSA enhancement approach in front-end to suppress the corrupted additive noise. The residual noise after MMSELSA processing can be regarded as additive and stationary noise approximately, which ensures that some simplified feature post-processing can be used at back-end. MVA [10] feature post-processing has shown great success in smoothing the speech parameterization, so we use it to deal with the remaining mismatch between enhanced speech and clean speech. Experiments show that our approach achieves significant improvements in the system performance. The remainder of this paper is organized as follows. The next section describes the MMSE-LSA algorithm. Section 3 shows the procedures of MVA feature postprocessing. The experiments and the results are given in section 4 and some conclusions are drawn in section 5.
2 MMSE-LSA Speech Enhancement The speech signal and noise process are assumed statistically independent, and spectral components of each of these two processes are assumed zero mean statistically independent Gaussian random variables. Let X k = Ak e jα k , Dk , and Yk = Rk e jν k , denote the k-th Fourier expansion coefficient of the speech signal, the noise process, and the noisy observations, respectively, in the analysis interval [0 ,T ] . Let
{ } and λ (k ) = E{D } denote, respectively, the variances of the clean
λ x (k ) = E X k
2
2
d
k
and noisy spectral components. MMSE-LSA aims at producing an estimate of Ak whose logarithm is as close as possible to the logarithm of Ak in the MMSE sense. Under the above assumptions on speech and noise, this perceptually motivated criterion results in the estimator given by −t ˆ = ε k exp§¨ 1 ∞ e dt ·¸ R A k ³ ¨ 2 νk t ¸ k 1+ εk © ¹
(1)
εk Rk2 λ (k ) γk ; εk = x ; γk = . 1+ εk λ d (k ) λ d (k )
(2)
where vk =
ε k and γ k are the a priori and a posteriori signal-to-noise ration(SNR), respectively. In order to reduce computational complexity, the exponential integral in (1) may be evaluated using the functional approximation below instead of iterative solutions or tables [11]. Thus, to approximate
Combining Speech Enhancement with Feature Post-processing
ei(v ) =
∞
³v
e−x dx x
775
(3)
with
− 2.31log10 (v ) − 0.6, for v < 0.1, ° ~ e i(v ) ≈ ®-1.544log10 (v ) + 0.166, for 0.1 ≤ v ≤ 1, ° 10 − 0.52 v −0.26 , for v > 1. ¯
(4)
3 MVA Feature Post-processing After MMSE-LSA speech enhancement, a great reduction of the mismatch between noisy speech and clean speech is obtained. Nevertheless this enhancement is not perfect and the enhanced speech retains residual noise which will affect the performance of recognition systems. MVA post-processing has been demonstrated as an effective technique to smoothing the speech parameterization in [10]. Thus, we use it to deal with the residual mismatch between enhanced speech and clean speech after the speech enhancement. The MVA post-processing is quite similar to certain schemes well known to the community (namely variance normalization and mean subtraction). The crucial difference lies in the domain in which the post-processing is applied. In this work, we apply MVA post-processing after delta and delta-delta feature processing. Once speech enhancement is applied to reduce the mismatch in the spectral domain, the standard mel-frequency cepstral coefficients (MFCC), c1…c12, and the log energy along with their delta and delta-delta coefficients are computed and processed by MVA. For a given utterance, we represent the data by a matrix C whose element Ctd is the dth component of the feature vector at time t, t = 1…T, the number of frames in the utterance and d = 1…D, the dimension of the feature space. The first step is mean subtraction defined by: Ctd′ = C td − μ d
(5)
where
μd =
1 T
T
¦ Ctd
.
(6)
t =1
This is followed by variance normalization defined by: Ctd =
C td′
σd
=
C td − μ d (7)
σd
where
σd =
1 T
T
¦ (C td t =1
− μd )
2
.
(8)
776
J. Lei et al.
The third step is mixed auto-regression moving average filter, defined by: ¦ M C~(t −i )d + ¦ M C (t + j )d j =0 ~ ° i =1 if M < t ≤ T − M C td = ® 2M + 1 °C otherwise ¯ td
(9)
where M is the order of the auto-regression moving average filter.
4 Experiments A continuous HMM-based speech recognition system is used in the recognition experiments for examining the presented approach. The database we used in these experiments is selected from the mandarin Chinese corpus provided by the 863 plan (China High-Tech Development Plan) [12]. The training set for train HMMs includes utterances of 96 (48 males and 48 females) speakers. The test set includes utterances of 10 (5 males and 5 females) speakers. The white, f-16 and factory1 noise from NOISEX-92 [13] are added to the test set by varying the signal-to-noise ratio (SNR) from 0dB to 20dB. The acoustic modeling is based on a set of 61 phones. Each phone is modeled as a three-emitting-state left-right topology with a mixture of 8 Gaussians per state and diagonal covariance matrices. Then triphone-based acoustic models are used in this continuous speech recognition task. In order to test the validity of the approach, four experiments are done: the baseline without noise reduction, enhanced with MMSE-LSA, feature post-processing by MVA and MMSE-LSA combined with MVA feature post-processing. They are respectively titled as “Baseline”, “LSA”, “MVA” and “LSA-MVA” in Table 1-3. Tests with white noise, showed in Table 1, demonstrate that “LSA”, “MVA” and “LSA-MVA” can be more effective than “Baseline”, especially in the low SNR conditions. For example, in the 10dB white noise condition, it is observed that “LSA”, “MVA” and “LSA-MVA”, respectively, achieve absolutely 44.09%, 21.32% and 45.84% improvement when compared with “Baseline”. As a whole, “LSA”, “MVA” and “LSA-MVA” achieve averagely 22.77%, 14.09% and 25.35% improvement, respectively. Further, the performances of all approaches in clean environments are also well maintained between 81% and 84%, our method almost doesn’t affect the performance of system in clean environments. To investigate the effectiveness of the approach in the nonstationary environments, we test the f-16 noisy environments and the factory1 noisy environments with different SNRs. It is observed from Table 2 and Table 3 that the approach can also effective for improving the system performance. In the f-16 noisy environments, “LSA”, “MVA” and “LSA-MVA”, respectively, achieve 13.67%, 5.01% and 15.30% improvements compared with “Baseline”. In the factory1 noisy environments, they averagely achieve 6.73%, 2.43% and 7.81% improvements, respectively. From another point of view, we observe from Table 1-3 that “LSA-MVA” is more effective than “LSA” and averagely obtain 2.58% improvement for the white noise, 1.63% for the f-16 noise, and 1.08% for the factory1 noise, respectively. The experimental results demonstrate that the MVA feature post-processing can decrease the residual mismatch between enhanced speech and clean speech after MMSE-LSA speech enhancement.
Combining Speech Enhancement with Feature Post-processing
777
Table 1. Recognition rates (%) for additive white noise
SNR
0dB
5dB
10dB
15dB
20dB
Clean
Avg.
Baseline
0.15
1.61
8.32
20.73
48.47
83.61
27.15
LSA
8.47
25.51
52.41
61.17
70.66
81.27
49.92
MVA
5.99
14.16
29.64
50.07
64.67
82.88
41.24
LSA-MVA
9.78
26.86
54.16
66.60
76.20
81.42
52.50
Table 2. Recognition rates (%) for additive f-16 noise
SNR
0dB
5dB
10dB
15dB
20dB
Clean
Avg.
Baseline
2.77
13.28
33.87
62.04
73.58
83.61
44.86
LSA
16.80
42.18
62.48
71.53
76.93
81.27
58.53
MVA
12.41
27.01
43.94
61.75
71.24
82.88
49.87
LSA-MVA
17.08
43.21
67.30
73.58
78.39
81.42
60.16
Table 3. Recognition rates (%) for additive factory1 noise
SNR
0dB
5dB
10dB
15dB
20dB
Clean
Avg.
Baseline
2.04
11.39
37.96
62.19
74.89
83.61
45.35
LSA
9.60
29.62
54.74
66.28
70.95
81.27
52.08
MVA
8.18
19.85
41.75
61.61
72.41
82.88
47.78
LSA-MVA
9.64
29.34
55.91
68.18
74.45
81.42
53.16
5 Conclusions This paper describes a method that combining MMSE-LSA speech enhancement with MVA feature post-processing for robust speech recognition. The MMSE-LSA speech enhancement is adopted to suppress noise from noisy speech. The residual noise after MMSE-LSA processing can be regarded as additive and stationary noise approximately, which ensures that some simplified feature post-processing can be used at back-end. Thus, the MVA feature post-processing is used to deal with the remaining mismatch. Experimental results show that our method exhibits considerable improvements under noise conditions and MVA feature post-processing can further increase the performance of the system after speech enhancement.
778
J. Lei et al.
Acknowledgements This research is partially supported by NSFC (National Natural Science Foundation of China) under Grant No.60475007, Key Project of Chinese Ministry of Education under Grant No.02029, and the Foundation of Chinese Ministry of Education for Century Spanning Talent.
References 1. Gong, Y.: Speech Recognition in Noisy Environments: A Survey. Speech Communication, Vol. 16, No. 3 (1995) 261–291 2. Atal, B. S.: Effectiveness of Linear Prediction Characteristics of the Speech Wave for Automatic Speaker Identification and Verification. Journal of the Acoustical Society of America, Vol. 55, No. 6 (1974) 1304-1312 3. Gales, M. J. F., Young, S. J.: Robust Continuous Speech Recognition Using Parallel Model Combination. IEEE Transactions on Speech and Audio Processing, Vol. 4, No. 5, (1996) 352-359 4. Ephraim, Y., Lev-Ari, H., Roberts, W. J. J.: A Brief Survey of Speech Enhancement. The Electronic Handbook, CRC Press, http://ece.gmu.edu/~yephraim/ephraim.html (2005) 5. Ephraim, Y., Cohen, I.: Recent Advancements in Speech Enhancement. The Electrical Engineering Handbook, CRC Press, http://ece.gmu.edu/~yephraim/ephraim.html (2005) 6. Moreno, P. J., Raj, B., Stern, R. M.: A Vector Taylor Series Approach for EnvironmentIndependent Speech Recognition. Proceedings of ICASSP’96 (1996) 733-736 7. Segura, J. C., Benitez, C., de la Torre, A., Rubio, A. J.: Feature Extraction Combining Spectral Noise Reduction and Cepstral Histogram Equalization for Robust ASR. Proceedings of ICSLP’02 (2002) 225-228 8. Segura, J. C., Ramirez, J., Benitez, C. de la Torre, A., Rubio, A. J.: Improved Feature Extraction Based on Spectral Noise Reduction and Nonlinear Feature Normalization. Proceedings of EUROPSPEECH’03 (2003) 353-356 9. Ephraim, Y., Malah, D.: Speech Enhancement Using a Minimum Mean Square Error LogSpectral Amplitude Estimator. IEEE Transactions on Speech, Signal Processing, Vol. 33, No. 2 (1985) 443-445 10. Chen, C. P., Bilmes, J., Ellis, D. P. W.: Speech Feature Smoothing for Robust ASR. Proceedings of ICASSP’05 (2005) 525-528 11. Martin, R., Malah, D., Cox, R. V., Accardi, A. J.: A Noise Reduction Preprocessor for Mobile Voice Communication. EURASIP Journal on Applied Signal Processing, Vol. 8 (2004) 1046-1058 12. Zu, Y. Q.: Issues in the Scientific Design of the Continuous Speech Database. http://www.cass.net.cn/chinese/s18_yys/yuyin/ report/report_1998.htm 13. Varga, A. and Steeneken, H. J. M.: Assessment for Automatic Speech Recognition: II. NOISEX-92: A Database and an Experiment to Study the Effect of Additive Noise on Speech Recognition Systems. Speech Communication, vol. 12, no. 3, (1993)247-251
Conic Section Function Neural Networks for Sonar Target Classification and Performance Evaluation Using ROC Analysis Burcu Erkmen and Tulay Yildirim Yildiz Technical University, Department of Electronics and Communications Engineering 34349 Besiktas, Istanbul-Turkey {bkapan,tulay}@yildiz.edu.tr
Abstract. The remote detection of undersea mines in shallow waters using active sonar is a crucial subject required to maintain the security of important harbors and cost line areas. Neural network classifiers have been widely used in classification of complex sonar signals due to its adaptive and parallel processing ability. In this paper, Conic Section Function Neural Networks (CSFNN) is used to solve the problem of classification underwater targets. Simulation results support the ability of CSFNN with computational advantages of traditional neural network structures to utilize highly complex sonar classification problem. Receiver Operating Characteristic (ROC) analysis has been applied to the neural classifier to evaluate the sensitivity and specificity of diagnostic procedures. The ROC curve of the classifier based on different threshold settings demonstrated excellent classification performance of the CSFNN classifier.
1 Introduction Automatic identification and classification of underwater signals on the basis of sonar signals are challenging problem due to the complexity of the ocean environment. Identification by human experts is usually not objective and a very heavy workload. Neural Networks with their adaptive and computational advantages over the traditional signal processing and pattern recognition techniques appear to be ideally suited to active sonar classification. The pioneer papers by Gorman and Sejnowski [1], [2] were perhaps the first papers which reported the application of neural networks to this area. After these pioneer papers, there has been growing interest in the use of networks for the automatic recognition of sonar targets. Among the several neural classifier, Multi-layer Perceptron (MLP) classifier have been used in several applications about sonar target classification [1]-[7]. Radial Basis Function Networks (RBFN) [3, 8], General Regression Neural Networks [9], and the Probabilistic Neural Networks (PNN) [3] have been also the efficient Feed-Forward Neural Networks widely used to classify sonar signal in literature. The Receiver Operating Characteristic (ROC) analysis [10] is an efficient method of measuring and comparing diagnostic performance of medical and sonar studies [5, 6, 11, 12]. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 779 – 784, 2006. © Springer-Verlag Berlin Heidelberg 2006
780
B. Erkmen and T. Yildirim
In this paper, Conic Section Function Neural Networks (CSFNN) has been used to classify sonar returns from two different targets on the sandy ocean bottom a mine and cylindrical shaped rock. The idea brought forward by Dorffner [13] is to generalize the function of a unit to include all these decision regions in only one network, providing a relationship between an MLP unit and an RBFN unit. To evaluate the performance of neural classifier, the ROC curve has been obtained based on different threshold settings. This paper organized as follows. Section 2 briefly describes the CSFNN structure. Section 3 presents some concepts related to ROC analysis. In Section 4, the neural classifier designs, simulation results and performance evaluation with ROC analysis are presented. Finally, Section 5 outlines some conclusions.
2 Conic Section Function Neural Networks The CSFNN is capable of making automatic decisions depending on the data distribution of a given application. The decision boundaries of MLP (hyperplane) and of RBF (hypersphere) are the special cases of CSFNN. There would be intermediate types of decision boundaries such as ellipses, hyperbolas or parabolas in between two cases which are also all valid for decision regions. The propagation rule of CSFNN (when will consist of RBF and MLP propagation rules) can be derived using analytical equations for a cone. The following form (Eq.1) is obtained for n-dimensional input space. F
pj
(x) =
N
¦
i =1
(x
pi
− c ) w − cos ω ij ij j
N 2 ¦ ( x pi − c ij ) i =1
(1)
Where xpi refers to input vector for p. pattern, wij refers to the weights for each connection between the input and hidden layer, cij refers to center coordinates and Ȧij refers to opening angles. i and j are the indices referring to the units in the input and hidden layer, respectively. This equation consists of two major parts analogous to the MLP and the RBF. The equation simply turns into the propagation rule of an MLP network, which is the dot product (weighted sum) when the Ȧ is ʌ/2. Second part of the equation gives the Euclidean distance between the inputs and the centers for an RBF network. The network can be started with an initialized MLP or with an RBF depending on the opening angels [14].
3 Receiver Operating Characteristic Analysis ROC analysis is an established method of measuring diagnostic performance for the analysis of radar images. The ROC curve is a good measure when performance of different schemes needs to be compared. The evaluation criteria are based on the ROC curve, used in sonar target classification system to indicate trade-off of the conditional probability of correct classification versus conditional probability of falsealarm responses. Equivalently, the ROC curve is a graphical representation of the trade-off between sensitivity (Sn) and Specificity (Sp). Sensitivity and specificity are the basic expressions (Eq.2. and Eq.3.) for the diagnostic test interpretation of the ROC analysis.
CSFNN for Sonar Target Classification and Performance Evaluation
781
sensitivit y =
number of true positives number of true positives + number of false negatives
(2)
specificity =
number of true negatives number of true negatives + number of false positives
(3)
A simple calculation of sensitivity and specificity for both classifiers are given above. Furthermore, a simple estimation of the accuracy of CSFNN classifier, which is defined by means of the dreal (Eq 4) distance of a real classifier from ideal one (dideal=0): d real =
(1 − sensitivit y )2 + (1 − specificit y )2
(4)
If sensitivity (x-axis) against specificity (y-axis) for each classifier had been plotted it could have been treated dreal as the Euclidian distance of a (1-sensitivity, 1specifity) point from the top right (1,1) corner, which represented the ideal classifier. The smaller the distance, the better the classifier is. The using of artificial neural networks as a classifier in sonar applications, the operating points for the ROC curve can be generated by varying threshold value for the output node. The performance of diagnostic test is satisfied when ROC curve climbs rapidly towards upper left hand corner of the graph. On the other hand, unsatisfied performance is obtained when the ROC curve follows a diagonal path from the lower left hand corner to the upper right hand corner.
4 Simulation Results The aim of this study is to employ Conic Section Neural Networks to classify sonar returns. The dataset, which is the original sonar data used by Gorman and Sejnowski [1, 2], was taken from the University of California collection of machine-learningdatabases. Although this dataset is an old one, it is a well-known benchmark for sonar problems. This dataset consists of sonar returns collected from two sources: a metal cylinder and similarly shaped rock. Both objects were lying on a sand surface, and the sonar chirp projected at them from different angles (aspect-angles) produced the variation in the data. The data set consists of 208 returns (111 cylinder shaped mine returns and 97 rock returns) which are sorted in increasing order of aspect-angle. The transmitted sonar signal is a frequency-modulated chirp, rising in frequency. The entire data set was split into randomly train and test sets (104 samples for training file, 104 samples for test file). The data was filtered and spectral envelope of 60 samples (the inputs or dimensionality of our data) was extracted. The network (CSFNN) used here was based on a fully-connected feed-forward neural network composed of 60 input nodes and an output node. The network properties of the sonar classification system are shown in Table 1. As stated before, CSFNN is capable of combining decision region of MLP and RBFN only one network. Therefore, MLP and RBFN are also used to classify the same dataset to compare classification performance with CSFNN. Classification process was performed by using MATLAB 7.0.
782
B. Erkmen and T. Yildirim Table 1. Network properties of sonar classification neural system
Properties Learning Algorithm Number of Input Nodes Neuron Number of Hidden Layer Neuron Number of Output Layer Learning Rate Momentum Rate Activation Function Training Method Bias Weights Epoch Number Initial Weights Initialization of CSFNN
Values-Methods Back Propagation 60 26 1 0.1 0.8 Logarithmic Sigmoid Continuous Training Used 1000 Random RBF case (Ȧ=ʌ/4)
Table 2. Classification rate (%) comparisons for MLP, RBF and CSFNN
MLP RBF CSFNN
CLASSES “mine” “rock” class. rates (%) class. rates (%) Train Test Train Test 100 85.5 100 88 100 85.5 100 95.24 100 88.71 100 97.62
OVERALL Total class. rates (%) Train Test 100 86.5 100 89.42 100 92.3
Table 2 shows the classification rate of three types of neural classifiers for comparison. MLP have two hidden layers (30-10 neurons) training with backpropagation algorithm using m=0.8 momentum parameter and lr=0.1 learning rate. RBF has used an Orthogonal Least Squares algorithm for selecting and optimally locating centers. In this experiment, RBFN has 70 hidden neurons referring to RBFN centers. As can be seen from Table 2, the training result of CSFNN was reached to 100% success rate same as other networks. However, in testing phase, CSFNN showed a much better performance of classification performance with 92.3% success rate. These results show that CSFNN is an efficient neural network in sonar classification problem. Finally, ROC analysis has been applied to test results of CSNN classifier to evaluate the classification performance. The evaluation criteria of ROC analysis in sonar detection indicate the trade-off of probability of true detection versus probability of false detection. Table 3 is employed for these calculations. Table 3 labeled with classification results on the left side and mine absent/present status on the top. Table 3. Diagnostic test interpretation table
The Result of the Classifier “Mine” “Rock”
Mine Present True Positives (TP) False Negatives (TP)
Rock Present False Positives (FP) True Negatives (TN)
CSFNN for Sonar Target Classification and Performance Evaluation
783
The ROC curve is a graphical representation of the trade-off between sensitivity (Sn) and Specificity (Sp). ROC curve is plotted using Table 4 for the diagnostic test interpretation. The operating points for the ROC curve can be generated by varying threshold value for the output node. dreal is also calculated to estimate the accuracy of the CSFNN classifier. Table 4. Evaluation of the CSFNN classifier performance for each threshold setting
Threshold Values 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
TP
FP
FN
TN
Sensitivity
62 60 59 55 55 55 55 47 42 31 0
42 19 11 4 1 1 1 1 0 0 0
0 2 3 7 7 7 7 15 20 31 62
0 23 31 38 41 41 41 41 42 42 42
1 0.9677 0.9516 0.8871 0.8871 0.8871 0.8871 0.7581 0.6774 0.5 0
1-Specifity dreal 1 0.4524 0.2619 0.0952 0.0238 0.0238 0.0238 0.0238 0 0 0
1 0.45 0.266 0.022 0.013 0.013 0.013 0.243 0.323 0.5 1
Test Rates(%) 59.61 79.81 86.53 89.42 92.3 92.3 92.3 84.61 80.76 70.19 40.38
As can be observed from Figure 1, ROC curve climbs rapidly towards upper left hand corner of the graph. This proves that, CSFNN classifier provides excellent classification performance for sonar returns.
Fig. 1. Receiver Operating Characteristic (ROC) Curve for CSFNN
5 Conclusions The general objective of this work was to classify underwater target collected from two sources: a metal cylinder and similarly shaped rock using a CSFNN. When the performance of CSFNN is compared with traditional neural classifiers (MLP and
784
B. Erkmen and T. Yildirim
RBF) in terms of successful classification rates, classification results prove the success of CSFNN classifier. ROC curve is also demonstrated the excellent classification performance of CSFNN for sonar classification problem. This neural classifier will be useful for sonar researchers.
References 1. Gorman, R. P., Sejnowski, T. J.: Learned Classification of Sonar Targets Using a Massively Parallel Network. IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 36, No.7, (1988) 1135-1140 2. Gorman, R. P., Sejnowski, T. J.:. Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets. Neural Networks, Vol 1, No 1, (1988) 75-89 3. Chen, C.H.: Neural Networks for Active Sonar Classification. Pattern Recognition, Vol.II. Conference B: Pattern Recognition Methodology and Systems, Proceedings., 11th IAPR International Conference on , (1992) 438 – 440 4. Diep, D., Johannet, A., Bonnefoy, P.; Harroy, F.; and Loiseau, P.:Classification of Sonar Data for a Mobile Robot Using Neural Networks. Intelligence and Systems, Proceedings., IEEE International Joint Symposia on, (1998) 257 – 260 5. Haley, T.B.: Applying Neural Networks to Automatic Active Sonar Classification. Pattern Recognition, Proceedings., 10th International Conference on, Vol: 2, 41 – 44 6. Shazeer, D.J., Bello, M.G.: Minehunting with Multi-layer Perceptrons. (1991) Neural Networks for Ocean Engineering, 1991, IEEE Conference on, (1990) 57 – 68 7. Jing, Y.Y., El-Hawary, F.: A Multilayered ANN Architecture for Underwater Target Tracking. Electrical and Computer Engineering, Conference Proceedings. 1994 Canadian Conference on, vol.2, (1994) 785 - 788 8. Yegnanarayana, B., Chouhan, H.M., Chandra Sekhar, C.: Sonar Target Recognition Using Radial Basis Function Networks. Singapore ICCS/ISITA '92. ‘Communications on the Move', vol.1, (1992) 395 - 399 9. Kapano÷lu, B., Yıldırım, T.: Generalized Regression Neural Networks For Underwater Target Classification. Neu-Cee 2nd International symposium on Electrical and Computer Engineering, Nicosia, North Cyprus, (2004) 223-225 10. Woods, K.S., Bowyer, K.W.: Generating ROC Curves for Artificial Neural Networks. Computer-Based Medical Systems, Proceedings IEEE Seventh Symposium on , (1994) 201 - 206 11. Azimi Sadjadi, M.R., Yao, D., Huang, Q., Dobeck, G.J.: Underwater Target Classification Using Wavelet Packets and Neural Networks. IEEE Transaction on Neural Networks, Vol 11., No. 3., May (2000) 12. Ward, M.K., Stevenson, M.: Sonar Signal Detection and Classification Using Artificial Neural Network. Electrical and Computer Engineering, 2000 Canadian Conference on Vol. 2, vol.2, March (2000) 717 - 721 13. Dorffner, G.: Unified Framework for MLPs and RBFNs: Introducing Conic Section Function Networks. Cybernetics and Systems”, Vol.25, (1994) 511-554 14. Yıldırım, T.: Development of Conic Section Function Neural Networks in Software and Analogue Hardware. Phd Thesis, University of Liverpool, May (1997)
3D Map Building for Mobile Robots Using a 3D Laser Range Finder Zhiyu Xiang and Wenhui Zhou Dept. of Information & Electronic Engineering, Zhejiang University 310027 Hangzhou, P. R. China {xiangzy, zhouwh}@zju.edu.cn
Abstract. 3D map building for mobile robots under cluttered indoor environments remains a challenge. The problem lies in two aspects: map consistency and computational complexity. A novel method to finish such a task is proposed. The system employs a special designed 3D laser range finder, which is built on the basis of a 2D laser scanner, as the environment-perceiving sensor. The registration of 3D range images is realized by localization on a so-called ceiling map, which is the set of intersecting lines between the ceiling and the walls in the room. By matching the calculated local ceiling map with the global one, accurate sampling positions without suffering from accumulative errors can be obtained. A consistent 3D map can then be built on the basis of it. The experimental results demonstrate our success.
1 Introduction Precise 3D map of environments are greatly valuable for the automation in many important areas [1]. The increasing need for rapid characterization and quantification of complex environments has created challenges for data analysis. The challenge lies in several aspects: a) lacking of good, cheap and fast sensors that allow the robots to sense the environment in real time; b) the algorithms to generate a consistent 3D map with low computational cost. A widely used sensor for 3D perception is laser range finder (or LADAR). Lots of localization and 2D map building algorithms (SLAM) have been developed [2] based on 2D laser range finder. However 2D laser range finer can only scan on a fixed plane acquiring limited range information. To extend it into 3D applications, Zhao et al. [3] acquired the 3D information by using two 2D range finders mounted horizontally and vertically on the mobile robots. 3D laser range finer is a good choice for map building. However, today’s commercial 3D laser rang finders are large and heavy, built for stationary use mainly. For consistent map building, registration of different views acquired by robot is the key problem. Pair wise matching such as standard ICP [4] and incremental matching [5] suffers error-accumulating results. Meanwhile most of the algorithms have heavy computational costs and are difficult to implement in real time. In this paper a novel consistent 3D map building method is proposed. The system employs a special designed 3D laser range finder, which is built on the basis of a 2D D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 785 – 790, 2006. © Springer-Verlag Berlin Heidelberg 2006
786
Z. Xiang and W. Zhou
laser scanner, as the environment-perceiving sensor. The registration of 3D range images is realized by localization on a so-called ceiling map, which is the set of intersecting lines between the ceiling and the walls in the room. Using of ceiling map has three advantages: a) a priori global ceiling map is easily available either from the design drawings or by manual measurement; b) it is only 2-dimensional, meaning computational efficiency in localization; c) due to its height it is relatively immune to the obstruction under cluttered environments. By matching the calculated local ceiling map with the global one, sampling positions without suffering from error accumulation can be obtained. A consistent 3D map can then be built on the basis of it. Furthermore to reduce the cost of computation, the map fusion process is based on the planar surfaces instead of raw 3D points. The whole algorithm is very computational effective and can be implemented in real time. The paper is organized as follows: Section 2 introduces the design of 3D laser range finder briefly. Section 3 presents the algorithms for surface detection and local ceiling map generation. In section 4 3D map building algorithms are described. Section 5 presents the experimental results and the conclusion is given in the last section.
2 3D LADAR System The system is composed of three components: a) 2D LADAR; b) mechanical scanning equipment; c) control and data acquiring unit. The mechanical scanning equipment includes a supporting base and an axle driven by a step motor. The task of the control unit is: a) receiving the command from the host computer; b) sending the current pitching angle to the host computer. The host computer saves the high-speed LADAR ranging data through RS-422 and pitching angle from the control unit simultaneously. The coordinates of the 3D sampling points in LADAR coordinate system can be obtained from the equation (1): x = ρ cos α ° ® y = ρ sin α cos θ ° z = ρ sin α sin θ ¯
(1)
where ρ is the ranging data, α and θ represents the horizontal and vertical scanning angle, ( x, y, z ) represents the coordinates of the 3D points.
3 Local Surface and Ceiling Map Generation 3.1 Local Surface Map Building Local surface map building is carried out on an on-line scheme to accelerate the algorithm. It consists of two major steps: line detection and surface generation. Line Detection. Upon arriving of each 2D scan from the 3D LADAR, first the points whose range beyond a certain threshold are discarded to reduce the noise. Then the
3D Map Building for Mobile Robots Using a 3D Laser Range Finder
787
neighboring points whose range differences are small enough are clustered into clouds. Within each cloud a recursive splitting process is carried out to sub-divide the clouds into small segments within which all of the points can be fit by one line. This step is carried out on local 2D scanning coordinates. Surface Generation. After line detection is done the data is converted into 3D. Based on the detected lines, surface generation algorithm detects the surfaces in the 3-dimensional scene. The algorithm precedes the following steps [1]: 0. The first set of lines coming from the very first 2D scan is stored; 1. Every other line is being checked with the set of stored lines. If a matching line is found, these two lines are transformed into a surface; 2. If no such matching line exists, the line may be an extension of an already found surface. In this case, the new line is matching with the top line of a surface. This top line is being replaced by the new line, resulting in an enlarged surface; 3. Otherwise the line is stored as a stand-alone line in the set mentioned above. Two main criteria have to be fulfilled in order to match lines: a) the angle between the two lines has to be small enough; b) the distance from any end points to the other line must be small enough. To achieve real time capabilities, the algorithm makes use of the characteristics of the data as it comes from the range finder, i.e., the lines are sorted throughout the whole scene due to their inherited order. Thus an efficient local search can be realized. Fig. 1 shows an example of the detected planar surfaces from a set of 3D data. 3.2 Ceiling Map Generation Under cluttered environments usually the 2D map on the height level of the LADAR is very trivial, leading difficulties when use it for range image registration. Fig. 2 (a) shows such a cluttered 2D map on the height of z = 0. However, since few things exist on the height of the ceiling, a clear and relatively complete 2D ceiling map can be obtained. The ceiling map can be easily obtained by intersecting a planar plane close to the ceiling and parallel to the ground to all of the detected surfaces. Fig. 2 (b) shows a ceiling map example using the same 3D data shown in Fig 2 (a). The quality of the ceiling map is represented by the total length of the line segments within the map. We vary the height of the intersection plane to generate several ceiling maps and only select the best one as the final result.
4 Consistent 3D Map Building With a global and a local ceiling map in hand, the determination of the robot’s pose becomes a common 2D localization problem. We adopt a scan matching algorithm based on Extended Kalman Filter (EKF) to finish this task. The detailed algorithm can be found in lots of literatures such as [2].
788
Z. Xiang and W. Zhou
Fig. 1. An example of the detected surfaces from 3D data
(a)
(b)
Fig. 2. (a) An example of the 2D map when z=0, where lots of small segments from cluttered environment are displayed. (b) An example of the local ceiling map obtained using z = 1900 mm, where profiles of the room are clearly shown.
Since all of the local ceiling maps have been matched to the same global ceiling map, the resulting positions will have the same reference. Therefore a consistent 3D map can be built on the basis of the correct pose of the robot. The 3D map integrating process includes four steps: a) filtering out the small surfaces; b) coordinate transformation; c) matching between the local and global map; d) map fusion. Since the environment is cluttered, lots of small surfaces will exist in the local map. They have little meaning in visualized 3D model and will increase the computational cost of the following steps. Two criteria are used to filter out the small surfaces: a) The number of scan lines within the surface must be larger than a threshold; b) The average length of the scan lines within the surface must exceed a threshold. After the first step, the number of the surfaces can be reduced to about a half, leaving only relatively large surfaces in the local 3D map. In the second step, coordinates of all of the surfaces in the local map have to be transformed into global coordinate according to the calculated robot position. Deciding whether a surface in the local 3D map is
3D Map Building for Mobile Robots Using a 3D Laser Range Finder
789
matching with another surface in the constructing 3D map depends on some criteria: (a) The angle and the distance between the two surfaces should be small enough; (b) Two surfaces should be partially overlapped. In the last map fusion step, new plane parameters are calculated given the vertices of the both 3D polygons. Then each polygon is projected from the 3D coordinates space into a 2D coordinate space of the obtained plane. Finally the polygons are merged. The algorithm from Vatti [6] is used for the polygon merging process.
5 Experiment Fig. 3 shows the obtained trajectory of the robot during the experiment. The bounding rectangle represents the global ceiling map of the environment. The robot started from position A and ended on the position B. The small triangles represent the sampling positions and the direction of the robot on the trajectory. The localization results were satisfying, where the covariance of the positioning error during the whole process was less than 1 cm over all x-y-z components.
Fig. 3. Trajectory of the robot calculated by the ceiling map matching algorithm
Fig. 4. 3D planar surface map obtained from the experiment
790
Z. Xiang and W. Zhou
Fig. 4 shows the integrated 3D planar surface map of the environment. The map had the size of about 3200 mm in x direction, 5000 mm in y direction and 3300 mm in Z direction respectively. For the reason of clarity, the ceiling was removed from the map to give a good view to the internal walls and floors. Some parts of the walls and the floor were blank because of the obstruction by the objects. The map was globally consistent by noticing the accurate size and the correct positions of the walls within it.
6 Conclusions A consistent 3D map building method for mobile robots under indoor environments is proposed. Matching the local ceiling map to the global one solves the requirement of consistency. The matching is only in 2D and therefore is computationally efficient. Furthermore using of the ceiling map enables the robot to deal with cluttered environments very well since it is almost unaffected by the obstruction. The experimental results demonstrated our success.
Acknowledgments The research was sponsored by China National Science Foundation under grant No. 60505017.
References 1. Surmann, H., Nuchter, A., Hertzbero, J.: An Autonomous Mobile Robot with a 3D Laser Range Finder for 3D Exploration and Digitalization of Indoor Environments. International Journal of Robotics and Autonomous Systems. Elsevier. 45(2) (2003) 181-198 2. Chou, H., Traonmilin, M., Ollivier, E., Parent, M.: A Simultaneous Localization and Mapping Algorithm based on Kalman Filtering. IEEE Intelligent Vehicles Symposium. (2004) 631-635 3. Zhao, H., Shibasaki, R.: Reconstruction Textured CAD Model of Urban Environment using Vehicle-borne Laser Range Scanners and Line Cameras. IEEE Proceedings of International workshop on Computer vision systems. (2001) 453-458 4. Best, P., Mckay, N.: A Method for Registration of 3D Shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14 (1992) 239-256 5. Chen, Y., Medoni, G.: Object Modeling by Registration of Multiple Range Images. Proceedings of the IEEE conference on Robotics and Automation. (1991) 2724-2729 6. Vatti, B.: A Generic Solution to Polygon Clipping. Communications of the ACM. (1992) 56-63
Construction of Fast and Robust N-FINDR Algorithm Liguo Wang1, 2, Xiuping Jia3, and Ye Zhang2 1
School of Information and Communication Engineering, Harbin Engineering University, Harbin 150001, China [email protected] 2 Dept. of Information Engineering, Harbin Institute of Technology, Harbin 150001, China {wlg74327, zhye}@hit.edu.cn 3 School of Electrical Engineering, University College, the University of New SouthWales, Australian Defence Force Academy, Campbell, ACT 2600, Australia [email protected]
Abstract. N-FINDR has been a popular algorithm of endmember (EM) extraction method for its fully automation and relative efficiency. Unfortunately, innumerable volume calculation leads to a low speed of the algorithm and so becomes a limitation to its applications. Additionally, the algorithm is vulnerable to outliers that widely exist in hyperspectral data. In this paper, distance measure is adopted in place of volume one to speed up the algorithm and outliers are effectively controlled to endow the algorithm with robustness. Experiments show the improved algorithm is very fast and robust.
1 Introduction In resent years, hyperspectral remote sensing has been applied in many fields and the processing techniques of hyperspectral images (HSI) have been developed greatly. In HSI, mixed pixels are a mixture of more than one distinct substance In these cases, the resulting spectrum will be composite of some spectra. One of the most important HSI processing techniques is spectral unmixing[1], which aims at analyzing of mixed pixels. Spectral unmixing is to decompose a mixed pixel into a collection of distinct endmembers(EMs) with a set of fractional abundances which indicate the proportion of each EM. As the basic step of spectral unmixing, EM extraction is crucial for computing fractional abundances accurately. In this case, a number of EM extraction methods [2] have been proposed over the past decade. N-FINDR algorithm [3] has been famous and popular in EM extraction methods for its fully automation and relative efficiency. The algorithm is essentially an automated technique for finding the purest pixels in an image. The convex nature of hyperspectral data allows this operation to be performed in a relatively straightforward manner. It is based on theory of convex geometry and conducted on a reduced dimensional data space provided by MNF transformation [4]. Now N-FINDR algorithm is widely used in Anomaly detection, efficient materials mapping, hyperspectral image compression, and hyperspectral image sharpening [5]. It is in commercial use and is available in ENVI - a powerful hyperspectral image analysis package [6]. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 791 – 796, 2006. © Springer-Verlag Berlin Heidelberg 2006
792
L. Wang, X. Jia, and Y. Zhang
Despite the algorithm has been successfully used in various applications and is improved by reference [7], innumerable volume calculation leads to a very low speed of the algorithm, and outliers usually bring undesirable interference to final extraction. To get a fast and robust N-FINDR algorithm, distance measure instead of volume measure is used to low down the computational burdon, and effective control of outliers is implemented to endow the algorithm with robustness.
2 Speeding Up of N-FINDR Algorithm For the limitation of space, original N-FINDR algorithm is omitted here. Let E be the matrix of ( N + 1) pixels s1 , s 2 , , s N +1 augmented with a row of ones. In original NFINDR algorithm, the volume V(E) of the simplex formed by the pixels is proportional to the determinant of E. The part just aims to reduce the complexity of volume calculation. For the purpose of visualization, two-dimension space is considered firstly. In fig. 1, A, B, and C are the vertices of a triangle and Vold is the area of the triangle. Let A0 be a point which is different from A, B, and C, and Vnew be the volume of the triangle formed by A0, B and C. Let again the line segments AD, A0D0 be the distances from A and A0 to the line segment BC. Whether A0 can take the place of A is determined by the sizes of areas of the two triangles. For comparison Vold and Vnew, original algorithm computes them directly. It should be noted that the goal is only to compare which volume is larger but not necessary to compute them quantificationally. From geometry theoretics, the following expression holds Vold/Vnew=AD/A0D0 .
y
(1)
l1 l
A
C D0
l2
D B
A0 x Fig. 1. Equality of distance measure and volume one in N-FINDR algorithm
In other word, the distance comparison is equivalent to the volume comparison for renewing EMs. An intuitionistic illustration is given in fig. 1. Where l is the straightline formed by B and C, l1 is its parallel straight-line passing through vertices A, and l2 is the symmetrical line of l1 about line l. It can be seen that A0 can substitute for A if and only if A0 lies in the exterior of the region bounded by a pair of parallel straight-line l1 and l2.
Construction of Fast and Robust N-FINDR Algorithm
In N-dimension space, simplex with ( N + 1) vertices is considered. Let
793
s1 , s2 , ,
s N +1 are these vertices and s0 is another pixel different from them. Then distance comparison can be used to estimate whether pixel s0 taking the place of si (1 ≤ i ≤ N + 1) can result in an increase of volume. In this case, straight-line and area in twodimension space are corresponding to hyperplane and volume accordingly. In the space, the equation of the hyperplane formed by s1 , s2 , , si −1 , si +1 , , s N +1 can be written as N
¦ β i xi + c = 0 ,
(2)
i =1
where β = ( β 1 , β 2 , , β N ) T is the solution of the following equations:
[s1 , s 2 , , si−1 , si+1 , , s N +1 ]T ⋅ β + c ⋅ I N
=0 .
(3)
I N is a N × 1 vector of ones. c takes 0 if the original is linear correlation to s1 , s2 ,
, si−1 , si+1 , , s N +1 , and 1 otherwise. In practice, c has little chance to be evaluated as 0. Another, the normalization of distance form original point to hyperplane does not affect the analysis of the paper. Then the distance d ( s 0 ) from s0 to hyperplane (2) is proportional to the following expression:
d ( s0 ) = β T ⋅ s0 + c .
(4)
In N-FINDR algorithm, the times for constructing hyperplane, equaling to the times of EM renewing multiplied by the number of EMs, are much less than the times of distance calculation. So the computational cost of hyperplane construction can be omitted relatively. From (4), distance calculation includes just a dot product of two N × 1 vectors, of which the computational cost is linear to dimension N , while volume calculation needs to calculate a determinant of a ( N + 1) × ( N + 1) matrix, of which the computational cost is cubic to dimension N . In brief, original N-FINDR algorithm and its fast version are named O-N-FINDR and F-N-FINDR respectively.
3 Detecting and Removing Outliers Outliers have higher probability to be selected as EMs for their special spatial position, and so they can bring severe influence on iterative searching of EMs. Outliers are widely existed in HSI, and only one error extraction can lead to the breakdown of N-FINDR algorithm sometimes. In this case, the construction of robust N-FINDR algorithm is very necessary. If outliers can be excluded from hyperspectral data, the bad influence of them can be avoided. For that outliers usually exist in an isolated manner, local analysis is helpful to get the purpose. In concrete, the number of pixels in a local window can be counted for center pixel, and the number can be used as
794
L. Wang, X. Jia, and Y. Zhang
outlier index of the center pixel, the larger of the index, the more isolation extent, and so the higher probability to be selected as outlier. To low down the computational cost, square window can take the place of round one approximately. Here the robust N-FINDR algorithm is named R-N-FINDR briefly.
4 Experiments and Results To get a clear evaluation, F- N-FINDR and R- N-FINDR are respectively compared with O- N-FINDR. In the first group of experiments, 1000 points are generated by mixing of three points A(-15,0), B(15,0) and C(0,20) which are prescribed as EMs. The mixed points are uniformly distributed within the triangle spanned by A, B and C (solid triangle in fig.2). Independent random Gaussian noise with variance 1 is added to the 1000 points in both x and y directions. The two algorithms get the same final simplex (dotted triangle in fig.3) with different execution time. Table 1 gives the comparison for them of execution time (E-time in brief), the renewing times of EMs (EM-times in brief) and the computation times for volume/distance (V/D-times in brief). For a small N as 3, algorithm is speeded up by 7 times or so.
Fig. 2. Scatter plot of synthetic data used in exp. 1
In the second group of experiments, a large number of EMs as 10 is taken. In a 9dimensional space, 9 standard unit vectors and original point are prescribed as EMs. 10000 points are generated by linear combination of the EMs for the experiment. A detailed comparison of execution time is shown in table 2. In this case, improved algorithm gets a speeding up of more than 40 times compared with O-N-FINDR. In the first two group of experiments, the equal value of EM-times and V/D-times shows the equivalence of distance measure and volume one. In the third group of experiments, 3 consulting spectra in lab database are used to compose of 1000 spectra. Another 30 outliers are generated by imposing noise to the 3 consulting spectra. The first line in fig. 3 gives the comparison of consulting spectra
Construction of Fast and Robust N-FINDR Algorithm
795
and the results selected by different methods. It can be seen O-N-FINDR is severely interfered by outliers, while the results of R- N-FINDR is very close to consulting spectra. In the last group of experiments, 3 consulting spectra (soybean, grass and woods) are generated by averaging three classes of spectra (500 data of each class) that come from an agriculture/forestry landscape AVIRIS data acquired in Indian Pine Test Site. The second line in fig. 3 gives the performance comparison of the two methods implemented on the 1500 dataset. Again, robust algorithm gives closer results to the consulting spectra. It is known that ideal EM should correspond to class center but not simplex vertex, and so the robust algorithm can get more reasonable extraction regardless of existing of outliers. Table 1. Comparison of execution time/iteration times (exp. 1)
O-N-FINDR F-N-FINDR
E-time 2.9380 0.4210
EM-times 21 21
V/D-times 9504 9504
Table 2. Comparison of execution time/iteration times (exp. 2)
O-N-FINDR F-N-FINDR
a) Consulting EMs
E-time 961.6 22.75
EM-times 95 95
V/D-times 689843 689843
b) EMs selected by O-N-FINDR c) EMs selected by R-N-FINDR
Fig. 3. Comparison of selected EMs of different methods
796
L. Wang, X. Jia, and Y. Zhang
5 Conclusions In this paper, a fast and robust N-FINDR algorithm is constructed. In speediness measure, distance comparison takes the place of volume comparison, the larger the number of EMs, the higher the efficiency of the replacement. Another, constructing a good order for initialization and renewing of EMs is helpful of further speeding up of the algorithm. In robustness measure, outliers get effective control, and the control can lead to more reasonable extraction of EMs whether outlier exists or not. The algorithm must get more effective applications with the help of the improvements.
References 1. Keshava, N., Mustard, J. F.: Spectral Unmixing. Signal Processing Magazine, IEEE. 19 (2002) 44–57 2. Cipar, J. J., Eduardo, M., Edward, B.: A Comparison of End Member Extraction Techniques. Proceedings of SPIE - The International Society for Optical Engineering, 4725 (2002) 1-9 3. Winter, M. E.: N-FINDR: An Algorithm for Fast Autonomous Spectral End-member Determination in Hyperspectral Data. SPIE Imaging Spectrometry, 5 (1999) 266-275 4. Green, A., Berman, M., Switzer, P., Craig, M.: A Transformation for Ordering Multispectal Data in Terms of Image Quality with Iimplications for Noise Removal. IEEE Transactions on Geoscience and Remote Sensing, 26 (1988) 65-74 5. Winter, M. E., and Winter, Ed.: New Developments in the N-FINDR Algorithm. Presented at: IGARSS 2001 International Geoscience and Remote Sensing Symposium Sydney, Australia. http://www.higp.hawaii.edu/~winter/ 6. Xing, Y., Gomez, R.B.: Hyperspectral Image Analysis Using ENVI (Environment for Visualizing Images). Proc. of SPIE - The International Society for Optical Engineering, 4383 (2001) 79-86 7. Plaza, A. Chein-I Chang: An Improved N-FINDR Algorithm in Implementation. Proceedings of SPIE--Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XI, 5806 (2005) 298-306
Dental Plaque Quantification Using Cellular Neural Network-Based Image Segmentation Jiayin Kang1 , Xiao Li2 , Qingxian Luan2 , Jinzhu Liu3 , and Lequan Min1,3 1
School of Information Engineering, University of Science and Technology Beijing, 100083 Beijing, P.R. China [email protected] 2 School of Stomatology, Peking University, 100081 Beijing, P.R. China [email protected], [email protected] 3 School of Applied Science, University of Science and Technology Beijing, 100083 Beijing, P.R. China {jinzhucn, lqmin}@sohu.com
Abstract. This paper presented an approach for quantifying the dental plaque automatically based on cellular neural network (CNN) associated with histogram analysis. The approach was applied to a clinical database consisting of 15 objects. The experimental results showed that this method provided accurate quantitative measurement of dental plaque compared with that of traditional manual measurement indices of the dental plaque.
1
Introduction
The detection of dental plaque is crucial for patients, their clinicians and also researchers. A number of dental plaque indices have been developed to overcome the difficulties of quantifying the presence of dental plaque. These indices have been devised to allow an easy and semi-quantitative assessment of the distribution of dental plaque and they vary in their approach to record the scores (The details can be found in references [1],[2]). However, most clinical scoring systems for plaque are subjective because measurements rely primarily on the clinician’s ability to demarcate or score areas of disclosed plaque via visual examination. This may lead to inter-operator errors. Image segmentation is the first step in image analysis and pattern recognition. It is crucial and essential component of image analysis and/or pattern recognition system, is one of the most difficult tasks in image processing, and determines the quality of the final result of analysis [3]. To understand an image, one needs to isolate the objects in it and find the relation among them. The process of objects separation is referred as image segmentation. In other words, segmentation is used to extract the meaningful objects from the image [4]. Quantification of dental plaque on digital photograph of the anterior teeth of patients is an important research issue. The purpose of this study was to D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 797–802, 2006. c Springer-Verlag Berlin Heidelberg 2006
798
J. Kang et al.
propose a novel quantitative and automated method to measure dental plaque accumulation via digital image segmentation based on cellular neural network (CNN) associated with histogram analysis.
2 2.1
Dental Plaque Quantification Via Cellular Neural Network Cellular Neural Network
The CNN [5], first introduced as an implemental alternative to full-connected Hopfield neural network, has been widely studied for image processing, robotic and biological vision, and higher brain functions [5],[6]. In particular, some templates of the CNN have been presented for deleting small objects, edge detection, convex corner detection, diagonal line detection, global connectivity detection, and so on (see [7]-[10]). The standard M × N CNN architecture is composed of cells . The dynamic of each cell is given via the equation as follows: ak,l yi+k,j+l + bk,l ui+k,j+l + zi,j x˙ i,j = −xi,j + Ci+k,j+l ∈Sr (i,j)
= −xi,j +
r r
Ci+k,j+l ∈Sr (i,j)
ak,l yi+k,j+l +
k=−r l=−r
r r
bk,l ui+k,j+l + zi,j .
(1)
k=−r l=−r
where xi,j , yi,j , ui,j , zi,j represent state, output, input, and threshold respectively; Si,j (r) is the sphere of influence with radius r; ak,l and bk,l are the elements of the A-template and the B-template respectively. the output yi,j is the piece-wise linear function given by ak,l , bk,l 1 yi,j = (|xi,j + 1| − |xi,j − 1|), i = 1, 2, · · · , M ; j = 1, 2, · · · , N . 2 2.2
Standard Threshold CNN
The template of the standard threshold CNN has the form:
A=
0 0 0
0 2 0
0 0 0
,
B=
0 0 0
0 0 0
0 0 0
,
Z = −z ∗ .
and Z = −z ∗ , where −1 < z ∗ < 1 I. Global Task 1. Given: 2. Input: 3. Initial State:
Static gray-scale image P and threshold z ∗ . U (t) = Arbitrary or default to U (t) = 0. X(0) = P .
(2)
Dental Plaque Quantification Using CNN-Based Image Segmentation
Y (t) ⇒ Y (∞) = binary image when all pixels P with gray-scale intensity Pi,j > z∗ becomes black.
4. Output: II. Local Rules xi,j (0) 1. xi,j (0) < z ∗ 2. xi,j (0) > z ∗ 3. xi,j (0) = z ∗ 2.3
799
→ yi,j (∞) → White, independent of neighbors. → Black, independent of neighbors. → z ∗ , assuming zero noise.
Choosing Threshold
Roughly speaking, the ”clusters” in a tooth digital image may be classified by two kinds. One kind represents the tooth surfaces, and another stands for the dental plaque (see Fig. 1 (a) and (b)). (a)
(b)
Fig. 1. Two digital tooth images with different dental plaque (red color). (a) large amount dental plaque, (b) small amount dental plaque.
Threshold is a particular useful approach for distinguishing different kinds of clustering in images. For well-defined images, the threshold is located at the valley between two peaks in the gray level histogram [11](see Fig. 2(a)). However, the digital images of teeth with dental plaque have not always two remarkable peaks for determining the valley (see Fig. 2(b)). One of the methods of choosing threshold T is automatically iterative procedure, is given via the equation as follows [12]: ⎧ ⎫ L−1 Ti ⎪ ⎪ ⎪h k ⎪ h k ⎪ k ⎪ k ⎬ 1 ⎨ k=0 k=Ti +1 . (3) + Ti+1 = Ti L−1 ⎪ ⎪ 2⎪ ⎪ ⎪ ⎪ hk ⎭ hk ⎩ k=0
k=Ti +1
where hk is the numbers of pixels which gray value are k, The total number of gray-level is L. The formula (3) can be easily implemented by a computer programs. Using the formula 3 and corresponding computer programs, we compute the threshold T of the histogram in Fig. 2(a) and Fig. 2(b). The results are T 186, T 206. Based on the two thresholds, the areas percentages of the dental
800
J. Kang et al. (b) 3000
2500
2500
2000
2000
Distribution
Distribution
(a) 3000
1500
1500
1000
1000
500
500
0
0
Gray scale 0
50
100
150
Gray scale 200
250
0
50
100
150
200
250
Fig. 2. Gray-level histogram of tooth images shown in (a) Fig. 1(a), and (b) Fig. 1(b)
plaques to the total tooth surface are 46.34 % and 34.25 % respectively. Clearly, the calculation results are not accurate enough. After investigating digital tooth images of 15 objects obtained from the clinical trials, we found that the threshold may be determined by the minimum gray scale distribution between gray scales 50 and 150 from the histograms of the color Plane G. A computer-aided automated searching program has been designed in order to search thresholds of the histograms of digital tooth images. The steps of the program are listed as follows: 1. Image preprocessing (denoiseing and “hole-filling”). 2. Calculate the data of the histograms of digital tooth images. 3. Search the minimum valley (threshold T) between gray scales 50 and 150 from the data of histograms of the color Plane G. 4. According to the threshold T, segment the plaque. 5. Calculate the numbers of pixels of the plaque and teeth surfaces, respectively. Using this approach, the thresholds of the histograms shown in Figs. 2(a) and 2(b), are 99 and 95 respectively. They are more reasonable than those obtained by formula (3).
3 3.1
Experimental Results and Analysis Dental Plaque Measurement Using Traditional Dental Plaque Index
The plaque index developed by Quigiey and Hein and modified by Turesky is one of the most frequently used indices in product-testing [2]. The index is scored as follows: 0: no plaque; 1: separate flecks of plaque at cervical margin of the margin of the tooth; 2: a thin continuous band of plaque (up to 1 mm ) at the cervical margin; 3: a band of plaque wider than 1 mm but covering less than (1/3)rd of crown;
Dental Plaque Quantification Using CNN-Based Image Segmentation
801
4: plaque covering at least (1/3)rd but less than (2/3)rd of the crown; 5: plaque covering (2/3)rd or more of the crown. The results of dental plaque level (the mean of total 12 teeth for per object) according to Turesky index was listed in the third column in Table 1 (clinical database consisting of 15 objects). 3.2
Dental Plaque Quantification Based on the Proposed Approach
By means of the approaches designed in sections 2.2 and 2.3, we processed digital tooth images of 15 objects obtained via the clinical trials. In the meantime, the tooth images were indexed by visual observations. The results are listed in Table 1. Two processed tooth images (No.01 and No.02) are shown in Fig. 3. Obviously, as shown in Table 1, the data obtained via our computer-aided approaches are more accurate than those by visual observations. Table 1. The results of dental plaque based on proposed method and Turesky respectively Patient No. Proposed method(Percentage) Turesky(Level) 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15
51.29 14.30 8.43 27.82 28.10 12.94 58.44 6.74 22.48 22.32 79.64 11.42 53.59 7.98 16.36
2.75 1.75 1.18 1.63 1.75 1.50 3.42 1.20 2.08 1.63 4.58 0.92 2.75 1.36 1.42
(a) (b)
Fig. 3. Detected dental plaque (black color) of images shown in (a) Fig. 1(a), and (b) Fig. 1(b)
802
4
J. Kang et al.
Conclusion
A method to quantify the dental plaque based on cellular neural networks associated with the histogram analysis was proposed. Experimental results shown that the method, presented in this paper, is an automated, objective and quantitative option to current indices in quantifying dental plaque without need for a clinician. As a result, it is helpful for providing the standard for clinical evaluations. Furthermore, it may be an attractive method to monitor the trend of dental plaque growth in longitude investigation.
Acknowledgement The authors would like to thank for the financial supports given by the National Natural Science Foundations of China (Grant Nos. 60074034, 70271068), and the Research Fund for the Doctoral Program of Higher Education (Grant No. 20020008004) by the Ministry of Education of China.
References 1. Carter, K., Landini, G., Walmsley, A.D.: Automated Quantification of Dental Plaque Accumulation using Digital Imaging. J. of Dentistry. 32 (2004) 623–628 2. Pretty, I.A., Edgar, W.M., Smith, P.W., Higham, S.M.: Quantification of Dental Plaque in the Research Environment. J. of Dentistry. 33 (2005) 193–207 3. Cheng, H.D., Jiang, X.H., Sun, Y., Wang, J.L.: Color Image Segmentation: Advances and Prospects. Pattern Recogniton. 34 (2001) 2259–2281 4. Deshmukh, K.S., Shinde, G.N.: An Adaptive Color Image Segmentation. Electronic Letters on Computer Vision and Image Analysis. 5 (4) (2005) 12–23 5. Chua, L.O., Yang, L.: Cellular Neural Networks: Theory and Application . IEEE Trans. Circuits Syst.. 35 (1988) 1257–1272, 1273–1290 6. Chua, L.O., Roska, T.: Cellular Neural Networks and Visual Computing. Cambridge Press (2002) 7. Cai, H., Min, L.Q.: A Kind of Two-input CNN with Application. Int. J. of Bifurcation and Chaos. 15 (12) (2005) 4007–4011 8. Li, G.D., Min, L.Q., Zang, H.Y.: Design for Robustness Edge-gray Detection CNN. 2004 Int. Conf. on Communications, Circuit and Syst.. II (2004) 1061–1065 9. Liu, J.Z., Min, L.Q.: Design for CNN Templates with Performance of Global Connectivity Detection. Communications in Theoretical Physics. 41(1) (2004) 151–156 10. Su, Y.M., Min, L.Q.: Robustness Designs of Templates of Directed Overstrike CNNs with Applications. J. of Signal processing. 11 (2004) 449–454 11. Kim,D.Y., Park, J.W.: Connectivity-based Local Adaptive Thresholding for Carotid Artery Segmentation using MRA Images. Image and Vision Computing. 23 (2005) 1277–1287 12. Rafael, C.G., Richard, E.W., Steven, L.E.: Digital Image Processing using Matlab. Publishing House of Electronics Industry, Beijing (2004)
Detection of Microcalcifications Using Wavelet-Based Thresholding and Filling Dilation∗ Weidong Xu1,2, Zanchao Zhang1, Shunren Xia1, and Huilong Duan1 1
The Key Laboratory of Biomedical Engineering of Ministry of Education, Zhejiang University, Hangzhou 310027, China 2 Automation College, HangZhou Dianzi University, Hangzhou 310018, China [email protected]
Abstract. Microcalcifications (MCs) are the main symptoms of breast cancer in the mammograms. This paper proposed a new computer-aided diagnosis (CAD) algorithm to detect the MCs. At first, discrete wavelet transform (DWT) was applied to extract the high-frequency signal, and thresholding with hysteresis was used to locate the suspicious MCs. Then, filling dilation was utilized to segment the desired regions. During the detection process, ANFIS was applied for auto-adjustment, making the CAD more adaptive. Finally, the segmented MCs were classified with MLP, and a satisfying result validated this method.
1 Introduction Breast cancer continues to be one of the most dangerous tumors for middle-aged and older women in China. Among all the detection methods, mammography performs most effectively. In the mammograms, the early symptoms of breast cancer are microcalcifications (MCs). Since the MCs always appear tiny and indistinct, the detection of the MCs usually costs the radiologists much time and energy. Many computeraided diagnosis (CAD) methods have been developed to assist the radiologists. Nishikawa proposed a different-image technique based on image subtracting, which could extract the high-frequency (HF) image signal 1. Choe applied Harr wavelet to decompose the images, applied cubic mapping to enhance the wavelet coefficients, and detected the MCs from the reconstructed images with thresholding 2. Gulsrud reduced the difference of the background with image subtracting, eliminating the random noise with median filtering, and used an optimal filter to locate the MCs 3. Those conventional methods performed well in detecting those MCs with similar features and backgrounds. But if the MCs in the special backgrounds are faced with, the detection result was usually not so satisfying. This paper proposed a novel detection algorithm, which used discrete wavelet transform (DWT) to extract the HF signal, applied dilation filling to segment the suspicious regions, and utilized ANFIS to adjust the detection process. It overcame the defects of the conventional methods, and got a high detection precision with low false positive (FP) rate in the experiments. ∗
Supported by Nature Science Foundation of China (No. 60272029) and Nature Science Foundation of Zhejiang Province of China (No. M603227).
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 803 – 808, 2006. © Springer-Verlag Berlin Heidelberg 2006
804
W. Xu et al.
2 Discrete Wavelet Transform and Thresholding with Hysteresis MCs appear as tiny pieces with high intensities, which could be represented with the HF signals. In the conventional methods, image-subtracting technique was often applied to extract the HF information. Compared with them, wavelet is more appropriate to extract the objective HF information. Wavelet has great smoothness and locality, which makes it very effective in the information extraction. With wavelet, the low-frequency (LF) and the HF information of the signals could be decomposed level by level, called multi-resolution analysis (MRA). In this way, the objective signal could be extracted and processed according to their resolutions. A usual wavelet-based technique is discrete wavelet transform (DWT). With the twoscale sequences of the scale function and the wavelet function, the signals could be decomposed and reconstructed level by level easily. With DWT, the image information in each resolution is decomposed into four subbands: LL stores the low-frequency information, while LH , HL and HH store the HF information. These three subbands can be combined into a uniform HF domain, i.e. | LH | + | HL | + | HH | . In the practical work, the HF information that denotes the MCs always lies in the 2nd and 3rd levels of the wavelet domain 4.
(a)
(b)
Fig. 1. Original Images (a) and Location Result of the MCs (b)
Thresholding with hysteresis was used to extract the high-intensity coefficients in the HF domain of the 2nd and 3rd levels. Firstly, the coefficients were processed with a global thresholding: if the modulus was < T0 , the signal was deleted. Secondly, the reserved coefficients were processed with another global thresholding: if the modulus was > T1 , the signal was considered as the MCs. Finally, a local thresholding was applied on the neighborhood around each assured MC, and the remaining coefficients near the assured MCs were considered as the MCs, if their modulus were > T2 . In
Detection of MCs Using Wavelet-Based Thresholding and Filling Dilation
805
this way, those useful signals with comparatively low HF modulus could be extracted, leaving the noise with similar HF modulus suppressed. With the reconstruction of the assured signals in the HF domain, all the MCs could be located accurately (Fig. 1).
Fig. 2. Filling Dilation Result of the MCs in Fig.1
Fig. 3. ANFIS
3 Filling Dilation Then, filling dilation technique was used to segment the suspicious MCs. It is like a liana expanding, the mammogram is like the relief maps, and the gray-level intensity of the mammogram is like the altitude of the landform. The MC region R0 located in Section 2 was considered as the original regions of the lianas, and the gray-level contrast in the neighborhood was enhanced with an intensity-remapping method. Then, the liana began to expand outwards, with an iterative dilation process based on a cross-shaped structure element B : R 1 = R 0 ⊕ B , " , R n + 1 = R n ⊕ B . The new combined point during the dilation process wouldn’t be accepted into the current region, if its gray intensity f ( x , y ) could not satisfy: | f ( x , y ) − f k e r |≤ T 3 and | f ( x , y ) − f |≤ T 4 . Where f ker is the mean intensity of R 0 , f is the mean intensity of the accepted points in the neighborhood, and T 3 , T 4 are two thresholds.
If the altitude of the current position was quite different from that of the environment, it means that the liana cannot climb to it. In this way, the regions of the suspicious MCs were extracted accurately and adaptively, as shown in Fig. 2.
4 ANFIS-Based Parameter Adjustment In Section 2 and 3, the location and segmentation of the MCs could be finished. It is found out that the values of the five detection parameters ( T 0 , T1 , T 2 , T 3 , T 4 ) are quite important in the CAD. These thresholds should be auto-adjusted during the detection, according to the gray-level characteristics at the corresponding position. In this experiment, ANFIS (adaptive-network-based fuzzy inference system, in Fig. 3.)
806
W. Xu et al.
was used for auto-controller. ANFIS has high nonlinear approaching precision, good generalization ability, and has been widely applied in many research fields 5. ANFIS is divided into 6 layers. Each sample Sn has N d dimension signals, and
X i ( i ∈ [0, N d − 1] ) is inputted to a set of fuzzy membership functions μ Fi ;k ( X i ) ( k ∈ [0, M i − 1] , M i is the number of the fuzzy sets Fi ;k of
each dimension signal
Xi
the current dimension), and the function value is the fuzzy membership degree. could be mapped to a
1× M i vector, and Sn has N d such vectors. The fuzzy
membership function prototype is defined as a Gaussian, which has two premise parameters: c = ci ;k is fuzzy center, and σ = σ i ;k is fuzzy width.
μ cG,σau s sia n ( x ) = e
−
1 x−c 2 ( ) 2 σ
(1)
The fuzzy membership degrees of different dimension signals are cross-multiplied, Nd −1
to generate
Ay ( = ∏ M i ) excitation intensity Y j ( = μF0;k ×"μFi ;k ×"μFN −1;k 0
i =0
of the corresponding fuzzy association rules
i
d
Nd −1
)
R j ( = F0;k0 ×" Fi;ki ×" FNd −1;kN −1 ). d
In the fifth layer, Sugeno fuzzy model is applied to draw the conclusion of each fuzzy association rule, i.e. if X 0 is F0;k0 , " , and X Nd −1 is FNd −1;kN −1 , d
Nd −1
then f j ( Sn ) = WNd ; j + ¦ Wi ; j X i . Wi; j is called conclusion parameter, and the i=0
fuzzy conclusion
O j is computed by multiplying Y j and f j .
O j ( S n ) = Y j f j ( S n ) = (W N d ; j +
Finally, add up all
A y −1
N d −1
¦ i=0
W i ; j X i )Y j
¦Y
(2)
j
j=0
O j , and the output of ANFIS is gained, i.e. F ( S n ) =
Ay −1
¦O
j
.
j =0
Least mean square (LMS) method is adopted to train the ANFIS. If the ideal output
Sn is d , the mean square error is E ( S n ) = (d ( S n ) − F ( S n )) 2 2 . In order to ∂E minimize E , Wi ; j should be adjusted with the inverse direction of : ∂W
of
∂E ∂E ∂e ∂F ∂O = W i; j ( n ) − η ∂W ∂e ∂F ∂O ∂W = W i ; j ( n ) + η eY j X i = W i ; j ( n ) + η Y j ( S n ) X i ( d ( S n ) − F ( S n ))
(3)
W N d ; j ( n + 1) = W N d ; j ( n ) + η Y j ( S n )( d ( S n ) − F ( S n ))
(4)
W i ; j ( n + 1) = W i ; j ( n ) − η
Detection of MCs Using Wavelet-Based Thresholding and Filling Dilation
And
ci ;k and σ i ;k could also be adjusted like that:
c i ; k ( n + 1) = c i ; k ( n ) − η
∂E ∂E ∂e ∂F ∂O ∂Y ∂μ = ci ;k ( n ) − η ∂c ∂e ∂F ∂O ∂Y ∂μ ∂c
= c i ; k ( n ) + η ( d ( S n ) − F ( S n )) ¦ m
σ i ; k ( n + 1) = σ i ; k ( n ) − η
1 x − ci ;k ( n ) 2 ) σ i ;k ( n )
∂ Y m x − ci ;k ( n ) − 2 ( fm (Sn ) e ∂ μ Fi ;k σ i2; k ( n )
∂E ∂E ∂e ∂F ∂O ∂Y ∂μ = σ i ;k ( n ) − η ∂σ ∂e ∂F ∂O ∂Y ∂μ ∂σ
= σ i ; k ( n ) + η ( d ( S n ) − F ( S n )) ¦ f m ( S n ) m
∂Y m = ∂ μ Fi ; k
η
( ¦ Y j (Sn ) −
A y −1
μ F ( ¦ Y j (S n ))
1 x − ci ;k ( n ) 2 ) σ i ;k ( n )
2 ∂ Y m ( x − c i ; k ( n )) − 2 ( e 3 ∂ μ Fi ; k σ i ;k ( n )
(5)
(6)
A y −1
Ym ( S n ) i ;k
Where
807
2
j=0
¦ Y (S l
l
n
))
(7)
j=0
is the step length of the training,
m (or l ) ( ∈ [0, Ay − 1] ) is the serial
Rm (or Rl ) that contain the current fuzzy membership function μ Fi ;k ( X i ) , and there are totally Ay M i such rules. number of the fuzzy association rules
With the experiments, the optimal values of the detection parameters in different backgrounds could be measured, and three background features (mean intensity, contrast and fractal dimension) should be extracted simultaneously. With ANFIS, the relation between those optimal values and background features could be learned. If a new mammogram is processed, its background features should be extracted firstly, and then the appropriate values of the parameters could be determined by ANFIS accordingly, making the location and segmentation adaptive and accurate.
5 Experiments and Discussion 60 mammograms were used to test the proposed algorithms, which were taken from the 1st affiliated hospital of Zhejiang university, with the size of 1500*2000 and 12-bit gray-level. There were 163 MCs in these mammograms, and with the proposed algorithm, 162 MCs were detected, with 511 FPs. Finally, MLP (multi-layer perceptrons) was applied to remove the FPs, and reserve the true MCs. Before that, ten features of the MCs were extracted, including area, mean intensity, contrast, coherence, compactness, ratio of pits, number of hollows, elongatedness, fractal dimension, and clustering number. Coherence is the mean square deviation of the region, compactness is the roundness, ratio of pits is the ratio of the number of pits (concave point of the boundary) to the circumference, elongatedness is the ratio of the length to width, and clustering number is the number of the surrounding MCs.
808
W. Xu et al.
The MLP classifier was defined as: 3 layers, 10 input nodes, 20 hidden nodes, and 1 output node. And the classification result was: 158 true MCs were detected, and 499 false MCs were identified. So the true positive rate was 96.9%, while the number of the FPs per image was 0.2. Then, the true MC regions were segmented manually by the radiologists, and the result was considered as the segmentation criterion. So that the extraction effect of the MCs could be evaluated, by computing the ratio of the common area (the overlapped part of the auto-extracted area and the criterion area) to the criterion area. For those MCs with strong HF information, the mean extraction effect was 97.2%; for those MCs with comparatively weak HF signal, the mean extraction effect was 90.5%; for all the MCs, the mean extraction effect was 94.7%. The performance of the proposed algorithm was much better than that of the conventional methods. It’s because in this paper, adaptability and robustness were emphasized. ANFIS was used for the auto-adjustment of the detection process, and filling dilation could extract the desired regions accurately, which makes the feature extraction of the MCs could be applied upon the accurate regions so that the precision of the final classification could be ensured. Even when the MCs with the very special characteristics and backgrounds were faced with, this algorithm could get a satisfying result, which couldn’t be achieved with the conventional techniques. So the algorithm could deal with all kinds of MCs with different backgrounds and features in a high detection precision, and a very low FP rate could be achieved simultaneously.
References 1. Nishikawa, R.M., Jiang, Y., Giger, M.L., Doi, K., Vyborny, C.J., Schmidt, R.A.: Computeraided Detection of Clustered Microcalcifications. IEEE International Conference on System, Man and Cybernetics, Chicago, USA, (1992) 1375–1378 2. Choe, H.C., Chan, A.K.: Microcalcification Cluster Detection in Digitized Mammograms Using Multiscale Techniques. IEEE Southwest Symposium on Image Analysis and Interpretation, Tucson, USA, (1998) 23–28 3. Gulsrud, T.O., Husoy, J.H.: Optimal Filter-based Detection of Microcalcifications. IEEE Trans. Biomed. Eng., Vol. 48. (2001) 1272–1280 4. Yoshida, H., Zhang, W., Cai, W.D., Doi, K., Nishikawa, R.M., Giger, M.L.: Optimizing Wavelet Transform Based on Supervised Learning for Detection of Microcalcifications in Digital Mammograms. IEEE International Conference on Image Processing, Lausanne, Switzerland, (1995) 152–155 5. Xu, W.D., Xia, S.R., Xie, H.: Application of CMAC-based Networks on Medical Image Classification. 1st IEEE International Symposium on Neural Networks, Dalian, China, (2004) 953–958
ECG Compression by Optimized Quantization of Wavelet Coefficients Jianhua Chen, Miao Yang, Yufeng Zhang, and Xinling Shi Department of Electronic Engineering Yunnan University Kunming, P.R. China, 650091 [email protected]
Abstract. The optimization of the parameters of a uniform scalar dead zone quantizer used in a wavelet-based ECG data compression scheme is presented. Two quantization parameters: a threshold T and a step size Δ are optimized for a target bit rate through the particle swarm optimization algorithm. Experiment results on several records from the MIT-BIH arrhythmia database show that the optimized quantizer produces improved compression performance.
1
Introduction
The purpose of Electrocardiogram (ECG) compression is to reduce the amount of bits needed to transmit, to store digitized ECG data as much as possible with a reasonable implementation complexity while maintaining clinically acceptable signal quality. Among various transform coding techniques, those based on the discrete wavelet transform (DWT) play an interesting role due to their easy implementation and efficiency. In [1],[2], both the embedded zerotree wavelet (EZW) and the set partitioning in hierarchical tree (SPIHT) algorithms, which have shown very good results in image coding, were applied to ECG signals. In [3],[4], the wavelet coefficients were thresholded to set small coefficients to zero and then, the nonzero coefficients were uniformly quantized and coded. In the above mentioned algorithms, the wavelet transform coefficients are actually quantized by a special kind of uniform quantizer which has a larger central quantization bin around zero, called the dead zone. However, the width of the dead zone was either determined by a threshold (in [3],[4]) or fixed to 2 times the width of other quantization bins(in [1],[2]). For good rate-distortion performance, the relationship between the width of the dead zone and the width of other quantization bins should be optimized for the desired bit rate since the improper selection of these parameters may lead to higher distortion at that bit rate. Particle Swarm Optimization (PSO) is a population based evolutionary optimization technique developed by J. Kennedy and R. Eberhart in 1995, inspired by the social behavior of bird flocking or fish schooling [5]. PSO shares a lot of similarities with other evolutionary computation techniques like Genetic Algorithm (GA), but PSO does not utilize crossover or mutation operation; rather, it D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 809–814, 2006. c Springer-Verlag Berlin Heidelberg 2006
810
J. Chen et al.
has memory and tracks the best solution achieved in the past. PSO is attractive because it has few parameters to adjust and it gets to better result in a faster and less computation-consuming way compared to many other methods. In this paper, the optimization of the parameters of a uniform scalar dead zone quantizer (USDZQ) is presented. Here, two quantization parameters: a threshold T and a step size Δ are optimized for a desired bit rate through the PSO algorithm. The quantizer is applied in a wavelet-based ECG data compression scheme where the quantized wavelet coefficients are entropy coded by using the Exp-Golomb coding and Golomb-Rice coding. The rest of this paper is organized as follows: in Section 2, the USDZQ is introduced and then, the objective function for the quantization optimization and the PSO algorithm are discussed in Section 3. Implementation details and experiment results are presented in Section 4.
2
Uniform Scalar Dead Zone Quantizer
The USDZQ is described as: ⎧ if k = −1 ⎪ ⎪ [−3δ, −T ) ⎨ (−T, T ) if k = 0 Ik = [T, 3δ) if k = 1 ⎪ ⎪ ⎩ [(2k − 1)δ, (2k + 1)δ) otherwise
0 if k = 0 Rk = ±2kδ if k = ±1, ±2, ...
(1)
(2)
Here, k is the quantizer output index, Ik describes the kth decision interval and Rk is the corresponding reconstruction level. δ is half of the quantization step size Δ (or, Δ = 2δ), T is the threshold around zero and δ < T < 2δ. In fact, the quantization of this quantizer can be viewed as a threshold operation followed by common midtread uniform quantization. For most natural signals, many high frequency subband wavelet coefficients are so small that no significant information will be lost in the reconstructed signals when these coefficients are quantized to zero. The numerous zero-valued quantization indices produced by samples falling within the central quantization bin, the dead zone, can be efficiently coded by using some kind of entropy coder. Therefore, it’s desired to have a larger dead zone to set more high frequency coefficients to zero so as to increase the compression performance without losing much quality of the reconstructed signals. In the USDZQ, the width of the dead zone is 2T , while all other decision intervals are of width Δ except for the two decision intervals next to the dead zone. However, all reconstruction levels are equally spaced. In this way, the dequantization, performed during the decompression process to find an approximation to the original coefficients, simply consists of the multiplication of each quantization index by the step size.
ECG Compression by Optimized Quantization of Wavelet Coefficients
3
811
Optimization of the Quantizer by PSO
For a given signal, let H(T, Δ) be the output bit rate of the coding system based on the USDZQ with parameters T and Δ, and let D(T, Δ) be a measure of the distortion introduced in the ECG signal by the quantization. The quantizer optimization problem can then be stated as: For a given target bit rate Htarget , determine T and Δ in a way that maintains the equality H(T, Δ) = Htarget as well as minimizes D(T, Δ). Optimization can be achieved by minimizing the cost function J = D(T, Δ) + λ|H(T, Δ) − Htarget |
(3)
for the given target bit rate Htarget . From the rate-distortion theory [6], the ratedistortion function R(D) is non-increasing in D and is convex. Since D(T, Δ) is non-decreasing in T, Δ, H(T, Δ) will be non-increasing in T, Δ. Apparently, |H(T, Δ) − Htarget | reaches the minimum when H(T, Δ) = Htarget . For a large enough parameter λ, minimizing the cost J is equivalent to minimizing the distortion D(T, Δ) under the H(T, Δ) = Htarget constraint. In ECG compression applications, H(T, Δ) is measured in bits/sample and the distortion D is measured by the Percent Root-Mean-Square Difference: N N (xi − xi )2 (xi − xi )2 i=1 i=1 × 100 or P RD = × 100 (4) P RD = m N N 2 2 xi (xi − xi ) i=1
i=1
where xi is the original signal, xi is the reconstructed signal, xi is the mean value of the original signal and N is the number of samples in the signal. The modified version (P RDm ) is not sensitive to the signal mean and therefore is used as the distortion D in (3). However, the P RD measure is also calculated for comparison with other methods. This optimization problem can be solved by using the the PSO algorithm. Like most evolutionary computation techniques, PSO starts with a population of solutions, called particles, randomly selected from the solution space and searches for the optima determined by a fitness function. Each particle representing one potential solution flies in the search space with a velocity adjusted according to the best position in its own flying experience (P b) and the best position in all its companions flying experience (Gb). In this work, the fitness function is defined as the cost function in (3). The standard procedure of the particle swarm optimization is described as follows: 1) Set the iteration number k to zero. For a population of M particles, ran(0) (0) domly assign the initial position Xi (potential solution) and velocity Vi for (0) each particle i in two dimensions in the (T, Δ) problem space. The initial P bi for each particle is set as its original position. Calculate the optimization fitness (0) function (J(Xi )) for each particle and store the value. Find the best fitness
812
J. Chen et al.
among all particles and store the value and its corresponding position as initial Gb(0) . In this work, a population size of 10 is used. (k) (k) 2) Update the velocity Vi and the position Xi for each particle according to (5) and (6) respectively. (k+1)
Vi
(k)
= w(k) ∗ Vi
(k)
+ c1 ∗ rand() ∗ (P bi
(k)
− Xi )
(k)
(k+1) Xi
=
+c2 ∗ rand() ∗ (Gb(k) − Xi )
(5)
(k) Xi
(6)
+
(k+1) Vi
Where w(k) is the inertia weight, c1 and c2 are cognitive and social acceleration constants. In order to restrict the particles from traveling out of the solution space, a limit V max is usually placed on the velocity. When the velocity exceeds this limit in any dimension, the value is set as the limit. V max of 2 works well in this study. (k+1) based on the new fitness 3) For each particle i, (i = 1, · · · , M ), update P bi (k+1) evaluation J(Xi ) and the fitness of its own best position in last iteration (k) (J(P bi )), (k)
(k+1)
P bi
=
(k+1)
(k)
P bi if J(Xi ) ≥ J(P bi ) (k+1) (k+1) (k) if J(Xi ) < J(P bi ) Xi
(7)
Update the best global position Gb(k+1) based on the fitness of the best global position in last iteration J(Gb(k) ) and the new best fitness of every particle (k+1) J(P bi ), i = 1, · · · , M , ⎧ ⎨ Gb(k) if min(J(P b(k+1) )) ≥ J(Gb(k) ) i (k+1) i (8) = Gb (k+1) (k+1) ⎩ P b(k+1) if J(P bm ) = min(J(P bi )) < J(Gb(k) ) m i
4) Repeat steps 2) and 3) until the required number of iterations K is met. Inertia weight w(k) in (5) improves the performance of the PSO algorithm and is decreased linearly during a PSO search. w(k) = wmax −
(wmax − wmin ) × k K
(9)
In this work, we set wmax = 0.9 and wmin = 0.4. c1 and c2 are both set to 2.
4
Implementation and Results
The 9/7-tap biorthogonal filters are used to implement the discrete wavelet transform (the ‘bior4.4’ filters in MATLAB). For entropy coding, the Exp-Golomb codes are used to code the lengths of runs of the zero quantization indices and the Golomb-Rice codes are used to code the nonzero indices [7]. Although the DWT implemented here is not orthonormal, which means the mean square error between transform coefficients and their quantized values is not the same
ECG Compression by Optimized Quantization of Wavelet Coefficients
813
as the mean square error between the original samples and the reconstructed samples, we observe that these two errors are still directly proportional to each other. Therefore, we only calculate the P RDm measure between the transform coefficients and their quantized values such that the inverse transform is not calculated during each optimization iteration. By minimizing this distortion measure, the distortion between the original samples and the reconstructed ones is also minimized and the computational cost is reduce accordingly. For the same consideration, the actual Golomb coding is not implemented during the optimization procedure. We only accumulate the number of bits of the codewords for lengths of zero runs and nonzero coefficients. When the optimal T, Δ pair is found, Golomb codewords are produced to achieve the actual compression. The proposed algorithm is tested on several records from the MIT-BIH ECG arrhythmia database. All ECG data used here are sampled at 360 Hz, and the resolution of each sample is 11 bits/sample. Since the bit rate is H(T, Δ), the compression ratio (CR) can be defined as CR = 11/H(T, Δ) . Table 1 describes the process of the algorithm’s convergence on record 117 of the MIT-BIT database. Here, the target CR is 5, λ is set to 100. Due to the convex nature of the absolute value function in the second term of equation (3), the convergence of the algorithm to the target CR will be fast. From Table 1, it can be found that during the quantization optimization, the PSO algorithm converges rapidly on the target CR (after 4 iterations). However, the reduction of the distortion term in equation (3) and hence the reduction of the cost function itself is slow after the 5th iteration. Table 1. The process of convergence iteration fitness compression number evaluation ratio 1 4.2326 5.0724 2 2.8048 5.0388 3 1.6386 5.0122 4 1.2756 4.9959 5 1.2756 4.9959 10 1.1726 4.9983 20 1.1035 4.9998 50 1.0939 4.9999
For compression performance evaluation, a set of signals are used which consists of 1-min length of data in record numbers: 104, 107, 111, 112, 115, 116, 117, 118, 119, 201, 207, 208, 209, 212, 213, 214, 228, 231, 232 in the MIT-BIH database. The averaged PRD results of the proposed algorithm at different CRs are listed in Table 2. The results reported in [7] on the same data set are also listed for comparison. It can be found that the proposed algorithm yields improved P RD performance at all tested compression ratios.
814
J. Chen et al. Table 2. Averaged PRD(%) Performance Comparison CR 8:1 10:1 12:1 16:1 20:1 Proposed P RD 2.34 2.85 3.38 4.58 6.01 Chen [7] P RD 2.39 2.93 3.46 4.67 6.13
5
Concluding Remarks
The threshold T and the quantization step size Δ of the uniform scalar dead zone quantizer (USDZQ) used in an ECG signal compression scheme are optimized through the PSO algorithm such that the input signals can be compressed at a bit rate very close to the desired one with the minimum distortion. The quantized coefficients are further entropy coded by using the Exp-Golomb coding and the Golomb-Rice coding. Experiment results show that the optimized quantizer produces improved coding performance. The algorithm does not require any a priori signal information for training, and thus can be applied to any ECG signal.
Acknowledgement This work is supported by Yunnan University, P.R. China.
References 1. Hilton,M.L.: Wavelet and Wavelet Packet Compression of Electrocardiograms. IEEE Trans. Biomed. Eng., Vol. 44. (1997) 394-402 2. Lu,Z.T., Kim,D.Y., Pearlman,W.A.: Wavelet Compression of ECG Signals by the Set Partitioning in Hierarchical Trees Algorithm. IEEE Trans. Biomed. Eng., Vol. 47. (2000) 849-856 3. Benzid,R., Marir,F., Boussaad,A., Benyoucef,M., Arar,D.: Fixed Percentage of Wavelet Coefficients to be Zeroed for ECG Compression. Electronics Letters. Vol. 39. No. 11. (2003) 830-831 4. Blanco-Velasco,M., Cruz-Rold´ an,F., Godino-Llorente,J.I., Barner,K.E.: ECG Compression with Retrieved Quality Guaranteed. Electronics Letters. Vol. 40. No. 23. (2004) 1466-1467 5. Kennedy,J., Eberhart,R.: Particle Swarm Optimization, Proc. IEEE Inter. Conf. on Neural Networks, Vol. 4, (1995) 1942-1948 6. Yeung,R.W.: A First Course in Information Theory, Springer (Kluwer Academic/Plenum Publishers), (2002) 7. Chen, J., Ma, J., Zhang, Y., Shi, X.: ECG Compression Based on Wavelet Transform and Golomb Coding. Electronics Letters. Vol. 42. No. 6. (2006) 322-324
Effects on Density Resolution of CT Image Caused by Nonstationary Axis of Rotation Yunxiao Wang, Xin Wang, Xiaoxin Guo, and Yunjie Pang Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street 2699#, Changchun, 130012, P.R. China [email protected]
Abstract. This paper discusses the negative effect on density resolution of computerized tomography (CT) image caused by the nonstationary axis. Noise introduced and noise propagation has been analyzed according to the integral formula of reconstructing CT image. Through analyzing and deducing we get the Signal Noise Ratio which is influenced by position noise in the special position of some simple situation. Furthermore, having provided the result of SNR in other position by utilizing the method of numerical computation and having got the conclusion that the SNR is dropped in the position of edge of image and the main part of the image is almost not impacted. The simulative image agrees well with the conclusion. The result provides theoretic reference for confining the shaking quantity of the rotating axis in large CT system.
1 Introduction Computer Tomography (CT) is a method of reconstructing section image by using projection data of a certain section of measured object in each direction. The measured object or radiant point and detector rotating around a certain axis is a habitual method to get the whole azimuthally projection datum in practical application of CT. The position of axis is very significant for image reconstruction. It is well known that very kinds of image reconstruction algorithms regard the position of axis as the origin of coordinates. In the process of measurement and data obtained, the precision of localization of the mechanical equipment and deformation of measured object, etc, will lead to the fact that this axis of rotation is unstable. Nonstationary axis induces the additional translation while the measured object is rotating. The translation causes the projection data of different positions overlapped, thus influences the quality of CT image [1],[2]. In the theory of CT image, two key technical parameters are spatial resolution and density resolution. The effects on spatial resolution have discussed in [3]. Density resolution shows the differentiating capability of the minimum density interval in the object of CT imaging. Density resolution is generally represented by the percentage of ratio of the smallest density difference which can distinguish and the average density of object. The density resolution of CT system is mainly decided by the SNR of the reconstructed image and we may consider the reciprocal of SNR as the density resolution [4]. Density resolution depends largely on the noise of system which is D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 815 – 820, 2006. © Springer-Verlag Berlin Heidelberg 2006
816
Y. Wang et al.
generally determined by the statistical fluctuations of intensity of the ray sources, the statistical fluctuations of the interaction of ray and the substance, the fluctuations of ray captured by detector, the noise of head amplifier and the noise of A/D change, etc. The nonstaionary axis of rotation will introduce the noise of statistical fluctuation which will be propagated to the final reconstructed image. Consequently the SNR of CT system is degraded and the density resolution is dropped. In order to make the conclusion have generality, all following calculation is based on “parallel projection” and “filtered back projection” because “parallel projection” is the most basic way of data gathering, other projection beam may be turned into parallel projection and “filtered back projection” algorithm is a method of image reconstruction which is most frequently used [5].
2 Analyzing Effect on Density Resolution Supposing the projection data in ĭ direction is Ȝĭ(Xr) in Fig1 of [3], filtered back projection algorithm firstly require that each set of projection data and filter function q1(Xr) do convolution,then we will get the filtered projection Ȝĭ’(Xr), ∞
λ'Φ ( X r ) = ³ λΦ ( X r ' )q1 ( X r − X r ' )dX r ' ≡ λΦ ( X r ) ∗ q1 ( X r ) .
(1)
−∞
In Equation (1), there is no considering the factor of nonstationary axis. Now we suppose that the center of rotation moves to point (įr, įș) in polar coordinates when we sample projection data in angle ĭ. Projection data has an excursion of įrcos(įș-ĭ), then the actual projection value becomes Ȝĭ(Xr+įr, cos(įș-ĭ)). According to “filtered back projection algorithm”, we make the filtered projection data do a reverse projection and accumulate them, and then we get the function of reconstructed image. Let q1(x) be a filter function and we get,
μ (r ,θ ) =
1
π
³
π
0
∞
dφ ³ λφ ( X r + δ r cos(δ θ − φ )) q1 ( X r − X r ) dX r '
'
−∞
. '
(2)
X r = r cos( θ −φ )
Density resolution of CT is mainly caused by various statistical noises. In Equation(2) Ȝ is also a random quantity even though we do not consider the uncertain factor of position for the radiate source, detector, the function of ray, etc are all stochastic processes. The ratio between noises which is caused by these factors and the ideal reconstructed signal can be regarded as the density resolution of CT [5]. The randomicity of Ȝ is merely influenced by įr and į when we only consider the shaking factor of rotation center. We rewrite Equation (2) and consider įr is a small quantity. And we expanding Ȝ using Taylor Series and get Equation (3). μ i (r , θ ) = ≈
1
π
π
³
1
π ³
π
+
π
'
'
∞
dφ ³ λ Φ ( X r ) q1 ( r cos(θ − φ ) − X r )dX r
(3)
'
'
'
.
'
−∞
0
1
∞
dφ ³ λ Φ ( X r + δ r cos(δ φ )) q1 ( r cos(θ − φ ) − X r )dX r −∞
0
³
π
0
∞
dφ ³ λ Φ ( X r ) δ r cos(δ φ ) q1 ( r cos(θ − φ ) − X r )dX r −∞
(1)
'
'
'
Effects on Density Resolution of CT Image
817
In Equation (3) all length quantities are nondimensional quantities and they multiply 2d are the actual length. We have analyzed in [3] that įr and į are the steady stochastic process of one dimension about projection angle and įr and į do not change along with Xr’ in the same angle . Considering E(įrcosį ) 0, we may think the mean value of ȝi(r,ș) is the first term of Equation(3) and the variance is the second term if ignoring the higher order term in Equation (3). We may find out that stochastic fluctuating variance has something to do with Ȝ (Xr) besides the shaking rotation axis. And namely it has something to do with reconstructed image. For example while reconstructing an infinitely large and absolutely distributed uniformity image in theory, the random shaking of rotation center does not have any influence on the reconstructed image. But if the image is not an infinitely large or absolutely distributed uniformity picture, the random shaking of rotation center will introduce noises and the situation of noises has something to do with concrete image. The characteristic brings a lot of difficulties to analyze the influence of the shaking rotation center on density resolution of CT image. 2.1 Noise Analyzing and Numerical Computer at the Points of a Uniform Circle For a uniform circle whose radius is R, the one-dimensional function of its projection is λφ ( X r ) = 2 R 2 − X r 2 .
(4)
Its first-order derivative is shown in Equation (5). − 2X r
λφ (1) ( X r ) =
R − Xr 2
.
(5)
2
Substituting Equation (5) into the second term of Equation (3) and letting r=0, we get 1 D® ¯π
³
§1 =¨ ©π
³
§ ¨1 =¨ ¨π ©
π
0
π
∞
dφ ³ λΦ
(1 )
−∞ ∞
'
'
'
½ ¾. r =0 ¿
(6)
2
0
dφ ³ λΦ
π
R
−∞
( X r ) δ r cos( δ φ ) q1 ( r cos( θ − φ ) − X r ) dX r
(1 )
' ' '· ( X r ) q1 ( − X r ) dX r ¸ D {δ r cos( δ φ )} ¹ 2
³
0
dφ ³
−R
− 2X r
'
R2 − X r
· ¸ q1 ( X r ) dX r ¸ D {δ r cos( δ φ )} ¸ ¹ '
'2
'
Because Ȝ (1)(Xr) is an odd function and q1(Xr) is an even function, their product is an odd function. An integral of an odd function is zero in a symmetric interval, so the value of Equation (6) equals to 0, which means the variance is 0. In other words, the reconstructed image’ value at the center of a uniform circle is still a definite quantity but not a random quantity although there exist the shaking of rotation axis. Thus the SNR of this point is and the density resolution is not affected by any factors. For the situation of r0, the integral is not so simple as r=0 and presently we can get the datum by only using the method of numerical calculation. Before numerical calculation, we must firstly determine the value of R. The detector interval is the order of magnitude of 0.2mm and the section diameter of measured object is
818
Y. Wang et al.
commonly around 1m, so R should be around in 0.5m/ (2 x 0.2mm) =1250.Let R=1500, when r is respectively 0, 150, 300, ……,1500, we have calculated the mean of the first term of Equation(3) and the variance of the second term Equation (3) in Table 1. The SNR at some point in reconstructed image is defined in [4].
SNR ( r , θ ) =
E ( μ i ( r , θ )) . σ ( μ i ( r , θ ))
(7)
And consider the reciprocal of SNR as the density resolution [5]. ı(įrcosį ) is the mean square error of random variable įrcosį in Table 1. Because the shaking quantity of the center of rotation is very small(in general įr is small than 0.7) , ı(įrcosį ) is smaller than 1. Also taking into account the sharp changes of SNR when r is close to R, so the sampling density is increased by ten times when r is larger than 1350.It can be seen from the computation result of Table 1 that there are hardly any effects on density resolution caused by nonstationary axis when r is smaller than 1485, but the SNR drops sharply when r is close to R and the density resolution declines along with it. Table 1. Numerical calculation of density resolution of reconstructed image of a uniformity circle at different r when the shaking quantity of nonstationary axis is tiny r
E(ȝi(r,ș))
ı(ȝi(r,ș))
ı(ȝi(r,ș))/ E(ȝi(r,ș))
0
6399.948453
0.000000×ı(įrcosį )
0.000000×ı(įrcosį )
150
6399.988609
0.000476×ı(įrcosį )
0.000000×ı(įrcosį )
450
6399.991562
0.000946×ı(įrcosį )
0.000000×ı(įrcosį )
750
6400.001474
0.001821×ı(įrcosį )
0.000000×ı(įrcosį )
1050
6400.031210
0.004600×ı(įrcosį )
0.000001×ı(įrcosį )
1350
6400.258385
0.039047×ı(įrcosį )
0.000006×ı(įrcosį )
1380
6400.363610
0.054264×ı(įrcosį )
0.000008×ı(įrcosį )
1410
6400.585839
0.103632×ı(įrcosį )
0.000016×ı(įrcosį )
1440
6401.106782
0.221732×ı(įrcosį )
0.000035×ı(įrcosį )
1470
6403.395663
0.926066×ı(įrcosį )
0.000145×ı(įrcosį )
1500
5300.472977
3298.659848×ı(įrcosį )
0.622333×ı(įrcosį )
2.2 Effects on Density Resolution Under a Relatively Uniform Background In general, CT system is mainly used to check defects of object and therefore its section is formed all in the uniform background with some defects. And it meets the condition of a relatively uniform background, so the conclusion in section 2.2 may be generalized to reconstruct CT image in other uniform background conditions. That is to say, under the relatively uniform background condition, the influences on density resolution of CT image caused by nonstationary axis of rotation are mainly represented by the SNR of
Effects on Density Resolution of CT Image
819
image edge dropped and the main body of image has not affected nearly. We will illuminate this conclusion in the following part. Firstly, we have proved that filter function q1(x) meets Equation (8) (Because of the limit of the length on this paper, the process of proving has not given). n
³ q ( x) dx ≡ 0 m
1
(8)
(m and n are integers) .
So we get Equation (9) from Equation (3).
σ(μi (r,θ))
(9)
ª 1 π ∞ (1) ' ' 'º = « ³ dφ³ λφ (Xr ) q1(r cos(θ −φ) − Xr )dXr » ×σ(δr cos(δφ )) . ¬π 0 −∞ ¼ ª 1 π § +∞ i+1 (1) ' ' ' ·º = « ³ dφ¨ ¦³ λφ (r cos(θ −φ) − Xr ) q1(Xr )dXr ¸» ×σ(δr cos(δφ )) 0 ¹¼ ©i=−∞ i ¬π Under the relatively uniform background condition, the changes of Ȝ (1)(rcos(ș- )in Equation (9) is not big in general while Xr ' is from i to i+1 and it can be substituted by a constant., thus the corresponding sum to integral term is 0.Ȝ (1)(rcos(ș)-Xr’) changes relatively sharp only on the edge and so the corresponding sum to integral term is not 0. In other word, it is very small that the fluctuation of reconstructed image which nonstationary axis of rotation brings to under the uniform background except edge and these effects on density resolution may be neglected compared with other factors and just will make the SNR drop on the edge. Xr’)
3 Experimental Results According to the simulating conditions of reconstructed image provided in [3], we have reconstructed images of four small holes under the uniform background. The value of ȝ of holes is 1.2 times the value of ȝ0 of background. Figure 1 shows the simulative results. It can be seen that the fuzzy degree edge of holes is intensified along with the shaking quantity of axis įrm increasing.
(a) įrm
0
(b) įrm
0.25
(c) įrm
0.4
(d) įrm
0.5
Fig. 1. Reconstructed image under a uniform background with different shaking quantity of axis
820
Y. Wang et al.
4 Conclusions The nonstationry axis of rotation is one of the important factors that influencing CT image quality in the process of image reconstruction and density resolution is the key parameter of CT image quality. In this paper, we have presented a new method of analyzing the effects on density resolution under the situation of unstable axis. Synthesizing the datum given in Table 1, analysis on Equation (9) and the simulative result in Figure1, we may consider that the effects on density resolution caused by unstable axis of rotation only embodied in the edge of image whose density resolution is dropped and the main part of the image is almost not impacted. Density resolution is measuring the differentiating capability of density interval of CT system in large scale. From this standpoint, the fuzzy edge does not impact on the density resolution. So we have concluded that the effects may be neglected for the objects measured by CT system. This paper is providing a new theory basis for CT system and it can be applied to the future CT system.
Reference 1. Joseph A Concepcion, John D Carpinelli, Gioietta Kuo-Petravic, et al.: CT Fan Beam Reconstruction with a Nonstationary Axis of Rotation. IEEE Trans On Medic Imaging(1992) 111 116 2. Kijewski, M. F., Judy, P. F.: The Effects of Misregistration of the Projections on Spatial Resolution of CT Scanners, Med Phys (1983) 169 175 3. Yunxiao Wang, etc.: Effects on Spatial Resolution of CT Image caused by Nonstationary Axis, Proceedings of 2005 International Conference on Machine Learning and Cybernetics (2005) 5473-5478. 4. Barrett, H.H., Swindell.: The Theory of Radiological Imaging, Image Forming, Checking and Processing, Science Press (1998) 395 486 5. Lapin GD, Groothuis DR.:The Necessity of Image Registration for Dynamic CT Studies, Proceedings of the Annual Conference on Eng in Medical and Biology 13 (1991)283-284
Embedded Linux Remote Control System to Achieve the Stereo Image Cheol-Hong Moon1 and Kap-Sung Kim2 1
Gwangju University, Gwangju, Korea [email protected] http://web2.gwangju.ac.kr/ chmoon/ 2 Gwangju University, Gwangju, Korea [email protected]
Abstract. A new embedded SoC(System on a Chip) system, which enables the remote control and capture of a stereo image was developed and used to measure the distance and provide 3D information. This system is a simple and economical stereo system that overcomes the limitations of the systems currently in operation, which include the requirement of high-end equipment and software. The stereo image system developed in this study consisted of two CCDs (Charge Coupled Device), two image decoders to convert the cameras output analog signal into digital and TFT-LCD (Thin Film Transistor Liquid Crystal Display) to display the captured image. An I2C, stereo block and LCD IP control were used to control the system employing Embedded Linux for real-time operation. Web Server was also installed in the embedded stereo system to allow for remote networking with clients. The HTML(Hyper Text Markup Language) file and CGI (Common Gateway Interface) program were designed to control the embedded system in a server . In this system, remote control and image capture from a PC were enabled on the Web.
1
Introduction
Home networking has become an essential part of everyday life. In particular, the Internet provides people with almost infinite information and convenience. This study investigated a system to acquire stereo images through a network. This new system uses a SoC (System on a Chip) to reduce the volume, power consumption and size of the remote control system, which used to be quite complicated and bulky.[1] This study configured an embedded Linux remote control system using a SoC system and designed the IP (Intellectual Property) for the stereo system.[2-5] The IP is the individual component in a chip. Stereo IP, Stereo Control IP and LCD IP were designed in this study.[6] The embedded system, which optimizes the system using the designated functions, can be used more effectively in an Open Source OS such as Linux than in a conventional commercial OS.[7-9] Because Linux performs better in the embedded system, embedded Linux was ported by configuring a SoC system H/W to acquire stereo images. The image was acquired and stored by accessing the embedded system D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 821–826, 2006. c Springer-Verlag Berlin Heidelberg 2006
822
C.-H. Moon and K.-S. Kim
via a web browser and controlling it remotely. The acquired images were confirmed on TFT-LCD through a network and transferred to the client through a web browser.
2 2.1
Software and Hardware Design Software Design
The software design in this article included both development environments and embedded environments.
Fig. 1. Structure of the Kernel
The Linux Kernel was used in the embedded system because satisfies the requirements of the embedded system. Redhet 9.0 by Redhet Co was installed for this study. The kernel is the core of the operating system. It resides in the DRAM of the target board, configures the environment necessary for the system to operate and schedules the program operation. The kernel can be divided into Micro Kernel and Monolithic Kernel. Micro Kernel is the minimal kernel and only performs the core functions of the kernel with the remainder being performed as service processes. Monolithic kernel has the structure including many of the service routines essential for the system operation. Monolithic Kernel has the advantages of easy implementation and effective management of the system resources. However it is difficult to port it to the system with various environments, and the kernel size becomes large. Figure 1 shows the kernel structure indicating the process management, memory management, file system management, device management and network management. Development Environment Configuration. In this article, the environment to develop the embedded software was configured as follows. Minicom was installed to monitor and transfer data through a serial port, and a cross compiler was installed to construct the cross environments. TFTP was installed to construct the network environments, and NFS (Network File System) was implemented to construct the file system through a network.
Embedded Linux Remote Control System to Achieve the Stereo Image
823
Embedded Environment Configuration. To configure the embedded system, the devices were configured first when the system was turned on. The boot loader called the kernel and the file system was compiled and installed in the flash memory. The kernel, which is the core of the Linux, was compiled and installed in the flash memory. Finally, the file system for the users was constructed and installed in the flash. Web Server Construction. For the purpose of this article, the system was configured as a web server. In the file system, the web server, the core of the remote system and the CGI (Common Gateway Interface) program that performs the stereo image acquisition in the web server were designed and installed. The web server installed in this system was BoA web server version 0.94.9. A SoC (System on a Chip) is a technology that condenses not only hardware logic but also a processor, ROM, RAM, controller and peripheral circuits into a chip. This means that a SoC is the ultimate goal for various system-related companies as well as for semiconductor companies. The chip used in this article was Excalibur, which was produced by Altera. It has an ARM922T 32bit processor and the IP is connected to an AMBA (Advanced Microcontroller Bus). The Excalibur chip used in this article included APEX20Ks 400,000 gate FPGA regions.
Fig. 2. Stereo IP Block Diagram
Figure 2 shows the whole block diagram of the Stereo IP. The block diagram was divided into the Processor Area, PLD Area and External Hardware Area. The Processor Area was composed of Arm922T 32bit RISC (Reduced Instruction Set Computer) Core, Stripe-to-PLD Bridge to access the PLD Area and PLD-toStripe Bridge to access the Processor area from the PLD area. The PLD Area is a logically designed part using the VHDL language, which should be implemented as hardware. The Slave Decoder outputs the Select signal for each module by decoding the addresses of each module to access the modules in the PLD Area from the Processor. The I2C Control Logic is the control logic to set the inner register to make an Image Decoder output, YUV422 16bits. The Stereo Control Logic transfers the image data of the valid data zone of the Image Decoder to SRAM Control Logic using the Capture Start signal when the I2C configuration is complete. The SRAM Control Logic enables the data transmitted from the Stereo Control Logic to be transferred to SRAM with the control signal to write the data. Stereo DMA Logic reads the stereo image stored in SDRAM through the PLD-to-Stripe Bridge. The LCD Driver issues various signals to operate the
824
C.-H. Moon and K.-S. Kim
640x480 TFT LCD and displays the data read from the Stereo DMA Controller Logic on the LCD. The Default Slave Logic is selected by default if any of the three modules (SRAM Control, Stereo Control, I2C Control) are not selected. The External hardware area is composed of two image decoders for a Stereo Image and SRAM where the image data will be stored.
Fig. 3. I2C IP Status Diagram
Fig. 4. LCD IP Block Diagram
Figure 3 shows a status diagram of the I2C Control IP. The I2C Control IP separates the input clock as 50khz and uses it as an I2C Main clock. The I2C Control IP has an address of A0000000 and Slave Decoder selects it as the I2C Control Logic when the address approaches A0000000 in the Stripe. Beginning with the start conditions, the I2C Control IP converts parallel data such as a device address, a decoder inner address and data into serial data. It also sets the decoder register and completes data transmission by transmitting the stop conditions. Figure 4 shows a Block Diagram of the Stereo DMA and LCD IP, which is divided to an Excalibur Stripe Area and FPGA Area. It stores the SDRAM address, which will read the image data through the Stripe-to-PLD Bridge and the size of the image that will be displayed on the LCD in the Register BAnk through an AHB Slave Interface. The LCD Driver issues the Pixel Clock, Hsync, Vsync, and DE signals to initiate a 640x480 size TFT LCD with the enable signal of the Excalibur Stripe. The DMA Controller detects the Hblank and Vblank signals, which is the slot that does not display image data on LCDfrom LCD Driver through AHB Master Interface, accesses the Excalibur Stripe zone through the PLD-to-Stripe Bridge and stores the data in DPRAM (Dual Port Ram). The LCD Driver reads the data from DPRAM in the slot where it will display the image and then displays it on the LCD.
3
Experiments and Results
The experiment was performed according to the pathways shown in Figure 5. The figure shows the system operation flow chart. The system is divided into three parts. It begins with the far right client system, a middle Web SoC embedded system and ends with the far left Linux program. (1) Initially, the system is accessed through the client part. When the address, (http://220.69.22.111), is entered through the internet address window in a PC browser, the web server
Embedded Linux Remote Control System to Achieve the Stereo Image
825
Fig. 5. System Operation Flow Diagram
installed in Linux can be accessed via an Ethernet Chip of the embedded system through the network. The Web server delivers the main screen to the client system through an Ethernet. (2) When the Image Creation button in the client is pressed to convert the image, CGI program is executed after first acquiring the image. The program checks the value through getword and initializes the image decoder through the decoder initialization. After initialization, the system acquires data through a CCD camera and stores it in SRAM. After saving, the data is converted to a RAW file through makeraw. The created RAW file is converted to a BMP file through RAW to BMP and is then saved. (3) The saved file after the image conversion is then transmitted from the Web Server to the Client system when the Image Acquisition button in the Client is clicked. (4) A stereo image can then be seen. Simulation. Figure 6 shows the results of a Simulation of the LCD DMA Controller IP. Figures 7 and 8 show the embedded system image when it accesses the web server and the screen when the image was acquired.
Fig. 6. LCD DMA Controller IP Simulation
Fig. 7. Embedded System Image
Fig. 8. Web Server Image Acquisition
826
4
C.-H. Moon and K.-S. Kim
Conclusion
An embedded stereo system was implemented using a SoC Chip and the embedded Linux was installed to acquire the animation image from the outside through a network. Previously, network animation image systems were quite bulky, consumed a large amount of electricity and a network controlling system was difficult to implement. In order to overcome the shortcomings of this conventional system, a SoC was applied to the system and a remote program was designed to allow an existing image device to be controlled remotely. The system contained a controller, which controlled and initialized the stereo image IP. The IP was designed in the PLD part using Quartus II and a I2C controller, Stereo Block controller and LCD IP control were implemented. The IP ported the embedded Linux and file system, which processes multiple tasks simultaneously and in real time. The system was installed on a web server for Linux in order to allow the system to be accessed through the web. A CGI program was created to control the web server and the embedded SoC system, and a stereo image was acquired by controlling the web server remotely.
Acknowledgements This research was supported by the Program for the Training of Graduate Students in Reginal Innovation and the Technical Developement of Regional Industry which was conducted by the Ministry of Commerce, Industry and Energy of the Korean Government.
References 1. 2. 3. 4. 5. 6. 7. 8. 9.
Park, J.H. : IT EXPERT EMBEDDED Linux, Hanbit Media 2003, (2003) ALTERA. : Excalibur Devices Hardware Reference Manual, (2002) ARM. : AMBA Specification (Rev 2.0), (1999) ARM. : ARM922T(Rev 0) Technical Reference Manual, (2000) Sloss, A.N., Symes D. Wright C.: ARM System Developer’s Guide, Elsevier 2005, (2005) PHILIPS. : ”SAA7111AHZ Enhanced video input processor”, Data sheet, (1998) Matthew, N., Stones, R., translated by Han, D.H., Lee, M.Y. : Linux Programming for Beginners, Daerim 1998, (1998) Watson, K.M. Whitis Ma.: Linux Programing Unleashed, SAMS 1999, (1999) Kwon, S.H.: Linux Programing Bible, Global 2002, (2002)
Estimation of Omnidirectional Camera Model with One Parametric Projection Yongho Hwang and Hyunki Hong Dept. of Image Eng., Graduate School of Advanced Imaging Science, Multimedia and Film, Chung-Ang Univ. [email protected], [email protected]
Abstract. This paper presents a new self-calibration algorithm of omnidirectional camera from uncalibrated images. First, one parametric non-linear projection model of omnidirectional camera is estimated with the known rotation and translation parameters. After deriving projection model, we can compute an essential matrix of unknown camera motions, and then determine the camera positions. In addition, we showed that LMS (Least-Median-Squares) is most suitable for inlier sampling in our model than other methods: 8-points algorithm and RANSAC (RANdom Sampling Consensus). In the simulation results, we demonstrated that the proposed algorithm can achieve a precise estimation of the omnidirectional model and extrinsic parameters.
1 Introduction The seamless integration of synthetic objects with real photographs or video images has long been one of the central topics in computer vision and computer graphics [1,2]. Generating a high quality synthesized image requires first matching the geometric characteristics of both the synthetic and real cameras. Since the fisheye lens has a wide field of view, it is widely used to capture the scene and illumination from all directions from far less number of omnidirectional images. This paper presents a new self-calibration algorithm for estimating the omnidirectional camera model from uncalibrated images. First, we derive one parametric non-linear projection model of the omnidirectional camera, and estimate the model by minimizing a distance the projected points and the epipolar curves. In this process to estimate the camera model, our method uses the known rotation and translation parameters. After deriving projection model, however, we can compute an essential matrix of unknown camera motions and then determine the relative rotation and translation. The experimental results showed that the proposed method can reconstruct 3D scene structure and generate photo-realistic images. It is expected that animators, visual effect and lighting experts for the film industry would benefit highly from it.
2 Previous Studies Many researches for self-calibration and 3D reconstruction from omnidirectional images have been proposed up to now. In addition, these approaches are combined widely with IBL (Image-Based Lighting) [2, 5] due to their merits. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 827 – 833, 2006. © Springer-Verlag Berlin Heidelberg 2006
828
Y. Hwang and H. Hong
Xiong et al register four fisheye lens images to create the spherical panorama, while self-calibrating its distortion and field of view [3]. However, camera setting is required, and the calibration results may be incorrect according to lens because it is based on equi-distance camera model. Sato et al simplify user’s direct specification of a geometric model of the scene by using an omnidirectional stereo algorithm, and measure the radiance distribution. However, because of using the omnidirectional stereo, it is required in advance a strong camera calibration for capturing positions and internal parameters, which is complex and difficult process [5]. Although previous studies on calibration of omnidirectional images have been widely presented, there were few methods about estimation of one parametric model and extrinsic parameters of the camera [6~8]. Pajdla et al metntioned one parametric non-linear projection model has smaller possibility to fit outliers, and explanied that simultaneous estimation of a camera model and epipolar geometry may be affected by sampling corresponding points between a pair of the omindirectional images [9]. However, it requires further consideration about various inlier sampling methods: 8points algorithm, RANSAC (RANdom Sampling Consensus), LMS (Least-MedianSquares) [10]. This paper presents a robust calibration algorithm for one parametric model with relative efficiency and determines which method is most suitable for inlier sampling in our model.
3 Projection Model Estimation The camera projection model describes how 3D scene is transformed into 2D image. The light rays are emanated from the camera center, which is the camera position, and determined by a rotationally symmetric mapping function f .
f (u, v) = f (u) = r / tan θ .
(1)
where, r = u 2 + v 2 is the radius of a point (u, v) with respect to the camera center and θ is the angle between a ray and the optical axis. The mapping function f has various forms determined by lens construction [7,11]. The precise two-parametric non-linear model for Nikon FC-E8 fisheye converter as follows: θ=
ar , 1 + br 2
r=
a − a 2 − 4bθ 2 . 2bθ
(2)
where a, b are parameters of the model. On the assumption that the maximal view angle θ max is known, the maximal radius rmax corresponding to θ max can be easily obtained from the normalized view field image. It allows to obtain the one-parametric non-linear model as follows:
θ=
ar 2
r 1+ ( a θ max − 1 rmax ) rmax
.
(3)
Estimation of Omnidirectional Camera Model with One Parametric Projection
829
In order to estimate one parametric non-linear projection model, we use two omnidirectional images with known relative camera direction and translation. 20 corresponding points between images are established by the commercial program MatchMover pro3.0 [12].
(a)
(b)
Fig. 1. Input images were taken by Nikon FC-E8 fisheye converter mounted on Nikon Coolpix995 with 1530 1530 pixels and 20 correspondences marked by red circles. (a) omnidirectional image captured at the reference position, (b) at relatively rotated and translated position (rotation R: -30 around y-axis, unit translation vector T: (tx,ty,tz)=(0.9701, 0, 0.2425)).
Since the relative rotation and translation parameters are known in estimation of the camera model, we can draw an epipolar. In addition, we obtain the parameter a minimizing a distance of the epipolar curves and the projected points as follows:
1 arg min a N
N
¦ d (curve , pt ) . i
(4)
i
i =1
where, N and d(,) are the number of correspondences and Euclidean distance between pt curvei is the i-th epipolar curve, and i is the i-th a curve and a point, respectively. corresponding point.
(a)
(b)
Fig. 2. (a) Sum of distance between epipolar curves and correspondence points by change of parameter a, (b) estimated projection model, parameter a = 1.3
830
Y. Hwang and H. Hong
The distance error graph for the parameter a is shown in Fig. 2 (a). We obtained the minimum error when a is 1.3, and then the estimated projection model is represented in Fig. 2 (b).
4 Experimental Results Correspondences between two images (Fig. 1) were established by MatchMover pro 3.0. After estimating the omnidirectional camera model, we can obtain the relative parameters including rotation and translation, which were computed from the essential matrix. Fig. 3 shows the computed epipolar curves are located precisely on 20 corresponding points. In these results, the average pixel error of 20 feature points is 0.010531 and the angular error is 9.174 10-6, respectively. Fig. 4 shows the epipolar curves on the omnidirectional image from the essential matrix of the image. In these results, the average pixel error of 55 feature points is 0.0155 and the angular error is 2.0 10-5, respectively. In order to ascertain our performance, we have compared the estimated results with the known camera parameters in table 1.
Fig. 3. Computed epipolar curves on feature points
Fig. 4. Feature points and corresponding epipolar curves
The input images (800 800) and 70 corresponding points between two views are showed in Fig. 6. These are manually detected so that the evenly distributed points are selected. We have compared three methods: 8-points algorithm, RANSAC, LMS, to determine which method is most suitable for inlier sampling. In three algorithms the
Estimation of Omnidirectional Camera Model with One Parametric Projection
831
numbers of point’s subset are 70, 47 and 20, respectively. Our experimental results (Table 2) represent that LMS has the least errors among three methods. Inlier selection—where corresponding points should be located in the image—is important for robust estimation of the camera model and the essential matrix. Table 1. Son of real and estimated camera parameters R: -15° with y-axis, T: (1, 0.0, 0.0) Real Parame ters
Esti mated parame ters
ª 0 . 9659 « 0 « « 0 . 2588 « 0 ¬ ª 0 . 9574 « 0 . 001 « « 0 . 2887 « 0 ¬
0
− 0 . 2588
1 0
0 0 . 9659
0
0
R: -30° with y-axis, T: (0.9701, 0, 0.2425)
1º 0 »» 0» » 1¼
ª 0 . 866 « 0 « « 0 .5 « ¬ 0
0 . 0019 0 . 9999 0 . 0099
− 0 . 2887 0 . 01 0 . 9574
0 . 9944 0 . 0117 0 . 1053
0
0
1
º » » » » ¼
ª 0 . 8489 « 0 . 0011 « « 0 . 5286 « 0 ¬
0
− 0 .5
1 0
0 0 . 866
0
0
0 . 9701 º » 0 » 0 . 2425 » » 1 ¼
0 . 0081
− 0 . 5285
0 . 9999
0 . 0172
0 . 0152
0 . 8487
0
0
0 . 9163 º 0 . 0016 »» 0 . 4004 » » 1 ¼
Table 2. Comparison of three algorithms for inlier selection
8-points algorithm
Pixel error (average) 0.0237
RANSAC
0.0142
LMS
0.0106
Angular error (average) 5.068
10-9
3.171
10-9
1.167
10-9
Fig. 5. Feature points and corresponding epipolar curves
Fig. 6. Generated images by rendering synthetic objects into a real-world scene
832
Y. Hwang and H. Hong
In addition, our method is able to identify 3D position of the light sources with respect to the camera positions estimated by omnidirectional calibration. Sampling the bright regions in the image enables the user to detect a light source of the scene [14]. We have reconstructed only the illumination environment including light positions, and experimented on integration of synthetic objects—sphere, torus, table, cone—into the real scene. Fig. 6 represents the generated images and the animation results.
5 Conclusions This paper presents a new self-calibration algorithm for one parametric non-linear projection model of omnidirectional camera. From the estimated projection model, we can compute an essential matrix of unknown camera motions, and determine the camera positions. In simulation results, LMS method can make the most precise inlier sampling in our model. By using both hemispherical coordinates of two cameras, we identify 3D position of the light sources with respect to the camera positions. In addition, photo-realistic scenes can be generated in the reconstructed illumination environment. Further study will include an integration of scene and illumination reconstruction for photo-realistic image synthesis.
Acknowledgements This work was supported by the Ministry of Education, Korea, and under the BK21 project.
References 1. Fournier, A., Gunawan., Romanzin, C.: Common Illumination between Real and Computer Generated scenes. Proc. of Graphics Interface. (1993) 254-262 2. Debevec, P.: Rendering Synthetic Objects into Real Scenes: Bridging Traditional and Image-based Graphics with Global Illumination and High Dynamic Range Photography. Proc. of Siggraph. (1998) 189-198 3. Xiong, Y., Turkowski, K.: Creating Image Based VR using a Self-calibrating Fisheye Lens. Proc. of Computer Vision and Pattern Recognition. (1997) 237-243 4. Nene, S. A., Nayar , S. K.: Stereo with Mirrors. Proc. of Int. Conf. on Computer Vision. (1998) 26~35 5. Sato, Y., Sato, Ikeuchi, K.: Acquiring a Radiance Distribution to Superimpose Virtual Objects onto a Real Scene. IEEE Trans. on Visualization and Computer Graphics, Vol.5, No.1 (1999) 99~136 6. Bunschoen, R., Krose, B.: Robust Scene Reconstruction from an Omnidirectional Vision Oystem. IEEE Trans. on Robotics and Automation, Vol.19, No.2 (2003) 23~69 7. Micusik, Pajdla, T.: Estimation of Omnidiretional Camera Model from Epipolar Geometry. Proc. of Computer Vision and Pattern Recognition. (2003)485-490 8. Micusik, D., Martinec, Pajdla, T.: 3D Metric Reconstruction from Uncalibrated Omnidirectional Images. Proc. of Asian Conf. on Computer Vision. (2004)545-550 9. Micusik, B., Pajdla, T.: Omnidirectional Camera Model and Epipolar Estimateion by RANSAC with Bucketing. IEEE Scandinavian Conf. Image Analysis. (2003)
Estimation of Omnidirectional Camera Model with One Parametric Projection
833
10. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge Univ. (2000) 11. Kumler,J.,M.Bauer.: Fisheye Lens Designs and Their Relative Performance. http://www.realviz.com 12. Oliensis, J.: Extract Two-image Structure from Motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.24, No.12 (2002)1618-1633 13. Agarwal, S., Ramamoorthi, R., Belongie, S., Jensen, H.: Structured Importance Sampling of Environment Maps. Proc. of Siggraph.(2003) 605-612
Expert Knowledge Guided Genetic Algorithm for Beam Angle Optimization Problem in Intensity-Modulated Radiotherapy Planning∗ Yongjie Li and Dezhong Yao School of Life Science and Technology, University of Electronic Science and Technology of China, 610054 Chengdu, China {Liyj, Dyao}@uestc.edu.cn
Abstract. In this paper, a useful tool is proposed to find the optimal beam configuration within a clinically acceptable time using a genetic algorithm (GA) guided by the expert knowledge. Two types of expert knowledge are employed: (1) beam orientation constraints, and (2) beam configuration templates. The knowledge is used to reduce the search space and guide the optimization process. The beam angles are selected using GA, and the intensity maps are optimized using the conjugate gradient (CG) method. The comparisons of the optimization on a clinical prostate case with and without expert knowledge show that the proposed algorithm can improve the computation efficiency.
1 Introduction Intensity-modulated radiotherapy (IMRT) is a powerful technology to improve the therapeutic ratio by using modulated beams from multiple directions to irradiate the tumors. The conventional IMRT planning starts with the selection of suitable beam angles, followed by an optimization of beam intensity maps under the guidance of an objective function [1][2]. The set of such beams should be chosen such that the plan could produce highly three-dimensionally conformal dose distributions to the target, while sparing those organ-at-risks (OARs) and normal tissues as much as possible. Many studies have shown that the selection of suitable beam angles is most valuable for a plan with a small number of beams (<5) [1], and is also clinically meaningful for plans with large number of beams (>9) in some complicated cases, where the tumor volume surrounds a critical organ, or is surrounded by multiple critical organs [3][4]. Beam angle selection is important but also challenging for IMRT planning because of the inherent complexity, mainly the large search space and the coupling between the beam configuration and the beam intensity maps [4][5]. In the current clinical practice, beam angle selection is generally based upon the experience of the human planners. Several trial-and-error attempts are normally needed in order to find a group of acceptable beam angles, mainly because of the facts ∗
Supported by grants from NSFC of China (30500140 & 60571019).
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 834 – 839, 2006. © Springer-Verlag Berlin Heidelberg 2006
Expert Knowledge Guided GA for Beam Angle Optimization Problem in IMRT
835
that beam directions are case-dependant and they are coupled with the intensity maps of the incident beams, which result in the less straightforwardness for manual selection, compared to the conventional conformal radiotherapy (CRT) [3]. To date, extensive efforts have been made by many researchers to facilitate the technique of computer-assisted beam angle selection for IMRT planning [3~9]. Though there are fruitful improvements achieved, it still cannot act as a routine tool in clinical practice because of the limitation of the intrinsic extensive computation time. Two directions for the future studies to further improve the optimization performance are: (1) optimization algorithms themselves, and (2) the external intervention or guidance to the optimization process, such as the usage of the expert knowledge accumulated by the oncologists and physicists over time. This study is concentrated on the technique to improve the optimization efficiency by utilizing the expert knowledge to guide the evolution process of the genetic algorithm (GA). The rest of this paper is organized as follows. In Section 2, the frame of the knowledge guided-GA is briefly introduced, followed by the detailed descriptions of beam angle optimization with GA. The objective function is also described in this section. In Section 3, a clinical prostate is employed to show the performance of the proposed algorithm. Finally, some conclusions and discussion are given in Section 4.
2 Materials and Methods In order to decrease the computation burden, the optimization is separated into two iterative steps: beam angle selection using GA [7], and optimization of intensity maps for the selected beams using a conjugate gradient (CG) method [2]. It is noted that the terms plan, chromosome and individual are used interchangeably throughout the paper, unless they have different meanings that should be clearly stated. 2.1 Knowledge Guided Genetic Algorithm Two types of expert knowledge are used: (1) beam orientation constraints, which define the orientation scopes through which no beam can pass, and (2) beam configuration templates that are the most possible beam angles suitable to the current tumor. The first type is used to reduce the searching space by discarding the defined scopes from the whole space with 360°. The remaining of the total 360° are divided into discrete angles with an increment, such as 10°. The second type of the knowledge is used (1) to initialize some of the individuals in the first generation of GA (the left individuals are initialized randomly), and (2) to replace the worst individual in each new generation. The scheme of expert knowledge guided GA is shown in Fig. 1. No more than a quarter of the total individuals in the first generation of GA are allowed to be initialized with the templates, in order to avoid the knowledge dominating the GA operations at the beginning of the optimization. If there are plan templates remained after the initialization, they will be used to replace the worst individual in each new generation, until no template remains. When the optimization finishes, the optimized results are saved in the database to serve as templates.
836
Y. Li and D. Yao
Fig. 1. The scheme of expert knowledge guided GA for beam angle optimization
Fig. 2. The coding scheme and the genetic operations for beam angle optimization
2.2 Genetic Algorithm for Beam Angle Optimization This study adopts a one-dimensional integer-coding scheme, in which the combination of beam angles is represented by a chromosome with a length of user-specified beam number, and each gene in the chromosome represents a trial beam angle. The genes in one chromosome are different with each other, which means that there should be no two beams with same angles in one plan. This study adopts four genetic operations: selection, crossover, mutation, and pseudo immunity (Fig. 2). Parent individuals with higher fitness are selected into the next generation with a higher probability. To any two selected parent individuals, a crossover operation is applied according to a given crossover probability. Then a mutation operation to the two children is randomly done according to a mutation probability. Finally, a pseudo immunity operation is applied to the two children. For example, in the first individual generated after mutation, the angles in the first and last genes are 310° and 300°, respectively, which are two angles with too little separation for an acceptable plan. So it is reasonable to replace one of them with a suitable angle (310° is randomly selected to be randomly replaced by 220° in Fig. 2). 2.3 Objective Function and Fitness Value For each new individual (i.e. a new plan), a CG method is employed to optimize the corresponding beam intensity maps [2][7]. The optimization aims to minimize the dose difference between the prescribed and the calculated dose distributions, which can be mathematically described by the following objective function
Expert Knowledge Guided GA for Beam Angle Optimization Problem in IMRT
837
(
(1)
NT & & Fobj ( x ) = ¦ δ ⋅ w j ⋅ d j ( x ) − p j j =1
& d j (x) =
where
(
& x = x1 , x 2 , , x N
B
2
&
Nray
) is the beam set,
)
¦ a jm ⋅ xm
(2)
m =1
NB
is the number of the beams in a treatment
& plan. All of the selected beams in x are divided into rays, N ray is the total number of & the ray. Fobj (x& ) is the value of objective function of x . NT is the point number in the
volume; δ = 1 when the calculated point dose in the volume breaks the user-defined constraints, else δ = 0; w j is the weight of jth point, d j and p j are the calculated and prescribed doses of the jth point in the volume. a jm is the dose deposited to the jth &
point from the mth ray with a unit weight. x m is the intensity of the mth ray. The quality of each individual is evaluated by a fitness value, and the purpose is to find the individual (plan) with maximum fitness. The fitness value is calculated by
(
& & & Fitness(s ) = Fmax − Fobj ( s ) , s = s1 , s 2 , , s N angle
)
(3)
where Fmax is a rough estimation of the maximum value of the objective function, which makes sure that all the fitness values are positive, a requirement by the selec& tion operation. S is a group of angles to be selected, and Nangle is the number of the & beam angles of the plan. Both Fmax and Fobj (s ) are calculated using Eqs. (1) ~ (2). The whole optimization is terminated when no better plan can be found in the specified number of successive generations of GA, and the individual with the highest fitness in the last generation will be regarded as the optimal beam angles [7].
3 Results A clinical case with prostate tumor shown in Fig. 3 is optimized using the proposed method. There are four OARs to be considered: rectum, bladder, left and right femur head. The sizes and relative positions of the volumes change substantially from slice to slice. Seven 6MV coplanar photon beams are used to irradiate the tumor. The selection of parameters in GA is important for the optimization performance. Though some theoretical studies have been made for the parameter selection [10], all the parameters are mostly empirically selected in engineering applications [11]. As for the seven-beam plan of the case, the population size is experimentally set to 20. The crossover and mutation probability are set to 0.9 and 0.01, respectively. The results of the optimization running with and without expert knowledge are compared. For our new method, two beam orientation constraints are defined ((a) and (b) in Fig. 3), and a plan configuration candidate with beam angles of 0°, 50°, 100°, 150°, 210°, 260° and 310° is defined as an expert knowledge, shown as the solid white straight lines in Fig. 3. This plan candidate has become an informal standard for the prostate case in the clinical IMRT practice. The algorithm runs independently 20 times with and without the knowledge. All of the runs find the same optimal angles: 10°, 60°, 110°, 155°, 200°, 250° and 300°,
838
Y. Li and D. Yao
shown as the dotted white straight lines in Fig. 3. Table 1 lists the statistical results. Summarizing, about 46 minutes are taken when no knowledge is used, but the computation time is reduced to 32 minutes when all the knowledge is incorporated into the optimization progress, about 30% of the optimization time is saved. The influence of the quality and quantity of the knowledge on the performance are also studied. First, the influence is evaluated when some bad knowledge is provided. We provide some unreasonable plans as the templates, such as (10°, 20°, 30°, 40°, 50°, 60° and 70°). We find that the bad knowledge has little influence on the optimization performance, and when all the individuals in the first generation are initialized with the quiet bad templates, the computation time increased very slightly. Second, the influence of good knowledge is also studied. The good templates are produced by randomly sampling around the known optimal plan given by the above runs. Table 2 shows the computation time for different number of good templates. From the table we can find that the computation time meaningfully decreases when the number of good knowledge increases.
Fig. 3. The prostate case and the dose distribution of the optimized plan. The arcs (a) and (b) are the two orientation constraints. The solid white straight lines are the angles of a plan template (i.e., expert knowledge). The dotted white straight lines are the optimized beam angles. Table 1. The comparasion of the optimization runs with and without expert knowledge
With expert knowledge No Yes
Run times 20 20
Maximum computation time 52 min 42 s 38 min 19 s
Minimum computation time 40 min 37 s 29 min 05 s
Mean computation time 45 min 43 s 32 min 26 s
Table 2. The computation time for different number of good templates
Template Number 5 10 15 20 Comput. Time 28 min 12 s 26 min 45 s 25 min 23 s 23 min 37 s
4 Discussion and Conclusions In this paper, a new technique was developed for beam angle optimization in IMRT planning. The beam angles are selected with GA guided by the expert knowledge. The
Expert Knowledge Guided GA for Beam Angle Optimization Problem in IMRT
839
results on a clinical prostate tumor case show that the optimization efficiency is improved by incorporating the expert knowledge into the optimization process. The optimization of beam angles is a difficult process because of the extensive computation. By fully making use of the plentiful expert knowledge accumulated by the oncologists over time, the presented algorithm is hoped to be more feasible for routine IMRT planning. The optimization efficiency will be slightly or heavily improved, and the optimized angles are better, at least not worse than that of not utilizing knowledge. The degree of the improvement depends on the quantity and quality of the prior knowledge provided by the planner. The principle of GA is quite simple, however, it is not easy for GA to solve efficiently specified engineering optimization problems. Now it is a trend to explore some novel schemes to incorporate the expert knowledge into optimization process. This study has just provided a pilot frame for the combination of knowledge with GA. We are currently working on the building of an easily accessed knowledge database and on the more valid scheme for the guiding of GA with knowledge.
References 1. Webb, S.: Intensity-modulated Radiation Therapy. Bristol and Philadelphia, Institute of Physics Publishing (2000) 2. Spiro, S. V., Chui C. S.: A Gradient Inverse Planning Algorithm with Dose-volume Constraints. Med. Phys. 25 (1998) 321-333 3. Pugachev, A., Boyer A. L., Xing L.: Beam Orientation Optimization in Intensitymodulated Radiation Treatment Planning. Med. Phys. 27 (2000) 1238-1245 4. Hou, Q., Wang, J., Chen, Y., Galvin, J. M.: Beam Orientation Optimization for IMRT by a Hybrid Method of Genetic Algorithm and the Simulated Dynamics. Med. Phys. 30 (2003) 2360-2376 5. Gaede, S., Wong, E., Rasmussen, H.: An Algorithm for Systematic Selection of Beam Directions for IMRT. Med. Phys. 31 (2004) 376-388 6. Djajaputra, D., Wu, Q., Wu, Y. Mohan R.: Algorithm and Performance of a Clinical IMRT Beam-angle Optimization System. Phy. Med. Biol. 48 (2003) 3191–3212 7. Li, Y., Yao J., Yao, D.: Automatic Beam Angle Selection in IMRT Planning Using Genetic Algorithm. Phy. Med. Biol. 49 (2004) 1915-1932 8. Souza, W. D., Meyer, R. R., Shi, L.: Selection of Beam Orientations in Intensitymodulated Radiation Therapy Using Single-beam Indices and Integer Programming. Phy. Med. Biol. 49 (2004) 3465–3481 9. Wang, X., Zhang, X., Dong, L., Liu, H., Wu, Q., Mohan R.: Development of Methods for Beam Angle Optimization for IMRT Using an Accelerated Exhaustive Search Strategy. Int. J. Radiat. Oncol. Boil. Phys. 60 (2004) 1325–1337 10. Goldberg, D. E.: Genetic Algorithms in Search, Optimization, and Machine Learning (Addison-Wesley, Reading, Massachusetts) 1989 11. Yu, Y., Schell, M. C.: A Genetic Algorithm for the Optimization Prostate Implants. Med. Phys. 23 (1996) 2085–2091
Extracting Structural Damage Features: Comparison Between PCA and ICA∗ Luo Zhong1, Huazhu Song1, and Bo Han2 1
School of Computer Science and Technology, Wuhan University of Technology, Wuhan, Hubei 430070, China [email protected] 2 Center for Information Science and Technology, Temple University, Philadelphia, PA 19122, U.S.A
Abstract. How to effectively extract structural features from structural damage signals is always a hot problem in structural engineering domain. In this paper, principal component analysis (PCA) and independent component analysis (ICA) are disscussed in detail for selecting the feature from the measured time series data. Considering of the structural engneering data with unknow covariance and different scales, a standardization PCA based samples is used. In order to speed up the calculation for components, second-order-statistics spatio-temporal decorreltion algorithm is applied. Then, The components from PCA and different ICA algorithms are tested by the benchmark dataset from IASC-ASCE SHM group in British Columbia University. The results show that both PCA and ICA can effectivey reduce the influence from noise; different cumulate contribution rate in PCA plays different roles, and 99% is preferred . For two-damage level, both PCA and ICA are good; but for multi-damage level, ICA is better than PCA with 99% cumulate contribute rate. Therefore, ICA extracts structural features more accurately than PCA.
1 Introduction The identification of damage from vibration data is still a topic of ongoing intensive scientific research. A comprehensive literature review was made by Doebling and some successful methodologies were shown in his report [1,2]. Feature extraction is the process of the identifying damagesensitive properties, derived from the measured dynamic response, which allows one to distinguish between the undamaged and damaged structure. Therefore, how to seek the featues from the dynamic response time serials data becomes the most important step in detecting the civil structural damage. In addition, the measured data from sensors are sensitive to the structural environmental factors such as noise, temperature, and etc, they are difficult to be recognized and detected. Some reserachers have paid more attention to the processing of signals measured from sensors. Yuen proposed two-stage structural health monitoring ∗
This Project is supported by the Chinese Ministry of Education for Advanced University Action Plan (2004XD-03).
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 840 – 845, 2006. © Springer-Verlag Berlin Heidelberg 2006
Extracting Structural Damage Features: Comparison Between PCA and ICA
841
approach for phase I benchmark studies [3]. Structural experts wish to find some new robust approaches to structural features selection. Recently, principal component analysis (PCA) and independent component analysis (ICA) have become very popular for feature selection in data mining. PCA involves a mathematical procedure that transforms a number of possibly correlated variables into a smaller number of uncorrelated variables called principal components. It can discover or reduce the dimensionality of the data set, and identify new meaningful underlying variables, which removes the influence of noise. ICA, mostly used in feature reduction from time series data, is a statistical and computational technique for revealing hidden factors that underlie sets of random variables, measurements or signals. Zang [4] and Song [5] applied ICA to model damaged structures. Their results showed that ICA is a more robust method for feature selection and leads to more accurate classification. However, they didn’t compare the two methods in details. In this paper, the measured time serials signals from sensors were processed by ICA and PCA. For convinent to compare PCA and ICA, the frame combining PCA/ICA with a classifier including support vector machine (SVM) and artificial neural network (ANN) is shown in brief; next, PCA and SOS-ICA are discussed in section 2. In section 3, the experiments, based on the benchmark data from the University of British Columbia, are shown. Finally, the conclusion is in section 4.
2 Methodology 2.1 Frame for Structural Damage Identification The frame of integrating PCA/ICA and classifier is shown in Fig.1. The original time domain data X1, X2, …, Xn and noise measured by the sensors are first used as the input to PCA/ICA model, and result in the independent component matrix z, which serves as the input attributes for Classifier (SVM/ANN model), and the output of the Classifier is the status of strucure. This is a non-physics based model, therefore, it can be treated as a general model in structural damage identification, where ICA/PCA is for feature selection and the Classifier is for classification. ANN is used in this paper. X1
Node 1
X2
Node 2
Xn
Node n
ICA / PCA +
z
Classifier
status
(SVM / ANN) noise
Fig. 1. Frame for structural damage identification
2.2 Featrue Selection by PCA and ICA PCA finds orthogonal directions of gretest variance in the data. It extracts principal components amounts according to a variance maximizing rotation of the original structural features, whose algorithm is shown in [6].
842
L. Zhong, H. Song, and B. Han
ICA techniques provide statistical signal processing tools for optimal linear transformations in multivariate data. Considering the volume of the structural damage data, a fast method to calculate componetnts is needed. AMUSE, supposed by François Vialatte1and Andrzej Cichocki [7], arranges components not only in the order of devreasing variance, but also in the order of their decreased linear predictability. It belongs to the group of second-order-statistics spatio-temporal decorrelation algorithm (SOS-ICA), and uses simple principles that the estimated components should be spatio-temporally decorrelated and be less complex than any mixture of those sources. The components are ordered according to decreasing values of singular values of a time-delayed covariance matrix. Therefore, it applies two consecutive PCAs: first PCA is applied to input data; second PCA is applied to the time-delayed covariance matrix of the output of previous stage. Unlike in many ICA algorithms, all components estimated by SOS-ICA are uniquely defined and consistently ranked. Therefore, SOS-ICA is faster than some other ICA algorithms.
3 Experiments In this section, we will use PCA and ICA to extract features from structual data, apply both undamaged and damaged data as training data to construct a classifer ANN (three level: one input layer, 10 nodes in hidden layer, 1 node in output layer with Sigmoid function), and then apply the ANN model to test unseen data, explore that if they are correctly recognized. In addition, the different cumulate contribution rate is discussed for the practice structural damage domain. For comparing with different ICA algorithms, SOS-ICA and FastICA algorithm are applied in the experiments. The non-quadratic function g(y) = tanh (a1×y) is used to compute nongaussianity in FastICA [9]. Two-damage level and multi-damage level will be used to test PCA and ICA. 3.1 Data Sets A popular benchmark is used to testify the classification accuracies, which is developed by the IASC-ASCE SHM task Group at University of British Columbia. The structure is a 4-story, 2-bay by 2-bay steel-frame scale-model structure in the Earthquake Engineering Research Laboratory [8]. In our experiments, we mainly use seven data sets in the ambient data from this benchmark dataset, where C01 is an undamage dataset, C02∼C07 are different damage type datasets. The detailed information was shown in [9]. There are 15 attributes in each dataset. They correspond to the signals from 15 sensors located in this steel-frame. Also the benchmark dataset provide additional noise attribute. Seven datasets in the ambient data from this benchmark are used, in which we random extract 6000 data from original config01∼config07 data to respectively C01∼C07. 3.2 Identifying Two-Damage Level Experiment by PCA and SOS-ICA with ANN Two-damage level is often met in civil structural damage. C01 is treated as undamage data whose output is ‘1’ ; and so on, C07 is damage data whose output is ‘7’. Some of
Extracting Structural Damage Features: Comparison Between PCA and ICA
843
C01 and one from C02 to C07 are training data, corresponding the rest data are test data, the damage prediction values are shown in Table 1. So, PCA with 95% or 99% cumulate contribution rate and SOS-ICA can identify the two damage level well, any of these methods can be used. Table 1. Prediction value for two-damage level by PCA and SOS-ICA
C01/C02 C01/C03 C01/C04 C01/C05 C01/C06 C01/C07
1st class value 1 1 1 1 1 1
PCA (95%) 1.0000 1.0000 1.0002 1.5242 1.1758 1.0135
PCA (99%) 1.0001 1.0010 1.0385 1.0019 1.0008 0.9995
SOS 2nd -ICA class value 0.9999 2 0.9999 3 0.9999 4 0.9999 5 1.0001 6 1.0003 7
PCA (95%) 2.0002 2.9993 3.9445 4.3959 5.6817 7.0003
PCA SOS (99%) -ICA 1.9869 1.9912 2.9758 2.9737 3.9998 3.9308 4.9867 4.9665 5.9733 5.9542 6.8456 6.9339
3.3 Identifying Multi-damage Level Experiment by PCA and ICA with ANN For further test the identification methods, PCA and Fast-ICA are used to select features, and the result as an input to ANN. 3.3.1 Components from PCA and ICA By applying the PCA with different cumulate contribution and Fast-ICA algorithm, the original signal of C01 is shown in Fig.2. and the components of C01 are relatively shown in Fig.3∼Fig.5. Then, we calculate the correlative coefficient of the noise and the number of attributes correlated with noise from the component data, then calculate the correlative coefficient of the data and noise extracted by PCA and ICA, finally calculate the number of attributes correlated with noise, and the result is shown as Table 2, where we observed that PCA and ICA can effectively reduce the noise in structural signals. 3.3.2 Identifying Multi-damage For multi-damage level experiment, C01 is treated as undamage data whose output is ‘1’ ; and so on, C07 is damage data whose output is ‘7’. Some of C01∼C07 are as
-3
4
-3
original dataset C01
x 10
4
x 10
PCA with 95% contribution rate on dataset C01
2
3
0 2
-2
2 0
1000
2000
3000
4000
5000
6000
1000
2000
3000
4000
5000
6000
1000
2000
3000
4000
5000
6000
x 10
0
-1
-2
0 -3
-2
1 -3 -4
0 -3
1
x 10
0
0
1000
2000
3000
4000
5000
Fig. 2. Original C01signal
6000
-1
0
Fig. 3. PCA(95%) components of C01
844
L. Zhong, H. Song, and B. Han
-3
5 0 -5 2 0 -2 1 0 -1 4 2 0 -2 -2.5 -3 0 -0.5 -1
PCA with 99% contribution rate on dataset C01
x 10
0 -3 x 10
1000
2000
3000
4000
5 0 -5
5000
5 0 -5
6000
10 0 -10
0 -3 x 10
1000
0 -3 x 10
1000
2000
3000
4000
5000
6000
5 0 -5 2 0 -2
0 -3 x 10
2000
3000
4000
5000
6000
5 0 -5 1000
2000
3000
4000
5000
10 0 -10
6000
10 0 -10
0 -3 x 10
1000
0
1000
2000
2000
3000
4000
3000
5000
4000
6000
5000
5 0 -5 5 0 -5
6000
ICA on dataset C01 0
1000
2000
3000
4000
5000
6000
0
1000
2000
3000
4000
5000
6000
0
1000
2000
3000
4000
5000
6000
0
1000
2000
3000
4000
5000
6000
0
1000
2000
3000
4000
5000
6000
0
1000
2000
3000
4000
5000
6000
0
1000
2000
3000
4000
5000
6000
0
1000
2000
3000
4000
5000
6000
0
1000
2000
3000
4000
5000
6000
0
1000
2000
3000
4000
5000
6000
Fig. 4. PCA(99%) components of C01
Fig. 5. ICA components of C01
Table 2. Attributes number correlative to noise ICA PCA(95%) Source data ICs# correlative noise # PCs# correlative noise # C01 C02 C03 C04 C05 C06 C07
9 2 2 3 10 6 4
10 7 7 7 8 9 10
3 0 0 0 4 2 1
3 7 9 7 2 3 10
PCA(99%) PCs# correlative noise #
0 0 0 0 1 0 1
6 11 11 10 5 6 13
Predict 7 different damage level with ICA components
Predict 7 different damage level by PCA with 99% contribution rate
Predict 7 different damage level by PCA with 95% contribution rate
8
8
8 C07
C07
7
7
2 1 1 2 2 0 2
C07
7
C06
C06(x)
6
6
6
C05 C05
C03(x) C04(x)
3
5 C04
4 C03 3
2000
C03 3 C02
1
1
1000
4
C01
C01
1
0
C04
2
2 C01
0
5
C02
C02(x)
2
dam age lev el
dam age lev el
dam age level
C05(x) 5 4
C06
3000
4000 5000 6000 number of samples
7000
8000
9000
Fig. 6. Predict by PCA(95%)
0
0
0
1000
2000
3000
4000 5000 6000 number of samples
7000
8000
9000
Fig. 7. Predict by PCA(99%)
0
1000
2000
3000
4000 5000 6000 number of samples
7000
8000
9000
Fig. 8. Predict by ICA
training data, the rest C01∼C07 are as test data, the damage prediction values are shown in Fig.6∼Fig.8. When using 95% as the PCA cumulate contribution rate, prediction of C03∼C06 are wrong, and also half error to C02, therefore, PCA with 95% cumulate contribute rate can not meet the requirements. But when changing to 99%, the seven damage situations of C01∼C07 can all be forecasted accurately. Comparing to C05∼C06, the forecast accurate rates of C01∼C04 and C07 are much higher. But the forecast effects of ICA are perfect, it can distinguish the seven different damages accurately, and has
Extracting Structural Damage Features: Comparison Between PCA and ICA
845
a high accuracy. Therefore, the experiments indicate both PCA(99%) and ICA can distinguish different damages, and ICA surpasses PCA in extracting structural feature.
4 Conclusions In this article, PCA and ICA of the statistical methods are applied to extract structural features from the original measured data, which have complex relations among them. Different ICA algorithms and different cumulate contribution rates are discussed. Our experiment with the data from the benchmark data from IASC-ASCE SHM task group is used to verify our solution: PCA and different ICA algorithms (SOS-ICA and FastICA) can reduce the effection by noise. For two-damage level, PCA with 95% or 99% cumulate contribution rate and ICA can get a accuracy prediction result; but for multi-damage level, PCA with 99% cumulate contribution rate can predict the damage level accurately, but ICA is more better way than PCA. Therefore, PCA is subject to ICA in damage feature extraction.
Reference 1. Doebling, S.W., Farrar, C.R., Prime, M.B., Shevitz, D.W.: Damage Identification and Health Monitoring of Structural and Mechanical Systems from Changes in Their Vibration Characteristics: a Literature Review. Los Alamos National Laboratory. The Shock and Vibration Digest, 30 (1998) 91-105 2. Fritzen, C.P., Jennewein, D., Kierer Th.: Damage Detection Based on Model Updating Methd., Mechanical systems and Signal Processin. 12 (1998) 163-186 3. Yuen, K. V.: Two-stage Structural Health Monitoring Approach for Phase 1 Benchmark Studies, Journal of Engineering Mechanics, ASCE, 130 (2004) 16-33 4. Zang, C.: Structural Damage Detection using Independent Component Analys, Structural Health Monitoring, Int. J. 3 (2004) 69-84 5. Song, H.: Structural Damage Detection by Integrating Independent Component Analysis and Artificial Neural Networks. MLMTA’05, CSREA Press, (2005) 190-196 6. Zhong, L.: Structural Damage Identification Based on PCA and ICA, Accepted, Journal of Wuhan University of Technology, will be shown in No.7 (2006) 7. Vialatte1, F., Cichocki, A.: Early Detection of Alzheimer’s Disease by Blind Source Separation, Time Frequency Representation, and Bump Modeling of EEG Signals. ICANN 2005, Springer-Verlag Berlin Heidelberg LNCS, 3696 (2005) 683-692 8. http://wusceel.cive.wustl.edu/asce.shm/benchmarks.htm 9. Song, H.: Structural Damage Detection by Integrating Independent Component Analysis and Support Vector Machine. ADMA 2005, Springer LNAI, 3584 (2005) 670-677
Face Alignment Using an Improved Active Shape Model Zhenhai Ji1, 2, Wenming Zheng1, Ning Sun1, 2, Cairong Zou2, and Li Zhao2 1
Research Center of Learning Science, Southeast University, Nanjing, Jiangsu, 210096, P.R.China 2 Department of Radio Engineering, Southeast University, Nanjing, Jiangsu, 210096, P.R.China {jizhenhai, sunning}@seu.edu.cn
Abstract. Active Shape Model (ASM) is composed of global shape model and local texture model, and it is a powerful tool for face alignment. However, its performance is often influenced by some factors such as the initial location, illumination and so on, which will frequently lead to the local minima in optimization. Fully using the local information of each landmark we propose an improved method of ASM, in which we extend traditional local texture model to be three sub models by adding another two local models, combining with two models we construct a more robust model of ASM. Experiments show that the improved method solves the local minima problem efficiently and demonstrates the better accuracy performance and wider region in searching the target face image.
1 Introduction Face alignment is a vital step in the field of face recognition and facial expression recognition. Beinglass et al. [1] described a scheme for locating such objects using a Generalized Hough Transform with the point of articulation as the reference point for each subpart. Kass et al. [2] introduced Active Contour Models, an energy minimization approach for shape alignment. Wiskott et al. [3] developed Elastic Bunch Graph to locate facial feature using Gabor Wavelet. Nastar et al. [4] used a finite element approach to model the vibration modes of a shape. Particularly, Cootes et al. [5], [6] proposed the Active Shape Model and Active Appearance Model (AAM) respectively, and they have been the representative statistical modeling methods for their better performance and generalized application in many fields. ASM includes the global shape model and local texture model, which are all derives from the point distribution model. However, its performance often suffers from the factors such as the initial condition, illumination, facial expression etc., and all the above factors frequently lead to the local minima in optimization. To solve the local minima, many improved methods have been proposed in this several years. Zhao et al. [7] proposed a new shape evaluation on ASM and use it to drag the search out of the local minima; and Jiao et al. [8] proposed an improved method of ASM, called W-ASM, to solve the local minima problem. Different from the above two D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 846 – 851, 2006. © Springer-Verlag Berlin Heidelberg 2006
Face Alignment Using an Improved Active Shape Model
847
methods and under another consideration of the essence in ASM, we explore the local information of each landmark fully, and extend the original local texture model to be a more robust model, which includes three sub models, the original local texture model and another two models. The added two sub models will comprise more information than the original unique model, so the combination of three sub models can capture more sufficient local information. The rest of paper is arranged as follows. The traditional ASM algorithm is described in Section 2. In Section 3, we present the improved method. Experimental results are presented in Section 4, and the last Section is conclusions.
2 Algorithm of Traditional ASM The traditional ASM method is constructed based on the theory of Point Distributed Model (PDM), which derived from the landmarks around the shape; Figure 1 is one of face example, which is handed annotated by 58 landmarks. Supposed that we have a training face image set, and all the coordinates of landmarks for every face image are Tangent Face contour
Normal p2
p0
p1 Fig. 1. Annotated image with landmarks Fig. 2. Sampled process sketch of three sub models
connected sequentially into a vector. So, we can obtain a vector set corresponding with the training set. After aligning these vectors using Generalised Procrustes Analysis and Principle Component Analysis (PCA), we can generate the ASM global shape model, and each face image can be approximately represented by fewer main mode parameters. The formula of global shape model can be written as below:
s ≈ s + Pb .
(1)
Where s is the mean shape, b is the shape parameter, and P is the transformation matrix, which is composed of the principle component vectors. Another part of ASM is local texture model, which to describe local features character of each landmark, and it is constructed using the first derivative of the sampled intensity vectors perpendicular to this landmark contour. During searching facial feature points in a target face image, we calculate the Mahalanobis distance using formula (2) and the minimum value corresponding to point is the optimal −
candidate point. In which,
l p and ¦ p represent the mean texture vector and
848
Z. Ji et al.
p respectively; q is the near point on the line of perpendicular to the landmark p , and l q is the normalized sampled local texture covariance matrix in landmark
vector at the point
q , and q opt is the optimal candidate point of landmark p . −
−
qopt = arg min[(l q − l p ) T ¦ −p1 (l q − l p )] .
(2)
q
3 Improved Active Shape Model According to the whole construction process of ASM, we recognize that the local texture model acts as a key role in searching the candidate point of each landmark. Under the consideration we improved the local texture model and extend it into three sub models. The following describes the whole construed process of three sub models in detail. The first sub model is the original local texture model, and we rename it as middle local texture model for convenience to distinguish from the rest two sub models. The rest two sub models, named as the intern local texture model and extern local texture mode respectively, are constructed in similar way as the middle local texture model, and the difference between them is at the location of selection; the intern local texture model mainly sample “in” the face image, the extern local texture model mainly sample “out” of the face image, and the middle local texture model mainly sample on the contour of face image. The figure 2 shows the above sampled process roughly, supposed that p 0 is a point on the edge of face contour, p1 is the point “in” the face
p0 , and p 2 is a point “out” the face image corresponding point p 0 ; λ1 and λ 2 are the distances from the point p 0 to the point p1 and point p 2 respectively; let p0 , p1 , p 2 to be as the center point respectively, image corresponding the point
then sample some length intensity along the two sides of the each center point, thus we can obtain the corresponding sampled vectors for each landmark. After normalization for such sampled vectors just as described in section 2, we can obtain three mean vectors and three covariance matrixes for given landmark p , denoted as
l pm , l pi , l pe , Σ mp , Σ ip , Σ ep respectively; in which, the symbols meaning of l and ¦ represent mean vector and covariance matrix, and the superscripts m, i, e in them denote three kinds local texture model respectively. After the extension of local texture model, the evaluation function corresponding with formula (2) can generalize as the following: −1
qopt = arg min[α (lqi − l pi )T ¦ ip (lqi − l pi ) + q
β (l − l ) ¦ m q
m T p
m −1 p
(l − l ) + γ (l − l ) ¦ m q
m p
e q
e T p
e −1 p
(3)
(l − l )] . e q
e p
Face Alignment Using an Improved Active Shape Model
849
l qi , l qm , l qe demoted the corresponding three local texture vectors at point q respectively, and α , β , γ are the nonnegative weights such that the sum of them
Where
equals to 1. Comparing with the formula (2) and (3), we can see the formula (2) is a special case of extension of original ASM, and here the weighted coefficients only need to satisfy the conditions α = γ = 0 and β = 1 . On the other hand, Because of covering wider region in searching the target point, we may get more optimal candidate point q opt using the evaluation function of formula (3). During the construction of improved local texture model, it refers to selection of five coefficients λ1 , λ 2 , α , β , γ . In this paper, compared with the internal and extern local texture model, the middle local texture model should act as more important role for its kernel location, and the internal local texture model and external local texture model seem to play the same roles, so the value of α , β , γ can set to be 0.25,0.5,0.25 respectively. λ1 and λ 2 are the distance between two points, the values of them are neither too small nor too large, hereby, we must give a tradeoff of them, λ1 and λ2 set to be 4 pixels and 4 pixels in this paper experimentally.
4 Experimental Results Our database contains 160 different frontal face images [9], which includes 40 different characters and each has 4 different face images and the size of each image is 120*160. Each image is manually annotated 58 landmarks just as Fig. 1. We select 80 images as the training set, which are from 40 different characters and each is chosen 2 images randomly, and the rest 80 images are as the test set. In addition, we also select 40 images randomly from Yale face database B [10] as the supplement of test set.
(a) traditional ASM
(b) improved ASM
Fig. 3. Solving the problem of local minima
4.1 Solving Problem of Local Minima in Optimization ASM may plunge into the local minima problem in optimization when searching feature points in an unknown target face image. Through imposed the internal local texture model and extern local texture model on the point, we can drag such point out of the local minima, and figure 3 show the result.
850
4.2
Z. Ji et al.
Comparisons on Capture Range
For each test image, we begin searching from the identical mean model at different displacements by up to ± 10 pixels in x coordinate, and then perform searches to attempt to make the displaced mean model converge to the original optimal position. After the convergence, we calculate the point-to-point error from the convergence points to the labeled points. Figure 4 demonstrates the point-to-point errors vs. different displacements. From the two curves we can see that our improved method has more wider capture range than the traditional ASM.
Fig. 4. Point-to-Point errors with different initial displacements
4.3 Comparison on Point Location Accuracy We displace the landmarks of each test image from the accurate position to the different displacements in x, and then run the searches to make the displaced point regress to the original accurate positions. After convergence, we compare the estimated points with the original labeled points. Figure 5 shows the experimental result. The x coordinate is the distance from the found points to the labeled points. The y coordinate is the percentage of the number of the points. We can see the improved method has more accurate performance than the traditional ASM.
Fig. 5. Percentage of Point Number vs. Distance from Found Points to Labeled Points in x
5 Conclusions The conventional ASM is composed of global shape model and local texture model. However, its performance is often influenced by some factors, which will frequently
Face Alignment Using an Improved Active Shape Model
851
lead to the local minima in optimization. Through using the local information fully we construct a more robust model. Difference from the original ASM, the more robust model includes three sub models, which are combined together to search the new candidate point. Experiments show that the improved method solves the local minima problem efficiently and demonstrates the better accuracy performance and wider region in searching the target face image.
Acknowledgment This work was partly supported by the national natural science foundations of China under grant 60503023, and partly supported by the natural science foundations of Jiangsu province under the grant BK2005407.
References 1. Beinglass, A., Wolfson, H.J.: Articulated Object Recognition, or: How to Generalize the Generalized Hough Transform in Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. (1991) 461-466 2. Kass, M., Witkin, A., Terzopoulos, D.: Snakes:Active Contour Models. 1st International Conference on Computer Vision, London. (1987) 259-268 3. Wiskott, L., Fellous, J. M., Kruger, N., Christoph V.M.: Face Recognition by Elastic Graph Matching: Interlligent Biometric Techniques in Fingerprint and Face Recognition, Eds. Jain L.C. et al. (1999) 355-396 4. Nastar, C., Ayache, N.: Fast Segmentation, Tracking and Analysis of Deformable Objects. International Conference on Computer Vision, IEEE Comput. Soc. Press. (1993) 275-279 5. Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active Shape Models – Their Training and Application. Computer Vision and Image Understanding, 61 (1995) 38-59 6. Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active Appearance Models. European Conf. On Computer Vision. Springer. 2 (1998) 484-498 7. Zhao, M., Li, S.Z., Chen, C., Bu, J.J.: Shape Evaluation for Weighted Active Shape Models. The Asian Conference on Computer Vision. .Korea. 2 (2004) 1074–1079 8. Jiao,F., Li, S.Z., Shum,H.Y., Schuurmans, D.: Face Alignment Using Statistical Models and Wavelet Features. Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition.1 (2003) 321-327 9. Stegmann, M.B., Ersb ll, B.K., Larsen, R.: FAME – A Flexible Appearance Modeling Environment. IEEE Trans. on Medical Imaging. 22 (2003) 1319-1331 10. Athinodoros, S.G., Peter, N. B.: From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose. IEEE Trans. PAMI. 23 (2001) 643-660
Face Detection with an Adaptive Skin Color Segmentation and Eye Features Hang-Bong Kang Dept. of Computer Eng. Catholic University of Korea, #43-1 Yokkok 2-dong Wonmi-Gu, Puchon City Kyonggi-Do, Korea [email protected]
Abstract. In this paper, we propose a new method of face detection using an adaptive skin color model and eye features. First, we detect skin color segments adaptively using a two-dimensional Gaussian model from the CrCb skin color space. On these skin color segments, shape analysis is performed to reduce false alarms. Then, eye feature points for face are extracted. The possible eye feature points are compared with normalized eye features obtained from the training data for verification. At this step, we use a modified Hausdorff distance. Experimental results are given to demonstrate our face detection approach in slanted face images and different lighting conditions.
1 Introduction Detecting human faces is a first step in many multimedia applications such as automatic face recognition, face tracking and surveillance system. Several different cues such as skin color, motion, facial shape, and facial appearance can be used in face detection [1]. In particular, color has been suggested to be a powerful fundamental cue in face detection [2,3]. In recent years, various statistical color models are used in discriminating skin pixels and non-skin pixels for face detection. Single Gaussian model [2,3], Color histograms [4], and a Guassian mixture density model [5] were suggested. Wang and Chang [2] used color thresholding for face detection by means of suitable CrCb skin color distribution. Tsapatsoulis et al. [3] proposed a model which combines a twodimensional Gaussian color model and shape features with template matching. Jones and Rehg [4] proposed a comprehensive analysis of skin and non-skin color models and showed that histogram models were found to be slightly superior to Gaussian mixture models in terms of skin color classification. In this approach, Bayesian detectors based on skin color histogram produced higher face detection results but their adaptation involves increased computational cost. Even though color-based face detection system may work fast, there are still some limitations because the color of a face varies with changes in illuminant color, viewing geometry and miscellaneous sensor parameters. So, it is desirable to develop an adaptive algorithm to handle various situations. In this paper, we propose a new face detection approach that combines adaptive two-dimensional Gaussian color model D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 852 – 857, 2006. © Springer-Verlag Berlin Heidelberg 2006
Face Detection with an Adaptive Skin Color Segmentation and Eye Features
853
and eye features with template matching. Section 2 describes adaptive skin color segment detection method. Section 3 explains face candidate extraction and Section 4 discusses the template matching. Section 5 presents our experimental results.
2 Adaptive Skin Color Segmentation Generally, the skin color subspace in YCrCb color space covers a small area of the CrCb chrominance plane. However, it is very difficult to construct a skin color model to efficiently detect faces in all images. One possible solution is to use an adaptive skin color model to accommodate apparent changes in color due to varying lighting conditions. First, we model the CrCb skin-tone color distribution as two-dimensional Gaussian distribution.
½ 1 exp ®− ( x − μ 0 )T Σ −1 ( x − μ0 ) ¾ 2 ¯ ¿. P ( x | μ 0 , Σ) = 1 d ( 2π ) 2 | Σ | 2
(1)
where d = 2 , μ0 is mean vector and Σ is the covariance matrix. These parameters are initially estimated from training data which consist of face data of different races obtained from Web. Secondly, we detect skin segments by computing the likelihood of pixel using threshold value T = μ0 + σ . From the pixels that are classified as skin segments, we extract connected regions. After that, we compute average likelihood of the largest connected region R as μc
=
1 N
¦ p( x(i)) where N is the number of pixels in i∈R
region R. Finally, we update the mean of the Gaussian model as
μ0 = (1 − γ ) μo + γμc
(2)
where γ is the learning rate. The learning rate should be chosen depending on the lighting variations. In our experiments, γ = 0.7 has been shown to provide good results. If no skin regions are detected, the update process like Eq. (2) is not executed. Using adaptative color model, we can handle faces with new color characteristics or illumination changes.
3 Face Candidate Extractions From the list of skin segments, we execute morphological operations like opening and closing with structure element “+”. After that, it is necessary to reduce false alarms. We therefore compute the shape features because extracted segments are sometimes unrelated to the human faces.
854
H.-B. Kang
Since the shape of face is elliptical, it is desirable to compute the similarity of the shape of the extracted segment with an elliptical shape. The shape feature is computed from the bounding rectangle of each skin segment. We compute the ratio of the width (or short side) to the height (or long side) of the bounding rectangle. If the ratio is in the range between 0.4 and 0.9, we classify the segments as the candidates of face segments. To select probable face segments, we compute facial features from face candidates. There are various facial features such as eyes, a nose, and a mouth. Among these features, we select eye candidates as salient features because they are important in characterizing faces in different viewing geometry. To extract eye candidates, we transform the candidates of face segments into a gray scale image. After that, we make a binary image using a threshold value that is determined from the cumulative histogram. When the amount of cumulative histogram reaches the 15% of whole histogram, its value is chosen as the initial threshold value. From the binary image, we extract connected components which have enough size. If the number of salient features is not large enough at this threshold value, we increase the threshold value by 5% amount of cumulative histogram. If the number of extracted features is large enough, we stop the increase of threshold value. For detected feature points, we label them like Figure 1(a). To compute the possibility for detected points to be eye features, we divide the face segment into four regions because eyes are located in the upper side of the face. First, we divide the bounding rectangle of face segment into 3 to 2 ratio in height. Then, we compute the average of x coordinates of feature points in the window and then divide the window vertically using average value of x coordinates. This is shown in Figure 1(b). For each feature points in upper left window, we execute dilation operation on the feature points with a disk structuring element. Then, we compute the compactness value of dilated feature points. If the compactness value is near 1, we choose the point as eye candidate. So, the feature points are sorted according to the compactness value and saved as a list.
Fig. 1. Eye feature points extracted from a skin segment
4 Template Matching In this Section, we will discuss the face verification stage. From the eye feature list, we assess whether the segment is face or not by template matching. For template matching, we construct normalized eye feature template from training data. Figure 2 shows the normalized eye features. Before matching extracted features, we first compute the gradient of connecting line of two eye features. If the gradient of the eye line is not equal to horizon, we rotate the gradient of extracted eye line into horizontal.
Face Detection with an Adaptive Skin Color Segmentation and Eye Features
855
Then, we choose the eye region from the rotated images. By doing this process, our method can detect slanted faces efficiently. To compute similarities between extracted eye features and a template, we measure the modified Hausdorff distance [6]. The modified Hausdorff distance is defined as follows:
H ( A, B) = max(h( A, B), h( B, A)) . where
(3)
h f ( A, B) = f ath∈A min a − b and f x∈th X g (x) denotes the f-th quantile value b∈B
of g(x) over the set X. For a possible pair of eye features, we compute the modified Hausdorff distance from template. We choose the eye feature candidates which have the minimum Hausdorff distance from normalized eye features. After finding eye candidates, we compute the face area by the angle between eye lines and a chin. The angle is computed from the training image. After detecting eye and chins, we localize the face from the image data.
(a)
(b)
Fig. 2. Normalized eye feature
(c)
Fig. 3. Skin color segments detection
5 Experimental Results Our proposed method consists of three modules such as a skin segment extraction module, a face candidate extraction module, and a template matching module. We have tested our method on 14 video sequences which were captured from the web camera. Four sequences have severe illumination changes. First, we detected skin segments using adaptive two-dimensional Gaussian distribution model. The mean chrominance vector is updated from current image. Figure 3(b) shows the detected skin segments using initial mean vector of the image. Figure 3(c) shows updated detected segments. From the skin segments, we analyzed shape features of the bounding rectangle of each skin segment and then extracted eye feature candidates. We computed the gradient of eye lines and rotated the input image for template matching. Figure 4(a) shows input image and we rotated the image by 13.5 degree like Figure 4(b). Finally, we cut the eye regions from the image for template matching. The size of eye regions was adjusted by expanding or shrinking according to the normalized image. Sobel edge detection is performed and modified Hausdorff distance was computed for each pair of eye features. We selected desirable eye points and computed the face area using the angle between eyes and a chin.
856
H.-B. Kang
Figure 5 shows the detected slanted face. Table 1 shows the face detection result. The accuracy of face detection results on our test sequences are 96.8% for sequences with slanted faces and 95.1% for sequences with illumination changes. For the sequences having illumination changes or slanted faces, the results of our method is slightly better than boosting-based methods [7].
(a)
(b)
Fig. 4. (a) input image, (b) rotated image
Fig. 5. Face detection result
Table 1. Face detection results
Sequences Total Normal se3206 quences with slanted faces Sequences with illumina1467 tion changes
Detected
False Alarm
Accuracy
3104
102
96.8%
1396
71
95.1%
6 Conclusions In this paper, we proposed a new approach for face detection method with adaptive CrCb Gaussian color model and eye features. Skin color segments are extracted from adapted Gaussain model and then shape analysis is performed on the bounding rectangle of skin color segments to reduce false alarms. From extracted skin segments, we select eye candidates as salient features because they are important in characterizing faces in different viewing geometry. The template matching with eye features are performed by modified Hausdorff distance measure. The proposed face detection method can be used in various multimedia applications like video indexing and surveillance system.
References 1. Yang, M., Kriegman, D., Ahuja, N.: Detecting Faces in Images: A survey. IEEE PAMI (2002) 34-58 2. Wang, H., Chang, S.: A Highly Efficient System for Automatic Face Region Detection in MPEG video. IEEE Trans. Cir. Sys. Video. Tech. (1997) 615-628
Face Detection with an Adaptive Skin Color Segmentation and Eye Features
857
3. Tsapatsoulis, N., Avrithis, Y., Kollias, S.: Efficient Face Detection for Multimedia Applications. Int. Conf. ICIP (2000) 26-44 4. Jones, M., Rehg, J.: Statistical Color Model with Application to Skin Detection, Proc. IEEE CVPR’99 (1999) 63-87 5. Raja, Y., McKenna, S., Gong, S.: Colour Model Selection and Adaptation in Dynamic Scenes, Proc. ECCV (1998) 65-91 6. Rucklidge, W.: Locating Objects using the Hausdorff Distance, Proc. ICCV’95 (1995) 7. Viola, P., Jones, M.: Rapid Object Detection using a Boosted Cascade of Simple Features, IEEE CVPR (2001) 78-85
Fall Detection by Wearable Sensor and One-Class SVM Algorithm Tong Zhang1, Jue Wang1, Liang Xu2, and Ping Liu1 1
Key Laboratory of Biomedical Information Engineering of Education Ministry, Xi’an Jiaotong University, Xi’an, Shaanxi 710049, P. R. China {zhangtong3000, liuhuangs}@163.com, [email protected] 2 School of Computer Science and Technology of Xidian University, Xi’an, Shaanxi 710071, P. R. China [email protected]
Abstract. The fall is a crucial problem in the elderly people’s daily life, and the early detection of fall is very important to rescue the subjects and avoid the badly prognosis. In this paper, we use a wearable tri-axial accelerometer to capture the movement data of human body, and propose a novel fall detection method based on one-class support vector machine (SVM). The one-class SVM model is trained by the positive samples from the falls of younger volunteers and a dummy, and the outliers from the non-fall daily activities of younger and the elderly volunteers. The preliminary results show that this method can detect the falls effectively, and reduce the probability of being damaged in the experiments for the elderly people.
1 Introduction The fall is a common unexpected event in daily life, it usually can only damage the young people slightly, but it is really a crucial problem to the elderly people. About 10% to 15% falls will cause serious injury in the elderly people, and more than 1/3 of the persons aged over 65 will fall at least once per year[1,2]. The early detection of fall is very important to rescue the subjects and avoid the badly prognosis[2,3]. Wearable sensor based fall detection means embedding some micro sensors into clothes, girdle, etc., to monitor the movement parameters of human body in real-time, and determine whether there is a fall occurred based on the analysis about these parameters. Currently, wearable sensor based fall detection systems usually collects acceleration, vibration and tilt signals, and set a few thresholds to these signals respectively, then make decisions by detecting whether there is one or several data over the thresholds[1,2,3]. But there exist many problems about this kind of algorithms, include lacking of adaptability, deficiently in classification precision, etc. For example: Hwang et al[3] use tilt switch to trigger the detection program, when the tilt of the person’s upper body over 70°, the program will start to process the acceleration signals to determine whether there is a fall occurred. However, if the person slides fall during going down-stairs, in general he will sit down on the stairs with only a small tilt degree on the upper body, and hence the detection program will not be triggered. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 858 – 863, 2006. © Springer-Verlag Berlin Heidelberg 2006
Fall Detection by Wearable Sensor and One-Class SVM Algorithm
859
In this paper, we use a tri-axial accelerometer to capture the movement signals of human body, and propose a novel method based on one-class SVM to detect falls. The one-class SVM model is trained by the positive samples from the falls of younger volunteers and a dummy, the outliers from the non-fall daily activities of younger and the elderly volunteers. The preliminary results show that this method can detect the falls effectively, and at the meantime the probability of being damaged in the experiments for the elderly people are reduced.
2 Methods 2.1 Materials A tri-axial accelerometer, “MMA7260Q”, is selected, it is a low-g micro-machined resistance accelerometer with the volume 4*4*1mm.The chosen measuring range is 4g and the sampling rate is 512 points per second. When in the state of static, it is the acceleration of gravity to the vector sum of the signals from axes x, y and z. The sensor is affixed to a belt and bound to the human body, as shown in Fig.1.
axz = ax + az atotle = ax + ay + az x
The Elderly Person
y
z O
Accelerometer (MMA7260Q)
Fig. 1. The accelerometer and its fixed position
2.2 One-Class SVM Algorithm One-class SVM is an extended algorithm of SVM[4,5], it divides all samples into objective field and non-objective field, mapping all the samples into high dimensional feature space by using of a kernel function. Then, in the feature space, one-class SVM computes the surface of a minimal hypersphere which involves all the objective data inside, and this minimal hypersphere will be the classifier. A group of variables named slack variables are introduced to realize the trade-off between the radius of the hypersphere and the number of the samples which outside the hypersphere. All the samples inside the hypersphere are known as positive samples, the outside samples as outliers. Let X be a positive sample set, X={xi, i=1,...,l}, xi ∈ Rd, then we use nonlinear mapping to find a minimal hypersphere in high dimensional feature space, let vector a be the centre, R be the radius of the hypersphere, and involves the samples as many as possible. That is an optimal problem as follows:
860
T. Zhang et al.
minA
R∈\ ,ξ ∈\ , a∈F
R2 +
1 ¦ξ νA i i
(1)
s.t.
ξi ≥ 0 , i ∈ [1, A ]
|| Φ( xi ) − a ||2 ≤ R 2 + ξi ,
where F is feature space, ξ i are the slack variables, 1/vl determines the volume of the hypersphere and the number of the samples which will be segmented beyond the hypersphere, v ∈ (0,1), and l expresses the number of all the samples. Based on KKT condition, and introduce the kernel function:
K ( x, y ) =< Φ ( x)<Φ ( y ) >
(2)
the dual expression of the optimal problem (1) is:
min ¦ λi λ j K ( xi , x j ) − ¦ λi K ( xi , xi ) λ
i, j
i
(3)
s.t.
0 ≤ λi ≤
1 , νA
¦λ
i
=1
i
The centre of the hypersphere is: a = ¦ λi Φ ( xi )
(4)
i
After the training, a group of support vectors will be obtained, and we can calculate the radius R by the following equation:
R 2 = ¦ λi λ j K ( xi , x j ) + K ( xs , xs ) − 2¦ λi K ( xi , xs ) i, j
i
(5)
where xs is any support vector. And then the decision function is:
f ( x) = sgn(R2 − ¦ λi λ j K ( xi , x j ) + 2¦ λi K ( xi , x) − K ( x, x)) i, j
i
(6)
To any sample, if f(x)>0, the sample will be classified in positive, if f(x)<0, the sample will be an outlier. Here, we use RBF as the kernel function, i.e.:
K ( x, z ) = exp(−
|| x − z ||2
σ2
)
(7)
2.3 Fall Detection By the wearable manner in Fig. 1, axz and ay can indicate the accelerations in the transverse section and on the vertical axis of human body respectively. In the course
Fall Detection by Wearable Sensor and One-Class SVM Algorithm
861
of falling, either or both of axz and ay will take place acute changes and there will be a strongly reverse impact when the subject striking on the ground. Various change patterns of axz and/or ay correspond to various fall situations. In order to decrease the dimensions of input space, we just select the one, between axz and 1.414ay, which has stronger reverse impact. Then we use six variables to form the input vectors:
{Δt1 , Δa1( a ) , Δa1(σ ) , Δt2 , Δa2( a ) , Δa2(σ ) } the meanings, in sequence, are: the interval between the beginning of falling and the beginning of the reverse impact, the average acceleration during Δ t1, the variance of acceleration during Δ t1, the interval of the reverse impact, the average acceleration during Δ t2, and the variance of acceleration during Δ t2. The three pre-processing steps are implemented to acquire the input vectors: 1) low-pass filtering, to get rid of the noise; 2) finding an acute changing section in the sequences of axz and ay . In generally, a falling will be completed within 0.4-0.8 second, and follows a period of relative motionless. So, we overlapped intercept 1.5 seconds data section every 0.5 second, then determine whether there is an acute changing in a section based on the non-periodically varieties of the variance of several sequential sections. If both of axz and ay changed acutely, we select the stronger; 3) getting input variables, to an acute changing section, we choose the point which has maximal difference value as the beginning of the reverse impact, then determine Δ t1 and Δ t2 from this point, and calculate the other input variables. The holdout method is applied to train and test the one-class SVM model, i.e., the data set divided into two subsets, one for training and one for testing. In order to reduce the affect of noise, for the training set, we select all the positive samples and random choose 1/3 negative samples to train the model, and for the testing set, all samples are used. Such a training-testing course is a basic operation, and it will be repeated 16 times, then we pick the optimal result as classifier. The local optimal pairs (v, 1/ σ 2) can be found via discretization and global search in each basic operation, and the global optimal pair will be selected among the local optimal pairs.
3 Experiments and Results The 12 volunteers, 8 males and 4 females aged from 10 to 70 years old with height from 1.36m to 1.80m, are selected to attend the experiments. And the experiments were done on following categories: 1) low-risk fall down, the subjects fell down on the plane with soft cushion; 2) high-risk fall down, the subjects fell down on the hard plane, stairs and slope; 3) critical movement, the subjects did fleet movements that are some alike falling down, e.g. lay prone on the ground quickly, sat down heavily; 4) sub-critical movement, it was similar to category 3, but the movements were slower, including groveling, lying down, etc.; 5) low-intensity daily activities, included daily activities, e.g. walking, jogging; 6) high-intensity daily activities, there were some acute daily activities, e.g. running, jump and gymnastics. The fall situation is dangerous to human body, especially to the elderly, so the elderly volunteers only attended category 4 and 5, the younger volunteers attended category 1 and 3 to 6, while category 1 and 2 were implemented with a dummy.
862
T. Zhang et al.
We collected about 600 samples, approximately 65 percent of the samples are used to train the one-class SVM model, and the others to test the model, as shown in table 1. Let v=0.22, 1/ σ 2=90, we obtained the test results shown in table 2. The results show that this model can detect the falls effectively. Table 1. The samples for each category category
total samples
1 2 3 4 5 6
160 224 46 70 72 30
samples for training 100 150 30 44 44 20
samples for test 60 74 16 26 28 10
demonstrated by 60-younger 100-dummy dummy younger 30-younger 40-elderly 30-younger 42-elderly younger
Table 2. The test results category
correct
incorrect
correct ratio(%)
1 2 3 4
59 72 14 26
1 2 2 0
98.3 97.3 87.5 100.0
5
28
0
100.0
6 total
8 207
2 7
80.0 96.7
4 Discussion For the current fall detection algorithms[1,2,3], there is an implied precondition, i.e., fall is such a course of that the human body posture changed from nearly upright to nearly horizontal rapidly. But in many situations, this precondition is not valid, e.g., the subject slides and sitting down on the ground, stumbled while going upstairs, and falls while bending to pick something on the ground. Hence, the current fall algorithms have many “blind areas”. Our algorithm doesn’t need the precondition, and therefore it has more powerful adaptability to various situations. In the meantime, the experiment results have shown that the algorithm has well classification performance. The shortage of our algorithm is the complexity in calculation, especially for the pre-processing. So, in the future, we intend to improve the algorithm and test some other intelligent methods. And the fall risk prediction is another subject that we want to study in the next steps. Both the detection and the prediction are important to protect the elderly people, because the high risk means the elder is not suit to live independent at least a short period, and the detection means we can rescue the elder in time when a fall occurred.
Fall Detection by Wearable Sensor and One-Class SVM Algorithm
863
5 Conclusion The wearable sensors system could capture the real-time movement data of human body under the condition of low-lost and less disturbance to daily activities, while one-class SVM could well classify the data of fall and daily movements. Combining wearable sensors with one-class SVM algorithm, we can implement the credible and efficient fall detection for the elderly people.
Acknowledgments The authors acknowledge the support from National Natural Science Foundation of China, grant 60271025.
References 1. Noury, N.: A Smart Sensor for the Remote Follow up of Activity and Fall Detection of the Elderly. Proceedings of 2nd Annual International IEEE-EMBS Special Topic Conference on Microtechnologies in Medicine&Biology, Madison, Wisconsin (2002) 314-317 2. Luo, S., Hu, Q.: A Dynamic Motion Pattern Analysis Approach to Fall Detection. IEEE International Workshop on Biomedical Circuits&Systems, Singapore (2004) S2.1_5-S2.1_8 3. Hwang, J.Y., Kang, J.M., Jang, Y.W., Kim, H.C.: Development of Novel Algorithm and Real-time Monitoring Ambulatory System Using Bluetooth Module for Fall Detection in the Elderly. Proceedings of the 26th Annual International Conference of the IEEE EMBS, San Francisco (2004) 2204-2207 4. Scholkopf, B., Plattz, J.C.: Estimating the Support of A High Dimensional Distribution. Neurral Computation, Vol. 13 (2001) 1443-1472 5. Maneviitz, L.M., Yousef, M.: One-class SVMs for Document Classification. Machine Learning Research, Vol. 2 (2002) 139-154
Feature Extraction and Pattern Classification on Mining Electroencephalography Data for Brain-Computer Interface Qingbao Liu1 , Zongtan Zhou2 , Yang Liu2 , and Dewen Hu2 1
School of Information System and Management, 2 College of Mechatronics and Automation, National University of Defense Technology Changsha, Hunan, 410073, P.R.C. [email protected]
Abstract. Electroencephalography (EEG) is a useful tool for observing brain activities. With the development of modern signal processing and statistics techniques, we have the ability to explore more information from EEG, much beyond classic trial-averaging approach. This paper discusses some characteristic methods for feature extraction and pattern classification on EEG data theoretically, and a combined spatialtemporal-frequency analysis strategy is proposed. After comparison of the linear function and support vector machine, reinforcement training is introduced to optimize a pre-designed linear classifier while at the same time restrain over-fitting of the classifier. Experiments performed on real EEG experimental data shows that it is effective in mining potential EEG patterns, and by which we achieved consistently results on all four submitted datasets of the BCI competition III.
1
Introduction
In 1924, the German psychiatrist Hans Berger first recorded electroencephalogram (EEG) from human brain [1], and since then, EEG has been widely used in the clinic to evaluate neurological disorders. In recent 30 years, it has been brought into laboratory to help scientists study the brain function. In EEG studies, Event Related Potential (ERP) is a powerful approach which pays attention to the EEG response under a specially given stimulus. Classic ERP researches usually make use of trial-averaging method, which averages several EEG signals under the same condition so that the weak features could be magnified. However, when ERPs are employed in brain-computer interface, which aims at establishing a new communicating channel between human brain and the outside world, the trial-averaging method is far from practical due to the poor signal-to-noise ratio
Supported by Natural Science Foundation of China (60575044), the Distinguished Young Scholars Fund of China (60225015), Ministry of Education of China (TRAPOYT Project), and Specialized Research Fund for the Doctoral Program of Higher Education of China (20049998012).
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 864–869, 2006. c Springer-Verlag Berlin Heidelberg 2006
Feature Extraction and Pattern Classification on Mining EEG Data for BCI
(a) Characters array on the screen
865
(b) EEG electrodes on the scalp
Fig. 1. P300 spell experiment
(SNR) and real-time requirement [2]. Modern signal processing techniques are needed. Information of not only time domain but also frequency domain should be used. On the other hand, the increased number of sensors provides an improved spatial resolution which ensures the possibility of application of various multivariate methods [3]. Common BCI Experiment Setup. Presently, experiments for most BCI applications are designed in the following mode. Subjects are asked to perform different tasks during which multidimensional EEG signals are recorded from the scalp. Fig. 1 illustrates P300 spelling paradigm. The subject’s task was to focus attention on characters in a word that was prescribed by the investigator (i.e., one character at a time). Motor imagery is another kind of task in which the subjects go through different kinds of imagery tasks such as left hand or right foot movement. BCI Data Processing Scheme. In the training phase, data is labeled by the known task, and from which a feature extraction and classification system is learned. Then in the testing phase, the practical data is put into the system and the decision of the user’s intention is made. In the following sections, we pay attention on the methods of feature extraction and pattern classification. Through analysis and comparison, a framework of spatial-temporal-frequency feature extraction and classifier design based on enforcement training is proposed.
2
Feature Extraction Methods
The spectral analysis [4], time-frequency analysis and wavelet transform (WT) method [5] consider the time course of individual channels. Sophisticated spatial filter such as Independent component analysis (ICA) [6], common subspace pattern (CSP) [7], T-weighted approach [8] extract the demanded components that reflect either the independent or the most distinguish sources. However, each
866
Q. Liu et al.
of these methods by itself focuses on a special trait of the signal. If combined properly, they may compensate each other, and mine the information sufficiently. Here we present an effective approach. First, the spatial components are separated by CSP, and then the time-frequency patterns of each component are further discovered through WT. CSP plus WT provides a promising way to make the most of information in time, frequency and space domains. The details are described below. a) CSP Transformation to Characteristic Spatial Feature of EEG Given two distributions in a high-dimensional space, the Common Subspace Pattern(CSP) algorithm finds directions that maximize variance for one class and that at the same time minimize variance for the other class. Let X1 and X2 denote the multichannel EEGs that belong to two classes, with dimension of N (channels) by T (samples), and covariance matrices Σ1 and Σ2 , respectively. First, matrix W is calculated such that W (Σ1 + Σ2 )W T = I
(1)
Let S1 = W Σ1 W T , S2 = W Σ2 W T , and whiten S1 , i.e. S1 = RDRT
(2)
where D is diagonal matrix. As S1 + S2 = I, we have S2 = R(I − D)RT . So using projection matrix P = RT W , we obtain the following result: D = P T Σ1 P,
I − D = P T Σ2 P
(3)
Usually, spatial filter is constructed by choose the columns of P corresponding to the largest several eigenvalues. Fig. 2 shows the processing results on the first and last CSP projection directions. The star denotes one task and the circle the other. After projection, on the first CSP channel, energy of task one are mostly higher than task two, and opposite on the last channel. b) Wavelet Transform for Synthesized Time-Frequency Analysis To date, wavelet transform has been used in a variety of EEG pattern recognition work. Wavelets are essentially a compromise between time-domain and frequency-domain since they allow the user to view change in frequency bands over time. Here we apply continuous wavelet transform (CWT) to the CSP components obtained above. After CSP filter, the first CSP component takes the most energy of task one and the least of task two simultaneously. The last CSP channel is opposite, i.e. the most energy of task two and the least of task one. On each channel, CWT presents distinct time-frequency patterns, as shown in Fig. 3, especially for task one on the first channel, and task two on the last channel. In time axis, it emerges at about 1s, becomes strongest at about 1.5s—2s, and fades near 2.5s, but still lasts a long time. In frequency axis, the discriminative pattern concentrates on 8-14Hz, which indicates that μ rhythm contributes the main useful information. Notice that after computing the amplitude of the cwt coefficients, we perform maximum filter on the time-frequency domain to avoid the stripe effect, and this step improves the generalization ability of the classifier.
Feature Extraction and Pattern Classification on Mining EEG Data for BCI first CSP channel
last CSP channel task 1 task 2
1.2
867
task 1 task 2
1.2 1.1
1.1
1 1
0.9 variance
variance
0.9 0.8 0.7
0.8 0.7 0.6
0.6
0.5
0.5
0.4
0.4
0.3 0.2
0.3 20
40
60 80 trial index
100
120
20
40
60 80 trial index
100
120
Fig. 2. Variances of EEG signals after CSP projection. Left: projection on the first direction; Right: projection on the last direction.
Fig. 3. Average CWT amplitudes at CSP channels of differ rent tasks
3 3.1
Fig. 4. Classifier performance with increasing feature dimensions
Pattern Classification Methods Performance Comparison of Various Classifiers
First, Some comparisons are performed among kinds of classifiers, including Linear/nonlinear discriminant analysis and support vector machines (SVM). SVM has been widely used in classification and regression due to its computational efficiency and good generalization performance [10]. Performances especially the generalization ability of classifiers are evaluated when the dimension of feature space increases. The result shows (Fig. 4) that first, linear classifiers have a lower classification error than nonlinear classifiers on test data. Second, with the increase of feature dimensions, over-fitting occurs for nonlinear classifiers but not clear for linear classifiers. Third, within linear classifiers, SVM is superior to Fisher linear function. But take the computing
868
Q. Liu et al.
efficiency into account, we choose Fisher linear function and will show that the classifier could be further improved through reinforcement training. 3.2
Reinforcement Training for Non-overfitting EEG Classifier
Reinforcement training for classifier design is a data driven method for classifier optimization, which aiming at mining the discriminative information as far as possible, and improving both the fitting and generalization ability of an existing classifier. a) Feature Construction by Correlation. After CSP and WT, each EEG time course now is transformed to a set of WT amplitudes, which can be denoted as vi = [vi1 , vi2 , . . . , viK ]T ∈ RK . We call it a sample. Then for sample j, feature vector is constructed as xj = [xj1 , xj2 , . . . , xjn ]T , where vjk vik , i = 1, 2, . . . , n (4) xji = k
is correlation between signal vj and vi , n is the number of samples. b) Reinforcement Training for the Fisher Linear Classifier. As above, now we have a set of n×n-dimensional features x1 , . . . xn , some labeled ω1 and some labeled ω2 , and an initial linear discriminant function g(w0 ) = (w0 )T x is given under the Fisher criteria. Let Xw0 = b0
(5)
where X is the feature matrix that contains all feature vectors. A better distribution of b is obtained by changing w0 to w while constraining w moving within a neighborhood of w0 so as to restrain over-fitting. That is to solve min g(Δw) = XΔw − Δb 2 +β Δw 2 (6) where Δw = w − w0 , Δb = b − b0 , and β is a penalty factor for avoiding overfitting. Optimal w is given as w = w0 + Δw, Δw = (X T X + βI)−1 X T Δb. More details can be found in [9]. Applying reinforcement training to Fisher linear function got above, a higher correct is reached on the test data, and by which we achieve good result(three third place and one second place) for dataset I, II, IVa, IVc in BCI competition III ( http://ida.first.fraunhofer.de/projects/bci/competition iii/results/).
4
Conclusion
Some proved methods for EEG feature extraction and pattern classification are studied and discussed to explore more in ERP experiments and step farther in various brain-computer interface applications. CSP plus WT gives a powerful
Feature Extraction and Pattern Classification on Mining EEG Data for BCI
869
tool for information mining in a combined space-time-frequency domain. By CSP filtering, spatial components are separated according to energy. Then timefrequency structures of each component are further analyzed. Afterwards, Fisher linear classifier is designed and put through reinforcement training to obtain a better performance. But in some cases, especially when the training set is small, the classifier may become worse. Choosing the optimal parameters of the reinforcement training is the key problem, which will be researched in our future work.
References 1. Berger, H.: Uber Das Electrenkephalogramm Des Menchen. Arch Psychiatr Nervenkr. 87(1929), 527–570 2. Wolpaw, J.R., McFarland, D.J., Pfurtscheller, G., Vaughan, T.M.: Brain-Computer Interfaces for Communication and Control. Clininal Neurophysiology. 113(2002), 767–791 3. Parra L.C., Spence, C.D., Gerson, A.D.: and Sajda, P. Recipes for the Linear Analysis of EEG. NeuroImage. 28(2005), 326–341 4. Pfurtscheller, G., Lopes da Silva FH: Event-related EEG/MEG Synchronization And Desynchronization: Basic Principles. Clin Neurophysiol. 110(1999), 1842 5. Lemm, S., Sch¨ afer, C., and Curio, G.: BCI Competition 2003–Data set III: Probabilistic Modeling of Sensorimotor μ Rhythms for Classification of Imaginary Hand Movements. IEEE Trans. Biomed. Eng. 51(2004), 1077–1080 6. Makeig, S., Bell, A., Jung, T.P., Sejnowski, T.J: Independent Component Analysis of Electroencephalographic Data. Advances in Neural Information Processing Systems. 8(1996), 145–151 7. Scherg, M., von Cramon D.: Evoked Dipole Source Potentials of the Human Auditory Cortex. Electroenceph clin Neurophysiol. 65(1986), 344–360 8. Liu, Y., Zhou, Z.T., Hu, D.W., Dong, G.H.: T-weighted Approach for Neural Information Processing in P300 based Brain-Computer Interface. Proceedings of 2005 Int. Conf. on Neural Networks and Brain. 3(2005), 1535–1539. 9. Zhou, Z.T., Liu, Y., Hu, D.W.: Classification of Movement-related Potentials for Brain-computer Interface: A Reinforcement Training Approach. International Symposium on Neural Networks 2006, Lecture Notes in Computer Science. 3973(2006), 620–628. 10. Vapnik, V. N.: The Nature of Statistical Learning Theory. Springer Press, (2000), 2nd Edition.
Feature Extraction of Hand-Vein Patterns Based on Ridgelet Transform and Local Interconnection Structure Neural Network Yu Zhang, Xiao Han, and Si-liang Ma Institute of Mathematics, Jilin University South Campus, 130012 Changchun, China {zy26, papaxiao}@email.jlu.edu.cn, [email protected]
Abstract. In this paper, we propose a multiscale feature extraction method of hand-vein patterns based on ridgelet transform and local interconnection structure neural networks. In order to restrain the noises and emphasize the hand-vein pattern in the image, we perform the multiscale self-adaptive enhancement transform based on the ridgelet transform to the hand-vein image. A neural network with local interconnection structure is designed to extract the features of the hand-vein patterns and deal with different size hand-vein patterns by using different receptive fields. By using the structural matching method we identify the hand-vein patterns. Our experimental results show that the proposed methods are superior to other methods and efficiently solve the problem of extracting features from the unclear images. But more experiments using a large database are need.
1 Introduction Security has been important in the view of privacy protection with the advance of Internet technology. Biometric techniques [1], as a method of personal identification, have been attracting more attention for security. There are many types of biometric systems available for personal identification such as handprints [2][3][4], facial features[5], iris/retina[6], voice[7] and finger-vein recognition[8]. However, many people have aversion for authentication with handprint and dislike bringing the implement very close to the eyes for iris recognition. Other biometric systems do not have satisfactory effects for actual identification. We pay attention to authentication with hand-vein patterns because it has seven advantages: 1. The vein pattern in the back of a person's hand provides good distinction between individuals; 2. Except for the size of hand, the vein patterns do not change with time; 3. Vein patterns are inside the human body, people don't leave them around (like handprints) nor can they be easily observed like Iris patterns or faces; 4. Vein patterns are large robust internal patterns; 5. Vein structures are not easily captured or reproduced like other biometric traits; 6. Identical twins have different and distinct IR absorption patterns; 7. Vein patterns are not easily observed, damaged, obscured or changed. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 870 – 875, 2006. © Springer-Verlag Berlin Heidelberg 2006
Feature Extraction of Hand-Vein Patterns
871
2 Materials and Principle Light transmission through tissues is the greatest within the red and near-infrared wavelength bands. At visible wavelengths, transmission diminishes because of electronic absorption in tissue pigments, such as hemoglobin, myoglobin, and melanins. At longer infrared wavelengths, broad and intense absorption bands correspond to the fundamental vibration modes of water bonds. At near-infrared wavelengths, hemoglobin has lower absorbance than at visible wavelengths, but it is relatively high compared to other proteins in the tissue. Therefore, we devise a biometric system using patterns of hand vein, that is, an infra-red light is transmitted from the dorsal hand and the veins just beneath the skin of the hand then emit a black reflection, giving a picture of the veins in the hand. Fig. 1 show a block diagram of the flow of identification process and the optical trans-illumination images of a hand, respectively.
Fig. 1. Flow of identification process and Trans-illumination images of a hand
3 Main Methods 3.1 Mutiscale Self-adaptive Enhancement Processing Based on Ridgelet Transform Hand-vein recognition is a new topic of research and only two articles were published until now. An Asp (application specific professor) is used to extract the hand-vein patterns in [9]. The paper of [10] adopts phase only correlation and template matching as a recognition algorithm to perform. But both methods have high false rates for obscure vein images. To solve this problem, we propose a multiscale feature extraction method of hand-vein pattern based on ridgelet transform [11][12]. Experimental results show this method is superior to the papers [9] and [10], and the equal error rate (EER) is only 0.130%. E.J. Candes and D.L.Donoho presented a multiscale method named ridgelet transform [11][12], which is obtained by adding a direction function based on ridgelet function. Thus it has not only the ability of local time-frequency analysis like ridgelets, but also the ability of direction selection and discrimination, which can show line singularities of the images efficiently, such as linear profile in the images. In this paper, in order to restrain the noises and emphasize the hand-vein linear patterns in the images, we conduct the following multiscale self-adaptive enhancement transform based on ridgelet transform to the captured hand-vein image. At the same time, we do self-adaptive adjustments according to the function coefficients. Here the transform function is:
872
Y. Zhang, X. Han, and S.-l. Ma
f ( x) = a [σ (c( x − b)) − σ (−c( x + b)) ] .
(1)
Here, a=
1 , 0 < b <1, σ (c(1 − b)) − σ (−c(1 + b))
σ ( x) =
1 1 + e− x
(2)
(3)
b and c are the parameters to control the enhancement range and intensity. From Fig. 2, the transform function magnifies the coefficients with mean absolute values. The smaller coefficients correspond the noise parts while the larger ones correspond the clear parts.
Fig. 2. The plot of transform function f ( x) when b = 0.25 , c = 40
Readjust the ridgelet coefficients after multiscale transform and obtain the enhanced hand-vein image. 3.2 Local Interconnection Neural Network Neural Network Structure. To extract the vein patterns effectively, we construct a neural network with local interconnection structure to extract the straight line feature in those small regions which can be regard as the hand-vein patterns can be regarded as several straight lines. The local interconnection network consists of three layers. The input layer has seven columns and each column has seven input nodes, and the hidden layer has seven nodes, and the output has only one node. The seven nodes in the input layer and the one node in the hidden layer form the local interconnection. That is, the receptive field of each node in the hidden layer is composed of 1 column (7 nodes). There are 3 reasons to construct such local interconnection neural networks: The receptive field has the similar form to the straight line which makes it convenient to extract the line features; the nodes in every receptive field are independent each other and the ones in the same receptive fields have certain correlation each other. The local and global information are both considered; and because it is a local interconnection neural network, the computation complexity is reduced for the global neural network.
Feature Extraction of Hand-Vein Patterns
873
Train the Neural Network. To extract the hand-vein pattern fast, accurately and efficiently, we let the output be the mean value of the corresponding square of the input node. There are two kinds of training samples: one is the positive sample, which requires that the corresponding region of the nodes in the fourth column in the Fig.5 is the one containing the hand-vein pattern and the training output is +1; the other is all the samples but positive ones and the corresponding training output is –1. The training process is as follows: We choose some positive samples to train the network first. − Apply this network to extract the hand-vein pattern in the image and record the wrong extraction region. − Put the wrong extraction region to the negative samples and add some negative samples to the training set. − End if the training requirement is met. Otherwise go back to 2 and continue training.
The output of each BP neural network node needs the mean of a square neighborhood, which determines the speed of the whole system. A method called integral image [13] is applied here and the computation rate is enhanced greatly. We choose the RPROP algorithm [14] to train the neural networks which converges very fast. Simulating the Neural Network. To extract the hand-vein pattern with α angle, we only rotate the receptive field of the neural network by α . According to the scale and angle (scale determined by the magnitude of the mean value of the square neighborhood), the pyramid searching algorithm is applied to neural network obtained from (2) to find the hand-vein pattern. If the network output is greater than zero, set the region central +1, otherwise zero. Then we can extract the binary hand-vein pattern. See Fig. 3.
Fig. 3. Hand-vein image captured by our device and the corresponding hand-vein patterns features extracted
3.3 Matching the Hand-Vein Patterns
In the matching process, the pattern is converted into matching data, and these data are compared with recorded data. Here, we use the structural matching method [15] of the hand-vein patterns to identify the line-shaped patterns.
4 Experimental Results We use 128 hand-vein images from 32 volunteers, every 4 hands to test the system. Fig. 4 shows the ROC curve of the test result. Curve 1 is the ROC curve for the method
874
Y. Zhang, X. Han, and S.-l. Ma
[1], Curve 2 is the ROC curve for the method [2] and Curve 3 is the ROC curve for the proposed method. Therefore the proposed method is better than the previous ones [1] [2]. The equal error rate (EER) of the proposed method is 0.130%. The time of feature extraction and matching is less than 5 seconds for those volunteers.
Fig. 4. ROC curve of the hand-vein matching
More experiments using a large database are needed to prove those results.
Acknowledgment This work was supported by the National High-Tech Research and Development Plan of China under Grant No.2004AA001110.
References 1. Shen, W., Surette M., Khanna, R.: Evaluation of Automated Biometrics-Based Identification and Verification Systems. Special Issue on Automated Biometric Systems. Proc IEEE 85(9) (1997) 1463–1478 2. Jain, AK., Pankanti, S.: Automated Fingerprint Identification and Imaging Systems. Lee HC, Gaensslen RE (eds) Advances in handprint technology, 2nd edn. Elsevier, New York (2001) 3. Prabhakar, S., Jain, AK., Pankanti, S.: Learning Fingerprint Minutiae and Type. Pattern Recog 36(8) (2003) 1847–1857 4. Maio, D., Maltoni, D.: Direc Gray-Scale Minutiae Detection in Fingerprints. IEEETrans PatternAnal Mach Intell 19(1) (1997) 27–40 5. Turk, M. A., Pentland, A. P.: Face Recognition Using Eigenfaces. Proc. IEEE Conference on Computer Vision and Pattern Recognition (1991) 586–591 6. Zhu, Y., Tan, T., Wang, Y.: Biometric Personal Identification Based on Iris Patterns. Proceedings of the 15th International Conference on Pattern Recognition, Barcelona, Spain (2000) 805–808 7. Venayagamoorthy GK., Moonasar, V., Sandrasegaran, K.: Voice Recognition Using Neural Networks. Proceedings of the IEEE South African symposium on communication and signal processing (COMSIG 98) (1998) 30–32
Feature Extraction of Hand-Vein Patterns
875
8. Kono, M., Ueki, H., Umemura, S.: Near-Infrared Finger Vein Patterns for Personal Identification. Applied Optics, Vol. 41, No. 35 (2002) 7429-7436 9. Daugman, J.: Statistical Richness of Visual Phase Information: Update on Recognizing Persons by Iris Patterns. International Journal of Computer Vision, Vol. 45(1) (2001) 25-38 10. Donoho, D L.: Orthonormal Ridgelets and Linear Singularities. SIAM Journal on Mathematical Analysis, 31(5) 2000 1062~1099 11. Cand`es, E.,: Ridgelets: Theory and Applications. Ph.D. thesis, Department of Statistics, Stanford University (1998) 12. IM, S. K.: An Biometric Identification System by Extracting Hand Vein Patterns. Journal of the Korean Physical Society, 38(3) (2001)268-272 13. Toshiyuki, T., Naohiko, K.: Biometric Authentication by Hand Vein Patterns. SICE Annual Conference in Sapporo (2004) 249-253 14. Crow, F.: Summed-Area Tables for Texture Mapping. In Proceedings of SIGGRAPH, Vol. 18(3) (1984) 207–212 15. Riedmiller, M., Braun, H.: A Direct Adaptive Method for Faster Back Propagation Learning: The RPROP algorithm, Proc. of the IEEE Intl. Conf. on Neural Networks (1993) 586–591 16. Zhang, W., Wang, Y.: Core-Based Structure Matching Algorithm of Fingerprint Verification. Proceedings of the IEEE. International conference on pattern recognition, 1 (2002) 70–74.
Fuzzy Support Vector Machines for Automatic Infant Cry Recognition Sandra E. Barajas-Montiel and Carlos A. Reyes-García Instituto Nacional de Astrofisica Optica y Electronica, Luis Enrique Erro #1, Tonantzintla, Puebla, Mexico {sandybarajas, kargaxxi}@inaoep.mx
Abstract. Crying is the only communication way recently born babies have to express their needs. Several studies have shown that infant cry can be a valuable tool to determine the different infant’s emotional, and physiological states. With the aim in usefully applying the crying information, in this paper we present the use of Fuzzy Support Vector Machines (FSVM) for two different infant cry recognition tasks. In the first one to identify pathologies, we classify Normal, Deaf, and Asphyxia infant cries. The second problem is about identifying Pain cries, Hunger cries and No-Pain-No-Hunger cries which are those that do not belong to any of the first two classes. Here we show that FSVM perform better than conventional SVM reaching a correct classification accuracy of up to 90%.
1 Introduction Support Vector Machines (SVM) is a recently becoming popular classification technique developed by Vapnik and his group at AT&T Bell laboratories [1]. Experimental results indicate that SVM can achieve a generalization performance that is greater than or equal to other classifiers, while requiring significantly less training data to achieve good results [2]. An SVM is a binary classifier which makes its decisions by constructing a linear decision boundary or hyperplane that optimally separates the two classes. When there is an n-class problem, it is converted into n two-class problems in conventional classification with SVM. In each binary resulting problem a decision function f(x) is constructed to separate certain class from the others, when f(x)>0 for class i, x is classified into that class. In this kind of n-class pattern recognition problems, some times the decision function results positive for more than one class or some times it is less than zero, in both cases the datum is unclassifiable. To overcome this problem, Fuzzy Support Vector Machines (FSVM) were proposed [3]. Truncated polyhedral pyramidal membership functions are defined [4] in the basis on the functions obtained by training the SVM’s, and solve unclassifiable regions. This paper presents two different 3-class infant cry recognition tasks. The first one to identify pathologies, we classify infant cries from Normal, Deaf, and Asphyxiating babies. The second problem is about identifying Pain cries, Hunger cries and cries which do not belong to any of these two classes and which constitute the named No-Pain-No-Hunger class. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 876 – 881, 2006. © Springer-Verlag Berlin Heidelberg 2006
Fuzzy Support Vector Machines for Automatic Infant Cry Recognition
877
2 The Automatic Infant Cry Recognition Process The Automatic Infant Cry Recognition (AICR) process is very similar to the Speech Recognition Process. Basically the AICR can be divided in two stages. The first stage is for signal processing and the second one is for pattern classification. In the signal processing phase, the cry signal is first normalized and cleaned, and then it is analyzed to extract the most important features in function of time In AICR like in any pattern recognition problem, the goal is that given an input pattern we obtain as an output at the end of the recognition process the class to which this pattern belongs. 2.1 Signal Processing Phase The acoustical analysis of the raw cry wave form provides the information needed for its recognition. At the same time, it discards unwanted information such as background noise and channel distortion [5]. In this phase we make a transformation of measured data into pattern data. There are several techniques for analyzing cry wave signals, for the described experiments we used Mel Frequency Cepstral Coefficients. MFCC are similar to the ear’s perceptual characteristics. They can be obtained as filtered signals through different frequency scales. The Mel spectrum operates on the basis of selective weighting of the frequencies in the power spectrum. High order frequencies are weighted on a logarithmic scale whereas lower order frequencies are weighed on a linear scale. In this way, MFCC pretend to emulate the filtering properties of the ear, which is much more sensitive to some frequencies than to others [6], [7]. These coefficients are calculated in small frames of the signal on time. Only the first M cepstral coefficients are taken as features. The spectral form is modelled by the first coefficients and their precision depends on the number of coefficients that are taken. The set of values for n features may be represented by a vector in an ndimensional space, which further on, each is taken as a pattern. 2.2 Pattern Recognition Phase In this phase we determine the class or category to which each cry pattern belongs to. The set of values for n features represented by a vector is divided in two subsets: The training set and the test set. First the training set is used to teach the classifier how to distinguish the different cry types. Then, the test set is used to determine how well the classifier assigns the corresponding class to a pattern by means of the classification rule generated during training.
3 Binary and N-Class Support Vector Machines The binary support vector classifier uses the discriminant function of the following form:
f ( x ) = α k s (x ) + b .
f : X ⊆ ℜn → ℜ (1)
878
The
S.E. Barajas-Montiel and C.A. Reyes-García
k s ( x ) = [k ( x, s1 ),..., k ( x, s d )] is the vector of evaluations of kernel funcT
tions, centered at the support vector
S = {s1 ,..., s d }, s i ∈ ℜ n which usually is a
α ∈ ℜ l is a weight vector and b ∈ ℜ q : X → Y = {1, 2} is defined as
subset of the training data. The binary classification rule
is a bias. The
1 for f ( x ) > 0, q(x ) = ® ¯2 for f ( x ) ≤ 0 . The n-class generalization involves a set of discriminant functions f y
(2)
:X ⊆
ℜ n → ℜ , y ∈ Y = {1, 2, ..., c}defined as
f y ( x) = α y k s (x ) + b y , Let the matrix
y ∈Y .
(3)
A = [α 1 ,..., α c ] be composed of all weight vectors and b =
[b1 ,..., bc ] be a vector of all biases. The multi-class classification rule q : X → Y = {1,2,..., c} is defined as q( x ) = arg max f y ( x ) . (4) T
y∈Y
In this formulation, however unclassifiable regions remain, where some f (x ) have the same values. Reference [3] propose Fuzzy Support Vector Machines for conventional one – to - (n - 1) formulation to solve unclassifiable regions.
4 Fuzzy Support Vector Machines In this section we present the Fuzzy Support Vector Machines (FSVM) proposed in [3]. FSVM were introduced in order to decide on unclassifiable regions. In an n-class problem, for class i there are defined one-dimensional membership functions mij(x) on the directions orthogonal to the optimal separating hyper planes fj(x)=0 as follows: For i=j
for f i > 1, 1 mii ( x ) = ® ¯ f i ( x ) otherwise .
(5)
for f j < −1, °1 mii ( x ) = ® °¯− f j ( x ) otherwise .
(6)
For i ≠ j
Fuzzy Support Vector Machines for Automatic Infant Cry Recognition
879
The class i membership function of x is defined using the minimum operator for mij(x)(j,…,n):
mi ( x ) = min mij ( x ) . j =1,...,n
(7)
The datum x is classified into the class
arg max mi ( x ) . i =1,...,n
(8)
In realizing the fuzzy pattern classification, it is not necessary to implement the membership functions mi(x) given by (7). The procedure of classification is as follows. 1. For x, if fi(x) > 0 is satisfied for only one class, the input is classified into the class. Otherwise, go to Step 2. 2. If fi(x) > 0 is satisfied for more than one class i(i = i2,…, il,l>1), classify the datum into the class with the maximum f i (x )(i ∈ {i1 ,..., il }) . Otherwise, go to Step 3. 3. If fi(x) ≤ 0 is satisfied for all classes, classify the datum into the class with the minimum absolute value of f i ( x ) .
5 Implementation For the present experiments we worked with a corpus of patterns of infant cries labeled with information like infant age and the reason for the cry. The infant cries were collected by recordings done directly by medical doctors. After filtering and normalizing, each signal wave was divided in segments of one second duration. Then, acoustic features were extracted by means of Frequencies in the Mel scale (MFCC), with the freeware program Praat v4.0.8 [8]. Every one second sample is divided in frames of 50-milliseconds and from each frame we extract 16 coefficients. This procedure generates vectors with 304 coefficients by sample. In order to reduce the dimensions of the sample vectors we apply Principal Component Analysis (PCA). Our corpus is composed by 209 samples from pain cries, 759 samples of hunger cry, and 659 samples representing the class no-pain-no-hunger, this last set includes the sleepy and uncomfortable types. For the classification of pathologic cry we had a corpus of 1627 samples of normal babies, 879 samples of deaf babies, and 340 samples of asphyxiating babies. All the parameter values of the classifier were established in a heuristic way after completion of several experiments. During the experiments the 10-fold cross validation technique was used to evaluate the performance and reliability of the classifier.
6 Experiments and Preliminary Results Two different classification tasks were performed. In the first one, to identify pathologies, we worked with cry samples belonging to the Normal–Deaf–Asphyxia (N-D-A) classes. In the second classification task we had a corpus formed by samples of normal babies to identify the Pain-Hunger-No-Pain-No-Hunger (P-H-NPNH) classes. The
880
S.E. Barajas-Montiel and C.A. Reyes-García
results of the experiments are shown in Table 1 and Table 2, respectively. Each Table shows the percentage of correct classification using SVM and FSVM. In each classification task different number of principal components (PC) was used, here the number of PC tested in the experiments was 2, 3, 10, 16 and 50 respectively. The experiments show that the best results are achieved when 10 PC and FSVM were used. Table 1. Results of Normal–Deaf–Asphyxia (N-D-A) infant cry classification with Support Vector Machines and Fuzzy Support Vector Machines
Problem SVM FSVM
PCA2 74.1304 75. 3913
(N-D-A) % Classification Accuracy PCA3 PCA10 PCA16 91.4493 77.7536 94.7741 91. 9710 78.5507 94. 9816
PCA50 77.667 82.8986
Table 2. Results of Pain-Hunger-No-Pain-No-Hunger (P-H-NPNH) infant cry classification with Support Vector Machines and Fuzzy Support Vector Machines
Problem SVM FSVM
PCA2 58.3436 59.8160
(N-D-A) % Classification Accuracy PCA3 PCA10 PCA16 89.1411 55.0307 97.7914 89.6687 58. 6135 97.8282
PCA50 54.0307 55.9816
When working with pathological and normal cry samples the maximum correct classification obtained was 94.9816%. The poorest classified class was asphyxia, perhaps because of its lower number of samples abailable. In the second classification task (P-H-Np_Nh) the best classification score was 97.8282% and the class with more identification problems was the hunger class. One reason might be that this class presents characteristics similar to the uncomfortable cries.
7 Conclusions In this paper we present the automatic classification of infant cry by means of Fuzzy Support Vector Machines. Fuzzy Support Vector Machines were introduced to solve unclassifiable regions that remain with the use of conventional Support Vector Machines. We worked with two different 3-class problems to classify infant cry, in the first one to identify pathologies, the cry samples were divided into the Normal, Deaf, and Asphyxia classes (N-D-A); in the second one we used samples only of normal babies labeled to identify the Pain, Hunger, and No-Pain-No-Hunger classes (P-H-NPNH). Particularly, in the kind of problems we explore in this work, the Fuzzy Support Vector Machines showed improvement in classification performance over conventional SVM. We obtained the best correct classification in both classification tasks when using 10 principal component vectors. In the N-D-A for SVM problem we obtained 94.77 % of correct classification, and for FSVM 94.98 %, an average improvement of 0.21% on classification accuracy. In the P-H-NpNh problem the correct classification percentage obtained by SVM was 97.79%; and 97.82 % by FSVM, which shows a 0.03 % of average improvement between the models. The infant cry
Fuzzy Support Vector Machines for Automatic Infant Cry Recognition
881
correct classification results obtained with FSVM until this moment are very encouraging. We think that with a larger number of samples we could be able to generalize better our results in order to be closer to end up with a robust system that can be applicable to real life, and to other pathologies related with the central nervous system. The collection of more samples will also allow us to include a larger number of normal cry classes, and perhaps we could deal also with the identification of deafness levels.
Acknowledgments This work is part of a project that is being financed by CONACYT-Mexico (C01-46753).
References 1. Cortes, C. , Vapnik, V.: Support Vector Networks. Machine Learning, Vol. 20 (1995) 1-25
2. Wan, V., Campbell, W.M.: Support Vector Machines for Speaker Verification and Identification, IEEE International Workshop on Neural Networks for Signal Processing, Sydney, Australia (2000) 3. Inoue, T. , Abe, S.; Fuzzy Support Vector Machines for Pattern Classification. Proceedings of International Joint Conference on Neural Networks (IJCNN ‘01), Vol. 2 (2001)
1449–1454 4. Abe, S.: Pattern Classification; Neuro-Fuzzy Methods and their Comparison, SpringerVerlag, London (2001) 5. Livinson, S.E., Roe, D.B.: A Perspective on Speech Recognition, IEEE Communications Magazine, (1990) 28-34 6. Orosco, J. , Reyes, CA.: Mel-Frequency Cepstrum Coefficients Extraction from Infant Cry for Classification of Normal and Pathological Cry with Feed-Forward Neural Networks, Proc. International Joint Conference on Neural Networks. Portland, Oregon, USA (2003) 3140-3145 7. Reyes, O., Reyes, CA.: Clasificación de Llanto de Bebés para Identificación de Hipoacusia y Asfixia por medio de Redes Neuronales, Proc. of the II Congreso Internacional de Informática y Computación de la ANIEI, Zacatecas, México (2003) 20-24 8. Vojtech, F., Vaclav, H.: Statistical Pattern Recognition Toolbox, Czech Technical University, Prague (1999) 9. Boersma, P., Weenink, D.: Praat v 4.0.8: A System for Doing Phonetics by Computer. Institute of Phonetic Sciences of the University of Amsterdam (2002)
Geodesic Gabriel Graph Based Supervised Nonlinear Manifold Learning Huajie Chen and Wei Wei College of Electrical Engineering, Zhejiang University, HangZhou 310027, China [email protected]
Abstract. As for the discriminant analysis on nonlinear manifold, a geodesic Gabriel graph based supervised manifold learning algorithm was proposed. Using geodesic distance to discover the intrinsic geometry of the manifold, the geodesic Gabriel graph was constructed to locate the key local regions where the local linear classifiers would be learned. The global nonlinear classifier was achieved by merging the multiple local classifiers applying the soft margin criterion, which assigned the optimal weight to each local classifier in an iterative way without any assumption of the distribution of the example data. The superiority of this algorithm is confirmed by experiments on synthesized data and face image databases.
1 Introduction Many high-dimensional data in real-world applications such as computer vision and pattern recognition can be modeled as data points lying on a low-dimensional nonlinear manifold. Some dimensionality reduction methods have been proposed for learning complex embedding manifolds, i.e. Locally Linear Embedding (LLE) [1] and Isomap [2], by using local geometric metrics within a single global coordinate system. However, these methods are developed to best preserve data localities or similarities in the embedding space, and consequently cannot guarantee good discriminating capability. Only a few extended manifold learning algorithms explicitly address classification problems [3], [4]. Because the corresponding labels are required to map example data to the low-dimensional space, it remains a difficult issue to extend the mapping function to new test data. The underlying idea of the above manifold learning is that, the local geometry on the manifold is obtained firstly, and then aligned and extended to the global geometry. Following this idea, we propose a supervised manifold learning algorithm by merging local linear classifiers. The geodesic distance [2] instead of Euclidean distance is applied to reflect the intrinsic geometry of the example data, and then geodesic Gabriel graph is constructed to find out the key local regions. The local classifiers are learned in these local regions and merged into a final global classifier using soft margin criterion. By so doing, the whole learning task is decomposed into two sub-tasks of local learning and classifiers merging, and therefore significantly simplified. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 882 – 887, 2006. © Springer-Verlag Berlin Heidelberg 2006
Geodesic Gabriel Graph Based Supervised Nonlinear Manifold Learning
883
2 Geodesic Gabriel Graph Construction We assume that a d -dimensional manifold M embedded in a m -dimensional space ( d < m ) can be represented by the following function:
f : C → Rm ,C ⊂ Rd ,
(1)
where C is a compact subset of R d . Two points are said to be Gabriel neighbors if their diametral sphere does not contain any other points. For every potential pair of neighbors A and B, we just verify if any other point P is contained in the diametral sphere:
d 2 ( P, A) + d 2 ( P, B) < d 2 ( A, B ) ,
(2)
where d is some kind of metric such as the well-known Euclidean distance. A graph where all pairs of Gabriel neighbors are connected with an edge is called the Gabriel graph [5]. For a unit speed curve on a surface, the length of the surface-tangential component of acceleration is the geodesic curvature kδ . A curve with kδ = 0 is called geodesics. For data lying on a nonlinear manifold, the intrinsic distance between two data points is the geodesic distance on the manifold, i.e. the distance along the surface of the manifold, rather than the straight-line Euclidean distance, as Fig. 1 demonstrates. When using geodesic distance to capture the relations among the data points lying on nonlinear manifold, formula (2) is replaced by:
g 2 ( P, A) + g 2 ( P, B) > g 2 ( A, B ) ,
(3)
where g is the geodesic distance. Two points A and B are called geodesic Gabriel neighbors (GGN), if for all other points D in the set, the formula (3) is true. A graph where all pairs of GGNs are connected with an edge is called the geodesic Gabriel graph (GGG).
Fig. 1. Geodesic distance (solid line) and Euclidean distance (dashed line)
The basic idea of the geodesic distance estimation is that, for a neighborhood of points on a manifold, the Euclidean distances provide a good approximation to geodesic distance and for faraway points, the geodesic distance is estimated by the length
884
H. Chen and W. Wei
of the shortest path through neighboring points. For a new test data xt , we firstly calculate the Euclidean distances between xt and training data, and the k closest neighborhood training data form the neighborhood set Z ( xt ) . The geodesic distances between xt and other faraway training data are estimated as: g ( xt , xi ) = min ( g ( x j , xi ) + d ( xt , x j )) . j : x j ∈Z ( xt )
(4)
3 Local Linear Classifier Learning We firstly estimate the conditional probability of the example point for the local classifiers according to the geodesic distance between the example and the corresponding GGN, and then obtain the local classifier employing weighted regularized linear discriminant analysis (WRLDA). Suppose {( x1k , x2k ) | k = 1, 2,..., K } be a set of GGN that lie on a low dimensional manifold embedded in a high dimensional observed space. For each GGN ( x1k , x2k ) , a local classifier is given as f k . The conditional probability for f k of an arbitrary data x on this manifold, can be obtained as [6]: K
p( f k | x) = p k ( x) / ¦ j =1 p j ( x) , p k ( x) = exp(−α k ( x)) ,
(5)
where α k ( x) is the activity signal of the data for the kth local classifier, which can be estimated with the width constant t :
α k ( x) = ( g ( x, x1k ) + g ( x, x2k )) 2 / t .
(6)
In order to construct the local classifier f k , each example data is represented by a feature vector of its geodesic distance to GGN ( x1k , x2k ) : xi → zi = [ g ( xi , x1k ), g ( xi , x2k )]T .
(7)
The global intra-class scatter matrix can be represented as: S w = wT ( ¦ ( Z j − Z j )( Z j − Z j )T ) w = wT M w w , j =1,2
(8) 2× N
where w is the projection mapping matrix, the i -th column vector of Z j ∈ R j N j is the number of examples belonging to j -th class) is Z j (i ) = zi ⋅ p( f k | xi ) , and each column vector of Z j is the weighted mean of examples of j -th class. The global inter-class scatter matrix can be represented as: S B = wT ( ¦ N j ( Z j − Z )( Z j − Z )T ) w = wT M B w , j =1,2
where Z is the weighted mean of all the examples.
(9)
Geodesic Gabriel Graph Based Supervised Nonlinear Manifold Learning
885
The Fisher criterion can be used to find the optimal projection to maximize the ratio of the determinant of the inter-class scatter matrix S B to the intra-class scatter matrix SW in traditional LDA. However, it is well-known that the applicability of Fisher criterion to high-dimensional pattern classification tasks such as face recognition often suffers from the so-called “small sample size” (SSS) problem arising from the small number of available training samples compared to the dimensionality of the sample space [7]. Considering the possible distributional sparseness of the example data in real world applications, the regularized Fisher criterion is adopted:
arg max ( wT M B w ) / ( λ wT M B w + wT M W w ) ,
(10)
w
where 0 ≤ λ ≤ 1 is a regularization parameter. This problem has close form solution and can be directly computed out using the eigen-decomposition of transferred matrix of M B and M W [7]. Thereafter, the corresponding local classifier can be obtained as f k ( xi ) = wT zi .
4 Local Classifiers Merging Based on Soft Margin Criterion A global nonlinear classifier is available by merging local classifiers: F ( x) =
¦
α k p( f k | x) f k ( x) ,
k =1,... K
(11)
where f k denotes the k -th local classifier and α k is the corresponding weight coefficient. The Fisher criterion and some of its variations like formula (10) are powerful in many linear classification cases to find the optimal α = {α k } under the assumption that the data classes are Gaussian with equal covariance structure. Unfortunately, the assumption is not always true for the data lying on the complex nonlinear manifold, i.e. face images [8]. The margin based optimization methods concern the margins of the examples, i.e. the difference between the number of correct votes and the maximum number of votes received by any incorrect label, with out any limitation of the examples’ distribution [9]. Following this basic idea of margin, our merging algorithm is to maximize the minimal margin by updating the weight coefficients stepwise, with the soft margin criterion: arg min ¦ exp(− yi F ( xi )) , α
i
(12)
where F is the global classifier. Given the training example set {xi | i = 1, 2,...N } , the procedure of the optimization algorithm is given as following: 1) Start with weights wi = 1/ N , i = 1, 2,...N 2) Repeat for m = 1, 2,...M a) With {wi } , calculate the weight coefficients {α km } by using WRLDA ; b) Construct the classifier Fm ( x) according to formula (12);
886
H. Chen and W. Wei
c)
Update wi ← wi exp(− yi Fm ( xi )), i = 1, 2,...N
¦w
i
and renormalize so that
=1;
i
3) Output the final result α k = ¦ α km . m
5 Experiments In order to evaluate the proposed algorithm, experiment of face recognition on real face images, are presented in this section.We tested the proposed algorithm against Eigenface, Fisherface, Isomap and LLE methods using the publicly available Yale (http://cvc.yale.edu/projects/yalefaces/yalefaces.html) and UMIST databases (http:// images.ee.umist.ac.uk/danny/database.html). There are total 165 face images belonging to 11 objects in the Yale database, varying in expression and lighting conditions. The face blocks were cropped, aligned, and then scaled to 28 (Height)×24(Width). For each object, 7 images were selected randomly as training examples and the other 8 images as test examples. The UMIST database contains 564 images belonging to 20 objects varying in pose. For each object, 10 images from frontal to profile were selected out, including 5 training examples and 5 test examples with the resolution 28 (Height)×23(Width). The nearest neighbor rule was applied to classify the examples after dimension reduction for Fisherface, Isomap and LLE method. Some examples from these two databases and the experimental results are available in Fig. 2. It is obvious that our algorithm improves the detection rate significantly. Example 25
23
20.45
18.18
Erroe rate: %
20 15
19.32
21
13.64
15
14
10.23
10
22
20
9
10
5
5
0 Eigenface Fisherface Isomap
(a) Yale
LLE Our algorithm
0 Eigenface Fisherface Isomap
LLE Our algorithm
(b) UMIST
Fig. 2. Examples and experimental results
6 Conclusions In this paper, we present a GGG based supervised learning algorithm for the supervised learning on nonlinear manifold. This algorithm utilizes Geodesic distance instead of Euclidean distance to capture the global geometry of the data points lying on
Geodesic Gabriel Graph Based Supervised Nonlinear Manifold Learning
887
some sort of manifolds. Thereafter, GGG segments the whole manifold into some key local regions, thus the global nonlinear classification problem is replaced by local linear classification problem. Multiple local classifiers are merged into a global classifier by using the soft margin criterion without any limitation of the example data’s distribution. Experiments on the synthesized data and real face images justify the effectiveness of this algorithm. We are going to apply this algorithm in some other real world tasks of computer vision including expression detection and face recognition under pose variation.
Acknowledgments This work is supported by a grant from the Natural Science Fund for distinguished scholars of Zhejiang Province (No.R105341).
References 1. Roweis S.T, Saul L.K: Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science. 290 (2000) 2323–2327 2. Joshua B. Tenenbaum, Vin de Silva, Johh C. Langford: A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science. 290 (2000) 2319–2323 3. Wu Y.M., Chan K.L: An Extended Isomap Algorithm for Learning Multi-Class Manifold. IEEE International Conference on Machine Learning and Cybernetics, Shanghai, 26-29 August, 6 (2004) 3429 –3433 4. Chen Hwann-Tzong, Chang Huang-Wei, Liu Tyng-Luh: Local Discriminant Embedding and Its Variants. Computer Vision and Pattern Recognition, 2005. IEEE Computer Society, 2(2005) 846 – 853 5. Zhang Wan, King I: A Study of The Relationship between Support Vector Machine And Gabriel Graph. Neural Networks. International Joint Conference on Neural Networks. 1(2002) 239–244 6. Roweis, S, Saul L, Hinton, G: Global Coordination of Local Linear Models. Advances in Neural Information Processing System. 14(2001) 889–896 7. Lu, J.W., Plataniotis, K.N., Venetsanopoulos, A..N.: Regularization Studies of Linear Discriminant Analysis in Small Sample Size Scenarios with Application to Face Recognition. Pattern Recognition Letter, 26(2005) 181–191 8. Kim, T.K,, Kittler, J.: Locally Linear Discriminant Analysis for Multimodally Distributed Classes for Face Recognition with A Single Model Image. IEEE Trans. on Pattern Analysis and Machine Intelligence, 27(2005) 318–327 9. Schapire, R,., Freund, Y,., Bartlett, P,., et al: Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods. Annals of Statistics. 26(1998) 1651–1686
Grouping Sampling Reduction-Based Linear Discriminant Analysis Yan Wu and Li Dai Dept. of Computer Science and Technology, Tongji University, 1239 Si Ping Road, Shanghai 200092, P.R. of China [email protected]
Abstract. This paper proposes a new feature extraction method called grouping sampling reduction-based linear discriminant analysis. It solves the small sample size problem by using grouping sampling reduction to generate virtual samples with larger number and lower dimension than the original samples. The experiment result shows its efficiency and characteristic of high recognition rate.
1 Introduction Linear Discriminant Analysis (LDA) is a classical feature extraction method and has been widely used in pattern recognition. Traditional LDA is only capable of solving the problems in which the within-class scatter matrix is nonsingular. However, in some problems, such as face recognition, the dimension of the sample space is so high that it is difficult or impossible to find enough samples to make the within-class scatter matrix nonsingular. This is called a small sample size problem. How to extract fisher optimal discriminant feature in a small sample size problem has been an acknowledged problem and aroused people’s wide interest in recent years. Different methods have been proposed to solve it. They can be divided into two sorts: 1. From the view of algorithm, develop a new LDA-based algorithm. Yu and Yang [1] proposed Direct LDA (DLDA). It removes the null space of the between-class scatter matrix and seeks the optimal discriminant vectors in the remaining subspace. Another method called null space method [2] first removes the null space of the total scatter matrix, then projects samples to the null space of the withinclass scatter matrix, and finally removes the null space of the between-class scatter matrix to get optimal classifying performance. Two dimensional FLD (2DFLD) [3] proposed recently directly projects the image matrix under a specific projection criterion, rather than using the stretched image vector. 2. From the view of pattern samples, preprocess the samples to reduce the dimension or increase the number of them before performing LDA. Belhumeur [4] proposed Fisherface which performs principal component analysis (PCA) to reduce the dimension of samples, and then performs LDA in the lower-dimensional space. Another method called the sample-set-enlarging method adds new samples to the original sample set to increase the number of samples [5]. Based on this method, we propose grouping sampling reduction-based linear discriminant analysis. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 888 – 893, 2006. © Springer-Verlag Berlin Heidelberg 2006
Grouping Sampling Reduction-Based Linear Discriminant Analysis
889
2 Linear Discriminant Analysis Suppose n d-dimensional samples, x1 , , x n , belong to c different categories. The
subset Di of ni samples belongs to the category ω i ( i =1,2, ,c). The within-class scatter matrix SW and the between-class scatter matrix S B are defined as follows: c
SW = ¦¦ (x − mi )(x − mi ) = QW QW t
t
(1)
i =1 i∈Di
SB =
c
¦ n (m i
i =1
In (1) and (2), set, m i =
1 ni
¦x
x∈Di
− m )(mi − m ) = QB Q B t
i
t
(2)
mi is the mean vector of Di , and m is mean vector of the whole , m=
1 n
1
c
¦x = n ¦n m i
x
i
. The goal of LDA is to find an optimal
i =1
projection W : W = arg max W
W t S BW W t SW W
(3)
It has been proved that the column vectors of W are the eigenvectors corresponding to the maximum eigenvalues of SW−1 S B when SW is nonsingular. It is easy to prove that SW is a d × d matrix and the rank of it is not more than n − c . In practical face recognition problems, the number of the samples is much smaller than the dimension of the sample space. So SW is singular and the optimal matrix W can’t be directly extracted by calculating the eigenvectors of SW−1 S B . This is so-called the small sample size problem.
3 Grouping Sampling Reduction-Based LDA The sample-set-enlarging method increases the number of the samples to eliminate the singularity of the within-class scatter matrix by adding enough virtual samples to the training sample set. Virtual samples are generated from the original samples by doing simple geometric transformation such as rotation, translation, vertical mirroring and scale. The traditional sample-set-enlarging method [5][6][7] usually generates virtual samples by doing rotation or vertical mirroring transformations, which never change the sample’s size. So the virtual samples have the same dimension as the original samples and can be directly added to the original sample set. Since the dimension is not reduced and the number of the original training samples is much smaller than the dimension of the sample space, the traditional sample-set-enlarging method has to generate numerous virtual samples to make the within-class scatter
890
Y. Wu and L. Dai
matrix full rank and perform LDA in the original high-dimensional sample space, resulting in a waste of storage space and high computational complexity. A good method to overcome the limitations of the traditional sample-set-enlarging method is to generate lower-dimensional virtual samples. Considering that LDA should be performed on the sample set in which the samples have the same dimension, the lower-dimensional virtual samples can not be directly added to the original sample set, but replace the original samples to compose a new sample set. Then LDA is performed on the new sample set. Because the original samples are removed and LDA is performed completely on the virtual samples, the virtual samples should hold the discriminant information of the original samples. Only in this way is it possible to extract the optimal feature. To sum up, the generated virtual samples should satisfy these conditions: having larger number and lower dimension than the original samples to make the within-class scatter matrix nonsingular, and holding original samples’ information to ensure extracting the optimal feature. Sampling reduction is a method to reduce the digital image. To reduce an image 2 into the size 1 N of the original, it segments the image into a group of subdomains with the size of N × N , and assigns the values of the pixels at the top left corner of each subdomain to the pixels of the reduced image. In other words, it samples the original image every other N − 1 pixels in x and y directions. Since only one pixel in a subdomain is taken as representative and all the other N × N − 1 pixels are lost, the reduced image doesn’t hold enough information of the original image.
Fig. 1. (a) The original human face, (b) The result of grouping sampling reduction when N=2, (c) The result of grouping sampling reduction when N=4
If each pixel of a subdomain takes turns to be a certain pixel of the reduced image, we will get N 2 different reduced images. In this way, from an original sample, we 2 can generate a group of N 2 virtual samples whose dimension is reduced to 1 N the dimension of the original sample. We call this method grouping sampling reduction, and N the enlarging factor. Although a single virtual sample can’t hold much information of the original sample, a group of virtual samples generated from an original sample by grouping sampling reduction can hold nearly all the information of the original sample. For example, in Fig. 1, we can get the discriminant information from (b) and (c) as much as that from (a). It is mentioned in Section 2 that SW is a d × d matrix and the rank of SW is not more than n − c . So the condition, d ≤ n − c , must be satisfied if SW is nonsingular.
Grouping Sampling Reduction-Based Linear Discriminant Analysis
891
Since the original sample set doesn’t satisfy this condition, we use grouping sampling reduction to generate virtual samples, and then replace the original samples with the virtual samples to compose a new sample set. In the new sample set, the number and 2 the dimension of the samples are nN 2 and d N . So the within-class scatter matrix d d SW′ constructed by the new sample set is a matrix and × N2 N2 rank (S W′ ) ≤ nN 2 − c . The necessary condition to make SW′ nonsingular is d / N 2 ≤ nN 2 − c
(4)
Obviously, SW′ will be nonsingular if N is large enough, but not the larger the better. Because grouping sampling reduction distributes the pixel information of an original sample into each virtual sample generated from it; if N is too large, the pixel information is too much distributed to produce the discriminant information. So it is the best to assign N with the minimum integer making SW′ nonsingular. We choose the minimum integer satisfying (4) as the value of N . Although (4) is a necessary condition, usually the value of N chosen in this way can make SW′ nonsingular. However, the exceptional case must exist, since the rank of SW′ is not known before training and it may be any integer not more than nN 2 − c . In the exceptional case, to make SW′ nonsingular, add an extra 1 to the value of N . Grouping sampling reduction can generate the virtual samples with larger number and lower dimension than the original samples and eliminate singularity of the withinclass scatter matrix by choosing an appropriate value of the enlarging factor N . The generated virtual samples hold nearly all the discriminant information of the original samples and can represent human face patterns. So, after grouping sampling reduction, we can get the optimal discriminant feature by directly performing the traditional LDA on the new sample set completely composed of virtual samples. Ultimately, grouping sampling reduction based LDA we proposed in this paper solves the small sample size problem effectively. In addition, in the testing period, we still first use grouping sampling reduction to generate a group of virtual samples from an original testing sample, and then perform the recognition algorithm on the group of virtual samples to get each virtual sample’s category, and finally count the virtual samples which belong to the same category and take the category with maximum counting number as the original testing sample’s category.
4 Experimental Results The ORL face image database is used to compare our method with PCA, Fisherface, DLDA, Null space method and 2DFLD. This database contains 40 persons, each having 10 different images varying lighting slightly, facial expression and facial details. We test the recognition rates with different numbers of the training samples. k ( k =2, ,8 images of each person are randomly selected from the database for
892
Y. Wu and L. Dai
training and the remaining 10 − k images of each person for testing. For each value of k , 20 runs performs with different partition between the training set and the testing set. No any preprocessing is done, and the final dimension is chosen to be 39 ( c - 1). The nearest neighbor algorithm under the Euclidean distance is employed to classify the test images. Table 1 shows the average recognition rates. Table 1. Recognition rates (%) on ORL database
k
PCA
Fisherface
DLDA
2 3 4 5 6 7 8
80.6094 87.4286 91.8333 93.7250 95.5938 96.4167 97.6250
80.6094 90.9107 94.4792 95.3500 96.6875 97.0000 97.4375
83.8125 89.9107 93.3750 94.8250 96.2188 96.8333 97.6875
Null space method 85.9688 90.6071 94.5000 95.7000 96.8125 97.0000 97.6250
2DFLD 85.7031 90.7143 93.6458 95.4250 96.2813 97.0833 97.6250
Our method 87.7656 92.6607 95.5417 97.0750 97.8125 98.2917 98.8125
Fig. 2. Comparison of the six methods’ recognition rates over variation of k
Fig. 2 shows that for each value of k , the recognition rate of our method is the highest. From Table 1, we can see the recognition rate of our method is 1% higher than the other methods’. Especially when k is small ( k 2,3), the recognition rate of our method is 2% higher than the other methods’. The experiment result shows that our method outperforms the other methods on the whole, especially when the number of training samples is small, which is often the case in face recognition and other pattern recognition tasks.
Grouping Sampling Reduction-Based Linear Discriminant Analysis
893
5 Conclusions Grouping sampling reduction based LDA proposed in this paper is a new effective method to solve the small sample size problem. It first uses grouping sampling reduction to generate virtual samples which have larger number and lower dimension than the original samples and can represent human face patterns, and then uses these virtual samples to compose a new sample set, and finally performs LDA on the new sample set to extract the optimal feature. Since it increases the number of samples and reduces the dimension of samples at the same time, it has two advantages as follows when compared with other methods of solving the small sample size problem: 1. Compared with the dimension-reducing methods, it increases the number of samples, so it only need reduce the dimension in a smaller extent, and decreases the loss of discriminant information. 2. Compared with traditional the sample-set-enlarging methods, it reduces the dimension of samples, so it only need generate fewer virtual samples and perform LDA on a low-dimensional sample space. Therefore, it saves storage space and reduces computation complexity.
References 1. Yu, H., Yang, J.: A direct LDA Algorithm for High-Dimensional Data with Application to Face Recognition. Pattern Recognition, 34 (10) (2001) 2067-2070 2. Huang, R., Liu, Q. S., Lu, H. Q.: Ma S D. Solving the Small Sample Size Problem of LDA. ICPR'02, (2002) 29-32 3. Xiong, H. L., Swamy, M. N. S., Ahmad, M. O.: Two-dimensional FLD for Face Recognition. Pattern Recognition, 38 (2005) 1121-1124 4. Belhumeur, P. N., Hespanha, J. P., Kriegman, D. J.: Eigenfaces vs Fisherface: Recognition Using Class Special Linear Projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19 (7) (1997) 711-720 5. Marcel, S.: A Symmetric Transformation for LDA-based Face Verification. In Proc. Int. Conf. Automatic Face and Gesture Recognition (AFGR), Seoul, Korea (2004) 6. Beymer, D., Poggio, T.: Face Recognition from One Example View. In Proceedings of the Fifth International Conference on Computer Vision (ICCV), Cambridge (1995) 500-507 7. Torres, L., Vilà, J.: Automatic Face Recognition for Video Indexing Applications. Pattern Recognition, 35 (2002) 615-625
Hierarchical Adult Image Rating System* Wonil Kim1, Han-Ku Lee2,**, and Kyoungro Yoon3 1
College of Electronics and Information Engineering at Sejong University, Seoul, Korea [email protected] 2 School of Internet and Multimedia Engineering at Konkuk University, Seoul, Korea [email protected] 3 School of Computer Science and Engineering at Konkuk University, Seoul, Korea [email protected]
Abstract. Though popularity and improvement of Internet with the explosive proliferation of multimedia contents bring us the era of digital information, the unexpected popularity of the Internet brings us its own dark side. Everyday young children are exposed to images that should not be delivered to them. In this paper, we propose an adult image rating system that properly classifies an image into one of multiple classes such as swimming suit, topless, all nude, sex image, and normal. The simulation results show that the proposed system successfully rates images into multiple classes with the rate of over 70%.
1 Introduction In this paper, we propose image rating system that rates images according to the harmfulness to the minors. The development of the Internet has made people’s life much more convenient than any time before, however it also brought us its own dark sides. Among the millions of Web sites, more than 500,000 web sites are related to adult contents that your children should never see [1,6]. The proposed system uses MPEG-7 descriptors as the main input feature of the classification task. We first analyze several MPEG-7 descriptors and then create a prototype that extracts image features from adult image data. Using these descriptors, we rate adult images into multiple classes via effective hierarchical image classification technique. The proposed adult image rating system employs multiple neural network modules that perform binary classification task. The simulation results show that the MPEG-7 descriptors can be effectively used as main features of the image classification process. The proposed system indeed successfully rates images into multiple classes, such as swim suit, topless, nude, sex, and normal images, with the success rate of over 70%. In the next section, we discuss common approaches in the adult image detection. The proposed Hierarchical Adult Image Rating System is explained in section 3. The simulation environment and the results are shown in section 4. Section 5 concludes. * **
This paper is supported by Seoul R&BD program. Author for correspondence: +82-2-2049-6082.
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 894 – 899, 2006. © Springer-Verlag Berlin Heidelberg 2006
Hierarchical Adult Image Rating System
895
2 Adult Image Detection The research on the automated adult image detection has started with the popularity of the World Wide Web (WWW) in the mid 90’s. In the early days, most algorithms are detecting the existence of naked people in the images [2, 7]. The main idea of this research is as follows; it effectively masks skin regions using colors and textures (e.g. the skin filter). If skin regions that passed mask tests are matched to persons’ figures, it is assumed that there are numerous naked parts in the image (e.g. the geometric filter). The algorithm tries to detect whether or not the filtered skin color is matched to a specific body part in the whole image, rather than extracts primitive features helping effective classification [3]. The most common method in the adult image classification is based on the color histogram. It is an effective feature for large amount of data. A simple experiment based on the color histogram results in 80% of the detection rate and 8.5% of the false positive rate [4]. Feature extraction technique using MPEG-7 descriptors for the adult image classification was proposed by Yoo in 2003 [5]. He used three descriptors that are effective for adult image classification among various standardized MPEG-7 visual descriptors. His system compares the descriptor value of a given image to that of images from the database, and retrieves 10 similar images with class information. The given image is classified as the class, as which the majority of the retrieved images are classified. Even though the results showed that MPEG-7 descriptors can be used as very efficient features in adult image classification, the classification method used in his paper is a simple k-nearest neighbor method. Thanks to the blooming of data mining field, image classification techniques based on statistical methods also have incredibly improved and a new field of study called image mining was born [8]. Thus, many research groups study and research the field of image classification via the image mining technique. The image classification technologies can be categorized as the Neural Network, the Decision-Tree Model, the Bayesian classification and the Support Vector Machine An Artificial Neural Network (ANN) is an information processing paradigm inspired by biological nervous systems, such as the brain system processing information. The key element of this paradigm is the novel structure of the information processing system. In the adult image rating system, the network concentrates on the decision-boundary surface to distinguish adult images from nonadult images via the computer-based classification rule, called perceptron [9, 10]. It is composed of a large number of highly interconnected processing elements (neurons) working in unison to solve specific problems. ANNs, like people, learn by examples. An ANN is configured for a specific application, such as pattern recognition or data classification, through a learning process. Learning in biological systems is involved with adjustments to the synaptic connections that exist between the neurons. In this paper, we employ Neural Network for the classification module, which is structured hierarchically. Inputs to the neural network are fed from the feature values extracted from MPGE-7 descriptors [11]. Since each descriptor can represent specific features of a given image, a proper evaluation process to choose the best one for the adult image classification is required.
896
W. Kim, H.-K. Lee, and K. Yoon
3 The Proposed Hierarchical Adult Image Rating System 3.1 The Proposed Architecture Figure 1 illustrates the classification flow of the proposed Hierarchical Adult Image Rating System. The proposed system consists of two stages; the Feature Extraction Module and the Neural Network Classifier Module. Features defined in the MPEG-7 descriptors for the given query images are extracted, and then used as inputs for the classifier module.
Fig. 1. Classification flow of the Hierarchical Adult Image Rating System
The hierarchical neural network classifier module can categorize a given image into one of the five classes such as swim suit images (S), topless images (T), nude images (N), sex images (X), and normal images (I). By using multi module hierarchical structure, each stage can employ different descriptors, such as Color Layout for color, Homogeneous Texture for texture, or Region Shape for shape, that are most suitable for the given classification task. It is based on the fact that classifying normal and adult related images are heavily dependent on color information, whereas classifying nude and sexual images are likely to be dependent on shape descriptors or texture descriptors. At the top decision stage, it classifies images into adult related images and normal images. In the next stage, if the images are classified into adult related images, they are again categorized into 2 classes, such as topless images / swim suit images for category 1, and nude images / sexual images for category 2. These categorized images, which are all adult related images, are further classified in the next stages. The category 1 images are either topless images or swim suit images, and the category 2 images are either nude images or sexual images. 3.2 Feature Extraction and Classification Module The features of training images are extracted in XML format through the execution of the MPEG-7 XM software. This feature information in XML form is parsed in the
Hierarchical Adult Image Rating System
897
next step and is normalized into values between 0 and 1 with respect to values generated by each descriptor. These normalized values are used as inputs to the neural network classifier. The class information that is attached to the feature values is different depending on the stages in which it is used. The four modules that are used for classification employ neural network architecture. Each neural network classifier learns the relation of the feature values and the corresponding class by modifying the weight values between nodes. We use the backpropagation algorithm to train the network. The number of input nodes depends on the dimension of each descriptor, whereas the number of output nodes is two. The class information for the two output nodes is represented as (1, 0) or (0, 1) depending on the stage and images as mentioned above. In the testing process, as in the training process, the system extracts features from query images using MPEG-7 descriptors, and classifies the images using the neural network that are generated at the training process. The four modules are connected hierarchically starting from Module 1 (normal-adult related images), and then are traversed down if necessary.
4 Simulation Total of 4000 images (500 for swim suit images, 500 for topless images, 500 for nude images, 500 for sex images, and 2000 for normal images) are used for training and 800 images (100 for swim suit images, 100 for topless images, 100 for nude images, 100 for sex images, and 400 for normal images) for testing. It employs 5 visual descriptors for feature values, such as Color Layout (12), Color Structure (256), Edge Histogram (80), Homogenous Texture (30), and Region Shape (35), where values in the parentheses indicate the input dimension. The inputs consist of MPEG-7 normalized descriptor values. Table 1 shows the overall performance of each classification module. Table 2 presents the test results of each descriptor from the 800 images that are not used in the training process. In case of the Color Layout, the Edge Histogram, and the Region Shape, the results are better than the rest of the descriptors. We reason that each descriptor has its own advantages and disadvantages when they are used as feature (input) values for the network. For example, the Color Layout descriptor provides excellent result in the adult-normal image classification, but produces very poor results in topless-swim suit image classification, resulting in the degrade of overall performance. It is based on the fact that classifying normal and adult related images are heavily dependent on color information, whereas classifying nude and sexual images are likely to be dependent on Region Shape Descriptor. It would be a better strategy of using different descriptors or even combinations of different descriptors in each module. Finding the better combination would be very difficult if we are to consider the cases for many descriptors. Moreover, the best combination heavily depends on the domain of images. A strategy that works fine on one domain, such as sports image classification, may not work for other domains, such as flower image classification. The rest of descriptors can be used effectively for other domains.
898
W. Kim, H.-K. Lee, and K. Yoon Table 1. Classification rates of each Module Module 1 (Adult Images / Normal Images)
Adult Images Images 71.4 %
28.6 %
Normal Images 14.5 %
85.5 %
Adult
Module 2 (Topless, Swim suit/ Nude, Sex)
Normal Images
Topless/ Nude/ Swim suit Sex Topless/ Swimsuit Nude/Sex
Module 3 (Topless/Swim suit)
Topless Swim suit
Topless 75 % 32 %
Swim suit 25 % 68 %
73 %
27 %
23 %
77 %
Module 4 (Nude/Sex)
Nude Sex
Nude 87 % 10 %
Sex 13 % 90 %
Table 2. Test Results of Hierarchical Classifier using each descriptor
descriptors S T Color Layout N X I S T Color N Structure X I S T Edge N Histogram X I S T Homogeneous N Texture X I S T Region Shape N X I
S T N X (Swimsuit) (Topless) (Nude) (Sex) 42 21 21 9 17 42 15 12 11 12 64 6 8 7 7 74 2.25 3.25 1.5 1.25 34 19 17 14 24 22 20 22 6 7 66 11 18 13 4 54 5.75 5.25 0.75 2 56 19 10 11 20 38 11 15 15 7 53 10 13 21 9 50 1.5 2.25 2.5 2.25 61 14 7 4 21 39 11 8 11 8 61 7 7 11 9 61 2.75 8 5.25 5 55 11 5 2 11 55 8 3 5 13 57 10 3 5 4 70 7.25 5 3.75 5
I (Normal) 7 14 7 4 91.75 16 12 10 11 86.25 4 16 15 7 91.5 14 21 23 12 79 27 23 15 18 79
Total Total = 800 Correct = 589 Error = 211 % = 73.625 Total = 800 Correct= 521 Error = 279 % = 65.125 Total = 800 Correct = 563 Error = 237 % = 70.375 Total = 800 Correct = 538 Error = 262 % = 67.25 Total = 800 Correct = 553 Error = 247 %= 69.125
Hierarchical Adult Image Rating System
899
5 Conclusion This paper proposes a hierarchical adult image rating system using neural networks. This system consists of four modules that have a hierarchical structure. Each module learns corresponding classification task according to the feature values extracted from MPEG-7 descriptors. The selected MPEG-7 descriptors are used as inputs of the network. It classifies the images into multiple classes (5 classes) and the simulation shows that the system achieved above 70% of the success rate for this hard tasks. It is also shown that using different descriptors or even combinations of different descriptors in each module is better strategy for the classification task. Finding the better combination would be very difficult if we are to consider the cases for many descriptors. Consequently the proposed multi module hierarchical classification system is useful for well defined features and the framework can be effectively used as the kernel of web contents rating systems.
References 1. Arentz, W., Olstad, B.: Classifying Offensive Sites Based on Image Contents. Computer Vision and Image Understanding, Vol. 94, (2004) 293-310 2. Fleck, M., Forsyth, D., Bregler, C.: Finding Naked People. Proc. 1996 European Conference on Computer Vision, Vol. 2 (1996) 592-602 3. Jones, M., Rehg, J.: Statistical Color Models with Application to Skin Detection. Technical Report Series, Cambridge Research Laboratory (1998) 4. Jiao, F., Giao, W., Duan, L., Cui, G.: Detecting Adult Image Using Multiple Features. Proc. IEEE conference, Vol.3 (2001) 378 - 383 5. Yoo, S.: Intelligent Multimedia Information Retrieval for Identifying and Rating Adult Images. Lecture Notes in Computer Science, Vol. 3213. Springer-Verlag, Berlin Heidelberg, New York (2004) 165-170 6. Hammami, M., Chahir, Y., Chen, L.: WebGuard : Web Based Adult Content Detection and Filtering System. Proc. IEEE/WIC International Conference on Web Intelligence (2003) 574-578 7. Forsyth, D., Fleck, M.: Identifying Nude Pictures. Proc. IEEE Workshop on the Applications of Computer Vision (1996) 103-108 8. Hand, D., Mannila, H., Smyth P.: Principles of Data Mining. MIT Press (2001) 343-347 9. Rosenblatt F.: The Perceptron: A Probabilistic Model for Information Storage and Organization in Brain. Psychology Review 65 (1958) 386-408 10. Rumelhart, D., Hinton, G., Williams, R.: Learning Representations by Back-Propagating Errors. Nature (London), Vol. 323 (1986) 533-536 11. Kim, W., Lee, H., Yoo, S., Baik, S.: Neural Network Based Adult Image Classification. Lecture Notes in Computer Science, Vol. 3696. Springer-Verlag, Berlin Heidelberg, New York (2005) 481-486
Shape Representation Based on Polar-Graph Spectra Haifeng Zhao1 , Min Kong1,2 , and Bin Luo1, 1
Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei 230039, P.R. China [email protected], [email protected] 2 Department of Machine and Electron engineering, West Anhui University, Liuan 237012, P.R. China
Abstract. In this paper, a new shape representation method is proposed. We exploit our strategy in three steps. First we calculate the centroid and centroid distance of an image contour. Then based on polar coordinate system, the contour points are selected to construct graph, which is called Polar-Graph. The spectra of these graphs are finally organized as feature vectors for future clustering or retrieval. Our experiments show that the proposed representation is invariant to scale, translation and rotation, and is insensible to slight distortion and occlusion in some measure.
1
Introduction
The rapid development of digital technology and Internet has led to huge and ever-growing archives of images, audio or video data, which enables people to handle a large amount of visual information. Shape is one of the most important features that represent the visual content of an image. However, shape retrieval is yet a very difficult task because of the ambiguous and incomplete information about shape available in an image. Shape representation is the key step for shape application. Barrow et al. [1] and Fischler et al.[2] were among the first to demonstrate the potential of relational graphs as abstractions for pictorial information. Since then graph-based representations have been exploited widely for the purposes of shape representation, segmentation, matching and retrieval [3],[4],[5],[6]. However, there are two main problems with graph-based shape representation. One of the problems is how to construct the graph from the shape contour. If we try to build a graph with all contour points, the complexity of graph let proves to be useless. The other problem is to measure similarity of large sets of graphs, which hinders the manipulation of graph methods. Graph-spectral approach is an efficient way to solve the problem [7]. We aim to construct a graph directly from a shape. The centroid is an invariant to the shape. After calculating the centroid and centroid distance of the object shape, we construct graph by selecting contour points based polar coordinate
Corresponding author. Tel./fax: 086-551-5108445.
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 900–905, 2006. c Springer-Verlag Berlin Heidelberg 2006
Shape Representation Based on Polar-Graph Spectra
901
system. Graph representation also provides scale invariance by itself. Then the spectra of these graphs will be obtained as feature vectors for clustering or retrieval. Our experiments show that the proposed representation is invariant to scale, translation and rotation, even for slight distortion and occlusion. Furthermore, according to the polar angle of the selected contour points, the shape descriptor can be used as a kind of hierarchical coarse to fine representation.
2
Polar Graph
Edge detection is the preprocessing step to get the shape contour. The accuracy and reliability of edge detection is critical to the shape representation. In our implementation, we use the Canny edge detection algorithm [8], which is known to many as the optimal edge detector, to extract the contour points. First, we calculate the centroid of the object as a reference point. Next, relative to the centroid, the distance and angle of the contour points are computed for polar space. Because we use the pole at the centroid of the object, the representation of a distance sequence is translation invariant. To achieve rotation invariant, we choose the maximum distance that is associated with the angle of zero and normalize all the angles of contour points. Locating the centroid of a contour is key for shape representation. In order to obtain invariance to translation, rotation and scaling, the geometric center of object is selected as a reference point in each image. We use the average value of N the contour coordinates (xi , yi ) to compute the centroid (xc , yc ),xc = N1 i=1 xi , N yc = N1 i=1 yi , where N is the number of all contour points. The centroid distance (ρi ) is expressed by the distance of the contour point (xi , yi ) to the centroid(xc , yc ): ρi = (Δx2 + Δy 2 )1/2
Where Δx = (xi − xc ) and Δx = (yi − yc )
(1)
The centroid angle (θi ) means the anti-clockwise angle between the centroid distance line and x-axis. θi is calculated by equation(2). ⎧ − arctan(Δx/Δy) if Δx > 0 and Δy ≤ 0 ⎪ ⎪ ⎪ ⎪ if Δx = 0 and Δy < 0 ⎨ π/2 if Δx < 0 and Δy = 0 (2) θi = arctan(Δx/Δy) + π ⎪ ⎪ 3π/2 if Δx = 0 and Δy ≥ 0 ⎪ ⎪ ⎩ 2π − arctan(Δx/Δy) if Δx > 0 and Δy > 0 If we take the pole of a polar space at the centroid (xc , yc ), the contour point (xi , yi ) can be represented in the polar space (ρi , θi ), which is shown in Figure1. In order to compare and select points for graph easily, we change the radian form of θi ∈ [0, 2π) to the integer degree form of θi ∈ [0, 1, 2, . . . , 359). To obtain rotation invariance of shape, we rotate the polar space so that the maximum distance (ρi ) was associated with 0 degree. Other θi should be calculated in terms of the direction θmax of ρmax , which is shown in Figure 1. It is impossible to select all contour points to build the graph because of the complexity. Assume that many rays are sent out from the centroid, the rays will
902
H. Zhao, M. Kong, and B. Luo
Fig. 1. Polar Space and Rotation
Fig. 2. Two problematic examples
intersect with the contour. But it is hard to compute the coordinate of point of intersection because there is no equation for a complicated contour. On the contrary, we can select the contour points by θi with the same interval Δθ, such as an interval of 10 degree. we certainly get 36 feature points when we choose 10 degree interval. If we use different intervals(for example,40,35,30,25,...,5), a hierarchical coarse to fine representation could be carried out. Because of the intricate information of contours, some points may have the same angle θi and degrees in the [0, 359] may be discontinuous especially when the centroid is next to the boundary. Two situations are showed in Figure 2. Aiming at the multiple θi of first sitation, we use the average point of all contour points with the same degree. Secondly, If one degree (α) is missed, we take these steps: using the average of the degree (α ± 1) points to substitute for the degree point. If the degree points (α ± 1) cannot be obtained, we use the average of the (α ± 2) degree points. The worst situation is that no point in the (α ± 2) degree range. In this case, we make use of the centroid in the (α) degree direction. Therefore, the minimal interval can be equal to 5 degree. That is to say, the maximal number of feature points is 72. Because the graph with the same vertices is easy for the following processing, it is better to get the same contour points to construct the graph for each image. According to these selected points we build Delaunay graphs, which shown in the Figure 3. Obviously we can easily know that the graph with the smaller Δθ can represent the original object more accurately . We call the kind of graph as Polar-Graph. There are several distinct advantages of Polar-Graph. The Polar-Graph can be generated automatically in Polar Coordinate System. PolarGraph is a whole description of image contour so that every point selected for graph is very important. At the same time we can construct the same size of graph in term of the same interval because the inter-comparison of graphs with
Original object
Δθ = 20 degree Fig. 3. Coarse to fine Polar-Graphs
Δθ = 10 degree
Shape Representation Based on Polar-Graph Spectra
903
the same size is relatively easy. Compared with the shock graph[4], which is a shape abstraction that decomposes a shape into hierarchically organized primitive parts, a Polar-Graph is built directly from original contour points. Therefore, it keeps the shape information to a maximum extend.
3
Graph Spectral Decomposition
An extreme problem that hinders the manipulation of large sets of graph is to measure their similarity when we successfully construct Polar-Graph for representing object shapes. The problem arises in a lot of situations where the graphs must be matched or clustered together. Here, we work with the spectral decomposition of the weighted adjacency matrices. 3.1
Graph Spectra
We are concerned with a set of Polar-Graphs (G1 , . . . , Gk , . . . , Gn ). The k-th graph is denoted by Gk = (Vk , Ek ), which Vk is the set of vertices and Ek ⊆ (Vk ×Vk ) is the edge-set. For each Gk we compute the weighted adjacency matrix Ak . Ak is a |Vk | × |Vk | matrix whose element with row indexed and column indexed is 2 − d (i,j) 2 if (i, j) ∈ Ek , and d(i,j) is the Euclidean distance Ak (i, j) = e σ 0 otherwise (3) From the adjacency matrices Ak , k=1,. . . ,n at hand, now we can calculate the ω eigenvalues λω k by solving the equation |Ak − λk I| = 0, where ω is the eigenvalue index and ω = 1, 2, . . . |Vk |. We order the eigenvalues in descending order, i.e., |V | |λ1k | > |λ2k | > · · · > |λk k |. Furthermore, we can acquire the associated eigenvecω ω ω ω tors φk of λk by solving the system of equations Ak φω k = λk φk . The eigenvectors |V | are stacked in order to construct the modal matrix φk = (φ1k |φ2k | · · · φk k ). With the eigenvalues and eigenvectors of the adjacency matrix to hand, the spectral decomposition for the adjacency matrix of the graph-indexed k is Ak =
|Vk |
ω ω T λω k φk (φk )
(4)
ω=1
For each graph, we use only the first d eigenmodes of the adjacency matrix. The |V | truncated vector is φk = (φ1k |φ2k | · · · φk d ). 3.2
Spectral Features
Our goal is to use the spectral features computed from the eigenmodes of the adjacency matrices for Polar-Graphs to construct feature vectors. To avoid the difficulty of correspondence, we employ the order of eigenvalues to establish
904
H. Zhao, M. Kong, and B. Luo
the order of the feature vectors. Features suggested by spectral graph theory include leading eigenvalues, eigenmode volume, eigenmode perimeter, cheeger constant, inter-mode adjacency matrix and distance, which are analyzed in Luo’s paper[7]. Here we simply use leading eigenvalues for experiments to test our shape representation. For the graph-indexed k, the vector is Lk = (λ1k , λ2k , . . . , λdk )T . The vector Lk represents the spectrum of the graph. 3.3
Experiments
We use ”heart” sequence with total 20 images in the shape database of MPEG-7 CE-1-SetB, shown in Figure 4. The sequence involves translation, rotation, deformation, and occlusion. By our proposed method, first we extract the contour points to construct Delaunay graphs. Since the Delaunay graph is the neighborhood graph of the Voronoi tessellation, i.e. the locus of the median line between adjacent points, it may be expected to reflect changes of the contour. To test
Fig. 4. Heart sequence for experiments
Fig. 5. 3D projection of the spectra of the 20 graphs
Fig. 6. Distance map of spectral featurevectors
the validity of the new approach to the representation of shapes, we just keep the first three of eigenvalues as graph spectral feature. Figure 5 shows the 3D projection of the spectra of the twenty graphs after we construct the graph from shape contours (Δθ = 10 degree). We can find No.10, No.18, No.19 and
Shape Representation Based on Polar-Graph Spectra
905
No.20 graphs are more comparatively deviated than other graphs. From 5, we can easily find these images deviated are actually deformed to a great extent. The reason is that these corresponding images have more different shapes than other images. It demonstrates our representation reflect the real situation. In Figure6, we show the matrix of pairwise Euclidean distances which are defined as distances between every two feature-vectors (It is best viewed in color). The matrix has 20 rows and 20 columns (one for each of sequence images), and the indexes of graph are ordered according to the position of images in the sequence. The Figure6 also visually shows our representation is right and effective in its different colors.
4
Conclusions
In this paper, we present a novel approach of shape representation based on polar-graph Spectra.Our results demonstrate its robustness in the presence of translation, rotation and scale, even for slight distortion and occlusion. Based on Polar-Graph Spectra, it will be convenient to develop graph matching and clustering algorithms for the purpose of shape retrieval.
Acknowledgements This research is supported by the National Natural Science Foundation of China (No.60375010), the Excellent Young Teachers Program of the Ministry of Education of China,Innovative Research Team of 211 Project in Anhui University,and Natural Science Project of Anhui Provincial Education Department (No.2006KJ053B).
References 1. Barrow, H.G., Burstall, R.M.: Subgraph isomorphism, matching relational structures and maximal cliques. Inform. Process. Lett. 4 (1976) 83–84 2. Fischler, M., Elschlager, R.: The representation and matching of pictorical structures. IEEE Trans. Compute. 22 (1973) 67–92 3. Luo, B., Robles-Kelly, A., Torsello, A., Wilson, R.C., Hancock, E.R.: Learning shape categories by clustering shock trees. Proceedings of International Conference on Image Processing. 3 (2001) 672–675 4. Sebastian, T.B., Klein, P.N., Kimia, B.B.: Recognition of shapes by editing their shock graphs. IEEE Trans. Pattern Analysis and Machine Intelligence. 26 (2004) 550–571 5. Badawy, O.E., Kamel, M.: Shape representation using concavity graphs. Proceedings of 16th International Conference on Pattern Recognition. 3 (2002) 461–464 6. Siddiqi, K., Shokoufandeh, A., Dickenson, S.J., Zucker, S.W.: Shock graphs and shape matching. Sixth International Conference on Computer Vision (1998) 222-229 7. Luo, B., Wilson, R.C., Hancock, E.R.: Spectral embedding of graphs. Pattern Recognition.36 (2003) 2213–2223 8. Canny, J.: A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence. 8 (1986) (1986)
Hybrid Model Method for Automatic Segmentation of Mandarin TTS Corpus Xiaoliang Yuan1 , Yuan Dong1,2 , Dezhi Huang2 , Jun Guo1 , and Haila Wang2 1 School of Information Engineering Beijing University of Posts and Telecommunications, 100876, P. R. China [email protected], {yuandong, guojun}@bupt.edu.cn 2 France Telecom R&D Beijing Co, Ltd., 2 Science Institute South Road, Haidian District, Beijing, 100080, P. R. China {dezhi.huang, haila.wang}@francetelecom.com
Abstract. For a corpus-based Mandarin text-to-speech system, the quality of synthesized speech is highly affected by the accuracy of unit boundaries. In this paper, we proposed a hybrid model method for automatic segmentation of Mandarin text-to-speech corpus. The boundaries of acoustic units are categorized into eleven phonetic groups. For a given phonetic group of boundaries, the proposed method selects an appropriate model from initial-final monophone-based HMM, semi-syllable monophone-based HMM and initial-final triphone-based HMM. The experimental results show that the hybrid model method can achieve better performance than the single model method, in terms of error rate and time shift of boundaries.
1
Introduction
The corpus-based method has been applied in most Mandarin text-to-speech (TTS) synthesis systems, because it enables these systems to produce the synthesized speech with high articulation and intelligibility [1]. At the same time, this method also inflates the urgent need for a high-quality speech corpus. Especially, the boundaries accuracy of acoustical units highly impacts the quality of synthesized speech. The classical solution to segment the acoustic unit is to automatically or manually label the speech signals. Obviously, the manual labelling requires tremendous and laborious human work. It definitely introduces a great cost. Besides, it is difficult to keep consistency among the different labelers, especially when the error threshold is set to 10 ms. Even practically, most of the Mandarin TTS systems build different speech corpus according to their special applications. Consequently, the methods for automatic segmentation of speech corpus gain great attention and applause nowadays. Many methods for automatic segmentation have been proposed in the past several years [2][3][4][5][6]. Most of them adapt a phonetic recognizer to the task of acoustic alignment with the given phonetic transcription. They often comprise two stages: (1) the alignment stage, in which the boundaries are roughly D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 906–912, 2006. c Springer-Verlag Berlin Heidelberg 2006
HMM for Automatic Segmentation of Mandarin TTS Corpus
907
estimated by applying forced alignment, using a hidden Markov model (HMM) or a Gaussian mixture model (GMM); (2) the refinement stage, in which the boundaries are refined by high time-resolution analysis and refinement process, depending acoustic characteristics or checking rules. The literature [2] introduced a local refinement stage, in which the acoustic features and phonetic information, provided by the forced alignment, are combined to improve the segmental results. In reference [3], a post-refining method with fine contextual-dependent GMMs is used for the automatic segmentation. In reference [4], seven acoustic features, as well as statistical pattern recognition, are adopted to identify the most valuable features for each phonetic group. Among the former methods, most studies on automatic segmentation are based upon a single model, context-dependent or context-independent model. However, strong evidence from [2] points out that a context-dependent model achieves different performances with a context-independent model. Moreover, an inherent problem of the single model method is that each boundary is estimated only once. Other than the former methods, this paper has sought to improve the performance using hybrid model in the alignment stage. In this method, each boundary has several estimates. A mapping rule is trained and applied to select a best one from these estimates. The experiment result shows that the proposed method can increase the performance in the alignment stage from 79.7% in 20ms, 86.7% in 30ms to 86.1% in 20ms, 91.8% in 30ms.
2
The Hybrid Model Method
2.1
Boundary Grouping
As we known, a syllable in Mandarin includes an optional initial and a final. The Mandarin tone is always manifested in the part of final. Acoustically, a syllable always has a stable inner structure. Therefore, many of Mandarin TTS system adopt a syllable as the basic acoustic unit. For this paper, the segmentation task is to locate the boundary of a syllable (or the starting and ending time). Table 1. Four phonetic categories are defined as fricative and affricate, unaspirated stop, aspirated stop, and voiced consonant in Mandarin consonant phonemes Phonetic Category
SAMPA
Chinese Pinyin
Fricative and affricate Unaspirated stop Aspirated stop Voiced consonant
f x 6 S s t6 t6 h TS TS h ts ts h f h x sh s j q zh ch z c ptk bdg ptk phthkh mnlZ mnlr
It is also known, phonemes has dissimilar acoustic characters in Mandarin. It brings the difficulties to develop a general model, which can be used to segment all the possible boundary groups. The classification of consonants and vowels should be done separately. Hence, we divide all Mandarin consonants into four
908
X. Yuan et al.
categories according to their acoustic characteristics. They are fricative and affricate, unaspirated stop, aspirated stop, and voiced consonant [4], as listed in Table 1. All the consonants are grouped into four phonetic categories. With regard to the transitions between phonetic categories, the boundaries are further divided into eleven groups in Table 2. The phonetic categories in the left of the boundary include silence and vowel. The phonetic categories in the right embody zero initial, fricative and affricate, unaspirated stop, aspirated stop, and voiced consonant. The boundary structure ”vowel + zero initial”, namely, equals with ”vowel + vowel”. Table 2. Boundaries s are divided into eleven groups according to different phonetic categories between the left and the right of the boundaries Group ID Left Phonetic Category Right Phonetic Category B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10
2.2
vowel silence vowel silence vowel silence vowel silence vowel silence vowel
fricative and affricate fricative and affricate unaspirated stop unaspirated stop aspirated stop aspirated stop voiced consonant voiced consonant vowel vowel silence
The Hybrid Models
The boundaries in Mandarin always locate between the previous syllable’s final and the following syllable’s initial, both of whose acoustic characteristics will directly impact the positions of boundaries. Therefore, the method is proposed taking into account phonetic groups of the phonemes which are closed to the boundaries, when deciding which model to choose. Firstly, in the hybrid model method, the initial-final monophone-based model (IFMM), the semi-syllable monophone-based model (SSMM) and initial-final triphone-based model (IFTM) are trained by a large corpus uttered by many speakers. These models are speak-independent (SI) models. All the sentences in the training and test set are used to adapt the SI models as mentioned above. Then, these adapted models are respectively employed to do forced alignment. Secondly, a mapping rule between different models and boundary groups is then trained with the training set. All boundaries are categorized into eleven phonetic groups. An adapted model is finally selected to match a given phonetic group by voting of all the boundaries of this group. Finally, the test set is used to evaluate the mapping rule. The mapping rule can be later modified continuously according to the feedback from the test set.
HMM for Automatic Segmentation of Mandarin TTS Corpus
909
Since the most suitable model among the different phonetic categories were selected, the proposed approach can achieve better performance than using a single model, which can be proved by our experiment.
3 3.1
Experiments Mandarin Speech Corpus
Count of boundaries
A Mandarin speech corpus is uttered by a professional female speaker, recorded in a sound-proof room and sampled in 16 bits, 16000 Hz. For the sake of performance evaluation, the syllables of 4,000 sentences, as our test set, are labeled in a completely manual way. Subsequently another 400 randomly selected manual labeled sentences, as our training set, are used to training our mapping rule. There are 142,493 boundaries in our test set. The distribution of the boundary groups is depicted in Fig. 1. 60000 40000 20000 0
B0
B1
B2
B3
B4
B5
B6
B7
B8
B9
B10
Boundary 46728 11084 9234 9567 4602 2749 13154 2219 10748 3392 29016 Boundary group
Fig. 1. The distribution of the boundary groups in our test set
3.2
Evaluation of Boundaries
Many measures to evaluate segmentation performance are mentioned in [2],[4],[7], such as measuring the word error rate of a recognizer that uses a segmentation stage, or measuring the subjective quality of a speech synthesizer obtained by automatic segmentation. In this paper, the performance was evaluated by comparing the automatic segmentation with the manual segmentation and computing the error rates smaller than a threshold including 20ms and 30ms. In order to analyze the effects on boundary groups brought by various models, the mean errors and root mean square (RMS) errors of boundary shifts, which are computed with the distances between the automatic segmentation and the manual labeled data, are employed. 3.3
Baseline Performance in Several Models
An IFMM, a SSMM and an IFTM SI model are trained with a large speakerindependent corpus using HTK toolkits. Then 4,000 sentences without manually labeled and 400 manually labeled sentences mentioned before are used to adapt
910
X. Yuan et al.
the SI HMM models. With regard to the test set, IFMM, SSMM and IFTM adaptation models are respectively utilized to do forced alignment. The baseline performance is listed in Table 3. Table 3. Baseline performance was evaluated with IFMM, SSMM, IFTM smaller than a threshold including 20ms and 30ms Tolerance IFMM SSMM IFTM 20ms 30ms
3.4
79.7 86.7
77.2 84.5
76.7 83.7
Experimental Results
The experiment is designed to demonstrate the different results in the eleven boundary groups among IFMM, SSMM and IFTM. In addition, the mapping rule can be seen in this experiment. Posterior to forced alignment and boundary group decision, the mean and RMS of boundary shifts in IFMM, SSMM and IFTM are illustrated in Table 4. It is clearly to see that IFMM achieves better performance in many groups than Table 4. The mean and RMS of boundary shifts in IFMM, SSMM and IFTM IFMM SSMM IFTM Group Mean RMS Mean RMS Mean RMS B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10
0.007 0.013 0.008 0.008 0.008 0.014 0.018 0.018 0.031 0.018 0.018
0.017 0.030 0.027 0.019 0.016 0.022 0.027 0.032 0.034 0.028 0.025
0.007 0.019 0.009 0.016 0.008 0.015 0.019 0.018 0.028 0.024 0.026
0.014 0.031 0.023 0.019 0.016 0.022 0.021 0.032 0.031 0.042 0.035
0.009 0.026 0.015 0.032 0.010 0.019 0.021 0.020 0.032 0.022 0.014
0.020 0.037 0.023 0.029 0.023 0.024 0.027 0.038 0.044 0.036 0.025
SSMM and IFTM. A theoretical explanation for this observation was presented in [8]. The author claimed that the reason is the loss of alignment between the context-dependent models and the phones [9] during the training process. Moreover, the reason is also given by [2], where it was argued that contextdependent models are always trained with realizations of phones in the same context, the reason why the models have no information to discriminate between
HMM for Automatic Segmentation of Mandarin TTS Corpus
911
the phone and its context. Therefore, IFMM can achieve higher performance in that it is trained with realizations of phones in different contexts. Meanwhile, the mapping rule in our experiment is shown in Table 5. Table 5. Mapping rule of the proposed method B0
B1
B2
B3
B4
B5
B6
B7
B8
B9
B10
SSMM
IFMM
IFMM
IFMM
IFMM
IFMM
SSMM
IFMM
SSMM
IFMM
IFTM
The results shown in the Table 6 provide us the validity of the hybrid model method. Moreover, when comparing the results from the two represented papers [3][4] with single model method, it is observed that the hybrid model method achieves a superior performance than that in the alignment stage. Theoretically, the proper boundary in alignment stage is selected according to the mapping rules, thereby leads to an overall success. Table 6. Performance comparison 20ms 30ms baseline 79.7 proposed method 86.1
4
86.7 91.8
Conclusion
In this paper, we proposed a hybrid HMM-based method for automatic segmentation of Mandarin TTS corpus. Although, a single HMM-based method can be used to roughly segment the boundaries, we have noticed that for a specified HMM, the segmentation for the different group of boundaries has remarkably different performance. As an alternative, the hybrid model automatic segmentation adopts the method of boundary grouping and model selection to improve the accuracy of forced alignment. The experimental results show that the proposed method can achieve the better performance than a single model method. From the experimental results, we also found that the performance of group B8 (vowel followed by vowel) is still not good enough. In the near future, we will develop the new methods to improve the accuracy of group B8 in the stage of refinement.
Acknowledgement The authors would like to thank GuangRi Cui from FTR&D Beijing, for the fruitful discussions and great help in training the mentioned HMMs.
912
X. Yuan et al.
References 1. Cai, L., Huang, D., Cai, R.: Fundamental and Applications of Modern Speech Technology. Tsinghua University Press, Beijing (2003) 2. Toledano, D., G´ omez, L., Grande, L.: Automatic Phonetic Segmentation. IEEE Trans. on Speech Audio Proc, 11 (2003) 617–625 3. Wang, L., Zhao, Y., Chu, M., et al.: Refining Segmental Boundaries for TTS Database Using Fine Contextual-Dependent Boundary Models. In: Proc. of ICASSP, Montreal (2004) 641-644 4. Lin, C., Jang, J., Chen, K.: Automatic Segmentation and Labeling for Mandarin Chinese Speech Corpus for Concatenation-based TTS. Computational Linguistics and Chinese Language Processing, 10 (2005) 145-166 5. Zhu, D., Hu, Y., Wang, R.: Automatic Segmentation and Labeling of Speech Corpus Based on HMM with Adaptation. In. Proc. of ISCSLP (2000) 351-354 6. Tao, J., Hain, H.: Syllable Boundaries Based Speech Segmentation in Demi-Syllable Level for Mandarin with HTK, In: Proc. of Oriental COCOSDA (2002) 7. Cox, S., Brady, R., Jackson, P.: Techniques for Accurate Automatic Annotation of Speech Waveforms, in Proc. of ICSLP (2002) 1947-1950 8. Malfr`ere, F., Deroo, O., Dutoit, T.: Phonetic Alignment: Speech Synthesis Based vs. Hybrid HMM/ANN. In: Proc. of ICSLP, 4 (1998) 1571–1574 9. Wightman, C., Talkin, D.: The Aligner: Text-to-speech Alignment Using Markov Models. In: Progress in Speech Synthesis, Springer-Verlag Inc., New York (1997) 313–323
ICIS: A Novel Coin Identification System Adnan Khashman1, Boran Sekeroglu2, and Kamil Dimililer1 1
Electrical & Electronic Engineering Department 2 Computer Engineering Department Near East University, Lefkosa, Mersin 10, Turkey [email protected], [email protected], [email protected]
Abstract. When developing intelligent recognition systems, our perception of patterns can be simulated using neural networks. An intelligent coin identification system that uses coin patterns for classification helps prevent confusion between different coins of similar physical dimensions. Currently, coin identification by machines relies on the assessment of the coin’s physical parameters. In this paper, a rotation-invariant intelligent coin identification system (ICIS) is presented. ICIS uses a neural network and pattern averaging to recognize rotated coins at various degrees. Slot machines in Europe accept the new Turkish 1-Lira coin as a 2-Euro coin due to physical similarities. A 2-Euro coin is roughly worth 4 times the new Turkish 1-Lira. ICIS was implemented to identify the 2 EURO and 1 TL coins
and the results were found to be encouraging.
1 Introduction Artificial neural networks can be used to simulate our perception of objects and pattern recognition in intelligent machines. We can easily recognize familiar patterns or objects regardless of their size or orientation differences. This is due to our intelligent system of perception which had been trained to recognize the objects over time. Coin identification using pattern recognition has an advantage over the conventional identification methods used commonly in slot machines. Most of the coin testers in slot machines, work by testing physical properties of coins such as size, weight and materials using dimensioned slots, gates and electromagnets. However, if physical similarities exist between coins of different currencies, then the traditional coin testers would fail to distinguish the different coins. One such case is the identification of the 2Euro (EURO) and the new Turkish 1-Lira (TL) coins [1]. The 1 TL coin resembles very much the 2 EURO coin in both weight and size, and both coins seem to be recognized and accepted by slot machines as being a 2 EURO coin, which is roughly worth 4 times more than a 1 TL coin [2], [3]. Several coin recognition systems were previously developed and showed encouraging results. Fukumi et al [4] described a system based on a rotation-invariant neural network that is capable of identifying Japanese coins. This has the advantage of identifying rotated coins at any degree, however, the use of slabs is time consuming [5]. Other methods for coin identification were also suggested, such as the use of coin surface colour [6] and the use of edge detection of coin patterns [7]. The D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 913 – 918, 2006. © Springer-Verlag Berlin Heidelberg 2006
914
A. Khashman, B. Sekeroglu , and K. Dimililer
use of colour seems to increase the computational costs unnecessarily, whereas edgebased pattern recognition has the problem of noise sensitivity. The aim of the work presented within this paper is to develop and implement an intelligent coin identification system; abbreviated as (ICIS), that uses coin patterns for classification. ICIS uses image processing and pattern averaging to pre-process coin images prior to training a back propagation neural network using these images. ICIS is a rotation-invariant system that identifies obverse and reverse sides of coins rotated by 15o. A real life application will be presented by implementing ICIS to correctly identify the 2 EURO and 1 TL coins.
2 Coin Image Database There are 12 European countries in the euro area, where all 2 EURO coins have the same design on the obverse side but different designs for each European country on the reverse side [8]. The implementation of ICIS involves distinguishing the 2 EURO coins from the 1 TL coin. Five coins are used for this purpose: one 1 TL coin and four 2 EURO coins of Germany, France, Spain and Netherlands as shown in Figure 1.
(a)
(b)
(c)
(d)
Fig. 1. Coin Samples (a) 2 EURO common obverse side (b) 2 EURO reverse sides of Germany, France, Spain and Netherlands (c) 1 TL observe side (d) 1 TL reverse side
Images of the obverse and reverse sides of the five coins were captured using a Creative WebCam (Vista Plus). The coins were rotated at intervals of (15o) degrees as shown in Figure 2, and images of rotated coins were captured. For example, rotation by 15o results in 48 images of the 1 TL coin (24 obverse sides and 24 reverse sides) and 120 images of the 2 EURO coins (24 obverse sides, 24 reverse sides of each coin of Germany, France, Spain and Netherlands). Training the neural network within ICIS uses 28 of the captured images (at 0o, 90o, 180o and 270o degrees rotations) for all coins and different sides. The remaining 140 images of the various coins at different rotations will be used for testing the trained neural network within ICIS. Table 1 shows the number of coin images obtained using rotation interval of 15o degree. Figure 3 shows examples of rotated coins.
ICIS: A Novel Coin Identification System
120o
o 105o 90
75o 60o
135o 150 165
45o
o
30o
o
15o
180o 195
915
(a) 255o
0o o
345
210
o
(b) 105o
o
330o 315o
225o 300o
240o 255o
270o
285o
(c) 45o
Fig. 2. Rotation Degrees of Germany 2 EURO Coin
(d) 270o
Fig. 3. Rotated Coins (a) 2 EURO Obverse Common Side (b) 2 EURO France (c) 2 EURO Spain (d) 1 TL Obverse Side
Table 1. Number of Coin Patterns Using 15 o Rotations 2 EURO images 1 TL images Total
Obverse 24 24 48
Reverse 96 24 120
Total 120 48 168
3 ICIS Implementation The implementation of ICIS consists of two phases: an image processing phase where coin images undergo compression, segmentation and pattern averaging in preparation to be presented to the second phase; which is training a back propagation neural network. Once the neural network converges and learns, the second phase consists only of one forward pass that yields the identification result. 3.1 Image Processing Phase This phase is a data preparation phase for neural network training. Coin images undergo mode conversion, cropping, thresholding, compression, trimming and pattern averaging. The original captured coin image is in RGB color and with the dimensions of 352x288 pixels. First, the mode of the pattern is converted into grayscale. Second, the grey coin image is cropped to an even size image of 250x250 pixels. Third, the cropped grey coin image undergoes thresholding using a threshold value of 135 (as shown in equation 1), thus converting the image into black and white image. Finally, the thresholded coin image is compressed to 125x125 pixels and then trimmed to 100x100 pixels image that contains the patterns of the coin side.
916
A. Khashman, B. Sekeroglu , and K. Dimililer
P[x, y ] ≤ 135 0, if P[x, y ] = ® ¯255, else.
(1)
The 100x100 pixel image will provide the input data for the neural network training and testing. However, in order to provide a faster identification system and yet maintain meaningful learning, the 100x100 pixel image is further reduced to a 20x20 bitmap that represents the original coin image. This is achieved by segmenting the image using segments of size 5x5 pixels, and then taking the average pixel value within the segment, as shown in the following equations.
((
) )
Segi = Sumi / D / 256
(2)
D = §¨ TPx .TPy ·¸ / S © ¹
(3)
where Segi is the segments number, Sumi is the summation of the defined segments and D is the total number of each pixel, TP denotes the x and y pixel size of image and S is the total segment number. Pattern averaging provides meaningful learning and marginally reduces the processing time. For the work presented within this paper, pattern averaging overcomes the problem of varying pixel values within the segments as a result of rotation, thus, providing a rotation invariant system. Using a segment size of 5x5 pixels, results in a 20x20 bitmap of averaged pixel values that will be used as the input for the second phase, namely neural network training and generalization. 3.2 Neural Network Phase ICIS uses a 3-layer back propagation neural network with 400 input neurons, 30 hidden neurons and 2 output neuron; classifying the 1 TL and the 2 EURO coins. This phase comprises training and generalization or testing. 1 2 3
1 2
Coin Pattern
1
Euro Coin
2
Turkish Lira Coin
398 399 400 Input Layer
30
Output Layer
Hidden Layer
Fig. 4. ICIS Neural Network Topology
ICIS: A Novel Coin Identification System
917
Table 2. Final Neural Network Parameters Input Hidden Output Learning Momentum Minimum Training Iterations Time Nodes Nodes Nodes Rate Rate Error 400 30 2 0.0099 0.80 0.001 2005 46 seconds* *using a 2.4 GHz PC with 256 MB of RAM, Windows XP OS and Borland C ++ compiler
The neural network is trained using only 28 coin images of the available 168 coin images. The 28 training images are of rotated coins at (0o, 90o, 180o and 270o degrees) resulting in 8 (1 TL) coin images (4 obverse and 4 reverse sides) and 20 (2 EURO) coin images (4 obverse common side, 4 reverse side of Germany, 4 reverse side of France, 4 reverse side of Spain and 4 reverse side of Netherlands). The remaining 140 coin images are the testing images which are not exposed to the network during training and shall be used to test the robustness of the trained neural network in identifying the coins despite the rotations. During the learning phase, the learning rate and the momentum rate; were adjusted during various experiments in order to achieve the required minimum error value and meaningful learning. An error value of 0.001 was considered as sufficient for this application. Figure 4 shows the topology of the neural network within ICIS. Table 2 shows the final parameters of the trained neural network.
4 Results and Analysis The Intelligent Coin Identification System (ICIS) was implemented using the Cprogramming language. The neural network learnt and converged after 2005 iterations and within 46 seconds. The processing time for the generalized neural network after training and using one forward pass, in addition to the image preprocessing phase was a fast 0.02 seconds. The robustness, flexibility and speed of this novel intelligent coin identification system have been demonstrated through this application. Coin identification results (Table 3) using the training image set yielded 100% recognition as would be expected. ICIS identification results using the testing image sets were successful and encouraging. An overall correct identification of 158 coin images out of the available 168 coin images using all rotation degrees yielded a 94.04% correct identification. This successful result was obtained by using only the 90o rotated coin images for training the neural network. Table 3. ICIS Identification Results Coin Image Set Recognition Rate
Training 20/20 (100 %)
2 EURO Testing 96/100 (96 %)
Combined 116/120 (96.66 %)
Training 8/8 (100%)
1 TL Testing 34/40 (85 %)
Combined Total 42/48 158/168 (87.5 %) (94.04%)
5 Conclusions In this paper, a novel coin identification system is presented. This system, named as (ICIS), uses image preprocessing and a neural network. Image preprocessing is the
918
A. Khashman, B. Sekeroglu , and K. Dimililer
first phase in ICIS and aims at providing meaningful representations of coin patterns while reducing the amount of data for training the neural network; which receives the optimized data representing the coin images and learns the coin patterns. ICIS has been successfully implemented in this paper to identify the 2 EURO and 1 TL coins. The neural network training and generalizing used the Turkish 1-Lira coin and 4 2Euro coins of Germany, France, Spain and Netherlands. This solves a real life problem where physical similarities between these coins led to abusing slot machine in Europe. An overall correct identification result of 94.04% has been achieved, where 158 out of 168, variably rotated coin images, were correctly identified. These results are very encouraging when considering the time costs. The neural network training time was 46 seconds, whereas the ICIS run time for both phases (image preprocessing and neural network generalization) was 0.02 seconds.
References 1. Euro Coins: http://www.arthistoryclub.com/art_history/Euro_coins (2005) 2. Deutsche Welle, Current Affairs, Currency Confusion Helps Smokers, http://www.dwworld.de/dw/article/0,1564,1477652,00.html (04.02.2005) 3. Verband Turkischer Industrieller und Unternehmer: New Turkish Lira Symbolizes Closeness to Europe, Voices of Germany, Press review compiled by the Berlin Office of TÜSIAD, (19.01.2005) 4. Fukumi, M., Omatu, S., Takeda, F., Kosaka, T.: Rotation Invariant Neural Pattern Recognition System with Application to Coin Recognition, IEEE Transactions on Neural Networks, Vol. 3, No. 2 (1992) 272–279 5. Fukumi, M., Omatu, S., Nishikawa, Y.: Rotation Invariant Neural Pattern Recognition System Estimating a Rotation Angle, IEEE Transactions on Neural Networks, Vol. 8, No. 3 (1997) 568-581 6. Adameck, M., Hossfeld, M., Eich, M.: Three Color Selective Stereo Gradient Method for Fast Topography Recognition of Metallic Surfaces, Proceedings of Electronic Imaging, Science and Technology, Machine Vision Applications in Industrial Inspection XI, Vol. SPIE 5011 (2003) 128–139 7. Nolle, M. et al: Dagobert – A New Coin Recognition and Sorting System, Proceedings of VIIth Digital Image Computing: Techniques and Applications (2003) 329-338 8. European Central Bank: Euro Banknotes & Coins http://www.euro.ecb.int/en/section/euro0/ coins.html (2005)
Image Enhancement Method for Crystal Identification in Crystal Size Distribution Measurement Wei Liu and YuHong Zhao Institute of Industrial Control, Zhejiang University, 310027 Hangzhou, China {wliu, yhzhao}@iipc.zju.edu.cn
Abstract. The control of crystal size distribution is critically important in crystallization process. Therefore the measurement of crystal size distribution attracts much attention. Image analysis is an advanced method developed recently for crystal size distribution estimation. A feasible image enhancement method is proposed for crystal identification in crystallization image by using histogram equalization and Laplacian mask algorithm sequentially. The experiments result indicates that the effect of the crystal image can be improved obviously, and the crystals can be identified more easily and exactly.
1 Introduction Control of crystallization processes is critical in a number of industries, including microelectronics, food, and pharmaceuticals, which constitute a significant and growing fraction of the world economy [1]. In the pharmaceutical industry, the primary bottleneck to the operation of production-scale drug manufacturing facilities is associated with difficulties in controlling the size and shape distribution of crystals produced by complex crystallization process. Since Crystal Size Distribution (CSD) is the key controlled variable, an accurate measurement of the crystal size distribution is extremely important for crystallization control. The method of measuring CSD customarily used is sieving samples first and then analyzing the size distribution using a Coulter Counter. Another method used currently is Focused Beam Reflectance Measurement (FBRM), which is brought up according to Laser Scattering Theory. However, both of these methods are not satisfying enough in veracity and facility [2]. An advanced method proposed recently is based on image analysis. Improvements in image processing have made online video microscopy a promising technology to achieve these measurements [3]. After obtaining the crystal image in the crystal solution by a microscope in an appropriate scale, the major axis of the best-fit ellipse (approximate crystal length) could be lined out either manually or automatically. Then the length of the axis in the image can be calculated by distance measurement methods, and the real crystal size can be obtained through scale conversion. However, because of characteristics of the crystal image, it is not easy to identify the crystals correctly and completely. So image enhancement is needed. One of the primary characteristics of digital image processing is that there is a serious pertinence between processing algorithms and characteristics of the image. In other words, a certain D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 919 – 924, 2006. © Springer-Verlag Berlin Heidelberg 2006
920
W. Liu and Y. Zhao
algorithm just has good effect on several limited types of image. For others it won’t have the same effect, or even make the effect worse [4]. So it is a key problem to find out a proper and feasible image enhancement method for crystal identification. Since the automatic identification technique has not been developed maturely, most measures depend on manual work. In this paper an image enhancement method is provided for identifying the crystal manually. Histogram equalization and Laplacian mask are operated on the crystal images synthetically. As a result, the target parts of the image are protruded and the crystals can be identified easily. A large amount of crystal images have been processed by the proposed enhancement method, all experiments have shown good effects. Due to the length restrictions, only two of them are given in this paper to demonstrate the performance of the proposed method. In the following, the characteristic of crystal image is analyzed and the enhancement method is described in section 2. Histogram equalization methods and Laplacian mask algorithms are introduced respectively in section 3 and 4. The results of each processing are provided simultaneously. The conclusions are drawn and the further research is indicated in the last section.
2 Characteristics and Enhancement Method of Crystal Image Two typical crystal image obtained from the crystal solution are depicted in Figure 1 as (a) and (b). Their gray histograms are depicted in Figure 2 as (a) and (b) correspondingly.
(a)
(b)
Fig. 1. Two typical crystal images
From the figures above, the characteristics of a crystal image can be derived that: 1) The gray values concentrate in a narrow area and put up an unimodal-distribution, which results in an unobvious contrast on the edge of the crystals. So it is hard to find a division point to segment the image into foreground and background by the computer. 2) Most crystals in the image partly have the same gray values as the background. 3) For a crystal itself, the image has obviously different gray values, and generally has obvious edges inside because of the existing of different crystal planes.
Image Enhancement Method for Crystal Identification in CSD Measurement
921
Because of these characteristics of the crystal image, there has not been a general and effective method to identify the crystal in the image automatically. Identifying the crystals manually is most general considering the correctness of the result. However even for human eyes it is still not easy to identify the crystal correctly, especially in an image such as (a) in Figure 1. So an enhancement method is necessary to make the crystals identified more easily by manual work.
(a)
(b)
Fig. 2. Gray histograms of Figure 1
According to the characteristics discussed above, a feasible image enhancement method is proposed. Histogram equalization is adopted first to give the significant contrast difference of the images, and then Laplacian mask algorithm is used to make the images visibly sharper.
3 Histogram Equalization Histogram equalization is an image enhancement method based on cumulative distribution function (CDF) transformation. Let the variable r represent the gray levels of the image to be enhanced, and pr (r ) denotes the probability density functions of random variable r. The CDF transformation represents as the following: r
s = T (r ) = ³ pr (ω )dω 0
(1)
where ω is a dummy variable of integration. This transformation function satisfies the following conditions: (a) T (r ) is single-valued and monotonically increasing in the interval 0r1; (b) 0T(r)1 for 0r1 It can be demonstrated that the transformation function given in Eq.1 yields s characterized by a uniform probability density function. The transformation enlarges the gray value area. The transformation results of Figure 1 are depicted in Figure 3. While the gray histograms after transformation is depicted in Figure 4 correspondingly.
922
W. Liu and Y. Zhao
A conclusion can be drawn from these figures that the histograms have already spread to the full spectrum of the gray scale and the edges of the crystals are more obvious. As a result the crystals in the image are more clearly to identify. However, it is still not easy to line out the crystals manually especially in (a) of Figure 3. So further processing is needed. It is found in the experiment that Laplacian mask algorithms processing is an appropriate method.
(a)
(b)
Fig. 3. Transformation results of Figure 1
(a)
(b)
Fig. 4. Gray histograms of Figure 3
4 Image Enhancement Using Laplacian Mask Algorithm Laplacian is the simplest isotropic derivative operator. And it is used in the imaging sharpening frequently. For a function (image) f ( x, y ) of two variables, the discrete form of Laplacian is expressed as
Image Enhancement Method for Crystal Identification in CSD Measurement
∂ 2 f ( x, y ) ∂ 2 f ( x, y ) + ∂x 2 ∂y 2 = f (i + 1, j ) + f (i − 1, j ) + f (i, j + 1) + f (i, j − 1) − 4 f (i, j )
∇2 f =
923
(2)
In practice, this is usually implemented with one pass of a single mask. The coefficients of the single mask can be obtained by the following equation: g (i, j ) = f (i, j ) − ∇ 2 f (i, j ) = 5 f (i, j ) − f (i + 1, j ) − f (i − 1, j ) − f (i, j + 1) − f (i, j − 1)
(3)
The enhancement results of the original images (Figure 1) and the histogram-equalized images (Figure 3) are depicted in Figure 5 and 6 respectively.
(a)
(b)
Fig. 5. Laplacian mask enhancement results of original images
(a)
(b)
Fig. 6. Laplacian mask enhancement results of histogram-equalized images
Contrasted with the flat background, most of the crystals in each image are embossed in vision by Laplacian mask. However, the crystals in the images which have been processed by histogram equalization before are more clearly. So using these two
924
W. Liu and Y. Zhao
methods synthetically to process crystal images can improve the visual effect obviously. After processed by these two steps, crystals in the images are much easier to identify and line out manually.
5 Conclusions Crystal size distribution is important for the control of crystallization. Image analysis becomes an attractive method for the measurement of the CSD. The characteristics of the crystal image are analyzed in this paper, and a feasible digital image processing method is provided accordingly. Histogram equalization methods and Laplacian mask algorithms are used synthetically, as a result, the target crystals can be easily identified. The comparison between the images processed by the proposed method and the original images as well as the images processed by just one single method demonstrate that the image effect has been obviously improved. After these processing, the crystals can be lined out more easily and correctly either manually or automatically, and the real crystal size can be obtained through distance measurement algorithms and scale conversion. For the purpose of the control of the CSD, a further study will be focused on the automatic identification of the crystals. However, the processing method provided in this paper is also a necessary step in the research.
Acknowledgements The project is supported by National Natural Science Foundation of China (No.60503065).
References 1. Richard, D.B.: Advanced Control of Crystallization Processes. Annual reviews in control, Vol 26 (2002) 87-99 2. Daniel, B.P.: Crystal Engineering Through Particle Size and Shape Monitoring, Modeling, and Control. Ph.D Dissertation, University of Wisconsin-Madison (2002) 3. Paul, L., James, R.: Crystallization Studies using In-situ, High-Speed, Online Video Imaging. TWMCC (2004) 4. Lang, R.: Digital Image Processing & achieving by VC++. Beijing Hope Electronic Press, Beijing (2002) 5. Rafael, C.G., Richard E.W.: Digital Image Processing. Publishing House of Electronics Industry, Beijing (2002)
Image Magnification Using Geometric Structure Reconstruction Wenze Shao1 and Zhihui Wei2 1
Department of Computer Science and Engineering, Nanjing University of Science and Technology, 210094 Nanjing, China [email protected] 2 Graduate School, Nanjing University of Science and Technology, 210094 Nanjing, China [email protected]
Abstract. Though there have been proposed many magnification works in literatures, magnification in this paper is approached as reconstructing the geometric structures of the original high-resolution image. The structure tensor is able to estimate the orientation of both the edges and flow-like textures, which hence is much appropriate to magnification. Firstly, an edge-enhancing PDE and a corner-growing PDE are respectively proposed based on the structure tensor. Then, the two PDE’s are combined into a novel one, which not only enables to enhance the edges and flow-like textures, but also to preserve the corner structures. Finally, the novel PDE is applied to image magnification. The method is simple, fast and robust to both the noise and the blocking-artifact. Experiment results demonstrate the effectiveness of our approach.
1 Introduction Image magnification is mainly to produce the original high-resolution (HR) image from a single low-resolution (LR) and perhaps noisy image. Taking into account the insufficient density of the imaging sensor, it is reasonable to consider the observation model: g = Du + n , where g , u , and n are respectively the column-ordered vectors of the M × N LR image g , the original qM × qN HR image u , and the additive random noise n . Besides, the variable q represents the undersampling factor, and the matrix D describes the nonideal sampling process, i.e., first local-average and then down-sampling. There have been proposed many magnification algorithms [1, 2, 3, 4, 5, 6] in literatures, while currently the PDE-based level-set approaches [7, 8] are the most popular choices, which are fast, edge-enhancing and robust to the noise. Generally, the levelset magnification approaches can be unified to the following PDE ∂u / ∂t = c1D 2u (η ,η ) + c2 D 2u (ξ , ξ ) .
with the initial image as the bilinear or bicubic interpolation of the LR image. In the above PDE, η = Du / | Du | , ξ = Du ⊥ / | Du | are orthonormal vectors in the direction of gradient and tangent respectively, and c1 , c2 are the diffusivity functions of the D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 925 – 931, 2006. © Springer-Verlag Berlin Heidelberg 2006
926
W.Z. Shao and Z.H. Wei
gradient | ∇u | in each direction. Nevertheless, as for the case of nonideal sampling, image sharpening has to be incorporated into the interpolation process. Therefore, we propose the following PDE through incorporating in the shock filtering [9, 10] ∂u / ∂t = c1D 2u (η ,η ) + c2 D 2u (ξ , ξ ) − β sign( D 2u (η ,η )) | ∇u | .
(1)
where β is a positive controlling parameter. PDE (1) essentially magnifies images driven by the level curves, however, level curves do not capture all the geometric information that one desires to analyze the image content. Hence, more flexible PDE’s should be exploited to tackle with different geometric structures. Image magnification in this paper is approached as reconstructing the geometric structures of the original HR image. The structure tensor is able to estimate the orientation of both the edges and flow-like textures, which hence is much appropriate to magnification. Firstly, an edge-enhancing PDE and a corner-growing PDE are respectively proposed based on the structure tensor. Then, the two PDE’s are combined into a novel one, which not only enables to enhance the edges and flow-like textures, but also to preserve the corner structures. Finally, the novel PDE is applied to image magnification. The method is simple, fast and robust to both the noise and the blockingartifact. Experiment results demonstrate the effectiveness of our approach. The paper is organized as follows. Section 1 gives a brief review of the previous PDE-based magnification algorithms. In section 2, an edge-enhancing PDE and a corner-growing PDE are respectively proposed based on the structure tensor, and then the two PDE’s are combined into a novel. Subsequently, the novel PDE is applied to image magnification in section 3. The conclusion is finally given in section 4.
2 Edge-Enhancing PDE and Corner-Growing PDE To estimate the orientation of local geometric structures, Weickert [11] proposed the well-known structure tensor J ρ ( ∇uσ ) = Gρ ∗ ( ∇uσ ⊗ ∇uσ ) , ρ ≥ 0 .
(2)
denoted as ( J m, n ) m =1,2;n =1,2 , where uσ is the regularized version of u with a Gaussian kernel N (0,σ 2 ) , making the edge detection insensitive to the noise at scales smaller than σ ; the tensor ∇uσ ⊗ ∇uσ is convolved by a Gaussian kernel N (0, ρ 2 ) , making the structure analysis more robust to the flow-like structures and noises. The matrix J ρ is symmetric and definite semi-positive, and hence has orthonormal eigenvectors, denoted as w and w⊥ respectively. The vector w , defined as § w=¨ ¨ J 22 − J11 + ©
2 J12
( J 22 − J11 )
2
· ¸ , w = w/ | w | . + 4 J122 ¸¹
(3)
points in the direction with the largest contrast, and the orthonormal vector w⊥ points in the structure direction. Their corresponding eigenvalues μ and μ ⊥ can be used as descriptors of local structures: Constant areas are characterized by μ = μ ⊥ = 0, straight
Image Magnification Using Geometric Structure Reconstruction
927
edges by μ μ ⊥ = 0, and corners by μ ≥ μ ⊥ > 0. Besides, ( μ − μ ⊥ )2 is the measure of the local coherence, represented by ( μ − μ ⊥ )2 = ( J 22 − J11 ) 2 + 4 J122 . 2.1 Edge-Enhancing PDE Based on the two eigenvectors w and w⊥ provided by the structure tensor J ρ , we generalize PDE (1) to the following PDE ∂u / ∂t = c1D 2u ( w, w) + c2 D 2u ( w⊥ , w⊥ ) − β sign( D 2u ( w, w)) | ∇u | .
(4)
where β is a positive parameter, c1 and c2 are respectively the diffusivity functions of the local coherence ( μ − μ ⊥ )2 . Particularly, when c1 is a monotonically decreasing function ranged from 1 to 0, and c2 is equal to 1, the behavior of PDE (4) is easy to interpret. On homogeneous regions of u , the function c1 has values near 1. Then the first two terms of the PDE will be combined into Δu , yielding the isotropic diffusion. As for regions with edges and flow-like textures, the function c1 has values near 0 and hence the first term of PDE (4) vanishes. The second term forces PDE (4) to smooth the image in the structure direction, and therefore preserves the edges and flow-like textures. The last term in PDE (4) corresponds to the shock filtering for image sharpening, while the vector η in PDE (1) is replaced by the vector w . In fact, we notice that the same modification has also been proposed in [12]. Though PDE (4) may perform well on preserving the edges and flow-like textures, while how about the corner structures. Fig. 2(d) shows that PDE (4) is not capable of preserving the corner structures. 2.2 Corner-Growing PDE To overcome the blurring effect of corner structures in image diffusion, we propose the following PDE for corner growing ∂ u / ∂ t = c 3 (∇ u ) T ⋅ (∇ ⋅ ( w ⊥ ⊗ w ⊥ )) .
(5)
where c3 is a positive controlling parameter, and the divergence operator ∇ for a 2×2 matrix M is defined as § ∇ ⋅ (m1 1 m12 ) T · § m11 m12 · , M =¨ ∇⋅M =¨ ¸. T¸ © ∇ ⋅ (m 21 m 22 ) ¹ © m2 1 m2 2 ¹
Here, we demonstrate the performance of PDE (5) utilizing an interesting experiment as shown in Fig. 1. Obviously, Fig. 1(b) shows that PDE (5) plays the role of corner growing, and in fact, the rate of corner growing is determined by both parameters ρ and c3 . Therefore, we combine PDE’s (4) and (5), obtaining a novel PDE (6) which not only enables to enhance the edges and flow-like textures, but also to preserve the corner structures c1D 2u ( w,w) + c 2 D 2u ( w ⊥ ,w ⊥ ) + c 3 (∇ u ) T ⋅ (∇ ⋅ ( w ⊥ ⊗ w ⊥ )) ½ ∂u / ∂t = ® ¾. 2 ¯− β si gn( D u ( w,w)) | ∇ u | ¿
(6)
928
W.Z. Shao and Z.H. Wei
Other than PDE (6), Weickert [11] proposed the following PDE ∂ u / ∂ t = ∇ ⋅ ( D( J ρ (∇ u σ ))∇ u ) .
Where D is defined as D( J ρ (∇uσ )) = c1 ( w ⊗ w) + c2 ( w⊥ ⊗ w⊥ ) , called diffusion tensor. To achieve image sharpening, the modified shock filtering can be also incorporated in the above PDE, obtaining ∂ u / ∂ t = ∇ ⋅ ( D( J ρ (∇ u σ ))∇ u ) − β si gn( D 2u ( w,w)) | ∇ u | .
(7)
Nevertheless, PDE (6) is more powerful than PDE (7) in preserving corner structures. Experiment results shown in Fig. 2 tell us the truth.
(a)
(b)
(c)
(d)
(e)
Fig. 1. (a) Original image, (b) Diffuse Fig.1 (a) with PDE (5), (c) Gaussian noisy image ( μ = 0, σ = 10), (d) Diffuse Fig.1 (c) with PDE (4), (e) Diffuse Fig.1 (c) with PDE (6)
(a)
(b)
(c)
(d)
Fig. 2. (a) Original image, (b) Convolved image by the Gaussian kernel ( σ = 2), (c) Diffuse Fig.2 (b) with PDE (7), (d) Diffuse Fig.2 (b) with PDE (6)
3 Geometry-Driven Image Magnification In this section, we make use of PDE (6) for image magnification, with u ( x,0) = u0 ( x ) as the initial image (bilinear or bicubic interpolation of the LR image g ). By now, each term in PDE (6) has had its corresponding physical meaning in magnification: the first and second term combine to perform the isotropic diffusion in the homogeneous regions; the second term plays the role of smoothing the blocking artifacts in the structure direction; the third term plays the role of preserving the corner structures; while the fourth term overcomes the blurring effect introduced in the interpolation process. Since PDE (6) considers almost all of the geometric structures in images, which hence is much more appropriate to magnification than the level-set approaches.
Image Magnification Using Geometric Structure Reconstruction
929
For a vector ϖ = (ϖ 1 ,ϖ 2 )T , D 2u (ϖ ,ϖ ) = (ϖ 1 ) 2 u xx + 2ϖ 1ϖ 2u xy + (ϖ 2 ) 2 u yy . And as for the first-order partial derivatives ∂ x and ∂ y , they are calculated by recently proposed optimized derivative filters [13], with properties of rotation invariance, accuracy and avoidance of blurring effects. The corresponding numerical scheme of PDE (6) is as follows: °c1D 2 (uσt ) x ( w, w) + c2 D 2 (uσt ) x ( w⊥ , w⊥ ) + c3 (∇u xt ) T ⋅ (∇ ⋅ ( w⊥ ⊗ w⊥ )) ½° u xt +1 = u tx + τ ® ¾. °¯ + β (− sign( D 2 (uσt ) x ( w, w))) ∇u xt ¿°
(8)
Then, PDE (6) can be implemented by the following steps: 1. Calculate the initial image u0 ( x ) using bilinear interpolation; 2. Calculate the structure tensor J ρ (∇uσ ) = Gρ ∗ (∇uσ ⊗ ∇uσ ) using (2); 3. Calculate the dominate vector w using (3); 4. Calculate the local coherence and diffusivity functions c1 and c2 ; 5. Calculate (uσ ) x , (uσ ) y , (uσ ) xx , (uσ ) xy , (uσ ) yy , D 2uσ ( w, w) , and D 2uσ ( w⊥ , w⊥ ) ; 6. Calculate u x , u y , | ∇u | and (∇u )T ⋅ (∇ ⋅ ( w⊥ ⊗ w⊥ )) ; 7. Update the iteration process using (8) (The number of iteration step is T).
(a)
(b)
(c)
Fig.3. (a) Original image, (b) Level-set approach ( T = 20,τ = 0.24, c = 1, c2 = 1, β = 0.15 ), (c) Our proposed approach ( T = 20,τ = 0.24,σ = 1.5, ρ = 2, c = 1, c2 = 1, c3 = 1.5, β = 0.15 )
(a)
(b)
(c)
Fig. 4. Magnified portions corresponding to (a) Fig. 3(a), (b) Fig. 3(b), (c) Fig. 3(c)
930
W.Z. Shao and Z.H. Wei
The diffusivity function c1 in PDE’s (1) and (6) is chosen as c1 (t ) = c /(1 + t 2 ) for gray images ranged from 0 to 255, and c2 is defined as a variable for simplification. Hence, there are overall 8 parameters in the numerical scheme (8): T,τ ,σ , ρ , c, c2 , c3 , β . T is the number of iteration step, determined by the undersampling factor q and the noise level; τ is the size of each iteration step, bounded in the interval (0, 0.25) for numerical stability; σ is defined to regularize the noisy image, mainly determined by the noise level; ρ controls the size of neighborhood, large neighborhoods making the estimation of the structure orientation more robust to the interrupted lines, texture details, and random noises; c controls the strength of isotropic diffusion, also determined by the noise level; c2 controls the strength of diffusion in the structure direction, mainly determined by the undersampling factor q ; c3 controls the strength of enhancing the corner structures, determined by c1 , c2 and q ; β controls the strength of image sharpening, determined by c , c1 , c2 and q . Here, we only try to give some empirical intervals of the above parameters: τ ∈ (0, 0.24] , σ ∈ (0, 3] , ρ ∈ (1, 3] , c ∈ [1, 2] , c2 ∈ [1, 5] , c3 ∈ [1, 2] , and β ∈ [0.1, 0.3] . The shock filtering-incorporated level-set approach (1) and our proposed PDE (6) are utilized for image magnification. The initial guess for both PDE (1) and PDE (6) is chosen as the bilinear interpolation of the LR image. Fig. 3 shows the magnification results when the undersampling factor is 4. And Fig. 4 shows the portions of Fig. 3. Obviously, our proposed approach achieves much better visual quality compared with the level-set approach. As a matter of fact, the level-set approach not only removes the corners, but also shortens the level curves of the original HR image.
4 Conclusions The paper proposed an alternative PDE approach for image magnification based on the proposed edge-enhancing PDE and corner-growing PDE, which is not only capable of enhancing the edge structures, but also preserving the corner structures. The method is simple, fast and robust to both the noise and the blocking-artifact. Experiment results demonstrate the effectiveness of our approach.
References 1. Blu, T., Thévenaz, P., Unser, M.: Linear Interpolation Revisited. IEEE Transactions on Image Processing, 13 (2004) 710-719 2. Li, X., Orchard, T.: New Edge-Directed Interpolation. IEEE Transactions on Image Processing, 10 (2001) 1521-1527 3. El-Khamy, S. E., Hadhoud, M. M., Dessouky, M.I., Salam, B. M., El-Samie, F. E.: Efficient Implementation of Image Interpolation as an Inverse Problem. Digital Signal Processing, 15 (2005) 137-152 4. Schultz, R. R., Stevenson, R. L.: A Bayesian Approach to Image Expansion for Improved Definition. IEEE Transactions on Image Processing, 3 (1994) 233-242 5. Guichard F., Malgouyres F.: Total Variation based Interpolation. EUSIPSO'98, 3 (1998) 1741-1744
Image Magnification Using Geometric Structure Reconstruction
931
6. Chan, T. F., Shen, J. H: Mathematical Models for Local Nontexture Inpaintings. SIAM J. Appl. Math., 62(3) (2002) 1019-1043 7. Belahmidi, A., Guichard, F.: A Partial Differential Equation Approach to Image Zoom. Proceedings of International Conference on Image Processing, (2004) 8. Morse, B. S., Schwartzwald, D.: Isophote-based Interpolation. In 5th IEEE International Conference on Image Processing, (1998) 9. Osher, S. J., Rudin, L. I.: Feature-Oriented Image Enhancement Using Shock Filters. SIAM J. Numer. Anal, 27 (1990) 919-940 10. Alvarez, L., Mazorra, L.: Signal and Image Restoration Using Shock Filters and Anisotropic Diffusion. SIAM J. Numer. Anal., 31 (2) (1994) 590–605 11. Weickert, J.: Coherence-Enhancing Diffusion Filtering. International Journal of Computer, 31 (2/3) (1999) 111–127 12. Weickert, J.: Coherence-Enhancing Shock Filters. Pattern Recognition. Lecture Notes in Computer Science, Springer-Verlag, Berlin Heidelberg, 2781 (2003) 1–8 13. Weickert, J.: A Scheme for Coherence-Enhancing Diffusion Filtering with Optimized Rotation Invariance. Journal of Visual Communication and Image Representation, 13 (1/2) (2002) 103–118
Image-Based Classification for Automating Protein Crystal Identification Xi Yang1, Weidong Chen1, Yuan F. Zheng1, 2, and Tao Jiang3 1
Department of Automation, Shanghai Jiao Tong University, Shanghai, 200240, China 2 Electrical & Computer Engineering, The Ohio State University, USA 3 National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Science, Beijing 100101, China [email protected]
Abstract. A technology for automatic evaluation of images from protein crystallization trials is presented in this paper. In order to minimize the interference posed by the environmental factors, the droplet is segmented from the entire image first. The algorithm selects different features, which are derived from the pixels within the droplet, and obtains a 16-dimensional feature vector which will then be fed to the classifier to make a classification. Each image is classified into one of the following classes: “Clear”, “Precipitate” and “Crystal”. We have achieved an accuracy rate of 84.8% with our algorithm.
1 Introduction The analysis of the protein structure is an important component of protein crystallography, which has been one of the most popular research areas in recent years. Study of the function of protein crystal helps us to understand the mechanism of the protein as well as the interplay between protein molecule and other molecules [1]. The high-throughput protein crystallization system can prepare thousands of trials per day. Conventionally, the outcomes of the protein crystallization trials are assessed by human experts. This procedure is slow and inefficient. Therefore, an automatic technology needs to be studied to replace the manual work. Several methods have been proposed by other researchers [2 - 5]. The best result was achieved by Bern et al. in 2004 [6]. However, when their algorithm is applied to our image set, the accuracy rate is not acceptable. In this paper, we propose an automatic protein crystallization classification algorithm, which is based on the digital image processing technology. All the image samples obtained from the protein crystallization equipment are classified into 3 different classes, called “Clear” – no substance is produced, “Precipitate” – the primary substances produced are precipitates, and “Crystal” – the primary substances produced are crystals.
2 Methodology The procedure of the algorithm, as shown in Fig. 1, consists of 3 steps, including image segmentation, feature extraction and classification. Otsu automatic threshold [7], D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 932 – 937, 2006. © Springer-Verlag Berlin Heidelberg 2006
Image-Based Classification for Automating Protein Crystal Identification
933
Canny edge detection [8] and Active Contour Model (ACM) [9] are utilized to locate the boundary of the droplet. Image features are derived by calculating the Gray Level Co-occurrence Matrix (GLCM) [10], Hough Transform and Discrete Fourier Transform (DFT) of all the pixels belonging to the droplet. The classification procedure is divided into two stages; each of which is a two-class problem.
Fig. 1. Procedure of the algorithm
3 Algorithm 3.1 Image Segmentation In our algorithm, firstly, the image is converted from gray scale to binary image. The threshold used is computed by the Otsu algorithm. Then an ACM is applied to the image. ACM is defined as a dynamic contour which can change its shape, based on its energy function, to adapt to the local feature, for example, the boundary of the droplet. The energy function of ACM can be expressed as equation (1):
E = E int + E ext
(1)
Eint is the internal energy based on the shape of the dynamic contour, and Eext is the external energy based on the local feature of the image. Interested readers are referred to [9] for the extra formulas. The internal energy will smooth the dynamic contour and the external energy will adapt it to the edges detected in the image when minimizing the energy function. The ACM should be initialized with a position, from which it begins to change its shape. The minimum circle surrounding the connected component, which has the maximum area, is selected as the initial location of the contour. Canny edge detection is employed to detect the edge of the droplet, which is taken as the final position of the dynamic contour. After several iterations, the contour will converge to the boundary of the droplet, and all the pixels within the contour can be segmented from the image. The procedure of image segmentation is shown in Fig. 2.
(a)
(b)
(c)
(d)
(e)
Fig. 2. Procedure of the segmentation. (a): original image; (b): binary image converted from (a); (c): initial circle; (d): image with edges detected; (e): the nodes represent the boundary detected by ACM.
934
X. Yang et al.
3.2 Feature Extraction The classification is made based on the features extracted from the image. At first, individual features are extracted from the image, and then a feature vector is formed by these individual features. The features used include the texture, geometry and frequency features derived from the pixels inside the droplet. A detailed description of the features is presented as follows: A notable characteristic of the image is the texture features based on the statistical analysis of the image. The images with and without crystals will present different texture features in terms of contrast, correlation, etc. Texture features are obtained by computing the GLCM of the pixels belonging to the droplet. The GLCM is defined by the following equation:
§ f ( x + Δx, y + Δy) = j · ½ ° ° ¨ ¸ p(i, j) = # ®( x, y) | f ( x, y) = i, and ¨ or , x , y 0,1 N 1 = − …… (2) ¾ ¸ ° ¨ f ( x − Δx, y − Δy) = j ¸ ° © ¹ ¿ ¯ p(i, j ) is the element of the GLCM. x and y are the coordinates of each pixel, and f ( x, y ) represents the gray scale value of that pixel. # {Ω} means the number of elements within the set. Four properties can be computed from the GLCM: Entropy, Energy, Contrast and Correlation, which are selected as the texture features together with the mean value and the standard deviation of the gray scale values of all the pixels within the droplet. Another significant property of the image is the straight lines detected in the droplet. It can be seen that the edges of the crystals are always presented as straight lines while the edges of precipitates are usually curves. Hough Transform is utilized to detect the straight lines. Two values are taken as the geometry features, as ever mentioned by Cumbaa et al. [5], the total length and the maximum length of the lines detected in the image. Note that for the protein crystals, their edges are always clear and sharp while for the precipitates, their edges are always fluffy and smooth. As a result when the images are converted from the spatial to frequency domain, the images with precipitates will have more energy at high frequency components than the images with crystals. We select the mean values and the standard deviations of the image energy at four different frequency bands as the frequency features. 3.3 Classification
The classification is formed by two steps. In the first step, each image needs to be classified into “Clear” – no substance is generated or “Not Clear” – something (either precipitates or crystals) is generated. In the second step, the images which are labeled as “Not Clear” are classified into “Precipitate” and “Crystal”. The parameters used in the first step are defined as follows: A1: the number of grids, whose gray scale standard deviation exceed TC1. A2: the number of grids, whose gray scale standard deviation exceed TC2. A3: the number of grids, whose entropy exceed TC3. K: The number of connected components detected within the binary image. TC1, TC2 and TC3, TA1, TA2 and TA3 are manually determined thresholds.
Image-Based Classification for Automating Protein Crystal Identification
935
Firstly, the image is divided into several 30 × 30 pixels grids. If the A1 value of the image exceeds TA1, the image is marked as “Not Clear”. Otherwise, the image is converted from gray scale to binary image by the self-adaptive threshold algorithm. We detect the connected components within the binary image, and compute K. If K is not zero, then draw several rectangles surrounding each connected component, and divide each rectangle into 5 × 5 grids. If the A2 value of either rectangle exceeds TA2, the image is “Not Clear”. If K is zero or A2 is smaller than TA2, again, divide the segmented image into several 30 × 30 pixels grids. If the A3 value of the image is greater than TA3, then the image is “Not Clear”. Otherwise, the image is “Clear”. The flowchart shown in Fig. 3 describes the algorithm performed in the first step.
Fig. 3. The flowchart of the algorithm used in the first step of the classification
In the second step, a Fisher classifier, which has been trained by a human labeled learning set, is employed to make the classification. The learning set consists of images with crystals, called positive samples, and images with precipitates, called negative samples. The 16-dimensional feature vector f can be obtained from each image. Compute a project vector w , which should satisfy the following condition: when each f is projected onto w , the positive and the negative samples should be maximally separated. For an image with unknown class, compute its feature vector f , if f ⋅ w exceeds a scalar quantity l , then label the image as “Crystal”. Otherwise, label
the image as “Precipitate” as shown in equation (3). The scalar quantity l can be determined by prior knowledge.
≥ l → Crystal ¯< l → Precipitate
f ⋅w®
(3)
4 Experimental Results The learning set is formed by 10 images with crystals, and 10 images with precipitates. The testing set comes from a combination of 52 “Clear” images, 12 “Precipitate” images and 46 “Crystal” images. The experiment is performed on a PC with Windows XP operating system, and the CPU is AMD 2500+. With the project vector w derived from the learning set, we achieve a result as shown in Table 1:
936
X. Yang et al. Table 1. Result of the experiment
Detected True
“Clear”
“Precipitate”
“Crystal”
“Clear” (52) “Precipitate” (12) “Crystal” (46)
82.7% (43) 8.3% (1) 2.2% (1)
1.9% (1) 58.3% (7) 13.0% (6)
15.3% (8) 33.3% (4) 84.8% (39)
Typical images processed in the experiment are shown in Fig. 4.
(a)
(b)
(c)
(d)
(e)
Fig. 4. Typical images processed in the experiment. (a) (b) and (c) can be classified correctly, where (a) is “Clear”, (b) is “Crystal”, and (c) is “Precipitate”; (d): “Clear” image is classified as “Crystal” due to the light reflection as shown in the block; (e): image which contains grainy crystals is falsely classified as “Precipitate”.
5 Conclusion The algorithm proposed in this paper is proved to be effective and efficient. 84.8% “Crystal” images can be recognized correctly in the experiment. In order to increase the accuracy rate, new features should be considered, for example, the ones suggested by Bern et al. [6], corners, transparency and closed outer contours. Besides DFT used in our algorithm, wavelet transform can also be utilized to obtain more information. Finally, although the images with crystals can be differentiated from those with precipitates, the capability and quality of each protein crystallization trial are still unknown. The number of the crystals generated and the size of each crystal need to be studied to evaluate the performance of each trial in the future.
Acknowledgement This work is partly supported by the National Hi-Tech Research and Development Program under grant 2005AA420010.
References 1. Abola, E., Kuhn, P., Earnest, T., Stevens, R.: Automation of X-ray Crystallography. Nature Structural Biology, 7 (2000) 973-977 2. Wilson, J.: Towards The Automatic Evaluation of Crystallization Trials. Acta Crystallographica D, vol. 58 (2002) 1907-1914
Image-Based Classification for Automating Protein Crystal Identification
937
3. Spraggon, G., Lesley, S. A., Kreusch, A., Prestle, J. P.: Computational Analysis of Crystallization Trials. Acta Crystallographica D, vol. 58 (2002) 1915-1923 4. Jurisica, I., Rogers, P., Glasgow, J. I., Fortier, S., Luft, J. R., Woilfley, J.R.: Intelligent Support for Protein Crystal Growth. IBM System Journal, vol. 40, no. 2 (2001) 394-409 5. Cumbaa, C. A., Lauricella, A., Fehrman, N., Veatch, C.: Automatic Classification of Submicrolitre Protein-crystallization Trials in 1536-well Plates. Acta Crystallographica D, vol. 59 (2003) 1619-1627 6. Bern, M., Goldberg, D., Stevence, R. C., Kuhn, P.: Automatic Classification of Protein Crystallization Images Using A Curve-tracking Algorithm. Journal of Applied. Crystallography D, vol. 37 (2004) 279-287 7. Otsu, N. A.: Threshold Selection Method from Gray-level Histograms. IEEE Trans. Systems, Man and Cybernetics, vol. 9, no. 1 (1979) 62-66 8. Canny, J.: A Computational Approach to Edge Detection. IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 8, no. 6 (1986) 679-698 9. Kaas, M., Witkins, A., Terzopolus, D.: Snakes-Active Contour Models. International Journal of Computer Vision, vol. 1, no. 4 (1987) 321-330 10. Haralick, R., Shanmugan, K., Dinstein, I.: Textural Features for Image Classification. IEEE Trans. Systems, Man and Cybernetics, vol. SMC-3, no. 6 (1973) 610-621
Inherit-Based Adaptive Frame Selection for Fast Multi-frame Motion Estimation in H.264 Liangbao Jiao1, 3, De Zhang1, 2, and Houjie Bi2 1
Institute of Acoustics, State Key Lab of Modern Acoustics, Nanjing University, 210093 2 Institute of Communication Technique, Nanjing University, 210093 3 Nanjing Institute of Technology, 210000 [email protected]
Abstract. H.264 allows motion estimation performing on multiple reference frames and seven modes. This new feature improves the prediction accuracy of inter-coding blocks significantly, but it is extremely computational intensive because the complexity of multi-frame motion estimation is quickly increased with the number of used reference frames. Moreover, the distortion gain caused by each reference frame in various modes is correlated, therefore it is not efficient to scan all the candidate frames in all seven modes. In this paper, a novel inherit-based adaptive frame selection method is proposed to reduce the complexity of the multi-frame motion estimation process. A new reference list for ME (Motion Estimation) of low level mode is constructed adaptively according to the results of ME of up level mode. Simulation results show that the proposed method can save about 15% to 50% computations while getting almost the same Rate-Distortion performance as the full scan.
1 Introduction H.264/MPEG-4 AVC [1] is the latest video coding standard developed by the Joint Video Team (JVT). One of its significant advantages is high compression efficiency. It could save half of the bit-rate compared with the H.263 [2]. The significant improvement of compression efficiency in H.264 is achieved as the cost of the increasing of the computation and complexity. Because of utilizing a lot of interframe coding, up to 80% computational power of an encoder is consumed by ME [3]. To reduce the computation, blocking matching algorithm (BMA) is generally adopted in ME. New three-step search [4], four-step search [5], diamond search [6], and crossdiamond search [7] are some of the fast BMAs. In H.264, motion estimation is allowed searching on multiple reference frames to further reduce the temporal redundancy. Certainly, it adds the computational load due to motion estimation with the number of reference frames increasing, and the cost of motion estimation will dominate the complexity of the video codec. This is absolutely a challenge for mobile computing devices which limits the computing power. Therefore, the strategy of faster motion estimation is an eager requirement in H.264. Early works [8] [9] on reference frame selection is aimed at optimizing streaming quality. a novel frame selection method is proposed in [10] to speed up the D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 938 – 944, 2006. © Springer-Verlag Berlin Heidelberg 2006
Inherit-Based Adaptive Frame Selection for Fast Multi-frame ME in H.264
939
multi-frame motion estimation in H.264. Based on the center-biased MVP distribution characteristic of real world sequences, a center-biased frame selection path is applied to efficiently locate an ultimate frame in this method. From the simulation in [10], it can be seen that the computation cost is significantly reduced. At the same time, the rate distortion performance may be drop down because of the skipping of a lot searching area. However, the main focus in this paper is the solution of the linear increasing of the ME computation cost with the reference frame number. That is to say, decreasing the computation cost while maintaining the RD performance, even in the worst case. In this paper, we present a simple and effective method to reduce the computational cost without significant quality degradation in H.264. Except ME of 16X16 mode, only part of reference frames in other modes are scanned. The new reference frame list is constructed adaptively according to the results of up level mode ME. In Section 2, we will analyze the distribution of the lowest ME cost among the reference frames, and the ME cost correlation between up and low level modes. The results encourage the formation of the proposed method. In Section 3, inherit-based adaptive frame selection algorithm will be described in detail. The simulation results will be shown in Section 4. Finally, a conclusion is given.
2 Analysis and Observations In the JM (JVT Model) software, the RD optimizing in ME is to find that the lowest ME cost can be acquired when encoding a macroblock. Thus ME may be computed in seven inter-frame modes (i.e. 16×16, 16×8, 8×16, 8×8, 8×4, 4×8 and 4×4 mode) and intra-frame mode on all reference frames. To each macroblock, total 16×7=112 times inter-frame ME should be done if 16 reference frames are used; it is a big computation cost. Therefore it is worth to analyze the cost caused by each reference frame and the cost’s relationship in ME between MB (macroblock) and block of different modes at corresponding location. Generally, the temporal redundancy between two frames will increase with the time closing between the frames. That is to say, in the candidate reference frames, the three closest frames to current frame have the bigger probability to have the least ME cost than the latter frames. The probability to have the least ME cost among different reference frames is shown in table 1, when six QCIF(176X144) sequences is coded by all seven modes with the reference frame number set to 16. The sequences
Table 1. The probability distribution of the least ME cost Reference frame Pos 1st(%) 2nd(%) 3rd(%) Total(%)
940
L. Jiao, D. Zhang, and H. Bi Table 2. The accordant probability between correlative code modes Up level mode
Low level mode
Test Sequence 1
2
3
4
5
6
16X16
16X8
67.13
75.26
80.86
85.19
64.59
77.59
16X16
8X16
65.32
73.12
80.49
82.91
63.24
78.61
16X16
8X8
54.24
67.45
77.67
80.04
51.74
72.9
8X8
8X4
83.7
83.98
89.34
88.29
81.37
90.09
8X8
4X8
77.78
83.58
91.85
88.59
78.47
89.76
8X8
4X4 Average
74.15
80.51
90.41
86.73
74.12
87.91
70.39
77.32
85.10
85.29
68.92
82.81
are Forman, Mother&Daughter, Hall_Monitor, News, Carphone and Container, which will be represented by number (1,2,3,4,5,6) in the tables of this paper respectively. From table 1, it can be seen that the average probability of latest three reference frames to have the least ME cost is more than 75%. Secondly, in motion estimation of MB using JM software, the order of ME process is that 16×16 mode is scanned first, then 16×8, 8×16 mode, finally 8×8 mode and its sub-modes. In general, the processed object in low level mode is part of that in up level mode, therefore the result of ME on up level modes could be the guidance of ME on low level modes. For example, the reference frame, which has the least cost in ME of 16×16 mode, may also have the least one in 8×8 mode. The accordant probability with lowest ME cost between correlative code modes is shown in table 2 where six QCIF sequences is coded with 16 reference frames. It shows that the accordant probability with lowest ME cost between 16×16 mode and 16×8 mode is 67.13% to the Foreman sequence, and the accordant probability is 89.34% to the Hall_Monitor sequence between 8×8 mode and 8×4 mode.
3 Proposed Scheme It is inefficient to design the H.264 encoder by all cost. For that reason, it is essential to design a fast algorithm in keeping the RD performance and reducing its complexity. On the basis of the analysis in Section 2, to each ME mode, enough accordant probability of the ME result will be guaranteed when only latest three reference frames is processed. In addition, the optimized reference frame list can be adaptively constructed according to the frames which have lower cost in up level mode ME. The two considerations above is the basis of the inherit-based adaptive frame selection algorithm in this paper. In the algorithm, a new optimized reference list is constructed according to the inheriting of each mode before ME. To 16×16 mode, all the candidate reference frames is used to ME, because the ME result is the base of the new reference list construction of all the other modes. To 16×8 and 8×16 mode, the new reference list (named Reflist_1 in algorithm) contains 1st, 2nd, and 3rd reference frames and the three frames with the least ME cost in 16×16 mode. To 8×8 mode, the
Inherit-Based Adaptive Frame Selection for Fast Multi-frame ME in H.264
1
2
A
1
3
4
5
6
2
5
4
7
A B 7
941
3
6
B 8
8
Fig. 1. 8×4 and 4×8 mode frame selection reference
Fig. 2. The ME Time using different Reference Frame Number to Mother&Daughter Sequence
new reference list contains not only all the frame in list Reflist_1, but also 4th and 5th reference frames, because the ME result of 8×8 mode should be used in the reference list construction of modes 8×4, 4×8 and 4×4. The reference list construction of mode 8×4 and 4×8 is some complicated. It contains the 1st, 2nd and 3rd reference frames, the frames which has least three cost in ME of 8×8 mode and the two frames with least cost of ME in 16×8(or 8×16) mode. The relation between 8×4(4×8) block and 16×8(8×16) block in reference frame selection is shown in figure 1, i.e. the reference list of block 1-4(8×4 mode or 4X8 mode) is constructed according to the ME result of block A(16×8 mode or 8×16 mode) and the reference list of block 5-8(8×4 mode or 4×8 mode) is constructed according to the ME result of block B(16×8 mode or 8×16 mode). To 4×4 mode, the frame 1st, 2nd, 3rd, and three frames with the least ME cost in 8×8 mode form the new reference list.
4 Simulation Results In the simulation, the above six sequences is used from frame 0 to 299 (total 300 frames). All P-Frame coding and Fast ME (UseFME=1) is adopted for convenience. The searching range is set to 16. To compare the computation and RD performance, the traditional JM8.6 and modified JM8.6 by suggested algorithm in this paper are both used to encode the sequences. In traditional real time or wireless applications, no more than 10(normally 5) reference frames is used because of the computation cost increasing quickly with reference frame increasing. Figure 2 shows that the ME-time vs. the number of reference frames in the case of the tradition and modified algorithm to Mother&Daughter sequence respectively. It is clearly to see that using improved JM8.6, the ME-time is increasing slowly as the reference frame number increasing. Therefore it could utilize more reference frames to ensure the RD with high qualities. For comparison in detail, the value of METime(the time used in ME) and the RD(bit rate and PSNR of luminance component) of six QCIF(176X144) are shown in table 3, Each value is averaged by QP(quantitative parameter) from 25 to 34. It can be seen in table 3 that comparing with the tradition JM8.6 with 5 reference frames, the modified JM8.6 with 10 reference frames saves the METime more than 10% and the RD performance is improved, i.e. PSNR is advance 0.03db and the bit rate is saved
942
L. Jiao, D. Zhang, and H. Bi Table 3. RD performance and ME Time of different coding methods Test sequence 1 2 3 4 5 6
Ave
Test seque nce 1 2 3 4 5 6
Ave
New Method with 10 reference frames METime SNRY BitRate(K @30HZ) (s) (db) 34.91 120.14 63.16 36.35 37.19 39.08 36.17 45.30 28.56 35.50 65.15 35.85 35.74 122.99 52.12 34.81 33.34 31.43 35.58 70.68 41.70 New Method with 16 reference frames SNRY METime BitRate(K @30HZ) (db) (s) 34.95 118.87 75.18 36.37 36.91 45.69 36.19 45.38 33.44 35.51 65.03 42.58 35.77 122.42 60.55 34.81 33.39 37.59 35.60 70.68 49.17
Traditional JM8.6 with 5 reference frames SNRY METime BitRate(K@ 30HZ) (db) (s) 34.87 121.25 68.17 36.31 37.65 44.20 36.16 45.27 35.36 35.49 65.25 41.78 35.68 124.06 59.39 34.78 34.13 37.97 35.55 71.27 47.81 Traditional JM8.6 with 10 reference frames SNRY METime BitRate(K@ 30HZ) (db) (s) 34.95 119.75 146.07 36.36 37.10 89.94 36.17 45.27 67.22 35.51 65.03 83.71 35.78 122.64 139.10 34.82 33.23 70.85 35.60 70.50 99.48
1% in average value of six QCIF. Figure 3 is the RD curves of traditional JM8.6 with 5 reference frames & Improved JM8.6 with 10 reference frames according to Carphone sequence. It could be seen from Fig.3 that the RD performance of the algorithm in this paper is better than that of traditional algorithm.
Fig. 3. The RD curves of traditional JM8.6 with 5 reference frames & Improved JM8.6 with 10 reference frames according to Carphone Sequence
Fig. 4. The RD curves of traditional JM8.6 with 10 reference frames & Improved JM8.6 with 16 reference frames according to Mother&Daughter sequence
As one known, the RD performance could be enhanced by more reference frames, for example, using 10 reference frames in tradition JM8.6. The simulation results are shown in table 3 as well. However the METime is nearly double time of that with 5 reference frames. For keeping the RD performance, 16 reference frames should be used in modified JM8.6, however the METime is only 50% of that with 10 reference
Inherit-Based Adaptive Frame Selection for Fast Multi-frame ME in H.264
943
frames in the tradition JM 8.6. Figure 4 shows the RD curves of traditional JM8.6 with 10 reference frames & Improved JM8.6 with 16 reference frames according to Mother&Daughter sequence, which demonstrates that the RD performance of modified JM8.6 with 16 reference frames are almost the same as the traditional JM8.6 with 10 reference frames. It should be noted that the new algorithm will consume more memory space. However for mobile equipment, the significant decrease of computation is more worthy than the increase of memory. At the same time, according to the motion estimation, the computation cost of the sorting is so low that can be negligible.
5 Conclusion In this paper, a novel reference frame selection method is proposed to speed up the multi-frame motion estimation in H.264. Based on the distribution of ME cost in different reference frames and the inheriting of least cost of reference frame among the seven ME mode, a adaptive reference frame selection method is adopted to construct the new reference list to each ME mode, which saving a lot ME time. Simulation testify that, when multi-frame reference is used (i.e. 16), more than 50% ME time is saved while the RD performance is retained. The new algorithm has also solved the problem that the ME computation cost is quickly increasing when reference frame number increases. The proposed algorithm is highly suitable for realtime video-conferencing applications and mobile equipment.
Acknowledgments This paper is supported by National Nature Science Foundation of China (Grant No. 10234060).
References 1. Joint Video Team of ITU-T and ISO/IEC JTC 1: Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification, Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, JVT-G050 (2003) 2. Girod, B., Flierl, M.: Multi-Frame Motion-Compensated Video Compression for the Digital Set-Top Box, Proc. IEEE ICIP (2002) 3. Pirsch, P., Demassieux, N., Gehrke W.: VLSI Architectures for Video Compression – A Survey, Proc. IEEE. 83 (1995) 220-246 4. Li, R., Zeng, B., Liou, M.L.: A New Three-Step Search Algorithm for Block Motion Estimation, IEEE Trans. Circuits System Video Technology. 4 (1994) 438-443 5. Po, L.M, Ma, W.C.: A Novel Four-Step Search Algorithm for Fast Block Motion Estimation, IEEE Trans. Circuits System Video Technology. 6 (1996) 313-317 6. Tham, J.Y., Ranganath, S., Ranganath, M., Kassim, A.A.: A Novel Unrestricted CenterBiased Diamond Search Algorithm for Block Motion Estimation, IEEE Trans, Circuits System Video Technology. 8 (1998) 369- 377 7. Cheung, C.H., Po, L.M.: A Novel Small-Cross Diamond Search Algorithm for Fast Video Coding and Video Conferencing Applications, Proc. IEEE ICIP (2002)
944
L. Jiao, D. Zhang, and H. Bi
8. Wiegand T., Farber, N., Girod, B.: Error-Resilient Video Transmission Using Long-Term Memory Motion-Compensated Prediction, IEEE J. Select. Areas. Comm. 18 (2002) 1050– 1062 9. Liang, Y., Flieri, M., Girod, B.: Low-Latency Video Transmission over Lossy Packet Networks Using Rate-Distortion Optimized Reference Picture Selection, IEEE International Conference on Image Processing, Rochester. NY(2002) 10. Ting, C.W., Po, L.M., Cheung, C.H.: Center-Biased Frame Selection Algorithms for Fast Multi-Frame Motion Estimation in H.264, Proceeding of 2003 IEEE International Conference on Neural Networks and Signal Processing, Nanjing, China. (2003)1258-1261
Intelligent Analysis of Anatomical Shape Using Multi-sensory Interface Jeong-Sik Kim, Hyun-Joong Kim, and Soo-Mi Choi School of Computer Engineering, Sejong University, Seoul, Korea [email protected]
Abstract. This paper presents a method for intelligent shape analysis of the hippocampus in a human brain using multi-sensory interface. To analyze the shape difference between two groups of the hippocampus, initially we extract quantitative shape features from input images, and then perform statistical shape analysis using parametric representation and Support Vector Machines (SVMs) learning algorithm. Results suggest that the presented shape representation and a polynomial kernel based SVMs algorithm can effectively discriminate between normal controls and epilepsy patients. To provide a more immersive and realistic environment in analysis, we combined a stereoscopic display and a 6-DOF force-feedback haptic device. The presented multi-sensory environment improves space and depth perception, and provides users with sense of touch feedback while making it easier to manipulate 3D objects.
1 Introduction Typically, image-based statistical studies of morphology were based on simple measurements of size, area and volume. Shape-based intelligent analysis can provide much more detailed descriptions of morphological changes and can minimize an expert’s interference. Thus, users with insufficient knowledge of anatomy can easily understand the morphological changes when comparing patients vs. normal controls. For instance, it is known that an abnormal shape of the hippocampus involves with neurological diseases such as epilepsy, schizophrenia, and Alzheimer’s disease. In order to estimate shape deformation of the hippocampus by computer, it is essential to select an efficient shape representation scheme. Then, a powerful classifier is used to discriminate a patient group from the normal one. It is difficult for a user to feel real spatial and haptic (touch of sense) effect because anatomical structures in virtual scene are represented visually in 2D or 2.5D. For a long time, a haptic modality was considered inferior to visual sensory in terms of perceptual accuracy [1]. “Co-location” is a term used to describe a haptic and a visual display which is defined a same co-ordinate system. Although results would seem to suggest that a co-located display offers no significant advantage to that of a traditional 2D mouse interface held to one side of the body in a translational positioning task, colocation of the hand and virtual workspace improved performance in tasks involving object rotation [2]. Therefore, multi-sensory interface is very useful for interactive medical applications. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 945 – 950, 2006. © Springer-Verlag Berlin Heidelberg 2006
946
J.-S. Kim, H.-J. Kim, and S.-M. Choi
In our work, we develop a method for intelligent shape analysis using parametric representation and a SVM algorithm. For better understanding and improved manipulation of the anatomical structure, we construct and experiment multi-sensory virtual environment using a haptic device and a stereoscopic display.
2 Related Work Intelligent shape analysis based on the statistical model has been used to diagnose and treat for the diseases of the 3D human organs extracted from medical imaging data set. Zhu [3] introduced a parametric modeling method for the lateral ventricle in the statistical shape analysis. And Styner [4] proposed an approach for applying SPHARM (spherical harmonics) representation into the 3D shape analysis. PCA (Principle Components Analysis) is the most commonly used algorithm for separating two groups. PCA reduces the space dimensionality of the shape representation and can be used for the binary classification problem. But it has a limitation for constructing an efficient maximum likelihood classifier because of its small size of samples. In recent years, an artificial neural network or a SVM based classifier have been used in the statistical shape analysis. In special, a SVM turns out to be a powerful classifier since it guarantees to converge to an optimal solution even for small set of training samples. Researches for estimating the performance of multi-sensory interface with haptic and visual information in virtual environment are generally focused on time and accuracy efficiency. Basdogan [5] introduced an auto-stereoscopic and a haptic visualization method for spatial exploration and task design. Specially, he comprised a multi modality environment that can touch and manipulate a virtual object. This method has advantage that use non-invasive auto-stereoscopic display as a substitute for the shuttered glasses based stereo display. But it has no significant comparison result for the performance of the co-located interface. Wall [6] introduced the effect of haptic feedback and stereo graphics in a 3D target acquisition task. The equipment consisted of a Reachin Developer Display with a PHANToM haptic feedback device, equipped with the instrumented stylus. As a result, a haptic feedback improved subject’s accuracy, but did not improve the time taken to reach the target.
3 Intelligent Shape Analysis Using Multi-sensory Interface Generally, a SVM based shape analysis method consists of three main steps. First, we extract quantitative shape features from medical data set. This is used to create a generative model using model variation from sample data set, or is utilized to generate discriminative model helping a classification between two groups. We focus on the latter. Fig. 1 shows an overall procedure for 3D shape analysis. Initially, we make parametric models using a PDM (Point Distribution Modeling) method from mesh models set. And then we construct two average models representing each shape group statistically. Finally, we execute a classification task based on a SVM classifier. The procedure for creating a parametric model consists of five steps. First, we find the center of mass of the model and its principal axes of the surface points. Then we create an initial super-ellipsoid and triangulate it. Here, we use a single 3D blob
Intelligent Analysis of Anatomical Shape Using Multi-sensory Interface
947
element proposed as suggested in [7]. After the triangulation, we adopt the vertices of meshes to the FEM nodes to achieve the physics-based shape deformation. The mode shape vectors form an orthogonal object-centered coordinate system for describing feature locations. We transform these into nodal displacement vectors and iteratively compute the new position of the points on the deformable model using 3D Gaussian interpolation functions. We can get the final deformed positions if the energy value is below the threshold or the iteration count is over the threshold.
Fig. 1. Overall procedure for SVM based intelligent shape analysis
Once the feature vectors are extracted from the parametric model, they can be used to analyze the shape differences between populations, for example, normal controls and epilepsy patients in our case. In this section, we briefly describe our approach based on discriminative modeling method using SVM [8]. First, we train a classifier for labeling new examples into one of the two groups. Each training data set is composed of coordinates for the deformable meshes. Wethen extract an explicit description for the differences between two groups captured by the classifier. This method is to detect statistical differences between two populations. In order to acquire the optimal solution, it is important to select a good classifier function. SVM is known to be robust and free from the over-fitting problem. Given a training data set {(xk, yk), 1 k }, where xk are observations and yk are corresponding groups, and a kernel function, K : \ n × \ n 6 \ , the SVM classification function: n
yk (x) = ¦ k =1 ak yk K (x,x k ) + b.
(1)
Where the coefficients ak’s and b are determined by solving a quadratic optimization problem that is constructed by maximizing the margin between the two classes. For the non linear classification, we employ the commonly used polynomial function K(x,xk)=(x·xk + 1)d (where the kernel K of two objects x and xk is the inner product of their vectors in the feature space, parameter d is the degree of polynomial). In order to estimate the accuracy of the resulting classifier and decide the optimal parameters in
948
J.-S. Kim, H.-J. Kim, and S.-M. Choi
the non-linear case, we use cross-validation. To obtain error, recall, and precision, we evaluate the performances for three types of SVM kernels (i.e. polynomial, RBF, sigmoid) and the linear case. Results are described in Section 4. In order to support an efficient user interaction in virtual environment, we design and implement a multi modality interface integrating stereo graphic display with a haptic feedback. Fig. 2 describes hardware and software setup using multi-sensory interface.
Fig. 2. Overview of Co-located interface: (left) software setup; (right) hardware setup
In our work, we use a haptic device to touch virtual objects and to feel materials of the object’s surface and also to manipulate the grasped object. We choose the pointbased haptic rendering technique. In point-based haptic rendering, the tip point of the end-effector like as HIP (Haptic Interface Point) is digitized via encoders and used to detect collisions with virtual objects. Finally, a reaction force is calculated based on the depth of penetration and reflected to a user through a haptic device. We used a PHANToM haptic device and OpenHaptics Library for calculating a force model. Consequently, a user can control a point object to touch an object using a stylus handle, and then touch and manipulate static scene object [9]. A separation distance of human eyes is about 65mm.Because of a binocular parallax originated by this distance, our brain perceives two different 2D images for a single object. And the brain builds a sense of perspective by fusing these images. Using this theory, we can simulate our optic system in a computer application. In our work, we set up stereoscopic hardware environment using a CRT monitor and special glasses synchronized by an infrared emitter with that monitor. Additionally we developed a software module for stereoscopic rendering using OpenGL library. First, we create views for the left and right eye. Then, we deal with how each rendering is displayed to a user to create the desired stereoscopic effect. To generate two slightly different view frustums of the same image, we utilize two perspective cameras, one for each eye. In this part, main issue is to determine the method for setting these two different frustums using binocular parallax. A “toe-in” method creates a viewing frustum that leads each eye toward to one focus. This method is not good solution for accurate perspective because of mismatch of two frustums. So we use a “asymmetric frustum perspective projection” method, which two frustum generated from eyes are
Intelligent Analysis of Anatomical Shape Using Multi-sensory Interface
949
asymmetric. Finally we can display stereo images as we render two different images to each hardware buffer using final frustums [10, 11].
4 Experimental Results In our experiment, we estimate the performance of a SVM based classifier and of a parametric modeling method, and also investigate the effect of the multi-sensory interface consisting of a haptic device and a stereo rendering display. Initially, we collected two template 3D models (normal controls and epileptic patients) from the real MRI. And we also generated 80 deformed models using a modeling tool in order to estimate the capacity of our deformable modeling method.
Fig. 3. The result of the training test using SVM for four types of kernels
Fig. 4. Shape analysis using multi-sensory interface: (left) screenshot of the shape analysis for hippocampuses; (right) result of stereo rendering
We used 3D parametric and deformable meshes as shape features. In order to implement the non-linear classifier (SVMs), we tested three types of kernels; RBF, polynomial, sigmoid functions. And we adopted the cross validation (CV) technique in order to overcome the problem of small training sets. In our experiment, we tested four conditions: 1) sequential without CV, 2) sequential with CV, 3) randomized
950
J.-S. Kim, H.-J. Kim, and S.-M. Choi
without CV, 4) randomized with CV. As a result, we acquired that a polynomial function shows the best performance (Fig. 3). In our experiment, we used the multi-sensory interface to control a camera viewing and virtual objects (3D shape and Octree). In order to validate quantitative result of the shape difference, a user wears the stereo glasses and grasps a haptic stylus handle, and then one can change a camera view-point and manipulate the object (translating, rotating), and also pick and select Octree sub-space using a haptic device with a stereo visual cues. Fig. 4 shows the result of the shape analysis using our interface.
5 Conclusion In this paper, we presented a framework of intelligent 3D shape analysis based on a SVM classifier and shape differences between normal and epilepsy patient group. And we also set up and investigated the effect of the multi-sensory interface to analyze qualitative and quantitative results of the shape analysis using a haptic device and a stereo display. Our parametric modeling method is effective to construct the statistical model from the 3D model data. And a SVM based classifier applying polynomial kernel shows good performance to discriminate two groups. Multisensory interface using a haptic feedback and stereo visual cues provides immersive and perspective feelings for exploring the shape, so a user can explore and manipulate objects in virtual environment intuitively.
Acknowledgments This work was supported by the Korea Research Foundation Grant funded by the Korean Government (MOEHRD) (KRF-2005-205-D00105).
References 1. Loftin, R.B.: Multisensory Perception: Beyond the Visual in Visualization. Computing in Science & Eng., Vol. 5 (2003) 56-58 2. Wall, S.A., Harwin, W.S.: Quantification of the Effects of Haptic Feedback during a Motor Skills Task in a Simulated Environment. In Proc. of the 2nd PHANToM Users Research Symposium, (2000) 61-69 3. Zhu, L. Jiang, T.: Parameterization of 3D Brain Structures for Statistical Shape Analysis. In Proc. of SPIE Medical Imaging, Vol. 5370 (2004) 1254-1262 4. Styner, M., Gerig, G.: Statistical Shape Analysis of Neuro-anatomical Structures based on Medical Models. Medical Image Analysis, Vol. 7 (2003) 207-220 5. Basdogan, C., et.al.: Autostereoscopic and Haptic Visualization for Space Exploration and Mission Design. IEEE Virtual Reality Conference, (2002) 271-276. 6. Wall, S.A., et.al.: The Effect of Haptic Feedback and Stereo Graphics in a 3D Target Acquisition Task. Proc. of Eurohaptics, (2002) 23-29 7. Choi, S.M., et.al.: Shape Reconstruction from Partially Missing Data in Modal Space. Computer & Graphics, Vol.26 (2002) 701-708
Modeling Expressive Music Performance in Bassoon Audio Recordings Rafael Ramirez, Emilia Gomez, Veronica Vicente, Montserrat Puiggros, Amaury Hazan, and Esteban Maestre Music Technology Group Pompeu Fabra University Ocata 1, 08003 Barcelona, Spain Tel:+34 935422165, Fax:+34 935422202 {rafael,vicente,puiggross,hazan,maestre,gomez}@iua.upf.es
Abstract. In this paper, we describe an approach to inducing an expressive music performance model from a set of audio recordings of XVIII century bassoon pieces. We use a melodic transcription system which extracts a set of acoustic features from the recordings producing a melodic representation of the expressive performance played by the musician. We apply a machine learning techniques to this representation in order to induce a model of expressive performance. We use the model for both understanding and generating expressive music performances.
1
Introduction
Expressive performance is an important issue in music which has been studied from different perspectives (e.g. [2]). The main approaches to empirically study expressive performance have been based on statistical analysis (e.g. [11]), mathematical modelling (e.g. [13]), and analysis-by-synthesis (e.g. [1]). In all these approaches, it is a person who is responsible for devising a theory or mathematical model which captures different aspects of musical expressive performance. The theory or model is later tested on real performance data in order to determine its accuracy. In this paper we describe an approach to investigate musical expressive performance based on machine learning [7]. Instead of manually modelling expressive performance and testing the model on real musical data, we let a computer use an inductive logic programming algorithm to automatically discover regularities and performance principles from real performance data (i.e. bassoon audio performances). The rest of the paper is organized as follows: Section 2 describes how the acoustic features are extracted from the monophonic recordings. In Section 3 our approach for learning rules of expressive music performance is described. Section 4 reports on related work, and finally Section 5 presents some conclusions and indicates some areas of future research. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 951–957, 2006. c Springer-Verlag Berlin Heidelberg 2006
952
2
R. Ramirez et al.
Melodic Description
In order to obtain a symbolic description of the expressive audio recordings we compute descriptors related to two different temporal scopes: some of them related to an analysis frame, and some other features related to a note segment. Firstly, we deivide the audio signal into analysis frames, and a set of low-level descriptors are computed for each analysis frame. Then, we perform a note segmentation using low-level descriptor values. Once the note boundaries are known, the note descriptors are computed from the low-level and the fundamental frequency values. The main low-level descriptors we use to characterize expressive performance are instantaneous energy and fundamental frequency. Energy is computed on the spectral domain, using the values of the amplitude spectrum. For the estimation of the instantaneous fundamental frequency we use a harmonic matching model, the Two-Way Mismatch procedure (TWM) [5]. First of all, we perform a spectral analysis of a portion of sound, called analysis frame. Secondly, the prominent spectral peaks of the spectrum are detected from the spectrum magnitude. These spectral peaks of the spectrum are defined as the local maxima of the spectrum which magnitude is greater than a threshold. These spectral peaks are compared to a harmonic series and an TWM error is computed for each fundamental frequency candidates. The candidate with the minimum error is chosen to be the fundamental frequency estimate. Note segmentation is performed using a set of frame descriptors, which are energy computation in different frequency bands and fundamental frequency. Energy onsets are first detected following a band-wise algorithm that uses some psycho-acoustical knowledge [3]. In a second step, fundamental frequency transitions are also detected. Finally, both results are merged to obtain the note boundaries. We compute note descriptors using the note boundaries and the low-level descriptors values. The low-level descriptors associated to a note segment are computed by averaging the frame values within this note segment. Pitch histograms have been used to compute the pitch note and the fundamental frequency that represents each note segment, as found in [6]. This is done to avoid taking into account mistaken frames in the fundamental frequency mean computation.
3
Learning the Expressive Performance Model
In this section, we describe our inductive approach for learning an expressive performance model from audio performances of bassoon pieces. Our aim is to find note-level rules which predict, for a significant number of cases, how a particular note in a particular context should be played (e.g. longer than its nominal duration). We are aware of the fact that not all the expressive transformations regarding tempo (or any other aspect) performed by a musician can be predicted at a local note level. Musicians perform music considering a number of abstract structures (e.g. musical phrases) which makes of expressive performance a multilevel phenomenon. In this context, our ultimate aim is to obtain an integrated
Modeling Expressive Music Performance in Bassoon Audio Recordings
953
model of expressive performance which combines note-level rules with structurelevel rules. Thus, the work presented in this paper may be seen as a starting point towards this ultimate aim. The training data used in our experimental investigations are monophonic audio recordings of XVIII century bassoon pieces performed by a professional musician. Each piece has been recorded at 3 different tempos: for pieces indicated as adagio the recorded tempos are 50, 60, 100 ppm, for pieces indicated as allegro moderato and affectuoso the recorded tempos are 60, 92, 120 ppm. In this paper, we are concerned with expressive transformations of note duration, onset, energy and trills. The note-level performance classes which interest us are: lengthen, samedur and shorten for note duration, advance, ontime and delay for note onset, louder, medium and softer for note energy, and few, average and many for a trilled note. A note is considered to belong to class lengthen if its performed duration is 20% or more longer that its nominal duration, e.g. its duration according to the score. Class shorten is defined analogously. A note is considered to be in class advance if its performed onset is 5% of a bar earlier (or more) than its nominal onset. Class delay is defined analogously. A note is considered to be in class louder if it is played louder than its predecesor and louder then the average level of the piece. Class softer is defined analogously. Finally, a note is considered to be in class few, average or many if the number of trills is less tan 4, between 5 and 9, or more than 10, respectively. For synthesizing trills, we apply a nearest neighbor algorithm which selects the most similar trill (in terms of musical context) in the training examples and adapts it to the new musical context (e.g. the key of the piece). Each note in the training data is annotated with its corresponding class and a number of attributes representing both properties of the note itself and some aspects of the local context in which the note appears. Information about intrinsic properties of the note includes the note’s duration, pitch and metrical position, while information about its context includes the duration of previous and following notes, extension and direction of the intervals between the note and both the previous and the subsequent note, the note Narmour groups [8], and tempo of the performance. Using this data, we apply a greedy set covering algorithm in order to induce an expressive performance model. We obtain an ordered set of first-order rules each of which chharacterises a subset of the training data. We define four predicates to be learned: duration/4, onset/4, energy/4, and trills/4. For each note of our training set, each predicate corresponds to a particular type of transformation: duration/4 refers to duration transformation, onset/4 to onset deviation, energy/4 to energy transformation, and alteration/4 refers to note alteration. For each predicate we use the complete training set and consider a background knowledge containing the note’s local information (context/6 predicate) and the Narmour structures (narmour/2 predicate), as well as predicates for specifying an arbitrary-size context (i.e. any number of successors and predecessors) of a note (succ/2 predicate), and auxiliary predicates (e.g. member/3). Once we obtain a set of rules for a particular concept, e.g. duration, we collect the ex-
954
R. Ramirez et al.
amples correctly covered by each rule and apply a linear regression on the their numerical values. The numerical values of the covered examples are aproximated by a linear regerssion in the same way as a model tree approximates examples at its leaves. The difference with a model tree is that the induced rules do not form a tree as it is the case in model trees. The algorithm is as follows: SEQ-COVERING(Target_attribute,Attributes,Examples,Threshold) Learned_classification_rules := {} Learned_regression_rules := {} Rule := LEARN-ONE-RULE(Target_attribute, Attributes, Examples) while PERFORMANCE(Rule, Examples) > Threshold do Learned_classification_rules := Learned_classification_rules + Rule Examples := Examples - {examples correctly classified by Rule} Rule := LEARN-ONE-RULE(Target_attribute, Attributes, Examples) For each Rule in Learned_classification_rules do collect correctly covered examples by the Rule approximate the examples’ numerical value by linear regression LR Construct Rule_1 as: body(Rule_1) := body(Rule) head(Rule_1) := LR Learned_regression_rules := Learned_regression_rules + Rule_1 Return Learned_regression_rules
SEQ-COVERING learns rules until it can no longer learn a rule whose performance is above the given Threshold. The LEARN-ONE-RULE subroutine generates one rule by performing a general-to-specific search through the space of possible rules in search of a rule with high accuracy. It organises the hypothesis space search in the same general fashion as the CN2 algorithm mantaining a list of k best candidates at each step. In order to handle three classes (e.g. in the case of note duration, lengthen, shorten and same) we have forced the LEARN-ONE-RULE subroutine to learn rules that cover positive examples of one class only. Initially, it learns rules that cover positive examples of one of the classes (e.g. lengthen) and considers the examples of the other two classes (e.g. shorten and same) as negative examples. Once the rules for the first class have been learned, LEARN-ONE-RULE learns rules that cover only positive examples of a second class (e.g. shorten) in the same way it did for the first class, and similarly for the third class. The PERFORMANCE procedure computes the function tpα /(tp + f p) where tp is the number of true positives, f p is the number of false positives and α is a parameter which provides a trade-off between the rule’s accuracy and coverage. For each type of rule, depending on the exact number of positive examples, we tuned both the parameter α and the Threshold to constrain the minimum number of positive examples as well as the ratio of positive and negative examples covered by the rule. This is, using α and Threshold we restrict the area in the coverage space 1 in which the induced rules must lie. Inductive logic programming has proved to be an extremely well suited technique for learning expressive performance rules. This is mainly due to three reasons: Firstly, inductive logic programming allows the induction of first order logic rules. First order logic rules are substantially more expressive than the traditional propositional rules used in most rule learning algorithms (e.g. the widely used C4.5 algorithm [9]) which allows specifying musical knowledge in a more 1
Coverage spaces are ROC spaces based on absolute numbers of covered examples.
Modeling Expressive Music Performance in Bassoon Audio Recordings
955
natural manner. Secondly, Inductive logic programming allows considering an arbitrary-size note context without explicitly defining extra attributes. Finally, the possibility of introducing background knowledge into the learning task provides great advantages in learning musical concepts where often there is a great amount of available background information (i.e. music theory knowledge). Synthesis Tool. We have implemented a tool which transforms an inexpressive melody input into an expressive one following the induced model tree. The tool can either generate an expressive MIDI performance from an inexpressive MIDI description of a melody, or generate an expressive audio file from an inexpressive audio file.
4
Related Work
Widmer [14,15] reported on the task of discovering general rules of expressive classical piano performance from real performance data via inductive machine learning. The performance data used for the study are MIDI recordings of 13 piano sonatas by W.A. Mozart performed by a skilled pianist. In addition to these data, the music score was also coded. The resulting substantial data consists of information about the nominal note onsets, duration, metrical information and annotations. When trained on the data an inductive rule learning algorithm discovered a small set of quite simple classification rules [14] that predict a large number of the note-level choices of the pianist. Tobudic et al. [12] describe a relational instance-based approach to the problem of learning to apply expressive tempo and dynamics variations to a piece of classical music, at different levels of the phrase hierarchy. The different phrases of a piece and the relations among them are represented in first-order logic. The description of the musical scores through predicates (e.g. contains(ph1,ph2)) provides the background knowledge. The training examples are encoded by another predicate whose arguments encode information about the way the phrase was played by the musician. Their learning algorithm recognizes similar phrases from the training set and applies their expressive patterns to a new piece. Ramirez [10] et al report on a system capable of generating audio expressive saxophone performances of Jazz standards. The system is based on a similar approach to the one presented here, where different acoustic features of real saxophone Jazz performances are extracted and used to induce an expressive performance model. Lopez de Mantaras et al report on SaxEx [4], a performance system capable of generating expressive solo performances in jazz. Their system is based on casebased reasoning, a type of analogical reasoning where problems are solved by reusing the solutions of similar, previously solved problems. In order to generate expressive solo performances, the case-based reasoning system retrieve, from a memory containing expressive interpretations, those notes that are similar to the input inexpressive notes. The case memory contains information about metrical strength, note duration, and so on, and uses this information to retrieve the appropriate notes.
956
5
R. Ramirez et al.
Conclusion
This paper describes an inductive logic programming approach for learning an expressive performance model from recordings of XVIII century bassoon pieces by a professional musician. With this aim, we have extracted a set of acoustic features from the recordings resulting in a symbolic representation of the performed pieces and then applied a rule-based algorithm to the symbolic data and information about the context in which the data appeared. In this context, the algorithm has proved to be an extremely well suited technique for learning an expressive performance model. It naturally allows background knowledge (i.e. musical theory knowledge) to play an important role in the learning process, and permits considering an arbitrary-size note context without explicitly defining extra attributes for each context extension. Currently, we are in the process of increasing the amount of training data as well as experiment with different information encoded in it. Increasing the training data, extending the information in it and combining it with background musical knowledge will certainly generate a more complete set of rules. Acknowledgments. This work is supported by the Spanish TIC project ProMusic (TIC 2003-07776-C02-01).
References 1. Friberg, A.: A Quantitative Rule System for Musical Performance. PhD Thesis, KTH, Sweden,(1995) 2. Gabrielsson, A. The Performance of Music. In D.Deutsch (Ed.), The Psychology of Music (2nd ed.) Academic Press.(1999) few, average or many 3. Klapuri, A.: Sound Onset Detection by Applying Psychoacoustic Knowledge, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP.(1999) 4. Lopez de Mantaras, R. and Arcos, J.L. AI and music, from composition to expressive performance, AI Magazine, 23-3.(2002) 5. Maher, R.C. and Beauchamp, J.W. Fundamental frequency estimation of musical signals using a two-way mismatch procedure, Journal of the Acoustic Society of America, vol. 95(1994)2254-2263 6. McNab, R.J., Smith Ll. A. and Witten I.H., Signal Processing for Melody Transcription,SIG working paper, vol. (1996)95-22 7. Mitchell, T.M.: Machine Learning. McGraw-Hill.(1997) 8. Narmour, E.: The Analysis and Cognition of Basic Melodic Structures: The Implication Realization Model. University of Chicago Press.(1990) 9. Quinlan, J.R. C4.5: Programs for Machine Learning, San Francisco, Morgan Kaufmann. (1993) 10. Ramirez, R. Hazan, A. G´omez, E. Maestre, E.: A Machine Learning Approach to Expressive Performance in Jazz Standards MDM/KDD’04, Seattle, WA, USA.(2004) 11. Repp, B.H.: Diversity and Commonality in Music Performance: an Analysis of Timing Microstructure in Schumann’s ‘Traumerei’. Journal of the Acoustical Society of America 104.(1992)
Modeling Expressive Music Performance in Bassoon Audio Recordings
957
12. Tobudic A., Widmer G.: Relational IBL in Music with a New Structural Similarity Measure, Proceedings of the International Conference on Inductive Logic Programming, Springer Verlag.(2003) 13. Todd, N.: The Dynamics of Dynamics: a Model of Musical Expression. Journal of the Acoustical Society of America 91.(1992) 14. Widmer, G. Machine Discoveries: A Few Simple, Robust Local Expression Principles. Journal of New Music Research 31(1), (2002)37-50 15. Widmer, G.: In Search of the Horowitz Factor: Interim Report on a Musical Discovery Project. Invited paper. In Proceedings of the 5th International Conference on Discovery Science (DS’02), Lbeck, Germany. Berlin: Springer-Verlag.( 2002)
Modeling MPEG-4 VBR Video Traffic by Using ANFIS Zhijun Fang, Shenghua Xu, Changxuan Wan, Zhengyou Wang, Shiqian Wu, and Weiming Zeng School of Information Technology, Jiangxi University of Finance & Economics Nanchang, Jiangxi 330013, China [email protected],[email protected],[email protected], [email protected],[email protected], [email protected]
Abstract. Video traffic predicting and modeling are very important for compressed video transmission. The traditional method delineates the process by a rigid model with several parameters, which are difficult to estimate. In this paper, the MPEG-4 VBR (Variable Bit Rate) video traffic is modeled by the ANFIS (Adaptive Neuro-Fuzzy Inference System). Then, it is applied to modeling and predicting the MPEG-4 VBR video traffic. Simulations show the GoP (Group of Pictures) loss probabilities in actual video traffic are very close to those in ANFIS modeling traffic at the same experimental conditions and the prediction errors (1/SNR) are very small.
1 Introduction Nowadays, video applications, such as videophone, real time videoconference, and streaming stored video have become major components of broadband multimedia services. In the year 2000, MPEG-4 (Moving Picture Experts Group) became an international standard as it is a digital multimedia standard with associated protocols for representing, manipulating and transporting natural and synthetic multimedia content over a very broad range of communication infrastructures. It is also an object-based compression and streaming standard where a scene can be composed of a set of semantically meaningful objects (i.e. audio, video objects). Compared to the conventional frame-based coding techniques, MPEG-1 or MPEG-2, for example, the objectbased code and representation of the audio and video information enable MPEG-4 to cover a very wide scope of emerging and future applications [1]. However, the Quality of Service (QoS) for transporting video is frequently inconsistent and unpredictable since the Internet provides only a best-effort service. Therefore traffic modeling and predicting is a key solution to offer good QoS. Conventional traffic modeling and predicting schemes use a model-and-parameter approach to provide QoS guarantees while maintaining a high utilization of network resources. However, the application of this type of approach to MPEG-4 VBR video services involves several problems [2]. First, it is well known that modeling and predicting D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 958 – 963, 2006. © Springer-Verlag Berlin Heidelberg 2006
Modeling MPEG-4 VBR Video Traffic by Using ANFIS
959
MPEG-4 video traffic with only a few parameters is very difficult due to its complex traffic characteristics. Second, the high burstiness of the MPEG-4 VBR video traffic causes large queues, delays, and excessive cell losses. Third, characterizing the input video traffic prior to the call setup is only possible for video applications that use prerecorded streams, such as video on demand [2]. In this paper, adaptive neuro-fuzzy inference system (ANFIS) [3] is presented to model and predict the MPEG-4 VBR video traffic without any predetermined parameters. It is implemented to test the existence of package lost in different rates of bandwidth usage. Simulation results show that the group of pictures (GoP) loss probabilities in ANFIS modeling traffic are more approximate to those in actual traffic and the prediction errors (1/ SNR) in different video sequences (Silence of The Lambs, and Alpin Ski, Jurassic Park (I)) are very small. Consequently, this method is promising to be applied to MPEG-4 VBR video traffic modeling, prediction and resource reservation. After introducing the ANFIS arithmetic in Section 2, the experimental results are presented in Section 3, and the conclusions are drawn in Section 4.
2 ANFIS Arithmetic For simplicity, it is assumed that the fuzzy inference system under consideration has two inputs, x and y , and one output, z . Suppose that the rule base contains two fuzzy if-then rules of Takagi and Sugeno’s type [3], which are: Rule 1. If x is A1 and y is B1 , then
f1 = p1 x + q1 y + r1 . Rule 2. If
(1)
x is A2 and y is B2 , then f 2 = p2 x + q2 y + r2 .
If the membership functions of fuzzy set
(2)
Ai , Bi , i=1,2, are represented by μ A , μ B , i
i
and we choose product for T-norm (logical and) [4] in evaluating the rules, i.e.,: 1) Evaluating the rule premises results in
wi = μ Ai ( x ) μ Bi ( y ), i = 1, 2.
(3)
2) Evaluating the implication and the rule consequences yields
f ( x, y ) =
w1 ( x, y ) f1 ( x, y ) + w2 ( x, y ) f 2 ( x, y ) . w1 ( x, y ) + w2 ( x, y )
(4)
or, more simplely:
f =
w1 f1 + w2 f 2 . w1 + w2
(5)
960
Z. Fang et al.
This can be separated into phases by first defining
wi =
wi . w1 + w2
(6)
hence f can be written as
f = w1 f1 + w2 f 2 .
(7)
The structure of the ANFIS is shown in Figure 1 [4].
Fig. 1. Structure of the ANFIS
In this paper, the ANFIS uses a hybrid-learning algorithm to identify the membership function parameters and the Takagi and Sugeno’s type fuzzy inference systems (FIS). A combination of least squares and back propagation gradient descent methods is utilized for training FIS membership function parameters to model the MPEG-4 trace data. By implementing 10 bell membership functions for each input and choosing the epoch times to be 1000, it generates an initial Takagi and Sugeno’s type FIS for ANFIS training using a grid partition.
3 Simulation Results The MPEG-4 VBR video trace researched in [5] is adopted in this paper. The YUV information of each video is encoded into the MPEG–4 bit stream with the MOMUSYS MPEG–4 video software [6]. We set the number of video objects to 1 (i.e., the entire scene is one video object). The video format is CIF, whose size is 176 * 144 pixels with depth of 8 bits, but no rate control and the scalable layer coder technology in the encoding was used. The rate was set to 25 frames per second. Meanwhile, the GoP pattern was IBBPBBPBBPBB, and the quantization parameters were fixed at 10 for I-frames (VOPs), 14 for P frames, and 18 for B-frames. We observed 8197 frames traffic on the basis of the data of a GoP group. Let 1/ SNR
(¦ v 2 (m) / ¦ x 2 ( m)) (note the font size) denote the whole performance metric for prediction, it is obvious that the smaller the 1/ SNR , the better the forecast.
Modeling MPEG-4 VBR Video Traffic by Using ANFIS Silence of The Lambs 1
Original Trace(U1=90%) ANFIS(U1=90%) Original Trace(U2=80%) ANFIS(U2=80%) Original Trace(U3=70%) ANFIS(U3=70%)
0.9
GOP Loss Probability
0.8
0.7
0.6
0.5
0.4
0.3
0.2 0
10
20
30 Buffer Size (ms)
40
50
60
Fig. 2. Comparison of Silence of The Lambs Sequence GoP Loss Probability ( 1 / SNR =0.0111) Alpin Ski 1
Original Trace(U1=90%) ANFIS(U1=90%) Original Trace(U2=80%) ANFIS(U2=80%) Original Trace(U3=70%) ANFIS(U3=70%)
0.9
0.8
GOP Loss Probability
0.7 0.6
0.5
0.4 0.3
0.2
0.1
0
0
10
20
30 Buffer Size (ms)
40
50
Fig. 3. Comparison of Alpin Ski Sequence GoP Loss Probability ( 1 / SNR =0.0167)
60
961
962
Z. Fang et al. Jurassic Park (I) 1
Original Trace(U1=90%) ANFIS( U1=90%) Original Trace(U2=80%) ANFIS( U2=80%) Original Trace(U3=70%) ANFIS( U3=70%)
0.9
0.8
GOP Loss Probability
0.7 0.6
0.5
0.4 0.3
0.2
0.1
0
0
10
20
30 Buffer Size (ms)
40
50
60
Fig. 4. Comparison of Jurassic Park (I) Sequence GoP Loss Probability ( 1 / SNR =0.1095)
In this experiment, the network access speed was 5 × 10 bit / s (note the font size). Under the different bandwidth utilization rates (e.g.,. U=90%, 80%, 70%), GoP loss probability is compared between actual bit stream and ANFIS arithmetic, with the simulation shown in Figure 2 – Figure 4. Figure 2 shows the Silence of The Lambs’s simulation result (1/SNR=0.0111), Figure 3 is the Alpin Ski’s simulation result (1/SNR=0.0167) and Figure 4 is the Jurassic Park (I)’s simulation result (1/SNR=0.1095). 5
4 Conclusions In this paper, the MPEG-4 VBR video traffic model based on ANFIS is analyzed and discussed. The ANFIS model does not need to solve any extra parameters. It is applied to modeling and predicting the MPEG-4 VBR video traffic. Simulations show this model is concise and effective. Comparison of the GoP loss probabilities in actual traffic under different utilizations (U=70% 80% 90%) illustrates that those in the ANFIS modeling traffic are the closest approximation at the same experimental conditions and the prediction errors ( 1/ SNR ) of different video sequences (Silence of The Lambs, Alpin Ski, Jurassic Park (I)) are very small. Acknowledgments. This Project Supported by NSFC (No. 60462003), the Science and Technology Research Project of the Education Department of Jiangxi Province (No.2005-115 and 2006-231) and Jiangxi University of Finance & Economics Innovation Fund.
Modeling MPEG-4 VBR Video Traffic by Using ANFIS
963
References 1. Ahmed, T., Buridant, G., Mehaoua, A.: Delivering of MPEG-4 Multimedia Content over Next Generation Internet. Lecture Notes In Computer Science, 2216 (2001)110-127 2. Yoo, S. J.: Efficient Traffic Predication Scheme for Real-time VBR MPEG Video Transmission over High-speed Networks. IEEE Trans. Broadcasting, 48 (2002) 10-18 3. Roger Jang, J. S.: ANFIS: Adaptive-Network-Based Fuzzy Inference Systems. IEEE Trans. Systems, Man, and Cybernetics, 23 (1993) 665-685 4. Koivo, H.: ANFIS (Adaptive Neuro-Fuzzy Inference System). online: http://www.control.hut.fi 5. Fitzek, F. H. P., Reisslein, M.: MPEG-4 and H.263 Video Traces for Network Performance Evaluation, IEEE Network, 15 (2001) 40-54 6. Heising, G., Wollborn, M. MPEG-4 Version 2 Video Reference Software Package, ACTS AC0 ’98 MOMUSYS, (1999)
Multiple Textural Features Based Palmprint Authentication Xiangqian Wu1 , Kuanquan Wang1 , and David Zhang2 1 School of Computer Science and Technology, Harbin Institute of Technology (HIT), Harbin 150001, China {xqwu, wangkq}@hit.edu.cn http://biometrics.hit.edu.cn 2 Biometric Research Centre, Department of Computing, Hong Kong Polytechnic University, Kowloon, Hong Kong [email protected]
Abstract. This paper proposes two novel palmprint textural features, orientationCode and diffCode, and investigates the fusion of these features at score level for personal recognition. The orientationCode and diffCode are first defined using four directional templates and the differential operation, respectively. And then the matching score are computed to measure the similarity of the features. Finally, several fusion strategies are investigated for the matching scores of orientationCode and diffCode. Experimental results show that the orientationCode and diffCode can describe a palmprint effectively and the Sum, Product and Fisher’s Linear Discriminant (FLD) fusion strategies can greatly improve the accuracy of palmprint authentication.
1
Introduction
Computer-aided personal recognition is becoming increasingly important in our information society. Biometrics is one of the most important and reliable methods in this field [1]. The palmprint, as a relatively new biometric feature, has several advantages compared with other currently available features [1]: palmprints contain more information than fingerprint, so they are more distinctive; palmprint capture devices are much cheaper than iris devices; palmprints also contain additional distinctive features such as principal lines and wrinkles, which can be extracted from low-resolution images; a highly accurate biometrics system can be built by combining all features of palms, such as palm geometry, ridge and valley features, and principal lines and wrinkles, etc. It is for these reasons that palmprint recognition has recently attracted an increasing amount of attention from researchers [5, 2, 3, 6, 4]. A palmprint contains following basic elements: principal lines, wrinkles, delta points and minutiae, etc.. And these basic elements can constitute various palmprint features, such as palm lines [5], textural features [4], etc.. Different palmprint features reflect the different characteristic of a palmprint. Fusion of multiple palmprint features may enhance the performance of palmprint authentication D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 964–969, 2006. c Springer-Verlag Berlin Heidelberg 2006
Multiple Textural Features Based Palmprint Authentication
965
system. Up to now, textural features based algorithms are most effective for palmprint recognition. This paper will investigate some textural features and their fusion. Two novel palmprint textural features, orientationCode and diffCode, are computed using the directional templates and differential operation, respectively, and then compute a matching score for each feature. Finally, several strategies are investigated to fuse these two matching scores for personal authentication. When palmprints are captured, the position, direction and amount of stretching of a palm may vary so that even palmprints from the same palm may have a little rotation and translation. Furthermore, palms differ in size. Hence palmprint images should be orientated and normalized before feature extraction and matching. In this paper, we use the preprocessing technique described in [4] to align and normalize the palmprints. After preprocessing, the central part of the image, which is 128 × 128, is cropped to represent the whole palmprint.
2 2.1
Feature Extraction OrientationCode Extraction
We devise several directional templates to define the orientation of each pixel. The 0◦ -directional template is devised as below: ⎡ ⎤ 111111111 ⎢2 2 2 2 2 2 2 2 2⎥ ⎢ ⎥ ⎥ ◦ T0 = ⎢ (1) ⎢3 3 3 3 3 3 3 3 3⎥ ⎣2 2 2 2 2 2 2 2 2⎦ 111111111 And the α-directional template (Tα ) is obtained by rotate T0◦ with Angle α. Denote I as an image. The magnitude in the direction α of I is defined as Mα = I ∗ Tα
(2)
where “∗” is the convolution operation. Mα is called the α-directional magnitude (α-DM). Since the gray-scale of a pixel on the palm lines is smaller than that of the surrounding pixels which are not on the palm lines, we take the direction in which the magnitude is minimum as the orientation of the pixel. That is, the orientation of Pixel (i, j) in Image I is computed as below: O(i, j) = arg min Mα (i, j) ∀α
(3)
O is called the OrientationCode of the Palmprint. Four directional templates (0◦ , 45◦ , 90◦ and 135◦) are used to extract the OrientationCode in this paper. Extra experiments shows that the image with 32×32 is enough for the OrientationCode extraction. Therefore, before compute the OrientationCode, we resize the image from 128 × 128 to 32 × 32. Hence the size of the OrientationCode is 32 × 32. Figure 1 shows some examples of the OrientationCodes.
966
X. Wu, K. Wang, and D. Zhang
(a)
(b)
(c)
(d)
Fig. 1. Some examples of OrientationCodes. (a) and (b) are two palmprint; (c) and (d) are the corresponding OrientationCodes.
2.2
DiffCode Extraction
Let I denote a palmprint image and Gσ denote a 2D Gaussian filter with the variance σ. The palmprint is first filtered by Gσ as below: If = I ∗ Gσ
(4)
where “∗” is the convolution operator. Then the difference of If in the horizontal direction is computed as following: D = If ∗ b
(5)
b = [−1, 1]
(6)
where “∗” is the convolution operator. Finally, the palmprint is encoded according to the sign of each pixel of D: 1, if D(i, j) > 0; C(i, j) = (7) 0, otherwise. C is called diffCode of the palmprint I. Extra experiments also shows that the image with 32 × 32 is enough for the diffCode extraction. Therefore, before compute the diffCode, we resize the image from 128 × 128 to 32 × 32. Hence the size of the diffCode is also 32 × 32. Figure 2 shows some examples of DiffCode.
(a)
(b)
(c)
(d)
Fig. 2. Some examples of DiffCodes. (a) and (b) are two palmprint; (c) and (d) are the corresponding DiffCodes.
Multiple Textural Features Based Palmprint Authentication
3
967
Feature Matching
According to the definitions of the orientationCode and diffCode, the size of both features are same, i.e. 32 × 32. Let C1 and C2 denote two same type features (orientationCode or diffCode). Since C1 and C2 has the same length, we can use Hamming distance to define their similarity. The Hamming distance between C1 and C2 (H(C1 , C2 )) is defined as the number of the places where the corresponding values of C1 and C2 are different. That is, H(C1 , C2 ) =
32 32
C1 (i, j) ⊗ C2 (i, j)
(8)
i=1 j=1
where ⊗ is the logical XOR operation. The matching score of C1 and C2 is then defined as below: H(C1 , C2 ) (9) 32 × 32 Actually, S(C1 , C2 ) is the percentage of the places where C1 and C2 have the same values. Obviously, S(C1 , C2 ) is between 0 and 1 and the larger the matching score, the greater the similarity between C1 and C2 . The matching score of a perfect match is 1. S(C1 , C2 ) = 1 −
4
Score Fusion
Denote x1 and x2 as the matching scores of the orientationCode and diffCode, respectively. We fuse these two scores by following strategies to obtain the final matching score x. S1 : Maximum Strategy: x = max(x1 , x2 ) (10) S2 : Minimum Strategy: x = min(x1 , x2 ) S3 : Product Strategy: x=
√ x1 x2
(11) (12)
S4 : Sum Strategy:
x1 + x2 2 S5 : Fisher’s Linear Discriminant (FLD) Strategy: x=
(13)
W T SB W (14) W W T SW W ! x1 T x = Wopt ∗ (15) x2 where SB and SW are the between-class scatter matrix and the within-class scatter matrix of the genuine and impostor matching scores [7]. Wopt = arg max
968
5
X. Wu, K. Wang, and D. Zhang
Experimental Results
We employed the PolyU Palmprint Database [8] to test our approach. This database contains 600 grayscale images captured from 100 different palms by a CCD-based device. The orientationCode matching scores and the diffCode matching scores of each couple of the samples in this database are computed and then fused to get the final scores. All of the described fusion strategies were tested. And their ROC curves are plotted in Figure 3. And their equal error rate (EER) are listed in Table 1.
Maximum Minimum Product Sum FLD orientationCode diffCode
1
False Reject Rate (%)
0.9 0.8
EER
0.7 0.6 0.5 0.4 0.3 0.2 0.2
0.3
0.4
0.5 0.6 0.7 0.8 0.9 False Acceptance Rate (%)
1
1.1
Fig. 3. The ROC Curves of the orientationCode, the diffCode, and the different fusion strategies (S1 , S2 , S3 , S4 and S4 ) Table 1. EERs of the orientationCode, the diffCode, and the different fusion strategies (S1 , S2 , S3 , S4 and S5 ) Strategy OrientationCode diffCode S1 S2 S3 S4 S5 EER (%) 0.73 0.64 0.65 0.66 0.45 0.45 0.49
According to Figure 3 and Table 1, the performances of the maximum strategy and minimum strategy are worse than the diffCode, while the Sum, Product and FLD strategies can greatly improve the accuracy. The performances of the Sum, Product and FLD strategies are similar. However, the speed of the Sum strategy is much faster than the other two because of the less computation complexity. Therefore, the Sum fusion strategy is more suitable for the on-line palmprint authentication.
6
Conclusions
This paper proposed two novel palmprint textural features and investigated the fusion of these features. Several fusion strategies has been investigated. The maximum strategy and minimum strategy cannot improve the accuracy for palmprint
Multiple Textural Features Based Palmprint Authentication
969
recognition while the Sum, Product and FLD strategies greatly outperform both the orientationCode and diffCode. Considering the computation complexity, the Sum strategy is more suitable for the on line palmprint recogntion system.
Acknowledgements This work is supported by the National Natural Science Foundation of China (No. 60441005), the Key-Project of the 11th-Five-Year Plan of Educational Science of Hei Longjiang Province, China (No. HZG160) and the Development Program for Outstanding Young Teachers in Harbin Institute of Technology.
References 1. Jain, A., Ross, A., Prabhakar, S.: An Introduction to Biometric Recognition. IEEE Transactions on Circuits and Systems for Video Technology 14 (2004) 4–20 2. Wu, X., Wang, K., Zhang, D.: Fisherpalms Based Palmprint Recognition. Pattern Recognition Letters 24 (2003) 2829–2838 3. Duta, N., Jain, A., Mardia, K.: Matching of Palmprint. Pattern Recognition Letters 23 (2001) 477–485 4. Zhang, D., Kong, W., You, J., Wong, M.: Online Palmprint Identification. IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (2003) 1041–1050 5. Wu, X., Wang, K., Zhang, D., Huang, B.: Palmprint Classification Using Principal Lines. Pattern Recognition 37 (2004) 1987–1998 6. Han, C., Chen, H., Lin, C., Fan, K.: Personal Authentication Using Palm-print features. Pattern Recognition 36 (2003) 371–381 7. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley & Sons, Inc. (2001) 8. PolyU Palmprint Palmprint Database. http://www.comp.polyu.edu.hk/∼biometrics/
Neural Network Deinterlacing Using Multiple Fields∗ Hyunsoo Choi, Eunjae Lee, and Chulhee Lee Dept. Electrical and Electronic Engineering, Yonsei University 134 Shinchon-Dong, Sedaemoon-Gu, Seoul, 120-749, South Korea {piyagihs, ejlee, chulhee}@yonsei.ac.kr
Abstract. In this paper, we proposed a deinterlacing algorithm using neural networks for conversion of interlaced videos to progressive videos. The proposed method uses multiple fields: a previous field, a current field, and a next field. Since the proposed algorithm uses multiple fields, the neural network is able to take into account the motion pattern which might exists in adjacent fields. Experimental results demonstrate that the proposed algorithm provides better performances than existing neural network deinterlacing algorithms that uses a single field.
1 Introduction Since the invention of analog TV over 80 years ago, the industrial standard interlaced scan has been widely adopted in various TV broadcasting standards, which include NTSC, PAL and SECAM. The interlaced scan doubles the frame rate as compared to that of the progressive scan using the same bandwidth occupation [1]. However, this interlaced scan also introduces undesirable artifacts such as line crawling, interline flickering, and line twitter because of the nature of the interlaced scan. These artifacts can impair the visual quality of videos. In addition, interlaced scanning is unsuitable for display devices such as LCD-TVs, PDPs, PC monitors that require progressive formats. For example, it is necessary to convert interlaced DVD videos into the progressive format to display on PC monitors and LCD-TV monitors. Furthermore, recent HDTV monitors and multimedia PCs require the conversion between interlaced and progressive video sequences. A large number of techniques have been proposed for the interlaced to progressive scan conversion [1-8]. Some methods are based on intra-field de-interlacing. The main advantage of such algorithms is an easy implementation. These algorithms include line doubling, vertical averaging and edge-based line averaging (ELA) [2]. The ELA technique performs interpolation in the direction which has the highest correlation. However, these intra-field deinterlacing methods fail to provide good performance in the motion area of video sequence. To improve the performance within motion area, motion compensation methods [6-8] have been introduced. These methods are the most ∗
This research was supported by the MIC (Ministry of Information and Communication), Korea, under the ITRC (Information Technology Research Center) support program supervised by the IITA (Institute of Information Technology Assessment) (IITA-2005(C1090-0502-0027)).
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 970 – 975, 2006. © Springer-Verlag Berlin Heidelberg 2006
Neural Network Deinterlacing Using Multiple Fields
971
advanced approaches in deinterlacing and involve estimating specific motion trajectories. However, these methods suffer from computational complexity. In addition, incorrect motion estimation produces performance degradation. Recently, deinterlacing methods based on neural networks have been proposed [9], which uses the present field to obtain inputs of a neural network. In this paper, we propose to use a neural network for de-interlacing, which uses inputs from the previous, current, and next fields.
2 Neural Network Deinterlacing 2.1 Neural Network Deinterlacing Using a Single Field In this section, we briefly describe deinterlacing methods using neural networks which use a single field. The multilayer feed-forward network is one of the most popular neural network architectures [10]. The neural network has shown good performance in many applications such as pattern recognition and data optimization. Typically, the back-propagation algorithm is used to adjust the weight vector, which is updated so that the following error is reduced:
E=
1 2
¦ (t
k
− ok ) 2 .
(1)
k
where t k is a target value and ok is an output value of the neural network. During training phase, the weight vector is updated as follows:
Δwkj = −η
∂E , ∂wkj
Δw ji = − η
∂E . ∂w ji
(2)
where η is the learning rate. Plaziac proposed a deinterlacing method based on neural networks which use a single field [9]. In the algorithm of [9], the neural network has 30 inputs, 16 hidden neuron, and 3 outputs as shown in Fig. 1.
Pixels to be interpolated Existing pixels from the decimated image Pixels chosen as the input of neural network Pixels used to test the neural network accuracy
Fig. 1. Pixels used for inputs and outputs of Plaziac’s line-doubling method
2.2 Neural Network Deinterlacing Using Multiple Fields
In interlaced videos, adjacent fields provide valuable information for filling in the missing lines in the current field. In order to utilize this information the adjacent fields
972
H. Choi, E. Lee, and C. Lee
may provide, the proposed deinterlacing method uses three fields: the previous, current, and next fields. Fig. 2 shows the input pixels the proposed algorithm uses. It is noted that the proposed method uses 20 inputs. The proposed neural network has 16 hidden neurons and 1 output neuron. From the present field, 10 pixels are selected as inputs. The previous and next fields provide additional 10 pixels. In other words, the input vector and the output vector can be represented as follow: A = {a1 , a2 , a3 ,, a20 } ,
Previous frame
a1 a2 a3 a4 a5
B = {b1} .
(3)
Current frame
Next frame
a6 a7 a8 a9 a10 b1 a11 a12 a13 a14 a15
a16 a17 a18 a19 a20
Input : A Target: B
Fig. 2. Pixels used for the inputs and output in the proposed method
3 Experiments and Results Experiments were conducted in order to evaluate the performance of the proposed deinterlacing method. First, interlaced sequences were made by eliminating an even or odd line of progressive videos. Fig. 3 shows a process of progressive to interlaced Fn
Original Progressive Sequence
Fn+1
0
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7
0 Interlaced Sequence
1 2 3 4 5 6 7
Fig. 3. Progressive-to-interlaced conversion (scan line decimation) [11]
Neural Network Deinterlacing Using Multiple Fields
973
Table 1. Average PSNRs(dB) Format
QCIF
CIF
Video
ELA
Coastguard Container Foreman Hall & Monitor Mobile Mother & Daughter Silent Stefan Table Coastguard Container Hall & Monitor Mobile Akiyo Miss Mother & Daughter Silent Singer Stefan Table Average
26.38 26.62 33.38 28.65 23.04 35.98 32.92 23.63 26.76 28.09 27.77 30.82 23.40 37.86 40.94 38.60 33.91 33.42 26.07 29.39 30.38
NNDSF (by Plaziac) 26.13 26.37 30.55 28.34 23.49 33.98 32.07 24.09 26.22 28.16 28.27 30.45 24.92 35.29 37.09 35.97 32.65 32.22 26.85 29.98 29.66
NNDMF (Proposed) 31.92 35.24 35.00 34.91 30.25 39.35 37.28 26.49 28.44 31.03 36.16 35.22 28.71 41.46 41.73 40.17 36.96 37.05 25.84 27.61 34.04
Fig. 4. PSNR results of the three algorithms for the mobile sequence (QCIF format)
conversion. Then, the neural network is trained by using 5 QCIF video sequences: foreman, coastguard, container, hall&monitor, and mobile. From the 5 video sequences, the first 100 fields are used as training data. At the boundaries of images, pixels are mirrored to produce input vectors. After training, test videos of two video formats (QCIF, CIF) are used for performance evaluation. First, 9 QCIF videos (foreman, coastguard, container, hall&monitor, mobile, mother&daughter, silent, stefan, and table) are used as test data. Next, 11 CIF format videos (coastguard, container, hall&monitor, mobile, akiyo, miss, xtmother&daughter, silent, singer, stefan, and table) are used as test data.
974
H. Choi, E. Lee, and C. Lee
The proposed is compared with ELA and the neural network deinterlacing which uses a single field [9]. The peak signal to noise ratio (PSNR) is used as a criterion. Table 1 shows performance comparison of three methods: ELA, NNDSF (neural network deinterlacing which uses a single field), NNDMF (the proposed method). As can be seen, the proposed algorithm provides the best performance. Fig. 4 shows the frame PSNRs of the proposed method and other two methods for the mobile video sequence in QCIF format. Fig. 5 shows the reconstructed images and the original image of the 27th frame of the mobile video sequence.
(a)
(c)
(b)
(d)
Fig. 5. Reconstructed images of the 3 methods and the original image of the mobile sequence (27th frame). (a) ELA (b) NNDSF (c) NNDMF (Proposed method) (d) Original image.
4 Conclusion and Discussion In this paper, we proposed to use a neural network for deinterlacing with inputs taken from several fields. The proposed method use three fields: the previous, current, and next fields. Experimental results show that the proposed method significantly outperforms the existing methods.
References 1. Li, R.X., Zeng, B., Liou, Ming L.: Reliable Motion Detection/Compensation for Interlaced Sequences and Its Applications to Deinterlacing. IEEE Trans. Circuits and Systems for Video Technology, 10(1) (2000) 23-29 2. Doyle, T., Looymans, M.: Progressive Scan Conversion Using Edge Information. Signal Processing of HDTV II, L. Chairglione, Ed. Amsterdam, The Netherlands: Elsevier (1990) 711-721
Neural Network Deinterlacing Using Multiple Fields
975
3. Unser, M.: Splines: A Perfect Fit for Signal and Image Processing. IEEE Signal Processing Magazine, 16(6) (1999) 22-38 4. Bock, M.: Motion Adaptive Standards Conversion between Formats of Similar Field Rates. Signal Processing: Image Commun., 6(3) (1994) 275-280 5. Kovacervic, J., Safrank, R. J., Yeh, E. M.: Deinterlacing by Successive Approximation. IEEE Trans. Image Processing, 6(2) (1997) 339-344 6. Woods, J.W., Han, S.C.: Hierarchical Motion Compensated Deinterlacing. in Proc. SPIE, vol. 1605 (1991) 819-825 7. Bellers, E. B., de Haan, G.: Advanced Motion Estimation and Motion Compensated Deinterlacing. In Proc. Int. Workshop HDTV, Los Angeles, CA, Oct. , session A2 (1996) 8. Kwon, O., Sohn, K., Lee, C.: Deinterlacing Using Directional Interpolation and Motion Compensation. IEEE Trans. Consumer Electronics, 49(1) (2003) 198-203 9. Plaziac, N.: Image Interpolation Using Neural Networks. IEEE Trans. Image processing, 8(11) (1999) 1647-1651 10. Patterson, Dan. W.: Artificial Neural Networks. Prentice Hall (1995) 11. Jack, K.: A Handbook for the Digital Engineer. Fourth Edition, Elsevier (2004)
Non-stationary Movement Analysis Using Wavelet Transform Cheol-Ki Kim1, Hwa-Sei Lee1, and DoHoon Lee2 1
2
Department of Design, Pusan National University, South Korea School of Computer Science & Engineering, Pusan National University, South Korea [email protected], [email protected]
Abstract. This paper presents a method that automatically detects the insect’s abnormal movements. In general, the ecological data are difficult for analysis due to complexity residing in the systems with the variables varying in nonstationary fashion. Therefore, we needs to efficient methods that are able to measure from various environmental conditions. In this paper the wavelet transform are introduced as an alternative tool for extracting local and global information out of complex ecological data. And we discuss the method that is applicable to various relative fields.
1 Introduction Non-stationary movement data are difficult for analysis in ecological systems. Various mathematical methods have been used in ecology to analyze computational behaviors [1][2]. Recently IT techniques in ecological informatics have been applied to extraction of information from various fields such as forecasting and patterning, etc [3][4]. In real situations, however, local information in ecological data may also be important in revealing the states of individual specimens or ecological systems. The compressed information in parameters is usually to brief to address the local and global information sufficiently at the same time. In this regard, wavelets could be considered as an alternative tool to extract local and global information at the same time. Wavelet theory has been one of the most successful tools to analyze, visualize, and manipulate complex time based data, for which the traditional Fourier methods cannot be applied directly [5]-[7]. Analysis of data by wavelets allows one to study the data as if one studies material objects with a microscope capable of many levels of magnification. Wavelet approach is also flexible in handling irregular data sets. It can represent complex structures without the knowledge of the underlying function that generated the structure. It can precisely locate the jump discontinuities, singularities in dynamical systems [6]-[9]. Wavelets would be especially useful for finding scale dependent regularities from ecological data measured in experimental and field conditions. In implementation, wavelets have been efficiently used in extracting local information in time development and have been useful for characterizing or identifying changes in shapes of the curves [8][9]. One of the first applications of wavelets in ecology can be found in [10] in analyzing coherent structure existed between atmosD.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 976 – 981, 2006. © Springer-Verlag Berlin Heidelberg 2006
Non-stationary Movement Analysis Using Wavelet Transform
977
phere and forest. This paper outlines application of wavelets to ecological data in various types, and demonstrates the usage of wavelets for behavioral monitoring of indicator species treated with a toxic chemical.
2 The Proposed Method Wavelets were further used for monitoring continuous movement of an indicator species. DWT was applied to detection of changes in the shape of the movement tracks of the larvae of Chironomid after being individually treated with an insecticide, carbofuran, at the concentration of 0.1 mg/l. Figure 1 shows typical examples of the movement tracks of C. samoensis in the long term after treated with the chemical. This type of data could be continuously obtained from the image processing system [12]-[13]. In this case the typical responding behaviors could be observed from the movement tracks located at the center and right upper side of the cage in Figure 1, the shaking and highly-curved circular movements. In preliminary tests, various variables such as angle speed, angle acceleration, speed, acceleration, location, and maximum movement length were checked in revealing the states of the testing specimens. Among the variables phase angle was selected as input variable to detect the response behaviors of the specimens in this study, regarding that the typical symptomatic movements of the specimens after the treatments of the chemical were the highly shaking movements (Fig. 1). The response behaviors were characterized in changes in phase angles. The relationships of abnormal states of the test specimens with other characterizing variables will be discussed elsewhere. In the example segment shown in Fig. 1, the movement tracks were characterized in different phases. Usually the highly shaking movements with sharp changes in phase angles were observed when the species showed a number of small circular movements in a limited area.
Fig. 1. The movement tracks of the specimens of Chironomus samoensis larvae in 2D for the period of approximately 82000s first days after the treatment of carbofuran (0.1 mg/l)
Since the movement tracks were recorded in 2 dimensions from top view, we obtained the phase angle of the movement tracks in the usual manner for the complex variables for a point, x + yi , as shown in equation (1). Z = x + yi , θ = angle(Z ) , Z =| Z | . exp(iθ ) = R. exp(iθ )
(1)
978
C.-K. Kim, H.-S. Lee, and D. Lee
In the equation, ș= angle(Z) returns the phase angles in radians for each element of complex array Z, while the magnitude is given by R =| Z | . The phase angles were obtained continuously in every 0.25s during the observation period. Fig. 2a shows changes in phase angles of the movement tracks of the specimens after the treatments of the chemical. Different patterns in phase angles were correspondingly observed compared with Figure 1. The phase of movement in the limited area from the start to the middle period was presented with flat curves with fluctuation in small scale, while the longer circular turning movements were matched to clear periodic changes in phase angles in the latter part of observation (Fig. 2a).
Fig. 2. Changes in the values of phase angle in the movement tracks (approximately 760s) of Chironomus samoensis shown in Fig. 1 (a), and corresponding amplitude terms in different frequency components, The first level (D1) (b) and 8th level in scaling minimum (D8) (c) in 8 step decomposition according to Daubechie 4 function
After preliminary tests, Daubechie 4 was selected as the base function for DWT to detect changes in phase angles of the movement tracks among various base functions. To extract information of wavelets, hierarchical processes were applied to the data of phase angles. Figure 3 shows the filtering procedure in obtaining coefficients in wavelet analysis in this study. High (H_1) and low (H_0) filters were applied to the data to every step of filtering. When the data for signal were initially provided to the filters the signal is decomposed to two components in high frequency (D1) and low frequency (A1). Subsequently, A1 is decomposed to D2 and A2, and the component for low frequency (A2) was further decomposed for the third step of filtering. For simplicity of presentation the process up to D3 and A3 was listed in Figure 3. This process was repeated until A8 and D8 were obtained. Consequently, the formula to present the relationships between θ and amplitudes for different frequencies are as follows:
θ = A8 + D1 + D 2 + + D8
(2)
While the high frequency component in the first level (D1) provided the short term information with good time resolution, the components of high frequency in the eighth level (D8) carried the long term information. Among 8 components, we chose two levels components, minimum (D1) and maximum (D8), for detection of the
Non-stationary Movement Analysis Using Wavelet Transform
979
movement patterns (Fig. 2b and 2c). The minimum level component represents the highest frequency (D1) with the finest time resolution (Fig. 2b). With D1, the changes in phase angle in the lowest scale could be efficiently detected with the highest time resolution. Slight changes in phase angle in the shortest time period could be detected. On the other hand, the maximum level component (D8) could detect the ranges of somewhat longer ranges in the changes in phase angles (Fig. 2c). With preliminary tests, these two levels suffice for revealing the changes of behavioral states.
Fig. 3. Wavelet decomposition procedure, where S is the original signal to be decomposed, and H_0 and H_1 are lowpass and highpass filters, respectively
The changes in the amplitude terms in decomposition were differently observable in D1 and D8. The curve for the changes in the amplitude for D1 was sharp (Fig. 2b), while the corresponding curve for D8 was smoother, consequently indicating overall changes in amplitude terms (Fig 2c). By combining these two components, the time points for changes in the variables for both high and low frequency resolution could be detected through DWT. We selected the points that show the amplitude terms above the threshold both for D1 and D8 at the same time through AND logic. The characterizing coefficients at D1 were initially selected if the level was higher than a threshold ( θ = 0.01 ). Subsequently the coefficients at D8 were selected if the level was higher than a threshold ( θ = 0.05 ). Finally the points satisfying both conditions were chosen for indicating the changes in the patterns of the movement tracks.
3 Experimental Results We implemented the proposed method in Pentium IV 2.3GHz, 512MB and Matlab 6.5 and Visual C++. Detection was subsequently possible when a stream of input data for the data of phase angles was fed to the model to meet the criteria stated above (Fig. 4). The time points of the movement tracks with higher levels of changes in phase angles were sequentially detected (bold case in Fig. 4a). The detected patterns (bold case in Fig. 4a) were concentrated in the early phase when the specimens moved in a limited area in this case. An example of the enlarged segment detected by the model is shown in Figure 4b. The shaking and limited movement (bold case) and the normal movements not detected the model were both observed. The model was evaluated with the movement data for different individuals before and after the treatments. The data for the movement tracks were divided in every 757.5 s, and the total time of detection was calculated in each segment. The changes in detection time were significantly increased the data for different species after the treatments (Tab. 1).
980
C.-K. Kim, H.-S. Lee, and D. Lee
Fig. 4. The movement tracks detected by DWT. (a) track for the period of approximately 757.5s three days after the treatment. (b) track for the period of approximately 50s(from 200s to 250s) (c) track for the period of approximately 50s(from 450s to 500s). Table 1. Changes in detection rate of the movement patterns of Chironomus samoensis larvae before and after the treatments by using DWT (Student’s T-test)
Specimens
Treat
n
Mean
S.D
T
1
Before After Before After Before After
22 38 35 35 15 15
99.50 271.84 196.40 383.91 49.07 203.67
57.14 189.64 63.88 155.58 53.67 98.67
-4.142
Probability (p) <0.000
-6.600
<0.000
-5.330
<0.000
2 3
As demonstrated in this study, wavelets are efficient in extracting both local and global information in continuous data and could be useful for detecting changes in the variables as reported by other researchers [8][9]. Especially wavelets are useful for point out the time for monitoring. As is well known, this is not possible with the Fourier transform where only frequency detection is possible. An ANN has been used to utilize local information in behavioral data. In [13], they used the multi-layer perceptron(MLP) to detect response behaviors of an indicator species, medaka fish. The MLP was used to learn the variables characterizing movement tracks in this case, and detection of movement patterns with new data sets was possible with the trained network. Another useful method in artificial neural networks based on unsupervised learning, Self-Organizing Map (SOM), was used for clustering the segments of the movement tracks of indicator species [12]. In these studies, however, selection of the variables was pre-determined mainly by experience of the observers. Many variables were somewhat arbitrarily chosen and no quantitative evaluation could be addressed to select the variables. In the SOM, the segments are grouped in an unsupervised manner. However, the variables were arbitrarily chosen for grouping of the segments [12]. Consequently, choice of input variables was more dependent upon the observers’ subjective judgment for the case of artificial neural networks. Since detection of the treated behaviors were mainly based on the network algorithm, choice of detection is more case dependent for training with the artificial neural networks, and information on quantitative justification for selection could not be sufficiently provided, as is well known in the case of artificial neural networks. In this case for wavelet analysis,
Non-stationary Movement Analysis Using Wavelet Transform
981
the feature was selected more simply based on the amplitude terms in specific frequency components in DWT.
4 Conclusion Wavelets have good time-frequency (time-scale) localization, can represent data parsimoniously, and can be implemented with very fast algorithms. Wavelets are well suited for building mathematical models of ecological data and the statistical analysis of combined effects of complex factors in ecological network. Wavelet analysis was demonstrated in detecting response behaviors in time-series monitoring data in this study. Wavelet based analysis and synthesis may lead researchers in ecological studies to new insights and novel theories for understanding complex ecological and environmental phenomena.
References 1. Allen, T.F.H., Starr, T.B.: Hierarchy: Perspectives for Ecological Complexity. The University of Chicago Press, Chicago. USA. (1982) 310 2. Murray, J. D.: Mathematical Biology, Springer-Verlag, New-York (1989) 3. Lek, S., Delacoste, M., Baran, P., Dimopoulos, I., Lauga, J., Aulagnier, S.: Application of Neural Networks to Modelling Nonlinear Relationships in Ecology. Ecological Modelling, 90 (1996) 39-52 4. Recknagel, F.: Ecological Informatics: Understanding Ecology by Biologically-inspired Computation. Springer-verlag, Berlin (2003) 398 5. Mallat, Stephane G.: A Theory for Multiresolution Signal Decomposition: The Wavelet Representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11 (1989) 674-693 6. Meyer, Y.: Wavelets: Algorithms and Applications. SIAM Philadelphia (1993) 7. Morlet J.: Sampling Theory and Wave Propagation, NATO ASI Series. In: C. H. Chen (Editors), Acoustic Signal/Image Processing and Recognition, Springer-Verlag, I (1983) 233-261 8. Daubechie, I.: The Wavelet Transform, Time-frequency Localization and Signal Analysis. IEEE Transactions on Information Theory, 36 (1990) 961-1005 9. Rioul, O., Vetterli, M.: Wavelets and Signal Processing. IEEE Signal Processing, vol.8 (1991) 14-38 10. Gao, W., Li, B.L.: Wavelet Analysis of Coherent Structures at the Atmosphere-forest Interface. Journal of Applied Meteorology, 32 (1993) 1717-1725 11. Strang, G.: Introduction to Applied Mathematics. Wellesley-Cambridge Press, Wellesley, MA (1986) 12. Chon, T.S., Park, Y.S., Park, K.Y., Choi, S.Y., Kim, K.T., Cho, E.C.: Implementation of Computational Methods to Pattern Recognition of Movement Behavior of Blattella Germanica (Blattaria: Blattellidae) Treated with Ca2+ Signal Inducing Chemicals. Appl. Entomol. Zool,39(1) (2004) 79-76 13. Kwak, I.S., Chon, T.S., Kang, H.M, Chung, N.I., Kim, J.S., Koh, S.C., Lee, S.K., Kim, Y.S.: Pattern Recognition of the Movement Tracks of Medaka (Oryzias latipes) in Response to Sub-lethal Treatments of an Insecticide by Using Artificial Neural Networks. Environmental Pollution, 120 (2002) 671-681
Novel Fault Class Detection Based on Novelty Detection Methods Jiafan Zhang1, 2, Qinghua Yan1, Yonglin Zhang1, and Zhichu Huang2 1
Department of Mechanical Engineering, Wuhan Polytechnic University, Wuhan 430023, China 2 School of Mechanical and Electrical Engineering, Wuhan University of Technology, Wuhan 430070, China {jfz, qhy, ylz22}@whpu.edu.cn, [email protected]
Abstract. The ability to detect a new fault class can be a useful feature for an intelligent fault classification and diagnosis system. In this paper, we adopt two novelty detection methods, the support vector data description (SVDD) and the Parzen density estimation, to represent known fault class samples, and to detect new fault class samples. The experiments on real multi-class bearing fault data show that the SVDD can give both high identification rates for the prescribed ‘unknown’ fault samples and the known fault samples, which shows an advantage over the Parzen density estimation method in our experiments, via choosing the appropriate SVDD algorithm parameters.
1 Introduction Statistical pattern recognition techniques, including neural networks, have been explored intensively for the machinery fault classification and diagnosis in the last decade. To construct such a classification and diagnosis system, some typical fault pattern data are required during the training phase. Obviously, it is possible that novel faults are evolving while a monitored machine is running. These faults are different from those that have been trained to the diagnostic system and need to be promptly detected. Therefore, it is desirable that such diagnostic system should not only correctly discriminate all trained faults, but also detect unseen faults. The latter involves so-called novelty detection (ND) problems. For a machine learning system the ability to identify novel pattern class is known as novelty detection [1, 2]. Different approaches to the problem of the ND have been proposed, using such as density estimation, k-nearest neighbour algorithm or neural networks. Most of them have been applied to classification problems as well. Detection of novel patterns can be done by estimating density of the known pattern data and rejecting novel pattern data, which are below a probability threshold, in low probability areas [3, 1]. The density estimation can be based on a model of the data, for instance a mixture of Gaussian distributions, or estimated by Parzen windows. Both methods require large numbers of training data to make reliable probability density estimation. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 982 – 987, 2006. © Springer-Verlag Berlin Heidelberg 2006
Novel Fault Class Detection Based on Novelty Detection Methods
983
Neural networks have been used extensively in novelty detection. Multi-layer perceptrons and radial basis function networks are the best-known and most widely used neural networks. One of the simplest approaches to novelty detection is based on thresholding output of these networks [4]. The rejection criterion can be based on, for instance, two thresholds. The first threshold is applied to the winning node rejecting patterns whose activation falls below this value, while the second threshold is applied on the difference between the activations of the winning and second winning nodes rejecting patterns that fall below it. Another approach of novelty detection is to find bounded regions that contain (almost) all data, and uses restricted shapes for their class boundaries like hyperspheres. Support vector data description (SVDD), which inspired on the support vector classifier and proposed by Tax and Duin [5, 6], seems to give a flexible and tight data description among the boundary approaches. Since the SVDD focuses on the boundary description and not on the complete data density, the required number of data is smaller than for, e.g., the Parzen density estimation. In this paper, the SVDD and the Parzen density estimation are adopted to model known fault class samples, and to detect novel fault class samples that are different from those that have been modeled.
2 Support Vector Data Description Given a target dataset containing n data samples {xi , i =1, 2, …, n }, each object in the dataset is mapped onto a high-dimensional feature space using a nonlinear mapping φ xi). A hypersphere with minimal volume, which has the center a and the radius R, is defined to enclose the mapped data. By minimizing the hypersphere volume, we hope to minimize the chance of accepting outlier samples. To allow the possibility of outliers in the training set, the distance from φ xi) to the center a must not be strictly smaller than R 2, but larger distances should be penalized. Therefore, the slack variables ξ i are introduced. The above optimization problem becomes then: 2
min F ( R, a, ξ ) = R + C ¦ ξ i i
(1)
i
with constraints φ ( xi ) − a
2
2
≤ R + ξ i , ξ i ≥ 0 , ∀i .
(2)
The parameter C controls the trade-off between the volume and the errors on the training target dataset. The constraints (2) can be incorporated in the equation (1) by applying Lagrange multipliers βi and optimizing the Lagrangian. Furthermore, the inner products (φ xi)·φ xj)) in the obtained results can be replaced by a kernel function K(xi, xj). Thus, we obtain the dual forms of the optimization problem as follows: max L = ¦ βi K ( xi , xi ) − ¦ βi β j K ( xi , x j ) i
i, j
(3)
984
J. Zhang et al
with constraints a = ¦ βiφ ( xi ) , ¦ βi = 1 , 0 ≤ β i ≤ C , ∀i . i
(4)
i
In practice, it appears that a large fraction of the βi becomes zero. For a small fraction, βi > 0, and the corresponding data samples xi are called support vector. These samples appear to lie on the boundary. An test sample z is then accepted by the description when:
φ ( z ) −a
2
2
= K ( z , z ) − 2 ¦ βi K (z , x ) + ¦ βi β j K ( x i , x ) ≤ R , i
i
j
i, j
(5)
where the radius R can be determined by calculating the distance from the center a to a support vector xi on the boundary. In most cases, the RBF kernel seems to work very well on the data descriptions [5], which has a form as follows:
K ( xi , x j ) = exp ( − xi −x j
2
2
/ b ), b > 0 .
(6)
Thus, for the SVDD the parameter C and kernel parameter b need to be userdefined. Their values commonly have the range, 0<1/nC<1, b = 1-25. Additionally, with decreasing C or b the number of support vectors increases, the covered volume of the data description decreases, i.e. the error on the target data class increases, however, the chance of accepting outlier samples decreases [6].
3 Novel Fault Class Examples Detection Using the SVDD In this section, we try to utilize the SVDD to detect novel fault class samples. As an example, four different operating conditions of the rolling bearing are considered, which are (i) normal conditions without faults; (ii) a ball fault; (iii) an inner race fault; (iv) a outer race fault. Four sets of vibration data corresponding to four operating conditions can be obtained from the Bearing Data Center Website (available at http://www.eecs.cwru.edu/laboratory/bearing/download.htm) of the Case Western Reserve University. 3.1 Feature Extraction of the Rolling Bearing The discrete wavelet transform (DWT) is selected for the vibration data analysis, which uses the Daubichies-2 wavelet by five levels, i.e. the approximation (a5), and five levels of details (d1-d5). A DWT feature vector is defined for a given vibration data as v = [v1, v2, …, v10]T with its element defined as: vi = σ i / σ ri
i = 1, 2, …, 6 ,
(7)
where i = 1, 2, …, 6, corresponds to d1, d2, …, d5, a5, respectively and ıi is the standard deviation of the ith decomposition, e.g. ı1 is the standard deviation of d1;
Novel Fault Class Detection Based on Novelty Detection Methods
985
ıri is the standard deviation of ith decomposition of a reference signal (in this case we have chosen a data set acquired under normal operating condition and zero load). v7 and v8 are Crest factor of d5, a5 respectively, v9 and v10 are Impulse factor of d1, d2 respectively. In our experiments, for each class 200 10-dimensional feature vectors (i.e. data samples) are extracted from each set of vibration data corresponding to one of four different operating conditions of the rolling bearing. 100 feature vectors among them are used for training, and one half of the rests for the validation, the other half for testing. In addition, our experiments are done using DD-Tools [7].
3.2 Novelty Detection Experiments For the first experiment, the goal is to identify the outer race fault as novelty; other three classes data that correspond to normal conditions without faults, a ball fault and a inner race fault are considered as target data and represented using the SVDD. Before proceed to do it, we briefly consider a two-layer perceptron trained on the target data, used for the novelty detection. This network has structures of 10-14-3 for each layer unit numbers, and has a total correct recognition rate 94% on the target testing dataset. Fig. 1 shows the output results when 50 examples of the outer race fault present to it. There are 33 unseen fault examples whose winning node has an activation larger than 0.9, and the difference between the activations of the winning and second winning nodes larger than 0.8. If the first threshold is set to be 0.9, the novelty detection rate is 34% (17/50). Although the threshold setup is not rigorous, and the network structures are not optimal, we can find that the multilayer perceptrons used for novelty detection based on thresholding output of these networks seem to give unsatisfactory results.
Fig. 1. The output of unseen fault examples input to a two-layer perceptron (“ƕ”-activation of output node 1, corresponding to normal conditions without faults; “+” -activation of output node 2, corresponding to a ball fault; “∗” -activation of output node 3, corresponding to a inner race fault)
986
J. Zhang et al
Now, we return to the SVDD on the target data above. It should be noted that a prespecified fraction r, which specifies how many the training target data will be rejected, needs to be user-defined in Matlab software DD-Tools instead of the parameter C in the SVDD. We choose trial values r = 1%, 5%, 10%, and b = 2-10 for RBF kernel parameter in all our experiments. Additionally, in order to evaluate the performance of the SVDD, two indices should be calculated: the true positive rate (TP) or target recognition rate measures how well the algorithm recognizes new examples of the known fault classes, and the true negative rate (TN) or novelty detection rate does the same for examples of an unknown novel fault class. Fig. 2(a) shows the performance of the SVDD on the validation dataset of the known fault classes and prescribed novel fault class under varying r and b. As shown in the figure, all novel fault examples in the validation dataset can be detected successfully, which is irrelevant to parameters r and b in our chosen ranges. The recognition rate (or TP) for new examples of the known fault classes also ranges from 93.33 % to 98.67 % for r = 1%, b = 3-10, and 83.33 % to 92 % for r = 5%, b = 3-10. Obviously, the SVDD gives better detection results for the novel fault class, i.e. the outer race fault, compared with those obtained from a two-layer perceptron discussed previously. For the second experiment, the inner race fault is assigned to be a novel fault class. As shown in Fig. 2(b), in contrast to the target recognition rate, the novelty detection rate is very sensitive to the kernel parameter b, and decreases sharply as b increases, especially for r = 1%, or 5%. This can be understood by looking at the distribution of the data, where the inner race fault class is between the ball fault class and the outer race fault class. In this case, the SVDD will be unable to find compact surrounding boundaries for data of each target class when the r is small. But, for r = 10 % and b = 3-5 the SVDD can also give good results with the novelty detection rate in the range of 90%-98% and the target recognition rate in the range of 82 %-86 %. For the testing dataset, when r = 10 % and b = 4, there are TN = 98 %, TP = 83.33 %; when r = 10 %, b = 5, TN = 88%, TP = 86.67%. Fig. 2(c) shows the SVDD performance on the third experiment, which assigns the ball fault to be a novel fault class. Further explanations will be omitted here.
(a)
(b)
(c)
Fig. 2. The novelty detection performance on unseen fault examples based on the SVDD. (a). the outer race fault as novel fault class; (b). the inner race fault as novel fault class; (c). the ball fault as novel fault class.
Novel Fault Class Detection Based on Novelty Detection Methods
987
The SVDD method is compared with the Parzen windows method here. Correspondingly used for three experiments above, the Parzen windows method gives TN = 100%, TP = 40%; TN = 100%, TP = 60% and TN = 98%, TP = 39.33% respectively on their testing datasets for r = 5 % or r = 10 %. Obviously, it provides a slightly better novelty detection rate, but a greatly lower target recognition rate in our experiments. The latter is caused because of the over-training of the Parzen density estimation. Thus it cannot entirely meet the performance requirement, i.e. not only a high novelty detection rate, but also a high target recognition rate.
4 Conclusions In this paper we find that the SVDD can gives both high identification rates for the novel fault class and the known fault class examples, which shows an advantage over the Parzen windows method in our experiments. Moreover, its performance is dependent on the selection of the appropriate algorithm parameters. When no novel fault examples are available, the optimal algorithm parameters cannot be determined. Therefore, it may be helpful for the determination of the appropriate parameters that some artificial outliers can be generated around the known fault class examples.
References 1. Markou, M., Singh, S.: Novelty Detection: A Review-part 1: Statistical Approaches. Signal Processing, 83 (12): (2003) 2481-2497 2. Markou, M., Singh, S.: Novelty Detection: A Review-part 2: Neural Network based Approaches. Signal Processing, 83 (12): (2003) 2499-2521 3. Barnett, V., Lewis, T.: Outliers in Statistical Data. John Wiley & Sons Ltd., 2nd edition, (1978) 4. Stefano, C. De, Sansone, C., Vento, M.: To Reject or Not to Reject: That is The Questionan Answer in Case of Neural Classifiers. IEEE Trans. Systems Man Cybernetics-Part C, 30 (1) (2000) 84-94 5. Tax, M. J. , Duin, P. W.: Support Vector Domain Description. Pattern Recognition Letters, 20(11-13 191-1199 6. Tax, M. J., Duin, P. W.: Support Vector Data Description. Machine Learning, 54: (2004) 45-66 7. Tax, M. J.: DD-tools, The Data Description Toolbox for Matlab, (2005)
Novel Scheme for Automatic Video Object Segmentation and Tracking in MPEG-2 Compressed Domain∗ Zhong-Jie Zhu1,2, Yu-Er Wang1, Zeng-Nian Zhang1, and Gang-Yi Jiang2,3 1 Ningbo
Key Lab. of DSP, Zhejiang Wanli University, Ningbo 31 51 00, China [email protected] 2 National Key Lab. of Machine Perception, Peking University, Beijing 10 08 71, China 3 Institute of Circuits and Systems, Ningbo University, Ningbo 31 52 11, China
Abstract. In this paper, a novel scheme for fast object segmentation and tracking in MPEG-2 compressed domain is proposed. The video object is finally extracted after steps of motion detection, vector-based watershed segmentation, fusing operation, and finally edge correcting and morphologic post-processing. The tracking algorithm is fast and simple. All the processes are mainly implemented in compressed domain without the need of fully decoding of compressed stream. The information of motion vectors and DCT coefficients used in the algorithm are directly extracted from the compressed stream. Experimental results reveal that the proposed algorithm can extract objects directly from compressed stream with accurate contours and the object tracking algorithm is also efficient.
1 Introduction Video object segmentation and tracking is a key technology in many video processing applications such as object-based coding and object-based retrieval [1, 2, 3]. Object segmentation can be implemented either in uncompressed domain or directly in compressed domain. Most of the currently proposed algorithms are operated in uncompressed pixel domain, where the compressed stream should be fully decoded before segmentation can be performed. As a result, the computational complexity is high and may not meet real-time requirement in some applications [4, 5, 6]. Compressed-domain methods extract video objects directly from the compressed stream without the need of fully decoding of video stream. Hence, those methods are comparatively simple and fast. But it is still a challenge to extract accurate video objects directly from compressed stream [7, 8]. ∗
This work was partially supported by the Natural Science Foundation of Ningbo, China (200601A6301025); the Open Project Foundation of National Key Laboratory of Machine Perception of Peking University (2005176); the Foundation of Education Ministry of Zhejiang Province, China (200502123); Key Scientific and Technological Project of Ningbo, China (2005B100003).
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 988 – 994, 2006. © Springer-Verlag Berlin Heidelberg 2006
Novel Scheme for Automatic Video Object Segmentation
989
The paper presents a novel scheme aiming at segmenting and tracking video object directly from MPEG-2 compressed stream. The whole scheme consists of five steps. Firstly, motion vectors of Macro-blocks (MB) are extracted from compressed stream and initial motion area is detected after noise removal. Secondly, a vector-based watershed algorithm is proposed to classify the DCT coefficient-blocks and output a number of homogeneous regions. Thirdly, the results of the motion detection and watershed segmentation are fused and initial video objects are detected. Following that edge correcting and morphologic post-processing are implemented to finally extract accurate objects. Finally, a new tracking algorithm is proposed for efficiently object tracking.
2 Proposed Algorithm In MPEG-2 standard, there are three types of frames: I-frames, B-frames and Pframes. In the proposed algorithm, only the DCT coefficients of I-frames and the MB motion vectors of P-frames are used. 2.1 Motion Detection The goal of this process is to detect the initial motion area. Since the existing of noise is inevitable, noise removing should be performed. We assume the noise to be 0-mean Gaussian distribution and the motion information of video object to be non-Gaussian signal. Then the motion detection can be viewed as a process of detecting nonGaussian signal from Gaussian signals. Here, we use the high-order statistical method to perform motion detection which is widely used for noise removal. Let V denote the motion vector of the i-th MB of a P-frame. The result of motion detection in essence is to extract a binary motion mask, B(i ) , where B(i) ∈ {0,1} . If B(i) = 0 , it means the i-th MB belongs to motion area, or else it belongs to still background. The motion detection process is briefly introduced as follows: i
1. First, for each V , compute its fourth moment: i
Vi ( 4 ) =
1 s w2
¦ || V
VP ∈w
p
− Vi || 4
(1)
where w is a s × s window centered at V and V is the mean vector of all the motion vectors within w . 2. Then let σ 2 denote noise variance calculated as follows: w
w
i
i
σ2 =
1 s w2
*
¦ || V
V p ∈w *
p
− V || 2
(2)
where w is a area with size of s selected from the still background. 3. Finally, the binary motion mask is derived: *
w*
1 Vi ( 4 ) ≥ ω (σ 2 ) 2 , B (i ) = ® ( 4) 2 2 ¯0 Vi < ω (σ ) ,
where ω is a weighted coefficient.
(3)
990
Z.-J. Zhu et al
When object’s motion velocity is too small, the above extracted mask derived only from a single P-frame often contains holes. Hence, multi-frames should used to acquire an ideal mask. Assume K is the number of frames used, then B(i) = max{Bk (i ) | k = 0 ~ K − 1} .
(4)
2.2 Vector-Based Watershed Segmentation The above detected B(i ) only has MB-level accuracy. In this part the DCT-coefficient blocks of I-frame are classified according to similarity using the classical watershed algorithm which is a classical image segmentation tool and has been widely in use. Traditional watershed algorithms are mainly implemented on pixel grey or color values in spatial-domain. In this paper a vector-based watershed algorithm is proposed to classify DCT-coefficient blocks. The watershed algorithm is only implemented in I-frame. First we write the vector form of a DCT block Di as: Di = (di 0 , ai1 , ai 2 ,, aik ,, ai 63 )
(k = 1,,63) ,
(5)
where d i and aik denote the DC and AC coefficients of the i-th block Di , respectively. Then, the similarity Oij between the i-th block and the j-th block is defined: 0
Oij = (di 0 − d j 0 )2 + (ai1 − a j1 )2 + (ai 2 − a j 2 )2 .
(6)
The steps of vector-based watershed algorithm are briefly introduced as follows: 1. For each Di , calculate its module || Di || . Assuming Ds has the smallest module, for all the other Di s, calculate their Ois s. Then a scalar p is determined according to the maximum and minimum of Ois s. According to p, the DCT-coefficient blocks of the whole I-frame are classified into several sets of grades. Each Di belongs to one grade and the one that Ds belongs to is defined as the first grade, and all those whose Ois s are no larger than p are classified into the first grade too. Similarly, with the Ois increasing, the grade increases. 2. Starting with all the Di s in the first grade, assign each Di a mark according to such a rule: assign adjacent Di s the same mark and isolated Di a new mark. 3. For all the Di s in the next grade, assign marks according to steps 4.-6. 4. In the neighborhood of Di , if there exit DCT-blocks having assigned the same mark except watershed then assign the same mark to Di . 5. In the neighborhood of Di , if there exit DCT-blocks having assigned different marks except watershed then assign a special mark to Di called watershed. 6. In the neighborhood of Di , if there exit no DCT-blocks having assigned a mark except watershed then assign a new mark to Di implying a new region emerges. 7. Iterate steps (3)–(6) till each Di of all grades has been assigned a mark. All the Di s that assigned the same mark belongs to the same region.
Novel Scheme for Automatic Video Object Segmentation
991
The whole frame is segmented into many homogenous regions Oq (q = 0,1, , Q ) after watershed segmentation, where Q is the number of regions. 2.3 Fusing Operation and Moving Object Extraction In this part, the results of both motion detection and watershed segmentation are fused and the initial video object is extracted. Let B( x, y ) denote the binary mask with pixel accuracy. Let Oq (q = 0,1, , Q ) denote the segmented result of watershed segmentation and
Nq
denote
the
size
of
each Oq
.
Define
the
matching
rate
ρ q B between Oq and B( x, y ) as ρq B =
1 Nq
¦ B ( x, y) (q = 0,1, , Q) .
(7)
( x , y )∈Oq
Given a threshold ρ , then the final moving object M ( x, y ) is extracted: T
o
M o ( x, y ) =
¦O
ρ q B > ρT
q
(q = 0,1, , Q) .
(8)
2.4 Edge Correcting and Morphologic Post-processing
The above extracted object M ( x, y ) only has block-level accuracy. In order to improve the accuracy, edge correcting and morphological post-processing are implemented. The steps of edge correcting and morphologic post-processing can be expressed as: o
1. Firstly, all the contour DCT-blocks {D }(i = 1,, E ) are extracted from M ( x, y ) , where D is a contour block means that D lies in the object boundary and in its eight adjacent areas there are two and only two contour blocks. 2. Then, all the contour blocks {D }(i = 1, , E ) are inversely transformed into spatial domain denoted as {D ( x, y )}(i = 1, , E ) . 3. Thirdly, feature detection is performed within {D ( x, y )}(i = 1, , E ) to extract all the feature points {P ( x, y )}(i = 1, , M ) using the classical moravec operator [9]. For each contour DCT-block, only up to three feature points are extracted. 4. Next, the linear interpolation is performed between two adjacent feature points. 5. Finally, the morphological post-processing is performed to finally obtain an accurate object. i
i
o
i
i
i
i
i
2.5 Object Tracking
Object tracking is another important step after object extraction. Here, a new blocklevel tracking algorithm based on object contour is proposed briefed as follows: 1. First, extract the block-level contour of object, where the extracted contour should meet such a requirement: any block of the contour should have and only have two contour blocks in its eight adjacent areas except the first and the last ones. An
992
Z.-J. Zhu et al
example is shown in Fig. 1, where Fig.1 (a) is not a part of right contour and should be processed into Fig.1 (b) or Fig.1 (c). 2. For each block C of the contour, search its matching block according to such a rule: only three blocks are searched to select a best matching block from the next frame for C according to the position relations between the two blocks in front of C , A and B , as shown in Fig.2.
(a)
(b)
(c)
Fig. 1. Extracted object contour
(a)
(b)
(c)
(d)
Fig. 2. Four position relations between A and B
In order to improve the accuracy of tracked objects, the aforementioned edge correction and morphological post-processing processes should also be performed.
3 Experimental Results To evaluate the performance of the proposed scheme, experiments are implemented. First, the initial motion area is detected after noise removal. Then the vector-based
(a)
(b)
(c)
(d)
Fig. 3. Segmentation results of video sequence Alex
a
b
c
Fig. 4. Segmentation results of Mom video sequence
d
Novel Scheme for Automatic Video Object Segmentation
993
watershed segmentation is performed based on DCT-blocks. Third, the results of motion detection and watershed segmentation are fused and the video objects are detected with raw boundaries. Forth, edge correcting and morphologic postprocessing is implemented to extract accurate objects. Finally, object tracking is implemented. Fig.3 and Fig.4 show the segmentation results of Alex and Mom video sequences respectively, where Fig.3 (a) and Fig.4 (a) are original images, Fig.3 (b) and Fig.4 (b) are the extracted objects after fusing operation, Fig.3 (c) and Fig.4 (c) are the extracted objects after edge correcting and morphologic post-processing and Fig.3 (d) and Fig.4 (d) are the corresponding spatial objects. Fig.5 shows the tracked results where Fig.5 (a) is the extracted object from the 30th frame, and Fig.5 (b), Fig.5 (c) and Fig.5 (d) are the tracked objects from the 35th frame, 40th frame and 45th frame, respectively.
(a)
(b)
(c)
(d)
Fig. 5. Tracked results of Alex video sequence
4
Conclusion
Compressed domain methods can extract video objects directly from the compressed stream, which is comparatively fast, simple and efficient. But since there is less useful information available, it is a challenge to extract accurate objects directly from compressed stream. In this paper, the proposed scheme can extract objects directly from compressed stream with fairly accurate contours and the tracking algorithm can efficiently track objects in following frames. The experimental results reveal that the whole scheme is efficient.
References 1. Girgensohn A., Adcock J., Cooper M., Wilcox L.: Interactive Search in Large Video Collections. CHI 2005 Extended Abstracts, ACM Press (2005)1395-1398 2. JungHwan O., Quan W., JeongKyu L., Sae H.: Video Data Management and Information Retrieval. Idea Group Inc.and IRM Press (2004) 321-346 3. Asaad H., Khurram S. , Mubarak S.: An Object-Based Video Coding Framework for Video Sequences Obtained from Static Cameras. ACM Multimedia (2005) 608-617 4. Yu, W. Y., Xie, S. L., Yu, Y. L. , Pan, X. Z.: Actuality and Development of Video Retrieval Based Semantic. Application Research of Computers (2005)1-7 5. Hauptmann A. G., Christel M. G.: Successful Approaches in the TREC Video Retrieval Evaluations. In: Proc.ACM Multimedia (2004) 668-675 6. Mezaris V., Kompatsiaris I. , Strintzis M. G.: An Ontology Approach to Object-based Image Retrieval. In: Proc. ICIP03 (2003)511-514
994
Z.-J. Zhu et al
7. Mezaris V., Kompatsiaris I., Boulgouris N. V. , Strintzis M. G.: Real-time CompressedDomain Spatiotemporal Segmentation and Ontologies for Video Indexing and Retrieval. IEEE Trans. Circuits Syst. Video Techn.(2004) 606-621 8. Mezaris V. , Strintzis M. G.: Object Segmentation and Ontologies for MPEG-2 Video Indexing and Retrieval. In: Proc. CIVR 2004(2004)573-581 9. Moravec H.: Towards Automatic Visual Obstacle Avoidance. In Proc. 5-th International Joint Conference on Artificial Intelligence (1977)584
Offline Chinese Signature Verification Based on Segmentation and RBFNN Classifier Zhenhua Wu, Xiaosu Chen, and Daoju Xiao College of Computer Science and Technology, Huazhong University of Science and Technology, 430074, Wuhan, China [email protected], {x_s_chen, d_j_xiao}@mail.hust.edu.cn
Abstract. A simple, low computational cost and robust segmentation method is proposed by means of having successful experiences of strokes extraction for handwritten Chinese character and taking into account the characteristics of signature verification. After segmented and feature extracted, each signature is represented by a series of 6-dimensions vectors. Then, the degree of similarity between the questioned sample and 4 genuine signature samples stored in the reference database is calculated using these vectors. At last, the similarity vectors are inputted into RBFNN Classifier to decide whether the question sample is a genuine sample or not. The promising results of experiments indicate the segmentation method is fitting for Chinese signature verification and the whole verification method distinguish forgeries from genuine signatures effectively.
1 Introduction Many approaches for solving the involved automatic off-line signature verification problems have been reported. In general, the proposed techniques use either a type of features (global, local, statistical, geometric, graphometirc, pseudo-dynamic etc.), or a combination of different types of features[1-2], which are extracted from the signature images. From the classifiers perspective, many new classifiers have been used. Classifiers like Hidden Markov Model[3-4] and Support Vector Machine[5] need a substantial number of samples to produce a robust model for each writer in the training phase. Handwritten Chinese character is a complicated square graph composed of different strokes organized in a highly artistic and imaginative manner. It is different from signature shape of alphabetic extraordinarily. This paper introduces a simple and efficient off-line approach that can be applied to Chinese signature verification. The approach is based on feature extraction of every segment of the signature skeleton and a general model of RBFNN classifier. Signature segmentation is a very crucial and difficult processing task, since different signatures of one writer can differ from each other by local stretching, compression, omission or additional parts. All the three different type of forgeries (random, simple and skilled) [5] are considered in our approach. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 995 – 1001, 2006. © Springer-Verlag Berlin Heidelberg 2006
996
Z. Wu, X. Chen, and D. Xiao
2 Signature Database A set of signatures from 40 writers makes up the signature database (30 genuine samples, 8 simple forgeries and 5 simulated forgeries per writer), which is divided into two parts. The first part, composed of 8 writers, is used in the training stage. The second part including the rest is used in the verification stage. Four genuine samples are used for reference. The rest of 26 genuine samples are used in testing for each writer. The random forgeries of every writer are composed of 8 simple forgeries of the other writers in database. So, in training database, each writer has 4 reference samples, 26 genuine samples, 8 simples forgeries, 5 simulated forgeries and 56 (7*8) random forgeries. In verification database the number of random forgeries for each writer is 248 (31*8); the other numbers are the same as the first database.
3 Preprocessing, Segmentation and Feature Extraction 3.1 Preprocessing To get the smallest outline border of the signature, the well-known method of vertical and horizontal projections is employed. Therefore, the white space surrounding the signature is discarded. After that, the thinning algorithm described in ref [6] is used to generate the skeleton. Figure 1 shows two examples of this thinning result. 3.2 Segmentation Handwritten Chinese character recognition uses the similarity of structure of the same character written by different writers to identify that which character is represented by the character image. So distinct and relatively standard strokes extraction is crucial for handwritten Chinese character recognition based on structure analysis. In order to extract distinct and relatively standard strokes, skeleton based method often includes two key processing steps for stroke extraction: feature points extraction and broken stroke connection[7-9]. Unlike handwritten Chinese character recognition, Chinese signature verification uses the distinctions existing between forgeries and genuine signatures to detect forgeries, thus it is not necessary and even should be avoided to extract distinct and relatively standard strokes for Chinese signature verification system, since these operations generally decrease the distinctions between forgeries and genuine signatures. In terms of the difference between two systems, we simplify the segmentation algorithm based on skeleton image by adjusting the definition of feature points and getting ride of the broken strokes connection step. Let S is the skeleton image after thinned, and a point P is a black point, if it is a skeleton point in the thinned image S. Black point can be mathematically formalized by
1 if P is a black po int . S (P) = ® ¯0 otherwise
(1)
All the black points compose the set of skeleton points denoted by Sb. Figure 2 shows the pixels P and its eight–neighbors.
Offline Chinese Signature Verification
Fig. 1. Sample results of skeletnization
997
Fig. 2. Pixel and its neighbours
According to the 8-adjacency shown by Fig.2, the skeleton points in Sb can be classified into three classes: 1. If Nb(P)=1, P is an end point; 2. If Nb(P)=2, P is a connective point; 3. If Nb(P)>2, P is a fork point. Where Nb(P) is the number of the black adjacent points of skeleton point P, i.e. 7
N b ( P) = ¦i =0 S ( Pi ) .
(2)
The set of all fork points can be mathematically formalized by: Sf ={P|Nb(P)>2,P
Sb}.
(3)
The steps of the segmentation algorithm are described below: • Step 1: Extract all fork points and add them to Sf. • Step 2: Convert the value of all fork points to 0, i.e. S(P)=0, P Sf. Now there are only two kinds of points: end point and connective point. • Step 3: Trace out all segment curves between end points to constitute the segments set by scanning the skeleton image after fork points removed following the natural writing behaviour top-to-bottom and left-to-right. • Step 4: Clear the segments in which the number of black points is smaller than a certain value, in our experiments it is 5.
Fig. 3. Sample result of segmentation
This segmentation method is different from the strokes extraction method used for Chinese character recognition in two aspects mainly. First, the definition of fork point is different. The definition described by equation 3 enlarges the number of fork point. Because the fork points in our method serve only for disconnecting the connection of multi-strokes, expanding the quantity of fork points doesn’t lead to increasing of
998
Z. Wu, X. Chen, and D. Xiao
computation cost. On the contrary, it removes some bug points introduced at thinning stage. Second, there is no broken strokes connection part in our segmentation method since standard strokes is not necessary for Chinese signature verification mentioned above. This simple segmentation algorithm has three advantages: (1) decreasing the computational cost, (2) increasing the robustness for the large variations of Chinesesignature and (3) preserving more individuality of signature. Figure 3 shows a sample result of this segmentation. 3.3 Feature Extraction After preprocessed and segmented, the signature has a series of segments with the same size of the signature. In order to find the two most matching segments between the test and the reference signature, every segment is represented by a set of six features for comparison The first two features are relative horizontal and vertical center of the segment: N
C xs
M
x ⋅ s ( x, y ) x =1
¦ ¦ = ¦ ¦ y =1 N
y =1
M
s ( x, y ) x =1
/ M , C ys
M
N
x =1 N
y =1 M
¦ ¦ = ¦ ¦ y =1
y ⋅ s ( x, y )
/N .
(4)
s ( x, y ) x =1
Where, M denotes the width of the signature; N denotes the height of the signature; s ( x, y ) is the image of the segment s. The other four features reflect the trace or slant information of the segment s. They s are (1) the number ( Ph ) of points which has horizontal neighbor, (2) the number s s ( Pv ) of points which has vertical neighbor, (3) the number ( Po ) of points which has s positive diagonal neighbor and (4) the number ( Pe ) of points which have negative diagonal neighbor. The four kinds of neighbor are shown in figure 4. All the four values are normalized to [0 1]. They are given as
Rks = Pks
¦P
s k k∈{ h ,v ,o ,e}
, k ∈{h, v, o, e} .
(5)
Fig. 4. Four kinds of neighbor
4 Comparison and Experimental Results 4.1 Similarity Calculation Between Two Signatures After the processing of segmentation and feature extraction, the signature is represented by a series of 6 dimensions vectors composed of a set of six features. Each of
Offline Chinese Signature Verification
999
these vectors represents a segment of the signature. For a pair of signatures, the most corresponding segment in signature B of each segment in signature A is found by using a feature matching approach based on Euclidean distance. Let there are n segments in signature A and m segments in signature B. For each segment (segment i) in signature A, the 2*m Euclidean distances are calculated as
d ij1 = (C xi − C xj ) 2 + (C yi − C yj ) 2 ,1 ≤ j ≤ m , d ij2 =
¦ (R
i k
− Rkj ) 2 ,1 ≤ j ≤ m .
k∈{ h ,v ,o ,e}
(6)
(7)
Then the corresponding of segment i in signature A is segment q in signature B. q is calculated by formula described as below:
k , min{d ik2 , d il2 , d is2 } = d ik2 ° q = ® l , min{d ik2 , d il2 , d is2 } = d il2 ° s, min{d 2 , d 2 , d 2 } = d 2 ik il is is ¯
{k , l , s} ⊂ [1, m] ,
(8)
where, k, l and s are the serial number of three segments in signature B, and these three segments in signature B have the three smallest distance 1 with segment i of 1
signature A. i.e. d it
≤ d ij1 , t ∈ {k , l , s}; j ∈ [1, m] and j ≠ k , l , s .
For each segment in signature A, the value
vi are given as
1, d iq1 i + d iq2 i < T vi = ® , ¯ 0, otherwise
(9)
where vi=1 represents the segment i of signature A matches with a segment qi of signature B, and vi=0 represents the segment i of signature A has no matching segment in signature B; T is a distance threshold which is experimentally decided in training stage. Let matab represents the number of matching segments of signature A. matab is given as n
mat ab = ¦i =1 vi .
(10)
The matab represents the number of matching segments of signature B and is computed vice-verse. The similarity degree between signature A and B is computed as
simab = min{
mat ab matba × 100, × 100} . n m
(11)
4.2 The RBFNN Classifier and Comparison Stage The RBFNN Classifier is a three-layers feedforward network. The three-layers are input layer with four neurons, hidden layer with four to nine neurons and output layer
1000
Z. Wu, X. Chen, and D. Xiao
with two neurons. The hidden unit uses radially symmetric function as activation function. Two kind of basic functions including Gaussian function and thin plate spline function are applied in the experiments. A two-stage training algorithm is introduced. Stage 1, it uses 500 iterations of EM to position the centers. And then in stage 2 the pseudo-inverse of the design matrix to find the second layer weights. The input of the classifier is a 4 dimensions vector composed of 4 similarity degrees between the question sample and 4 reference samples. The output of the classifier is a simple linear discriminant that outputs a weighted sum of the basic functions. There are two stages in the comparison phase: training and verification. Training stage has two aims. One is experimentally adjusting system parameter: the distance threshold value T (see equation 9) between two compared signature segments to decide if these two segments are matching; we set T equal to 0.4 in our experiments. The other is training the RBFNN classifier. In verification stage, all 9184 (287*32) vectors of similarity degree in second database are inputted to RBFNN classifier. A type I error has occurred when genuine samples are identified as forgery. On the contrary, when forgery is identified as genuine sample, a type II error occurred. 4.3 Experimental Results Table 2 shows the results obtained using the second database. The experiments have shown promising results in terms of general error rate. The simulated forgery acceptance rate was high because the features extracted were not enough to identify this type of forgery. Table 1. Experimental results using the second signature database (GF: Gaussian Function; TPSF: Thin Plate Spline Function; HNN: Number of Neuron in Hidden layer) HNN
GF
Error Type (%) Error Type (%) Random Simple Simulate
HNN
Error Type (%) Error Type (%) Random Simple Simulate
4
5.05
0.03
6.25
13.75
4
4.09
0.03
6.64
11.25
5
5.29
0.03
6.25
13.13
5
5.05
0.03
6.25
12.50
6
5.65
0.03
5.47
11.87
6
4.57
0.03
6.25
12.50
7
6.13
0.03
5.47
10.63
7
3.85
0.04
7.42
15.63
8
6.13
0.03
5.47
10.63
8
4.09
0.03
8.59
15.63
9
6.37
0.03
5.86
12.50
9
3.00
0.04
8.59
15.63
TPSF
5 Conclusion A novel off-line Chinese signature verification approach is proposed. This method is based on feature extraction of every segment of the signature skeleton and a general model of RBFNN classifier. The simple segmentation method proposed requires lower computational cost and is more robust than strokes extraction method, which is achieved by means of simplifying the definition of feature points and getting ride of the broken strokes connection part. By using a global model instead of setting up
Offline Chinese Signature Verification
1001
independent model for each writer, this method reduces the number of genuine sample required by each writer in training phase. For each writer, only 4 genuine samples are required as reference in this method.
Reference 1. Qi, Y. Y., Hunt, B. R.: Signature Verification Using Global and Grid Features. Pattern Recognition, 22 (12) (1994) 95-103 2. Sabourin, R., Genest, G., Prêteux, F.: Off-line Signature Verification by Local Granulometric Size Distributions. IEEE Trans. PAMI, 19 (9) (1997) 976-988 3. Justino, E. J. R., Bortolozzi, F., Sabourin, R.: Off-Line Signature Verification Using HMM for Random, Simple and Skilled Forgeries. Proc. of the International Conference on Document Analysis and Recognition, Seattle, USA, IEEE Computer Society Press, (2001) 105-110 4. Justino, E. J. R., Yacoubi, El. A., Bortolozzi, F. and Sabourin, R.: An Off-line Signature Vverification System Using Hidden Markov Model and Cross-validation. Proc. XIII Brazilian Symposium on Computer Graphics and Image Processing, Gramado, Brazil (2000) 105-112 5. Justino, E. J. R., Bortolozzi, F., Sabourin, R.: An Off-line Signature Verification Method Based on SVM Classifier and Graphometric Features. Proc. of the 5th International Conference on Advances in Pattern Recognition, Calcutta, India (2003) 134-141 6. Lam, L., Lee, S. W., Suen, C. Y.: Thinning Methodologiesa Comprehensive Survey. IEEE Trans. PAMI, 14 (9) (1992) 869-885 7. Abuhaiba, I. S. I., Holt, M. J. J., Datta, S.: Processing of Binary Images of Handwritten Text Documents. Pattern Recognition, 29 (7) (1996) 1161-1177 8. Liu, K., Huang, Y. S., Suen, C. Y.: Robust Stroke Segmentation Method for Handwritten Chinese Character Recognition. Proc. of the Fourth International Conference on Document Analysis and Recognition, ULm, German, 1 (1997) 211-215 9. Lin, F., Tang, X.: Offline Handwritten Chinese Character Stroke Extraction. Proc. of the 16th International Conference on Pattern Recognition, Quebec, Canada, 3 (2002) 249-252
On-Line Signature Verification Based on Wavelet Transform to Extract Characteristic Points LiPing Zhang1,2 and ZhongCheng Wu1 1
Center for Biomimetic Sensing and Control Research Institute of Intelligent Machines, CAS, Hefei, 230031 Anhui, China 2 Department of AutomationUniversity of Science & Technology of China Hefei, 230026 Anhui, China [email protected], [email protected]
Abstract. On-line signature verification is one of the most accepted means for personal verification. This paper proposes an on-line signature verification method based on Wavelet Transform (WT). Firstly, the method uses wavelet transform to exact characteristic points of 3-axis force and 2-dimension coordinate of signature obtained by the F-Tablet. And then it builds 5dimension feature sequences and dynamically creates multi-templates using clustering. Finally, after the fusion of the above-mentioned 5-dimension feature sequences, whether the signature is genuine or not is decided by majority voting scheme. Experimenting on a signature database acquired by F-Tablet, the performance evaluation in even EER (Equal Error Rate) was improved to 2.83%. The experimental results show that the method not only reduces the amount of data to be stored, but also minimizes the duration of the whole authentication processing and increases the efficiency of signature verification.
1 Introduction As one of biometric authentication methods, signature verification is considered as the most convenient and no intrusion. Moreover signature has been the primary form for identity verification for a long history and has been widely accepted. Plamondon et al [1] categorized the various signature verification methodologies into two types: functional approach and parametric approach. In parametric algorithms, the task of selecting the right set of parameters is not trivial. And one of the major issues of function-based approaches is how to compare two signature patterns despite their different durations and their non-linear distortion with respect to the time parameter. DTW (Dynamic Time Warping) has provided a major tool in overcoming this problem [2]. Although this method has been highly successful in signature verification processes, it still has a high computational complexity due to the repetitive nature of its operations for the optimization process. Some authors tried to improve this method. Y.J. Bae et al [3] propose a parallel DTW algorithm, which results in a reduction of time complexity. Hao Feng et al [4] present a new extreme points warping technique for the functional approach. It improves the equal error rate D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1002 – 1007, 2006. © Springer-Verlag Berlin Heidelberg 2006
On-Line Signature Verification Based on WT to Extract Characteristic Points
1003
and reduces the computation time. But the raw data of signatures is sizeable, so the computational complexity of the DTW is hard to be improved greatly. In this paper, we particularly insist on the extraction of the characteristic points representing signatures based on wavelet transform, and fuse 3-axis force and 2-dimension coordinate information. This enables to reduce the amount of signature data to be stored and makes verification process as fast and as efficient as possible.
2 Signature Verification System The block diagram of proposed on-line handwritten signature verification system is shown in Fig.1. Our method first extracts characteristic points of signatures by wavelet transform and builds 5-dimension feature sequences, then matches these sequences of test signature and its corresponding templates and fuse five decisions by majority voting strategy to make final decision.
Fig. 1. The block diagram of proposed online signature verification system
2.1 Signature Database and Preprocessing Differing from other tablets, we use the F-Tablet to capture 3-axis force and 2dimension coordinate information of the pen-tip at 100 samples per second [5]. Fig 2 shows the online signature system interface and the F-Tablet. There are 32 writers enrolled to our database. Writers signed in their most natural way. Each signer supplied 30 genuine signatures in two sessions over a period of one month. We generated also 20 skilled forgeries (10 random forgeries and 10 skilled ones) to each writer. To collect skilled forgeries forgers were free access to trajectory and writing sequence of each signature and with no limitation in time for training. Before verification, the method does some preprocessing such as discarding the head and tail empty strokes of signatures, filter and normalization for 3-axis force and 2-dimension coordinate information of signatures. Fig. 3 displays the information sequences after preprocessing respectively.
1004
L. Zhang and Z. Wu
Fig. 2. Online signature system interface and the F-Tablet 100
300
fx fy fz
200
100
x y
90
80
80
Positions
70
60
F
100
60
50
40
40
30
0
20
20
10
-100 0
100
t
200
300
0 0
400
(a1) The 3-Axis force curves
0
100
t
200
300
400
0
100
fx fy fz
30
40
50
60
70
80
90
100
(a3) Genuine signature
100
x y
90
80
80
300
Positions
70
F
200
100
60
60
40
40
50
30
0 20
-100
-200 0
20
(a2) 2-dim coordinate curves
500
400
10
20
10
100
200
t
300
400
0 0
500
100
200
t
300
400
500
0 0
10
20
(b1) The 3-Axis force curves (b2) 2-dim coordinate curves
30
40
50
60
70
80
90
100
(b3) Forgery
Fig. 3. The 3-Axis force, 2-dim coordinate information and the corresponding signatures
2.2 Extraction of Characteristic Points by Wavelet Transform Wavelets are a family of functions that are able to cut up a signal into different frequency components and then study each component with the resolution matched to its scale [6]. Usually, a continuous wavelet transform can be written as: WT x ( a , b ) =
1 a
³ x ( t )ψ
∗
(
t−b ) dt = a
³ x (t )ψ a ,b
∗
( t ) dt = ¢ x ( t ), ψ
a ,b
( t )².
(1)
Where the symbol ‘ ¢, ² ’ is the inner product operation. And WT x (a, b) is called the wavelet coefficient of x(t ) with respect to the mother wavelet ψ (t ) . a is the dilation factor, b is the distance of translation. S.Mallat [7] described that sharp variation points are among the most meaningful features for characterizing signals, and the zero-crossings of a wavelet transform provide the locations of the signal sharp variation points at different scales. The completeness and stability of a signal representation based on zero-crossings of a wavelet transform at the scales 2 , for integer j are studied. Because the pen motion in the y -direction is more obvious during the writing, the function y is decomposed by wavelet transform. Then the zero-crossings of the detail at a certain level are j
On-Line Signature Verification Based on WT to Extract Characteristic Points
1005
extracted: ZC iy ( 1 ≤ i ≤ N , N is the number of zero-crossings) and the corresponding points of 3-axis force and 2-dimension coordinate sequences, namely characteristic points, are represented by ( Fxi , Fyi , Fzi , x i , y i ) ( 1 ≤ i ≤ N ), as feature functions of signatures. Characteristic points extracted by wavelet transform are showed in Fig.4. 100
100
90
90 80
80
70
70 60
60 50
50
40
40 30
30 20
20
10
10
0
0 0
10
20
30
40
50
60
70
80
90
100
(a) Genuine signature
0
10
20
30
40
50
60
70
80
90
100
(b) Forgery of signature
Fig. 4. Characteristic points extracted by WT
2.3 Templates and Thresholds Using above-obtained five feature sequences ( Fxi , Fyi , Fzi , x i , y i ) , the method will create reference templates respectively and then select three reference templates from the ten training signatures depending on clustering. As a result, it follows that there are multireference templates R1 , R 2 , R3 , R 4 and R5 of the five feature sequences. At the enrollment stage, the verification system calculates the average distance between each training signature and the multi-reference templates are calculated and their expectation μ j and standard deviation σ j can be gotten. Then let thresholds TH j be the following formulas: TH j = μ j + wσ j .
(2)
Where the value w is chosen to adjust the threshold to assure the even error rates of verification. As a consequence, the method will obtain five thresholds for each information sequences, TH kj , 1 ≤ k ≤ 5 . 2.4 Decision Fusion and Verification
For each feature sequence Tk of a signature T , firstly, the average distance d (Tk , Rk ) between Tk and multi-reference Rk are calculated and compared to the corresponding threshold. Then the decision u k is defined by: 1 if d (Tk , Rk ) ≤ TH k uk = ® ,1 ≤ k ≤ 5 . ¯0 otherwise
(3)
This paper uses the majority voting strategy to combine the five decisions u k and 5
make final decision u 0 . The total votes are given by α = ¦ u k and compare it to the k =1
threshold of majority voting TH . If α is above TH , u 0 is 1, otherwise 0. Depending
1006
L. Zhang and Z. Wu
on the value of u 0 , the verification system could judge whether the signature is genuine or not.
3 Experimentation and Results The proposed method was tested using the above-mentioned database. In our experiment, adjusting w ∈ (1,5) and TH (3,4,5) is to obtain the performance evaluation in EER (Equal Error Rate). Tests were executed with the mother wavelets Daubechies1 (Haar), Biorthogonal5.5 (Bior5.5), Daubechies10 (db10) and Symlets6 (sym6) at different levels of resolution (2,3,4). Results for this configuration are presented in Table 1 and Table 2. It is obvious that mother wavelet and resolution level will impact the performance of verification. The amount of signature data will reduce with respect to the increase of resolution level. Low-resolution level may include unimportant points, which could disturb the verification and high level will lose some characteristic points. Therefore in general, the level of resolution is 3 in our experiment. We also performed an investigation to produce better results for each writer by substituting of mother wavelet. Table 3 shows the even EER of single information sequence and information fusion and compares to the method without using wavelet transform to extract characteristic points. The result indicated that, after extracting characteristic points, the data to be stored reduces and the operating speed is more rapid. Moreover the authentication efficiency is enhanced and Information fusion also improves the verification performance. In this way, online signature verification can be well applied to the actual authentication system. Table 1. EER of different mother wavelet (level=3)
Mother wavelet Db1 (Haar) Bior5.5 Db10 Sym6
Writer1 7.5% 10% 6.67% 6.67%
Writer 2 10% 5% 5% 5%
Writer 3 36.67% 6.67% 0% 6.67%
Writer 4 13.33% 23.33% 3.333% 10%
Writer 5 10% 0% 3.33% 10%
Table 2. EER of different resolution level (using mother wavelet db10)
Resolution level 2 3 4
Writer 1 15% 6.67% 12%
Writer 2 36.67% 6.67% 15%
Writer 3 3.33% 0% 5%
Writer 4 15% 5% 11%
Writer 5 15% 5% 6.67%
Table 3. Comparison of Even EER between the proposed method and the one without WT
Without WT The proposed method
fx 26.13% 23.59%
fy 26.97% 22.29%
fz 24.45% 17.08%
x 25.74% 20.25%
y 28.48% 23.89%
Even EER 11.98% 2.83%
On-Line Signature Verification Based on WT to Extract Characteristic Points
1007
4 Conclusion and Future Work This paper proposes a method based on wavelet transform. The method extracts the characteristic points of the 3-axis force and 2-dimension coordinate signature information and creates 5-dimension feature sequences. Using these feature sequences it verifies the online signatures and the even EER (Equal Error Rate) is just 2.83%. Experiments show that the method not only reduces the amount of stored data, but also minimizes the duration of the authentication phase, and increases the efficiency of signature verification. The further work is to take the relationship of characteristic points into consideration to make the verification more robust and extract characteristic points of other information sequences such as velocity and acceleration for signature verification.
Acknowledgments This research is supported by the National Natural Science Foundation of China under Grant No.60475005, 60575058 and 10576033. The authors express their thanks for these supports and also would like to thank Dr. Meng Ming, Mrs. Shen Fei, Mr.Wei Ming-xu and Kang Le for their support to this work.
References 1. Plamondon, R., Lorette, G.: Automatic Signature Verification and Writer Identification-The State of the Art. Pattern Recognition, Vol.22, No.2 (1989) 107-131 2. Sato, Y., Kogure, K.: On-line signature Verification Based on Shape, Motion, and Writing Pressure. Proc. 6th Int. Conf. on Pattern Recognition (1982) 823-826 3. Bae, Y.J., Fairhurst, M.C.: Parallelism in Dynamic Time Warping for Automatic Signature Verification. In ICDAR’95, Vol.1, (1995) 426 - 429 4. Hao, Feng., Chan, Choong Wah.: Online Signature Verification Using a New Extreme Points Warping Technique. Pattern Recognition Letters Vol.24, (2003) 2943-2951 5. Ping, Fang., ZhongCheng, Wu,. Ming Meng, YunJian Ge, Yong Yu: A Novel Tablet for On-Line Handwriting Signal Capture. Proceedings of the 5th World Congress on Intelligent Control and Automation, Vol.6, (2004) 3714-3717 6. Graps, Amara.: An Introduction to Wavelets, IEEE Computational Science and Engineering, Vol. 2, No. 2 (1995) 1-18 7. Mallat, S.: Zero-crossings of a Wavelet Transform. IEEE Transactions on Information Theory, Vol.37, No.4 (1991) 1019 – 1033
Parameter Estimation of Multicomponent Polynomial Phase Signals Han-ling Zhang 1, Qing-yun Liu 2, and Zhi-shun Li 2 1
College of Computer and communications , Hunan Universty, ChangSha, 410082, China School of Marine Engineering, Northwestern Polytechnical University, Xi’an, Shaanxi 710072, China [email protected]
2
Abstract. This paper addresses the issue of detection and parameter estimation of multicomponent Polynomial Phase Signals (mc-PPS’s) embedded in noise based on high-order ambiguity function (HAF). We first show how existing PHAF-based techniques (PHAF—product HAF) are inadequate mainly in providing reliable detection for mc-PPS. The main contribution of this paper is that we present a novel parameter estimation method. Firstly, given a set of time delay, it gives rise to a set of estimates of phase coefficients based on HAF. Then it produces a final estimate of phase coefficient by means of voting. The new method improves the probability of detection and estimation accuracy while avoiding the issue of threshold selection. Computer simulations are carried out to illustrate the advantage of the proposed method over existing techniques.
1 Introduction In certain communication and signal processing applications such as synthetic aperture radar imaging and mobile communications, the phase of the observed signal can be modeled as a polynomial function of time. Such signals are commonly known as polynomial phase signals (PPS). The analysis and estimation of PPS has been an area of recent interest[1 5]. The existing HAF-based methods work very well when applied to analyze single component PPS. But when applied to analyze multicomponent polynomial phase signals (mc-PPS), due to the existence of spurious peaks in HAF, these methods may don’t work. In order to suppress the spurious peaks, in [2,3], PHAF-based methods are proposed. In this paper, a novel method providing more reliable detection and higher estimation accuracy than PHAF-based methods is proposed. This paper is organized as follows. In Section 2, limitation inherent in PHAF-based methods is briefly discussed. In Section 3, the novel method is presented. The final remarks are presented in Section 4.
2 Signal Model and Limitations of PHAF-Based Methods We assume an observation model composed of the sum of discrete-time polynomial phase signals embedded in additive white Gaussian noise D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1008 – 1012, 2006. © Springer-Verlag Berlin Heidelberg 2006
Parameter Estimation of Multicomponent Polynomial Phase Signals
x(n) =
§ Ml · e x p ¨ j ¦ a i , l n i ¸ ( n = 0 ,1, 2 , , N − 1 ) © i=0 ¹
L
¦b
l
l =1
where L is the number of PPS components, M order for the lth component, and lth component. The amplitudes
{a }
Ml
i , l i =1
b
l
1009
(1)
is the (highest) polynomial phase
are the polynomial phase coefficients for the
are assumed to be real and positive constants. The
l
results developed here for the constant amplitude can also be extended to the time-varying amplitude case, provided that the amplitude variation is slow. The notations used here are same to the notations in [2]. If more than one PPS component share the same leading coefficients am , the mth-order high order ambiguity function (HAF), Pm [x , ω , τ ] , at any frequency
the sum of vectors. For a given
τ
and ω = m!τ
small or even to be zero. So for any
τ
m −1
ω , is
a m , Pm [x , ω , τ ] may be very
of a given set of lags
{τ h }hH=1 , the assumption
that Pm [x , ω , τ ] peaks at ω = m !τ m −1 a m is not always true. When this assumption is not true, the product high-order ambiguity function (PHAF) of order m ,
a m , so it is not possible to estimate am . PM m [x , ω ] , will not peak at ω = m!τ Here, we give an example to illustrate the disadvantages of the existing PHAF-based methods. that L = M = 2 , b1 = b2 and a1, 2 = a 2, 2 = K , then the We suppose in m −1
second-order instantaneous moments p 2 [ x ( n ), τ ] = =
2
2
¦¦b
2 1
m = 1 l =1 l≠m
If
τ
2
¦b
2 1
l =1
ª exp « j 2π ¬
1 § 2 ·º ¨ K τ n + a1, lτ + K τ ¸ » 2 © ¹¼
1 ª º½ exp ® j 2π « ( a1,l − a1, m + K τ ) n + a1, lτ + K τ 2 + a 0 , l − a 0 , m » ¾ 2 ¬ ¼¿ ¯
(2)
meets the condition
τ =
2k + 1 2 ( a 1 ,1 − a 1 , 2 )
(3)
where k is any integer, there doesn’t exist sinusoidal signal with frequency Kτ in p 2 [x ( n ), τ ] . So P2 x, ω , τ and PM 2 [x , ω ] will not peak at frequency
[
]
corresponding to K , let alone estimate K .
3 The New Detection and Estimation Method In this paper, we focus on substituting a novel method for PHAF. As to estimation procedure, please refer to [2]. For any
τh
of a given set of lags
{τ h }hH=1 (we assume,
1010
H.-l. Zhang , Q.-y. Liu, and Z.-s. Li
without loss of generality, that
τ 1 < τ 2 < < τ H ≤ N m ), compute {aˆ m, g ,τ
}
G
h
g =1
,
the estimates of phase coefficients corresponding to the locations of the first G (G is given in advance) strongest peaks in Pm x, ω , τ h , then we get a set of estimates of
[
{{aˆ } }
]
H
phase coefficients
G . m , g ,τ h g =1 h =1
Intuitively speaking, if the given mc-PPS consists
of at least one mth-order PPS whose phase coefficient of order corresponding to
m is a m , the peaks
a m should emerge in most of Pm [x, ω , τ h ] h=1 . In other words, the H
} time and again. If no mth-order PPS } } would be disorderly and exists in the given mc-PPS, the elements of {{aˆ estimate of
{{
a m should appear in aˆ m, g ,τ h
}
G
H
g =1 h =1
H G m , g ,τ h g =1 h =1
unsystematic. This is the basis of our proposed method. It is well known that the larger the τ h , the higher the estimation accuracy of phase
{
coefficient. So we take all elements of aˆ m , g ,τ H
}
G
as the finial estimates of different
g =1
phase coefficients. At the same time, we think that all elements of
[
{{aˆ } }
H −1 G m , g ,τ h g =1 h =1
which are within the immediate neighborhood aˆ m , g ,τ H − δ , aˆ m , g ,τ H + δ determined by physical frequency resolution of DFT and
τ H ) of aˆ m, g ,τ
H
] (δ
is
should be
considered as different estimates of the same phase coefficient corresponding to aˆ m, g ,τ H . As to element of aˆ m , g ,τ h G which is not an estimate of a certain phase
{
{
coefficient as the one of aˆ m , g ,τ k
}
}
G g =1
g =1
, where k > h , we take it as the finial estimate of
another phase coefficient, and all elements of
[
{{aˆ } }
d G m , g ,τ h g =1 h =1
( d < h ) which is
]
within the immediate neighborhood aˆ m , g ,τ h − δ , aˆ m , g ,τ h + δ of aˆ m , g ,τ h as different estimates of the same phase coefficient. We denote q , R as the number of elements of
{{aˆ } }
G H m, g ,τ h g =1 h=1
which belongs to different estimates of the same phase coefficient
and the product of peak intensities of different estimates, respectively. If q ≥ γ ( γ is given in advance), and at the same time, there is at least one time that intensity of an estimate satisfies detection criterion proposed in [2], we think that an mth-order PPS is present in x ( n ) . We take the one whose product of q and R largest as the estimate of the parameter of this component. To demonstrate the validity of our proposed method, we consider the estimation of second –order phase coefficients of a six-component PPS, which consists of one harmonic, five chirps. The amplitudes and second-order phase parameters of these signal components are given in Table 1. Data length used was N = 5 1 2 . Additive
Parameter Estimation of Multicomponent Polynomial Phase Signals
noise
was
white
Gaussian
noise
{6 4 , 9 6 ,1 2 8 ,1 6 0 ,1 9 2 , 2 2 4 , 2 5 6 } and
with
variance
1.0.
The
lag
1011
sets
is
G = 12 . The sum of second-order HAF
of the chirps whose chirp rate are 20 is zero when
τ
equals to N 4 .
Table 1. Amplitude and second-orderphase parameters of a six component LFM signal
l b1 a2,l
1 1.0 20
2 0.7 20
3 0.7 20
4 1.0 35
5 1.0 35
6 1.0 0
Fig.1 shows the second-order phase parameter estimates obtained from 300 independent Monte-Carle simulations. The estimates based on the novel method ( q > 4 ) were shown in Figs. 1(a). The estimates based on PHAF were shown in Figs.1(b), (c). To show the difference of detection performance of the novel method and PHAF-based methods used in Figs. 1(a), (b), (c) respectively, table 2 shows the number of missed detection and that of false alarm of these method when noise variance is 1.414. If the algorithm fails to detect any one of the PPS components, the event is considered as missed detection. Similarly, if the algorithm estimates a non-existent PPS component, the event is considered as false alarm. Table 1 shows that Amplitude and second-order phase parameters of a six component LFM signal.
(a)
(b)
(c)
Fig. 1. The second-order phase parameter estimates obtained from 300 independent Monte-Carle simulations. (a) The estimates based on the novel method ( q > 4 ). (b) The estimates based on PHAF using the sets of lags ( τ 1 = N 8 and τ 2 = N 4 ). (c) The estimates based on PHAF
using the sets of lags ( τ 1
= N 4 and τ 2 = N 2 ).
From Table 2, we observe that the new method improves the probability of detection. We always take the phase coefficient estimate corresponding to larger τ as the final estimate, so we can see from Fig.1 roughly, that the new method also improves estimation accuracy.
1012
H.-l. Zhang , Q.-y. Liu, and Z.-s. Li
Table 2. The number of missed detection and that of false alarm of the method used in Fig. 1 when noise variance is 1.414
l Missed Detection False Alarm
Fig.1(a) 0 0
Fig.1(b) 32 27
Fig.1(c) 62 62
As to what is the minimum signal-to-noise ratio (SNR) for which our proposed method still works, the answer also depends on many factors, including the number of PPS components, their relative strengths, the highest PPS order in each component, and data length. If we take the example used in section 5 in [2] as an example, and use the same evaluation measure, our proposed method can work at -6dB, far below the minimum SNR 9.5dB in [2].
4 Conclusion In this paper, we investigated the detection and estimation of a sum of PPS components embedded in noise. The existing PHAF-based techniques do not perform well in providing reliable detection for mc-PPS. We present a novel detection and estimation method. Given a set of time delay, it gives rise to a set of estimates of phase coefficients based on HAF. Then it produces a final estimate of phase coefficient by means of voting. The new method improves the probability of detection and estimation accuracy while avoiding the issue of threshold selection. Computer simulations are carried out to illustrate the advantage of the proposed method over existing techniques.
References 1. Peter O’Shea: A New Technique for Instantaneous Frequency Rate Estimation[J]. IEEE Signal Processing Letters.9(8) (2002) 251-252 2. Ikram, M. Z., Tong Zhou, G.: Estimation of multicomponent polynomial phase signals of mixed orders[J]. Signal Processing. 81(2001)2293-2308 3. Barbarossa et al, S.: Product High-order Ambiguity Function for Multicomponent Polynomia-phase Signal Modeling [J]. IEEE Trans. Signal Processing. 46(3) (1998) 691-708 4. Wang, Y., Tong Zhou, G.: On the Use of High-order Ambiguity Function for Multicomponent Polynomial Phase Signals[J]. Signal Processing. 65. (1998)283-296. 5. S. Peleg, B. Friedlander. Multicomponent signal analysis using the polynomial phase transform[J]. IEEE trans. On Aerospace and Electronic systems.32(1). (1996) 378-386 6. M. Z. Ikram, et al. Estimating the parameters of Chirp signals: an iterative approach[J]. IEEE Trans. Signal Processing.46(12). (1998) 436-3440
Parameters Estimation of Multi-sine Signals Based on Genetic Algorithms∗ Changzhe Song, Guixi Liu, and Di Zhao Department of Automation, P.O. Box 185, Xidian University, Xi’an, 710071, China [email protected], [email protected]
Abstract. An improved Genetic Algorithms (GA) for parameters estimation of multi-sine signals (PEMS) is proposed. The strategies of self-adaptive elite criterion, two-points crossover and cataclysmic mutation are employed in this algorithm to improve the performance of GA. For simplifying the computation, a complicated operating process is converted into several simple processes. A model of PEMS is also built which is conveniently applied to GA. Simulation results show that the proposed method is effective and superior to the least -mean squares (LMS) method.
1 Introduction Parameters estimation of multi-sine signals is a classical problem in signal processing and is becoming increasingly important in radar, sonar, biomedical and geophysical application. There are two main methods to solve this problem. One is the FFT-based methods [1]. They are simple and convenient. But the disadvantages of FFT, such as fence effect, leakage etc., affect the estimation accuracy. Another is the optimization algorithms based on time-domain searching like the LMS [2] algorithm etc.. They do have certain weaknesses, such as the local convergence and low accuracy etc.. An improved GA is proposed in the paper to overcome these barriers. The strategies of Self-adaptive elite criterion [3], two-points crossover [4] and cataclysmic mutation [5] are introduced in this algorithm to modify the performance of GA. For reducing the complexity of GA, we estimate the sinusoids one by one. Simulation results illustrate the effectiveness of the proposed algorithm. And comparison is done to demonstrate the better performance of GA over LMS.
2 Problem Description and Model Building In this paper, consider the case where the sinusoidal signals are corrupted by additive Gaussian noise, i.e.
y (t ) =
K
Ai sin(2π ¦ i
f i t + Φ i ) + w (t )
(1)
=1
∗
This research is supported by the Preliminary Research Foundation of National Defence Science and Technology (51416060205DZ0147).
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1013 – 1017, 2006. © Springer-Verlag Berlin Heidelberg 2006
1014
C. Song, G. Liu, and D. Zhao
Where Ai , f i , Φ i are the parameters to be estimated, and represent the amplitude, frequency and initial phase respectively. Parameter K is the number of sine signals. 2 The element w ( t ) is Gaussian noise with mean 0 and variance σ . ~ ~ ~ T When K = 1 (for single sinusoid signal), let vector θ = [ A , f , Φ ] . The criterion function is defined as:
~
~
~
σ (θ ) = abs[ y (t ) − A sin( 2πf t + Φ )]
(2)
It is clear that when σ (θ ) reaches the minimum, vector θ is the estimate values of the parameters. When K > 1 (for multi-sine signals), we write equation (2) as follows.
~
~
~
σ i (θ i ) = abs[ y (t ) − Ai sin(2πf i t + Φ i )] i = 1, ⋅⋅⋅, K
(3)
According to the same rules as above, equation (3) is used as the criterion function of ~ ~ ~ T multi-sine signals. When σ i (θ i ) achieves the minimum, vector θi = [ Ai , f i , Φi ] is the estimate values of the parameters in sinusoid i . Then the sinusoid i is deleted from observation datum. According to the result the parameters of another sinusoid are estimated. Running this operation circularly, all parameters will be estimated. It is well known that for more sinusoids and/or longer datum, the computation of GA is very complicated [6]. But this course of estimate sinusoids one by one can reduce the computational complexity greatly.
3 Parameters Estimation Method Based on Genetic Algorithms 3.1 Modified Genetic Algorithms GA is an interactive process of selection, crossover and mutation. It is a kind of self-adaptive global searching optimization algorithm different from conventional optimization algorithms [4]. To improve the performance of GA, some strategies are utilized to modify the GA. 1) Self-adaptive elite criterion In selecting operation, self-adaptive elite criterion [3] is adopted besides Roulette Wheel Selection [4]. According to the bias between the best fitness and the average fitness, the method self-adjusts the number of elitists, which are reproduced to offspring directly in every generation. It makes sure GA converge to the global optimum. 2) Two-points crossover One-point crossover has a major drawback that certain combination of schema cannot be combined in some situations. The two-point crossover can be introduced to overcome this problem [4]. 3) Cataclysmic mutation In addition to traditional mutation operator, cataclysmic mutation is used in the algorithm. When individuals of a population are over convergence, the mutation process
Parameters Estimation of Multi-sine Signals Based on Genetic Algorithms
1015
is operated with a probability much larger than the usual. It is able to retain diversity of the population and prevent premature phenomena effectively [5]. 3.2 Proposed Genetic Algorithms Implementation According to the model described above, the sinusoid is estimated one by one. Equation (3) is chosen as the objective function. Our purpose is to find minimum of the objective function. So the fitness function can be constructed as follows [4].
Fi (θi ) = Cmax − σ i (θi )
(4)
Where C max is a constant big enough, satisfies Cmax ≥ max[σ i (θi )] to keep Fi (θi ) ≥ 0 . The complete process of the GA propounded in the paper is summarized in this section. Steps of the proposed algorithm are the following.
Step 1: Initialization. Choose population size N , crossover rate p c , mutation rate pm and the maximum generation G . Encode the parameters by binary string. Step 2: Set t = 0 . Generate the initial population p(t ) randomly. Step 3: Calculate the fitness of the individuals of the population, and rank the individuals on the basis of their fitness values. Step 4: Selecting operation. Choose N × Pk individuals from parent population and reproduce them directly to the offspring. Parameter Pk is proportional to the average fitness of current population and inversely proportional to the maximum fitness of the population. Other individuals are obtained by Roulette Wheel Selection. Step 5: Two-point crossover operation. Step 6: Mutation operation. Combine traditional mutation operator and cataclysmic mutation operator. Step 7: If t ≤ G , let t =t +1 and go to Step 3. If t > G , export the results. Step 8: Subtract the sine component estimated from sampling datum. If there is still sine component left, go to Step2; otherwise, finish program.
4 Simulations In this section, a computer simulation is afforded to show the performance of the proposed GA. An example of two sine signals corrupted by Gaussian noise is considered in this simulation. The signal is expressed by equation (5). 2
y (t ) = ¦ Ai sin(2π f i t + Φ i ) + w(t )
(5)
i =1
Where A1 = 2 , f1 = 2.5Hz Φ1 = 1.5rad A2 = 1 f2 = 1.5Hz Φ 2 = 2.5rad . Component w(t ) is Gaussian noise with mean 0 and variance 1. Sample frequency is
1016
C. Song, G. Liu, and D. Zhao
f = 20 Hz . The binary coding length of each parameter is 15. The population size is N = 60 . The crossover rate is Pc = 0.82 , and the mutation rate is pm = 0.045. The maximum generation is G = 400 . Data length is L = 128 . Fig. 1 shows the signal y (t ) at SNR = −5dB . Obviously, a strong influence on the signal is caused by the noise. Fig. 2 is the curves of the best fitness and average fitness. The two curves all converge at certain values. They almost match together eventually. So we can obtain the consequence that the proposed GA is a stable and convergent method.
Fig. 1. Waveform of the signal y(t)
Fig. 2. Curves of the best fitness and average fitness
Fig. 3. Evolution curves of parameters of component 1
Fig. 4. Evolution curves of parameters of component 2
The evolution curve of each parameter is shown in Fig.3 and Fig.4. It is clear that every parameter is able to converge at its truth-value nearly at generation 100. The method converges quickly. A qualitative cognition of the estimate accuracy is obtained according to these figures. Each parameter is estimated for 20 times. The average is used as the value of the parameter. In Table 1, we present the estimate results of GA and LMS at SNR=-5dB, 10dB, 30dB, 100dB. A quantitative cognition of the estimate accuracy is obtained. The
Parameters Estimation of Multi-sine Signals Based on Genetic Algorithms
1017
proposed method has a high degree of accuracy. And the accuracy of the proposed GA is higher than the LMS entirely. This sufficiently testifies the performance of the global searching of the proposed GA. Table 1. Estimate values of parameters -5 dB GA A1=2
LMS
10 dB GA
LMS
30 dB GA
LMS
100 dB GA
LMS
2.0758 2.6140
1.9732 2.1710
1.9928 1.9897
1.9999 2.0003
f1=2.5Hz 2.5073 2.3796
2.5042 2.4375
2.5034 2.5072
2.5005 2.5009
ĭ1=1.5rad 1.4101 1.0512
1.4354 1.7230
1.4554 1.5549
1.4966 1.4886
1.0861 1.1041
1.0324 1.0167
1.0049 0.9880
1.0014 1.0027
f2=1.5Hz 1.5103 1.5901
1.5074 1.4588
1.5013 1.5073
1.5001 1.4993
ĭ2=2.5rad 2.3125 2.6507
2.3441 2.6914
2.4762 2.5461
2.4809 2.5159
A2=1
5 Conclusions This paper introduces a modified GA to estimate the parameters of sinusoids. GA is a fast and effective global searching optimization algorithm. It is able to overcome the weaknesses of conventional optimization algorithms, such as the local convergence and low accuracy etc.. But GA is too complicated to be applied to real-time operation. In the paper, we estimate sinusoids one by one to reduce its complexity. Simulation results show the excellent performance of the improved GA and demonstrate that the proposed algorithm is effective to estimate the parameters of sine signals.
References 1. Zhang, X.D.: Modern Signal Processing [M]. Beijing: Tsinghua University Press (1995) 2. Mayyas, K.: Performance Analysis of the Deficient Length LMS Adaptive Algorithm IEEE Transactions on Signal Processing. 53(8) (2005) 2727-2734 3. Yu, W., NIE, Y.F.: Genetic Algorithm Approach to Blind Source Separation. J. of Wuhan Uni. of Sci. & Tech (Natural Science Edition), 26(3) (2003) 297-300 4. Xuan, G.N., Cheng, R.W.: Genetic Algorithm and Engineering Optimization. Beijing: Tsinghua University Press (2004) 5. Liu, J.: Application in Parameter Estimation of Nonlinear System Based upon Improved Genetic Algorithms of Cataclysmic Mutation. Journal of Chongqing Normal University (Natural Science Edition). 21(4) (2004) 13-16 6. TANG, K.S., MAN, K.F., WONG, S.K., HE, Q.: Genetic Algorithms and Their Applications. IEEE Signal Processing Magazine (1996) 22-37
Fast Vision-Based Camera Tracking for Augmented Environments Bum-Jong Lee and Jong-Seung Park Dept. of Computer Science & Engineering, University of Incheon, 177 Dohwa-dong, Nam-gu, Incheon, 402-749, Republic of Korea {leeyanga, jong}@incheon.ac.kr
Abstract. This article describes a fast and stable camera tracking method aimed for real-time augmented reality applications. From the feature tracking of a known marker on a single frame, we estimate the camera position and translation parameters. The entire pose estimation process is linear and initial estimates are not required. As an experimental setup, we implemented a video augmentation system to replace detected markers with virtual 3D graphical objects. Experimental results showed that the proposed camera tracking method is robust and fast applicable to interactive augmented reality applications.
1
Introduction
Augmented reality applications involve seamless insertion of computer-generated 3D graphical objects into a live-action video stream of unmodeled real scenes. The primary requirement for the practical augmented reality system is a method of accurate and reliable camera tracking. Most augmented reality applications require the camera pose in online mode to project computer generated 3D models into the real world view in real-time. Hence, utilization of a fiducial marker is a natural choice for the fast feature tracking as well as for the computation of the initial camera pose. The projective camera model is frequently used in computer vision algorithms. The projective model has eleven unknowns and it is unnecessarily complex for many applications. Since the camera tracking for an augmented reality application must be fast enough to handle real-time interactions, appropriate restrictions on the camera model should be introduced as long as the approximation is not far from the optimal solution. This paper describes a stable real-time marker-based camera tracking method for augmented reality systems working in unknown environments. In the next sections, we propose a fast linear camera matchmoving algorithm which does not require the initial estimates.
2
Camera Matchmoving
In the perspective model, the relations between image coordinates (u and v) and model coordinates (x, y and z) are expressed by non-linear equations. By D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1018–1023, 2006. c Springer-Verlag Berlin Heidelberg 2006
Fast Vision-Based Camera Tracking for Augmented Environments
1019
imposing some restrictions on the projection matrix P, linearized approximations of the perspective model are possible. A well-known linearized approximation of the perspective camera model is the weak-perspective model[1]. The weak-perspective model can be used instead of the perspective model when the dimensions of the object are relatively small compared to the distance between the object and the camera. In the weak-perspective model, all object points lie on a plane parallel to the ¯ = [¯ image plane and passing through the centroid x x y¯ z¯ 1]T of the object points. Hence, all object points have the same depth z¯ = (¯ x −t)·rz where R = [rx ry rz ]T T and t = [tx ty tz ] represent the relative rotation and translation between the object and the camera. The projection equations for an object point x to an z )(rx · (x − t)) and v = image point m = [u v 1]T are expressed by u = (1/¯ ¯ = 0, then z¯ = −t · rz and it leads the projection (1/¯ z)(ry · (x − t)) . Assume x equations: z , v = ˜ry · x + y¯/¯ z (1) u = ˜rx · x + x¯/¯ z , ˜ry = ry /¯ z, x ¯ = −t · rx , and y¯ = −t · ry . where ˜rx = rx /¯ The equation (1) corresponds to an orthogonal projection of each model point into a plane passing the origin of the object space and parallel to the image plane, followed by a uniform scaling by the factor (1/¯ z). The equation (1) can be solved by the orthographic factorization method[2] with constraints |˜rx | = |˜ry | = 1 and ˜rx · ˜ry = 0. Once ˜rx and ˜ry are obtained, the motion parameters rx and ry can be computed by normalizing ˜rx and ˜ry , i.e., rx = ˜rx /|˜rx | and ry = ˜ry /|˜ry |, and the translation along the optical axis is computed by z¯ = 1/|˜rx |. The weak-perspective model approximates the perspective projection by assuming that all the object points are roughly at the same distance from the camera. The situation becomes true if the distance between the object and the camera is much greater than the size of the object. We assume that the depths of all points of the object are roughly at the same depth. All depths of the object points can be set to the depth of a specific object point, called a reference point. Let x0 be such a reference point in the object and all other points have roughly same depth denoted by z0 . We set the reference point x0 as the origin of the object space and all other coordinates of the object points are defined relative to x0 . Consider an object point xi and its image mi = [ui vi 1]T which is the scaled orthographic projection of xi . From the equation (1), the relation can be written: ˜rx · xi = ui − u0 , ˜ry · xi = vi − v0 (2) where ˜rx = rx /z0 , ˜ry = ry /z0 , u0 = −t · rx /z0 , v0 = −t · ry /z0 , and m0 = [u0 v0 1]T is the image of the reference point x0 . Since the object points are already known and their image coordinates mi (0 ≤ i < N ) are available, the equations (2) are linear with respect to the unknowns ˜rx and ˜ry . For the N − 1 object points (x1 , . . . xN −1 ) and their image coordinates (m1 , . . . mN −1 ), we construct a linear system using equation (2) by introduc˜2 · · · x ˜ N −1 ]T , the (N − 1)-vector ing the (N − 1) × 3 argument matrix A = [˜ x1 x T T u = [˜ u1 u ˜2 · · · u ˜N −1 ] , and the (N − 1)-vector v = [˜ v1 v˜2 · · · v˜N −1 ] where ˜ i = xi − x0 , u x ˜i = ui − u0 and v˜i = vi − v0 . All the coordinates are given by
1020
B.-J. Lee and J.-S. Park
column vectors in the non-homogeneous form. The unknowns ˜rx and ˜ry can be obtained by solving the two linear least squares problems: A˜rx = u and A˜ry = v.
(3)
The solution is easily obtained using the singular value decomposition (SVD). The parameters rx and ry are computed by normalizing ˜rx and ˜ry . Once the unknowns rx and ry have been computed, more exact values can be obtained by an iterative algorithm. Dementhon[3] showed that the relation of the perspective image coordinates (ui and vi ) and the scaled orthographic image coordinates (ui and vi ) can be expressed by ui = ui + αi ui and vi = vi + αi vi in which αi is defined as αi = rz · xi /z0 where rz = rx × ry . Hence, in the equations (2), we replace ui and vi by ui and vi and obtain: ˜rx · xi = (1 + αi )ui − u0 , ˜ry · xi = (1 + αi )vi − v0 .
(4)
Once we have obtained initial estimates of ˜rx and ˜ry , we can compute αi for each xi . Hence, the equations (4) are linear with the unknowns, ˜rx and ˜ry . The term αi is the z-coordinate of xi in the object space, divided by the distance of the reference point from the camera. Since the ratio of object size to z0 is small, αi is also small, which means only several iterations may be enough for the approximation.
3
Marker-Based Pose Estimation
Assume the N object points x0 , x1 , . . . xN −1 are observed in a frame and their image coordinates are given by m0 , m1 , . . . mN −1 in a single frame. All the points are given by column vectors in the non-homogeneous form. We automatically choose the most preferable reference point which minimizes the depth variation. The focal lengths in two image directions (fx and fy ) and the coordinates of the principal point (px and py ) are also used for the accurate pose estimation. The overall steps of the algorithm are as follows: Step 1 (Selecting xk ): Choose a reference point xk satisfying arg mink i (zi − zk )2 where zi is the z-coordinate of xi . Step 2 (Normalizing coordinates): Translate all the input object points by −xk so that the reference point xk becomes the origin of the object space. Also, translate all the input image points by [−px − py ]T so that the location of principal point becomes the origin of the image space. Step 3 (Establishing A, u and v): Using object points xi and their corresponding image points mi (i = 1, . . . , N −1), build the (N −1)×3 argument matrix A, the (N − 1)-vector u, and the (N − 1)-vector v shown in equation (3). Set ut and vt by ut = u and vt = v. Step 4 (Computing ˜rx and ˜ry ): Solve the two linear least squares problems, A˜rx = ut and A˜ry = vt , for the unknowns ˜rx and ˜ry . The solution is easily obtained using the singular value decomposition (SVD).
Fast Vision-Based Camera Tracking for Augmented Environments
1021
Step 5 (Computing zk , rx , and ry ): Compute zk by zk = 2fx fy /(fy |˜rx |+fx |˜ry |) where fx and fy are the camera focal lengths by the x- and y- axes. Compute rx and ry by rx = ˜rx /|˜rx | and ry = ˜ry /|˜ry |. Step 6 (Computing αi ): Compute αi by αi = rz · xi /zk where rz = rx × ry . If αi is nearly same to the previous one, stop the iteration. Step 7 (Updating ut and vt ): Update ut and vt by ut = (1 + αi )u and vt = (1 + αi )v. Goto Step 4. The rotation matrix R is the arrangement of the three orthonormal vectors: R = [rx ry rz ]T . The translation vector t is the vector from the origin of the camera space to the reference point. Hence, once we found ˜rx and ˜ry , the depth of the reference point zk is computed by zk = 2fx fy /(fy |˜rx | + fx|˜ry |). Then, the translation t is obtained by t = [zk uk /fx zk vk /fy 2/(|˜rx | + |˜ry |)]T .
4
Experimental Results
To demonstrate the effectiveness of the proposed method we implemented the camera pose tracking system that relies on known marker tracking from a real video stream. For each frame, the camera pose for the current frame is calculated using the tracked feature points on a marker from a single frame. The implemented system recognizes two types of markers (Cube and TagMarker as shown in Fig. 1). The continuous marker tracking and re-initialization are robust and also are not sensitive to illumination changes. From the marker features, we estimate the camera pose. Then, we project 3D graphical objects onto the frame and render the projected virtual object together with the original input frame. Fig. 1 shows the AR application which inserts a virtual object into a live video stream. The upper figure shows the insertion of a virtual flowerpot at the cube marker position and the lower figure shows the insertion of a building with a helicopter attached on it into the AR marker position. We compared the estimation accuracy of the proposed method with the linear scaled orthographic method (SOP)[4] and the iterative scaled orthographic method (POSIT)[3]. The projection error of the proposed method is under 2 pixels in most cases and it is less than that of SOP and POSIT (See Table 1). The comparison of accuracy and stability is shown in Fig. 2. In the left figure, the error is measured as the average reprojection error when the depth variance of scene points relative to the reference point increases. We also measured the error according to the relative distance of the marker from the camera divided by the marker radius, which is shown in the right figure. Table 1. Accuracy and computing time with respect to marker types marker type #frames #feature points avg accuracy(pixel) time(ms) CubeSparse 1000 5 1.64 1.73 CubeDense 700 72 0.327 2.113 TagMarker 8 6 0.948 1.756
1022
B.-J. Lee and J.-S. Park
Fig. 1. Augmented reality applications to insert virtual objects into a video stream 14
4
3.5
Proposed POSIT SOP
10 error(pixel)
error(pixel)
3
2.5
2
8
6
1.5
4
1
2
0.5
Proposed POSIT SOP
12
0.2
0.4
0.6
0.8 1 1.2 point distribution(m)
1.4
1.6
0
2
4
6
8 10 distance(m)/radius(m)
12
Fig. 2. Comparison of camera tracking accuracy of three different methods
14
Fast Vision-Based Camera Tracking for Augmented Environments
1023
The processing speed of the proposed camera tracking method is about 27 frames per second on a Pentium 4 (2.6GHz) computer including all steps of the system such as frame acquisition, marker detection, feature extraction, pose estimation, and 3D rendering. The camera pose estimation process and the registration process roughly take 4 ms and 8 ms, respectively, and the speed is sufficiently fast for real-time augmented reality applications. The time is proportional to the number of feature points and the accuracy is inversely proportional to the number of feature points. Overall numerical values indicate that the type of markers does not affect the pose accuracy critically.
5
Conclusion
This article has presented a real-time camera pose estimation method assuming a known marker is visible to the video camera. From the marker tracking from a single frame, the camera position and translation parameters are estimated using a linear approximation. The pose estimation process is fast enough to real-time applications since the entire pose estimation process is linear and initial estimates are not required. Compared with previous fast camera pose estimation methods, the camera pose accuracy is greatly improved without paying extra computing time. As an application of the proposed method, we implemented an augmented reality application which inserts computer-generated 3D graphical objects into a live-action video stream of unmodeled real scenes. Using the recovered camera pose parameters, the marker in the image frames is replaced by a virtual 3D graphical object during the marker tracking from a video stream. Experimental results showed that the proposed camera tracking method is robust and fast enough to interactive video-based applications.
Acknowledgement This work was supported in part by the Ministry of Commerce, Industry and Energy (MOCIE) through the Incheon IT Promotion Agency (IITPA) and in part by the Brain Korea 21 Project in 2006.
References 1. Carceroni, R., Brown, C.: Numerical Methods for Model–Based Pose Recovery. (1997) 2. Tomasi, C., Kanade, T.: Shape and Motion from Image Streams Under Orthography: A Factorization Approach. IJCV, 9 (1992) 137–154 3. Dementhon, D., Davis, L.: Model-based Object Pose in 25 Lines of Code. IJCV, 15 (1995) 123–141 4. Poelman, C., Kanade, T.: A Paraperspective Factorization Method for Shape and Motion Recovery. IEEE T-PAMI, 19 (1997) 206–218
Recognition of 3D Objects from a Sequence of Images Daesik Jang Department of Computer Information Science Kunsan National University, Gunsan, South Korea [email protected]
Abstract. The recognition of relatively big and rarely movable objects such as refrigerators and air conditioners, etc. is necessary because these objects can be crucial global features for Simultaneous Localization and Map building(SLAM) in indoor environment. In this paper, we propose a novel method to recognize these big objects using a sequence of 3D scenes. The particles which represent an object to be recognized are scattered into the 3D scene captured from an environment and then the probability of each particle is calculated by matching the 3D lines of the object model with them of the environment. Based on the probabilities and the degree of convergence of the particles, the object in the environment can be recognized and the position of the object can also be estimated. The experimental results show the feasibility of the suggested method based on particle filtering and its application to SLAM problems.
1 Introduction Object recognition has been one of the most challenging issues in computer vision and intensively investigated for several decades. In particular, the object recognition has played an important role for manipulation and SLAM in robotics fields. Many researchers suggested various 3D object recognition approaches. Among them, the model-based approach mentioned in this paper is the most general one for recognizing shapes and objects. It recognizes objects by matching features extracted from an object on the scene with the stored features of the object in advance[1]. Some famous model-based recognition studies are as follows. The method suggested by Fischler and Bolles [2] uses RANSAC for recognizing objects. It projects points of all models on the scene and decides if the projected points are similar to those of the captured scenes and recognizes the object based on the similarity. This method is not so efficient because the procedure including assumption and verification is repeated many times to get an accurate result. In addition, Johnson and Herbert [4] proposed a spin-image based recognition algorithm in cluttered 3D scenes and Andrea Frome et al. [3] compared the performance of 3D shape contexts with that of spin-image. Jean Ponce et al. [5] introduced a 3D object recognition approach using affine invariant patches. However, these methods are working well only when accurate 3D data or fully textured environments are provided, while our approach makes it possible to recognize objects when there are lots of noises and uncertainties in the captured scenes stemming from low-quality sensors. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1024 – 1029, 2006. © Springer-Verlag Berlin Heidelberg 2006
Recognition of 3D Objects from a Sequence of Images
1025
In this paper we propose a new approach to recognize big and rarely movable objects in a sequence of images by applying a probabilistic model in noisy and textureless environments.
2 Extraction of 3D Line Features We use 3D lines as key features for object recognition because 3D data can be obtained robustly on the boundaries of objects even in texture-less environments as shown in Fig. 1(a).
(a) 3D data from stereo camera
(b) Noisy 3D data
Fig. 1. 3D data
(a) Experimental environment
(b) Edges extracted
(c) 2D lines
(d) 3D lines
Fig. 2. Results of the line feature extraction in 2D and 3D
Due to the poor accuracy of 3D data as illustrated in Fig. 1(b), all lines are firstly extracted from 2D images and the 2D lines can be transformed to 3D lines by mapping corresponding 3D points (2D images and their corresponding 3D points are captured at the same time from a stereo camera). We made a simple algorithm to find 2D lines based on the edge following approach so that we could find most of lines in a scene efficiently. First of all, the edges are drawn by canny edge detection algorithm. Then, we categorize the edges as hori-
1026
D. Jang
zontal, vertical and diagonal line segments based on the connectedness of edges. 2D lines are found by connecting each line segments with adjoining line segments considering the aliasing problem. 3D lines can be obtained if there are corresponding 3D points at the pixels of 2D lines. The 2D lines are transformed into 3D lines by assigning the 3D positions of the corresponding 3D points. Fig. 2 shows the results of line extraction in 2D and 3D.
3 Object Recognition Based on Particle Filter 3.1 Initial Particle Generation An object to be recognized is modeled with 3D lines and we consider this line set as a particle. Many particles which represent this object will be spread into possible positions of objects in a 3D scene based on the initial particle generation. Fig. 3 (a) illustrates the particle of an object. At every scene, the particles are initially generated to find other possible positions of an object, which could not be extracted in previous scenes.
(a) The model o f a refrigerator
(b) Vertical line
(c) Horizontal line
(d) Relationship of lines
Fig. 3. A particle of one object and the initial particle generation using lines (dot : particles generated, solid: 3D line in the scene)
Fig. 3 shows how particles are generated by using directional features of lines. In case of (b), many particles can be generated by rotating vertical lines based on core vertical line. All particles located apart from floor are eliminated because refrigerator does not stand up with certain gap from floor. 3.2 Determination and Updating of the Probabilities of Particles The probability is obtained after getting positive and negative similarity of each particle by using 3D line in space. The positive similarity shows how much the lines composed of particles are matched with lines in space after projecting them into space. It is decided by following two elements. The first element is S1 and it is the degree that lines composed of particles are matched with lines in space. S1 is determined as follows. It is tested if there is a 3D line around each line of a particle or not. And the similarities in length, orientation, and distance of the corresponding lines are verified. The second element is S2. It
Recognition of 3D Objects from a Sequence of Images
1027
shows how many lines of a particle are matched with the lines in space. These two elements of similarity S1 and S2 are integrated by weighted sum. On the other hand, the negative similarity shows how lines in space are matched with those of particles. For example, if an air conditioner having the same shape and dimension with refrigerator is existing in space, the positive similarity of the air conditioner is identical to that of a refrigerator. But in this case, the number of lines included in the air conditioner is greater than that of lines comprised in a refrigerator due to the geometric shape difference. All particles at time t-1 should be propagated to scene at time t to update the probability of particles. Probabilities of particles newly founded at time t and existing particles are updated based on (1).
Pt (n) =
Np Np
1
¦¦ d P n=0 m=0
t −1
(m) Pt (n)
(1)
where d is the difference of the pose of Pt −1 (m) and Pt (n) .
(a) 2D image of environment
(b) The particles at first scene
(c) The particles at second scene
(d) The particles at third scene
Fig. 4. The distribution of particles in a sequence of images (The particles are represented by green boxes)
After that, particles are sampled again according to the probabilities of them. Particles with high probability can generate more particles and particles, while those with low probability are disappeared. Fig. 4 shows how particles are updated in continuous scenes.
4 Experimental Results This paper aims to make a robot to know the location of large objects such as refrigerators when the robot is navigating. We assume that all objects to be applied for this approach should have major straight line features enough for recognition. In order to
1028
D. Jang
get 2D images and 3D data, a stereo camera named as Bumblebee is used. The stereo camera is mounted on the end effecter of an arm with an eye-on-hand configuration. Fig. 5 shows the stereo camera and the robot used for experiment and an eye-on-hand configuration.
(a) Stereo camera
(b) Robot
(c) Eye on hand
Fig. 5. Equipments for the experiment
(a) 0°
(b) 30°
(c) 60°
Fig. 6. The recognition results from different viewpoints
(a) Occluded by mannequin
(b) Occlusion with another view a ngle (30°)
Fig. 7. The recognition results with static occlusions
Fig. 6 shows experimental results with different viewpoints. Three sequences of images were used and the robot is looking at and approaching the object from different directions of 0, 30 and 60 degree respectively. Although the probability of each particle is getting lower if the angle between the robot and the object is increasing from 0 to 60 degree, these results are acceptable since the probabilities of particles are
Recognition of 3D Objects from a Sequence of Images
1029
over a predefined threshold of 0.7. The blue box in the scene means the estimated model position after recognition. Fig. 7 shows the result in the presence of occlusions. A mannequin was put in front of the object to make occlusions. Even though the mannequin stands in front of the refrigerator, the position of the object was recognized successfully with 3 consecutive scenes.
5 Conclusion In this paper, the method to recognize objects and estimate their poses by using sequential scenes in noisy environments is suggested. Under the assumption that we already know the object to be recognized, particles of the object are scattered into the 3D scene and the probability of each particle is determined by matching 3D lines. And the probabilities of particles are updated in the same way after reading the next scene and then the object is measured and its pose is estimated. This method can be applied to recognize large objects such as refrigerators, air conditioners and bookcases that have many line features. It is proved by experiment that this method is robust to orientation changes and occlusions. Moreover, it can be used to perform SLAM more reliably by providing the positions and poses of the recognized objects as landmarks.
Acknowledgement This work was supported by IITA through IT Leading R&D Support Project.
References 1. Farias, M.F.S., Carvalho, J.M.: Multi-view Technique For 3D Polyhedral Object Rocognition Using Surface Representation. Revista Controle & Automacao. (1999) 107-117 2. Fischler, M. A., Bolles, R. C.: Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Comm. Assoc. Comp. Mach. Vol. 24 No. 6 (1981) 381-395 3. Frome, A., Huber, D., Kolluri, R., Bulow, T., Malik, J.: Recognizing Objects in Range Data Using Regional Point Descriptors. European Conference on Computer Vision, Prague, Czech Republic, (2004) 4. Johnson, A. E., Hebert, M.: Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 21 No. 5 (1999) 433-449 5. Fred, R., Svetlana, L., Cordelia, S., Jean, P.: 3D Object Modeling and Recognition Using Affine-Invariant Patches and Multi-View Spatial Constraints. CVPR (2003) 272-280
Reconstruction of Rectangular Plane in 3D Space Using Determination of Non-vertical Lines from Hyperboloidal Projection Hyun-Deok Kang and Kang-Hyun Jo Intelligent Systems Laboratory, School of Electrical Engineering, University of Ulsan, 680-749 San 29, Muger 2 - dong, Ulsan, Korea {hdkang,jkh2005}@islab.ulsan.ac.kr
Abstract. This paper describes the 3D reconstruction of planar objects using parallel lines from single panoramic image. Determination of non-vertical lines is depends on position of vanishing points with two lines on panoramic image. The vertical 3D line is projected as radial line on panoramic image and horizontal line is projected as curve or arc in panoramic image. Two parallel vertical lines are converged on center point in calibrated panoramic image. On the contrary, two parallel horizontal lines have the pair of vanishing points on the circle at infinity in panoramic image. We reconstruct the planar objects with parallel lines using the vanishing points and properties of parallelism in lines. Finally, we analysis and present the results of 3D reconstruction of planar objects by synthetic or real image.
1 Introduction This paper describes the 3D reconstruction of objects from single panoramic image. In general, we need two different views of images to acquire the 3D information of objects. We calculate the 3D information of objects using the geometric constraints of camera and curved mirror under the catadioptric imaging system. Tracing the previous work of structure from motion with the omnidirectional vision system, we reconstructed a geometric information using two omnidirectional camera or single camera like a motion stereo method[3,8,10]. The previous works to acquire geometric information from single panoramic image have been used the system with conical mirror and camera; one of the properties of conical mirror is non-SVP(Non-Single ViewPoint constraint). Brassart measured the location of robot and geometric information of features with SYCLOP(Conical SYstem for LOcalization and Perception) system[6]. Especially, Pinciroli explained the reason that the condition of conical mirror and non-SVP constraints are used in 3D line reconstruction[15]. As described in Pinciroli, conical mirror system is good to reconstruct the spatial information of environment. However, it is frequently to blur the image by property of non-SVP constraints and also has limitation of vertical field of view because of the mirror's shape. A method for line reconstruction from single panoramic images has been presented, using prior information about parallelism and coplanarity[5]. In this paper, we present D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1030 – 1035, 2006. © Springer-Verlag Berlin Heidelberg 2006
Reconstruction of Rectangular Plane in 3D Space
1031
the conditions under which straight lines in 3D space can be reconstructed from single panoramic image using the vanishing points and properties of lines such as parallelism and coplanarity[15]. In this paper, we describe the camera model with hyperboloidal mirror which consist of combination of curved mirror and conventional camera in section II. Horizontal line is converted the arc(curve) with pair of vanishing points on the circle at infinity by polar-coordinates transformation. In section III, we explain the estimation of trajectory of features and vanishing points. When we get the part of vertical or horizontal line, we also discuss how to calculate the point intersecting between xy-plane and extended vertical or horizontal line and also get the parallel plane in xy-plane. In experiment, we test our proposed method in synthetic and real environment. We discuss about the measurement error based on experimental data finally.
2 Geometric Model of Vertical Line Segments in Catadioptric Camera Let us assume that we consider the triangle which consists of mirror focal point F ' and 3D points P, H , Q and the figure with the mirror focal point F ' and points of mir))))))& )))))))& ror surface Pm , H m , Qm as shown in figure 1. Length of line segments F ' Pm , F ' H m calculated by camera model[14]. Then, the angle ϕ is calculated by inner product of ))))))& )))))))& two vectors F ' Pm , F ' H m . As the same of calculation of angle ϕ ,θ is calculated by )))))))& ))))))& inner product of two vectors F ' H m , F ' Qm is shown. )))))& ))))))& ))))))& ))))))& −1 F ' Pm ⋅ F ' H m −1 F ' H m ⋅ F ' Qm ϕ = cos , θ = cos . (1) F ' H m F ' Qm F ' Pm F ' H m
Finally, we calculate the distance( R ) of camera and vertical line segments on xyplane and height of vertical line segment( Z H ) using the angles ϕ ,θ and distance between camera and mirror focal points( 2e ). R = 2e cot(θ ) , Z H = R tan(ϕ ) + 2e .
(2)
Therefore, we know the 3D information of feature in environment from located feature vertically.
3 The Estimation of Horizontal Line and Vanishing Points For 3D localization of features, we have to know the intersection points which located in ground plane and vertical line segments in wall. It is difficult to extract the intersection point because of occlusion of objects or partial arc of horizontal curve in image. We use the hyperboloidal mirror; we know the curve lies in a circle. The shape of corridor is similar to pseudo-sinusoidal function in panoramic image. If we regard the curve of shape of corridor as the function of pseudo-sinusoidal, we derive the intersection points. The synthetic and panoramic images are shown in figure 2. Our created corridor has T-type junction and also has three vanishing points. We calculate the
1032
H.-D. Kang and K.-H. Jo
vanishing point which locates on the circle at infinity and calculate the 3D localization of features in image. The trajectory of shape of corridor shown at figure 2(d); The arcs are similar to pseudo-sinusoidal function with different period. For estimation of pseudo-sinusoid function, we need to transform between Cartesian to Polar coordinates. The arc is the part of the circle in omnidirectional image and also transformed as the pseudo-sinusoidal function in panoramic image. Consider a set of feature points pi = ( xi , yi )T with i = 1, 2,..., N in circle. ( xi − a ) + ( yi − b) = ρ . 2
2
2
(3)
The pseudo-sinusoidal function is
ri = ( a cos θ i + b sin θ i ) ±
ρ − ( a + b ) + ( a cos θ i + b sin θi ) . 2
2
2
2
(4)
P
rrim
ZH
Pm
ϕ F'
h
Hm Qm
H
θ
22ee
q
p f
Q F
R
camera
Fig. 1. Geometry of camera and feature points of vertical line
Fig. 2. Synthetic corridor(T-junction): (a) Omnidirectional image with feature points which have the circle trajectory and (b) Panoramic image and pseudo-sinusoidal trajectory in each transformed points
Reconstruction of Rectangular Plane in 3D Space
1033
4 Experiments To illustrate and verify the method, experiments with synthetic and real image and hyperbolic mirror are presented. We extract the corner points or vertical edge for feature extraction. In order to get the vertical segments, the omnidirectional image translated to panoramic image. The width of transformed image presents the azimuth and elevation of feature around the robot and also extracted as the vertical line segments respective to angle. The extraction of vertical line segments as a feature described in [9].
Fig. 3. The reconstruction of synthetic corridor: (a) arbitrary viewpoint, (b) xy plane, (c) xz plane, (d) yz plane
Fig. 4. Estimation of curve and vanishing points: (a) Input image with provided features, (b) Estimated curves. (c) The results of curves as pseudo-sinusoidal functions with different periods
1034
H.-D. Kang and K.-H. Jo
4.1 Estimation of Curves Using Geometrical Constraints
We extract the curves of provided feature in image which meant by horizontal distorted line. The purpose of preliminary experiments is to analysis where the intersection points are located in the estimated curve with vanishing points. The estimated curve regards as the pseudo-sinusoidal function and its intersection points also regard as the vanishing points of corridor. Geometrical constraints means the properties such as co-planarity, perpendicularity and parallelism using lines[15]. Results of reconstruction of features are shown in figure 3 and 5.
Fig. 5. Result of reconstruction of features (up to scale)
5 Conclusion We proposed the method to acquire the 3D geometric information of feature using the geometric properties of mirror and circle at infinity in single omnidirectional image. Location of features or 3D reconstruction calculated by usage of motion stereo method or least two different point of view images. It is able to get the spatial information of located features vertically against the ground plane which uses the constraints of plane and lines and geometric condition of camera and mirror in omnidirectional vision system alternatively. These methods have the merit that is possible to know the 3D spatial information of feature and also the location of robot in navigation; because it is to obtain the 3D information of features in the single image. We use the hyperboolidal mirror with high curvature in order to view the features which located in high pose of feature in corridor and also tested our proposed method. Now we are analyzing to fit and estimate the curve of feature points in image. Also, we should experiment and analyze an estimation of curve using the fitting of pseudosinusoidal function in real environment.
Acknowledgement This work was originally motivated and supported by Research Fund of University of Ulsan in part. Also, we would like to thank Ministry of Commerce, Industry and Energy and Ulsan Metropolitan City which partly supported this research through the Network-based Automation Research Center (NARC) at University of Ulsan.
Reconstruction of Rectangular Plane in 3D Space
1035
References 1. Yamazawa, K., Yagi, Y., Yachida, M.: Omnidirectional Imaging with Hyperboloidal Projections. Proc. IROS. (1993) 2. Baker, S., Nayar, S.K.: A theory of Catadioptric Image Formation. Int. Conf. Computer Vision. (1998) 35-42 3. Gluckman, J., Nayar, S.: Ego-motion and Omnidirectional Cameras. Int. Conf. Computer Vision. (1998) 999-1005 4. Criminisi, A., Reid I., Zisserman, A.: Single View Metrology. Proc. of the 7th Int. Conf. on Computer Vision. (1999) 5. Sturm, P.: A Method for 3D Reconstruction of Piecewise Planar Objects from Single Panoramic Images. Proc. IEEE Workshop OMNIVIS, USA (2000) 6. Brassart, E., Delahoche, L., Caushois, C., Drocourt, C., Pegaro, C., Mouaddib, E.M.: Experimental Results got with the Omnidirectional Vision Sensor: SYCLOP. Proc. IEEE Workshop OMNIVIS, (2000) 145-152 7. Schaffalitzky, F., Zisserman, A.: Planar Grouping for Automatic Detection of Vanishing Lines and Points. Int'l Journal of Image and Vision Computing. Vol. 18, (2000) 647-658 8. Zhigang Z.: Omnidirectional Stereo Vision. Workshop on Omnidirectional Vision Applied to Robotic Orientation and Nondestructive Testing (NDT), The 10th IEEE Int’l. Conf. on Advanced Robotics, Budapest, Hungary (invited talk). (2001) 9. Kang, H.D., Jo, K.H.: Self-localization of Autonomous Mobile Robot from the Multiple Candidates of Landmarks. Int. Conf. on Optomechatronic Systems III. Vol. 4092. Germany. (2002) 428-435 10. Svoboda, T., Pajdla, T.: Epipolar Geometry for Central Catatioptric Cameras. Int. J. Computer Vision. Vol 49. No. 1. (2002) 23-37 11. Caushois, C., Brassart, E., Delahoche, L., Clerentin, A.: 3D Localization with Conical Vision. Proc. IEEE Workshop OMNIVIS (2003) 12. Ying, X., Hu, Z.: Catadioptric Line Feature Detection using Hough Transform. Int. Conf. Pattern Recognition. Vol. 4. (2004) 839-842 13. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. 2nd Edition. Cambridge University Press, (2004) 14. Caglioti, V., Gasparini, S.: On the Localization of Straight Lines in 3D Space from Single 2D Images. Proc. Conf. Computer Vision and Pattern Recognition. Vol. 1. USA, (2005) 1129-1134 15. Piciroli, C., Bonarini, A., Mattercci, M.: Robust Detection of 3D Scene Horizontal and Vertical Lines in Conical Catadioptric Sensors. Proc. IEEE Workshop OMNIVIS. China, (2005)
Region-Based Fuzzy Shock Filter with Anisotropic Diffusion for Adaptive Image Enhancement∗ Shujun Fu1,2, Qiuqi Ruan2, Wenqia Wang1, and Jingnian Chen3 1 2
School of Mathematics and System Sciences, Shandong University, Jinan, 250100, China Institute of Information Science, Beijing Jiaotong University, Beijing, 100044, China 3 School of Arts and Science, Shandong University of Finance, Jinan, 250014, China [email protected]
Abstract. A region-based fuzzy shock filter with anisotropic diffusion is presented for image noise removal and edge sharpening. An image is divided into three-type different regions according to image features. For different regions, a binary shock-type backward diffusion or a fuzzy backward diffusion is performed in the gradient direction to the isophote line, incorporating a forward diffusion in the tangent direction. Gaussian smoothing to the second normal derivative results in a robust process against noise. Experiments on real images show that this method produces better visual results of the enhanced images than some related equations.
1 Introduction Image enhancement and sharpening are important operations in image processing and computer vision. Many different methods have been put forth in the past [1]. However, major drawbacks of these methods are that they also enhance noise in image, and ringing artifacts may occur along both sides of an edge. More importantly, traditional image sharpening methods mainly increase the gray level difference across edge, while its width remains unchanged. For a wide and blurry edge, increasing simply its contrast produces only very limited effect. As the extension of conventional (crisp) set theory, L. A. Zadeh put forward the fuzzy set theory to model the vagueness and ambiguity in complex systems, which is a useful tool for handling the uncertainty associated with vagueness and/or imprecision. Image and its processing bear some fuzziness in nature. Therefore, fuzzy set theory has been successfully applied to image processing and computer vision [2]. In the past decades there has been a growing amount of research concerning partial differential equations in image enhancement, such as anisotropic diffusion filters [3-6] for edge preserving noise removal, and shock filters [7-9] for edge sharpening. Incorporating anisotropic diffusion with shock filter, we present a region-based fuzzy shock filter with anisotropic diffusion to remove image noise, and to sharpen edges by reducing their width simultaneously. ∗
This work is supported by the national natural science fund, China (No. 60472033), the Key Laboratory Project of Information Science & Engineering of Railway of National Ministry of Railways, China (No. TDXX0510), and the Technological Innovation Fund of Excellent Doctorial Candidate of Beijing Jiaotong University, China (No. 48007).
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1036 – 1041, 2006. © Springer-Verlag Berlin Heidelberg 2006
Region-Based Fuzzy Shock Filter with Anisotropic Diffusion
1037
2 Region-Based Fuzzy Shock Filter with Anisotropic Diffusion 2.1 Some Related Work One of most influential work in using partial differential equations (PDEs) in image processing is the anisotropic diffusion (AD) filter, which was proposed by P. Perona and J. Malik [4] for image denoising, enhancement, etc. Let ( x, y ) ∈ Ω ⊂ R 2 , and t ∈ [ 0, + ∞ ) , a multiscale image u ( x, y, t ): Ω × [ 0, + ∞) → R , is evolved according to the following equation:
∂u ( x, y , t ) = div ( g ( ∇u ( x, y , t ) )∇u ( x, y , t )) , g ( ∇u ) = 1 (1 + ( ∇u K )2 ) . ∂t
(1)
where K is a gradient threshold. The scalar diffusivity g ( ∇u ) , chosen as a non-
increasing function, governs the behaviour of the diffusion process. Performing a * backward diffusion for ∇u > K along N , this equation can sharpen the edge. Different from the nonlinear parabolic diffusion process, L. Alvarez and L. Mazorra [7] proposed an anisotropic diěusion with shock filter (ADSF) equation by adding a hyperbolic equation, called shock Filter which was introduced by S.J. Osher and L.I. Rudin [8], for noise elimination and edge sharpening:
∂u = −sign(G ∗ u )sign(G ∗ u ) ∇ u + cu . σ NN σ N TT ∂t
(2)
where Gσ is a Gaussian function with standard deviation σ , c is a positive constant. A more advanced scheme was proposed by P. Kornprobst, et al. [9], which combines image coupling, restoration and enhancement (CRE) in the following equation:
∂u = − a (u − u )+a (h u +u ) − a (1 − h )sign(G ∗ u ) ∇ u . f 0 r τ NN TT e τ σ NN ∂t
(3)
where a f , ar and ae are some constants, u0 is the original image; hτ = hτ ( Gσ ∗ u N )
= 1 , if Gσ ∗ u N < τ , and 0 elsewhere. The first term on the right is a fidelity term to carry out a stabilization effect. In order to reinforce robustness against noise, G. Gilboa et al. [10] generalized the real-valued diffusion to the complex domain, by incorporating the free Schrödinger equation. They utilized the imaginary part to approximate the smoothed second derivative when the complex diffusion coefficient approaches the real axis, and proposed an interesting complex diffusion process (CDP): ∂u = − 2 arctan(aIm( u )) ∇ u + λ u + λu . NN TT π ∂t θ
(4)
where Im(x) is the imaginary part of a complex variable x, λ = reiθ is a complex scalar, θ is a small angle, λ is a real scalar; and a is a parameter to control the sharpness of the slope near zero.
1038
S. Fu et al.
2.2 Region-Based Fuzzy Shock Filter with Anisotropic Diffusion
In equations (2) and (3), however, to enhance an image using the symbol function sign(x) is a binary decision process. This is a hard partition without middle transition. Unfortunately, the obtained result is a false piecewise constant image, where a bad visual quality is produced in some areas. In Fig.1, zoomed part of results obtained by the binary shock filter to blurry images, such as the Lena and the Peppers, is shown respectively. One can see obviously unnatural illumination transition and annoying artifacts in the image enhancement process.
Fig. 1. Zoomed part of results by the binary shock filter: left, the Lena; right, the Peppers
Fuzzy set theory discovers the fuzziness of the information received from nature by human. Fuzzy techniques are powerful tools for knowledge representation and processing, and fuzzy techniques can manage the vagueness and ambiguity efficiently. In image processing applications, many difficulties arise because the data / tasks / results are uncertain [2]. Denote the fuzzy set S on the region R as:
S=³
μS ( x)
. x , x∈R
(5)
where μ S ( x ) ∈ [ 0, 1] is called the membership function of S on R. Chen, etc [11] extended further above set to the generalized fuzzy set, where they denoted the generalized membership function (GMF) μ S ( x) ∈ [ −1, 1] to substitute μ S ( x ) ∈ [ 0, 1] . An image comprises regions with different features, such as edges, textures and details, and flat areas, which should be treated differently to obtain a better result in an image processing task. We divide an image into three-type regions by its smoothed gradient magnitude: big gradients (such as boundaries of different objects), medium gradients (such as textures and details) and small gradients (such as smoother segments inside different areas). In our algorithm, for edges between different objects, a shock-type backward diffusion is performed in the gradient direction to the isophote line (edge), incorporating a forward diffusion in the isophote line direction. For textures and details, shock filters with the symbol function enhance image features in a binary decision process, which produce unfortunately a false piecewise constant result. We notice that the variation of texture and detail is fuzzy in these areas. In
Region-Based Fuzzy Shock Filter with Anisotropic Diffusion
1039
order to approach this variation, we extend the binary decision to a fuzzy one substituting sign(x) by a hyperbolic tangent membership function th(x), which guarantees a natural smooth transition in these areas, by controlling softly changes of gray levels of the image. As a result, a fuzzy shock-type backward diffusion is introduced to enhance these features while preserving a natural transition in these areas. The normal derivative of the smoothed image is used to detect image feature. Finally, an isotropic diffusion is used to smooth flat areas simultaneously. Thus, incorporating shock filter with anisotropic diffusion, we develop a regionbased fuzzy shock filter with anisotropic diffusion (RFSFAD) process to reduce noise, and to sharpen edges while enhancing image features simultaneously:
v = Gσ ∗ u ° . ® ∂u = + − ( )sign( ) c u c u w v v u N NN T TT N NN N °¯ ∂t
(6)
with Neumann boundary condition, where the parameters are chosen as follows according to different image regions:
cN
cT
0
1 (1 + l1 u )
T2 < vN ≤ T1
0
1 (1 + l1 u )
else
1
1
vN > T1
w(vN ) 2 TT 2 TT
1
th(l2 vNN ) 0
cN and cT are the normal and tangent flow control coefficients respectively. The tangent flow control coefficient is used to prevent excess smoothness to smaller details; l2 is a parameter to control the gradient of the membership function th(x); T1 and T2 are two thresholds which divide the image into three-type different regions; l1 and l2 are constants.
3 Numerical Implementation and Experimentals We develop a speeding shock capturing scheme by using the MS limiter function [12], and present results obtained by using our scheme (6) comparing its performance with above related methods. First, we compare performances of related methods on the blurred Peppers image (Gaussian blur, σ =2.2) with added middle level noise (SNR=21dB). In Fig.2 local enlarged results are shown. As it can be seen, although AD denoises the image well specially in the smoother segments, it produces the blurry image with unsharp edges, whose ability to sharpen edges is limited, because of its poor sharpening process with the improper diffusion coefficient along the gradient direction. Moreover, with the diffusion coefficient in inverse proportion to the image gradient magnitude along the tangent direction, it does not diffuse fully in this direction and presents rough contours. For ADSF and CRE, though they sharpen edges very well, in a binary decision process they yield the false piecewise constant images, which look unnatural
1040
S. Fu et al.
Fig. 2. Zoomed parts of above results (from top-left to bottom-right): a noisy blurry image, results by AD, ADSF, CRE, CDP and RFSFAD respectively
with a discontinueous transition in the homogenous areas. Further, ADSF cannot reduce noise well only by a single directional diffusion in the smoother regions. Performing a complex diffusion process, CDP present a relative good result. But on edges with big gradient magnitude between different objects, because the diffusion process is weight- ed by the arctan(x), the sharpness of its result is somewhat lower than that using the sign(x). And that, it should be pointed out that image enhancement by the complex computation is time consuming than the real one.
Region-Based Fuzzy Shock Filter with Anisotropic Diffusion
1041
4 Conclusions This paper deals with image enhancement for noisy blurry images. By reducing the width of edges, a region-based fuzzy shock filter with anisotropic diffusion process is proposed to remove noise and to sharpen edges. Our model performs a powerful process to noise blurry images, by which we not only can remove noise and sharpen edges effectively, but also can smooth image contours even in the presence of high level noise. Enhancing image features such as edges, textures and details with a natural transition in interior areas, this method produces better visual quality than some relative equations.
References 1. Castleman, K.R. (ed.): Digital Image Processing, Prentice Hall, (1995) 2. Hamid, R.T.: Fuzzy Image Processing: Introduction in Theory and Applications. SpringerVerlag, (1997) 3. Aubert, G., Kornprobst, P.: Mathematical Problems in Image Processing: Partial Differential Equations and the Calculus of Variations. Springer-Verlag, Applied Mathematical Sciences, Vol.147 (2001) 125-164 4. Perona, P., Malik, J.: Scale-space and Edge Detection Using Anisotropic Diffusion. IEEE Trans. Pattern Anal. Machine Intell, Vol.12 (1990) 629-639 5. Nitzberg, M., Shiota, T.: Nonlinear Image Filtering with Edge and Corner Enhancement. IEEE Transactions on PAMI, Vol.14 (1992) 826-833 6. You, Y.L., Xu, W., Tannenbaum, A., Kaveh, M.: Behavioral Analysis of Anisotropic Diffusion in Image Processing, IEEE Transactions on Image Processing. Vol.5 (1996) 1539-1553 7. Alvarez, L. Mazorra, L.: Signal and Image Restoration Using Shock Filters and Anisotropic Diffusion. SIAM J. Numer. Anal. Vol.31 (1994) 590-605 8. Osher, S.J., Rudin, L.I.: Feature-oriented image enhancement using shock filters. SIAM J. Numer. Anal., 27(1990) 919-940. 9. Kornprobst, P., Deriche, R., Aubert, G.: Image coupling, restoration and enhancement via PDE’s. IEEE ICIP, 2(1997) 458-461. 10. Gilboa, G., Sochen, N., Zeevi, Y.Y.: Image Enhancement and denoising by complex diffusion processes. IEEE Transactions on PAMI, 26(8)( 2004) 1020-1036. 11. Chen, W.F., Lu, X.Q., Chen, J.J., Wu, G.X.: A new algorithm of edge detection for color image: Generalized fuzzy operator. Science in China (Series A), 38(10)(1995) 1272-1280. 12. Liu, R.X., Shu, Q.W.: Some new methods in Computing Fluid Dynamics, Science Press of China, Beijing (2004).
Robust Feature Detection Using 2D Wavelet Transform Under Low Light Environment Jihoon Lee1, Youngouk Kim1, 2, Changwoo Park2, Changhan Park1, and Joonki Paik1 1
Image Processing and Intelligent Systems Laboratory, Department of Image Engineering, Graduate School of Advanced Imaging Science, Multimedia, and Film, Chung-Ang University 2 Precision Machinery Center, Korea Electronics Technology Institute, 401-402 B/D 193, Yakdae-Dong, WonMi-Gu, Puchon-Si, KyungGi-Do 420-140, Korea [email protected]
Abstract. A novel local feature detection method is presented for mobile robot’s visual simultaneous localization and map building (v-SLAM). Camerabased visual localization can handle complicated problems, such as kidnapping and shadowing, which come with other type of sensors. Fundamental requirement of robust self-localization is robust key-point extraction under affine transform and illumination change. Especially, localization under low light environment is crucial for the purpose of guidance and navigation. This paper presents an efficient local feature extraction method under low light environment. A more efficient local feature detector and a compensation scheme of noise due to the low contrast images are proposed. The propose scene recognition method is robust against scale, rotation, and noise in the local feature space. We adopt the framework of scale-invariant feature transform (SIFT), where the difference of Gaussian (DoG)-based scale-invariant feature detection module is replaced by the difference of wavelet (DoW).
1 Introduction SLAM is required multi-modal sensors, such as ultrasound sensor, range sensor, infrared (IR) sensor, encoder (odometer), and multiple visual sensors. Recognitionbased localization is considered as the most promising method of image-based SLAM [1,2]. IR-LED cameras are recently used to deal with such complicated conditions. Map building becomes more prone to illumination change and affine variation, when the robot is randomly moving. The most popular solution for the robust recognition method is scale-invariant feature transform (SIFT) approach that transforms an input image into a large collection of local feature vectors, each of which is invariant to image translation, scaling, and rotation [3]. The feature vector is partially invariant to illumination changes and affine (or three-dimensional) projection. Such local descriptor-based approach is generally robust against occlusion and scale variance. In spite of many promising factors, SIFT has many parameters to be controlled, and it requires the optimum Gaussian pyramid for acceptable performance. Intensity-based local feature extraction methods cannot avoid estimation error because of low light-level D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1042 – 1050, 2006. © Springer-Verlag Berlin Heidelberg 2006
Robust Feature Detection Using 2D Wavelet Transform
1043
noise.Corner detection [5] and local descriptor-based [2] methods fall into this category. An alternative approach is moment-based invariant feature extraction that is robust against both geometric and photometric changes [9,11]. This approach is usually effective for still image recognition. While a robot is moving, the moment-based method frequently recognizes non-planar objects, and can hardly extract invariant regions under illumination change. This paper presents a real-time local keypoint extraction method in the two-dimensional wavelet transform domain. The proposed method is robust against illumination change and low light-level noise, and free from manual adjustment of many parameters. The paper is organized as follows. In section 2, noise adaptive spatio-temporal filter (NAST) is proposed to remove low light-level noise as a preprocessing step[6]. Section 3 describes the proposed real-time local feature extraction method in the wavelet transform domain. Section 4 summarizes various experimental results by comparing DoW with SIFT methods, and section 5 concludes the paper.
2 Noise Adaptive Spatio-temporal Filter The proposed NAST algorithm adaptively processes the acquired image to remove low light level noise. Depending on statistics of the image, information of neighboring pixels, and motion, the NAST algorithm selects a proper filtering algorithm for each type of noise. A conceptual flowchart of the proposed algorithm is illustrated in Fig. 1. The proposed NAST algorithm has four different operations which are applied to the low light images.
Fig. 1. Conceptual flowchart of the proposed algorithm
2.1 Noise Detection Algorithm The output of the noise detection block determines the operation of filtering blocks. The proposed spatial hybrid filter (SHF) can be represented as y ( i , j ) = n ( i , j ) × xˆ ( i , j ) + ( 1 − n ( i , j ) ) × x ( i , j )
,
(1)
where xˆ (i , j ) represents a pixel filtered by the SHF and n (i , j ) , which is the result of the noise detection process, takes 1 for the position of photon counting noise (PCN)
1044
J. Lee et al.
pixels and 0 elsewhere. In equation (1), x (i , j ) and y (i , j ) denote the (i, j)-th pixels in noisy and filtered images, respectively. In the proposed noise detection scheme, n (i , j ) forms a binary noise map denoted by N, which is used to filter out uncorrelated noise and to indicate the reference points for the subsequent filtering of correlated noise. 2.2 Filtering Mechanism of SHF
If the central pixel in the window (W) is considered to be noise (i.e., n (i , j ) = 1 in the noise map N), it is substituted by the median value of the window as a normal median filter. Then the noise cancellation scheme in SHF is extended to the correlated pixels in the local neighborhood ( x (i , j ) where n (i , j ) ≠ 1 and at least one n (k , l ) = 1 in W). In order to identify the correlated noise, the de-noised pixel value x ′ (i, j ) can be
defined as x ′( i , j ) =
σ
2
(i, j ) × x (i, j ) + x
2
(i, j )
,
σ (i, j ) + x (i, j )
(2)
where x (i , j ) , and σ (i, j) respectively represent the mean and variance of the window W. 2
2.3 Statistical Domain Temporal Filter (SDTF) for False Color Noise (FCN) Detection and Filtering
We use a new SDTF for removing FCN. The sum of the absolute differences (SAD) between the two working windows of consecutive frames is used for motion detection to avoid motion blur due to temporal averaging. Let xˆ (i , j , t ) and xˆ (i , j , t − 1) denote intensity values at the (i, j)-th pixel in the spatially filtered frames at time t and t − 1 , respectively, the proposed temporal filter can then be realized as xˆ ( i , j , t − 1 ) ,
S
T
(i, j, t ) > S
T
(i, j, t − 1)
¯ xˆ ( i , j , t ) ,
S
T
(i, j, t ) ≤ S
T
(i, j, t − 1)
y (i, j, t ) = ®
(3)
where y (i, j , t ) represents the final result of the proposed NAST and ST is the local statistics defined as S T (i, j , t ) =
( x (i,
j , t ) − x (i, j , t ) ) − σ 2 (i, j, t ) 2
(4)
3 A New Method for Local Feature Detector Using 2D Discrete Wavelet Transform In this section 2D discrete wavelet transform is briefly described as a theoretical background [7]. Based on theory and implementation of 2D discrete wavelet transform, the DoW-based local extrema detection method is presented.
Robust Feature Detection Using 2D Wavelet Transform
1045
3.1 Characteristics of 2D Wavelet Transform
Human visual characteristics are widely used in image processing. One example is the use of Laplacian pyramid for image coding. SIFT falls into the category that uses Laplacian pyamid for scale-invariant feature extraction [3]. On the other hand wavelet transform is a multiresolution transform that repeatedly decompose the input signal into lowpass and highpass components like subband coding [7,8]. Wavelet-based scale-invariant feature extraction method does not increase the number of samples in the original image, which is the case of the Gaussian pyramid-based SIFT method. Wavelet transform can easily reflect human visual system by multiresolution analysis using orthogonal bases[12]. Because the wavelet-based method does not increase the number of samples, computational redundancy is greatly reduced, and its implementation is suitable for parallel processing. 3.2 Difference of Wavelet in the Scale Space
Most popular wavelet functions include Daubechies [7] and biorthogonal wavelet [10]. Although Daubechies designed a perfect reconstruction wavelet filter, it does not have symmetry. In general image processing applications symmetric biorthogonal filter is particularly suitable[10], but we used Daubechies coefficient set{DB2, DB10, DB18, DB26, DB34, DB42} for just efficient feature extraction purpose.
Fig. 2. Structure of Difference of Wavelet
A. Parameter Decision for Wavelet Pyramid. In order to construct the wavelet pyramid, we decide the number of Daubechies coefficients and approximation levels, which can be considered as a counterpart of the DoG-based scale expansion. Fig. 3 shows that DB6 provides the optimum local key points, and Fig. 4 shows that approximation level 3 is the most efficient for matching. Although larger coefficients have better decomposition ability, we used DB2 as the first filter, and increased the step by 8. Because all DB filters have even numbered supports, difference between adjacent DB filters’ support is recommended to be larger than or equal to 4 for easy alignment. In this work we used difference of 8, because difference of 4 provides almost same filtered images. Table 1 summarizes results experimental of processing time and matching rate using different wavelet filters in the SIFT framework. Coefficient set of the first row provides the best keypoint extraction result with significantly reduced computational overhead. The combination given in the second row is the best in the sense of matching time and rate.
1046
J. Lee et al.
number of keypoints per imag e
800 700 600 500 400 300 200 100 0 4
5
6
7
8
9
number of scales(wavelet coefficients)
matching rate(%)
Fig. 3. The number of extracted keypoints versus the number of wavelet coefficients
2
3
4
5
Fig. 4. Matching rate versus the number of approximation level
Table 1. Various coefficient sets of Daubechies coefficients in the SIFT framework for measuring processing time and matching rate under low light(0.05lux) condition Comparison factor Coefficient set DB2, DB6, DB10, DB14, DB18, DB22 DB2, DB10, DB18, DB26, DB34, DB42 DB2, DB14, DB26, DB38, DB50, DB62 DB2, DB18, DB34, DB50, DB68, DB86 SIFT[4]
(ı=1.6, k= 2 , 1D Gaussian kernel size = 11) (Images per octave = 6, Number of octaves = 3)
Processing time(msec)
Matching rate(%)
121 130 173 213
34.72 71.92 72.37 72.87
925
57.68
B. Wavelet-like Subband Transform. As shown in Fig. 2, the proposed wavelet pyramid is constructed using six Daubecies coefficient sets with three approximation levels. Because the length of each filter is even number, we need appropriate alignment method for matching different scales, as shown in Fig. 5, where DB10 is used for 320 × 240 input images.
Robust Feature Detection Using 2D Wavelet Transform
1047
Fig. 5. Proposed alignment method for different approximation levels
3.3 Local Extrema Detection and Local Image Descriptors
In the previous subsection we described the detail construction method for wavelet pyramid and DoW. In keypoints extraction step, we used min-max extrema [4] with consideration of aligning asymmetrically filtered scales.
Fig. 6. Maxima and minima of the difference of Wavelet images are detected by comparing a pixel (marked with X) to its 26 neighbors in 3 × 3regions at the current and adjacent scales (marked with circles)
In order to extract scale-invariant feature points, we compute DoW in the scale space, and locate the minimum and maximum pixels among the neighboring 8 pixels and 18 pixels in the upper and lower-scale images. Such extrema become scaleinvariant features. DoW-based scale space is constructed as shown in Fig. 2. For each octave of scale space, the initial images are repeatedly convolved with the corresponding wavelet filter to produce the set of scale space images shown in the left. DoW images are shown in the center, and in the right maxima and minima of the difference of wavelet images are detected by comparing a pixel, marked with × , to its 26 neighbors in three 3 × 3 templates, marked with circle. For discrete wavelet transform, we used six different sets of Daubechies coefficients to generate a single octave, and make each difference image by using three octaves as DoW1 = DB10 _ L1 − DB 2 _ L1,
DoW 2 = DB18 _ L1 − DB10 _ L1
DoW 3 = DB 26 _ L1 − DB18 _ L1,
DoW 4 = DB 34 _ L1 − DB 26 _ L1
DoW 5 = DB 42 _ L1 − DB 34 _ L1
(5)
1048
J. Lee et al.
Equation (5) defines how to make a DoW image using two wavelet transformed images. Feature points obtained by the proposed method are mainly located in the neighborhood of strong edges. DoW also has computational advantage to DoG because many octaves can be generated in parallel.
4 Experimental Result We first enhanced the low light-level image quality using NAST filter, whose result is shown in Fig. 7. Comparison between DoG-based SIFT and the proposed DoW methods is shown in Fig. 8. As shown in Fig. 8, the proposed DoW method outperforms the DoG-based SIFT in the sense of both stability of extracted keypoints and computational efficiency. Fig. 9, Compares performance of combined NAST and DoG method with the DoG-based SIFT algorithm.
(a)
(b)
Fig. 7. (a) Input low light-level image with significant noise and (b) NAST filtered image
(a)
(b)
(c)
(d)
Fig. 8. Keypoint extraction results: (a) DoG, (b) DoW, and (c, d) translation of (a) and (b), respectively
(a)
(b)
(c)
Fig. 9. Keypoints extraction results under low light-level condition using (a)DoG, (b) DoG with NAST, and (c) DoW with NAST
Table 2 shows performance evaluation for processing time, matching rate and the PSNR in dB is obtained by using pre-filtering algorithm. The low pass filter(LPF)[13] were simulated for comparison with the NAST filter. In order to measure PSNR, we add synthetic noise (20dB PCN, and 15dB FCN) to the acquired low light images. This work was tested using a personal computer with Pentium- 3.0GHz.
Robust Feature Detection Using 2D Wavelet Transform
1049
Table 2. Performance evaluation of DoG and DoW with NAST filter Comparison Factor Type of method DoG under low light NAST + DoG under low light LPF + DoW under low light NAST + DoW under low light
Processing time (msec) PSNR(dB) Matching rate (%) 925 1,104 254 355
39.48 37.13 39.50
68.88 70.98 73.69 77.24
5 Conclusion The paper presents a local feature detection method for vSLAM-based selflocalization of mobile robots. Extraction of strong feature points enables accurate self-localization under various conditions. We first proposed NAST pre-processing filter to enhance low light-level input images. The SIFT algorithm was modified by adopting wavelet transform instead of Gaussian pyramid construction. The waveletbased pyramid outperformed the original SIFT in the sense of processing time and quality of extracted keypoints. A more efficient local feature detector and a compensation scheme of noise due to the low contrast images are also proposed. The proposed scene recognition method is robust against scale, rotation, and noise in the local feature space.
Acknowledgement This research was supported by Korean Ministry of Science and Technology under the National Research Laboratory Project, by Korean Ministry of Education under the BK21 Project, and by Seoul Future Content Convergence Cluster established by Seoul Industry-Academy-Research Cooperation Project.
References 1. Dissanayake., M.W.M.G., et al.: A Solution to the Simultaneous Localization and Map Building (SLAM) Problem. IEEE Trans. (2001) 229–241 2. Lionis, G.S.; Kyriakopoulos, K.J.: A Laser Scanner Based Mobile Robot SLAM Algorithm with Improved Convergence Properties. IEEE International Conference, 1 (2002)582 – 587 3. Lowe, D.G.: Object Recognition from Local Scale-invariant Features. Proc. of 7th Int’l Conf. on Computer Vision, 2 (1999) 1150-1157 4. Lowe, D.G.: Distinctive Image Features from Scale Invariant Keypoints. Int’l Journal of Computer Vision, 60 (2004) 91-110 5. Zhang, Z., Deriche, R., Faugeras, O., Luong, Q.T.: A Robust Technique for Matching Two Uncalibrated Images through the Recovery of the Unknown Epipolar Geometry. Artificial Intelligence, (1995) 87-119 6. Lee, S., Maik, V., Jang, J., Shin, J., Paik, J.: Noise-Adaptive Spatio-Temporal Filter for Real-Time Noise Removal in Low Light Level Images. IEEE Trans. Consumer Electronics, 51 (2005) 648-653
1050
J. Lee et al.
7. Daubechies, I.: Orthogonal Bases of Compactly Supported Wavelets. Comun, Pure. Appl. Math, 41 (1998) 909-996 8. Mallat, S.G.: Multifreuency Channel Decompositions of Images and Wavelet Models. IEEE Trans. On ASSP, 37 (1989) 2091-2110 9. Mindru, F., et al.: Moment Invariants for Recognition under Changing Viewpoint and Illumination. Computer Vision and Image Understanding 94 (2003) 3-27 10. Feauvean, J.C., Mathieu, P., Barlaud, M., Antonini, M.: Recursive Biorthogonal Wavelet Transform for Image Coding. in Proc. IEEE ICASSP’91 (1991)2649-2652 11. Janne Heikkila.: Pattern Matching with Affine Moment Descriptors. Pattern Recognition, 37 (2004) 1825-1834 12. Irie, K., Kishimoto, R.: A Study on Perfect Reconstructive Subband Coding. IEEE Trans.on CAS for Video Technology, 1 (1991) 42-48 13. Richardson, I.: Video Codec Design. 1st ed., John Wiley & Sons, West Sussex (2002) 195209
Robust Music Information Retrieval in Mobile Environment Won-Jung Yoon and Kyu-Sik Park Dankook University, Division of Information and Computer Science San 8, Hannam-Dong, Yongsan-Ku, Seoul Korea, 140-714 {helloril, kspark}@dankook.ac.kr
Abstract. In this paper, we propose a music information retrieval (MIR) system. In the real mobile environment, a query music signal is captured by a cellular phone. A major problem in this environment is distortions contained in the features of the query sound due to the mobile network and environmental noise. In order to alleviate these noises, a signal subspace noise reduction algorithm is applied. Then a robust feature extraction method called Multi-Feature Clustering (MFC) combined with SFS feature optimization is implemented to improve and stabilize the system performance. The proposed system has been tested with using cellular phones in the real world and it shows about 65% of average retrieving success rate.
1 Introduction A number of content-based music retrieval methods are available in the literature as in [1-3]. However these studies are mainly concern on the PC based music retrieval system with no noise condition. These methods are tend to fail when the query music signal contains background noises and network errors as in mobile environment. MIR in mobile environment is relatively new field of study in theses days. Burges et al [4] proposed an automatic dimensionality reduction algorithm called Distortion Discriminant Analysis (DDA) for the mobile audio fingerprint system. Kurozumi et al [5] combined local time-frequency-region normalization and robust subspace spanning, to search for the music signal acquired by the cellular phone. Phillips [6] introduces a new approach to audio fingerprinting that extracts a 32-bit energy differences along the frequency and time axes to identify the query music. In contrast to previous works, this paper focuses on the following issues on the mobile music information retrieval system. Firstly, the proposed system accepts query sound captured by a cellular phone in real mobile environment. In order to release noises due to a mobile network and environment, a signal subspace noise reduction algorithm is applied. Further effort to extract a noise robust feature is performed by SFS (sequential forward selection) feature optimization method Secondly, the music retrieval results corresponding to different input query patterns (or portions) within the same music file may be much different. In order to overcome this problem, a robust feature extraction method called MFC (multi-feature clustering) combined with SFS (sequential forward selection) is proposed. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1051 – 1055, 2006. © Springer-Verlag Berlin Heidelberg 2006
1052
W.-J. Yoon and K.-S. Park
2 Robust Music Feature Extraction, Selection and Clustering Before feature extraction, a well known signal subspace noise reduction algorithm [8] is applied to the query signal acquired by the cellular phone to reduce the mobile noises. Then, at the sampling rate of 22 kHz, the music signals are divided into 23ms frames with 50% overlapped hamming window at the two adjacent frames. Two types of features are computed from each frame: One is the timbral features such as spectral centroid, spectral roll off, spectral flux and zero crossing rates. The other is coefficient domain features such as thirteen mel-frequency cepstral coefficients (MFCC) and ten linear predictive coefficients (LPC). The means and standard deviations of these six original features and their delta values are computed over each frame for each music file to form a total of 102-dimensional feature vector. In order to reduce the computational burden and so speed up the search process, an efficient feature selection method is desired. As described in paper [9], a sequential forward selection (SFS) method is used to meet these needs. Firstly, the best single feature is selected and then one feature is added at a time which in combination with the previously selected features to maximize the classification accuracy. This process continues until all 102 dimensional features are selected. After completing the process, we pick up the best feature lines that maximize the classification accuracy. As pointed out earlier, the classification results corresponding to different query patterns within the same music file may be much different. It may cause serious uncertainty of the system performance. In order to overcome these problems, a new robust feature extraction method called multi-feature clustering (MFC) with previous feature selection procedure is implemented. Key idea is to extract pre-defined features over the full-length music signal in a step of 20 sec large window and then cluster these features in four disjoint subsets (centroids) using LBG-VQ clustering technique.
3 Experimental Setup and Simulation Results The proposed algorithm has been implemented and used to retrieve music data from a database of 240 music files. 60 music samples were collected for each of the four genres in Classical, Hiphop, Jazz, and Rock, resulting in 240 music files in database. The excerpts of the dataset were taken from radio, compact disks, and internet MP3 music files. Fig. 1 shows the block diagram of experimental setup. In order to demonstrate the system performance, two sets of experiment have been performed. One is the system with a proposed signal subspace noise reduction technique and MFC-SFS feature optimization (dashed line). The other is the system without any noise reduction technique and the feature optimization.
Fig. 1. Two sets of experimental setup
Robust Music Information Retrieval in Mobile Environment
1053
The proposed mobile MIR system works as follows. Firstly, a query music signal is picked up by the single microphone of the cellular phone and then transmitted to the MIR server. Then the signal is acquired by the INTEL dialogic D4PCI-U board in 8 kHz sampling rate, 16 bit, MONO. Secondly, a signal subspace noise reduction algorithm is applied to the query signal. Thirdly, pre-defined set of features are extracted from the enhanced query signal. At this moment, a trained music DB is available where the music files in DB were indexed by MFC feature clustering with SFS feature selection method Finally the, queried music is identified from the music DB using a simple similarity measure and the retrieval result will be transmitted via SMS server. A similarity measure between the queried music and the music file in DB is based on the minimum Euclidean distance measure. Two sets of experiment have been conducted in this paper. • Experiment 1: Demonstration of retrieval performance for the proposed MIR system and comparison analysis • Experiment 2: Retrieval test using MFC method with different query patterns Fig. 2 shows average retrieval accuracy of the system with noise reduction algorithm and MFC - SFS feature optimization method with respect to music query captured by cellular phone. From the figure, we can see that the retrieval performance increases with the increase of features up to certain number of features, while it remains almost constant and decreased after that. Thus based on the observation of these boundaries in figure 2, we can select first 20 features up to the boundary and ignore the rest of them.
Fig. 2. MFC-SFS feature selection procedure Table 1. MIR statistics for the system with and without NR and MFC-SFS feature optimzation
MIR system Retrieval accuracy Feature dimension
Noisy Query 44.3% 102
Query with NR and MFC-SFS 65.2% 20
As seen on the table 1, the proposed method achieves more than 20% higher accuracy than the one without noise reduction and MFC-SFS algorithm even with only a 20 feature set. To verify the performance of the proposed MFC-SFS method, seven excerpts with fixed duration of 5 sec were extracted from every other position in same query
1054
W.-J. Yoon and K.-S. Park
music- at music beginning and 10%, 20%, 30%, 40%, 50%, and 80% position after the beginning of music signal. Fig. 3 shows the retrieval results with seven excerpts at the prescribed query position. with MFC
with ou t MFC
10 0
Retriev al Accu racy (%)
90 80 70 60 50 40 30 20 10 0 Beg in n in g
1 0%
2 0%
3 0%
40 %
50 %
80 %
Query p o sitio n
Fig. 3. Retrieval results at different query portions with MFC-SFS
As we expected, the retrieval results without MFC-SFS greatly depends on the query positions and it’s performance is getting worse as query portion towards to two extreme cases of beginning and ending position of the music signal. On the other hand, we can find quite stable retrieval performance with MFC-SFS method and it yields relatively higher accuracy rate in the range of 55% ~ 67%. Even at two extreme cases of beginning and ending position, the system with MFC-SFSS can achieves classification accuracy as high as 62% which is more than 20% improvement over the system without MFC-SFS. This is a consequence of good MFC property which helps the system to build robust musical feature set over the full-length music signal.
4 Conclusion In this paper, we propose music information retrieval (MIR) system in mobile environment. The proposed system has been tested with using cellular phones in the real mobile environment and it shows about 65.2 % of average retrieving success rate. Experimental comparisons for music retrieval with several query excerpts from every other position are presented and it demonstrates the superiority of MFC-SFS method.
Acknowledgment This work was supported by grant No. R01-2004-000-10122-0 from the Basic Research Program of the Korea Science & Engineering Foundation
Reference 1. Tzanetakis, G., Cook, P.: Musical Genre Classification of Audio Signals, IEEE Trans. on Speech and Audio Processing, vol. 10, no. 5 (2002) 293-302 2. Wold, E., Blum, T., Keislar, D., Wheaton, J.: Content-based Classification, Search, and Retrieval of audio, IEEE Multimedia, vol.3, no. 2 (1996) 26-39
Robust Music Information Retrieval in Mobile Environment
1055
3. Foote, J.: Content-based Retrieval of Music and Audio, in Proc. SPIE Multimedia Storage Archiving Systems II, vol. 3229, C.C.J. Kuo et al., Eds. (1997) 138-147. 4. Burges, C.J.C., Platt, J.C., Jana, S.: Extracting Noise Robust Features From Audio Data, Proceedings of ICASSP (2002) 1021-1024 5. Kurozumi, T., Kashino, K., Murase, H.: A Robust Audio Searching Method for Cellularphone-based Music Information Retrieval, IAPR 16thICPR, Vol.3 (2002).991-994 6. Haitsma, J., Kalker, T.: A Highly Robust Audio. Fingerprinting System, 3rd Int. Symposium on Music. Information Retrieval (ISMIR), Oct. (2002) 14-17 7. Ephraim, Y.: A Signal Subspace Approach for Speech Enhancement, IEEE Transactions on Speech and Audio Processing, vol. 3, no. 4, July (1995) 251-266. 8. Liu, M., Wan, C.: A Study on Content-Based Classification Retrieval of Audio Database, Proc. of the International Database Engineering & Applications Symposium (2001) 339 345
Robust Speech Feature Extraction Based on Dynamic Minimum Subband Spectral Subtraction Xin Ma, Weidong Zhou, and Fang Ju College of Information Science and Engineering, Shandong University Jinan, Shandong, 250100, P.R. China {max, wdzhou, jufang}@sdu.edu.cn
Abstract. Based on theoretical analysis of nonlinear feature extraction, we proposed a new method called a dynamic minimum subband spectral subtraction (DMSSS) and discussed its effects to the results of speech recognition. We illustrate the process of removing corrupted components by subtracting the estimated dynamic minimum of short-time spectra. Experimental results show the proposed method is stable and yield a good performance in ASR under noisy environments. If combined with peak isolation method, DMSSS can improve the recognition performance significantly.
1 Introduction Noise robustness research is a important aspect in speech recognition, its mainly involve noise-robust speech feature extraction [1], acoustic model adaptation [2], noise compensation [3], Parallel model [4] etc. The aim of robust speech feature extraction is extracting noise-resistant features. Its main idea is to reserve the components which are insensible to noise while include linguistic information and suppress the sensible components to noise by taking advantage of masking property of speech. Peak isolation [5] is one of the noise-robust speech feature extraction methods whose algorithm is simple and can improve the performance of recognizer under noisy environment. Noise compensation method such as spectral subtraction [6] is usually realized by directly eliminating the noise. Model adaptation uses a certain amount of test data to adapt the HMM model parameters to the noisy environment. Parallel model combination uses parallel hidden Markov models perform simultaneous processes for noise and speech. When the SNR decreases, or fluctuation of noise increases, some methods of noiserobust speech feature extraction can lost more useful information than usual methods especially for invoiced sounds. Focusing on the noise-robust speech feature extraction and adequately considering the nonstationarity of noise, we proposed a method named dynamic minimum subband spectral subtraction (DMSSS), which can suppress the noise sensible components of speech and maintain noise-robust speech features at the same time. It can enhance the noise resistance of recognizer with no preknowledge of noise, and its computational cost is low. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1056 – 1061, 2006. © Springer-Verlag Berlin Heidelberg 2006
Robust Speech Feature Extraction Based on DMSSS
1057
2 Theory and Realization of DMSSS It is difficult to estimate the power spectrum accurately because the power spectrum is not time-invariant even for stationary noise. But for a short segment of noise, its fluctuation is limited. If the analysis window is short, power spectrum of noise can be thought to be stationary within this window. We can increase the SNR by wholly removing the prominent increments of spectra caused by noise. If noise and speech are assumed to be independent for additive noise model, then the noisy speech signal y(t) is the sum of the speech signal s(t) and the noise signal n(t), y(t)= s(t) + n(t)
(1)
After the spectral analysis we can obtain the power spectrum representation
Y(e jȦ )=S(e jȦ )+N(e jȦ ) .
(2)
But this summation does not hold for the amplitude spectra because their phases are not consistent. If the noise and speech are independent and we assume the distribution of noise is Gaussian, for one subband of power spectra of certain analysis window, we can think the discrete short-term power spectra have the relation like
E[|Yk |2 =E[|Sk |2 + E[| N k |2 ] .
(3)
2
If we use λ (k) standing for the expectation of | N k | , for a short time we can think
|Yk |2 =| Sk |2 +λ (k) .
(4)
λ(k) cannot be obtained directly, however the relation of minimums of power spectral in a short time span can be known, which is min[Yk ]T 2 = min[Sk ]T 2 + λ (k) .
(5)
and we have following relation
[Yk ]2 − min[Yk ]2 T = [Sk ]2 − min[Sk ]2 T .
(6)
where T is the time span required to analyze the above relations. After the above subtraction is made, the relative forms of spectra are still retained and the noise effects to the spectra are suppressed. If T was selected reasonable, the minimum of subband power spectra will close to noise subband power spectra in this time span. As Mel energy spectral are often used for recognition, we can use Mel energy instead of power spectral. first the time segments are constructed for estimating the minimum of Mel energy. Supposing the number of frames for estimating the minimum of Mel energy is N, we can select forward, backward or bi-directional approaches. For example, if the numbers of frames for calculating is even, then the forward approach uses
1058
X. Ma, W. Zhou, and F. Ju
the current frame and the past N-1 frames to make up one segment that is used for calculating m in[ M el , M el , ...M el ] , where subscript t stands for the time of current frame. The backward approach applies the current and the ahead N-1 frames and bi-directional approach uses the current, past and ahead frames to form the required segment. As the change of speech is slow, the adjacent min Mel(t) s should not have sharp change. So after calculating the min Mel(t) , we should smooth them along time direction. The simple way for smoothing is to average the adjacent min Mel(t) using formula (7), t − N +1
t−N+2
t
min Mel (t) = [min Mel(t − 1) + min Mel(t + 1)] / 2 . SP
(7)
SP
where the min Mel (t) is the smoothed minimum of Mel energy in one subband. Then we can subtract it from subband Mel energy like (8 )
MelS(t) = Mel(t) − min MelSP (t) .
(8)
MelS(t) is the subband Mel energy after subtracting dynamic minimum. To avoid the MelS(t) becoming too small, we must set a positive threshold ε , and modify formula (8) as follows:
MelS(t) = max{Mel(t) − min MelSP (t), ε} .
(9)
where the function max(•, •) stands for selecting the larger from the two parameters. According to our experience, the ε is chosen between 10~100. Fig.1a shows the three-dimensional Mel energy spectrogram of noisy speech (corrupted by 0dB addictive noise) before and after DMSSS process.
x 10
x 1 011
Mel energy amplitude
Mel energy amplitude
11
2 .5
2 .5
2 1 .5
1 0 .5 0 30
2 1 .5 1
0 .5
0 30
C h a n n e l N2o0 10
0
0
100
300 200 F ra m e N o
400
500
500
20
400 10
Channel No
200
100
0
0
300
F ra m e N o
(a )
(b )
Fig. 1. Three-dimensional spectrogram of noisy speech. (a) Noisy speech before DMSSS enhancement, (b) Noisy speech after DMSSS enhancement.
Fig 2 gives the comparison of Log Mel energy spectra for clean, noisy, and processed noisy signal spectra by using DMSSS. It can be found the effects of noise mainly concentrate in the valley of log energy spectra. After DMSSS, the noise effects can be alleviated observably.
Robust Speech Feature Extraction Based on DMSSS
1059
2 6 2 4
Log Mel energy amplitude
2 2 2 0 1 8 1 6 1 4 1 2 1 0
0
5
1 0
1 5 C h a n n e l N o
2 0
2 5
3 0
Fig. 2. Log Mel energy amplitude of one frame in speech “zhiyue” dot, dashdotted, and solid line show the noisy speech, enhanced noisy speech and clean speech
4 Experiments and Results Speech recognition experiments are conducted on the “Chinese 863”speech recognition database(a database widely used for Chinese speech recognition) [7]. The input speech is sampled at 16kHz and segmented as 25ms frames with an overlap of 15ms and preemphasis coefficient is 0.95. Triphone HMMs with 5 states single gauss are trained with HTK3.2 [8]. The features evaluated in these experiments include MFCCs, and MFCCs enhanced with DMSSS, PKISO, DMSSS combining with PKISO. Feature vectors of baselines have 39 elements consisting of 12 MFCC, 0’th cepstral parameters and their delta and acceleration coefficients. The DMSSS and PKISO methods use improved MFCCs introduced by us and Strope etc. respectively [5]. Training is conducted with clean signals, and recognizing is done with noisy speech signals at different SNRs. The noisy speech signal are generated artificially by mixing speech signals with Gaussian white and babble noises. For evaluate the results, the recognition accuracy rate defined by HTK book [8] is used as our criterion, The syllable recognition rates are shown in Table 1. Table 1. The syllable recognition Accuracy rates of BASELINE, PKISO, DMSSS(N=32), PKISO +DMSSS (N=32) SNR (dB) 0 10 20 30
Babble BASELINE
PKISO
DMSSS
0.07 35.55 75.42 82.80
1.82 42.76 61.26 76.41
25.56 58.36 72.78 76.56
Gaussian white PKISO+DMSSS BASELINE
28.10 66.04 78.88 80.22
2.86 27.16 60.82 80.30
PKISO
DMSSS
PKISO+DMSSS
-3.10 35.56 62.77 77.65
25.99 53.26 69.77 74.38
31.56 69.66 70.13 80.83
Clearly, for syllable recognition, PKISO, DMSSS, and combined PKISO with DMSSS can improve recognition performance against speech corrupted by noises. The performance of PKISO and DMSSS combination is the best in these methods. For further finding the performance of above methods to different phones, we statistic the recognition results of voiced, unvoiced consonants, and vowels. Results are shown in Table 2, we can find when the SNR is high, PKISO or DMSSS will become a little poor than plain MFCCs for unvoiced consonants, but for voiced consonants
1060
X. Ma, W. Zhou, and F. Ju
and vowels, the performance of these techniques are similar. With the decreasing of SNR, DMSSS and PKISO decrease slower than plain MFCCs When SNR is less than 20dB, the best results are always obtained with the PKISO and DMSSS combination whether for voiced, unvoiced consonants, or vowels. Table 2. The recognition accuracy rates of BASELINE, PKISO, DMSSS(N=32), PKISO +DMSSS (N=32) for unvoiced consonants, voiced consonants, vowel Type of phone
unvoiced consonant
voiced consonant
vowel
SNR (dB) 0 10 20 30 0 10 20 30 0 10 20 30
Babble
Gaussian white
BASELINE
PKISO
DMSSS
PKISO+DMSSS
BASELINE
PKISO
DMSSS
PKISO +DMSSS
-0.54 35.45 62.38 78.88 0.80 56.78 67.58 90.32 53.63 67.91 82.88 88.84
0.03 42.84 51.32 61.73 2.03 72.65 89.63 91.76 50.13 72.36 91.06 92.13
6.86 58.89 67.08 70.05 16.70 67.16 78.08 84.55 65.78 76.88 86.00 88.50
16.80 59.25 69.66 73.98 47.12 76.12 91.00 91.98 66.09 78.57 91.02 91.81
-1.41 11.55 51.92 82.38 -0.43 38.17 75.35 86.98 50.81 59.42 75.79 84.30
5.58 33.14 72.25 78.61 2.87 53.58 80.45 88.76 55.10 75.50 82.78 91.61
8.32 34.66 71.68 78.33 15.17 58.95 80.06 86.30 42.94 63.52 81.68 84.30
20.46 57.32 76.70 84.34 44.09 75.50 90.01 90.34 61.65 82.38 90.85 95.45
5 The Effect of Length of Short-Time Segment in DMSSS The syllable recognition rates at different length of short-time segment (N) for noises are illustrated as Fig.3. For babble noise, the best average recognition rates emerge approximately at N=32, but for Gaussian white noise the best average recognition rates emerge approximately at N=24. Clearly, different N for different noises can bring the changing of recognition rate. Generally, if N is too large, the minimum estimated with DMSSS will tend to zero, the abilities of DMSSS to suppress noise will decrease. If N is too small, the minimum may change a lot and cause the method ineffective.
(a)
(b)
Fig. 3. The effects of length of short-time segment to recognition results with DMSSS at different SNR (0,10,20,30dB) (a) for babble white noises, (b) for Gaussian noises
Robust Speech Feature Extraction Based on DMSSS
1061
6 Conclusions This paper presents a novel nonlinear approach for speech recognition, which is named dynamic minimum subband spectral subtraction (DMSSS). Theoretical analysis indicates the proposed method is easily realized, and experimental results demonstrate its effectiveness of improving the robustness in automatic speech recognition. The experiments also show the length of short-time segment for dynamic minimum has effects to noises suppressing. When the length is chosen reasonably, our algorithm will have significant performance improvement.
References 1. Vivek Tyagi, Christian Wellekens: On De-emphasizing The Spurious Components in The Spectral Modulation for Robust Speech Recognition. Robust 2004, COST278 and ISCA Tutorial and Research Workshop Robustness Issues in Conversational Interaction August (2004) 30-31 2. Nolazco-Flores, J., Young, S.: Continuous Speech Recognition in Noise using Spectral Subtraction and HMM Adaptation. ICASSP Pp.I. (1994) 409-412 3. Raj, B., Seltzer, M., Stern, R.: Reconstruction of Damaged Spectrographic Features for Robust Speech Recognition. Proceedings ICSLP, volume 1, Beijing, China (2000) 375-360 4. Gales, M.J.F., Young, S.J.: Robust Continuous Speech Recognition using Parallel Model Combination. IEEE Transactions on Speech and Audio Processing 4 (1996) 352–359 5. Strope, B., Alwan, A.: A Model of Dynamic Auditory Perception and Its Application to Robust Word Recognition. IEEE Transactions on Speech and Audio Processing 5 (1997) 451–464 6. Boll, S.F.: Suppression of Acoustic Noise in Speech Using Spectral Subtraction. IEEE Transactions on Acoustics. Speech and Signal Processing 27(2) (1979) 113-120 7. http://www.cass.net.cn/chinese/s18_yys/yuyin/product/product_2.htm 8. http://htk.eng.cam.ac.uk/
Searching Algorithm for Shadow Areas Using Correlation in Fourier Domain and Its Application Choong Ho Lee Graduate School of Hanbat National University, San 16-1 Deokmyeong-dong Yuseong-gu Daejeon 305-719 Korea [email protected]
Abstract. Searching and enhancement of shadow area in the satellite imagery is one of growing interest because of new possible needs of application in this field. This paper proposes an algorithm to search the shadow areas caused by buildings which are very common in satellite imagery of urban area in Korea. Binarization using histogram and threshold has demerits to have scattered small shadow areas which should be ignored for some applications. The proposed searching algorithm uses the fast Fourier transform and computes correlation in frequency domain. We search the threshold for correlation which is appropriate to obtain the shadow areas which do not include the scattered small dark areas. Experimental results show this method is valid to extract shadow areas from the satellite imagery.
1 Introduction There has been considerable recent interest in searching and enhancement of shadow area of 1-m satellite imagery [1, 2]. It is reported that the shadow area in the satellite imagery is useful to detect building images semi-automatically [3, 4]. Sohn et al. reported a searching and enhancement algorithm for shadow area which can not be performed automatically [1, 2]. K. L. Kim et al. reported some more complex method to search special areas which uses clustering, labeling, segmentation, feature extraction, and fuzzy theory [5]. K. L. Kim et al. also reported the feature extraction method which can be performed semi-automatically or automatically by comparing color images with grey-scale images [6]. However, no methods to search and enhance the shadow area semi-automatically or automatically are reported as far as authors know. We present a searching and enhancement algorithm which can be performed semiautomatically or automatically. The searching algorithm uses the correlation to obtain the template to extract shadow area from a satellite imagery. The algorithm preserves the bright area (sunny area) because we perform the enhancement algorithm only for shadow area which is separated from the bright area.
2 Searching and Enhancement Algorithm Enhancing the picture quality of the shadow area tends to degrade that of the bright area in the satellite imagery as shown Fig. 1, when we use the enhancement D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1062 – 1067, 2006. © Springer-Verlag Berlin Heidelberg 2006
Searching Algorithm for Shadow Areas Using Correlation in Fourier Domain
1063
algorithms such as histogram equalization, histogram specification, or contrast stretching etc. which are proper to process dark area in images. 2.1 Extraction of Shadow Area To prevent the degradation of picture quality in the bright area, we need to separate an image into two parts, sunny area and shadow area, to use the algorithms only in shadow area but to preserve the sunny area as it is.
Fig. 1. A 500x500 satellite image which includes shadow area
Binarization algorithms based on the grey levels can lead to the problem of diffuse shadow areas as shown Fig. 2. We introduce the correlation to make a template which can be used to extract shadow area from the satellite image. Correlation C (u , v ) can be computed by solving
C (u , v) = Re{F (u , v) * G (u , v)} .
(1)
where f ( x, y ) and g ( x, y ) are two image signals and F (u , v ) and G (u , v ) are Fourier transformations. Actually, the correlation C (u , v ) can be computed by first rotating an image m by 180 degrees and then using the FFT-based convolution techniques as follows
C (u , v) = Re[ F −1{F ( M )G π (m)}] .
(2)
F −1 means inverse Fourier transformation, and F (M ) denotes Fourier transπ formation of image M and G is Fourier transformation of an image m rotated by
where
180 degrees. Using the template Fig. 3, the shadow area can be extracted as shown by Fig. 4. Likewise, the bright area can be extracted as shown by Fig. 5.
1064
C.H. Lee
Fig. 2. After binarization by greylevel 128
Fig. 3. A template for shadow area obtained by correlation of 8x8 shadow block and original image
Fig. 4. Shadow areas which are extracted from the original image
Searching Algorithm for Shadow Areas Using Correlation in Fourier Domain
1065
Fig. 5. Sunny areas which are extracted
2.2 Enhancement of Picture Quality of Shadow Area To enhance the picture quality of shadow area, histogram equalization is used. Fig. 6 shows the result.
Fig. 6. Shadow area after histogram equalization
3 Simulation Results In the experiment, we used 8x8 block of shadow area to compute the correlation of the satellite imagery. Actually, we computed the correlation using Eq. 2 (refer [7]). The Fig. 3 is obtained using threshold 1,500,000 which is a little less than the maximum value 2,229,975. The correlation provides the template for shadow areas which can extract the shadow areas from the original satellite imagery. The shadow area obtained by using the template does not have diffused point areas. The image quality of shadow area is improved by histogram equalization while the bright area is preserved as it is. After that process the shadow area and the bright area are added.
1066
C.H. Lee
Fig. 7. Reconstructed image
Fig. 7 is the resultant image obtained by histogram equalization for the original image Fig. 1. Fig. 8 shows the resultant image using the algorithm we suggested. After histogram equalization without our algorithm, the objective picture qualities of shadow area and bright area are degraded to 27.99 dB and 22.79 dB respectively in PSNRs(peak-signal-to-noise-ratio). Bright area degrades more than shadow area does. Considering that while the subjective picture quality of shadow area aggrades the degradation of bright area in the satellite imagery can be critical sometimes. Thus, the algorithm we propose is useful to improve the picture quality of shadow area preserving that of the bright area.
4 Conclusions The correlation of a small block of shadow area and the satellite imagery provides the template which is not diffuse for shadow areas. Using the template, the shadow area and the bright area can be separated. While the conventional enhancement algorithms degrade the picture quality of bright area, our algorithm can enhance the picture quality of shadow area more effectively preserving that of bright area as it is.
References 1. Sohn, H. G., Yun, K. H., Park H. K. : Enhanced Urban Information Recognition through Correction of Shadow Effects. In: Proceedings of Korean Society of Surveying, Geodesy, Photogrammetry and Cartography, Busan Korea, (2003) 187–190 2. Sohn, H. G., Yun, K. H., Lee, D. C. : A Study of the Corrction of Shadow Effects in Aerial Color Photo (Focusing on Road in Urban Area). In: Proceedings of Joint Conf. of Korean Society of GIS and Korean Society of Remote Sensing, (2003) 383–387 3. Ye, C. S. ,Lee, K. H. : Detection Using Shadow Information in KOMSAT Satellite Imagery. In: Proceedings of Joint Conf. of Korean Society of GIS and Korean Society of Remote Sensing Vol.16, No.3, 2000 383–387
Searching Algorithm for Shadow Areas Using Correlation in Fourier Domain
1067
4. Yoon, T.H. : Semi-automatic Building Segmentation from 1m Resolution Aerial Images. In: Master Thesis, Korean Advanced Institute of Science and Technology, Daejeon, Korea 5. Kim. K. L., Kim, U. N., Kim H. J.: Methods on Recognition and Recovery Process of Censored Areas in Digital Image. In: Proceedings of Korean Society of Surverying, Geodesy, Photogrammetry and Cartography, (2002) 1–11 6. Kim. K. L., Kim, U. N., Chun, H. W. : A Study on Semi-automatic Feature Extraction Using False Color Aerial Image In: Proceedings of Korean Society of Surveying, Geodesy, Photogrammetry and Cartography, (2002) 109–115 7. Cluster Image Processing Toolbox User’s Guide. The Mathworks Inc. MA, (1997)
Shadow Detection Based on rgb Color Model Baisheng Chen and Duansheng Chen Department of Computer Science of Huaqiao University, Quanzhou, 362021, China {samchen, dschen}@hqu.edu.cn
Abstract. A shadow detection scheme based on photometric invariant rgb color model is proposed. We firstly study the photometric invariance of rgb color model and deduce some important property. The algorithm combines the cues of moving cast shadow on brightness and chromaticity successively to detect candidate shadow regions in rgb color space; finally, a post-processing by exploiting region-based geometry information to exclude pseudo shadow segments. Results are presented for several video sequences representing a variety of illumination conditions and ground materials when the shadows are cast on different surface types. The results show our approach robust to widely different background and illuminations.
1 Introduction Moving cast shadows can cause object merging, object shape distortion, and even object losses (due to the shadow cast over another object). For this reason, moving shadows detection is critical for accurate objects detection in vision surveillance applications. Many algorithms have been proposed in the literatures that deal with shadows. These approaches are mostly classified as model-based and feature-based. The first approach comprises methods which are designed for special applications, such as aerial image understanding [1], surveillance [2]. They exploit a prioriknowledge of the 3-D geometry of the scene, the objects and the illumination. These model-based approaches have two major limitations. Simple rectilinear models can be used only for simple objects, for instance buildings and vehicles. In addition, the priori-knowledge of the illumination and 3-D geometry of the scene is not always available. The second overcomes these limitations by exploiting shadow geometry, intensity and color properties. For example, [3] utilizes the rationale that shadows have similar chromaticity, but lower brightness than the background to remove shadows in HSV color space. The approach proposed is shadow feature based. We exploit the shadow features on brightness and chromaticity to detect shadows and implement the algorithm in the photometric invariant normalized rgb color space. The remainder of this paper is organized as follows. In section 2, we focus on the photometric invariance of normalized rgb color model. At the next section, normalized rgb color model based shadow detection scheme is detailed described. Experimental results and analysis are given in section 4. At the last section, we draw a conclusion about our work. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1068 – 1074, 2006. © Springer-Verlag Berlin Heidelberg 2006
Shadow Detection Based on rgb Color Model
1069
2 rgb Color Model In our work, we focus on the normalized color model defined as follows:
r=
R G B ; g= ; b= R+G+B R+G+B R+G+B
(1)
where r + g + b = 1 . The normalized rgb color model defined above possesses photometric invariant features, which is insensitive to surface orientation, illumination direction and intensity. Photometric invariant features are functions describing the color configuration of each image coordinate discounting local illumination variations, such as shadings and shadows. Given the red, green and blue sensors with spectral sensitivities f R (λ ) , f G (λ ) and
f B (λ ) respectively, to obtain an image of the surface patch illuminated by a SPD of the incident light denoted by e(λ ) , the measured sensors values will be given [4] as,
C = m b (n,s) ³ f C (λ )e(λ )c b (λ )dλ + ms (n,s,v) ³ f C (λ )e(λ )cs (λ )dλ λ
λ
(2)
where c b (λ ) and cs (λ ) are the albedo and Fresnel reflectance respectively. λ denotes the wavelength, n is the surface patch normal; s is the direction of the illumination source, and v of the direction of the viewer. Geometric terms m b and ms denote the geometric dependencies on the body and surface reflection respectively. Considering the neutral interface reflection (NIR) model and white illumination, it stands e(λ ) = e, cs (λ ) = cs . Then, we proposed the measured sensor values are given: C w = em b (n,s)k C + ems (n,s,v)cs ³ f C (λ )dλ λ
(3)
for C w = {R w , G w , Bw } giving the red, green and blue sensor response under the
assumption of a white light source. k C = ³ f C (λ )c b (λ )dλ is a compact formulation λ
depending on the sensors and the surface albedo. If the integrated white condition holds ³ f R (λ )dλ = ³ f G (λ )dλ = ³ f B (λ )dλ = f , we have, λ
λ
λ
C w = Cb + Cs = em b (n,s)k C + ems (n,s, v)cs f
(4)
According to the body reflection term of eq. (3), Cb = em b (n,s)k C , then the normalized rgb color model is insensitive to surface orientation, illumination direction and intensity as can be seen from
r(Rb ,Gb ,Bb ) =
emb (n,s)kR kR = emb (n,s)(kR + kG + kB ) kR + kG + kB
(5)
It only depends on the sensors and the surface albedo. Equal arguments also hold for g and b components.
1070
B. Chen and D. Chen
3 Normalized rgb Color Model Based Shadow Detection Our approach, depicted in Fig. 1, is a multistage approach where each stage of the algorithm removes moving object pixels, which cannot be shadow pixels. Input video frame is passed through the system and Mi is the binary mask of potential shadow pixels updated after each step.
Fig. 1. Steps of the shadow detection algorithm
Step 1—Moving Object Detection: To model the background, recent history of each k
pixel x, is modeled by a mixture of K Gaussians: P(x)= ¦ i=1 Wi ×η ( x, μi , ¦i ) , where
for each R, G, B channel, P(x) is the probability of observing pixel value x. η is the Gaussian function whose ith mixture component is characterized by the mean μi , covariance ¦ i and weight Wi . The distortion of brightness for each pixel between the incoming frame and the reference image is computed to abstract the candidate pixels of moving objects. This process is performed as follows:
1 I(x) − B(x) ≥ 2*σ (x) F(x) = ® otherwise ¯0
(6)
where σ ( x) is the mean value of the distortion of brightness for the pixel at the position x, and it is computed as follows:
σ (x,i) = max(σmin ,α I(x,i) − B(x) + (1−α )σ (x,i −1))
(7)
where a minimum distortion value σ min is introduced as a noise threshold to prevent the distortion value from decreasing below a minimum should the background measurements remain strictly constant over a long period. After the initial detection, the binary mask (M1 in Fig. 1) contains the moving object, its shadow and noisy isolated pixels. Step 2—Luminance Ratio Test: Researches in [5] state that the ratio between pixels when illuminated and the same pixels under shadows can be roughly linear. Step2 exploits this observation to initially segment shadow regions. The following intensity test is applied to moving objects and shadow pixels to further reduce their number. Let p(x) be the pixel at position x, where I(x) and B(x) are the corresponding pixel values in the input image and in the background model, respectively. ∀p(x) ∈ M1 if(α <
B(x) <β ) M2=M2 p(x) I(x)
(8)
where the upper bound β is used to define a maximum value for the darkening effect of shadows on the background and is approximately proportional to the light source intensity. Instead, the lower bound α prevents the system from identifying as shadows those points where the background was darkened too little with respect to the expected effect of shadows.
Shadow Detection Based on rgb Color Model
1071
Step 3—Chrominance Homogeneity Analysis: This step performs a pixel-wise operation to calculate chrominance correlation between the points of the input image and those of the background model in normalized rgb color space; and further, the pixels with bigger chrominance correlation than the given threshold value are segmented to form candidate shadow regions. Let p(x) be a pixel where x ∈ (r,g, b) is background pixel and x ∈ (r ' ,g ' , b' ) is foreground moving pixel, then
α = 1−
r − r' r + r'
; β = 1−
g − g' g + g'
; γ =1 −
b − b' b + b'
(9)
The chrominance correlation, H, is calculated as H(r, g, b) = α ∗ β ∗ γ . Let H T be the threshold to scale color similarity, and M3 = ∅ , the chrominance homogeneity analysis is performed according to the following equation: ∀p ∈ M2 if (H > H T ) M3 = M3 p
(10)
This process excludes most of the candidate shadow points as a result of the process of luminance ratio test. However, some points are still preserved because they are highly identical with the true shadow points; and these will be further removed by the consequent procedure of region-based geometry analysis. Step 4—Region-Based Geometry Analysis: This step performs a region-based process to exclude the pseudo shadow regions which are misclassified in step 3, by exploiting shadow geometry information. So far, most of the object regions have been removed, and the rest of the candidate shadow regions significantly exhibit an important spatial cue on geometry that the segments of shadow boundary which is connected to object covers a small percent over the total boundary of the shadow region. Let { R(i)} be the segmented shadow regions from step 2 and step 3, then C (i) and L(i) denote perimeter and boundary which is connected to the object, respectively, of the corresponding candidate shadow regions, then the ratio L C tends to be rather smaller for true shadow regions. We hypothesize a patch in the image under the mask { R(i)} to be a shadow region and generate a mask M4 of the output detected shadows as: ∀R ∈{R(i)} if (
L(i) < LT ) M 4 = M 4 R C(i)
(11)
where LT is threshold value for sharing boundary segments ratio of candidate shadow regions. Those candidate shadow regions which share boundary segments ratio are less than LT are incorporated to model the mask M4.
4 Experiment and Analysis 4.1 Data and Parameters
The testing data consists of moving people with variety of illumination conditions and background surfaces found in outdoor areas, which contain structures such as
1072
B. Chen and D. Chen
buildings, houses, walkways, park, playground, etc. The range of physical conditions in our experiments (shown in the heading of examples) includes: − background surface materials—grass (park), wood (trees), concrete (walkways), granites (buildings), color dope concrete (playground); − color—typical uniform surface colors, textured colors, saturated and neutral; − surface lope—vertical, horizontal, slopes (such as a hill); − sun angle—morning, high noon, early afternoon, late afternoon.
The number of Gaussians (K at Step 1) is fixed at 3. The five parameters that affect our algorithm are: 1) threshold σ min = 10 pixels selected for background subtraction (Step 1), 2) the lower bound α = 1 and the upper bound β = 3 selected for luminance ratio test (Step 2), 3) H T = 0.86 for Chrominance Homogeneity Analysis (Step 3), and 4) sharing boundary segments ratio threshold LT = 0.4 selected for region-based geometry analysis (Step 4). Sequence1: Time: 3:40p.m. Strength: high; Size: large; Surface: textured granite walkway, flat Frame#180
Sequence2: Time: 8:30a.m. Strength: medium; Size: medium; Surface: textured concrete, flat
230
95
Sequence3: Time: 1:40a.m.; Strength: low; Size: very large; Surface: textured concrete, flat
200
380
415
Steps Input video
Step 1
Step 2
Step 3
Step 4 % of shadow detected % of object detected S
FS
S
FS
S
FS
S
FS
S
FS
S
FS
94.
5.2
93.1
7.8
92.4
5.7
90.5
9.2
87.
15.
82.
12.
Fig. 2. Results on shadows of different strength
confusion matrix
Shadow Detection Based on rgb Color Model
1073
4.2 Results and Performance Evaluation
In order to evaluate the performance of our algorithm, the ground truth data are obtained by manually drawing a contour around the moving objects and their shadows. Quantitive evaluations are shown in using a confusion matrix for shadow detection rate (S), object detection rate (O), and false alert rate (FS). Experimental results are given in Fig.2 which represents shadows of different strength and in Fig.3 where shadows of different sizes cast on several kinds of background surfaces. Sequence4: Time: 3:00p.m. Strength: high; Size: medium; Surface: grass, inclined Frame#190
Sequence5: Time: 4:00p.m. Strength: high; Size: large; Surface: textured granite bricks, vertical
235
85
Sequence6: Time: 1:30p.m.; Strength: high; Size: very small; Surface: textured color polished concrete, flat
120
390
470
Steps Input video
Step 1
Step 2
Step 3
Step 4 % of shadow detected % of object detected S 84. 8
FS
S
FS
S
FS
S
FS
6.7
94.2
11.4
82.9
11.7
85.4
17.6
S 85. 7
FS 8.6
S 92. 1
FS 10. 3
confusion matrix
Fig. 3. Results on shadows cast on different surfaces
Sequence 1: the shadow cast on the textured granite walkway is of high strength in this example. In Sequence1, most of the object pixels are successfully eliminated by luminance ratio test and chrominance homogeneity analysis in normalized rgb space. On the average, over 90 percent of shadow pixels are correctly detected while the percentage of object pixels classified as shadow is minimal.
1074
B. Chen and D. Chen
Sequence 2: Fig.2 (Sequence2) shows multiple moving objects at different distances from the camera, in which shadows appear isolated as well as overlapped. The detection scheme turns out to well deal with both the cases mentioned above. Sequence 3: this video was shot at late afternoon and the background scene is mixed up and of high noises. Due to slight contrast on luminance between foreground and background, the process of luminance ratio test fails to well exclude object pixels while the consequent process of chrominance homogeneity analysis succeeded to remove most of those retaining object pixels. Sequence 4: Fig.3 (Sequence4) represents an inclined and curved grass surface. This is also an example of a surface that exhibits highly saturated color and secularities due to the surface type of grass and angles of incidence. Our algorithm does a good job to deal with this case. Sequence 5: Sequence5 illustrates a difficult shadow and object color situation. In this case, parts of the object and the shadow have the same diffuse color as the background due to the obscure illumination and it is heavily self-shadowed. As a result, the selfshadowed region of subject is preserved after the stage of luminance ratio test and chrominance homogeneity analysis. The self-shadowed region possessing a large percent of sharing boundary segments between the detected object regions. The experimental results shows the post-processing works well to deal with self-shadowed regions and do little harm to the completeness of the object contour. Sequence 6: Sequence6 is representative of small shadow size and shows a highly saturated color surface. The background is uniformly red and green polished playground.
5 Conclusion In this paper, a novel approach with a combination of multiple techniques to detect shadows and moving objects based on photometric invariant rgb color model was presented. The algorithm was tested on a wide variety of video data consisting of different illumination conditions and background surfaces, and the results show the proposed algorithm robust and efficient to remove moving shadows.
References 1. C. Wang, L. Huang., Rosenfeld, A.: Detecting Clouds and Cloud Shadows on Aerial Photographs. Pattern Recognition Letters 12 (1991) 55-64 2. Yoneyama, A., Yeh, C.-H., Kuo, C.-J.: Moving Cast Shadow Elimination for Robust Vehicle Extraction Based on 2D Joint Vehicle/Shadow Models. In: Ioannis, Carlo (eds.) Proc. of IEEE Conf. on AVSS’03. IEEE Computer Society, New York (2003) 229-236 3. Cucchiara, R., Grana, C., Piccardi, M., Prati, A.: Detecting Moving Objects, Ghosts, and Shadows in Video Streams. IEEE Transactions on Pattern Analysis and Machine Intelligence 10 (2003) 1337–1342 4. Shafer, S.A.: Using Color to Separate Reflection Components. Color Res. Applicat. 4 (1985) 210-218 5. Rosin, P.L., Ellis, T.: Image Difference Threshold Strategies and Shadow Detection. In: Pycock, David (eds.): Proc.of 6th BMVC. Butterworth-Heinemann, Oxford (1995) 347-356
Shape Analysis for Planar Barefoot Impression Li Tong, Lei Li, and Xijian Ping Information Science Department, Zhengzhou Information Science and Technology Institute No.837, P.O.Box 1001, Zhengzhou, Henan, China, 450002 [email protected] {aerolite_l, pingxijian}@yahoo.com.cn
Abstract. The shape or outlines of planar barefoot impressions of persons has been found useful in many cases for personal identification. This paper performs an automatic analysis on the shape of barefoot impressions, totally 28 numerical feature measurements are extracted to describe the geometry and structure properties of the barefoot impression’s shape. Each measurement on the footprint outlines extracted in this work is examined for the distinguishability and the consistence by its population standard deviation versus interpersonal standard deviation ratio (SDR). Experiment result shows that we can obtain feature measurements from barefoot impression images reliably.
1 Introduction As a kind of common trace information provided by human body, barefoot impressions have been proved to extremely useful in identify a person in many cases [1-3] . Barefoot impression shape features, such as size, positions, directions and relative distribution relationship, for the weight bearing areas in footprint play the most important roles in barefoot shape analysis. However, the task of footprint analysis is carried out manually or half-manually by someone trained in forensic, whether that be an Orthopedist, or a Forensic Identical Specialist, and there is a need to develop footprint verification technique based on automatic and reliable shape analysis by providing consistent and precise feature measurements. Previous footprint analysis researches, mainly in forensic area, have provided some basic methods in footprint shape characteristics analysis. In Kennedy’s [4, 5] study, CAD software was used to get geometric measurements for manual traced footprint graphics, and part of shape landmarks still needs to be located in an alternate way. Kazuki et al.[6] developed an automatic footprint-based personal identification system. Under low image resolution, only features of direction and position for whole footprints were extracted. The main difficulty in footprint recognition is caused by the variability of footprint features. Though barefoot impressions mainly decided by a person’s foot bones distribution, one leaves different barefoot impression when he is in deferent behaviours. In order to identify characteristics that do not change, this paper performs a study on the barefoot impressions left by persons standing normally on rigid carriers with no bearing. This paper presents an automatic approach to extract precise numerical shape feature measurements for scanned barefoot images based on computational geometry. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1075 – 1080, 2006. © Springer-Verlag Berlin Heidelberg 2006
1076
L. Tong, L. Li, and X. Ping
2 Preprocess for Planar Barefoot Impression Image 2.1 Image Collection and Segmentation In order to get a collection of footprint samples, each person was introduced to step onto a flat scan device and stand normally, and each pair of barefoot impression image was scanned at 100 * 100 dpi. Two levels of shape information are founded to be in involved in this footprint image: one is the boundary for the whole foot area and the other is the shape for the weight bearing area produced by human weight. Corresponding to this visual information, the histogram of barefoot image shows three distinct peaks. Hereby, a method of multi-segmentation[7] is firstly performed to yield two thresholds for extracting the area of the whole foot and weight bearing areas respectively (Fig.1). )
Fig. 1. Multi-threshold segmentation for footprint: (a) original image, (b)weight bearing impression image,(c) whole foot image
2.2 Alignment and the Coordinate System of the Footprint There are no landmarks that can be readily identified in barefoot impression. The robustness of feature measurements is determined by whether the impressions by the same object give the same or similar results. Hereby, it is necessary to develop a method to seek a correct alignment and setup a coordinate system for the footprint. For a barefoot impression, the shape of the sole area is considered to have a more stability than that of toe areas [2], and hence it is used as the main clue to find the alignment. The convex hull of the sole can be drawn out by tracing the outline. It is unique. The line segment in the convex hull of maximal length on each side of the sole was selected as the tangent cone. The coordinate system for footprint is setup as following: a central axis of the footprint outline is defined as the bisector of the tangent cone, i.e., the line bisecting the cone angle through the apex. Then the vertical axis in the coordinate system is just defined as the central axis. The origin for the coordinate is defined as the intersection of the central axis with the line joining the two points of tangency of the tangent cone to the heel. It has been confirmed to meet with the vision perception and many other measurements depend on it.
Shape Analysis for Planar Barefoot Impression
1077
3 Feature Measurements for Barefoot Impression Shape 3.1 Landmarks’ Definition and Location On the basis of the coordinate system having been set up, landmarks in the sole can easily to be located (see Fig.2(a)). Moreover, the information about toe is embodied in the outlines of the whole foot with reliability. In order to make a consistent measure, we put the shape image of whole foot into the coordinate system setting up for sole, and the five metatarsal heads form the semicircular bumps at different heights and are segmented by six concave corner points (see Fig.2(b)). Metatarsal ridge curve
LFP LF2 LFC
inner arch outline
LF1
Medial axis curve
Interior Tangent line
Exterior Tangent line LHC
LH2
LH1 LHP
(a)
Fig. 2. (a) Landmarks and shape measurements in the sole; (b) landmarks and shape measurements in whole print
3.2 Measurement Definitions and Computation After each of the collected footprint images is preprocessed and is signed with landmarks on the outlines, Feature measurements is defined and computed using digital geometry algorithms. All of the numerical measurements are divided into three groups as following. Table 1. Numerical measurements for the whole footprint
Name LLength LHWidth LDi LOi LPW LAW RH
Definition length of whole foot, defined as the length of the minimum exterior rectangle along the vertical axis width of whole foot, defined as the width of the minimum exterior rectangle along the horizontal axis distance from LHC to LTPi (i=1,…,5) orientation from horizontal axis to the line between LHC to LTPi (i=1,…,5) Perimeter of whole footprint Area of whole footprint Ratio of LLength to LHWidth
1078
L. Tong, L. Li, and X. Ping
(1)Structure features and geometry features for the whole footprint In recognizing a footprint, the primary knowledge about the foot is come from the structure distribution and size, which involves the intrinsic information about foot bones distribution. The numerical measurements for left footprint are listed in Table 1. (2)Structure features and geometry features for the sole Since feature measurements for sole are applied as main clues to verify a footprint in forensic science, a group of common used numerical measurements are explained in Fig.2(b) and listed in Table 2. Table 2. Numerical measurements for the sole
Name LSLength LSFWidth LSHWidth LDT1 LDT2 LOT LOF LOH LPS LAS
Definition length of the sole, distance from LFP to LHP width of the half sole, distance from LF1 to LF2 width of the heel, distance from LH1to LH2 distance from LH1 to LF1 distance from LH2 to LF2 inclination angle between the two tangent lines orientation from horizontal axis to the line connecting LF1 and LF2 orientation from horizontal axis to the line connecting LH1 and LH2 Perimeter of the sole Area of the sole
(3) Curve features based on curvature analysis As in previous literatures, geometrical measurements are mainly computed based on dominate points or subsampled points on the footprint outlines, and curve segments with unique local changing relationship are discarded or described with scripts by forensic experts. In this paper, the measurement of average curvature energy for boundary curve segment is included in the footprint measurements set. Considering the four curve segments described in Section 3.1, the curve segment of metatarsal ridge and that of the inner arch outline involve most of the unique information. Moreover, a process of medial axis transform is performed and results a curve as the sketch for sole. For the purpose of measuring the changing relationship on a curve, we denote the average bending energy value to assess the degree of ordination changing for the points on the curve. For a sole outline in footprint, the bending energy measurements for each of the three important curve segments, i.e. BEmr for metatarsal ridge curve, BEia for inner arch outline, and BEma for medial axis curve, are included in the shape feature measurements set.
4 Experiment and Statistical Variance Analysis for Feature Measurements Based on our work of planar footprint shape analysis, totally 28 different numerical measurements are taken from each barefoot impression. We have carried out the
Shape Analysis for Planar Barefoot Impression
1079
experiment of automatic feature extraction for 76 subjects. 71 out of 76 footprint images are processed correctly. The main reason for the failure cases is the distorted outline of the sole due to dim edge between the weight bearing area and the whole footprint at the inner arch. To develop the technology of footprint verification, it is desired to find precise and consistent numerical feature measurements to distinguish footprints from different subjects. Hence, a variance analysis is conducted for all measurements. Firstly, feature measurement sets computed from barefoot impressions of 76 different subjects are collected to estimate the population variance component ıp2. And theoretically, the lager ıp2 is, the comparatively higher distinguish ability the corresponding feature measurement has. Secondly, 11 subjects out of our subject group, treated as randomly selected samples of the general population, are selected for repeated measure, 14 samples are collected from each subject daily. The individual variance component ıi2 estimated from these samples weight the variable’s reliability, i.e., with a smaller ıi2, a shape measurement will have a higher probability to consistent for single subject. Then, Standard Deviation Ratio (SDR), defined as ıp/ıi, is used to weight the comprehensive ability for each measurement variable. The larger the standard deviation ratio is, the more useful using this measure in footprint matching (Table 3). Table 3. Standard deviation ratio of footprint measurements
name LLength LHWidth LD1 LD2 LD3 LD4 LD5
SDR 8.6052 9.6567 10.6953 10.7620 10.1962 9.5344 9.4368
name LO1 LO2 LO3 LO4 LO5 LPW LAW
SDR 6.1274 6.3524 6.7015 6.0060 6.1040 12.6280 37.5366
name RH LSHWidth LSFWidth LSLength LAS LPS LDT1
SDR 10.0993 5.7355 7.3420 14.8139 7.4571 8.1037 5.0416
name LDT2 LOT LOH LOF BEia BEma BEmr
SDR 6.4273 4.6324 5.0473 5.5288 2.6896 3.5054 4.6718
As implicated in Table 3, ignoring the correlation with other features, each feature plays a different role in distinguishing different footprints with consistent measure. Among the shape feature measurements, the area and the perimeter of whole foot (LAW and LPW) and the length of sole (LSLength), with the estimated SDR of 37.5366, 12.6280 and 14.8139 respectively, have obviously the best properties. Following them are the measurements to describe the figure of sole, e.g. ratio of LLength to LHWidth (10.0993), the perimeter of sole (8.1037). Measurements extracted from whole footprint have comparatively better utility than those extracted from sole have. Moreover, with respect to the average bending energy, metatarsal ridge curve plays the most important role among the three analyzed curves, but more information about local property of curve are still desired to be included.
5 Conclusion As the first step towards planar footprint verification technology, this paper develops an automatic reliable method of shape analysis for barefoot impression outlines and
1080
L. Tong, L. Li, and X. Ping
presenting a set of precise shape feature measurements. Comparing with Kennedy’s work, we deal with planar barefoot images and extract features from them without any human interaction, and result more reliable measurements for further analysis. Furthermore, besides common geometrical shape features, curve characteristics are suggested to include in footprint comparing. Experiment results show that our algorithm can perform the feature extraction from planar barefoot images reliably. Distinguish ability and robustness of each measurement is examined by SDRs which is considered as a reference for feature selecting in footprint verification.
Acknowledgement This research is supported by the National Natural Science Foundation of China under Grant No. 60272004.
References 1. John, A., DiMaggio.: The Forensic Podiatrist and Barefoot Evidence, First international conference on forensic human identification, London, England (1999) 2. Laskowski, GE., Kyle, VL.: Barefoot Impressions-a Preliminary Study of Identification Characteristics and Population Frequency of Their Morphological Features, Journal of Forensic Science, Vol.33(2) (1988) 378–388 3. Patil, MK, Vasanth Bhat, V., Bhatia, M., et al.: New Methods and Parameters for Dynamic Foot Pressure Analysis in Diabetic Neuropathy, Proceedings-19th international conference IEEE/EMBS (1997) 1826–1828 4. Kennedy, RB.: Uniqueness of Bare Feet and Its Use as A Possible Means of Identification, Forensic Science International, Vol.82(1) (1996) 81–87 5. Kennedy, RB.: Statistical Analysis of Barefoot Impressions, Journal of Forensic Science, Vol.48(1) (2003) 55–63 6. Nakajima, K., Mizukami, Y., et al.: Footprint-based Personal Recognition, IEEE Trans. on Biomedical Engineering, Vol.47(11) (2000) 1534–1537 7. Chang, J. S., et al.: New Automatic Multi-level Threshold Technique for Segmentation of Thermal Images. Image and Vision Computing, Vol.15 (1997) 23–34
Statistical Neural Network Based Classifiers for Letter Recognition Burcu Erkmen and Tulay Yildirim Yıldız Technical University, Department of Electronics and Communications Engineering, ˙ 34349 Be¸sikta¸s, Istanbul, Turkey {bkapan,tulay}@yildiz.edu.tr
Abstract. In this paper, Statistical Neural Networks have been proven to be an effective classifier method for large sample and high dimensional letter recognition problem. For this purpose, Probabilistic Neural Network (PNN) and General Regression Neural Networks (GRNN) have been applied to classify the 26 capital letters in the English alphabet. Principal Component Analysis (PCA) has been established as a feature extraction and a data compression method to achieve less computational complexity. The low computational complexity obtained by PCA provides a solution for high dimensional letter recognition problem for online operations. Simulation results illustrate that GRNN and PNN are suitable and effective methods for solving classification problems with higher classification accuracy and better generalization performances than their counterparts.
1
Introduction
Letter recognition are challenging classification problem due to the complexity of large sample, high dimensional database. The identification task was also difficult due to the wide diversity among the different fonts and the primitive nature of the attributes. The benchmark data in this paper were taken from the Machine Learning Repository at the University of California with the contributions of David J. Slate [1]. The paper by Frey and Slate [2] is the first article used the dataset in their classification experiment. They investigated the ability of several variations of Holland-Style adaptive classifier systems using the letter dataset, and achieved a testing performance rate of 82.7%. Daqi et al [3] recognize the English letters using modular adaptive RBF type neural networks with 91.85% accuracy. They also achieved a recognition rate of 94.13% using combinative neural networks based classifiers in [4]. Williamson [5] obtained the classification accuracy of 95.95% with the Gaussian ARTMAP. Bermejo et al [6] reached a classification result of 94.29% with 14 attributes and 10.000 patterns with the principle component analysis and 1-Nearest-Neighbour (1-NN) cascade classifier. Fogarty at all [7] achieved 95.7% accuracy with a minor variation of the k-Nearest Neighbor technique. In [8], the bayesian decision scheme on a local ICA representation of the letter image recognition data was implemented. The D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1081–1086, 2006. c Springer-Verlag Berlin Heidelberg 2006
1082
B. Erkmen and T. Yildirim
classification accuracy was 90.3% by using with this feature extraction technique (Bayesian Independent Component Analysis: BICA). The decision tree induction method [9], multilayer perceptron [10], Bayesian Maximum Likelihood classifier with one Gaussian per class (BML) [8] and k-Nearest Neighbor classifier (k-NN) [8] have been applied to letter database, and the accuracy obtained were under 90%. In this paper, Statistical Neural Networks have been applied to the letter recognition problem. The best classification and generalization performance has been obtained using PNN with 96.18% accuracy among the counterparts. GRNN has also been given comparable results and achieved a recognition performance rate of 94.78%. The data set has been compressed using PCA feature extraction technique to reduce computational complexity of the neural classification system. This paper organized as follows. Section 2 briefly describes the PNN and GRNN structures. Section 3 presents some concepts related to PCA feature extraction technique. In Section 4, the information of data set used here, GRNN and PNN classification performance and applying feature extraction to the data set are presented. Finally, Section 5 outlines some conclusions.
2 2.1
Overview of Statistical Neural Networks and Dimensionality Reduction Technique Probabilistic Neural Networks (PNN)
The probabilistic neural network (PNN) is inspired from Bayesian classification and classical estimators for probability density functions. The basic operation performed by the PNN is an estimation of the probability density function of features of each class from the provided training samples using Gaussian kernels. These estimated densities are then used in a Bayesian decision rule to perform the classification. These operations take place at four layers of the PNN. Since PNN is a one pass learning algorithm,the training time for the PNN is very short in comparison to the iterative backpropagation learning algorithm as in MLP. The detail information about PNN can be found in [11]. 2.2
General Regression Neural Networks(GRNN)
General Regression Neural Networks (GRNN) are memory-based network that provides estimates of continuous variables and converges to the underlying regression surface. GRNN is a one pass learning algorithm with highly parallel structure. The principal advantages of GRNN are fast learning and convergence to the optimal regression surface as the number of samples becomes very large. GRNN approximates any arbitrary function between input and output vectors, drawing the function estimate directly from the training data. Furthermore, it is consistent; that is, as the training set size becomes large, the estimation error approaches zero, with only mild restrictions on the function GRNN is particularly advantageous with sparse data in a real time environment, because the
Statistical Neural Network Based Classifiers for Letter Recognition
1083
regression surface is instantly defined every where, even with just one sample. The GRNN topology consists of 4 layers: the input layer, the hidden layer, the summation layer, and the outputs. The detail information about GRNN can be found in [12].
3
Principal Component Analysis
Principal Component Analysis is a multi-variable statistical analysis technique of data compression and feature extraction. Feature extraction refers to a process whereby a data space is transformed into a feature space that, in theory,has exactly the same dimension as the original data space. However, the transformation is designed in such a way that the data set may be represented by a reduced number of effective features and yet retain most of the intrinsic information content of the data; in other words, the data set undergoes a dimensionality reduction. Envisaging practical applications that require online operations for the classifiers, an attractive approach for the classifier design is to reduce the dimensionality of the N-dimensional input space by projecting input data onto a reduced a number of M directions that can facilitate the classification task. The PCA describes the original data space in a base of eigenvectors (computed from process covariance matrix). The corresponding eigenvalues account for the energy of the process in the eigenvector directions. Considering data projection restricted to the M eigenvectors with highest eigenvalues, an effective reduction on dimensionality of original data input space can be achieved, with minimum information loss. The detail calculations about PCA can be found in [13].
4
Letter Recognition Problem
The aim of this study is to employ Statistical Neural Networks to classify English letters. Letter image recognition data set [1] consists of 26 capital letters (A-Z) with different fonts. The letters were derived from 20 fonts that were randomly distorted to form black and white images. Each of a large number of black and white rectangular pixel displays as one of the 26 capital letters. Each image has 16 numerical attributes such as the position and dimensions of bounding box and number of black pixels. Each character image was then scanned, pixel by pixel, to extract 16 numerical attributes. These attributes represent primitive statistical features of the pixel distribution. The attributes were scaled to fit into a range of integer values from 0 through 15. The data set consists of 20.000 total samples. The entire data set was split into train and test sets, the first 16000 exemplars for train set, the remaining 4.000 exemplars for test set as the previous studies. The information of 16 primitive attributes can be found in [2]. 4.1
Method and Simulation Results
GRNN and PNN have been used for classification scheme. Classification process was performed by using MATLAB 7.0. The recognition percentages for these
1084
B. Erkmen and T. Yildirim
Fig. 1. Examples of the character images[2] Table 1. Classification rate (%) comparisons for PNN and GRNN
PNN (with spread=1) GRNN (with spread=0.4)
Train Accuracy (%) Test Accuracy(%) 100 96.18 100 94.78
Table 2. Classification Results of PNN classifier for each letter Letters A Number of Misclassification 2 Number of Samples 156 Recognizing Accuracy(%) 98.7 Letters N Number of Misclassification 10 Number of Samples 166 Recognizing Accuracy(%) 94.0
B
C
D
E
F
G
H
I
J
K
L
M
4
6
5
12
10
4
17
5
9
10
5
4
136 142 167 152 153 164 151 165 148 146 157 144 97.1 95.8 97.0 92.1 93.5 97.6 88.7 97.0 94.0 93.1 96.8 97.0 O P Q R S T U V W X Y Z 1
11
5
11
1
4
2
1
3
6
2
3
139 168 168 161 161 151 168 136 139 159 145 158 99.2 93.5 97.0 93.2 99.4 97.4 98.8 99.3 97.8 96.3 98.6 98.1
classifiers are represented in Table 1. The performance of these classifiers is affected by the choice of spread parameters. The spread values are chosen ’0.4’ for the GRNN and ’1’ for the PNN to achieve maximum classification rate. Classification results of PNN classifier and GRNN classifier for each letter are given in Table 2 and Table 3 respectively. PCA has been established as a feature extraction method. Using PCA, data set which has 16-dimensional input space is reduced to 10-dimensional input space. Classification rates of statistical neural networks cannot be improved using PCA, however, these rates are still in an acceptable range. The classification results are given in Table 4. In previous works, different methods were employed to the letter database. The comparison of recognition accuracy of these methods and Statistical Neural Networks are presented in Table 5.
Statistical Neural Network Based Classifiers for Letter Recognition
1085
Table 3. Classification Results of GRNN classifier for each letter Letters A Number of Misclassification 2 Number of Samples 156 Recognizing Accuracy(%) 98.7 Letters N Number of Misclassification 11 Number of Samples 166 Recognizing Accuracy(%) 93.4
B
C
D
E
F
G
H
I
J
K
L
M
9
6
7
15
14
6
24
6
10
16
5
5
136 142 167 152 153 164 151 165 148 146 157 144 93.4 95.8 95.8 90.1 90.8 96.3 84.1 96.4 93.2 89.0 96.8 96.5 O P Q R S T U V W X Y Z 6
12
8
14
5
6
2
2
3
7
5
3
139 168 168 161 161 151 168 136 139 159 145 158 95.7 92.9 95.2 91.3 96.9 96.0 98.8 98.5 97.8 95.6 96.6 98.1
Table 4. Classification rate (%) comparisons for PNN and GRNN with PCA Train Accuracy (%) Test Accuracy(%) PNN+PCA 100 93.58 GRNN+PCA 100 91.63 Table 5. Recognition Accuracy (%) Comparison of Different Methods G-ARTMAP[5] PCA+1-NN[6] BICA[8] BML[8] k-NN[8] C4.5[9] 95.95 94.29 90.3 87.5 89.9 77.7 MLP[10] IB1[7] M-RBF[3] Comb-NN[4] GRNN PNN 79.03 95.7 91.85 94.13 94.78 96.18
5
Conclusions
In this paper, GRNN and PNN classifiers have been first employed in literature for the purpose of the capital letters recognition in English alphabet. When the performances of statistical neural networks and different methods used in previous studies are compared in terms of successful classification rates, PNN with 96.18% accuracy performs better than all of the other methods, GRNN with 94.78% accuracy also gives a comparable result. These networks have been proven to be effective classifier methods for this large sample and high dimensional dataset. The emphasis in here is not classifier itself but also the process of the feature extraction technique, PCA, achieves low computational complexity. Although the results get a little bit worse than the ones obtained without PCA, the low computational complexity achieved by PCA helps designing compact neural classifier, which is attractive for online operations and hardware realizations.
1086
B. Erkmen and T. Yildirim
References 1. Slate, D. J.: UCI Repository of Machine Learning Databases. (1991) 2. Frey, P.W., Slate, D. J.: Letter Recognition Using Holland Style Adaptive Classifiers. Machine Learning, Vol6. (1991)161-182 3. Daqi, G., Chengyin L., Changwu, L.: Modular Adaptive RBF-Type Neural Networks for Letter Recognition. Neural Networks, 2003. Proceedings of the International Joint Conference on , Vol. 1, (2003)571 - 576 4. Daqi, G., Chao, X., Guiping, N.: Combinative Neural-Network-Based Classifiers for Optical Handwritten Character and Letter Recognition. Neural Networks, Proceedings of the International Joint Conference on, 2003, Vol. 3, (2003)2232 - 2237 5. Williamson, J. R.: Gaussian ARTMAP: A Neural Network for Fast Incremental Learning of Noisy Multidimension Maps. Neural Networks, Vol.9, (1996)881-897 6. Bermejo, S., Cabestany, J.: Oriented Principle Analysis for Large Margin Classifiers. Neural Networks, Vol. 14, (2001)1447-1461 7. Fogarty, T.: First Nearest Neighbor Classification on Frey and Slate’s Letter Recognition Problem. Machine Learning, Vol. 9, (1992)387-388 8. Bressan, M., Guillamet, D., Vitria, J.: Using an ICA Representation of High Dimensional Data for Object Recognition and Classification. Computer Vision and Pattern Recognition, Proceedings of the 2001 IEEE Computer Society Conference on, Vol:1, (2001)I-1004 - I-1009 9. Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kauffann, CA. (1993) 10. Anand, R., Mehrotra, K.M., Mohan, C.K., et al.: Efficient Classification for Multiclass Problems Using Modular Neural Networks. IEEE Transaction on Neural Networks, Vol.14, (1995)117-124 11. Specht, D.F.: Probabilistic Neural Networks. Neural networks, vol.3, (1990)109-118 12. Specht, D.F.: A General Regression Neural Network. IEEE Transactions on Neural Networks, 2(6),(1991)568-576 13. Haykin, S.: Neural Networks. A Comprehensive Foundation, Newyork: Macmillan College Publishing Company, (1994)363-370
The Study of Character Recognition Based on Fuzzy Support Vector Machine∗ Yongjun Ma College of Computer Science and Information Engineering, TUST. Tianjin University of Science and Technology, Tianjin, China [email protected]
Abstract. Support vector machine (SVM) and v-SVM are novel type of learning machine, which have shown to provide better generalization performance than traditional techniques. This thesis introduces a new type of fuzzy support vector machine (Fv-SVM), which based on v-SVM. The new algorithm considers that the input samples have different contributions to the final result, so fuzzy memberships are used to determine the effects of input samples. It also discusses in detail the core algorithms determine the fuzzy memberships based on kernel methods. In the experiments Fv-SVM is used for character recognition. The results show that Fv-SVM has low error rate and better generalization ability.
1 Introduction Support vector machine (SVM) is a novel type of learning machine, which based on statistical learning theory. That is, an SVM is an approximate implementation of the method of structural risk minimization. SVM has shown to provide a better generalization performance than traditional techniques, including neural networks [1]. v-Support vector machine(v-SVM) is proposed on the basis of SVM, which inherits the advantages of SVM. v-SVM introduces a new parameter v to control the classification accuracy instead of parameter C in SVM [2]. Both SVM and v-SVM are fed with input samples that have same importance. In fact, the contributions of samples are not same to classification. Some samples are support vectors (SVs), they can decide the classification results. Some samples have little contributions to final results, even become noise and lower the classification accuracy. So it is necessary that the classifier has ability to evaluate the input samples. Unfortunately v-SVM has no ability to abandon these meaningless samples. If the contributions of input samples are estimated properly before feeding into v-SVM, the classification performance of v-SVM will be improved. In this paper, we construct a new classifier named Fv-SVM by introducing fuzzy membership into v-SVM. Fuzzy membership can be used to estimate the importance of input samples. By marking input samples with different fuzzy membership according to their different contributions, we can lower the influence of outliers and noise, and finally improve the classification performance [3]. ∗
This research is sponsored by a grant of Tianjin Key Technologies R&D Program under contract 04310951R.
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1087 – 1092, 2006. © Springer-Verlag Berlin Heidelberg 2006
1088
Y.J. Ma
This paper is organized as follows. In section 2 we discuss the construction of FvSVM. Section 3 shows the kernel methods based fuzzy membership determination. The obtained experimental results are illustrated in section 4. Finally, Section 5 summarizes the conclusions that can be drawn from the presented research.
2 Fuzzy Support Vector Machine: Fv-SVM 2.1 The Construction of Fv-SVM Given input training vector x i ∈ R d , i = 1, … , n , xi belongs to one of two classes, which marked with yi yi, where 0 ≤ τ i ≤ 1 .
{-1,1}, IJi is membership that indicates the extent xi belongs to
x 1 y1 IJ 1 … x i yi IJ i … x n yn IJ n Given z = ϕ (x ) is the corresponding character vector input space vector x mapping into feature space Z by function ϕ . IJi is Fuzzy membership which indicates the extent xi belongs to one of the two classes. Just like SVM, the purpose of Fv-SVM is to construct classification hyperplane which can maximise distance to two classes, Therefore, the optimization problem is converted as following. min 1 w
2
subject to
2
− vρ +
1 n ¦τ iξ i n i =1
y i ( w ⋅ x i + b) ≥ ρ − ξ i , i = 1, … , n , ξ i ≥ 0,
(1)
ρ ≥0
(2)
Notice that τ i reduces the impact of parameter ξ i on SVM, in the mean time τ i also controls the impact of correspondent input sample xi to SVM, the smaller the parameter τ i , the less important the input sample xi . Thus we can determine the affect of each input sample in the classification according some rules. These rules is directly base on τ i , so the value of τ i is very important. To solve the above optimization problem, we construct the following Lagrangian L p ( w,ȟ, b,ȡ ,τ ,Į, ȕ, δ ) =
1 1 n 2 w − vρ + ¦ τ iξ i 2 n i =1
n
n
i =1
i =1
(3)
− ¦ α i {y i [(w ⋅ x i ) + b ] − ρ + ξ i } + ¦ β i ξ i − δρ
s.t. 0 ≤ τ i ≤ 1
α i , βi ,δ ≥ 0 .
In order to find the saddle point of Lp, we differentiate equation (4) to and setting the results equal to zero, then we obtain:
(4)
w , ξ , b, ρ
The Study of Character Recognition Based on Fuzzy SVM n
1 w = ¦α i yi xi , α i + β i = τ i , n i =1
n
¦α
i
yi = 0 ,
i =1
n
¦α
i
−δ =v
i =1
1089
(5)
By denoting α i , β i , δ ≥ 0 , 0 ≤ τ i ≤ 1 and considering the knowledge of kernel function, Substituting the four equations above into (4), the problem above becomes to the QP problem: max Q (Į ) = − 1 α α y y k (x , x ) ¦ i j i j i j p 2 i , j =1 n
subject to 0 ≤ α i ≤ 1 τ i , n
where
n
¦α
i
i =1
yi = 0 ,
n
¦α
i
(6)
≥v
(7)
i =1
α i is the Lagrangian multiplier corresponding with each sample.
Using the superscript * to denote the optimal values of the cost function, from equation (5) and (2), we derive: n ρ − ξi ∗ − w * ⋅ xi w * = ¦ α i* y i x i , b =
(8)
yi
i =1
b* is the threshold value of classification. Note most of
α i* are usually null, only the nonzero are satisfied with the equality
sign. These points are termed support vectors because they are the only points of input set needed to determine the optimal separate hyperplane, and the decision function becomes § n * · f p (x ) = sgn ¨ ¦ α i y i k ( x i , x ) + b * ¸ © i =1 ¹
(9)
From above we can see that the decision function only consists of inner products (xi· x), which can be replaced by kernel K(xi , x), thus the input space is mapped into a high dimensional feature space through the nonlinear mapping defined by kernel function, the linearly nonseparable problem becomes linearly separable. 2.2 Algorithms Complexity Analysis The computational complexity of SVM classifier is decided by the size N of training set, the dimensions d of input space , the number Nsv of SVs and their distributions. The experiments show that in most cases N sv N << 1 , in the support vectors sets most support vectors are not the boundary support vectors, so the computational 3 2 complexity of SVM classifier is O N SV + NN SV + dNN SV [4]. The parameters v of v-SVM is not changing the computational complexity. Furthermore, the parameter τ i of Fv-SVM does not add additional computational burden of classifier, on the
(
)
contrary side, Fv-SVM can abandon some input samples which are mostly not SVs
1090
Y.J. Ma
when
τ =0
or
τ → 0.
So the training sets become smaller, that is N ′ < N , the
computational complexity is
(
)
3 2 O N SV + N ′N SV + dN ′N SV , here Nsv remains
constantly. From above we can see the computational complexity is lower than v-SVM and SVM. In Fv-SVM ξ i is used to measure the extents the samples are misclassified, τ i is fuzzy membership that indicates the contribution of xi to classification, then τ iξ i is the parameter with different weights that used to measure the classification error rate. From equation 10, we can see α i is influenced by τ i , while α i is the weight corresponding with SVs. So we can control the weights of SVs though tuning
τ i , and
control the classification results finally.
3
The Computation of τ i Based on Kernel Methods
The purpose of SVM is to maximize the margin between two classes, that is minimize ||w||. From the point of this view, the Euclidean distance between two classes has close relationship with the decision function. The nearest points between two classes will be SVs, the far points have low possibility to be SVs [5]. Given two sets samples,
S + and S − whose dimensions are s0, xi is input
y i = ±1 , Φ is the map from input space to feature space. In high
dimensional feature space the square of Euclidean distance between two samples Φ(xi) and Φ(xj) is as following: 2
d2 (Φ(xi), Φ(xj)) = ||Φ( xi ) − Φ( xj )|| = (Φ( xi ) − Φ( xj )) . (Φ( xi ) − Φ( xj )) (10)
= Φ( xi ) . Φ( xi ) + Φ( xj ) . Φ( xj ) − 2Φ( xi ) . Φ( xj ) = K( xi , xi ) + K( xj , xj ) − 2K( xi , xj ) For each sample in one class, we firstly find the nearest class based on kernel function. These
k s samples in the other
k s samples come to be a new set, which marked
as S , i = 1,2,3, . 0 i
We can get all these S i0 after each sample is computed by the method above. All the S i0 come into being a new set named S new , which can be written as S = S 0 . In i new i
most cases it can only include most samples near the optimal separate plane. Let
τi
of the samples lay outside S new be zero. Considering that the distance from each
The Study of Character Recognition Based on Fuzzy SVM
samples of S new to the nearest
k s samples belong to the other class, let
1091
xi ∈ S + ,
x j ∈ S − , i, j = 1,2, , s , thus the fuzzy membership of xi can be defined as 0
τi =
ks
¦
j =1
where
(
( ))
d Φ (x i ), Φ x j
2
ks
(
( ))
2
¦ d Φ (x i ), Φ x j j =1
(11)
0 ≤ τ i ≤ 1.
4 Experiments Character recognition of car card is widely used in intelligent transportation system. The accuracy of character recognition is important in its applications, especially when the image quality of car card is poor. In the following experiments the Fv-SVM based character recognition results is studied in two cases: the image of car cards are good and blurred by noise [6]. The experimental environment is PIV1.7G/ 512M Memory/ Win2000/ Matlab6.5/ VC++6.0. The input samples are car card characters, which consist of 90 groups of characters: there are 10 numbers, 26 letters and 54 Chinese characters. The features are extracted from samples by gridding net density. From each group 20 samples are selected to build up input samples set which contains 1800 samples.In the experiments we choose second-order polynomial function as the kernel function. The original input image is shown in figure 1(a), in figure 1(b) the original images were added aussian noise and pepper& salt noise.
(a)
(b)
Fig. 1. Input samples. (a) Original images. (b) Noise corrupted images, which have been added with pepper& salt noise 0.6.
We use SVM, v-SVM and Fv-SVM to classify the samples showed in figure 1(a). The results show in the table 1. Table 1. Comparison among Fv-SVM, v-SVM and SVM
Classifier
Number of SV
Time
s
Accuracy(%)
Fv-SVM(v = 0.3)
196
126
97.3
v-SVM (v = 0.3)
135
93
84.5
SVM (C =200)
137
81
84.2
1092
Y.J. Ma
In table 1, there are 35 SVs in Fv-SVM , the accuracy rate is the best among three classifiers, it reaches to 97.3%, but Fv-SVM is also time-consuming. In table 2 we compare the classification results. Table 2. Comparison between original images and noise corrupted images
Noise Original Images Pepper& Salt 0.2 Pepper& Salt 0.6 Pepper& Salt 0.8 Gaussian noise0.2 Gaussian noise 0.6 Gaussian noise 0.8
Numbers of SV 196 189 125 93 195 198 143
Accuracy (%) 97.3 94.8 87.1 45.8 92.3 79.2 40.8
From the table we can conclude that Fv-SVM has good tolerance to noise, when pepper &salt noise and gaussian noise add up to 0.6, the classification accuracy is acceptable.
5 Conclusions In this paper Fv-SVM is proposed on the basis of v-SVM, which considers that different sample having different contribution to the final classification results. The different contribution is measured by fuzzy membership, which can be determined by kernel methods. So, the classification accuracy is improved. The experiments show that Fv-SVM has high accuracy. But in the other hand, Fv-SVM is also time consuming. It is worthy of studying how to balance the accuracy and time consuming.
References 1. Ma, Y.J., Kong, B.: A Study of Object Detection based on Fuzzy Support Vector Machine and Template Matching. In: IEEE Proceedings of the 5th World Congress on Intelligent Control and Automation, vol.5, Hangzhou, P.R. China (2004) 4137-4140 2. Scholkopf, B., Smolad, A.J.: New Support Vector Algorithms, neurocolt2 nc2-tr-1998-031. Technical report, GMD First and Australian National University (1998) 3. ChunFu Lin and ShengDe Wang: Fuzzy Support Vector Machines, IEEE Trans. On Neural Networks, vol.13, no.2, (2002) 464 - 471 4. Christopher, J., Burges, C.: A Tutorial on Support Vector Machines for Pattern Recognition. Knowledge Discovery and Data Mining, vol.2, no.2, (1998) 235-244 5. Trevor, H., Robert, T., Jerome, F..: The Elemnets of Statistical Learning. Springer-Verlag Press (2004) 6. Ma, Y.J.: The Study of SVM and Its Applications in Image Analysis. Ph.D dissertation (Chinese). Hefei: University of Science and Technology of China (2002)
Tracking, Record, and Analysis System of Animal’s Motion for the Clinic Experiment Jae-Hyuk Han , Young-Jun Song , Dong-Jin Kwon , and Jae-Hyeong Ahn Chungbuk National University, 12 Gaeshin-dong, Heungduk-gu, Chungbuk, Korea {[email protected], [email protected], [email protected], [email protected]}
Abstract. This paper presents a system that accurately records an animal's motion during clinical experiments, using a camera connected to a computer, and analyzing and serving data. Using images input through a general CCD camera, the system separates background and animal, and stores the location and state. Existing systems support tracking of a single animal and analysis of path, velocity, and so on. However, the proposed system supports multiple animal tracking and analysis of animal conditions by modeling the shape of the animal.
1 Introduction The necessity of automatic systems, such surveillance systems, has increased. In the traditional computer vision method, systems that recognize moving objects from a digital image have been researched. This research can be applied to various automated tracking and detecting systems, such car tracking, traffic control systems, precision automation systems, military operations, and industrial automation [1][2]. Tracking systems for animal motion are gradually becoming popular in the field of clinical experiments. In the majority of animal experiments, recording and writing animal motion depends entirely on human sight. In order to create an automatic system, new experimental equipment for replacing human sight is required [3]. The focus of this paper is the creation of a system using the general CCD camera for animal motion tracking. Among the various methods of tracking animal motion, the most common methods include correlation, spatio-temporal gradient, block matching [4], using different images between frames, and so on. In this paper, the adaptive threshold method is used, among the different images between frames, the moving animal is separated from the background, only the parts corresponding to the animal are extracted, and data modeling is employed in order to analyze and record the result. The HMI (Hu moment invariant) constant moment of Hu to 4 sequences [5] [6] is used for modeling the experimental animal contour. In section 2, the system composition is explained. In section 3, the tracking and analyzing system is described. Finally, in section 4, the conclusion is presented.
2 System Composition The proposed system consists of a frame grabber for digitalizing images input from a camera, as presented in figure 1. The tracking program manages digital images, D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1093 – 1100, 2006. © Springer-Verlag Berlin Heidelberg 2006
1094
J.-H. Han et al.
separates experimental animals from the background, and models and stores the result. The analysis program analyzes the tracking data and then extracts the experimental data. In the majority of cases, a 4-arm or 8-arm maze is used, and the camera is installed on a stand. The camera on the stand can photograph the maze vertically and move easily. The image capture board converts the analog TV signal input into digital data. The suggested system manufactures and supplies the image board to speed up the processing time and reduce the risk of illegal software duplication.
(a)
(b)
Fig. 1. The tracking system for animal’s motions and the image capture board, where (a) is the tracking system, (b) is the image capture board
3 Tracking System and Analysis System 3.1 Tracking System The experiment is conducted in a controlled environment. The illumination and a background are setup artificially. Consequently, the experimental animal can be extracted from the input image in the image of the background without the animal. The proposed system uses the background in advance and tracks animals by comparing this with the images containing the animal. At first, an image without the animal is stored as the background image. When the real experiment begins, it is possible to detect the absolute difference between the input image and the background image. When the absolute difference from the background image is greater than the experimental threshold, the higher pixel is recognized and the pixel pertinent to the experimental animal is stored. In the case the experimental animal enters into a shade area or the animal color is similar to the background color, there is the issue that the animal will not be recognized well because the absolute difference between the animal and the background is less than the threshold. In order to solve this problem, a white and black maze is used as the background. However, in such a black background, the shade occurred by the wall of the maze. When the animal enters to the shade area, it can occur, where the absolute value between the background and the animal
Tracking, Record, and Analysis System of Animal’s Motion for the Clinic Experiment
1095
Fig. 2. Tracking Program Processing Screen for 2 animals
cannot be recognized. In the proposed system, to solve this problem, the threshold is used and varied adaptively, according to animal color. In the two-dimensional image space (i, j ) having the discrete value, each pixel value of images of M×N size, supposes a function according to the location as the background image is
f (i, j ) . If each pixel value of
f B (i, j ) and the pixel value of the input image in which the
animal exists f (i, j ) , its motion is like that in equation (1) , and the absolute difference between the threshold T and two images is compared.
1 if f B (i, j ) − f I (i, j ) > T f o (i, j ) = ® ¯ 0 else
(1)
f B (i, j ) = 1 , becomes a pixel pertinent to the animal. However, in the case f B (i, j ) has a middle level value pertinent to the shade At this time, the pixel that is
area, a lower threshold can be applied. Accordingly, the threshold T is used adaptively such as in equation (2), according to the value of the background image pixel.
§ 128 − f B (i, j ) − 128 · ¸ T ′ = T ¨¨1 − ¸ 256 © ¹
(2)
Once the threshold is applied by the above equation, it can be extracted perfectly, even if an object enters the shaded area. The program processing screen of the suggested tracking system is as presented in figure 2. Users can set the experimental time and frequency. In addition, the system is divided into areas, and is possible to trace several moving animals simultaneously. The maximum possible tracking area is four, as revealed as figure 3. Similar to the water maze test, for the experiment measuring the time required to obtain a specific area for the experimental animal, if the goal area is set up, the arrival time can be compared. When the animal comes to the goal area, the experiment is finished automatically.
1096
J.-H. Han et al.
3.2 Analysis System 3.2.1 Object's Basic Feature Expression through Shape Descriptors There are experimental animal size, location, and direction as features to compare and analyze both forms and behavior aspects of the experimental animal. Almost importantly, prior to obtaining the features, it is assumed that the animal binary image of m×n size is given. Features for the size are expressed as the number of pixels that the animal consumes. n
m
A = ∑∑ B(i, j )
(3)
i =1 j =1
Fig. 3. Shape descriptor
In a binary image, as the experimental animal's central location is the same as the area location, its central location (xc,yc) can be obtained by equation (4). n
xc =
m
n
∑∑ jB(i, j ) i =1 j =1
A
,
yc =
m
∑∑ iB(i, j ) i =1 j =1
(4)
A
The shape descriptor used in this paper is presented in figure 3.[7] If the animal's central location is fixed in advance, the animal's position can be calculated by using the central location and obtaining its major axis, minor axis, and slope as in equation (5). In the animal experiment, as the same animal motion is recorded for the experimental time, the long and short lengths are measured in the labeling area. Therefore, if the long and short length is compared, it can be found whether the animal stands, crouches, or moves. Equation (5a) stands for the motion to the x axis; equation (5b) to the y axis and equation (5c) to two dimensional directions. Equations (5d) and (5e), are the long and short axis, respectively, and equation (5f) is the direction. 1 ( x − x c )2 μ xx = (5a) ∑ A ( x , y )∈R
μ yy =
1 2 ∑ ( y − yc ) A ( x , y )∈R
(5b)
Tracking, Record, and Analysis System of Animal’s Motion for the Clinic Experiment
1 ¦ (x − xc )( y − y c ) A ( x , y )∈R
μ xy =
a = 2 2 μ xx + μ yy + b = 2 2 μ xx + μ yy −
(μ (μ
1097
(5c)
xx
− μ yy ) + 4μ xy2
(5d)
xx
− μ yy ) + 4 μ xy2
(5e)
2
2
− 2 μ xy −1 , °tan (μ xx − μ yy ) + (μ xx − μ yy )2 + 4μ xy2 °° θ =® 2 ° −1 μ yy − μ xx + (μ yy − μ xx ) + 4 μ xy2 tan ° − 2μ xy °¯
if ( μ yy ≥ μ xx ) (5f)
The shape descriptor used in the proposed system is the HMI. This is suggested by the nonlinear compound of a geometrical moment by Hu. The main feature of the HMI is that it has invariable character for various conversions such as the object location, rotation, size and so on. In the two dimensional image spaces taking discrete values, if each pixel value of each image having M×N size is f (i, j ), ( p + q) degree moment mpq will appear as in equation (6) M
N
m pq = ¦¦ i p j q f (i, j )
(6)
i = 0 j =0
If there is only one object that exists, and all background values in images are zero, (xc,yc), the center of the object, can be determined using equation (7).
xc = m10 / m00 ,
y c = m01 / m00
(7)
In this point, the 0 degree moment is the object area, and the first moment (m10, m01) is the distribution value of the i and j axis. After obtaining the center, the central moment and fixed central moment such as equation (8), can be found by summing the values, excluding the center, in accordance with the weigh of the coordinates in the object value.
μ pq = ¦¦ (x − xc ) p ( y − y c )q f ( x, y )
(8a)
η pq = μ pq / μ 00( p + q + 2 ) / 2
(8b)
x
y
When using the fixed center moment, HMI can be determined [5][6]. HMI values have 7 values originally. Even though the values from M5 to M7 among the values can express the object more precisely, owing to describing the components of the result, the components individually are very sensitive to noise. In this system, only four values from M1 to M4 are used to reduce calculations and to obtain a fast response time, and then perceive the variation of the animal's outer shape.
1098
J.-H. Han et al.
3.2.2 Function of the Analysis System First, the animal trace from the experimental time can be seen. At this point as a starting point appears to be red and the final point appears to be blue, the trace when an object moves can be visibly distinguished. Second, the total distance that the animal moves and average motion speed can be measured. In the case of the experimental speed, it is possible for the user to define and measure the speed section, and average motion speed. For example, the low speed section can be separated from the high speed section and then motion time can be measured. Third, to make it possible to analyze the distance the animal moves in some area of the maze, the tracking area is divided into districts, the motion distance and possession time are measured, and the share compared with the total district is shown as a percentage. Fourth, every analysis is conducted based on the tracking area designated by the user. In the analysis program, it is possible for the user to voluntarily designate and analyze the area. Fifth, after the experiment finishes, new and additional analyses are possible, as the user can replay and watch the experiment. The existing system does not suggest a method to measure and analyze object shape. Consequently, only the object central location is recorded, and the analysis of motion distance, speed, and trace is made possible. The developed system proposes the method to conduct modeling and record animal shape by analyzing data and extending the analysis functions. This makes it possible to analyze data of the specific form such as the following, and offer this to users.
Fig. 4. Analysis program processing screen
First, the long and short axis of the animal are measured and a graph is offered to compare at a relative rate. Its short axis is the shortest constituent of the trunk horizontally. Accordingly, it hardly changes during the experimental time. However, the long axis is included in body length from tail to head, and changes continuously. If variation of the relative rates between the long and short axis is relatively low, it
Tracking, Record, and Analysis System of Animal’s Motion for the Clinic Experiment
1099
can be concluded that the animal does a specific behavior. However, the in case the animal stands and crouches, as the body length becomes short, it can be used as a criterion to distinguish abnormal animal behavior. This behavior is called rearing; and can be utilized as analysis data by offering a graph as presented in figure 5.
(a)
(b)
Fig. 5. Result graph, where (a) is the Long and short axis rate graph, (b) is the HMI graph
Second, the developed system uses the constant moment as the method of modeling the experimental animal contour. The user can grasp a change in its contour through a variation rate of the HMI value and detect abnormal behavior.
4 Conclusion The system proposed in this paper consists of general image capture board, a tracking program which operates in driver and window surroundings, and an analysis program. An automated experiment is made possible when the proposed system is deployed. Therefore, the system can retrench labor power, and continue the experiment without limitation as long as disk space is sufficient. The existing system can only be measured in a restricted situation. But, if the animal can be distinguished without any relationship between background and illumination, the proposed system is a unique product, making it possible to experiment with several kinds of experiments, as measured in any situation. In addition, in the case the animal experiment is conducted using passive records by hands, experimental data is not objective, and the experiment result is not sometimes acknowledged externally. The measuring method for this system extracts data objects using an instrument, the experiment result can be confirmed, admitted objectivity, and if possible, analyzed several times. The development technique is a technique that extracts the moving animal in an image, and models, recognizes, and tracks the moving animal in various situations. Accordingly, the result of this study is applicable directly to the lookout and preservation system, and so on.
1100
J.-H. Han et al.
References 1. Naohiro, A., Akihiro, F.: Detection Obstructions and Tracking Moving Object by Image Processing Technique. Electronics and Communications in Japan. Vol.82. (1999) 28-33 2. Betke, M., Haritaoglu, E.: L. S. Davis.:Real-Time Multiple Vehicle Detection and Tracking from a Moving Vehicle. Machine Vision and Applications. 12:2. (2000) 69-72 3. http://vision.fe.uni-lj.si/research/trackan/index.html 4. Gharavi, H., Mills, M.: Block-Matching Motion Estimation. IEEE Trans. on Commum. Vol.38. (1990) 950-953 5. Gouda, I. S., Lynn, A.A.: Moment invariants and Quantization Effects. IEEE Proceedings. (1998) 157-163 6. Alexander, G., Mamistvalov: N-Dimensional Moment Invariants and Conceptual mathematical Theory of Recognition N-Dimentional Solids. IEEE Trans. On Pattern analysis and Machine Intelligence. Vol.20. No.8 (1998) 7. Haralick, R.M., Shapiro, L.G.: Computer and Robot Vision. Journal Addison – Wesley. Vol.1. (1992)
VEP Estimation with Feature Enhancement by Whiten Filter for Brain Computer Interface* Jin-an Guan School of Electronic Engineering, South-Central University for Nationalities, Wuhan, 430074, China [email protected]
Abstract. An imitating-natural-reading paradigm was used to induce robust visual evoked potentials (VEPs) which as carriers for a brain-computer interface based mental speller. Support vector machine (SVM) was adopted in the single-trail classification on the features. To improve the accuracy of pattern recognition, thus to boost up the bit rate of whole system, a 300ms window was used to estimate the accurate time of target stimuli present from EEG signals. As the spontaneous EEG could be regarded as a stationary random process in a short period, a whiten filter was constructed by the AR parameters which calculated from those non-target induced signals. In succession, real-world signals were input to the filter where the spontaneous EEGs were whitened. Finally, a wavelet method was applied to have the white signals filtered. The results boosted up the classification accuracy by enhancing the target signals.
1 Introduction We are now constructing a mental speller based on a novel technique which is called Brain Computer Interface (BCI). BCI provides a direct communication channel from the user’s brain to the external world by reading the electrical signatures of brain’s activity and its responses to external stimuli. These responses can then be translated to computer commands, thus providing a communication link, particularly for people with severe disabilities [1], [2]. Visual Evoked Potentials (VEPs) are usually exploited as communication carriers between brain and computer in a mental speller paradigm. The measured responses are often considered as the combination of electric activity resulted by multiple brain generators, active in association with the eliciting event, and noises which are brain activity not related to the stimulus, together with interference from non neural sources, like eye blinks and other artifacts. Even though they are dominated by lower frequencies, compared to background electroencephalogram (EEG), due to poor signal-to-noise ratio (SNR) conditions, their form is difficult to be estimated in a trial to trial scheme [3],[4]. As spontaneous EEG could be regarded as a stationary random process in a short period, we could process the signals using stationary signal models [5]. In order to *
This work is supported by NSF of South-Central University for Nationalities Grant # YZZ05015 to Jin-an Guan.
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1101 – 1106, 2006. © Springer-Verlag Berlin Heidelberg 2006
1102
J.-a. Guan
identify patterns from EEG signals for the speller application, we use a Support Vector Machine (SVM) as our classifier [6]. To speed up the bits rate of our system, a relatively shorter time window to detect evoked potentials is preferred. This paper presents a novel component of VEP-N2, which not reported in literatures, as features for BCI. Before features were input into a classifier, an AR whiten filter procedure was implemented to have the N2 components enhanced.
2 Methods 2.1 Experimental Setup and Data Acquisition Experimental model and data come from the cognitive laboratory in South-Central University for Nationalities of China. The objective of the experimental data acquisition was to obtain EEG signals during Imitating-Natural-Reading paradigm with target onset and non target onset. EEG activity was recorded from Fz, Cz, Pz, and Oz. Subjects viewed a monitor which has a window in the center and with a size of 16×16 pixels containing gray patterns against a black background. The continuous symbol string which consists of target and non-target symbols move through the window smoothly from right to left at a speed of 160ms/symbol. The epoch was started at a short tune, which remind the subject to focus his eye to the window where non-target symbols were moving continuously. The delay between start time and the target symbol to appear varied randomly between 1.5-3 s. In each trial, acquisition of EEG started at 320ms (for subject H, and 210ms for subject M and T ) before target onset and then halted at 880ms (for subject H, and 990ms for subject M and T ) after target presenting, thus totally 512 samples in 1.2 seconds were sampled. More detailed description about the experiment could be found in [7]. 2.2 Feature Enhancement Using Whiten Filter Spontaneous EEGs are comprehensive electrical physiological reflections of neural system on the cortex or scalp. It will running ceaselessly as long as a man is alive, on matter he/shi is in a state of active thinking or be induced passively or in a state of unconsciousness. The producing mechanism is very complex. But the spontaneous EEG could be regarded as a stationary random process in a short period, we could process the signals using stationary signal models. For an Auto Regressive model, we denote w(n) the white noise, the output of spontaneous EEG(denote by E(n)) through system A(z) is a white noise: p
w( n) = ¦ ak E (n − k ) . k =0
(1)
Where ak are parameters of AR model, p is the order of AR model. Now, presume that the trial recordings from user are only linearly mixed of Event-Related Potential (ERP) and spontaneous EEG, and in succession hypothesize that they are independent each other, then the recorded signal x(n) of every stimuli could be represented as follow: x ( n) = s ( n) + E ( n) . (2)
VEP Estimation with Feature Enhancement by Whiten Filter for BCI
1103
Where s(n) is ERP component, E(n) is spontaneous EEG. In order to discriminate the EEG of target and non-target stimulus, we take the non-target EEG as the spontaneous EEG, i.e. E(n). To wipe off E(n), three steps should be taken: (1) Calculate the parameters of AR model using the non-target stimuli signals to construction a whiten filter; (2) Filtering x(n) by the whiten filter: p
p
p
k =0
k =0
k =0
y(n) = ¦ ak x(n − k ) = ¦ ak {s(n − k ) + E (n − k )} = ¦ ak s(n − k ) + w(n) .
(3)
p
Let Y (n) = ¦ ak s (n − k ) as the result of ERP through the whiten filter, we have: k =0
y ( n ) = Y ( n ) + w( n ) .
(4)
(3) Filtering the whitened noise signals using Mallat wavelet algorithm. Fig.1 is the 84 trial averaged waveform of target (thick dash line) and non-target stimuli from subject H. Fig.2 is the results of whitened signals after filtered by wavelet. Compare Fig.1 to Fig.2, we could find that besides a few point at the beginning caused by cutting edge effective, the non-target signal is become smoother and target signal is more prominent. These suggest that the spontaneous EEG were rejected effectively.
15
SUBJECT H CHANNEL Oz target vs. non-target sample rate 427Hz
10
amplitude (microvolt)
5
0
-5
-10
0
20
40
60
80
100
120
140
samples( time interval 150ms~450ms)
Fig.1. 84 trials averaged waveform of target (thick dash line) and non-target stimuli (solid line)
1104
J.-a. Guan
subject H channel Oz
targets vs.non-targets
0.7
0.6
wavelet filtered
relative amplitude
0.5
target time interval: 150ms~450ms
0.4
0.3
0.2
0.1
non-target time interval: -300ms~0ms
0
-0.1
-0.2
0
20
40
60
80
100
120
140
samples
Fig. 2. The results of the signals in Fig.4 after filtered by wavelet
2.3 Single Trial Estimation of ERP Using SVM
We are now Using N2 components enhanced by the methods described above as features, to be classified by a SVM classifier. In our experiments, the Matlab6 Toolbox was used to perform the classification. The radial basis function was taken as kernel function. To prevent overfitting and underestimating the generalization error during training, the dataset of all trials was equally divided into two parts, the training set and testing set. The model parameters of Ȟ-SVM and generalization errors were estimated by a 10 by 10-fold cross-validation procedure which only be performed on the training set. Then, using these best parameters, we performed a leave-one-out procedure 10 times to evaluate the averaged classification accuracy on the testing set. Values of gamma and Ȟ, and the expected correct classification rate, were determined by the run of with best generalization performance. These parameters were finally performed to the testing set which was not appeared in the training stage. Finally, we performed a leave-one-out procedure 30 times to evaluate the averaged classification accuracy for the testing set. At this stage, two steps were done: firstly, using all training data but one leave-out as testing data, then a set of classification parameters were obtained; secondly, using these parameters to classifying the testing set. These steps repeated for 30 times to get the averaged performance.
3 Results and Discussion Results tabulated in Table 1 show the differences effective of classification from the three subjects using N2 as features. The best averaged correct classification rate is from subject H, the value is 94.1%. The results from subject M and T is 84.8% and 82.3% respectively.
VEP Estimation with Feature Enhancement by Whiten Filter for BCI
1105
Table 1. Averaged results of 30 leave-one-out cross-validation of three subjects
Subject Accuracy(%)
H
M
max min avg 96.0 91.8 94.1
T
max min avg 87.6 82.3 84.8
max min avg 83.5 79.2 82.3
The reason of exploit the signals from Oz is that the N2 waves from Oz is stronger than other sites. Fig. 3 shows this is feasible by grand average of trials. The figure also shows other interesting features that, from forehead to occipital, the P2 decreases and the N2 increases respectively. The averaged peak amplitude from Oz is – 9.1microvolts and at time point 245ms. The P3 wave starts about at 300ms and reaches to top about at 420ms. Therefore, in order to evaluate the feasibility of using
Fig. 3. Grand averaged potentials of all trials from subject H 15
amplitude (microvolt)
Oz 10
5
0
subject M subject T subject H
-5 245ms
-10 -200
-100
0
100
200
300
400
500
600
700
800
900
time (ms)
Fig. 4. Comparison of grand averaged EEG signals at electrode Oz from three subjects
ï
1106
J.-a. Guan
N2 waves as features for brain-computer interface, we intercept the segment only from 0ms to 300ms as features to input into the Ȟ–SVM classifier. Fig.4 shows the grand averaged EEG signals at electrode Oz from three subjects. It shows that the amplitude of N2 from subject H is greatest in the three subjects. The N2 from subject T is smaller, and subject M almost have not N2 be elicited. These implied that the effectives of using N2 as features for BCI would be differences from subjects. Therefore, using N2 as features for classifier would be a subject- specified option.
4 Conclusion The present paper introduced a novel component N2 of VEP as a carrier for constructing a BCI based speller. In this experiment, the best averaged correct classification rate with single-trial EEG is up to 94%. Comparison to our previous works in literature [7], conducting an AR whiten filter to have the N2 features enhanced is a feasible way. In literature [7], due to without preprocess of AR whiten filter, the best classification accuracy is only 90%. The results also suggest that the VEP inducing paradigm using imitated-naturalreading could reduce the signal-to-noise ratio significantly and thus could achieve higher correct rate of classification than those VEP inducing paradigms using flashing stimulation. Due to the robust performance, the N2 components of VEP can be exploited as features for brain computer interfaces. But these features are subjectspecified because of the difference amplitude of N2 in different subjects.
References 1. Wolpaw, J. R., Birbaumer, N., McFarland, D.J. et al.: Brain-computer Interfaces for Communication and Control, Clin. Neurophsiol., 113(2002)767–791 2. Thulasidas, M., Guan, C., Wu, J.K.: Robust Classification of EEG Signal for Brain– Computer Interface. IEEE Trans. Neural Syst. Rehab. Eng., 14(2006)24-29 3. Garrett, D., Peterson, D. A., Anderson, C. W., et al.: Comparison of Linear, Nonlinear, and Feature Selection Methods for EEG Signal Classification. IEEE Trans. Neural Syst. Rehab. Eng., 11(2003)141–144 4. Blankertz, B., Muller, K.R., Curio, G., et al.: The BCI Competition 2003: Progress and Perspectives in Detection and Discrimination of EEG Single Trials , IEEE Trans. Biomed. Eng., 51(2004)1044–1051 5. Gao, K.F.: Carriers Extraction for Brain Computer Interface Using Wavelet. Master dissertation of SCUEC. (2004) 5 6. Chang, C.-C., Lin, C.-J.: Training Support Vector Classifiers: Theory and algorithms. Neural Computation, 13(2001)2119–2147 7. Guan, J.A., Chen, Y.G., Lin, J.R.: N2 Components as Features for Brain Computer Interface, Proc. of the first int. conf. on neural interface & control, IEEE press, Wuhan, (2005) 45-49
Weight Estimation for Audio-Visual Multi-level Fusion in Bimodal Speaker Identification Zhiyong Wu1,2, Lianhong Cai2, and Helen M. Meng1 1
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China [email protected], [email protected] 2 Department of Computer, Tsinghua University, Beijing 100084, China [email protected]
Abstract. This paper investigates the estimation of fusion weights under varying acoustic noise conditions for audio-visual multi-level hybrid fusion strategy in speaker identification. The multi-level fusion combines model level and decision level fusion via dynamic Bayesian networks (DBNs). A novel methodology known as support vector regression (SVR) is utilized to estimate the fusion weights directly from audio features; Sigma-Pi network sampling method is also incorporated to reduce feature dimensions. Experiments on the homegrown Chinese database and CMU English database both demonstrate that the method improves the accuracies of audio-visual bimodal speaker identification under dynamically varying acoustic noise conditions.
1 Introduction Human speech is bimodal. While audio is major source of speech information, visual component is considered to be valuable supplementary in noisy environments because it remains unaffected by acoustic noise. Many studies show fusion of audio visual features leads to more accurate speaker identification in noisy environments [1-3]. The audio-visual fusion can be divided into feature level, decision level and model level [2-3]. We have proposed a multi-level hybrid fusion strategy based on dynamic Bayesian networks (DBNs). It combines model level and decision level fusion to achieve improved performance [4]. In such a strategy, the fusion weights are of great importance as they must capture the reliability of inputs which may vary dynamically. In literature, fusion weights have been usually determined during training and remain fixed for all subsequent testing [2-3]. Hence the weights may not match input testing patterns well, leading to inferior accuracies compared with mono-modal identification as speech can vary dramatically at temporal level (noise bursts) in practice. This paper attempts to estimate fusion weights directly from audio stream to capture dynamical variations of acoustic noise in a reasonable way. A novel method known as support vector regression (SVR) [5] is utilized, which performs function approximation based on structural risk minimization. Sigma-Pi network [6] sampling is also incorporated for feature dimension reduction. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1107 – 1112, 2006. © Springer-Verlag Berlin Heidelberg 2006
1108
Z. Wu, L. Cai, and H.M. Meng
2 Multi-level Fusion of Audio-Visual Features The audio-visual features can complement each other. Furthermore, different levels of fusion strategies can reinforce each other too. For example, model level fusion outperforms decision level fusion in most cases; and performance of decision level fusion may be better than that of model level in the very noisy environments [2, 4].
Fig. 1. DBN based audio-visual multi-level fusion
In view of the advantages of model level and decision level fusion, we proposed a multi-level fusion strategy via DBNs, as illustrated in figure 1. There are three models: audio-only, video-only, and audio-visual correlative model (AVCM) that performs model level fusion. These models are further combined at decision level to deliver the final speaker identification result. AVCM captures the inter-dependencies between audio and visual features and the loose temporal synchronicity among them. Further studies of multi-level fusion are given in [4]. The formula is: P(OA, OV | MA, MV, MAV) = [P(OA | MA)]λA [P(OV | MV)]λV [P(OA, OV | MAV)]λAV.
(1)
Where P(OA | MA) is the formula for audio-only model MA of audio observation OA, P(OV | MV) is the formula for video-only model MV of video observation OV, and P(OA, OV | MAV) is the formula for AVCM model MAV. λA, λV and λAV are fusion weights.
3 Estimating Fusion Weight with Support Vector Regression The estimation of fusion weights is a key issue. We enforce constraints of λA+λV+λAV =1 and λA,λV,λAv0. We also impose λA=λAV by assuming performances of both audioonly and AVCM models are equally dependent on the quality of acoustic speech. Support vector regression (SVR) is used to estimate audio weight λA directly from original audio features because it has powerful ability in learning and can achieve high degree of generalization by means of structural risk minimization [5]. 3.1 Fusion Weight Estimating Strategy Figure 2 depicts processing steps using SVR to estimate audio fusion weight. Primary audio features are first extracted from original audio speech, which are re-sampled by
Weight Estimation for Audio-Visual Multi-level Fusion
1109
¦ t
Π
Fig. 2. Processing steps for fusion weight estimation
Sigma-Pi sampling [6] to obtain secondary distribution features that describe distributions of original audio features. Finally, SVR is used to predict the fusion weight. The quality of input audio is obtained from relatively long time span (e.g., 1500ms) of original audio features. If these features are input directly into the SVR module to estimate λA, there will be too many dimensions for computation (e.g., a 28-order speech feature vector sampled with the frame shift of 11ms will give 28×1500/11= 3818 dimensions!). In order to reduce the amount of computation, we propose to use Sigma-Pi networks to sample the primary audio features prior to further processing. 3.2 Dimension Reduction with Sigma-Pi Sampling Sigma-Pi sampling is defined on sequences of primary audio features, the horizontal vertex is time and the vertical vertex represents the primary features. It consists of two windows of different size with constant distance in time and feature position. The size of small window is fixed to 1 and the size of large window is changeable.
¦ ∏
Fig. 3. Schematic overview of Sigma-Pi sampling [6]
If the primary audio feature values are p(t,f), the secondary distribution features s(f1,f2,t0,ǻt,ǻf) are calculated as follows:
s=
1 Δt Δf
ª p ( t , f ) Δ t − 1 Δf −1 p ( t + t ¦« ¦¦ t ¬ t '= 0 f '= 0 1
0
º ¼
+ t ', f 2 + f ') . »
(2)
Where f1 is the feature of small window, f2 is the feature of bottom left corner of large window, t0 is the time difference between two windows and ǻtǻf is the extension of large window in time and feature. Small window value is multiplied with the mean of large window which are then integrated over time, resulting in a single secondary feature value, which reflects the distributions of original primary audio features.
1110
Z. Wu, L. Cai, and H.M. Meng
We assume that different orders of primary features are independent, the parameters f2=f1 and ǻf=0 of Sigma-Pi sampling are fixed and only t0 and ǻt are variable. Sigma-Pi sampling can reduce the dimensions of features greatly; only 28 secondary distribution feature values are calculated from 3818 primary audio features.
4 Databases and Setup We perform the weight estimation experiment in the scope of the audio-visual textprompted speaker identification. Two databases are used. One is our homegrown audio-visual bimodal database having 60 subjects (38 males 22 females, aged from 20 to 65) with each subject speaks 30 continuous Chinese digits (upper to 6 digits per utterance), each utterance is repeated 3 times at intervals of 1 month. The other is CMU’s bimodal database [7] which has 10 subjects (7 males 3 females) speaking 78 English words repeated 10 times. Artificial white Gaussian noise was added to original audio data (SNR=30dB) to simulate various SNR levels. The fusion models were trained at 30dB SNR and tested under all SNR levels. We applied cross-validation for every subject’s data, i.e. 90% of all the data are used as training set, the remaining 10% as test set, and this partitioning is repeated until all the data had been covered in the test set. The acoustic features include 13 Mel frequency cepstral coefficients (MFCCs) and 1 energy (with frame size 25ms, frame shift 11ms) together with their corresponding delta parameters. The visual features include the mouth width, upper lip height, lower lip height [7] and their delta values. The frame rate of visual features is 30 frames per second (fps), which is up-sampled to 90fps (11ms) to match with the audio features by copying and inserting two frames between each two original visual feature frames.
5 Experiments 5.1 Learning SVR Parameters Weight estimation is carried out using ȝ-SVR [5] whose parameters are trained with following steps. First multi-level AVCM DBNs are trained. A DBN is developed for each word with left-to-right non-skipping topological structure. Audio sub-model has 5 states, video sub-model has 3 states; each state is modeled using Gaussian mixture model (GMM) with 3 mixtures. All DBNs are implemented using GMTK toolkit [8]. Then for each test set with one specific SNR level and each value of audio fusion weight λA which varies from 0 to 1 at 0.02 intervals, perform the speaker identification. The words’ DBNs are connected to form a whole sentence model, which is then used to identify the speakers. For each SNR level value, the fusion weight λA with the best identification accuracy is recorded as the target weight value for SVR training. 5.2 Choosing Sigma-Pi Parameters The parameters t0 and ǻt of Sigma-Pi are chosen first. During this stage, the value of t0 varies from 100ms to 1000ms at 100ms intervals, and ǻt varies from 50ms to
Weight Estimation for Audio-Visual Multi-level Fusion
1111
300ms at 50ms intervals. The tests are carried out for all combinations of t0 and ǻt. Results show that when t0=500ms and ǻt=150ms, the performance is the best. These two parameters are then taken as the basic parameters for the following experiments. 5.3 Speaker Identification Results We conduct the speaker identification experiments with fixed noise and random varying noise conditions (with mean acoustic SNR varies from 30dB to 0dB at 10dB intervals) through the whole sentence. Two different weight estimation methods are tested: (1) fixed weight, the fusion weight remains fixed for the test set after trained, as the tradition way mentioned in [2-3]; (2) our proposed method, the estimated weight changes automatically according to the acoustic noise conditions. The experimental results on our homegrown Chinese database are summarized in table 1. The experiments are also conducted on the CMU English database to check the validation of the proposed method whose results are summarized in table 2. It can be seen that the method proposed in this paper improves the accuracies of speaker identification at different acoustic SNR levels when the noise varies dynamically comparing to the traditional fixed weight method. When the acoustic noise changes, that is to say the noise condition for the test set varies and does not match with the training set, the performance degrades dramatically for the traditional fixed weight method, while the performance differences are not significant for the proposed method in this paper. It indicates that our proposed method can predict the fusion weight well under dynamically varying acoustic noise conditions, and can improve the performance of the audio-visual bimodal speaker identification. Table 1. Accuracies of speaker identification on our own Chinese database
mean SNR
30dB 20dB 10dB 0dB fixed varying fixed varying fixed varying fixed varying fixed weight 100% 98% 91% 85% 79% 72% 76% 70% proposed method 100% 100% 91% 90% 80% 78% 77% 75% Table 2. Accuracies of speaker identification on CMU English database
mean SNR
30dB 20dB 10dB 0dB fixed varying fixed varying fixed varying fixed varying fixed weight 100% 99% 92% 86% 81% 77% 77% 73% proposed method 100% 100% 93% 92% 81% 81% 79% 78%
6 Conclusions We investigate a fusion weights estimation method of multi-level hybrid fusion for audio-visual speaker identification by means of support vector regression (SVR). The proposed method estimates the fusion weights directly from the audio features. In the method, Sigma-Pi network re-sampling is introduced to reduce the dimensions of the
1112
Z. Wu, L. Cai, and H.M. Meng
audio features. The experiments show that the method improves the speaker identification performance at different acoustic SNR levels under varying acoustic noise conditions, which indicates that the proposed method can predict the fusion weight well under such circumstances.
Acknowledgments This work is supported by the joint research fund of NSFC-RGC (National Natural Science Foundation of China - Research Grant Council of Hong Kong) under grant No. 60418012 and N-CUHK417/04.
References 1. Senior, A., Neti, C., Maison, B.: On the Use of Visual Information for Improving Audiobased Speaker Recognition. In: Audio-visual Speech Processing Conf. (1999) 108–111 2. Nefian, A.V., Liang, L.H., Fu, T.Y., Liu, X.X.: A Bayesian Approach to Audio-Visual Speaker Identification. In: Proc. 4th Int. Conf. AVBPA, Vol. 2688 (2003) 761–769 3. Chibelushi, C.C., Deravi, F., Mason, J.S.D.: A Review of Speech-based Bimodal Recognition. IEEE Trans. Multimedia 4 (2002) 23–37 4. Wu, Z.Y., Cai, L.H., Meng, M.H.: Multi-level Fusion of Audio and Visual Features for Speaker Identification. In: Proc. Int. Conf. Biometrics, LNCS 3832 (2006) 493–499 5. Scholkopf, B., Smola, A.J., Williamson, R.C., Bartlett, P.L.: New Support Vector Algorithms. Neural Computation 12 (2000) 1083–1121 6. Gramß, T., Strube, H.W.: Recognition of Isolated Words based on Psychoacoustics and Neurobiology. Speech Communication 9 (1990) 35–40 7. Chen, T.: Audiovisual Speech Processing. IEEE Trans. Signal Processing 18 (2001) 9–21 8. Bilmes, J., Zweig, G.: The Graphical Models Toolkit: An Open Source Software System for Speech and Time-series Processing. In: Proc. Int. Conf. ICASSP. (2002) 3916–3919
A Study on Optimal Configuration for the Mobile Manipulator Considering the Minimal Movement Jin-Gu Kang1, Kwan-Houng Lee2, and Jane-Jin Kim3 1
Dept. Visual Broadcastion Media Keukdong College DanPyung-Ri 154-1,Gamgog Myun,Eumsung Gun Chungbuk,467-900,Republic of Korea [email protected] 2 School of Electronics & Information Engineering, Cheongju University, Naedok-Dong Sangdang-Gu Cheongju-City, Chungbuk, 360-764, Republic of Korea [email protected] 3 Dept.Computer Information Keukdong College DanPyung-Ri 154-1,Gamgog Myun,Eumsung Gun Chungbuk,467-900,Republic of Korea [email protected]
Abstract. A Mobile Manipulator--a serial connection of a mobile robot and a task robot--is redundant by itself. Using it’s redundant freedom, a mobile manipulator can perform various task. In this paper, to improve task execution efficiency utilizing the redundancy, optimal configurations of the mobile manipulator are maintained while it is moving to a new task point. And using a cost function for optimality defined as a combination of the square errors of the desired and actual configurations of the mobile robot and of the task robot, the job which the mobile manipulator performs is optimized. Here, The proposed algorithm is experimentally verified and discussed with a mobile manipulator, PURL-II
1 Introduction While a mobile robot can expand the size of the work space but does no work, a vertical multi-joints robot or manipulator can’t move but it can do work. And at present, there has been a lot of research on the redundant robot which has more degrees of freedom than non-combination robots in the given work space, so it can have optimal position and optimized job performance[6][13]. While there has been a lot of work done on the control for both mobile robot navigation and the fixed manipulator motion, there are few reports on the cooperative control for a robot with movement and manipulation ability[4]. Different from the fixed redundant robot, the mobile manipulator has the characteristic that with respect to the given working environments, it has the merits of abnormal movement avoidance, collision avoidance, efficient application of the corresponding mechanical parts and improvement of adjustment. Because of these characteristics, it is desirable that one uses the mobile manipulator with the transportation ability and dexterous handling in difficult working environments[5]. This paper explains the mobile manipulator PURL-II which is a combination in series of a mobile robot that has 3 degrees of freedom for efficient job accomplishment and a task robot that has 5 degrees of freedom. We have analyzed the kinematics and inverse kinematics of each robot to D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1113 – 1124, 2006. © Springer-Verlag Berlin Heidelberg 2006
1114
J.-G. Kang, K.-H. Lee, and J.-J. Kim
define the 'Mobility' of the mobile robot as the most important feature of the mobile manipulator. We researched the optimal position and movement of robot so that the combined robot can perform the task with minimal joint displacement and adjust the weighting value using the this 'Mobility'. When the mobile robot performed the job with the cooperation of the task robot, we investigated the optimizing criteria of the task using the 'Gradient Method' to minimize the movement of the whole robots. The results that we acquired by implementing the proposed algorithm through computer simulation and the experiment using PURL-II are demonstrated.
2 Mobile Manipulator 2.1 Configuration of the Mobile Manipulator The robot that was used in our research is shown in Fig. 1
Fig. 1. Complete PURL-II
The robot PURL-II consists of the task robot with 5 degrees of freedom and the mobile robot with 3 degrees of freedom. We mounted the ROB3 with 5 joints as the task robot, and installed a gripper at the end-effector so it can grip things. In addition to that, we mounted the portable PC to be used as the host computer to monitor the controller of the mobile manipulator and to monitor the states of the robot. 2.2 Kinematics Analysis of the Mobile Robot We analyzed the kinematics to calculate the position in Cartesian coordinate system using the variables of the mobile robot[3]. The coordinate system and modeling for the kinematics is shown in Fig. 2. Let us denote the present position of the mobile robot as pm , the velocity as p m , the average velocity of gravity center as vm ,c , angular velocity of the mobile robot as ω , the angle between X coordinate and the mobile robot as θ m , The Cartesian velocity p m , is represented in terms of joint variables as follows.
A Study on Optimal Configuration
p m = J ( p m ) q m ª x m º ª cos θ m « » « « y m » = « sin θ m « z m » « 0 « » « «¬θ m »¼ «¬ 0
0 0 1 0
0º » 0» 0» » 1 »¼
ª v m ,c º « » «vm ,z » «¬ ω »¼
1115
(1)
(2)
where vm , z is provided by a ball-screw joint along the Z axis. Under Pure Rolling Condition and Non Slipping Condition, let us denote the wheel radius as r , the distance between wheel and the center as l . Then xm , y m , and θm are calculated by (3) as follows[15].
x m =
r (q m ,l + q m ,r )cos θ m 2
(3a)
r (q m,l + q m,r )sinθ m 2
(3b)
r (q m ,l − q m , r ) 2l
(3c)
y m =
θm =
2.3 Kinematics Analysis of the Mobile Manipulator Each robot which is designed to accomplish each independent objective concurrently should perform its corresponding movement to complete the task. The trajectory is needed for kinematics analysis of the whole system, so that we can make the combination robot perform the task efficiently using the redundant degree of freedom generated by the combination of the two robots[9][10]. From Fig. 3, We can see the Cartesian coordinate of the implemented mobile/task robot system and the link coordinate system of each joint in space[3]. This system is an independent mobile manipulator without wire. The vector q of the whole system joint variables can be defined qt = [qt1 qt 2 qt 3 qt 4 qt 5 ] and q m = [q m 6 q m 7 q m8 q m 9 ] that represents the joint variable vector of the task robot. That is shown as
ª q º T q = « t » = [qt1 qt 2 qt 3 qt 4 qt 5 qm6 qm7 qm8 qm9 ] q ¬ m¼
(4)
The linear velocity, and angular velocity of mobile robot in Cartesian space with respect to the fixed frame of the world frame can be expressed as (5). 0
ª 0V º ª 0 J º Pm = « 0 m » = « 0 m ,v » q m = 0 J m q m ¬ ω m ¼ «¬ J m ,ω ¼»
(5)
1116
J.-G. Kang, K.-H. Lee, and J.-J. Kim
yW
yR
vL
xR
vC ry
θ
2L
vR xW
rx
Fig. 2. Mobile robot modeling and coordinate system
In view of Fig. 3, let us represent the Jacobian of vector qt (task robot joint variable) with respect to frame {1}. These results are shown in (6) as follows[8] . m
ª mV º ª m J º Pt = « m t » = « m t ,v » qt = mJ t qt ¬ ω t ¼ ¬« J t ,ω ¼»
(6)
Given the Jacobians, 0 J m and m J t , for each robot, if we express the Jacobian of the mobile 0
[
Pt = 0Vt
manipulator as 0
0
Jt ;
the linear velocity. Then angular velocity
ω t ] from the end-effector to the world frame is represented as (7). T
0
ª 0V º ª 0V º ª 0ω + 0 R 1V º Pt = « 0 t » = « 0 m » + « 0m 1 1 t » R1 ω t ¼ ¬ ωt ¼ ¬ ωm ¼ ¬
= 0 J m q m + 0 J t q t =
[J 0
0 m
Jt
]
(7)
ª q m º « » ¬ qt ¼
Here 0 R1 is rotational transformation from world frame to the base frame of the task robot. Namely, in view of (5)~(7), the movements of mobile robot and task robot are involved with the movement of end-effector.
3 Algorithm for System Application 3.1 Task Planning for Minimal Movement Because the position of base frame of task robot varies according to the movement of mobile robot, through inverse kinematics, the task planning has many solutions with respect to the robot movement. and we must find the accurate solution to satisfy both the optimal accomplishment of the task and the efficient completion of the task.
A Study on Optimal Configuration
1117
Fig. 3. Coordinate system of the mobile manipulator
In this paper, we have the objective of minimization of movement of the whole robot in performing the task, so we express the vector for mobile manipulator states as (8). ªqm º q = « » ¬ qt ¼
(8)
where qm = [xm ym zm θ m ] T and qt = [θ 1 θ 2 θ 3 θ 4 θ 5 ] T .Here, q is the vector for the mobile manipulator and consists of qm representing the position and direction of mobile robot in Cartesian space and qt , the joint variable to each n link of the task robot. Now to plan the task to minimize the whole movement of mobile manipulators, a cost function, L , is defined as L = Δ q T Δ q = ( q f − qi )T ( q f − qi ) = ( q m , f − q m ,i ) T ( q m , f − q m ,i ) + ( q t , f − q t , i ) T ( q t , f − q t ,i )
Here, q i = [q m ,i
and q f = [q m , f
q t ,i qt , f
]
T
]
T
(9)
represents the initial states of the mobile manipulator,
represents the final states after having accomplished the task. In
the final states, the end-effector of the task robot must be placed at the desired position X t ,d . For that, equation (10) must be satisfied. In (10), we denote as R(θ m , f ) and f (qt , f ) , respectively, the rotational transformation to X − Y plane and kinematics equation of task robot[14].
X t ,d = R(θ m , f ) f (qt , f ) + X m , f
(10)
where X t ,d represents the desired position of task robot, and X m , f is the final position of mobile robot. We can express the final position of the mobile robot X m , f as the function of the desired coordinate X t ,d , joint variables θ m, f and qt , f , then the cost function that represents the robot movement is expressed as the n × 1 space function of θ m, f and qt , f as (11).
1118
J.-G. Kang, K.-H. Lee, and J.-J. Kim
L = {Xt ,d − R(θm, f ) f (qt , f ) − X m,i }T {Xt ,d − R(θm, f ) f (qt , f ) − X m,i }
(11)
+ {qt , f − qt ,i }T {qt, f − qt ,i }
In the equation (11), θ m , f and qt , f which minimize the cost function L must satisfy the condition in (12).
ª ∂L « ∂θ ∇L = « m , f « ∂L « ∂q ¬ t, f
º » »=0 » » ¼
(12)
Because the cost function is nonlinear, it is difficult to find analytically the optimum solution that satisfies (12). So in this papers, we find the solution through the numeric analysis using the gradient method described by (13) . ªθ m , f ( k +1 ) º ªθ m , f ( k ) º « »=« » − η ∇L ¬ q t , f ( k +1 ) ¼ ¬ q t , f ( k ) ¼
θ m , f ( k ) , qt , f ( k )
(13)
This recursive process will stop, when ∇L < ε ≈ 0 . That is, θ m , f ( k ) and qt , f ( k ) are optimum solutions. Through the optimum solutions of θ m, f and qt , f the final robot state q f can be calculated as (14). ª q m , f º ª X t ,d − R (θ m , f ) f ( q t , f ) º qf = « »=« » qt , f ¬ qt , f ¼ ¬ ¼
(14)
There are several efficient searching algorithms. However, the simple gradient method is applied for this case. 3.2 Mobility of Mobile Robot In this research, we define “mobility of the mobile robot” as the amount of movement of the mobile robot when the input magnitude of the wheel velocity is unity. That is, the mobility is defined as the corresponding quality of movement in any direction[1]. The mobile robot used in this research does move and rotate because each wheel is rotated independently under the control. The robot satisfies (15) with remarked kinematics by denoting left, right wheel velocities ( q m ,l , q m , r ) and linear velocity and angular velocity ( v m , ω ). vm = r
ω =
q m , l + q m , r 2
r q m ,l − q m ,r 2 l
(15a)
(15b)
A Study on Optimal Configuration
1119
Rewriting (15a), (15b), we get (16a) and (16b).
q m ,r =
vm + ω l r
(16a)
q m ,l =
vm − ω l r
(16b)
Mobility is the output to input ratio with a unity vector, v m = 1 , or q 2 m ,l + q 2 m , r = 1 and the mobility v m in any angular velocity ω is calculated by (17). vm = r
1 −ω 2
2
l2 r2
(17)
When the mobile robot has the velocity of unity norm, the mobility of mobile robot is represented as Fig. 4. It shows that the output, v and ω ѽ in workspace for all direction inputs that are variance of robot direction and movement. For any input, the direction of maximum movement is current robot direction when the velocities of two wheels are same[7]. At the situation, there does not occur any angular movement of the robot. 3.3 Assigning of Weighting Value Using Mobility From the mobility, we can know the mobility of robot in any direction, and the adaptability to a given task in the present posture of mobile robot. If the present posture of mobile robot is adaptable to the task, that is, the mobility is large to a certain direction, q m , l
vm vm = 0
vm = 1
Δθ
q m , r
Fig. 4. Motion generation efficiency
we impose the lower weighting value on the term in the cost function of (18) to assign large amount of movement to the mobile along the direction. If not, by imposing the higher weighting value on the term we can make the movement of mobile robot small. Equation (18) represents the cost function with weighting value L = { X t ,d − R (θ m , f ) f ( qt , f ) − X m ,i }T Wm { X t ,d − R (θ m , f ) f ( qt , f ) − X m ,i } + {qt , f − qt ,i }T Wt {qt , f − qt ,i }
(18)
Here, Wm and Wt are weighting matrices imposed on the movement of the mobile robot and task robot, respectively. In the cost function, the mobility of mobile robot is
1120
J.-G. Kang, K.-H. Lee, and J.-J. Kim
expressed in the Cartesian coordinate space, so the weighting matrix Wm of the mobile robot must be applied. after decomposing each component to each axis in Cartesian coordinate system as shown in Fig. 5 and is represented as (19). 0 0 º ªω x 0 « » 0 ωy 0 0 » Wm = « « 0 0 ωz 0 » « » 0 0 0 ω θ »¼ ¬«
Where, ω x =
(19)
1 k1 1 , ω y= , ω z= , and v ⋅ cos(φ ) cos(α ) + e ( z d − f z (qt ) ) 2 v ⋅ sin(φ ) sin(α ) + e
ω θ=1 .
3.4 Mobile Robot Control
The mobile robot carries the task robot to the reachable boundary to the goal position, i.e., within the reachable workspace. We establish the coordinate system as shown in Fig. 6 so that the robot can take the desired posture and position movement from the initial position according to the assignment of the weighting value of the mobile robot to the desired position. After starting at the present position, ( xi , yi ) , the robot reaches the desired position, ( xd , yd ) . Here the current robot direction φ , the position error α from present position to the desired position, the distance error e to the desired position, the direction of mobile robot at the desired position θ are noted [7,11]. e = − v cos α
α = − ω + θ =
(20a)
v sin α e
(20b)
v sin α e
(20c)
A Lyapunov candidate function is defined as in (21).
V = V1 + V2 = 12 λ e 2 + 12 (α 2 + hθ 2 )
(21)
where V1 means the error energy to the distance and V2 means the error energy in the direction. After differentiating both sides in equation (21) in terms of time, we can acquire the result as in equation (22). V = V + V = λ ee + α α + hθ θ (22) 1
2
(
)
Let us substitute equation (20) into the corresponding part in equation (22), it results in equation (23). . v sin α (α + hθ ) º ª V = − λ e v cos α + α «− ω + ⋅ (23) » α e ¬ ¼ Note that V < 0 is required for a given V to be a stable system. On this basis, we can design the nonlinear controller of the mobile robot as in (24).
A Study on Optimal Configuration
1121
Y
θ
( xd , yd )
e α
ω
v
φ ( xi , yi )
Fig. 5. Decomposing mobility
X
Fig. 6. Position movement of mobile robot by imposed weighting value
v = γ ( e cos α ) , ( γ > 0 ) ω = kα + γ
cos α sin α
α
(24a)
( α + hθ ) , ( k , h > 0 )
(24b)
Therefore, using this controller for the mobile robot, V approaches to zero as t → ∞ ; e and α also approach almost to zero as shown in (25). V = − λ ( γ cos 2 α ) e 2 − kα 2 ≤ 0
(25)
(a)
(c)
(b)
(d)
Fig. 7. The optimal position planning to move a point of action of a robot to (1, 0.5, 0.7)
1122
J.-G. Kang, K.-H. Lee, and J.-J. Kim
4 Simulation For verifying the proposed algorithm, simulations were performed with PURL-II. Fig. 7 shows the simulation results with a 3 DOF task robot and a 3 DOF mobile robot. The goal is positioning the end-effect to (1, 0.5, 0.7), while initial configuration of mobile robot is (-1.0, -1.0, 1.3, 60°) and that of task robot is (18°, 60°, 90°). The optimally determined configuration of mobile robot is (0.0368, -0.497, 1.14, 44.1°) and that of task robot is (1.99°, 25.57°, 86.63°). Fig. 7 shows movements of the task robot in different view points.
5 Experiment and Conclusion Before the real experiments, assumptions for moving manipulators operational condition are set as follows: 1. In the initial stage, the object is in the end-effect of the task robot. 2. The mobile robot satisfies the pure rolling and non-slippage conditions. 3. There is no obstacle in the mobile robot path. 4. There is no disturbance of the total system. And the task robot is configured as the joint angles of (18°, 60°, 90°), then the coordinate of the end-effect is set up for (0.02, 0.04, 1.3). From this location, the mobile manipulator must bring the object to (1, 1.5, 0.5). An optimal path which is calculated using the algorithm which is stated in the previous section has Wx = 10.0, Wy = 10.0, and Wz = 2.66. And at last the mobile robots angle is 76.52° from the X axis; the difference is coming from the moving of the right wheels 0.8m and the moving of the left wheels 1.4m. Next time, the mobile robot is different with the X axis by 11.64°with right wheels 0.4m moving and the left wheels 0.5m moving. Hence, the total moving distance of mobile robot is (1.2m, 1.9m), the total angle is 88.16°, and the each joint angle of task robot are (-6.45°, 9.87°, 34.92°). The experimental results are shown by the photography in Fig. 8. For the real experiment, the wheel pure rolling condition is not satisfied, also by the control the velocity through the robot kinematics, the distance error occurs from the cumulative velocity error. Using a timer in CPU for estimating velocity, timer error also causes velocity error. Hence consequently, the final position of end-effect is placed at (1.2, 1.5, 0.8) on object. A new redundancy resolution scheme for a mobile manipulator is proposed in this paper. While the mobile robot is moving from one task (starting) point to the next task point, the task robot is controlled to have the posture proper to the next task, which can be pre-determined based upon TOMM[2][16]. Minimization of the cost function following the gradient method leads a mobile manipulator an optimal configuration at the new task point. These schemes can be also applied to the robot trajectory planning. The efficiency of this scheme is verified through the real experiments with PURL-II. The different of the result between simulation and experiment is caused by the error between the control input and the action of the mobile robot because of the roughness of the bottom, and is caused by the summation of position error through velocity control. In further study, it is necessary that a proper control algorithm should be developed to improve the control accuracy as well as efficiency in utilizing redundancy.
A Study on Optimal Configuration
1123
Fig. 8. Response of robot posture
References 1. Mason, M. T.: Compliance and Force Control for Computer Controlled Manipulators. IEEE Transaction on Systems, Man, Cybernetics, 11 (1981) 418~432 2. Lee, S.K., Lee, Jang M..: Task-Oriented Dual-Arm Manipulability and Its Application to Configuration Optimization. in Proceeding 27th IEEE International Conference on Decision and Control Austin, TX, (1988) 3. Spong, Mark W., Robot Dynamics and Control, John Wiley & Sons, (1989) 92-101 4. Francois G. P.: Using Minimax Approaches to Plan Optimal Task Commutation Configuration for Combined Mobile Platform-Manipulator System. IEEE Transaction on Robotics and Automation, 10 (1994) 44-53 5. Lewis F. L., Control of Robot Manipulators, Macmillan Publishing, (1993) 136-140 6. Tsuneo Y.: Manipulability of Robotic Mechan- isms. The International Journal of Robotics Robotics Research, 4 (1994) 3-9 7. Aicardi, M.: Closed-Loop Steering of Unicycle-like Vehicles via Lyapunov Techniques. IEEE Robotics and Automation Magazine, 10 (1995) 27-35 8. You, S.S.: A Unified Dynamic Model and Control System for Robotic Mainpulator with Geometric End-Effector Constraints. KSME International Journal, 10 (1996) 203-212 9. Jang, J.H., Han, C.S.: The State Sensitivity Analysis of the Front Wheel Steering Vehicle: In the Time Domain,” KSME International Journal, 11 (1997)595-604 10. Hong, K.S., Kim, Y.M., Choi, C.: Inverse Kinematics of a Reclaimer: Closed-Form Solution by Exploiting Geometric Constraints. KSME International Journal, 11 (1997) 629-638 11. Hare, N., Fing, Y.: Mobile Robot Path Planning and Tracking an Optimal Control Approach. International Conference on Control, Automation Robotics and Vision, (1997) 9-11 12. Lee J., Cho, H. S.: Mobile Manipulator Motion Planning for Multiple Task Using Global Optimization Approach. Journal of Intelligent and Robotics System, (1997)169-190 13. Stephen, L. C.: Task Compatibility of Manipulator Postures. The International Journal of Robotics Research, 7 (1998) 13-21
1124
J.-G. Kang, K.-H. Lee, and J.-J. Kim
14. You, S.S., Jeong, S.K.: Kinematics and Dynamic Modeling for Holonomic Constrained Multiple Robot System through Principle of Workspace Orthogonalization. KSME International Journal, 12 (1998) 170-180 15. James, C. A., John, H. M.: Shortest Distance Path for Wheeled Mobile Robot. IEEE Transactions on Robotics and Automation, 14 (1998) 657-662 16. Lee, J. M.: Dynamic Modeling and Cooperative Control of a Redundant Manipulator Based on Decomposition. KSME International Journal. 12 (1998) 642-658
Multi-objective Flow Shop Scheduling Using Differential Evolution Bin Qian, Ling Wang, De-Xian Huang, and Xiong Wang Dept. of Automation, Tsinghua Uinv Beijing, 100084, P.R. China [email protected]
Abstract. This paper proposes an effective Differential Evolution (DE) based hybrid algorithm for Multi-objective Permutation Flow Shop Scheduling Problem (MPFSSP), which is a typical NP-hard combinatorial optimization problem. In the proposed Multi-objective Hybrid DE (MOHDE), both DE-based searching operators and some special local searching operators are designed to balance the exploration and exploitation abilities. Firstly, to make DE suitable for solving MPFSSP, a largest-order-value (LOV) rule based on random key representation is developed to convert the continuous values of individuals in DE to job permutations. Then, to enrich the searching behaviors and to avoid premature convergence, a Variable Neighborhood Search (VNS) based local search with multiple different neighborhoods is designed and incorporated into the MOHDE. Simulation results and comparisons with the famous random-weight genetic algorithm (RWGA) demonstrate the effectiveness and robustness of our proposed MOHDE.
1 Introduction Flow Shop Scheduling Problems (FSSP) are one of the most well known problems in the area of scheduling and has been proved to be strongly NP-hard [1]. Due to its importance in many industrial areas, FSSP has attracted much attention and wide research in both Computer Science and Operation Research fields. The Permutation FSSP with n jobs and m machines is commonly defined as follows: each of n jobs is to be sequentially processed on machine 1, , m . The processing time p i , j of job i
on machine j is given. At any time, each machine can process at most one job and each job can be processed on at most one machine. The sequence in which the jobs are to be processed is the same for each machine. There are various scheduling objectives to be considered. Among them are makespan, maximum tardiness, total tardiness, maximum flowtime, and total flowtime [2]. Most real-world optimization problems in manufacturing systems are multiobjective. Some researchers have tackled multi-objective FSSP. For example, Daniels and Chambers [3] considered the tradeoff between the makespan and the maximum tardiness. Rajendran [4] presented a branch-and-bound algorithm and two heuristic algorithms to minimize the total flowtime with a constraint condition on the makespan. Ishibuchi and Murata [5] applied Genetic Algorithms based algorithm to the two-objective and three-objective FSSP. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1125 – 1136, 2006. © Springer-Verlag Berlin Heidelberg 2006
1126
B. Qian et al.
Differential Evolution (DE) algorithm [6] is a novel parallel direct search method, which has been proved a simple and efficient heuristic for global optimization over continuous spaces. VNS [7] is very effective and can be easily applied to a variety of problems. Recently, Onwubolu and Davendra [8] described a novel optimization method based on a differential evolution (exploration) algorithm for single objective FSSP. Tasgetiren [9] presented a PSO algorithm with VNS to solve FSSP. Both of their algorithms achieved good results. Recently, hybrid heuristics have been a hot topic in the fields of both Computer Science and Operational Research [15]. It is well known that the performances of evolutionary algorithms can be improved by combining problem-dependent local search. Memetic algorithms (MAs) [16] may be considered as a union of a population-based global search and local improvements. In MAs, several studies [16][17] have been focused on how to achieve a reasonable combination of global search and local search, and how to make a good balance between exploration and exploitation. Inspired by MAs’ spirit, a hybrid algorithm based on DE and VNS is proposed for MPFSSP in this paper.
2 Formulation of MPFSSP The MPFSSP can be described as follows: denote ci , j as the complete time of job i on machine j , ci is the completion time of job i , Ti is the tardiness time of job i , d i is the due date of job i , and let π = ( σ 1 , σ 2 , , σ n ) be any a processing sequence
of all jobs. In our study, we minimize two objectives: makespan C max and maximum tardiness Tmax . Then the mathematical formulation of the MPFSSP to minimize C max and Tmax can be described in (1): cσ 1 ,1 = pσ 1 ,1, °c ° σ j ,1 = cσ j −1 ,1 + pσ 1 ,1 , j = 2, , n ° cσ ,i = cσ ,i −1 + pσ , i , i = 2, , m 1 1 ° 1 ° cσ j ,i = max cσ j −1 , i , cσ j , i −1 + pσ j ,i , i = 2, , m; j = 2, , n . ® ° Cmax = cσ n , m ° ci = ci , m ° ° Ti = max{ci − d i ,0} °T ¯ max = max{T1 , , Tn }
{
}
(1)
3 Brief Review of DE DE algorithm introduced by Storn and Price [6] is a branch of evolutionary algorithms for optimization problems over continuous domains. DE is a population-based evolutionary computation technique, which uses simple differential operator to create new candidate solutions and one-to-one competition scheme to greedily select new
Multi-objective Flow Shop Scheduling Using Differential Evolution
1127
candidate. The theoretical framework of DE is very simple and DE is easy to be coded and implemented with computer. Besides, it is computationally inexpensive in terms of memory requirements and CPU times. Thus, nowadays DE has attracted much attention and wide applications in various fields. In DE, it starts with the random initialization of a population of individuals in the search space and works on the cooperative behaviors of the individuals in the population. Therefore, it finds the global best solution by utilizing the distance and direction information according to the differentiations among population. However, the searching behavior of each individual in the search space is adjusted by dynamically altering the differentiation’s direction and step length in which this differentiation performs. At each generation, the mutation and crossover operators are applied on the individuals, and a new population arises. Then, selection takes place, and the corresponding individuals from both populations compete to comprise the next generation. Currently, there are several variants of DE algorithm can be found in http://www.icsi.berkeley.edu/%7Estorn/code.html.
4 MOHDE for MPFSSP A general multi-objective optimization problem (MOP) with w objectives can be expressed as follows: minimize f1 ( x ), f 2 (x ), ..., f w (x ) ,
(2)
where f1 ( x ), f 2 ( x ), ..., f w (x ) are w objectives to be minimized and x ∈ solutions set . Let us consider two solutions a and b , the solution a is said to dominate b if ∀i ∈ {1,2,..., w} : f i (a ) ≤ f i (b ) and ∃j : {1,2,..., w} : f j (a ) < f j (b ) .
(3)
If a solution is not dominated by any other solutions of the MOP, that solution is defined as a nondominated solution. The solutions that are nondominated within the entire solution space are defined as Pareto solutions and comprise so-called Pareto trade-off front. In recent years, various evolutionary algorithms have been presented to solve different MOPs. For a review, see [21][22]. The aim of MOHDE is to try to obtain all Pareto solutions of MPFSSP. In this section, we will explain the implementation of MOHDE for MPFSSP in detail by illustrating the key techniques used in MPFSSP, including solution representation, VNS–based local search and solution repair mechanism. 4.1 Solution Representation
Because of DE’s continuous nature, it can not be directly adopted for FSSP. So the applications of DE on combinatorial optimization problems are very limited. The important problem in applying DE to MPFSSP is to find a suitable mapping between job sequence and individuals (continuous vectors) in DE. For the n-job and mmachine problem, each vector contains n number of dimensions corresponding to n operations. In this paper, we propose a largest-order-value (LOV) rule based on random key representation [10] to convert DE’s individual X i = [ xi ,1 , xi ,2 , , xi , n ] to the job solution/permutation vector π i = { π i ,1 , π i , 2 , , π i ,n }.
1128
B. Qian et al.
According to LOV rule, X i = [ xi ,1 , xi ,2 , , xi ,n ] are firstly ranked by descending order to get the sequence ϕ i = [ ϕ i ,1 , ϕ i , 2 , , ϕ i ,n ]. Then the job permutation π i is calculated by the following formula:
π i ,ϕ i ,k = k .
(4)
We provide a simple example to illustrate the LOV rule in table 1. In this stance ( n = 6 ), when k = 1 , then ϕ i ,1 = 4 and π i ,ϕ i ,1 = π i ,4 = 1 ; when k = 5 , then ϕ i ,5 = 2 and π i ,ϕ i , 5 = π i , 2 = 5 , and so on. This representation is unique and simple in terms of finding new permutations. Table 1. Solution Representation Dimension k
xi , k
1 1.36
2 3.85
3 2.55
4 0.63
5 2.68
6 0.82
ϕ i, k
4
1
3
6
2
5
π i, k
2
5
3
1
6
4
4.2 VNS-Based Local Search
In Reeves [19][20], it is observed that, in the context of FSSP, the solution space landscape induced by some specific operators (i.e. insert, interchange, inverse, etc) has a “big-valley”, where local optimal solutions tend to be relatively close to each other and to the global optimal solution. This encourages us to develop a local search method to exploit this structure and to guide the DE’s population to the bottom region of big-valley, where contains the global optimal solution and better local optimal solutions. In addition, MPFSSP’s huge space and multi-objective property make it difficult to use only one neighborhood can achieve good results. So we design a VNSbased local search with multiple different neighborhoods to enrich the local searching behaviors and to avoid premature convergence. The neighborhood of local search is based on the insert+inverse+interchange variant of the VNS method proposed in [7][9]. Pseudo code of local search is given as follows: Convert an individual X i (t ) to a job permutation π i according to the LOV rule; Set loop=1; do k=0; k_max=3; do randomly select u and v , where u ≠ v ; if (k=0) then π i _ 1 = insert (π i ,u, v ) ; if (k=1) then π i _ 1 = inverse(π i ,u, v ) ;
if (k=2) then π i _ 1 = interchange(π i ,u, v ) ;
Multi-objective Flow Shop Scheduling Using Differential Evolution
1129
if π i _ 1 dominates π i then
k=0; π i = π i _ 1 ; elseif k=k+1 endif; while ( k
ϕ i ,π i , k = k .
(5)
Step 2: Values in xi are rearranged to keep consistent with ϕ i . 4.3 MOHDE for MPFSSP
MOHDE is designed based on DE/rand-to-best/1/exp scheme, whose base vector is the best individual of the current population. Thus, MOHDE can share best information among the population. In order to solve MPFSSP, MOHDE is similar to canonical DE/rand-to-best/1/exp algorithm with the following modifications: (1) Since the exploration ability of DE depends on population diversity, two Gaussian distributions are used to initialize population and generate scaling factor F . The population is initialized according to a Gaussian distribution Gaussian(0.5,0.2 ) and the scaling factor F is generated from a Gaussian distribution Gaussian(0.7,0.2) . (2) For the sake of balancing the exploration and exploitation abilities, local search is applied to only 1 individuals in population in each generation. 5 (3) Results [18] show that elitist strategy can speed up the performance of multiobjective GA significantly, which can also keep the good solutions once they are found. So a tentative non-dominated solutions set A, which contains nondominated job solutions and their corresponding individuals, is added to MOHDE. A is updated by the new current DE population at every generation. That is, if a solution in the current population is not dominated by any other solutions in the current population and A, this solution is added to A. Then all solutions dominated by the added one are deleted from A. (4) Considered multi-objective problem, there is no absolutely best solution. And the goal of MOHDE is to try to find all pareto solutions. So MOHDE treats all
1130
B. Qian et al.
nondominated solutions as having equal value and randomly selects one from A as the best individual bestit (base vector) of the current population. (5) To get uniformly distributed nondominated solutions, DE selection step in MOHDE can accept the worse solution with a little probability. That is to change f (tmp ) ≤ f ( X r1 (t − 1)) to ( ( tmp dominates X r1 (t − 1) ) or ( random(0,1) < 0.05 ) ) at DE selection step. Based on DE/rand-to-best/1/exp scheme, the presented LOV rule, DE based global search and VNS-based local search, the pseudo code of MOHDE is proposed as follows: step0: Let G denotes a generation, P a population of size M , and X i (t ) the i th individual of dimension n in population P in generation t , xi, k (t ) the k th dimension of individual X i (t ) , and
tmpk the k th dimension of tmp , and CR denotes the crossover probability. step1: Input n, M ≥ 3, CR ∈ [0,1] , and initial bounds: low( xi , k )=0,upper( xi , k )=4, k = 1,, n . step2: Initialize PG = 0 = {X 1 (0), , X M (0 )} as for each individual xi ,k (0) = low(xi , k ) + Gaussian(0.5,0.15) * (upper (xi , k ) − low(xi , k )) ,
i = 1, , M , k = 1, , n repair xi ,k (0) if any variable is outside its bounds; end for each; step3: Initialize nondominated solutions set A and t=1; step4: do //evolution phase for i = 1 to M //Apply DE search and local //search to each individual tmp = X i (t − 1) ; bestit is an individual randomly selected from A; randomly select r1, r 2 ∈ (1 M ) , where r1 ≠ r 2 ≠ i ; randomly select k ∈ (1 n ) ; L = 0 ; //DE mutation and crossover do tmpk = tmpk + Gaussian(0.7,0.2) * (bestit k − tmpk ) + Gaussian(0.7,0.2) * (xr1, k (t − 1) − xr 2, k (t − 1)) ; repair tmpk if it is outside its bounds; k = (k + 1) mod n ; L++; while (random(0,1) < CR ) and (L < n ) ;
Multi-objective Flow Shop Scheduling Using Differential Evolution
1131
if ( tmp dominates X r1 (t − 1) )or ( random(0,1) < 0.05 ) then
X i (t ) = tmp
elseif
X i (t ) = X i (t − 1) ; //DE selection if i mod 5=1 then Apply VNS local search to X i (t ) ; end for i ; update nondominated solutions set A and t = t+1; while the stopping criterion is not satisfied; step5: Output nondominated solutions set A; It can be seen that MOHDE not only applies the DE-based evolutionary searching mechanism to effectively perform exploration for promising solutions within the entire region, but also applies problem-dependent local searches to perform exploitation for solution improvement in sub-regions. Due to the hardness of MPFSSP, MOHDE applies several neighborhoods simultaneously. Since both exploration and exploitation are stressed and balanced, it is expected to achieve good results for MPFSSP. In the next section, we will investigate the performances of MOHDE.
5 Experiments 5.1 Testing Problems
Ishibuchi et.al [5] provided a random-weight genetic algorithm (RWGA) for MOFSSP, which performed better than the vector evaluated genetic algorithm (VEGA) and a constant weight genetic algorithm (CWGA). In this paper, we compare our proposed MOHDE with RWGA based on 29 well known benchmark problems. The first eight problems were called car1, car2 through car8 by Carlier [11]. The other 21 problems were called rec01, rec03 through rec41 by Reeves [12]. In our study, we minimize two objectives: makespan Cmax and maximum tardiness Tmax . The due date of each job was specified as follows: (1) For each problem j , randomly generate a permutation of the jobs. (2) Calculate the completion time of each job in the permutation specified in (1). (3) Specify the due data of each job by
C *j C *j (6) , ], 10 10 is the due date of job i to problem j , ci , j is the completion time of job d i , j = ci , j + random[−
where d i , j
i to problem
j ,
C *j is the optimal makespan of
ª C *j C *j º random «− , » is a random value in the interval «¬ 10 10 »¼
problem
ª C *j C *j º , ». «− «¬ 10 10 »¼
j , and
1132
B. Qian et al.
For these 29 problems, the makespan Cmax and the maximum tardiness Tmax are used as criteria. Because the variance of Cmax is much smaller than Tmax , we set the two objectives like in [5] as follows: Minimize f1 (s ) = 5 * Cmax
(7)
Minimize f 2 (s ) = 2 * Tmax .
(8)
Constant multipliers in (7) and (8) are used to control the two criteria equally. s is a job solution/permutation. 5.2 Statistical Analysis
Multi-objective Problems require multiple and uniformly distributed nondominated solutions to form a Pareto trade-off front. Proper comparison of results from two algorithms is a complex matter. The two factors, number of alternative solutions and their distributions must be considered. A statistical comparison method derived from Knowles and Corne [13] is used in our experiments. It works as illustrated in Fig. 1(a). For two objective problems, the attainment surface is defined as the lines joining the solutions in the nondominated solutions set. Therefore, for two algorithm A and B, A’s attainment surface is LA (solid), B’s attainment surface is LB (dashed). A collection of sampling lines(drawn from the origin) that intersect with the attainment surfaces across the full range of the Pareto frontier. Examples of such lines are indicated by L1-L5 in the Fig. 1(a). For a given sampling line, the intersection of an algorithm closer to the origin (for both minimization) is the winner. Line L2, for example, intersects A’s attainment surfaces at I2, and B’s at I3. Because I2 is closer to the origin than I3, algorithm A outperforms algorithm B on Line L2. Line L1 intersects A’s attainment surfaces at I1 and has no intersects points at B’s, that is to say algorithm A outperforms algorithm B on Line L1.
Fig. 1 (a). Sampling the Pareto frontier
Fig. 1(b). RWGA vs MOHDE
Multi-objective Flow Shop Scheduling Using Differential Evolution
1133
Given two attainment surfaces, LA from algorithm A and LB from algorithm B. A single sampling line L yields 0 or 1 or 2 points of intersection at LA and LB , and a evaluation can be made whether algorithm A outperforms B on L or not. Such a evaluation is performed for each of several lines covering the Pareto trade-off area. Insofar as the lines provide a uniform sampling of the Pareto surface, the result of this analysis yields two numbers-the percentage of the surface in which algorithm A outperforms algorithms B (A_better_B), and the percentage of the surface in which algorithm B outperforms algorithms A (B_better_A). Repeat evaluation certain times, and we can get the average value of A_better_B and B_better_A with some statistical significant. For the number of lines, we find 200 is adequate, although, obviously, the more lines the better. 5.3 Test Results and Comparisons
We set both population size of MOHDE and RWGA as 2 × n , maximum generation (stopping condition of MOHDE) as 300, RWGA’s other parameter the same as in [5], algorithm A as MOHDE, algorithm B as RWGA, repeat times as 20, the number of sample lines as 300. For fair comparison, RWGA and MOHDE run the same time. Table 2. The statistical results of testing algorithm Proble m
n,m
C*
T*
Car1 Car2 Car3 Car4 Car5 Car6 Car7 Car8 Rec01 Rec03 Rec05 Rec07 Rec09 Rec11 Rec13 Rec15 Rec17 Rec19 Rec21 Rec23 Rec25 Rec27 Rec29 Rec31 Rec33 Rec35 Rec37 Rec39 Rec41
11,5 13,4 12,5 14,4 10,6 8,9 7,7 8,8 20,5 20,5 20,5 20,10 20,10 20,10 20,15 20,15 20,15 30,10 30,10 30,10 30,15 30,15 30,15 50,10 50,10 50,10 75,20 75,20 75,20
7038 7166 7312 8003 7720 8505 6590 8366 1247 1109 1242 1566 1537 1431 1930 1950 1902 2093 2017 2011 2513 2373 2287 3045 3114 3277 4951 5087 4960
337 684 707 152 218 397 572 425 22 82 48 89 128 42 45 34 92 129 67 103 0 167 148 243 93 212 302 37 114
MOHDE with strategy 1,CR=0.1 DBG DNDS 29.050 30.650 31.383 54.900 31.983 5.350 0.000 5.050 70.167 79.267 77.133 44.900 73.167 58.567 47.267 62.200 42.267 62.700 28.733 47.367 34.767 40.767 42.133 41.767 34.400 31.700 24.333 19.067 24.467
7 12 8 9 20 9 3 9 17 11 13 9 16 15 11 11 13 14 15 16 17 12 21 18 17 18 14 18 17
RWGA GBD
GNDS
0.783 2.417 8.917 2.433 1.950 1.183 0.000 0.333 9.067 6.667 15.367 7.967 17.867 24.600 10.767 13.067 34.933 26.067 26.433 32.033 31.333 20.867 34.767 17.033 6.967 22.900 20.733 12.033 17.433
6 9 6 5 15 8 3 8 12 6 8 5 12 11 9 8 13 7 10 10 11 9 19 9 11 10 10 14 14
1134
B. Qian et al.
The test program was coded in Delphi 6.0 and run on a Pentium IV 2.8GHz PC. We run the comparison procedure for every problem, and the statistical results are summarized in Table 2, where DBG , GBD, DNDS, GNDS denote the average percentage of MOHDE outperform RWGA, the average percentage of RWGA outperform MOHDE, the average number of non dominate solutions obtained by MOHDE, the average number of non dominate solutions obtained by RWGA respectively. C * denotes the optimal makespan and T * denotes the optimal maximum tardiness. From Table 2, it can be seen that the results obtained by MOHDE are much better than that of RWGA. DBG is obviously larger than GBD, and DNDS is also larger than GNDS in many problems. Secondly, the sum of DBG and GBD of each problem is less than 100, which indicates some sampling lines have no intersections with both MOHDE and RWGA. Randomly run MOHDE and RWGA for Rec01 one times, the non dominated solutions of RWGA (start point) and the non dominated solutions of MOHDE (circle point) are shown in Fig. 1(b). The value of DBG, GBD, DNDS, GNDS are 59.33, 1.33, 15, 11 respectively. MOHDE’s solutions are better than that of RWGA’s. Results are similar for other problems. 5.4 Discussions
From the experimental results with different parameters, we find that the solution’s quality of MOHDE varies with CR. For every problem, we run comparison 20 times to get statistical results. To each comparison, we run RWGA one times to get a non dominated set ND_RWGA, and run MOHDE the same time with different CR to get a series of non dominated set ND_MOHDE_CR_I, and then compare ND_RWGA with different ND_MOHDE_CR_I respectively. The statistical results of some problems are given in Table 3. It is observed in Table 3 that the solution’s quality of MOHDE decreases with the CR’s increasing. In our experience, CR between 0 and 0.3 is a good choice for these problems. Table 3. The statistical results of testing algorithms in different CR(strategy=1) CR=0.2 Problem Car5 Rec09 Rec19 Rec29 Rec39
CR=0.4
CR=0.8
CR=1
DBG
GBD
DBG
GBD
DBG
GBD
DBG
GBD
29.3 58.9 48.7 43.7 19.3
1.7 26.2 42.9 37.9 16.1
30.7 56.8 43.5 28.2 13.4
3.0 31.5 48.2 46.1 17.5
30.9 32.2 29.8 13.6 9.6
3.1 55.6 57.6 58.1 17.5
36.3 25.4 18.6 13.2 10.9
1.8 68.4 63.2 62.8 17.5
6 Conclusion and Future Research This paper proposes a hybrid algorithm based on DE and VNS-based local search for MPFSSP. Simulation results and comparisons with a prevail algorithm based on well-
Multi-objective Flow Shop Scheduling Using Differential Evolution
1135
known benchmarks demonstrate the effectiveness of the MOHDE. To the best of our knowledge, MOHDE is the first algorithm based on DE algorithm to solve multiobjective scheduling problem, and thus it still has potential to develop other version of DE to increase the solution’s quality for the scheduling problem. For future work, we intend to study DE for other kinds of scheduling problems, such as multi-objective job shop[14].
Acknowledgements This research is partially supported by National Science Foundation of China (Grant No. 60204008, 60374060 and 60574072) as well as the 973 Program (Grant No. 2002CB312200).
References 1. Garey, M.R., Johnson, D.S.: Computers and Intractability: a Guide to the Theory of NP Completeness. Freeman, San Francisco (1979) 2. Wang, L.: Shop Scheduling with Genetic Algorithms. Tsinghua University & Springer Press, Beijing, (2003) 3. Daniels, R.L., Chambers, R.J.: Multiobjective flow-shop scheduling. Naval Res. Logistics Quat., 37 (1990) 981-995 4. Rajendran, C.: Two-stage flowshop scheduling problem with bicriteria. J.Oper.Res.Soc., 43(9) (1992) 871-884 5. Ishibuchi, H., Murata, T.: A multiobjective genetic local search algorithm and its application to flowshop scheduling. IEEE T. Syst. Man. Cy. C. 28(3) (1998) 392-403 6. Storn, R., Price, K.: Differential Evolution-A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces. J. Global. Optim. II., (1997) 341-359 7. Mladenovic, N., Hansen, P.: Variable Neighborhood Search. Comput. Oper. Res., 24 (1997) 1097-1100 8. Onwubolu, G., Davendra, D.: Scheduling flow shops using differential evolution algorithm. Eur. J. Oper. Res., 171(2) (2006) 674-692 9. Tasgetiren, M.F., Sevkli, M., Liang, Y.C., Gencyilmaz, G.: Particle Swarm Optimization Algorithm for Permutation Flowshop Sequencing Problem. Lecture Notes in Computer Science, 3172 (2004) 382-389 10. Bean, J.C.: Genetic algorithm and random keys for sequencing and optimization. ORSA journal on computing, 6(2) (1994) 154-160 11. Carlier, J.: Ordonnancements a contraintes disjonctives. Oper. Res., 12 (1978) 333-351 12. Reeves, C.R.: A genetic algorithm for flowshop sequencing. Comput. Oper. Res., 22 (1995) 5-13 13. Knowles, J.K., Corne, D.W.: Approximation the Nondominated Front Using the Pareto Archived Evolution Strategy. Evol. Comput., 8(2) (2000) 149-172 14. Wang, L., Zheng, D.Z.: An effective hybrid optimization strategy for job-shop scheduling problems. Comput. Oper. Res., 28 (2001) 585-596 15. Ruiz, R., Maroto, C.: A comprehensive review and evaluation of permutation flowshop heuristics. Eur. J. Oper. Res., 165 (2005) 479–494 16. Hart, W.E., Krasnogor, N., Smith, J. E.: Recent Advances in Memetic Algorithms. Heidelberg: Springer, (2004)
1136
B. Qian et al.
17. Ishibuchi, H., Yoshida, T., Murata, T.: Balance between genetic search and local search in memetic algorithms for multiobjective permutation flowshop scheduling. IEEE T. Evol. Comput., 7 (2003) 204–223 18. Zitzler, E., Deb, K., Thiele, L.: Comparison of multiobjective evolutionary algorithms: Empirical results. IEEE T. Evol. Comput., 8 (2000) 19. Reeves, C.R., Yamada, T.: Genetic algorithms, path relinking and the flowshop sequencing problem. Evol. Comput., 6 (1998) 45-90 20. Reeves, C.R.: Landscapes, operations and heuristic search. Ann. Oper. Res., 86 (1999) 473-490 21. Coello, C.A.C.: A comprehensive survey of evolutionary-based multiobjective optimization techniques. Knowledge and Inform. Syst., 1(3) (1999) 269-308 22. Coello, C.A.C., Veldhuizen, D.A.V., Lamont, G.B.: Evolutionary Algorithms for Solving Multi-Objective Problems. Boston, MA: Kluwer (2002)
A Genetic Algorithm for the Batch Scheduling with Sequence-Dependent Setup Times TsiuShuang Chen1, Lei Long1, and Richard Y.K. Fung2 1
College of Information Technology Science, Nankai University, Tianjin, 300071, P.R. China [email protected] 2 Department of Manufacturing Engineering & Engineering Management, City University of Hong Kong, Hong Kong, P.R. China
Abstract. This paper considers a single machine scheduling problem with sequence-dependent setup times to minimize the maximum lateness. A genetic algorithm is developed in which an effective binary coding based on the problem properties is presented, and a heuristic for sequencing the batches given the batching structure is proposed. Computational experiments show that the proposed genetic algorithm performs well in solving the problem, and is capable of effectively solving large problems involving 400 jobs.
1 Introduction Batch scheduling problem with setup times arises frequently in process industries, parts manufacturing environments and cellular assembly systems (such as chemical, pharmaceutical, food processing, metal processing, printing industries and semiconductor testing facilities). The setup times may reflect the need to change tools or to clean the machines. The Batch Scheduling Problem (BSP) with setup times can be described as follows. Suppose that there are n jobs belonging to F,F 1, job families. A family f, 1 f F, contains nf jobs, n = ¦ Ff =1 n f . These jobs are ready at time zero and will have to be processed without interruption on a single machine. Job Ji 1 i n has a processing time pi and a due date di. Let Ak be the subset of the jobs that belong to family k, 1kF. Assume that the jobs belonging to the same family are indexed in the non-decreasing order of their due dates. Ji ~Jj denotes that Ji and Jj are of the same family. And Ji Jj denotes that Ji ~Jj and d j = min v {dv | J v ~ J i , d v > di } . A setup time s fg is required when two jobs Ji and Jk belonging to separate families f and g are processed consecutively in that order. sff = 0. Let ci be the completion time of job Ji, and the lateness of the Ji in the sequence π is Li (π ) = ci − di . The objective is to minimize the
maximum lateness of the jobs, i.e. Lmax (π ) = max i {Li (π )} . For maximum lateness problem, Monma and Potts [1] put forward a dynamic programming approach that can solve 1/ sf / Lmax and 1/ sfg /Lmax. Unfortunately, this D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1137 – 1144, 2006. © Springer-Verlag Berlin Heidelberg 2006
1138
T. Chen, L. Long, and R.Y.K. Fung
algorithm is not of practical use unless F is very small. They indicated, as Bruno and Downey [2] did before them, that 1/ sfg /Lmax is strongly NP-hard. Bruno and Downey [2] also showed that 1/sf/Lmax is NP-hard as well, but is solvable in pseudo-polynomial time if the number of distinct due dates is fixed. Ghosh and Gupta [3] gave an improved dynamic programming approach for 1/ sfg /Lmax problem, whose computational efficiency is superior to that of Monma and Potts [1]. Uzsoy et al [4] and Uzsoy et al [5] proposed a branch-and-bound algorithm and a dynamic programming procedure for 1/prec, sfg / Lmax respectively. Uzsoy et al [5] analyzed the performance of the myopic Earliest Due Date (EDD) dispatching rule for 1/ri, sfg / Lmax problem. Assuming that the setup times are bounded by the processing times, they developed tight worst-case error bounds of their heuristic. To overcome the large computational time, Ovacik and Uzsoy [6] presented a rolling horizon procedure for 1/ rj, sfg / Lmax problem. In their approach, the problem is decomposed into a series of small subproblems based on the rolling horizon. Unal and Kiran [7] considered the problem with sequence-independent setup times. A heuristic and an exact algorithm were proposed to find feasible schedules in which all due dates are met. This paper presents a genetic algorithm to find solutions that can minimize the maximum lateness. The remainder of the paper is organized as follows. In Section 2, two useful structural properties for optimal solution are discussed. A genetic algorithm (GA) with effective binary coding is proposed in Section 3. In Section 4, the results of computational experiments performed to evaluate the performance of the GA are reported. Finally, some concluding remarks are drawn in Section 5.
2 Analysis of the Problem BSP Let us consider two structural properties required to yield an optimal solution to 1/ sfg /Lmax problems. Theorem 1. There is an optimal schedule for1/ sfg /Lmax problem in which all jobs from a given family are processed in a non-decreasing order of their due dates (EDD).
The proof was given by Monma and Potts [1]. To solve a BSP, partitions of jobs, denoted as Bkl , k = 1, 2, , F , l = 1, 2 , Lk should be obtained, such as Ll =k1 Bkl = Ak , Lk is the number of partitions belonging to family k . The jobs in partition Bkl are processed following one after another without any setup in between, and compose a batch. According to Theorem 1, there is an optimal solution where the jobs in a batch are processed in the EDD order for the 1/ sfg /Lmax problem. For the 1/ sfg /Lmax problem, a composite job J h for a sequence of two jobs (Ji , Jj) can be defined by ph= pi + pk and d h = min{d j , di + p j } . This definition can easily be extended to a longer sequence of jobs. Substituting a composite job for a (partial) sequence does not affect the overall cost of any schedule for the maximum lateness problem [1]. In this paper, a batch is treated as a composite job, and its due date is defined as
A GA for the Batch Scheduling with Sequence-Dependent Setup Times
d Bkl = min{di + ¦ J j ∈Bkl p j } . Ji ∈Bkl
1139
(1)
i< j
Theorem 2. Let CBkl be the completion time of batch Bkl , then
(2)
Lmax = max{C Bkl − d Bkl } . k, l
The theorem is a generalization of a result for 1/ sf /Lmax given in Webster and Baker [8]. Therefore, Lmax = max{ci − di } = max{ max{ci − di }} = max{ LBkl } = max{ CBkl − d Bkl } . i
k ,l
J i ∈Bkl
k ,l
k ,l
This theorem can help reduce the computational requirement significantly for the maximum lateness of any schedule.
3 Genetic Algorithm A genetic algorithm (GA) is an “intelligent” probabilistic search algorithm which can be applied to a variety of combinatorial optimization problems. In this paper, a binary-coded GA is proposed to solve the 1/ sfg /Lmax problem. 3.1 Coding Scheme
In this proposed GA, a binary string is used to represent a solution. Let
{
1, xi = 0,
job J i is processed immediately after a new setup job J i is in the same batch as job J j , J j → J i
A chromosome is defined as
x 2 , , x n1 , x n1 + 2 , , x n1 + n2 , , x n1 +n F −1 + 2 , , x n ).
(3)
The first job of each family is eliminated in the chromosome because xi ≡ 1 if job J i is the job with the earliest due date of Ak , J i ∈ Ak . Let LC denote the length of the chromosome, LC = n − F . For ∀J i ∈ Ak , let l (i ) = ¦ J j ∈ Ak x j , then job J i belongs to Bk ,l ( i ) . j ≤i
That means { xi , i = 1, , n} determines the batching structure of the jobs. 3.2 Heuristic for Sequencing the Batches
In this paper, a heuristic (H1) for sequencing the batches given the batching structure of the jobs is proposed. Firstly Sequence the batches in a non-decreasing order of due dates d Bkl , then a series of batch swapping procedures are conducted to improve the performance of batch sequencing.
1140
T. Chen, L. Long, and R.Y.K. Fung
3.3 Fitness Function
Let F ( x(i, k )) denote the fitness of individual x(i, k ) in population P ( k ) ,
i = 1, , N , k = 1, , K max ,where N is the population size, and Kmax is the maximum number of evolution generations. Let k = 0 for the initial generation. Define F ( x(i, k )) = f max (k ) − f ( x(i, k )) .
(4)
Where f max (k ) = max{ f ( x(i, k ))} , f ( x(i, k )) is the Lmax value of individual x(i, k ) . i
3.4 Construction of Initial Population
In the GA proposed in this paper, the initial population is generated randomly. In order to guarantee the diversity of the initial population, the chromosome newly generated is selected and added into the initial population only when the Hamming distance between the new chromosome and every chromosome already in the initial population is greater than some value (e.g. LC/15). Repeat the procedure until the initial population reaches the population size N. 3.5 The Procedure of the GA
Step 1 Generate the initial population. k=0 Step 2 Use heuristic H1 to determine the sequence of the batches. If Lmax( ) does not change for 5 generations, then stop. Step 3 Conduct the selection operator (Fitness Proportional Method). Use the crossover and mutation operators (one-point crossover operator and one-point mutation operator) to generate new population. k = k+1 Step 4 Use the elitist policy. Replace the worst individual in this generation by the best individual from the previous generation. Step 5 If k reaches the maximum evolution number Kmax,, stop. Otherwise, go to step 2.
4 Computational Analysis The purpose of this section is to report the results of the computational experiments carried out to analyze the performance of the GA. The method for generating problem instances is similar to that suggested in Unal and Kiran [7]. Note that the problem discussed in this paper is with sequence-dependent setup times, which is different from that of Unal and Kiran [7]. 4.1 Generation of Problem Instances
In the computational experiments, a large number of random problem instances are generated. Let nb (i ) , a (i ) , i = 1, , LB , denote the number of jobs in batch i and family type of batch i in this initial schedule respectively. A problem instance is represented
A GA for the Batch Scheduling with Sequence-Dependent Setup Times
1141
by a set of values of the parameters n, F , pi , d i , Af and s fg , where i = 1, 2, , n, f , g = 1, 2, , F . Besides, let c j be the completion time of job J j . These parameters are generated as follows: Assign values to F, G, LB;
n=0 A f = φ ; s fg ~ U [20, 50] orU [50, 80], f , g = 1, , F , f ≠ g .(Let s fg = 0 for f = g ) for (i = 1; i ≤ LB ; i + + ) { nb (i ) ~ U [1, G ]; a (i ) ~ U [1, F ]; for ( j = n + 1; j ≤ n + nb (i ); j + + ) { Aa ( i ) = Aa ( i ) ∪ { j}; p j ~ U [20, 50] or U [50, 80]; if j == n + 1 then c j = c j −1 + p j + s a ( i −1), a ( i ) ; else c j = c j −1 + p j } n = n + nb (i ) } Depending on different values of the parameters pi, sfg, six different data types are defined as follows:
I pi ~ U [ 20,50] , s fg ~ U [ 20,50] ; II pi ~ U [ 20,50] , s fg ~ U [50,80] ; III pi ~ U [50,80] , s fg ~ U [ 20,50] ; IV pi ~ U [50,80] , s fg ~ U [50,80] ; V pi ~ U [ 20,50] , s fg ~ U [50,80] or s fg ~ U [ 20,50] , which is determined randomly; VI pi ~ U [50,80] , s fg ~ U [50,80] or s fg ~ U [ 20,50] , which is determined randomly. Due to the consideration of sequence-dependent setup times, two new parameter types V and VI are introduced in this paper. 4.2 The Effectiveness of the GA—Experiment 1
In experiment 1, the effectiveness of the GA is compared with the backward dynamic programming (DP) algorithm proposed by Ghosh and Gupta[3]. Both algorithms are coded in C++. All runs are done by the AMD XP1800+ CPU. Data type II ( pi ~ U [20,50], s fg ~ U [50,80] ) is used in this experiment, and 5 problem types are defined for different values of parameters LB and F. For each type, 20 instances are generated and solved by the GA and the DP algorithm respectively. Due to the large computational times of the DP algorithm, nb (i ) ~ U [1,3] is adopted in this opt opt experiment. Let ( LGA max − Lmax ) Lmax be the relative gap of the GA. In table 1 the due
dates are d j ~ U [c j , cn ], for j = 1, 2, n , where c j , cn are the completion times of job j , n in the initial schedule generated in subsection 4.1. In table 2 the due dates are d j ~ U [c j 2, c j + cn 2], for j = 1, 2, n . Table 1 and 2 give the average and maximum relative gaps of the GA for the 20 instances, and the average and maximum computation times of DP algorithm and the GA in the 20 instances.
1142
T. Chen, L. Long, and R.Y.K. Fung
Table 1. The Effectiveness of the DP and GA ( Situation I ) LB 4 5 6
F Avg(s) 0.5 4.1 20.2 29.8 237.5
2 3 3 4 4
DP Max(s) 1 10 46 72 384
Avg(s) <1 <1 <1 <1 <1
GA Max(s) <1 <1 <1 1 1
Avg gap
Max gap
0.0020 0.0289 0.0104 0.0424 0.0360
0.0031 0.0305 0.0217 0.0510 0.0380
Table 2. The Effectiveness of the DP and GA ( Situation II ) LB 4 5 6
F 2 3 3 4 4
Avg(s) 0.7 3.2 24.7 33.4 242.5
DP Max(s) 1 8 53 75 392
Avg(s) <1 <1 <1 <1 <1
GA Max(s) <1 <1 <1 1 1
Avg gap 0.0258 0.0383 0.0213 0.0275 0.0433
Max gap 0.0307 0.0396 0.0335 0.0401 0.0512
It can be seen from Table 1 and Table 2 that the computation times of the GA are much less than that of the DP algorithm, and the computational burden of the DP algorithm increases rapidly. These results demonstrate that the GA is capable of generating near optimal solutions. We did not test on larger problems due to the excessive computational time needed for getting optimal solutions by DP. 4.3 The Effectiveness of the GA—Experiment 2
In this set of experiments, we will verify the performance of the GA to suggest feasible schedules in which all delivery due dates are met, i.e. Lmax (π ) = 0 . Let d j ~ U [c j , cn + Z ], nb (i ) ~ U [1,10] . 90 problem types are defined for different values of parameters LB, F , pi , s fg and Z . For each type, 20 instances are generated and solved by the GA. In the simulation, the population size is N = 100 , and the crossover and mutation possibilities are 0.85 and 0.015 respectively. The GA successes in suggesting feasible schedules for all 1800 instances. The computational results are presented in Table 3. The column ‘MAX’ refers to the maximum generation of the GA in solving the 20 instances for each problem type, and the column ‘AVE’ refers to the average computation times. It can be seen from Table 3 that the performance of the GA is quite well. The maximum generations do not exceed 21, and the average computation times do not exceed 56s. The experiment shows that the GA has a certain potential for solving larger problems. Another experiment is also conducted in which more restrictive due dates are considered. Let d j ~ U [c j , c j + k ⋅ P ], nb (i ) = U [1,10] , where k = 1, 2,3, 4,5 , P is the average processing time of the batches in the initial schedule. 60 problem types are defined. For each type, 20 instances are generated and solved by the GA. The
A GA for the Batch Scheduling with Sequence-Dependent Setup Times
1143
simulation results show that the performance of the GA is also quite well even with restrictive due dates. The algorithm has failed to find feasible schedules for only 16 problem instances out of 1200 instances. The average computation times are also satisfactory. Table 3. Results of Computational Experiment 2 LB 30
F 10
15
40
15
20
50
20
Data Types I II III IV V VI I II III IV V VI I II III IV V VI I II III IV V VI I II III IV V VI
Z=0
Z=20
Z=50
MAX
AVE (s )
MAX
AVE(s )
MAX
AVE(s )
4 5 9 9 8 7 14 14 8 13 8 10 20 13 13 16 9 9 18 18 16 18 16 13 13 14 14 19 19 16
2.60 2.10 4.45 3.15 2.15 2.15 3.20 6.20 7.00 6.35 1.65 3.90 8.55 11.35 11.70 13.55 7.50 6.00 22.15 25.40 16.90 17.70 14.05 8.85 41.60 36.40 42.50 38.30 27.30 19.20
5 13 6 4 8 3 13 12 9 12 7 6 11 11 8 17 6 8 17 13 12 19 19 13 21 15 13 15 16 14
1.90 3.20 3.35 3.55 1.45 1.05 5.25 6.05 4.85 4.50 3.30 2.60 11.90 9.95 14.25 17.00 8.25 8.05 14.05 21.50 16.20 22.30 12.85 15.30 23.95 55.90 35.90 42.60 35.90 28.10
8 5 4 6 6 2 9 13 4 12 8 4 15 10 12 16 7 5 12 16 12 13 12 15 14 16 13 15 9 18
4.30 3.15 2.10 2.50 1.15 1.20 3.50 5.40 3.45 4.30 2.85 2.90 18.25 11.10 11.15 16.05 4.10 7.70 19.25 25.00 9.85 16.20 13.60 12.75 33.85 47.20 34.00 34.60 29.65 25.40
5 Conclusion This paper discusses a single machine scheduling problem with sequence-dependent setup times to find a sequence for processing all the batches of jobs meeting the required due dates. Since the problem is NP-hard, a genetic algorithm is developed to obtain solutions that can minimize the maximum lateness, where an effective binary coding scheme based on the problem properties is given, and a heuristic for sequencing the batches for a given batching structure is proposed. Computational experiments
1144
T. Chen, L. Long, and R.Y.K. Fung
show that the proposed genetic algorithm can generate near optimal solutions, and performs very well in suggesting feasible schedules. It can effectively solve large problems with up to 400 jobs.
Acknowledgement The research described in this paper was partly supported by the financial support from National Natural Science Foundation, China (Grant No.60274042), and a Strategic Research Grant (SRG) from City University of Hong Kong (Project No. 7001227).
References 1. Monma, C.L., Potts, C.N.: On the Complexity of Scheduling with Batch Setups. Oper Res. 37 (1989) 798-804 2. Bruno, J., Downey, P.: Complexity of Task Sequencing with Deadlines, Set-up Times and Changeover Costs. SIAM J Comput. 7 (1978) 393-404 3. Ghosh, J.B., Gupta, J.N.D.: Batch Scheduling to Minimize Maximum Lateness. Oper Res Lett. 21 (1997) 77-80 4. Uzsoy, R., Martin-Vega, LA., Lee C.Y.: Leonard PA.Production Scheduling Algorithms for a Semiconductor Test Facility. IEEE Trans Semicond Manufact. 4 (1991) 270-280 5. Uzsoy, R., Lee, C., Martin-Vega, A.: Scheduling Semiconductor Test Operations: Minimizing Maximum Lateness and Number of Tardy Jobs on a Single Machine. Naval Res Logist. 39 (1992) 369-388 6. Ovacik, I.M., Uzsoy, R.: Rolling Horizon Algorithms for a Single-Machine Dynamic Scheduling Problem with Sequence-Dependent Setup Times. Int J Prod Res. 32 (1994) 1243-1263 7. Unal, A.T., Kiran, A.S.: Batch Sequencing. IIE Trans. 24 (1992) 73-83 8. Webster, S., Baker, K.R.: Scheduling Groups of Jobs on a Single Machine. Oper Res. 43 (1995) 692-703
A Study on the Configuration Control of a Mobile Manipulator Base Upon the Optimal Cost Function Kwan-Houng Lee Division of Electronics & Information Engineering, Cheongju University, Naedok-Dong Sangdang-Gu Cheongju-City, Chungbuk, 360-764, Republic of Korea [email protected]
Abstract. A mobile manipulator-a serial connection of a mobile robot and task robot- has the abilities of both moving and performing a task. In this paper, we show that a mobile manipulator transfers an arbitrary object. Here, we figure out the solution for the optimal configuration of a mobile manipulator with a series of tasks. For this, first we define the task vector for contiguous tasks. And after making the major(longest principal) axis of manipulability ellipsoid generated by a task robot to correspond to the task vector, we can find the optimal posture of a mobile manipulator by considering the posture of a task robot for the configuration of a task conversion and computing the position of the mobile robot kinematics. In addition to that, we try to decrease the distance error and direction error of a mobile robot using the Lyapunov equation and focus our mind on the control of a task robot using the inverse kinematics equation.
1 Introduction While a mobile robot can expand the size of the workspace but does no work, a vertical multi-joints robot or manipulator is limited to the work space because of its fixed base structure. To overcome these constraints, we construct a mobile manipulator system and control the robot to take an appropriate posture for the next task efficiently using a redundant joint and perform a task. We define a mobile manipulator as the robot that can accomplish a various task with the mobility and manipulation. A mobile manipulator generates the redundant DOF(Degree Of Freedom) in kinematics different from the other redundant robot with the fixed base structure and with at least six DOF. However, there is still the difficulty in the control method that one should find an appropriate solution among quite many solutions for one task [1]. Fig. 1 shows the mobile manipulator, the robot which is implemented for the experiment. Here the mobile robot is wheel-driven and capable of moving its platform up and down on the basis of gravity center. Moreover, the task robot is mounted on the center of the platform. We pursuit the control of each robot, task robot, mobile robot. In the case of the mobile robot movement, we try to reduce the distance error and position error using Lyapunov equation so that the desired position from the initial position can be included in the workspace of the task robot and the mobile robot moves along the curved path [2]. After inclusion of the desired position D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1145 – 1150, 2006. © Springer-Verlag Berlin Heidelberg 2006
1146
K.-H. Lee
into the workspace, in case of end-effector movement to the desired position, we try to control optimal configuration of mobile manipulator for performing contiguous task by aligning the vector of the main axis in Ellipsoid to the working direction using manipulability ellipsoid [3].
2 Kinematics Analysis of the Mobile Manipulator The mobile robot is a nonholonomic system and the task robot is a holonomic system. So the kinematics of the mobile manipulator can be analyzed in the velocity kinematics [1][4]. In the kinematics analysis as shown in Fig. 2, if the principal coordinate system {W } is defined, the coordinate system of the mobile robot {P} can be configured and the base coordinate system {B} can be generated along the linear link Z P . In addition, the coordinate system {E} of end-effector installed in the task robot can be configured on the basis of the coordinate system {B} [5]. The total transformation matrix is equal to TWE = TWP ⋅ TPB ⋅ TBE . Here TWP is the function of q p = {x, y , α }T , TPB function of q b = {z}T , and TBE the function of q e = {θ 1 ,θ 2 ,θ 3 }T . So the system
variable matrix of the mobile manipulator can be represented as q = {q p q b q e } ∈ R 6 . If the vector of end-effector with respect to the principal coordinate system is denoted as X WE , the velocity kinematics can be figured out as the following equation (1)[6]. X WE = J WE (q ) ⋅ q
(1)
Where J WE (q ) is Jacobian of TWE ZB
{B}
YB
XB
ZW
ZE
YE
{E} XE
YW
{W} XW
ZP
YP
{P} XP
Fig. 1. Mobile Manipulator PURL-II
Fig. 2. Kinematics analysis of the mobile manipulator
3 System Operation Algorithm 3.1 Definition of Task Vector First, if we define the path for a series of tasks in order to define the task vector, we can acquire as the following equation (2).
A Study on the Configuration Control of a Mobile Manipulator
1147
X traj (t ) = (x d (t ), y d (t ), z d (t ) )
(2)
From equation (2), the temporal derivative for each axis is represented in vector form as the following equation (3). (3)
§ dx (t ) dy (t ) dz (t ) · X task (t ) = ¨¨ d , d , d ¸¸ dt dt ¹ © dt
Y
(xd, yd) (xp,yp)
θ
e
ω
α
φ (xi,yi)
Fig. 3. Definition of task vector
X
Fig. 4. Movement of the position
Fig. 3 shows the task vector of a specified trajectory. The curve represents the trajectory and the straight line represents the task vector for each point. If the trajectory is near a curved line, the norm of a task vector is small. And if the trajectory is near a straight line, the norm of a task vector is large. If the length of a task vector is configured as the length of major axis and minor axis, in the curved trajectory, a difference of the length between major axis and minor axis is relatively small and is suitable for the direction change. While in the straight trajectory, a difference of the length between major axis and minor axis is relatively large and is suitable for moving in the major axis of the ellipsoid. Therefore, by using the defined task vector, the optimal posture of a task robot can be obtained 3.2 Path Planning of the Mobile Robot For the robot to move and take a posture from the initial position of the mobile robot to workspace, we establish the coordinate system as shown in Fig. 4. When the mobile robot starts at the current position X i = (xi , y i ) and moves to position X p = (x p , y p ) to include position X d = (x d , y d ) , X d ∈ X traj the final position of the endeffector, within a workspace, we represent the current robot direction, direction error from the current position to the desired position, distance error to the desired point, the mobile robot direction at the desired position as φ , α , e , θ respectively, as shown in Fig 4 . When the mobile robot moves from (xi , yi ) to (xd , yd ) , we make α and e to be minimized, θ to be the direction of the mobile robot at the desired position.
1148
K.-H. Lee
Kinematics Eq x = −v ⋅ cos α , as the following equation (4) [2]. e = −v ⋅ cos α ,
y = v ⋅ sin α , θ = ω e > 0, This is represented
α = −ω +
v ⋅ sin α , e
v ⋅ sin α e
θ =
(4)
Referring to equation (4), we use Lyapunov equation to figure out the solution to make equation (4) to be minimized. Here, Lyapunov Candidate Function is denoted as the following equation (5).
(
1 1 ⋅ λ ⋅ e 2 + α 2 + hθ 2 2 2
V = V1 + V2 =
)
(5)
Where λ , h each is arbitrary positive real number. From equations (5), we obtain as the following equation (6). v ⋅ sin α α + hθ · § V = −λ ⋅ e ⋅ v ⋅ cos α + α ¨ − ω + ⋅ ¸ e ¹ α ©
(6)
In order to converge stably, V has to be V < 0 . Hence, nonlinear mobile robot controller is designed as following equation (7) v =γ
(e cos α ), (γ
ω = k ⋅α + γ
> 0)
cos α ⋅ sin α
α
(7)
(α + hθ ), ( k , h > 0)
Therefore, V satisfies the convergence condition that φ ≅ θ in the case that t → ∞ as the following equation(8), e , α ≅ 0 by mobile robot controller. V = −λ (γ cos 2 α )e 2 − kα 2 ≤ 0
(8)
Fig. 5 shows the simulation result. We configures that initial position of mobile robot is X i = (0, 0, 0.3) and the initial angle φ with respect to x axis is π / 8 . And mobile robot navigates toward the target point X d = (3, 3, 0.5) reducing the direction error α 3 .5
( x
3
p
, y
p
( x
)
d
, y
2 .5
2
1 .5
1
0 .5
( xi, yi)
0
-0 .5
-1
-0 .5
0
0 .5
1
1 .5
2
Fig. 5. Simulation result
2 .5
3
3 .5
d
)
A Study on the Configuration Control of a Mobile Manipulator
1149
and position error e by equation (8). When X d is included within workspace of task robot, the mobile robot finish by stopping at X p . In this posture, end-effector of mobile robot can be located in X d but it may have singularity as to be difficult for changing other postures. Current coordinate is X p = (2.5125, 2.8951) , the distance between target point and current point is L = 0.4982 , and the angle between mobile robot and X axis is θ = 0.5753rad .
4 Experiments In this paper, We fixed the initial point of the mobile robot to X i = (0, 0, 0.5) , and φ is π / 8 , X d = (3, 3, 0.8) . Fig. 6 is the result of this experiment, and the desired point is the top point of the box in the right side of picture. Mobile robot completed action at the X p = (2.42, 2.79) , and the angle between X axis and the mobile robot is θ = 0.61 rad .
Fig. 6. Result of the experiment
Here, In progressing θ 3 = 1.64rad , θ 2 = 1.08rad , X f = (2.52, 2.83, 0.4) , and final direction is θ = 0.17rad . The different of the result between simulation and experiment is caused by the error between the control input and the action of the mobile robot because of the roughness of the bottom, and is caused by the summation of position error through velocity control. In the result, desired position is X d = (3, 3, 0.5) , but result position of the end-effecter is (2.72, 2.85, 0.43) .
5 Conclusions A mobile manipulator is more effective than the redundant robot of a fixed based structure for the work beyond current workspace. In this paper, we structured a mobile manipulator system combined with a vertical multi-joint robot as a task robot and a mobile robot, used the redundant joints of a system efficiently, and controlled
1150
K.-H. Lee
using task robot’s task execution with a changed workspace by driving of mobile robot. When we controlled mobile robot to change workspace, controlled the mobile robot decreasing the distance error and direction error by Lyapunov function. Task robot took an appropriate posture to execute a task using Manipulability Ellipsoid in the workspace, could execute a task by moving a mobile robot in that posture so that the end-effector of a task robot reach the desired position. Therefore, the mobile manipulator executed the task beyond current workspace with an appropriate posture to accomplish the given task in cooperation of mobile robot and task robot. In further study, The exact position control is needed by using a sensor for calibration position error for velocity control. And we will be able to execute cooperation control of the two local robots by making a cost function through the weighting values for the Mobility of a mobile robot and the manipulability ellipsoid of a task robot
References 1. Lee, J. K.: Mobile Manipulator Motion Planning for Multiple Tasks Using Global Optimization Approach. Journal of Intelligent and Robotic System, (18) (1997) 169-190 2. Aicardi, M.: Closed Loop Steering of unicycle-like Vehicles via Lyapunov Techniques. IEEE Robotics and Automation Magazine, (1995) 27-35 3. Tsuneo, Y.:Manipulability of Robotic Mechanisms. The International Journal of Robotics Research. 4(2) (1985) 3-9 4. Homayoun, S.:A Unified Approach to Motion Control of Mobile Manipulator. The International Journal of Robotics Research. 17(2) (1998) 107-118 5. Jea, H. C.:Interaction Control of a Redundant Mobile Manipulator. The International Journal of Robotics Research. 17(12), (1998) 1302-1309 6. Alessandro, D. L.:Nonholonomic Behavior in Redundant Robots Under Kinematic Control. IEEE Transactions on Robotics and Automation. 13(5) (1997) 776-782 7. David, L.:Configuration Control of a Mobile Dexterous Robot:Real-Time Implementation and Experimentation, The International Journal of Robotics Research. 16(5) (1997) 601618 8. Tsuneo, Y.:Foundation of Robotics. MIT press. (1990) 133-135. 9. Mark, W. S.:Robot Dynamics and Control. John Wiley & Sons (1989) 10. Akira, M.:Sub-Optimal Trajectory Planning of Mobile Manipulator. International Conference Robotics & Automation. (2001) 21-26
An Effective PSO-Based Memetic Algorithm for TSP Bo Liu, Ling Wang, Yi-hui Jin, and De-xian Huang Department of Automation, Tsinghua University, Beijing 100084, China [email protected], [email protected]
Abstract. This paper proposes an effective Particle Swarm Optimization (PSO) based Memetic Algorithm (MA) for Traveling Salesman Problem (TSP) which is a typical NP-hard combinatorial optimization problem with strong engineering background. In the proposed PSO-based MA (PSOMA), a novel encoding scheme is developed, and an effective local search based on Simulated Annealing (SA) with adaptive meta-Lamarckian learning strategy is proposed and incorporated into PSO. Simulation results based on well-known benchmarks and comparisons with some existing algorithms demonstrate the effectiveness of the proposed hybrid algorithm for TSP.
1 Introduction The Traveling Salesman Problem (TSP) [1] is a class of widely studied NP-hard combinatorial problems from communication and transportation, and has earned a reputation for being difficult to solve. To account for the difficulty of TSP, a large amount of techniques have been proposed, such as mathematical programming [2], constructive heuristics [3], and improvement heuristics [4], [5]. Recently, particle swarm optimization (PSO) [6] was proposed, which has gained wide applications in different fields mainly for continuous optimization problems [7], [8], [9], [10], [11]. To the best of our knowledge little work was published on PSO for TSP [12]. In this paper, we will propose a PSO-based MA (PSOMA) for TSP. PSOMA applies the evolutionary searching mechanism of PSO characterized by individual improvement, population cooperation and competition to effectively perform exploration. On the other hand, PSOMA utilizes an adaptive local search to perform exploitation. The features of PSOMA can be summarized as follows. First, to make PSO suitable for solving TSP, a Ranked-Order-Value (ROV) rule based on random key representation [13] is presented to convert the continuous position values of a particle to a tour permutation. Then a SA-based local search with multiple different neighborhoods is designed and incorporated to enrich the searching behaviors and to avoid premature convergence, and an effective adaptive Meta-Lamarckian learning strategy is employed to decide which neighborhood to be used.
2 Formulation of TSP The TSP can be described as the problem of finding a shortest closed tour visiting each city once and only once. Given a set {c1 , c2 ,..., cn } of n cities and symmetric D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1151 – 1156, 2006. © Springer-Verlag Berlin Heidelberg 2006
1152
B. Liu et al.
distance d (ci , c j ) which gives the distance between city ci and c j , the goal is to find a permutation π of these n cities that minimizes the following function: n −1
¦ d (cπ
(i ) , cπ (i +1) ) + d (cπ ( n ) , cπ (1) )
.
(1)
i =1
3 Brief Review of PSO In PSO system, the position and the velocity of the i-th particle in the d-dimensional search space can be represented as X i = [ xi ,1 , xi ,2 ,..., xi , d ] and Vi = [vi ,1 , vi ,2 ,..., vi , d ] respectively. Each particle has its own best position ( pbest ) Pi = [ pi ,1 , pi ,2 ,..., pi , d ] corresponding to the personal best objective value obtained so far at time t. The global best particle ( gbest ) is denoted by Pg , which represents the best particle found so far at time t in the entire swarm. The new velocity of particle is calculated as follows: vi , j (t + 1) = wvi , j (t ) + c1r1 ( pi , j − xi , j (t )) + c2 r2 ( p g , j − xi , j (t )), j = 1,2,..., d . (2) where c1 and c2 are acceleration coefficients, w is inertia factor, r1 and r2 are two independent random numbers uniformly distributed in the range of [0, 1]. Thus, the position is updated i according to the following equation: xi , j (t + 1) = xi , j (t ) + vi , j (t + 1), j = 1,2,..., d .
(3)
This process is repeated until a user-defined stopping criterion is reached.
4 PSOMA for STSP 4.1 Solution Representation
Due to the continuous characters of the position of particles in PSO, the standard encoding scheme of PSO can not be directly adopted for TSP. In this paper, a rankedorder-value (ROV) rule based on random key representation [13] is presented to convert the continuous position X i = [ xi ,1 , xi ,2 ,..., xi ,n ] of particles in PSO to permutation of cities π = {c1 , c2 ,..., cn } , thus the performance of the particle can be evaluated. In particular, the rank of each position value of a particle represents a jobindex so as to construct a permutation of cities. In ROV rule, the smallest position value of a particle is firstly picked and assigned a smallest rank value 1. And then the second smallest position value is picked and assigned rank value 2. With the same way, all the position values will be handled to convert the position information of a particle to a city permutation. For instance ( n = 6 ), position is X i = [0.06,2.99,1.86,3,73,2.13,0.67] . Because xi ,1 = 0.06 is the smallest position value, xi ,1 is picked firstly and assigned rank value 1, then xi ,6 = 0.67 is assigned rank value 2. Thus, based on the ROV rule, the city permutation is obtained, i.e, π = [1,5,3,6,4,2] . In our PSOMA, the SA based local search is applied to city
An Effective PSO-Based Memetic Algorithm for TSP
1153
permutation. So, when a local search procedure is completed, the particle’s position information should be repaired to guarantee that the permutation resulted by ROV rule for new position information is the same as the permutation resulted by that local search. That is to say, applying the local search to city permutation, position information should be adjusted correspondingly. The process based on the local search to position information is the same as the process to permutation. For example, if SWAP operator [14] is used as local search, obviously swapping of city 5 and city 6 is corresponding to swapping of position value 2.99 and value 3.73. As for other operators (INVERSE and INSERT [14]), the adjustment is similar. 4.2 PSO-Based Search
In this paper, the PSO-based searches, i.e., (2) and (3) are applied for exploration. Note that, the PSO-based evolution is performed on continuous space. When evaluating the performance of a particle, the position information should be converted to city permutation using the above ROV rule. 4.3 SA-Based Local Search Combining Meta-Lamarckian Learning Strategy
In this paper, a SA-based local search with multiple different neighborhoods is developed to enrich the local searching behaviors and avoid premature convergence. The adaptive Meta-Lamarckian learning strategy in [15] is employed to decide which neighborhood to be used. Three different kinds of neighborhoods, i.e., SWAP (select two distinct elements from an n-city permutation randomly and swap them), INSERT (choose two distinct elements and insert the back one before the front), and INVERSE (invert the subsequence between two different random positions) are utilized. The adaptive Meta-Lamarckian learning strategy [15] is adopted to decide which neighborhood to be chosen for local search to reward the utilization times of those neighborhoods resulting in solution improvement, which is illustrated as follows. SAbased search is divided into training phase and non-training phase. During the training phase, each SA-based local search with different neighborhood is applied for the same times, i.e., n ( n − 1) steps (a generation of Metropolis sampling). Then, the reward η of each neighborhood is determined as follows: η =| pf − cf | /(n( n − 1)) . (4) where n denotes the number of jobs, pf is the objective value of the old permutation, and cf is the objective value of the best permutation found by the local search based on different neighborhood during consecutive n( n − 1) steps. After the reward of each neighborhood is determined, the utilizing probability put of each neighborhood is adjusted using the following equation: K
put ,i = ηi /
¦η j =1
. j
(5)
where ηi is the reward value of the i-th neighborhood, K is the total number of neighborhoods. At non-training phase, according to the utilizing probability of each neighborhood, a roulette wheel rule [16] is used to decide which neighborhood to be used for SA-
1154
B. Liu et al.
based local search. If the i-th neighborhood is used, its reward will be updated by η i = η i + Δη i , where Δηi is the reward value of the i-th neighborhood calculated during a non-training phase (i.e., SA-based searching based on the i-th neighborhood with consecutive n( n − 1) steps). Of course, the utilizing probability put of each neighborhood should be adjusted again for next generation of PSOMA. Moreover, exponential cooling schedule [17] t k = λt k −1 , 0 < λ < 1 , is applied, and the step of Metropolis sampling is set to n( n − 1) . The SA-based local search is only applied to the best solution found so far, i.e., gbest . And if a solution with better quality is found during the local search, gbest of the whole swarm should be updated. 4.4 PSOMA
Based on the proposed ROV rule, PSO-based search, and SA-based local search combining Meta-Lamarckian learning strategy, the PSOMA is described as follows: Step 1: Generate Ps particles with random position values, and determine the corresponding city permutations by ROV. Step 2: Evaluate initial population and determine pbest and gbest , set initial temperature. Step 3: If gbest keeps fixed at consecutive L steps, then output the best solution; otherwise, go to Step 4. Step 4: Update swarm using PSO-based operation, determine permutation using ROV rule, and evaluate the swarm. Step 5: Update the pbest of each particle and gbest . Step 6: Perform SA-based Local Search Combining Adaptive Meta-Lamarckian Learning Strategy for gbest . Step 7: Update the temperature and return to Step 3.
5 Numerical Tests In this paper, five problems [18] named Pr76, Pr107, Pr124, Pr136 and Pr152 are employed for test. Firstly, to validate the effectiveness of each part in PSOMA, two approaches PSOMA and PSOMA_NOSA are compared. In PSOMA_NOSA, SA-based Local Search Combining Adaptive Meta-Lamarckian Learning Strategy is removed from PSOMA. Setting the swarm size ps = 20 , w = 1.0 , c1 = c2 = 2.0 , xmin = 0 ,
xmax = 4.0 , vmin = −4.0 , vmax = 4.0 , initial temperature T0 = 3.0 , annealing rate λ = 0.9 , and stopping parameter L = 30 for all the problems. Each approach is independently run 20 times for every problem, and the statistical results are listed in Table 1. n denotes the number of cities. C* means the best known solutions for symmetric TSP. BEM, AEM, and WEM respectively denote the relative error of the best, average, and worst with respect to the best known solutions C*. From Table 1, it is shown that the BRE, ARE and WEM values obtained by PSOMA are much better than those obtained by PSOMA_NOSA, which demonstrates the effectiveness by
An Effective PSO-Based Memetic Algorithm for TSP
1155
incorporating the local search into PSO. That is to say, the superiority in terms of searching quality and robustness of PSO owns to the combination of global search and local searches, i.e., the balance of exploration and exploitation. Since local search elements are employed, the computational time of PSOMA is larger than that of algorithms without local search, while it can be regarded as the cost to avoid premature convergence so as to achieve better optimization performances. To show the effectiveness of PSOMA, we carry out simulation to compare our PSOMA with the hybrid PSO based on VNS (PSOVNS) [19]. In our simulation, the parameters of PSOVNS are the same as those used in [19], while the PSOVNS uses the same number of evaluation with PSOMA for each instance. The statistical results of the two algorithms are listed in Table 2. Table 1. Comparisons between PSOMA and PSOMA_NOSA Problem
n
C*
Pr76 Pr107 Pr124 Pr136 Pr152
76 107 124 136 152
108159 44303 59030 96772 73682
BEM 0.80 0.63 1.09 1.64 0.52
PSOMA AEM 3.87 5.05 6.69 8.91 7.85
WEM 12.08 13.09 17.97 16.88 19.45
PSOMA_NOSA BEM AEM WEM 119.39 190.38 223.30 510.55 626.77 766.59 445.06 513.17 597.66 367.29 419.70 468.54 569.19 652.25 764.49
Table 2. Comparisons between PSOMA and PSOVNS Problem Pr76 Pr107 Pr124 Pr136 Pr152
n 76 107 124 136 152
C* 108159 44303 59030 96772 73682
PSOMA BEM AEM 0.80 3.87 0.63 5.05 1.09 6.69 1.64 8.91 0.52 7.85
WEM 12.08 13.09 17.97 16.88 19.45
PSOVNS BEM AEM 17.56 23.82 22.30 49.68 37.17 58.57 24.74 37.60 47.60 68.82
WEM 30.76 66.77 82.07 51.47 83.82
From Table 2, it is shown that the BRE, AEM, and, WEM values obtained by PSOMA are much better than those obtained by PSOVNS for all the instances, So, it is concluded that our proposed local search method, especially their utilization in hybrid senses, are more effective than the pure VNS-based local search in [19]. In brief, our PSOMA is more effective than PSOVNS.
6 Conclusions In this paper, by hybridizing population-based evolutionary searching ability of PSO with local improvement ability of the local search heuristic to balance exploration and exploitation, an effective PSO based MA (PSOMA) is proposed for the TSP. The simulation results and comparisons demonstrate the superiority of the proposed PSOMA in terms of searching quality and robustness.
1156
B. Liu et al.
Acknowledgements This research is partially supported by National Science Foundation of China (Grant No. 60204008, 60374060 and 60574072) as well as the 973 Program (Grant No. 2002CB312200).
References 1. Wang, L.: Intelligent Optimization Algorithms with Applications. 4th edn. Tsinghua University & Springer Press, Beijing (2001) 2. Padberg, M., Rinaldi, G.: A Branch-and-Cut Algorithm for the Resolution of Large-Scale Symmetric Traveling Salesman Problems. S.I.A.M. Review 33 (1991) 60-100 3. Glover, F., Gutin, G., Yeo, A., Zverovich, A.: Construction Heuristics for the Asymmetric TSP. European Journal of Operational Research 129 (2001) 555-568 4. Chatterjee, S., Carrera, C., Lynch, L.A.: Genetic Algorithms and Traveling Salesman Problems. European Journal of Operational Research 93 (1996) 490-510 5. Cowling, P.I., Keuthen, R.: Embedded Local Search Approaches for Routing Optimization. Computers & Operations Research 32 (2005) 465-490 6. Kennedy J., Eberhart, R.C., Shi, Y.: Swarm Intelligence. Morgan Kaufmann Publishers, San Francisco (2001) 7. Liu, B., Wang, L., Jin, Y.H., Huang, D.X.: Advances in Particle Swarm Optimization Algorithm. Control and Instruments in Chemical Industry 32 (2005) 1-6 8. Liu, B., Wang, L., Jin, Y.H., Tang, F., Huang, D.X.: Improved Particle Swarm Optimization Combined with Chaos. Chaos, Solitons and Fractals 25 (2005) 1261-1271 9. Liu, B., Wang, L., Jin, Y.H., Huang, D.X.: Designing Neural Networks Using Hybrid Particle Swarm Optimization. Lecture Notes in Computer Science, Vol. 3496. Springer Verlag, Berlin Heidelberg (2005) 391-397 10. Liu, B., Wang, L., Jin, Y.H., Tang, F., Huang, D.X.: Directing Orbits of Chaotic Systems by Particle Swarm Optimization. Chaos, Solitons and Fractals, 29 (2006) 454-461 11. Liu, B., Wang, L., Jin, Y.H.: Hybrid Particle Swarm Optimization for Flow Shop Scheduling with Stochastic Processing Time. Lecture Notes in Artificial Intelligence, Vol. 3801. Springer Verlag, Berlin Heidelberg (2005) 630-637 12. Clerc, M.: Discrete Particle Swarm Optimization, Illustrated by Traveling Salesman Problem. In New Optimization Techniques in Engineering. Springer-Verlag, Berlin (2004) 13. Bean J.C.: Genetic Algorithms and Random Keys for Sequencing and Optimization. ORSA Journal of Computing 6 (1994) 154-160 14. Wang, L., Zheng, DZ.: An Effective Hybrid Heuristic for Flow Shop Scheduling. Int. J. Adv. Manuf. Technol. 21 (2003) 38-44 15. Ong, Y.S., Keane, A.J.: Meta-Lamarckian Learning in Memetic Algorithms. IEEE Trans. Evol. Comput. 8 (2004) 99-110 16. Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison Wesley, MA (1989) 17. Wang, L., Zheng, DZ.: An Effective Hybrid Optimization Strategy for Job-shop Scheduling Problems. Comput. Oper. Res. 28 (2001) 585-596 18. Reinelt, G.: TSPLIB - A Traveling Salesman Problem Library. ORSA Journal on Computing 3, 4 (1991) 376-384 19. Tasgetiren, M.F., Sevkli, M.Y., Liang, C., Gencyilmaz, G.: Particle Swarm Optimization Algorithm for Permutation Flowshop Sequencing Problem. Lecture Notes in Computer Science, Vol. 3172. Springer Verlag, Berlin Heidelberg (2004) 382-389
Dual-Mode Control Algorithm for Wiener-Typed Nonlinear Systems Haitao Zhang and Yongji Wang Department of Control Science and Engineering, Huazhong University of Science and Technology, Wuhan, 430074, P.R.C {zht, wangyjch}@mail.hust.edu.cn
Abstract. Wiener-typed nonlinear systems with hard input constraints are ubiquitous in industrial processes. However, because of the complex structure, there are only few achievements on the control algorithm for constrained Wienertyped system. An improved dual-mode control algorithm is put forward. Firstly, Zeroin Algorithm is applied to obtain the inverse of the static nonlinear block of the Wiener-typed nonlinear system. Then, we define the invariant ellipsoid sets for estimated state and estimated error respectively, and guarantee the feasibility, stability and convergence of this algorithm by the theory of invariant set combined with Dual-Model approach[1]. In contrast to traditional algorithms, this one has the advantages of larger initial stable region in state space and higher tracking accuracy. Finally, the feasibility and superiority of the proposed algorithm are validated by case studies.
1 Introduction During a lot of real industrial processes such as distillation[2], pH neutralization control [3,4], hydro-control, and chemical reaction system[5], there widely exists a type of nonlinear system which can be described by Wiener model. This model consists of a linear dynamic element followed by a static nonlinear element [2]. Wiener models correspond to processes with linear dynamics associated with general nonlinear operators. In recent years, the control of Wiener-typed nonlinear systems has become one of the most important and difficult tasks in nonlinear control field[2-5]. In recent years, Suny and Lee[6] worked over the method approaching nonlinear element with polynomials based on the suggestion that the output of linear block should be feasible. However, in practical industrial control, this assumption can not always be satisfied. In 1997[2], Kalafatis and Wang proposed a method identifying the two parts of Wiener model at the same time, but an assumption must be satisfied that the inverse of the nonlinear element can be approached by P order polynomials with satisfying precision, which is its limitation. Thus, it can be concluded that most of the existing control algorithms for Wiener-typed nonlinear systems have some limitations more or less, and there are few achievements on dealing with hard input constraints so far. Therefore, it is valuable to find a solution for this problem. In 2000[8], Yang et al proposed an approach based on dual-mode algorithm to control a Hammerstein-typed nonlinear system with hard input constraints, and gained excellent control D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1157 – 1162, 2006. © Springer-Verlag Berlin Heidelberg 2006
1158
H. Zhang and Y. Wang
performances. Nevertheless, because of the complexity of Wiener-typed nonlinear systems, they have not taken into account about the hard input constraints them. In this paper, based on Yang’s idea[1,8,9], an improved output feedback control method is presented. The main contribution of this paper is that, for constrained Wiener-typed system, Dual-mode control approach [9] is imported to effectively enlarge the closedloop stable region. This paper is organized as follows. In section 2, the problem descriptions in Zdomain and state space are given. The algorithm for systems subjected to constraints is presented in sections 3, and the stability and convergence analyses are also given in this section. In section 4, simulation results are shown, which validate the feasibility and superiority of these proposed algorithms. Finally, conclusion remarks are drawn in section 5.
2 Problem Description Wiener model consists of a linear dynamic element followed by a static nonlinear element. The discrete structure of this model is shown in figure 1. The difference equation is described as
a ( z −1 )η (k ) = z − d b( z −1 )u ( k ) . ® y ( k ) f [ η ( k ) ] = ¯
(1)
a( z −1 ) and b( z −1 ) are polynomials of z −1 , d is system’s time delay, f (⋅) is a static nonlinear function, u (k ) , y (k ) and η (k ) are the input, output and output of
where
dynamic linear block respectively. Assume the linear element of our Wiener-typed plant can be described by
x(k +1) = Ax(k ) + Bu (k ) .
(2)
η (k ) = Cx(k ) .
(3)
Now, the problems this paper addresses can be summarized as follows. If the prior knowledge of the static nonlinear block, whose inverse exists, is partially known at least, how to guarantee the closed-loop stability of Wiener-typed system with hard input constraints u min ≤ u ≤ u max ? Furthermore, how to enlarge the closed-loop stable region?
3 Output Feedback Control for Constrained Wiener-Typed System The state observer is given by
xˆ ( k + 1) = Axˆ ( k ) + Bu ( k ) + L (η( k ) − Cxˆ ( k ) ) .
(4)
Dual-Mode Control Algorithm for Wiener-Typed Nonlinear Systems
where
xˆ ( k ) is the estimated state in k
th
sampling
1159
period,
and
−1 −1 η~(k ) = f Zeroin ( y (k )) , in which f Zeroin refers to the inverse of f calculated by
Zeroin Algorithm[10]. The estimation state error is defined by
e ( k ) x ( k ) − xˆ ( k ) .
(5)
Then, we can define the invariant ellipsoid sets of estimated state and estimated state errors as follows, respectively
where order
S { xˆ | xˆ T Pxˆ ≤ 1} .
(6)
Se {e | eT Pe e ≤ e 2 } , 0 ≤ e 2 ≤ 1 .
(7)
Pe and P are positive definite square matrices. If xˆ (k ) ∈ S , e(k ) ∈ S e , in to
make
these
ellipsoid
xˆ (k + 1)Pxˆ (k + 1) ≤ 1 and e
T
T
sets
invariant,
( k + 1) Pe e ( k + 1) ≤ e
2
it
is
required
that
.
For a Wiener-typed nonlinear system subjected to hard input constraints, i.e.
u min ≤ u ≤ u max , we extend the estimated state xˆ ( k ) by D(k ) = [d1 (k ),", dn (k )]T . d
Thus, the state vector becomes zˆ ( k ) = ª xˆ ¬
T
( k ) , D ( k )º¼ T
T
. Then we can design the
stable state feedback control law by
u (k ) = Kxˆ (k ) + ED(k ) . where
E = [1, 0 , " 0 ]1× n d
(8)
[
]
, D(k ) = d1 (k ),", d nd (k )
T
.
Assume
that
η( k ) −η ( k ) = δ ¬ªη ( k ) ¼º Ce ( k ) ,(δ ¬ªη ( k )¼º ≤ σ ) , then substituting (8) to (4) yields xˆ ( k + 1) = Ψxˆ ( k ) + BED ( k ) + LC ª¬1 + δ (⋅) º¼ e ( k ) . Take the extended state
(9)
zˆ ( k ) into consideration, we obtain
(
zˆ ( k + 1) = Ξzˆ ( k ) + ª LC ª¬1 + δ ( ⋅) º¼ «¬
)
T
[
T
0º e ( k ) . »¼
(10)
]
T A + BK BE º with Ξ = ª , M = ªϑ I º , ϑ = 0 " 0 ( n −1)×1 , d « 0 » «0 ϑ T » M¼ ¬ ¬ ¼ nd × nd Define ellipsoid invariant set of the estimation extended state
S { zˆ | zˆT Pzˆ ≤ 1} . where the
P is a positive define matrix.
(11)
1160
H. Zhang and Y. Wang
Lemma 1[8]: For ∀μ >1,τ =1+1/ ( μ −1) , and
a,bare matrices with the same size, then
(a + b )T P(a + b ) ≤ μa T Pa + τb T Pb .
(12)
Theorem 1: The linear time-invariant block of the Wiener-typed nonlinear system is shown as (2), (3), and the state space expression of the observer is given as (4), where the feedback vector K and the observer feedback vector L are stable. In addition,
δ ª¬η ( k ) º¼ ≤ σ
. Then, if the following assumption A1 is satisfied, then
S and S e
are invariant sets in the sense of (11) and (7), respectively. Moreover, the control law (8) will converge to unconstrained stable feedback control law u (k ) = Kxˆ (k ) , and the system is closed-loop asymptotically stable. A1 There exist
μ > 1, μ> 1, ( μ , μ∈ R )
such that
μ~Ξ T P Ξ ≤ (1 − e 2 )P .
(13)
τ(1 + σ 2 ) C T LT ExT PEx LC ≤ Pe .
(14)
μΨ T Pe Ψ + τσ 2C T LT Pe LC ≤ Pe .
(15)
where τ = 1 + 1/ ( μ − 1)
τ= 1 + 1/ ( μ− 1) , and E x is a transform factor
xˆ = E xT zˆ . Proof: For ∀zˆ (k ) ∈ S , we conclude from (11) and (14) that
μ~zˆ(k )Ξ T P Ξzˆ(k ) ≤ (1 − e 2 )zˆ T (k )P zˆ(k ) ≤ 1 − e 2 . For ∀e ( k ) ∈ Se , from (13),
(16)
xˆ T (k + 1)Pxˆ (k + 1) ≤ 1 and δ ª¬η ( k ) º¼ ≤ σ , we have
τeT ( k ) ª¬1 + δ ( ⋅) º¼ C T LT ExT PEx LCe ( k ) ≤ eT ( k ) Pe e ( k ) ≤ e 2 . 2
Moreover, applying Lemma 1, for
(17)
∀zˆ ( k ) ∈ S we have
(
)
zˆT ( k +1) Pzˆ ( k +1) ≤ μzˆT ( k ) ΞT PΞz ( k ) +τe ( k ) 1+ δ ( ⋅) CT LT ExT PEx LCe ( k ) ≤ 1 . T
Hence, S is an invariant set of extended state In similar way, we can gain from (15) that
2
(18)
zˆ ( k ) . S e is an invariant set. Moreover, take
the same way of [8] , we can prove the closed-loop stability of our algorithm.
Dual-Mode Control Algorithm for Wiener-Typed Nonlinear Systems
1161
4 Case Study Plant:
x(k + 1) = Ax(k ) + Bu (k ) , η (k ) = Cx(k ) .
(19)
y (k ) = η 4 (k )sin [η (k )] − η 5 (k ) .
(20)
ª2.3 −1.2º ª1º ª2.0º − 1.5 ≤ u (k ) ≤ 3 , A = « , B = « » , C = [1 0]. x( 0) = « » » ¬2.5¼ ¬1 0 ¼ ¬0 ¼ T Firstly, we set K = [− 2.4179 1.1495] , and L = [1.0556 0.3704] . The
where
control performance tracking {40, -40} double-step signals is shown in Fig.1. the ~᧹1.5 , σ = 0.1 The upper subfigparameters are: n d = 5 , e = 0.4 . μ᧹1.1᧨μ
ures are curves of r (dash-dot line: set points), y (solid line), and η (dashed line), respectively; the lower subfigure is the curve of u . These results validate the feasibility of our proposed algorithm.
Fig. 1. Control performance tracking step signals
Fig. 2. Tracks and ellipsoid invariant sets of system state and estimated state
1162
H. Zhang and Y. Wang
x and xˆ , where the solid and the dashed ellipsoids refers to the invariant sets S for n d = 5 and n d = 0 , respectively. The dashed lines and circular points are the track of xˆ , and the solid lines and star points denote the track of x . Fig, 2 illuminates the power of the auxiliary vector, i.e. the tracks of x and xˆ will converge to the origin quickly enough, which Fig. 2 shows the tracks and ellipsoid invariant sets of
validates the superiority of this proposed algorithm.
5 Conclusion For Wiener-typed nonlinear systems subjected to hard input constraints, a control algorithm based on dual-model technique is proposed. This algorithm has the following two advantages: 1) high precision, 2) large closed-loop stability region. Its feasibility and superiority are validated by simulation results.
References 1. Chen, H. (ed.): Quasi-finite horizon nonlinear model predictive control scheme with guaranteed stability. Automatica, Vol. 14,(1998) 1205-1217 2. Bloemen, H.H.J. (ed.): Wiener model identification and predictive control for dual composition control of a distillation column. J. Process Control, Vol.14, (2001) 601-620 3. Kalafatis, A. (ed.): A new approach to the identification of pH process based on the Wiener model, Chem.Eng.Science, Vol.50, (1995) 3693-3701 4. Pajunen, G.A. (ed.): Identification of a pH process represented by a nonlinear Wiener model. IFAC Adaptive. Syst. Control Signal Processing, (1983) 91-95. 5. Norquay, S.J. (ed.): Application of Wiener model predictive control (WMPC) to an industrial C2-splitter, J. Process Control, Vol.9, (1999) 461-473 6. Wang, X.J. (ed.): Weighting adaptive control of Wiener model based on multilayer feedforward neural networks, Proc. the 4th Word Congress on Intelligent Control and Automation. June 10th-14th , 2002, Shanghai, China 7. Kalafatis, A.D., Wang, L.: Identification of Wiener-type nonlinear system in a noisy environment. Int.J.Control. Vol.66, (1997) 923-941 8. Yang J.J.: Study on the Model Predictive Control Method of System with Input Constraints. Doctor Thesis of Northeast University of China. June, (2000). 9. Szmaier, M. (ed.): Suboptimal control of linear systems with state and control inequality constrains. Proc. IEEE Conf. Dec. Contr., (1997) 761-762 10. George,E.F. (ed.): Computer Methods for Mathematical Computations [B], Prentice-Hall, Inc., Englewood Cliffs, New Jersey, (1977) 156-166
NDP Methods for Multi-chain MDPs
Hao Tang1,2, , Lei Zhou1 , and Arai Tamio2 1
School of Computer and Information, Hefei University of Technology, Tunxi Road No.193, Hefei, Anhui 230009, P.R. China, 2 Dept. of Precision Engineering, School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan [email protected] / {htang,arai}@prince.pe.u-tokyo.ac.jp
Abstract. Simulation optimization techniques are discussed for multichain Markov decision processes (MDPs) by the learning of performance potentials. Different from ergodic or unichain models, where a single sample path suffices to be used for the learning of potentials, under a multichain case, there are more than one recurrent classes for the underlying Markov chain, therefore the sample path has to be restarted often so as not to circulate only in one recurrent class. Similar to unichain models, temporal difference (TD) learning algorithms can also be developed for learning potentials. In addition, by representing the estimates of potentials via a neural network, one neuro-dynamic programming (NDP) method, i.e., the critic algorithm, is derived as what has been supposed for unichain models. The obtained results are also applicable for general multichain semi-Markov decision processes (SMDPs), and we use a numerical example to illustrate the extension.
1
Introduction
For an ergodic or unichain MDP , we assume each stationary policy have only one recurrent class. However, this assumption may not hold for some practical problems. So, it is valuable to discuss the more general case, i.e., multi-chain MDPs. The underlying Markov chain corresponding to at least one stationary policy consists of two or more recurrent classes [1]. Performance potentials, introduced mainly by Cao for MDPs, have been proved efficient in optimizing ergodic MDPs [5]. Recently, Cao and Guo have extended it to discrete-time multichain models [2]. Consider that potentials of multi-chain MDPs can also be estimated by sample paths as in unichain cases,
Partially supported by the National Nature Science Foundation of China (60404009), the Nature Science Foundation of Anhui Province(050420303), and the Support Project of Hefei University of Technology for Science and Technology-Innovation Groups. Corresponding author. Received his PH.D degree in 2002 from University of Science and Technology of China, thereafter an associate professor of Hefei University of Technology, P.R. China, and now also a visiting scholar at Advanced Robotics with Artificial Intelligence Lab in the University of Tokyo, Japan.
D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1163–1168, 2006. c Springer-Verlag Berlin Heidelberg 2006
1164
H. Tang, L. Zhou, and A. Tamio
so that neuro-dynamic programming (NDP) methods can be developed [3,4]. We have showed its successful application in ergodic MDPs or SMDPs by using potentials [6]. In this paper, we will discuss one model of NDP, i.e., critic algorithm, for multichain processes, where the potentials are learned based on TD(λ) learning and approximated by a neural network, and then approximate policy iteration is followed.
2
Continuous-Time Multi-chain MDPs
Consider a MDP {X(t), t ≥ 0} with state space Φ = {1, 2, · · · , M }, and let {Xn : Xn ∈ Φ, n = 0, 1, 2 · · · } be its underlying Markov chain, where Xn denotes the system state at the nth decision epoch. The set of stationary policies is denoted by Ω = {v|v : Φ → D} with D being the compact action set. Associated with any policy v is a M × M matrix Av such that its ij-element aij (v(i)) gives the transition rate to state j upon taking action v(i) at state i. Now suppose the process corresponding to policy v has m recurrent classes, i.e., Φ1 , Φ2 , · · · , Φm , and an additional class Φm+1 for transient states so that Φ = Φ1 ∪ Φ2 ∪ · · · ∪ Φm ∪ Φm+1 . We can assume matrix Av has the following canonical form " v # AR 0 v , (1) A = LvR Lvm+1 where AvR =diag(Av1 , Av2 , · · · , Avm ) with each Avi serving as the infinitesimal generator of recurrent class m. Our goal is to choose a policy v ∗ from Ω that minimizes a given criteria in ∗ expectation, that is for any v ∈ Ω, ηαv ≤ ηαv under discounted criteria (α > 0, is ∗ a discount factor) or η v ≤ η v under average criteria. For every i, j ∈ Φ, let aij (v(i)) be a continuous function defined on compact set D. Since Φ is finite, we can select a constant μ ≥ maxi∈Φ,v(i)∈D {−aii (v(i))}. v Av and constant μ can yield a stochastic matrix P = Av /μ + I, and I is the v identity matrix. In addition, P determines a uniformized Markov chain of the original continuous-time process with discount factor β = μ/(μ + α).
3
3.1
Multiple Sample Paths-Based NDP Optimization by Critic Algorithm Learning of Performance Potential
In this paper, we only consider average criteria for multichain models. As α = 0, let the performance potential vector, g v = (g v (1), g v (2), · · · , g v (M ))τ , satisfy v (−Av + μPv )g v = f v , where Pv is defined as a Cesaro limit of P [1]. By the v v definition of P and g , let v
(I − P + Pv )g v = f v
(2)
NDP Methods for Multi-chain MDPs
1165
with gv = μg v , denoting the potential vector of the uniformized chain. By (2), we get gv (i) = f (i, v(i)) + P i (v(i))g v − Pvi g v (3) v
P i (v(i)) and Pvi denote the ith row of matrix P and Pv respectively. It is easy to prove η v = μPv g v = Pv gv . The main idea of NDP is to approximate some values through parametric architectures. Here, we will use a neural network to represent the potentials, and train the architecture parameters with sample paths by TD(λ) learning. TD(λ) learning is a multi-step method, where λ refers to the use of eligibility traces. Suppose the network output * g v (i, r) to be the approximation of gv (i) as we input i. Here, r is the parameter vector of the network. Then the parametric TD formula of potentials can be derived by (3) as follows dn = d(Xn , Xn+1 , r) = f (Xn , v(Xn )) − η*v (Xn ) + g*v (Xn+1 , r) − g*v (Xn , r) (4) where η*v (Xn ) is the estimate of average cost η v (Xn ). We will consider accumulating traces for TD(λ) learning that takes the form as
βλZn−1 (i) Xn = i Zn (i) = (5) βλZn−1 (i) + 1 otherwise where Zn (i) denotes the trace of state i at the nth decision epoch. (4) and (5) yield the following parameterized TD(λ) learning r := r + γZn dn
(6)
g v (Xn , r) Zn := λZn−1 + ∇*
(7)
where and ∇ denotes the gradient with respect to r. 3.2
Difficulty of the Learning in Multichain Cases
As we simulate a multichain process, it must ultimately follows into one recurrent class and circulate in this class forever, and only partial states can be visited by a single sample path. Therefore we have to restart the process often or use other approaches to derive multiple sample paths to approximate potentials. Another important character is that there is neither unique stationary distribution nor unique average cost, so that η v (Xn ) may be different rather than identical for variant state Xn . Then, the learning or estimation of potentials and average costs will become more difficult in comparison with ergodic or unichain processes. First, if we know distinctly the classification of states for any given policy, then it will be easier to treat the learning. Since the average costs is identical for any two states of the same recurrent class, we only need m units to store the m average costs corresponding to Φ1 , Φ2 , · · · , Φm . For each sample path, it appears similar to unichain with recurrent class Φz , z ∈ {1, 2, · · · , m}, and the average cost of recurrent class, i.e., ηzv , is estimated according to ηzv + δf (Xn , v(Xn )) η*zv := (1 − δ)*
(8)
1166
H. Tang, L. Zhou, and A. Tamio
where δ denotes the stepsize. Note that, no matter which state the sample path starts from, (8) will generate a good approximation of ηzv after sufficient steps. For a transient state, its average cost is mainly determined by the values of all the recurrent classes, and the ultimate transition probabilities to every recurrent classes. Then, the learning of average costs for transient class at the end of a sample path can take the form η v (Xn ) + δ* ηzv η*v (Xn ) := (1 − δ)*
(9)
where δ can be viewed as the statistic probability of transition from Xn to recurrent class Φz . On the other hand, it is very difficult to deal with the situation as multichain structure is unknowable. The straightforward method is to memorize each value of average costs for all states, or directly to find out the classification for every policy under the condition that the model parameters are known. However, it is unpractical in large scale systems because of “the curse of dimensionality” and “the curse of modeling”. There is no good approach to overcome these obstacles in our learning. The only heuristic method we may suppose is that we still use (8) for the learning of all states visited by a sample path, and use the average of the values, learned in the past paths, as initial cost of the next sample path. 3.3
Critic Algorithm Based on Potential Learning
For an optimal policy v ∗ of the uniformized chain of a multichain MDP, the average costs and potentials satisfy the system of two optimality equations, that is, 3 4 ∗ ∗ ∗ v v∗ v∗ 0 = min{P η − η } and 0 = min f (i, v(i)) + P i (v(i))g v − gv − η v (i) v∈Ω
∗
v(i)∈B i
∗
with B i = {d|d ∈ D, P i (d)η v = η v (i)}. Obviously, they are similar to the optimality equations in [2,1]. The algorithms is as follow. Algorithm 1. Policy Evaluation Step 1. Select a neural network and initial parameters r, λ and η v . Step 2. Select integer N and let n = 0, choose the initial state Xn . v Step 3. Simulate the next state Xn+1 according to P . Step 4. Calculate η*v (Xn ), η*zv and r through (8) or (9), (4), (6) and (7). Step 5. If n < N , let n := n + 1, and go to step 3. Step 6. If stopping criteria is satisfied, exit; otherwise, go to step 2. Algorithm 2. Policy Improvement Step 1. Let k = 0 and select an arbitrary initial policy v0 . Step 2. Evaluate policy vk by estimating η*vk and g*vk through Algorithm 1. Step 3. Try to improve the policy by the following substeps: Substep 3.1. Choose a policy v*k+1 satisfying 3 v 4 (10) v*k+1 ∈ arg min P η*vk v∈Ω
Substep 3.2. For every i ∈ Φ, if P i (* vk+1 (i)) η*vk = η*vk (i), select an action vk+1 (i) such that 5 6 (11) g vk vk+1 (i) ∈ arg min f (i, v(i)) + P i (v(i))* k
v(i)∈B i
NDP Methods for Multi-chain MDPs
1167
k
where B i = {d|d ∈ D, P i (d)* η vk = η*vk (i)}; otherwise, let vk+1 (i) = v*k+1 (i). Step 4. If some stopping criteria is satisfied, exit; otherwise, let k := k + 1, and go to step 2.
4
A Numerical Example About a SMDP
Since a semi-Markov decision process can be treated as an equivalent continuoustime MDP by using an infinitesimal generator [7,8], our results can also be extended to a multichain SMDP. An example is followed. Consider a SMDP with finite state space Φ = {1, 2, · · · , 25} and compact action set D = [1, 5]. Here, there are two recurrent classes, i.e., Φ1 = {1, 2, · · · , 10} and Φ2 = {11, 12, · · · , 20}, and a transient class Φ3 = {21, 22, · · · , 25}. For i, j ∈ Φ1 , the transition probabilities of the underlying embedded Markov chain satisfy pij (v(i)) = exp(−v(i)/j)/[M (1 + exp(−v(i)))] as j = nexti ; otherwise, pij (v(i)) = 1 − j=nexti pij (v(i)). Here we use nexti to denote the next state of i, and nexti=10 = 1. The sojourn time of state i ∈ Φ1 satisfies 3-order Erlang distribution with the parameter√3v(i). The performance cost of state i satisfies f (i, v(i)) = ln[(1 + i)v(i)] + i/(2v(i)). For i, j ∈ Φ2 , pij (v(i)) = 20 exp(−v(i)/j)/ k=11 exp(−v(i)/k), the sojourn time distribution Fij (t, v(i)) = 1 − x1 exp(Gv(i) t)e, and f (i, v(i)) = ln(i/v(i)) + (v(i) + 1)v(i)/i. Here, x1 = [5/8, 3/8], Gv(i) = v(i)[−1, 0; 0, −3]. For i ∈ Φ3 and j ∈ Φ, if j = i, pij (v(i)) = exp[−(v(i) − 50/i)2/j]/25; otherwise, pij (v(i)) = 1 − j=i pij (v(i)). In addition, Fij (t, v(i)) = 1 − x1 exp(Gv(i) t)e, and f (i, v(i)) = 0.5v 2 (i) + v(i)/i. For simplicity, we choose a BP network with topological architecture being 5 × 3 × 1. We set the hidden layer to be sigmoid, and the output layer is linear. We first transform such a SMDP to an equivalent MDP [7,8], and then using the supposed critic algorithm, get the optimization results as showed in Table 1. By the computation-based policy iteration algorithm, the average costs of recurrent classes Φ1 and Φ2 are 2.54140 and 2.37472 respectively, and the average costs of transient states are 2.42300, 2.42177, 2.42068, 2.41970, and 2.41882 respectively as the parameter of stop criteria ε = 0.00001. The whole computation time is only several seconds. By the proposed NDP algorithm, we have derived similar optimization results. In addition, less memory is needed to store the potentials, however it comes at the cost of much longer computation time. We also illustrate the optimization process by Figure 1, which shows the generated sequence of
Table 1. Optimization results by using NDP-based algorithm ε η vε (i), i ∈ Φ1 η vε (i), i ∈ Φ2 η vε (i), i ∈ Φ3 ts (s) 0.01 2.73396 2.37736 2.48065,2.47802,2.47567,2.47358,2.47170 176.8 0.001 2.71005 2.37647 2.47309,2.47063,2.46844,2.46648,2.46473 238.4 0.00001 2.62910 2.37982 2.45202,2.45019,2.44854,2.44709,2.44577 903.3
1168
H. Tang, L. Zhou, and A. Tamio
3.8
3.3
3.2
3.6
3.1 3.4
average cost
average cost
3 3.2
3
2.8
2.9
2.8
state 1−10 2.7
state 1−10 2.6
2.6
state 21−25 state 21−25
2.5
2.4
state 11−20
2.4
state 11−25 2.2
0
5
10
15
20
25
30
35
40
45
iteration times
Fig. 1. Average case as α = 0 and λ = 0
50
2.3
0
5
10
15
20
25
30
35
iteration times
Fig. 2. Average case as α = 0 and λ = 0.2
average costs based on TD(0) learning. Figure 2 is the optimization processes of TD(λ) learning with λ = 0.2.
5
Conclusions
By multiple sample paths, we can solve the optimization problems of multi-chain MDPs through potential-based NDP. It is more complex than that for ergodic or unichain models. In addition, there are many other issues valuable to discuss, such as the robust decision schemes for uncertain multichain processes.
References 1. Puterman, M. L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994) 2. Cao, X. R., Guo, X. P.: A Unified Approach to Markov Decision Problems and Performance Sensitivity Analysis with Discounted and Average Criteria: Multichain Cases. Automatica. 40 (2004) 1749–1759 3. Bertsekas, D. P., Tsitsiklis, J. N.: Neuro-Dynamic Programming. Athena Scientific, Belmont Massachusetts (1996) 4. Sutton, R. S., Barto, A. G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge MA (1998) 5. Cao, X. R.: From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning. Discrete Event Dynamic Systems: Theory and Applications. 13 (2003) 9-39 6. Tang, H., Yuan, J. B., Lu, Y., Cheng, W. J.: Performance Potential-Based NeuroDynamic Programming for SMDPs. Acta Automatic Sinica(Chinese). 31 (2005) 642-645 7. Cao, X. R.: Semi-Markov Decision Problems and Performance Sensitivity Analysis. IEEE Trans. on Automatic Control. 48 (2003) 758–769 8. Yin, B. Q., Xi, H. S., Zhou, Y. P.: Queueing System Performance Analysis and Markov Control Processes. Press of University of Science and Technology of China, Hefei (2004)
Research of an Omniberaing Sun Locating Method with Fisheye Picture Based on Transform Domain Algorithm Xi-hui Wang 1, Jian-ping Wang 2, and Chong-wei Zhang 2 1 Hefei University of Technology, School of Electrical Engineering and Automation, Graduate Student. 545 P. O. mailbox HFUT south campus, Hefei, Anhui Province, China [email protected] 2 Hefei University of Technology, School of Electrical Engineering and Automation, Professor
Abstract. In this paper, a novel method of locating the sun spot is presented. Used the math transform domain algorithm to emphasize the brightness of the sun area in the fisheye picture, the human’s vision brightness sensitivity is simulated by this optic color filter impact. The small sun in the fisheye picture is segmented, and transformed to the plane picture instead of the whole picture. It is easy to get the coordinates of the sun area barycenter from the plane picture to the fisheye picture. Then the azimuth angle and the vertical angle between the vision point and the sun spot are calculated. The results of the experimentation show that the amount of computation of the algorithm is reduced a lot. It is accurate, fast and real-time.
1 Introduction In the research of mobile tracking sun spot, one of the key technologies is to omniberaing judge the position of the sun spot fast, accurately and dynamically. However it is difficult that the sun spot is dynamically located by the method of traditional latitude and longitude orientation. The large angle and three-dimensional picture is hardly to get by the plane picture. It is imprecise that the sun spot is located by multi-hole locate system [1]. Observed from the ground, the track of the sun movement approximately is a 0°— 180°curve from east to west. The process of simulating human to judge the position of sun can be described as the following steps: first get the sky picture form the holding point, then located the brightest point or area in the picture, at last calculated the azimuth angle and the vertical angle between the vision point and the sun. A fast locating algorithm is presents based on fisheye picture. Picture of sky is get by fisheye lens from our vision point. the human’s vision brightness sensitivity is simulated which the high bright area is emphasized in the fish eye picture by (H, S, I) math transform domain algorithm. The small sun in the fisheye picture is segmented, and transformed to the plane picture instead of the whole picture. It is easy to get the coordinates of the sun area barycenter from the plane picture to the fisheye picture. Then the azimuth angle and the vertical angle between the vision point and the sun spot are calculated. Finally the omniberaing dynamical fast tracking of sun spot is achieved. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1169 – 1174, 2006. © Springer-Verlag Berlin Heidelberg 2006
1170
X.-h. Wang , J.-p. Wang , and C.-w. Zhang
2 A Fast Judge Algorithm of the Brightest Point in Picture Based on Transform Domain Algorithm It is assured that the brightest point in the fisheye picture of the sky is the sun spot, and the naked eye is more sensitive to brightness than chroma. The optic color filter which intensifies the brightness information of the picture while impairing the chroma information of the background is achieved by math transform domain algorithm. [2] The function to transform the RGB picture into HIS picture is showed below:
I = ( R + G + B) / 3
S = 1−
3 min( R, G, B) I
H =θ
G≥B
H = 2π − θ
G≤B
(1)
(2)
In function (2)
ª « −1 θ = cos « « «¬
1 [(R − G ) + (R − B )] º» 2 » (R − G )2 + (R − B )(G − B ) » »¼
(3)
Fh Fs Fi can be used to identify the space F. In another word, F = {Fh , Fs , Fi } . In this equation Fh = WH H The only three-dimensional vectors
Fi = WS S
FS = WI I
WH
WS
WI are the three weights.
After anglicizing and counting with the experiment, in order to emphasize the sun area which is strong in brightness and short of change in color, we commonly define WH =0.12 WS =0.14 WI =0.65. In this way, the color of the background which is thin in brightness is weakened, and the brightest point that we are interested in is emphasized. The position of the sun is a bright spot in fisheye picture. In a HIS picture this spot can simply mark as a pixel area which is not only H ≤ 50 but also I ≥ 150
P = { pi H PI ≤ 50 ∪ I PI ≥ 150} i=1 2 … N
(4)
It is hard to locate the accurate position of the center of the sun area with fisheye picture because of distortion of fisheye picture, so it is necessary to transform the fisheye picture to plane picture. Considering the tremendous distance between vision point and the sun, it is only necessary to transform the small area of aggregation (4).
Research of an Omniberaing Sun Locating Method with Fisheye Picture
1171
3 The Transform Arithmetic between Fisheye Picture and Plane Picture In a fisheye picture f(x is
y)
(x=1
2
…
n
y=1
2
…
n), the shooting angle
ϕ (get from the notebook of the fisheye lens). From these parameters, we can easily
get the paraboloid equation of fisheye lens. [3] The paraboloid equation is
§ϕ · §ϕ · ( x − ¦ x / Tt ) 2 + ( y − ¦ y / Tt ) 2 = (r1ctg ¨ ¸) 2 − 2r1ctg ¨ ¸ z ©2¹ ©2¹
(5)
In function (5) T t is the number of the effective pixels,Ȉx and Ȉy are the once quadratures. r1 =_OE_
Fig. 1. Relationship between fisheye picture andrandom distance plane picture
The relationship between fisheye picture and random distance plane picture is showed in Fig.1. P1 is a random point of the sun area in fisheye picture. With equation 5 , it is easy to calculate the corresponding point P2 in ellipse paraboloid. Linked point O and point P2, then prolonged the line OP2 across the plane ABCD on point P3. Point P3 is the only corresponding point P1. Mapping the every point in sun spot aggregation P into plane coordinates system, the sun spot area in plane picture is got. In order to locate the position of the sun, it is necessary to calculate the barycenter of the sun area in plane picture. N
xo =
N
¦x i =0
N
pi
,
yo =
¦y
pi
i =0
(6)
N
In (6) equations, x pi is the transverse coordinates, y pi is the vertical coordinates, ( xo
yo ) is the coordinates of the barycenter of the sun spot in plane picture.
1172
X.-h. Wang , J.-p. Wang , and C.-w. Zhang
4 The Location Arithmetic of Angles Between Sun and Vision Point Azimuth angle and vertical angle are the two key elements in tracking and locating system, and the aim of this paper is to find out the function of these two parameters. Because of the characteristic of the fisheye picture, it is easy to locate the position. So it is necessary to transform the xo yo to coordinates in fisheye picture. The transforming process refer to section 3. Supposing the corresponding coordinates in fisheye picture of xo yo is p1 x1, y1 .
(
)
Fig. 2. Point location in fisheye picture
(
)
Showed in Fig.2, every random point p1 x1, y1 in two-dimensional projection plane OXY has its corresponding point
p2 ( x2 , y 2 , z 2 ) in paraboloid. According to
the equation (5), the function of Z is
Z=
(
R 2 − (X 2 + Y 2 ) 2R
)
The relationship between p1 x1, y1 and
(7)
p2 ( x2 , y 2 , z 2 ) is showed below
x2 = x1 y 2 = y1 z2 =
(
R − x1 + y1 2R 2
2
2
(8)
)
Now it is safe and clear to conclude the functions of the azimuth angle vertical angle β
α
and
Research of an Omniberaing Sun Locating Method with Fisheye Picture
(
R 2 − x1 + y1 z 2R β = arctg 2 = arctg x2 x1
α = arctg
yt x2 + y 2 2
2
2
2
)
y1
= arctg
(
§ R 2 − x12 + y12 x1 + ¨¨ 2R © 2
1173
(9)
)·¸
2
¸ ¹
5 Experiment Analysis Fig.3 is the original picture caught by fisheye lens. Fig.4 is the result after transacting the Fig.3 with transform domain algorithm which achieves optic color filter impact. Seen from the Fig.4 the figure of the sun is clearly.
Fig. 3. Original picture caught by fisheye lens
Fig. 4. Picture after transact
1174
X.-h. Wang , J.-p. Wang , and C.-w. Zhang
6 Conclusion The proposed algorithm in this paper to fast determine the azimuth angle and vertical angle between sun and vision point has several advantages. 1 With fisheye lens , the whole sun moving trace is caught in one picture. 2 With transform domain algorithm which achieves optic color filter impact, fast segmenting the high bright part in picture is easy. 3 According to the relationship between fisheye picture and plane picture, only transform the high bright part would be satisfied, the amount of computation reduce a lot. 4 With help of this algorithm, the azimuth angle and vertical angle between sun and vision point accurately can be fast calculated In conclusion, the merits of the algorithm announced in this paper are short of amount of computation, high accuracy and the most important point is that it achieves the mobile omniberaing fast locating sun. The further key study should be to simplify algorithm and enhance real-time characteristic.
References 1. Chen, S.E., Quick Time, V.R.: An Image Based Approach to Virtual Environment. Proceedings of Siggraph’95. Los Angeles, LA, USA(1995) 29-38 2. Wang, J.P., Qian, B., Jiang, T.: Research on the Segmentation Method of License Plate Based on Space Transform Analysis. Journal of HEFEI University of Technology, 27(3) (2004) 251-255 3. Wang, J.Y., Yang, X.Q., Zhang, C.M.: Environments of Full View Navigation Based on Picture Taken by Eye Fish Lens. Journal of System Simulation, Vol. 13. SUPPL, (2001) 66-68 4. Shah, S., Aggarwal, J.K.: A Simple Calibration Procedure for Fish Eye (high distortion) Lens Camera. In: Proceedings of the IEEE International Conference on Robotics and Automation, San Diego, CA, USA, 3 (1994) 3422- 3427
Author Index
Ahn, Jae-Hyeong 566, 676, 1093 An, GaoYun 90 An, Senjian 450 Baek, Seong-Joon 488, 735 Barajas-Montiel, Sandra E. 876 Bashar, Md. Rezaul 9 Bayarsaikhan, Battulga 201 Bi, Houjie 938 Cai, Lianhong 1107 Cao, Xue 556 Cao, Yongfeng 528 Chang, Un-Dong 566, 676 Chen, Baisheng 1068 Chen, Duansheng 1068 Chen, Guobin 211 Chen, Huajie 882 Chen, Hui 631 Chen, Jianhua 809 Chen, Jiayu 410 Chen, Jingnian 1036 Chen, Min-Jiang 364 Chen, Tao 141 Chen, Tianding 100, 120, 263, 285 Chen, TsiuShuang 1137 Chen, Weidong 932 Chen, Wen-Sheng 191, 294, 547 Chen, Xiaosu 995 Chen, Yanquan 285 Cheng, Long 430 Cho, Youn-Ho 470 Choi, Hyunsoo 970 Choi, Soo-Mi 945 Dai, Li 888 Dai, Ru-Wei 131 Dimililer, Kamil 913 Dong, Shugang 715 Dong, Yuan 906 Duan, Huilong 639, 803 Erkmen, Burcu
779, 1081
Fang, Bin 294, 547 Fang, Zhijun 211, 958 Feng, Chen 715, 761 Feng, Yueping 600 Fu, Shujun 1036 Fung, Richard Y.K. 1137 Gao, Ming 1 Gao, Xiao-Shan 191 Gao, Xue 657 Gao, Yong 172 Gomez, Emilia 951 Gu, Juan-juan 663 Gu, Xuemai 741 Guan, Jin-an 1101 Guo, Ge 689 Guo, Jun 773, 906 Guo, Qing 741 Guo, Xiaoxin 600, 815 Ha, Jong-Eun 478, 606, 728 Han, Bo 840 Han, Jae-Hyuk 1093 Han, Jialing 72 Han, Xiao 870 Hazan, Amaury 951 He, Hongjie 374 He, Yong 42 Hong, Hyunki 827 Hong, Sung-Hoon 488 Hou, Gang 72 Hou, Zeng-Guang 430 Hu, Dewen 645, 864 Hu, Dongchuan 689 Hu, Min 421 Hu, Zhijian 251 Huang, De-Xian 1125, 1151 Huang, Dezhi 906 Huang, Zhichu 982 Hwang, Yongho 827 Jang, Daesik 1024 Jang, Dong-Young 230 Jeong, Mun-Ho 478, 606, 728 Ji, Guangrong 715, 761
1176
Author Index
Ji, Zhenhai 846 Jia, Chunfu 709 Jia, Xiuping 791 Jiang, Gang-Yi 626, 988 Jiang, Julang 421 Jiang, Tao 421, 932 Jiao, Liangbao 938 Jin, Weidong 150 Jin, Yi-hui 1151 Jing, Zhong 220 Jo, Kang-Hyun 1030 Ju, Fang 1056 Kang, Dong-Joong 478, 606, 728 Kang, Hang-Bong 852 Kang, Hyun-Deok 1030 Kang, Jiayin 797 Kang, Jin-Gu 1113 Kang, Sanggil 54 Khashman, Adnan 913 Kim, Cheol-Ki 976 Kim, Daejin 488 Kim, Dong Kook 488 Kim, Dong-Woo 566, 676 Kim, Hyung Moo 440 Kim, Hyun-Joong 945 Kim, Jane-Jin 1113 Kim, Jeong-Sik 945 Kim, Kap-Sung 821 Kim, Wonil 54, 894 Kim, Yong-Deak 626 Kim, Young-Gil 566 Kim, Youngouk 1042 Kong, Jun 19, 62, 72 Kong, Min 900 Kwak, Hoon Sung 334, 440 Kwon, Dong-Jin 1093 Lee, Lee, Lee, Lee, Lee, Lee, Lee, Lee, Lee, Lee, Lee, Lee,
Bae-Ho 488 Bum Ju 721 Bum-Jong 1018 Choong Ho 1062 Chulhee 970 DoHoon 976 Eunjae 970 Han Jeong 334 Han-Ku 894 Heon Gyu 721 Hwa-Sei 976 Jihoon 1042
Lee, Kang-Kue 470 Lee, Kwan-Houng 1113, 1145 Lee, Kyoung-Mi 182 Lee, Wang-Heon 478 Lei, Jianjun 773 Li, Fucui 626 Li, Hua 572 Li, Hui 241 Li, Jiao 755 Li, Lei 1075 Li, Miao 220 Li, Xiao 797 Li, Yao-Dong 497 Li, Yongjie 834 Li, Zheng-fang 683, 696 Li, Zhi-shun 1008 Lin, Xinggang 517 Liu, Bo 1151 Liu, Gang 773 Liu, Guixi 1013 Liu, Heng 578 Liu, Jian 303, 620 Liu, Jinzhu 797 Liu, Ping 858 Liu, Qingbao 864 Liu, Qing-yun 1008 Liu, Wanquan 450 Liu, Wei 919 Liu, Yang 864 Long, Lei 1137 Lu, Chong 450 Lu, Yan-Sheng 364 Lu, Yinghua 19, 62, 72 Luan, Qingxian 797 Luo, Bin 312, 900 Luo, Jie 241 Luo, Xiaobin 631 Lv, Yujie 241 Ma, Jixin 312 Ma, Si-liang 870 Ma, Xin 1056 Ma, Yongjun 1087 Maestre, Esteban 951 Meng, Helen M. 1107 Min, Lequan 797 Moon, Cheol-Hong 230, 821 Nam, Mi Young 201 Nguyen, Q. 322
Author Index Nian, Rui 715, 761 Nie, Xiangfei 773 Niu, Yifeng 343 Noh, Kiyong 721 Oh, Sangyoon
54
Paik, Joonki 1042 Pang, Yunjie 815 Park, Aaron 488, 735 Park, Changhan 1042 Park, Changwoo 1042 Park, Jong-Seung 1018 Park, Kyu-Sik 470, 1051 Peng, Fuyuan 620 Peng, Yuhua 353 Ping, Xijian 689, 1075 ¨ un¸c 402 Polat, Ov¨ Premaratne, P. 322 Puiggros, Montserrat 951 Qi Miao 62 Qian, Bin 1125 Ramirez, Rafael 951 Reyes-Garc´ıa, Carlos A. 876 Rhee, Phill Kyu 9, 201 Rong, Haina 150 Ruan, QiuQi 90, 1036 Ryu, Keun Ho 721 Safaei, F. 322 Sedai, Suman 201 Sekeroglu, Boran 913 Seo, Duck Won 440 Shang, Yan 461, 517 Shao, Wenze 925 Shao, Yongni 42 Shen, Lincheng 343 Shi, Chaojian 30, 651 Shi, Xinling 809 Shi, Zhongchao 702 Shon, Ho-Sun 721 Song, Changzhe 1013 Song, Huazhu 840 Song, Jian-She 162 Song, Jiatao 211 Song, Young-Jun 566, 676, 1093 Su, Guangda 461, 517
Sun, Hong Sun, Ning
1177
392, 410, 528 846
Tamio, Arai 1163 Tan, Min 430 Tan, Yihua 303 Tang, Hao 1163 Tang, Jianliang 191 Tang, Yuan Yan 547 Tao, Liang 663 Tian, Jie 241 Tian, Jinwen 303 Tian, Yan 620 Tian, Zheng 749 Tong, Li 1075 Uwamahoro, Diane Rurangirwa Vicente, Veronica
951
Wan, Changxuan 958 Wang, Bin 30, 651 Wang, Chun-Dong 1 Wang, Chun-Heng 131, 497 Wang, Hai-Hui 364 Wang, Haila 906 Wang, Hong 614 Wang, Jian 773 Wang, Jian-Li 141 Wang, Jian-ping 1169 Wang, Jianzhong 19 Wang, Jue 858 Wang, Junyan 517 Wang, Kuanquan 964 Wang, Lei 670 Wang, Liguo 755, 767, 791 Wang, Ling 1125, 1151 Wang, Lu 670 Wang, Shuhua 62 Wang, Tao 275 Wang, Wei 72 Wang, Wenqia 1036 Wang, Wenyuan 670 Wang, Xi-hui 1169 Wang, Xin 815 Wang, Xiong 1125 Wang, Xiu-Feng 1 Wang, Yangsheng 172, 593, 702 Wang, Yongji 1157 Wang, Yu-Er 988
728
1178
Author Index
Wang, Yunxiao 600, 815 Wang, Yuru 62 Wang, Zhang 303 Wang, Zhengxuan 600 Wang, Zhengyou 211, 958 Wei, Wei 882 Wei, Zhihui 584, 925 Wen, Jing 547 Wen, Xian-Bin 749 Wu, Dan 741 Wu, Shaoxiong 112 Wu, Shiqian 211, 958 Wu, Xiangqian 964 Wu, Yan 888 Wu, Zhenhua 995 Wu, Zhiyong 1107 Wu, ZhongCheng 1002 Xia, Shunren 639, 803 Xia, Yong 497 Xiang, Youjun 696 Xiang, Zhiyu 785 Xiao, Bai-Hua 131, 497 Xiao, Daoju 995 Xiao, Huan 670 Xiao, Yi 81 Xie, Ling 374 Xie, Sheng-Li 507 Xie, Yubo 620 Xin, Guan 81 Xing, Guobo 620 Xu, Dong 572 Xu, Ge 392, 410 Xu, Jun 162 Xu, Lei 131 Xu, Liang 858 Xu, Min 141 Xu, Shenghua 958 Xu, Weidong 639, 803 Xu, Xin 528 Xu, Yong 220 Xu, Zhi-liang 683, 696 Xu, Zhiwen 600 Xue, Feng 421 Xue, Quan 211 Yan, Jingqi 578 Yan, Qinghua 982 Yang, Jianwei 294 Yang, Jing-Yu 220, 556
Yang, Juanqi 689 Yang, Miao 809 Yang, Ming 353 Yang, Shiming 761 Yang, Wen 392, 410 Yang, Xi 932 Yang, Zhen 773 Yao, Dezhong 834 Yi, Wenjuan 626 Yıldırım, T¨ ulay 402, 779, 1081 Yoon, Kyoungro 894 Yoon, Kyungro 54 Yoon, Won-Jung 1051 You, Bum-Jae 728 You, He 81 You, Kang Soo 334, 440 You, Xinge 547 Young, Nam Mi 9 Yu, Mei 626 Yuan, Xiaoliang 906 Zeng, De-lu 683 Zeng, Weiming 211, 958 Zhang, Baixing 461 Zhang, Chengxue 251 Zhang, Chong-wei 1169 Zhang, David 220, 578, 964 Zhang, De 938 Zhang, Fan 538 Zhang, Gexiang 150 Zhang, Haitao 1157 Zhang, Han-ling 1008 Zhang, Hua 749 Zhang, Jiafan 982 Zhang, Jiashu 374, 631 Zhang, Jingbo 19 Zhang, Jingdan 19 Zhang, Jingwei 645 Zhang, Jun 584 Zhang, Li-bao 382 Zhang, LiPing 1002 Zhang, Shanwen 614 Zhang, Tai-Ping 547 Zhang, Tong 858 Zhang, Xinhong 538 Zhang, Xinxiang 294 Zhang, Yan 275 Zhang, Ye 755, 767, 791 Zhang, Yonglin 982 Zhang, Yong Sheng 275
Author Index Zhang, Yousheng 421 Zhang, Yu 870 Zhang, Yufeng 809 Zhang, Zanchao 803 Zhang, Zeng-Nian 988 Zhang, Zutao 631 Zhao, Chunhui 767 Zhao, Di 1013 Zhao, Guoxing 312 Zhao, Haifeng 900 Zhao, Li 846 Zhao, Xuying 593, 702 Zhao, YuHong 919 Zheng, Wenming 846 Zheng, Xiaolong 593, 702 Zheng, Yong-An 162
Zheng, Yuan F. 932 Zhong, Anming 709 Zhong, Luo 840 Zhou, Lei 1163 Zhou, Lijian 715, 761 Zhou, Ming-quan 382 Zhou, Weidong 1056 Zhou, Wenhui 785 Zhou, Wen-Ming 162 Zhou, Xinhong 353 Zhou, Zhi-Heng 507 Zhou, Zongtan 645, 864 Zhu, Liangjia 645 Zhu, Zhong-Jie 988 Zhuo, Qing 670 Zou, Cairong 846
1179
Lecture Notes in Control and Information Sciences Edited by M. Thoma and M. Morari Further volumes of this series can be found on our homepage: springer.com
Vol. 345: Huang, D.-S.; Li, K.; Irwin, G.W. (Eds.) Intelligent Computing in Signal Processing and Pattern Recognition 1179 p. 2006 [3-540-37257-1] Vol. 344: Huang, D.-S.; Li, K.; Irwin, G.W. (Eds.) Intelligent Control and Automation 1121 p. 2006 [3-540-37255-5] Vol. 341: Commault, C.; Marchand, N. (Eds.) Positive Systems 448 p. 2006 [3-540-34771-2] Vol. 340: Diehl, M.; Mombaur, K. (Eds.) Fast Motions in Biomechanics and Robotics 500 p. 2006 [3-540-36118-9] Vol. 339: Alamir, M. Stabilization of Nonlinear Systems Using Receding-horizon Control Schemes 325 p. 2006 [1-84628-470-8] Vol. 338: Tokarzewski, J. Finite Zeros in Discrete Time Control Systems 325 p. 2006 [3-540-33464-5] Vol. 337: Blom, H.; Lygeros, J. (Eds.) Stochastic Hybrid Systems 395 p. 2006 [3-540-33466-1] Vol. 336: Pettersen, K.Y.; Gravdahl, J.T.; Nijmeijer, H. (Eds.) Group Coordination and Cooperative Control 310 p. 2006 [3-540-33468-8] Vol. 335: Kozáowski, K. (Ed.) Robot Motion and Control 424 p. 2006 [1-84628-404-X] Vol. 334: Edwards, C.; Fossas Colet, E.; Fridman, L. (Eds.) Advances in Variable Structure and Sliding Mode Control 504 p. 2006 [3-540-32800-9] Vol. 333: Banavar, R.N.; Sankaranarayanan, V. Switched Finite Time Control of a Class of Underactuated Systems 99 p. 2006 [3-540-32799-1] Vol. 332: Xu, S.; Lam, J. Robust Control and Filtering of Singular Systems 234 p. 2006 [3-540-32797-5] Vol. 331: Antsaklis, P.J.; Tabuada, P. (Eds.) Networked Embedded Sensing and Control 367 p. 2006 [3-540-32794-0] Vol. 330: Koumoutsakos, P.; Mezic, I. (Eds.) Control of Fluid Flow 200 p. 2006 [3-540-25140-5]
Vol. 329: Francis, B.A.; Smith, M.C.; Willems, J.C. (Eds.) Control of Uncertain Systems: Modelling, Approximation, and Design 429 p. 2006 [3-540-31754-6] Vol. 328: Lora, A.; Lamnabhi-Lagarrigue, F.; Panteley, E. (Eds.) Advanced Topics in Control Systems Theory 305 p. 2006 [1-84628-313-2] Vol. 327: Fournier, J.-D.; Grimm, J.; Leblond, J.; Partington, J.R. (Eds.) Harmonic Analysis and Rational Approximation 301 p. 2006 [3-540-30922-5] Vol. 326: Wang, H.-S.; Yung, C.-F.; Chang, F.-R.
H∞ Control for Nonlinear Descriptor Systems
164 p. 2006 [1-84628-289-6] Vol. 325: Amato, F. Robust Control of Linear Systems Subject to Uncertain Time-Varying Parameters 180 p. 2006 [3-540-23950-2] Vol. 324: Christoˇdes, P.; El-Farra, N. Control of Nonlinear and Hybrid Process Systems 446 p. 2005 [3-540-28456-7] Vol. 323: Bandyopadhyay, B.; Janardhanan, S. Discrete-time Sliding Mode Control 147 p. 2005 [3-540-28140-1] Vol. 322: Meurer, T.; Graichen, K.; Gilles, E.D. (Eds.) Control and Observer Design for Nonlinear Finite and Inˇnite Dimensional Systems 422 p. 2005 [3-540-27938-5] Vol. 321: Dayawansa, W.P.; Lindquist, A.; Zhou, Y. (Eds.) New Directions and Applications in Control Theory 400 p. 2005 [3-540-23953-7] Vol. 320: Steffen, T. Control Reconˇguration of Dynamical Systems 290 p. 2005 [3-540-25730-6] Vol. 319: Hofbaur, M.W. Hybrid Estimation of Complex Systems 148 p. 2005 [3-540-25727-6] Vol. 318: Gershon, E.; Shaked, U.; Yaesh, I. H∞ Control and Estimation of State-muliplicative Linear Systems 256 p. 2005 [1-85233-997-7] Vol. 317: Ma, C.; Wonham, M. Nonblocking Supervisory Control of State Tree Structures 208 p. 2005 [3-540-25069-7]
Vol. 316: Patel, R.V.; Shadpey, F. Control of Redundant Robot Manipulators 224 p. 2005 [3-540-25071-9] Vol. 315: Herbordt, W. Sound Capture for Human/Machine Interfaces: Practical Aspects of Microphone Array Signal Processing 286 p. 2005 [3-540-23954-5]
Vol. 300: Nakamura, M.; Goto, S.; Kyura, N.; Zhang, T. Mechatronic Servo System Control Problems in Industries and their Theoretical Solutions 212 p. 2004 [3-540-21096-2] Vol. 299: Tarn, T.-J.; Chen, S.-B.; Zhou, C. (Eds.) Robotic Welding, Intelligence and Automation 214 p. 2004 [3-540-20804-6]
Vol. 314: Gil', M.I. Explicit Stability Conditions for Continuous Systems 193 p. 2005 [3-540-23984-7]
Vol. 298: Choi, Y.; Chung, W.K. PID Trajectory Tracking Control for Mechanical Systems 127 p. 2004 [3-540-20567-5]
Vol. 313: Li, Z.; Soh, Y.; Wen, C. Switched and Impulsive Systems 277 p. 2005 [3-540-23952-9]
Vol. 297: Damm, T. Rational Matrix Equations in Stochastic Control 219 p. 2004 [3-540-20516-0]
Vol. 312: Henrion, D.; Garulli, A. (Eds.) Positive Polynomials in Control 313 p. 2005 [3-540-23948-0]
Vol. 296: Matsuo, T.; Hasegawa, Y. Realization Theory of Discrete-Time Dynamical Systems 235 p. 2003 [3-540-40675-1]
Vol. 311: Lamnabhi-Lagarrigue, F.; Lora, A.; Panteley, E. (Eds.) Advanced Topics in Control Systems Theory 294 p. 2005 [1-85233-923-3]
Vol. 295: Kang, W.; Xiao, M.; Borges, C. (Eds) New Trends in Nonlinear Dynamics and Control, and their Applications 365 p. 2003 [3-540-10474-0]
Vol. 310: Janczak, A. Identiˇcation of Nonlinear Systems Using Neural Networks and Polynomial Models 197 p. 2005 [3-540-23185-4]
Vol. 294: Benvenuti, L.; De Santis, A.; Farina, L. (Eds) Positive Systems: Theory and Applications (POSTA 2003) 414 p. 2003 [3-540-40342-6]
Vol. 309: Kumar, V.; Leonard, N.; Morse, A.S. (Eds.) Cooperative Control 301 p. 2005 [3-540-22861-6] Vol. 308: Tarbouriech, S.; Abdallah, C.T.; Chiasson, J. (Eds.) Advances in Communication Control Networks 358 p. 2005 [3-540-22819-5]
Vol. 307: Kwon, S.J.; Chung, W.K. Perturbation Compensator based Robust Tracking Control and State Estimation of Mechanical Systems 158 p. 2004 [3-540-22077-1] Vol. 306: Bien, Z.Z.; Stefanov, D. (Eds.) Advances in Rehabilitation 472 p. 2004 [3-540-21986-2] Vol. 305: Nebylov, A. Ensuring Control Accuracy 256 p. 2004 [3-540-21876-9] Vol. 304: Margaris, N.I. Theory of the Non-linear Analog Phase Locked Loop 303 p. 2004 [3-540-21339-2] Vol. 303: Mahmoud, M.S. Resilient Control of Uncertain Dynamical Systems 278 p. 2004 [3-540-21351-1]
Vol. 293: Chen, G. and Hill, D.J. Bifurcation Control 320 p. 2003 [3-540-40341-8] Vol. 292: Chen, G. and Yu, X. Chaos Control 380 p. 2003 [3-540-40405-8] Vol. 291: Xu, J.-X. and Tan, Y. Linear and Nonlinear Iterative Learning Control 189 p. 2003 [3-540-40173-3] Vol. 290: Borrelli, F. Constrained Optimal Control of Linear and Hybrid Systems 237 p. 2003 [3-540-00257-X] Vol. 289: Giarre, L. and Bamieh, B. Multidisciplinary Research in Control 237 p. 2003 [3-540-00917-5] Vol. 288: Taware, A. and Tao, G. Control of Sandwich Nonlinear Systems 393 p. 2003 [3-540-44115-8] Vol. 287: Mahmoud, M.M.; Jiang, J.; Zhang, Y. Active Fault Tolerant Control Systems 239 p. 2003 [3-540-00318-5]
Vol. 302: Filatov, N.M.; Unbehauen, H. Adaptive Dual Control: Theory and Applications 237 p. 2004 [3-540-21373-2]
Vol. 286: Rantzer, A. and Byrnes C.I. (Eds) Directions in Mathematical Systems Theory and Optimization 399 p. 2003 [3-540-00065-8]
Vol. 301: de Queiroz, M.; Malisoff, M.; Wolenski, P. (Eds.) Optimal Control, Stabilization and Nonsmooth Analysis 373 p. 2004 [3-540-21330-9]
Vol. 285: Wang, Q.-G. Decoupling Control 373 p. 2003 [3-540-44128-X]