Intelligent Control and Automation: International Conference on Intelligent Computing, ICIC 2006, Kunming, China, August, 2006 (Lecture Notes in Control and Information Sciences)

Lecture Notes in Control and Information Sciences 344 Editors: M. Thoma, M. Morari De-Shuang Huang, Kang Li, George W...

Author: De-Shuang Huang | Kang Li | George William Irwin

295 downloads 1657 Views 13MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Lecture Notes in Control and Information Sciences 344 Editors: M. Thoma, M. Morari

De-Shuang Huang, Kang Li, George William Irwin (Eds.)

Intelligent Control and Automation International Conference on Intelligent Computing, ICIC 2006 Kunming, China, August 16–19, 2006

ABC

Series Advisory Board F. Allgöwer, P. Fleming, P. Kokotovic, A.B. Kurzhanski, H. Kwakernaak, A. Rantzer, J.N. Tsitsiklis

Editors De-Shuang Huang

George William Irwin Queen’s University Belfast, UK E-mail: [email protected]

Institute of Intelligent Machines Chinese Academy of Sciences Hefei, Anhui, China E-mail: [email protected]

Kang Li Queen’s University Belfast, UK E-mail: [email protected]

Library of Congress Control Number: 2006930913 ISSN print edition: 0170-8643 ISSN electronic edition: 1610-7411 ISBN-10 3-540-37255-5 Springer Berlin Heidelberg New York ISBN-13 978-3-540-37255-4 Springer Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microﬁlm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable for prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springer.com c Springer-Verlag Berlin Heidelberg 2006 The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Camera-ready by author, data conversion by Scientiﬁc Publishing Services, Chennai, India Cover design: design & production GmbH, Heidelberg Printed on acid-free paper

SPIN: 11816492

89/techbooks

543210

Preface

The International Conference on Intelligent Computing (ICIC) was formed to provide an annual forum dedicated to the emerging and challenging topics in artificial intelligence, machine learning, bioinformatics, and computational biology, etc. It aims to bring together researchers and practitioners from both academia and industry to share ideas, problems and solutions related to the multifaceted aspects of intelligent computing. ICIC 2006 held in Kunming, Yunnan, China, August 16-19, 2006, was the second International Conference on Intelligent Computing, built upon the success of ICIC 2005 held in Hefei, China, 2005. This year, the conference concentrated mainly on the theories and methodologies as well as the emerging applications of intelligent computing. It intended to unify the contemporary intelligent computing techniques within an integral framework that highlights the trends in advanced computational intelligence and bridges theoretical research with applications. In particular, bio-inspired computing emerged as having a key role in pursuing for novel technology in recent years. The resulting techniques vitalize life science engineering and daily life applications. In light of this trend, the theme for this conference was “Emerging Intelligent Computing Technology and Applications”. Papers related to this theme were especially solicited, including theories, methodologies, and applications in science and technology. ICIC 2006 received over 3000 submissions from 36 countries and regions. All papers went through a rigorous peer review procedure and each paper received at least three review reports. Based on the review reports, the Program Committee finally selected 703 high-quality papers for presentation at ICIC 2006. These papers cover 29 topics and 16 special sessions, and are included in five volumes of proceedings published by Springer, including one volume of Lecture Notes in Computer Science (LNCS), one volume of Lecture Notes in Artificial Intelligence (LNAI), one volume of Lecture Notes in Bioinformatics (LNBI), and two volumes of Lecture Notes in Control and Information Sciences (LNCIS). This volume of Lecture Notes in Control and Information Sciences (LNCIS) includes 142 papers covering 4 relevant topics and 1 special session topics. The organizers of ICIC 2006, including Yunan University, the Institute of Intelligent Machines of the Chinese Academy of Science, and Queen’s University Belfast, have made enormous effort to ensure the success of ICIC 2006. We hereby would like to thank the members of the ICIC 2006 Advisory Committee for their guidance and advice, the members of the Program Committee and the referees for their collective effort in reviewing and soliciting the papers, and the members of the Publication Committee for their significant editorial work. We would like to thank Alfred Hofmann, executive editor from Springer, for his frank and helpful advice and guidance throughout and for his support in publishing the proceedings in the Lecture Notes series. In particular, we would like to thank all the authors for contributing their

VI

Preface

papers. Without the high-quality submissions from the authors, the success of the conference would not have been possible. Finally, we are especially grateful to the IEEE Computational Intelligence Society, The International Neural Network Society and the National Science Foundation of China for the their sponsorship. June 2006

De-Shuang Huang Institute of Intelligent Machines Chinese Academy of Sciences, China Kang Li Queen’s University Belfast, UK George William Irwin Queen’s University Belfast, UK

ICIC 2006 Organization

General Chairs:

De-Shuang Huang, China Song Wu, China George W. Irwin, UK

International Advisory Committee Aike Guo, China Alfred Hofmann, Germany DeLiang Wang, USA Erke Mao, China Fuchu He, China George W. Irwin, UK Guangjun Yang, China Guanrong Chen, Hong Kong Guoliang Chen, China Harold Szu, USA John L. Casti, USA Marios M. Polycarpou, USA

Mengchu Zhou, USA Michael R. Lyu, Hong Kong MuDer Jeng, Taiwan Nanning Zheng, China Okyay Knynak, Turkey Paul Werbos, USA Qingshi Zhu, China Ruwei Dai, China Sam Shuzhi GE, Singapore Sheng Zhang, China Shoujue Wang, China Songde Ma, China

Stephen Thompson, UK Tom Heskes, Netherlands Xiangfan He, China Xingui He, China Xueren Wang, China Yanda Li, China Yixin Zhong, China Youshou Wu, China Yuanyan Tang, Hong Kong Yunyu Shi, China Zheng Bao, China

Program Committee Chairs:

Kang Li, UK Prashan Premaratne, Australia

Steering Committee Chairs:

Sheng Chen, UK Xiaoyi Jiang, Germany Xiao-Ping Zhang, Canada

Organizing Committee Chairs:

Yongkun Li, China Hanchun Yang, China Guanghua Hu, China

Special Session Chair:

Wen Yu, Mexico

Tutorial Chair:

Sudharman K. Jayaweera, USA

Publication Chair:

Xiaoou Li, Mexico

VIII

Organization

International Liasion Chair:

C. De Silva, Liyanage, New Zealand

Publicity Chairs:

Simon X.Yang, Canada Jun Zhang, China

Exhibition Chair:

Cheng Peng, China

Program Committee Aili Han, China Arit Thammano, Thailand Baogang Hu, China Bin Luo, China Bin Zhu, China Bing Wang, China Bo Yan, USA Byoung-Tak Zhang, Korea Caoan Wang, Canada Chao Hai Zhang, Japan Chao-Xue Wang, China Cheng-Xiang Wang, UK Cheol-Hong Moon, Korea Chi-Cheng Cheng, Taiwan Clement Leung, Australia Daniel Coca, UK Daqi Zhu, China David Stirling, Australia Dechang Chen, USA Derong Liu, USA Dewen Hu, China Dianhui Wang, Australia Dimitri Androutsos, Canada Donald C. Wunsch, USA Dong Chun Lee, Korea Du-Wu Cui, China Fengling Han, Australia Fuchun Sun, China Girijesh Prasad, UK Guang-Bin Huang, Singapore Guangrong Ji, China Hairong Qi, USA Hong Qiao, China

Hong Wang, China Hongtao Lu, China Hongyong Zhao, China Huaguang Zhang, China Hui Wang, China Vitoantonio Bevilacqua, Italy Jiangtao Xi, Australia Jianguo Zhu, Australia Jianhua Xu, China Jiankun Hu, Australia Jian-Xun Peng, UK Jiatao Song, China Jie Tian, China Jie Yang, China Jin Li, UK Jin Wu, UK Jinde Cao, China Jinwen Ma, China Jochen Till, Germany John Q. Gan, UK Ju Liu, China K. R. McMenemy, UK Key-Sun Choi, Korea Liangmin Li, UK Luigi Piroddi, Italy Maolin Tang, Australia Marko Hoþevar, Slovenia Mehdi Shafiei, Canada Mei-Ching Chen, Taiwan Mian Muhammad Awais, Pakistan Michael Granitzer, Austria Michael J.Watts, New Zealand

Michiharu Maeda, Japan Minrui Fei, China Muhammad Jamil Anwas, Pakistan Muhammad Khurram Khan, China Naiqin Feng, China Nuanwan Soonthornphisaj, Thailand Paolo Lino, Italy Peihua Li, China Ping Guo, China Qianchuan Zhao, China Qiangfu Zhao, Japan Qing Zhao, Canada Roberto Tagliaferri, Italy Rong-Chang Chen, Taiwan RuiXiang Sun, China Saeed Hashemi, Canada Sanjay Sharma, UK Seán McLoone, Ireland Seong G. Kong, USA Shaoning Pang, New Zealand Shaoyuan Li, China Shuang-Hua Yang, UK Shunren Xia, China Stefanie Lindstaedt, Austria Sylvia Encheva, Norway Tai-hoon Kim, Korea Tai-Wen Yue, Taiwan Takashi Kuremoto, Japan Tarık Veli Mumcu, Turkey

Organization

Tian Xiang Mei, UK Tim. B. Littler, UK Tommy W. S. Chow, Hong Kong Uwe Kruger, UK Wei Dong Chen, China Wenming Cao, China Wensheng Chen, China Willi Richert, Germany Worapoj Kreesuradej, Thailand

Xiao Zhi Gao, Finland Xiaoguang Zhao, China Xiaojun Wu, China Xiaolong Shi, China Xiaoou Li, Mexico Xinge You, Hong Kong Xiwen Zhang, China Xiyuan Chen, China Xun Wang, UK Yanhong Zhou, China Yi Shen, China

IX

Yong Dong Wu, Singapore Yuhua Peng, China Zengguang Hou, China Zhao-Hui Jiang, Japan Zhen Liu, Japan Zhi Wang, China Zhi-Cheng Chen, China Zhi-Cheng Ji, China Zhigang Zeng, China Ziping Chiang, Taiwa

Reviewers Xiaodan Wang, Lei Wang, Arjun Chandra, Angelo Ciaramella, Adam Kalam, Arun Sathish, Ali Gunes, Jin Tang, Aiguo He, Arpad Kelemen, Andreas Koschan, Anis Koubaa, Alan Gupta, Alice Wang, Ali Ozen, Hong Fang, Muhammad Amir Yousuf, An-Min Zou, Andre Döring, Andreas Juffinger, Angel Sappa, Angelica Li, Anhua Wan, Bing Wang, Rong Fei, Antonio Pedone, Zhengqiang Liang , Qiusheng An, Alon Shalev Housfater, Siu-Yeung Cho, Atif Gulzar, Armin Ulbrich, Awhan Patnaik, Muhammad Babar, Costin Badica, Peng Bai, Banu Diri, Bin Cao, Riccardo Attimonelli, Baohua Wang, Guangguo Bi, Bin Zhu, Brendon Woodford, Haoran Feng, Bo Ma, Bojian Liang, Boris Bacic, Brane Sirok, Binrong Jin, Bin Tian, Christian Sonntag, Galip Cansever, Chun-Chi Lo, ErKui Chen, Chengguo Lv, Changwon Kim, Chaojin Fu, Anping Chen, Chen Chun , C.C. Cheng, Qiming Cheng, Guobin Chen, Chengxiang Wang, Hao Chen, Qiushuang Chen, Tianding Chen, Tierui Chen, Ying Chen, Mo-Yuen Chow, Christian Ritz, Chunmei Liu, Zhongyi Chu, Feipeng Da, Cigdem Turhan, Cihan Karakuzu, Chandana Jayasooriya, Nini Rao, Chuan-Min Zhai, Ching-Nung Yang, Quang Anh Nguyen, Roberto Cordone, Changqing Xu, Christian Schindler, Qijun Zhao, Wei Lu, Zhihua Cui, Changwen Zheng, David Antory, Dirk Lieftucht, Dedy Loebis, Kouichi Sakamoto, Lu Chuanfeng, Jun-Heng Yeh, Dacheng Tao, Shiang-Chun Liou, Ju Dai , Dan Yu, Jianwu Dang, Dayeh Tan, Yang Xiao, Dondong Cao, Denis Stajnko, Liya De Silva, Damien Coyle, Dian-Hui Wang, Dahai Zhang, Di Huang, Dikai Liu, D. Kumar, Dipak Lal Shrestha, Dan Lin, DongMyung Shin, Ning Ding, DongFeng Wang, Li Dong, Dou Wanchun, Dongqing Feng, Dingsheng Wan, Yongwen Du, Weiwei Du, Wei Deng, Dun-wei Gong, DaYong Xu, Dar-Ying Jan, Zhen Duan, Daniela Zaharie, ZhongQiang Wu, Esther Koller-Meier, Anding Zhu, Feng Pan, Neil Eklund, Kezhi Mao, HaiYan Zhang, Sim-Heng Ong, Antonio Eleuteri, Bang Wang, Vincent Emanuele, Michael Emmerich, Hong Fu, Eduardo Hruschka, Erika Lino, Estevam Rafael Hruschka Jr, D.W. Cui, Fang Liu, Alessandro Farinelli, Fausto Acernese, Bin Fang, Chen Feng, Huimin Guo, Qing Hua, Fei Zhang, Fei Ge, Arnon Rungsawang, Feng Jing, Min Feng, Feiyi Wang, Fengfeng Zhou, Fuhai Li, Filippo Menolascina, Fengli Ren, Mei Guo, Andrés Ferreyra, Francesco Pappalardo, Chuleerat Charasskulchai, Siyao Fu, Wenpeng Ding, Fuzhen Huang, Amal Punchihewa,

X

Organization

Geoffrey Macintyre, Xue Feng He, Gang Leng, Lijuan Gao, Ray Gao, Andrey Gaynulin, Gabriella Dellino, D.W. Ggenetic, Geoffrey Wang, YuRong Ge, Guohui He, Gwang Hyun Kim, Gianluca Cena, Giancarlo Raiconi, Ashutosh Goyal, Guan Luo, Guido Maione, Guido Maione, Grigorios Dimitriadis, Haijing Wang, Kayhan Gulez, Tiantai Guo, Chun-Hung Hsieh, Xuan Guo, Yuantao Gu, Huanhuan Chen, Hongwei Zhang, Jurgen Hahn, Qing Han, Aili Han, Dianfei Han, Fei Hao, Qing-Hua Ling, Hang-kon Kim, Han-Lin He, Yunjun Han, Li Zhang, Hathai Tanta-ngai, HangBong Kang, Hsin-Chang Yang, Hongtao Du, Hazem Elbakry, Hao Mei, Zhao L, Yang Yun, Michael Hild, Heajo Kang, Hongjie Xing, Hailli Wang, Hoh In, Peng Bai, Hong-Ming Wang, Hongxing Bai, Hongyu Liu, Weiyan Hou, Huaping Liu, H.Q. Wang, Hyungsuck Cho, Hsun-Li Chang, Hua Zhang, Xia Huang, Hui Chen, Huiqing Liu, Heeun Park, Hong-Wei Ji, Haixian Wang, Hoyeal Kwon, H.Y. Shen, Jonghyuk Park, Turgay Ibrikci, Mary Martin, Pei-Chann Chang, Shouyi Yang, Xiaomin Mu, Melanie Ashley, Ismail Altas, Muhammad Usman Ilyas, Indrani Kar, Jinghui Zhong, Ian Mack, Il-Young Moon, J.X. Peng , Jochen Till, Jian Wang, Quan Xue, James Govindhasamy, José Andrés Moreno Pérez, Jorge Tavares, S. K. Jayaweera, Su Jay, Jeanne Chen, Jim Harkin, Yongji Jia, Li Jia, Zhao-Hui Jiang, Gangyi Jiang, Zhenran Jiang, Jianjun Ran, Jiankun Hu, Qing-Shan Jia, Hong Guo, Jin Liu, Jinling Liang, Jin Wu, Jing Jie, Jinkyung Ryeu, Jing Liu, Jiming Chen, Jiann-Ming Wu, James Niblock, Jianguo Zhu, Joel Pitt, Joe Zhu, John Thompson, Mingguang Shi, Joaquin Peralta, Si Bao Chen, Tinglong Pan, Juan Ramón González González, JingRu Zhang, Jianliang Tang, Joaquin Torres, Junaid Akhtar, Ratthachat Chatpatanasiri, Junpeng Yuan, Jun Zhang, Jianyong Sun, Junying Gan, Jyh-Tyng Yau, Junying Zhang, Jiayin Zhou, Karen Rosemary McMenemy, Kai Yu, Akimoto Kamiya, Xin Kang, Ya-Li Ji, GuoShiang Lin, Muhammad Khurram, Kevin Curran, Karl Neuhold, Kyongnam Jeon, Kunikazu Kobayashi, Nagahisa Kogawa, Fanwei Kong, Kyu-Sik Park, Lily D. Li, Lara Giordano, Laxmidhar Behera, Luca Cernuzzi, Luis Almeida, Agostino Lecci, Yan Zuo, Lei Li, Alberto Leva, Feng Liang, Bin Li, Jinmei Liao, Liang Tang, Bo Lee, Chuandong Li, Lidija Janezic, Jian Li, Jiang-Hai Li, Jianxun Li, Limei Song, Ping Li, Jie Liu, Fei Liu, Jianfeng Liu, Jianwei Liu, Jihong Liu, Lin Liu, Manxi Liu, Yi Liu, Xiaoou Li, Zhu Li, Kun-hong Liu, Li Min Cui, Lidan Miao, Long Cheng , Huaizhong Zhang, Marco Lovera, Liam Maguire, Liping Liu, Liping Zhang, Feng Lu, Luo Xiaobin, Xin-ping Xie, Wanlong Li, Liwei Yang, Xinrui Liu, Xiao Wei Li, Ying Li, Yongquan Liang, Yang Bai, Margherita Bresco, Mingxing Hu, Ming Li, Runnian Ma, Meta-Montero Manrique, Zheng Gao, Mingyi Mao, Mario Vigliar, Marios Savvides, Masahiro Takatsuka, Matevz Dular, Mathias Lux, Mutlu Avci, Zhifeng Hao, Zhifeng Hao, Ming-Bin Li, Tao Mei, Carlo Meloni, Gennaro Miele, Mike Watts, Ming Yang, Jia Ma, Myong K. Jeong, Michael Watts, Markus Koch, Markus Koch, Mario Koeppen, Mark Kröll, Hui Wang, Haigeng Luo, Malrey Lee, Tiedong Ma, Mingqiang Yang, Yang Ming, Rick Chang, Nihat Adar, Natalie Schellenberg, Naveed Iqbal, Nur Bekiroglu, Jinsong Hu, Nesan Aluha, Nesan K Aluha, Natascha Esau, Yanhong Luo, N.H. Siddique, Rui Nian, Kai Nickel, Nihat Adar, Ben Niu, Yifeng Niu, Nizar Tayem, Nanlin Jin, Hong-Wei Ji, Dongjun Yu, Norton Abrew, Ronghua Yao, Marco Moreno-Armendariz, Osman Kaan Erol, Oh Kyu Kwon, Ahmet Onat, Pawel Herman, Peter Hung, Ping Sun, Parag Kulkarni, Patrick Connally, Paul Gillard, Yehu Shen,

Organization

XI

Paul Conilione, Pi-Chung Wang, Panfeng Huang, Peter Hung, Massimo Pica Ciamarra, Ping Fang, Pingkang Li, Peiming Bao, Pedro Melo-Pinto, Maria Prandini, Serguei Primak, Peter Scheir, Shaoning Pang, Qian Chen, Qinghao Rong, QingXiang Wu, Quanbing Zhang, Qifu Fan, Qian Liu, Qinglai Wei, Shiqun Yin, Jianlong Qiu, Qingshan Liu, Quang Ha, SangWoon Lee , Huaijing Qu, Quanxiong Zhou , Qingxian Gong, Qingyuan He, M.K.M. Rahman, Fengyuan Ren, Guang Ren, Qingsheng Ren, Wei Zhang, Rasoul Milasi, Rasoul Milasi, Roberto Amato, Roberto Marmo, P. Chen, Roderick Bloem, Hai-Jun Rong, Ron Von Schyndel, Robin Ferguson, Runhe Huang, Rui Zhang, Robin Ferguson, Simon Johnston, Sina Rezvani, Siang Yew Chong, Cristiano Cucco, Dar-Ying Jan, Sonya Coleman, Samuel Rodman, Sancho SalcedoSanz, Sangyiel Baik, Sangmin Lee, Savitri Bevinakoppa, Chengyi Sun, Hua Li, Seamus McLoone, Sean McLoone, Shafayat Abrar, Aamir Shahzad, Shangmin Luan, Xiaowei Shao, Shen Yanxia, Zhen Shen, Seung Ho Hong, Hayaru Shouno, Shujuan Li, Si Eng Ling, Anonymous, Shiliang Guo, Guiyu Feng, Serafin Martinez Jaramillo, Sangwoo Moon, Xuefeng Liu, Yinglei Song, Songul Albayrak, Shwu-Ping Guo, Chunyan Zhang, Sheng Chen, Qiankun Song, Seok-soo Kim, Antonino Staiano, Steven Su, Sitao Wu, Lei Huang, Feng Su, Jie Su, Sukree Sinthupinyo, Sulan Zhai, Jin Sun, Limin Sun, Zengshun Zhao, Tao Sun, Wenhong Sun, Yonghui Sun, Supakpong Jinarat, Srinivas Rao Vadali, Sven Meyer zu Eissen, Xiaohong Su, Xinghua Sun, Zongying Shi, Tony Abou-Assaleh, Youngsu Park, Tai Yang, Yeongtak Jo, Chunming Tang, Jiufei Tang, Taizhe Tan, Tao Xu, Liang Tao, Xiaofeng Tao, Weidong Xu, Yueh-Tsun Chang, Fang Wang, Timo Lindemann, Tina Yu, Ting Hu, Tung-Kuan Liu, Tianming Liu, Tin Lay Nwe, Thomas Neidhart, Tony Chan, Toon Calders, Yi Wang, Thao Tran, Kyungjin Hong, Tariq Qureshi, Tung-Shou Chen, Tsz Kin Tsui, Tiantian Sun, Guoyu Tu, Tulay Yildirim, Dandan Zhang, Xuqing Tang, Yuangang Tang, Uday Chakraborty, Luciana Cariello, Vasily Aristarkhov, Jose-Luis Verdegay, Vijanth Sagayan Asirvadam, Vincent Lee, Markus Vincze, Duo Chen, Viktoria Pammer, Vedran Sabol, Wajeeha Akram, Cao Wang , Xutao Wang, Winlen Wang, Zhuang Znuang, Feng Wang, Haifeng Wang, Le Wang, Wang Linkun, Meng Wang, Rongbo Wang, Xin Wang, Xue Wang, Yan-Feng Wang, Yong Wang, Yongcai Wang, Yongquan Wang, Xu-Qin Li, Wenbin Liu, Wudai Liao, Weidong Zhou, Wei Li, Wei Zhang, Wei Liang, Weiwei Zhang, Wen Xu, Wenbing Yao, Xiaojun Ban, Fengge Wu, Weihua Mao, Shaoming Li, Qing Wu, Jie Wang, Wei Jiang, W Jiang, Wolfgang Kienreich, Linshan Wang, Wasif Naeem, Worasait Suwannik, Wolfgang Slany, Shijun Wang , Wooyoung Soh, Teng Wang, Takashi Kuremoto, Hanguang Wu, Licheng Wu, Xugang Wang, Xiaopei Wu, ZhengDao Zhang, Wei Yen, Yan-Guo Wang, Daoud Ait-Kadi, Xiaolin Hu, Xiaoli Li, Xun Wang, Xingqi Wang, Yong Feng, Xiucui Guan, Xiao-Dong Li, Xingfa Shen, Xuemin Hong, Xiaodi Huang, Xi Yang, Li Xia, Zhiyu Xiang, Xiaodong Li, Xiaoguang Zhao, Xiaoling Wang, Min Xiao, Xiaonan Wu, Xiaosi Zhan, Lei Xie, Guangming Xie, Xiuqing Wang, Xiwen Zhang, XueJun Li, Xiaojun Zong, Xie Linbo, Xiaolin Li, Xin Ma, Xiangqian Wu, Xiangrong Liu, Fei Xing, Xu Shuzheng, Xudong Xie, Bindang Xue, Xuelong Li, Zhanao Xue, Xun Kruger, Xunxian Wang, Xusheng Wei, Yi Xu, Xiaowei Yang, Xiaoying Wang, Xiaoyan Sun, YingLiang Ma, Yong Xu, Jongpil Yang, Lei Yang, Yang Tian, Zhi Yang, Yao Qian, Chao-bo Yan, Shiren Ye,

XII

Organization

Yong Fang, Yanfei Wang, Young-Gun Jang, Yuehui Chen, Yuh-Jyh Hu, Yingsong Hu, Zuoyou Yin, Yipan Deng, Yugang Jiang, Jianwei Yang, Yujie Zheng, Ykung Chen, Yan-Kwang Chen, Ye Mei, Yongki Min, Yongqing Yang, Yong Wu, Yongzheng Zhang, Yiping Cheng, Yongpan Liu, Yanqiu Bi, Shengbao Yao, Yongsheng Ding, Haodi Yuan, Liang Yuan, Qingyuan He, Mei Yu, Yunchu Zhang, Yu Shi, Wenwu Yu, Yu Wen, Younghwan Lee, Ming Kong, Yingyue Xu, Xin Yuan, Xing Yang, Yan Zhou, Yizhong Wang, Zanchao Zhang, Ji Zhicheng, Zheng Du, Hai Ying Zhang, An Zhang, Qiang Zhang, Shanwen Zhang, Shanwen Zhang, Zhang Tao, Yue Zhao, R.J. Zhao, Li Zhao, Ming Zhao, Yan Zhao, Bojin Zheng, Haiyong Zheng, Hong Zheng, Zhengyou Wang, Zhongjie Zhu, Shangping Zhong, Xiaobo Zhou, Lijian Zhou, Lei Zhu, Lin Zhu, Weihua Zhu, Wumei Zhu, Zhihong Yao, Yumin Zhang, Ziyuan Huang, Chengqing Li, Z. Liu, Zaiqing Nie, Jiebin Zong, Zunshui Cheng, Zhongsheng Wang, Yin Zhixiang, Zhenyu He, Yisheng Zhong, Tso-Chung Lee, Takashi Kuremoto Tao Jianhua, Liu Wenjue, Pan Cunhong, Li Shi, Xing Hongjie, Yang Shuanghong, Wang Yong, Zhang Hua, Ma Jianchun, Li Xiaocui, Peng Changping, Qi Rui, Guozheng Li, Hui Liu, Yongsheng Ding, Xiaojun Liu, Qinhua Huang

㧘

Table of Contents

Blind Source Separation A Uniﬁed Framework of Morphological Associative Memories Naiqin Feng, Yuhui Qiu, Fang Wang, Yuqiang Sun . . . . . . . . . . . . . . . . .

1

A New Speech Denoising Method Based on WPD-ICA Feature Extraction Qinghua Huang, Jie Yang, Yue Zhou . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

An Eﬃcient Algorithm for Blind Separation of Binary Symmetrical Signals Wenbo Xia, Beihai Tan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

A New Blind Source Separation Algorithm Based on Second-Order Statistics for TITO ZhenLi Wang, XiongWei Zhang, TieYong Cao . . . . . . . . . . . . . . . . . . . .

29

A New Step-Adaptive Natural Gradient Algorithm for Blind Source Separation Huan Tao, Jian-yun Zhang, Lin Yu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

An Eﬃcient Blind SIMO Channel Identiﬁcation Algorithm Via Eigenvalue Decomposition Min Shi, Qingming Yi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

41

An Improved Independent Component Analysis Algorithm and Its Application in Preprocessing of Bearing Sounds Guangrui Wen, Liangsheng Qu, Xining Zhang . . . . . . . . . . . . . . . . . . . . .

48

Array Signal MP Decomposition and Its Preliminary Applications to DOA Estimation Jianying Wang, Lei Chen, Zhongke Yin . . . . . . . . . . . . . . . . . . . . . . . . . .

54

Mixture Matrix Identiﬁcation of Underdetermined Blind Source Separation Based on Plane Clustering Algorithm Beihai Tan, Yuli Fu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

60

Non-linear Blind Source Separation Using Constrained Genetic Algorithm Zuyuan Yang, Yongle Wan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

66

XIV

Table of Contents

Intelligent Sensor Networks A Distributed Wavelet-Based Image Coding for Wireless Sensor Networks Hui Dong, Jiangang Lu, Youxian Sun . . . . . . . . . . . . . . . . . . . . . . . . . . . .

72

Development of Secure Event Service for Ubiquitous Computing Younglok Lee, Seungyong Lee, Hyunghyo Lee . . . . . . . . . . . . . . . . . . . . . .

83

Energy Eﬃcient Connectivity Maintenance in Wireless Sensor Networks Yanxiang He, Yuanyuan Zeng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

95

The Critical Speeds and Radii for Coverage in Sensor Networks Chuanzhi Zang, Wei Liang, Haibin Yu . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 A Distributed QoS Control Schema for Wireless Sensor Networks Jin Wu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 A Framework of In-Situ Sensor Data Processing System for Context Awareness Young Jin Jung, Yang Koo Lee, Dong Gyu Lee, Mi Park, Keun Ho Ryu, Hak Cheol Kim, Kyung Ok Kim . . . . . . . . . . . . . . . . . . . . 124 A Mathematical Model for Energy-Eﬃcient Coverage and Detection in Wireless Sensor Networks Xiaodong Wang, Huaping Dai, Zhi Wang, Youxian Sun . . . . . . . . . . . . . 130 A Method of Controlling Packet Transmission Rate with Fuzzy Logic for Ad Hoc Networks Kyung-Bae Chang, Tae-Hwan Son, Gwi-Tae Park . . . . . . . . . . . . . . . . . . 138 A Novel Algorithm for Doppler Frequency Rate Estimation of Spaceborne Synthetic Aperture Radar Shiqi Huang, Daizhi Liu, Liang Chen, Yunfeng Liu . . . . . . . . . . . . . . . . . 144 A Novel Genetic Algorithm to Optimize QoS Multicast Routing Guangbin Bao, Zhanting Yuan, Qiuyu Zhang, Xuhui Chen . . . . . . . . . . 150 A Probe for the Performance of Low-Rate Wireless Personal Area Networks Shuqin Ren, Khin Mi Mi Aung, Jong Sou Park . . . . . . . . . . . . . . . . . . . . 158 AG-NC: An Automatic Generation Technique of Network Components for Dynamic Network Management Eun Hee Kim, Myung Jin Lee, Keun Ho Ryu . . . . . . . . . . . . . . . . . . . . . . 165

Table of Contents

XV

Clustering Algorithm in Wireless Sensor Networks Using Transmit Power Control and Soft Computing Kyung-Bae Chang, Young-Bae Kong, Gwi-Tae Park . . . . . . . . . . . . . . . . 171 Discriminating Fire Detection Via Support Vector Machines Heshou Wang, Shuibo Zheng, Chi Chen, Wenbin Yang, Lei Wu, Xin Cheng, Minrui Fei, Chuanping Hu . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 Dynamic Deployment Optimization in Wireless Sensor Networks Xue Wang, Sheng Wang, Junjie Ma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Energy-Eﬃcient Aggregation Control for Mobile Sensor Networks Liang Yuan, Weidong Chen, Yugeng Xi . . . . . . . . . . . . . . . . . . . . . . . . . . 188 Intelligent MAC Protocol for Eﬃcient Support of Multiple SOPs in UWB-Based Sensor Networks Peng Gong, Peng Xue, Duk Kyung Kim . . . . . . . . . . . . . . . . . . . . . . . . . . 194 Topology Control in Wireless Sensor Networks with Interference Consideration Yanxiang He, Yuanyuan Zeng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

Intelligent Control and Automation Adaptive Depth Control for Autonomous Underwater Vehicles Based on Feedforward Neural Networks Yang Shi, Weiqi Qian, Weisheng Yan, Jun Li . . . . . . . . . . . . . . . . . . . . . 207 Adaptive Fuzzy Sliding-Mode Control for Non-minimum Phase Overload System of Missile Yongping Bao, Wenchao Du, Daquan Tang, Xiuzhen Yang, Jinyong Yu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 An Improved Genetic & Ant Colony Optimization Algorithm and Its Applications Tiaoping Fu, Yushu Liu, Jiguo Zeng, Jianhua Chen . . . . . . . . . . . . . . . . 229 Application of Adaptive Disturbance Canceling to Attitude Control of Flexible Satellite Ya-qiu Liu, Jun Cao, Wen-long Song . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 Application of Resource Allocating Network and Particle Swarm Optimization to ALS Jih-Gau Juang, Bo-Shian Lin, Feng-Chu Lin . . . . . . . . . . . . . . . . . . . . . . 252

XVI

Table of Contents

Case-Based Modeling of the Laminar Cooling Process in a Hot Rolling Mill Minghao Tan, Shujiang Li, Jinxiang Pian, Tianyou Chai . . . . . . . . . . . 264 Fast Mesh Simpliﬁcation Algorithm Based on Edge Collapse Shixiang Jia, Xinting Tang, Hui Pan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 Hierarchical Multiple Models Adaptive Feedforward Decoupling Controller Applied to Wind Tunnel System Xin Wang, Hui Yang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 Intelligent Backstepping Control for Chaotic Systems Using Self-Growing Fuzzy Neural Network Chih-Min Lin, Chun-Fei Hsu, I-Fang Chung . . . . . . . . . . . . . . . . . . . . . . 299 Modeling of Rainfall-Runoﬀ Relationship at the Semi-arid Small Catchments Using Artiﬁcial Neural Networks Mustafa Tombul, Ersin O˘gul . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 A Novel Multi-agent Based Complex Process Control System and Its Application Yi-Nan Guo, Jian Cheng, Dun-wei Gong, Jian-hua Zhang . . . . . . . . . . 319 Neural Network Based Soft Switching Control of a Single Phase AC Voltage Restorer Kayhan Gulez, Tarık Veli Mumcu, Ibrahim Aliskan . . . . . . . . . . . . . . . . . 331 Neural Network Training Using PSO Algorithm in ATM Traﬃc Control Yuan-wei Jing, Tao Ren, Yu-cheng Zhou . . . . . . . . . . . . . . . . . . . . . . . . . . 341 Parameter Identiﬁcation of Dynamical Systems Based on Improved Particle Swarm Optimization Meiying Ye . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 Petri Net Modeling Method to Scheduling Problem of Holonic Manufacturing System (HMS) and Its Solution with a Hybrid PSO Algorithm Fuqing Zhao, Qiuyu Zhang, Yahong Yang . . . . . . . . . . . . . . . . . . . . . . . . . 361 Real-Time Motion Planning by Sampling Points on Obstacles’ Surfaces Towards HRI Hong Liu, Xuezhi Deng, Hongbin Zha, Keming Chen . . . . . . . . . . . . . . . 373 Sliding Mode Control Based on Fuzzy Neural Network for Missile Electro-hydraulic Servo Mechanism Chunlai Yu, Hualong Xu, Yunfeng Liu, Shiqi Huang . . . . . . . . . . . . . . . . 385

Table of Contents

XVII

Stability Analysis of Network Data Flow Control for Dynamic Link Capacity Case Yuequan Yang, Yaqin Li, Min Tan, Jianqiang Yi, John T. Wen, Xuewu Guo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 Matrix Measure Stability Criteria for a Class of Switched Linear Systems Hongbo Zou, Hongye Su, Jian Chu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 Study on FOG-SINS/GPS Integrated Attitude Determination System Using Adaptive Kalman Filter Xiyuan Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 Supervisory Control for Rotary Kiln Temperature Based on Reinforcement Learning Xiaojie Zhou, Heng Yue, Tianyou Chai, Binhao Fang . . . . . . . . . . . . . . 428 A Data Reorganization Algorithm to Improve Transmission Eﬃciency in CAN Networks Jung-Ki Choi, Sungyun Jung, Kwang-Ryul Baek . . . . . . . . . . . . . . . . . . . 438 A Neural Network Approach to QoS Management in Networked Control Systems over Ethernet Wenhong Zhao, Feng Xia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444 A Novel Micro-positioning Controller for Piezoelectric Actuators Van-Tsai Liu, Chun-Liang Lin, Hsiang-Chan Huang, Zi-Jie Jian . . . . 450 A Study on the Robust Control of an Inverted Pendulum Using Discrete Sliding-Mode Control J.Y. Yang, H.J. Lee, J.H. Hwang, N.K. Lee, H.W. Lee, G.A. Lee, S.M. Bae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456 A VRC Algorithm Based on Fuzzy Immune PID-Smith Ling He, Yu-cheng Zhou, Yuan-wei Jing, Hai-yu Zhu . . . . . . . . . . . . . . . 463 Absolute Stability of State Feedback Time-Delay Systems Hanlin He, Zhongsheng Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469 Adaptive Fuzzy Control of Nonlinear Systems Based on Terminal Sliding Mode Shuanghe Yu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474 Simulation of Air Fuel Ratio Control Using Radius Basis Function Neural Network Zhixiang Hou . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480

XVIII

Table of Contents

An Experimental Study on Multi-mode Control Methods of Flexible Structure Using PZT Actuator and Modiﬁed IMSC Algorithm W.S. Lee, H.J. Lee, J.H. Hwang, N.K. Lee, H.W. Lee, G.A. Lee, S.M. Bae, D.M. Kim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486 Applications of Computational Verbs to Image Processing of RoboCup Small-Size Robots Wanmi Chen, Yanqin Wei, Minrui Fei, Huosheng Hu . . . . . . . . . . . . . . 494 Auto-Disturbance-Rejection Controller Design Based on RBF Neural Networks Yongli Shi, Chaozhen Hou . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500 CASTmiddleware : Security Middleware of Context-Awareness Simulation Toolkit for Ubiquitous Computing Research Environment InSu Kim, YoungLok Lee, HyungHyo Lee . . . . . . . . . . . . . . . . . . . . . . . . . 506 Cybernation Process of Lidar System Detecting the Atmospheric Carbon Dioxide Yue-Feng Zhao, Yin-Chao Zhang, Pei-Tao Zhao, Jia Su, Xin Fang, Jun Xie, Kai-feng Qv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514 Design of a Robust Output Feedback Controller for Robot Manipulators Using Visual Feedback Min Seok Jie, Chin Su Kim, Kang Woong Lee . . . . . . . . . . . . . . . . . . . . . 520 Fuzzy Sliding Mode Controller with RBF Neural Network for Robotic Manipulator Trajectory Tracking Ayca Gokhan Ak, Galip Cansever . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527 Hybrid Fuzzy Neural Network Control for Complex Industrial Process Qingyu Yang, Lincang Ju, Sibo Ge, Ren Shi, Yuanli Cai . . . . . . . . . . . . 533 Intelligent Vehicle Control by Optimal Selection of Image Data M. Junaid Khan, Danya Yao, Juan Zhao, Shuning Wang, Yu Cai . . . . 539 Rule-Based Expert System for Selecting Scene Matching Area Guozhong Zhang, Lincheng Shen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546 Multi-channel Measurement of Transmissivity of Smoke Tao Shen, Jian-she Song . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554 Multi-Model Predictive Control Based on a New Clustering Modeling Method Luwen zhou, Lifang zhou . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559

Table of Contents

XIX

Neural Network-Based an Adaptive Discrete-Time Global Sliding Mode Control Scheme Zhenyan Wang, Jinggang Zhang, Zhimei Chen, Yanzhao He . . . . . . . . . 565 Real Coded Genetic Algorithm for Optimizing Fuzzy Logic Control of Greenhouse Microclimate Fang Xu, Jiaoliao Chen, Libin Zhang, Hongwu Zhan . . . . . . . . . . . . . . . 571 Research and Implementation on the Mobile Intelligent Controller for Home Automation Service Jonghwa Choi, Dongkyoo Shin, Dongil Shin . . . . . . . . . . . . . . . . . . . . . . . 578 Sampled-Data Systems with Quantization and Slowly Varying Inputs Ge Guo, Huipu Xu, Yuan Tian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584 Set-Stabilization with Occasional Feedback Ge Guo, Jigong Li . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590 Spatial Reasoning for Collision Detection and Hardware Implementation Chirag Nepal, Seung Woo Nam, Dohyung Kim, Kyungsook Han . . . . . . 596 Stability Criteria for Switched Linear Systems Hongbo Zou, Hongye Su, Jian Chu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602 Suppressing of Chaotic State Based on Delay Feedback Wenli Zhao, Linze Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 608 Vector Controlled PMSM Drive Based on Adaptive Neuro-fuzzy Speed Controller Xianqing Cao, Liping Fan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616

Data Fusion, Knowledge Discovery and Data Mining “Intelligent Yardstick”, an Approach of Ranking to Filter Non-promising Attributes from Schema in Data Mining Process Mohammad M. Hassan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623 A Local Computing-Based Hierarchical Clustering Algorithm Building Density Trees Wei-di Dai, Jie-Liu, Da-yi Zhao, Zhen-hua Liu, Jun-xian Zhang, Pi-lian He . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633 Support Vector Clustering and Type-Entropy Based Joint De-interleaving/recognition System of Radar Pulse Sequence Qiang Guo, Zheng Li, Xingzhou Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . 642

XX

Table of Contents

Classiﬁcation Rule Mining Based on Ant Colony Optimization Algorithm Peng Jin, Yunlong Zhu, Kunyuan Hu, Sufen Li . . . . . . . . . . . . . . . . . . . . 654 Dynamic Feature Selection in Text Classiﬁcation Son Doan, Susumu Horiguchi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664 Knowledge Veriﬁcation of Active Rule-Based Systems Lorena Chavarr´ıa-B´ aez, Xiaoou Li . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 676 Lattice-Based Classiﬁcation with Mixed Numeric and Nominal Data Wei Hu, Huanye Sheng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 688 Learning to Semantically Classify Email Messages Eric Jiang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 700 Optimizing the Hyper-parameters for SVM by Combining Evolution Strategies with a Grid Search Ruiming Liu, Erqi Liu, Jie Yang, Ming Li, Fanglin Wang . . . . . . . . . . . 712 Prediction of Sinter Burn-Through Point Based on Support Vector Machines Xiaofeng Wu, Minrui Fei, Heshou Wang, Shuibo Zheng . . . . . . . . . . . . . 722 Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset Show-Jane Yen, Yue-Shi Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731 Utilizing a Hierarchical Method to Deal with Uncertainty in Context-Aware Systems Donghai Guan, Weiwei Yuan, Mohammad A.U. Khan, Youngkoo Lee, Sungy-oung Lee, Sangman Han . . . . . . . . . . . . . . . . . . . . . 741 A Multi-focus Image Fusion Method Based on Image Information Features and the Artiﬁcial Neural Networks Lijian Zhou, Guangrong Ji, Changjiang Shi, Chen Feng, Rui Nian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747 A Multilayer Topic-Map-Based Model Used for Document Resources Organization Jiangning Wu, Haiyan Tian, Guangfei Yang . . . . . . . . . . . . . . . . . . . . . . 753 A New Clustering Algorithm for Time Series Analysis Jianping Zeng, Donghui Guo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759

Table of Contents

XXI

A New IHS-WT Image Fusion Method Based on Weighted Regional Features Jin Wu, Bingkun Yin, Jian Liu, Jinwen Tian . . . . . . . . . . . . . . . . . . . . . 765 A Novel Algorithm of Mining Multidimensional Association Rules WanXin Xu, RuJing Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 771 A Novel Discretizer for Knowledge Discovery Based on Multiknowledge Approaches QingXiang Wu, Girijesh Prasad, TM McGinnity, David Bell, ShaoChun Zhong, Jiwen Guan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 778 A Novel Neural Network for Sensor Fusion Applied to Wood Growth Ring Moisture Measurement Mingbao Li, Shiqiang Zheng, Jun Hua . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784 A Novel Reduction Algorithm Based Decomposition and Merging Strategy Feng Hu, Xinghua Fan, Simon.X Yang, Chaohai Tao . . . . . . . . . . . . . . . 790 A Pattern Distance-Based Evolutionary Approach to Time Series Segmentation Jingwen Yu, Jian Yin, Duanning Zhou, Jun Zhang . . . . . . . . . . . . . . . . . 797 A Quality Prediction Method of Injection Molding Processes Using Sub-stage PCA-SI XiaoPing Guo, FuLi Wang, MingXing Jia . . . . . . . . . . . . . . . . . . . . . . . . 803 A Robust Algorithm for Watermark Numeric Relational Databases Xinchun Cui, Xiaolin Qin, Gang Sheng, Jiping Zheng . . . . . . . . . . . . . . 810 A Study on the RAP Approach and Its Application Jian Cao, Gengui Zhou, Feng Tang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 816 An Analytical Model for Web Prefetching Lei Shi, Lin Wei, Zhimin Gu, Yingjie Han, Yun Shi . . . . . . . . . . . . . . . 822 An Optimal Interval for Computing Course and Ship Speed in Marine Gravity Survey Based on Approximate Reasoning Lihua Zhang, Chong Fang, Xiaosan Ge, Yilong Li . . . . . . . . . . . . . . . . . 828 Application of Association Rules in Education Sylvia Encheva, Sharil Tumin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834

XXII

Table of Contents

Associative Neighborhood According to Representative Attribute for Performing Collaborative Filtering Kyung-Yong Jung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839 Benchmarking a Recurrent Linear GP Model on Prediction and Control Problems Xiao Luo, Malcolm Heywood, A. Nur Zincir-Heywood . . . . . . . . . . . . . . 845 Cognitive Computing in Intelligent Medical Pattern Recognition Systems Lidia Ogiela, Ryszard Tadeusiewicz, Marek R. Ogiela . . . . . . . . . . . . . . . 851 Data Mining-Based Analysis on Relationship Between Academic Achievement and Learning Methods During Vacation Hea-Suk Kim, Yang-Sae Moon, Jinho Kim, Woong-Kee Loh . . . . . . . . . 857 Database and Comparative Identiﬁcation of Prophages K.V. Srividhya, Geeta V Rao, Raghavenderan L, Preeti Mehta, Jaime Prilusky, Sankarnarayanan Manicka, Joel L. Sussman, S Krishnaswamy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863 Dynamic Multidimensional Assignment Algorithm for Multisensor Information Fusion System Yi Xiao, Guan Xin, He You . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 869 Future Location Prediction of Moving Objects Based on Movement Rules Vu Thi Hong Nhan, Keun Ho Ryu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875 Fuzzy Information Fusion for Hypergame Outcome Preference Perception Yexin Song, Zhijun Li, Yongqiang Chen . . . . . . . . . . . . . . . . . . . . . . . . . . 882 Image Identiﬁcation System Using MPEG-7 Descriptors Wonil Kim, Sanggil Kang, Juyoung Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . 888 Improved Text Mining Methods to Answer Chinese E-mails Automatically Yingjie Lv, Qiang Ye, Yijun Li . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 894 Improvement on ARC-BC Algorithm in Text Classiﬁcation Method Yu Zhao, Weitong Huang, Yuchang Lu . . . . . . . . . . . . . . . . . . . . . . . . . . . 903 Improving Structure Damage Identiﬁcation by Using ICA-ANN Based Sensitivity Analysis Bo Han, Lishan Kang, Yuping Chen, Huazhu Song . . . . . . . . . . . . . . . . . 909

Table of Contents

XXIII

Inferring Certiﬁcation Metrics of Package Software Using Bayesian Belief Network Chongwon Lee, Byungjeong Lee, Jaewon Oh, Chisu Wu . . . . . . . . . . . . . 915 Inﬂuence Analysis in Linear Mixed Models Yu Fei, Jianxin Pan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 921 Knowledge Reduction in Set-Valued Decision Information Systems Hong Wang, Wen-Xiu Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 927 Modeling and Fusion Estimation of Dynamic Multiscale System Based on M-Band Wavelet and State Space Projection Peiling Cui, Guizeng Wang, Quan Pan . . . . . . . . . . . . . . . . . . . . . . . . . . . 933 Multiscale Feature Extraction for Time Series Classiﬁcation with Hybrid Feature Selection Hui Zhang, Mao-Song Lin, Wei Huang, Saori Kawasaki, Tu Bao Ho . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 939 Network Traﬃc Classiﬁcation Using Rough Set Theory and Genetic Algorithm Ning Li, Zilong Chen, Gang Zhou . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945 Optimization Feature Compression and FNN Realization Shifei Ding, Yuncheng Wang, Zhongzhi Shi, Fengxiang Jin . . . . . . . . . . 951 Paleolithic Stone Relic Analysis Using ARD Bum Ju Lee, Heon Gyu Lee, Keun Ho Ryu, Moon Haeng Huh, Jong Yun Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 957 Pei-Radman Fusion Estimation Algorithm for Multisensor System Applied in State Monitoring Xue-bo Jin, You-xian Sun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 963 Privacy Preserving Association Rules Mining Based on Secure Two-Party Computation Weimin Ouyang, Qinhua Huang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 969 QuickBird Panchromatic and Multi-Spectral Image Fusion Using Wavelet Packet Transform Wenjuan Zhang, Jiayin Kang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 976 Satisfaction-Based Selection of XML Documents Sergio Andreozzi, Paolo Ciancarini, Danilo Montesi, Rocco Moretti . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 982

XXIV

Table of Contents

Special Session on Smart and Intelligent Home Technology An Adaptive Error Detection-Recovery QOS for Intelligent Context-Awareness: AEDRQ Eung Nam Ko . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 990 An Eﬃcient Method to Find a Shortest Path for a Car-Like Robot Gyusang Cho, Jinkyung Ryeu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1000 An Intelligent QOS Model for Adaptive Concurrency Control Running on Ubiquitous Computing Environments Eung Nam Ko . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1012 Analysis and Extension of S/Key-Based Authentication Schemes for Digital Home Networks Ilsun You . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1022 Anonymous Voting Scheme on Multicast Dong-Myung Shin, Hee-Un Park, Woo-Jae Won, Jae-Il Lee . . . . . . . . . 1034 CAST: Design and Implementation of Secure Context-Awareness Simulation Toolkit for Ubiquitous Computing Research Environment InSu Kim, YoungLok Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1040 Support Vector Machine for String Vectors Malrey Lee, Taeho Jo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1056 RHDP-Scheme for RFID’s Eﬃciency Improvement on the Ubiquitous Computing Bok Yong Choi, Deok Gyu Lee, Jong Hyuk Park, Kwang Nam Choi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1068 A Study on the Improvement of Soldiers Training System for Using Ubiquitous Sensor Network Seoksoo Kim, Soongohn Kim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1078 Deﬁnition of Security Requirement Items and Its Process to Security and Progress Management Eun Ser Lee, Sun-myoung Hwang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1084 Design and Implementation of Web Security Access Control System for Semantic Web Ontology Eun-ser Lee, Sun-myoung Hwang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1090

Table of Contents

XXV

Designing of Patient-Focused Mobile Agent in the Ubiquitous Medical Examination System Jaekoo Song, Seoksoo Kim, Soongohn Kim . . . . . . . . . . . . . . . . . . . . . . . . 1096 Mobile Robot Path Planning by Circular Scan Code and Circular Distance Transform Method Gyusang Cho, Jinkyung Ryeu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1102 Ultrasonic Intruder Detection System for Home Security Yue-Peng Li, Jun Yang, Xiao-Dong Li, Jing Tian . . . . . . . . . . . . . . . . . 1108 Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1117

A Unified Framework of Morphological Associative Memories Naiqin Feng1,2, Yuhui Qiu2, Fang Wang2, and Yuqiang Sun1 1

College of Computer & Information Technology, Henan Normal University, Xinxiang 453007 2 Faculty of Computer & information Science, South West-China University, Chongqing 400715 [email protected]

Abstract. The morphological neural network models, including morphological associative memories (MAM), fuzzy morphological associative memories (FMAM), enhanced morphological associative memories (EFMAM), etc., are extremely new artificial neural networks. They have many attractive advantages such as unlimited storage capacity, one-short recall speed and good noisetolerance to erosive or dilative noise. Although MAM, FMAM and EFMAM are different and easily distinguishable from each other, they have the same morphological theory base. Therefore in this paper a unified theoretical framework of them is presented. The significance of the framework consists in: (1) It can help us find some new methods, definitions and theorems for morphological neural networks; (2) We have a deeper understanding of MAM, FMAM and EFMAM while having the unified theoretical framework.

1 Introduction The theory of artificial neural networks has been successful applied to a wide variety of pattern recognition problems [3,4]. In this theory, the first step in computing the next state of a neuron or in performing the next layer neural-network computation involves the linear operation of multiplying neural values by their synaptic strengths and adding the results. A nonlinear activation function usually follows the linear operation in order to provide for non-linearity of the network and set the next state of the neuron. In recent years, a number of different morphological neural network models and applications have emerged [1,5,8,12,15,16]. First attempts in formulating useful morphological neural networks appeared in [10]. Since then, only a few papers involving morphological neural networks have appeared. Suarez-Araujo applied morphological neural networks to compute homothetic auditory and visual invariants [2]. Davidson employed morphological neural networks in order to solve template identification and target classification problems [9], [11]. All of these researchers devised multi-layer morphological neural networks for very specialized applications. A more comprehensive and rigorous basis for computing with morphological neural networks appeared in [6] where it was shown that morphological neural networks are capable of solving any conventional computational problem. In 1998, Ritter et al. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 1 – 11, 2006. © Springer-Verlag Berlin Heidelberg 2006

2

N. Feng et al.

proposed the concept of morphological associative memories (MAM) and the concept of morphological auto-associative memories (auto-MAM) [7], which constitute a class of networks not previously discussed in detail. MAM is based on the algebraic lattice structure ( R, ∧, ∨ , + ) or morphological operations. MAM behaves more like human associative memories than the traditional semilinear models such as the Hopfield net. Once a pattern has been memorized, recall is instantaneous when the MAM is presented with the pattern. In the absence of noise, an auto-MAM will always provide perfect recall for any number of patterns programmed into its memory. The auto-MAM MXX is extremely robust in recalling patterns that are distorted due to dilative changes, while auto-MAM WXX is extremely robust in recalling patterns that are distorted due to erosive changes. In 2003, Wang and Chen presented the model of fuzzy morphological associative memories (FMAM). Originated from the basic ideas of MAM, the FMAM uses two basic morphological operations (∧, ⋅) , (∨ , ⋅) instead of fuzzy operation (∧, ∨ ) in fuzzy associative memory [13]. FMAM solves fuzzy rules memory problem of the MAM. Under certain conditions, FMAM can be viewed as a new encoding way of fuzzy associative memory such that it can embody fuzzy operators and the concepts of fuzzy membership value and fuzzy rules. Both auto-FMAM and auto-MAM have the same attractive advantages, such as unlimited storage capacity, one-shot recall speed and good noise-tolerance to either erosive or dilative noise. However, they suffer from the extreme vulnerability to noise of mixing erosion and dilation, resulting in great degradation on recall performance. To overcome this shortcoming, in 2005, Wang and Chen further presented an enhanced FMAM (EFMAM) based on the empirical kernel map [14]. Although MAM, FMAM and EFMAM are different and easily distinguishable from each other, we think that they have the same theoretical base, i.e. the same morphological base, therefore they can be unified together. This paper tries to establish a unified theoretical framework of MAM, FMAM and EFMAM. The more the thing is abstracted, the deeper the thing is understood. Consequently it is possible that some new methods and theorems are obtained. This is the reason why we research and propose the unified theoretical framework of MAM, FMAM and EFMAM

2 Unified Computational Base of MAM, FMAM and EFMAM Traditional artificial neural network models are specified by the network topology, node characteristics, and training or learning rules. The underlying algebraic system used in these models is the set of real numbers R together with the operations of addition and multiplication and the laws governing these operations. This algebraic system, known as a ring, is commonly denoted by ( R, +, ×) . The basic computations occurring in the morphological network proposed by Ritter et al. are based on the algebraic lattice structure ( R, ∧, ∨ , + ) , where the symbols ∧ and ∨ denote the binary operations of minimum and maximum, respectively, while the basic computations used in FMAM and EFMAM are based on the algebraic lattice structure ( R + , ∧ , ∨ , ⋅) ( R + = (0, ∞ )) .

A Unified Framework of Morphological Associative Memories

3

In unified morphological associative memories (UMAM), the basic computations are based on the algebraic lattice structure (U , ∧ , ∨ , O ) , where U=R, or U = R+; O=

㧙

+, , ⋅ , or /. If U=R and O=+, then (U , ∧ , ∨ , O ) = ( R , ∧ , ∨ , + ) , which is the computational base of MAM; If U=R+ and O= ⋅ , then (U , ∧ , ∨ , O ) = ( R + , ∧ , ∨ , ⋅ ) , which is the computational base of FMAM and EFMAM. Of course, the symbol O or /. also can be other appreciated operators, for example,

㧙

3 Unified Morphological-Norm Operators 3.1 Operators in MAM, FMAM and EFMAM As that described in the preceding section, morphological associative memories are based on the lattice algebra structure ( R, ∧, ∨ , + ) . Suppose we are given a vector pair

Щ

Щ

x=(x1,…, xn)ƍ Rn, , and y=(y1,…,ym) Rm. An associative morphological memory that will recall the vector y when presented the vector x is given by

§ y 1 − x1 щ(-x)ƍ= ¨ # W=yƑ ¨ ¨y −x © m 1

" % "

y1 − x n ·

¸ ¸ y m − x n ¸¹ #

(1)

щx = y as can be verified by the simple computation since W satisfies the equation WƑ § ∨ in= 1 y 1 − x i + x i · ¸ =y # ¨ ¸ n ¨ ∨ i =1 y m − x i + x i ¸ © ¹

Wщ Ƒ x =¨

(2)

W is called the max product of y and x. We also can denote the min product of y and x using operator ш Ƒ like (1) and (2). Similarly, let (x1, y1),…,(xk, yk) be k vector pairs with ȟ ȟ ȟ ȟ xȟ=(x1 ,…, xn )ƍ Rn and yȟ=(y1 ,…, ym )ƍ Rm for ȟ=1,…, k. For a given set of pattern ȟ ȟ associations {(x , y ): ȟ=1,…, k} we define a pair of associated pattern matrices (X,Y), j where X = (x1,", x k ) , Y = (y1 , ", y k ) . Thus, X is of dimension n×k with i, jth entry xi

Щ

Щ

j

and Y is of dimension m×k with i, jth entry yi . With each pair of matrices (X,Y), two natural morphological m×n memories WXY and MXY are defined by k Ƒ ( −x ȟ ) '] and M XY = ∨ k [ y ȟ щ Ƒ ( −x ȟ ) '] . WXY = ∧ ξ =1[ y ȟ ш ξ =1

(3)

Ƒ (-xȟ)ƍ= yȟщ Ƒ (-xȟ)ƍ. It therefore follows from this definition that Obviously, yȟш

WXYyȟш Ƒ (-xȟ)ƍ=yȟщ Ƒ (-xȟ)ƍMXY, ∀ ȟ=1,…, k.

(4)

In view of equations (2) and (3), this last set of inequalities implies that WXYщ Ƒ xȟ[yȟш Ƒ (-xȟ)ƍ] щ Ƒ xȟ=yȟ= [yȟщ Ƒ (-xȟ)ƍ] ш Ƒ xȟMXYш Ƒ xȟ

(5)

∀ ȟ=1,…, k or, equivalently, that

WXY щ Ƒ X Y MXY ш Ƒ X.

(6)

4

N. Feng et al.

Ƒ X =Y, then WXY is called a щ Ƒ -perfect memory for (X, Y); if MXYш Ƒ X= Y, If WXYщ then MXY is called a ш Ƒ -perfect memory for (X, Y). The basic computations used in FMAM and EFMAM are based on the algebraic l l l lattice structure ( R + , ∧, ∨ , ⋅) ( R + = (0, ∞ )) . If the input vector x = ( x1 , " , xn ) ' is

defined in R + , and the output vector y = ( y1 , " , ym ) ' is defined in R + , by using some transformation, for example, exp (x) and exp (y) (acting on each component of n m x, y), the input vectors and output vectors can be transformed into R + and R + , respectively. Set X = (x1,", x k ) , Y = (y1 ,", y k ) , with each pair of matrices (X, Y), two new morphological m × n memories AXY and BXY are as follows: l

n

l

l

m

l −1

l −1

A XY = (∧ lk=1 ( y l ٤ (x ) ) , B XY = (∨ lk=1 ( y l ٤ (x ) ) §1 1 · (x ) = ¨ l , " , l ¸ xn ¹ © x1 l −1

(7)

'

xil > 0, ∀i = 1, " , n

§ ¨ ¨ l − 1 l − 1 y l ٤ (x ) = y l ٤ (x ) = ¨ ¨ ¨ ¨ ©

l

y1 l

x1 #

l

y1

"

l

xn

%

#

l

ym l

x1

l

ym

"

l

xn

(8)

· ¸ ¸ ¸ ¸ ¸ ¸ ¹

(9)

where ш ż and щ ż denote fuzzy composite operation (∧, ⋅) and (∨ , ⋅) often used in fuzzy set theory, respectively. In FMAM and EFMAM, the recall is given by AXYщ ż xl=(

ш y шż(x ) )щżx and B k l=1

l

l -1

l

ш

l XY ż x =(

щ y щż(x ) )шżx k l=1

l

l -1

l

(10)

With analyzing for MAM, FMAM and EFMAM, we can easily see that there exist reversible operators in memory and recall. For MAM, the reversible operators in memory and recall are – and +, respectively; for FMAM and EFMAM, they are / and ×, respectively. We unify them with the following definitions.

3.2 Unified Morphological-Norm Operators Definition 1. For an m×p matrix A and a p×n matrix B with entries from U, the o matrix product C =A щ B, also called the morphological max product norm of A and B, is defined by

cij = ∨ k =1aik Ο bkj = (ai1Οb1 j ) ∨ (ai 2 Οb2 j ) ∨ " ∨ (aip Ο bpj ) . p

o

(11) +

-

Φ

Where, щ is a unified morphological operator, which represents one of the щ , щ , щ , /

and щ . The symbol Ɉ represents a reversible operation, such as +,

㧙, ×, or /.

Definition 2. For an m×p matrix A and a p×n matrix B with entries from U, the o matrix product C =A ш B, also called the morphological min product norm of A and B, is defined by

A Unified Framework of Morphological Associative Memories

5

cij = ∧ k =1aik Ο bkj = (ai1Οb1 j ) ∧ (ai 2 Οb2 j ) ∧ " ∧ (aip Ο bpj ) . p

(12)

o

+

Φ

-

Where, ш is a unified morphological operator, which represents one of the ш , ш , ш , /

and ш . The symbol Ɉ represents a reversible operation, such as +,

㧙, ×, or /.

Definition 3. For an m×p matrix A and a p×n matrix B with entries from U and the + + max product C =A щ B, the morphological operator щ is defined by:

cij = ∨ k =1aik + bkj = (ai1 + b1 j ) ∨ (ai 2 + b2 j ) ∨ " ∨ (aip + bpj ) . p

Similarly, we can define the morphological operators

(13)

щ- , Φщ , and щ/ .

Definition 4. For an m×p matrix A and a p×n matrix B with entries from U and the + + min product C =A ш B, the morphological operator ш is defined by:

cij = ∧ k =1aik + bkj = (ai1 + b1 j ) ∧ (ai 2 + b2 j ) ∧ " ∧ ( aip + bpj ) . p

Similarly, we can define the morphological operators

(14)

ш- , Φш , or ш/ .

ȟ

Щ

Definition 5. Let (x1, y1),…,(xk, yk) be k vector pairs with xȟ=(x1 ,…, xn )ƍ Rn and ȟ ȟ yȟ=(y1 ,…, ym )ƍ Rm for ȟ=1,…, k. For a given set of pattern associations {(xȟ, yȟ): ȟ=1,…, k} and a pair of associated pattern matrices (X,Y), where X = (x1,", x k ) , Y = (y1 , ", y k ) , the morphological min-product memory WXY is defined by ȟ

Щ

§ y1ȟ o x1ȟ " y1ȟ o xnȟ · ¨ ¸ WXY = Y ш X ' = ∧ ξ =1[ y ш ( x ȟ ) '] = ∧ξ =1¨ ȟ # ȟ % ȟ # ȟ ¸ ¨y ox " y ox ¸ m n¹ © m 1 o

k

ȟ o

k

(15)

And the morphological max-product memory MXY is defined by M XY = Y щ X ' = ∨ k [ y ȟ щ ξ =1 o

o

o

§ y1ȟ o x1ȟ " y1ȟ o xnȟ · ¸ ( x ȟ ) '] = ∨ξk =1¨¨ # % # ¸ ¨ yȟ o xȟ " yȟ o xȟ ¸ m n¹ © m 1

(16)

o

Since y ȟ ш ( x ȟ ) ' = y ȟ щ ( x ȟ ) ' , WXY and MXY follow from this definition that WXY y ȟ ш ( x ȟ ) ' = y ȟ щ ( x ȟ ) ' MXY ∀ξ = 1, " , k o

o

(17)

Let ш represents the reverse of ш , and щ represents the reverse of щ , that is, Ɉ and Ĭ are reversible each other. If Ɉ=+ or ×, then Ĭ= or ÷; on the contrary, if Ɉ= or ÷, then Ĭ=+ or ×. Then, WXY and MXY satisfy that Ĭ

o

Ĭ

㧙

o

㧙

6

N. Feng et al.

WXY щ x ξ ≤ [ y ξ ш Ĭ

Ɉ

щ

(18)

Ĭ

(19)

(xξ ) '] щĬ xξ = y ξ = [yξ o (xξ ) '] шĬ xξ ≤ MXYшĬ xξ

∀ξ = 1, " , k or equivalently, that Ĭ

WXY щ X ≤ Y ≤ MXY ш X o

Definition 6. A matrix A=(aij)m×n is said to be a щ -perfect memory for (X,Y) if and o o only if Aщ X =Y. The matrix A=(aij)m×n is said to be a ш -perfect memory for (X,Y) if o

and only if Aш X = Y. In fact, in the existing MAM there are only two memories WXY and MXY defined by using operators ш and щ , respectively. In the existing FMAM and EFMAM, it is also the same, i.e. there are only two memories WXY and MXY defined by using / / operators ш and щ , respectively. But according to the definitions 1 to 6, there will be four memories in MAM, FMAM or EFMAM, respectively. The two additional + + Φ Φ memories defined by using operators ш and ш (for MAM), and by using ш and щ (for FMAM or EFMAM), respectively. That is to say, there are more methods in the unified framework than there are in MAM, FMAM and EFMAM.

4 Unified Morphological Theorems Ritter gave 7 theorems with respect to MAM in [7]. Wang et al. also proved 6 theorems with respect to FMAM in [13] and 4 theorems with respect to EFMAM in [14]. Our research result shows that these theorems can be unified. We give two of them and their proofs as two examples. Theorem 1: If A is щ -perfect memory for (X, Y) and B is ш -perfect memory for (X, Y), then o

o

AWXYMXYB and WXY

щo X = Y =M шo X. XY

Proof of Theorem 1: If A is щ -perfect memory for (X, Y), then (A щ xȟ )i =y i for all ȟ = 1,…, k and all i=1,…, m. Equivalently o

∨

n j =1

( a ij Ο x ξj ) = y iξ

o

∀ ξ = 1, " , k

and

ȟ

∀ i = 1, " , m .

For MAM, U=R, Ɉ=±, Ĭ= B , it follows that for an arbitrary index j ∈ {1, " , n} we have

a ij Ο x ξj ≤ y iξ

∀ ξ = 1, " , k

⇔ a ij ≤ y iξ Θ x ξj ⇔ a ij ≤

For FMAM and EFMAM, U=R+, Ɉ=× or also can be derived.

∧ξ

k =1

∀ ξ = 1, " , k

( y iξ Θ x ξj ) = w ij

(20)

㧛, Ĭ=㧛 or ×, the set of inequalities (20)

A Unified Framework of Morphological Associative Memories

7

This shows that AWXY. In view of (19), we now have Y=A щ XWXY щ XY, o

o

and therefore, WXY щ X=Y. A similar argument shows that if B is ш -perfect memory o

o

for (X, Y), then MXYB and MXY ш X=Y. Consequently we have AWXYMXYB o

̱

and WXY щ X = Y =MXY ш X. o

o

o

Theorem 2: WXY is щ -perfect memory for (X, Y) if and only if for each ȟ = 1,…, k, o

Ĭ

each row of the matrix [yȟ щ (xȟ)ƍ]- WXY contains a zero entry. Similarly, MXY is ш perfect memory for (X, Y) if and only if for each ȟ = 1,…, k, each row of the matrix Ĭ MXY -[yȟ щ (xȟ)ƍ] contains a zero entry.

Proof of Theorem 2: We only prove the theorem in one domain for either the memory WXY or the memory MXY. The result of proof for the other memory can be derived in an analogous fashion. o WXY is щ -perfect memory for (X, Y) ⇔ ( WXY щ x ξ ) i = y iξ ∀ȟ =1,…,k and ∀i =1,…,m o

⇔ y iξ − ( WXY щ x ξ ) i = 0 ∀ȟ =1,…,k and ∀i =1,…,m o

∨

⇔ y iξ −

∧ ⇔ ∧ ⇔ ∧ ⇔

n j =1 n j =1 n

n j =1

( wij Ο x ξj ) = 0 ∀ȟ =1,…,k and ∀i =1,…,m

( y iξ − ( wij Ο x ξj )) = 0 ∀ȟ =1,…,k and ∀i =1,…,m ( y iξ Θ x ξj − wij ) = 0 ∀ȟ =1,…,k and ∀i =1,…,m Ĭ

([ y ξ щ ( x ξ ) '] − WXY ) ij = 0 ∀ȟ =1,…,k and ∀i =1,…,m

j =1

This last set of equations is true if and only if for each ȟ=1,…,k and each integer i Ĭ =1,…,m, each column entry of the ith row of [yȟ щ (xȟ)ƍ]- WXY contains at least one zero entry.

̱

We need to note that the conditions the equation set given above holds are different for MAM and for FMAM or EFMAM. For MAM, it holds in U=R; for FMAM or EFMAM, it holds in U=R+.

5 Discussions What are the advantages of the unified framework of morphological associative memories? We think that there are at least three benefits in it: Firstly, the unified theoretical framework is beneficial to understanding the MAM, FMAM and EFMAM. This paper analyzes the common properties of MAM, FMAM and EFMAM, and establishes the theoretical framework of unified morphological associative memory (UMAM) by extracting these common properties. The more the thing is abstracted, the deeper the thing is understood. Therefore the UMAM is of great benefit to us in research and applications with respect to MAM, FMAM and EFMAM. Secondly, the UMAM can help us find some new methods. In fact, the method of defining the morphological memory WXY or MXY in MAM, FMAM or EFMAM is

8

N. Feng et al.

not unique. For example, according to (15) and (16), the WXY and MXY also can be defined by: WXY= ∧ ξK=1 ( y ξ

1

And MXY= ∨ ξK=1 ( y ξ

1

+ ξ ξ 2 K ξ Φ ш ( x ) ') or WXY= ∧ ξ =1 ( y ш ( x ) ')

+ ξ ξ 2 K ξ Φ щ (x ) ') or MXY= ∨ ξ =1 ( y щ (x ) ')

(21)

(22)

Consequently, there are more methods defining the memories WXY and MXY in the UMAM. Finally, the methods in the UMAM are complementary rather than competitive. For this reason, it is frequently advantageous to use these methods in combination rather than exclusively.

6 Experiments A number of experiments are conducted to demonstrate the advantages of the methods in UMAM. Three typical of experiments are as follows: Experiment 1. Let §0· §0· § 0 · § −1 · § 0 · § 0 · 1 2 2 3 3 ¨ ¸ ¨ ¸ ¨ ¸ ¨ ¸ ¨ ¸ x = 0 , y = 1 ; x = −2 , y = −1 ; x = −3 , y = ¨ −2 ¸ ¨ ¸ ¨ ¸ ¨ ¸ ¨ ¸ ¨ ¸ ¨ ¸ ¨0¸ ¨0¸ ¨ −4 ¸ ¨ 0 ¸ ¨ 0 ¸ ¨ 0 ¸ © ¹ © ¹ © ¹ © ¹ © ¹ © ¹ 1

Then both 1

WXY=

ш

3 ȟ=1

§ 0 0 0 · § -1 1 3 · § 0 (yȟ ш (xȟ)ƍ)= ¨ 1 1 1 ¸ ∧ ¨ -1 1 3 ¸ ∧ ¨ -2 ¨¨ ¸¸ ¨¨ ¸¸ ¨¨ ©0 0 0¹ © 0 2 4¹ © 0

· ¸ 1 -2 ¸¸ 3 0 ¹ 30

=

§ -1 0 0 · ¨ -2 1 -2 ¸ ¨¨ ¸¸ ©0 0 0¹

and

щ (y щ (x )ƍ)= §¨¨

§ -1 1 3 · § 0 3 0 · § 0 3 3 · ¸ ¨ ¸ ¨ ¸ ¨ ¸ 1 1 1 ∨ -1 1 3 ∨ -2 1 -2 = 1 1 3 ¨ ¸¸ ¨¨ ¸¸ ¨¨ ¸¸ ¨¨ ¸¸ ©0 0 0¹ © 0 2 4¹ © 0 3 0 ¹ ©0 3 4¹ are perfect recall memories, because they satisfy the definition 6, respectively. But both § 0 0 0 · § -1 -3 -5 · § 0 -3 0 · § -1 -3 -5 · 3 + 2 WXY= ȟ=1(yȟ ш (xȟ)ƍ)= ¨ 1 1 1 ¸ ∧ ¨ -1 -3 -5 ¸ ∧ ¨ -2 -5 -2 ¸ = ¨ -2 -5 -5 ¸ ¨¨ ¸¸ ¨¨ ¸¸ ¨¨ ¸¸ ¨¨ ¸¸ © 0 0 0 ¹ © 0 -2 -4 ¹ © 0 -3 0 ¹ © 0 -3 -4 ¹ and § 0 0 0 · § -1 -3 -5 · § 0 -3 0 · § 0 0 0 · 3 + 2 MXY= ȟ=1(yȟ щ (xȟ)ƍ)= ¨ 1 1 1 ¸ ∨ ¨ -1 -3 -5 ¸ ∨ ¨ -2 -5 -2 ¸ = ¨ 1 1 1 ¸ ¨¨ ¸¸ ¨¨ ¸¸ ¨¨ ¸¸ ¨¨ ¸¸ © 0 0 0 ¹ © 0 -2 -4 ¹ © 0 -3 0 ¹ © 0 0 0 ¹ are not. 1

MXY=

ш

щ

3 ȟ=1

ȟ

-

ȟ

0 0 0·

A Unified Framework of Morphological Associative Memories

Experiment 2. Set x = 1

§1· §1· §2· § 0 · §3· §0· ¨ 2 ¸ , y1 = ¨ 0 ¸ ; x 2 = ¨ 3 ¸ , y 2 = ¨ −1¸ ; x3 = ¨ 4 ¸ , y 3 = ¨ 1 ¸ ¨ ¸ ¨ ¸ ¨ ¸ ¨ ¸ ¨ ¸ ¨ ¸ ¨3¸ ¨0¸ ¨4¸ ¨ −1¸ ¨1¸ ¨0¸ © ¹ © ¹ © ¹ © ¹ © ¹ © ¹

If the Ritter’s method is used, then both 1

WXY=

ш

3 ȟ=1

§ 0 -1 -2 · § -2 -3 -4 · § -3 -4 -1 · (yȟ ш (xȟ)ƍ)= ¨ -1 -2 -3 ¸ ∧ ¨ -3 -4 -5 ¸ ∧ ¨ -2 -3 0 ¸ ¨¨ ¸¸ ¨¨ ¸¸ ¨¨ ¸¸ © -1 -2 -3 ¹ © -3 -4 -5 ¹ © -3 -4 -1 ¹

=

§ -3 -4 -4 · ¨ -3 -4 -5 ¸ ¨¨ ¸¸ © -3 -4 -5 ¹

=

§ 0 -1 -1 · ¨ -1 -2 0 ¸ ¨¨ ¸¸ © -1 -2 -1 ¹

and 1

MXY=

щ (y щ (x )ƍ)= §¨¨ 3 ȟ=1

ȟ -

· § -2 -3 -4 · § -3 -4 -1 · ¸ ¨ ¸ ¨ ¸ -1 -2 -3 ∨ -3 -4 -5 ∨ -2 -3 0 ¸¸ ¨¨ ¸¸ ¨¨ ¸¸ ¨ © -1 -2 -3 ¹ © -3 -4 -5 ¹ © -3 -4 -1 ¹

ȟ

0 -1 -2

are not perfect recall memories. But if the method in UMAM is used, then both 2

WXY=

ш

3 ȟ=1

§ 2 3 4· § 2 3 4· § 3 4 1· + (yȟ ш (xȟ)ƍ)= ¨ 1 2 3 ¸ ∧ ¨ 1 2 3 ¸ ∧ ¨ 4 5 2 ¸ ¨¨ ¸¸ ¨¨ ¸¸ ¨¨ ¸¸ ©1 2 3¹ ©1 2 3 ¹ © 3 4 1¹

=

§ 2 3 1· ¨1 2 2¸ ¨¨ ¸¸ ©1 2 1¹

=

§3 4 4· ¨ 4 5 3¸ ¨¨ ¸¸ ©3 4 3¹

and 2

MXY=

щ (y щ (x )ƍ)= §¨¨ 3 ȟ=1

ȟ +

§ 2 3 4· § 3 4 1· ¸ ¨ ¸ ¨ ¸ 123 ∨ 123 ∨ 452 ¸ ¨ ¸ ¨ ¸¸ ¨ ¸ ¨ ¸ ¨ ©1 2 3¹ ©1 2 3 ¹ © 3 4 1¹

ȟ

2 3 4·

are perfect recall memories. Experiment 3. Let §1 X=¨4 ¨ ¨4 ©

2·

§1 ¸ ¨ 2 , Y= 2 ¸ ¨ ¸ ¨1 4¹ ©

2 2 2

2 2 1

1·

¸. ¸ ¸ 1¹ 2

If the Ritter’s method in MAM is used, then 1

1

§1 1 1· § -1 -3 -3 · + WXY = Y ш X ' = ¨ 0 -2 -2 ¸ , 1WXY щ X= ¨ 2 2 2 ¸ Y; ¨ ¸ ¨ ¸ ¨1 1 1¸ ¨ −1 −3 −3¸ © ¹ © ¹ §0 M XY = Y щ X ' = ¨ 1 ¨¨ ©0

0 0 −1

§ 1 2 2· · 1 + ¸ , M X = XY ш ¨ 2 2 2 ¸ Y. 0 ¨ ¸ ¸¸ ¨1 1 1¸ −1 ¹ © ¹ 0

9

10

N. Feng et al.

If the method in FMAM or EFMAM is used, then 2

2

§0.5 0.25 0.25· / Φ §1 WXY = Y ш X ' = ¨ 1 0.5 0.5 ¸ , 2WXY щ X= ¨ 2 ¨ ¸ ¨ ¨0.5 0.25 0.25¸ ¨1 © ¹ © §

/

M XY = Y щ X ' = ¨ ¨¨ ©

1

1

2

1

1

0 .5

1 2

1· 2¸

Y;

¸ ¸ 1 1¹

· 2 Φ § 1 2 2 · Y. , MXY ш X= ¨ ¸ 1 2 2 2¸ ¨ ¸ ¸¸ ¨ ¸ 0 .5 ¹ ©1 1 1¹ 1

They make not perfect memory for (X, Y). But if the method in UMAM is used, then 3

+ WXY = Y ш X ' = §¨ 2

3

3

+

4

5

5·

6

6

3

-

§1 2 1· = Y; 2 2 2¸ ¨ ¸ ¨1 1 1¸ © ¹

, 3WXY щ X= ¨

§4

5

§1 2 1· 3 ¸ , MXY ш X= ¨ 2 2 2 ¸ =Y. ¨ ¸ ¸¸ ¨1 1 1¸ 5¹ © ¹

§1

2

4·

¨ ¨1 ©

2

WXY = Y ш X ' = ¨ 2

Φ

¸ 4 ¸ ¸ 3¹

4

M XY = Y щ X ' = ¨ 4

Φ

4·

¨ ¨2 ©

¨¨ ©3

4

3

§4

M XY = Y щ X ' = ¨ 4 ¨¨ ©2

4

4 8 4

/

§1 2 1·

¸ , WXY щ X= ¨ 2 2 2 ¸ =Y; 4 ¨ ¸ ¸ ¨1 1 1¸ ¸ 2¹ © ¹ 4

§1 2 1· / , 4MXY ш X= ¨ = Y. ¸ 8 2 2 2¸ ¨ ¸ ¸¸ ¨1 1 1¸ 4¹ © ¹ 4·

The three experiments given above show that the methods in UMAM are complementary, and therefore the UMAM can solve more associative memory problems, especially to hetero-MAM, hetero-FMAM and hetero-EFMAM.

7 Conclusions This paper introduces a new unified theoretical framework of neural-network computing based on lattice algebra. The main emphasis of this paper was on the unification of morphological associative memories, fuzzy morphological associative memories, and enhanced fuzzy morphological associative memories. Our research and experiments showed that the MAM, FMAM and EFMAM could be unified in the same theoretical framework. The significance of the unified framework consisted in: on the one hand we got a better and deeper understanding of the MAM, FMAM and EFMAM from the unified framework UMAM; on the other hand we obtained some new methods from it. Therefore the UMAM can solve more problems of the associative memories than the MAM, FMAM, and EFMAM do. The lattice algebraic approach to neural-network theory is new and a multitude of open questions await exploration. For example, new methods of morphological associative memory need further investigation; the application base of the unified framework needs expanding, etc. It is our hope that these problems will be better solved in the future.

A Unified Framework of Morphological Associative Memories

11

Acknowledgments This research is supported by the Science Fund of Henan Province, China (0511012300) and key project of Information and Industry Department of Chongqing City, China (200311014).

References 1. Raducanu,B., Grana, M., Albizuri, F. X.: Morphological Scale Spaces and Associative Memories: Results on Robustness and Practical Applications, J. Math. Image. Vis., vol. 19, no. 2 (2003), 113-131. 2. Suarez-Araujo, C. P.: Novel Neural Network Models for Computing Homothetic in Variances: An Image Algebra Notation,” J. Math. Imaging and Vision, vol. 7, no. 1 (1997), 69-83. 3. Huang, D.S., Systematic Theory of Neural Networks for Pattern Recognition (in Chinese), Publishing House of Electronic Industry of China, May (1996) 4. Huang, D.S.,“On the Modular Associative Neural Network Classifiers, The 5th National United conf on Computer and Application, Beijing, Vol.3 Dec. (1999).7.285-7.290. 5. Ritter, G. X., Urcid, G.: Lattice Algebra Approach to Single-neuron Computation, IEEE Transactions on Neural Networks, Vol. 14, No. 2, (2003), 282-295. 6. Ritter, G. X., Sussner, P.: An Introduction to Morphological Neural Networks, in Proc. 13th Int. Conf. Pattern Recognition, Vienna, Austria, (1996), 709-717. 7. Ritter, G. X., Sussner, P., Diaz-de-Leon, J. L.: Morphological Associative Memories. IEEE Transactions on Neural Networks, Vol. 9, No. 2, (1998) 281-293. 8. Ritter, G. X., Recent Developments in Image Algebra, in Advances in Electronics and Electron Physics, P. Hawkes, Ed. New York: Academic, vol. 80, (1991) 243-380. 9. Davidson, J. L., Hummer,F.: Morphology Neural Networks: An Introduction with Applications, IEEE System Signal Processing, vol. 12, no. 2, (1993) 177-210. 10. Davidson, J. L., Ritter, G. X., A Theory of Morphological Neural Networks, in Digital Optical Computing , vol. 1215 of Proc. SPIE, July (1990) 378-388. 11. Davidson, J. L., Strivastava, R.: Fuzzy Image Algebra Neural Network for Template Identification, in 2nd Annu. Midwest Electrotechnol. Conf., Ames, IA, Apr. (1993) 68-71. 12. Pessoa,L. F. C., Maragos, P.: Neural Networks with Hybrid Morphological/rank/linear nodes: A Unifying Framework with Applications to Handwritten Character Recognition, Pattern Recognition, vol. 33, Jun. (2000) pp. 945-960. 13. Wang, M., Wang, S. T., Wu, X. J., Initial Results on Fuzzy Morphological Associative Memories, ACTA ELECTRONICA SINICA (in Chinese), vol. 31, May (2003) 690-693. 14. Wang, M., Chen, S. C., Enhanced FMAM Based on Empirical Kernel Map, IEEE Transactions on Neural Networks, vol. 16, no. 3, (2005) pp. 557-564, 15. Gader, P. D., Khabou, M. A., Koldobsky, A.: Morphological Regularization Neural Network s, Pattern Recognition, vol. 33, Jun. (2000) 935-944. 16. Sussner, P.: Generalizing Operations of Binary Morphological Associative Memories Using Fuzzy Set Theory, J. Math. Image. Vis., vol. 19, no. 2, (2003) 81-93.

A New Speech Denoising Method Based on WPD-ICA Feature Extraction Qinghua Huang, Jie Yang, and Yue Zhou Institute of Image Processing & Pattern Recognition, Shanghai Jiaotong University, Shanghai, China, 200240 {qinghua, jieyang, zhouyue}@sjtu.edu.cn

Abstract. Independent Component Analysis (ICA) feature extraction is an efficient sparse coding method for noise suppression. However, single channel signal can not be directly applied in ICA feature extraction. In this paper, we propose a new method using wavelet packet decomposition (WPD) as preprocessing for single channel data. Wavelet packet coefficients (WPCs) provide multi-channel data as input data to learn ICA basis vectors. Furthermore we project input data onto the basis vectors to get sparser and independent coefficients. Appropriate nonlinear shrinkage function is used onto the components of sparse coefficients so as to reduce noise. The proposed approach is very efficient with respect to signal recovery from noisy data because not only the projection coefficients are sparser based on WPCs but both the features and the shrinkage function are directly estimated from the observed data. The experimental results have shown that it has excellent performance on signal to noise ratio (SNR) enhancement compared with other filtering methods.

1 Introduction Data decomposition and representation are widely used in signal processing. One of the simplest methods is to use linear transformation of the observed data. Given observation (often called sensor or data) matrix X ∈ * m× N , perform the linear transformation X = AS + η .

(1)

where A ∈ * m× n represents basis data matrix or mixing data matrix (dependent on application), S ∈ * n× N contains the corresponding hidden components that give the contribution of each basis vector, η ∈ * m× N is error or noise, n is the number of hidden sources, m is the number of observations and N is the number of samples. The row of S should be as sparse as possible for sparse component or independent as possible for ICA[1]. The row of S is sparse when only a small number of the components are significantly non-zero at the same time. Sparse coding has many applications such as redundancy reduction and denoising. For supergaussian signal D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 12 – 20, 2006. © Springer-Verlag Berlin Heidelberg 2006

A New Speech Denoising Method Based on WPD-ICA Feature Extraction

13

independence and sparsity are consistent since sparsity is equivalent to supergaussian or leptokurtosis. The coefficients of ICA basis vectors usually have a sparse distribution then resulting in statistically efficient codes. Therefore ICA as a feature extraction method has been widely used in extracting efficient speech features and reducing noise[2,3]. However ICA requires signal from at least two separate sensors. For single channel data many ICA feature extraction methods directly segment the data into data matrix[4,5] and learn ICA feature basis using noise-free training signal as a prior knowledge[6]. Direct segmentation is based on the assumption that signal is stationary within a short time. But the assumption is approximate and not strict. Noise-free signal isn’t obtained in many practical applications. To overcome these limitations, we propose to use WPD to pre-process the observed data from a single sensor and then use the WPCs at different frequencies as input to learn ICA basis vectors. Furthermore we apply the sparsity of projection coefficients to reduce noise. Our method has advantages over other filtering methods in that both speech features and the nonlinear shrinkage function are estimated directly from the signals. Based on the sparsity of WPCs, the projection coefficients of ICA basis vectors are sparser. Experimental results have shown that the presented method is more efficient to reduce the additive Gaussian noise compared with other denoising methods. The paper is organized as follows. In section 2, the WPD as preprocessor and ICA feature extraction are described. In section 3 detailed speech denoising is introduced. Experiments and conclusions are presented in section 4 and section 5 respectively.

2 WPD-ICA Feature Extraction 2.1 Wavelet Packet Decomposition Standard ICA requires signal at least as many as sources. In many applications, single sensor signal is often obtained. We develop a new method to decompose the data

S(0,0)

A(1,0)

AA(2,0)

D(1,1)

AD(2,1)

DA(2,2)

Fig. 1. Wavelet Packet Decomposition

DD(2,3)

14

Q. Huang, J. Yang, and Y. Zhou

from a single sensor and then use the coefficients at different frequencies as input matrix to ICA. Projection of signal onto wavelet packet basis function is called wavelet packet decomposition. WPD has strong frequency resolution power and high time resolution power. It is a full binary tree decomposition [7,8,9] (Fig.1). WPD is used as a preprocessor to decompose the signal into a set of narrow band signals at first. If signal X is analyzed by n level full binary tree decomposition, we can get wavelet packet coefficient matrix (WPCM) X p using the following transformation

X p = Wp ⋅ X .

(2)

" C (0, N1 ) º " C (1, N1 ) »» , C(i,j) is the jth WPC of the ith node " " » » " C (m, N1 ) ¼ in n level decomposition, m = 0,1" 2n − 1 is the node index of n level of decomposition and N1 = N n (N is the number of samples) is the length of each node ª C (0,1) C (0, 2) « C (1,1) C (1, 2) where X p = « « " " « ¬C (m,1) C (m, 2)

coefficients. We use matrix X p to extract ICA feature. WPD is applied to decompose signal with noise. One of the primary properties of WPD is sparsity. That is, small WPCs are dominated by noise, while coefficients with a large absolute value carry more signal information than noise. The ICA feature vectors are learned from the WPCs, so the projection coefficients with a large absolute value also have more signal information than noise. Therefore the choice of wavelet and the level of decomposition play a critical role in this section and the following analysis. Concrete choice depends on the problem at hand. 2.2 ICA Feature Extraction

ICA as a feature extraction method can be detailed in the following n

X = AS = ¦ ai si .

(3)

i =1

where ai (each column of A) is a basis vector, all columns of A span a feature space which ensures that all projection coefficients s1 , s2 , " , sn satisfy mutually independent property. The idea of WPD-ICA is based on the following proposition [2]. Proposition 1: A (component-wise) linear operator T leaves the property of linear independence unaffected. Therefore we can use the ICA algorithm on the wavelet packet coefficient space to extract feature.

A New Speech Denoising Method Based on WPD-ICA Feature Extraction

15

We apply WPD as defined in Eq. (2) onto the two sides of Eq. (1) to get X p = Wp ⋅ X ° °S p = Wp ⋅ S ® °η p = W p ⋅η ° X = AS + η = A( S + η ) = AS p p p p p ¯ p

(4)

where X p , S p , η p are WPCM of signal, projection coefficients and noise respectively and Sp = S p + ηp , ηp = A−1η p . The covariance matrix of the noise in the wavelet packet domain equals Cη p = E[η pη Tp ] = E[Wpηη T W pT ] = Wp E[ηη T ]W pT . If

Wp is orthogonal and Cη = σ 2 I , then we get that Cη p = σ 2 I . In the same way if A is orthogonal, Cηp = σ 2 I . This means that orthogonal transformations leave the Gaussian noise structure intact, which makes the problem more simply tractable. The sparsity of X p means that the distribution of X p is highly peaky and heavy-tail than the Gaussian distribution. The property gives us the advantage to use ICA feature extraction. It enforces super-Gaussian distributions on the coefficients of ICA basis Sp in terms of the central limit theorem. We can be sure that the basis coefficients will be described by even more highly peaked distributions since the inputs of ICA are described by highly peaked ones. So we can learn better basis representation for the signal. ICA feature extraction algorithm is performed to obtain the estimation of projection coefficient matrix Sp and the basis matrix A by unmixing matrix W in the following equation

Yp = WX p .

(5)

where Yp is the estimation of Sp . ICA basis matrix can be calculated by the relation A = W −1 . By maximizing the log likelihood of projection coefficients, both the independent coefficients and basis matrix can be inferred at the same time. The learning rule is represented as ∆W ∝

∂ log p ( Y p ) ∂W

T T W W = η [ I − ϕ (Y p ) Y p ]W

(6)

.

here the updating algorithm is natural gradient method which speeds the convergence considerably[2]. The ϕ ( y ) is score function which is defined as

ϕ ( y) = −

p′( y ) ∂ log p( y ) . In this paper, we use the generalized gaussian =− p( y ) ∂y q

distribution to estimate the p.d.f of y, that is p( y ) ∝ exp(− y ) where q can be

16

Q. Huang, J. Yang, and Y. Zhou

learned from the data[10]. Combing with the learning rule in Eq. (6), the unmixing matrix is iterated until convergence is achieved. The basis function matrix is then obtained.

3 Speech Denoising Speech feature basis and sparse projection coefficients onto these basis vectors are acquired in section 2.2. In the noisy environment, Yp denotes the noisy coefficients. S p is the original noise-free coefficients. ηp is the projection coefficients of Gaussian noise. The relation between them can be described as Yp = S p + ηp .

(7)

We want to estimate S p from the noisy coefficients Yp . We can use the Maximum Likelihood (ML) estimation method. The ML estimation gives the relation Sˆ p = h(Yp ) where the nonlinear function h(⋅) is called as shrinkage function and its inverse is given by h −1 ( S p ) = S p + σ 2ϕ ( S p ) ( σ is the standard derivation of Gaussian

noise)[6,11]. In general, a model for elimination of noise or other undesirable components for single sensor data is depicted in the following steps and Fig.2.

X

Xp WPD

ICA (W)

Yp

Sp

Denoising

IICA (A)

X

Xp

IWPD

Fig. 2. Basic model for removing noise from single-sensor data

㧔1㧕Choose appropriate wavelet function and the best level of WPD. Use the WPCM

of observed noisy signal to learn ICA basis vectors and sparse projection coefficients. Apply the nonlinear shrinkage function on noisy coefficients to get the estimated noise-free coefficients. Inverse the ICA and WPD to obtain the recovered signal from the noisy signal.

㧔2㧕㧔3㧕

4 Experiments In our experiments, male and female speech signals with added Gaussian noise are used to test the performance of the proposed method. The sampling frequency is 8kHz and 40000 samples of each signal are used. Signal added with white Gaussian noise is represented as X = X s + nw

nw ~ Ν (0, σ 2 ) .

(8)

A New Speech Denoising Method Based on WPD-ICA Feature Extraction

17

Noisy signal with a colored Gaussian noise is described as

X = X s + nc .

(9)

where nc is colored Gaussian noise and is modeled by an autoregressive process AR(2):

nc (t ) = 1.32nc (t − 1) − 0.89nc (t − 2) + nw (t ) .

(10)

Firstly the Daubechies function of order 8 has been chosen as the wavelet function and speech signal is analyzed by WPD through six level of decomposition. Wavelet packet coefficients are represented as " C (0, 625) º " C (1, 625) »» » " " » " C (m, 625) ¼

ª C (0,1) C (0, 2) « C (1,1) C (1, 2) Xp = « « " " « C C (63,1) (63, 2) ¬

The unmixing matrix W is initialized by 64×64 identity matrix and the learning rate is gradually decreased during iterations. W is learned by the algorithm in Eq. (6) and it is used as the filter to get sparse coefficients. Estimated noise-free coefficients are obtained by denoising the sparse coefficients. Enhanced signal is reconstructed from the estimated noise-free coefficients. To judge the performance of noise suppression, the signal to noise ratio is used

¦ signal (t ) SNR = 10 log ¦ noise (t ) 2

t

2

.

(11)

t

As a measure of the signal approximation the root mean squared error (RMSE) defined as RMSE =

N

¦ (S

ideal i

− Sireconstructed ) / N .

(12)

i =1

can be used. The RMSE is only an overall measure of the performance. In the first experiment of male speech signal, the noisy male speech signal corrupted by four different intensity of additive white Gaussian noise are used to test the method. The SNR of the input noisy signals are 0.1175, -6.2592, -9.1718 and 13.6174dB respectively. We can get high SNR and satisfied reconstructed signal. The output SNR results of the recovered male speech signal are 4.8094, 3.1309, 0.0918, 0.8782dB and RMSE results are 0.0398, 0.0433, 0.0504, 0.0529 respectively. It can be seen that the SNR have much improvement. Fig.3 shows the denoising results of the noisy male speech with the input SNR of -13.6174dB and it was compared to the filtering results from the median filter and the wavelet filter method. Table 1 denotes the SNR and RMSE of denoised signal under the condition of additive white Gaussian noise.

18

Q. Huang, J. Yang, and Y. Zhou

Fig. 3. The denoising results of noisy male speech when the input SNR is -13.6174dB (a) denotes the clean male speech signal (b) denotes the noisy signal with additive white Gaussian noise (c) denotes the denoising result of our method (d) denotes the denoising result of wavelet filter (e) denotes the denoising result of median filter with n=5 Table 1. Denoising results of male speech with white Gaussian noise

Input signal SNR(dB)

WPD-ICA denoised signal SNR(dB) RMSE

Wavelet denoised signal SNR(dB) RMSE

Median value filtered signal SNR(dB) RMSE

-13.6174 -9.1718 -6.2592 0.1175

-0.8782 0.0918 3.1309 4.8094

-2.2804 -1.7821 1.2803 1.4406

-8.0949 -6.0853 -4.8516 -1.8285

0.0529 0.0504 0.0433 0.0398

0.0568 0.0554 0.0475 0.0471

0.0759 0.0687 0.0646 0.0555

Female speech signal with four different intensity of additive colored Gaussian noise are used in another experiment. The SNR of the input noisy signals are 4.8004, 0.4854, -5.0242 and -12.6541dB respectively. Fig.4 denotes the results of three methods which suppress the additive colored Gaussian noise. The SNR and RMSE of denoised female speech can be seen from Table 2.

A New Speech Denoising Method Based on WPD-ICA Feature Extraction

19

Fig. 4. the denoising results of noisy female speech when the input SNR is -3.2423 dB (a) denotes the clean female speech signal (b) denotes the noisy signal with additive colored Gaussian noise (c) denotes the denoising result of our method (d) denotes the denoising result of wavelet filter (e) denotes the denoising result of median filter with n=5 Table 2. Denoising results of female speech corrupted by colored Gaussian noise

Input signal SNR(dB)

WPD-ICA denoised signal SNR(dB) RMSE

Wavelet denoised signal SNR(dB) RMSE

Median value filtered signal SNR(dB) RMSE

-17.1052

-8.1328

0.0797

-12.9301

0.1013

-13.0024

0.1016

-11.3516 -3.2423

-1.3560 2.1785

0.0568 0.0476

-3.7092 1.3112

0.0639 0.0497

-9.3792 -5.8208

0.0848 0.0710

2.5113

6.4278

0.0347

2.7661

0.0486

-4.2815

0.0657

5 Conclusions How to extract basis vectors directly from the single channel speech signal is the key problem in noisy speech denoising. Therefore in this paper we present a new approach to combine ICA feature extraction with WPD so as to extract basis function

20

Q. Huang, J. Yang, and Y. Zhou

directly from single channel data. WPD-ICA learns basis vectors using the high order statistics of the data. Projection coefficients onto the learned basis vectors are sparser and more suitable for reducing noise. Shrinkage function can also be obtained from data. Experiments on real speech signal with added Gaussian noise have shown that the proposed method can efficiently suppress noise and enhance signals.

References 1. Commo, P.: Independent Component Analysis, A New Concept? Signal Processing, Vol.36 (1994) 287-314 2. Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley, New York (2001) 3. Roberts, S., Everson, R.: Independent Component Analysis: Principles and Practice. Cambridge University Press, Cambridge (2001) 4. Lee, T.-W., Jang, G.-J.: The Statistical Structures of Male and Female Speech Signals. Proc. ICASSP, Salt Lack City, Utah, May (2001) 105-108 5. Lee, J.-H., Jung H.-Y., Lee, T.-W., Lee, S.-Y.: Speech Feature Extraction Using Independent Component Analysis. Proc. ICASSP, Istanbul, Turkey, Vol. 3, June (2000) 1631-1634 6. Hyvärinen, A.: Sparse Code Shrinkage: Denoising of Nongaussian Data by Maximum Likelihood Estimation. Technical Report A51, Helsinki University of Technology, Laboratory of Computer and Information Science (1998) 7. Mallet, S.: A Wavelet Tour of Signal Processing. Academic Press, second edition (1999) 8. Ravier, P., Amblard, P.O.: Wavelet Packets and De-noising Based on Higher-orderStatistics for Transient Detection. Signal Processing, Vol.81 (2001) 1909-1926 9. Donoho, D.L., Johnstone, I.: Adapting to Known Smoothness Via Wavelet Shrinkage. J. Amer. Stat. Assoc. Vol.90, Dec (1995) 1200-1224 10. Lee, T.-W., Lewicki, M.-S.: The Generalized Gaussian Mixture Model Using ICA. International workshop on Independent Component Analysis (ICA’00), Helsinki, Finland, June (2000) 239-244 11. Donoho, D.L.: De-noising by Soft Thresholding. IEEE Trans. Inf. Theory, Vol.41, No.3 (1995) 613-627

An Efficient Algorithm for Blind Separation of Binary Symmetrical Signals* Wenbo Xia and Beihai Tan School of Electronic and Communication Engineering, South China University of Technology 510640, China {WBX, northsea80}@163.com

Abstract. An efficient algorithm for blind separation of binary symmetrical signals is proposed, which don’t depend on the statistical characteristics of source signals. The mixture matrix is estimated accurately by using the relations of sensor signals in case of no noise and it is also proved in this paper, and the estimated mixture matrix is a primary column transformation of the original mixture matrix by the algorithm, through which the source signals are recovered by permutations and sign changes of their rows. In practice, they can be corrected by introducing headers in the bit-streams and differently encoding them. The algorithm is shown simple and efficient in last simulations.

1 Introduction Blind separation problem has been one hot topic in the recent years, which has gained much attention, see, e.g., [1],[2],[3],[4],[5],[6],[13],[15] etc. Blind source separation (BSS) is to recover the source signals without any information of both the source signals and the channels. In many previous researches, because one recovering independent components of sources by the sensor signals, so this kind of BSS problem is also called independent component analysis (ICA). There have existed a lot of algorithms and applications of BSS up to now. Specially, in paper [10], Xie’s conjecture corrected the famous Stone’s conjecture. BSS algorithms based on Xie’s conjecture should be without suspicion in basic theory. From now on, researches have a reliable basis to study BSS both in theory and algorithm design. In the same time, the applications of BSS cover many areas, such as: array processing, multi-user communication and biomedicine etc. For digital signals blind separation, there also were many algorithms, such as: AlleJan’ analysis method [7], K.Anand’s two-step clustering method [8], Li Yuanqing’s underdetermined algorithm [9] and others [11],[12],[14],etc. But these algorithms are *

The work is supported by the National Natural Science Foundation of China for Excellent Youth (Grant 60325310), the Guangdong Province Science Foundation for Program of Research Team (grant 04205783), the National Natural Science Foundation of China (Grant 60505005), the Natural Science Fund of Guangdong Province, China (Grant 05103553) and (Grant 05006508), the Specialized Prophasic Basic Research Projects of Ministry of Science and Technology, China (Grant 2005CCA04100).

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 21 – 28, 2006. © Springer-Verlag Berlin Heidelberg 2006

22

W. Xia and B. Tan

complicated in computations and impreciseness for restoration completely. This paper proposes a novelty blind separation algorithm of binary symmetrical signals, which only use the relations of sensor signals to estimate the mixture matrix and recover source signals, and its good performance is also testified by the last simulations. For the sake of simplicity, we suppose that the binary symmetrical signals are BPSK signals in this paper for discussion.

2 Model of Blind Separation of BPSK Signals In digital signals blind separation, Let m narrowband BPSK signals from m different users, arrive at an array of d antennas [8]. The measured baseband signal at the p th element is given by: ∞

m

x p (t ) =

¦

qi a pi

i =1

¦ b ( j )s(t − jT − τ ) + w i

i

p (t ) ,

(1)

j =1

where T is baud period, q i is the amplitude of the i th user’s signal, a pi is response of the p th sensor to the i th user signal, bi (∗) = ±1 bit-stream transmitted by the i th user, s (*) is signal waveform of unit energy, τ i is time delay of the i th signal to the array and w p (*) is additive white noise at the p th sensor. We assume that the time

taken for electromagnetic waves to traverse the array is small compared to τ i and that the maximum multi-path delay is small compared to T . Here, we absorb the multipath effect into the coefficients a pi , and hence, a pi are not explicitly parameterized in terms of the directions-of-arrival (DOA’s). The a pi are unknown coefficients to be estimated as we estimate the bit-streams of the users. If the τ i are all equal (which is a simplifying assumption that is not necessarily true in practice and deserves more study), one can perform matched filtering over a symbol period T to obtain [8] m

x p ( n) =

¦q a i

pi bi ( n) + w p ( n)

,

(2)

i =1

where w p (n) is a white noise with 0 mean and variance is σ 2 , and it can be denoted as vector: x(n) = As (n) + w(n) ,

(3)

where s (n) = [b1 (n) bm (n)]T , x(n) = [ x1 (n) xd (n)]T , w(n) = [ w1 (n) wd (n)]T , A = [q1a1 q2 a 2 qm am ] , a r = [a1r a 2 r a dr ]T . If we have N snapshots and (3) can be denote as matrix X ( N ) = AS ( N ) + W ( N ) ,

(4)

where X ( N ) = [ x(1) x( N )] , S ( N ) = [ s (1), s (2) s ( N )] , and W ( N ) = [ w(1) w( N )] .

An Efficient Algorithm for Blind Separation of Binary Symmetrical Signals

23

Next, we suppose that there exits no noise, mixture matrix A is nonsingular and d = m , that is to say the number of the sensor signals are equal to the number of source signals. To combine (3) and (4), we have x(n) = As (n) , (5) X ( N ) = AS ( N ) .

(6)

3 An Efficient Algorithm for BPSK Signals Blind Separation In this paper, in order to separate digital signals, we must estimate mixture matrix first. When N is enough great, column vector s (n) in (3) has 2 m distinct vectors denoted as V = {s1 , s 2 s 2m } , that is to say all column vectors of matrix S (N ) in (4) come from one of the set V . Similarly, we also get 2 m distinct vectors of sensor signals through (5) denoted as U = {x1, x2 x2 m } , and obviously all column vectors of matrix X (N ) come from one of the set U , namely xi = Asi

(i = 1,2, 2 m ) .

(7)

It is also denoted as x1i = a11 s1i + a1m s mi x 2i = a21 s1i + a 2 m s mi

(i = 1,2, 2 m ) ,

(8)

x mi = a m1 s1i + a mm s mi where x i = [ x1i x mi ]T , s i = [ s1i s mi ]T . Because s ij ∈ {+1,−1}, i = 1 m; j = 1 2 m , so x i + x j = A( s i + s j ) (i ≠ j , i = 1,2, 2m , j = 1,2, 2m ) .

(9)

It can be denoted like (8) as x1i + x1 j = a11 ( s1i + s1 j ) + a1m ( s mi + s mj )

x 2i + x 2 j = a 21 ( s1i + s1 j ) + a 2 m ( s mi + s mj ) x mi + x mj = a m1 ( s1i + s1 j ) + a mm ( s mi + s mj )

(10)

From (10), we can know if ( s1i + s1 j ) = +2 or ( s1i + s1 j ) = −2 , but ( s ki + s kj ) = 0, k ≠ 1, k ∈ {1,2, m} we can have xi + x j = (+2)a1 or xi + x j = (−2)a1 , that is to say the sum of the i th vector and the j th vector in the set U is (+2) times of the first column a1 of mixture matrix A or (−2) times of it. Similarly, when ( s qi + s qj ) = +2 or ( s qi + s qj ) = −2 , but ( s ki + s kj ) = 0, k ≠ q, k ∈ {1,2, m} , then xi + x j = (+2)aq or xi + x j = (−2)aq , that means the sum of the i th vector and the j th vector in the set U is (+2) times of the column a q of mixture matrix A or (−2) times of it. Next, in

24

W. Xia and B. Tan

order to look for all column vectors in mixture matrix A , we will take any two vectors of the set U to add up, y l = x i + x j , l = 1C22m (i ≠ j , i = 1,2, 2 m , j = 1,2, 2 m ) .

(11)

Finally, let set Y = { yl , l = 1C22m } . Definition 1: In the set Y = { yl , l = 1C22m } , if y a = y b or ya = (−1) yb , a ≠ b , we will look on them as same cluster Gr , and r is the footnote of the cluster. Again, we define a set S = {sl , l = 1C22m } , s l = s i + s j , l = 1C22m (i ≠ j , i = 1,2, 2m , j = 1,2, 2m ) ,

(12)

y l = As l , l = 1C22m .

so

(13)

At the same time, according to definition 1, we also can cluster in the set S . When y a = y b or y a = (−1) y b , a ≠ b , we let their corresponding s a and s b into the same cluster H r . From definition 1 and combining equations (10), we can know that all column vectors of A must come from m clusters of Gr , and they are only different from the corresponding column vectors of m clusters of Gr by (+2) or (−2) times. Theorem 1: According to definition 1, when we classify Y = { yl , l = 1C22m } into different clusters, among all the clusters the m clusters which contain the column vectors of the mixture matrix A or the cluster whose element is zero vector, then the number of elements of them is most, and they are 2 m−1 respectively, but the number of elements of the other clusters is less than 2 m−1 . Proof: Let s l = s i + s j

γ

㧧 l = 1C 㧘 (i ≠ j, i = 1,2, 2 2 2m

m

, j = 1,2, 2m ) ;

< >; When sl = 0 , si can be taken 2 m distinct vectors. For every si , there exists a vector s j to make s i + s j = 0 , but because of symmetry, the number of appearance

2m = 2 m −1 . 2 < >; When sl = e1r , where e1r denotes a m × 1 vector whose r th element is (+2)

of sl = 0 is

δ

or (−2) , the other elements are all 0, and r is arbitrary. when e1r denotes the vector whose r th element is (+2) , the other elements are all 0, according to < >, the numsl = e1r

m −2

γ

e1r

is 2 . Similarly, when denotes the vector whose ber of appearance of r th element is (−2) , the other elements are all 0, the number of appearance of sl = e1r is 2 m −2 . So we can arrive a conclusion that the number of appearance of sl = e1r is 2 m −2 + 2 m− 2 = 2 m −1 , where e1r denotes a m × 1 vector whose r th element is (+2) or (−2) , the other elements are all 0.

An Efficient Algorithm for Blind Separation of Binary Symmetrical Signals

25

ε

2 2 < >; When sl = erk , where erk denotes a m × 1 vector whose r th element is (+2) and k th element is (−2) , the other elements are all 0; or whose r th element is (−2) and k th element is (+2) , the other elements are all 0, and r, k are arbitrary. 2 denotes the vector whose r th element is (+2) and k th element is (−2) , When erk

according to <

γ>, the number of appearance of s = e l

2 rk

is 2 m−3 , Similarly, when

2 erk denotes the vector whose r th element is (−2) and k th element is (+2) , the num2 is 2 m−3 . So we can arrive a conclusion that the number ber of appearance of sl = erk 2 is 2 m −3 + 2 m −3 = 2 m−2 . of appearance of sl = erk 2 2 , where erk denotes a m × 1 vector whose r th element Similarly, if when sl = erk is (+2) and k th element is (+2) , the other elements are all 0; or whose r th element is (−2) and k th element is (−2) , the other elements are all 0, and r, k are arbitrary, 2 is also 2 m −3 + 2 m −3 = 2 m−2 . the number of appearance of sl = erk Obviously, we know when the nonzero elements of the vector sl increase, the number of appearance of s l will decrease. Because sl is corresponding to yl , so when sl is zero vector or it has only a nonzero element, then the number of appear-

ance of sl is most, 2 m−1 , and the number of appearance of yl which come from the

̱

same cluster is most, and is 2 m−1 .

So when yl is nonzero vector and its appearance number is 2 m−1 , it must be a column vector of mixture matrix A . From theorem 1, in order to restore A , we should find m clusters in which elements are nonzero and the number of elements are most. We denote the m clusters as a new set Gˆ = {Gˆ , Gˆ Gˆ } . We take a column vector from every Gˆ and make 1

2

m

i

them divided by 2 then denoted as aˆ , (i = 1,2 m) . A new matrix Aˆ is composed of aˆ , (i = 1,2 m) , and It is obvious that Aˆ is only a primary column transformation of i

mixture matrix A . So Aˆ = AP , where P is a primary matrix. Substituting (14) for (6) X ( N ) = Aˆ P −1 S ( N ) .

(14) (15)

Let Sˆ ( N ) = P −1 S ( N ) , we have X ( N ) = Aˆ Sˆ ( N ) .

(16)

−1

Because P is a primary matrix, so P is also a primary matrix and Sˆ ( N ) is only a primary row transformation of S ( N ) . From (16), Sˆ ( N ) = Aˆ −1 X ( N ) ,

(17)

so the source signals can be restored through Sˆ ( N ) by permutations or sign changes.

26

W. Xia and B. Tan

Algorithm summary, 1. Find 2 m distinct sensor signal vectors denoted as U = {x1 , x2 x2m } from N

sensor signals. Get the set Y = { yl , l = 1C22m } through equation (11) and cluster them by us-

Ձ

ing definition 1. Find m nonzero clusters whose elements’ number are all 2 m−1 in above clusters, and denoted as a set of them Gˆ = {Gˆ 1 , Gˆ 2 Gˆ m } . Take a column vector from every cluster Gˆ , (i = 1,2 m) and divide it by 2, then denoted as aˆ , (i = 1,2 m) .

Ղ

i

Form a new matrix Aˆ by aˆ , (i = 1,2 m) . 4. Restore source signals by (17).

4 Simulation Results In the experience, we suppose there are three BPSK source signals in the following fig.1, and take N = 1000 in case of no noise. Here, a 3 × 3 random mixture matrix ª 0.8304 0.0490 -1.8211º A = ««- 0.0938-1.3631 1.4675»» «¬- 0.4591- 0.2131 - 0.4641 »¼

is brought. The mixture signals are gotten by equation (6) and

the three mixtures are shown by the following fig.2. Then, a new mixture matrix ª- 0.0490-1.8211 0.8304º Aˆ = «« 1.3631 1.4675- 0.0938»» «¬ 0.2131- 0.4641- 0.4591»¼

is obtained by the above algorithm accurately and source sig-

nals are restored by equation (17) and shown by the fig.3.

Fig. 1. Three source signals

An Efficient Algorithm for Blind Separation of Binary Symmetrical Signals

27

Fig. 2. Three mixture signals

Fig. 3. Three restored source signals

We find that the estimated mixture matrix Aˆ is a primary column transformation of the original mixture matrix A through the algorithm, so the restored source signals are only different from the original source signals by permutations and signs, and source signals are restored successfully. Similarly, the algorithm can be applied to general binary symmetrical signals for blind separation through the example of BPSK signals.

28

W. Xia and B. Tan

5 Conclusions This paper gives a novel algorithm for blind separation of binary symmetrical signals and it doesn’t depend on the characteristics of statistical independence of source signals. According to the characteristics of binary symmetrical signals, we can estimate the nonsingular mixture matrix and proved in the paper. The simulations show the estimated matrix accurate and the algorithm simple with a little computation. Therefore, it has good performance and precision for blind separation of binary symmetrical signals.

References 1. Xie, S.L., Zhang, J.L.: Blind Separation of Minimal Information Based on Rotating Transform. Acta Electronica Sinica, v 30, n5, May(2002) 628-631 2. Li, Y., Wang, J., Zurada, J.M.: Blind Extraction of Singularly Mixed Source Signals. Neural Networks, IEEE Transactions on Volume 11,Issue 6, (2000) 1413 – 1422 3. Yang, H.H., Amari, S., Cichocki, A.: Information-theoretic Approach to Blind Separation of Sources in Nonlinear Mixture. Signal Processing, vol.64, (1998) 291-300 4. Zhang, J.L., Xie, S.L., He, Z.S.: Separability Theory for Blind Signal Separation. Zidonghua Xuebao/Acta Automatica Sinica, v30, n 3, May (2004) 337-344 5. Bofill, P., Zibulevsky, M.: Underdetermined Source Separation Using Sparse Representation.Signal processing, 81 (2001) 2353-2362 6. Xiao, M., Xie, S.L., Fu, Y.L.: A Novel Approach for Underdetermined Blind Sources Separation in Frequency Domain. Advances in Neural Networks-ISNN 2005, LNCS 3497 (2005) 484-489 7. Van der veen, A.J.: Analytical Method for Blind Binary Signal Separation, IEEE Trans. Signal Process, 45 (1997) 1078-1082 8. Anand, K., Mathew, G., Reddy, V.U.: Blind Separation of Multiple Co-channel BPSK Signals Arriving at an Antenna Array, IEEE Signal Process. Lett. 2 (1995) 176-178 9. Li, Y., Cichocki, A., Zhang, L.: Blind Separation and Extraction of Binary Sources. Communication and Computer Sciences, 86 (2003) 580-590 10. Xie, S.L., He, Z.S., Fu, Y.L.: A Note on Stone’s Conjecture of Blind Separation. Neural Computation, 16(2004) 245-319 11. Li, Y., Cichocki, A., Zhang, L.: Blind Deconvolution of FIR Channels with Binary Sources: A Grouping Decision Approach. Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on Volume 4 (2003) 289-292 12. Talwar, S., Viberg, M., Paulraj, A.: Blind Estimation of Synchronous Co-channel Digital Signals Using an Antenna Array. Part I: algorithms. IEEE Trans. Signal Process. 44 (1996) 1184-1197 13. Zhang, J.L., Xie, S.L.: Multi-input Signal-output Neural Network Blind Separation Algorithm Based on Penalty Function. Intelligent and Complex Systems, 2(2003) 353-362 14. Lee, C.C., Lee, J.H.: An effient method for blind digital signal separation of array data. Signal Processing 77 (1999) 229-234 15. Xie, S.L., He, Z.S., Gao, Y.: Adaptive Theory of Signal Processing. 1st ed. Chinese Science Press, Beijing (2006) 130-223

A New Blind Source Separation Algorithm Based on Second-Order Statistics for TITO ZhenLi Wang, XiongWei Zhang, and TieYong Cao Nanjing Institute of Communications Engineering P.O. Box 7, Nanjing, Jiangsu, 210007 China [email protected], [email protected]

Abstract. In this paper, we investigate a new blind source separation algorithm for TITO (two-input and two-output) channel system. Considering the case of the noisy instantaneous linear mixture of source signals, we form the matrix pair with the two 2×2 dimension symmetric, positive definite matrices via two covariance matrices. We apply a set of transforms such as the Cholesky factorization and SVD (singular value decomposition) to this formed matrix pair. And a unitary matrix is then obtained, which is an accurate diagonalizer of each matrix of this pair. Compared with the JADE algorithm and the SOBI algorithm, some numerical results show the better performance of the new algorithm.

1 Introduction Blind source separation (BSS), aiming at recovering unobserved signals or “sources” from observed mixtures, has recently received a lot of attention. This is due to the many potential application areas, such as communication [1], [2], biomedical measurements [3], [4], etc. It is often called “blind” because it exploits only on the assumption of mutual independence between the sources without relying on any α priori knowledge about mixing matrix. In our work, we concerned only with the separation of noisy linear combinations of the two source signals obtained from TITO channel system. In its form, one observes two sequences S1 (n) , S2 (n) recorded from two sensors, each observation X i ( n) being a noisy linear combination of two sources. Thus

㧘 X (n) = AS (n) + d (n) ,

where vector S(n) =[s1(n) s2 (n)]T , vector

X (n) = [x1(n) x2 (n)]T , vector d (n) denotes additive perturbed noise and A = (ai , j ) is fixed unknown invertible matrix. Sources can be recovered blindly by either estimating the mixing matrix A or its pseudo-inverse M = A# corresponding to the demixing system. Several papers have introduced blind identification algorithms based on joint diagonalization (see [5], [6], [7]). For instance, the SOBI algorithm presented in [5], [6] relies on stationary second-order statistics which are based on a joint diagonalization of a set of covariance matrices. Similarly, the joint diagonalization of the JADE algorithm [7] is performed by some fourth-cumulant matrices. However, it costs unnecessary computation amount for the joint diagonalization of many matrices in these previous techniques. Moreover, the approximate diagonalization degrades the separation performance of sources. In this paper, we D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 29 – 34, 2006. © Springer-Verlag Berlin Heidelberg 2006

30

Z. Wang, X. Zhang, and T. Cao

propose a new blind source separation algorithm based on accurate diagonalization second-order statistics for TITO channel system. The remainder of this paper is organized as follows: Section 2 presents a new blind separation algorithm based on accurate diagonalization second-order statistics. Some numerical results are given in Section 3 to illustrate the validity of this algorithm. Finally, the conclusion is presented in Section 4.

2 The New Blind Source Separation Algorithm We consider exploiting second-order statistics information of the whitened observation signal X w (n) . For independent sources, the sample autocorrelation covariance matrices and its delayed counterpart are defined as the follows Rˆ (0) = X (n) X T (n) = ARˆ (0) AT + σ 2 I w

w

s

Rˆ ( k ) = X w ( n + k ) X wT ( n) , k ≥ 1 & k ∈ Z

(1) (2)

Where T denote the transpose of a vector or a matrix. Under the white noise assumption, the JADE algorithm and the SOBI algorithm both obtain an estimation σˆ 2 of the noise variance, which is the average of the m − n smallest eigenvalues of Rˆ (0) . Where m and n denote the numbers of the sample covariance matrices and the sources, respectively. For TITO channel system, this variance estimation can’t be performed since m = n . In this paper, the presented algorithm reduces the influence of disturbing noise via a series of transforms. It is introduced by the following steps: Step 1. Form the matrix pair ( P, Q ) with the two 2×2 dimension symmetric positive definite matrices as the follows

P = Rˆ (1) Rˆ T (1)

(3)

(4) Q = Rˆ (2) Rˆ T (2) Step 2. Compute the Cholesky factorization of matrix P and matrix Q , respectively.

P = RPT RP

(5)

Q = RQT RQ

(6)

By using upper-triangle matrices RP and RQ , a new matrix F is then defined as the following equation:

F = RP ⋅ RQ −1

(7)

Step 3. Compute SVD (singular value decomposition) of matrix F .

Σ = U F T FVF Where Σ = diag(σ 1 , σ 2 ) , σ 1 ≥ σ 2 > 0 .

(8)

A New BSS Algorithm Based on Second-Order Statistics for TITO

31

Step 4. Form a unitary matrix U according to equation (9), which is an accuracy diagonalizer of each matrix of the pair ( P, Q ) . Namely, matrix U satisfies with the form U T PU = D1 and U QU = D2 , where D1 and D2 are both diagonal matrices. T

U = RQ −1 ⋅ VF

(9)

Proof Applying (5) and (9) to the matrix product of PU , we shall get

PU = RPT RP ⋅ RQ −1VF = RPT ⋅ ( RP RQ −1 ) ⋅ VF

(10)

(7) and (8) are then applied to (10)

PU = RPT ⋅ F ⋅ VF = RPT ⋅ U F ΣVF T ⋅ VF = RPT ⋅ U F Σ VF TVF = I ,

= RQT ⋅ ( RP RQ −1 )T ⋅U F Σ = RQT ⋅ F T ⋅U F Σ = RQT ⋅ VF ΣTU F T ⋅ U F Σ , U F T U F = I = RQT ⋅VF ΣT Σ = RQT VF ⋅ Σ 2 = ( RQ −1VF ) −T Σ 2 = U −T Σ 2

(11)

We can find the expression U PU = Σ from (11). Similarly, (6) and (9) are applied to the matrix product of QU T

2

QU = RQT RQ ⋅ RQ −1VF = RQT ⋅ VF

= ( RQ −1VF ) −T = U −T

(12)

The other expression U T QU = I are got from (12). Now we can easily know that D1 = Σ 2 , D2 = I . Clearly, the global minimum of the nonnegative function

C (U ) = off(U T PU + U T QU )

(13)

is achieved when matrix U simultaneously and accurately diagonalize the pair ( P, Q) , And this minimum value equals to zero. In equation (13), the “off” is defined as off( H ) = ¦ | H ij |2 . The proof of the uniqueness of matrix U can be seen in i ≤i ≠ j ≤ n

appendix B of index [6]. Step 5. The source signals are estimated as Sˆ ( n) = U T X w (n) , and the demixing matrix is estimated as M = UTW , where W denotes the whitening matrix.

32

Z. Wang, X. Zhang, and T. Cao

3 Simulation Results The experiment in this Section is intended to illustrate the superiority of our algorithm compared to the JADE algorithm and the SOBI algorithm. In this test, the JADE algorithm and the SOBI algorithm use 3 fourth-order cumulant matrices and 150 covariance matrices for joint diagonalization, respectively. In order to evaluate the performance of three algorithms, we calculate the error measure proposed by Amari etc [9]. N

N

| gij |

i =1

j =1

max k | g ik |

E = ¦ (¦

N

N

| gij |

j =1

i =1

max k | g kj |

− 1) + ¦ (¦

− 1)

(14)

gij is the (i, j ) -element of the global system matrix G = MA and maxj gij represents the maximum value among the elements in the i th row vector of G , maxj g ji denotes the maximum value among the elements in the i th column vector of where

G . The data X (n) = [ x1 (n) x2 (n)]T are synthesized by mixing two independent sources s5, s6 [8] through the matrix A , which is randomly generated in the interval [0,1]. The synthesized X (n) is then corrupted with white noise. In the situation of noise level ranging from -50 dB to 0 dB, Fig.1 shows that three curves are obtained by averaging ten times runs, which correspond to the JADE algorithm, the SOBI algorithm and the proposed algorithm, respectively. The main conclusion can be drawn from this figure is that the new algorithm performs better than the other two algorithms when noise power is less than -20 dB. Again, the performance of the new algorithm is still superior to that of the JADE algorithm when noise power increases from -20 dB to 0 dB.

Fig. 1. Noise power versus error measure for three algorithms: the JADE algorithm (dashed line with diamond), the SOBI algorithm (the dotted line with triangle-right) and the proposed algorithm (the dash-dot line with circle)

A New BSS Algorithm Based on Second-Order Statistics for TITO

33

Fig.2 shows a set of speech spectrograms in the case of noise power equaling to -

ª0.8349 0.6305º » . From this picture we can know ¬0.2258 0.7041¼

25 dB and the mixing matrix A = «

that, compared to the previous algorithms, the proposed algorithm has comparative performance by only using little second-order statistics information, which reduces the computation amount of the new algorithm.

Fig. 2. Speech spectrograms . (a), (b): The two source signals. (c), (d): The two mixing signals corrupted with white noise. (e), (f): The two separated signals of the JADE algorithm. (g), (h): The two separated signals of the SOBI algorithm. (i), (j):The two separated signals of the proposed algorithm.

4 Conclusion A new algorithm, which is applicable to TITO channel system, has been introduced for blind sources separation. In the proposed algorithm, a series of transforms are used to the formed matrix pair exploiting second-order statistics information. And the proof of accurate diagonalization of this pair is also presented. The separation of the two noisy source signals is studied in simulation experiments. Results show that our algorithm performs better than the JADE algorithm and the SOBI algorithm at low noise power. Besides, our algorithm still keeps better performance compared with the JADE algorithm when disturbed noise power increases.

References 1. Anand, K., Mathew, G., Reddy, V.: Blind Separation of Multiple Co-channel BPSK Signals Arriving at an Antenna Array. IEEE Signal Processing Letters. 2 (1995) 176-178 2. Chaumette, E., Comon, P., Muller, D.: ICA-based Technique for Radiating Sources Estimation: Application to Airport Surveillance. IEE Proceedings-F. 140 (1993) 395-401 3. Karhunen, J., Hyvarinen, A., Vigario, R. (ed.): Applications of Neural Blind Separation to Signal and Image Processing. In Proc. ICASSP. 1 (1997) 131-134

34

Z. Wang, X. Zhang, and T. Cao

4. Makeig, S., Bell, A., Jung, T.P, Sejnowski, T.J.: Independent Component Analysis of Electroencephalographic Data. In Advances in Neural Information Processing Systems. 8 MIT Press (1995) 5. Belouchrani, A., Cichocki, J.F.: Robust Whitening Procedure in Blind Source Separation Context. Electronics Letters. 36 (2000) 2050-2053 6. Belouchrani, A., Abed, M.K., Cardoso, J.F.(ed.): A Blind Source Separation Technique Using Second-order Statistics. IEEE Trans.Signal Processing. 45 (1997) 434-444 7. Cardoso, J.F., Souloumiac, A.: Blind Beamforming for Non-Gaussian Signals. IEE Proc. F (Rader and Signal Processing), 140 (1993) 362-370 8. http://www.kecl.ntt.co.jp/icl/signal/mukai/demo/hscma2005 9. Amari, S.I., Cichocki, A., Yang, H.H.: A New Learning Algorithm for Blind Signal Separation. In D.S Touretzky, M.C.Mozer & M.E. Hasselmo (Eds), Advance in Neural Information Processing Systems, Cambridge, MA: MIT Press (1996) 757-763

A New Step-Adaptive Natural Gradient Algorithm for Blind Source Separation Huan Tao, Jian-yun Zhang, and Lin Yu Information Engineering Dept., Electronic Engineering Institute, 230037 HeFei, China [email protected]

Abstract. The main differences between the mixed signals and origin signals: Gaussian probability density function, statistical independence and temporal predictability. The proposed BSS algorithms mainly derived from the Gaussian probability density function and statistical independence. A new adaptive method is proposed in the paper. The method uses the temporal predictability as cost function which is not studied as much as other generic differences between the properties of signals and their mixtures. Step-adaptive nature gradient algorithm is proposed to separate signals, which is more robust and effective. Compared to fixed step natural gradient algorithm, Simulations show a good performance of the algorithm.

1 Introduction The goal of blind signal separation (BSS) is to recover mutually independent signals from their mixture. The problem has recently attracted a lot of interest because of its wide number of applications in diverse fields and some effective methods have been proposed such as in [2],[5],[6]. BSS can be very computationally demanding if the number of source signals is large. Application call for a BSS method which is computationally affordable, fast convergent, stable and reasonable accurate. BSS can be decomposed into two-steps: the observations are first pre-whitened by a whitening matrix, and then an orthogonal matrix can be separately calculated by constraining the source separation with different cost function. There are three main differences between the mixed signals and origin signals: Gaussian probability density function, statistical independence and temporal predictability. The proposed BSS algorithms mainly derived from the Gaussian probability density function and statistical independence. A new cost function based on temporal predictability is proposed by Reference [1] and proved to be effective which found a new way for blind signal separation. But it has limitations. The added calculation of temporal predictability makes the separate algorithm based on nature gradient time-consuming and instable. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 35 – 40, 2006. © Springer-Verlag Berlin Heidelberg 2006

36

H. Tao, J.-y. Zhang, and L. Yu

In this paper, a new BSS method based on maximizing temporal predictability of signal mixtures has been introduced which has better separation performance.

2 Preliminaries 2.1 BSS Model As a general model for BSS let L observed signals be related to N independent source signals si (t )(t = 1,", N ) by L × N unknown channel matrix A:

x(t ) As (t ) n(t ) . Where

(1)

s (t ) = [ s1 (t ), s 2 (t ),", s N (t )]T and A is full-column rank matrix.

n(t ) (n1 (t ), n2 (t ), ", nm (t ))T is the vector of additive noise. Without loss of generality we assume in the derivation that signal are real-valued, L = N and no noise. The BSS can operate into two steps. The first step is to pre-whiten the observations according to a whitening matrix B which results in a set of uncorrelated and normalized signals. Pre-whitening can be carried out in any of known methods and it is not dealt with here. After pre-whitened, an appropriate cost function based on high-order statistics can separate the sources by forcing their independence. 2.2 Temporal Predictability In Reference [1], the definition of signal predictability F is: n

F (Wi, x) log

V (Wi , x) V log i log U (Wi , x) Ui

( y y ) i

i1 n

~

( y y ) i

2

i

.

(2)

2

i

i1

Where

yi Wi xi is the value of signal y at time i . The term U i reflects the extent ~

yi is predict by a short-term ‘moving average’ y i of values in y . In contrast, the term Vi is a measure of the overall variability of in y . As measured by

to which

the extend to which

yi is predicted by a long term ‘moving average’ y i of values in ~

y . The predicted value y i and y i of yi are both exponentially weighted sums of signal values measured up to time i 1 ,such that recent values have a larger weighting than those in the distant past:

A New Step-Adaptive Natural Gradient Algorithm for Blind Source Separation ~

~

y i MS y i1 (1 MS ) yi1 0 b MS b 1

37

.

(3)

y i ML y i1 (1 ML ) yi1 0 b ML b 1 The half-life

hL of ML is much longer than the corresponding half-life hS of MS .

The relation between a half-life

h and parameter M is defined as M 21/ h .

3 Step-Adaptive Nature Gradient Algorithm Equation (2) can be rewritten as:

F log

Wi C Wi t ~

.

(4)

Wi C Wi t

~

Where C is long-term covariance between signal mixtures, and C is short-term covariance, which can be expressed as: Reference [3] proved gradient ascent on F with respect to Wi could be used to maximize F .

∇F = Iteratively updating

2Wi − 2Wi ~ C− C. Vi Vi

(5)

Wi until a maximum of F is located: Wi +1 = Wi + µ∇F .

(6)

Commonly, the convergence time and stability is based upon the properly selection of step µ . Reference [4] has analyzed the stability conditions of nature gradient algorithm. A step-adaptive algorithm is desirable. Here we proposed a new step-adaptive algorithm. Intuitively, we can use the distance between the separation matrix and optimal separation matrix to adjust the step adaptively. But the optimal separation matrix is unknown before the signals separated. An alternation we use

∆W (k ) = Wi +1 − Wi

2 F

.

(7)

∆W (k ) , E (∆W (k )) is used to perform step-adaptive adjustment. In the process of adaptation, the increasing of E ( ∆W ( k )) means the To smooth the

fluctuation of algorithm, so a smaller step is desirable; on the contrary, the decreasing

38

H. Tao, J.-y. Zhang, and L. Yu

of E ( ∆W ( k )) means a larger step is wanted to accelerate the convergence of algorithm. The updating expression of step is:

µ (k + 1) = α (k ) µ (k ) .

α

(8)

can be expressed as follows:

1+ γE(∆W(k)),E(∆W(k)) < E(∆W(k −1)) 1 °° , E(∆W(k)) > E(∆W(k −1) α(k) = ® 1 + E ( ∆ W ( k )) β ° °¯ 1, else

(9)

Where 0 < β < 1,0 < γ < 1 γ is in charge of the convergence speed and controls the steady error when convergence. E (∆W (k )) can be get form:

E(∆W(k +1)) =

k 1 E(∆W(k)) + ∆W(k +1) . k +1 k +1

β

(10)

4 Simulation and Performance Compared to fixed step nature gradient algorithm, the performance of the stepadaptive nature gradient algorithm is evaluated through simulations. Here we use three source signals with the sample of 5000 points. The mixing matrix A is generated randomly. The simulation parameters are as follows: λ L = 0.9, λ S = 0.004, µ 0 = 0.001, β = 0.5, γ = 0.06 . After separation, the

1 º ª0.02 0.04 « 1 0.1 »» . separated result is: 0.06 « «¬ 1 0.008 0.003»¼ To evaluate the separation performance and the convergence speed of different algorithms, we use the correlation coefficiency between the original signals and recovered signals. The definition of correlation coefficiency is defined by (11).

ρ ij =

cov(s i , s j ) cov(s i , si ) cov(s j , s j )

.

(11)

A comparison between fixed step nature-gradient algorithm with different steps and step-adaptive nature-gradient algorithm is done based on correlation coefficiency. The result is depicted in Fig. 1.

A New Step-Adaptive Natural Gradient Algorithm for Blind Source Separation

39

0 step=0.0005 step=0.002 step=0.01 adaptive step

-10

ρ(k) [dB]

-20

-30

-40

-50

-60

-70

0

0.5

1

1.5 2 number of iterations k

2.5

3 4

x 10

Fig. 1. Comparison of step-adaptation with fixed steps of 0.0005,0.002 and 0.01

From Fig.1, we can clearly see the step-adaptive nature-gradient algorithm is superior to fixed step nature-gradient algorithm in convergence speed.

5 Conclusion A new adaptive separation algorithm is proposed based on maximizing temporal predictability of signal which is not studied as much as other generic differences between the properties of signals and their mixtures. The algorithm is stepadaptive. So it is more robust compare to fixed step natural gradient. Simulations show that it is effective and can get good separation precision. The step-adaptive nature gradient algorithm can also be used to other BSS method based on different cost function.

References 1. James, V. Stone.: Blind Source Separation Using Temporal Predictability, Neural computation(in press) (2001) 1196-1199 2. Belouchrani, A., Abed-Meraim, K., Cardoso, J.F.: A Blind Source Separation Using Second Order Statistics. IEEE Trans. On signal processing,Feb. Vol.45 (1997)434-444 3. Amari, S I.:Natural Gradient Works Efficiently in Learning. Neural Computation(1998) 251-276

40

H. Tao, J.-y. Zhang, and L. Yu

4. Amari, S. I., Chen, T. P.,Cichocki, A.: Stability Analysis of Adaptive Blind Source Separation, Neural Networks(1997) 1345-1351. 5. Sergio, A., Cruces-Alvarez, Andrzej Cichocki, Shun-Ichi Amari.: On A New Blind Signal Extraction Algorithm: Different Criteria and Stability Analysis.. IEEE SIGNAL PROCESSING LETTERS, VOL.9,NO.8,AUGUST (2002) 6. Yan, Li, Peng Wen, David Powers.: Methods for The Blind Signal Separation Problem. IEEE Int. Conf. Neural Networks&Signal Processing. December (2002)

An Efficient Blind SIMO Channel Identification Algorithm Via Eigenvalue Decomposition* Min Shi and Qingming Yi Department of Electronic Engineering, Jinan University, Guangzhou, 510632, PR China [email protected]

Abstract. An effective blind multichannel identification algorithm is proposed in this paper. Different from the Prediction Error Method, the new algorithm does not require the input signal to be independent and identical distribution, and even the input signal can be non-stationary. Compared with Least-Square Approach, the new algorithm is more robust to the overestimation of channel order. Finally, the experiments demonstrate the good performance of the proposed algorithm.

1 Introduction Blind identification of Single-Input Multiple-Output (SIMO) systems has many applications, or potential applications in wireless communications, equalization, seismic data deconvolution, speech coding, image deblurring, echo cancellation[1-8], etc. For the FIR SIMO system, as long as the FIR channels do not share the common zeros and all channels are fully activated, the SIMO system can be identified by just second-order statistics of the output [1], which further makes the blind identification of SIMO systems so important. So many researchers paid much attention on this problem. Because of the predominant advantage in computation cost and the weak requirement in data samples of the receiving signals, the second-order statistics (SOS)-based methods are very attractive and obtain much attention. Among them, the least-square approach (LSA) [1], the linear prediction methods (LP) [2] and the subspace methods (SS) [3]and are the three main classes. When the channel order is known, the channels can be very precisely estimated by SS-based approaches and LSA-methods, however, which are very sensitive to the estimation error of channel order. Contrastively, LP methods are not so accurate as the former two methods, but robust to the channel order overestimation. LP methods usually require the input signal is independent and identically distribution (i.i.d) while the other two methods is not limited by this requirement. Relatively, the LS approaches are a little simpler than SS ones. *

This work was supported by the National Natural Science Foundation of China(Grant 60505005), the Guangdong Provincial Natural Science Foundation(Grant 05103553), and Guangdong Province Science and Technology Project (Grant 2005B10101013).

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 41 – 47, 2006. © Springer-Verlag Berlin Heidelberg 2006

42

M. Shi and Q. Yi

In this paper, we present a new blind identification algorithm for SIMO FIR system by improving the LS approaches. The proposed algorithm is simply based on generalized eigenvalue decomposition. The new algorithm can be easy implemented and is robust to the channel order overestimation than SS and LS approaches.

2 Problem Statement The single-input m -output channel can be formulated as: L

x ( t ) = ¦ h (τ ) s ( t − τ ) + n ( t ) , t = 1, 2," , T

(1)

τ =0

where x ( t ) = ( x1 ( t ) ," , xm ( t ) ) ∈ R m×1 is the observed signa1 vector, s ( t ) is the input T

signal, h (τ ) = ( h1 (τ ) ," , hm (τ ) ) , (τ = 0," , L ) denotes the FIR channel impulse response. The order of the convolution is L . The additive noise is denoted as a vector T n ( t ) = ( n1 ( t ) ," , nm ( t ) ) ∈ R m×1 . The blind identification problem can be stated as T

follows: Given the receiving signals determine

{hˆ ( < )} i

m

i =1

the

channels

{h ( < )}

m

i

i =1

{ x ( t ) i = 1,", m; t = 1,", T }

up

i

to

a

nonzero

, we aim to

scaling

factor,

i.e.

= c {hi ( < )}i =1 , ( c ≠ 0 ) , then we can further recover the input signal s ( < ) . m

Xu, Tong, et al point out that if the channel order is known in advance, the necessary and sufficient identifiability condition of SIMO system (1) is that the FIR channels have no common zero [1]. So we assume that the FIR channels of system (1) do not share the common zeros.

3 Identification Equations According to reference [1], we have the following equations: xi ( t ) = hi ( t ) : s ( t ) ,

x j (t ) = h j (t ) : s (t ) ,

(2)

where : stands for convolution operation. Thus h j ( t ) : xi ( t ) = h j ( t ) : ¬ª hi ( t ) : s ( t ) ¼º = hi ( t ) : ª¬ h j ( t ) : s ( t ) º¼ = hi ( t ) : x j ( t ) , i.e., h j ( t ) : xi ( t ) = hi ( t ) : x j ( t ) , ( i ≠ j, i, j = 1," , m )

(3)

From equation (3), we have ªhj º ª¬ X i ( L ) : − X j ( L ) º¼ « » = 0 ¬ hi ¼

(4)

An Efficient Blind SIMO Channel Identification Algorithm

43

where hk = ( hk ( L ) ," , hk ( 0 ) ) and T

ª xk ( L ) « x ( L + 1) X k ( L) = « k «# « «¬ xk (T − L )

xk ( L + 1) xk ( L + 2 ) # xk (T − L + 1)

" xk ( 2 L ) º » " xk ( 2 L + 1) » » % # » " xk (T ) »¼

(5)

where k = 1," , m . T

Denote h ª h1T ," , hmT º , and we construct the following matrices: ¬ ¼ ª º½ « 0 " 0 X i +1 ( L ) − X i ( L ) 0 0 »° « » °° X i ( L) = « # 0 0 » ¾ m − i blocks # % «0 " 0 X ( L) 0 " − X i ( L )» ° m « N

»° i −1 blocks m − i +1 blocks ¬ ¼ ¿°

(6)

where i = 1," , m .In equations (6), each block, e.g., 0 or { X k ( L ) , k = 1,", m} , has the size (T − L + 1) × ( L + 1) . In the noise free case, from SIMO system (1) we derive the following equations: X ( L) ⋅ h = 0

where matrix X ( L ) is

(7)

{(T − L + 1) ª¬m ( m − 1) 2º¼} × ª¬m ( L + 1)º¼ , and it is given by

ª º½ « X 1 ( L) »° « » °° m ( m + 1) X ( L) = « blocks # »¾ 2 « X m−1 ( L ) » ° «

»° ¬ m blocks ¼ °¿

(8)

Now the blind identification problem (1) boils down to solving equations (7).

4 Blind Identification Algorithm The solution of equation (7) is not unique. To find the practical solution, we usually add some appropriate constraints, e.g., h = 1 or c H h = 1 for a constant vector c . 2 The LS approaches identify the channels of system (1) by solving the following optimization problem with constraints: min J ( h ) = min X ( L ) ⋅ h 2 , 1 ° 2 h ® °¯ st : h 2 = 1.

(9)

44

M. Shi and Q. Yi

Xu, Tong et al [1] use the singular value decomposition (SVD) or fast subspace decomposition (FSD) to solve optimization problem (9). Of course, we can replace the constraint h = 1 by constraint c H h = 1 .Since the accurate channel order of 2

system (1) is unknown and estimating it is a challenging work in practice. Usually what we can do is overestimating the order. Without the loss of generality, we overestimate the channels order of system (1) as Lh ( Lh ≥ L ) . As mentioned in section 1, LSA algorithm is not robust to overestimation of channel order. To overcome this drawback, we attempt to improve the LSA algorithm, which intend to not only keep advantage of LSA algorithm, but also be robust to overestimation of channels order. Denote the ª¬ m ( Lh + 1) º¼ ×1 vector hˆ to be the estimation of h . Considering Lh ≥ L , if hˆ satisfies hk (τ ) = 0, (τ = L + 1,", Lh ; k = 1,"m ) , the overestimation of channel order will have not any influence on the channel identification of system (1). Hence the desirable estimation hˆ of h should be hˆ = ª hˆT ," , hˆT ºT , m¼ ¬ 1 ° ° T h −L § L · ® T °hˆk = c ¨ 0," , 0, hk ¸ = c ( 0," , 0, hk ( L ) ," , hk ( 0 ) ) , k = 1," , m,

¨ ¸ ° Lh +1 © ¹ ¯

where

c is a nonzero constant. We construct the following

(10)

ª¬ m ( Lh + 1) º¼ × 1 vector:

ȝ = ª ȝT ," , ȝT º T ∈ R m( Lh +1) , m¼ ¬ 1 ° ® T Lh ° ȝk = µ ," , µ ,1 , k = 1," , m;0 < µ < 1. ¯

(

(11)

)

To make hˆ be robust to overestimation of channel order and satisfy expression (10) as possible as it can, we solve the following optimization problem with constraints:

()

ª X ( L ) ⋅ hˆ 2 + hˆ T diag ( ȝ ) l hˆ º , ˆ ( ) »¼ h ° min J h = min 2 hˆ « ¬ ® ° st : hˆ = 1, ¯ 2

(12)

where l is a positive integer. Because 0 < µ < 1 , it is easy to know that 1 > µ > " > µ Lh and 1 > µ l > " > µ lLh . So under the constraints hˆ = 1 and 2

l X ( Lh ) hˆ = 0 , minimizing hˆT ª¬ diag ( ȝ ) º¼ hˆ will force hˆ to approximately satisfy expression (10) in some degree. The constraint hˆ = 1 means hˆT hˆ = 1 . Thus the optimization problem (12) can be 2

formulated into the following one without constraint:

An Efficient Blind SIMO Channel Identification Algorithm

45

ª X ( L ) ⋅ hˆ 2 + hˆ T diag ( ȝ ) l hˆ º ( ) ¼» h 2 ˆ = min ¬« h min e hˆ hˆ hˆ T hˆ

()

l hˆ T ª X T ( Lh ) X ( Lh ) + ( diag ( ȝ ) ) º hˆ ¬ ¼ = min hˆ hˆ T hˆ

(13)

From expression (13), we have

()

l e hˆ ⋅ hˆT hˆ = hˆT ª X T ( Lh ) X ( Lh ) + ( diag ( ȝ ) ) º hˆ ¬ ¼

(14)

For equation (14), calculating the derivative of the two sides with respect to hˆ , we get

( ) ⋅ hˆ hˆ + 2 e hˆ hˆ = 2 ª X () ¬

∂ e hˆ

T

∂hˆ

Let

T

( Lh ) X ( Lh ) + ( diag ( ȝ ) ) º¼ hˆ l

(15)

( ) = 0 , from equation (15), we have

∂ e hˆ ∂hˆ

( ) ⋅ hˆ hˆ = 2

∂ e hˆ ∂hˆ

T

{ª¬ X

T

( Lh ) X ( Lh ) + ( diag ( ȝ ) ) º¼ − e ( hˆ ) ⋅ I l

} hˆ = 0

(16)

Equation (16) means that one can estimate hˆ by doing the eigenvalue decomposition with respect to matrix ª X T ( Lh ) X ( Lh ) + ( diag ( ȝ ) )l º , and the ¬ ¼ eigenvector corresponding smallest value is just the estimation of hˆ . So we obtain the proposed algorithm as follows:

XP Input the received signalsG x ( t ) = ( x1 ( t ) ,", xm ( t ) )T , t = 1,", T UG Set µ SG integerG l Gand the channel orderG Lh UG YP Construct the matrix X ( Lh ) Gand ȝ UG ZP Compute the eigenvalues and corresponding eigenvectors of matrixUG 4) The eigenvector corresponding smallest value is just the estimationG hˆ ofG h U

5 Numerical Experiments and Result Analysis Root-mean-square-error (RMSE) is employed as a performance measure of channel estimation. Usually, when RMSE<0.8, the channels are well identified; when RMSE>1.0, the estimation of channels is not reliable. The input signal is supposed to be independent and identical distribution in the experiment. Computer simulations were conducted to evaluate the performance of the proposed algorithm in comparison with Least-Squares Approach (LSA) and Prediction Error Method (PEM). In the

46

M. Shi and Q. Yi

following two experiments, the related parameters of the proposed algorithm are set as: T = 1000 , µ = 0.99 and l = 2 experientially. All input signals are i.i.d Gaussian signals generated by Matlab command randn ( < ) . The channel coefficients are listed below.

h1 ( z ) = -0.4326+0.1253z −1 -1.1465z −2 , h2 ( z ) = -1.6656+0.2877z −1 +1.1909z −2 .

Table 1. The overestimation of channel order and corresponding RMSE for i.i.d input signal

Lh

2

3

4

5

6

7

8

9

LSA PEM Our

9.0984e-016 0.0424 9.3014e-006

0.99 0.08 0.03

0.97 0.26 0.14

1.04 0.27 0.14

0.10 0.31 0.14

1.03 0.31 0.14

1.08 0.32 0.14

1.10 0.32 0.14

(a) Noise free

(b) The received signals are added white Gaussian noise and the SNR is 40dB

Fig. 1. Performance comparison between LSA, PEM and the proposed algorithm

From Table 1 and Fig.1(a), when the order of channel is accurately given, LSA can obtain the precise estimation of channels. But for overestimation case without noise, we can see that both PEM algorithm and the proposed algorithm well identify the channels, but LSA does not do this. Additionally, Fig.1(b) shows the comparison result in the same simulation environment except adding white Gaussian noise to the receiving signals. All SNRs are 40dB. In this situation, we can see that only the proposed algorithm get the relatively satisfactory estimation (Fig.1(b)).

6 Conclusion Based on matrix eigenvalue decompostion, an effective blind multichannel identification algorithm is proposed in this paper. Different from the Prediction Error Method, the new algorithm does not require the input signal to be independent and

An Efficient Blind SIMO Channel Identification Algorithm

47

identical distribution, and even the input signal can be non-stationary. Compared with Least-Square Approach, the new algorithm is more robust to the overestimation of channel order and much faster.

References 1. Xu, G. H., Liu, H., Tong, L., Kailath T.: A Least-squares Approach to Blind Channel Identification. IEEE Trans on Signal processing, Vol.43 (1995) 2982-2993 2. Abed-Meraim K., Moulines E., Loubaton P.: Prediction Error Methods for Second-order Blind Identification. IEEE Trans on Signal processing, Vol. 45 (1997)694–705 3. Moulines E., Duhamel P., Cardoso J. F., Mayrargue S.: Subspace Methods for the Blind Identification of Multichannel FIR Filters. IEEE Trans on Signal Processing, Vol. 43 (1995) 516–525 4. Xie, S. L., He, Z. S., Fu, Y. L.: A Note on Stone's Conjecture of Blind Signal Separation. Neural Computation, vol. 17 (2005)321-330 5. He, Z. S., Xie, S. L., Fu, Y. L.: A Novel Framework of Multiphonic Acoustic Echo Cancellation. Progress in Natural Science (2005) 6. He, Z. S., Xie, S. L., Fu, Y. L.: A New Blind Deconvolution Algorithm for SIMO Channel Based on Neural Network. In: Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on, Vol. 6 (2005) 3602- 3616 7. Gazzah H. Regalia P. A., Delmas J. P., Abed-Meraim K.: A Blind Multichannel Identifaction Algorithm Robust to Order Overestimation. IEEE Transactions on Signal Processing, Vol. 50 (2002)1449-1458 8. Gazzah H. Regalia P. A., Delmas J. P.: Asymptotic Eigenvalue Distribution of Block Toeplitz Matrices Application to Blind SIMO Channel Identification. IEEE Transactions on Information Theory, Vol. 47 (2001) 1243 - 1251

An Improved Independent Component Analysis Algorithm and Its Application in Preprocessing of Bearing Sounds Guangrui Wen, Liangsheng Qu, and Xining Zhang College of Mechanical Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi 710049, China {grwen, lsqu, zhangxining}@mail.xjtu.edu.cn

Abstract. Independent Component Analysis (ICA) is known as an efficient technique to separate individual signals from various sources without knowing their prior characteristics. Firstly, the basic principle of ICA is reviewed in Sec 2, and then an improved ICA algorithm based on coordinate rotation (CR-ICA) is proposed. Secondly, two advantages of the CR-ICA algorithm are discussed; the one is that the separation can be carried out without iteration, and the other is that less computation is needed to achieve the same effect. Finally, the experiment in recognition of mixed sound and practical application in preprocessing of bearing sounds proved that the CR-ICA algorithm is better than traditional ICA algorithm in separation precision and computation speed. Moreover, the advantages of the method and the potential for further applications are discussed in the conclusion.

1 Introduction The data collecting method of multi-measurement points and multi-sensors has been adopted widely in the mechanical equipment online monitoring and fault diagnosis system. The signals collected were sometimes mixed up by the signals coming from different sources. Separating some special signals from these mixed signals may help figure out the essentials of machine faults and enhance the quality of diagnosis information. Blind Source Separation (BSS) is introduced for the signals of unknown of source signal and mixture type. Independent Component Analysis (ICA) is a new technique of statistical signal processing accompanying with the development of BSS problems. ICA deals with the mixed signals derived from the linear and nonlinear combination of independent statistic signals with each other and aims at separating each independent component from the mixed signals. In 1994, Comon expatiated on the concept of ICA systemically and constructed a cost function directly based on high order statistic variables [1]. Bell and Sejnowski explained the BSS problem from the information theory point of view, and presented the maximum entropy ICA algorithm (Infomax-ICA) [2], i.e. the maximum difference entropy of outputs of neural networks predict the mutual information maximum between inputs and outputs in neural networks. Based on it, they presented stochastic D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 48 – 53, 2006. © Springer-Verlag Berlin Heidelberg 2006

An Improved ICA Algorithm and Its Application in Preprocessing of Bearing Sounds

49

gradient declined algorithm to complete maximum difference entropy simultaneously. Hereafter, many people including T.W.Lee etc. expanded the work of Bell and Sejnowski, and developed an improved expanding ICA algorithm.[3] The algorithm was useful for the signals displayed in super-gaussian and sub-gaussian condition. However, these ideas and algorithms were lack of computability or consistency essentially. I.e. the computation carried out with iteration and needed long computation time. In addition, the mixed signal seldom satisfied ideal symmetry distribution in practice, and they present skewness distribution generally [4]. This paper is a first attempt to apply the ICA in engineering diagnosis area. Case studies in this paper reveal its advantages. The potential application is also discussed.

2 Basic ICA Principle and Improved Algorithm 2.1 Basic ICA Principle ICA was originally developed to deal with the problems that are closely related to the cocktail-party problem [5,6]. Since the recent progress in ICA, it has become clear that this method will find widespread applications as well.

X = WS

(1)

where X is the observed vector, W is the mixed factor matrix, S is the source vector. Obviously, if we can get the inverse matrix of W , indicated by W T , we may easily obtain the source signal matrix S from the observed signal matrix X , the former S will be written as:

S =WT X

(2)

ICA can be used to estimate the source signals from the mixtures based on the information of their independence. As we know, independence of two random variables means that the joint probability distribution function (PDF) is equal to the product of individuals as Equation 4.

p(x1 , x 2 ) = p1 ( x1 ) p 2 (x 2 )

(3)

Basically speaking, ICA is an optimization problem; its objective is to optimize the coefficient matrix W so as to obtain the components S , the components of which are statistically as independent to each other as possible. Based on traditional ICA algorithms, this paper presents a new improved ICA algorithm, and applies it in engineering diagnostics area. 2.2 An Improved ICA Algorithm Based on Coordinate Rotation (CR-ICA) 2.2.1 Preprocessing for CR-ICA In the preceding section, we discussed the principle of the ICA algorithm. Practical detail algorithms based on these principles will be discussed in the next section. However, before applying an ICA algorithm on the data, it is usually very useful to do

50

G. Wen, L. Qu, and X. Zhang

some preprocessing. In this section, we discuss some preprocessing techniques that make the problem of ICA estimation simpler and better conditioned. a Centering The most basic and necessary preprocessing is to center X, i.e. subtract its mean vector M=E{X} so as to make X a zero-mean variable. This implies that S is zero-mean as well, as can be seen by taking expectations on both sides of Equation (1). This preprocessing is made solely to simplify the ICA algorithms. b Whitening Another useful preprocessing method is to whiten the observed variables. This means that before the application of the ICA algorithm (and after centering), we transform ~ the observed vector X linearly so that we obtain a new vector X which is white, i.e. its components are uncorrelated and their variances equal unity. With the original signal whitened, the correlation between the mix signals can be eliminated, and the independent component extraction algorithm can be simplified and its performance will be improved. Sometimes only whitening process may recover the waveform of source signals. In the rest of this paper, we assume that the data has been preprocessed by centering and whitening. 2.2.2 Algorithm Flow of CR-ICA ~ After mixed signals X are preprocessed, X becomes a unit covariance vector X , and ~ the components of X is perpendicular with each other. Then a new improved Inde~ pendent Component Analysis Algorithm is proposed to process this vector X . The algorithm is based on the coordinate rotation theory and can be used to search the optimum rotational angle with the help of the optimum algorithm. The detail steps of the algorithm are shown as follows: Step 1: Select rotation matrix R. By rotating transforms, matrix S will be obtained.

R =[

cos α sin α

~ − sin α ] S = R* X cos α

(4)

In order to obtain the optimum rotation angle, object function Q is built. Q = ¦ (cos α ⋅ xi − sin α ⋅ yi )3 i

(5)

~ ~ where xi , yi are two column elements of matrix X 2× n which is equal to X and n is ~ column number of matrix X . Step 2: Obtain object function Q’s derivative Q '

Q ' = 3* ¦ [(cos α ⋅ xi − sin α ⋅ yi )2 *(sin α ⋅ xi + cos α ⋅ yi )] i

(6)

Step 3: In order to obtain extremum of Q ' , Q ' is taken to zero. According to Equation (10)

An Improved ICA Algorithm and Its Application in Preprocessing of Bearing Sounds [sin α (cos α )2 ¦ xi3 − 2*cos α (sin α ) i

¦x y i

2 i

+ (cos α )

i

3

¦yx

2 i i

i

2

¦yx

2

i i

51

+ (sin α )3 ⋅

i

− 2sin α ⋅ (cos α ) 2 ¦ yi2 xi + cos α ⋅ (sin α )

2

i

¦y ]=0 3 i

(7)

i

Step 4: Suppose a = ¦ xi3 , b = ¦ yi xi2 , c = ¦ xi yi2 , d = ¦ yi3 , then formula (12) can i

i

i

i

be simplified as follow: c ⋅ tg 3α + (d − 2b)tg 2α + (a − 2c)tgα + b = 0

(8)

Step 5: Obtain the root value of Equation (12) using tgα as unknown. Step 6: Search an optimum angle from all angles obtained by step 5 to make object function obtain the minimum. Step 7: Use Equation (4) to do rotation transformation, then the independent component can be obtained.

3 Experiments In practical recongnition of signals, sound recongnition is one classical type [8]. Mixed sounds are made up of human voice and alarming whistel sound. The sounds are collected by two recorders and it is no doubtful that each sound collected by single sound recorder will receive another sound’s information. Fig.1(a) and (b).show the

(a)

(c)

(b)

(d)

Fig. 1. (a) (b) display the original mixed sound, and separated results are showed in Fig1.(c)(d)

52

G. Wen, L. Qu, and X. Zhang Table 1. The performance of two algorithms in recognition of mixed sounds

SNR/dB

Algorithm CR-ICA FastICA

y1

y2

103.81 110.39

102.87 107.43

Computation Time/S 0.806 1.560

mixed sounds. By whitening the mixed sounds and then applying the improved ICA algorithm, the independent signals can be obtained and shown in Fig.1(c) and (d). Table 1 displays the SNR results by using two algorithms. It is obviously that the proposed CR-ICA algorithm is better than traditional FastICA algorithm in separation precision and computation speed under the same conditions.

4 Applications The condition monitoring and fault diagnosis of rolling bearing have been investigated for a long time. Many efficient methods have been proposed, such as resonance demodulation and ferrography. Herein, we recognize the bearing faults by sampling bearing sound. In experiment, two Sound level Meters were mounted to pick up the machine sound. One aimed at the motor sound, the other aimed at the bearing sound. It is sure that each collected sound contains other part sound information. We use the CR-ICA method to preprocess the mixed sound. The original signals collected are shown in Fig.2(a,b). The preprocessing results are shown in Fig.2(c,d).

(a)

(c)

(b)

(d)

Fig. 2. The observed signals are shown in Fig. (a,b) and the preprocessing results are shown in Fig. (c,d)

An Improved ICA Algorithm and Its Application in Preprocessing of Bearing Sounds

53

As shown in Fig.2(c,d), the separated source like white noise is due to the motor, while the impulsive signal with periodic impacts was originated from the spall in the inner race of the tested bearing.

5 Conclusions This paper proposes an improved ICA algorithm (CR-ICA), and applies it to tackle the following problems in experiments and engineering diagnosis: recognition of mixed sound and preprocessing of bearing sound. The case studies show that the CRICA method performs better than the traditional ICA algorithms.

References 1. Comon P.: Independent component analysis, a new Concept, Signal Processing Vol.36 (1994) 287-314 2. Belland, A.J., Sejnowski, T.J.: An information-maximization approach to blind separation and blind separation and blind deconvolution, Neural Computation, Vol.7 (1995) 11291159 3. Lee, T.W., Girolami, M., Sejnowski, T.J.: Independent component analysis using an extended infmax algorithm for mixed sub-gaussian and super-gaussian sources, Neural Computation,11(2) (1999) 417-441 4. Li, X.F., Wen, G.R.: Analyzed method of skewness component in Blind Separation, Journal of Xi’an Jiaotong University, Vol.37 (2003) 703-707 5. Zhang, H., Qu, L.:Partially blind source separation of the diagnostic signals with prior knowledge. Proceedings of the 14th International Congress on Condition Monitoring and Diagnostic Engineering Management, Manchester, UK. Elsevier (2001) 177-184 6. Aapo Hyvarinen, Erkki Oja: Independent Component Analysis: Algorithms and Applications, Neural Networks, Vol 13 (2000) 411-430 7. Qu Liangsheng, He Zhengjia: Mechanical Fault Diagnostics, Shanghai Science & Technology press.(1986) 86-87 8. Aapo Hyvarinen, Erkki Oja: Independent Component Analysis by General Nonlinear Hebbian-Like Learning Rules, Signal Processing 64. (1998) 301-313 9. Xu Yonggang: Mechanical Dynamic Signal Processing, Doctor dissertation, Xi’an Jiaotong University (2003) 10. Liangsheng Qu, Guanghua Xu: The Fault Recognition Problem in Engineering Diagnostics, Insight, Vol 39, No 8 (1997) 569-574

Array Signal MP Decomposition and Its Preliminary Applications to DOA Estimation Jianying Wang, Lei Chen, and Zhongke Yin School of Information Sci. & Tech., Southwest Jiaotong University, Chengdu, 610031, China {jywang, chan, zkyin}@home.swjtu.edu.cn

Abstract. The idea of sparse decomposition is introduced into array signal processing, and a novel approach to DOA estimation is presented in this paper. The approach decomposes the array signal over an over-complete dictionary, the atoms of which are vectors established according to the array geometry. The sparse decomposition is implemented by matching pursuit (MP) in the proposed algorithm. High resolution of DOA estimation can be obtained according to the parameters of the atoms decomposed with MP. The DOA estimation resolution capabilities are shown to be much higher than MUSIC and ESPRIT, especially in the case of less array elements and lower SNR. Furthermore, the performance is not affected by the correlation of the signals to be resolved. Computer simulation confirms its validity.

1 Introduction Goals of sensor array signal processing is to estimate parameters such as directions of arrival (DOA) by fusing temporal and spatial information, captured via sampling a wavefield with a set of judiciously placed antenna sensors. Parameter estimation problems in theoretical as well as applied statistics have long been of great research interest. Two popular methods belonging to this class of techniques are the so called maximum likelihood method (MLM) which is based on the work of Capon [1] on frequency-wave-number analysis, and the maximum entropy method (MEM) based on the work of Burg [2]. Perhaps, the most important high resolution techniques currently being examined are the so called signal subspace techniques, such as MUSIC [3] and ESPRIT [4]. The key problem of eigen-subspace methods is the estimation of signal and noise subspace, and then the parameters can be achieved using the orthogonality of the signal and noise subspace. Besides, the algorithms based on higher order cumulant [5], neural network [6] and wavelet analysis [7] etc. have been proposed. At present, the algorithms most widely used in array signal processing, which are almost based on orthonormal decomposition of signals, project the array signal on an orthonormal and complete subspace. Because the signal is decomposed into weightedsum of each orthonormal basis function, the orthonormal decomposition has some disadvantages, such as orthogonality and completeness of the basis function system and, hence, inherent least resolution. So non-orthonormal decomposition provokes D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 54 – 59, 2006. © Springer-Verlag Berlin Heidelberg 2006

Array Signal MP Decomposition and Its Preliminary Applications

55

more and more research interests in recent years [8-10]. This paper introduced nonorthonormal decomposition into the array signal processing area. By establishing over-complete family of basis functions and adjusting atom vector density of overcomplete dictionaries, the received array signal can be projected onto one basis vector which approximates very close to the desired signal. Based on sparse representations of the array signal, high-resolution spatial estimation was implemented. In this paper, using the idea of sparse decomposition, based on matching pursuit (MP) decomposition [8], a new method of high-resolution DOA estimation is firstly proposed. Computer simulations show that the new algorithm obtains higher resolution than the conventional DOA estimation algorithm in the case of definite density of atom vectors, and the new method has better performance especially at low SNR.

2 Array Signal Model Consider D far-field narrow-band sources which have known center frequency ω 0 impinging on the array (as shown in Fig. 1). In such array, the distance d between two elements causes the propagation delays τ . Then the complex output of the lth element at time t can be written as: D

xl (t ) = ¦ ali si (t − τ li (θi )) + ni (t )

l = 1,2,, M

(1)

i =1

Where a li is the corresponding sensor element complex response at frequency ω 0 and τ li is the propagation delay between a reference point and the lth sensor element for the ith wavefront impinging on the array from direction θ i , ni (t ) is the additive noise that is assumed to be a stationary zero-mean random process.

θ

Fig. 1. Array geometry

The received data vectors of the array can be written as: X( t ) = AS( t ) + N( t )

(2)

Where X (t ) is the M × 1 snapshot data matrix, the vector S( t ) is the D × 1 data vector of impinging signals. N( t ) is the M ×1 data matrix of additive noise.

56

J. Wang, L. Chen, and Z. Yin

A is the array steering vector A = [ a1 a2 aD ] , For the ith signal, the vector a i

in the matrix A is given by a i = [exp(− jω 0τ 1i ), exp(− jω 0τ 2i ),, exp(− jω 0τ Mi )]T

i = 1,2,, D

(3)

Where τ li is the propagation delay between the reference point and the lth sensor for the ith wavefront impinging on the array from direction θ i . τ li is given by

τ li (θ i ) =

(l − 1)d sin θ i c

(4)

According to the array signal model described above, DOA estimation can be calculated though (4), as long as τ li is estimated by some method.

3 The DOA Estimation Based on MP Decomposition The conventional methods of array signal processing are almost based on orthonormal decomposition of signals, so there are many limits as mentioned above. In this paper, we introduce a new method of array signal processing with matching pursuit. Matching pursuit is a greedy algorithm that chooses at each step of decomposition process a waveform that approximates best a part of the signal. By using MP, the array signal can be decomposed over a family of functions chosen flexibly according to the characteristic of the signal itself. The characteristic of expansion coefficient can be utilized to get the interested information. According to the array signal model, in order to obtain The DOA estimation with equation (4), the atom vectors can be written as: 1 º ª » « exp(− jω d sin θ c ) 0 m » Gθ m (ai , t ) = S (t ) « » « » « ¬exp(− jω0 ( M − 1)d sin θ m c )¼

m = 1,2, , M

(5)

Where θ m is the DOA parameter which can be set according to the required searching precision. M is the total number of atoms in the dictionary. The parameters of vector atoms determined only by θ m . We can decompose the array signal over the dictionaries described above. According to the equation (5), we can establish an overcomplete vector family, and decompose the array signal over the family. By using MP, the array signal x can be decomposed into x = PG x + Rx

(6)

Where x is the signal received by array sensors, PG x is the signal’s projection on the atom vector which best matches the source, namely PG ( x ) = sup PG ( x ) , and Rx is θ m

the residual vector after approximating x with G .

Array Signal MP Decomposition and Its Preliminary Applications

57

In the MP decomposition, we must select a vector atom that matches x almost at best. This selection must follow a restriction which given by vector projection theorem as follow: x − Gθ i = inf x − Gθ y y∈ m

(7)

Obviously, the atom vector that best matches the original array signal can be obtained by searching the value of θi . Therefore the estimation of DOA can be obtained by the atom parameter θi . On the contrary, the noise does not have the same characteristic as the array signal, so the projection of noise on the atom vector is approaching zero. So this method can achieve the de-noised signal.

4 Simulation Results In this section, we present some simulation results to compare the performance of the new DOA estimation algorithm with the conventional algorithms (ESPRIT, MUSIC). We use a uniform linear array with pair spacing λ 2 . The signal is narrow band of 256 samples that is built by adding white Gaussian noise. The source is located at 60o. All the results are averaged over 128 simulations run at each point. Fig. 2 shows the DOA estimation result with a 3-element array. The simulation displays that the new DOA estimation method based on array signal MP decomposition has better performance than the conventional methods in the case of less array elements, so the new algorithm is an efficient method in reducing the hardware costs. In order to improve the algorithm’s performance at low SNR, an array of 10 elements is used in the next simulation. Fig. 3 shows the DOA estimation STD versus the signal-to-noise (SNR) for 256 snapshots. The simulation results indicate that the

Fig. 2. DOA Estimation STD versus SNR

58

J. Wang, L. Chen, and Z. Yin

Fig. 3. DOA Estimation STD versus SNR

new DOA estimation method has obviously higher resolution than the conventional methods such as ESPRIT and MUSIC, especially at low SNR. Another main advantage of the algorithm proposed in this paper is that, in the DOA estimation based on MP, the powers of the received signals are used instead of the signal subspaces; hence the system performance is robust to correlation between the inputs from different angles.

5 Conclusion A central problem in the array signals’ DOA estimation is how to exactly estimate the time delay. By decomposing the array signals over one over-complete dictionary, the time delay estimation has been clearly improved compared with decomposing over an orthonormal basis. As a result, higher resolution has been achieved with MP decomposition of the array signals. The new algorithm works well in the case of less array elements; therefore it can reduce the hardware costs. It performs well too at very low SNR circumstance, and can also be used when the signals are correlated. The newly proposed method in this paper should be beneficial to radar and sonar systems. On the other hand, the new method is just a preliminary probe into array signal sparse decomposition; whereas it was shown that the technique can achieve higher resolution in parameter estimation. From the analysis above, the method is quite promising, thus further research is needed on the algorithms and its performance.

References 1. Capon, J.: High-resolution Frequency-wave Number Spectrum Analysis. Proc. Of IEEE, Vol. 57(8) (1969) 1408-1418 2. BURG, J.P.: Maximum Entropy Spectral Analysis. PhD Thesis, Stanford University, Stanford, USA (1975)

Array Signal MP Decomposition and Its Preliminary Applications

59

3. Schmidt, R.O.: Multiple Emitter Location and Signal Parameter Estimation. IEEE Trans. Antennas and propagation, Vol. 34(3) (1986) 276-280 4. Roy, R., Kailath, T.: ESPRIT--Estimation of Signal Parameters via Rotational Invariance Techniques. IEEE Trans. Acoustics, Speech, and Signal processing, Vol. 37(7) (1989) 984-995 5. Mendel, J.M.: Tutorial on Higher-order Statistics (spectra) in Signal Processing and System Theory: Theoretical Results and Some Applications. Proc. of IEEE, Vol. 79(3) (1979) 278-305 6. Southall Hugh, L., Simmers Jeffrey, A., Donnell Teresa H.O.: Direction Finding in Phased Arrays with a Neural Network Beamformer. IEEE Transactions Antennas and Propagation, Vol. 43(12) (1995) 1369-1374 7. Xu, W., Liu, T., Schmidt, H.: Beamforming Based on Spatial-wavelet Decomposition. Sensor Array and Multichannel Signal Processing Workshop Proceedings, Vol. 4(6) (2002) 480-484 8. Mallat, S., Zhang, Z.: Matching Pursuits with Time-frequency Dictionaries. IEEE Trans. Signal Processing, Vol. 41(12) (1993) 3397-3415 9. Eldar, Y.C., Oppenheim, A.V.: MMSE Whitening and Subspace Whitening. IEEE Trans. Information Theory, Vol. 49(7) (2003) 1846-1851 10. Arthur, P.L., Philipos, C.L.: Voiced/unvoiced Speech Discrimination in Noise Using Gabor Atomic Decomposition. Proc. Of IEEE ICASSP[C], Hong Kong Vol. I (2003) 820-828

Mixture Matrix Identification of Underdetermined Blind Source Separation Based on Plane Clustering Algorithm* Beihai Tan and Yuli Fu College of Electronic and Communication Engineering, South China University of Technology 510640, China [email protected], [email protected]

Abstract. Underdetermined blind source separation and sparse component analysis aim at to recover the unknown source signals under the assumption that the observations are less than the source signals and the source signals can be sparse expressed. Many methods to deal with this problem related to clustering. For underdetermined blind source separation model, this paper gives a new plane clustering algorithm to estimate the mixture matrix based on sparse sources information. Good performance of our method is shown by simulations.

1 Introduction Blind source separation (BSS) has been applied to many fields, such as, digital communication, image processing, array processing and biomedicine, and so on. Also, it has a lot of potential applications. Therefor, it has been a hot topic in signal processing and neural networks field [1-6]. Blind separation comes from cocktail problem [7], just to say, we only can restore source signals by gotten sensor signals, what’s more, mixture channel and source signals’ distributions are unknown. So the mathematics model of BSS is X (t ) = AS (t ) + N (t ) , t = 1 T .

(1)

where X (t ) = [ x1 (t ), x 2 (t ) x m (t )]T is sensor signals, A ∈ R m×n is mixture matrix,

and S (t ) = [ s1 (t ), s2 (t ) sn (t )]T is source signals, and N (t ) = [n1 (t ), n2 (t )nm (t )]T is noise. BSS aims at restoring source signals only by known sensor signals, generally, we suppose noise doesn’t exist. In general, if m is more than n , that is, the number of sensor signals is more than that of source signals [8], it is overdetermined BSS. We consider the case that m is less than n in this paper, namely, underdetermined BSS. Although it is difficult to restore source signals, we can use some other information, such as, sparseness of *

The work is supported by the National Natural Science Foundation of China for Excellent Youth (Grant 60325310), the Guangdong Province Science Foundation for Program of Research Team (grant 04205783), the Natural Science Fund of Guangdong Province, China (Grant 05006508), the Specialized Prophasic Basic Research Projects of Ministry of Science and Technology, China (Grant 2005CCA04100).

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 60 – 65, 2006. © Springer-Verlag Berlin Heidelberg 2006

Mixture Matrix Identification of Underdetermined Blind Source Separation

61

source signals, to restore source signals, and if some source signals aren’t sparse in time-domain, we can make them sparse through some transformation, such as, Fourier transformation or wavelet transformation, so BSS model is also written as x(t ) = a1 s1 (t ) + a 2 s 2 (t ) + a n s n (t ), t = 1 T .

(2)

Where x(t ) = [ x1 (t ), x m (t )]T , a i = [a1i , a mi ]T .

2 Sparse Representation of Underdetermined Blind Separation For underdetermined BSS, generally, some blind extraction algorithms [9], [10] are taken in past, but the algorithms can’t realize to restore all source signals. In order to restore all source signals in underdetermined BSS, researchers make use of some characteristics of signals, for example, sparse analysis is adopted to make signals sparse representation, so some underdetermined BSS algorithms are successfully. Among the good algorithms there are Belouchrani’s maximum likelihood algorithm [11] for discrete sources, Zibulevsky’s sparse decomposition algorithm [3], Lee [12] Lewicki [13] and Li’ overcomplete representation algorithms [5] and Bofill’ sparse representation in frequency domain [14]. Generally, sparse signal is that the one whose most sample points are zero or are near to zero, and a little sample points are far from zero. Here, we suppose that the source signal si (t ) is nonzero and the other source signals are zero or are near to zero at the time of t . So equation (2) can be written as (3) x(t ) = ai si (t ) . From above equation, we can known that ai and x(t ) are collinear so we can estimate mixture matrix A = [a1 , a2 , an ] by clustering x(t ) in all time. It is a very important algorithm for sparse component analysis solving underdetermined BSS, named by k-means clustering, and the algorithm includes two steps [5],[14], first, clustering centers are estimated by k-means clustering; second, source signals are estimated by known mixture matrix through linear programming. Because the above algorithms require that source signals are very sparse, so there is a lot of restriction for application. Recently, Pando Georgiev puts forward a new sparse component analysis method for underdetermined BSS based the next conditions [15]. A1) the mixture matrix A ∈ R m×n has the property that any square m × m submatrix of it is nonsingular. A2) each column of the source matrix S (t ) has at most m − 1 nonzero elements. A3) the sources are sufficiently rich represented in the following sense: for any index set of n − m + 1 elements I = {i1 , i2 in−m+1} ⊂ {1,2, n} there exist at least m column vectors of the matrix S such that each of them has zero elements in places with indexes in I and each m − 1 of them are linearly independent.

62

B. Tan and Y. Fu

For simplicity, we suppose m = 3, n = 4 to explain the paper’s algorithm for the problem. If m = 3, n = 4 , the equation (2) can be written as: x(t ) = a1 s1 (t ) + a 2 s 2 (t ) + a 3 s 3 (t ) + a 4 s 4 (t ), t = 1 T .

(4)

where x(t ) = [ x1 (t ), x2 (t ), x3 (t )]T and ai = [a1i , a2i , a3i ]T , according to A2), if the i th source signal and the j th source signal are nonzero at the time of t , then x(t ) = a i s i (t ) + a j s j (t ), t = 1 T .

(5)

From equation (5), we can know the sensor signal vector is in the same plane with vector ai and vector a j . Again, according to A1, every two columns in mixture matrix are independent, there are defined C 42 different planes by every two columns in mixture matrix. From equation (5), the mixture matrix A = [a1 , a2 , a3 , a4 ] can be estimated through plane clustering of sensor signals in no noise or little noise. Next, the plane clustering algorithm is given in detail and source signals are restored by it.

3 Mixture Matrix Identification Based on Plane Clustering Pando Georgiev has proved that the mixture matrix is identifiable when the conditions A1 ,A2 ,A3 are met. Because the mixture matrix is very important, but Pando Georgiev doesn’t give substantial algorithm for it, so this paper gives the substantial novel algorithm for estimating mixture matrix. For simplicity, we still suppose m = 3, n = 4 to explain the algorithm. To identify C 42 = 6 planes, we turn to identify their six normal lines , and if their normal lines are identified, then we identify their planes. In order to begin plane clustering, we initialize the sensor signals x(t ), t = 1T , which are normalized. If m = 3 , a sensor signal correspond to one point in the spherical surface, and the points of the below half spherical surface need to turn them to above half spherical surface symmetrically. Then, the new sensor signals are

° ° ˆx(t ) = ® °− ° ¯

x(t ) x(t )

if x3 (t ) ≥ 0.

x (t ) x (t )

if x3 (t ) < 0.

, t = 1T .

(6)

Clustering xˆ (t ) is correspond to clustering x(t ) , and the points will locate in the above half spherical surface which are in the same planes with the planes by every two columns of the mixture matrix respectively. Similar to k-means cluster, normal lines clustering is to get their normal lines and modify them in clustering algorithm. For example, there are some initialized points y (t ) = [ y1 (t ), y 2 (t ), y3 (t )]T , t = 1,2, N 0 in a plane.To identify its plane, we suppose & its normal line is n 0 = [n 01 , n 02 , n 03 ]T , According to inner-product’ s definition, & & (7) (n0 , y (t )) = n01 ⋅ y1 (t ) + n02 ⋅ y2 (t ) + n03 ⋅ y3 (t ) = n0 ⋅ y (t ) × cos θ n&0 y (t ) ,

Mixture Matrix Identification of Underdetermined Blind Source Separation

63

& where θ n&0 y (t ) is the angle between the normal line n0 and the point y (t ) , so 0 ≤ θ n&0 y (t ) ≤ π , and −1 ≤ cos θ n&0 y (t ) ≤ 1 . From equation (7), if we need to identify the plane composed of the points y (t ) , & t = 1,2, N 0 , the normal line n0 = [n01 , n02 , n03 ]T must be found to let θ n&0 y (t ) tend to

π 2

& for any t ∈ {1,2, N 0 } , because n 0 = 1, y (t ) = 1 , so just to say

& n0 = arg min & n0

s.t.

N0

&

¦ (n , y(t )) 0

(8)

t =1

(n01 ) + (n02 ) + (n03 ) = 1. 2

2

2

Based on equation (8), the plane clustering algorithm is followed in detail. 1) 2) 3)

4)

Initialize the sensor signals x(t ), t = 1 T using equation (6) to get new sensor signals xˆ (t ), t = 1T . & & & & & & Bring six initialized normal lines randomly, n1 , n2 , n3 , n 4 , n5 , n6 . & Compute the inner-products of xˆ (t ), t = 1T and ni , i = 1 6 respectively, & & and take their absolute values, let X i = {xˆ (t ) | ( xˆ (t ), n i ) < ( xˆ (t ), n j ) , j ≠ i} . & Modify the initialized normal lines, let n = [sin θ cos ϕ , sin θ sin ϕ , cosθ ] , 0 ≤θ ≤

5)

π

㧘

0 ≤ ϕ ≤ π . For the sake of simplicity, the algorithm is shown by 2 the following Matlab programme. for i = 1 : 6 & & nˆ i = ni ; for θ = 0 : η1 : π / 2 for ϕ = 0 : η 2 : π & & if ( X i , n ) < ( X i , ni ) & & ni = n ; end end end end & & Where η1 ,η 2 denote step sizes respectively, ( X i , n ) , ( X i , ni ) respectively denote the sums of inner-product’s absolute value between all the elements of & & the set X i and normal lines n , and ni . & & If nˆi − ni < ε i i = 16 , the algorithm stops and ε i is a given little value,

㧘

otherwise, continue the step 3). Because each column vector ai in the mixture matrix compose a plane with other & & column a j ( j ≠ i ) , so ai must be orthogonal with three normal lines among n1 , n 2 , & & & & n3 , n 4 , n5 , n 6 and the three normal lines must be in the same plane. That is to say, if we find any three coplanar normal lines, the columns ai (i = 1, 4) will be estimated.

64

B. Tan and Y. Fu

4 Restoring Source Signals & Now, we suppose that the normal line is nk (k ∈ {1, 6}) of the plane composed of ai , a j (i ≠ j ) , and the set of the sensor signals is X l (l ∈ {1, 6}) which is coplanar with ai , a j (i ≠ j ) . For any x(t ) ∈ X l , so x(t ) = ai si (t ) + a j s j (t ) ,

(9)

x(t ) = Aij sij (t ) ,

(10)

or where Aij = [ai , a j ], sij (t ) = [ si (t ), s j (t )]T , so

sij (t ) = Aij # x(t ) .

(11)

Where Aij # denotes the generalized inverse matrix of Aij . So only the i th source signal and the j th source signal have nonzero values gotten by equation (11) at the time of t , but zero for the other source signals at the time of t .

5 Simulations Results In the experiment, a random 3 × 4 matrix brings for the simulation but meets the condition A1), and take N = 1000 , four source signals are denoted in fig 1, The iniª- 0.27574 0.18977- 0.67493 0.86583º « 0.59016 0.28866- 0.72862- 0.12535» , and the » « «¬ 0.75874- 0.93844 0.11652 0.48439»¼ ª 0.67479 0.86533- 0.27554- 0.19024º algorithm is « 0.7288 - 0.12444 0.59033- 0.28914» . » « «¬- 0.11622 0.48551 0.75867 0.93819»¼

tialized mixture matrix is

matrix by the above

Fig. 1. Four source signals

Fig. 2. Restored source signals

estimated mixture

Mixture Matrix Identification of Underdetermined Blind Source Separation

65

From the estimated mixture matrix and the above figures of restored source signals, the algorithm is successful except that the first and the fourth restored signals have sign difference from the third and the second source signals, which is allowed in BSS.

6 Conclusions This paper gives a novel and substantial algorithm for estimating the mixture matrix and restoring the sparse source signals in underdetermined BSS. The algorithm is feasible and its good performance is shown in the simulation results, and it also easy to expand the algorithm to high dimension underdetermined BSS by sparse component analysis.

References 1. Hyvarinen, A., Oja, E.: Independent Component Analysis: Algorithms and Applications. Neural Networks, 13 (2000) 411-430 2. Xie, S. L., Zhang, J. L.: Blind Separation Algorithm of Minimal Mutual Information Based on Rotating Transform. Acta Electronic Sinica, 30 (5) (2002) 628-631 3. Zibulevsky, M., Pearlmutter, B.A.: Blind Source Separation by Sparse Decomposition in a Signal Dictionary. Neural computation, 13 (4) (2001) 863-882 4. Xie, S. L., He, Z. S., Gao, Y.: Adaptive Theory of Signal Processing. 1st ed. Chinese Science Press, Beijing (2006) 130-223 5. Li, Y., Cichocki, A., Amari, S.: Analysis of Sparse Representation and Blind Source Separation. Neural Computation 16 (2004) 1193–1234 6. Zhang, J. L., Xie, S. L., He, Z.S.: Separability Theory for Blind Signal Separation. Zidonghua Xuebao/Acta Automatica Sinica, 30 (3) (2004) 337-344 7. Jutten, C., Herault, J.: Blind Separation of Sources, Part I: An Adaptive Algorithm Based on Neuromimetic. Signal Processing, 24 (1991) 1-10 8. Zhang, J. L., Xie, S. L.: Multi-input Signal-output Neural Network Blind Separation Algorithm Based on Penalty Function. Intelligent and Complex Systems, 2 (2003) 353-362 9. Li, Y., Wang, J., Zurada, J. M.: Blind Extraction of Singularly Mixed Source Signals. IEEE Trans on Neural Networks, 11 (2000) 1413-1422 10. Li, Y., Wang, J.:Sequential Blind Extraction of Instantaneously Mixed Sources. IEEE Trans. Signal Processing, 50 (5) (2002) 997-1006 11. Belouchrani, A., Cardoso, J. F.: Maximum Likelihood Source Separation for Discrete Sources. In Proc. EUSIPCO, Edinburgh, Scotland (1994) 768-771 12. Lee, T. W., Lewicki, M.S., Girolami, M., Sejnowski, T. J.: Blind Source Separation of More Sources Than Mixtures Using Overcomplete Representation. IEEE Signal Processing Letter, 6 (1999) 87-90 13. Lewicki, M. S., Sejnowski, T. J.: Learning Overcomplete Representations. Neural computation, 12 (2000) 337-365 14. Bofill, P., Zibulevsky, M.: Underdetermined Source Separation Using Sparse Representation. Signal processing, 81 (2001) 2353-2362 15. Georiev, P., Theis, F., Cichocki, A.: Sparse Component Analysis and Blind Separation of Underdetermined Mixtures. IEEE Transactions On Neural Networks, 16 (4) (2005) 992-996

Non-linear Blind Source Separation Using Constrained Genetic Algorithm Zuyuan Yang and Yongle Wan School of Electrics & Information Engineering, South China University of Technology, Guangzhou 510641, Guangdong, China [email protected], [email protected]

Abstract. In this paper, a novel adaptive algorithm based on constrained genetic algorithm (GA) is presented for solving non-linear blind source separation (BSS), which can both get out of the trap of local minima and restrict the stochastic decision of GA. The approach utilizes odd polynomials to approximate the inverse of non-linear mixing functions and encodes the separating matrix and the coefficients of the polynomials simultaneously. A novel objective function based on mutual information is used with the constraints to the separating matrix and the coefficients of the polynomials respectively. The experimental results demonstrate the feasibility, robustness and parallel superiority of the proposed method.

1 Introduction Since 1990, researchers have attached increasing importance to BSS which means recovering original sources without knowing source signals and transmitted channels. BSS is widely used in signal processing, and there have been several methods for solving linear mixing model [1], [3], [5] including the basic theory. In [12], Xie corrected Stone’s conjecture and the modified conjecture without suspicion supplied a reliable basis for researchers to study BSS both in theory and algorithm design. For nonlinear case, the highly non-unique characteristics of both linear separating matrix and non-linear demixing function make it almost impossible to recover sources completely without some extra constraints. At present, post-nonlinear model for nonlinear BSS is widely utilized. The approach in [2] was mainly for solving sparse signal and it was based on nonlinear PCA in [6]. In [7], [11], the algorithms based on neural network were developed by using the stochastic gradient descent method which was also used in [6]. This method may lead to fast learning convergence rate of weights, however, the result may converge to local minima of the contrast function. In order to get out of the trap of local minima, GA was used in [4], however it was only utilized to obtain the coefficients of the polynomials which approximated the nonlinear demixing functions and the separating matrix was still obtained by gradient method. Furthermore, GA is a stochastic selection algorithm, the convergence of which is not proved theoretically, so a proper constraint to the solution space will lead to a better result. However, it was seldom to see the constraint in [4]. The condition was used in [7], however, the algorithm was mainly utilized for blind signal extraction D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 66 – 71, 2006. © Springer-Verlag Berlin Heidelberg 2006

Non-linear Blind Source Separation Using Constrained Genetic Algorithm

67

in order. In [8], a constraint to the estimations was used, but this approach used a sigmoid function with only one parameter which may affect the approximation of the non-linear demixing functions. In [10], the assumptions to the coefficients of the polynomials simplified the contrast function. But the results violated the assumptions without the constraints corresponding to them in the algorithm. The post-nonlinear model is like following:

x(t ) = f ( A ⋅ s (t )) , y (t ) = W ⋅ g ( x(t )) .

(1)

Where x(t ) are mixtures of signals s (t ) , y (t ) are estimations of s (t ) . (see Fig. 1)

x1 s1

f1

sn

fn

g1

y1

gn

yn

xn

Fig. 1. Post-nonlinear mixing and demixing model

In this work, a novel objective function based on mutual information with the constraints to the separating matrix and the coefficients of the polynomials is used for post-nonlinear model. Instead of stochastic gradient descent method, GA is utilized to solve non-linear BSS. The parallel superiority of GA is used to get out of the trap of local minima for both separating matrix and coefficients of the polynomials, and the constraints are used to restrict the stochastic decision of GA. The paper is organized as follows: in Section 2, the algorithm to solve non-linear BSS is described in detail, including the construction of the fitness function. The experimental results are shown in Section 3. Finally, a conclusion is given in Section 4.

2 Blind Separation Using GA 2.1 Fitness Function

The selection of fitness function is based on information theoretic criterion, the y ," yn mutual information between 1 is defined as follows [11]: n

I ( y1 ," yn ) = − H ( y1 ," yn ) + ¦ H ( yi ) .

(2)

i =1

The function g is approximated as follows by Weierstrass approximation theorem: P

g j ( x j ) = ¦ g jk x 2j k −1 , j = 1," , n . k =1

where g jk are adjustable parameters. Therefore, the fitness function is used [10]:

(3)

68

Z. Yang and Y. Wan n n ° L (W , g ) = − ln det W − ¦ E[log g ′j ( x j ) ] − ¦ H ( yi ) . ® j =1 i =1 °¯ H ( yi ) ≈ C − (k3i ) 2 12 − (k4i ) 2 48 + 3(k3i ) 2 k4i 8 + (k4i )3 16

(4)

where h jk = g jk h j1 for k ≥ 2 , and E[ yi2 ] = 1, E[ yi ] = 0 so as to WW T = I , C is a constant, k3i = E[ yi3 ], k4i = E[ yi4 ] − 3 . Suppose that g j1 = 1 , h jk 1(k ≥ 2) , from (4),

we can obtain the following non-linear programming problem with constraints: n

P

n

Max: f = ln det W + ¦¦ g jk E[(2k − 1) x 2j k − 2 ] + ¦ H ( yi ) .

(5)

s.t. h(W ) = WW T − I = 0, h ( g ) = g jk − ε ≤ 0, k ≥ 2 .

(6)

j =1 k = 2

i =1

where ε < 0.1 is a positive constant , H ( yi ) is estimated from (4). And (6) is equal to h j ( w) = 0, hi ( g ) ≤ 0, j = 1," , m1 , i = 1," , m2 .

(7)

Where m1 , m2 is from (9). Under the constraints, the feasible domain Q can be defined as follows [9]: Q = {( w, g ) h j ( w) = 0, hi ( g ) ≤ 0, j = 1, 2," , m1 , i = 1, 2," , m2 } .

(8)

㧘 m = (n

(9)

Definitions: w = [W11 , W12 ,"Wnn ]T

1

2

+ n) 2 , m2 = n ⋅ p − n .

H j ( w) = h j ( w) ∇h j ( w) , H max ( w) = max{H j ( w)} ° °° H i ( g ) = hi ( g ) ∇h j ( g ) , H max ( g ) = max{0, H i ( g )} . ® ° k1 = arg{ j H j ( w) = H max ( w)}, j = 1, 2," , m1 ° °¯ k2 = arg{i H i ( g ) = H max ( g )}, i = 1, 2," , m2

Def.1

Def. 2

DSFD ( w, g ) = H max ( g )∇hk2 ( g ) + sgn(hk1 ( w)) H max ( w)∇hk1 ( w) .

(10)

(11)

m1

d ( w, g ) = v0 ∇ w f − ¦ v j ∇h j ( w) .

Def. 3

(12)

j =1

where v j is weight of the gradient direction. In general, v0 = 0.5 , and

°v∗j , if (hj (w) = 0) vj = ® . °¯sgn(hj (w)) ⋅ (Hmax (w) +δ ) (Hmax (w) +δ −λ ⋅ Hj (w)) else

δ

where

, λ

are

d ( w, g ) ⋅∇h j ( w) = 0 . T

positive

constants,

v∗j

are

multipliers

(13) satisfying

Non-linear Blind Source Separation Using Constrained Genetic Algorithm

69

Def. 4

FD = d ( w, g )T ⋅ (− DSFD ( w, g )) .

(14)

Def. 5

p ° f ( w, g ) (1 + 1 FD ) , f ≥ 0 eval ( w, g ) = ® . p °¯ f ( w, g ) ∗ (1 + 1 FD) , f < 0

(15)

where p ≥ 1 , usually p = 2 . Then, the fitness function F ( w, g ) is given as e f ( w, g ) + e2 , ( w, g ) ∈ Q . F ( w, g ) = ® 1 ¯e1 ⋅ eval ( w, g ) + e2 , else

(16)

where e1 , e2 are positive real numbers such that F ( w, g ) ≥ 0 . 2.2 Operations Initial population: Select proper size N of the population, and encode the genes of the chromosome which corresponds to separating matrix W and coefficients of nonlinear function g with real number. Set proper parameters for the fitness function, crossover probability, mutation probability, maximum iteration number, and so on. Selection: Fitness-proportionate selection by roulette wheel is adopted, and the new generations come from combinational chromosomes with better fitness. Crossover: In the paper, the arithmetic combinatorial crossover operator is suggested:

° wi( k +1) = α ⋅ wi( k ) + (1 − α ) w(j k ) , ® ( k +1) = α ⋅ w(j k ) + (1 − α ) wi( k ) °¯ w j

° gi( k +1) = α ⋅ gi( k ) + (1 − α ) g (j k ) . ® ( k +1) = α ⋅ g (j k ) + (1 − α ) gi( k ) °¯ g j

(17)

Mutation: The weighted gradient direction from (12) is introduced for w :

w( k +1) = w( k ) + β ( k ) d ( w( k ) , g ( k ) ) and g ( k +1) = Mean( gi( k ) ), i = 1, 2," , m2 .

(18)

where β ( k ) is learning rate, and Mean( x) means the average of x . Stop rule: A maximum iteration number is determined to trigger the stop rule.

3 Experimental Results To provide an experimental demonstration of the validity of BSS with constrained GA, three sources will be used in post-nonlinear model. MSE and the residual crosstalk in decibels (Ct) [11] are used to evaluate accuracy of the algorithm. MSEi = E ( si (t ) − yi (t )) 2 , Cti = 10 log E[( yi − si ) 2 ] . where y, s are with unit variance.

(19)

70

Z. Yang and Y. Wan

The linear mixing matrix and three nonlinear functions: ª 0.1870 A = «« 0.1624 ¬« 0.1286

0.6390 0.5200 º 0.9216 0.2316 »» , 0.1824 0.7413»¼

f1 = tanh( x ) ° ® f 2 = tanh(0.8 x) . ° f = tanh(0.5 x) ¯ 3

(20)

Polynomials of fifth order were used as the approximations for g = f −1 , according to the algorithm, we have obtained the results as follows: g1 = x + 0.094569 x 3 + 0.039137 x5 ° 3 5 ® g 2 = x + 0.087650 x + 0.092012 x ° 3 5 ¯ g 3 = x + 0.098845 x + 0.045751x

ª 0.8188 -0.2598 -0.5230 º

, W = «« 0.3051 0.9686 -0.0589 »» . ¬«0.5074 -0.0840

(21)

0.8103 ¼»

Table 1. Crosstalk (Ct) and MSE corresponding to sources

Ct (dB) MSE

s1

s2

s3

-26.7264

-64.5408

-26.6678

0.0691

0.0016

0.0695

1 g1 h1

0 -1 -2

0

5

10

15

20

25

30

35

40

45

50

10

15

20

25

30

35

40

45

50

10

15

20

25

30

35

40

45

50

0 g2 h2

-0.5

-1

0

5

1 g3 h3

0

-1

0

5

Fig. 2. g i means the estimation of the non-linear demixing function according to the algorithm,

hi = f i −1 means the inverse of non-linear mixing function

Fig. 3. Original sources and the estimations

4 Conclusion In this paper, the post-nonlinear blind source separation model has been solved using constrained GA. The novelty of this approach is using reasonable constraints in the

Non-linear Blind Source Separation Using Constrained Genetic Algorithm

71

novel contrast function and a new fitness function is constructed. The experimental results show the validity of this method and the original sources are recovered acceptably up to scalings. It is proper to use constrained odd polynomials to approximate the inverse of non-linear distortion when it is under controlled. However, it may not work well under other conditions as it is a quite open question to estimate the inverse of a non-linear function, and there is still a long way for us to overcome it.

Acknowledgement This work is supported by the National Natural Science Foundation of China for Excellent Youth (60325310), the Guangdong Province Science Foundation for Program of Research Team (04205783), the Natural Science Fund of Guangdong Province, China (05103553), the Specialized Prophasic Basic Research Projects of Ministry of Science and Technology, China (2005CCA04100).

References 1. Li, Y., Andrzej, C., Amari, S.: Analysis of Sparse Representation and Blind Source Separation. Neural Computation, 16 (2004) 1193-1234 2. Gao, Y., Xie, S. L.: An Algorithm for Nonlinear Blind Source Separation Based on Signal Sparse Property and Kernel Function. Computer Engineering and Applications, 22 (2005) 33-35 3. Anthony, J. B., Terrence, J. S.: An Information-maximization Approach to Blind Separation and Blind Deconvolution. Neural Computation, 7(1995) 1129-1159 4. Tan Y., Wang, J.: Nonlinear Blind Source Separation Using Higher Order Statistics and A Genetic Algorithm. IEEE Trans on Evolutionary Computation, 5(2001) 600-612 5. Xie, S. L., He, Z. S., Gao, Y.: Adaptive Theory of Signal Processing. 1st ed. Chinese Science Press, Beijing (2006) 136-155 6. Gao, Y., Xie, S. L.: Two Algorithm of Blind Signal Separation Based on Nonlinear PCA Criterion. Computer Engineering and Applications, 22 (2005) 24-26 7. Zhang, J. L., He, Z. S., Xie, S. L.: Sequential Blind Signal Extraction in Order Based on Genetic Algorithm. Acta Electronica Sinica, 32 (2004) 616-619 8. Liu, H. L., Xie S. L.: Nonlinear Blind Separation Algorithm Based on Multiobjective Evolutionary Algorithm. Systems Engineering and Electronics, 27 (2005) 1576-1579 9. Richard, Y. K. F.,Tang, J. F., Wang, D.W.: Extension of A Hybrid Genetic Algorithm for Nonlinear Programming Problems with Equality and Inequality Constraints. Computers & Operations Research 29 (2002) 261-274 10. Martin-Clemente, R., Putonet, C. G., Rojas F.: Post-nonlinear Blind Source Separation Using Methaheuristics. Electronics Letters, 39 (2003) 1765-1766 11. Taleb, A., Jutten, C.: Source Separation in Post-nonlinear Mixtures. IEEE Trans on Signal Processing, 47 (1999) 2807-2820 12. Xie, S. L., He, Z. S., Fu, Y. L.: A Note on Stone's Conjecture of Blind Signal Separation. Neural Computation, 17 (2005) 321-330

A Distributed Wavelet-Based Image Coding for Wireless Sensor Networks Hui Dong, Jiangang Lu, and Youxian Sun National Laboratory of Industrial Control Technology Zhejiang University, Hangzhou 310027, China {dongh, jglu, yxsun}@iipc.zju.edu.cn

Abstract. The strict constrains of wireless sensor networks (WSN) on individual sensor node's resource brings great challenges to the information processing, especially in image capture sensor network. A Simple Wavelet Compression (SWC) processing of image coding is proposed to maximize compression and minimize energy cost in WSN. For low activity in WSN, we employ a Low-complexity Change Detection Algorithm (LCDA) to mark active blocks and we only encode these active regions to save energy. In particular, a position estimation and compensation method is presented to exploit the inherent correlations that exist between sensor readings based on distributed lifting scheme. The impact of this scheme on image signal quality is presented in the final. The simulation results showed that these approaches achieved significant energy savings without sacrificing the quality of the image reconstruction

1 Introduction Wireless sensor networks are being developed for a variety of applications such as environmental monitoring, habitat studies, marine biology and video-surveillance, just to mention a few. Such node is equipped with a sensing device that collects information from the environment (e.g., temperature, vibrations, audio, images) and transmits it through the networks to a central node, or sink, for processing or storage. Imaging sensors are able to provide intuitive visual information for recognition, monitoring, and surveillance. However, these sensors usually generate vast amount of data. For those battery-powered sensors, energy efficient transmission of the images collected and transmitted in the sensor network presents the most challenging problem [1]. In principle, image compression can reduce the amount of data to be transmitted by a considerable factor. On the other hand, it is well-known that most image coders exhibit a very high computational burden. This is not a matter of concern in desktop multimedia applications, but as for wireless sensor networks, where each of the sensors has limited power; current image coding schemes like JPEG standards can hardly reach such a threshold. Several energy efficient protocols of image compression, such as [2-5], are proposed for wireless applications. Magli E. [2] presents a low-complexity video compression with ultra low delay for real time wireless transmission based on change detection and JPEG-like compression of regions of interest. [3] propose an adaptive D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 72 – 82, 2006. © Springer-Verlag Berlin Heidelberg 2006

A Distributed Wavelet-Based Image Coding for Wireless Sensor Networks

73

and Energy Efficient Wavelet Image Compression algorithm (EEWIC) for lossy compression of still image, which consists of two techniques attempting to conserve energy by avoiding the computation and communication of high-pass coefficients: “HH elimination” and “H* elimination” technique. Another energy efficient power aware image compression [5] noted that maximum compression before transmission dose not always provide minimal energy consumption, and present a heuristic algorithm for it. The heuristic algorithm algorithm tries to minimize total energy dissipation with selecting the optimal image compression parameters under given the network conditions and image quality constraints. However, their approaches mainly focus on power efficient techniques for individual components and can not provide a favorable energy performance trade-off in the case of WSN. Fundamentally different from conventional image sensor, image sequences in WSN for environmental monitor are often characterized by low activity and high spatial correlation. These differences are calling for novel approaches for WSN and in particular in network data processing for saving energy consumption in transmission and computation. Based on fast lifting scheme, we propose an energy efficient distributed spatial-frequency wavelet transform algorithm for image compression, enabling significant reductions in computation as well as communication energy needed, with minimal degradation in image quality. Finally, the superior performance of this algorithm is demonstrated by comparing it with several other popular image compression techniques. The paper is organized as follows. Section 2 introduces the background and proposed algorithm. The Comparison of the scheme is addressed in Section 3. Section 4 presents some preliminary results. In Section 5 we present our conclusion and discuss future works.

2 Background and Proposed Algorithm We consider a wireless networks composed of a set of stationary, battery-powered sensor nodes, which is developed as part of the low power wireless sensor project at MIT (AMPS) [6]. Each of sensors is equipped with CLOCK, SENSOR, ADC, LED, RADIO and APPLICATION. The system set-up is shown in Figure 1. Sensor 1 Sensor 2

...

A/D

N hops

A/D Cluster head SWC

Centre node

... ...

A/D Sensor n

Fig. 1. The architecture of wireless system

74

H. Dong, J. Lu, and Y. Sun

In order to reduce the communications cost, WSNs can be organized according to a cluster architecture. For the sake of simplicity, we assume that the area under observation is an 2D model plane where sensors are located. Data sampling by the sensors are collected at a source sensor node, clustered by a head node, which is either a central controller or gateway to the fixed network. The sensors and the central node were assumed to be placed as in Fig 1, where the “N” is number of hops on the shortest path between the sources and the sink and “n” is the number of sources, which capture the image signal from the environment. Fig. 2 shows the block diagram of the image coding with the proposed position estimation and compensation in the wavelet domain. In the proposed coding scheme, an input image signal is decomposed by the integer wavelet transform and transmits to the cluster head node. The position estimation finds a similarity block in the neighborhood sensor, which is matched with the current block, and then gets the position vector. The wavelet block consists of the wavelet coefficients which are only related to a local region of the image. According to position vector, we can shift the wavelet coefficients buffer and make it has strong correlation between coefficients of different sensors. In the final, we encode the similar coefficients block with proposed simple wavelet compression. The residual signal can be quantized and encoded by embedded zerotree wavelet (EZW) coder [7] or by set partitioning in hierarchical trees (SPIHT) coder [8]. Position

Position vectors

estimation sensor

Wavelet DWC coefficient

Change detection

...

...

Input sensor

Marked block

Change detection

Cluster head

Position compensation

Centre node

DWC Bit stream

Reference frame

DWC

Entropy coder

Fig. 2. Block diagram of proposed image coding scheme

2.1 Change Detection Unlike typical multimedia video, image sequences in WSN for environmental monitor are often characterized by low motion, when no object is expected to move within the scene but in case of anomalies. In this section we present a low-complexity algorithm to scans the image and mark those active regions within one frame, and we only encode these active regions to save energy consumption. Each input image signal data is divided into 8x8 blocks. In order to decrease complexity, these pixels in each block are hierarchically subdivided into the subsets number 1. 2, and 3 in order of importance [2]. The algorithm scans the value in the block according to the order of importance, computing the difference between each value and the one in the same position in the reference frame; then, it attempts to classify the difference as noise, shadow or illumination change, or object motion. Accordingly, Di, R, U is defined as follow:

A Distributed Wavelet-Based Image Coding for Wireless Sensor Networks

Di = xori (i ) − xref (i ) n

R = ¦ Di i =1

75

(1)

n

¦x

ori

(i )

(2)

i =1

U = Max ª¬ xori (i ) − xref (i ) º¼ ( i = 1...n )

(3)

Moreover, two values N and M are defined and used as a threshold to classify the signal. A sensitivity parameter S is defined as the maximum allowed number of “active” bits in an inactive block and P is defined as the number of pixels for which Di exceeds M. The simple code is presented in Algorithm 1. The reference frame is updated by copying the marked blocks of the marked frame onto the current reference frame. The threshold are automatically computed and updated during the analysis and encoding process[2]. Algorithm 1 For i=0 to n scan each pixel according to the order of importance if P>S then mark the block as activity encode this block proceeding with next block endif End For calculate R if RN and U>M then mark the block as activity encode this block proceeding with next block end if if RM then classify the block content as shadow or illumination. proceeding with next block end if update the reference frame compute the threshold and update it 2.2 The Wavelet Transforms Based on Lifting Scheme

The lifting scheme is an alternative method to compute wavelet transforms. It allows a faster implementation, along with a full in-place calculation of the coefficients [10, 11]. Lifting scheme contains three stages: split, prediction and update. The data is

76

H. Dong, J. Lu, and Y. Sun

split into two disjoint sets, followed by a series of prediction and update steps to transform the coefficient data (Fig.3), where sn denotes the high-pass data and d n denotes the low-pass data after the transform is computed.

d j+1 sj

split

predict

update

+

s j+1

Fig. 3. Block diagram of the lifting process

The 5/3 filter structure is an ideal choice in low energy systems, which greatly relaxes computational requirements. It is given by: ª s 0 (2 n ) + s 0 ( 2 n + 2) º ° d 1 ( n ) = s 0 ( 2 n + 1) − « » 2 ° ¬ ¼ ® ( d ( n − 1) + d ( n )) 1 ª º 1 1 ° s ( n ) = s (2 n ) + + » 0 « °¯ 1 4 2¼ ¬

(4)

The wavelet transforms based on lifting scheme has received widespread recognition in the field of image processing for its notable success in coding and obtained very good compression performance. The proposed wavelet transforms is just an extension of standard wavelet transforms. The entire process is carried out by executing standard wavelet transforms twice, one is executed inside sensor node (to reduce the temporal redundancy), and the other in cluster node (to reduce the spatial redundancy). 2.3 Position Estimation and Compensation

High spatial density of sensor networks induces a high level of network data redundancy, where spatially proximal sensor readings are highly correlated. The sensor nodes can compress their data based on the fact that there is another sensor measuring data that is correlated with it. In video coding, several types of interframe predictions have been used to reduce the interframe redundancy. Motion compensated prediction has been used as an efficient scheme for temporal prediction. Likely, a Position Estimation and Compensation method is proposed to fully exploit the spatial inherent correlations that exist between sensor readings. Different from the conventional motion estimation, the proposed position compensation is executed in wavelet domain, which can be overcome the shortcoming of shift-variant property [9]. In this section we present an algorithm on how to exploit the correlation between sensors using position estimation and compensation. The correlation degree between sensors is determined by the overlapping sensing area of correlated nodes. We consider a 2-D model for the sensing area of image sensors illustrated by Fig. 4a. Here S1, S2 is the location of the sensor node, R is the sensing, radius, V is the center line of sight of the camera's field of view which will be termed sensing direction, and Į is the offset angle of the field of view on both sides of V. Figure 4b is the experimental

A Distributed Wavelet-Based Image Coding for Wireless Sensor Networks

77

result of sensing, which illustrate the similarity of two sensor reading. We can examine every block therein and determine if it also falls in others’ sensing areas, and then use a block-matching algorithm (Position Estimation and Compensation) to reduce the spatial redundancy. Y

YĻ

object

V2

V1

bĻ

bĻ

b

b

b ß

a

ß

a

a

a

S1 S1

X'

S2

S2

X

b

a

Fig. 4. a. Sensing model of image sensors. b. Experimental image.

The image block matching schemes are based on mapping coordinate locations, in one image to corresponding locations in a second image. Assume that the distance between the location and the image plane for all sensors are the same and denoted by d, which can characterize the correlation between sensor readings. The (x1, y1), (x2, y2) is the location of the block b in S1, S2 respectively, and the b’ (x’,y’) is the virtual location of the block b(x2, y2) coordinate transformation. As 2-D model, coordinate transformation formula is: T

ª x2 º ªx 'º « y '» = « y » « 2» « » «¬1 »¼ ¬«1 »¼

T

§ 1 0 0 · § cos α ¨ ¸¨ ¨ 0 1 0 ¸ ¨ − sin α ¨ ¸¨ © a b 1¹© 0

sin α cos α 0

0· ¸ 0¸ 1 ¸¹

(5)

Where the (a, b) is the location of node S2 in S1 coordinate. Given the b and b’ loca)))* tion, we can get the position estimation vector bb ' . Fig. 4b illustrates the mapping. Following the same approach we can also determine the mapping relation between S1 and Sn. It is worth to be emphasized that the position estimation algorithm can be executed offline, so it is not a energy burden for wireless sensor network. Different from the position estimation, the position compensation is executed wavelet domain, which can be overcome the shortcoming of shift-variant property. Like motion compensation introducing in the MPEG standards, the block-based position compensation often produces discontinuities between the blocks because the neighboring motion vector are not coherent. These discontinuities lead to highfrequency components in the residual signals and generate large signals of the wavelet coefficients in the high-bands, so the coding efficiency can be degraded. The wavelet transform decomposes an image into four bands of LL, HL, LH, and HH, which are the low–low, the high–low, the low–high, and the high–high bands along the horizontal and the vertical directions, respectively. The so-called wavelet coefficient block (WCB) consists of those wavelet coefficients of an image that are

78

H. Dong, J. Lu, and Y. Sun

only related to a local region of the image (as shown in Fig. 5). We first makes posi)))* tion estimation in spatial domain and obtains position vectors ( bb ' ) for each prediction block. Then, with taking advantage of local spatial-frequency characteristic of )))* wavelet coefficients, we shift the WCB’s order in coefficients buffer according to bb ' to compensate the WCB of prediction block with that of reference block.

Fig. 5. a. Original image. b. Coefficient after the transform.

3 Comparison of the Scheme In order to fairly compare other distributed approach and the proposed algorithm, a cost function that takes into account both processing costs and transmission costs have to be defined. The total energy dissipated at each sensor will be split into three main components: E = E p + Et + Er

(6)

where E p is the energy consumption due to wavelet transform processing, Et , Er is energy dissipation for radio transmission and reception, which has also been developed to model by a sensor node when transmitting and receiving data [12]:

ET = Eelec k + ε amp kd 2

(7)

ER = Eelec k

(8)

Equation (7), (8) stand for the energy dissipated to transmit a k-bit packet over a distance d to receive the k-bit packet respectively, where Eelec is the energy dissipated to run transmit or receive electronics, and ε amp is the energy dissipated by the transmit power amplifier to achieve an acceptable E/N at the receiver. We assume that the energy used to transmit a bit is equivalent to the energy used to receive a bit over very short distances. For our radio, we use the parameters Eelec = 50nJ / b and ε amp = 100 pJ / b / m 2 . To determine the energy efficiency of each algorithm, we take a closer look at the computational complexity of the wavelet transform computed using lifting [10]. We analyze energy efficiency by determining the number of times certain basic operations are performed for a given input, which in turn determines the amount of switching activity, and hence the energy consumption. For standard wavelet algorithm, in the forward wavelet decomposition using the above filter (5/3 filter), 2 shift and 4 add

A Distributed Wavelet-Based Image Coding for Wireless Sensor Networks

79

operations are required to convert the sample image pixel into a low-pass coefficient. Similarly, high-pass decomposition requires 2 shift and 4 add. We model the energy consumption of the low/high-pass decomposition by counting the number of operations and denote this as the computational load. Thus, for a given input M x N bits image signal and wavelet decomposition applied through L transform levels, we can estimate the total computational load as follows: L

N DWC = MN (8 A + 4S )¦ l =1

1 4l −1

(9)

Besides various arithmetic operations, the transform step involves a large number of memory accesses. Since the energy consumed in external and internal data transfers can be significant, we estimate the data-access load by counting the total number of memory accesses during the wavelet transform. At a transform level, each pixel is read twice and written twice. Hence, with the same condition as the above estimation method, the total data-access load is given by the number of reads and writes operations: L

N R _ DWC = NW _ DWC = 2MN ¦ l =1

1 4l −1

(10)

The overall computation energy is computed as a weighted sum of the computational load and data-access load. A simple energy model can be used to model the active energy dissipation due to computation of the SA-1100 as a function of supply voltage [12]:

E p = NCVdd2

(11)

Where N is the number of clock cycles per task, which is determined by N SDWC , N R _ SDWC and NW _ SDWC . C is the average capacitance switched per cycle, and

Vdd is the supply voltage. For the StrongARM SA-1100, C is approximately 0.67 nF. Obviously, the cost of the proposed lifting algorithm for computing the wavelet transform is one half of the cost of standard algorithm asymptotically. For the HH elimination technique in EEWIC [3], the result is given as flow: CR _ HH = CW _ HH =

CHH =

E L 7 1 1 MN ¦ l −1 + 2 MM ¦ l −1 4 4 4 l =1 l = E +1

L MN (22 A + 19 S ) E 1 1 + MN (12 A + 10 S ) ¦ ¦ l −1 l −1 2 l =1 4 l = E +1 4

(12)

(13)

where E is applied to the first E transform levels out of the L total transform levels [3]. To get an idea of the impact on image data quality, we also measured the distortion that the wavelet algorithm brought. Reconstruction data quality is often measured

80

H. Dong, J. Lu, and Y. Sun

using a metric know as Peak Signal to Noise Ratio (PSNR). This is defined as (in decibels):

PSNR = 20 log10

2b − 1 E x ( i, j ) − y ( i, j )

(14)

where x(i,j) is the pixel value of original image, y(i,j) is of the reconstructed image and b is the bit-depth (bpp) of the original image. We recognize the PSNR does not always accurately model perceptual image quality, but use it because it is a commonly used metric in the literature.

4 Simulation Results We performed a set of experiments in 2D model as proof of concept of our approach. In particular, the quality of image using the proposed method is studied. Peak Signal to Noise Ratio (PSNR) in dB between the original and reconstructed signal was calculated for objective quality assessment. In the experiments, all sensors have the same parameter setting. The size of image captured by a sensor is 176x 144 pixels. The sensing offset angle is π / 8 and the angle between sensing directions of the two sensors π / 4 . The Sampling frequency of image sensor is 1 frame/s .

Fig. 6. Different energy used between EEWIC and SWC without Change Detection Algorithm with respect to the number of sources (n) and the degree of correlation (d)

Fig.6 shows the different energy used in EEWIC and proposed SWC without Change Detection Algorithm, where nodes number N is 1000ҏ, sampling time T=4s and ҏwavelet decomposition layer k=3. The sources number (n) and the distance (d) is varied for 0 to 100 and 0 to 10 (m) respectively. Note that for the case of the number of sources n =10, energy expenditure of our algorithm is actually few change due to the propose algorithm concentrating on exploiting the correlation between sensors. On the other hand, we can find that the extent of correlation in the data from different sources can be expected to be a function of the distance (d) between them. The figure shows a steeply decreasing convex curve of energy difference that reaches saturation when the sensor distance (d) over 5m. Thus, under experimental condition, we

A Distributed Wavelet-Based Image Coding for Wireless Sensor Networks

81

defined the two sensors are so-called strongly correlated while the distance (d) between sensors is within [0m, 5m], otherwise they are weakly correlated. To get an idea of the impact on image quality, we present the comparisons of the proposed algorithm and EEWIC algorithm. Fig. 7 shows the PSNR value versus CR for case of specific data sensor reading while wavelet decomposition layer k=3. We can see in the figure, as the compression ratio increase, the quality of reconstruction data degrades. However, in doing so, by applying the SWC and WTIC techniques at same case, different result is obtained. For the compression ratio CR<10 case, There is no perceivable difference in the quality of the two approach. But as the value of CR increasing, the quality of reconstruction data of EEWIC algorithm suffer a sharply drop, and the SWC algorithm outperform.

Fig. 7. PRD of reconstruction data for AWIC and SWC with respect to CR

5 Conclusions We have proposed a method of reducing energy consumption by using simple wavelet distributed compression for WSN’s image capture in low motion scenario. This algorithm exploit the fact that the inherent correlations between sensor readings using Position Estimation and Compensation. We also proposed a change detection algorithm to reduce computation complexity without sacrificing the quality of the image reconstruction. Experimental results show that the proposed scheme has not only high energy efficiency in transmission but also graceful degradation in PSNR performance in terms of image reconstruction quality. Several extensions of the problem studied in this paper are worth further investigation. The above experimental is run without Low-complexity change detection algorithm (LCDA), since the other algorithm (such as MPEG, EEWIC) is not designed to work at very low bit-rates and low motion scenario (environment monitor), so the comparison is unfair. Additional, we have not considered the 3-D sensing model in this work, in which the location and sensing directions for two sensors may not be in the same plane; whether our conclusions hold up under these circumstances remains to be seen.

82

H. Dong, J. Lu, and Y. Sun

Acknowledgment This work was supported by the National Natural Science Foundation of China (No. 20206027), the Key Technologies R&D Program in the 10th Five-year Plan of China (No. 2004BA210A01), the Key Technologies R&D Programs of Zhejiang Province (No. 2005C21087 and No. 2006C31051), and the Academician Foundation of Zhejiang Province (No. 2005A1001-13).

References 1. Akyildiz, F., Su, W., Sankarasubramaniam, Y., Cayirci, E.: Wireless Sensor Networks: a Survey, Computer Networks, 38 (2002) 393-422 2. Magli, E. Mancin, M., Merello, L.: Low-complexity Video Compression for Wireless Sensor Networks, Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on Volume 3, 6-9 July (2003) 585-588 3. Lee, D.G., Dey, S.: Adaptive and Energy Efficient Wavelet Image Compression for Mobile Multimedia Data Services. Communications 2002, ICC 2002, IEEE International Conference on Volume 4, 28 April-2 May (2002) 2484 – 2490 4. Martina, M., Masera, G., Piccinini, G.: Embedded IWT Evaluation in Reconfigurable Wireless Sensor Network. Electronics, Circuits and Systems 2002, 9th International Conference on Volume 3, 15-18 Sept. (2002) 855 – 858 5. Huaming W., Abouzeid, A.A.: Power Aware Image Transmission in Energy Constrained Wireless Networks. Computers and Communications, 2004. Proceedings. ISCC 2004. Ninth International Symposium on Volume 1, 28 June-1 July (2004) 202 – 207 6. Min, R., Bhardwaj, M., et al.: An Architecture for a Power-aware Distributed Microsensor Node, Proc. IEEE Workshop Signal Processing Systems (SiPS ’00), Oct. (2000) 581-590 7. Shapiro, J.M.: Embedded Image Coding using Zerotrees of Wavelet Coefficients, IEEE Trans on signal processing, 12 (1993) 3445-3462 8. Said, A., Pearlman, W.A.: A New, Fast and Efficient Image Codec based on Set Partitioning in Hierarchical Trees, IEEE Trans. Circuits Syst. II, 6 (1996) 243–250 9. Park, H.W., Kim, H.S.: Motion Estimation Using Low-band-shift Method for Waveletbased Moving-picture Coding, IEEE Transactions on Image Processing, Volume 9, Issue 4, April (2000) 577 – 587 10. Sweldens, W.: The lifting scheme: A Construction of Second Generation Wavelets, SIAM Journal of Mathematical Analysis, 29 (1998) 511-546 11. Daubechies, I., Sweldens, W.: Factoring Wavelet Transforms into Lifting Steps. J. Journal of Fourier Analysis and Application, 4(3), (1998) 245-267 12. Wang, A., Chandraksan, A.: Energy-efficient Dsps for Wireless Sensor Networks, IEEE Signal Processing Magazine, Volume 19, Issue 4, July (2002) 68 - 78

Development of Secure Event Service for Ubiquitous Computing* Younglok Lee1, Seungyong Lee1, and Hyunghyo Lee2,** 1

Dept. of Information Security, Chonnam National University, Gwangju, 500-757, Korea [email protected], [email protected] 2 Div. of Information and EC, Wonkwang University, Iksan, 570-749, Korea [email protected]

Abstract. In ubiquitous computing, application should adapt itself to the environment in accordance with context information. Context manager is able to transfer context information to application by using event service. Existing event services are mainly implemented by using RPC or CORBA. However, since conventional distributed systems concentrate on transparency hiding network intricacies from programmers - treating them as hidden implementation details that the programmer must implicitly be aware of and deal with, it is not easy to develop reliable distributed services. Jini provides some novel solutions to many of the problems that classical systems have focused on, and makes some of the problems that those systems have addressed simply vanish. But there is no event servicein Jini. In this paper, we design and implement a secure event service, SeJES, based on Jini in order to provide reliable ubiquitous environment. By using the proposed event service, event consumers are able to retrieve events based on the content. In addition, it enables only authorized suppliers and consumers to exchange event each other. We use SPKI/SDSI certificates in order to provide authentication and authorization and extend JavaSpaces package in order to provide a contentbased event retrieval service.

1 Introduction In ubiquitous computing environment, application should be able to properly adapt itself according to its own context information coming from ubiquitous sensors. Most of the existing communications are based on request-reply communication model. However, many ubiquitous computing applications require more flexible and indirect or asynchronous communication mechanism. Event Service[1] is the one which can be used for these asynchronous communications. By using CORBA Event Service, a number of event suppliers and consumers can asynchronously communicate even with no background knowledge with each other. Suppliers and consumers never directly connect to each other and communicate with each other through the event channel. *

This research was supported by the MIC (Ministry of Information and Communication), Korea, under the ITRC (Information Technology Research Center) support program supervised by the IITA (Institute of Information Technology Assessment). ** Corresponding author. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 83 – 94, 2006. © Springer-Verlag Berlin Heidelberg 2006

84

Y. Lee, S. Lee, and H. Lee

Generally, the Event Service of CORBA (Common Object Request Broker Architecture) can be really applied to many applications. However, in the case of using the event service of CORBA, problems such as persistency and filtering should be solved. Also since the existing distributed system such as CORBA can not provide a reliable environment to develop distributed service, Jini was emerged to solve the problem. In addition, the several features and services of Jini support the characteristics of ubiquitous computing environment. Among those Jini’s services is JavaSpaces[2]. But there is no event service in Jini. JavaSpacesTM is a networked repository for java objects, which provides methods of sharing and transferring objects even if Jini applications do not have any knowledge with each other. Even though applications can utilize an object transferring functions provided by JavaSpaces in order to asynchronously communicate, there are still two problems in doing it. One is that the remote listener which receives the event from JavaSpaces has to call a read() method on the JavaSpaces proxy. The other is that the event acquired by that remote method read() is not guaranteed as the latest. By modifying and extending JavaSpaces, we implemented JES (JavaSpaces based Event Service)[3]. But there is no security service in JES In this paper, we implement SeJES (Secure JavaSpaces based Event Service) by extending JES to be more applicable to ubiquitous environment. No matter what degree of computing power event consumers have, the proposed SeJES provides secure communication and content based filtering services. Our SeJES performs basic but essential security services such as authentication and authorization using SPKI/SDSI certificates as well. Also, our SeJES provides an enhanced security by transmitting renewed secure session key to event service consumers and suppliers, and some functions of QoS. This paper consists of as follows: Section 2 briefly reviews related work, and section 3 explains a system model of the SeJES. Section 3 also briefly describes how each component consisting of the system model can be implemented. Section 4 describes how the SeJES be implemented. In this section, we define interfaces furnished to consumer and suppliers, which use Event service. We explain the implementation scenario applied on our system. Finally, conclusion and the further research work are shown in Section 5.

2 Related Work Many CORBA vendors have developed Event Services, compliant with the OMG specification. Some have even added their own functionality to overcome the drawbacks depending on the application domain in which the Event Service is to be used. This section describes some commercial and academic event models that are CORBA based, and looks at their advantages and drawbacks[4]. IONA Technologies provides two different types of message service products; OrbixEvents[5] and OrbixTalk[6]. OrbixEvents is a C++ implementation of the OMG CORBA Event Service and the only commercial available Event Service that supports

Development of Secure Event Service for Ubiquitous Computing

85

both, untyped and typed events for the push and pull communication models. OrbixTalk is used for distributing IDL-based operations over UDP using either a simple or reliable multicasting, which is ideal in systems that have many consumers and suppliers, since with UDP there is no need to maintain the connection between each consumer and supplier. However, since it is based on UDP it cannot interoperate with any other ORB system. jEVENTS[7] is a Java implementation of CORBA Event Service for untyped messages, produced by a company called Outack Resource Group Inc. jEVENT supports both push and pull style communication between suppliers and consumers, is also IIOP compliant and may be used with any ORB that supports IIOP. The VisiBroker Event Service is available in both, C++ and Java versions from Inprise Corporation. Both versions are compliant with the OMG CORBA Event Service implementing untyped events for push and pull communication models. When used with VisiBroker ORB and its Smart Agent architecture that is vendor specific, it becomes a highly available self-recovering service. ICL have developed a multicast Event Service called DAIS[8]. A multicast service was developed due to the requirements from a customer that needed to communicate sixty messages per second, each being a few hundred bytes in size, to over eight hundred consumers. With these requirements, a standard CORBA Event Channel would have to produce nearly half a million individual messages per second, clearly unfeasible for a distributed system. To minimize the amount of network traffic between supplier and consumer applications, messages are collected into packets and sent in a single UDP packet. However, existing distributed systems such as CORBA do not consider transferring delay and system performance as a part of program models. They also treat problems concerning network programming as what a programmer should deal with by himself. Therefore, existing distributed systems have difficulty in providing reliable distributed service. But Jini[11], ubiquitous middleware, provides some novel solutions to many of the problems that classical systems have focused on, and makes some of the problems that those systems have addressed simply vanish. Jini supports serendipitous interactions among services and users of those services. Jini allows services to come and go without requiring any static configuration or administration. Also Communities of Jini services are largely self-healing[12]. But in Jini, there is no event service.

3 System Model The system model of the SeJES consisting of four components can be distributed among hosts across the network as shown in figure 3.1. The main function of the SeJES is that only the authorized event consumers and producers can exchange events by verifying their identities and capabilities. Event service registers itself with lookup service, which allows event producers and consumers to get the event service proxy. We implement each component of the model described above as follows:

86

Y. Lee, S. Lee, and H. Lee

1. Event Service registration

Lookup Service

Event Service

Host A

Host B

2. Applications lookup the event service

3. Applications communicate with events after checking securities

Supplier Application

Consumer Application

Host D

Host C

Fig. 3.1. System Model of SeJES

• Discovery Service – Event producer and event supplier applications discover the ev ent service by using Lookup Service. We use reggie, developed by Sun, which com plies with discovery service specification of Jini. • Secure Event Service – Our SeJES(Secure JavaSpaces based event service) consists of four components(LRC, JES, SSCM, and ERC) as shown in figure 3.2. Major tasks of the LRC (Listener Registration Controller) are not only to retrieve events based on event types and contents, but also register consumers’ listener with the JES (JavaSpaces based Event Service) in order for the only authorized consumers to listen events. The JES plays a central role in notifying events provided by event suppliers to its registered consumers and storing the events for content based retrieval. The SSCM(SPKI/SDSI Certificate Manager) proves whether certificate list given from consumer is correct or not, and then returns ACLs (Access Control Lists), related to the events which consumers wish to get, to consumers. The ERC (Event Registration Controller) registers the type of event and ACL, sent by event suppliers, with the SSCM, and checks whether supplied events are authorized or not. SeJES retrieval( ) registerConsumer( ) registerListener( )

LRC

validate Cert( )

SSCM

store( )

registerListener( )

validateCert( ) notify( )

JES

write( )

ERC

Fig. 3.2. Functional Components of SeJES

registerProducer( ) publish()

Development of Secure Event Service for Ubiquitous Computing

87

• Event consumer – Event consumer applications are classified as two types. One is run in machines with powerful computing power and the other does with low cost equipment. In the case of the former, event consumer owns certificate chain discovery algorithm and directly calculates certificate paths which prove that the consumer can get the event, and then delivers them to the SSCM. However, the latter sends all of its name certificates and authorization certificates to the SSCM in order to discover certificate chain lists, which authorize the consumers to get the event. Event consumer applications are able to retrieve with the events as content based after finding event service from discovery service. Furthermore, the consumer is able to get notified of what it wants, as registering remote listener with event service in real time. • Event supplier – Before sending event object instances to the SeJES, event supplier sends a event type and the ACL corresponded with the event type. After getting secure session ID from the SeJES, event supplier inquires SeJES if it is able to send event by using session ID, and then if permitted, it sends the event object instance.

4 The Design and Implementation of SeJES 4.1 Components of SeJES System This section describes the functions of each component of our SeJES and explains interfaces provided in each module. In addition, we define ACL which event suppliers will provide with and SPKI/SDSI certificates [9] which event consumer will use. Finally, we show the details of SeJES operation by providing the event service usage scenario. 4.1.1 ACL and SPKI/SDSI Certificate ACL(Access Control List) is a form of expressing security policy that defines which event supplier(issuer) delegates authorization to event consumers(subjects) who will get his events. < issuer, subject, delegation-bit, authorization tag, validity > Event consumer is granted the following SPKI/SDSI name and authorization certificates from event supplier. Name certificate - Authorization certificate - Figure 3.3 explains the S-expression of the authorization certificate that Bob grants its authority “get event 2” to subject called XMan in his local name space, from Nov., 20, 2005 to June, 18, 2006. Name and authorization certificates are sent to event service with their event types when event consumer registers its remote listener with the event service. That is, after finding ACL which fits a

88

Y. Lee, S. Lee, and H. Lee

(cert

(display plain) (issuer (public-key (rsa (e #010001#) (n |APsREOm+tJQsyS6f7ddzrY4A ...|)))) (subject (name XMan)) (tag “get event 2”) (valid (not-before "2005-11-20_06:51:33") (not-after "2006-6-18_21:51:33")) (comment "test certificate")) (signature (hash sha1 |aj5Le4mGJ1BltdNdhUm3BVxjgrw=|) (public-key (rsa (e #010001#) (n |APsREOm+tJQsyS6f7ddzrY4ACM9fmQC ...|))) (rsa-pkcs1-sha1 |mSWhfa2GBJ3YKwkEYL/7yCP3IicwYtCvC ... |))

Fig. 3.3. S-expression of authorization certificate

event type by calling SeJES.retrival() on the SeJES proxy, event consumer invoke SeJES.registerConsumer() with the first input “eType” and the second input “certificate-Path”. 4.1.2 LRC The LRC (Listener Registration Controller) is responsible for registering event listener which event consumer hopes to register with the JES. Using parameter information provided by event consumer, the LRC retrieves ACLs corresponded with the event type and provides methods which can retrieve event based on the event content. It requests authorization check by sending the event type and a bundle of certificates to the SSCM and decides whether it register the event listener or not depending on its returned value.

ListenerRegistrationController retrieval(eType) retrieval(eTemplate, principal, SessionID) registerConsumer(eType, certificate-Path) registerConsumer(eType, certificate-List) registerListener(eTemplate, Listener, principal, sessionID) Fig. 3.4. Interfaces of the LRC

• retrieval(eType), retrieval(eTemplate, principal, sessionID) An Event Consumer invokes the method retrieval(eType) in the LRC proxy so that it can realize if he or she has authorization concerned the event type “eType”. After

Development of Secure Event Service for Ubiquitous Computing

89

finding ACL which represents appropriate authorization related to the event type as a result value, this method returns the ACL to the consumer. In addition, after checking authorization, consumer can invoke a method retrieval(eTemplate, principal, sessionID) of event service proxy in order to retrieve the event based on content. By using three parameters, this method proves if consumer is properly authorized and updates session key as a new value and returns the event which coincides with the eTemplate to the consumer. LRC.retrieval(nType) Output: eSessionKey lease-Time LRC.retrieval(nTemplate, principal, sessionID) Input:

Input:

eType

eTemplate principal eSessionID

Output: Object-List

• registerConsumer(eType, cert-Path), registerconsumer(eType, cert-List) Event Consumer, running in the machine with computing power, calls a method registerConsumer(eType, certificate-Path) on the LRC proxy in order to send the first parameter “eType” and he second parameter “certificate-path” that event consumer can prove its authorization suitable for the event type “eType” to the LRC. After proving that the certificate-path is valid, the LRC creates a session key and encodes it. Then it stores the event type, the public key of event consumer, just created session key, nonce, and lease-time in order to check consumers’ authorization later on. As the values of results, this methods returns encoded session key and lease-time to consumer. LRC.registerConsumer(eType, certificate-Path ) Input:

eType certificate-Path

Output: eSessionKey lease-Time

However, event consumer, running low cost equipment, calls registerConsumer(eType, cert-List), to send all of the certificates which it owns. • registerListener(nTemplate, Listener, principal, sessionID) After checking consumer’s authorization and if the result is true, the LRC updates nonce and registers remote listener with the JES. LRC.registerListener(eTempl, Listener, principal, sessionID) Input:

eTempl Listener principal SessionID

Output: boolean

90

Y. Lee, S. Lee, and H. Lee

4.1.3 SSCM By using certificate list sent by the event consumer of low cost equipment, the SSCM prove that the consumer can achieve the event instance of the event type. Also it checks the proof of certificate-Path directly sent by the event consumer in machines with powerful computing power. As a result, the SSCM includes the algorithm of “Certificate Chain Discovery”[10] and returns Boolean value of each case to the LRC(Listener Registration Controller). The SSCM interfaces are as Figure 3.5.

SPKI-SDSI-ControlManager storeACLs(eType, ACLs, leaseTime) validateCert(eType, Certificate-Path) validateCert(eType, certificate-List) getACLs(eType)

Fig. 3.5. Interfaces of the SSCM

• storeACLs(eType, ACLs, leaseTime) The ERC calls this method of the SSCM module in order to store ACL which is a collection of authorizations for the consumer to achieve events provided by event supplier. • validateCert(eType, Certificate-Path), validateCert(eType, Certificate-List) These methods are invoked by the LRC in order to request authorization proof of event consumer. The first method owns Certificate-Path as parameter, a collection of certificates proven by consumers and the second one owns Certificate-List, a collection of all name and authorization certificates held by consumers. • getACLs(eType) This method finds and returns ACL which is necessary for consumers to achieve designated types of event 4.1.4 ERC The ERC (Event Registration Controller) is responsible for checking the proof of event types and ACL which are provided by event suppliers. It also checks the replication of events. Furthermore, the ERC has a responsibility of transferring events

Event Registration Controller registerProducer(eType, ACLs, certificate-Path) publish(notification, principal, sessionID) Fig. 3.6. ERC interface

Development of Secure Event Service for Ubiquitous Computing

91

provided by event suppliers to the JES. Interfaces provided by the ERC are shown in Figure 3. 6. • registerProducer(eType, ACLs, certificate-Path) By sending an event type “eType”, ACLs and authorization proof information “certificate-Path” to the SSCM, this method let the SSCM store them in its cash. Furthermore, this method creates encoded session key by using two parameters of producer’s principal derived from certificate-Path and session key, a random nonce. This encoded session key is used to prove whether the event is authorized to provide itself to the JES or not. • publish(event, principal, sessionID) This is the method that publishes suppliers’ event. After checking if the event sent by suppliers is authorized and then if only if the return value is true, this method sends the event to the JES. 4.2 Testing of SeJES Service This section summarizes how a consumer and a supplier use our SeJES service, implemented in Jini environment. In addition, it shows GUI screen that shows, in order to get event which the consumer is interested in, whether listener’s registration in the SeJES is successfully committed or not. It also exhibits GUI screen including procedures and results necessary for suppliers to publish event. 4.2.1 Test Scenario A scenario to test SeJES services is as follows: Susan who possesses event supplier sensor1 sets ACL, <self, Bob, 1, “get event1”, (05-11-20, 06-03-29)> in sensor1 in advance. When Susan turns sensor1 on, event supplier application in sensor1 registers event type(event1) and its ACL set by Susan with SeJES. In the meantime, Bob hopes to get the event1 published by gadget1 of Susan. In order to get the event1 instance, Bob asks SeJES to send ACL corresponded with the event1. And then Bob calculates Certificate-Path establishing a chain from name and authorization certificates in its certificate cash to ACL of event1 which she wants to get. Now Bob registers his remote listener with SeJES and waits till the event arrives. In the meantime, unauthorized Charlie also tries to register his remote listener to get the event1 of sensor1. In this case, since he can’t prove his authorization, his request is rejected. Purchasing sensor2, Eve tries to use event service in order to publish event2, but it is rejected as well. Figure 4.1 shows the order of methods invoked in order for event consumer to be notified and for supplier to provide the event.

92

Y. Lee, S. Lee, and H. Lee

SeJES

Gadget 1 Event Consumer SPKI/SDSI certificates Remote Event Listener web server

1.ACL request 2.registerConsumer( )

LRC

2.1 validate Cert( )

3.Reply Permit or deny 4.registerListener( )

iii)ValidateCert( ) 4.1 registerListener( ) iv)store( )

6.Notify Notification

5.Download nListener_stub

SSCM

vi)write( )

JS-ES

ERC

i,ii) ACL request & RegisterProducer( )

v)publish( )

Event Producer ACLs

Sensor 1

Fig. 4.1. Procedure of Secure Event Service

Figure 4.2 exhibits a screen dump showing that consumer Bob tried to register his listener with the SeJES in order to get an event “Alice location” and the SeJES notified those events to Bob.

Fig. 4.2. Listener Registration & Event Notification

Figure 4.3 shows that the SSCM proves whether the certificate-path provided by the event consumer “Bob” is correct or not.

Development of Secure Event Service for Ubiquitous Computing

93

Fig. 4.3. Proving process of authorization certificate-path

Figure 4.4 shows the event producer of gadget1 Susan’s log of event publishing.

Fig. 4.4. Example the Screen of Event Publishing

5 Characteristics of Our SeJES System SeJES is distinguished with CORBA in three aspects. Firstly, event service based on CORBA uses event channel. Accordingly, it is not totally an isolated model because event consumers and event suppliers communicate each other through the channel. Secondly, event service based on CORBA transfers its filtering responsibility to consumer programmers. Therefore, any events written in the channel are to be transferred to every consumer listening to the channel, which is an inefficient process by increasing excessive expenses of communication. However, our system can deliver only necessary events to a particular consumer in need because content based filtering using JavaSpace is available in the SeJES system. Finally, no persistency is existed in CORBA based event service. If the channel is down, it loses all information concerning consumers and suppliers connected to the channel. For that reason, there is no

94

Y. Lee, S. Lee, and H. Lee

way for consumers to take events provided while the channel is disconnected even after they are successfully connected again. However Our SeJES system is able to do that because of its lease () function.

6 Conclusions and Further Work In this paper, we design and implement a secure event service, SeJES. Event consumers are able to register their listeners with the SeJES and disconnect their listeners at any time. Furthermore, the SeJES enables only authorized consumer and supplier to securely exchange their events by using SPKI/SDSI certificate. In addition, our SeJES guarantees that event is transferred to only all of the authorized event consumers and enhances the security by delivering event consumer and suppliers secure session key. The proposed event service, SeJES, is able to store event for lease-time. While being leased, event consumer commits a filtering based on content during the lease-time. However, our SeJES is not able to federate event servers and does not provide the event service with priority. Also we want to implement our SeJES which has more QoS and more Filtering functions. Most of them are our future work.

References 1. Object Management Group: CORBAServices: Common Object Services Specification. Revised Edition. (1995) 2. Philip Bishop , Nigel Warren: JavaSpace IN PRACTICE. Addison-Wesley (2003) 3. Lee, Younglok., et al.: Development of Event Manager and its Application in Jini Environment. EUC Workshops 2005, LNCS 3823, Springer-Verlag, Nakasaki (2005) 704 – 713 4. Paul Stephens: Implementation of the CORBA Event Service in Java. A Thesis for the Degree of Masters of Computer Science, Trinity College, Dublin (1998) 5. IONA: OrbixEvents Programmer’s Guide. IONA Technologies PLC (December 1997) 6. 6. IONA: OrbixTalk-The White Paper. Technical Report, IONA Technologies PLC, April (1996) 7. OUTBACK: jEVENTS-Java-based Event Service User’s Guide. OutBack Resource Group Inc. (1997) 8. ICL Object Software Laboratories: DAIS Multicast Event Service. White Paper (1998) 9. Andrew J. Maywah: An Implementation of a Secure Web Client Using SPKI/SDSI Certificates. Master Thesis, M.I.T, EECS (2000) 10. Dwaine Clarke, Jean-Emile Elien, Carl Ellison, Matt Fredette, Alexander Morcos, Ronald L. Rivest: Certificate Chain Discovery in SPKI/SDSI. Journal of Computer Security, 9 (2000) 285-322 11. Sun Microsystems: .Jini™ Architecture Specification. Sun Microsystems (1997-2000) 12. Keith Edwards, W., Edwards, W.: Core Jini. Pearson Education, (2000)

Energy Efficient Connectivity Maintenance in Wireless Sensor Networks Yanxiang He and Yuanyuan Zeng School of Computer, Wuhan University, 430072, Hubei, P.R. China [email protected], [email protected]

Abstract. Connectivity maintenance in energy stringent wireless sensor networks is a very important problem. Constructing a connected dominating set (CDS) has been widely used as a connectivity topology strategy to reduce the network communication overhead. In the paper, a novel energy efficient distributed backbone construction algorithm based on connected dominating set is presented to make the network connected and further prolong the network lifetime, balance energy consumption. The algorithm is with O(n) time complexity and O(n) message complexity. The results show that our algorithm outperforms several existing algorithms in terms of network lifetime and backbone performance.

1 Introduction The research on wireless sensor networks has been fueled up by many applications in various areas [1, 2]. An important problem of sensor networks is the stringent power budget of wireless sensor nodes. Research has shown that a great amount of energy of sensor nodes is consumed for communications. Reducing the communication cost is an important way to save energy of sensor nodes and to prolong the life time of sensor. Minimizing energy consumption and maximizing the system lifetime has been a major design goal for make connectivity maintenance in wireless sensor networks. A dominating set (DS) of a graph is a subset of nodes such that each node in the graph is either in the subset or adjacent to at least one node in that subset. A CDS is a DS which induces a connected sub graph. A connected dominating set (CDS) is a good candidate of a virtual backbone for connectivity maintenance in wireless networks [3], because any node in the network is less than 1-hop away from a CDS node. One objective for constructing the backbone is to minimize the size of a backbone. We assume that every node has the same transmission range so that we can model the network topology using unit disk graphs, UDG in short. Unfortunately computing a minimal CDS (denoted by MCDS) of a UDG graph has been proved to be NP-hard [4]. In this paper, we propose a novel efficient distributed algorithm that computes a sub-optimal MCDS in polynomial time to maintain the network connectivity. The backbone nodes have heavy communication load and their energy consumption is high. To balance the energy consumption load, each time when we compute the MCDS, we also compute out the length of time that this MCDS should work as a D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 95 – 105, 2006. © Springer-Verlag Berlin Heidelberg 2006

96

Y. He and Y. Zeng

backbone. The length of time is dependent on the residual energy of the MCDS nodes. When this length of time expires, the sensor nodes coordinate with each other again and compute the next MCDS as the backbone for the next period of time. By doing so, the energy consumption of all nodes in the network is balanced and the lifetime of the network is extended. The remainder of this paper is organized as follows. Section 2 briefly introduces the related work in the literature. Section 3 discusses out distributed algorithm for constructing the CDS as backbone to make connectivity. Section 4 is the performance analysis and simulations. Section 5 is the conclusion and future work.

2 Related Work In the last few years, researchers actively explored advanced power conservation topology control approaches for wireless sensor networks. Extensive work has been done on the connectivity maintenance issue. Research in [4] focused on energy conservation by controlling sensor transmission power in order to maintain network connectivity. It demonstrated that the network connectivity can be maintained if each sensor has at least one neighbor in every cone of 2 /3. Xu et al. [5] proposed two algorithms that can conserve energy by identifying redundant nodes of connectivity. In GAF [6], nodes use geographic location information to divide the world into fixed square grids. Nodes within a grid switch between sleeping and listening, with the guarantee that one node in each grid stays up to route packets. SPAN [7] is another protocol that achieves energy efficiency for wireless sensor networks by introducing off-duty and on-duty cycles for sensor nodes. Dominating set based topology control leads to a virtual backbone for the deployed ad hoc and sensor networks. The virtual backbone is formed by representing the connected routing nodes as a connected dominating set (CDS). Since the minimal CDS problem is NP-hard, most previous work has focused on finding distributed heuristics for reducing the size of CDS. Current MCDS approximation algorithms include centralized and distributed algorithms. Following the increased interest in wireless ad hoc and sensor networks, many distributed approaches have been proposed because of no requirements for global network topology knowledge. These algorithms contain two types. One type is to find a CDS first, then prune some redundant nodes to attain MCDS. Wu and Li proposed in [11] a distributed algorithm with message complexity and O( 3) time complexity, the approximation factor at most n/2. Butenko et al [12] constructs a CDS starting with a feasible solution, and recursively removes nodes from the solution until a MCDS is found. The other type is to form a maximal independent set (MIS) at first, and then find some connectors to make the independent nodes connected together. P. J .Wan et. in [9] proposes a distributed algorithm with performance ratio of 8. Min et al in [11] propose an improved algorithm by employing a Steiner tree in the second step to connect the nodes in the MIS with performance ratio of 6.8.

±

P

Energy Efficient Connectivity Maintenance in Wireless Sensor Networks

97

3 Distributed Algorithm for Energy Efficient MCDS When all sensor nodes have the same transmission range, the network topology is modeled as a UDG G= (V, E). An edge represents that nodes u and v are within each other’s transmission range. Each node u is associated with a unique ID, denoted by id(u) (this can be, for instance, IP or MAC address). The aim of our algorithm is to compute a sub-optimal MCDS as a backbone for wireless sensor networks. Our algorithm consists of two phases. In the first phase, we compute a maximal independent set (MIS) of the network graph. An independent set of a graph is a subset of V that no two nodes in the subset have an edge. An MIS of a graph is an independent set that cannot include any more node in V. An MIS is a DS of a graph. Note that this DS (obtained as the MIS) may not be connected. The second phase of the algorithm is to choose the minimal number of nodes (called connectors) to make the DS connected, i.e., a CDS. Each time when constructing a CDS, the length of operating time of this CDS is determined according to the residual energy of the CDS nodes. When this operating time expires, the next CDS is computed. To extend the lifetime of the network, we always give higher priority to the nodes with higher residual energy to be as backbone nodes. Thus, nodes will be usually acting as backbone nodes in turn and the energy consumption of nodes is well balanced. For each node u, we define weight as: w(u)={energy(u), degree(u)}. The higher significant part of w(u) is the residual energy of u. When two nodes have the same energy, the node with a higher degree has a higher priority. This policy would make the size of the CDS smaller (under the condition of energy balance). 3.1 MIS Construction Since any two nodes in MIS cannot have an edge, that is, when a node is in MIS, any other node that has an edge incident to this node cannot be included in the MIS. We use colors to indicate if a node is in MIS or not. The algorithm always starts from a node that initiates (invokes) the execution of the algorithm. We call this node initiator. We use black to indicate the nodes in MIS and grey to indicate non-MIS nodes. Each node is in one of the four states: white, black, grey and transition. Initially, all nodes are in white, and at the completion of the algorithm all nodes in the network must be either in black (MIS nodes) or in grey (non-MIS nodes). The state transition of a node is done in response to the message it receives. There are three types of messages: 1) BLACK message sent out when a node becomes a black node; 2) GREY message, sent out when a node becomes a grey node; 3) INQUIRY message, sent out when a node inquires the weights and states of its neighbors. Every message contains node state, id and weight in format. As the start of the algorithm, the initiator colors itself in black. A node that colors itself in black will broadcast a BLACK message to its neighbors (to indicate itself as an MIS node). A neighbor that receives a BLACK becomes a grey node (a non-MIS node), and it broadcasts a GREY message to its neighbors. A node that receives a GREY message needs to compete to become a black node. So it broadcasts an INQUIRY message to its neighbors to inquire their states and weights. It sets a timeout to wait for the replies of the INQUIRY message. The node is in the transition state during this timeout period, because it cannot determine whether it would become

98

Y. He and Y. Zeng

black or grey. If it finds it has the highest weight among all its transition state neighbors based on the replies from all its neighbors, its color is changed to black, and it does the same as the other black nodes do. If this node is still in the transition state when the timeout expires, its color changes to white. The algorithm is fully distributed and all nodes execute the same algorithm concurrently. Any node whose neighbors are all colored in black or grey terminates. The MIS construction procedure ends when every node terminates. MIS construction algorithm: initiator () { Color itself black; Broadcast a BLACK msg; } Each node i, responses to the msg it receives: MIS-algorithm { Receive a msg; If it is black/grey then Ignore the msg; If its neighbors have no white neighbors then Return; end if else Switch on message-type { Black: Color itself grey; Broadcast a GRAY msg; Grey: Broadcast an INQUIRY msg; Enter transition state; Set a timeout waiting for replies; If w(i) is the highest then Color itself black; Broadcast a BLACK msg; end if If in transition after timeout then Color itself white; end if Inquiry: Reply its own color and w(i); } end Switch end if } Theorem 1: The set of black nodes represented as B that computed by the first section algorithm is an MIS of the network graph. Proof: The algorithm colors the nodes of the graph layer by layer, and propagates out from the initiator to reach all nodes in the network, with one layer of black and the next layer as grey. At each layer, black nodes are selected by gray nodes of previous layer and are marked black. The construction incrementally enlarges the black node set by adding black nodes 2 hops away from the previous black nodes set. Also the

Energy Efficient Connectivity Maintenance in Wireless Sensor Networks

99

newly colored black nodes could not be adjacent to each other, for the interleaving coloring layer of black and grey nodes. Hence every black node is disjoint from other black nodes. This implies that B forms an independent set. Further, the algorithm will end up with black or grey nodes only. Each grey node must have at least one black neighbor, so if coloring any grey node black, B will not be disjoint anymore. Hence B is a maximal independent set. Theorem 2: Considering the propagation layer of MIS, Let Bi and Gi be the set of nodes marked black and grey at ith layer. For a MIS node in Bi, there always exists that it has a neighbor in Gi connecting at least another MIS node in Bi+1 with it. Proof: For any node gę*L is a non-MIS node formed at the ith layer. In the construction algorithm, it must be selected to be marked grey from white state on receiving a Black message from its black neighbor in Bi. Next, after determining its state, the grey node g sends out a Grey message to all its neighbors in the i+1th layer. The neighbor finds itself with the highest weight among all its transition neighbors will become a black node in Bi+1. This implies that there always exists a non-MIS neighbor node gę*L has at least two MIS nodes in Bi and Bi+1 respectively. So for a MIS node in Bi, there always exists that it has a neighbor in Gi connecting at least another MIS node in Bi+1 with it. 3.2 Connected Dominating Set Construction In this section, we make interlacing selection of interconnecting nodes (called connectors) into the formation of connected DS based on previous MIS construction, i.e. connectors of black nodes are established in an interlaced fashion during the construction of MIS. When all grey neighbors of a black node terminate the MIS procedure, this implies the first section algorithm for this node and its grey neighbors terminates. Then the black node will enter the second CDS construction to find connectors. Apparently, the CDS section algorithm starts from MIS initiator too because of propagation order of the MIS procedure algorithm. Our main idea is to employ a Steiner tree in this subsection to connect nodes in MIS. In a graph, a Steiner tree for a given subset of nodes, called terminals. Every node other than the terminals in the Steiner tree is called a Steiner node. The constructed MIS nodes are terminals, and the selected connectors from non-MIS nodes are Steiner nodes. The internal nodes in the Steiner tree become a CDS. We expect to select a small number of Steiner nodes from non-MIS nodes with higher power in order to obtain good efficiency of CDS. We use a greedy approximation algorithm that every black node selects the grey node with maximal black neighbor number as a connector. If two grey nodes have the same black neighbor number, then the one with higher energy level has higher priority. Each MIS node is in one of the three states: black, transition and blue. Each nonMIS node is in one of the three states: grey, compete and blue. The black and grey are initial state of CDS procedure (after finishing the first MIS procedure), and blue state is final state to indicate the node is in CDS or not. The transition and compete state is the unsure state when a node can’t decide itself as a CDS node. There are three types of messages: 1) INQUIRY message, sent out when a black node inquires its grey neighbors about their state and number of black neighbors. 2) INVITE message, sent

100

Y. He and Y. Zeng

out to invite a grey neighbor to be a connector. 3) BLUE message, sent out when a node changes blue. At the start of the algorithm, the MIS initiator colors itself in blue. A node that colors itself in blue will broadcast a Blue message to its neighbors (to indicate itself as a CDS node). Next, the black and grey nodes (after finishing MIS procedure) will execute corresponding state transition mechanism. When a node is black initially, if the node and its grey neighbors have finished the MIS procedure, the black node will broadcast an INQUIRY message and enters transition state. It sets a timeout to wait for the replies of the INQUIRY message. The node is in the transition state during this timeout period, because it cannot determine which node should be selected to behave as a connector. If a node in transition state receives a BLUE message will enter blue state. This implies that it already has a grey neighbor as a connector. Otherwise, the node still has no neighbor as a connector will try to select one. The selection of connectors is based on replies of INQUIRY message, which include the information of black neighbor number and energy level of its grey neighbors. The connector selection rules are: 1) the selected neighbor should be adjacent to at least a blue node. 2) The selected grey node is with maximal black neighbor number. If multiple nodes are found, then we use node energy level as a tie breaking mechanism (higher energy node wins). The rule 1 protects the constructed CDS is a complete component of Steiner tree merged together. The rule 2 protects the CDS with smaller size and energy efficiency. The intuition of transition state is to wait for replies of INQUIRY from its neighbors, and make decision to select a neighbor as a connector. When the node selects a grey neighbor matching the above two condition, it sends out an INVITE message to neighbor, and changes itself to blue state. When enters in blue, the node broadcasts a BLUE message to indicate itself CDS node. When grey initially, a node response the received messages. A grey node that receives a BLUE message will update its information about number of black neighbors. A grey node that receives an INQUIRY message replies the sender with the number of black neighbors and its energy, and then enters compete state. The intuition behind compete state is to probe the network to see if itself suits as a connector. If a grey node in compete state receives an INVITE message, it is invited as a connector and colored in blue. When enters blue state, the node broadcasts a BLUE message to neighbors. The CDS construction algorithm continues until: 1) Any MIS node colored blue and no white neighbors terminates the procedure. 2) Any nonMIS node terminates the procedure when all its neighbors are colored blue or grey. The same operation continues until every node terminates. The CDS algorithm ends when every node terminates. CDS construction: Initiator() { Color itself in blue; Broadcast a BLUE msg;} Each node i, execute operation according to its state: CDS-algorithm{ Switch on state-type{ Black: If all neighbors terminate then Broadcast an INQUIRY msg; Enter the transition state;

Energy Efficient Connectivity Maintenance in Wireless Sensor Networks

101

end if Grey: Receive a msg; If receive an INQUIRY msg then Reply its black neighbor number; Reply its energy(i); Enter the compete state; end if If receives a BLUE msg then Update its black neighbor number; Update its energy(i); If all neighbors color in blue or grey then Return; Transition: Set a timeout waiting for replies; Receive a msg; If receive a BLUE msg then Color itself in blue; else Find a neighbor as a connector; Send out an INVITE msg to the node; Color itself in blue; end if Compete: Receive a msg; If receive an INVITE msg then Color itself in blue; Blue: Broadcast a BLUE msg; If all neighbors color in blue or grey then Return ;} end Switch} Theorem 3: The set of blue nodes computed by the algorithm is a CDS of the network graph. Proof: The set of blue nodes are contained by MIS and connectors. MIS is a dominating set, so we only need to proof the connectivity. Let {b0,b1…bn} be the independent set, which elements are arranged one by one in the construction order. Hi be the graph over {b0,b1…bi}(İL˘n) in which pairs of nodes are interconnected by connectors. We prove connectivity by induction on j that Hj is connected. Since H1 consists of a single node, it is connected trivially. Assume that Hj-1 is connected for some Mı2 . Considering message propagation layer in our algorithm, let Bi-1 and Gi-1 be the set of nodes marked black and grey at the i-1th layer, respectively. The gray node in Gi-1 with maximal number of black neighbor and adjacent to a blue node is selected as connecters. According to theorem 2, it’s enough to find grey nodes which interconnect Bi-1 nodes at i-1th layer with Bi nodes in the ith layer. As Hj-1 is connected, so must be Hj. So the nodes in MIS and connectors set are connected together, and they also form a dominating set. Therefore the set of blue nodes computed by the algorithm is a CDS.

102

Y. He and Y. Zeng

4 Performance Evaluation and Simulations The message complexity and time complexity of our distributed algorithm are analyzed at first. Since each node sends out a constant number of messages, the total number of message is O(n). The use of linear message takes at most linear time. Theorem 4: Our distributed algorithm has O(n) time complexity, and O(n) message complexity. Next, we analyze the size of energy efficient CDS. The following important property of independent sets is that: Lemma 1: In a unit disk graph, every node is adjacent to most five independent nodes. Lemma 2: In any unit disk graph, the size of every maximal independent set is upperbounded by 3.8opt+1.2 where opt is the size of minimum connected dominating set in this unit disk graph. Theorem 5: In the CDS construction phase, the number of the connectors will not exceed 3.8opt, where opt is the size of MCDS. Proof: Let B be the independent set and S be the connectors set of a graph. From lemma2: |B_İRSW. From theorem2 and lemma1, it can be deduced that connectors has black neighbor number ranged from 2 to 5. The worst case occurs when all nodes are distributed in a line. By analyzing utmost situation, the number of gray connecting nodes must be less than the number of MIS nodes (details omitted). |S_İ_%_İRSW. The number of output connecting node will not exceed 3.8opt. Theorem 6: The approximation factor of our algorithm is not exceeding 7.6. Proof: Our distributed algorithm includes two phases. One is the MIS construction, and the other the forming of CDS by Steiner nodes. From Lemma2, the performance ratio in the first phase is 3.8. From Theorem 5, the performance ratio is 3.8 in the second phase, so the resulting CDS will have size bounded by 7.6. In our algorithm, the node with higher power will have bigger chance to become a CDS node. The reconstruction mechanism makes the balance of energy consumption in networks as energy level changes. Our algorithm guarantees that the CDS nodes have good energy efficiency and extend the network lifetime. The simulation network size is 100-300 numbers of nodes in increments of 50 nodes respectively, which are randomly placed in a 160X160 square area to generate connected graphs. Radio transmission range is 30 or 50m. Each node is assigned initial energy level 1 Joule (J). A simple radio model is used: Eelec is energy of actuation, sensing and signal emission/reception. Eamp is energy for communication, varies according to the distance d between a sender and a receiver. Eamp=¯fs, when d
Energy Efficient Connectivity Maintenance in Wireless Sensor Networks

103

The data routing takes flooding protocol. We take parameter timeout in MIS construction algorithm as 0.5s. The simulation makes average solutions over 30 iterations of random generating scenes. Fig. 1, 2 shows the size of the dominating set with the increasing number of nodes in the network for a certain transmission radius. ECDS has a good performance with smaller CDS size when comparing with WAA and WLA as the network size increases. Fig. 3 shows the average CDS residual energy as the network size increases for working 150s (one event every 0.5s) with r=50m. ECDS achieves better energy efficiency with much higher residual energy comparing with WAA and WLA. And WLA has the worst energy efficiency for its big size of dominating set. Fig. 4 shows the network lifetime (length of working time until can’t construct a backbone for the network) as the network size increases from 100 to 300 nodes when r=50m. ECDS has much better energy performance comparing with WAA and WLA. It can work with longest time until can’t construct a backbone any more. Apparently, ECDS has better network lifetime when compared with the other two algorithms. 65

WAA WLA ECDS

60

Size of CDS

55 50 45 40 35 30 25 100

150

200

250

300

Network size

Fig. 1. Size of CDS as the network size increases when r=30m 40

WAA WLA ECDS

35

Size of CDS

30

25

20

15

10 100

150

200

250

300

Network size

Fig. 2. Size of CDS as the network size increases when r=50m

104

Y. He and Y. Zeng 0.95

WAA WLA ECDS

Ave residual energy of CDS (J)

0.9

0.85

0.8

0.75

0.7

0.65 100

150

200

250

300

Network size

Fig. 3. Average CDS residual energy as the network size increases when r=50m 1500

WAA WLA ECDS

Ave residual energy of CDS (J)

1400 1300 1200 1100 1000 900 800 700 600 500 100

150

200

250

300

Network size

Fig. 4. Network lifetime as the network size increases when r=50m

5 Conclusion and Future Work A distributed energy efficient backbone based on connected dominating set algorithm for connectivity maintenance of wireless sensor networks is presented. The nodes with higher weight have more chance to be selected as backbone nodes to efficiently manage the network. The algorithm makes energy consumption balanced by computing a new CDS when nods residual energy of network cut down to a certain level. The algorithm time complexity and message complexity of this algorithm are both O(n). The performance ratio is 7.6. Moreover, the algorithm is fully distributed, only uses simple local node behavior to achieve a desired global objective. The simulation results show that the algorithm can efficiently prolong network lifetime and balance node energy consumption with a smaller backbone size, comparing with existing classic algorithms. The future work will focus on simulations under various settings and MCDS improvement with QoS consideration.

Energy Efficient Connectivity Maintenance in Wireless Sensor Networks

105

References 1. Pottie, G.J., Kaiser, W.J.: Wireless Integrated Network Sensors, Communications of ACM. 43 (2000) 51-58 2. Akyildiz, I.F, Su, W., Sankarasubramaniam, Y., Cayirci, E.: A Survey on Sensor Networks, IEEE Communications Magazine, 40 (2002) 102-114 3. Ephremides, A., Wiselthier, J., Baker, D.: A Design Concept for Reliable Mobile Radio Networks With Frequency Hoping Signaling, Proc. IEEE, 75 4. Wattenhofer, R., Li, L., Bahl, P., Wang, Y.: Distributed Topology Control for Power Efficient Operation in Multihop Wireless Ad Hoc Networks, Proc. of InforCom (2001) 5. Xu, Y., Bien, S., Mori, Y., Heidemann, J., Estrin, D.: Topology Control Protocols to Conserve Energy in Wireless Ad Hoc Networks, Technical Report 6, University of California, Los Angeles (2003) 6. Xu, Y., Heidemann, J., Estrin, D.: Geography-Informed Energy Conservation For Ad Hoc Routing, In: proc. of 7th Annual Int’l Conf on Mobile Computing and Networking (MobiCom), Rome, Italy. (2001) 70-84 7. Chen, B., Jamieson, K., Balakrishnan, H., Morris, R.: Span: An Energy-Efficient Coordination Algorithm for Topology Maintenance in Ad Hoc Wireless Networks, Proc. MobiCom. (2001) 85-96 8. Clark, B.N., Coloburn, C.J., Bhargavan V.: Unit Disk Graphs, Discrete Mathematic, 86 (1990) 165-177 9. Wan, P.J., Alzoubi, K., Frieder, O.: Distributed Well Connected Dominating Set in Wireless ad hoc networks, in proc. of INFOCOM (2002) 10. Alzoubi, K.M., Wan, P.J.: New Distributed Algorithm For Connected Dominating Set In Wireless Ad Hoc Networks. Proc. 35th Hawaii Int’1 Conf, System Sciences (2002) 3881-3887 11. Min, M., Huang, C.X., Huang, S.C.-H., Wu, W., Du, H., Jia, X.: Improving Construction for Connected Dominating Set with Steiner Tree in Wireless Sensor Networks, to appear in Journal of Global Optimization (2004) 12. Wu, W., Du, H., Jia, X., Li, Y., Huang, C.-H., Du D-Z.: Minimum Connected Dominating Sets And Maximal Independent Sets In Unit Disk Graphs, Technical Report 04-047, Department of Computer Science and Engineering, University of Minnesota (2004)

The Critical Speeds and Radii for Coverage in Sensor Networks* Chuanzhi Zang1,2, Wei Liang1, and Haibin Yu1 1

Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China 2 Graduate School of the Chinese Academy of Sciences, Beijing 100049, China {zangcz, weiliang, yhb}@sia.cn

Abstract. In sensor network the coverage problem is very important to topology control, energy saving, routing and et al. Compared to the well known sensor model and exposure model, a modified sensor model and a modified exposure model are used to analyze the critical parameters for coverage. The modified sensor model can deal with more general conditions, and the modified exposure model is more reasonable than the known ones. Based on these models, the sensor physical characteristics and the target properties, we analyze the coverage problem mathematically and identify two critical speeds and two critical radii of influence of the sensor node. Using these results, it is easy to estimate the critical sensor density or the critical number of sensor nodes required to cover a given area. Keywords: sensor network, coverage, exposure.

1 Introduction Recent technological advances in distributed embedded systems have prompted significant research efforts in both the industry and the academia. Among such systems, wireless ad-hoc sensor networks are particularly noteworthy due to their potentially numerous, economically attractive applications and their ability to bridge the interface between the user, and the physical world[1,2]. Unlike traditional embedded systems, the new wireless sensor network nodes have remarkable computational and storage capabilities. An important problem receiving increased consideration recently is the sensor coverage problem, centered on a fundamental question: How well do the sensors observe the physical space? In some ways, it’s one of the measurements of the quality of service (QoS) of sensor networks. The coverage concept is subject to a wide range of interpretations due to a variety of sensors and applications. Different coverage formulations have been proposed, based on the subject to be covered (area versus *

This paper is supported by Natural Science Foundation of China under contract 60434030 and 60374072.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 106 – 117, 2006. © Springer-Verlag Berlin Heidelberg 2006

The Critical Speeds and Radii for Coverage in Sensor Networks

107

discrete points)[3,4], the sensor deployment mechanism (random versus deterministic), as well as other wireless sensor network properties (e.g. network connectivity and minimum energy consumption). For example, in the battlefield, the sensor nodes are randomly deployed to detect enemy movement. Upon detection, nodes transmit the information to the user via multi-hop communication. An important question in such scenarios is to determine the number of sensors to be deployed so that the entire area is covered and probability of detection is high. Deploying small number of nodes might leave blind spots or sensing holes, which can allow the enemy to pass through. Thus knowing the sensing capacity as a function of number of nodes to be deployed is crucial for design of sensor networks. Density of nodes is also a crucial parameter in scenarios where network is deployed to monitor environmental variables. Leaving blind spots in such cases can reduce the accuracy of the results obtained. In the example described above, typically the target is a signal source, and the nodes receive the signal via a channel. Depending upon the strength of the signal received, the node detects the target. Thus the sensing capacity of the sensor network would depend upon the target characteristics as well as sensor sensitivity and calibration. Thus the density evaluation must take into account the nature and characteristics of both the sensor as well as the target. In this paper we focus on the area coverage with random sensor deployment. All sensors have the same characteristics. Compared to the well known sensor model and exposure model in [3-7], a modified sensor model and a modified exposure model are used to analyze the critical parameters for coverage. The modified models can deal with more general conditions. Based on the sensor physical characteristics and the target properties, we analyze the coverage problem mathematically and identify two critical speeds: the undetectable speed and partial detectable speed. When the target speed is greater than undetectable speed it can’t be detected, and when the target originates from a sensor and its speed is greater than partial detectable speed, it can’t be detected. We also identify two critical radii: the radius of complete influence and radius of no influence. The target within the radius of complete influence can be detected and the target beyond the radius of no influence can escape from the detection. Using these results, it is easy to estimate the critical sensor density or the critical number of sensor nodes required to cover a given area. The remainder of the paper is organized as follows. In the next section we summarized the related work. Section 3 gives various models which are used to analyze the coverage problem. In section 4 we analytically evaluate the critical speeds and radii, and thus estimate the critical sensor density and number required to cover a given area. Section 5 gives the simulations which verify the theoretical result. The last section concludes the paper.

2 Related Work The computational geometry method is often used to solve the coverage problems. The Art Gallery Problem [8] deals with determining the number of observers necessary to cover an art gallery room such that every point is seen by at least one observer. It has found several applications in many domains such as the optimal antenna placement

108

C. Zang, W. Liang, and H. Yu

problems for wireless communication. The Art Gallery problem was solved optimally in 2D and was shown to be NP-hard in the 3D case. Reference [8] proposes heuristics for solving the 3D case using Delaunay triangulation. Sensor coverage for detecting global ocean color where sensors observe the distribution and abundance of oceanic phytoplankton [9] is approached by assembling and merging data from satellites at different orbits. It seems that Meguerdichian et al.[4] were among the first several researchers to identify the importance of using Delaunay triangulation and Voronoi diagram in sensor network coverage. Given a wireless sensor network, it is interesting in designing a localized algorithm that finds a path connecting a point s and a point t which maximizes the smallest observability of all points on the path. It is called the best coverage problem[4]. Meguerdichian et al presented a centralized method using the Delaunay triangulation to solve the best coverage problem. Their algorithm has the best possible time complexity among centralized algorithms. Compared to the best coverage problem, the worst coverage problem is to find the path that maximizes the distance of the path to all sensor nodes. Meguerdichian et al. presented a centralized method using the Voronoi diagram to solve the worst coverage problem. Several related problems were also studied recently. The minimum exposure problem[5,6] is to find a path connecting two points in the domain that minimizes the integral observability over the time traveled from the source point to the destination point. Using a multiresolution technique and Dijkstra and/or Floyd-Warshall shortest path algorithms, Meguerdichian et al.[5,6] presented an efficient and effective algorithm for minimal exposure paths for any given distribution and characteristics of sensor networks. The algorithm works for arbitrary sensing and intensity models and provides an unbounded level of accuracy as a function of run time. Adlakha et al[7] researched the critical density thresholds for coverage in wireless sensor networks. In [7], Adlakha et al evaluated the critical number of nodes required for target detection in a sensor network. They used physical characteristics of sensors and target to derive an equation for effective sensor radius. Using this effective radius they estimated the critical density for coverage in sensor network. They incorporated physical characteristics of sensor and target in evaluating the sensing capacity of sensor networks. Such modeling enables sensor network design, where the user can decide the density of nodes to be used depending upon the target characteristics it is trying to detect as well the nature of sensor deployed. The sensor models used in [4-7] has no definition when the sensor and the target at the same position, and when the target is very close to the sensor, the value received by sensor can be greater than any given positive real number which isn’t reasonable. Having these in mind, we can understand why the result in [7] is wrong when the sensor is very close to the target. Compared to the result in [7], ours is more general and understandable.

3 Sensor Network Models Before analyzing the critical speed and radii we will describe the sensor model, exposure model and target model.

The Critical Speeds and Radii for Coverage in Sensor Networks

109

3.1 Sensor Model Sensing devices generally have widely different theoretical and physical characteristics. Thus, numerous models of varying complexity can be constructed based on application needs and device features. However, for most kinds of sensors, the sensing ability diminishes as distance increase. Given a sensor s and a target located at point p, Meguerdichian et al.[5,6] defined the sensor model as

S (s, p ) =

λ

(1)

[d (s, p )]

k

where d(s,p) is the Euclidean distance between the sensor s and the point p, the positive constants Ȝ is the signal amplitude and k is sensor technology-dependent parameter. From (1), one can easily find the shortages of the sensor model as following. lim S ( s, p) = ∞

(2)

d →0

So when the target and the sensor at the same position, model (1) has no definition. On the other hand, given a sensor, the value that the sensor can read has an upper bound, denoting which as Fmax. From (2), one can easily find that the value S has no upper bound. It is infeasible. In order to overcome those shortages we give a modified sensor model: S (s, p ) =

λ

(3)

[d (s, p ) + 1]

k

where Ȝ denotes the signal amplitude, and for simplicity, we only consider the signal which has a positive constant value. When the sensor and target at the same position, using (3), we can get S=Ȝ which is the signal original value. Since each sensor requires certain signal to noise ratio (SNR) to detect the signal, beyond a certain noise figure Fmin, the signal would not be detected. This is because the signal strength would fall below the noise floor. Thus if S(s, p) < Fmin, the sensor would not detect the signal. Thus the domain of S is Fmin <S(s, p) < Fmax, so we have: 0 ° ° ° S (s, p ) = ® Fmax ° λ ° ° [d (s, p ) + 1]k ¯

λ

[d (s, p ) + 1]

k

< Fmin

[d (s, p ) + 1]

k

> Fmax

λ

(4)

otherwise

3.2 Exposure Model

Depending upon the type of detection and the application, Meguerdichian et al.[5,6] and Adlakha et al[7] identified two kinds of exposure model: integrator model and derivative model. Those models determine how the signal received from the target is processed to make a decision. For example, an acoustic sensor can sense the target for a fixed period of time, integrate the acoustic energy and if the energy exceeds the threshold, it

110

C. Zang, W. Liang, and H. Yu

declared that the target is detected. This was the integrator model and it was first introduced in [5,6]. Suppose an object O is moving in the field F from point p(t1) to point p(t2) along the curve (or path) p(t). Meguerdichian et al.[5,6] define the integrator model as t2

E = ³ S (s, p ) t1

dp(t ) dt dt

(5)

where |dp(t)/dt| is the element of the arc length. For example, if p(t)= (x(t), y(t)), then dp(t ) § dx(t ) · § dy(t ) · = ¨ ¸ ¸ +¨ dt © dt ¹ © dt ¹ 2

2

(6)

Exposure (5) depends on the arc length, which also means that the exposure depends on the target speed. In real world the exposure model sometimes has no relationship with the target speed. For example, for an acoustic sensor, when the target’s trace is a circle around the sensor, in spite of the target speed, the signal strength received by sensor is not changed (see figure 1).

target

sensor

Fig. 1. The target runs along a circle around the sensor

So in this paper we only consider the exposure model (see (7)) which doesn’t depend on the target speed. Note that the integrator model pertains to sensors that are energy detectors. Thus when the total signal energy or exposure (which is the total signal strength over the time) exceeds a threshold, the sensor declared the target as detected. t2

E = ³ S (s, p )dt

(7)

t1

In this paper, each sensor makes its own decision separately and detection occurs if E>Et (Ethreshold). 3.3 Target Model

The sensor networks have various applications and different applications has different target model. In this paper we only consider the cases in which the sensor network is used to detect the moving target. The target can be a person, a soldier, or a vehicle. We

The Critical Speeds and Radii for Coverage in Sensor Networks

111

assume the target moves in a straight-line path with constant speed v for a time T. The time T also can be considered as the sensor detecting time.

4 Finding the Critical Parameters First, we give two critical speeds associated with a given sensor network. Definition 1: the undetectable speed vc1. When the target speed is greater than vc1, the sensor network can’t detect the target. Definition 2: the partial detectable speed vc2. When the target speed is greater than vc2, the target originating from a sensor can’t be detected by the sensor.

Second, we give two critical radii associated with a given sensor. Definition 3: the critical radius rc1. It is also named as Radius of complete influence by Adlakha et al[7]. The targets originating within this radius are surely detected. Definition 4: the critical radius rc2. It is also named as Radius of no Influence by Adlakha et al[7]. The targets originating beyond this radius can’t be detected.

In following subsections we will derive analytical result for the critical radii and speed. We use the senor model (4), the exposure mode (7) to analyze the relationship between a target and a sensor. The target moves in a straight-line path with speed v and travels distanceįduring time T. The signal amplitude Ȝ is a constant. The effective value of S(s,p) is between Fmin and Fmax. 4.1 The Biggest Exposure Direction (BED) and Least Exposure Direction (LED)

Let (xs,ys) denote the sensor position, (x,y) denote the target position, (xo,yo) denote the target original position, and (xe,ye) denote the target final position. From (3), we get 2

§ k λ − 1· = ( x − x )2 + ( y − y ) 2 ¨ s s S ¸¹ ©

(8)

f ( x, y ) = ( x − x s ) + ( y − y s )

(9)

Let 2

2

The gradient of f is ∇f ( x, y ) = (2(x − x s ),2( y − y s )) = 2((x, y ) − ( x s , y s )) T

T

(10)

From (10), we get that the vector (x,y)-(xs,ys) is just the direction from the sensor to

ęR, i=1,…,m, so f = §¨©

λ

2

− 1· , i=1,…,m. Thus we can S ¸¹ get a set of contour lines which are circles centered on the sensor. It is well known that the gradient is perpendicular to those contour lines (see figure 2) and thus the line on which the gradient is passes the sensor. For function f, the gradient direction ∇f is its fastest increasing direction and −∇f is its fastest decreasing direction. So for function

the target. Let S(x,y)=Si, Si

i

k

112

C. Zang, W. Liang, and H. Yu

S, ∇f is its fastest decreasing direction and −∇f is its fastest increasing direction. Given the sensor position (xs,ys) and target original position (xo,yo), we define the least exposure direction (LED) as the direction from the sensor to the target or as the direction of vector (xo,yo)-(xs,ys) and we define the biggest exposure direction (BED) as the direction from the target to the sensor or as the direction of vector (xs,ys)- (xo,yo). Thus we can get the following theorem.

Fig. 2. The LED and BED

Theorem 1. Given the sensor position (xs,ys) and the target original position (xo,yo), we can calculate the LED and BED. When the target moves along the LED, the sensor can get the minimal exposure value, when the target moves along the BED, the sensor can get the maximal exposure value, and when the target moves along the other direction, the exposure value received by sensor is between the minimal and maximal exposure. The benefit of theorem 1 is that when we consider the coverage problem we only need to analyze the LED and BED cases. So for simplicity, in the remainder of this paper, we place the sensor at the origin position and limit the target original position (xo,yo) and its path to x axis. And we let k=2, so we get

S (s, p ) =

λ §¨ ©

(x − x ) + ( y − y ) 2

o

o

2

+ 1·¸ ¹

2

=

λ

(x− x

+ 1)

2

o

(11)

4.2 The Critical Speeds

Because the effective value of S(s,p) subjects to Fmin <S(s, p) < Fmax, keeping d>0 in mind, we can get effective distance as

§ · λ − 1¸ ≤ d (s, p ) ≤ max ¨ 0, ¨ ¸ Fmax © ¹ § · λ Let rmin = max¨ 0, − 1¸, rmax = ¨ ¸ Fmax © ¹

λ Fmin

λ Fmin

−1

(12)

−1 .

We let Ȝ
The Critical Speeds and Radii for Coverage in Sensor Networks T

T

0

0

E = ³ S (s, p )dt = ³ λdt = λT

113

(13)

When Et>ȜT, the sensor network can’t detect the target. To ensure the sensor can detect the target, we let E t ≤ λT

(14)

Lemma 1. Given the threshold Et, the amplitude Ȝ, the target speed v and the moving time T, when the target path passes the sensor and is symmetric to y axis (see figure 3), the sensor can get maximal exposure. Intuitively, the lemma is right, but the proof is very trivial. In this paper we don’t show the proof.

sensor

target (xo,yo)

(xe,ye)

Fig. 3. The maximal path

Given the threshold Et, the amplitude Ȝ, and the moving time T, using lemma 1, we get that the critical speed vc1 satisfies the following equations T

Et = 2

2

³

T

S (s, p )dt =

0

2

³ (v 0

λ

c1t + 1)

2

dt , v c1

T ≤ rmax 2

(15)

or T

Et = 2

rmax 2

³ S (s, p )dt = 0

vc1

³ (v 0

λ

c1t + 1)

2

dt , vc1

T > rmax 2

(16)

Using (15) and (16), we get (17) and (18) respectively 2λ 2 − Et T

(17)

2λ rmax E t 1 + rmax

(18)

v c1 = v c1 =

Note that when the target speed v>vc1, the sensor network can’t detect the target.

114

C. Zang, W. Liang, and H. Yu

Based on the definition of critical speed vc2, we get that the critical speed vc2 satisfies the following equations T

T

³

³ (v

E t = S (s, p )dt = 0

0

λ

c2

t + 1)

2

dt , v c 2T ≤ rmax

(19)

or rmax T

E t = S (s, p )dt =

³

vc 2

³ (v

0

0

λ

c2

t + 1)

dt v c 2T > rmax

2

(20)

Using (19) and (20), we get (21) and (22) respectively.

λ

vc2 = vc2 =

Et

−

1 T

(21)

λ rmax E t 1 + rmax

(22)

Note that when the target speed v>vc2, the target originating from origin can’t be detected by the sensor. 4.3 The Critical Radii

In this subsection we assume that the target speed v<=vc2. Given the threshold Et, the amplitude Ȝ, the target speed v and the moving time T, based on the definition of critical radius rc1, we get that the critical radius rc1 satisfies the following equations T

T

³

³ (r

Et = S (s, p )dt = 0

λ

dt , vT ≤ rmax

+ vt + 1)

2

c1

0

(23)

or T

rmax

E t = S (s, p )dt =

³ 0

v

³ (r 0

λ

c1 + vt + 1)

2

dt , vT > rmax

(24)

Using (23) and (24), we get (25) and (26) respectively.

§ λT · ¸=0 rc21 + (2 + vT )rc1 + ¨¨1 + vT − E t ¸¹ © § λr rc21 + (2 + rmax )rc1 + ¨¨1 + rmax − max vE t ©

· ¸=0 ¸ ¹

(25)

(26)

The rc1 is the positive root of (25) or (26). Note that the targets originating within rc1 are surely detected.

The Critical Speeds and Radii for Coverage in Sensor Networks

115

Given the threshold Et, the amplitude Ȝ, the target speed v and the moving time T, based on the definition of critical radius rc2, we get that the critical radius rc2 satisfies the following equations T

T

³

³ (r

Et = S (s, p )dt = 0

λ

− vt + 1)

2

c2

0

dt , vT ≤ rmax

(27)

or T

rmax

E t = S (s, p )dt =

³ 0

v

³ (r 0

c2

λ

− vt + 1)

2

dt , vT > rmax

(28)

Using (27) and (28), one can easily proof that rc 2 = rc1 + vT

(29)

Note that the targets originating beyond rc2 can’t be detected. Based on the critical radii, we define two critical densities as

ρ c1 =

1 1 , ρ c2 = 2 2 πrc1 πrc 2

(30)

Given a square area L*L, where L is the line length, we define two critical number as N c1 = L2 ρ c1 , N c 2 = L2 ρ c 2

(31)

5 Simulations In order to verify our result, we develop two simulations using MATLAB software. The first simulation is to verify the critical sensor numbers and it verifies the critical radii indirectly. The second is to verify the critical sensor speed. In the simulations, we consider random uniform deployment of sensors over a square area and the target moves in a straight-line path with constant speed v for a time T. In the first simulation, we let L=300, Ȝ=1, k=2, Fmin=0.001, Fmax=1, Et=0.01, T=2, v=5, so we can calculate vc1=193.6754, vc2=96.8377, rc1=9, rc2=19, Nc1=354, Nc2=79. The sensors are deployed on the 300*300 square randomly. The target original position and direction are randomly selected. We let the sensors number change from 1 to 2*Nc1. Under each given sensor number, we let the target present 100 times and the detection probability is the average detected times over the target presenting times. Figure 4 shows one of our simulations result. The simulations tell us when deploying Nc1 sensor nodes randomly the detection probability is about 90%, and when deploying Nc2 sensor nodes randomly the detection probability is about 40%. When deploying more sensors than Nc1 the probability converges at 1. In the second simulation, we let L=100, Ȝ=1, k=2, Fmin=0.001, Fmax=1, Et=0.01, T=2, n=39, so we can calculate vc1=193.6754, vc2=96.8377. The sensors are deployed on the 100*100 square randomly. The target original position and direction are randomly selected. We let the target speed change from 1 to [vc1]. Under each given

116

C. Zang, W. Liang, and H. Yu Area=300*300 simulation=100 Probability

1

0.8

(354,0.92)

0.6 (79,0.51)

0.4

0.2 The Number of Sensors 0

0

100

200

300

400

500

600

700

800

Fig. 4. Probability of detection Vs. number of nodes

speed, we let the target present 100 times and the detection probability is the average detected times over the target presenting times. Figure 5 shows the simulations result. Figure 5 shows that when the speed turns greater the probability turns less. When the speed greater than the vc1, the sensor network can’t detect the target. However, we can’t explain why the probability increases when the speed increases before it reaches 20.

Probability

Area=100*100 simulation=100 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0

20

40

60

80 100 Speed

120

140

160

180

200

Fig. 5. Probability of detection Vs. speed of target

6 Conclusions In this paper we use modified sensor model and exposure model which are more reasonable than known models to analyze the coverage problem in the sensor network mathematically. We find the biggest exposure direction and least exposure direction which simplify the analytical process. We also identify several critical parameters, such as speed, radius, number and density. These parameters are very important when

The Critical Speeds and Radii for Coverage in Sensor Networks

117

designing or analyzing a sensor network. The simulations verify our theoretical results. As part of our future work, we would analyze the critical parameters when the target speed vc2
References 1. Estrin, D., Govindan, R., Heidemann, J., Kumar, S.: Next Century Challenges: Scalable Coordination in Sensor Networks. In Proc. of MobiCOM, (1999)263–270. 2. Akyildiz, I F., Su, W., Sankarasubramaniam, Y., Cayirci, E.: Wireless Sensor Networks: a Survey. Computer Networks. VOL.38, NO.2, (2002)393-422. 3. Li, X., Wan, P., Wang, Y., Frieder, O.: Coverage in Wireless Ad-hoc Sensor Networks. IEEE Transactions on Computers, VOL.52, NO>6, (2003)753-763. 4. Meguerdichian, S., Koushanfar, F., Potkonjak, M., Srivastava, M B.: Coverage Problems in Wireless Ad-hoc Sensor Networks. In IEEE INFOCOM, (2001)1380–1387. 5. Meguerdichian, S., Koushanfar, F., Qu, G., Potkonjak, M.: Exposure in Wireless Ad-hoc Sensor Networks. In Proceedings of the 7th International Conference on Mobile Computing and Networking (MobiCom'01), Rome, Italy, (Best Student Paper Award), (2001)139-150. 6. Meguerdichian, S., Koushanfar, F., Qu, G., Veltri G., Potkonjak M.: Exposure in Wireless Ad-hoc Sensor Networks: Theory and practical solutions. Journal of Wireless Networks, VOL.8, NO.5, (2002)443-454. 7. Adlakha, S., Srivastava, M.: Critical Density Thresholds for Coverage in Wireless Sensor Networks. In IEEE Wireless Communications and Networking Conf. (WCNC), (2003)1615 –1620. 8. Marengoni, M., Draper, B A., Hanson, A., Sitaraman, R A.: System to Place Observers on a Polyhedral Terrain in Polynomial Time. Image and Vision Computing, VOL.18, (1996)773-780. 9. Gregg, W W., Esaias, W E., Feldman, G C. et al.: Coverage Opportunities for Global Ocean Color in a Multimission Era. IEEE Transactions on Geoscience and Remote Sensing, VOL.36, (1998)1620-1627. 10. Zhao, F., Guibas, L.: Wireless Sensor Networks: an Information Processing Approach. Elsevier, 2004.

A Distributed QoS Control Schema for Wireless Sensor Networks Jin Wu1,2 1

School of Computer Science and Engineering, Beihang University 100083 Beijing, China 2 Sino-German Joint Software Institute, Beihang University 100083, Beijing, China [email protected]

Abstract. Wireless ad-hoc sensor networks have recently emerged as a premier research topic. They have great long-term economic potential, ability to transform our lives, and pose many new system-building challenges. One major challenge for its real deployment is its QoS issue. This is a rich area because sensor deaths and sensor replenishments make it difficult to specify the optimum number of sensors (this being the service quality that we address in this paper) that should be sending information at any given time. Through literature survey, we discover that current solution towards this problem remains some unreasonable assumptions in practice. So we proposed a new control schema allowing the control method to be more feasible in real environment. A distributed control schema is introduced in this paper. Every sensor node runs a control algorithm in a distributed fashion. This distributed QoS control schema can handle limitations exist in current QoS control methods.

1 Introduction The research of Wireless Sensor Networks (WSN) is a hot topic and attracts considerable attentions. It is expected that considerable applications for WSN will appears in the nearly future. However, the reliability is a barrier that stops the wide spread of the applications. This research is to propose some QoS mechanism to improve the reliability of the WSN. QoS is an important issue and has been extensively studied in the context of the wired networks, such as the Interserv and Diffserv for the Internet QoS. However, the current collection of networking technologies and techniques for wired networks is not applicable to wireless sensor networks, consisting of massive tiny electronic equipments with low computation abilities and short range wireless communication abilities powered by batteries. Based on this context, the area of Wireless Sensor Networks Quality of Service remains largely open. There are many different definitions towards the concept of QoS in Wireless Sensor Networks. Apart from those communication network related parameters such as throughput, packet loss, and transmission delay which have been amplified addressed by many D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 118 – 123, 2006. © Springer-Verlag Berlin Heidelberg 2006

A Distributed QoS Control Schema for Wireless Sensor Networks

119

researchers, a special QoS parameter to mean the resolution of sensor networks is a unique parameter typically measures the performance for Wireless Sensor Network in terms of the data gathering ability. Reference [1, 3] defines QoS to mean sensor network resolution. Specifically, depending on the different stimuli present in the sensor network, it is defined as the optimum number of sensors sending information toward information-collecting sinks. This is a very important issue, because in any sensor network we want to accomplish two things: 1) maximize the lifetime of the sensor network by having sensors periodically power-down to conserve their battery energy, and 2) have enough sensors powered-up and sending packets toward the information sinks so that enough data is being collected when stimuli presents. Note that the information sinks need a certain amount of information gathered from the different sensors, but sensors in close proximity to each other allow many of those sensors to be powered-down. This is the optimization problem we address, and it is a rich research area because sensors are always placed in the sensing field in random fashions with redundancy. Sensor deaths (e.g., as a result of damage or battery failure) and sensor replenishments make it difficult to control the optimum number of sensors that should be activated and sensing the field at any given time [2, 3]. This issue is firstly described in ref [3], and some improvements are given in ref [1, 4]. However, there is significant weak point for those solutions is that two assumptions are made where 1) broadcast channel exists for collection point to all nodes, and 2) sensor nodes are able to acknowledge the information for collection point even when it is powered off for energy saving. The above two assumptions are not supported by main-stream sensor node equipments. In this paper we present a distributed control algorithm for duty-cycle management to improve the QoS of Wireless Sensor Networks. We consider the sensor network can operate under the following model. Sensor nodes are distributed across the sensing field and simultaneously operating. Each sensor nodes swap in two states, active and sleep. Sensor node itself has to decide when in active (or sleep) state. A control algorithm runs on each sensor node to determine the operation state. We borrow the concept of control from a well-known Active Queue Management algorithm, Random Early Detection. The reminder of this paper is organised as follows. Section 2 gives the related works for this research. Definitions and models related to this paper are presented in Section 3. Section 4 gives a detailed description of the control algorithm for duty cycle. Finally, conclusions are given in Section 5.

2 Background The study of wireless sensor networks is still a burgeoning field, many aspects of sensor networks, such as routing, preservation of battery power, adaptive selfconfiguration, etc., have already been studied in previous papers. Ref. [6] might be the earliest work to the present study as it actively probes the question of QoS that the base stations are receiving from the sensors. However, it defines QoS as total coverage in a static fashion. That is, it does not allow a data sink to dynamically alter the QoS it is receiving from the sensors, depending on varying circumstances.

120

J. Wu

Reference [3] proposed a solution that uses the idea of allowing the base station to communicate QoS information to each of the sensors using a broadcast channel and we use the mathematical paradigm of the Gur Game to dynamically adjust to the optimum number of sensors. The result is a robust sensor network that allows the base station to dynamically adjust the number of sensors being activated, thereby controlling the resolution of QoS it is receives from the sensors, depending on varying circumstances. This research attracts some research attentions and some new papers [1, 4] can be found in the literature to extend the idea.

3 Definitions and Models In this section, Placement Model, Sensing Model, and Converge Measures are defined, respectively. In the rest of this paper, these definitions and models will be used to study the decision fusion policy and its affect to overall performance. In this paper, a commonly used sensor placement model is applied. This model has been used by many researchers, e.g. in ref. [5]. Large number sensors are randomly placed over a two-dimensional geographical region. It is also assumed that the locations of sensors are uniformly and independently distributed in the region. Such a random initial deployment is desirable in scenarios where priori knowledge of the field is not available. Also, the random deployment can be the direct result of certain deployment strategies. Based on this assumption, the locations of sensors can be modelled by a stationary two-dimensional Poisson point process. Denote the density of the underlying Poisson point process as λ , which is measured by the number of sensors per unit area. The number of sensors located in a region A, N(A), follows a Poisson distribution of parameter λ A , where A represents the area of the region [5].

P( N ( A) = k ) =

e

−λ A

(λ A ) k

(1)

k!

In this paper, for the sake of simplicity, the Boolean sensing model is being used. The Boolean sensing model has been widely used in many researches. In the Boolean model, each sensor has a certain uniform sensing range, r. A sensor can only sense the environment and detect phenomenon within its sensing range. A location is said to be “covered” by a sensor if it lies within the sensor’s sensing range. The degree of coverage is defined by the coverage density. It is defined as, fc(p), the positive integer for sensors’ number by which a particular point p within the sensing field is covered with. The Coverage Density represents redundancy level of sensor deployment for a certain point in the detection area. Note that the definition of a location being covered depends on the specific sensing model under consideration. The Boolean sensing model is considered in this paper, where a location is covered if it is within the sensing area of a sensor. If there are n sensors got the ability to detect the event that takes place at the point p, then the value of fc(p), in terms of resolution, equals to n. Therefore, if there is an event takes place at the point p, basically, n sensors will be able to claim detection for this event. However, some sensors in the sensor network might face

A Distributed QoS Control Schema for Wireless Sensor Networks

121

some problems in terms of miss detection and false detection, so the detection reported by sensors might not exactly equals to n. Detail performance of n will be discussed in Section 4.

4 Duty-Cycle Management Algorithm for QoS Control Suppose we have a collection of n sensor nodes, M1 through Mn, placed over a sensing field with area of S. Sensor nodes are placed and operating following the definition in Section 3. This section analyse the duty cycle management problem to support QoS of Wireless Sensor Network in terms of resolution. Also, a duty cycle management algorithm is given. For the sake of simplicity, it is assumed that all sensor nodes have similar hardware and software specifications and configurations, and they are placed over a flat screen in quite radio environment. As defined in Section 3, every node is running independently. In order to have every node periodically power-down to conserve its battery energy, sensor nodes need to compute what time to sleep and what time to wake up. For any node i, it is awake for the first time deployment, suppose the time for the hth sleep is tsleepi|h, and the time for the hth wake up is twakei|h. Then, the hth sleep interval can be represented as

∆titvl

i

= t wake

i

h

h

− t sleep

i h

When in active state, a sensor node will emit a beacon signal through the radio channel for every ∆t beacon . Without loss of generality, we can define

∆t beacon >> Z beacon / Rate where Zbeacon and Rate represent for the packet size of beacon and the transmission rate of radio channel. The beacon density, ni(t), measures the probability of receiving a beacon for node i at time t. Then, it can be easily discovered that the resolution of a event takes place at point x, x ⊂ S , is

R f c ( x, t ) = n x (t ) ⋅ ∆t beacon ⋅ ( ) 2 + 1 r

(2)

where R and r are the range for sensing and transmission respectively. Defined the minimum threshold of fc(x,t) as f’, then the minimum threshold of beacon density nx(t) can be represented as

f '−1 r n' ( x ) = ( ) 2 ⋅ c R ∆t beacon

(3)

Based on the above analysis, it can be considered as when the incoming beacon probability is lower than n’(x), it means that the number of nodes that cover the point x are too small to provide certain degree of resolution, or vice versa. Therefore, the algorithm that controls the sleep interval can be given as follows.

122

J. Wu

Twakei|0=0; Tsleepi|0=0; n=1; if ((tnow-last_update)>update_itvl) { if (((tnow-twakei|n-1)/ tbeacon)*x)>Maxth {tsleepi|n=tnow; twakei|n=tnow+Max_sleep; }; if (((((tnow-twakei|n-1)/ tbeacon)*x)<Maxth)&((((tnow-twakei|n-1)/ tbeacon)*x)>Minth)) {tsleepi|n=tnow; twakei|n=tnow+ Max_sleep*((((tnow-twakei|n-1)/ tbeacon)*x-Minth)/(Maxth-Minth)); }; if ((((tnow-twakei|n-1)/ tbeacon)*x)<=Minth) {tsleepi|n=+ ; twakei|n=tnow; }; if tsleepi|n<=tnow Set_wake(twakei|n); x=0; n++; last_update tnow; Sleep; }; Minth=(r/R)2*(fc’-1); Maxth=Minth+a*(N-Minth);

Ğ

㧩

Fig. 1. Duty-cycle Control Algorithm for QoS Control

In the algorithm shown in Figure 1, tnow is current time, and the update_itvl is the time slot for update interval. α ⊂ (0,1) is a parameter related to the stability of system.

5 Conclusions and Future Work Sensor networks are an exciting area with very real applications in the near future. Although many aspects of sensor networks have been studied before, quality of service (QoS) for sensor networks remains largely open. In this paper, we present an idea of using the duty-cycle control algorithm running on each sensor node to balance the trade off between energy consumption and resolution. It is expected that even without using such critical assumptions as ref [3] did, we still can control the Wireless Sensor Networks will to achieve some sort of QoS. It is believed that our newly proposed control method is effective, and has significant advantage against the method in literature. In the future, simulation experiments will be done to numerate the significant

A Distributed QoS Control Schema for Wireless Sensor Networks

123

improvement of the newly proposed algorithms. Also, the stability of the control system will be given and parameter configuration methods will be provided to better tune the duty-cycle management system.

Acknowledgement The author acknowledges the support from the National Natural Science Foundation of China (NSFC) under the grant numbers 90104022, 90412011, and 90612004.

References 1. Frolik, J.: QoS Control for Random Access Wireless Sensor Networks. In the Proceedings of the WCNC 2004, (2004)1522-1527 2. Heinzelman, W., Kulik, J., Balakrishnan, H.: Adaptive Protocols for Information Dissemination in Wireless Sensor Networks. In the Proceedings of the 5th ACM/IEEE Mobicom Conference, (1999) 174-185 3. Iyer, R., Kleinrock, L.: QoS Control for Sensor Networks. In the Proceedings of the 2003 IEEE International Conference on Communications, ICC'03. Vol.1. (2003) 517-521 4. Kay, J., Frolik, J.: Quality of Service Analysis and Control for Wireless Sensor Networks. In the Proceedings of the 2004 IEEE International Conference on Mobile Ad-hoc and Sensor Systems, (2004) 359-369 5. Liu, B., Towsley, D.: A Study of the Coverage of Large-scale Sensor Networks. In the Proceedings of the 2004 IEEE International Conference on Mobile Ad-hoc and Sensor Systems, (2004) 475-483 6. Meguerdichian, S., Farinaz, K., Miodrag, P., Srivstava, M.: Coverage Problems in Wireless Ad Hoc Sensor Networks. In the Proceedings of the 20th Annual Joint Conference of the IEEE Computer and Communications Societies, Vol. 3. (2001) 1380-1387 7. Shakkottai, S., et. al.: Unreliable Sensor Grids: Coverage, Connectivity, and Diameter, In the Proceedings of the 22th Annual Joint Conference of the IEEE Computer and Communications Societies, Vol. 2. (2003) 1073-1083 8. Tian, D., Georganas, N.: A Coverage-preserving Node Scheduling Scheme for Large Wireless Sensor Networks. In the Proceedings of the 1st ACM International Workshop on Wireless Sensor Networks and Applications, (2002) 32-41 9. Ye, F., et al.: Peas: A Robust Energy Conserving Protocol for Long-lived Sensor Networks. In the Proceedings of the 23rd International Conference on Distributed Computing Systems, (2003) 28-37

A Framework of In-Situ Sensor Data Processing System for Context Awareness Young Jin Jung1, Yang Koo Lee1, Dong Gyu Lee1, Mi Park, Keun Ho Ryu1,*, Hak Cheol Kim2, and Kyung Ok Kim2 1

Database/Bioinformatics Laboratory, Chungbuk National University, Korea {yjjeong, leeyangkoo, dglee, pmi386, khryu}@dblab.chungbuk.ac.kr 2 Electronics and Telecommunications Research Institute (ETRI), Korea {david90, kokim}@etri.re.kr

Abstract. We propose a framework of the context awareness system which processes a large amount of sensor data from the application areas. The proposed framework consists of a context acquisition, a knowledge base, a rule manager, and a context information manager, etc. we implement the proposed framework of in-situ sensor data processing system that manages the data transmitted from various sensors and notifies the manager of the alarm message for specific conditions. Our proposed framework is able to be applied to the prevention of a forest fire, the warning system for detecting environmental pollution, etc.

1 Introduction It is very necessary to detect the environmental conditions of remote places in real time in order to prevent natural disasters such as a flood, a typhoon, an earthquake, etc. according to the global environment deterioration at an alarming rate with the progress of civilization. The sensors included in a forest, a factory district, a river in ubiquitous sensor network environment transmit sensor data to a concentration node and a control center through routing among sensors. A ubiquitous sensor network environment can collect data through the communications among various sensors and provide an intelligent environment of physical space. In order to provide a suitable service in the situation of users with minimum intervention, the context aware techniques are the core of the service. Context aware service includes the ability of understanding, analyzing, and reasoning user situation. When a user wants to get some service, the service provider should be aware of the context and provide the most suitable service at the user-requested time. It is necessary to manage the sensor data and to abstract the context information depending on the requirements predicted in applications with the hiding of the complex situation of used sensors. Some techniques are required not only to manage the sensor data, but also to understand context for providing optimal service, as the *

Corresponding author.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 124 – 129, 2006. © Springer-Verlag Berlin Heidelberg 2006

A Framework of In-Situ Sensor Data Processing System for Context Awareness

125

context information has some properties changed from the attributes of time and space. In this paper, in order to handle real time sensor data and understand a situation, we design and implement the prototype of the context awareness system for monitoring disaster or accident to deal with the plentiful sensor data in a vast area such as the prevention of a forest fire, the warning system for detecting environmental pollution, etc. The proposed system can manage the data transmitted from various sensors and notify the manager of the alarm message for specific conditions. The remainder of the paper is organized as follows. Section 2 briefly describes the existing sensor data processing and context aware system. Section 3 introduces the proposed system structure. Section 4 presents in-situ sensor data processing and the database schema for storing sensor and context information. Section 5 illustrates the implemented system. Section 6 concludes.

2 Related Work Sensor Web Enablement (SWE) of the Open Geospatial Consortium, Inc. (OGC) builds revolutionary framework of open standards for exploiting web-connected sensors and sensor systems [1]. In addition, SWE contains some research sensorML [2], Observations & Measurements [3], Sensor Observation Service, Sensor Planning Service, Web Notification Service. The sensorML is information model for discovering, querying and controlling web-resident sensors. The observations & measurements are the information model for observations and measurements. The sensor observation service is the service to fetch observations from a sensor or constellation of sensors. The sensor planning service assists in "collection feasibility plans" and to process collection requests for a sensor or sensor constellation. The web notification service executes and manages message dialogue between a client and web services for long duration asynchronous processes. The goal of the SWE activity is to allow all types of web and/or Internet-accessible sensors, instruments, and imaging devices to be accessible and, where applicable, controllable via the internet. A context-aware application is one which adapts its behavior to a changing environment [4]. Typically, a context-aware application needs to know the location of users and equipment, and the capabilities of the equipment and networking infrastructure [5]. There are many projects to understand context information and provide suitable services such as SOCAM(Service-oriented Context-Aware Middleware) [6], CASS (Context-awareness sub-structure) [7], CoBrA(Context Broker Architecture) [8], Context Toolkit [9], Gaia project [10], Hydrogen project [11], CORTEX [12], etc. The SOCAM has the architecture for the building and the rapid prototyping of context-aware mobile services. The CASS is centralized middleware approach designed for contextaware mobile applications. The CoBrA has an agent based architecture for supporting context-aware computing in so called intelligent spaces. Intelligent spaces are physical spaces such as living rooms, vehicles, corporate offices and meeting rooms. The Context Toolkit takes a step towards a peer-to-peer architecture but it still needs a centralized discoverer where distributed sensor units (called widgets), interpreters and aggregators are registered in order to be found by client applications. The Hydrogen project's context acquisition approach is specializing in mobile devices. The Gaia project extends typical operating system

126

Y.J. Jung et al.

concepts to include context-awareness. It aims at supporting the development and execution of portable applications for active spaces. The CORTEX system is an example for a context-aware middleware approach based on the Sentient Object Model.

3 Proposed Framework In order to prevent disasters, provide intelligent public services and personalized services, the structure of context awareness system for modeling and processing large scale condition information from a variety of sensor data is shown in Fig. 1.

Fig. 1. The structure of context awareness system

In-situ and remote sensor data transmitted from sensor network middle interface are stored into a context information database in knowledge base through converting the data into knowledge with a sensor data collection, a sensor data abstraction, and a sensor data refiner. The context information manager module can understand the situation of the area using sensor in real world through analyzing and reasoning the situation through detecting rules defined in the rule manager. It can also provide the summarized context information to a service provider through utilizing a data provider and the knowledge base. In this paper, we focus on the sensor data processing in the context aware system with the sensor model language and rule management for a specific context.

4 In-Situ Sensor Data Processing in the Framework We briefly explain how in-situ sensor data is processed for prevent accidents in the framework such as the sensor data processing steps, database schema to store the rules to check the sensor data, and alarm messages to suggest a safety guideline.

A Framework of In-Situ Sensor Data Processing System for Context Awareness

127

Fig. 2. In-situ sensor data processing for context awareness

Fig. 2 shows the sensor data processing to notify alarm messages through utilizing the rules to evaluate the conditions and knowledge base to support the additional messages. Detailed data processing is summarized as follows:

၃ Sensor data is transmitted into the system through a sensor network middleware interface with transduerML(Transducer Markup Language). ၄ In order to know sensor types and get the detected data from sensors after ၅ ၆ ၇

installing in-situ sensors, registration of sensor information is required in the system. Summarized sensor meta data is stored in the knowledge base. The abstracted sensor data will be processed with context analysis and rule processing after combining sensor meta data and observed data The rule processing module in the rule manager searches the rules to satisfy the abstracted sensor data in the rule information database of the knowledge base. The service information supply module finds the additional message to help the user understand the situation easily in the environment database. The additional message changed depending on the place is provided in the safety guidelines.

Sensor Model Language(SensorML) which is an XML schema for defining the geometric, dynamic, and observational characteristics of a sensor is designed to support a wide range of sensors including both dynamic and stationary platforms and including both in-situ and remote sensors. The sensor structure tables in the database store the properties of sensors extracted from the sensorML files. The tables include the elements of the sensorML such as an identifiedAs, a classifiedAs and, a measures, etc. The measures illustrates the characters of measurement such as sample period, relative accuracy, etc. others tables also store sensor metadata such as sensor locations, document events, the manager mail address, etc. The rule tables for context awareness provide context information, detail conditions such as time, node id, the measured values of sensors safety guidelines for users in the cases that the sensor values are satisfied with the rules for presenting a specific situation.

128

Y.J. Jung et al.

5 Implementation and Running Examples In this section, in order to show the usage of the sensor data processing in context aware system, we describe a process for handling the sensorML, for managing the sensor data transmitted from sensor network.

Fig. 3. The provided alarm message and the focused on a sensor

Fig. 3 shows the served warning message and alarm of the sensor when the measured values of a specific sensor are satisfied with the detailed conditions that user defined. In the view, the specific sensor is focused on alarm and the summarized message is also served to notify users and managers. The context aware system includes three parts: a view, a sensor information bar, a navigation panel. The view displays and handles the geometry and sensor information through utilizing sensor information bar and navigation panel. The sensor information bar shows the list, the structure, the last values of sensors. The navigation panel provides the functions to move to the specific sensor, to rotate the view, and to handle the tilt of the view.

6 Conclusion Recently, the interest in sensor data processing and context aware system increases rapidly on a large scale. In this paper, we proposed the framework of the context aware system based on sensor network to provide the warning message and suitable safety guideline services depending on the transmitted values and the location of sensors in real-time. The implemented system would be useful to process various sensor data for understanding contexts user defined in a variety of applications [13] in sensor network such as an intelligent transportation management, an intelligent robot system, a disaster management system, etc. Currently we are focusing on extending the rules for capturing and reasoning situation under the ontology concepts [14] to satisfy users’ various requirements based on the sensor network.

A Framework of In-Situ Sensor Data Processing System for Context Awareness

129

Acknowledgements This work was supported by RRC program of MOCIE and ITEP, by Electronics and Telecommunications Research Institute, and by the MIC (Ministry of Information and Communication), Korea, under the ITRC (Information Technology Research Center) support program supervised by the IITA (Institute of Information Technology Assessment).

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

11. 12.

13. 14.

Botts, M.,: Sensor Web Enablement. http://www.opengeospatial.org/press/ (2005) Botts, M.,: Sensor Model Language. http://vast.nsstc.uah.edu/SensorML/ (2004) Cox, S.,: Observations and Measurements. http://www.opengeospatial.org (2003) Seo, S.B., Kang, J.W., Ryu, K.H.: Multivariate Stream Data Reduction in Sensor Network Applications. EUC Workshops (2005) 198-207 Harter, A., Hopper, A., Steggles, P., Ward, A., Webster, P.: The Anatomy of a ContextAware Application. Mobile Computing and Networking (2002) Gu, T., Wang, X.H., Pung, H.K., Zhang, D.Q.: A Middleware for Context-Aware Mobile Services. IEEE Vehicular Technology Conference. Milan, Italy (2004) Fahy, P., Clarke, S.: CASS – Middleware for Mobile Context-Aware Applications. MobiSys (2004) Chen, H., Finin, T., Joshi, A.: Using OWL in a Pervasive Computing Broker. Workshop on Ontologies in Agent Systems, AAMAS (2003) Salber, D., Dey, A. K., Abowd, G. D.: The Context Toolkit: Aiding the Development of Context-Enabled Applications. In Proceedings of ACM CHI 99, Pittsburgh, PA. (1999) Román, M., Hess, C., Cerqueira, R., Ranganat,,A.: Campbell, R. H., Nahrstedt, K.: Gaia: A Middleware Infrastructure to Enable Active Spaces. In IEEE Pervasive Computing (2002) Hofer, T., Schwinger, W., Pichler, M., Leonhartsberger, G., Altmann, Jo.: ContextAwareness on Mobile Devices – the Hydrogen Approach. (2002) Biegel, G., Cahill,,V.: A Framework for Developing Mobile, Context-aware Applications. In Proceedings of 2nd IEEE conference on Pervasive computing and Communications, Percom (2004) Seo, S.B., Kang, J.W., Lee, D.W., Ryu, K.H.: Multivariate Stream Data Classification Using Standard Text Classifiers. Dexa (2006), to be accepted. Hwang, J.H., Gu, M.S., Ryu, K.H.: Context-Based Recommendation Service in Ubiquitous Commerce. ICCSA Vol. 2 (2005) 966-976

A Mathematical Model for Energy-Efficient Coverage and Detection in Wireless Sensor Networks Xiaodong Wang, Huaping Dai, Zhi Wang, and Youxian Sun National Laboratory of Industrial Control Technology, Institute of Industrial Process Control Zhejiang University, Hangzhou 310027, P.R. China {xdwang, hpdai, wangzhi, yxsun}@iipc.zju.edu.cn

Abstract. The tradeoff between system lifetime and system reliability is a paramount design consideration for wireless sensor networks. In order to prolong the system lifetime, random sleep scheme can be adopted without coordinating with its neighboring nodes. Based on the random sleep scheme, an accurate mathematical model for expected coverage ratio and point event detection quality is put forward in this paper. Furthermore, the model also takes the border effects into account and thus improves the accuracy of performance and quality analysis. Our model is flexible enough to capture the interaction among the essential system parameters. Therefore, this model could provide beneficial guidelines for optimal sensor network deployment satisfying both the lifetime and reliability requirements. Additional simulation results confirm the correctness and effectiveness of our analysis.

1 Introduction Recently, wireless sensor networks (WSNs) have attracted a great deal of research attention due to their wide-range potential applications, such as environment monitoring, surveillance, target detection and localization. The sensor nodes in such applications are often intended to be deployed in remote or hostile environment operating for a long time under limited battery power, so it is undesirable or impossible to recharge or replace the battery power of all sensors. How to extend system lifetime without sacrificing system reliability is a paramount design consideration for WSNs. Fortunately, researchers have found that scheduling sensor nodes to alternate between sleep and active mode is an important method to conserve energy resources in sensor networks with high node density ( high up to 20 nodes/m3 ) [1],[2],[3]. However, dynamic management of node duty cycles may disrupt the performance of network, such as coverage, detection, etc. Hence, an important problem of dynamic management is to minimize the number of nodes that remain active, while still achieving satisfied quality of service (QoS) for applications. In order to resolve the above problem, some interesting approaches have been recently proposed in literature [4], [5]. However, all these schemes require coordination among nodes by exchanging location or directional information obtained with the Global Positioning System (GPS) and the directional antenna technology. These D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 130 – 137, 2006. © Springer-Verlag Berlin Heidelberg 2006

A Mathematical Model for Energy-Efficient Coverage and Detection in WSNs

131

hardware devices usually consume too much energy and the cost is too high for tiny sensors. Furthermore, coordination among nodes also takes additional energy. It is expected that scheduling algorithms could work without geography information. Moreover, none of the aforementioned literature considered the border effects, that is, the points near the border of deployment area generally have less chance to be covered than the points in central area. When the proportion of the node’s sensing range to the range of the deployment area is not small enough, the border effects should not be ignored ([6], [7]). In [6], a mathematical method was proposed to evaluate the number of nodes needed to reach the expected coverage ratio with the consideration of border effect. However, it can only apply to determine the number of active nodes, and when dynamic management of nodes duty cycle is adopted, the total number of nodes can not be derived from this model. In [7], a mathematical expression was formulated for expected k-coverage with the consideration of both the border effects and the uncoordinated node scheduling scheme. Though the border effects were considered in [6] and [7], the network performance of border area such as coverage and detection can not be predicted accurately from these models. In this paper, we present a mathematical model for energy-efficient coverage and detection quality with the consideration of border effects. We base our analysis on random deployment since this deployment strategy is easy and inexpensive for sensor networks [8]. And for individual nodes, we adopt the model of random sleep scheme. In this model, nodes sleep and wake up randomly and independently of each other. The obvious advantage of this scheme is its simplicity for implementation, without incurring control overhead. Since the deployment and the sleep scheme we choose are random, it is more reasonable to study this problem from a probabilistic perspective. The main contributions of this paper include:

㧔1㧕The model is flexible enough to capture the interaction among the system pa-

rameters such as sensor node numbers, random sleep ratio, etc. Hence, it can provide guidelines for optimal sensor network deployment. Our model can help to determine the sleep ratio for a desired coverage and acceptable detection quality. We pay more attention to the quality of service (QoS) that the sensor network provides for the border area. If the applications in border area demand high degree of accuracy, the QoS is desired to be upgraded to a higher level. In this case, how many nodes should we deploy? This problem is also answered analytically in our model.

㧔2㧕㧔3㧕

The rest of the paper is organized as follows. In section 2 we present our network models and assumptions. Section 3 formulates the network coverage problem and section 4 considers point event detection. In section 5 we present the simulation results and section 6 concludes the paper.

2 Network Models and Assumptions In this section, we present the notations and assumptions for our derivations. First of all, we assume that n sensor nodes are uniformly and independently distributed in a two-dimensional circular area Z with radius of R . For simplicity, we use the Boolean

132

X. Wang et al.

sensing model and assume that sensor’s sensing range is a circular area centered at this sensor with a radius of S . In addition, all sensor nodes are supposed to have the same sensing radius and no two sensors are deployed exactly at the same location. A point event E that occurs within Z can be detected if it lies within at least one active sensor node’s sensing range. So we define a point’s neighboring area ` as a region that any sensor node, if it is located within the region, can cover this point. When border effects are not taken into account, for all points in Z , ` Z = π r 2 . If border effects are considered, we divide the area Z into two parts as shown in Figure 1 for the convenience of analyzing this problem.

l Z' Z = Z '+ Z "

Z"

Fig. 1. Illustration of central area, border area and neighboring area

The central area Z ' that is concentric with Z , has a radius of R − r . Obviously, for any point in Z ' , its neighboring area is ` Z ' = π r 2 . However, for an arbitrary point

d in border area Z " , only the shadowed part has the probability of being deployed

with sensor nodes. Hence, ` d < π r 2 . In our analysis, we assume that all sensors have the same sensing period T and the same sleep ratio α ( 0 ≤ α ≤ 1 ) that defines the percentage of time the sensor is in sleep state. Each sensor node determines independently for each common time unit called slot to be inactive with probability α .

3 Network Coverage Analysis First, according to our assumptions, since nodes are deployed with a uniform distribution, the probability that a sensor node falls on a point’s neighboring area is φ = ` π R 2 . Hence, it is well-known that the number of nodes within ` conforms to a binomial distribution B(n, ` π R 2 ) . Hence, the probability that an arbitrary point is covered by at least one node is Ρ = ¦ k =1 Cnk φ k (1 − φ )n − k = 1 − (1 − φ )n n

(1)

Considering the random sleep scheme, the probability of a point event E is covered by at least one active sensor is Ρ ' = 1 − ¦ k = 0 α k Cnk φ k (1 − φ )n − k = 1 − [1 − (1 − α )φ ] n

n

(2)

A Mathematical Model for Energy-Efficient Coverage and Detection in WSNs

133

For each point in Z ' , the probability that a sensor node falls on its neighboring area is the same: φZ ' = π r 2 π R 2 = r 2 R 2 . According to Formula 2, the probability of being n

covered is also same: 1 − ª¬1 − (1 − α )(r R )2 º¼ . Then the expected coverage ratio of area Z ' is Ρ 'Z ' = 1 − [1 − (1 − α )φ Z ' ] = 1 − ª¬1 − (1 − α )(r R) 2 º¼ n

n

(3)

In particular, the neighboring areas of points in Z " have various values, determined by the distance l between the point and the center of Z as shown in Figure 1. In order to evaluate the average coverage ratio of area Z " , we have to compute the average probability of being covered for all points in Z " . For any point d in Z " , its neighboring area ` d ∈Z " can be calculated using the formula proposed in [6]: ` d ∈Z "

1 R2 − r 2 − l 2 R2 − r 2 − l 2 = π ( R 2 + r 2 ) + r 2 arcsin + 2 2lr 2l R2 − r 2 + l 2 R2 − r 2 + l 2 − R 2 arcsin − 2lR 2l

§ R2 − r 2 − l 2 · r −¨ ¸ 2l © ¹

2

2

§ R2 − r 2 + l 2 · R2 − ¨ ¸ 2l © ¹

2

(4)

Then, the average neighboring area of all points in Z " is R

` Z " = ³³ ` d ∈Z " dσ π [ R 2 − ( R − r )2 ] = ³R − r 2π l ` d ∈Z " dl π [ R 2 − ( R − r )2 ]

(5)

Z"

So the average coverage ratio of Z " is n

Ρ 'Z " = 1 − ª¬1 − (1 − α )φZ " º¼ = 1 − ª¬1 − (1 − α )(` Z " π R 2 ) º¼

n

(6)

Furthermore, we are also interested in the average coverage ratio of whole area Z when all the deployed sensors are active. Based on this value, we can explore how much the QoS will be disrupted under the random sleep scheme and the quantitative quality differences between the central area Z ' and border area Z " . The average neighboring area of all points in Z can be obtained by:

{ = {π r ( R − r )

}

` Z = π r 2 × π ( R − r ) 2 + ` Z " ª¬π R 2 − π ( R − r ) 2 º¼ π R 2 2

2

}

+ ` Z " ª¬ R − ( R − r ) º¼ 2

2

R

2

(7)

Hence, according to formula 1 and 7, the average coverage ratio of Z without adopting the sleep scheme is Ρ Z = 1 − (1 − φ )n = 1 − (1 − ` Z π R 2 )n

(8)

In this paper, we are only concerned with 1-coverage. Our model can be easily extended to k-coverage.

134

X. Wang et al.

4 Point Event Detection Analysis In this section, we propose a mathematical model to analyze the probability of detection delay for the point event. Even the points in area Z are all covered, it may not be guaranteed that all the point events are detected instantaneously when they occur due to the random sleep scheme we adopt. Now consider an arbitrary point covered by at least one node in Z . As Figure 2 illustrates, we call the period from time t1 to t3 the Worst Case Sleep Time (WCST), during which all nodes within this point’s neighboring area happen to be in sleep state, but unfortunately an event occurs during this time period (As shown in Figure 2, the event occurs at time t2 ). Hence, a detection delay td (td = t3 − t2 ) unavoidably occurs for the reason that the event can only be detected when at lest one node wakes up at time t3 . [9] also considered this scenario, but they only targeted large scale wireless sensor networks.

t1

t3

t2

Fig. 2. Illustration of point event detection delay

Intuitively, increasing the number of nodes deployed in the area, or decreasing the sleep ratio of each node can decrease delay time. But, how many nodes we should deploy? How to choose the sleep ratio? Due to the border effects, we need not only to guarantee the detection quality of point events in central area Z ' , but also to pay much attention to analyze the detection delay probability of the point events in area Z " , if the applications require a high degree of accuracy of detection in the whole region Z . Based on Figure 2 and above analysis, it is necessary to calculate the conditional probability (denoted by Ρ S C ) that a point is not covered by any active node even it could be covered ΡS C

n n α k Cnk φ k (1 − φ )n − k [1 − (1 − α )φ ] − (1 − φ ) n ¦ k =1 = =

1 − (1 − φ ) n

(9)

1 − (1 − φ )n

Then, the probability of a given point event is uncovered for at least τ slots is τ

{

Ρ td (td ≥ τ ) = Ρ S C − ¦ i =1 ª¬1 − (1 − φ ) n º¼ = Ρ S C − ¬ª1 − (1 − φ )n ¼º

−1

τ

−1

¦ i =1

¦ k =1α ik (1 − α k )Cnk φ k (1 − φ )n − k

{

n

n

}

i i +1 ¬ª1 − (1 − α )φ ¼º − ¬ª1 − (1 − α )φ ¼º

n

}

(10)

A Mathematical Model for Energy-Efficient Coverage and Detection in WSNs

135

From previous section we can calculate φZ ' and φZ " . Hence, if node number n and sleep ratio α are known in advance, the detection delay of point event can be evaluated analytically. Equivalently, given other parameters, the number of sensors to be deployed can also be estimated.

5 Simulation Results In this section, we demonstrate that the analytical results are consistent with simulation results. In our simulation, locations of nodes are generated conforming to uniform random distribution over a circular area Z with radius R . The area Z is divided into many small grids with size 0.1× 0.1 .The node’s sensing range r is set to 10. Then, R can be determined by the value of r R (1, 0.5, and 0.1 respectively). The period T is chosen to 10s , and let time slot be 1s . In the first set of experiments r R is set to 1, and the number of nodes n is varied from 1 to 10 with an increment of 1. We first obtain the simulation results of coverage ratio (denoted by SC − Z ) of whole area Z when deployed nodes are all active. Then, the random sleep scheme is adopted. We measure the coverage ratio of both central area Z ' and border area Z " under different combinations of n (1, 2,3!,10) and α (0.3, 0.6), denoted by SC − Z '− 0.3 , SC − Z '− 0.6 , SC − Z "− 0.3 , and SC − Z "− 0.6 respectively. The coverage ratio is obtained as follows. The simulation coverage ratio for a single time slot is obtained by calculating the proportion of the number of covered grids to the total number of grids in the area. For each deployment, 1000 time slots are examined. Besides, we generated 100 deployments for every combination of parameters, and get the average simulation results as shown in Figure 3(a). Comparing with the analytical results ( AC − Z , AC − Z '− 0.3 , AC − Z '− 0.6 , AC − Z "− 0.3 , AC − Z "− 0.6 ), we observe that the simulation results match the analytical curves well. In the second set of experiments, for a given node number, we study the detection delay probability of central area Z ' and border area Z " under different combination of sleep ratio α (0.3, 0.6) and time slots τ (1, 2, !, 20) , denoted by SD − Z '− 0.3 , SD − Z '− 0.6 , SD − Z "− 0.3 and SD − Z "− 0.6 respectively. The node number is selected intentionally. As Figure 3(a) shows, when 8 deployed sensor nodes are all waking up, the whole area Z is almost fully covered. Then we can explore how much the detection quality will be disrupted under the different α values, and the quantitative quality differences between the central area and border area. For each time slot τ , when random sleep scheme is applied, the area coverage ratio Ρτ' can be obtained by calculating the proportion of the number of the covered grids to the total number of grids in this area. Then, Ρ S C can be estimated by the long run average of 1 − Ρτ' . Every grid is assumed as a point event. For each grid, we record the number of experiments where the detection delay is larger than or equal to 1s, 2 s,3s, !, 20 s respectively. The simulation results shown in Figure 3(b) are averages over 100 runs.

136

X. Wang et al.

We also conducted additional experiments with r R = 0.5 and r R = 0.1 to examine the accuracy of our theoretical results, and the simulation results are showed in Figure 4 and Figure 5. Our observations from simulation are summarized as follows: 1) The simulation results are very close to the analytical results, which validates the correctness of our derivations. 2) The QoS of central area Z ' outperforms that of border area Z " on both coverage and detection quality. 3) The coverage ratio increases with the increasing number of deployed nodes. For a given nodes number, coverage ratio increases with the decrease of α . 4) For a given node number, the probability of detection delay increases with the increase of α . 1

0.025 AD-Z"-0.3 SD-Z"-0.3 AD-Z' -0.3 SD-Z' -0.3 AD-Z"-0.6 SD-Z"-0.6 AD-Z' -0.6 SD-Z' -0.6

Probability of Dection Delay

Coverage ratio

0.7

0.6 AC-Z SC-Z AC-Z"-0.6 SC-Z"-0.6 AC-Z' -0.6 SC-Z' -0.6 AC-Z"-0.3 SC-Z"-0.3 AC-Z' -0.3 SC-Z' -0.3

0.5

0.4

0.3

0.2

0

2

4 6 8 Number of nodes (a)

0.8 0.7

0.015 Node number =8 0.01

0.6 0.5 AC-Z SC-Z AC-Z"-0.6 SC-Z"-0.6 AC-Z' -0.6 SC-Z' -0.6 AC-Z"-0.3 SC-Z"-0.3 AC-Z' -0.3 SC-Z' -0.3

0.4 0.3 0.2

0.005

0.1

10

0

0

5

10 Time slots (b)

15

0

20

Fig. 3. Comparing analytical results with simulation results (r/R=1)

1

0.045

0.04

0.8

Probability of Detection Delay

Coverage ratio

0.6 0.5

0.2 0.1 0 0

AC-Z SC-Z AC-Z"-0.6 SC-Z"-0.6 AC-Z' -0.6 SC-Z' -0.6 AC-Z"-0.3 SC-Z"-0.3 AC-Z' -0.3 SC-Z' -0.3 300 600 Number of nodes (a)

Node number=30 0.01

0.005

15 30 Number of nodes (a)

45

0

0

5

10 15 Time slots (b)

AD-Z"-0.3 SD-Z"-0.3 AD-Z' -0.3 SD-Z' -0.3 AD-Z"-0.6 SD-Z"-0.6 AD-Z' -0.6 SD-Z' -0.6

0.035

0.7

0.3

0

0.015

0.03

0.025 Node number=600 0.02

0.015

0.01

0.005

900

20

Fig. 4. Comparing analytical results with simulation results (r/R=0.5)

0.9

0.4

AD-Z"-0.3 SD-Z"-0.3 AD-Z' -0.3 SD-Z' -0.3 AD-Z"-0.6 SD-Z"-0.6 AD-Z' -0.6 SD-Z' -0.6

0.02 Probability of Detection Delay

0.02 0.8

0.025

0.9

Coverage ratio

1

0.9

0

0

5

10 Time slots (b)

15

20

Fig. 5. Comparing analytical results with simulation results (r/R=0.1)

A Mathematical Model for Energy-Efficient Coverage and Detection in WSNs

137

6 Conclusions In this paper, we presented an accurate mathematical model for energy-efficient coverage and detection with the consideration of border effects. The correctness and effectiveness of our analytical model are justified through extensive simulation experiments. This model enables us to analyze the tradeoff between network lifetime and system reliability of wireless sensor networks more effectively, and provides guides for optimal sensor network deployment.

Acknowledgment This research is supported by Chinese National Natural Science Foundation under the Grant 60304018, 60434030, Technology Fund of Ningbo City (No.2005C100067), the Key Technologies R&D Programs of Zhejiang Province (No.2005C21087), Academician Foundation of Zhejiang Province (No.2005A1001-13), and Specialized Research Fund for the Doctoral Program of Higher Education (No.20050335020).

References 1. Shih, E., Cho, S., Ickes, N., Min, R., Sinha, A., Wang, A., Chandrakasan, A.: Physical Layer Driven Protocol and Algorithm Design for Energy-efficient Wireless Sensor Networks. Proceedings of the seventh annual international conference on Mobile computing and networking (MobiCom 01), Rome, Italy, July (2001) 272-287 2. Tian, D., Georganas, N.D.: A Coverage-preserved Node Scheduling scheme for Large Wireless Sensor Networks. Proceedings of First International Workshop on Wireless Sensor Networks and Applications (WSNA’02), Atlanta, USA, September (2002) 32-41 3. Ye, F., Zhong, G., Lu, S., Zhang, L.: Energy Efficient Robust Sensing Coverage in Large Sensor Networks. UCLA Technical Report, (2002) 4. Xing, G., Wang, X., Zhang, Y, Lu, C., Pless, R., Gill, C.: Integrated Coverage and Connectivity Configuration for Energy Conservation in Sensor Networks. ACM Trans. Sensor Networks, in press 5. Lu, J., Suda, T.: Coverage-aware Self-scheduling in Sensor Networks. Proceedings IEEE 18th Annual Workshop on Computer Communications, (2003) 117–123 6. Liu, M., Cao, J.N., Li X., Lou, W.: Coverage Analysis for Wireless Sensor Networks. Proc. 1st International Conference on Mobile Ad-Hoc and Sensor Networks (MSN'05), (2005) 711-720 7. Yen, L.H., Yu, C. W, Cheng, Y.M.: Expected K-coverage in Wireless Sensor Networks. Ad Hoc Networks, in press 8. Tilak, S., Abu-Ghazaleh, N.B., Heinzelman, W.: Infrastructure Tradeoffs for Sensor Networks. Proceedings of First International Workshop on Wireless Sensor Networks and Applications (WSNA’02), Atlanta, USA, (2002) 49-57 9. Hsin, C.F., Liu, M.: Network Coverage using Low Duty Cycled Sensors: Random & Coordinated Sleep Algorithms. International Symposium on Information Processing in Sensor Networks, (2004) 433–442

A Method of Controlling Packet Transmission Rate with Fuzzy Logic for Ad Hoc Networks Kyung-Bae Chang, Tae-Hwan Son, and Gwi-Tae Park ISRL, College of Science, Korea University. Anam-dong 5-ga Seongbuk-gu, Seoul, Korea {lslove, chlilla, gtpark}@korea.ac.kr

Abstract. In this research, a packet transmission rate control scheme between nodes on a wireless Ad-hoc network is proposed considering the characteristics of Wireless LAN rent transmission efficiencies by different transmission distances. Many energy efficient routing algorithms researches have been conducted only on the assumption of ideal experimental cases. This paper considers the way of finding suitable transmission rate for the transmission distances between nodes on a mobile Ad-hoc networks so that a more realizable method is presented. In this research, a controlling algorithm for transmission data rates by the distances between mobile nodes is realized using Fuzzy logic, possibly available to be applied to Ad-hoc network routing, and simulations are conducted to verify the enhancements in throughput.

1 Introduction Ad Hoc networks are self-organizing communication networks which can quickly spread out in areas without a communication infrastructure and an Ad-hoc network can be autonomously organized between each mobile node in the network. An Ad Hoc network connects its mobile hosts as needed, and it has a temporary network structure which is possible without a previously organized network structure or a central supervisor. The mobility of nodes varies the topology of an Ad Hoc network. This characteristic of Ad Hoc network demands its own application solutions and the way of developing applications should be differently considered from the case of static networks. Ad Hoc networks in currently use has the following characteristics. First, every node in the networks has its mobility and topology of network dynamically varies. Second, mobile nodes have the limitation in power supply due to the mobility of nodes. Third, there is Band limitation in wireless network and variety channel quality. Researches on developing efficient routing algorithms and power saving algorithms under fixed power source environments are vigorously being conducted in the field of mobile Ad Hoc networks. Such researches on routing algorithms show the results under the ideal circumstances with fixed data transmission rate [1]. However, in a real environment which the distances between Ad Hoc nodes are not fixed in, data losses and delays are not avoidable due to the data traffic in a network. Therefore, a more intelligent method for network control is in need. In WLANs with providing a multiple transmission rate, data transmission rate is determined by the distances between host nodes so that the transmission rate cannot be guaranteed to be D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 138 – 143, 2006. © Springer-Verlag Berlin Heidelberg 2006

A Method of Controlling Packet Transmission Rate

139

the maximum [5]. The maximum transmission rate is only realized under the condition of host nodes within the transmission range. For instance, the transmission rates of 1/2Mb/s, 5.5 Mb/s, 11 Mb/s are ideal at the distances of 100m, 60m, 30m, respectively [2]. Appropriate data transmission rates for transmission distances between mobile nodes should be considered for more realistic Ad Hoc network routing method as well as researches on developing various mobile routing methods. In this research, Fuzzy logic is suggested for the logical transmission rate selection as the method of controlling transmission rate between nodes in Ad Hoc network, considering the characteristics of IEEE 802.11 which possesses different transmission efficiency by transmission distances. A method of controlling packet transmission rate by distances is realized, which is possibly applied to a generic Ad Hoc network routing and the proposed method is verified its validity with a computer simulation.

2 Ad Hoc Routing Algorithm There are two mainstreams of researches in developing routing algorithms for Ad Hoc networks. First off, methods of routing for energy saving of mobile nodes are being researched [1]. Nodes in an Ad Hoc network critically depend on batteries since they are remotely operated using only batteries. Therefore, a network structuring optimization problem is very important for minimum power consumption with providing satisfactory communication. At present, researches on using a periodic sleeping and a clustering of nodes are lively being conducted to span the lifetime of batteries by distributing the energy consumption to the whole network. Secondly, there are a number of researches on realizing the shortest distance adapted to the dynamic node changes [3] [4]. In a mobile Ad Hoc network, the phase of the network continuously varies due to the mobility of nodes. The network phase is possible to be rapidly and arbitrarily changed since it is a multiple hops. At this point the existence of a multiple links might bring on a unpredicted bad influences to the protocol performance and the application on the upper layer. Therefore an enhanced routing method resolving this problem should be importantly considered. Fig. 1 depicts an example of arbitrary routing route decision about dynamic nodes. There are a couple of significant results of researches on DSR, AODV, SPAN and GPSR. However, those researches only use a fixed transmission rate not relevant to distances between nodes in the simulations. In a real system, the distance between

Fig. 1. An Example of Temporary Topology

140

K.-B. Chang, T.-H. Son, and G.-T. Park

nodes certainly affects the network such as signal intensity, transmission delay and packet loss by distances. As seen in Fig. 1, even in an optimized network there might be 2 possible node distances having 2 times different ranges according to their node densities. Therefore, such distance different are considered in this research for the realization of better routing.

3 Transmission Characteristic of 802.11b IEEE 802.11b shows different transmission characteristics in accordance with distances of mobile nodes. The transmission rate is possible up to 11Mbps. However, it varies by transmission distances and performances of links. In general, receiving signal intensity is illustrated by RSSI (Received Signal Strength Indication) value. Transmission rate of 1 or 2 Mbps, 5.5Mbps and 11Mbps are ideal at distances of 100m, 60m and 30m, respectively [2]. The relation between distances and transmission rates in IEEE 802.11b is depicted in Fig. 2. Fig. 2 indicates the different error rates as the packet transmission rates at over a certain distance.

Fig. 2. Transmission Rate Evaluation at Different Distances

A high data transmission rate possesses a high throughput, and on the other hand a low data transmission rate enables a long distance data transmission. Hence, the most appropriate transmission rate should be selected at a certain mobile distance as shown in Fig. 3.

Fig. 3. Transmission Rate Changes According as Distances

A Method of Controlling Packet Transmission Rate

141

4 Fuzzy Logic Control and Max-Min Algorithm In Fuzzy logic theory, a value is represented as a degree of truth similar to be represented in probabilistic theory on the contrary to the conventional logic representing a value with binary logic (0 or 1, black or white, yes or no). Fuzzy logic enables a medium value between 0 and 1. In this paper, a Fuzzy logic controller for data transmission rate decision is constructed using 2 inputs, RSSI values and packet delays in Ad Hoc network. MAX-MIN composition is used as the method for combining input values. MAX-MIN method is shown in (1). R1 $ R 2 ( x , z ) = ∨ ( R1 ( x , y ) ∧ R2 ( y , z )) =

{

y ∈Y

ª ( x , z ), max{min{ µ ( x , y ) R1 «¬ y , µ R 2 ( y , z )}} x ∈ X , y ∈ Y ,∈ z

. (1)

}

5 Simulation A Fuzzy logic control proposed in the simplest method is simulated in this research. The amount of packet transmission with increasing the moving area of mobile nodes is measured between one host node and one other mobile node. The result is compared to the cases using fixed packet transmission rate (2Mbis, 5.5Mbps and 11Mbps). Simulation is conducted with membership functions representing RSSI values and packet delays under the assumption that every node has the same RSSI values. Fig. 4 (a) illustrates the membership function about RSSI value between mobile nodes. A RSSI value is a value derived from calculating internal electric signals of devices so that it is possible for membership functions representing a RSSI value to have different values by different calculating formulas and devices. The membership function for packet delay between mobile nodes is shown in Fig. 4 (b).

Fig. 4. Membership Function for RSSI Value (a) and Packet Delay (b)

142

K.-B. Chang, T.-H. Son, and G.-T. Park

A Fuzzy control logic shown in Table 1 is constructed with those two membership functions. The constructed controller controls transmission rate in three ways of increasing, sustaining and decreasing. Table 1. Logic Table

R S S I

Strong

High Zero

Packet delay Normal Up

Low Up

Fair

Down

Zero

Up

Weak

Down

Down

Zero

Fig. 5. Comparison of Transmission Rate According as Node Distances

The Fig. 5 shows the comparison between the amount of packet transmissions in accordance with increasing moving area of mobile nodes. A transmission rate of 2Mbps is stable throughout the whole region but it shows a relatively insufficient transmission amount to other transmission modes. In the case of 5.5Mbps, the transmission rate decreases through the whole region but it doesn’t show a large difference. This mode can be concluded as stable and not worse in performance considering other modes. The transmission mode of 11Mbps shown an abrupt decrease in transmission rate as it approaches 30m of distance. So it is said to be sensitive to distances. The modes using Fuzzy logic control shows the highest transmission rate in all cases and it has similar decreasing transmission amount with other modes. This shows that the mode controlled by Fuzzy logic is the most efficient and stable mode in all cases.

6 Conclusion In this paper, Fuzzy logic algorithm is used as a dynamic control method for transmission rate according as varying distances between two mobile nodes. The simulation

A Method of Controlling Packet Transmission Rate

143

shows the proposed Fuzzy logic controller possesses better performances than networks using fixed transmission rates. Expanding the result of this research to apply to a multi hop routing algorithm might enable a more stable and faster network. The result shows only the cases of simple comparison in transmission rates. Considerations about transmission data loss and transmission delay are remained as the further research. In addition, a more specific realization is planned using the proposed method with currently researched multi hop routing algorithm.

References 1. Chen, B., Jamieson, K., Balakrishnan, H., Morris, R.: SPAN: An Energy-Efficient Coordination Algorithm for Topology Maintenance in Ad Hoc Wireless Networks. Wireless Networks 8 (2002) 481-494 2. Andren, C., Webster, M.: CCK Modulation Delivers 11 Mbps for High Rate 802.11 Extension. Proc. Wireless Symposium/Portable by Design Conference (1999) 3. Broch, J., Johnson, D., Maltz, D.: The Dynamic Source Routing Protocol for Mobile Ad Hoc Networks. Internet Draft, IETF Mobile Ad Hoc Networking Working Group (1998) 4. Perkins, C.E., Royer, E.M.: Ad Hoc On-Demand Distance Vector Routing. Proc. 2nd IEEE Workshop on Mobile Computer Systems and Applications (WMCSA’99). (1999) 90–100 5. Williams, J., Hanzo, L., Steele, R.: Channel-Adaptive Modulation. Proc. 6th Internet. Conference Radio Receivers and Associated Systems (1995) 344–147

A Novel Algorithm for Doppler Frequency Rate Estimation of Spaceborne Synthetic Aperture Radar Shiqi Huang, Daizhi Liu, Liang Chen, and Yunfeng Liu Xi’an Research Inst. of Hi-Tech, Hongqing Town, 710025 Xi’an, P.R. China [email protected]

Abstract. Synthetic Aperture Radar (SAR) can obtain high-resolution radar images under all weather, day and night and long distance conditions, and has been applied widely in military and civil fields. Range-Doppler (RD) algorithm is a simple and typical imaging algorithm. The key of it is Doppler parameters estimations, including Doppler centroid frequency and Doppler frequency rate. Doppler frequency rate is variational with range. If the estimation of it is inaccurate, it will bring severe defocusing effect and blurring in azimuth direction. The previous estimations of Doppler frequency rate usually use image field instead of data field, the calculated amount is very large and the imaging speed is slow. In order to improve them, this paper proposes a novel Doppler frequency rate estimation algorithm for spaceborne SAR imaging. The raw data of ERS are used to test effectiveness and feasibility of this method.

1 Introduction Radar imaging remote sensing has become a main technique for Earth observation system. Spaceborne SAR imaging system adopts side-look imaging in general and RD algorithm is usually used by this imaging mode. The key technique of RD algorithm is Doppler parameter estimation. The accuracy of Doppler parameter estimation directly impacts on SAR image quality. In order to get high precise Doppler parameter value and avoid using ephemeris data and attitude data to estimate Doppler parameters that bring many errors, some people have obtained quite accurate Doppler parameter estimation values from echo data itself. These methods of Doppler parameter estimation are called “automatic estimation approach”[1]. The purpose is to abstract phase errors that impact SAR image quality from radar echo data and to eliminate them, getting high-resolution images. The errors of Doppler frequency rate estimation will make images defocus and blur, and it will still impact range migration correction. There are some familiar Doppler frequency rate estimation methods for spaceborne SAR, including subaperture correlation method (map drift algorithm)[2][3], time-frequency analysis method [4], image contrast method [5] and minimum entropy method [6], etc. The common shortcomings of these algorithms are D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 144 – 149, 2006. © Springer-Verlag Berlin Heidelberg 2006

A Novel Algorithm for Doppler Frequency Rate Estimation of Spaceborne SAR

145

calculation complex, calculated amount great and unusable to real-time imaging. Then others present some algorithms to estimate Doppler frequency rate from raw data, such as reflectivity displacement method [7] and Shift-and-Correlation (SAC) method [8]. However, the two methods are only fit for airborne SAR Doppler frequency rate estimation and can't be used on spaceborne SAR. In real-time imaging, it demands that the algorithm can satisfy some accuracy requirement and less calculated amount. So a novel Doppler frequency rate estimation algorithm, Mean Frequency Shift Correlation (MFSC) method, is presented in this paper. MFSC algorithm directly estimates Doppler frequency rate from echo data and does not image. Therefore, the computational efficiency of it improves, and it is fit for real-time processing. Obviously is it better than traditional MD algorithm. So it has some theory and practical value for studying spaceborne SAR imaging. The algorithm is validated to be feasibility and validity with ERS-2 raw data of ESA.

2 Doppler Frequency Rate The key technique of azimuth compression or azimuth focus is Doppler parameter estimation, namely, Doppler centroid frequency and Doppler frequency rate estimations. The equation of Doppler frequency rate is given by [9]

f DR = − 2V 2 λR0 .

(1)

Where V is the ground track velocity, Ȝ is wavelength, R0 is range from target to spacecraft track. And the R0 is given by

R0 = H 2 + Rg2 = Rnear + n × (c / Fs ) .

(2)

Where H is spacecraft height, Rg is ground range, Rnear is the distance to the first range bin, c is velocity of light, Fs is the sampling frequency of range direction, n is sampling point number, namely, the number of range gate. The changes of fDR are shown in Fig.1.

Fig. 1. The changes of Doppler frequency rate with range

146

S. Huang et al.

3 Mean Frequency Shift Correlation Algorithm Jorgen presented Shift and Correlation algorithm in 1991[8]. He uses the correlation characteristics of Doppler signal to estimates Doppler frequency rate f DR , obtaining very much high efficiency. An echo of a point target is

s (t ) = exp[iπf DR (t − t 0 ) 2 ] ; t 0 − T 2 ≤ t ≤ t 0 + T 2 .

(3)

Signal s(t) is divided into two parts which are SL(f) and SU(f) in frequency field and they are the lower half and the upper half of Doppler spectrum, respectively. SAC algorithm refers to relative frequency shift of SL and SU, and then makes correlation. The sketch diagram of principle of SAC algorithm is shown in Fig.2. s(t) has the characteristic of wide-time-bandwidth accumulation, and the corresponding time field signal of SL(f) and SU(f) is sl(t) and su(t), respectively.

°sl (t ) = exp[iπf DR (t − t 0 ) 2 ] t 0 − T / 2 ≤ t ≤ t ; . ® °¯su (t ) = exp[iπf DR (t − t 0 ) 2 ] t 0 ≤ t ≤ t 0 + T / 2

(4)

Then, sl(t) and su(t) make frequency shift processing, SL(f) shifts FRF/4 to upper half of spectrum and SU(f) shifts PRF/4 to lower half of spectrum. The results are

°SL+ ( f ) = SL( f + PRF / 4) . ® + °¯SU ( f ) = SU ( f − PRF / 4) The corresponding time field signal is

(5)

sl + (t ) and su + (t ) .

sl + (t ) = sl(t) exp(i2π ⋅ PRF 4 ⋅ t) = exp[−iπf DR (δ 2) 2 + iπf DRt 0δ ] exp[iπf DR (t − t 0 + δ 2) 2 ] su + (t ) = su(t ) exp(−i2π ⋅ PRF 4 ⋅ t ) = exp[−iπf DR (δ 2) 2 − iπf DRt 0δ ] exp[iπf DR (t − t 0 − δ 2) 2 ]

δ = PRF 2 ⋅ f DR

; t0 −T 2 ≤ t ≤ t0 .

(6)

t0 ≤t ≤t0 +T 2.

(7)

;

sl + (t ) and su + (t ) correlate each other, the correlation peak will appear in position δ , which Fig. 2(e) shows. Assume the position δ of correlation peak and pulse repeat frequency PRF is given, f DR may be gained, as is Where

. If

called shift and correlation method. It is a pity that SAC method is only adapt to airborne SAR and the high contrast grade terrain. If we directly utilize it to image for spaceborne SAR, the result is that none can be obtained, which is shows in Fig.4(a). This article proposes using the geometry of spaceborne SAR, which are V, λ and

R0 , to estimate Doppler frequency rate of every range gate cursorily, then compute is estimated with SAC acts as adjustable value of Doppler frequency rate estimation.

A Novel Algorithm for Doppler Frequency Rate Estimation of Spaceborne SAR

147

The basic value plus adjustable value may gain accurate Doppler frequency rate. As is the Mean Frequency Shift Correlation algorithm. The flow chat is displayed in detail in Fig.3.

Fig. 2. Sketch diagrams of Shift and Correlation

MFSC method is an autofocus algorithm that has high computed efficiency and has some similar sections as MD algorithm. In order to obtain Doppler frequency rate errors, MD algorithm makes azimuth correlation with the corresponding images of lower half and upper half of azimuth spectrum. It is autofocus algorithm with image field. Its operation quantity is quite large and needs reiterative operation to form an image. Therefore, it is not fit for real-time imaging processing. However, MFSC doesn’t need reiterative operation, which may reduce operation quantity greatly. And it may reach good accuracy for Doppler frequency rate estimation and not only be fit for real-time imaging processing but also agrees with all kinds of terrain.

Fig. 3. Flow chart for MFSC algorithm

148

S. Huang et al.

4 Experimental Results and Performance Comparisons

㧘

In order to prove efficiency and correctness of MFSC algorithm we utilize real measured data to image with MFSC algorithm. The experimental results are shown in Fig.4(c)-4(f), note that these images are cut out. These data comes from ERS-2, and some parameters as follows: V is 7040 m/s, is 5.7cm, n is from 0 to 5615, c is 3E8m/s, Fs is 18.96MHz, and Rnear is 838000m. The Fig.4 (a) explains that SAC algorithm can’t image for spaceborne SAR. No focus processing also can’t obtain legible image that is shown in Fig.4 (b), in other words, the Doppler frequency rate is estimated with

¬

f DR = −

2V 2

λ Rnear

1 . + n × (c / Fs)

(8)

Whereas, MFSC algorithm can image to all kinds of terrains for spaceborne SAR, which are shown in Fig.4(c)-(f). Table.1 is the performance compare of several algorithms including MFSC algorithm, MD algorithm, time-frequency analysis algorithm and image contrast algorithm.

(a) SAC algorithm image

(b) No focus image

(d) Mountain image

(e) Ocean image

(c) Countryside image

(f) Urban image

Fig. 4. Experiment results of real test data. (a) directly using SAC algorithm for spaceborne SAR imaging, (b) no focus, (c)-(f) the image of countryside area, mountain area, ocean area and urban area with MFSC algorithm, respectively.

We may know from it that MD algorithm, time-frequency algorithm and image contrast are all work in image field, so their real-time feature is bad and their account scalar is large. The terrain adaptability of MD algorithm and image contrast is bad and they demand strong contrast terrain. MFSC algorithm and rime-frequency algorithm have good terrain adaptability, but the work filed of MFSC algorithm is data filed, so it has good real-time feature and less account scalar.

A Novel Algorithm for Doppler Frequency Rate Estimation of Spaceborne SAR

149

Table 1. Performance compare of several algorithms

MD MFSC Time frequency analysis Image contrast

Image or data field image field

Account scalar large

Real-time feature bad

Terrain adaptability strong contrast

data field

little

good

good

image field

large

bad

good

image field

large

bad

strong contrast

5 Conclusions The processed object of Doppler frequency rate estimation method was images instead of data ago. In general, they need repeating replace operation, so their account scalar is large, account course is complex and real-time is very bad. MFSC algorithm directly estimates Doppler frequency rate with echo data, without repeating replace operation, which reduces a lot of account scalar and imaging time, and its adaptability for terrain is wide. Experiment proves that it is right and effective. This has some theory and practice reference value for next studying the imaging and application of spaceborne SAR.

References 1. Li F. K., Held D. N., Curlander J., Wu C.: Doppler Parameter Estimation for Spaceborne Synthetic Aperture Radars. IEEE Transactions on Geosciences and Remote Sensing, 23(1) (1985) 47-56 2. Blacknell D., White R. G., Wood J. W.: The Prediction of Geometric Distortions in Airborne SAR Imagery from Autofocus Measurements. IEEE Transactions on Geosciences and Remote Sensing, 25(6) (1987) 775-781 3. Terry M.C.: Subaperture Autofocus for Synthetic Aperture Radar. IEEE Trans. AES, 30(2) (1994) 615-621 4. Liu Y.T., et al: Radar Imaging Technique. Ha’erbin Industry University publishing house, Ha’erbin, China (2001) 5. Curlander J. C., Wu C., Pang A.: Automatic preprocessing of spaceborne SAR data. ICASS'02, (1982) 31-36 6. Cheng Y. P.: Study of Several Problems in SAR Imaging, Xidian University Doctor Degree Paper, Xi’an, China (2000) 7. Moreira J.: A New Method of Aircraft Motion Error Extraction from Radar Raw Data for Real Time Motion Compensation. IEEE Trans. GRS, 28(7) (1990) 620-626 8. Dall J.: A New Frequency Domain Sutofocus Algorithm for SAR. Proceeding of IGARSS'91, Helsinki: June, (1991) 1069-1072 9. Curlander, McDonough: Synthetic Aperture Radar, Systems & Signal Processing, Chapter 4, John Wiley & Sons, New York (1991)

A Novel Genetic Algorithm to Optimize QoS Multicast Routing Guangbin Bao, Zhanting Yuan, Qiuyu Zhang, and Xuhui Chen College of Computer and Communication, Lanzhou University of Technology, 730050 Lanzhou, P.R. China {baogb, yuanzt, zhangqy, xhchen}@lut.cn

Abstract. Multicast routing service is becoming a key requirement of computer networks supporting multimedia applications. And multicast routing problem has been demonstrated technically as a NP-complete. This paper proposes a novel QoS-based multicast routing algorithm using the genetic algorithms (GA), which has the following characteristics: the preprocessing mechanism, the tree structure coding method, novel heuristic algorithms for creation of random individuals crossover, and the instructional mutation process. The result of simulation shows that the proposed GA-based algorithm has the advantage over the conventional algorithms in efficiency.

1 Introduction With the rapid advances in switching and communication technologies, Multicast service is becoming a key requirement of computer networks. Multicasting consists of concurrently sending the same information from a source to a subset of all possible destinations in a computer network. To carry large numbers of multicast sessions, a network must minimize the sessions' resource consumption, while meeting their quality of service (QoS) requirements. The current approach for efficiently supporting a multicast session in a network consists of establishing a multicast tree for the session, along which session information is transferred. Algorithms are needed in the network to compute multicast trees are called multicast routing algorithms. Nevertheless, it has been demonstrated that it is NP-complete to find a feasible multicast tree with two independent additive path constraints [1]. Generally, there are two ways to solve this problem: an optimal solution in exponent time, or a near-optimal solution by a heuristic algorithm. Though the first is an optimal solution, it is impractical due to its NP hard computation. Instead the second is a feasible way and some heuristics for QoS multicast routing problem have been presented [2]. The simulation results have shown that most of the proposed algorithms either work much slowly or never compute delay-constrained multicast trees with low costs. In this paper, we study the QoS multicast routing problem, and propose a new heuristic genetic algorithm by improving the coding/decoding mechanism and by enhancing the crossover and mutation operation. The algorithm is a source routing algorithm, by which the global information of the network must be kept at each node and the routing tree is locally computed at the source node. Experiments have proved that the heuristic genetic algorithm can solve the QoS multicast routing problem efficiently. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 150 – 157, 2006. © Springer-Verlag Berlin Heidelberg 2006

A Novel Genetic Algorithm to Optimize QoS Multicast Routing

151

The paper is arranged as follows. Section 2 describes the network model and definition in QoS multicast routing problem formally. Section 3 presents a genetic algorithm for the QoS multicast routing problem. Section 4 evaluates the performance of the proposed algorithm. And section 5 is the conclusions of this paper.

2 Network Model and Problem Definition 2.1 Network Model The problem considered here is how communication paths are generated through a packet-switched network for multicast traffic. As far as multicast routing is concerned, a network [3] is usually represented as a directed graph G=(V, E), consisting of a set of switches, V, and a set of directed links, E. Let the link from node i to node j be denoted by e (i, j). Each link e ∈ E is associated with a cost C(e) and several QoS parameters, such as a delay, loss probability, and jitter. The cost function, C(e) is a positive real function, i.e., C:E → R+. The cost function reflects the amount of resources required to support the quality of service provided by the link [4]. The QoS supported on a link is described by QoS functions. Each QoS function, Qi(e), is a positive real function which gives the quality of the parameter that can be guaranteed on the link e. For a multicast connection, packets originating at the source node s ∈ V, have to be delivered to a set of destination nodes M ⊆ V - s. We refer to M as the destination group, and s M the multicast group. Multicast packets are routed from the source to the destinations via the links of a multicast tree T=(VT, ET). A multicast tree is a subgraph of G spanning S and the nodes in M. In addition, V may contain relay nodes, that is, nodes in the multicast tree but not members of the multicast group. 2.2 QoS Metrics The QoS guarantee for a multicast connection is defined as follows. Let q1, …, qn be the n QoS functions and Q1, …, Qn be the corresponding QoS constraints that need to be satisfied. A multicast tree is said to be able to provide the required QoS guarantee if the end-to-end QoS of each source-destination pair of the multicast connection is satisfied. In this thesis, we only consider QoS parameters that are additive, i.e., the end-to-end QoS of a path is the sum of individual QoS of each link on the path. QoS parameters such as a delay and jitter are additive in nature. The end-to-end loss probability of a path can be approximated by the sum of loss probabilities of all links of the path if the link loss probability is very small. Formally, for each v ∈ M, the end to end QoS is guaranteed by

¦

qi (e) ≤ Qi, ∀v ∈ M , i = 1, 2,..., n

e∈P ( s ,v )

Where P(s, v) is the path in T from s to v.

(1)

152

G. Bao et al.

The multiple-constraint multicast routing problem is defined as follows

min ¦ C (e) S .t e∈T

¦

qi (e) ≤ Qi , ∀v ∈ M , i = 1, 2,..., n

e∈P ( s , v )

(2)

Two important QoS metrics are considered in this dissertation [5]: 1. Source-destination delay ( ∆ ): The parameter ∆ represents an upper bound on the acceptable end-to-end delay along any path from the source to the destination nodes. This parameter reflects the fact that the packet delivered ∆ time units after its transmission at the source is of no value to the receivers. 2. Inter destination delay variation ( δ ): is the parameter that represents the maximum difference between end-to-end delays along the paths from source to any two destination nodes that can be tolerated by the application. In essence, this parameter defines a synchronization window for the various receivers. By supplying values for parameters ∆ , δ , the application in effect imposes a set of constraints on the paths of the multicast tree. Given the delay ∆ and delay variation δ tolerances, our objective then is to determine a multicast tree such that the delays along all source-destination paths are within the two tolerances. Or mathematically can be stated as: Given a network G=(v, A), a source node s ∈ V, a multicast group M ⊆ V - s, a link-delay function D:A → R+, a delay ∆ and delay variations δ , is there a tree T=(vt, AT) spanning s nodes in M, such that

¦

D(l ) ≤ ∆, ∀v ∈ M

(3)

¦

D(l ) ≤ δ , ∀v ∈ M , ∀(v, u ) ∈ M

(4)

l∈PT ( s , v )

¦

l∈PT ( s , v )

D(l ) −

l∈PT ( s , u )

Where Eq. 3 is the source-destination constraint, and Eq. 4 is the inter-destination delay constraint. A tree T is feasible if and only if T satisfies both Eq. 3 and Eq. 4.

3 Algorithm Description 3.1 Construction of Routing Table In the network graph, G=(V, E), there are |V|(|V|-1) possible source-destination pairs. There are usually many possible routes between any source-destination pair. Our algorithm assumes that a routing table, consisting of R possible routes, has been constructed for each source-destination pair using the k-shortest path algorithm [6]. The size of the routing table, R, is the parameter of our algorithm. 3.2 Generating the Initial Population For a given source node s and a destination set M={m1, m2, …, mk}, a chromosome can be represented by a string of integers with length k. A gene, gi , 1 ≤ I, … ≤ k, of the

A Novel Genetic Algorithm to Optimize QoS Multicast Routing

153

chromosome is a integer in {1,2, ,r} which represents a possible route between s and mi, where mi ∈ M [7]. Obviously, a chromosome represents a candidate solution for the multicast routing problem since it guarantees a path between the source node to any of the destination nodes. However a chromosome does not necessarily represent a tree. Therefore, we trim the extra edges using a minimum directed spanning tree algorithm, modified form the optimum branching algorithm proposed in [7]. 3.3 Generating the New Population Select two chromosomes from the population of the current generation of which is the best chromosomes (parent 1) and the other a randomly selected chromosomes (parent 2). Make a crossover operation between the two chromosomes producing two new genomes (child 1 and child 2). Make a mutation operation to both of the children Perform a selection to obtain the new generation. 3.4 Crossover Operation Crossover operation generates two children from parents. The children inherit genes randomly from the parents. Whether the crossover is made at all is determined [8] by the parameter pc(usually in the interval [0.5,1.0] ). If pc=1.0, crossover is always made. If pc ≤ 1.0, crossover is made with probability pc. If the crossover operation is not made to the parents, the genes are copied to the children unchanged. If crossover is made to the parents, then in this algorithm the (method of two points) is used in which two randomly selected genes the starting point and the ending point are determined for parent 2. Then, the genes between these points are exchanged between the parents. 3.5 Mutation Operation When the children chromosomes have been created both of them undergo mutation operations. The number of the operations nop is determined by the parameter pm (usually in the interval [0.01,0.2]) and is calculated as follows

nop = ( p )( pm)

(5)

Where nop is the number of mutations, p is the length of chromosome, and pm is a user-defined parameter, in the interval [0.01,0.2]. The mutation operations are performed by selecting nop random genes of the chromosome and replacing the value of the selected gene by a random i integer from the feasible interval of the corresponding integer variable. 3.6 Steps of the Algorithm The algorithm is outlined in the following steps: Step 1: Initialize a population of chromosomes. The algorithm first generates P different chromosomes at random, which form the first generation. The set of chromosomes is called the chromosomes pool (or population), and P is the size of the gene pool.

154

G. Bao et al.

Step 2: evaluate each chromosome in the chromosome pool [9-10]. The fitness value of a chromosome is the value of the fitness function for the solution (e.g., a multicast tree) represented by the chromosome. Given a chromosome pool H={h1, h2, …, hp}, the fitness value of each chromosome is computed as follows. Let C(hi) be the overall line the costs of all links of the network, and Qi be QoS constraints( ∆ , δ ). The fitness value of the chromosome hi, F (hi), is given by

C (hi ) , if ¦ qi (e) ≤ Qi∀v ∈ M , i = 1, 2,..., n °1 − fhi = ® C ( E ) e∈P ( s ,v ) ° 0, Otherwise ¯

(6)

Where P(s, v) is the path from source s to destination v, derived from chromosome hi. After evaluating the fitness values of all chromosomes, chromosomes are then sorted according to their fitness values such that F(h1) ≥ F(h2) ≥ … ≥ F(hp). That is the first chromosome in the pool and is the best solution found so far. Step 3: If the number of generations is larger than the pre-defined maximum number of attritions, MaxGen(Maximum number of generations), then stop and output the best chromosome (solution), otherwise, go to step 4. Step 4: Discard duplicated chromosomes. There might be duplicated chromosomes in the pool. Apply some of the genetic operations [11], e.g. crossover, on two duplicate chromosomes will yield the same offspring. Therefore, too many redundant chromosomes will reduce the ability of searching. Once this situation occurs, the redundant chromosomes must be discarded. New randomly generated chromosomes replace them. Step 5: Generate next generation of chromosomes by applying generic operations: reproduction, crossover, and mutation. Step 6: Stop when the number of generations reaches the maximum of generation, MaxGen, or when no further improvement is observed on the fitness function.

4 Performance Evaluation To evaluate the proposed algorithm, we compared its performance with conventional algorithm using computer simulations. 4.1 Simulation Conditions For the simulations, we make the following assumptions: • • • •

The number of nodes is 60. The delay of each link varies from 0 to 50 ms. The cost of each link varies from 0 to 200. When the number of nodes is 60, the number of destination nodes is 20. The Destination Cost Constraint Function is considered 3000 and the Destination Delay Constraint Function varies from 90 to 240 [12]. • The number of generations is limited to 30. • Test the computation time and success ratio.

A Novel Genetic Algorithm to Optimize QoS Multicast Routing

155

4.2 Simulation Results and Considerations The characteristic of search success ratio versus delay constraint for 60 nodes is shown in Fig. 1.

Fig. 1. Success ratio of algorithm

The proposed algorithm has the higher search success ratios than the conventional algorithm. In this figure, when the delay constraint is small, the search success ratio of both algorithms is almost 0. This is because the route that satisfies the required delay does not exist. While, when the delay constraint is large, the search success ratio of both algorithms increases [13]. This is because many routes that satisfy the required delay exist. However, the proposed algorithm has higher search ratio compared with conventional algorithm.

Fig. 2. Computation time of algorithm

156

G. Bao et al.

The characteristic of computation time versus delay constraint for 60 nodes is shown in Fig. 2. From the figure, it is clearly that the computation time of proposed algorithm is lower than computation time of the conventional algorithm [14]. The computation time for the minimum path is decreased. This is why the computation time of proposed algorithm is smaller than the conventional algorithm.

5 Conclusions In this paper, we study the QoS multicast routing problem which is known to be NP-complete, and proposed a genetic algorithm for the problem. From the simulation results, we conclude that our proposed algorithm has better search success ratio and computation time compared with the conventional algorithm. By modifying the fitness function, the proposed genetic algorithm can also be applied to other multicast problems. In the future, we plan to apply the proposed genetic algorithm to the multilevel hierarchical routing, and study the ways of implementing such genetic algorithm efficiently for various network conditions.

Acknowledgments This research is supported by Natural Science foundation of GANSU province (grant NO. ZS022-A25-027 and 3ZS042-B25-002).

References 1. Wang, Z., Crowcroft, J.: Quality of Service for Supporting Multimedia Applications, in IEEE Journal on Selected Areas in Communications 14 (7) (1996) 1228-1234 2. Zhengying, S. Bingxin, Z. Erdun: Bandwidth-delay-constrained Least-cost Multicast Routing Based on Heuristic Genetic Algorithm, in Computer Communications, 2001(Vol.24) 685-692 3. Kawano, K., Masuda, T., Kinoshita, K., Murakami, K.: An Efficient Method to Search for the Location of Network Services with Multiple QoS Guarantee, in Transaction of IEICE, Vol.J84-B, 2001(No.3) 443-451 4. Rouskas, G.N., Baldine, I.: Multicast Routing with End-to-end Delay and Delay Variation Constraints, in IEEE Journal on Selected Areas in Communications 15 (3) (1997) 346-356 5. Sriram, R., Manimaran,G., Murthy, S.R.: Algorithms for Delay Constrained Low-cost Multicast Tree Construction, in Computer Communications 21 (18) (1998) 1693-1706 6. Koyama, A., Barolli, L., Matsumoto, K., Apduhan, B. O.: GA-based Multi-purpose Optimization Algorithm for QoS Routing, in Proc. of AINA, Vol.1, (2004) 23-28 7. Palmer, C. C., Kershenbaum, A.: Two Algorithms for Finding Optimal Communication Spanning Trees, in Technical Report, IBM T. J. Watson Research Center, Yorktown Height, NY, (1993) 8. Handan, M., El-Hawary, M.: Genetic Algorithm for Multicast Routing with Delay and Delay Variation Constraints, in Accepted for Publication, CCECE, May (2004) 9. Barolli, L., Koyama, A., Motegi, S., Yokoyama, S.: Performance Evaluation of a Genetic Algorithm based on Routing Method for High-speed Networks, in Transaction IEE, Vol.119-C, No.5, (1999) 624-631

A Novel Genetic Algorithm to Optimize QoS Multicast Routing

157

10. Roch A. Guerin, Ariel Orda: QoS Routing in Networks with Inaccurate Information: Theory and Algorithms, in IEEWACM Trunsuctions, vol. 7, no. 3, (1999) 350-364 11. Sherif, M. R., Habib, I. W., Nagshineh, M., Kermani, P.: A Generic Bandwidth Allocation Scheme for Multimedia Substreams, in Adaptive Networks Using Genetic Algorithms, IEEE ( 1999) 1243-1247 12. Gelenbe, E. et al., Cognitive Packet Networks: QoS and performance, Keynote Paper, in IEEE MASCOTS Conference, San Antonio, TX, October, (2002) 14-16 13. Gelenbe, E., Liu, P., Lain, J._e.: Genetic Algorithms for Route Discovery, in SPECTS’03, Summer Simulation Multiconference, Society for Computer Simulation, Montreal, July, (2003) 20-24 14. Goto, T. Hasegawa, H. Takagi, Y. Takahashi (Eds.): Performance and QoS of Next Generation Networking, Springer, London, (2001) 3-17

A Probe for the Performance of Low-Rate Wireless Personal Area Networks Shuqin Ren, Khin Mi Mi Aung, and Jong Sou Park Computer Engineering Dept., Hankuk Aviation University, Koyang City, South Korea {sqren, maung, jspark}@hau.ac.kr

Abstract. Low-rate wireless personal area networks (LR-WPANs) are characterized by low power consumption, low cost, low computation and low rate, which are based on IEEE 802.15.4 and ZigBee standards. In this article, we first give an overview of the network structure, including its architecture, feasibility and functions. We also analyze several application scenarios with NS-2 simulator, and get some experiment results. For a better view of this kind of network, a preliminary performance evaluation is given according to the results, focusing on the beacon-enabled mode for star-topology and treetopology networks. Our performance evaluation study is to reveals the factors which affect network performance such as association, collision, packet delivery ratio and throughput under different superframe structures and traffic types.

1 Introduction LR-WPANs are networks which operate in the Personal Operating Space of 10 meters or less with low data rate wireless connectivity among simple devices that consume minimal power. With the standard, IEEE 802.15.4 Low Rate Wireless Personal Area Network(LR-WPAN) standard [1][4] for the physical layer (PHY) and the media access controller (MAC) and the specification of network/security layer and application framework for the LR-WPANs by ZibBee alliance, the development of LR-WPAN application is accelerated. “Wireless’s New Hookup --- ZigBee Sensors Send Data to Connect Myraiad Devices; Five Years on a AA-Battery” [7]. And the performance evaluation is an important aspect. Our work is to give a preliminary analysis about the LR-WPAN performance by using 802.15.4 NS2 simulator [5]. Many research institutes and scholars have devoted to the performance study about this network. Gang Lu et al. analyzed the performance of the medium access protocol in IEEE 802.15.4 with a focus on the beacon-enabled MAC for star-topology networks [3]. Ivan Howitt and Jose A. Guierrez analyzed the coexistence of an IEEE 802.15.4 network on the IEEE 802.11b devices [2]. Zheng gave a performance study of IEEE 802.15.4[5]. Also ZigBee Alliance [6], an industry consortium of chip manufacturer, OEM manufacturer, has developed the network and application layer based on IEEE 802.15.4. Our work in this paper is to analyze the factors to affect LRWPAN performance and to give prospective evaluations for the various combinations of these factors and the potential problems by some quantitative measures. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 158 – 164, 2006. © Springer-Verlag Berlin Heidelberg 2006

A Probe for the Performance of Low-Rate Wireless Personal Area Networks

159

The paper is organized as following. The architecture of LR-WPANs is discussed in section 2; in section 3, the performance evaluation process is proposed and the experiment results about the performance are given; and section 4 gives a conclusion.

2 Architecture of LR-WPANs 2.1 Structure of LR-WPANs The LR-WPAN architecture is built as the following model: physical layer (PHY), medium access control (MAC) sublayer, service specific convergence sublayer (SSCS), IEEE802.2 Logical Link control (LLC), routing and upper layer. A LRWPAN has such features as: 1)27 channels with three data rates: 1 channel with the rate of 20kb/s in 868MHz band, 10 channels with the rate of 40kb/s in 915MHz band, 11 channels with the rate of 250kb/s in 2.4GHz band; 2) Supporting both star and peer-to-peer connections; 3) Three data transmission types (direct, indirect and guaranteed time slot (GTS)); 4) 16 bit short or 64 bit extended addresses; 5) Carrier sense multiple access with collision avoidance (CSMA-CA) channel access. 2.2 Topologies of LR-WPANs There are two types of components in a LR-WPAN: a full-function device (FFD) and a reduced-function device (RFD). A FFD can communicate with RFDs or other FFDs, while an RFD can only talk to an FFD. A FFD can operate in three modes serving as a network coordinator (PAN Coor), a coordinator, or a device. An RFD just can serve as a device. There are two types of topologies in LR-WPANs: star and peer-to-peer connections. In the former type, there is only one central controller called the PAN Coor that is in charge of the routing communication around the networks, and other devices need establish connections with this PAN Coor; in the latter, any two FFD devices in the transmission range can communicate each other, but the RFD may only communicate with one FFD device at a time. There is also a PAN Coor not only for routing but for managing network. Combining these two topologies, there is also another topology called as cluster tree, in which the PAN Coor is also the first cluster head (CH0), and some FFD devices serve as other cluster head to extend the scale of the network. 2.3 Data Transfer Models in LR-WPANs There are two kinds of communication models in LR-WPANs: beacon-enabled mode and non beacon enabled mode. We can optionally use the superframe for beaconenabled mode. And in the non beacon-enabled mode, we just use CSMA/CA for accessing channel and then send data; and polling mechanism is used by device to check whether there is data from coordinator. The superframe format (Fig.1) is defined by coordinator and it comprises an active part and an optional inactive part, and is bounded by beacons. The active part is

160

S. Ren, K.M.M. Aung, and J.S. Park

consisted of beacon, Contention Access Period (CAP) and Contention Free Period (CFP). The CAP is used by devices for accessing channel, applying for GTS, and sending/receiving data through CSMA/CA backoff algorithm. The CFP is optional and may accommodate up to seven guaranteed time slots (GTS). And the CFP is used for transferring data by using GTS.

Fig. 1. Examples of the superframe structure

3 Performance Evaluation About LR-WPANs 3.1 Experiment Environment There are four frame types: beacon, command, acknowledgment, and data frames in LR-WPANs. In this section we will analyze the performance of the LR_WPAN under different traffic and different superframes. We evaluated the LR-WPANs’ performance by running some simulation experiments using NS2. In this simulator, there are 14 PHY primitives for 802.15.4 PHY; 35 MAC primitives for 802.15.4 MAC; Service Specific Convergence Sublayer (SSCS) is an interface to access MAC primitives. We used some functions provided by SSCS to do some experiments. The process to build a sensor network in NS-2 is: setting some network parameters as traffic type, the maximum distance to send and receive in a single hop; setting the topography and channel; configuring the nodes in the network by assigning the Link Layer, MAC protocol, antenna, radio propagation and so on; setting up the traffic between nodes; and starting the simulation and stopping it after some time. 3.2 Performance Metrics A LR-WPAN can work in beacon-enabled mode or non-beacon enabled mode. The beacon-enabled mode in the IEEE 802.15.4 makes applications to save more power, and to extend the life of the network. And the performance of the beacon-enabled mode is affected by beacon order (BO) and superframe order (SO). Here we will discuss the network performance as following measurements. 1) Association efficiency: The average number of attempts per successful association. Successful association rate: The ratio of devices successfully associated with a coordinator to the total number of association request.

A Probe for the Performance of Low-Rate Wireless Personal Area Networks

161

2) Packet delivery ratio: The ratio of packet received to packets sent in MAC sublayer. 3) Throughput: the ratio of the packets received to the time receiving these packets. 3.3 Experiment Results We did 2 experiments to evaluate the star and tree topography network performance respectively. Fig. 2 gives the scenario of the tree example. The performance is evaluated based on the trace file. In our experiments, we set the same value for BO and SO from 0 to 7. This scenario consists of 13 nodes, with 1 PAN Coor, 7 FFD, and 5 RFD. The number above every node is the parent node id. And we compared the performance using FTP, CBR or Poisson traffic, which are the normal data packet types. In our future studying, we will discuss the traffic with attacking data packets.

Fig. 2. Experiment scenarios

Successful Association Rate and Collision Rate To associate with coordinator, the device can scan channel executing actively or passively. Here we just used the active channel scan in which a beacon request is sent to locate a coordinator. If there has been a coordinator, the device will send association request; after receiving the ACK, the device will send a data request; if the ACK and the association response from coordinator are received respectively, the association is built successfully. We tested the association efficiency under different BO (0~7) with the same value as SO given different traffic (Fig.3a). For the BO with more than2, no failure for building association. Yet only the association efficiency is not enough to represent the network performance. We also examined the relationship of collision rate and the BO (Fig. 3b). For the smaller BO, more collisions happened, about 75% of collision happened for BO=0. The collision of BO with 2 is also higher, from BO with 3, the collision began decreases.

162

S. Ren, K.M.M. Aung, and J.S. Park

From the definitions of BI and SD(BI= 2 B O × α B aseS up erfram eD uration , SD= 2 SO × α BaseSuperframeDuration ), larger BO means larger beacon interval, so the coordinator reacts slowly; and the lower BO means higher collision probability because of the higher frequency of beacons, and these collision may bring down the association ratio. Also the CSMA-CA algorithm requires a transaction which should be finished before the end of CAP, or else the transaction should be delayed until the beginning of next frame. So at the beginning of a superframe, more collisions will happen for smaller beacon order because of short time for CAP. 80

110 100

Collision Ratio

60

Collision Ratio

Association Ratio

90 80 70 60

40

20

50 Cbr traffic Ftp traffic Possion traffic

40

0

30 0

1

2

3

4

5

6

0

7

1

2

3

4

Beacon Order

Beacon Order (a)

1.2

5

1.0

7

Throughput

140 0.8

Throughput

Packet Delivery Ratio

6

(b)

160

0.6

0.4

120

100

Cbr Traffic Ftp Traffic

0.2

80

0.0

60 0

1

2

3

4

5

6

7

0

Beacon Order (c)

1

2

3

4

5

6

7

Beacon Order (d)

Fig. 3. (a) Successful association efficiency with different beacon order and different traffic; (b) Collision Ratio under different beacon order for the ftp traffic; (c) Packet delivery ratio under different beacon order for the ftp and cbr traffic; (d) Throughput under different beacon order for the ftp m

R p acketR to S =

¦ R p a ckets

i

i =1 n

¦ S p a ckets

(1) k

k =1

m

T h ro u g h p u t =

¦ R packets

i

* p a cketsize i

i =1

T start − T en d

(2)

A Probe for the Performance of Low-Rate Wireless Personal Area Networks

163

Packet Delivery Ratio and Throughput We used packet delivery ratio and throughout to measure the data transmission performance. In our experiments, we measured the effective packet delivery ratio as the number of data packets receiving at the device divided by the number of data packets originally sent as Equation 1, where m is the number of devices that received data packets whose destination address is the node address, and n is the number of devices that sent data packets originally. The throughout is the bit size of the data packets in unit time (Equation 2).These performances are also affected by the BO and SO for beacon-enabled mode (Fig. 3c and Fig. 3d). The packet delivery ratio is higher in BO with 2, 3, and 4 with 100%, 90% and 100% respectively. And the highest throughput is at the BO with 2, then 4 and 3. So selecting the suitable BO that should be neither too small nor too large is very important for the trade-off between these performance metrics. And also different traffic leads to different packet delivery ratio. The experiment results showed us that packet delivery ratio is higher for ftp traffic by BO with 0 and 2, cbr traffic has higher packet delivery ratio with other BO value.

4

Conclusion

Based on the description of the IEEE 802.15.4 standard and the relevant performance evaluation experiments, we find that for a better performance, the suitable combination of SO and BO is necessary given specific traffic. And selecting an optimized solution based on simulation is economic and useful for designing a sensor network application. Our ongoing works are implementing and analyzing the orphaning recovery performance of LR-WPANs with a simple and feasible recovery algorithm. We will analyze the work patterns of the LR-WPAN under attacks with real test bed.

Acknowledgements This research was supported by the MIC (Ministry of Information and Communication), Korea, under the ITRC (Information Technology Research Center) support program supervised by the IITA (Institute of Information Technology Assessment).

References 1. Callaway, E., et Al.: Home Networking with IEEE 802.15.4: A Developing Standard for Low-Rate Wireless Personal Area Networks”, IEEE Communications Magazine, (2002) 69-77 2. Howitt, D., Gutierrez,J.A.: IEEE 802.15.4 Low Rate - Wireless Personal Area Network Coexistence Issues, Wireless Communications and Networking, (2003) 1481-1486 3. Lu, G. et al..: Performance Evaluation of the IEEE 802.15.4 MAC for Low-rate Low-power Wireless Networks, IEEE International Conference on Performance, Computing, and Communications (IPCCC), (2004)701-706

164

S. Ren, K.M.M. Aung, and J.S. Park

4. Institute of Electrical and Electronics Engineers, Inc., IEEE Std. 802.15.4-2003, IEEE standard for information technology - telecommunications and information exchange between systems - local and metropolitan area networks specific requirements part 15.4: wireless medium access control (MAC) and physical layer (PHY) specifications for low-rate wireless personal area networks (LR-WPANs). IEEE Press, New York, (2003) 5. Jianliang, Zheng., Myung J. Lee.: A Comprehensive Performance Study of IEEE 802.15.4, IEEE Press Book, http://ees2cy.engr.ccny.cuny.edu/zheng/pub/.(2004) 6. Kinney, P., ZigBee.: Technology: Wireless Control that Simply Works, White Paper dated 2 October (2003) 7. William, M. Bulkeley.: Wireless’s New Hookup, The Wall Street Journal, Feb.,B1(2005)

AG-NC: An Automatic Generation Technique of Network Components for Dynamic Network Management* Eun Hee Kim1, Myung Jin Lee2, and Keun Ho Ryu1,** 1

Database and Bioinformatics Laboratory, Chungbuk National University, Korea {ehkim, khryu}@dblab.chungbuk.ac.kr 2 Research and Development Center, GALIM Information Technology, Korea [email protected]

Abstract. In this paper, we propose an automatic generation method of network components for active network management based on SNMP. At most the components in the network have been managed manually. It wastes time and cost for network management program development. Thus, we propose an active program generator, which is called AG-NC, in order to solve these disadvantages. AG-NC consists of NE Basic Info Handler, MIB Handler, Template Handler, and Operation Handler. This can generate a network management program automatically using information that was provided along with SNMP library. Therefore, we can make the network structure expansion because the development time and cost of the network management program can be reduced dramatically through AG-NC.

1 Introduction Due to internet development with spread of the Web, most information systems are constructed based on the network environment connected with various network devices. The Network management has become important because the network structure is complex and growing fast. However, it requires the information of network components such as node, interface, and service rather than the simple status of the network. In addition, we need a standard network management scheme to manage network in a common way for the different network devices. IETF (Internet Engineering Task Force) made SNMP (Simple Network Management Protocol) [1] as standardization for easy internet management. It has been broadly used for most internet managements until now. Its advantages were easiness of implementation and interoperability. Therefore, it exposed many limitations in network management and operation in the SNMP-based network management as high-speed telecommunication network appeared. *

This research was supported by the MIC (Ministry of Information and Communication), Korea, under the ITRC (Information Technology Research Center) support program supervised by the IITA (Institute of Information Technology Assessment). ** Corresponding author. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 165 – 170, 2006. © Springer-Verlag Berlin Heidelberg 2006

166

E.H. Kim, M.J. Lee, and K.H. Ryu

In this paper, we propose an active program generator, which is called the AG-NC (Automatic Generation of Network Component) in order to automate the generation of information for network management. The proposed AG-NC can generate a network management program automatically using information that has been provided along with the network equipments and SNMP library. Thus, we will make the network structure expansion by reducing the development time and cost of the network management program dramatically through AG-NC. This paper is organized as the following. In Section 2, we briefly review related works and describe their weaknesses. We introduce a framework of AG-NC in Section 3. We describe the result of analysis of experiment through AG-NC in Section 4, followed by the conclusion which summarizes our contributions and discusses future work in Section 5.

2 Related Work SNMP has many advantages like easy implementation and a simple structure. However, the high-speed telecommunication network enlarges volume of network and makes structure of network complex. Especially, network management application development is more difficult. The network management application should orchestrate network components automatically. In order to complement, the disadvantages of the SNMP-based network management system many researchers have applied XML as a scheme to transfer and process large amount of data generated from a broad network [2], [3] effectively. These researches are to express managed information using XML and transfer these XML documents by HTTP [3], [4]. Moreover, when data is stored in database or processed by user application, it uses XML standardization [5]. Web based structure accelerated lots of new application program by providing various kinds of data which was distributed in the internet and platform-independent easily. Therefore, there is a number of research integrating existing different managed protocols and tools by applying web techniques into the network management or system management [6], [7], [8], [9]. However, the researches do not concentrate on management as they are SNMP agent, and depend on manual work in order to develop a network management program. Therefore, it requires expensive cost and time for developing a network management program. Also, a network manager spends a lot of time on modifying errors. Moreover, commercial network management systems such as What'sUPGold [10] and VisualRoute[11] generates network management program manually. This paper focuses on how to solve the problems that increase the cost and time in development of network management applications. Whenever changes happen in the network the network manager must modify or create a new management application to manage network components. We need a tool to generate management program automatically for the newly added network components [12].

3 AG-NC Framework AG-NC framework is used as a supporting tool for network management system. Network management program generated from the framework is used to operate and

AG-NC: An Automatic Generation Technique of Network Components

167

manage network by integrating with SNMP manager. Network managers and network program developers input managed object which will be newly added to the network using user interface. AG-NC framework generates network management program and sends it to network management system. The network management system can monitor the network elements which are newly added or changed.

Fig. 1. AG-NC framework

Fig. 1 shows proposed AG-NC framework. In order to generate network management program automatically, AG-NC consists of NE Basic Info Handler, MIB Handler, Template Handler, and Operation Handler. The functions of these components in AG-NC are described in detail in the next section. 3.1 The Function of These Components A. NE Basic Info Handler This component takes charge of storing and creating basic information of network management objects that will manage as network application. In order to create basic information, we need some information. First, a class name of new network management program and program name of network management objects. Second, a file name of MIB that network manager uses to specify network management objects. Third, object name for the specific network management object. Fourth, acceptance or rejection of method for set operation of SNMP. Finally, acceptance or rejection of method switching over from a specific numeric data contained in MIB to character data. We will generate a network management program based on this information. B. Operation Handler Operation handler manages operations for SNMP execution. SNMP protocol has four kinds of operation such as Get, Get Next, Set, and Trap. Get operation reads management information such as status and run-time of network management object. Get Next operation takes lower layer information from the hierarchical tree structure. Set operation has the control of handling MIB of the network management object. Trap operation is threshold or event which is reported to the manager.

168

E.H. Kim, M.J. Lee, and K.H. Ryu

C. MIB Handler MIB handler takes charge of building MIB information tree to generate network management application. MIB information tree makes a hierarchy of MIB objects. MIB handler extracts identification values of MIB objects which is a target for the network management application from MIB information tree, and builds a tree after parsing the content of MIB file. The MIB objects are managed and classified by single and entry objects single object means that MIB object attribute corresponds to one attribute value in MIB information tree. The MIB information tree creates a process of two steps. The first step is MIB file reading process. In this step, MIB handler reads MIB files corresponding to more than one MIB file names selected in the basic data selection step (NE Basic Handler). After reading MIB file, we generate MIB file information tree from the read MIB file (MIB file information tree is generated from the read MIB file). In order to generate MIB information tree, we use default MIB file and user-added MIB file. D. Template Handler Template Handler supports formal information such as template header and template tail, which are commonly used in generated network management application. In the template header, name of network management application and necessary application variables are defined. In the template tail, source code that configures method for debugging is defined. In the next section, we will analyze the performance of the framework by applying the generated network management application to an actual network management system.

4 Experiments and Analysis In this section, we evaluate the efficiency of the generated application through our framework. Efficiency of application is how exactly the information is obtained from various kinds of network components. Table 1. Example of Network components Node type

Node IP

Node Name

Router

211.196.xxx.127

ROUTER

Windows Server

211.196.xxx.133

KT-9Z25FJCHPIZ8

Therefore we would compare the network management program which is created manually with which is generated automatically in the same network environment and for the same network managing component. We would analyze the reliability of proposed AG-NC through the test about how exactly to manage network status of network management components. In order to verify the generated network management program through our framework, we get information from two kinds of network components as shown in Table 1.

AG-NC: An Automatic Generation Technique of Network Components

169

4.1 Analysis Result We obtain the result that the collected information from Router and Windows Server components using the generated network management program through AG-NC has no errors. Also, we can reduce consuming cost for maintenance and management of network management system. Fig. 2 shows the monitoring results of input/output traffics of Router connecting to the network using generated network program through AG-NC.

Fig. 2. Result of Router using generated network program through AG-NC

Fig. 3 shows the monitoring results of input/output traffics of Windows server connecting to the network using generated network program through AG-NC.

Fig. 3. Result of Windows Server using generated network program through AG-NC

We observe that regular operations don’t have errors in the execution of generated network management program through AG-NC when applied in the real world application (i.e. another kinds of network components such as Linux server, switch and so on). We can ensure how our application causes no errors and has high efficiency.

5 Conclusion The existing SNMP based network management system created the network management program manually for managed objects, when adding new network

170

E.H. Kim, M.J. Lee, and K.H. Ryu

device or network components to the network. Generating network management program manually depreciates maintainability and efficiency of a network management system when network volume is enlarged or organization gets diverse. Therefore, in this paper we proposed a framework for automatic generation of network management program. We called it as AG-NC (Automatic Generation of Network Components). The proposed AG-NC consists of NE Basic Info Handler, MIB Handler, Template Handler, and Operation Handler. It can create an automatic network management program that accomplishes network management with SNMP manager using information of network components. Generated results via our framework can be easily extensible and available for network management. It can develop and maintain the network management program. Moreover, we show capability to reduce time and cost for maintaining network with evaluation of time consumed and error rate in the generation of network management program by applying in an actual network management system. The network management system generates network management program through our framework in this experiment.

References 1. Stallings, W.: SNMP, SNMPv2, SNMPv3, and RMON 1 and 2. 3rd edn, Addison-Wesley, Reading, MA, USA (1999) 2. Ju, H.T., Han, S.H., Oh, Y.J., Yoon, J.H., Lee, H.J., Hong, J.W.: An Embedded Web Server Architecture for XML-Based Network Management. the IEEE/IFIP Network Operations and Management Symposium, Florence, Italy (2002) 5-18 3. Kim, Y.D., Cho, K.Y., Heo, J.H., Cheon, J.K., Cho, S.H.: Network 0anagement System by Using Transfer SNMP. Proc. of KNOM Conference, Taejeonn, May (2001) 102-106 4. Barillaud, F., Deri, L., Fedirum, M.: Network Management Using Internet Technologies. Proc. IEEE/IFIP International Symp. On Integrated Network Management, San Diego CA (1997) 5. Deri, L.: HTTP-Based SNMP and CMIP Network Management. Internet Draft, IBM Zurich Research Laboratory (1996) 6. Pell, H.A., Mellquist, P. E.: Web-Based System and Network Management. Internet Draft, Hewlett-Packard (1996) 7. WBEM : http://wbem.freerange.com 8. Perkins, D., McGinnis, E.: Understanding SNMP MIBs, Prentice-Hall (1997) 9. Case,J.(et al): Management Information Base for Version 2 of the Simple Network Management Protocol (SNMPv2). IETF, RFC1907 (1996) 10. WhatsUp Gold: http://www.ipswitch.com 11. VisualRoute: http://www.visualroute.com 12. Lee. M. J.: A Network Management System Based on Active Program Generation. Ph.D. Thesis, Chungbuk National University, Korea (2005)

Clustering Algorithm in Wireless Sensor Networks Using Transmit Power Control and Soft Computing Kyung-Bae Chang, Young-Bae Kong, and Gwi-Tae Park ISRL, College of Science, Korea University. Anam-dong 5-ga Seongbuk-gu, Seoul, Korea {lslove, ybkong, gtpark}@korea.ac.kr

Abstract. Minimizing power consumption of node is important in wireless sensor networks. Transmit power control and clustering can reduce the energy consumption efficiently when nodes are non-homogeneously dispersed in space. This paper presents the clustering algorithm in wireless sensor networks. The clustering algorithm is based on the optimization of transmit power level by using the soft computing approaches. This solution determines the node transmit power level statistically and achieves energy savings efficiently.

1 Introduction Minimizing power consumption is important in the field of researching sensor networks. And node transmit power control is an effective solution for minimizing energy consumption since it greatly affects battery lifetime and performance of traffic transmission [1]. For instance, in Fig.1, when node N1 tries to transmit a packet to node N2, it doesn’t need to transmit the packet with the power of 30mW since node N1 is located within the range of 1mW power, so node N1 possibly saves more battery power. In the case of packet transmission between node N3 and N4, it is assumed that node N3 tries to broadcast to node N4 with 1mW of power. If node N1 simultaneously tries to broadcast to node N2 with 1mW, both of transmissions from node N3 and N4 can be successfully received since node N2 is not in the interference range of node N3 and N4 is not in the interference range of N1 as well. Nevertheless, in the case that node N1 broadcasts with 30mW of transmission power, the transmission from node N4 to N3 is interfered by node N1. In this case, only the transmission from node N1 to N2 is successfully carried out. Therefore, traffic transmission performance of a network can be improved by controlling transmission power control. In case of homogeneously distributed nodes in a network as shown in Fig.2 (a), it is possible to resolve problems mentioned above by controlling with the minimum transmission power which possibly connects the network. However, when nodes in a network are non-homogeneously distributed as depicted in Fig.2 (b), that is to say that a node is irregularly in a far distance from other nodes, battery lifetime and traffic transmission become to be inefficient since the minimum transmission power needed to maintain the network abnormally increases. [2] This problem can be effectively resolved by grouping nearly located nodes in a network into clusters corresponding to power levels. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 171 – 175, 2006. © Springer-Verlag Berlin Heidelberg 2006

172

K.-B. Chang, Y.-B. Kong, and G.-T. Park

Fig. 1. The need of Transmit Power Control

Moreover, nodes in a wireless sensor network have mobility so that the density of the nodes also varies as time. Hence, the clustering method should dynamically vary in accordance as the movement of nodes considering the node density due to the mobility of nodes. In order for this, a method determining the transmission power level of each node with minimum computation and shortest time is in need. In this paper, a method of clustering by determining optimized power level using soft-computing in a non-homogeneously distributed network is proposed. With the aid of the proposed method, communications in a cluster can be carried out with optimized transmission power, and higher level of transmission powers are used for the communications to nodes in other clusters. In this way, the proposed method possibly resolves problems occurred in case of non-homogeneously distributed nodes. This paper is organized as follows: Section 2 describes clustering algorithm in more detail. Section 3 describes the clustering characteristics for transmit power control. Finally, Section 5 concludes this paper and discusses future work.

Fig. 2. (a) Homogeneous spatial dispersion of nodes (b) Non-homogeneously dispersed node

Clustering Algorithm in Wireless Sensor Networks

173

2 Clustering Algorithm The method proposed in this paper is a clustering method which generates clusters with determining transmission power level by using soft-computing technique in nonhomogeneously distributed network. 2.1 Clustering Algorithm The transmission power for clustering is determined by Bayesian classification based on Priority probability. This method is applied in the way to calculate the probability for an arbitrary node to belong to a specific transmission power level, and to select the class having the highest probability. By assuming xi as an arbitrary node in a network and

ci as a transmission power level, the optimized transmission power C best is cal-

culated by (1).

C best = ArgMax [

p ( x i | c j ) p (c j ) P ( xi )

] .

(1)

P( xi ) is the probability that an arbitrarily extracted node from the node set is xi and P (c j ) implies the probability for an arbitrarily extracted node from the node set to belong to transmission power set,

c j . P ( xi | c j ) is the probability that an arbitrarily

extracted node from a node set belonging to a transmission power set, c j is xi . (1) Determines the transmission power set having the highest possibility to the power level by calculating P ( xi | c1 ), P ( xi | c2 ), P ( xi | c3 ), and P ( xi | ck ) for a given node x j . By using (1), internal nodes in a network attain transmission power level based on Bayesian Classification. Through this procedure, an arbitrarily extracted node, x i clusters with the optimized transmission power C best . This procedure is repeated until all nodes in the network are finished to be clustered. 2.2 Cluster-Based Routing In general, cluster based routing uses the modified routing table. The routing table has cluster id and transmit power level. If source node and destination node are in the same cluster, routing can discover the path only uses with the decided the transmit power level of the cluster. The transmit power level can be realized by using the optimized power level, Cbest which is simply enough to connect nodes. But source node and destination node are in the different cluster, routing must use the more high transmit power level for route discovery. Fig.3. presents the example of the routing algorithm in a clustered network. First, transmit power level of 1mW is firstly used to transmit a packet from the source node

174

K.-B. Chang, Y.-B. Kong, and G.-T. Park

Fig. 3. Cluster-Based routing example

S to the destination node D1 because S and D1 are in the same cluster1. In second case, source node S is in the cluster1 and destination node D2 is in the clutser2. So the node S must use the more high transmit power level to forward packet into the 10mW cluster where the destination node belongs to.

3 Simulation and Clustering Characteristics We simulated the clustering using MATLAB. Our simulation is involving 100 node placed non-uniformly on a 100m x 100m area and. And we assume that the node can control the transmit power level and there are only a few discrete power levels available. Simulations show that the clustering for transmission power control can perform the well-structured clustering for wireless sensor network.

Fig. 4. (a) fixed transmit power (b) transmit power control using bayes’ rule

Clustering Algorithm in Wireless Sensor Networks

175

Our proposed methods have the following characteristics. • Since the proposed clustering algorithm is based on the transmit power level, it is not fixed and provides a distributed clustering. Thus it has simple structure and can do the clustering efficiently in wireless sensor networks. • It is possibly applied to all kinds of proactive and reactive routing protocols. In case of proactive routing protocols such as DSDV [3], routing tables holding different power levels are maintained by HELLO packets to construct the clustering. On the other hand, reactive routing protocols like AODV [4], can be transmitted to every power level applicable to be used by Discovery Request. • Transmit power level might be dynamically changed by node mobility and we adapt Bayesian classification based on Priority probability. The proposed clustering algorithm is possible to predict the node transmit power without any network resources and excessive computing time. Therefore. The clustering algorithm is an efficient for wireless sensor network varying in density between many nodes.

4 Conclusion We discuss the clustering algorithm using the bay Bayesian classification based on Priority probability and the cluster-based routing. Minimizing power consumption is indeed important in the field of researching sensor networks. This paper proposes a solution for transmission power and clustering in non-homogeneously distributed network. The proposed method indicates an efficient way of clustering by using transmission power. Moreover, efficient traffic transmission can be realized, transmission routes considering transmission power are possible provided and the method is able to minimize collisions occurring in MAC by using the proposed method. Based on MATLAB simulation, our algorithms will give efficient clustering mechanism for wireless sensor network. In order to verify our methods, we should extend the network simulator ns [6] to simulate our algorithms.

References 1. Narayaswamy, S., Kawada, V., Sreenivas, R.S., Kumar, P.R.: Power Control in Ad-Hoc Networks: Theory, Architecture, Algorithm, and Implementation of the COMPOW Protocol. European Wireless Conference, (2002) 2. Kawadia, V., Kumar, P.R.: Power Control and Clustering Ad Hoc Networks. IEEE INFOCOM , (2003) 3. Kecman, V.: Learning and Soft Computing. 61-103 4. Perkins, C.E., Elizabeth M.R.: Highly Dynamic Destination-Sequenced Distance-Vector Routing (DSDV) for Mobile Computers. SIG-COMM `94: Computer Communication Review. (1994) 234-244 5. Perkins, C.E., Elizabeth M.R.: Ad Hoc On-Demand Distance Vector Routing. Proceedings of the 2nd IEEE Workshop on Mobile Computing Systems and Applications. (1999) 90–100 6. UCB/LBNL/VINT Network Simulator – ns2. http://www-mash.cs.berkeley.edu/ns/, (1998)

Discriminating Fire Detection Via Support Vector Machines Heshou Wang1,2, Shuibo Zheng2, Chi Chen2, Wenbin Yang2, Lei Wu2, Xin Cheng2, Minrui Fei1, and Chuanping Hu2 1

School of Mechatronical Engineering and Automation, Shanghai University, Shanghai 200072, China 2 Shanghai Fire Research Institute of Ministry of Public Security, Shanghai 200032, China [email protected]

Abstract. Many researchers are exploiting multi-sensor detection to discriminating between fire and nuisance sources. Multi-sensor detectors can monitor multiple aspects of a wide variety of signatures produced by flaming fires, smoldering fires and nuisance source. A new method based on support vector machines (SVMs) is proposed to identify flaming fires, smoldering fires and nuisance sources incorporating smoke, temperature and carbon monoxide (CO) sensors. The usefulness and acceptability of the fire discriminating method has been demonstrated.

1 Introduction In many applications, it is desirable to decrease the detection time, increase detection sensitivity, and increase the reliability of the detection system. The use of multicriteria-based detection technology offers the most promising means to achieve both improved sensitivity to real fires and reduced susceptibility to nuisance sources. An early fire detection system can be developed by processing the signal from sensors that measure different parameters of a developing fire or from analyzing multiple aspects of a given sensor output (e.g. rate-of-rise as well as absolute value). Advances in sensor technology and intelligence that could be implemented to improve detection time while limiting unnecessary alarms were outlined [1]. Thuillard [1] proposed methods of incorporating intelligence along with combinations of current sensor technology for minimizing unnecessary alarms. Numerous other researchers are exploring multi-sensor detection as the principal means of discriminating between fire and nuisance sources [2, 3]. Some research is exploiting signal processing techniques, including statistical methods, expert systems, and neural networks, to investigate the sensor response and provide the discrimination capability between fire and nonfire sources [4, 5]. Support vector machine are a new machine learning methodology based on statistical learning theory and structural risk minimization [7]. The last few years have seen the rise of support vector machines as powerful tools for solving classification and regression problems, time series forecast [8, 9, 6], etc. This was brought about by its excellent characteristic such as good generalization performance, the globally D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 176 – 181, 2006. © Springer-Verlag Berlin Heidelberg 2006

Discriminating Fire Detection Via Support Vector Machines

177

optimal and unique solution, and sparse representation of solution. a particular advantage of SVMs over other learning algorithms is that it can be analyzed theoretically using concepts from computational learning theory, and at the same time can achieve good performance when applied to real problems. SVMs learning are based on some beautifully simple ideas and provides a clear intuition of what learning from examples is about. Second, it can lead to high performances in practical applications. In this paper, an early fire detection method consisting of an array of smoke, temperature and carbon monoxide sensors is presented, with discrimination provided by SVMs analysis of the sensor responses.

2 Support Vector Regression Algorithm Considered two set xi ∈ X ⊆ R n , yi ∈ Y ⊆ R . A primal space is transformed into a

high-dimensional feature space by a nonlinear map ĭ ( x ) = (φ1 ( x ), φ2 ( x ), , φn ( x ) ) .

Approximating the data set with a nonlinear function f ( x ) = ȦT ĭ ( x ) + b.

(1)

The coefficients Ȧ and b can be obtained by solving the primal objective function: 1 2

min

l

2

Ȧ + C ¦ (ξi + ξi* ) i =1

yi − Ȧ ĭ ( xi ) − b ≤ ε + ξi ° T * ® Ȧ ĭ ( xi ) + b − yi ≤ ε + ξi ° * ≥ 0, ¯ ξi , ξi T

s.t.

(2)

where C is the regularization constant which determines the trade-off between the flatness of f and the amount up to which deviations larger than ε are tolerated. ξ i , ξ i* are positive slack variables. A Lagrange function from the primal objective function was constructed as follows:

L=

1 2

l

l

i =1

i =1

Ȧ + C ¦ (ξ i + ξi* ) − ¦ α i (ε + ξi − yi + ȦT ĭ ( xi ) + b) 2

l

(3)

l

− ¦ α (ε + ξ + yi − Ȧ ĭ ( xi ) − b) − ¦ (ηi ξi + η ξ ). * i

i =1

* i

* * i i

T

i =1

This Lagrange function has a saddle point with respect to the primal and the dual variables at the optimal solution. The dual variables in Eq.(3) satisfy positive constraints, i.e. α i , α i* ,ηi ,ηi* ≥ 0 .

178

H. Wang et al.

By means of Karush-Kuhn-Tucker (KKT) conditions, we obtain l ∂L = Ȧ − ¦ (α i − α i* )ĭ ( xi ) = 0 ∂Ȧ i =1 l ∂L = ¦ (α i − α i* ) =0 ∂b i =1 ∂L = C − α i − ηi =0 ∂ξi

° ° ° ° ° ® ° ° ° ° °¯

∂L = C − α i* − ηi* * ∂ξ i

(4)

= 0.

SVMs avoid computing explicitly the map ĭ ( x ) and exploit kernel function

K ( xi , x j ) = ĭ ( xi )T ĭ ( x j ) instead. Any function which satisfiers Mercer condition can be used as kernel function. Utilizing Eq.(4) to eliminate the primal variables (Ȧ, b, ξi , ξi* ) in (3), the Wolfe dual optimization problem is as follows: 1 l * * ° - 2 ¦ (α i − α i )(α j − α j ) K ( xi , x j ) − ° i , j =1 max ® l l ° ε (α + α * ) + y (α − α * ) ¦ i i i i i °¯ ¦ i =1 i =1 s.t.

° ® ° ¯

l

¦ (α

i

(5)

− α i* ) = 0

i =1

0 ≤ α i , α i* ≤ C.

By solving quadratic program, regression function is rewritten as: l

f ( x ) = ¦ (α i − α i* ) K ( xi , x ) + b,

(6)

i =1

where α i , α i * satisfy α i × α i * = 0, α i ≥ 0, α i * ≥ 0 . Only a few coefficients (α i − α i * ) are nonzero values, and the corresponding training data points have approximation errors equal to or larger than ε . These data points are called support vectors. Different kernels can be used as follows: (1) Linear Kernel: K ( x , xi ) = x T xi (2) RBF Kernel: K ( x, xi ) = exp(−

|| x − xi ||2 ) 2σ 2

Discriminating Fire Detection Via Support Vector Machines

179

(3) Polynomial Kernel: K ( x , xi ) = ((γ xT xi ) + r )d , d = 1, 2, , N . where σ , γ , r and d are kernel parameters.

3 SVMs Model Selection When applying the SVMs to modelling, the first thing is what kernel is to be used. RBF kernel is used as kernel function because it has less hyperparameters that influence the complexity of model selection than the polynomial kernel. Our goal is to find SVMs regression model have the best generalization performance. To achieve this goal, the best set of hyperparameters such as C and σ (RBF kernel parameter) has to be selected. As the size of data is severely limited, the cross-validation [11] procedure via parallel grid-search is employed to prevent the overfitting problem. Basic pairs of ( C and σ ) are tried and the one with the best crossvalidation performance is picked. Exponentially growing sequences of C and σ is a practical method to identify good parameters (for example C = e −4 , e −2 , e10 , σ = e −10 , e −8 , , e −2 ). The procedure of S -fold cross-validation divides given data D at random into S subsets {G1 , G2 , , GS } , and uses S − 1 subsets for training, and uses the remaining one for the validation. This process is repeated S times by changing the remaining subset, and the generation performance is evaluated by using the following MSE (mean squared error) over all validation results. MSECV =

1 N

S

¦ ¦ (y

v

− y ( x v θˆi )) 2

(7)

i =1 v∈Gi

Here Gi denotes the i − th subset for the validation. And θˆi denotes the optimal parameter vector obtained by using D − Gi for training. By using S -fold cross-validation with grid-search, the hyperparameters are optimized so that the cross-validation error MSECV is minimized.

4 Discriminating Fire with SVMs Support vector machines were applied to signal processing from smoke, temperature and CO sensors. The system input signals are smoke, temperature rising trend and CO. The output quantity is the probability of fire and nonfire. We used the data collected in a standard laboratory to simulate flaming fire, smoldering fire and nuisance source. Scaling training data is very important. The signals are likely to be measured in different physical units. These attributes in greater numeric ranges dominate those in smaller numeric ranges. Each attribute is recommended to linearly scale to the range [-1, 1] or [0, 1].

180

H. Wang et al.

The training set is 80 samples and 50 samples are used as the test set. RBF kernel is chosed with width σ =9, the loss function with ε =0.001 and regularization constant C =110 in SVMs. Some training samples and training results are given in Table 1. Table 1. Training samples from sensors and training probability results

S∆ (V) 0.076 0.06 0.078 0.049 0.054 0.056 0.06 0.065 0.06 0.056

M∆ (V) 3.1 2.6 3.0 2.6 3.0 2.84 2.94 2.97 2.67 2.7

Tτ 2.0 5.0 4.0 4.0 4.0 1.0 0.5 0.5 0 0.5

PF

PS

PN

0.05 0.05 0.1 0.05 0.1 0.15 0.05 0.1 0.85 0.9

0.05 0.1 0.2 0.2 0.1 0.7 0.7 0.65 0.1 0.05

0.9 0.85 0.7 0.75 0.8 0.25 0.25 0.25 0.05 0.05

PˆF

PˆS

PˆN

0.033 0.064 0.106 0.068 0.134 0.145 0.039 0.082 0.863 0.915

0.058 0.085 0.213 0.182 0.075 0.746 0.725 0.722 0.108 0.076

0.882 0.843 0.689 0.774 0.834 0.255 0.268 0.254 0.056 0.047

Note:

S∆ - Amplitude of smoke M ∆ - Amplitude of CO PF - Probability of flame fire

Tτ - Trend of temperature

PS - Probability of smoldering fire

PN - Probability of nuisance source PˆF - Estimated probability of flame fire PˆS - Estimated probability of smoldering fire Pˆ - Estimated probability of nuisance source N

Some test samples and test results are shown in Table 2. The results indicate SVMs are capable of discriminating fire by means of multi-sensor signals. SVMs have strong ability to learn a small number of samples. It can be seen good generalization performance can be achieved to prevent the overfitting problem. Table 2. Test samples from sensors and test probability results

S∆ (V) 0.051 0.048 0.053 0.055 0.058 0.058

M∆ (V) 2.7 2.5 2.9 2.83 2.92 2.65

Tτ 2.0 3.5 3.5 1.0 0.5 0.5

PF

PS

PN

0.15 0.10 0.20 0.23 0.10 0.90

0.25 0.3 0.30 0.68 0.70 0.10

0.75 0.70 0.80 0.3 0.35 0.05

PˆF

PˆS

PˆN

0.134 0.108 0.225 0.225 0.108 0.882

0.258 0.315 0.284 0.671 0.726 0.078

0.742 0.734 0.788 0.318 0.365 0.043

Discriminating Fire Detection Via Support Vector Machines

181

5 Conclusions This paper presented a new method based on support vector machines to identify flaming fires, smoldering fires and nuisance source incorporating multi-sensors. The research result indicates the application of SVMs to discrimination fire detection is effective and feasible.

Acknowledgement This work was supported by Doctoral Program Foundation of Science & Technology Special Project in University (20040280017), Key Project of Science & Technology Commission of Shanghai Municipality under grant 04JC14038, and Shanghai Leading Academic Disciplines (T0103).

References 1. Grosshandler, W.L.: A Review of Measurements and Candidate Signatures for Early Fire Detection, NISTIR 5555, Gaithersburg, MD, National Institute of Standards and Technology, (1995) 2. Thuillard, M.: New Methods for Reducing the Number of False Alarms in Fire Detection Systems, Fire Technology, 30(2) (1994) 250-268 3. Luck, H.: Remarks on the State of the Art in Automatic Fire Detection, Proceeding of the 10th International Conference on Fire Detection-AUBE’95, Duisberg Germany, April 4, (1995) 4. Pfister, G.: Multisensor/Multicritera Fire Detection: A New Trend Rapidly Becomes State of the Art, Fire Technology, 33(2) (1997) 99-114 5. Okayama, Y.: A Primitive Study of a Fire Detection Medthod Controlled by Artificial Neural Net. Fire Safety Journal, 17(6) (1991) 535 - 553 6. Hall, J.R.: The Latest Statistics on U.S. Home Smoke Detectors, Fire Journal, 83(1) (1989) 39-41 7. Vapnik, V.N.: Statistical Learning Theory. New York: Wiley, (1998) 8. Van, G.T.: Financial Time Series Prediction Using Least Squares Support Vector Machines within the Evidence Framework, IEEE Transactions on Neural Networks, July, v12(4) (2001) 809-821 9. Vapnik, V.N.: An Overview of Statistical Learning Theory, IEEE Trans. Neural Network , 10(5) (1999) 988 - 999 10. Smola, J., Schölkopf, B.: A Tutorial on Support Vector Regression. NeuroCOLT2 Technical Report Series, Royal Holloway College, University of London, UK, (1998) 11. Duric, Petar M.: Model Selection by Cross-Validation. IEEE International Symposium on Circuits and Systems. (1990) 2760-2763

Dynamic Deployment Optimization in Wireless Sensor Networks Xue Wang, Sheng Wang, and Junjie Ma State Key Laboratory of Precision Measurement Technology and Instruments, Department of Precision Instruments, Tsinghua University, Beijing 100084, P.R. China [email protected] [email protected] [email protected]

Abstract. Sensor deployment is one of the key topics addressed in wireless sensor networks (WSNs) study. This paper proposes a self-organizing technique for enhancing the coverage of WSNs which consists of mobile and stationary nodes. The mobile nodes will relocate themselves to find the best deployment under various kinds of situations for covering largest area. The new locations of mobile nodes are determined by parallel particle swarm optimization (PPSO) which is suitable for solving multi-dimension function optimization in continuous space. Especially, the mobile nodes deployment with PPSO is useful in situations while some area need cooperative measuring with multiple nodes, and can be adjusted dynamically according to the requirement of environment. The experimental results verify that mobile nodes deployment with PPSO has good performance in quickness, coverage and connectivity.

1 Introduction In WSNs, dynamic deployment optimization has become one of the key topics addressed. T. Wong et al. [1] and S. Zhou et al. [2] proposed the “Virtual Forces” algorithm which can effectively enhance the coverage and connectivity of WSNs in single measurement, but little attention has been focused on the dependability and precision of sensor nodes. Actually, because of the high robust and precision requirement, cooperative measurement is required in most applications. The proposed PPSO based dynamic deployment optimization algorithm is useful in deployment of cooperative measurement with the effective coverage performance taken as criterion while precision and speed of optimization is satisfied.

2 Detection Model and Evaluation Method 2.1 Sensor Node Deployment Proper deployment can improve performance of WSNs. We assume that all nodes know their location and have same detection range rd , detection dependability r (t ) and communication range rc . We define that a circle with original point (x, y) and radius rd can be monitored by the sensor sitting on location (x, y) with the probability r (t ) at time t. Fig. 1(a) shows an example of initial random sensor deployment. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 182 – 187, 2006. © Springer-Verlag Berlin Heidelberg 2006

Dynamic Deployment Optimization in Wireless Sensor Networks

(a)

183

(b)

Fig. 1. (a) Random deployment. (b) Effective coverage performance evaluation.

2.2 Performance Evaluation for WSNs

If an area is in detection range of n nodes at time t, the area’s synthesis detection dependability can be calculated directly as: n

R ( t ) = 1 − ∏ (1 − ri ( t ) )

(1)

i =1

where ri ( t ) is the detection dependability of ith sensor nodes. Effective coverage performance can be represented by the proportion of effective area where synthesis detection dependability can satisfy the detection acquirement. As shown in Fig. 1(b), gridding algorithm divides the area into grids and calculates the proportion of effective detected grids. The simulation results verify that the error is between 0.5% and 0.1% while granularity is between 4% and 0.25%. Unfortunately, the execution time increases fast when granularity decreases. For reducing the execution time, we can analyze the effective detection area formed by stationary nodes at first, and then solve the remaining area during dynamic adjustment. As illustrated in Fig.1, sensors are divided into two connected groups. In WSNs, if a node can not be connected to the sink node, its information will not be received. For grouping, each node detects its neighbors and all connected nodes label them in same group. We define the group connected with sink node as activated group. The number of connected nodes and coverage are focused. The coverage ratio is as follows:

Cm =

ci ,i ∈ S A

(2)

where ci is the coverage of a sensor i, S is the set of nodes, and A is the total size of the area to be monitored. Let N m denotes the number of sink-connected nodes after placing mobile nodes. To represent the improvement, we define:

N im =

N m − N0 × 100% N0

(3)

where N 0 is the number of connected nodes before placing any mobile nodes. We also define cim as the improvement of coverage with m mobile nodes, as:

Cim =

Cm − C0 ×100% C0

(4)

184

X. Wang, S. Wang, and J. Ma

3 PPSO Based Dynamic Deployment Optimization 3.1 Principle of Particle Swarm Optimization

PSO is a swarm-intelligence-based evolutionary algorithm [3]. In PSO, the potential solutions, called particles, “fly” through the search space to find optimal solution [4]. Each particle keeps the best location pbest and the global optimal solution gbest. The current location is called pnow. During optimization, each particle changes its velocity toward pbest and gbest position with the bounded random acceleration. pbest and gbest are updated according to (5) and (6) respectively: ° pbest if f ( pnow ) ≥ f ( pbest ) pbest = ® °¯ pnow if f ( pnow ) < f ( pbest )

(5)

gbest = min { pbest1 , pbest2 , , pbestn }

(6)

Velocity and position of particle are updated according to equations (7) and (8): vij ( t + 1) = ω ( t ) × vij ( t ) + c1 r1 j ( t ) ( pij ( t ) − xij ( t ) ) + c2 r2 j ( t ) ( pgj ( t ) − xij ( t ) )

(7)

xij ( t + 1) = xij ( t ) + vij ( t + 1)

(8)

where c1 and c2 are acceleration constants, r1 j ( t ) and r2 j ( t ) are two separate random functions in the range [0,1] for ith particle in jth dimension, xij ( t ) and vij ( t ) represent position and velocity at time t separately, pij ( t ) is the pbest, and pgj ( t ) is the gbest. Variable ω ( t ) is the inertia weight used to balance the global and local search. The simulation results illustrate that an inertia weight starting with a value 0.9 and linearly decreasing to 0.4 greatly improve the performance of PSO [5]:

ω ( t ) = 0.9 −

t × 0.5 MaxNumber

(9)

where MaxNumber is the number of maximum iterations. 3.2 PSO Based Dynamic Deployment Optimization

The elements in position vector X i = ( xi1 , xi 2 , xin ) present coordinates of all mobile nodes, and the correlative fitness is presented by the proportion of effective detected area. Granularity should decreases gradually for the tradeoff between speed and precision. After adjusting granularity, we should renew the velocities of particles randomly and re-analyze the gbest’s and pbest’s fitness associated with new granularity for keeping the validity. The process of optimization is as follows: 1. Initialize a population of particles with random positions and velocities and granularity. Analyze the effective detection area formed by stationary nodes. 2. Evaluate the effective coverage performance. Compare and update optimal pbest value of each particle and global optimal gbest of whole population. 3. Change velocity and position of particle according to (7) and (8) respectively.

Dynamic Deployment Optimization in Wireless Sensor Networks

185

4. Halve the granularity when gbest is not evolved in recent 10 iterations, renew the velocities randomly, and re-analyze the fitness. 5. Loop to step 1 until a criterion is met, usually a sufficiently small granularity, a sufficiently good fitness or a maximum number of iterations. 3.3 PPSO Based Dynamic Deployment Optimization

Large amount of computation and limited computing ability of each node constrain the utility of PSO based dynamic deployment optimization. So we use PPSO algorithm which divides the whole detecting area into n groups which contain same number of nodes, where n equals to the number of intelligent nodes, as illustrated in Fig. 2. Because of random deployment, the uncovered area in each part is not equal. Then, the mobile nodes are divided into n parts:

ni =

si ¦s

×N

(10)

where si is the uncovered area of ith part, N is the total number of mobile nodes.

Fig. 2. Sensor node division map, each group contains same number of nodes

Significantly, the nodes in the edge will affect the nearby area which must be considered during optimization. Because the furthest area that nodes can affect is determined by the detection radius rd , the region should be enlarged by rd . As illustrated in Fig. 2, the dash dot lines form the boundary of actual optimized area. Furthermore, because each intelligent node performs optimization independently, some mobile nodes may overlap with others. So, if the distance between two mobile nodes is less than the detection radius rd , their positions should be re-optimized in whole area with other optimized mobile nodes considered as stationary ones.

4 Simulation Results We simulate a WSN including n = 80 stationary nodes and n = 20 mobile nodes, with detection radius r = 7m and communication radius r = 2r = 14m . The mobile nodes are randomly deployed in a square region with area A = 100 × 100 = 10000m . As illustrated in Fig. 3(b), with only 20 mobile nodes, connectivity and coverage are greatly improved. But two mobile nodes at about x = 550m, y = 700m overlap. Fig. 3(c) shows the adjusted result. s

d

m

c

d

2

186

X. Wang, S. Wang, and J. Ma

(a)

(b)

(c)

Fig. 3. Demonstration of optimization. (a) No mobile nodes. (b) 20 mobile nodes before adjustment. (c) 20 mobile nodes after adjustment.

(a)

(b)

(c)

Fig. 4. (a) Coverage improvement with different iterations in one node. (b) Coverage and (c) Connectivity improvement with different number of mobile nodes.

Fig. 4(a) represents the coverage improvement in one intelligent node during the execution of PPSO. Fig. 4(b) and (c) represent how the coverage and connectivity increase with mobile nodes. As illustrated, the coverage and connectivity are doubled when there are 2 mobile nodes. It can be verified that our algorithm can greatly improve the connectivity and coverage of WSNs. Experiment results shows that the time descends significantly when the number of intelligent nodes increases, and the PPSO algorithm has great performance at speedup and efficiency. Furthermore, execution time of optimization will increases with the number of sensor nodes, but the increase is almost linear. Moreover, we assumed that target should be tracked by at least 4 nodes for detection dependability. As illustrated

(a)

(b)

(c)

Fig. 5. Dynamic position change of mobile nodes, (a) before adjustment; (b) after adjustment; (c) another adjustment; where circles denote the available range for nodes to track the target

Dynamic Deployment Optimization in Wireless Sensor Networks

187

in Fig.5, the proposed algorithm can dynamically change the positions of nearest and fewest mobile nodes according to current situation. After the target moving away, the nodes will go back to its former position for enlarge the coverage.

5 Conclusions PPSO based dynamic deployment optimization, which takes gridding effective coverage and connection performance evaluation as criterion, can optimize the deployment dynamically according to the detection demand and states of nodes. The simulation results verify that the proposed algorithm is useful for both cooperative and single measurement. Furthermore, the parallel mechanism reduces the execution time and the time increases slowly with the number of nodes. We can make a conclusion that PPSO is suitable for the dynamic deployment optimization of WSNs. Acknowledgement. This paper is sponsored by National Natural Science Foundation of China (No. 60373014; No. 50175056).

References 1. Wong, T., Tsuchiya, T., Kikuno T.: A Self-organizing Technique for Sensor Placement in Wireless Micro-Sensor Networks. Proc. of the 18th Int. Conf. on Adv. Info. Networking and Application, IEEE, Piscataway, NJ (2004) 78-83 2. Zhou, S., Wu, M. Y., Shu, W.: Finding Optimal Placements for Mobile Sensors: Wireless Sensor Network Topology Adjustment. In Proc. of the IEEE 6th Circuits and Systems Symposium on Emerging Technologies, IEEE, Piscataway, NJ (2004) 529-532 3. Ciuprina, G., Ioan, D., Munteanu, I.: Use of Intelligent-Particle Swarm Optimization in Electromagnetics. IEEE Trans. on Magnetics, (38) 2 (2002) 1037-1040 4. Eberhart, R. C., Shi, Y.: Particle Swarm Optimization: Developments, Applications and Resources. Proc. Congress on Evolutionary Computation, IEEE, Piscataway, NJ (2001) 81-86 5. Shi, Y., Eberhart, R. C.: Fuzzy Adaptive Particle Swarm Optimization. Proc. Congress on Evolutionary Computation, IEEE, Piscataway, NJ (2001) 101-106

Energy-Efficient Aggregation Control for Mobile Sensor Networks Liang Yuan, Weidong Chen, and Yugeng Xi Department of Automation, Shanghai Jiao Tong University, Shanghai, China [email protected]

Abstract. A primary purpose of sensing in a sensor network is to collect and aggregate information about a phenomenon of interest. The interesting phenomenon is often an event. In this paper, we develop control algorithms aggregate the interesting event based on energy-efficient communication topology. The communication topology is that each node keeps the k closest neighbors by adjusting the transmit power in the aggregating process. And it can not only remain communication connection but also decreases the energy consumption. Moreover we design local feedback controller for completing the g1obal target. We simulate the aggregation motion. As the results, the sensor nodes converge to different formation and the utility of our proposed method is validated.

1 Introduction With the development of wireless sensor networks, computer network, distributed computing and mobile robots, mobile sensor networks (MSN) has recently been emerging as an important research area. It consists of a collection of mobile nodes with a variety of different type of sensors, for example, mobile robots or mobile vehicles etc, distributed over some areas that form an ad-hoc wireless network. From the present development tendency, market potential for MSN is gigantic. It is applied in military field, space field, environment monitoring and homeland security etc. Recently, some researchers have studied mobile sensor networks. Gaurav Sukhatme and Tan focused mainly on evenly dispersing sensors form a source point and redeploying them for network rebuilding to maximize the coverage area[1, 2], rather than aggregating the sensor nodes in an area of interest. Related work by Jorge Cortes and his colleagues [3] use voronoi methods to aggregate the mobile sensor nodes to the particular formation that can acquire more information. But in real-time voronoi partition, the method can not depend on the self-organization and requires defining the distribution beforehand. Meanwhile, the mobile sensor nodes have to consume lots of energy for completing a specific task. Especially, the communication consumes more energy and the amount of energy consumed by the transceiver varies from about 15% to about 35% of the total energy [4]. So, how to decrease energy consumption of communication has been an important topic. However, many researches for energy-efficient communication of wireless sensor networks often select D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 188 – 193, 2006. © Springer-Verlag Berlin Heidelberg 2006

Energy-Efficient Aggregation Control for Mobile Sensor Networks

189

a mobility models integrated in simulator. The mobility models for simulation include random waypoint model, random direction model and Brownian-like motion etc [5, 6]. These models can’t stand for the real-life movements of the sensor nodes. Our aim focuses on completing the energy-efficient aggregation motion in an interest area. We present the k-closest neighbors method that decrease energy consumption through adjusting the radio power for each node, while ensure that the communication topology remains connection. Moreover, each node can depend on the local sensing information to converge to different stable formation. The outline of this paper is as follows. In Section 2, we model the sensor network. In Section 3, we propose energy-efficient aggregation control algorithms to acquire different formation. In Section 4, we simulate the aggregation motion and validate the utility of our proposed method. Next, in Section 5, we summarize the paper.

2 Sensor Networks Modeling Consider a sensor network consisting of N identical coupled linear oscillators (nodes) as follows. The state equations of the sensor network can be written as [7]: .

N

.

x i = f (xi ) + c

¦

a ij Γ x j ,

i = 1 , 2 ... , N

(1)

j =1

Щ

where xi= (xi1, xi2, … , xin) RN are the state variables of node i and the constant c>0 represents the coupling strength of the network. f(xi) is a random motion equation of the sensor nodes initially. For simplicity, we take ī=diag(r1, r2, …, rn) RN×N with ri=1 for a particular i and rj=0 for j i [7]. If there is a connection between node i and node j (i j), then aij=aji=1; otherwise, aij=aji=0(i j). We take

Щ

ҁ

ҁ

ҁ

N

a ii = −

¦a

ij

= −ki ,

i = 1, 2 , ..., N

.

(2)

j = 1, j ≠ i

where the degree ki of node i is defined to be the number of connection incidents on node i. Defined that the sensor network (1) is (asymptotically) stable if x 1 ( t ) = x 2 ( t ) = ... = x N ( t ) = s ( t )

as

t → ∞

.

(3)

3 Control Algorithms 3.1 Aggregation Control We assume that each node knows the exact position of the target point P and the other nodes and move simultaneously to cover the point P. s(t) can be an equilibrium point, a stable formation. To achieve such the goal (3), we apply artificial potential field method on the nodes of the sensor networks. We assume that the dangerous radius of the target point is Rd. SR is the dangerous area. AR is the attractive potential field of the target point P.

190

L. Yuan, W. Chen, and Y. Xi

Definition: we suppose that all sensor nodes can find the target point initially. Due to the attractive potential field of the target point P, each node can move toward P. Thus, the attractive potential field force of node i is defined:

ˈ

Щ

° d ( x p − x i ) f a ( xi ) = ® 0 , °¯

x p − xi ≥ Rd

.

(4)

x p − xi < Rd

where xi N is the position of node i. xp represents the target position. d (d>0) is a coefficient that show the degree of the attractive potential field force. So, the controlled network can be described: ° x i = f ( x i ) + c ® ° u = f (x ) a i ¯ i

N

¦

a ij Γ x

j

+ ui,

i = 1 , 2 ..., N

j =1

.

(5)

Actually, the equation (5) is a local feedback controller. d is a feedback control gain. The constant c and d influence the convergence speed of the whole sensor networks. In equation (4), we only assume that the interest event area is a circle. Actually, we can also define different the attractive potential field of the target point P according to different formation of the interest event area, for example, ellipse, line or column. 3.2 Energy-Efficient Topology For mobile sensor networks, how to decrease the communication energy consumption is very important. It involves with network topology control. Topology control is how to set the radio range for each node so as to ensure the communication connected, while still minimizing energy usage [8]. For the purpose of keeping the connection of communication and decreasing the energy consumption of communication, we adopt the k-neighbors method to realize the topology control of MSN [9, 10]. The algorithm based on k-neighbors uses the idea of changing radio power depending on the number of neighbors. Initially, a node finds all neighboring nodes using an initial power Po. If the number of neighbors is less than the required number of neighboring nodes k, then the transmission power is increased (see Fig. 1). If the number of neighbors is greater than the required number of neighboring nodes k, the closest k neighbors are reserved as the neighbor nodes and the rest nodes are deleted from the neighbors of node i (see Fig. 1). Hence each node is forced to maintain k

(a)

(b)

Fig. 1. Adjusting radio power of node i (a) increasing radio power of node i. (b) decreasing radio power of node i.

Energy-Efficient Aggregation Control for Mobile Sensor Networks

191

neighbors. Thus, it helps maintaining high connectivity even when the distribution of the sensor nodes is sparse. Especially, in the process of aggregation, the distance between the sensor nodes is gradually decreasing. The topology control method is actually effective. So, we can benefit from adjusting nodes different transmission ranges in decreasing energy usage. According to the sensor networks communication model [4], the communication energy cost is totally determined by the value of transmission ranges of each node. Thus the goal of the energy efficient topology becomes to minimize transmission ranges of each node in the process of the aggregation motion: T

min

n

¦ ¦ rα i

.

(6)

t = 0 i =1

where ri denotes the range assigned to node i and Į is the coefficient that depends on the environmental conditions. T is the time to complete the aggregation motion. Meanwhile, the value of the k closest neighbors number must be considered for keeping the network connection. In [10], Xue and Kumar present the number of neighbors needed connectivity of wireless networks. If each node is connected to less than 0.074logn nearest neighbors then the network is asymptotically disconnected with probability one as n increases, while if each node is connected to more than 5.1774logn nearest neighbors then the network is asymptotically connected with probability approaching one as n increasing. Thus, the number of neighbors always keeps as k. From equation (2), we know that the coupling matrix A is a symmetric matrix and the diagonal elements are:

aii = −k ,

i = 1,2,..., n .

(7)

where k satisfies [5.1774logn ]+1kn-1. [5.1774logn] stands for the smaller but nearest integer to the real number 5.1774logn. In the process of aggregation, the distance between the nodes can gradually be reduced. If we specify the k value, the transmit radius ri of each node on the basis of remaining the k closest neighbors is diminished. Thus, communication energy consumption is greatly decreased. But, if the transmit radius ri of each node keep the constant, communication energy will be more wasted in the process of aggregation.

4 Experiments and Results Considered that a mobile senor networks. Initially, the sensor nodes are randomly moving. Assumed that the number of the sensor nodes is 10 and the system has an unstable equilibrium point: xp=25. We can stabilize the sensor networks onto the originally unstable equilibrium point xp by applying the local linear feedback control equation (5). Fig. 2 shows the processof controlling a 10-nodes network by a completed coupled networks, that k=7.

192

L. Yuan, W. Chen, and Y. Xi 32

35 34

30 32

28

30 30

26 25

28

X

X

X

24 22

26

20 24

20

22

18

15

16 14

20

0

0.5

1

1.5 t

2

2.5

10

3

0

0.5

1

1.5 t

(a)

2

2.5

18

3

0

0.5

1

1.5 t

(b)

2

2.5

3

(c)

Fig, 2. Stabilizing processes with different coupling strengths. (a) c=0, d=0. (b) c=1, d=1 (c) c=10, d=20.

Based on equation (6), the distance between the nodes can constantly be reduced in the process of aggregation. Simultaneously, we can diminish the transmit radius of each node. Thus, communication energy usage is greatly decreased trough topology control. But, if we don’t use topology control, the transmit radius of each node keep the same value, communication energy is more wasted. Fig. 3 shows average energy consumption for 10 nodes in the process of aggregation motion with topology control and without topology control, that k=7, c=10, d=30, Į=3. 800

the transmit power of each node

700 Topology control Without topology control

600 500 400 300 200 100 0

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

t

Fig. 3. Average energy consumption for 10 nodes in the process of aggregation motion

A common character in the simulation study is that the sensor nodes can converge to the stable state by the local feedback controller. Fig. 4 shows different stable formation by changing xp in c=10, d=30. 5

5

0

0

0

-5

-5

-5 y

y

y

5

-10

-10

-10

-15

-15

-15

-20 -5

0

5

10 X

15

20

25

-20 -5

0

5

10 X

15

20

25

-20

0

5

10

15 X

Fig. 4. Circle formation, ellipse formation and line formation

20

25

Energy-Efficient Aggregation Control for Mobile Sensor Networks

193

5 Summary In this paper, we develop a control algorithm to aggregate the interesting event based on synchronous theory of complex dynamical networks. And assumed that the event point has a strong attractive potential field, all nodes can move to the event point in the field force. Each node keeps k closest neighbors by adjusting the radio power in the aggregating process. It can decrease the communication energy consumption and guarantee that the sensor network keeps the connection of communication. We present the g1obal target by local feedback controller. As the results, the sensor nodes can converge to circle, ellipse and line formation only through local sensing information.

Acknowledgements This work is partly supported by the National Hi-Tech Research and Development Program under grant 2005AA420010, the Natural Science Foundation of China under grant 60475032 and the “Shu Guang” Project of Shanghai.

References 1. Howard, A., Mataric, M. J., Sukhatme, G. S.: Mobile Sensor Network Deployment Using Potential Fields: A Distributed, Scalable Solution to The Area Coverage Problem. The 6th International. Symposium on Distributed Autonomous Robotic Systems (2002) 299–308 2. Poduri, S., Sukhatme, G. S.: Constrained Coverage for Mobile Sensor Networks. IEEE International Conference on Robotsics and Automation (2004) 165-171 3. Cortes, J., Martinez, S., Karatas, T. , Bullo, F.: Coverage Control for Mobile Sensing Networks. IEEE Transactions on Robotics and Automation. 20 (2004) 243-255 4. Santi, P.: Topology Control in Wireless Ad Hoc and Sensor Networks. John Wiley & Sons, Ltd (2005) 5. Heo, N., Varshney, P. K.: Energy-Efficient Deployment of Intelligent Mobile Sensor Networks. IEEE Transactions on Systems, Man, and Cybernetics-part A: Systems and Humans. 35(2005) 78-92 6. Bettstetter, C., Resta, G., Santi, P.: The Node Distribution of The Random Waypoint Mobility Model for Wireless Ad Hoc Networks. IEEE Transactions on Mobile Computing. 2 (2003) 257-269 7. Wang, X. F., Chen, G.: Pinning Control of Scale-Free Dynamical Networks. Physica A. 310 (2002) 521-531 8. Yuan, L., Chen, W. D., Xi, Y. G.: A Review of Control and Localization for Mobile Sensor Networks. The 6th World Congress on Intelligent Control Automation (2006). In Appear 9. Gurumohan, P. C., Taylor, T. J., Syrotiuk, V. R.: Topology Control for MANETs. WCNC (2004) 599-603 10. Xue, F., Kumar, P. R.: The Number of Neighbors Needed for Connectivity of Wireless Networks. Wireless Networks. 10 (2004) 169-181

Intelligent MAC Protocol for Efficient Support of Multiple SOPs in UWB-Based Sensor Networks Peng Gong, Peng Xue, and Duk Kyung Kim Dept. of Information and Communication Engineering, INHA University, Incheon, 402-751, South Korea [email protected], [email protected]

Abstract. Ultra-wideband (UWB) technique has been considered as a possible candidate for high rate Wireless Sensor Network (WSN) because of its higher capacity with low power operation. The IEEE 802.15.3 Medium Access Control (MAC) protocol works on a Time Division Multiple Access (TDMA) basis within a piconet. With multiple overlapped piconets, the current protocol uses Parent/ Child (P/C) or Parent /Neighbor (P/N) configuration to avoid interpiconet interference, but the throughput of P/N or P/C cannot exceed that of single piconet. In this paper we propose an intelligent MAC protocol to cooperate with the UWB system based on Multi-Carrier Code Division Multiple Access (MC-CDMA). The proposed protocol uses Intermediate Sensor Device (ISDEV) to connect Piconet Coordinators and adaptively arrange 2 simultaneous data transmission links during each Channel Time Allocation (CTA). Our simulation results demonstrate the proposed scheme can achieve a higher throughput with an acceptable compromise at link success probability in multiple overlapped sensor networks.

1 Introduction Wireless Sensor Networks (WSNs) are composed by sensor nodes (device), which have severe restriction on processing power and memory in wireless channel environments. They have been widely used in various areas, such as military, telemedicine, environment monitoring and etc. Recently, high data rate applications, such as industrial monitoring and structural monitoring bring new challenges for sensor networks. Together with the notable characteristics of high data rate, low power, and low cost, UWB seems to be a valid candidate for sensor networks with high data rate communication, and the research of UWB in sensor network has been increased dramatically in recent year [1], [2]. In order to maximize the benefits of UWB technology in sensor networks, joint design of the MAC and efficient radio technology should be paid more attention. Depending on the various services, a few or thousands sensor devices (SDEV) may be collaborated as one network or piconet. The piconet consists of a number of SDEVs and a specific SDEV. The specific SDEV in the piconet is designated as the network coordinator, so call Piconet Coordinator (PNC) is in charge of overall control of a piconet and acts as an intermediary between the SDEVs and the network D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 194 – 201, 2006. © Springer-Verlag Berlin Heidelberg 2006

Intelligent MAC Protocol for Efficient Support of Multiple SOPs

195

computer. Due to the movement of objects in sensor networks, the multiple piconets may be geographically overlapped. Then, Co-Channel Interference (CCI) is generated between the piconets. Dynamic channel selection and parent/child (P/C) or parent/neighbor (P/N) piconet are used to avoid the CCI in the IEEE 802.15.3 MAC [3]. In fact the channels defined for MB-OFDM system are not orthogonal, so there is serious CCI even using dynamic channel selection [4]. On the other hand, the child piconet or neighbor piconet works in a private CTA within the parent superframe. This provides the interference mitigation but limits the throughput. The MC-CDMA system uses spreading matrix to provide additional degree of diversity to improve the performance of MB-OFDM system [5], but it did not take the CCI into account. There is still serious CCI within Simultaneously Operating Piconets (SOPs), so the problem of overlapped SOPs is still not solved. Although the MCCDMA is not a perfect solution for the overlapped SOPs, it has the potential to be used in UWB sensor network owing to its additional diversity and flexible data rate by means of frequency domain spreading. A combined MC-CDMA and MAC approach [6] motivates us to propose an intelligent MAC to cooperate with modified MC-CDMA system for SOPs in UWB-based sensor networks. The intelligent MAC protocol uses senior PNC to allocate the CTAs to all of the SDEV in overlapped SOPs through ISDEV. It limits the simultaneous pairs of communicating SDEVs to 2 during each CTA. The spreading code matrixes are used to provide the additional degree of diversity to the system. Additionally, based on twice transmissions of the same OFDM symbol, a joint Minimum Mean Square Error (MMSE) estimator [7] has been adopted in the receiver. We evaluate the system performance in terms of throughput and link success probability as measures of quality and quantity of the modified MC-CDMA system with intelligent MAC protocol in overlapped sensor networks. The paper is organized as follows. Section 2 gives the overview about IEEE 802.15.3 MAC and the problem in SOPs. Section 3 describes the proposed MCCDMA system. Section 4 introduces the intelligent MAC protocol for overlapped SOPs. In the section 5, the proposed scheme is valuated by intensive link and system level simulations. Finally, section 6 draws the benefits and conclusions.

2 Backgrounds In this section, we explain the IEEE 802.15.3 MAC protocol and its problem in overlapped SOPs based on the MB-OFDM technique. The IEEE 802.15.3 MAC timing within a piconet is based on the superframe. The superframe consists of three parts: beacon, Contentions Access Period (CAP) and Channel Time Allocation Period (CTAP). During CAPs, the SDEVs access the channel with CSMA/CA. Channel access in the CTAP is based on TDMA. The CTAP is divided into CTAs, which are used for PNC-SDEV and SDEV-SDEV communications. The child or neighbor piconet works within the parent superframe. The PNC of the parent piconet allocates private CTA for child or neighbor piconet(s). The PNC of the child or neighbor piconet broadcasts its beacon and allocates CTAs inside private CTA. Because every link has the guaranteed time slots, there is no inter-piconet interference in P/C or P/N configuration.

196

P. Gong, P. Xue, and D.K. Kim

The support of data rates 55, 110, 200Mb/s is mandatory for MB-OFDM UWB system [4]. Time-Frequency Code (TFC) has been used to interleave coded data over 3 frequency bands and define separate logical channels or independent piconets. It shall be mandatory for all devices to support Mode 1 operation (operating in 3 lowest bands). In order to mitigate the symbol collision probability, the repeater had been adopted for the 3 mandatory data rates. Four TFC channel patterns for model 1 devices are listed in [4]. According to the IEEE 802.15.3 MAC, when the PNC detects some overlapped SOPs the PNC can use dynamic channel selection to change the channel. In fact, the 4 logical channels for model 1 device in [4] are not orthogonal. Any two piconets experience collisions in the transmitted symbols even when the PNC changes the channel. When the number of SOPs increases, the collision would be more serious.

㧘

3 The Modified MC-CDMA System Fig. 1 (a) shows the structure of the proposed transmitter of the MC-CDMA system, where tone interleaving, pilot tones, guard tones, and cyclic prefix are omitted for simplicity. After mapping the coded-bits onto one of QPSK constellation points, the signal is spread in frequency domain. The chip duration is equal to the QPSK symbol duration. Wash-Hadamard code with a length of 8 has been chosen herein as an example. Among 8 possible code sequences, the proposed transmitter utilizes n code sequences to enable n simultaneous transmissions (n=1, .., 4). Then the signals are sent into the IFFT (Inverse Fast Fourier Transform). The OFDM symbol is transmitted twice according to the TFC pattern as in [4].

Fig. 1. Proposed MC-CDMA structure

The transmitted signal experiences fading channel [8] and AWGN (Additive White Gaussian Noise). Fig. 1 (b) describes the structure of the proposed receiver of the MC-CDMA system. After FFT processing, the receiver combines the received OFDM symbols with joint MMSE [7], which minimizes the effect of noise. The reference signals are detected by dispreading with the same spreading matrixes used in the

Intelligent MAC Protocol for Efficient Support of Multiple SOPs

197

transmitter. Eight symbols are grouped and sent into despreader. Finally, the despreader can recover n symbols which are simultaneously transmitted over each group.

4 Intelligent MAC Protocol for Overlapped SOPs The average collision probability reaches 38%, 55.7% and 70.7% for 2, 3 and 4 SOPs, respectively with the TFC pattern in [4]. Especially, it is difficult to mitigate the CCI in case of 3 and 4 SOPs without coordination. So we propose to limit the number of interference-source to one. That means no matter how many piconet are overlapped, there are only 2 simultaneous data transmission links during each CTA. In order to separate the 2 links, we assign two different spreading matrixes to them. Each spreading matrix has a size of 4 and the spreading codes belonging to different spreading matrix are designed to have a good cross-correlation property as in Table 1. Each link can adjust n code sequences (n=1,..,4) flexibly based on the link quality. Table 1. Spreading matrix

Spreading matrix 1

Spreading matrix 2

1 1 1 1 1 1 1 1

Spreading Code 1 1 1 -1 1 -1 1 -1 -1 -1 -1 1 1 1 1 -1 1 -1 1 -1 -1 -1 -1 1

1 1 1 1 -1 -1 -1 -1

1 -1 1 -1 -1 1 -1 1

1 1 -1 -1 -1 -1 1 1

1 -1 -1 1 -1 1 1 -1

Now we give an example of 2 overlapped SOPs working based on the proposed MAC protocol. Two piconets may approach and their coverage areas are partially overlapped as in Fig. 2. When the intelligent MAC protocol is applied, DEV-A belonging to P-1 (piconet -1) first hears the beacon from PNC-2 (the PNC of P-2). Secondly, SDEV-A can act as an ISDEV and send out a heartbeat signal, which has copied the beacon signal of P-2, to PNC-1. We consider PNC-1 as senior PNC. Then PNC-1 adjusts the superframe duration, performs beacon alignment, and assigns the spreading matrixes. When the senior PNC finishes the coordination, it broadcasts the beacon with the information of the coordinated superframe. After receiving beacon from PNC-1, the ISDEV-A sends the beacon of PNC-1 to PNC-2 by heartbeat signal. Then the beacon signal will be broadcasted in the piconets. Each SDEV listens to the beacon from its PNC and keeps synchronization with the coordinated superframe. Fig. 3 (a) shows the coordinated superframe for 2 overlapped SOPs, where the beacon-1 and beacon-2 are assigned in different time slots [9]. During CTA-1, there are 2 simultaneous data transmission links (link-1 in P-1 and link-1 in P-2) with different spreading matrixes. Fig. 3 (b) illustrates the coordinated superframe for 3

198

P. Gong, P. Xue, and D.K. Kim

overlapped SOPs. The senior PNC adaptively chooses 2 links for each CTA in coordinated superframe and those 2 links should belong to different piconets; in the CTA-1 L-1 in P-1 works with L-1 in P-2, and in the CTA-2 L-2 in P-1 works with L-1 in P-3. When SOPs move far from each other, each piconet can work based on its original superframe and no coordination is needed. SDEV-2 PNC-2

PNC-1

ISDEV-A

SDEV-1

Intermediate DEV SDEV in Piconet-1

PNC

SDEV in Piconet-2

Fig. 2. Two simultaneously operating piconets

(a) Coordinated Superframe for 2 SOPs

(b) Coordinated Superframe for 3 SOPs

Fig. 3. Coordinated superframes

5 Simulation Results 5.1 Link Level Simulations In the link level simulation, a Packet Error Ratio (PER) of less than 8% is required with a frame body length of 1024 octets of pseudo-random data [4]. All of the simulation assumptions are the same as in [10]. Owing to additional diversity by means of frequency domain spreading, lower code rates are adopted such as 1/2 instead of 11/32 for 55 and 100 Mb/s, and 3/4 instead of 5/8 for 200Mb/s. With the 4 simultaneous transmissions with spreading factor of 8 and reduced coding rate, we can achieve three corresponding data rates 40, 80, and 120Mb/s for the MC-CDMA system. With a smaller number of codes, the data rate can be scalable down to 10 Mbps. Formally, the SINR is defined as the desired received power divided by the total interference plus noise power. Since the CCI is created by frequency-hopped collision, Guassian approximation is no longer suitable for the interference power. The interference power is just effective on the collided symbols and the collision probability is dependent on the number of overlapped SOPs. Tables 2 and 3 give the required SINR (SNR) to achieve 8 % PER from our intensive link level simulations of MB-OFDM and MC-CDMA systems with different number of SOPs.

Intelligent MAC Protocol for Efficient Support of Multiple SOPs

199

Table 2. Link level simulation results for MB-OFDM system

Data Rate (Mbps) Coding rate SNR for 8% PER (1 SOP) SINR for 8% PER (2 SOPs) SINR for 8% PER (3 SOPs)

55 11/32 6.6dB 5.5dB 6.1dB

110 11/32 7.4dB 6.3dB 6.9dB

200 5/8 9.8dB 8.5dB 9.1dB

Table 3. Link level simulation results for MC-CDMA system

Data Rate (Mbps) Coding rate Spreading matrix size SINR for 8% PER (2&3 SOPs)

10 1/2 1x8 3.4dB

40 1/2 4.9dB

80 1/2 4x8 5.6dB

120 3/4 7.2dB

5.2 System Level Simulations There are 3 SOPs with an equal separation distance between the PNCs as configured in Fig. 4. The separation between PNCs is defined as PNC distance (D). The piconet range is set to be 10m. Each piconet has 20 homogeneous SDEVs. Links among the SDEVs in each piconet are randomly created. The system chooses the highest data rate, which satisfying 8% PER requirement based on the required SINR (SNR) in Tables 2 and 3. If all of the data rates cannot satisfy the required error rate criterion, the link is considered as unsuccessful link. Reference and interference signals are not time aligned due to different propagation delay. We consider three different scenarios in the simulation. The SOPs with P/C or P/N configuration are considered in the scenario 1. The SOPs without coordination exist in the scenario 2. Finally, the SOPs with the proposed scheme are considered for scenario 3.

Fig. 4. Configuration of three overlapped SOPs

Table 4 summarizes the parameters used in the system level simulations. The throughput is defined as the total number of the information bits in the packets that are correctly received in a given time duration. The link success probability is the percentage of the successful links among the total links, which has been allocated CTAs in the superframe.

200

P. Gong, P. Xue, and D.K. Kim Table 4. Simulation parameters

Channel model Modulation Spreading matrix Packets size Transmit power Pass loss model Noise power per bit

Channel model 1 in [8] QPSK Table 1 1024bytes -10.3dBm 44 . 2 + 20 log 10 ( d ) dB − 174 + 10 * log 10 ( data rate ) dBm

FFT size Superframe

Beacon time CAP time CTAP time CTA size Piconet No.

128 65 ms 0.5 ms 4.5ms 60 ms 1 ms 2,3

Fig. 5 compares the throughput and link success probability (LSP) in three scenarios. Scenario 1 has a constant average throughput because of the TDMA structure within a superframe. Due to the overhead of beacons and CAP durations, the case of 3 SOPs has a lower throughput compared with the case of 2 SOPs. Without coordination, the throughput increases as the distance between PNCs increases, and it is higher than the P/C configuration when two PNCs are farther than 12 meters (so called partial overlap). However, from Fig. 5 (b) the LSP is too low even with a slight overlap, e.g., the PNC distance is longer than 18 meters. The proposed MAC protocol combined with the MC-CDMA technique is a compromise between the throughput and the LSP. When each link is assigned only 4 codes for transmission, the throughput can increase 150 % approximately while maintaining the LSP higher than 77 %. When a single code is allowed flexibly based on the link quality, the LSP can be increased up to 93 % even with a perfect overlap (D = 0). When two PNCs are within the piconet range of 10 meters, it is easy to have a direct communication between them and form a P/C configuration. However, when they are apart each other farther than 10 meters, their coverage are partially overlapped and the IDEV is required to achieve coordination. It is shown that the P/C configuration is inefficient in terms of throughput with partial overlap as shown in Fig. 5 (a). The proposed scheme mitigates this problem and is beneficial in terms of throughput at a cost of slight degradation in LSP.

(a)

(b)

Fig. 5. Comparisons of throughputs and link success probabilities in three scenarios

Intelligent MAC Protocol for Efficient Support of Multiple SOPs

201

6 Conclusions In this paper we proposed an intelligent MAC protocol to cooperate with the modified MC-CDMA technique in UWB-based sensor networks. Using MC-CDMA technique, the data rate can be flexibly changed by varying the number of simultaneous code transmissions, depending on the link quality. And the simultaneous links within a CTA is controlled to be two at most, which greatly reduce the collision probability. Additionally, the proposed MC-CDMA improves the SINR requirements owing to the frequency domain diversity and joint MMSE. This proposed scheme was found efficient in the multiple overlapped sensor networks, which achieves an increase in throughput by 50 % approximately with an acceptable compromise in LSP. Acknowledgments. This research was supported by University IT Research Center (INHA UWB-ITRC), Korea.

References 1. Giuliano, R., Mazzenga, F.: Performance Evaluation of UWB Sensor Network with Aloha Multiple Access scheme. IWWAN (2005) 2. Oppermann, I.: UWB Wireless Sensor Networks: UWEN - A Practical Example. IEEE Communications Magazine, vol. 38 (2004) 393–422 3. Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specifications for High Rate Wireless Personal Area Networks (WPANs). IEEE Standard 802.15.3. Institute of Electrical and Electronics Engineers (2003) 4. Batra, A.: Multi-band OFDM Physical Layer Proposal for IEEE 802.15 TG3a. IEEE P802.15-03/268r3 (2003) 5. Ramachandran, I.: Symbol Spreading for Ultrawideband System Based on Multiband OFDM. Proc. PIMRC, vol. 2 (2004) 1204-1209 6. Schoo, K., Choi, H.: MC-CDMA In personal area networks - A Combined PHY and MAC Approach. ITS- Project (2005) 7. Taketa, K., Adachi, F.: Inter-chip Interference Cancellation for DS-CDMA with Frequency-domain Equalization. VTC2004, vol. 4 (2004) 2316 - 2320 8. Foerster, J.: Channel Modeling Sub-committee Report Final. IEEE P802.15-02/368r5 (2002) 9. Mesh Dynamics and Advanced Cybernetics Group-Dynamic Beacon Alignment: http:// www.meshdynamics.com/Publications/ MDPBEACONALIGNMENT. Pdf 10. Ghassemzadeh, S.S.: Parameter assumptions for the simulation of the proposed 802.15.3a PHYs. DCN# 15-04-0488-00-003a (2004)

Topology Control in Wireless Sensor Networks with Interference Consideration Yanxiang He and Yuanyuan Zeng School of Computer, Wuhan University, 430072, Hubei, P.R. China [email protected], [email protected]

Abstract. Topology control incurs large interference will increase communication sign collision, induce great delay in data delivering and consumes more energy. In the paper, we design a distributed algorithm considering interference based on existing connected dominating set backbone. The simulation shows that our algorithms considering interference have good performance and are more suitable for realistic application environments.

1 Introduction In the last few years, researchers actively explored topology control approaches for wireless sensor networks. Interference plays a very important role in many applications of wireless sensor networks. Some literatures have pointed out that a node can interference with another node even if it is beyond its communication range. If a topology has a large interference, either many signals sent by nodes will collide, or the network may experience a serious delay at delivering the data for some nodes, and even consumes more energy. To improve the network performance, designing topology control algorithms with consideration of interference are imminent and necessary. In this paper, we’ll design a distributed interference-aware connected dominating set based backbone algorithm suitable for application environment.

2 Related Work In GAF [1], nodes within a grid switch between sleeping and listening, with the guarantee that one node in each grid stays up to route packets. Some dominating set based topology control leads to a virtual backbone for the deployed ad hoc and sensor networks [2, 3, 4]. Recent research has shown that interference can make a significant impact on the performance of wireless networks, but still fewer literatures are on interference-aware topology control. Burkhart et al [6] point out that low node degree not always implies small interference. They show traditional topology control methods not always produce a sub graph whose interference is within a constant factor of the optimum. Li et al [7] present several algorithms to construct network topologies such that the maximum and average link/nodal interference of the topology is either minimized or approximately minimized. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 202 – 206, 2006. © Springer-Verlag Berlin Heidelberg 2006

Topology Control in Wireless Sensor Networks with Interference Consideration

203

3 Model and Assumptions We use G = (V, E) to represent such networks. We assume that all nodes have an equal transmission range r and are stationary. All nodes in the network use the same maximal interference range R. Simultaneous transmissions on proximate edges may interfere with each other resulting in collisions. Many interference models have been put forward by literature [5]. Here we present a static interference model based on existing models. I(e1) denote the set of edges which interfere with edge e1. The interference model defines the set I(e1) for each edge e1 in the networks. Let the two nodes incident to e1 be u and v, and the nodes incident to e2 be x and y. We define interference disk of a node u as the disk centered at u with radius of R. If node u or node v are covered by one of the interference disks of node x and y, then we say the edge e1 interference with edge e2. A hierarchical clustering constructed in network graph G is network topology G’. Let e be any link in G’. The link interference number of edge e denoted as |I(e)|. The node interference number is defined as the maximum interference of all links incident on a node, denoted as |I(u)|. The topology interference of sub graph G’, donated by |I(G’)| is max |I(e)|. The interference-aware topology control problem tries to construct a sub graph G’, which is connected and can cut down topology interference effectively.

4 Distributed Interference-Aware Topology Control Algorithm Our method is based on local minimum connected dominating set (MCDS). We present a distributed interference-aware MCDS backbone construction algorithm. We call our method as I-CDS. I-CDS algorithm is based on [4]. Firstly constructs a maximal independent set (i.e., an MIS, which is a subset of V that no two nodes in the subset have an edge. It’s also a dominating set of a graph), and the nodes in the MIS are dominators, nodes not in MIS are dominatees. In the second phase, each dominatee identifies the dominators that are at most two-hop away from it, and then connect the dominators together by choosing some connectors. We consider interference into node ranking as a function: rank(u)={ |I(u)|, id(u)}. Each node is in one of the four states: candidate, dominatee, dominator and connector. After finishes the algorithm, nodes in dominator and connector state will become a cluster-head. Other nodes are as cluster-member, belonging to local clusters. Interference-aware minimal CDS backbone (I-CDS) Algorithm

Step 1: Every node u sends out a TEST message with higher power reaching R to collect local interference information and compute the node interference number, and broadcasts its |I(u)| to its one-hop neighbors. Step 2: Node u with the minimal |I(u)| among all its neighbors with candidate state become a dominator. If the node interference number is the same, the node with lower ID wins. The node declares itself as a dominator will broadcast a DOMINATOR message. Step 3: Whenever a neighboring node receives the DOMINATOR message, it declares itself as a dominatee and broadcasts a DOMINATEE message.

204

Y. He and Y. Zeng

Step 4: Upon receiving DOMINATOR and DOMINATEE messages from all its neighbors, node maintains a list1 with the IDs of the neighboring dominator. When a node finishes the list, it broadcast corresponding LIST1 messages. Step 5: Upon receiving the LIST1 message from its neighbor, a dominatee maintain a list2 (the IDs of the neighbors two hops away), and broadcast a LIST2 messages. Step 6: Upon receiving both LIST1 and LIST2 messages from a neighbor, a dominator adds the neighbor into a maintained list3 (the IDs of the neighbors which connecting a two-hop away dominator) according to the increasing order of node interference number. And then select the dominatee neighbor in list3 with minimal node interference number (if the interference number is the same, the lower ID node wins) as a connector by sending a LIST3 message. Step 7: Upon receiving a LIST3 from its neighbors, a dominatee declares itself as a connector and sends out CONNECTOR1 message. Upon receiving the CONNECTOR1, a dominatee is selected as a connector if it could reach node’s two hops away dominators, and sends out CONNECTOR2 message. Theorem 1: I-CDS contains a CDS and thus is interference-aware. Proof: As we can see, the step_1 to setp_3 determine the state of nodes (dominator and dominatee) in the network graph in a non-overlapping way. The dominator nodes and the dominatee nodes are interleaving with each other. There are no two dominator nodes that have an edge. It is not difficult to see the nodes sends out DOMINATOR forms a maximal independent set. And the nodes in the MIS are nodes with minimal interference number among its neighbors. So the constructed set is with less interference comparing with many existing algorithms. The step_4 to setp_8 collects one-hop and two-hop dominators and dominatees information. And the senders of CONNETOR1 message connect dominators one-hop away, and senders of CONNECTOR2 message connect dominators two-hop away. All dominators will be connected together by those dominatees, which are invited as connectors. In the connectors selections, we consider the interference by always selecting nodes with minimal interference number as a connector. So our constructed optimum minimal CDS backbone will efficiently cut down link interference in realistic communication. Theorem 2: Algorithm 2 is with O(n) message complexity and O(m+n) time complexity. Proof: Each node in algorithm 2 sends out a constant number of messages, the total number of messages is O(n). Our algorithm is based on [14] with time complexity O(n). Step_2 to setp_8 has the same time complexity with [14]. Step_1 in algorithm2, which makes a computation of local interference number with time complexity O(m).

5 Simulations The simulation network size is 100 to 300 numbers of nodes in increments of 25 nodes respectively, which are randomly placed in a 160X160 square area to generate connected graphs. The radio transmission range is 30m, and the interference range is

Topology Control in Wireless Sensor Networks with Interference Consideration

205

I-CDS CDS 26 24

Interference

22 20 18 16 14 12 100

150

200

250

300

Network size

Fig. 1. Topology interference of network comparing among I-CDS and CDS algorithms

I-CDS CDS

0.66 0.64

Ave residual energy (J)

0.62 0.60 0.58 0.56 0.54 0.52 0.50 0.48 100

150

200

250

300

Network size

Fig. 2. Average residual energy of network when one backbone node died out comparing among I-CDS and CDS algorithms I-C D S CDS

0 .6 8

0 .6 6

Ave delay

0 .6 4

0 .6 2

0 .6 0

0 .5 8

0 .5 6 100

150

200

250

300

N e tw o rk s iz e

Fig. 3. Average delay of network comparing among I-CDS and CDS algorithms

twice the transmission range: 60m. Each node is assigned initial energy level 1 Joule (J). The transceiver energy model: mimics a “sensor radio” with Eelec50nJ/bit, fs 4 10pJ/bit/m2, mp 0.0013pJ/bit/m . We study the performance of constructing topology in terms of link interference, when comparing CDS construction algorithm (CDS) [4] with our I-CDS algorithm. Fig. 1 shows that interference of I-CDS will be less than original CDS algorithm version. Less interference in the network will cut

¯

¯

206

Y. He and Y. Zeng

down the signal collision and save energy. Also less interference will reduce network delay at delivering data. Fig.2 shows that our I-CDS will both improve the network energy performance. Fig. 3 shows the comparison of average network delay. Our topology with interference consideration effectively reduces signal collision and less delay.

6 Conclusion In the paper, we study on topology control problems in wireless sensor networks with interference consideration. We propose a distributed interference-aware CDS based backbone construction algorithm to make hierarchical network construction. Simulation shows that our algorithm substantially outperforms the existing CDS backbone algorithm without interference consideration.

References 1. Xu, Y., Heidemann, J.,.Estrin, D.: Geography-Informed Energy Conservation for Ad Hoc Routing. In Proceedings of the ACM/IEEE International Conference on Mobile Computing and Networking (MOBICOM), Rome, Italy. (2000) 70-84 2. Wu, J., Li, H.: On Calculating Connected Dominating Set for Efficient Routing in Ad Hoc Wireless Networks. In Proc. the 3rd ACM Int’l workshop Disc. Algor. and Methods for Mobile Computing and Commun. (1999) 7-14 3. Wan, P. J., Alzoubi, K., Frieder, O.: Distributed Well Connected Dominating Set in Wireless Ad Hoc Networks. In Proc. IEEE INFOCOM (2002) 4. Alzoubi, K., Wan, P. J.. Frieder, O.: New Distributed Algorithm for Connected Dominating Set in Wireless Ad Hoc Networks. In Proc. 35th Hawaii Int’1 Conf(HICSS'02) (2002) 3881-3887 5. Gupta, P., Kumar, P.: The Capacity of Wireless Networks. IEEE Transactions on Information Theory, vol. 46, no.2. (2000) 388-404 6. Burkhart, M., Rickenbach, P. V., Wattenhofer, R.,Zollinger, A.: Does topology control reduce interference. In Proc. ACM MOBIHOC. (2004)42004 7. Li, X.-Y., Neijad, K. M., Song, W.-Z., Wang, W.-Z.: Interference-aware topology control for wireless sensor networks. In Proc. IEEE SECON (2005)

Adaptive Depth Control for Autonomous Underwater Vehicles Based on Feedforward Neural Networks Yang Shi1 , Weiqi Qian2 , Weisheng Yan3 , and Jun Li3 1

3

Department of Mechanical Engineering, University of Saskatchewan, 57 Campus Drive, Saskatoon, Saskatchewan, S7N 5A9, Canada 2 Institute of Computational Aerodynamics, China Aerodynamics Research and Development Center, Mianyang, Sichuan, 621000, P.R. China Institute of Underwater Robotics, Northwestern Polytechnical University, Xi’an 710072, P.R. China

Abstract. This paper studies the design and application of the neural network based adaptive control scheme for autonomous underwater vehicle’s (AUV’s) depth control system that is an uncertain nonlinear dynamical one with unknown nonlinearities. The unknown nonlinearity is approximated by a feedforward neural network whose parameters are adaptively adjusted on-line according to a set of parameter estimation laws for the purpose of driving the AUV to cruise at the preset depth. The Lyapunov synthesis approach is used to develop the adaptive control scheme. The overall control system can guarantee that the tracking error converges in the small neighborhood of zero and all adjustable parameters involved are uniformly bounded. Simulation examples are given to illustrate the design procedure and the applicability of the proposed method. The results indicate that the proposed method is suitable for practical applications.

1

Introduction

Autonomous underwater vehicles (AUVs) have various potential applications and great advantages in terms of operational cost and safety: When performing manipulations or inspection tasks, AUVs can help us better understand marine and other environmental issues, protect the ocean resources, and eﬃciently utilize them for further development. So far, there are more than 46 AUV models worldwide, and numerous worldwide research and development activities have occurred in the area of AUVs [19]. However, a number of complex issues due to the unstructured, hazardous underwater (or undersea) environment make it diﬃcult to travel in the ocean even though todays technologies have allowed humans to land on the moon and robots to travel to Mars. Major facts that make it diﬃcult to control AUVs include: (1) the highly nonlinear, time-varying dynamic behavior of the AUVs; (2) uncertainties in D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 207–218, 2006. c Springer-Verlag Berlin Heidelberg 2006

208

Y. Shi et al.

hydrodynamic coeﬃcients; (3) disturbances by ocean currents. It is diﬃcult to ﬁne-tune the control gains during cruise underwater. Therefore, it is highly desirable to have an AUV control system that has a self-adaptive ability when the control performance degrades during operation due to changes in the dynamics of the AUV and its environment. In recent years, several advanced control techniques have been developed for autonomous underwater vehicles (AUVs), aimed at improving the capability of tracking given reference position and attitude trajectories [3]. AUVs performing manipulations or inspection tasks need to be controlled in six degrees of freedom. Even though the control problem is kinematically similar to the control of a rigid body in a six-dimensional space, which has been largely studied in the literature, the presence of hydrodynamic eﬀects makes the problem of controlling an AUV much more challenging. Reference [13] presents the state of the art of several existing AUVs and their control architecture. Typical results include sliding control [6,15], nonlinear control [10], adaptive control [11], neural network based control [16,17,18,8], and fuzzy control [2,9]. Since neural networks (NNs) have an inherent capability of approximating nonlinear functions, it is attractive to apply them in motion control systems, e.g., AUVs. In [16,17], a neural network control system has been proposed using a recursive adaptation algorithm with a critic function (reinforced learning approach), and thus the system adjusts itself directly and on-line without an explicit model of vehicle dynamics. In [8], a self-organizing neural-net-controller system (SONCS) was developed for the heading keeping control of AUVs, which features with the fast adaptation method. In this paper, inspired by the successful application of feedforward neural networks in missile control systems in [4], the development of a feedforward neural network based adaptive control for AUV’s depth control system is proposed. By employing a feedforward NN to on-line approximate the uncertain nonlinear dynamics of the AUV without explicit knowledge of its dynamic structure, the depth tracking performance is further investigated. The on-line parameter estimation laws of the NN is developed in the context of the Lyapunov stability concept. Boundedness of all parameters involved as well as the convergence of the tracking errors to zero is guaranteed. The rest of the paper is organized as follows. In Section 2 we discuss the uncertain nonlinear model of the AUV’s depth control system. In Section 3 the feedforward NN is then brieﬂy introduced and its universal approximation property is reviewed. In Section 4, by using the feedforward NN as an on-line approximator, we propose the adaptive control law and associated parameter estimation laws, and analyze the tracking performance and the stability of the whole AUV depth control system. In Section 5 we present an illustrative example to demonstrate the eﬀectiveness of the proposed method. Finally, we oﬀer some concluding remarks in Section 6.

2

AUV Model

Dynamics of AUVs, including hydrodynamic parameter uncertainties, are highly nonlinear, coupled, and time varying. Several modeling and system identiﬁcation

Adaptive Depth Control for AUVs Based on Feedforward Neural Networks

209

techniques for underwater robotic vehicles have been proposed by researchers [14,3]. The motion of an AUV is discussed in 6 degrees of freedom (DOF) since 6 independent coordinates are necessary to determine the position and orientation of a rigid body AUV, and 6 diﬀerent motion components are conveniently deﬁned as: surge, sway, heave, roll, pitch, and yaw. When analyzing the motion of AUVs in 6 DOF it is convenient to deﬁne two coordinate frames as illustrated in Figure 1.

X1

$89IL[HG FRRUGLQDWHV Y1

Ye

O Z1

(DUWKIL[HG FRRUGLQDWHV Oe

δe

ψ

Xe

θ

Ze

Fig. 1. AUV

In this work, we focus on the depth control system of the AUV. Suppose the AUV has a rigid body, and assume that the forward speed v is constant and that the sway and yaw modes can be neglected, then the following equations of motion of the AUV’s depth system include the angular velocity in pitch ωz1 , the pitch angle θ, the attack angle α, the depth ye , and the stern plane deﬂection δe . ⎧ v˙ ⎪ ⎪ ⎪ ⎪ α ˙ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ω˙z1 ⎪ ⎪ ⎪ ⎪ ⎪ θ˙ ⎪ ⎪ ⎪ ⎪ y˙ e ⎪ ⎩ x˙ e

= k11 v 2 + k14 sin(Θ) + k18 cos(α), = v1 k21 v 2 α + k22 vωz1 + k231 α cos(α) + k232 α sin(α) + k 24 α sin(Θ) +k25 cos(Θ) + k26 cos(θ) + k27 sin(θ) + k28 + k29 v 2 δ2 , = k301 sin(α) + k302 α cos(α) + k31 v 2 α + k32 vωz1 + k34 α sin(Θ) (1) k35 cos(Θ) + k36 cos(θ) + k37 sin(θ) + k38 + k39 v 2 δe , = ωz1 , = v sin(θ − α), = v cos(θ − α),

where Θ = θ − α, xe is the moving distance on the Xe direction, δe is the controller to be designed, and ki (is are subscripts appearing in the above equations) are coeﬃcients appropriately deﬁned in [14].

210

Y. Shi et al.

From (1), we have y¨e = v(θ − α) cos(θ − α) = v cos(θ − α)(ωz1 −˙α) = v cos(θ − α)ωz1 − cos(θ − α)[k21 v 2 α + k22 vωz1 + k231 α cos(α) +k232 sin α + k24 α sin(θ − α) + k25 cos(θ − α) + k25 cos(θ − α) +k26 cos θ + k27 sin θ + k28 + k29 v 2 δe ] := f (x) + g(x)δe , where x = [ye θ α wz1 ]T , and f (x) := v cos(θ − α)ωz1 − cos(θ − α)[k21 v 2 α + k22 vωz1 + k231 α cos(α) +k232 sin α + k24 α sin(θ − α) + k25 cos(θ − α) + k25 cos(θ − α) +k26 cos θ + k27 sin θ + k28 ], g(x) := k29 [−v 2 cos(θ − α)]. Therefore, the model of the AUV can be represented in the following compact form (2) y¨e = f (x) + g(x)δe .

3

Feedforward Neural Networks

NNs are promising tools for identiﬁcation and control applications because of the universal approximation property [7,5]. A three-layer feedforward NN (shown in Figure 2) can perform as an online approximator [4]. The NN’s vector output can be represented in matrix form ˆ n (xa , W ˆ ih , W ˆ ho ) = W ˆ ho σ(W ˆ ih xa ), Y

ˆn Y

∑

∑

∑

σˆ1

σˆ 2

σˆ p

Wˆho

Wˆ ih (1) X a

∑

Wˆ ih ( 2) X a

∑

Wˆ ih ( p ) X a

+LGGHQ /D\HU

∑ Wˆih

Xa

Fig. 2. Neural network structure

(3)

Adaptive Depth Control for AUVs Based on Feedforward Neural Networks

211

ˆ ih ∈ IRp×(n+1) and W ˆ ho ∈ IRm×p are the input-hidden weight matrix where W and hidden-output weight matrix, respectively; x ∈ IRn×1 is the input vector; xa = (xT , −1)T ∈ IR(n+1)×1 is the augmented neural input vector (the −1 term denotes the input bias), ˆ ih (i)xa ) := σi (W

1

ˆ ih (i)xa 1 + exp −W

∈ IR,

i = 1, 2, · · · , p

is a sigmoid function, and ⎤ ˆ ih (1)xa ) σ1 (W ⎥ .. ˆ ih xa ) := ⎢ σ(W ⎦, ⎣ . ˆ σp (Wih (p)xa ) ⎡

⎤ ˆ ih (1) W ⎥ ⎢ .. := ⎣ ⎦, . ˆ Wih (p) ⎡

ˆ ih W

ˆ ih includes the threshold. where W NN’s universal approximation property is stated formally in the following theorem [7,5]. ¯ ∈ D (a compact subset of IRn ), Y(x) : D → IRm Theorem 1. [7,5] Let x be a continuous function vector. For an arbitrary constant > 0, there exists an integer p (the number of hidden neurons) and real constant optimal weight ∗ ∗ ∈ IRp×(n+1) and Who ∈ IRm×p such that matrices Wih ∗ ∗ Y(x) = Yn∗ (xa , Wih , Who ) + n (x),

(4)

where n (x) is the approximation error vector satisfying n (x) ≤ , ∀x ∈ D. The optimal approximator can be described as ∗ ∗ ∗ ∗ Yn∗ (xa , Wih , Who ) = Who σ(Wih xa ).

4

(5)

Adaptive Control Design for AUV Depth System

For the AUV’s depth control system modeled in (2), the control objective is to drive the AUV to track an expected depth trajectory yem . The tracking performance can always be characterized by the tracking error e := yem − ye . In order to increase the robustness of the controller to be designed, a sliding surface is introduced as: S = e˙ + λe, where λ is a small positive constant. Deﬁne S η if |η| ≤ 1, S∆ = S − ε · sat , sat(η) = sgn(η) otherwise. ε If S ≤ ε, S∆ = S˙ ∆ = 0; and if S > ε, S∆ = S − ε and S˙ = S˙ ∆ . Then the derivative of S is

212

Y. Shi et al.

S˙ = e¨ + λe˙ = y¨e + λe˙ = f (x) + g(x)δe + (−¨ yem + λe) ˙ ˙ + g(x)δe . = −ΛS + [ΛS + f (x) + (−¨ yem + λe)]

(6)

Deﬁne Y (x) := ΛS + f (x) + (−¨ yem + λe), ˙

(7)

which is uncertain, and can be on-line approximated by a feedforward NN described in Section 3. 4.1

Using NN as an Online Approximator

When the AUV cruises underwater, additional force and moment coeﬃcients are added to account for the eﬀective mass of the ﬂuid that surrounds the vehicle and must be accelerated with the AUV. These coeﬃcients are referred to as added (virtual) mass and include added moments of inertia and cross coupling terms such as force coeﬃcients due to linear and angular accelerations. It would be diﬃcult task to obtain the exact values of hydrodynamic coeﬃcients, let alone those disturbances from currents and waves. The main idea of NN based control schemes is to apply NNs to online approximate the unknown nonlinear functions involved in the nonlinear systems to be controlled. On the basis of Theorem 1, we can see that there exists an op∗ ∗ , Who ) over a properly deﬁned timal neural network approximator Yn∗ (xa , Wih ˆ ih , W ˆ ho ) to model the ˆ compact set, and we design a NN approximator Yn (xa , W ˆ ˆ unknown function Y (x), given the estimates Wih and Who . The NN approximation error Y˜n and the wight matrix estimation error are deﬁned as follows, respectively ˆ ih , W ˆ ho ), Y˜n := Y (x) − Yˆn (¯ xa , W ∗ ˆ ih , ˜ ih := W − W W ih

˜ ho := W∗ − W ˆ ho . W ho According to Theorem 1, we can re-write the NN approximation error as ˆ ho σ(W ˆ ih xa ) Y˜n = W∗ σ(W∗ xa ) + n (x) − W ho

ih

˜ ho σ(W∗ xa ) + W ˆ ho σ(W∗ xa ) + n (x) − W ˆ ho σ(W ˆ ih xa ). =W ih ih Taking the Talor-series expansion on

(8)

∗ σ(Wih xa ),

we have ∗ ˜ ih xa ), (9) ˆ ih xa ) + σ (W ˆ ih xa ) W∗ xa − W ˆ ih xa + (W σ(Wih xa ) = σ(W ih dσ1 (z) dσ1 (z) ˆ ih xa ) = diag dσ1 (z) | ˆ where σ (W , | , · · · , | ˆ ˆ z=Wih1 xa z=Wih2 xa z=Wihp xa dz dz dz

∈ IRp×p , and (·) is the sum of the high-order terms of the argument in the Taylor-series expansion. Substituting (9) into (8), we can get ˆ ih xa ) − σ (W ˆ ho σ (W ˜ ho σ(W ˆ ih xa )W ˆ ih xa +W ˆ ih xa )W ˜ ih xa +Ψ, (10) Y˜n = W ˜ ho σ (W ˆ ih xa )W∗ xa + W ˜ ho (W ˜ ih xa ) + n (x). where Ψ = W ih

Adaptive Depth Control for AUVs Based on Feedforward Neural Networks

213

Adaptive control and estimation laws to be designed will suppress the NN approximation error, and thus achieve satisfactory tracking performance. In order to facilitate the following design, we analyze the lumped term Ψ in the NN approximation error and explore its upper bound, following the approach used in [4]. Sigmoid function and its derivative are always bounded by certain constants, hence we assume c1 and c2 are some constants, and ∗ ˆ ih xa ) − σ(Wih ˆ ih xa ) ≤ c2 . σ(W xa ) ≤ c1 , σ (W

Therefore, ∗ ˜ rmih xa ) = σ(W ˆ ih xa ) − σ(Wih ˆ ih xa )W ˜ ih xa ˆ (W (W xa ) − σ ˜ ih xa . ≤ c1 + c 2 W

(11)

According to Theorem 1, the norm of the optimal weight matrices of the trained ¯ ih NNs should be bounded by certain constants that are assumed to be W ¯ and Who , ∗ ∗ ¯ ih , Who ¯ ho , Wih F ≤ W F ≤ W T where · F := tr (·) (·) with tr indicating the trace of a matrix, representing the Frobenius norm of a matrix. It is noted that the Frobenius norm of a vector is equivalent to the 2-norm of a vector. Then the norm on the residual term Ψ of the NN approximation error is ˆ ih xa )W∗ xa + W ˜ ho (W ˜ ih xa ) + n (x) ˜ ho σ (W Ψ = W ih ˜ ih F · xa + ˜ ho F · c2 · W ¯ ih · xa + W ¯ ho c1 + c2 W ≤ W ˆ ho F xa ¯ ho + + 2c2 W ¯ ih W ¯ ho xa + c2 W ¯ ih W ≤ c1 W ¯ ho W ˆ ih F xa , +c2 W := bT w where

¯ ho + 2c2 W ¯ ih W ¯ ho c2 W ¯ ih c2 W ¯ ho T ∈ IR1×4 , b = c1 W ˆ ho F xa W ˆ ih F xa ∈ IR4×1 . w = 1 xa W

Then we have Ψ ≤ bT w.

(12)

It is also noticed that g(x) is uncertain in that the involved coeﬃcient k29 is unknown. Therefore, we need to adaptively estimate k29 . For the convenience of expression, deﬁne k := k29 , and the parameter estimation error k˜ =then the estimated g(x) can be expressed as: gˆ(x) = kˆ −v 2 cos(θ − α) .

(13)

214

4.2

Y. Shi et al.

Control and Parameter Estimation Laws

Once Yˆn and gˆ are employed as on-line approximators, we can design an adaptive AUV depth control system based on NNs: δe = gˆ−1 (−Yˆn + uc ),

(14)

where uc is the compensation control term and has the following form S ˆT uc = −sat b w, ε

(15)

ˆ ∈ IR4×1 is an unknown vector to be estimated. where b The parameter estimation laws for the NN and associated unknown coeﬃcients are designed as follows T ˆ ih xa ) − σ (W ˆ˙ ho = Γho σ(W ˆ ih xa )W ˆ ih xa S∆ , (16) W T ˆ˙ = Γih xa S∆ W ˆ ih xa ) , ˆ ho σ (W W ih

(17)

˙ kˆ = Γk S∆ −v 2 cos(θ − α) ,

(18)

ˆ˙ = Γw |S∆ |w. b

(19)

Figure 3 depicts the structure of the depth control system developed herein. In the implementation of the controller, the depth ye can be measured by a pressure meter, the pitch angle θ can be measured by an inclinometer while the pitch rate ωz1 requires a rate gyro or rate sensor. Parameter Estimation Laws

Expected Depth

+

+

•

-

•

NN Approximation

Yˆn

-

+ +

X

AUV System

AUV System Output

•

AUV System States

Compensation Control

Approximation

uc

gˆ −1

•

Adaptive NN Control

Fig. 3. Block diagram for the control scheme

Adaptive Depth Control for AUVs Based on Feedforward Neural Networks

4.3

215

Stability Analysis

Theorem 2. (Stability) Consider the AUV depth control system described by (1) or (2) with the control given by (14) and parameter estimation laws provided by (16), (17), (18), and (19). Then the AUV depth tracking error will asymptotically converge to a neighborhood of zero, and all adjustable parameters will remain bounded. Proof. Choose a Lyapunov function V = V1 + V2 , where V1 =

1 2 S , 2 ∆

(20)

and V2 =

1 ˜ b ˜T. ˜ T ) + 1 tr(W ˜ T ) + 1 Γk k˜2 + 1 bw ˜ ho Γ −1 W ˜ ih Γ −1 W tr(W ho ih ho ih 2 2 2 2

(21)

In the following, the time derivative V˙ is to be evaluated for two cases: (1) |S| > ε, and (2) |S| ≤ ε. (1) Case 1: If |S| > ε, then S∆ = S − ε. Hence, the time derivative of V1 can be derived as follows: V˙1 = S∆ S˙ ∆ . Substituting (6), (14), and (15) into the above equation yields V˙ 1 = S∆ [−ΛS + Y (x) + g(x)δe ] = S∆ −ΛS + Y˜n (x)+ Yˆn (x) +[˜ g (x) + gˆ(x)] gˆ−1 (x) −Yˆn (x) + uc (22) = S∆ −ΛS + Y˜n (x) + g˜(x)δe + uc . Taking the NN approximation error Y˜n (x) (10) and the control law δe (14) into (22), we have V˙ 1 = −S∆ ΛS∆ − S∆ ε + S∆ uc + S∆ k˜ −v 2 cos(θ − α) δe ˆ ih xa )W ˆ ih xa ˜ ho σ(W ˆ ih xa ) − σ (W +S∆ W ˆ ih xa )W ˜ ih xa + Ψ ˆ ho σ (W +W According to (12), we can further obtain 2 ˆ ih xa )W ˆ ih xa ˆ ih xa ) − σ (W ˜ ho σ(W V˙ 1 ≤ −ΛS∆ + tr S∆ W ˜ T w. ˆ ho σ (W ˆ ih xa )W ˜ ih xa + k˜ −v 2 cos(θ−α) δe S∆ +|S∆ |b +tr S∆ W (23) On the other hand, the time derivative of V2 is T T ˜ ho Γ −1 W ˜ ih Γ −1 W ˆ˙ ) − tr(W ˆ˙ ) − Γ −1 k˜ kˆ˙ − b ˆ˙ ˜ T w−1 b. V˙ 2 = −tr(W ho ih ho ih k

216

Y. Shi et al.

Substituing the parameter estimation laws (16), (17), (18), and (19), and the control law (14) into the above equation yields T T ˜ T w−1 b ˆ˙ ˆ˙ ) − tr(W ˆ˙ ) − Γ −1 k˜ kˆ˙ − b ˜ ho Γ −1 W ˜ ih Γ −1 W V˙ 2 = −tr(W ho ih ho ih k ˆ ih xa ) − σ (W ˜ ho σ(W ˆ ih xa )W ˆ ih xa = −tr S∆ W ˆ ih xa )W ˜ ih xa − k˜ −v 2 cos(θ − α) δe S∆ −|S∆ |b ˜ T w. ˆ ho σ (W −tr S∆ W

(24) Combining (23) and (24) leads to 2 V˙ ≤ −ΛS∆ .

(25)

(2) Case 2: If |S| ≤ ε, then S∆ = 0. Hence, V˙ = 0.

(26)

Considering the above two cases, (25) and (26) obviously imply that : (1) S∆ , ˜ ho , W ˜ ih , and w are all bounded; (2) S∆ ∈ L2 . According to the boundedness W of all the adjustable parameters, we can straightforwardly see that δe , uc , and ∞ S˙ ∆ are also bounded. Furthermore, limt→∞ 0 S∆ dt is bounded, and S∆ is uniformly continuous. Applying the Barbalat Lemma [12] yields lim S∆ = 0,

t→∞

(27)

which implies that the depth tracking error will asymptotically converge to a neighborhood of zero.

5

AUV Case Study

The simulation study is based on the model structure of certain AUV developed in [14]. Preset the expected cruising depth yem = 50m. Assume the following initial conditions: v = 30m/s, ye = 0; ωz1 (0) = 0. Then we employ a feedforward NN with the structure - 8 inputs, 10hidden neurons, and 1 output - to approximate the uncertain nonlinearity. The adaptive update gain matrices are set to be Γho = diag(5, · · · , 5) ∈ IR10×10 , Γih = diag(0.2, · · · , 0.2) ∈ IR8×8 , and Γk = 0.05, and all the initial weights are set to 0. For the sliding surface, we choose S = e˙ + 4e, and ε = 0.3. Figure 4 illustrates the depth response of the AUV (ye ), and Figure 5 shows the control input (δe ) - the stern plane deﬂection. A better performance may be obtained by further tuning the update gain and increasing the number of neurons in the hidden layer. A higher update gain gave a better tracking performance but, when the gain was too high, oscillatory behavior may happen.

Adaptive Depth Control for AUVs Based on Feedforward Neural Networks

217

Fig. 4. Depth response of the AUV (ye )

Fig. 5. Control input - the stern plane deﬂection of the AUV (δe )

6

Conclusion

An adaptive NN controller for an AUV’s depth control system has been developed. The NN controller oﬀers guaranteed tracking performance. Feedforward NN has been used to on-line approximate the uncertain nonlinear dynamics of the AUV. Without explicit prior knowledge of the vehicle dynamics, the proposed control technique could achieve satisﬁed tracking performance, and all the adjustable parameters involved are bounded during the course. Case studies show the eﬀectiveness of the proposed method for AUV system. Whereas this work is only for the AUV’s depth channel, the next stage of the study is to apply the proposed NN based adaptive control scheme for AUV’s three-channel control system design.

References 1. Curtin, T. B., Bellingham, J. G., Catipovic, J., Webb, D. Autonomous Oceanographic Sampling Networks. Oceanography. 6 (1989) 86–94 2. DeBitetto, P. A.: Fuzzy Logic for Depth Control of Unmanned Undersea Vehicles. Proc. Symposium of Autonomous Underwater Vehicle Technology. (1994) 233–241

218

Y. Shi et al.

3. Fossen, T.: Guidance and Control of Ocean Vehicles. Chichester: Wiley. (1994) 4. Fu, L.C., Chang, W.D., Yang, J.H., Kuo, T.S.: Adaptive Robust Bank-to-turn Missile Autopilot Design using Neural Networks. Journal of Guidance, Control, and Dynamics. 20 (1997) 346–354 5. Funahashi, K.I.: On the Approximate Realization of Continuous Mappings by Neural Networks. Neural Networks. 2. (1989) 183–192 6. Healey, A. J., Lienard, D.: Multivariable Sliding Mode Control for Autonomous Diving and Steering of Unmanned Underwater Vehicles. IEEE Journal of Oceanic Engineering. 18 (1993) 327–339 7. Hornik, K., Stinchcombe, M., White, H.: Multilayer Feedfroward Networks are Universal Approximators. Neural Networks. 2 (1989) 359–366 8. Ishii, K., Fujii, T., Ura, T.: Neural Network System for Online Controller Adaptation and its Application to Underwater Robot. Proc. IEEE International Conference on Robotics & Automation. (1998) 756–761 9. Kato, N.: Applications of Fuzzy Algorithm to Guidance and Control of Underwater Vehicles. Underwater Robotic Vehicles: Design and Control. J. Yuh (Ed.), TSI: Albuquerque. (1995) 10. Nakamura, Y.,Savant, S.: Nonlinear Tracking Control of Autonomous Underwater Vehicles. Proc. IEEE Int. Conf. on Robotics and Automation. 3. (1992) A4–A9 11. Nie, J., Yuh, J., Kardash, E., Fossen, T. I.: Onboard Sensor-based Adaptive Control of Small UUVs in the Very Shallow Water. Proc. IFAC-Control Applications in Marine Systems. Fukuoka, Japan. (1998) 201–206 12. Slotine, J.J. E., Li, W.: Applied Nonlinear Control. Prentice-Hall, Englewood Cliﬀs. (1991) 13. Valavanis, K. P., Gracanin, D., Matijasevic, M., Kolluru, R.,: Demetriou, Control Architecture for Autonomous Underwater Vehicles. IEEE Contr. Syst.. (1997) 48– 64 14. Xu, D., Ren,., Yan, W.: Control Systems for Autonomous Underwater Vehicle. Xi’an: NPUP. (1990) 15. Xu, D., Yan, W., Shi, Y.: Nonlinear Variable Structure Double Mode Control of Autonomous Underwater Vehicles. Proc. IEEE International Symposium on Underwater Technology. Tokyo. (1990) 425–430 16. Yuh, J.: A Neural Net Controller for Underwater Robotic Vehicles. IEEE Journal Oceanic Engineering. 15 (1990) 161–166 17. Yuh, J.: Learning Control for Underwater Robotic Vehicles. IEEE Control System Magazine. 14 (1994) 39–46 18. Yuh, J.: An Adaptive and Learning Control System for Underwater Robots. Proc. 13th World Congress International Federation of Automatic Control. San Francisco, CA. A (1996) 145–150 19. Yuh, J.: Design and Control of Autonomous Underwater Robots: a Survey. Autonomous Robots. (2000) 7–24

Adaptive Fuzzy Sliding-Mode Control for Non-minimum Phase Overload System of Missile Yongping Bao1, Wenchao Du2,3, Daquan Tang4, Xiuzhen Yang5, and Jinyong Yu5 1

School of Mathematics and Information, Lu Dong University,Yantai,264001, P.R. China [email protected] 2 Graduate Students’ Brigade, Naval Aeronautical Engineering Institute, Yantai 264001, P.R. China 3 Special Missiles Representatives Office in Beijing of Military Representatives Bureau of NED in Tianjin, Beijing, 100076, P.R. China 4 School of Automation Science and Electrical Engineering, Beijing University of Aeronautics and Astronautics,Beijing 100083, P.R. China 5 Department of Automatic Control Engineering, Naval Aeronautical Engineering Institute, Yantai 264001, P.R. China

Abstract. An adaptive fuzzy logic system is incorporated with the Varibale Structure Control (VSC) system for the purpose of improving the performance of the control system. A sliding surface with an additional tunable parameter is defined as a new output based on the idea of output redefinition, as a result the overload system of missile with the characteristic of non-minimum phase can be transformed into minimum-phase system by tuning the parameters of the sliding surface, and a sliding-mode controller can be designed. For the existence of uncertainty of the parameters, a fuzzy logic system is used to approximate it, thus the chattering effects can be alleviated. Finally, the simulation results have been given to show the effectiveness of the proposed control scheme.

1 Introduction A system is a non-minimum phase one when it’s zero dynamics is unstable. To control non-minimum phase systems is more difficult than to control minimum phase systems, which can be shown by the process of I/O linearization, and the standard I/O linearization may cause the zero dynamics of the non-minimum phase system unstable, thus the ordinary control scheme can’t be applied. For the merits of I/O linearization, some researchers managed to extend the technique to non-minimum phase systems. A solution is to define a new output, thus the zero dynamics of new system can be stabilized. However building a meaningful relation between the original output and the new output is difficult. C. Kravaris and R.A. Wright [1] defined a new output by introducing the concept of static equivalence, which can guarantee that the compensated system is linear with respect to static I/O, but the dynamical process can not be guaranteed. H.Yang and H.Krishnan[2] applied the output redefinition technique to single-link flexible robots by coordinate transformation. Govert Monsees and Jacquelien Scherpen[3] pointed out that because the inversion of the non-minimum D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 219 – 228, 2006. © Springer-Verlag Berlin Heidelberg 2006

220

Y. Bao et al.

phase system is not stable, the feedforward controller can not be used unless the desired input or the upper bound of the input is known. Conventional sliding-mode control can not be applied to the system with non-minimum phase, because its equivalent control term tends to infinity, such that Shkolnikov and Yuri B.Shtessel[4] designed dynamic sliding mode control for non-minimum phase systems. The acceleration system of tail-controlled missiles is a non-minimum phase system, namely the tail fins deflection first generates a small force on the fin opposed to the desired acceleration. In [5], the approximating lineariztion and feedback linearization is adopted, the dynamic model of missile is transformed into minimum phase parametric affine model. In [6] the model of missile with the acceleration as the output is simplified as a minimum phase system via partly linearization and singular perturbation-like technique, and I/O is exactly linearized. In [7], output redefinition and inversion is used to deal with the non-minimum phase characteristic of missile. In this paper, a new sliding surface with an additional tunable parameter is defined as a new output, thus output redefinition can be combined with Sliding-Mode Control(SMC) perfectly, and the non-minimum phase system of the missile overload can be controlled. Besides, fuzzy logic system is used to approximate the uncertain part of system, so that the control gain of SMC can be more fitting, and the chattering effect can be alleviated. This paper is organized as follows. In Section II, the original overload system of missile is transformed into a new one by refining a sliding surface as a new output. In Section III, a fuzzy sliding mode controller is designed and the stability is proved via Lyapunov stability theorem. In Section IV, a simulation example is provided to illustrate the performance of the proposed control scheme. Conclusion remarks are finally made in Section V.

2 Description of Missile Overload System Conventional acceleration model of the pitch channel of missiles [7] concludes three equations, but some researchers often treat the motor as a one order system, thus the control to design is not fin deflection but the control voltage, and the total equations are as follows

ω z = a24α + a22ω z + a25δ z

(1a)

α = ω z − a34α − a35δ z

(1b)

δz = − wδ z + wuδ

(1c)

ny = Where

ω z in(1a)

v v a34α + a35δ z g g

is angular acceleration ,

α

in (1b) is attack angle,

(1d)

n y in (1d) is

overload. (1a), (1b) and (1d) formulate the acceleration model of the pitch channel of missiles. (1d) is the mode of the motor. In Appendix a conclusion is given that the derivation of the overload is approximately proportional to the angular acceleration,

Adaptive Fuzzy Sliding-Mode Control

which can be expressed as

221

V n yb ≈ ω z . In the control scheme proposed in this paper, g

the derivation of the overload is only play the role of damping, so that the value of it precise or not is not a matter of cardinal significance. When the angular acceleration is used to replace the derivation of the overload, the sliding surface chosen as (2) becomes meaningful.

S = k1 (n y − n yd ) + k 2 (ω z − ω zd )

(2)

Where k1 is an additional tunable parameter which can not be seen in the traditional from of sliding surface. Take the sliding surface as new output , after some mathematical manipulations, it can obtain that

§ α · § m11 ¨ ¸ ¨ ¨ ω z ¸ = ¨ m21 ¨ S ¸ ¨ m © ¹ © 31

m12 m22 m32

m13 ·§ α · § 0 · § D1 · ¨ ¸ ¸¨ ¸ ¨ ¸ m23 ¸¨ ω z ¸ + ¨ 0 ¸uδ + ¨ D2 ¸ ¨D ¸ m33 ¸¹¨© S ¸¹ ¨© b ¸¹ © 3¹

(3)

Let S = 0 , then we can get the zero dynamics shown as

§ α · § m11 ¨¨ ¸¸ = ¨¨ © ω z ¹ © m21

m12 ·§ α · § D1 · ¸¨ ¸ + ¨ ¸ m22 ¸¹¨© ω z ¸¹ ¨© D2 ¸¹

(4)

To keep the zero dynamics stable, the polynomial shown in (5) must be a Hurwitz one, the aim can be obtained by choosing the values of k1 , k 2 .

§ m11 sI − ¨¨ © m21

m12 · ¸ = s 2 + h1 s + h2 m22 ¸¹

(5)

where

h1 = a34 − a 22 + k1

v a34 a35 + k 2 a 24 a35 + k 2 a 22 a 25 g

h2 = k2 a22a25a34 − a22a34 + k1k2

v v a22a25a34a35 − k1 a22a34a35 g g

+ k2 a22a24a25a35 − k2 a22a24a35 2

Besides, to guarantee the convergence of the sliding surface and the dynamical performance, the following relations should be satisfied

k1 k 2 > 0

(6)

B2 < k1 / k 2 < B1

(7)

222

Y. Bao et al.

a34 − a 22 + k1

v a34 a35 + k 2 a 24 a35 + k 2 a 22 a 25 > 0 g

k 2 a22 a25 a34 − a22 a34 + k1k 2

v a22 a25 a34 a35 g

(8)

(9)

v 2 − k1 a22 a34 a35 + k 2 a22 a24 a25 a35 − k 2 a22 a24 a35 > 0 g

3 Adaptive Fuzzy Sliding Mode Controller Design Suppose the zero dynamics has already been stabilized by choosing

k1 , k 2 , and the

values of α , ω z and aerodynamic parameters can be measured or calculated, then it can obtain that

S = m 31 α + m 32 ω z + m 33 S + bu δ + D 3

(10)

Chose the candidate Lyapunov function as V1 =

1 S 2

2

(11)

If the control is chosen as

uδ = uc + u a where

(12)

1 1 u c = (−m31α − m32ω z − m33 S − D3 ) , u a = (− A1 S − A2 sign( S )) b b

Derivate (12), we have

V1 = SS = S(m31α + m32ωz + m33S + buδ + D3 ) = S(−A1S − A2sign(S))

(13)

= −A1S 2 − A2 S ≤0 thus the asymptotical stability of the system can be guaranteed. When there exists uncertainty, (3) can be changed into the following form

§ α · § m11 + ∆m11 ¨ ¸ ¨ ¨ω z ¸ = ¨ m21 + ∆m21 ¨ S ¸ ¨ m + ∆m 31 © ¹ © 31

m12 + ∆m12 m22 + ∆m22 m32 + ∆m32

§ D1 + ∆D1 · § 0 · ¸ ¸ ¨ ¨ + ¨ 0 ¸uδ + ¨ D2 + ∆D2 ¸ ¨ D + ∆D ¸ ¨ b + ∆b¸ 3¹ ¹ © 3 ©

m13 + ∆m13 ·§ α · ¸¨ ¸ m23 + ∆m23 ¸¨ωz ¸ m33 + ∆m33 ¸¹¨© S ¸¹

(14)

Adaptive Fuzzy Sliding-Mode Control

223

and the zero dynamics will be

§ α · § m11 + ∆m11 ¨¨ ¸¸ = ¨¨ ©ω z ¹ © m21 + ∆m21

m12 + ∆m12 ·§ α · § D1 + ∆D1 · ¸¨ ¸ + ¨ ¸ m22 + ∆m22 ¸¹¨©ω z ¸¹ ¨© D2 + ∆D2 ¸¹

To keep the zero dynamics stable,the values of

§ m11 + ∆m11 sI − ¨¨ © m21 + ∆m21

(15)

k1 , k 2 must keep the polynomial

m12 + ∆m12 · ¸ a Hurwitz one. From (14),we have m22 + ∆m22 ¸¹

S = (m31 + ∆m31)α + (m32 + ∆m32)ωz + (m33 + ∆m33)S + (b + ∆b)uδ + D3 + ∆D3

(16)

Let

f = ( m 31 + ∆ m 31 )α + ( m 32 + ∆ m 32 )ω z + ( m 33 + ∆ m 33 ) S + D 3 + ∆ D 3

(17)

g = b + ∆b

(18)

S = f + gu δ

(19)

then (16) can be rewritten as

For the existence of uncertainty, fuzzy logic system is introduced to approximate it, which is not to tune the weights but the centers of the member function of the output. And the bell member functions are expressed in the form of (20)

µ ij (u j ) = 1 /(1 + where

u j − cij

2 bij

aij

)

(20)

cij presents the center of the member function, a ij determine the width of the

bell function and

bij characterize the slope. And the following rule sets are adoped

IF u1 is U1 AND um is Um THENF =ξi Rulei

i = 1,2,...R ,

(21)

Where u and F are input and output of the FLC , U i and ξ i are input and output linguistic variables. For FLS we chose product-operation rule of fuzzy implication and center of average deffuzifier as (22)

224

Y. Bao et al. R

m

i =1

j =1

R

m

FTOTAL = ¦ ξ i ∏ µ ij (u j ) / ¦∏ µ ij (u j ) i =1 j =1

R

(22)

= ¦ ξ iω ni = ξ T ω n i =1

where m

R

m

ω ni = ∏ µ ij (u j ) / ¦∏ µ ij (u j )

(23)

i =1 j =1

j =1

R, m are numbers of rules and inputs, ω ni is the ith element of the vector

ω n = (ω n1 , ω n 2 ,..., ω nR ) T ,for SISO FLC , m = 1 . Let

f = f + ∆f

(24)

g = g + ∆g

(25)

where f = m 31α + m 32 ω z + m 33 S + D 3 , ∆ f = ∆ m 31α + ∆ m 32ω z + ∆ m 33 S + ∆ D3 ,

g =b

㧘 ∆ g = ∆ b , then (16) can be rewritten as

S = f + ∆ f + ( g + ∆ g ) u δ

(26)

Fuzzy logic system is adopted to approximate ∆f , ∆g , that is T ∆fˆ = ξˆ f ω nf

(27)

T ∆gˆ = ξˆg ω ng

(28)

Firstly, define optimal parameters ξ f , *

ξ g * and minimal approximating error

me = (∆f − ∆f * ) − ~

Let ξ f

* = ξ f − ξˆ f

㧘 ξ~

(∆g − ∆g * ) uδ g + ∆gˆ

(29)

* = ξ g − ξˆg

g

And the following theorem can be got. Theorem 1: If control laws and adaptive laws are adopted as shown in(30)-(33)

ξˆ f = l1ω nf S

ξˆg = −

1 l 2ω ng uδ S g + ∆gˆ

(30) (31)

Adaptive Fuzzy Sliding-Mode Control

uc = − ua =

Where

1 ( f + ∆ fˆ + A1 S ) g + ∆ gˆ 1 ( − A 2 sign ( S )) g + ∆ gˆ

225

(32)

(33)

l1 ,l 2 and A1 , A2 are positive real numbers, and A2 > me , then the system

(16) stable asymptotically. Proof Choose the candidate Lyapunov function as

V1 =

1 2 1 ~T~ 1 ~T~ ξg ξg S + ξf ξf + 2 2l1 2l2

(34)

derivate it, we will have

1 ~ T ~ 1 ~ T ~ V1 = SS + ξ f ξ f + ξ g ξ g l1 l2 = S{ f + ∆f + ( g + ∆g)[− −

1 ( f + ∆fˆ + A1S ) g + ∆gˆ

1 T~ T~ A2 sign(S )]} − Sωnf ξ f − Sωng ξ g uδ ˆ g + ∆g g + ∆g )uc − A1S − A2 sign(S ) = S{ f + ∆f − f − ∆fˆ + (1 − g + ∆gˆ

g + ∆g T~ T~ ) A2 sign(S )]} − Sωnf ξ f + Sωng ξ g uδ g + ∆gˆ ∆g~ + (∆g − ∆g * ) ~ uδ − A1S = S (∆f + (∆f − ∆f * ) − g + ∆gˆ T~ T~ − A2 sign(S )) − Sωnf ξ f + Sωng ξ g uδ

+ (1 −

~T = S (ξ f ωnf −

1 ~T 1 ~ ~ ξ g ωnguδ − ωnf Tξ f + ωngTξ g uδ g + ∆gˆ g + ∆gˆ

− A1S − A2 sign(S ) + (∆f − ∆f * ) −

(∆g − ∆g * ) uδ ) g + ∆gˆ

= S (− A1S − A2 sign(S ) + me ) ≤ − A1S 2 − ( A2 − me ) S ≤0 thus the asymptotical stability of the system can be guaranteed.

(35)

226

Y. Bao et al.

4 Simulation

n

y

Į̬e

/g

Take the pitch channel overload model of some missile as an example, suppose the model of the motor is a one order system as − 17 (s + 17) , to verify the correctness and

t(s)

t(s) Fig. 1. Response curve of overload

α

Ȧ

z

u̬ V

(Ǆ

Fig. 2. Response curve of

t(s) t(s)

ωz

Fig. 4. Curve of control voltage

n

y

S

/g

Fig. 3. Response curve of

t(s)

t(s) Fig. 5. Response curve of

S

Fig. 6. Response curves of verloadwith ± 20% parameter perturbation

Adaptive Fuzzy Sliding-Mode Control

227

effectiveness of the proposed control scheme, simulations are made for the nominal system and the system with parameter perturbation respectively when the reference input is square wave signal. The simulation results are shown as Figure1-6, where Figure1-5 are for the nominal system, the curve of overload is given in Figure 1,the curve of α is given in Figure 2, the curve of ω z is given in Figure 3, and the curve of control voltage is given in Figure 4, the curve of sliding surface is given in Figure 5. Curves of the overload with ± 20% parameter perturbation are shown in Figure 6(solid line for + 20% ,dashed line for − 20% ).

5 Conclusion In this paper, an adaptive fuzzy logic system is incorporated with the VSC system for the purpose of improving the performance of the control system. A sliding surface with an additional tunable parameter is defined as a new output based on the idea of output redefinition, as a result the overload system of missile with the characteristic of non-minimum phase can be transformed into minimum-phase system by tuning the parameters of the sliding surface, and a sliding-mode controller can be designed. For the existence of uncertainty of the parameters, a fuzzy logic system is used to approximate it, thus the chattering effects can be alleviated. Finally, the simulation results have been given to show the effectiveness of the proposed control scheme.

References 1. Kravaris, C., Wright, R.A.: Nonminimum-phase Compensation for Nonlinear Processes. AIChE. J. 38 (1992) 26-40 2. Yang, H., Hariharn, K., Marcelo, H.: Tip-trajectory Tracking Control of Single-link Flexible Robots via Output Redefinition. Proceedings of International Conference on Robotics and Automation Detroit, Michigan. (1999) 1102-1107 3. Zinober, A., Owens, D. (Eds.): Nonlinear and Adaptive Control. LNCIS 281, Springer-Verlag Berlin Heidelberg (2003) 239-248 4. Iiya, A.S., Yuri, B.S.: Aircraft Nonminimum Phase Control in Dynamic Sliding Manifolds. Journal of guidance ,control and dynamics, 24(3) (2001) 566-572 5. Chwa, D.K., Choi, J.Y.: New Parametric Affine Modeling and Control for Skid-toTurn Missiles. IEEE Transactions on Control Systems Technology, 9(2) (2001) 335-347 6. Lee, J.I., Ha, I.J.: Autopilot Design for Highly Maneuvering STT Missiles via Singular Perturbation-Like Technique. IEEE Transactions on Control System Technology, 7(5) (1999) 527-541 7. Ryu, J.H., Park, C.S., Tank, M.J.: Plant Inversion Control of Tail-Controlled Missiles. AIAA-97. 3766 (1997) 1691-1696

228

Y. Bao et al.

Appendix To make the sliding surface S = k1 ( n y − n yd ) + k 2 (ω z − ω zd ) converge,

ω z

should be proportional to n y , which will be proved in the following conclusion. Conclusion 1: The acceleration of angular is approximately proportional to the derivation of overload of missile Proof Take the pitch channel model as an example, there exists the following relation

α = ω z − a34α − a35δ z

(A1)

V (a34α + a35δ z ) g

(A2)

n y =

substitute (A1) into (A2), it can obtain that

n y =

V (ω z − α ) g

(A3)

V (ω z − α) g

(A4)

derivate (A3) and we will have

n y =

Because α is not easy to obtain and the value of which in small compared with the relation of (A5) can be got.

n y ≈

V ω z g

ω z , so (A5)

An Improved Genetic & Ant Colony Optimization Algorithm and Its Applications Tiaoping Fu1,2, Yushu Liu1, Jiguo Zeng1, and Jianhua Chen2 1

School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China [email protected] 2 Naval Arms Command Academy, Guangzhou 510430, China

Abstract. It is a crucial research to improve the efficiency of weapon-target assignment (WTA) of warship formation. However, the WTA problem is NP hard. Some heuristic intelligent algorithms usually result in local optimal. A novel genetic & ant colony optimization (GACO) algorithm is proposed which is based on the combination of genetic algorithm and ant colony algorithm. Genetic algorithm phase adopts crowding replacement and changeable mutation operator to create multiple populations. Due to the good initial pheromone distribution, ant colony optimization phase can avoid getting into local optimal. Then, a further study of how to use the algorithm on WTA is made. Some experiments are made. The results demonstrate that GACO has better efficiency than other classical algorithms. The bigger the WTA problem is concerned, the more advantage the algorithm makes. The proposed algorithm is viable for other NP-hard problems.

1 Introduction Following with the rapid progress of the modern technology, all kinds of anti-ship weapons gain fast development. The traditional anti-sea and anti-sub operations have largely changed into the scope of anti-air operations. Thus, a credible and efficient WTA algorithm is very necessary. However, the present firepower distribution researches of warship mainly focus on single warship operation, the achievement of warship formation cooperative firepower distribution is scarce. But, in the future seafight, facing for the complicated anti-air situation, warship formation must cooperatively control many kinds of missiles to attack the arriving targets. WTA problem is complicated optimization problem whose solutions numbers exponential increase with the numbers of weapons (M) and the numbers of targets (N). It is a multiple parameters and multiple constraints NP hard problem. Such resource allocation problems of combinatorial optimization have numerous local extrema [1], have the characters of nonlinear, discontinuous etc. The excellent performance of many heuristic intelligent algorithms has arose more and more research interests. Among these algorithms, genetic algorithm has the high-speed searching character, can solve the WTA problem well when combinated with other optimization algorithm. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 229 – 239, 2006. © Springer-Verlag Berlin Heidelberg 2006

230

T. Fu et al.

At the aspect of WTA problem, people bring forward many methods such as neural networks, genetic algorithms with greedy eugenics, expert system. However, these algorithm all have their advantage and shortage, cannot give attention to both speed and quality performances, so is hard to fit the real-time and precise firepower distribution demand of warship formation anti-air missiles attacking numerous targets. Facing for the urgent requirement, avoiding the shortage of present algorithm, an improved genetic & ant colony optimization (GACO) algorithm based on the combination of genetic algorithm and ant colony algorithm is proposed.

2 Air Defense WTA Problem for Warship Formation 2.1 Analyzation of WTA Problem Weapon-target assignment problem is an important research which make all weapons in a region cooperate action for protecting the own-force assets. Its mission is exerting multiple weapons colligating advantage as a whole, finding an optimal assignment of weapons to targets for a scenario with the objective of minimizing the expected threat of own warship formation. We have know the characters of assaulting target and anti-air missiles units and n ≥ m (when assaulting targets is more than anti-air missiles units or m > n , we can choose the n most dangerous assaulting targets based on the threat parameters and deal with other target in other groups). Aerial threat toward warship formation is very fearful. In the actual operation, the survival of formation is much more important than the cost of operation. In other words, we will not think over using the cheaper weapon resource but the safety of the warship formation can be confirmed. So, we proposed: we should stress on the protection of warship formation operation capability when assigning weapons to targets. Then, the considered WTA problems are to minimize the following colligation threat parameters function: m ª n x º min C = ¦ v j «∏ (1 − kij ) ij » j =1 ¬ i =1 ¼

(1)

Paying attention less to the cost of operation, but we take the threat parameter of every target into count during the WTA course. Thus, the direct results are reducing the dashing probability of the most dangerous targets and getting the basic intention of anti-air operation. On the other hand, the design will improve the algorithm efficiency greatly and meet the real-time requirement of decision-making. xij is decision variable. When weapon i attacks target j , xij = 1 , otherwise,

xij = 0 . v j is the threat parameter of target j . Threat parameter has relation with distance, bearing, speed of coming target and moving speed, direction of warship formation etc. kij is the damage probability of weapon i to target j . Also, we can get

kij by distance, bearing, speed of coming target and weapons capability of warship formation.

kij can be provided by C3I system of warship formation platform.

An Improved GACO Algorithm and Its Applications n

st.

¦x

ij

≤ g i , ( j = 1, 2, , m)

(2)

i =1

m

¦x

ij

≤ 1,

231

( i = 1,2,, n)

(3)

( i = 1,2,,n ; j = 1,2,,m)

(4)

j =1

xij ∈{0,1}

Formulation (2) replaces that the weapons assigned to target j can not exceed at some time. Formulation (3) represents that weapon some time.

i

gi

can only attack one target at

2.2 Computation of Target Threat Parameter The problem of threat judgment has relation with many factors, but these factors themselves are uncertain. However, fuzzy theory is a good tool to solve this kind of uncertain problem. Threat parameter has relation with distance, bearing, speed of coming target and moving speed, direction of warship formation etc which expressed by variable γ . Shown in Fig.1, we suppose warship formation lies on point W, target lies on point T. VT V ș VW

T

d

W Fig. 1. Sketch of the situation between the target and ship formation

We

suppose

there

are

target

Ti , their attribution parameters are

di , θ i , VRi (i=1,2,3…..n). When have n target, their attribution parameters are di ' , θ i ' , Vi ' after dealt without dimension. Thus, we can get the threat variable of each parameter γid , γiθ , γiV [2] , then, we can get the colligation threat value of target i toward warship formation. Then, we can put the threat value

vi got by ( F * γ ) 1/ 2

into the goal function of warship formation anti-air WTA optimization.

232

T. Fu et al.

3 The Improved Genetic & Ant Colony Optimization Algorithm for WTA Problem 3.1 Design of Improved Genetic Algorithm in GACO Algorithm Some researches have been done on the combination of genetic algorithm and ant colony optimization algorithm. But they mainly focus on the combination of simple genetic and ant colony optimization algorithm. However, they have some shortages. For example, the population variety of simple genetic algorithm is poor. It is easy to get into local optimal when the evolution generations are still very few. When ant colony optimization algorithm continues anaphase search based on the local Pareto optimum solutions, the entirely astringency of ant colony optimization is hard to ensure. If we increase the evolution generations of anaphase algorithm simply, the runtime of algorithm will increase greatly, the advantage of combination algorithm will disappear. It is a good way to maintain the variety of population by using crowding replacement. Crowding replacement can prevent the individual of high fitness from overpopulated by restricting that filial generation can only replace nearest parental generation. On the other hand, the property of individual that distance is farther has much more difference. Thus, the algorithm can get individuals that distance is farther and property has more difference when adopting crowding replacement. The implementation flow of genetic algorithm using crowding replacement in GACO is shown as: Step 1: Initialization Firstly, initial population and adaptive function is set in reason based on the character of the WTA problem. Real-number encoding can be closer to the problem, has strongpoint thereinafter: ameliorating the computing complexity and improving the computing efficiency of genetic algorithm; convenient for the hybrid with other classical optimization algorithm; good for designing genetic operator contraposing the problem; convenient for dealing with the complex constraint conditions. So, we adopted the real-number encoding in chromosome.

Ԙ

ԙ

Ԛ

ԛ

The chromosome string of the ith individual

xti of the tth generation is

aik11 aik22 ainkn . Thereinto n is the length of chromosome string, corresponding to the target number; gene bit

k

aij j replaces the serial number of weapon allocating to the

assaulting target in weapon units set:

k

a ij j

0 ° ° =® °k j °¯

no target unit allocates to the i th w eapon the k j target unit allocates to the i th w eapon, k j ∈ {0,1, 2, , m }

(5)

An Improved GACO Algorithm and Its Applications

233

Choose s individuals in the feasible region of solution space(s is the scale of colony, representing the s allocation plan), build up the initial colony X t {xti | i 12"V}. Step 2: Tournament option operators During the anaphase of genetic algorithm, the adaptability values of individuals in colony are equal approximately. Thus, if we choosing operators according to the adaptability values proportion simply, the choosing strength is poor. Tournament option operators has the function of automatically adjusting adaptability values proportion, choosing strength having nothing to do with the relative adaptability values among individuals, fitting for joining with crowding replacement especially. So, we adopt tournament option operators [3] choosing individuals to multiply. The tournament scale of tournament option is Size , the numeric area is [2, N]. The tournament scale has the relation formula with choosing strength and multiformity loss as SelIntTour (Size) = 2(log(Size) − log 4.14 log(Size) ) LossDivSize ( Size ) = Tour

1 − Size −1

− Tour

(6)

Size − Size −1

(7) ~

xts based on their fitness δ ( X ) and Size individuals are chosen from xt1᧨xt2᧨ the individuals of highest fitness to multiply set are saved to form the multiply

xt . set xt ᧨xt ᧨ Step 3: Crossover operators '1

'2

's

Choose two individuals

xt'i᧨xt'j ∈ xt'1᧨xt'2᧨ xt' s , deletes xt'i᧨xt'j from

xt'1 , xt'2 , xt's . Take

xt'i᧨xt'j as parents to disperse recombined, their offspring are xt'' k , xt''l . '' k

''l

'i

'j

''1

''2

'' s

Choose xt , xt or xt , xt to add into xt , xt ,, xt . Repeat the process s/2 times. Step 4: Time varying mutation operators If n bits of all chromosomes in whole colony get the same value, the searching space is only (1/ 2) n of the whole space when purely through crossover computing. This will decrease searching space greatly. Thus, we must adopt mutation operators to change the premature phenomena. Already having many experimental compare researches, the judgment of mutation is more important than crossover sometimes has been affirmed. Essentially, GA is a process of dynamic and adaptability. It will departure from the evolution spirit if adopting the way of fixing parameters. So, we hope modifying the value of strategy parameters during the GA computing course [4]. In our paper, we modify strategy parameters obeying the certain rule, changing the parameters based on genetic generations. Initial stages of algorithm, adopt larger mutation value, avoiding prematurity and maintaining the colony multiformity. Following the increasing of the genetic generations, mutation value drops continuously, making the computing converge to global optimization. The way is given by

234

T. Fu et al.

pm = 0.3 − 0.2×t / G

(8)

t is genetic generations, G is the total generations. ''1 ''2 xt''s to form next Based on mutation probability, mutation disturbs for xt ᧨xt ᧨ i "V}. generation colony Xt 1 {xt 1 | i 12

where

Step 5: Individual crowding replacement As for xti+1 , 1 ≤ i ≤ S , suppose xt j , 1 ≤ j ≤ S is the nearest parent individual, also mean Euclid distance d ( xti+1 , xti ) is shortest. The d ( xti+1 , xti ) is given by d ( xti+1 , xti ) = (aik11 − a kj11 ) 2 + ( aik22 − a kj 22 ) 2 + (ainkn − a kjnn )

(9)

If δ ( xti+1 ) > δ ( xti ) , replacing xt by xt +1 , otherwise, reserving xt . Step 6: If currently generation t reaches the total iteration times tmax , then break and save the computing result. Otherwise, t++, turn to Step2. Step 7: Put finally colony into objective function, get the r Pareto optimum solutions, code these chromosomes of r Pareto optimum solutions, get r optimal assignments of weapon units to targets, keep the r assignments as the inputting of anaphase algorithm. i

i

i

3.2 Design and Link of Ant Colony Optimization Algorithm in GACO Algorithm Ant colony optimization was used to solve TSP problem [5] originally, searching the shortest route among all cities through ant colony randomly searching under the inspire of pheromone. In order that ACO can be used to WTA problem, in this paper, we express WTA problem into bipartite graph G=(V U E). V is the set of n points, representing n weapon units separately, corresponding n nodes of one side of bipartite graph. U is the set of m points, representing m targets separately, corresponding m nodes of the other side of bipartite graph. E is the border joining targets nodes with weapon units nodes E = {eij | i = 1,2,n; j = 1,2,m} . If some weapon unit i is

㧘㧘

assigned to target j, there is a border Otherwise, there isn’t border.

τ ij

between weapon unit i and target j,

eij linking weapon unit i with target j.

is the trace of border

eij . If there isn’t border

τ ij = 0 . The feasible route composed with many

borders in bipartite graph, is correspond with an assignment project between targets set and weapon units set. So, seeking the optimal solution on WTA problem is searching the optimal route in bipartite graph. The ant colony optimization in GACO is described as: Step 1: Initialization (1) encode r optimal assignments of former phase(GA), form the initial r routes of ant colony optimization. (2) the initial pheromone distribution between target set and weapon unit set is given by

An Improved GACO Algorithm and Its Applications

τ ij (t0 ) = τ 0 + ∆τ ij ; i = 1, 2, , n; j = 1, 2, , m. where τ ij (t0 ) represents the trace of border

235

(10)

eij at initial time( t0 = 0), τ 0 is

pheromone constant, being a small positive real number.

∆τ ij is given by

r

∆τ ij = ¦ ∆τ ijk

(11)

k =1

where ∆τ ij represents the trace of border k

eij of route k, r is the optimal assignments

of GA, corresponding the initial r routes. QSk ° ° ∆τijk = ® ° 0 °¯

has border between target j and weapon i in kth assignment project hasn't border between target j and

(12)

weapon i in kth assignment project

where Q is adjustment parameter, S k is the objective function value of the kth assignment project. (3) make every ant correspond with only one weapon node or target node and put the node into weapon Tabu Table or target Tabu Table. Step 2: Node choosing Any ant i (corresponding with weapon node i) chooses target node j basing on β °arg max j∈allowi [τ ij (t )(ηij ) ] j=® J °¯

when q ≤ q0 otherwise

where q0 is the threshold value enacted in advance,

(13)

q0 =0.9, q is a random number

uniformly distributing in (0, 1), allowi is the set of all the targets which still now is not assigned to ant i, τ ij (t ) is the trace between weapon i and target j at time t. On WTA problem, the mathematical models of ηij are given based on different optimal rule. For example, we need decrease the threat value toward warship formation to maximum extent. The mathematical model should be the arithmetic product of damage probability kij and the threat value v j of target j.

ηij = kij × v j

(14)

J is the serial number of some weapon in allowi set, the value of J is decided by the way of roulette based on probability

Pis (t )

τ ij ( t )(η ij ) β °° β Pij ( t ) = ® ¦ τ ij ( t )(η ij ) j∈allowi ° 0 °¯

j ∈ allowi other

(15)

236

T. Fu et al.

Step 3: Local pheromone updating After every ant chooses its target node, use “local pheromone update” to update the trace of border eij .

τ ij (t + 1) = (1 −ψ )τ ij (t ) + ψ∆τ

(16)

where 0 < ψ ≤ 1 is a constant, representing the volatilization probability.

∆τ ij =

Q cbjk

(17)

k

where cbj is the total benefit of the current ant k from first node to now. One result of “local pheromone update” is that ants will not converge to a single route. Approved by experiment, the way is good for finding more latency optimal solutions and improves the quality of searching. Otherwise, all ants probably trap in an extraordinary small searching interspace. Step 4: Check of finishing node assignment (1) after all ants choose their target nodes and local update pheromone, set the Recorder Stack of ants. If the assigned weapons to the targets nodes have reached the maximum limitation, the target node will be set in the Recorder Stack of ant. Then ant moves to the next null weapon node which has not been assigned any target. Turn to Step 2. (2) if all weapon nodes have been traversed, then turn to Step 5. Step 5: Whole pheromone update After all ants having traversed all targets nodes, m solutions have been built up. These m solutions were taken into objective function and get the local optimal solutions. The best solution is preserved, using “whole pheromone update” to update the trace of borders of the best solution. The update rule of the “whole pheromone update” is given by

τ ij (t + 1) = (1 − ρ )τ ij (t ) + ρ∆τ ij (t )

(18)

where 0 < ρ ≤ 1 is the parameter which controls the attenuation process of pheromone. 1 ° ∆τ ij (t ) = ® C elitist °¯ 0

if ij is one border of the best assignment otherwise

(19)

Step 6: Check of finishing evolution If currently generation t reaches the total iteration times Tmax, loop is terminated and the optimization resolution is got. Otherwise, turn to Step 2. 3.3 Flow of Warship Formation WTA Problem Based on GACO Algorithm In the first phase of computing, adopts genetic algorithm, making full use of the GA’s characters of rapidity, randomicity and global astringency. Its function is producing initial pheromone distribution of certain problem for next phase. In the second phase

An Improved GACO Algorithm and Its Applications

237

of algorithm, adopts ant colony optimization. Under the circumstances of having initial pheromone distribution, ant colony optimization converges on the optimal path through pheromone accumulation, renewal and the parallel processing, global searching ability of ACO. Its overall frame is given in the Fig.2. Warship formation air defense WTA Define object function based on decreasing threat to maximum extent Create a set of real-number encoding randomly

Calculate the probability; ant moves to next node based on the probability

Choose multiply set through tournament option operators

After ant chooses the target node, use “local update” to update border eij trace

Disperse recombine on crossover probability

After n ants traversed m target nodes, use “whole update” to update all borders trace

Mutate disturbance based on time varying operators

Reach genetic generations?

Initialize parameters; create initial pheromone distribution; put n ants on n weapon nodes

No

Yes Create some sets of optimal solution

Stop criterion satisfied?

No

Yes Optimal solutions output

Fig. 2. Flow of warship formation WTA based on GACO

4 Experimental Results and Analysis We make experiments aiming at air defense missile-target assignment problem of warship formation to test the performance of GACO. We suppose that the formation has eight missile weapon units, facing for eight targets at the same time. The threat values of these targets to formation and the damage probability of every missile to these targets are different. The damage probability kij can be calculated based on distance, bearing, speed of coming target and the missiles performance of warship formation.

kij is provided by

C 3 I system of warship formation. The genetic generations of GA in GACO are 20; crossover probability initial mutation probability

pcross =0.6;

pmutation =0.3. Initial pheromone of every routs of ACO is

60; trace update parameter ρ =0.2, ψ =0.2, iteration times are 30. As for 8 missiles assign to 8 targets problem, we adopt GACO and GAGE [1] algorithms which can meet the real-time demand in air defense missile-target assignment problem of warship

238

T. Fu et al.

fitness value

Fig. 3. The fitness curves of GACO and GAGE

formation. All experiments are performed on a 2.8GHz machine with 512 megabytes main memory. Programs are written in Windows/Visual C++ 6.0. The comparing results of the two algorithms are shown in the Fig.3. As a whole, the fitness value of GACO and GAGE drops following the generations continuously. The fitness curve descending extent of GA in GACO prophase is less than GAGE. The cause may lie in GAGE has been imported with Greedy Eugenics. The fitness value descending extent of GA anaphase in GACO becomes slow continuously. The cause may lie in GA is helpless for the using of feedback information in system. When computing to more generations, usually makes redundancy iteration and has low efficiency when searching precision solution. However, because GACO adopts the design of crowding replacement , time varying mutation operators and makes full use of the GA’s characters of randomicity and global astringency, it can maintain the variety of population well and produce good initial pheromone distribution for ACO. During ACO phase in GACO, the fitness value drop greatly, finally, the optimization value stabilizes at a number lower than GAGE’s. The cause lies that ACO has good initial pheromone distribution and makes use of the characters of parallel processing and positive feedback, realizing to find further precision solutions and avoid getting into local optimal. So, GACO has better optimization performance and speed performance than GAGE when solving the problem of air defense missile-target assignment. For testing the performance of GACO on cosmically assignment problem, we compare GACO with other intelligent optimization algorithms, for example: GA, GAGE, Simple Genetic & Ant Algorithm [6](GAAA), Niching Genetic and Ant Algorithm[7](NGAA). The results are shown in the Table.1. The strategy parameters of GA and ant colony algorithm are same as GACO for confirming the fairness. The number out of bracket is optimization value of objective function; the number in bracket is operation time of every algorithm. We can see from the Table.1, GACO has better effective and efficient performance than the other four algorithms obviously. The bigger the assignment problem is concerned, the more advantage the algorithm makes.

An Improved GACO Algorithm and Its Applications

239

Table 1. Comparison of optimization performance and speed performance among algorithms Algorithms GA GAGE GAAA

NGAA GACO

50weapons \ 50 targets 283.35 (42.76) 172.46 (28.37) 182.32 (12.38) 181.56 (6.13) 168.73 (5.85)

80weapons \ 80 targets 357.82 (54.75) 285.39 (31.57) 288.79 (18.96) 288.13 (9.94) 278.26 (7.68)

100weapons \ 80 targets 278.65 (51.98) 163.58 (21.06) 166.42 (10.35) 164.37 (9.67) 159.38 (6.02)

120weapons \ 80 targets 143.35 (36.78) 96.84 (9.32) 98.67 (7.53) 97.43 (6.41) 93.37 (5.27)

5 Conclusions The improved genetic & ant colony optimization adopts the advantages of genetic and ant colony optimization, overcoming their shortage and achieving a good result when using on WTA problem of warship formation. We make experiments on the algorithm and compare the experimental results with other algorithms. The results demonstrate: GACO has good searching efficiency and speed performance; GACO is a preferable optimization algorithm and can meet the real-time and precision demand in WTA problem. GACO is also viable for other NP-hard problems. Following the increasing of problem scale, the improvement is more greatly. Acknowledgments. This work was partially supported by the National Defense Science Foundation of China (Grant No. 10504033). We would like to thank Doctor. Yunfei Chen for his helpful and constructive comments.

References 1. Lee, Z. J.: Efficiently Solving General Weapon-Target Assignment Problem by Genetic Algorithms with Greedy Eugenics. IEEE Transactions on Systems, 33 (1) (2003) 113-121 2. Hu, S., Z, Y.: Determining the Threatening Degrees of Objects Using Fuzzy Operations. Acta Armarmentarii, 20 (1) (1999) 43-46 (in Chinese) 3. Georges, R. Harik.: Finding Multimodal Solutions Using Restricted Tournament Selection. In Larry. J. Eshelman, editor, Proceedings of the Sixth International Conference on Genetic Algorithms, Morgan Kaufmann, (1995) 24-31 4. Holland, J.: Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI, MIT Press, Cambridge, MA (1992) 5. Dorigo, M.: Bonabeau E, Theraulaz G. Ant Algorithm and Stigmery. Future Generation Computer Systems, 16 (8) (2000) 851-871 6. Kumar, G. M, Haq, A. N.: Hybrid Genetic and Ant Colony Algorithms for Solving Aggregate Production Plan. Journal of Advanced Manufacturing Systems, 4 (1) (2005) 103-111 7. Chen, Y., Liu, Y, Fan, J., Zhao, J.: A Niching Genetic and Ant Algorithm for Generalized Assignment Problem. Transactions of Beijing Institute of Technology, 25 (6) (2005) 490494 (in Chinese)

Application of Adaptive Disturbance Canceling to Attitude Control of Flexible Satellite Ya-qiu Liu, Jun Cao, and Wen-long Song Northeast Forestry University, Harbin 150040, China [email protected], [email protected], [email protected]

Abstract. An adaptive inverse disturbance canceling method of an orbiting flexible satellite during normal pointing for “modal vibration disturbance”, which is difficult to cancel by the PID method since it’s modal frequency low and dense, and damping small, is proposed. Here, the adaptive inverse disturbance canceling, compared with the conventional feedback disturbance rejection method, performs in inner loop and is independent of dynamic response control loop. Since the adaptive inverse disturbance canceling performed is based on the PID control of dynamics response in this paper, the control structure is designed as following. Fist, the conventional PID method is designed for the dynamical control system of rigid satellite. Second, the modal vibration disturbance control is performed by adaptive inverse disturbance canceling method. The key of this approach is estimation of modal vibration disturbance, the difference is between disturbed output of the plant and disturbance-free output of the copy model, which is then input to the disturbance canceling filter which is a least squares inverse of rigid satellite model. Simulation results demonstrate the effectiveness of the controller design strategy for attitude control and modal vibration disturbance suppression.

1 Introduction The current trend of spacecraft is to use large, complex, and light weight space structures to achieve increased functionality at a reduced lunch cost. This results in these space structures being extremely flexible and having low fundamental vibration modes, and these modes are often excited during normal on-orbit operations, such as attitude maneuvers. When it is required to maneuver the attitude of the flexible spacecraft, the dynamic coupling between the solar panel vibration and the spacecraft attitude varies with the angle of attitude maneuver. Certain levels of vibration to the flexible solar arrays will be introduced. Therefore, vibration reduction is a critical problem related to maneuvering of flexible spacecraft. One basic method, which has been used in the past for disturbance canceling with linear plants, and three methods, which have been attempted in the past to cancel disturbances with nonlinear plants are presented in [1]. The first, based on a derivative plant model, suffers from high complexity; the second, based on the filtered-epsilon method, has been demonstrated to fail [2]; and the third [3], based on internal model control to be incorrect if on-line plant modeling is performed. These three approaches are abandoned here in favor of D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 240 – 251, 2006. © Springer-Verlag Berlin Heidelberg 2006

Application of Adaptive Disturbance Canceling

241

extending the disturbance canceling method used for linear plants to encompass nonlinear plants as well. The adaptive inverse control has advantages in disturbance canceling [4]. Based on inverse thought the adaptive inverse disturbance canceling has performed in inner loop through a separate adaptive inverse filter, and it is independent of dynamics response control. By handing the problem in this way, we can improve as much as possible performance, a compromise is not required in the design process to obtain good dynamic response and good disturbance control [1,5]. Thus the adaptive inverse canceling method for normal pointing attitude control and vibration suppression of an orbiting spacecraft with flexible appendage is proposed by defining the correlative vibration as “modal vibration disturbance”. The key to this method is regard effect produced in modal vibration as a kind of correlated disturbance on the base of rigid controlling, and disturbance canceling has performed in inner loop separately. In this paper, the rigid spacecraft model and adaptive inverse disturbance canceller is modeled using NARX (Nonlinear AutoRegressive with eXogenous inputs model)[5,6,7], and improved RTRL-LMBP algorithm is designed for improve convergence speed and obtain better effect of disturbance canceling control.

2 Dynamics Description The slewing motion of a rigid hub with flexible appendage attached to the hub is graphically presented in Fig.1. The rotational motion only without any translation of the center of mass of the whole structure is considered in this paper. Define the OXY and oxy as the inertial frame and the frame fixed on the hub, respectively. The attitude angle denotes the relative motion between these two frames. Denote as the flexible deformation at point with respect to the oxy frame. It is assumed that the control torque is applied to the rigid hub only. Using Lagrangian method, the governing equations of motion for the spacecraft model are given by [8].

Iθ + ¦ Fn ⋅ qn (t ) = T , qn (t ) + 2ς pn q n (t ) + pn2 ⋅ qn (t ) + Fn ⋅ θ = 0

(1)

n

where Ih is the moment of inertia of the center body; T is the control torque; Fn is coupling coefficients; qn, ςn and pn is modal variable, damping ratio, constrained Modal frequency of the nth modal of flexible appendages respectively.

m

Y

x w(x,t)

¦Ñ, E y R

T

o

θ

O b

Ih

l X

Fig. 1. Spacecraft model with single-axis rotation

242

Y.-q. Liu, J. Cao, and W.-l. Song

For the latter analysis, the model (1) can be transformed as the following form Is θ ( s ) − 2

¦

Fn s qn ( s ) = T ( s ), qn ( s ) =

− Fn s

2

n

2

s + 2ς pn s + pn 2

2

θ (s)

(2)

where s is the Laplace variable. The transfer function from control torque T to attitude angle ș can be written as: θ ( s) =

1 Is

2

⋅ (1 +

¦s n

Kn ⋅ s 2

2

+ 2ρn Λ n s + Λ n 2

)T ( s ) 1

(3)

where k n = Fn I , K n = k n (1 − k n ) , ρ n = ς (1 − k n ) 2 , Λ n = pn (1 − k n ) . The 2

2

2

block diagram of transfer function is then shown in Fig.2.

T ( s)

1 Is 2

+

¦s n

Kn ⋅ s 2 2 + 2ρn Λn s + Λ2n

θ ( s)

+

Fig. 2. Block diagram for flexible spacecraft with single-axis rotation

3 Dynamic Neural Networks A layered network is a feedforward structure that computes a static nonlinear function. Dynamics is introduced via the taped delay lines at the input to the network, resulting in a dynamic neural network, which is called NARX filter. It is general enough to approximate any nonlinear dynamical system, and either its structure or adaptive algorithm is more complicated than static network, but the ability to describe nonlinear dynamic system is strengthened greatly. 3.1 Adaptive Dynamic System Identification

NARX models have implicit feedback of delayed versions of their output to the input of the model. This feedback is assumed in all block diagrams in Fig. 3. The purpose of Fig. 3 is to show that this feedback, when training an adaptive plant model, may be connected to either the model output or the plant output. The first method is called a parallel connection for system identification, and the second method is called a seriesparallel connection for system identification. Networks configured in series-parallel may be trained using the standard backpropagation algorithm. Networks configured in parallel must be trained with either real-time recurrent learning (RTRL) or backpropagation through time (BPTT). The series-parallel configuration is simple, but is biased by disturbance. The parallel configuration is more complex to train, but is unbiased by disturbance. Therefore, in this work, nonlinear system identification is

Application of Adaptive Disturbance Canceling

243

first performed using the series-parallel configuration to initialize weight values of the plant model. When the weight values converge, the plant model is reconfigured in the parallel configuration and training is allowed to continue. This procedure allows speedy training of the network, but is not compromised by disturbance. Zk

XN 3ODQW

P

yk

yk

6HULHV3DUDOOHO

B

3DUDOOHO 0RGHO

ek

yˆ k

Pˆ

Fig. 3. Adaptive plant modeling

3.2 Adapting Dynamic Neural Networks

LM (Levenberg-Marquardt backpropagation) algorithm is the combination of the steepest decent algorithm with the Gauss-Newton algorithm. Compared with a conjugate gradient algorithm and a variable learning rate algorithm, the LevenbergMarquardt algorithm is much more efficient than either of them on the training steps and accuracy. With the aid of the approximate second derivative, the LM algorithm is more efficient than the gradient method. Therefore, it can be applied to online control. As the matrix is positive definite, the solution always exists, such that LM method is preferable to Gauss-Newton method. To improve training speed, the improved LMBP-RTRL algorithm based on the LM method is proposed. An NARX filter computes a function of the following form yk = f ( xk , xk −1 , , xk − n , yk −1 , yk − 2 , , yk −m , W ).

(4)

The familiar “sum of squared error” cost function is Vk = 1/ 2E (ek ek ). T

(5)

To Stochastic approximate Vk, we need to construct Vk ≈ (1/ 2)ek ek . T

(6)

For adapting NARX filters, it was fist done by RTRL algorithm, defining the Stochastic approximate function (6), where, ek=dk-yk. Then the Jacobians presentation is as follows. J (W )

dy k dW

=

∂y k ∂W

=

dek dW

n

+

¦ i =0

= −

∂y k dxk − i ∂xk − i dW

(7)

dy k dW m

+

¦ i =1

∂y k dy k − i ∂y k − i dW

(8) .

244

Y.-q. Liu, J. Cao, and W.-l. Song

The first term ∂yk/∂W in (8) is the direct effect of a change in the weights on yk, which is denoted a Jacobians J0(W); The second term is zero; The final term can be broken up into two parts. The first, ∂yk/∂yk-i, can by obtained by BP algorithm. The second part, dyk-i/dW, is simply a previously calculated and stored value of dk/dW. When the system is “turn on,” dyi/dW are set zero for i=0,-1,-2,…, and the rest of the terms are calculated recursively from that point on. In the case of hiding time, a similar presentation follows,

e = {e1 , e2 , , eS } {e1 , e2 , , eN }

(9)

W = {w111 , w121 , , w1S R , b11 , bS1 , w112 , , bSM } {w1 , w2 , , wn }.

(10)

M

1

1

M

The J0(W) is of the following form:

§ ∂y1 (W ) ∂w1 ¨ ∂y (W ) ∂w ∂y 2 1 J 0 (W ) = =¨ ∂W ¨ ¨ © ∂y N (W ) ∂w1

∂y1 (W ) ∂w2

∂y2 (W ) ∂w2

∂y N (W ) ∂w2

∂y1 (W ) ∂wn ·

∂y2 (W ) ∂wn ¸

¸ ¸ ¸ ∂y N (W ) ∂wn ¹.

(11)

where, N = S M , n = S1 (R + 1) + S 2 (S1 + 1) + + S M (S M −1 +1) . The elements of J0(W) can be computed by improving algorithm of backpropagation. By defining new sensitivity m m si , h = ∂y h ∂ni , then

[ J 0 ]h , l =

∂yh

=

∂wl

[ J 0 ]h , l =

∂yh ∂wl

∂yh ∂wij

m

∂yh

=

∂yh ∂ni

m

=

∂bi

m

m m −1 = si , h a j .

∂ni ∂wij m

m

∂yh ∂ni

(12)

m

=

∂ni ∂bi m

m

m = si , h

(13)

And it is initialized at the final layer

siM, h =

∂yh ∂n

M i

f M ( niM )

i=h

¯0

i≠h

=®

(14)

It can also be shown that the sensitivities satisfy the following recurrence relation m m m m +1 T m +1 S = F (n )( W ) S

(15)

.

Continue, J0(W) may be calculated via (12) and (13). Let (d w y ) k

ª¬( dyk −1 dW )

T

( d x y ) k [( ∂yk ∂y k −1 )

( dy

k −2

( ∂y

k

dW )

T

∂yk − 2 )

J (W ) = − [ J 0 (W ) + ( d x y ) k ( d w

( dy

k −m

( ∂y y) ] k

.

k

dW )

T

º¼

T

∂yk − m )].

(16)

Application of Adaptive Disturbance Canceling

245

Obtained the Jacobian matrix, the weights and offsets may be adjusted by LM method. The update becomes −1

∆W = − ª¬ J (W ) J (W ) + µ I º¼ J (W )e(W ). T

T

(17)

Where the parameter ȝ is multiplied by some factor ȕ whenever a step would result in an increased V(W). When a step reduces V(W), ȝ is divided by ȕ. Notice that when ȝ is large the algorithm becomes steepest descent, while for small ȝ the algorithm becomes Gauss-Newton. The LM algorithm can be considered a trust-region modification to Gauss-Newton. The algorithm is summarized as follow: 1) Let all inputs to the network and compute the corresponding network outputs and errors, and then compute value of the cost function. 2) Compute the Jacobian matrix. Networks configured in series-parallel may be trained using the standard backpropagation algorithm. A Networks configured in parallel must be trained with RTRL based on LMBP algorithm as (16). 3) Solve (17) to obtain ∆Wk; 4) Re-compute the cost function using Wk+∆Wk. If this new value is smaller than that computed in step 1, then reduce ȝ by ȕ, Let Wk+1=Wk+∆Wk, and go back to step 1. If the value is not reduced, then increase ȝ by ȕ and go back to step 3. 5) The algorithm is assumed to have converged when the norm of the gradient is less than predetermined value, or when value of the cost function has been reduced to some error goal.

4 Control Strategy During normal pointing attitude control of an orbiting flexible spacecraft, the modal vibration of flexible appendages is regarded as a kind of correlated disturbance defined as “modal vibration disturbance”, which is difficult canceling by the PID method since it’s modal frequency low and dense, damping small. An adaptive inverse control has advantage in disturbance canceling, which implemented only in the inner loop. Since the adaptive inverse disturbance canceling performed is based on the PID control of dynamics response in this paper, the control structure is designed as following. Fist, the conventional PID method is designed for the dynamical control system of rigid spacecraft. Second, the modal vibration disturbance control is performed by adaptive inverse disturbance canceling method. The PID controller design is not provided here, and in the following part only the disturbance canceller design is described. According disturbance canceling technology of adaptive inverse control [1,5], structure diagram illustrating the adaptive inverse “modal vibration disturbance” canceling for flexible satellite during normal pointing control mode as Fig.4. First, the dynamical system of the rigid spacecraft we wish to control is modeled using NARX neural network Pˆ . Second, a very close copy Pˆ of Pˆ , disturbance-free match to plant, is fed the same input as the plant NP, which is dynamics module based on reaction wheels of constrained mode of flexible spacecraft with single-axis rotation as Fig.2. The difference between the disturbed output of the plant and the disturbance-free output of the copy model is estimation of modal COPY

246

Y.-q. Liu, J. Cao, and W.-l. Song

vibration disturbance ηˆ , which is then input to a copy of disturbance canceller k

COPY −1 z Qˆ k ( z ) , and z Qˆ k ( z ) is a best least squares inverse of rigid spacecraft model PˆCOPY . −1

At the same time, the output of z Qˆ ( z ) is subtracted from the plant input to effect cancellation of the plant disturbance, such that the incentive element of modal vibration is cancelled in principle and the vibration can be effective reduced. The unit −1

COPY

k

delay z-1 of Qˆ ( z ) in recognition of the face that digital feedback links must have at least one unit of delay around each loop [1]. Thus, the current value of the plant disturbance ηˆ can be used only for the cancellation of future values of plant disturbance and cannot be used for instantaneous self-cancelation. The effects of these unit delays are small when the system is operated with a high sampling rate, however. COPY

k

k

z −1

Qˆ kCOPY ( z )

B

PˆCOPY _

0 _

Km Tm s + 1

PID

s

ηˆ k

θ

Flexible Satellite Model

NP +

1 Tf s +1 The saved attitude angle information

Modeling input Offline identification

Desired response Low-pass filter process

Pˆ

Synthetic noise source

PˆCOPY

=

Qˆ k ( z ) error

n

Offline process for generating Qˆ k ( z )

Fig. 4. Adaptive inverse disturbance canceling for flexible satellite during normal pointing control mode

However, considering uncertainty of parameter of spacecraft in modeling or orbiting, the model Pˆ is required adaptively. At the same time, considering convergence speed, the model Pˆ is adapted using offline method, which performed as following: Fist, the modeling input is obtained by low-pass filtering saved signal of controller output and desired response is attitude angle of “quasi-rigid”, which obtains by filter “modal vibration disturbance” from attitude angle signal of flexible spacecraft, where the filter process based on 1-D DWT (Discrete Wavelet Transform). Second, real-time is to adjust the weight of Pˆ according to adaptive reference signal, which is the difference between the saved and filtered attitude angle signal of orbiting spacecraft and the output of model Pˆ . Finally, in order to improve convergence speed, the adaptive disturbance canceller ˆ Q ( z ) is generated by offline process, in which requires a synthetic noise source that has the same statistic characteristic with plant disturbance. In this application, the synthetic noise is superposed using sine signal, which should has any order modal k

Application of Adaptive Disturbance Canceling

247

vibration frequency (here considering only the first 5 order modal). Since the offline compute speed is much faster than real-time, to a specific Pˆ ( z ) , the optimal Qˆ ( z ) can be generated by offline process. In practice, in a sampling cycle of real system, offline compute the Qˆ ( z ) can be iterated hundreds or thousands times. In addition, considering dynamic describe performance of canceller Qˆ ( z ) , it is performed using NARX neural network. The whole process of scheme as shown in Fig.4 is performed as following: k

k

k

k

1) Data storage: The adaptive modeling signals of current Pˆ are provided by sample saved queue in the certain time, which has two components: the modeling input (output of controller) and attitude angle signal of flexible spacecraft. The queue is updated per certain time (for example 10 second), at the same time we have adapting once the Pˆ and updating the Pˆ , continue training Qˆ ( z ) and updating z Qˆ ( z ) . 2) Adaptation of model: Using the pairs of input-output, Pˆ is adapted, which are composed of both components of the queue processed through low-pass filter. Here the low-pass filter process is based on 1-D DWT, for example to obtain quasi-rigid attitude angle desired response a5, and we decompose disturbed output signal of the plant using the db10 wavelet into 5 levels. 3) Training of canceller: The Qˆ ( z ) is trained as soon as Pˆ was updated. Since the scheme in Fig.5 aim at rejection of “modal vibration disturbance”, so the synthetic noise is composed of superposed signal using sin waves with first 5 order modal frequency. Once z Qˆ ( z ) is updated, the disturbance canceling loop works on the new parameters, such the adaptive process is performed “real-time”. The above three processes go on continually, such can be performed well disturbance canceling control since Pˆ and z Qˆ ( z ) are adaptive. COPY

−1

k

COPY

k

k

−1

COPY

COPY

k

−1

COPY

COPY

k

5 Simulation Results In order to test the proposed control schemes, numerical simulations have been performed. The numerical model of the spacecraft is from the [9]. The low-frequency modes are generally dominant in a flexible system, in this paper, and the first five modes frequency and coupling coefficient are shown in table 1, a concerning modal truncation we can consult [10]. Fist, according to conventional method design the PID controller of rigid spacecraft with single-axis rotation as Fig.1. The parameter of PID is selected by Matlab toolbox rltool as KP=6, KI=0.05, and so the phase and amplitude margins of close loop system is 80°and 18dB respectively. In this simulation, the adaptive modeling signals of Pˆ ( z ) are provided by sample saved queue in 500s, the modeling input-output data is obtained through low-pass k

248

Y.-q. Liu, J. Cao, and W.-l. Song Table 1. Some Coefficients in Simulation Model Solved By Constrained Modes Order 1 2 3 4 5

Modal frequency ofconstrained Modes pn (rad/s) 2.6917 16.869 47.233 92.557 153.003

Coupling coefficients Fn (kg1/2 m) 2.6917 0.4301 0.1537 0.0785 0.0475

filter, which processes the queue by 1-D wavelet decompose using db10 into 5 levels, and an input of training Qˆ ( z ) is synthetic noise of 10000 samples. Both plant model [4,5] and N . and canceller are structured using NARX neural network N The parameter of training neural network is no longer provided. For comparative purposes, seven different cases of disturbance canceling control for normal pointing control are conducted: 1) only using the PI control, as showed in Fig.5; 2) applying the PI control with adaptive inverse disturbance canceller, as showed in Fig.6; 3) and 4) cases are the case of 2) with ±20% variance for modal k

( 2 , 2 ),10 ,1

(a) curve of attitude angle

(b) curve of attitude rate

Fig. 5. Response to PI case

(a) curve of attitude angle

(b) curve of attitude rate

Fig. 6. Response to PI with adaptive inverse disturbance canceller

( 5 ,5 ), 30 ,1

Application of Adaptive Disturbance Canceling

249

frequency, Pˆ and z Qˆ ( z ) without adaptively updated, plotted in Fig.7 and Fig.8; 5) and 6) cases are the case of 2) considering -20% variance for inertia, Pˆ and z Qˆ ( z ) without adaptively updated, which is showed in Fig.9 and Fig.10; 7) case 2) considering –50% variance for inertia, Pˆ and z Qˆ ( z ) (a) without and (b) with adaptively updated, which result showed in Fig.11. −1

COPY

k

COPY

COPY

−1

COPY

k

−1

COPY

(a) curve of attitude angle

COPY

k

(b) curve of attitude rate

Fig. 7. Response to PI with adaptive inverse disturbance canceller considering -20% variance −1 COPY and z Qˆ ( z ) without adaptively updated for modal frequency, Pˆ COPY

k

(a) curve of attitude angle

(b) curve of attitude rate

Fig. 8. Response to PI with adaptive inverse disturbance canceller considering +20% variance −1 COPY and z Qˆ ( z ) without adaptively updated for modal frequency, Pˆ COPY

k

The simulation results of PI control with disturbance canceller, considering ±20% and z Qˆ ( z ) without adaptively updated are shown in variance for inertia, Pˆ Fig.9 and Fig.10. Analysis of Fig.5~Fig.10 is shown that: (1) both effective rejection attitude dither (modal vibration) and great advance steady precision are performed using adaptive inverse disturbance canceling (Fig.5~Fig.6); (2) the adaptive inverse disturbance canceller has finite stability of scheme for uncertainty and variance of parameter (Fig.7~Fig.8). −1

COPY

COPY

k

250

Y.-q. Liu, J. Cao, and W.-l. Song

(a) curve of attitude angle

(b) curve of attitude rate

Fig. 9. Response to PI with adaptive inverse disturbance canceller considering -20% variance −1 COPY and z Qˆ ( z ) without adaptively updated for inertia, Pˆ COPY

k

(a) curve of attitude angle

(b) curve of attitude rate

Fig. 10. Response to PI with adaptive inverse disturbance canceller considering +20% variance −1 COPY and z Qˆ ( z ) without adaptively updated for inertia, Pˆ COPY

k

(a)

(b)

Fig. 11. The curve of attitude angle considering –50% variance for inertia (a) PˆCOPY and −1 COPY −1 COPY z Qˆ k ( z ) without adaptively updated (b) PˆCOPY and z Qˆ k ( z ) with adaptively updated

Application of Adaptive Disturbance Canceling

251

The above simulation results only demonstrate the action effect and stability of adaptive inverse disturbance canceller, and does not reflect that disturbance canceller requires disturbance control loop to adapt necessarily for work well. Therefore, to demonstrate adaptivity of Pˆ

COPY

and z Qˆ −1

COPY

k

( z ) is necessary.

and z Qˆ −1

Considering case (2) with –50% variance for inertia, (a) Pˆ

COPY

adaptively updated, as showed in Fig.11(a), and (b) Pˆ adaptively updated, which is showed in Fig.11(b).

COPY

COPY

k

and z Qˆ −1

( z ) without

COPY

k

( z ) with

6 Conclusions An adaptive inverse disturbance canceling method for “modal vibration disturbance”, which is difficult canceling by the PID method since it’s modal frequency low and dense, and damping small, is proposed. From the controlling effect, we can draw the conclusion that the modal vibration disturbance is rejected effectively and the precision of normal pointing attitude control is improved greatly; at the same time, the design of adaptive inverse disturbance canceller can ensure the parameter robustness. On the other hand, from design scheme, we can obtain that the control performance of disturbance rejection is improved, which does not affect system dynamic response since adaptive inverse disturbance canceling, compared with the conventional feedback disturbance rejection method, performs in inner loop and is independent of dynamic response control loop. Simulation results demonstrate that all above problems are solved by the research productions in this paper. The further work is to apply this method to the experimental study.

References 1. Widrow, B., Walach, E.: Adaptive Inverse Control, Prentice Hall P T R, Upper Saddle River, NJ (1996) 2. Carbonell Oliver, D.: Neural Networks Based Nonlinear Adaptive Inverse Control Algorithms. Thesis for the Engineer degree, Stanford University, Stanford, CA. (1996) 3. Bilello, M.: Nonlinear Adaptive Inverse Control. Ph.D. thesis, Stanford University, Stanford, CA. (1996) 4. Plett, G. L.: Adaptive Inverse Control of Plants with Disturbances. Ph.D. dissertation, Stanford Univ., Stanford, CA. (1998) 87-91 5. Plett, G. L.: Adaptive Inverse Control of Linear and Nonlinear Systems Using Dynamic Neural Networks. IEEE transactions on neural networks. 14 (2003) 360-376 6. Siegelmann, H.T., Horne, B.G.: Computational Capabilities of Recurrent NARX Neural Networks. IEEE Trans. on Systems, Man and Cybernetics - Part B: Cybernetic. 27 (1997) 208-215 7. Haykin, S.: Neural networks: A Comprehensive Foundation. Second Edition. Prentice Hall International (1999) 8. Junkins, J. L., Youdan K.: Introduction to Dynamics and Control of Flexible Structures. AIAA, (1993) 82-100 9. Jin Jun, S.: Study on CSVS Method for the Flexible Spacecraft, PhD thesis, Harbin Institute of Technology (2002) 10. Liu, D., Yang, D.M.: Modeling and Truncation of Satellite with Flexible Appendages. Journal of Astronautics. 4 (1989) 87-95

Application of Resource Allocating Network and Particle Swarm Optimization to ALS Jih-Gau Juang, Bo-Shian Lin, and Feng-Chu Lin Department of Communications and Guidance Engineering National Taiwan Ocean University, Keelung 20224, Taiwan, ROC [email protected] Abstract. This paper presents two intelligent aircraft automatic landing control schemes that use neural network controller and neural controller with particle swarm optimization to improve the performance of conventional automatic landing systems. Control signals of the aircraft are obtained by resource allocating neural networks. Control gains are selected by particle swarm optimization. Simulation results show that the proposed automatic landing controllers can successfully expand the safety envelope of an aircraft to include severe wind disturbance environments without using the conventional gain scheduling technique.

1 Introduction Conventional Automatic Landing Systems (ALS) can provide a smooth landing which is essential to the comfort of passengers. However, these systems work only within a specified operational safety envelope. When the conditions are beyond the envelope, such as turbulence or wind shear, they often cannot be used. Most conventional control laws generated by the ALS are based on the gain scheduling method [1]. Control parameters are preset for different flight conditions within a specified safety envelope which is relatively defined by Federal Aviation Administration (FAA) regulations. According to FAA regulations, environmental conditions considered in the determination of dispersion limits are: headwinds up to 25 knots; tailwinds up to 10 knots; crosswinds up to 15 knots; moderate turbulence, wind shear of 8 knots per 100 feet from 200 feet to touchdown [2]. If the flight conditions are beyond the preset envelope, the ALS is disabled and the pilot takes over. An inexperienced pilot may not be able to guide the aircraft to a safe landing at airport. According to Boeing's report [3], 67% of the accidents by primary cause are due to human factors and 5% are attributed to weather factors. By phase of flight, 47% accidents are during final approach or landing. It is therefore desirable to develop an intelligent ALS that expands the operational envelope to include safer responses under a wider range of conditions. The goal of this study is to prove that the proposed intelligent automatic landing controllers can relieve human operators and guide the aircraft to a safe landing in a wind disturbance environment. In this study, robustness of the proposed controller is obtained by choosing optimal control gain parameters that allow a wide range of disturbances to the controller. In 1995, Kennedy and Eberhart presented a new evolutionary computation algorithm the real-coded Particle Swarm Optimization (PSO) [4]. PSO is one of the latest D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCS 344, pp. 252 – 263, 2006. © Springer-Verlag Berlin Heidelberg 2006

Application of Resource Allocating Network and PSO to ALS

253

population-based optimization methods which dose not use filtering operation (such as crossover and mutation). Members of the entire population are maintained through the search procedure. This method was developed through the simulation of a social system, and it has been found to be robust in solving continuous nonlinear optimization problems [5]-[7]. The method is also suitable for determination of the control parameters which give aircraft better adaptive capability in severe environments. PSOs are best-suited for function optimization tasks. Their structure gives them some sort of a real time solution, while tuning some parameters, such as initial area, swarm size, and neighborhoods. There has also been proof of PSO being able to solve the Traveling Salesman Problem and doing multi-objective optimization tasks [8]. On the other hand, the ability to optimize functions makes PSO effective for adjusting neural network weights or some parameters to other evolutionary algorithms and techniques. Therefore, PSO is suitable for determining control parameters, which give aircraft better adaptive capability in severe environments. Recently, some researchers have applied intelligent concepts such as neural networks and fuzzy systems to intelligent landing control to increase the flight controller's adaptively to different environments [9]-[14]. Most of them do not consider the robustness of controller due to wind disturbances [9]-[12]. In [13], a PD-type fuzzy control system is developed for automatic landing control of both a linear and a nonlinear aircraft model. Adaptive control for a wide range of initial conditions has been demonstrated successfully. The drawback is that the authors only set up the wind disturbance at the initial condition. Persistent wind disturbance is not considered. In [14], wind disturbances are included but the neural controller is trained for a specific wind speed. Robustness for a wide range of wind speeds has not been considered. Juang [15]-[16] had presented a sequential learning technique that uses a conventional neural network with back-propagation through time algorithm in successful landing control. But the number of hidden units was determined by trial and error and the speed of convergence was slow. For sequential learning of Radial Basis Network, Platt [17] had developed an algorithm known as Resource Allocation Network (RAN). It starts with no hidden units and grows by allocating new hidden units based on the novelty in the observations that arrive sequentially. If an observation has no novelty, then the existing parameters of the network are adjusted by an LMS algorithm to fit that observation. RAN had been used for several applications varying from function approximation to nonlinear system identification. Its powerful approximation ability and fast convergence characteristic has been demonstrated. Here, we present two control schemes, RAN controller and RAN controller with PSO algorithm, to guide the aircraft to a safe landing and make the controller more robust and adaptive to the ever-changing environment.

2 System Description The pilot descends from cruising altitude to an altitude of approximately 1200ft above the ground. The pilot then positions the airplane so that the airplane is on a heading towards the runway centerline. When the aircraft approaches the outer airport marker, which is about 4 nautical miles from the runway, the glide path signal is intercepted (as shown in Fig. 1). As the airplane descends along the glide path, its pitch, attitude and speed must be controlled. The aircraft maintains a constant speed along the flight path. The descent rate is about 10ft/sec and the pitch angle is between -5 to +5 degrees.

254

J.-G. Juang, B.-S. Lin, and F.-C. Lin

Finally, as the airplane descends 20 to 70 feet above the ground, the glide path control system is disengaged and a flare maneuver is executed. The vertical descent rate is decreased to 2ft/sec so that the landing gear may be able to dissipate the energy of the impact at landing. The pitch angle of the airplane is then adjusted, between 0 to 5 degrees for most aircraft, which allows a soft touchdown on the runway surface. Altitude 1200 ft Glide Path

| 50 ft

Flare Path

Runway Position

0 ft Touchdown

Fig. 1. Glide path and flare path

A simplified model of a commercial aircraft that moves only in a longitudinal and vertical plane is used in the simulations for implementation ease [14]. To make the ALS more intelligent, reliable wind profiles are necessary. Two spectral turbulence forms modeled by von Karman and Dryden are mostly used for aircraft response studies. In this study the Dryden form [14] was used for its demonstration ease. Fig. 2 shows a turbulence profile with a wind speed of 30 ft/sec at 510 ft altitude. Wind Gust velocity components: Longitudinal (Solid) & Vertical (Dashed) 20

10

ft/sec

0 -10 -20 -30

-40 0

5

10

15

20 25 30 Time (sec.)

35

40

45

50

Fig. 2. Turbulence profile

3 Landing Control In this study, the aircraft maintains a constant speed along the flight path. We assumed that the change in throttle command is zero. The aircraft is thus controlled solely by the

Application of Resource Allocating Network and PSO to ALS

255

pitch command. In this section, we present an intelligent neural network controller that uses the Resource Allocation Network to guide the aircraft to a safe landing in a wind disturbance environment. And then, Particle Swarm Optimization is used in the automatic landing system to improve the performance of the previous intelligent landing controller and make the controller more robust and adaptive to the ever-changing environment. 3.1 Resource Allocating Network Controller RAN is a modified neural network from Radial Basis Network. The output of the RAN algorithm has the following form: J

J

j =1

j =0

F (x ) = ¦ w j ϕ j (x ) + θ = ¦ w j ϕ j (x )

(1)

where ϕ j (x ) is the response of the jth hidden neuron to the input x and w j is the weight connecting the jth hidden unit to the output unit. θ = w0ϕ 0 is the bias term.

Here, J represents the number of hidden neurons in the network. ϕ j (x ) is a Gaussian function given by

(

§ x−m ¨ j ϕ j (x ) = exp¨ − 2 2σ j ¨ ©

2

)

· ¸ ¸ ¸ ¹

(2)

where m j = m j1 ,, m jp is the center, and σ j is the width of the Gaussian function.

The learning process of RAN involves allocation of new hidden units as well as adjusting network parameters. The network begins with no hidden units. As observations are received, the network grows by using some of them as new hidden units. The following two criteria must be met for an observation (x0 , y 0 ) to be used to add a new hidden unit to the network: xn − m j > εn

(3)

en = y n − F (x n ) > emin

(4)

where m j is the center (of the hidden unit) closest to x n . ε n and emin are thresholds to be selected appropriately. When a new hidden unit is added to the network, the parameters associated with the unit are w j +1 = e n m j +1 = x n

(5)

σ j +1 = κ x n − m j

κ is an overlap factor, which determines the overlap of the responses of the hidden units in the input space.

256

J.-G. Juang, B.-S. Lin, and F.-C. Lin

When the observation (x0 , y 0 ) does not meet the criteria for adding a new hidden

[

T

]

T T

unit, the network parameters W = w0 , w1 , , w j , m1 , , m j

are updated using the

LMS as follows: W (n ) = W (n − 1) + ηe n a n

(6)

where Ș is the learning rate. a n is the gradient vector and has the following form:

[

T

]

T T

a n = 1, w1 , , w j , m1 , , m j

ª 2w j 2w T = «1, ϕ1 (x n ), , ϕ j (x n ), ϕ1 (x n ) 21 (x n − m1 ) , , ϕ j (x n ) 2 x n − m j σ1 σj «¬

(

º

)» T

T

(7)

»¼

Therefore, the learning process is defined as: If x n − m j > ε n And en = y n − F (x n ) > emin

᧶

adding a new hidden unit, and parameters are chosen as w j +1 = e n m j +1 = x n

σ j +1 = κ x n − m j Else

W (n ) = W (n − 1) + ηe n a n

End In the scheme, the RAN algorithm is used to tune the neural controller and guide the aircraft to a safe landing in a wind disturbance environment. The RAN structure is shown in Fig. 3. Fig. 4 describes the control scheme in an intelligent automatic landing system, which consists of a PI controller, RAN controller, aircraft model, command, and a wind model. 3.2 Resource Allocating Network Controller with Particle Swarm Optimization

In the PSO algorithm, each member is called “particle,” and each particle flies around in the multi-dimensional search space with a velocity, which is constantly updated by the particle’s own experience - the experience of the particle’s neighbors or the experience of the whole swarm. PSO can be used to solve many of the same kinds of problems as the genetic algorithm (GA). This optimization technique does not suffer, however, from some of GA’s difficulties. Interaction in the group enhances, rather than detracts from, progress toward the solution. Further, a particle swarm system has memory, which the genetic algorithm does not have. Each particle keeps track of its coordinates in the problem space, which are associated with the best solution (fitness) it has achieved so far. This value is called pbest. Another value that is tracked by the global version of the particle swarm optimizer is the overall best value, and its location, obtained so far by any particle in the population. This location is called gbest. At each

Application of Resource Allocating Network and PSO to ALS

257

time step, the particle swarm optimization concept consists of velocity changes of each particle toward its pbest and gbest locations. Acceleration is weighted by a random term, with separate random numbers being generated for acceleration toward pbest and gbest locations. This is illustrated in Fig. 5, where

x k is the current position of a

x k +1 is its modified position, v k is its initial velocity, v k +1 is its modified velocity, v pbest is the velocity considering its pbest location, and v gbest is the velocity

particle,

considering its gbest location.

Fig. 3. Structure of RAN

Fig. 4. Aircraft automatic landing system with RAN controller

The operation of particle swarm optimization is shown in Fig. 6. The definition of the parameters is

vid(k ) : velocity of individual i at iteration k, Vdmin w : inertia weight factor,

≤ vid( k ) ≤ Vdmax

258

J.-G. Juang, B.-S. Lin, and F.-C. Lin

c1 , c2 : acceleration constant, rand1, rand2 : uniform random number between 0 and 1,

xid(k ) : current position of individual i at iteration k , pbesti : pbest of individual i, gbest : gbest of the group.

k (v )

(v

k +1

x k +1 ) ( v gbest )

xk

( v pbest )

x k −1 Fig. 5. Movement of a PSO particle

In here, Initial conditions are: number of particles is 20, V min = −0.5 , V max = 0.5 , c1 = c 2 = 1.5 . The fitness function is defined as: For Turbulence strength=min : Į : max Do{ The Process of Landing } -3 ≤ h(T ) ft/sec ≤ 0, 200 ≤ x (T ) ft/sec ≤ 270, If -300 ≤ x(T ) ft ≤ 1000, -1 ≤ θ (T ) degree ≤ 5 Fitness =Turbulence strength Else Fitness = Turbulence strength - Į End End

® ¯

4 Simulation Results In the simulations, successful touchdown landing conditions are defined as follows: -3 ≤ h(T ) ft/sec ≤ 0, 200 ≤ x (T ) ft/sec ≤ 270, -300 ≤ x(T ) ft ≤ 1000, -1 ≤ θ (T ) degree ≤ 5,

Application of Resource Allocating Network and PSO to ALS

259

where T is the time at touchdown. Initial flight conditions are: h(0)=500 ft, x (0) =235 ft/sec, x(0) =9240 ft, and γ o =-3 degrees. After using RAN, the controller can successfully guide the aircraft flying through wind speeds of 0 ft/sec to 70ft/sec. Table 1 shows the results from using different wind turbulence speeds with the original control gains that were used in [14] as shown in Fig. 7. Fig. 8 to Fig. 11 show the results of using RAN. The results indicate that the RAN controller can result in fast online adjusting, and it can implement a more robust network structure than [14]-[16] which can only overcome turbulence to 30 ft/sec, 50 ft/sec, and 65 ft/sec, respectively.

Initialize a population of particles with random positions and velocity

Calculate fitness function f(x)

Compare each particle’s fitness Generate initial pbest and gbest K=1

Vidnew

w u Vid C1*rand()*(Pbest Xid) C 2 * rand () * (Gbest Xid ) X idnew

X idold Vidnew

Calculate fitness function F(new)

Yes

K+1

F(new)>Fp(old)

Pbest(n)=Xnew(n)

No Fp(new)>Fg(old)

No

Xold=Xnew Vold=Vnew

No Terminate condition

Yes Yes

Gbest=Pbest(n)

Optimal solution

Fig. 6. Operation of PSO

260

J.-G. Juang, B.-S. Lin, and F.-C. Lin Table 1. The Results From Using RAN Controller (k1=2.8; k2=2.8; k3=11.5; k4=6.0;)

Wind speed 0 20 40 45 50 55 60 65 70

horizontal velocity ( ft/sec ) 234.6779 234.6779 234.6779 234.6779 234.6779 234.6779 234.6779 234.6779 234.6779

᧤᧥

Landing point ft

᧤

᧥ ᧤

᧥

Aircraft vertical Pitch angle Speed ft/sec degree

750.0594 785.2610 808.7288 796.9949 527.1153 973.0034 855.6644 961.2695 867.3983

-2.7626 -2.1738 -2.0113 -2.0691 -2.2504 -1.3382 -1.6461 -1.5612 -1.3437

-1.4249 -0.5760 0.4015 0.4195 0.8709 1.1702 1.0376 0.9780 1.6158

Number of hidden units 20 52 93 117 119 125 132 140 140

Disturbance

θc

θe

δE

kθ

Aircraft Response

Rate loop Rate loop Position loop Position loop Flare: Typical value: Glide Slope: Typical value: Glide Slope: kθ = 2 .8 Flare:

k q = 2 .8

θ q

kq

kθ = 11 .5 k q = 6.0

Fig. 7. Pitch autopilot

Fig. 8. Aircraft pitch and pitch command

Fig. 9. Vertical velocity and command

Application of Resource Allocating Network and PSO to ALS

Fig. 10. Aircraft altitude and command

261

Fig. 11. Growth number of RAN hidden unit

Fig. 12. Aircraft pitch and pitch command Fig. 13. Vertical velocity and velocity command

Fig. 14. Aircraft altitude and command

Fig. 15. Growth number of RAN hidden units

In previous section, the control gains of the pitch autopilot in glide-slope phase and flare phase are fixed (as shown in Fig. 7). After using PSO, optimal control gains can be obtained. The controller can successfully overcome turbulence to 95 ft/sec. Table 2 shows the results from using different wind turbulence speeds. Fig. 12 to Fig. 15 show

262

J.-G. Juang, B.-S. Lin, and F.-C. Lin

Table 2. The Results From Using RAN Controller with PSO (K1=2.3003; K2=2.3003; K3=11.6411; K4=20.913;)

Wind speed 30 35 40 45 50 55 60 65 70 75 80 85 90 95

horizontal Landing velocity point ft ( ft/sec ) 234.6779 796.9949 234.6779 785.2610 234.6779 576.8423 234.6779 820.4627 234.6779 808.7288 234.6779 820.4627 234.6779 698.9305 234.6779 867.3983 234.6779 902.5617 234.6779 543.5309 234.6779 750.0594 234.6779 761.7933 234.6779 785.2610 234.6779 620.9865

᧤᧥

Aircraft vertical Speed

Pitch angle

Number of hidden units

0.1305 0.3786 0.6208 0.8530 1.0986 1.3282 1.5423 1.7485 2.0277 1.8307 2.5887 2.8101 3.0111 3.5558

79 86 94 106 111 118 124 126 130 130 132 137 141 146

᧤ft/sec᧥ ᧤degree᧥

-2.0780 -1.9942 -1.8895 -1.9696 -1.6908 -1.5961 -1.4927 -1.3969 -1.3472 -1.2538 -1.2361 -1.1466 -1.0507 -1.0944

the results from using RAN with PSO. In comparison, while using RAN with the PSO algorithm, the controller is more adaptive to ever-changing environments.

5 Conclusion The purpose of this paper is to investigate the use of hybrid neural networks and evolutionary computation to aircraft automatic landing control and to make the automatic landing system more intelligent. Current flight control law is adopted in the intelligent controller design. Tracking performance and adaptive capability are demonstrated through software simulations. For the safe landing of an aircraft with a conventional controller, the wind speed limit of turbulence is 30 ft/sec. In this study, the RAN controller with original control gains can overcome turbulence to 70 ft/sec. The RAN controller with PSO algorithm can reach 95 ft/sec. These results are better than those without using the PSO algorithm. Especially, the PSO algorithm adopted in RAN has the advantage of using fewer hidden neurons. This is because the PSO method can be used to generate high quality solutions on complex parameter searches. From these simulations, the proposed intelligent controllers can successfully expand the controllable environment in severe wind disturbances. Acknowledgement. This work was supported by the National Science Council, Taiwan, ROC, under Grant NSC 92-2213-E-019 -005.

Application of Resource Allocating Network and PSO to ALS

263

References 1. Buschek, H., Calise, A.J.: “Uncertainty Modeling and Fixed-Order Controller Design for a Hypersonic Vehicle Model,” Journal of Guidance, Control, and Dynamics, vol. 20, no. 1, 42-48, (1997) 2. Federal Aviation Administration, “Automatic Landing Systems,” AC 20-57A, Jan. (1971) 3. Boeing Publication.: “Statistical Summary of commercial Jet Airplane Accidents”, Worldwide Operations (1959-1999) 4. Kennedy, J., Eberhart, R. C.: “ Particle Swarm Optimization,” Proceedings of IEEE International Conference on Neural Networks, Vol. IV, pp. 1942-1948, Perth, Australia, (1995) 5. Shi, Y., Eberhart, R. C.: “Empirical Study of Particle Swarm Optimization,” Proceedings of the 1999 Congress on Evolutionary Computation, 1945-1950, Piscataway, (1999) 6. Peter, J. A.: “Using Selection to Improve Particle Swarm Optimization,” Proceedings of IEEE International Conference on Evolutionary Computation, pp. 84-89, Anchorage, May (1998) 7. Zheng, Y. L., Ma, L., Zhang, L., Qian, J.: “On the Convergence Analysis and Parameter Selection in Particle Swarm Optimization,” Proceedings of the Second IEEE International Conference on Machine Learning and Cybernetics, November 2-5, (2003)1802-1807 8. Kennedy, J., Eberhart, R. C.: Swarm Intelligence, Morgan Kauffman publishers, San Francisco, CA, (2001) 9. Izadi, H., Pakmehr, M., Sadati, N.: “Optimal Neuro-Controller in Longitudinal Autolanding of a Commercial Jet Transport,” Proc. IEEE International Conference on Control Applications, CD-000202, 1-6, Istanbul, Turkey, June (2003) 10. Chaturvedi, D.K., Chauhan, R., Kalra, P.K.: “Application of generalized neural network for aircraft landing control system,” Soft Computing, vol. 6, 441-118, (2002) 11. Iiguni, Y., Akiyoshi, H., Adachi, N.: “An Intelligent Landing System Based on Human Skill Model,” IEEE Transactions on Aerospace and Electronic Systems, vol. 34, no. 3, 877-882, (1998) 12. S. Ionita and E. Sofron, “The Fuzzy Model for Aircraft Landing Control,” Proc. AFSS International Conference on Fuzzy Systems, pp. 47-54, Calcutta, India, February 2002. 13. Nho, K., Agarwal, R.K.: “Automatic Landing System Design Using Fuzzy Logic,” Journal of Guidance, Control, and Dynamics, vol. 23, no. 2, 298-304, (2000) 14. Jorgensen, C.C., Schley, C.: “A Neural Network Baseline Problem for Control of Aircraft Flare and Touchdown,” Neural Networks for Control, 403-425, (1991) 15. Juang, J.G., Chang, H.H., Cheng, K.C.: “Intelligent Landing Control Using Linearized Inverse Aircraft Model,” Proceedings of American Control Conference, vol. 4, 3269-3274, (2002) 16. Juang, J.G., Chang, H.H., Chang, W.B.: “Intelligent Automatic Landing System Using Time Delay Neural Network Controller,” Applied Artificial Intelligence, vol. 17, no. 7, 563-581, (2003) 17. Platt, J.: “A Resource Allocating Network for Function Interpolation,” Neural Computation, vol. 3, 213~225, (1991)

Case-Based Modeling of the Laminar Cooling Process in a Hot Rolling Mill Minghao Tan1, Shujiang Li1, Jinxiang Pian2, and Tianyou Chai2 1

School of Information Science and Engineering, Shenyang University of Technology, 110023 Shenyang, China [email protected] 2 Research Center of Automation, Northeastern University, 110004 Shenyang, China [email protected]

Abstract. Accurate mathematical modeling of the laminar cooling process is difficult due to its complex nature (e.g., highly nonlinear, time varying, and spatially varying). A case-based temperature prediction model is developed for the laminar cooling process using case-based reasoning (CBR) and the dynamical process model. The model parameters for the current operating condition are found by retrieving the most similar cases from the case base according to the current operating condition and reusing the solutions of the retrieved cases. The resulting model can predict the through-thickness temperature evolutions of the moving strip during the cooling process. Experimental studies based on industrial data from a steel company show the effectiveness of the proposed modeling approach.

1 Introduction In a hot rolling mill, the laminar cooling process is used to cool hot steel strips from the finishing temperature to the desired final cooling temperature. The intention of improving product quality as well as reducing production cost has led to higher demands on the laminar cooling control system. Good control performance of this process is vital to the desired mechanical properties and metallurgical structure of the steel product [1]. As a result the development of an accurate model capable of describing the heat transfer mechanism of the laminar cooling process with sufficient accuracy is essential [2]. In the literature many methods have been developed to model the laminar cooling process. Groch used statistical process models with simplified heat transfer descriptions [3]. Ditzhuijzen neglected the through-thickness heat conduction inside the strip and modeled the laminar cooling process as merely a first-order time delay [4]. However, because of the time varying and spatially varying nature of the laminar cooling process, it should instead be described by nonlinear partial differential equations with nonlinear boundary conditions. Empirical heat transfer equations were used by several authors to model the laminar cooling process [5], [6], [7]. A numerical model of the D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 264 – 274, 2006. © Springer-Verlag Berlin Heidelberg 2006

Case-Based Modeling of the Laminar Cooling Process in a Hot Rolling Mill

265

laminar cooling process was developed in [8] which calculates the through-thickness temperatures at the strip center line during the cooling process. The heat transfer coefficients were determined by specific onsite experiments on the runout table cooling hardware. However, the heat transfer coefficients determined in this way can not reflect the changes in operating conditions and the model can only be used for offline development purposes. Accurate description of key process parameters during laminar cooling is essential to modeling the laminar cooling process. This paper takes a knowledge-based approach to modeling the laminar cooling process in which case-based reasoning (CBR) [9], [10] is integrated with the first principles dynamical model. The key process parameters are obtained using case-based reasoning [11] and physical analysis according to the operating conditions of the cooling process. Experimental studies with industrial data show superior accuracy of the proposed modeling approach.

2 Typical Laminar Cooling Process The schematic of a typical laminar cooling process is shown in Fig.1. After leaving the last finishing stand F7 the strip is cooled on the runout table by top and bottom water headers. At the entry to the cooling area the temperature and thickness of the strip is measured by the infrared pyrometer P1 and the X-ray gauge D1. At the end of the runout table the final cooling temperature of the strip is measured by P2 before it is wound at the downcoiler. The strip speed during the cooling process is tracked by speed tachometers. Nineteen banks of four headers are installed on the runout table, with each header having a constant flow rate. There are four spray patterns for the four headers in each water bank [2]. Speed

Top headers

F7 Bank 1

Bank 2

Bank 15

Bank 16

D1 P 1

Runout table 7.68 m 4.62m

Bank 19

Strip

Bottom headers

P2

Coilers 10.10 m

27.95 m

Water cooling area 100.8m

Fig. 1. Schematic of the laminar cooling process

The strip temperature is related to the operating conditions of the moving strip, such as the strip material, strip gauge, entry temperature, and the control signals such as the activated headers and the flow rate of cooling headers. The output of the laminar cooling process is the strip temperature. If the strip is divided into M through-thickness layers, the inputs of the laminar cooling process include the strip gauge d, the strip length L, the steel grade Gr,

266

M. Tan et al.

the entry temperature Te, the water temperature Tw, the environment temperature Ten, the strip speed v, the strip acceleration ac, the first activated top header Ht, the first activated bottom header Hb, the number of activated headers H, the header flow rate q, and the spray pattern π. The process outputs include the strip temperature on the top surface T0, the strip temperature on the bottom surface TM, and the temperatures of the through-thickness layers inside the strip T1…TM-1.

3 Physical Model of the Laminar Cooling Process [2] The temperature of the ith lengthwise strip segment is described by the following equation [2]

∂Ti ( y , t (i )) ∂ 2Ti ( y , t (i )) =a ∂t ( i ) ∂y 2

(1)

with the initial condition

Ti ( y , t (i )) = Ti 0 ( y )

(2)

and the boundary conditions

λ

t (i ) ∂Ti ( y, t (i )) d = α 0 [Tw 0 ( xk 0 + ³t v(t )dt, t ) − T ( i , t (i ))] d ∂y 2 y=

(3)

t (i ) ∂Ti ( y, t (i)) d = α M [TwM ( xi 0 + ³ v(t )dt, t ) − T (− i , t (i))] t d ∂y 2 y=−

(4)

i0

i

2

λ

i0

i

2

∂Ti ( y,t (i )) ∂y

=0

(5)

y =0

where a is the thermal diffusivity of the strip, Ti(y,t(i)) is the temperature of the ith strip segment at location y and instant t(i), λ is thermal conductivity of the strip, Tw0, TwM are the temperature of cooling water on the top and bottom surface of the strip, α0, αM are heat transfer coefficients on the top and bottom surface of the strip, di is the thickness of the ith strip segment, xi0 is the position of the ith strip segment at the initial time instant ti0.

4 Case Based Modeling Strategy of the Laminar Cooling Process The proposed case-based modeling strategy for the laminar cooling process is shown in Fig. 2. The dynamical model is established from physical analysis of the heat transfer process. The features of the current operating condition are extracted from the operating data and used to retrieve matching cases in the case base. The solution parameters of the current operating condition are determined by reusing the solutions

Case-Based Modeling of the Laminar Cooling Process in a Hot Rolling Mill

π Ht Hb H

Tcm

Laminar cooling process

Statistical analysis

Retrieve

Stored Cases

Retrieved Cases

᧻

Accuracy acceptable

Reuse

Process knowledge

Reused Cases

N

Y

Revise Retain

267

Revised Cases

Gr vh dh T h

Dynamical model of the

ac w L Tw T en

laminar cooling process

M odels of heat transfer coefficients and thermal conductivities

T0 T1 ,…TM

Fig. 2. Case-based modeling strategy for the laminar cooling process

of the retrieved cases. The obtained solutions are then tested by calculating the heat transfer coefficients, thermal conductivities, and thermal diffusivities and performing statistical analysis of the temperature predictions. 4.1 Dynamical Model of the Laminar Cooling Process

We can discretize (1)-(5) using finite difference as

(1+ a0

(1 − a0 a j ∆Γ

∆Γ ∆Γα0 ∆Γ + a0 )T0 (n +1) − a0 T1(n +1) = 2 (∆y) ∆yλ0 (∆y)2

a ∆Γα 0 a ∆Γα 0 ∆Γ ∆Γ − 0 )T0 ( n ) + a0 T1 ( n ) + 2 0 TW ∆ yλ 0 ( ∆y ) 2 ∆yλ0 ( ∆y ) 2 a j ∆Γ

a j ∆Γ

T j +1 ( n + 1) ( ∆y ) ( ∆y ) ( ∆y ) 2 (j=1,2, …, M-1) a j ∆Γ a j ∆Γ a j ∆Γ = T n + − T n + ( ) ( 2 2 ) ( ) ) T ( n ) j −1 j j +1 ( ∆y ) 2 ( ∆y ) 2 ( ∆y ) 2

(2 + 2

2

)T j ( n + 1) −

2

T j −1 ( n + 1) −

(6)

(7)

268

M. Tan et al.

(1 + aM aM

∆Γ α ∆Γ ∆Γ + aM M )TM (n + 1) − aM TM −1(n + 1) = 2 (∆y) (∆y)2 ∆yλM

∆ΓαM ∆ΓαM ∆Γ ∆Γ T (n) + (1 − aM )TM (n) + 2aM TW − aM 2 M −1 2 (∆y) (∆y) ∆yλM ∆yλM

(8)

where j is the jth through-thickness layer (j=0, 1, … M), T is the strip temperature, n is the nth time step, ∆Γ is time step size, aj is thermal diffusivity at layer j; λ0, λM is thermal conductivities at top and bottom surface; α0, αM is heat transfer coefficients at the top and bottom surface. Equations (6) and (8) describe the heat transfer on the surface of the moving strip and (7) describes the heat conduction between various layers within the strip. The determination of the heat transfer coefficients α0, αM, thermal conductivities λ0, λM and thermal diffusivities aj (j=0, M) is key to improving the model accuracy. When the header is activated, the heat transfer coefficients during water cooling are related to the spray intensity, the strip surface temperature and the strip speed, etc. Because the header flow rate is constant the heat transfer coefficients at the top and bottom surface are modeled as follows [2]

α0 = (2 − ((Hc − Ht ) /10.0+ 1)0.12)β1 (

v β2 d β3 T0 β4 ) ( ) ( ) 1.1vh dh Th

(9)

αM = (2 − ((Hc − Hb ) /10.0+1)0.12)β1 (

v β2 d β3 TΜ β4 ) ( ) ( ) Th 1.1vh dh

(10)

where Hc is the specified header, v is the strip speed at the specified header, d is the strip gauge at the specified header, Tj (j=0, M) is the strip temperature at the specified header, q is the cooling water flow rate at the specified header; vh is the speed of the strip head at the entry to the cooling section, dh is the thickness of the strip head measured at D1, Th is the temperature of the strip head measured at P1. β1 ,β2 ,β3 ,β4 are parameters to be determined. When the header is deactivated the heat transfer coefficients at the top and bottom surface are calculated by [2]

α0 = σ × ε ×

(T04 − Ten4 ) + 6.5 + 5.5 × v 0.8 T0 − Ten

α M = 0.8 × σ × ε ×

(T04 − Ten4 ) T0 − Ten

(11)

(12)

where σ is the Stefan-Boltzmann constant, ε= 0.82 is the emissivity. The thermal conductivities at the top and bottom surface λj (j=0, M) are found by [2]

λj = 56.43-(0.0363-c (v – 1.1⋅vh) )×Tj (j= 0, M)

(13)

Case-Based Modeling of the Laminar Cooling Process in a Hot Rolling Mill

269

The thermal diffusivity at layer j is calculated by [2] 8.65 − 0.0146 (T j − 400) °5.0 − 0.045 (T − 650) ° j aj =f(Tj)= ® T + 2 . 75 0 . 025 ( j − 700) ° °¯5.25 + 0.00225(T j − 800)

T j ∈ [400, 650) T j ∈ [650, 700) (j=0,…, M) T j ∈ [700, 800) T j ∈ [800, 1000]

(14)

Because the parameters β1 ,β2 ,β3 ,β4 and c vary with operating conditions, casebased reasoning is used to determine these parameters according to the changing operating conditions. 4.2 Case Representation and Retrieval

The case base stores knowledge of the laminar cooling process in the form of organized cases. Each case, consisting of two parts, case descriptors and case solutions, is a specific experience in modeling the laminar cooling process for a given operating condition. The case solutions include the parameters β1, β2, β3, β4 and c in (10), (11), and (14). They are mainly related to the key features of the process operating conditions, namely the steel grade, the strip gage, the strip speed, and the strip temperature, which are chosen as the case descriptors. The case structure is shown in Table 1. Table 1. Case Structure Case descriptors F f1 f2 f3 f4 Gr vh dh Th

Case solutions S s2 S3 s4

s1

β1

β2

β3

β4

s5 c

The current operating condition of the strip is defined as Cin, and the descriptors of Cin are F =(f1, f2, f3, f4). The solutions of Cin are defined as S=(s1, s2, s3, s4, s5). Assume there are m cases in the case base, C1, C2,…Cm. The descriptor vector of P P P P P Ck (k=1,…m) is defined as Fk = ( f k ,1 , f k , 2 , f k ,3 , f k , 4 ) , the solution vector of Ck is defined as

S kP = ( sk,P1 , sk,P2 , sk,P3 , sk,P4 , sk,P5 )

(15)

Due to the limited space the similarity functions between various descriptors are omitted in this paper. The reader is referred to chapter 2 of [2] for details. The similarity between the current operating condition Cin and the stored case Ck (k=1,…m) is 4

SIM ( C in , C k ) =

¦ω

l

× sim ( f l , f kP,l )

l =1

(16)

4

¦ω l =1

l

270

M. Tan et al.

SIM max = Max ( SIM (Cin , Ck )) k∈{1,m}

(17)

All cases with similarities greater than the threshold SIMth are retrieved from the case base. 4.3 Case Reuse

If no exact match for the current operating condition is found in the case base the solutions of the retrieved cases have to be adapted before they can be reused for the current operating condition. Suppose r cases have been retrieved from the case base {C1R , ..., C rR } , where the similarity between C kR (k=1,…r) and the current operating condition is SIMk. Assume SIM1 ≤ SIM2 ≤ ··· ≤ SIMr ≤ 1, then the solutions of the retrieved cases are

S kR = ( sk,R1 , sk,R2 , sk,R3 , sk,R4 , sk,R5 ) (k=1,…r)

(18)

The solution of the current operating condition is S=(s1, s2, s3, s4, s5), where r

¦w

k

sl =

× s k,Rl

k =1

(19)

r

¦w

k

k =1

(l=1,…5)

and wk (k=1,…r) is found as follows If Then

SIMr =1

1 k = r ½ ¾ ¯0 k ≠ r ¿

wk= ®

Else wk =SIMk End If

(20)

(k=1,…r)

4.4 Case Revision and Case Retention

Case revision performs the evaluation test on the validity of the reused solutions that results from case reuse. The flowchart of case revision is shown in Fig. 3. The heat transfer coefficients, the thermal conductivities, and thermal diffusivities are calculated from the solutions of case reuse. Then the final cooling temperatures of the strip segments are calculated according to (6)-(8). The statistical evaluation signal ∆T is calculated by N

∆T = ¦ | T0 (i ) − Tcm (i ) | / N i =1

(21)

Case-Based Modeling of the Laminar Cooling Process in a Hot Rolling Mill

271

where Tcm(i) is the final cooling temperature measurement (i=1,…N), N is the number of cooling temperature measurements, T0(i) is the final cooling temperature prediction by this model. In case ∆T < 10 , the case is retained into the case base. In case ∆T > 10 , case revision is performed to improve the accuracy of the solution from reuse. The revised case is tested of its redundancy before it is retained in the case base.

͠

͠

Given s1 ... s5 from case reuse

Calculate ∆T

∆ T > 10 ° C?

N

Y

Adjust β 1, i , so that the model predictions equal real measurements

β 1 = median ( β1,1, β1,2, ..., β 1,N )

Adjust ci in the same way as β 1 (i=1,...N)

c = median ( c1, c2,..., cN)

Adjust β 1, β 2, β 3, β 4, c

Calculate ∆T

N

∆ T ≤ 10 ° C?

Y

Case retention

Fig. 3. Flow chart of case revision

272

M. Tan et al.

5 Experimental Study In this experiment we use 61 data samples collected from a rolling mill and compare the model predictions with the results of [12]. The case descriptors for the experiment are shown in Table 2. Table 2. Case Descriptors Gr 316

dh 12

vh 2.9

Th 835

According to the descriptors in Table 2 one case was retrieved from the case base with SIMmax =0.62. Table 3 lists the case solutions calculated by the case-based reasoner for the specified operating condition. The model predictions of the proposed modeling method and ref [12] are plotted against the real cooling temperature measurements in Fig. 4. Table 3. Reasoning Results

β1

β2

β3

β4

4306

0.85

1.12

1.2

c 0.0052

Table 4. Model Accuracy Comparison SIM N 0.62 61

͠

Measurements ±10 This paper Ref [12] 61 39

Real measurements

This paper

Ref. [12]

Coiling temperature

Seg. No.

Fig. 4. Comparison of final cooling temperature predictions

Case-Based Modeling of the Laminar Cooling Process in a Hot Rolling Mill

273

Thirty-nine temperature predictions of the model in [12] are within 10°C of the temperature measurements, as can be seen from Table 4. Fig. 4 shows the model in [12] lost track of many of the cooling temperature measurements, esp. towards the final period. In sharp contrast, 100% (61/61) of the predictions of this paper are within 10°C of the measurements. It is evident that the proposed approach is very good at tracking the evolution of the strip temperature and capable of much better accuracy than the model in [12].

6 Conclusions The development of an accurate model is essential to better understanding and successful control of the laminar cooling process. This paper has introduced a novel hybrid approach to modeling the laminar cooling process that combines first principles modeling and case-based reasoning. Experiments based on data collected from the laminar cooling process of a hot mill have demonstrated the superior model accuracy of the hybrid modeling approach. The results in this paper can be generalized to a wide range of similar processes.

Acknowledgements This work was partly supported by the Ph.D. Funding Program of Shenyang University of Technology, the Program of Liaoning Department of Education under Grant No.2004D309, Shenyang Science and Technology Program under Grant No.10530842-05, the China National Key Basic Research and Development Program under Grant No.2002CB312201, and the Funds for Creative Research Groups of China under Grant No.60521003.

References 1. Chai, T.Y., Tan, M.H., et al Intelligent Optimization Control for Laminar Cooling. In: Camacho, B., Puente, D. (eds.): Proc. of the 15th IFAC World Congress. Elsevier, Amsterdam (2003) 691-696 2. Tan, M.H.: Intelligent Modeling of the Laminar Cooling Process. Tech. Rep. 18. Research Center of Automation, Northeastern University, Shenyang (2004) 3. Groch, A.G., Gubemat, R., Birstein, E.R.: Automatic Control of Laminar Flow Cooling in Continuous and Reversing Hot Strip Mills. Iron and Steel Engineer. 67(9) (1990) 16-20 4. Ditzhuijzen, V.G.: The Controlled Cooling of Hot Rolled Strip: A Combination of Physical Modeling, Control Problems and Practical Adaptation, IEEE Trans. Aut. Cont. 38(7) (1993) 1060-1065 5. Moffat, R.W.: Computer Control of Hot Strip Coiling Temperature with Variable Flow Laminar Spray. Iron and Steel Engineer. 62(11) (1985) 21-28 6. Leitholf, M.D., Dahm, J.R.: Model Reference Control of Runout Table Cooling at LTV. Iron and Steel Engineer. 66(8) (1989) 31-35

274

M. Tan et al.

7. Yahiro, K. J.: Development of coiling temperature control system on hot strip mill. Kawasaki Steel Mizushima Works Tech. Rep. 24. (1991) 8. Evans, J.F., Roebuck, I.D., Howard, R.W.: Numerical Modeling of Hot Strip Mill Runout Table Cooling. Iron and Steel Engineer. 70(1) (1993) 50-55 9. Kolodner, J.L.: Case-Based Reasoning. 1st edn. Morgan Kaufmann, New York (1993) 10. Watson, I., Marir, F.: Case-based reasoning: A review. Knowledge Engineering Review. 9(2) (1994) 355-381 11. A. Aamodt, and E. Plaza, "Case-based reasoning: Foundational issues, methodological variations, and system approaches," AI Communications, Vol. 7, pp. 39–59, 1994. 12. Shan, X.Y.: Transformation and Development of the Cooling Control System of the 2050mm Baosteel Hot Strip Mill. In: Ren, D. (eds.): Development of Science and Technology in Metallurgy. Metallurgical Industry Press, Hangzhou China (1999) 19-22

Fast Mesh Simplification Algorithm Based on Edge Collapse Shixiang Jia1, Xinting Tang2, and Hui Pan3 1

Department of Computer Science and Technology, Ludong University, 264025 Yantai, P.R. China [email protected] 2 Department of Computer Science and Technology, Ludong University, 264025 Yantai, P.R. China [email protected] 3 Department of Computer Science and Technology, Ludong University, 264025 Yantai, P.R. China [email protected]

Abstract. Firstly, we present a new mesh simplification algorithm. The algorithm is based on iterative half-edge contracting, and exploits a new method to measure the cost of collapse which takes the length of contracting edge and the dihedral angles between related triangles into account. The simplification does not introduce new vertex in original mesh, and enables the construction of nested hierarchies on unstructured mesh. In addition, the proposed algorithm adopts the Multiple-Choice approach to find the simplification sequence, which leads to a significant speedup with reduced memory overhead. Then we implement a mesh simplification system based on this algorithm, and demonstrate the effectiveness of our algorithm on various models.

1 Introduction Many high-resolution models are obtained by scanning systems or created by modeling systems. Unfortunately, these highly detailed models are hard to store and transmit, and will slow down the rendering speed, causing jerkiness of movement. In fact, such complex models are not always required. In order to get simpler versions of them, a simplification algorithm is needed. As for the algorithm, it takes a complex model and automatically generates an approximation using fewer triangles that looks reasonably similar to the original. We provide a new algorithm which tries to preserve the visually important parts of the model by using a new cost function to measure the approximation error. As the visual acuity of the human vision system is principally dependent upon three factors: size, orientation and contrast [1,2], our new cost function will focus on the edge length and the dihedral angles between related triangles. In order to speed up the algorithm, we use a probabilistic optimization strategy based on the Multiple-Choice Algorithm [3] to find the optimal decimation sequence. The Multiple-Choice technique does not require a global priority queue data structure which reduces the memory overhead and simplifies the algorithmic structure. We have developed a simplification D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 275 – 286, 2006. © Springer-Verlag Berlin Heidelberg 2006

276

S. Jia, X. Tang, and H. Pan

system based on the proposed algorithm, and applied our algorithm on many models with various size. The rest of the paper is organized as follows. We first review the related work in Section2. Section 3 describes our algorithm in detail. The implementation is discussed in Section 4. Section 5 presents a discussion of results and performance analysis. Section 6 concludes the paper.

2 Related Work The problem of surface simplification has been studied in both the computational geometry and computer graphics literature for several years. Some of the earlier work by Turk [4] and Schroeder [5] employed heuristics based on curvature to determine which parts of the surface to simplify to achieve a model with the desired polygon count. Vertex clustering algorithm described by Rossignac and Borrel [6] is capable of processing arbitrary polygonal input. A bounding box is placed around the original model and divided into a grid. Within each cell, the cell’s vertices are clustered together into a single new representative vertex. The method is very fast and effective, however, the quality of the approximation is not often satisfactory. This approach usually leads to a vertex distribution which does not adapt to the local curvature of the surface, and can not guarantee a proper manifold topology of the resulting approximation. Hoppe [7,8] posed the model simplification problem into a global optimization framework, minimizing the least squares error from a set of point-samples on the original surface. Later Hoppe extended this framework to handle other scalar attributes, explicitly recognizing the distinction between smooth gradients and sharp discontinuities. He also introduced the progressive mesh [8], which is essentially a stored sequence of simplification operation, allowing quick construction of any desired level of detail along the continuum of simplifications. However, the algorithm provides no guaranteed error bounds. There is considerable literature on surface simplification using error bounds. Cohen and Varsheny [9] have used envelopes to preserve the model topology and obtain tight error bounds for a simple simplification. An elegant solution to the polygon simplification problem has been presented in [10,11] where arbitrary polygonal meshed are first subdivided into patches with subdivision connectivity and then multiresolution wavelet analysis is used over each patch. These methods preserve global topology, give error bounds on the simplified object and provide a mapping between levels of detail. Garland [12] used iterative contractions of vertex pairs to simplify models and maintains surface error approximation of polygonal modes. This algorithm is efficient and can rapidly produce high quality approximation. Incremental decimation algorithms typically lead to superior model quality. These algorithms simply models by iteratively executing atomic decimation step such as edge collapse (see Fig. 1). An edge collapse takes the two endpoints of the target edge, moves them to the same position, links all the incident edges to one of the vertices, deletes the other vertex, and removes the faces that have degenerated into lines or points. Typically, this removes two triangular faces per edge contraction. To minimize the approximation error, a cost function measuring the quality of the approximation is proposed to guide the process of simplification [7,12].

Fast Mesh Simplification Algorithm Based on Edge Collapse

t2 t3 8

t2

t1 t4

277

t5

9

t3

9

t4

Fig. 1. Half-edge collapse. The (u, v) edge is contracted into point v. The t1 and t5 triangles become degenerate and are removed.

The particular sequence of edge collapse transformations must be chosen carefully, since it determines the quality of the approximating models. For those algorithms [7], the priority queue is a natural data structure to store the order of the edges to be simplified, which allows a variety of operations (inclusion, access and removal of the largest, etc.) to be efficiently performed. But it takes a long time to build the queue before starting the simplification process. Furthermore, each step of the decimation also consumes a significant amount of time to recompute the collapse cost of changed edges and to update their position in the priority queue. In order to accelerate this process, Wu and Kobbelt [3] have presented a technique called Multiple-Choice based on probabilistic optimization. It makes no use of a priority queue, but chooses the edge to be contracted from a small number of randomly selected edges. We provide a new algorithm which can preserve the visually important parts of the model by using a new cost function to measure the approximation error. In order to speed up the algorithm, we also use a probabilistic optimization strategy based on the Multiple-Choice Algorithm to find the optimal decimation sequence. Our system allows faster simplification than some quality method.

3 Simplification Algorithm 3.1 Atomic Decimation Operator Our algorithm is based on the half-edge collapse operation. Half-edge collapse is to choose an edge (u,v) and contract it to one of its endpoint v. After collapsing, all triangles adjacent to either u or v are connected to v, and triangles adjacent to both u and v are removed (see Fig. 1). We prefer half-edge collapse because of its simplicity. The methodology of halfedge collapse is in fact closely related to the vertex decimation approach. In each step of vertex decimation approach, a vertex is selected for removal. All the facets adjacent to that vertex are removed from the model and the resulting hole is triangulated. Instead of the vertex elimination and arising hole triangulation, half-edge contracting just merge one endpoint of the selected edge into the other endpoint. Half-edge contracting avoids the action of hole triangulation, and is generally more robust than vertex decimation. In this case, we do not need to worry about finding a plane onto which the neighborhood can be projected without overlap. In addition, half-edge contracting makes progressive transmission more efficient (no intermediate vertex coordinates) and enables the construction of nested hierarchies that can facilitate further applications.

278

S. Jia, X. Tang, and H. Pan

The change caused by edge collapse is quantified by collapse cost. The algorithm based on edge collapse has to solve two problems: one is how to calculate the collapse cost for every candidate; the other is how to find the simplification sequence. Then the algorithm can collapse the edge iteratively until the given criterion is satisfied. 3.2 Collapse Cost According to the characteristics of human vision system, observers are mainly sensitive with three attributes of the model: size, orientation and contrast [1,2]. According to the first attribute, the length of the edge should be considered when calculating its collapse cost. With the last two attributes, the dihedral angles between the related triangles are also important guidance. Our cost function will focus on the edge length and the sum of the dihedral angles.

t4 t5

D

t3 G t6 t7 t2 t1 t8

E

F

Fig. 2. The candidate for contracting

The principle of our algorithm is that the contracting edge should be at smooth areas (such as edge(a,b) in Fig. 2), so the dihedral angle between any two related triangles should be small. To calculate the collapse cost for edge(u,v) in Fig. 1, we need to do some work as follows: 1) Find out all the triangles adjacent to vertex u: t1, t2, t3, t4, and t5, and those adjacent to both vertex u and v: t1 and t5 2) Calculate the dihedral angle between t1 and t2, t3, t4, t5, and then those between t5 and t1, t2, t3, t4 3) Set the largest dihedral angle in step 2 as the final angle between the related triangles adjacent to edge(u,v). The final angle of edge(a,b) in Fig. 2 is very small(zero), so we can contract it. As a matter of fact, we can relax this condition to that when the edge is an exclusive edge we can also collapse it, such as edge(c,d) in Fig. 2. The collapse of edge(c,d) will have little influence to the appearance of the model. We can observe that the dihedral

t1

t2 t3

8

t4

t5

9

Fig. 3. The calculation of the triangles’ weight

Fast Mesh Simplification Algorithm Based on Edge Collapse

279

angle between t1 and t8 is very large while the one between t1 and t2 is very small. If we use the above algorithm for calculation, the collapse cost of edge(c,d) will be large, which is contrary to the fact, so we need to improve it. We give every triangle a weight when calculating the dihedral angle. For edge(u,v) in Fig. 3, when calculating the dihedral angle between t1 and the other triangles, we think the one between t1 and t2 is most important , so the weight of t2 to t1 should be largest, and the weights of t2, t3, t4 and t5 should decrease counterclockwise. While, when we calculate the dihedral angle between t5 and the other triangles, the weights of t4, t3,, t2 and t1 should decrease clockwise. We define that S is the set of triangles that are adjacent to vertex u, the number of the triangles in it is n and si (i=1, 2 , , ,n) indicates the ith triangle. B is the set of triangles that are adjacent to both u and v, the number of the triangles in it is m. We define the weight of si to bj as follows: W ( s i , b j ) = n /( n + D ( s i , b j )

(1)

where D(si, bj) in (1) denotes the number of triangles between si and bj. In Fig. 3, if bj is t1, D(si., t1) denotes the number of triangles which will be visited when traversing counterclockwise from t1 to si. For example, D(t2, t1)=1, D(t4, t1)=3. If bj is t5, D(si, t5) denotes the number of triangles which will be visited when traversing clockwise from t5 to si. Define fi (i=1, 2, , , n) indicates the unit normal vector of the ith triangle of S, and ej (j=1, 2, , , m) indicates the unit normal vector of the jth triangle of B. We define the collapse cost of edge (u, v) is: m

n

Cost (u , v ) =|| u − v || ×(¦¦ [(1 − (e j • f i )) × W ( si , b j )]) j =1 i =1

(2)

where ||u-v|| in (2) indicates the length of edge(u,v).

e j • f i =| e j | × | f i | × cosθ = cosθ

(3)

We use ej ⋅ fi to compare the value of the dihedral angleș, so we can avoid the calculation of arccosine. 3.3 Multiple-Choice Algorithm

Since the cost function has been defined, each possible atomic decimation operation (candidate) can be rated according to the function. So the remaining issue is to choose a candidate for each decimation step. In other words, we should find the optimal decimation sequence. Finding the optimal decimation sequence is a very complex problem [13] and consequently one has to find solutions with approximate optimality. Most of the algorithms adopt a greedy strategy to find a decimation sequence that is close to the optimal. For every decimation step, the algorithm will go through all possible candidates to find one with the lowest cost. An implementation of the greedy strategy usually requires a priority queue data structure for the candidates that has to be initialized and updated during the decimation. Our algorithm uses a different probabilistic optimization strategy based on Multiple-Choice algorithm to find the decimation sequence. The fundamental idea behind

280

S. Jia, X. Tang, and H. Pan

MCA is quite simple and intuitive and can be explained best by means of the wellestablished bins-and balls model [14,15]. In order to apply MCA to the model simplification problem we have to map balls, bins, and maximum load to the corresponding mesh entities [3]. Since the balls are enumerated in the outer loop (for each ball make a MC decision) they correspond to the decimation steps. The bins represent the possible choices in each step, hence they correspond to the possible candidates. The maximum load finally is the value that is to be optimized and consequently we associate it with the quality criterion that is used to rate the candidates. In this setup, the MCA approach to model simplification consists of testing a small set of d randomly selected candidates (edge collapses) in each step and performing that decimation operation among this small set that has the best quality value. Experiments show that using MCA approach our algorithm can produce approximations in almost the same quality as other algorithms based on greedy strategy when d = 6. Compared to the greedy optimization, the major benefit of the Multiple-Choice optimization is that the algorithmic structure is much simpler. For the Multiple-Choice optimization we do not need a priority queue and consequently we reduce the memory consumption and make the algorithm much easier to implement. 3.3 Algorithm Summary

Firstly, the importance of each vertex in a mesh should be evaluated. The most suitable edge for the contraction is searched in its neighborhood, and the one with the lowest cost is marked as the vertex’s importance. As for the most suitable edge for contraction we take the one that does not cause the mesh to fold over itself and preserves the original surface according to the criterion. Then we can decimate vertices one by one according their importance. Using the above idea of Multi-Choice techniques, the overall framework of our algorithm can be summarized as follows: 1. Determine the topology structure of the original mesh and calculate the unit normal of every triangle in the mesh. 2. For every vertex of the original model, calculate the cost of contracting the vertex to its neighborhood, which means to calculate the cost of all the edge adjacent to the vertex, picking the edge with the lowest cost as the vertex’s collapse edge. 3. Randomly choose d vertices from all candidates, and update the vertices needed to be recomputed among the d vertices. 4. Select the vertex with candidature edge of lowest cost from the d vertices, and contract its edge. After contracting, mark the related vertices needed to be updated. 5. Repeat step 3 and 4 until the given criterion is satisfied. Our algorithm does not need a global priority queue, so it is much easier to be implemented. In Table 1, we compare our algorithm with others.

Fast Mesh Simplification Algorithm Based on Edge Collapse

281

Table 1. Compared with others, our algorithm based on greedy strategy does not need a global priority queue

Algorithm step Initialize

Select candidate

Decimate

Our algorithm Others Initialize, Initialize, compute collapse cost for all compute collapse cost for all candidate, candidate perform global queue sorting Select d vertices randomly, update the vertex’s cost if Top of the queue necessary, pick the best out of d Perform operator Perform operator, locally recomputed cost, update global queue

4 Implementation Based on the above algorithm, we have developed a framework providing efficient simplification on models of various size. 9LHZ PRGHO

5HDG PRGHO ILOH

,QWHUQDO GDWD VWUXFWXUHV

6DYH PRGHO ILOH

6LPSOLI\ PRGHO

Fig. 4. The structure of our simplification system. The system can read model file into internal data structure, simplify it interactively, and save the internal data back to model file.

The processing stage of the framework consists of the following steps (see Fig. 4): 1. 2. 3. 4. 5.

Read the input model file, and create internal data structure. Render the current model. Simplify the model according to the user’s aim. Render the simplified model. Repeat step 2-4 until the user is satisfied with the resulting model, then save it back to model file.

Step1 consists of reading the model file, triangulating the model, and storing all the points and triangles into the internal data structures. As our simplification algorithm can only deal with triangles as input, the models consisting of n-sided polygons need to be triangulated in the preprocessing phase.

282

S. Jia, X. Tang, and H. Pan

Class Vertex{ Vec3 location; // Vec3 is a vector class int index; set vertNeighbors; //vertices connected to this vertex set triNeighbors; //triangles of which this vertex is a part bool bActive; //false if vertex has been removed double cost; //cost of contracting the min cost edge int minCostNeighbor; // index of vertex at other end of the min cost edge } Class Face{ float weight; Vec3 direct; Vec3 normal;//normal of this triangle Vec3 point;// vertices of this triangle bool bActive; // active flag void getVerts(int& v1,int& v2,int& v3) } Class CShape { vector vertices; vector < Face > faces; Vertex& getvertex (int i); { return vertices( i ) }; Face& getface (int i ) { return faces( i );}; unsigned int vert_count( ) const; unsigned int face_count( ) const; unsigned int active_vert_count( ); unsigned int active_face_count( ); bool initialize( ); //find min cost edge for every vertex } Class CModel { vector shapes; void getshape (CShape& shape); bool initialize( ); //initialize every shape bool decimate(int percent); //control the process of simplification bool EdgeCollapse( int vertid )//contracting one edge }

As shown above, we define the basic internal data structure for model and simplification with the aid of Visual C++.

5 Results and Discussion In this section the efficiency of our algorithm is demonstrated. We have tried our implementation on various models of different sizes and varying shapes, and have

Fast Mesh Simplification Algorithm Based on Edge Collapse

283

achieved encouraging results. Table 2 summarizes the running time of our current implementation and compares it with Garland’s QEM algorithm [12] and Melax’s algorithm which is simple and fast [16]. All experiments are done on a commodity PC with Intel 2.4GMHz CPU and 1024M RAM. Table 2. Running time of different algorithms. All data reflects the time needed to simplify the model to 0 triangles

Model

Vertices

Triangles

telephone

34352

68575

lamp

5848

11672

bunny

34834

69451

Algorithm

Initialization time (secs.)

Simplification time (secs.)

Total (secs.)

QEM Melax Our QEM Melax Our QEM Melax Our

3.721 3.058 4.344 1.245 2.215 1.687 5.512 8.531 5.122

19.493 25.143 13.085 3.506 5.278 2.110 19.970 25.225 6.965

23.214 28.201 17.429 4.751 7.493 3.797 25.482 33.756 12.087

We also depict the absolute maximum geometric errors for the bunny and lamp model when decimating them to various levels of details (see Fig. 5 and Fig. 6). The approximation error is measured by the Hausdorff distance between the original model and the simplified result. The Hausdorff distance (sometimes called the L∞ norm difference) between two input meshes M1 and M2 is [17]:

K haus ( M 1 , M 2 ) = max(dev( M 1 , M 2 ), dev(( M 2 , M 1 )) 1

2

1

(4) 2

where dev(M , M ) in (4) measures the deviation of mesh M from mesh M . The Hausdorff distance provides a maximal geometric deviation between two meshes.

Fig. 5. Absolute maximum geometric error for bunny model. The size of bounding box is 15.6*15.4*12.1.

284

S. Jia, X. Tang, and H. Pan

Fig.7 demonstrates the visual quality of the approximations generated using our algorithm. In Fig.7 (i), the bunny model is drastically simplified (99%), but the major details of the original still remain.

4(0 2XUDOJRULWKP 0HOD[

UR UU H QR LW DP L[ RU SS D

WULDQJOHV

Fig. 6. Absolute maximum geometric error for lamp model. The size of bounding box is 15.6*15.6*22.6.

(a) telephone, 68575triangles

(d) lamp, 11672triangles

(b) 34287 triangles

(c) 1370 triangles

(e) 5836 triangles

(f) 232 triangles

Fig. 7. The visual quality of the approximations generated using our algorithm. The bunny model (i) is drastically simplified (99%), but the major details still remain.

Fast Mesh Simplification Algorithm Based on Edge Collapse

(g) bunny, 69451triangles

(h) 6945 triangles

285

(i) 694 triangles

Fig. 7. (continued)

6 Conclusion We have presented a surface simplification algorithm which is capable of rapidly producing high fidelity approximations of 3d meshes. Our algorithm can preserve the visually important features of the model. We also applied generic probabilistic optimization principle of Multiple-Choice algorithm to the problem of finding a simplification sequence. Experiments show that the MCA approach can reduce the memory overhead and lead to a simpler algorithmic structure. Based on the proposed algorithm, we have implemented a simplification system. We have processed many 3D meshes of different sizes on this system, and achieved encouraging results. This demonstrates the effectiveness of our algorithm.

References 1. Campbell, F. W., Robson, J. G.: Application of Fourier Analysis to the Visibility of Gratings. Journal of Physiology 197 (1968) 551-566 2. Blakemore, C., Campbell, F. W.: On the Existence of Neurons in the Human Visual System Selectively Sensitive to the Orientation and Size of Retinal Images. Journal of Physiology, 203 (1969) 237-260 3. Wu, J., Kobbelt, L.: Fast Mesh Decimation by Multiple–choice Techniques. In Vision, Modeling and Visualization. IOS Press (2002) 241–248 4. Turk, G.: Re-tilling Polygonal Surfaces. In Proceeding of ACM SIGGRAPH (1992) 55-64 5. Schoroeder, W.J., Zarge, J.A., Lorensen, W. E.: Decimation of Triangle Meshes. In Proc. Of ACM SIGGRAPH (1992) 65-70 6. Rossignac, J., Borrel, P.: Multi-resolution 3D Approximation for Rendering Complex Scenes. In Geometric Modeling in Computer Graphics Springer Verlag (1993) 455-465 7. Hoppe, H., DeRose, T., Duchamp, T., McDonald, J. A., Stuetzle, W.: Mesh optimization. Computer Graphics (SIG-GRAPH ’93 Proceedings) (1993) 19–26 8. Hoppe, H.: Progressive Meshes. In SIG-GRAPH 96 Conference Proceeding. ACM SIGGRAPH Addison Wesley August (1996) 99-108 9. Cohen, J., Varshney, A., Manocha, D., Turk, G.: Simplification Envelopes. In Proc. Of ACM SIGGRAPH ’96 (1996) 119-128

286

S. Jia, X. Tang, and H. Pan

10. Derose, T., Lounsbery, M., Warren, J.: Multiresolution Analysis for Surfaces of Arbitrary Topology Type. Technical Report TR 93-10-05 Department of Computer Science University of Washington (1993) 11. Eck, M., Derose, T., Duchamp, T., Hoppe, H., Lousbery, M., Stuetzle, W.: Multiresolution Analysis of Arbitrary Meshes. In Proceeding of ACM SIGGRAPH (1995) 173-182 12. Garland, M., Heckbert, P. S.: Surface Simplification Using Quadric Error Metric. In Proc. SIGGRAPH'97 (1997) 209-216 13. Agarwal, P., Suri, S.: Surface Approximation and Geometric Partitions. In Proceedings of 5th ACM-SIAM Symposium on Discrete Algorithms (1994) 24-33 14. Azar, Y., Broder, A., Karlin, A., Upfal, E.: Balanced Allocations. SIAM Journal on Computing, 29(1) (1999) 180-200 15. Kolchin, V., Sevastyanov, B., Chist-yakov, V.: Random Allocations. John Willey & Sons (1978) 16. Melax, S.: A Simple, Fast, and Effective Polygon Reduction Algorithm. Game Developer November (1998) 44-49 17. Southern, R., Blake, E., Marais, P.: Evaluation of Memoryless Simplification. Technical Report CS01-18-00, University of Cape Town (2001)

Hierarchical Multiple Models Adaptive Feedforward Decoupling Controller Applied to Wind Tunnel System∗ Xin Wang1,2 and Hui Yang2 1

Center of Electrical & Electronic Technology, Shanghai Jiao Tong University, Shanghai, P.R. China, 200240, 2 School of Electrical & Electronic Engineering, East China Jiaotong University, Jiangxi, P.R. China, 330013 [email protected]

Abstract. For the biggest wind tunnel in Asia, during the aerodynamic research on the scale models, it is difficult to keep the Mach number in the test section and the stagnation pressure constant strictly because the interaction is strong, the operation conditions change abruptly and the transient response’s requirements are high. To cope with these problems, a Hierarchical Multiple Models Adaptive Feedforward Decoupling Controller (HMMAFDC) is presented in this paper. The controller is composed of multiple fixed controller models and two adaptive controller models. Multiple models are used to improve the transient response of the wind tunnel. Hierarchical structure is presented to reduce the number of the fixed models greatly. To the optimal model selected by the switching index, the interactions of the system are viewed as measurable disturbance and eliminated using the feedforward strategy. It not only decouples the system dynamically but also places the poles of the closed loop system arbitrarily. The significance of the proposed method is that it is applicable to a MIMO system with a much small number of models. The global convergence is obtained. Finally, several simulation examples in a wind tunnel experiment are given to show both effectiveness and practicality.

1 Introduction A 2.4m x 2.4m injector driven transonic wind tunnel in China Aerodynamics Research and Development Center (CARDC) is the biggest wind tunnel in Asia [1]. It is used for aerodynamic research on scale models, which is very important for national defense and civil aviation. Aerodynamic research data of scale models are measured at a given Mach number with a constant stagnation pressure. It is required that in the initial stage, the response time should be no longer than 7.0 seconds; in the experiment stage, the steady state tracking errors are within 0.2% in 0.8 second and the overshoot should be avoided [2]. Recently several controllers are designed to satisfy the transient response’s requirement above. According to a 1.5m wind tunnel (FFA- T1500) in Sweden, several separate SISO models are used to control it [3]. For a 1.6m x 2m wind tunnel in ∗

This work is supported by National Natural Science Foundation (No. 60504010, 50474020) and Shanghai Jiao Tong University Research Foundation.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 287 – 298, 2006. © Springer-Verlag Berlin Heidelberg 2006

288

X. Wang and H. Yang

Netherlands, it is regarded as a second-order system and a PID controller is given [4]. Later a predictive controller is designed to control the Mach number in this wind tunnel with the angle of attack changing [5]. In USA, a system of self-organization neural networks are developed and tested to cluster, predict and control the Mach number of a 16-foot wind tunnel in NASA [6]. However, if the descriptions for the aerodynamics of a wind tunnel are different with the size of a wind tunnel, the controller should be also different. For the 2.4m x 2.4m transonic wind tunnel in CARDC, two SISO stable linear reduced order models are established and two PID controllers are designed to control the Mach number and the stagnation total pressure respectively [2]. But when the Mach number in the test section varies from 0.3 to 1.2, the interaction becomes stronger and a multivariable decoupling controller is needed [7]. In [1], two feedforward static decouplers with four fixed PI controllers are designed to solve this problem. But when the Mach number steps from 0.3 to 0.4, 0.5,…,1.2, the parameters of the wind tunnel will jump accordingly. The poor transient response cannot satisfy the high requirements of the wind tunnel above. So some special controller structure and algorithms are needed. To solve this problem, some multiple models adaptive controllers (MMAC) are designed to improve the transient response [8, 9]. One adaptive model, one reinitialized adaptive model and lots of fixed models are used to cover the region where the parameters change. For example, about 300 models are needed to cover the region where only one parameter changes [10]. The number of the models is so large that it increases the calculation time, which affects the selection of the sampling period. To reduce the huge number of models needed in MMAC, Localization, Moving Bank and other methods are presented [11, 12]. However, these methods can only reduce a small number of the models, which can’t solve this problem essentially. In our former work, a Hierarchical Multiple Models Adaptive Controller (HMMAC) was proposed to reduce the number of the fixed models [13, 14]. In [13], a decoupling controller using pole-zero cansellation method is proposed to deal with minimum phase system, while non-minimum pahse system is solved in [14]. Unfortunately, their structures are not suitable for the distributed control system (DCS) In this paper, a novel Hierarchical Multiple Models Adaptive Feedforward Decoupling Controller (HMMAFDC) is presented. Multiple models are used to improve the transient response of the wind tunnel. Hierarchical structure is presented to reduce the number of the fixed models greatly. To the optimal model selected by the switching index, the interactions of the system are viewed as measurable disturbance and eliminated using the feedforward strategy. It not only decouples the system dynamically but also places the poles of the closed loop system arbitrarily. Several simulation examples in the wind tunnel experiment illustrate the HMMADC.

2 Description of the System The 2.4m x 2.4m wind tunnel is an intermittent wind tunnel constructed for the aerodynamic research aim by CARDC. It is a closed-circuit interjector driven transonic tunnel and used for testing scale models, mostly of airplanes, in the speed region of 0.3 to 1.2 (see fig.1). The interjector is used to realize high Mach numbers with the limited amount of air storage while the Reynolds number can be increased in order to decrease

HMMAFDC Applied to Wind Tunnel System

289

the influence of model factors on the measurements. At the initial stage of the aerodynamic experiment, the main control hydraulic servo valve is opened and air is allowed to flow from storage bottle into the tunnel. Part of the air is let out through the main exhaust hydraulic servo valve; the other is injected into the tunnel by the injector. After the stable flowing field is established, the experiment proceeds. It has more than 40 operation cases. One of these cases is as follows [1]. At the initial stage of the experiment, the main control hydraulic servo valve is tuned to give the initial value of the Mach number in the test section with the main exhaust hydraulic servo valve and the choke finger at the preset position. After the stable flowing field is established, the exhaust hydraulic servo valve is tuned to keep the stagnation total pressure to be 1.5, and the choke finger makes the Mach number in the test section vary with ∆ M = 0.1 from 0.3 to 1.2, while the main control hydraulic servo valve is controlled to ensure the injector total pressure constant and compensates for the loss of the air storage pressure. When the Mach number in the test section is larger than 0.8, the choke finger is opened at its maximal position and the plenum exhaust valve is used to tune the Mach number in the test section correspondingly.

flow Main control hydraulic servo valve

Injector

Storge tank

Stagnation pressure control flow flow Test section

Choke finger

Stagnation

Main exhaust hydraulic servo valve

Mach number control

Fig. 1. The structure of the transonic wind tunnel

From these two particular models[7], the linear reduced-order model of the wind tunnel can be established according to each Mach number as follows

β1 −0.4 s ª e − « α y s ( ) ª 1 º 1s + 1 « = « y ( s)» ¬ 2 ¼ « − β 3 s + 1 e −0.4 s « (α s + 1) 2 3 ¬

−

β2

º e −0.4 s » ªu ( s) º »⋅« 1 » , β4 » u (s) e −0.4 s » ¬ 2 ¼ α4 s + 1 ¼

(α 2 s + 1)2

(1)

290

X. Wang and H. Yang

where y1 ( s ) , y2 ( s ) , u1 ( s ) , u2 ( s ) are the Mach number in the test section, the stagnation total pressure, the choke finger opening and the main exhaust hydraulic servo valve respectively. αi , β i are parameters. and satisfy αi ∈ [αi min , αi max ] , βi ∈ [ βi min , β i max ] .

Select the sampling period as 0.1 second. Then the linear discrete time multivariable minimum phase system is described as

(I + A z 1

−1

+ A2 z −2 ) y(t ) = ( B0 + B1 z −1 ) u(t − 4) + d .

(2)

When the Mach number varies, the parameters of the system change accordingly. So the system can be viewed as a linear MIMO discrete-time system, which admits DARMA representation of the form A(t , z −1 ) y(t ) = B(t , z −1 )u(t − k ) + d (t ) ,

(3)

where u(t ) , y(t ) are the n × 1 input, output vectors respectively and d (t ) is a n × 1 vector denoting the steady state disturbance. A(t , z −1 ), B(t , z −1 ) are polynomial matrixes in the unit delay operator z −1 and B0 (t ) is nonsingular, for any t . Here

A(t , z −1 ) is assumed to be a diagonal polynomial matrix. The system satisfies the assumptions as follows: (1) The system parameters are time variant with infrequent large jumps. The period between two adjacent jumps is large enough to keep the jumping parameters constant. (2) Φ (t ) = [ − A1 (t ),; B0 (t ),; d (t )] is the system model, which changes, in a

compact set Σ . (3) The upper bounds of the orders of A(t , z −1 ) , B(t , z −1 ) and the time delay k are known a prior; (4) The system is minimum-phase. To decouple the system, the interaction caused by the input u j ( t ) to the output yi (t ) , ( j ≠ i ) is viewed as measurable disturbance. So the system (3) can rewritten

A(t , z −1 ) y(t ) = B(t , z −1 ) u(t − k ) + B(t , z −1 )u(t ) + d (t ) ,

(4)

where B(t , z −1 ) = B(t , z −1 ) + B(t , z −1 ) . B(t , z −1 ) = diag ª¬ Bii (t , z −1 ) º¼ is a diagonal polynomial matrix and B0 (t ) is nonsingular, ∀t . B(t , z −1 ) = ª¬ Bij (t , z −1 ) º¼ and

Bii (t , z −1 ) = 0 . From assumption (1), Ai ( t ) , B j (t ) , B j (t ) , d (t ) are piecewise constant (time variant system with infrequent large jumping parameters). During the period when no jumps happen, (4) can be rewritten as A( z −1 ) y(t ) = B( z −1 )u(t − k ) + B( z −1 ) u(t − k ) + d .

(5)

HMMAFDC Applied to Wind Tunnel System

291

3 HMMAFDC To reduce the number of fixed models, a hierarchical structure with l levels is adopted. Iupu t an d O utput D ata

1

Level 1

Level 2

1

ಹ

2

Level l

1

O p ti m a l F i x e d M o d e l

ಹ

2

᧧

ಹ ಹ

j2

ಹ ಹ

j1

ಹ

2

u, y

m

1

m

2

ಹ ಹ

jl

A d a p ti v e M o d e l

᧧

m

l

R e in iti a l iz e d A d a p ti v e M o d e l

O p ti m a l M o d e l

Fig. 2. Hierarchical principle of the HMMAFDC

(1) Utilizing the prior information, the set Σ , where the parameters of the system vary, is partitioned into m1 subsets Σ , ( s = 1, , m1 ) . In each subset, the center Φ and 1, s

1,s

its radius r are designed to satisfy that For any Φ ∈ Σ , Φ − Φ ≤ r . So the centers 1,s

1,s

1, s

1, s

Φ , s = 1, , m1 compose the level 1 fixed model set which covers the system 1,s

parameter set with their neighbors entirely. (2) According to the switching index, the best model in level 1 is selected as j1 . (3) Based on the best model j1 in level 1 and use the partition method presented above similarly, m2 centers are set up to compose the level 2 fixed model set on line dynamically, which covers the model j1 with their neighbors entirely. (4) According to the switching index, the best model in level 2 is selected as j2 . (5) Similarly, the best model in the last level i.e. level l is selected as jl , which is also the best model among all the fixed models. (6) At last, in level l + 1 , a free running adaptive model and a reinitialized adaptive model are added in. According to the switching index, the best model is selected among

292

X. Wang and H. Yang

these three models above. The free running adaptive model is used to guarantee the stability of the wind tunnel while the reinitialized adaptive model’s initial value can be set to be that of the best model selected to improve the transient response of the wind tunnel. For the system (5), the cost function to be considered is of the form [15] 2

J c = P ( z −1 ) y(t + k ) − R( z −1 ) w(t ) +Q ( z −1 )u(t ) + S ( z −1 ) u(t ) + r ,

(6)

where w(t ) is the known reference signal. P ( z −1 ), Q ( z −1 ), R( z −1 ) are diagonal weighting polynomial matrices, S ( z −1 ) is a weighting polynomial matrix and r is the weighting vector respectively. Q ( z −1 ) is used to weight the control u(t ) and S ( z −1 ) is used to weight the interaction u(t ) , which is viewed as the measurable disturbance. Introduce the identity P ( z −1 ) = F ( z −1 ) A( z −1 ) + z − k G ( z −1 ) .

(7)

In order to get unique polynomial matrixes F ( z −1 ) , G ( z −1 ) , the orders of F ( z −1 ), G ( z −1 ) are chosen as n f = k − 1, ng = na − 1 .

(8)

Multiplying (5) by F ( z −1 ) and using (7), the optimal control law is as G ( z −1 ) y(t ) + H1 ( z −1 ) u(t ) + H 2 ( z −1 ) u(t ) + r = R( z −1 ) w(t ) ,

(9)

where H1 ( z −1 ) = F ( z −1 ) B( z −1 ) + Q ( z −1 ), H 2 ( z −1 ) = F ( z −1 ) B( z −1 ) + S ( z −1 ), r = Fd + r . From (9) and (5), the system equation can be derived as follows ª¬ P ( z −1 ) B( z −1 ) + Q ( z −1 ) A( z −1 ) º¼ y(t + k ) = B( z −1 ) R( z −1 ) w(t ) + ª¬Q ( z −1 )d − B( z −1 )r º¼ + ªQ ( z −1 ) B( z −1 ) − B( z −1 ) S ( z −1 ) º u(t ) . ¬ ¼

(10)

Note that (10) is not the closed loop system equation because there exists the input u(t ) , although it is viewed as the measurable disturbance. Equation (9) and (10) are just used to choose the polynomial matrixes to decouple the system. For the system, let Q ( z −1 ) = R1 B( z −1 ) S ( z −1 ) = R1 B( z −1 ) where R1 is a diagonal matrix. The system equation (10) can be rewritten as ª¬ P ( z −1 ) + R1 A( z −1 ) º¼ y(t + k ) = R( z −1 ) w(t ) + R1d − r .

(11)

From (11), considering P ( z −1 ), R( z −1 ), A( z −1 ) are diagonal matrices, it is concluded that by the choice of the weighting polynomial matrixes, the closed loop system can be decoupled dynamically. To eliminate the steady state error, the polynomial matrixes can be chosen as P ( z −1 ) + R1 A( z −1 ) = T ( z −1 ) , r = R1d .

HMMAFDC Applied to Wind Tunnel System

293

In the level l + 1 , the HMMADC is composed of three models. One is the fixed controller model Θ , i.e. the best model jl in level l , the others are a free-running l +1,1

adaptive controller model Θ and a re-initialized adaptive controller model Θ . l +1,2

l +1,3

To the adaptive controller models Θ , Θ , Multiplying (5) by F ( z −1 ) from the left l +1,2

l +1,3

and using (7), it follows that P ( z −1 ) y(t + k ) = G ( z −1 ) y(t ) + F ( z −1 ) B( z −1 )u(t ) + F (1)d .

(12)

Multiplying (5) by R1 from the left and using the chosen polynomial matrixes above, it follows that T ( z −1 ) y(t + k ) = P ( z −1 ) y(t + k ) + R1 A( z −1 ) y(t + k ) = G ( z −1 ) y(t ) + F ( z −1 ) B( z −1 )u(t ) + R1 B( z −1 ) u(t ) + F (1)d + R1d .

(13)

Using (7), (9) and the definitions of H ( z −1 ) , r , the recursive estimation algorithm of Θ and Θ is described as follows m +1

m+2

Ty(t + k ) = Gy(t ) + H1u(t ) + H 2 u(t ) + r ,

θˆi (t ) = θˆi (t − 1) + a (t )

X (t − k ) ⋅ ª y fi (t )T − X (t − k )T θˆi (t − 1) º¼ , 1 + X ( t − k )T X ( t − k ) ¬

y fi = Tii ( z −1 ) yi (t )

where

(14)

is

the

X (t ) = ª¬ y(t )T ,; u(t )T ,; u(t )T , ,1º¼ controller

T

auxiliary

system

(15) output,

is the data vector, Θ = [θ1 , , θ n ] is the

parameter

matrix

and

T

θi = ª¬ g , , g ; g ,, g ,; h ,, h ;º¼ , i = 1, 2, , n . The scalar a (t ) is set to 0 i1

0 in

1 i1

1 in

0 i1

0 in

avoid the singularity problem of the estimation Hˆ (0) [16]. To a HMMAFDC, the switching index is as follows 2

2

y f (t ) − y f ( t )

e f (t ) J=

i ,s

i,s

1 + X (t − k ) X (t − k ) T

=

(16)

i,s

1 + X (t − k ) X (t − k ) T

where y f (t ) = T ( z −1 ) y(t ) is the auxiliary output of system, e f (t ) is the auxiliary i ,s

output error between the real system and the model s in level i . For level 1 to l , let ji = arg min( J ) s = 1, , mi , i = 1, 2, , l correspond to the model whose auxiliary i,s

output error is minimum , then Θ is chosen to be the best controller in level i . But for j

the level l + 1 , there are only three models left. So let jl +1 = arg min( J ) s = 1, 2, 3 , l +1, s

then Θ is chosen to be the HMMADC and used to control the system. j +1

294

X. Wang and H. Yang

(1) If jl +1 ≠ 3 , which means Θˆ (t ) is not the minimum output error controller, then l +1,3

re-initialize Θˆ (t ) as the optimal controller parameter to improve the transient l +1,3

response, i.e. Θˆ (t ) = Θ . Θˆ (t ) , Θˆ (t ) are estimated using (15) respectively and l +1,3

l +1, jl +1

l +1,2

l +1,3

the controller is set as Θ (t ) = Θ . l +1, jl +1

(2) If jl +1

= 3 , Θˆ (t ) , Θˆ (t ) are estimated using (15) respectively and the controller l +1,2

l +1,3

is set as Θˆ (t ) = Θˆ (t ) . l +1,3

The optimal control law can be obtained from

Gˆ ( z −1 ) y(t ) + ª¬ Hˆ 1 ( z −1 ) + Hˆ 2 ( z −1 ) º¼ u(t ) + rˆ = R( z −1 ) w(t ) .

(17)

4 Applications to the Wind Tunnel System The wind tunnel system (2) is of second order and the time delay equals to 4. Every 60 steps, the Mach number in the test section varies from 0.3 to 1.2 with ∆ M = 0.1 , which causes the parameters of the system jump simultaneously. Because the sampling period is selected as 0.1 second, 1 second in experiment means 10 steps in the simulation. The stagnation total pressure is required to be 1.5 all the time. Case 1: A conventional adaptive decoupling controller is designed to control the wind tunnel. Its initial value is chosen close to the real controller parameter model. The responses of the system are shown in Fig. 3 and 4. In the initial stage, after 7 seconds’ operation, the overshoots of the system are all less than 0.2%, which satisfies the requirement. But in the experiment stage, after 0.8 second’s operation, the overshoots of the system are much larger than 0.2%. The largest overshoot is 68.74%, which 340 times the requirement. In fact, during all experiment period, i.e. after the initial stage, the overshoots of the system are all much larger than 0.2%. So the adaptive controller cannot satisfy the requirement and be used to control the wind tunnel. Case 2: A multiple models adaptive decoupling controller is designed to control the wind tunnel. In this case, 30 fixed models are used to cover the region where jumping parameters vary. Note that the real system model is not among these fixed system models. Then 30 corresponding fixed controller models are set up using the transformation proposed above and two adaptive controller models are added to compose the multiple controller models. These two adaptive controller models’ initial values are same as those of the adaptive model in case 1. The responses of the system are shown in Fig. 5 and 6. Compared with that in case 1, the transient response of the wind tunnel is improved greatly when only 30 fixed models are added. In the initial stage, the overshoots of the system are all less than 0.2%, which satisfies the requirement. However, in the experiment stage, the overshoots of the system are all larger than 0.2%, especially the stagnation total pressure (see Fig.6).

HMMAFDC Applied to Wind Tunnel System

295

Case 3: A multiple models adaptive decoupling controller with 1000 fixed models is designed to control the wind tunnel. It is designed using the same algorithm as in case 2 but the number of the fixed models. As the number of the fixed models increases, the transient response becomes better. Both in the initial stage and in the experiment stage, the overshoots of the system are all less than 0.2%, which satisfies the requirement (see Fig.7 and 8). Case 4: A HMMADC is designed to control the wind tunnel. In this case, the same algorithm is used as in case 2 and 3 except a hierarchical structure with 3 levels and 10 models at each level adopted. Totally there are 30 fixed models added, the same number as in case 2, but the overshoots of the system are much better than those in case 2. They are similar to those in case 3, all less than 0.2%, which satisfies the requirement both in the initial stage and in the experiment stage. But the number is 33 times less than that in case 3 (see Fig.9 and 10). The results show that although the same algorithm is adopted in case 2, 3 and 4, the HMMADC can get better transient response with fewer models. 1.4

1.2

1

y1

0.8

0.6

0.4

0.2

0 0

50

100

150

200 t/step

250

300

350

400

Fig. 3. The Test-section-Mach-number using ADC 1.8 1.6 1.4 1.2

y2

1 0.8 0.6 0.4 0.2 0 0

50

100

150

200 t/step

250

300

350

Fig. 4. The Stagnation-total-pressure using ADC

400

X. Wang and H. Yang 1.4

1.2

1

y1

0.8

0.6

0.4

0.2

0 0

50

100

150

200 t/step

250

300

350

400

Fig. 5. The Test-section-Mach-number of MMADC using 30 models 1.8 1.6 1.4 1.2

y2

1 0.8 0.6 0.4 0.2 0 0

50

100

150

200 t/step

250

300

350

400

Fig. 6. The Stagnation-total-pressure of MMADC using 30 models 1.4

1.2

1

0.8 y1

296

0.6

0.4

0.2

0 0

50

100

150

200 t/step

250

300

350

400

Fig. 7. The Test-section-Mach-number of MMADC using 1000 models

HMMAFDC Applied to Wind Tunnel System 1.8 1.6 1.4 1.2

y2

1 0.8 0.6 0.4 0.2 0 0

50

100

150

200 t/step

250

300

350

400

Fig. 8. The Stagnation-total-pressure of MMADC using 1000 models 1.4

1.2

1

y1

0.8

0.6

0.4

0.2

0 0

50

100

150

200 t/step

250

300

350

400

Fig. 9. The Test-section-Mach-number of HMMADC using 10,10,10 models 1.8 1.6 1.4 1.2

y2

1 0.8 0.6 0.4 0.2 0 0

50

100

150

200 t/step

250

300

350

400

Fig. 10. The Stagnation-total-pressure of HMMADC using 10,10,10 models

297

298

X. Wang and H. Yang

5 Conclusions This paper presents a Hierarchical multiple models adaptive decoupling controller. Compared with the MMADC, the better transient response can be got with much fewer models, which reduce the number of the fixed models greatly.

References 1. Zhang, G.J., Chai T.Y., Shao C.: A Synthetic Approach for Control of Intermittent Wind Tunnel, Proceedings of the American Control Conference, (1997) 203–207 2. Yu W., Zhang G.J.: Modelling and Controller Design for 2.4 M Injector Powered Transonic Wind Tunnel, Proceedings of the American Control Conference, (1997) 1544–1545 3. Nelson D.M.: Wind Tunnel Computer Control System and Instrumentation, Instrument Society of America, (1989) 87–101 4. Pels A.F.: Closed-Loop Mach Number Control in A Transonic Wind Tunnel, Journal A, 30 (1989) 25–32 5. Soeterboek R.A.M., Pels A.F., et al.: A Predictive Controller for the Mach Number in A Transonic Wind Tunnel, IEEE Control Systems Magazine, 11 (1991) 63–72 6. Motter M.A., Principe J.C.: Neural Control of the NASA Langley 16-Foot Transonic Tunnel, Proceedings of the American Control Conference, (1997) 662–663 7. CARDC.: Measurement and Control System Design in High and Low Speed Wind Tunnel, National Defence Industry Press, Beijing (2002) (in Chinese) 8. Narendra K.S., Xiang C.: Adaptive Control of Discrete-Time Systems Using Multiple Models, IEEE Trans. on Automatic Control, 45 (2000) 1669–1686 9. Wang X., Li S.Y., et al.: Multiple Models Adaptive Decoupling Controller for A Nonminimum Phase System, 5th Asian Control Conference, (2002) 166–171 10. Narendra K.S., Balakrishnan J., Ciliz M.K.: Adaptation and Learning Using Multiple Models, Switching, and Tuning, IEEE Control Systems Magazine, 15 (1995) 37–51 11. Zhivoglyadov P.V., Middleton R.H., Fu M.Y.: Localization Based Switching Adaptive Control for Time-Varying Discrete-Time Systems, IEEE Trans. on Automatic Control, 45 (2000) 752–755 12. Maybeck P.S., Hentz K.P.: Inverstigation of Moving Bank Multiple Model Adaptive Algorithms, Journal of Guidance Control Dynamics, 10 (1987) 90–96 13. Wang X., Li S.Y., Yue H.: Multivariable Adaptive Decoupling Controller Using Hierarchical Multiple Models, ACTA Automatica Sinica, 31 (2005) 223–230 14. Wang X., Li S.Y., Yue H.: Hierarchical Multiple Models Decoupling Controller for Nonminimum Phase Systems, Control Theory and Application, 22 (2005) 201–206 15. Wang X., Li S.Y., et al: Multiple Models Direct Adaptive Controller Applied to the Wind Tunnel System, ISA Transactions, 44 (2005) 131–143 16. Goodwin G.C., Ramadge P.J., Caines P.E.: Discrete Time Multivariable Adaptive Control, IEEE Trans. on Automatic Control, 25 (1980) 449–456 17. Landau I.D., Lozano R.: Unification of Discrete Time Explicit Model Reference Adaptive Control Designs, Automatica, 17 (1981) 593–611

Intelligent Backstepping Control for Chaotic Systems Using Self-Growing Fuzzy Neural Network Chih-Min Lin1, Chun-Fei Hsu2, and I-Fang Chung3 1

Department of Electrical Engineering, Yuan-Ze University, Chung-Li, Tao-Yuan, 320, Taiwan, Republic of China [email protected] 2 Department of Electrical and Control Engineering, National Chiao-Tung University, Hsinchu, 300, Taiwan, Republic of China [email protected] 3 Institute of Bioinformatics, National Yang-Ming University, Taipei, 115, Taiwan, Republic of China [email protected]

Abstract. This paper proposes an intelligent backstepping control (IBC) for the chaotic systems. The IBC system is comprised of a neural backstepping controller and a robust compensation controller. The neural backstepping controller containing a self-growing fuzzy neural network (SGFNN) identifier is the principal controller, and the robust compensation controller is designed to dispel the effect of minimum approximation error introduced by the SGFNN identifier. Finally, simulation results verify that the IBC system can achieve favorable tracking performance.

1 Introduction In recent years, the fuzzy neural network (FNN) possesses the merits of the low-level learning and computational power of neural network, and the high-level human knowledge representation and thinking of fuzzy theory [1]. The fuzzy neural networks have been adopted widely for the control of complex dynamical systems owing to its fast learning property and good generalization capability. For the FNN approaches, the structure of the FNN should be determined by trial-and-error in advance for the reason that it is difficult to consider the balance between the rule number and the desired performance. To solve the problem of structure determination, much interest has been focused on the self-organizing neural network approach [2-4]. Without the need for preliminary knowledge, the self-organizing approach demonstrates the properties of generating the rules automatically. According to the way of the selforganizing approaches, the incremental rule construction has been widely used [2]. This paper proposes an intelligent backstepping control (IBC), which combines the advantages of neural network identification and adaptive backstepping control techniques, and it is applied to control a chaotic system to show its effectiveness. Finally, the effectiveness of the proposed SAFNC scheme is demonstrated by some simulations. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 299 – 308, 2006. © Springer-Verlag Berlin Heidelberg 2006

300

C.-M. Lin, C.-F. Hsu, and I.-F. Chung

2 Description of Chaotic Systems Chaotic systems have been known to exhibit complex dynamical behavior. The interest in chaotic systems lies mostly upon their complex, unpredictable behavior, and extreme sensitivity to initial conditions as well as parameter variations. Consider a second-order chaotic system such as well known Duffing’s equation describing a special nonlinear circuit or a pendulum moving in a viscous medium under control [5-8].

x = − px − p1 x − p 2 x 3 + q cos( wt ) + u = f + u

(1)

where p , p1 , p2 and q are real constants; t is the time variable; w is the fre-

x

x

quency; f = − px − p1 x − p 2 x 3 + q cos( wt ) is the chaotic dynamic function; and u is the control effort. Depending on the choice of these constants, it is known that the solutions of system (1) may exhibit periodic, almost periodic and chaotic behavior. For observing the chaotic unpredictable behavior, the open-loop system behavior with u = 0 was simulated with p = 0.4 , p1 = −1.1 , p 2 = 1.0 and w = 1.8 . The phase plane plots from an initial condition point (1,1) are shown in Figs. 1(a) and 1(b) for q = 1.8 and q = 7.0 , respectively. It is shown that the uncontrolled chaotic system has different trajectories for different q values.

q=1.8

q=7.0

x

x

(a)

(b)

Fig. 1. Phase plane of uncontrolled chaotic system

3 Design of Ideal Backstepping Controller The control objective is to find a control law so that the state trajectory x can track a reference xc closely. Assume that the parameters of the system (1) are well known, the design of ideal backstepping controller is described step-by-step as follows: Step 1. Define the tracking error e1 = x − xc

(2)

and the derivative of tracking error is defined as e1 = x − x c .

(3)

IBC for Chaotic Systems Using Self-Growing Fuzzy Neural Network

301

The x can be viewed as a virtual control in the equation. Define the following stabilizing function α = x c − c1e1 (4) where c1 is a positive constant. Step 2. Define e2 = x − α

(5)

then the derivative of e2 is expressed as e2 = x − α = x − xc + c1e1 .

(6)

Step 3. The ideal backstepping controller can be designed as [9] u * = xc − f − c1e1 − c2 e2 − e1

(7)

where c2 is a positive constant. Substituting (7) into (6), it is obtained that e2 = −c2 e2 − e1 .

(8)

Step 4. Define a Lyapunov function as e12 e22 (9) + . 2 2 Differentiating (9) with respect to time and using (3)-(5) and (8), it is obtained that V = e e + e e V1 =

1

1 1

2 2

= e1 ( x − α − c1e1 ) + e2 (−c2 e2 − e1 ) = −c1e12 − c2 e22 ≤ 0 .

(10) Therefore, the ideal backstepping controller in (7) will asymptotically stabilize the system.

4 Design of Intelligent Backstepping Controller Since the chaotic dynamic function f may be unknown in practical application, the ideal backstepping controller (7) can not be precisely obtained. To solve this problem, the descriptions of the SGFNN identifier and the design steps of the IBC system are described as follows: 4.1 SGFNN Identifier

A four-layer fuzzy neural network, which comprises the input (the i layer), membership (the j layer), rule (the k layer), and output (the o layer) layers, is adopted to implement the proposed SGFNN. The signal propagation and the basic function in each layer are as follows: Layer 1 - Input layer: For every node i in this layer, the net input and the net output are represented as

neti1 = xi1

(11)

302

C.-M. Lin, C.-F. Hsu, and I.-F. Chung

y i1 = f i1 (net i1 ) = neti1 , i = 1,2 where

(12)

1 i

x represents the i-th input to the node of layer 1.

Layer 2 - Membership layer: In this layer, each node performs a membership function and acts as an element for membership degree calculation, where the Gaussian function is adopted as the membership function. For the j-th node, the reception and activation functions are written as

net y 2j = f j2 (net 2j

2 j

(x =−

2 i

− mij )

2

(σ ) ) = exp(net ) ,

(13)

2

ij 2 j

j = 1,2,..., m

(14)

where mij and σ ij are the mean and standard deviation of the Gaussian function in the j-th term of the i-th input linguistic variable xi2 , respectively; and m is the total number of the linguistic variables with respect to the input nodes. Layer 3 - Rule layer: Each node k in this layer is denoted by ∏ , which multiplies the incoming signals and outputs the result of the product. For the k-th rule node net k3 = ∏ x 3j

(15)

y k3 = f k3 (net k3 ) = net k3 , k = 1,2,..., n

(16)

j

where x 3j represents the j-th input to the node of layer 3. Layer 4 - Output layer: The single node o in this layer is labeled as Σ , which computes the overall output as the summation of all incoming signals net o4 = ¦ wk4 xk4

(17)

y o4 = f o4 (neto4 ) = neto4 , o = 1

(18)

k

where the link weight wk4 is the output action strength associated with the k-th rule; xk4 represents the k-th input to the node of layer 4; and y o4 is the output of the SGFNN. For ease of notation, define the vectors m and ı collecting all parameters of SGFNN as m = [m11 m21 m12 m2 m ]T

(19)

ı = [σ 11 σ 21 σ 12 σ 2 m ]

(20)

T

Then, the output of the SGFNN can be represented in a vector form fˆ = w T ĭ(m, ı )

(21)

where w = [w14 w24 ...wn4 ] and ĭ = [x14 x24 ...xn4 ] = [Φ 1 Φ 2 ... Φ n ]T . According to the T

T

universal approximation theorem, an optimal SGFNN approximator can be designed to approximate the chaotic system dynamics, such that [10] f = f * + ∆ = w *T ĭ * (m * , ı * ) + ∆

(22)

IBC for Chaotic Systems Using Self-Growing Fuzzy Neural Network

303

where ∆ is the approximation error, w * and ĭ * are the optimal parameter vectors of w and ĭ , respectively, and m * and ı * are the optimal parameters of m and ı , respectively. Let the number of fuzzy rules be n * and the fuzzy rules be divided into two parts. The first part contains n neurons which are the activated part and the secondary part contains n * − n neurons which do not exist yet. Thus, the optimal weights w * , ĭ * , m * and ı * are classified in two parts such as ªw *a º ªĭ *a º ªm *a º ªı *a º w * = « * » , ĭ * = « * » , m * = « * » and ı * = « * » ¬w i ¼ ¬ĭ i ¼ ¬m i ¼ ¬ı i ¼

(23)

where w *a , ĭ *a , m *a and ı *a are activated parts, and w *i , ĭ *i , m *i and ı *i are inactivated parts, respectively. Since these optimal parameters are unobtainable, a SGFNN identifier is defined as ˆ (m ˆ a , ıˆ a ) ˆ Ta ĭ fˆ = w a

(24)

ˆ , m ˆ a and ıˆ a are the estimated values of w *a , ĭ *a , m *a and ı *a , reˆa, ĭ where w a

~

spectively. Define the estimated error f as ~ ˆ +∆ ˆ Ta ĭ f = f − fˆ = w *aT ĭ *a + w *i T ĭ *i − w a ~ ~ T ˆ T T ~ ~ ˆ ĭ +w ĭ +∆ =w ĭ +w a

a

a

a

a

(25)

a

~ ~ = w* − w ˆ . In the following, some adaptive laws will ˆ a and ĭ a = ĭ *a − ĭ where w a a a be proposed to on-line tune the mean and standard deviation of the Gaussian function of the SGFNN approximator to achieve favorable estimation of the dynamic function. To achieve this goal, the Taylor expansion linearization technique is employed to transform the nonlinear radial basis function into a partially linear form, i.e.

~ ~ + BT ı ~ +h ĭa = AT m a a

(26)

ª ∂Φ ª ∂Φ 1 ∂Φ n º ∂Φ n º where A = « 1 » |m =mˆ a , B = « » |ı =ıˆ a , h is a vector of higher∂m a ¼ ∂ı a ¼ ¬ ∂m a ¬ ∂ı a ~ = m* − m ~ = ı * − ıˆ , and ∂Φ k and ∂Φ k are defined as ˆa, ı order terms, m a a a a a ∂m ∂ı a a

a

T

ª º ª ∂Φ k º ∂Φ k ∂Φ k 0 0 0» « » = «0( k −1)×2 ( m − k )×2 m m m ∂ ∂ ∂ 1k 2k ¬ ¼ ¬ a¼

(27)

T

ª º ª ∂Φ k º ∂Φ k ∂Φ k 0 0 0» « » = «0( k −1)× 2 ( m − k )× 2 ∂ı 1k ∂ı lk ¬ ¼ ¬ ∂ı a ¼ Substituting (26) into (25), it is obtained that ~ ~T ˆ ~ T Aw ~ T Bw ˆa +ı ˆ a +ε f = w a ĭa + m a a

(28)

(29)

304

C.-M. Lin, C.-F. Hsu, and I.-F. Chung

~ =m ~ T Aw ~ =ı ~ T Bw ˆ Ta A T m ˆ a and w ˆ Ta B T ı ˆ a are used since they are scalars; and where w a a a a ~ T T *T * ~ ˆ h + w ĭ + w ĭ + ∆ and assume it is bounded by the uncertain term ε ≡ w a

a

a

i

i

0 ≤ ε ≤ E , where E is a positive constant representing the approximation error bound. However, it is difficult to measure this bound in practical applications. Thus, a bound estimation mechanism is developed to observe the bound of the approximation error. Define the estimation error of the bound ~ (30) E = E − Eˆ where Eˆ is the estimated error bound. 4.2 Fuzzy Rule Generation In general, the selection of the number of fuzzy rules is a trade-off between desired performance and computation loading. If the number of fuzzy rules is chosen too large, the computation loading is heavy so that they are not suitable for practical applications. If the number of fuzzy rules is chosen too small, the learning performance may be not good enough to achieve desired performance. To tackle this problem, the proposed SGFNN identifier consists of structure and parameter learning phases. The first step of the structure learning phase is to determine whether or not to add a new node (membership function) in layer 2 and the associated fuzzy rule in layer 3, respectively. In the rule generating process, the mathematical description of the existing rules can be represented as the membership degree of the incoming data to the cluster. Since one cluster formed in the input space corresponds to one potential fuzzy logic rule, the firing strength of a rule for each incoming data xi1 can be represented as the degree that the incoming data belong to the cluster. The firing strength obtained from (16) is used as the degree measure

β k = y k3 , k = 1, 2, ..., n( N )

(31)

where n(N ) is the number of the existing rules at the time N. According to the degree measure, the criterion of generating a new fuzzy rule for new incoming data is described as follows. Find the maximum degree β max defined as

β max = 1≤max βk k ≤n ( N )

(32)

It can be observed that if the maximum degree β max is smaller as the incoming data is far away the existing fuzzy rules. If β max ≤ β th is satisfied, where β th ∈ (0,1) a pregiven threshold, then a new membership function is generated. The mean and the standard deviation of the new membership function and the output action strength are selected as follows:

minew = xi1 , σ inew = σ i , wnew = 0

(33)

where xi is the new incoming data and σ i is a pre-specified constant. The number

n( N ) is incremented n( N + 1) = n( N ) + 1 .

(34)

IBC for Chaotic Systems Using Self-Growing Fuzzy Neural Network

305

4.3 IBC Design The proposed intelligent backstepping control (IBC) system is shown in Fig. 2, which encompasses a neural backstepping controller unb and an robust compensation controller u rc . The design of IBC for the chaotic dynamic system is described step-bystep as follows: Step 1. Define the tracking error e1 as (2), a stabilizing function α as (4) and e2 as (5). Step 2. The control law of the IBC is developed in the following equation uic = u nb + u rc

(35)

where

u nb = xc − fˆ − c1e1 − c2 e2 − e1

(36)

u rc = − Eˆ sgn(e2 )

(37)

and sgn(.) is a sign function and fˆ is the output of SGFNN. Substituting (35) into (6), it can be obtained that

e2 = f − fˆ − c2 e2 − e1 + u rc . By substituting (29) into (38), equation (38) becomes ~ T Aw ~ Tĭ ~ T Bw ˆ +m ˆa +ı ˆ a + ε − c2 e2 − e1 + u rc e2 = w a a a a

(38)

(39)

Step 3. Define the Lyapunov function as ~ ~ Tm ~ ~Tw ~ ~T ı ~ m ı e2 e2 w E2 V2 = 1 + 2 + a a + a a + a a + (40) 2 2 2η1 2η 2 2η 3 2η 4 ~ where E = E − Eˆ ; and η1 , η 2 , η 3 and η 4 are positive constants. Differentiating (40) with respect to time and using (39), it is obtained that

~ ~ ~ Tm ~ ~Tw ~ ~T ı ~ w m ı EE V2 = e1e1 + e2 e2 + a a + a a + a a +

η1

η2

η3

η4

~ T Aw ~ Tĭ ~ T Bw ˆ +m ˆa +ı ˆ a + ε − c2 e2 − e1 + u rc ) + = e1 (e2 − c1e1 ) + e2 (w a a a a ~ ~ ~ Tm ~ ~Tw ~ ~T ı ~ w m ı EE a a + a a+ a a +

η1

η2

η3

~ T (e ĭ ˆ + = −c1e12 − c2 e12 + w a a 2 ~ T ( e Bw ˆa+ ı 2 a

~ ı

a

η3

η4

~ w a

η1

~ m ~ T (e Aw ˆ a + a )+ )+m a 2

) + e2 (ε + u rc ) +

η2

~ ~ EE

η4

(41)

If the adaptive laws of the SGFNN identifier and the approximation error bound are chosen as

306

C.-M. Lin, C.-F. Hsu, and I.-F. Chung

~ = η e ĭ ˆ = −w ˆ w a a 1 2 a

(42)

~ = η e Aw ˆ a = −m ˆa m a 2 2

(43)

~ = η e Bw ˆa ıˆ a = −ı a 3 2

(44)

~ Eˆ = − Ǽ = η 4 e 2

(45)

then (41) can be rewritten as ~ ~ EE 2 2 ˆ V2 = −c1e1 − c2 e1 + ε e2 − E e2 +

η4

= − c e − c e + ε e2 − E e2 2 1 1

2 2 1

≤ −c1e12 − c2 e12 − ( E − ε ) e2 ≤ −c1e12 − c2 e12 ≤ 0 .

(46) ~ ~ ~ ~ Similar to the discussion of (10), it can be concluded that w a , m a , ı a and E are bounded and e1 and e2 converge to zero as t → ∞ .

adaptive laws (42), (43), (44)

β th

ˆ , ıˆ ˆ , m w a a a

SGFNN identifier (24)

xc

neural backstepping u nb + controller (36)

+ c1

d/dt

α − + d/dt

uic +

robust compensation u rc controller (37)

−

+

rule generation (31), (32), (33)

fˆ

e1

−

n(N )

Eˆ

e2

bound estimation algorithm (45)

intelligent backstepping control

Fig. 2. IBC for chaotic system

Chaotic system (1)

x

IBC for Chaotic Systems Using Self-Growing Fuzzy Neural Network

307

5 Simulation Results The IBC system has been tested on the abovementioned chaotic system to track a desired periodic orbit. The control parameters are selected as c1 = c2 = 1 ,

η1 = η 2 = η 3 = 20 , η 4 = 0.1 , σ i = 1.0 , and β th = 0.5 . These parameters are chosen to achieve favorable transient control performance considering the requirement of asymptotic stability and the possible operating conditions. The simulation results of the IBC for q = 1.8 and q = 7.0 are shown in Figs. 3 and 4, respectively. These results show that the proposed IBC design method can achieve favorable tracking performance. The simulation results not only the perfect tracking responses can be achieved but also the concise fuzzy rule’s size can be obtained since the proposed selforganizing mechanism and the online learning algorithms are applied. The simulation results show that by using the self organizing mechanism and the online learning algorithm, a perfect tracking response can be achieved as well as reduced fuzzy rule base size can be obtained.

control effort, u

state, x

xc

x

time (sec) (c)

time (sec) (a) control effort, u

state, x

x

xc

time (sec) (c)

time (sec) (a)

rule number

state, x

x

xc

time (sec) (d)

time (sec) (b)

rule number

state, x

xc

x

time (sec) (b)

Fig. 3. Simulation results of IBC for q=1.8

time (sec) (d)

Fig. 4. Simulation results of IBC for q=7.0

308

C.-M. Lin, C.-F. Hsu, and I.-F. Chung

6 Conclusions In this paper, an intelligent backstepping control (IBC) system has been proposed for the chaotic system. The developed IBC system utilized a self-growing fuzzy neural network identifier to online estimate the chaotic dynamic function. The control law of the IBC system is synthesized using the Lyapunov function, so that the asymptotic stability of the control system can be guaranteed. Finally, simulation results verified that the proposed IBC system can achieve favorable tracking performance of the nonlinear chaotic systems.

Acknowledgment The authors appreciate the partial financial support from the National Science Council of Republic of China under grant NSC-90-2213-E-155-016.

References 1. Lin, C.T., Lee, C.S.G.: Neural Fuzzy Systems: A Neuro-Fuzzy Synergism to Intelligent Systems, Englewood Cliffs, NJ: Pretice-Hall (1996) 2. Juang, C.F., Lin, C.T.: An On-line Self-constructing Neural Fuzzy Inference Network and its Applications. IEEE Trans. Fuzzy Syst., (1998) 12-32 3. Li, C., Lee, C.Y., Cheng, K.H.: Pseudo-error-based Self-organizing Neuro-fuzzy System. IEEE. Trans. Fuzzy Syst., (2004) 812-819 4. Lin, C.T., Cheng, W.C., Liang, S.F.: An on-line ICA-mixture-model-based Selfconstructing Fuzzy Neural Network. IEEE Trans. Circuits Syst., (2005) 207-221 5. Jiang, Z.P.: Advanced Feedback Control of the Chaotic Duffing Equation. IEEE Trans. Circuits Syst., (2002) 244-249 6. Yassen, M.T.: Chaos Control of Chen Chaotic Dynamical System. Chaos, Solitons & Fractals, (2003) 271-283 7. Wang, J., Qiao, G.D., Deng, B.: H ∞ Variable Universe Adaptive Fuzzy Control for Chaotic System. Chaos, Solitons & Fractals, (2005) 1075-1086 8. Ji, J.C., Hansen, C.H.: Stability and Dynamics of a Controlled Van Der Pol-Duffing Oscillator. Chaos, Solitons & Fractals, (2006) 555-570 9. Slotine, J.E., Li, W.: Applied Nonlinear Control, Prentice-Hall, Englewood Cliffs, New Jersey (1991) 10. Wang. L.X.: Adaptive Fuzzy Systems and Control: Design and Stability Analysis. Englewood Cliffs, NJ: Prentice-Hall (1994)

Modeling of Rainfall-Runoff Relationship at the Semi-arid Small Catchments Using Artificial Neural Networks Mustafa Tombul1 and Ersin O÷ul2 1

Anadolu University Engineering Faculty of Civil Engineering Department, Eskiúehir/Turkey 2 III.Regional Directorate of State Hydraulic Work, Eskiúehir /Turkey [email protected], [email protected]

Abstract. The artificial neural networks (ANNs) have been applied to various hydrologic problems in recently. In this paper, the artificial neural network (ANN) model is employed in the application of rainfall-runoff process on a semi-arid catchment, namely the Kurukavak catchment. The Kurukavak catchment, a sub-basin of the Sakarya basin in NW Turkey, has a drainage area of 4.25 km2. The performance of the developed neural network based model was compared with multiple linear regression based model using the same observed data. It was found that the neural network model consistently gives good predictions. The conclusion is drawn that the ANN model can be used for prediction of flow for small semi-arid catchments.

1 Introduction Modeling of a rainfall-runoff process of a watershed is important in water resources management and design activities such as flood control and management, and design of various hydraulic structures in a watershed, etc. Traditional hydrologic models that are based on mathematical representation of watershed processes have been applied to stream flow predictions. These models typically require a significant effort of data collection including rainfall, stream-flow, and watershed characteristics. Additional efforts are needed for assessing model parameters, and performing model calibration and verification. Hydrologic models devoted to stream-flow predictions range from hourly to daily forecast of stream/flood flows [1, 2 3] The ANN models have been used increasingly in various aspects of science and engineering because of its ability to model both linear and nonlinear systems without the need to make any assumptions as are implicit in most traditional statistical approaches. In some of the hydrologic problems, ANNs have already been successfully used for river flow prediction [4], for rainfall-runoff process [5, 6]. Current research on ANN-hydrologic applications ranges from the predictions of peak discharge and time to peak from a single rainfall event, to the forecast of hourly or daily river stages or discharges [7, 8, 9, 10, 11, 12, 13, 14]. In the hydrological forecasting studies many experiments reported that ANNs may offer a promising D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 309 – 318, 2006. © Springer-Verlag Berlin Heidelberg 2006

310

M. Tombul and E. O÷ul

alternative for rainfall–runoff modeling [5, 2, 15, 16, 17, 18, 19, 20, 21, 10, 22, 23, 24, 25, 26]. The ANN models are powerful prediction tools for the relation between rainfall and runoff parameters. The results will support decision making in the area of water resources planning and management. In these hydrological applications, a feedforward back propagation algorithm is used [27]. The aim of this paper is to model the rainfall-runoff relationship in the semiarid small catchment (Kurukavak) located in Turkey using a black box type model based on ANN methodology.

2 The Study Catchment The Kurukavak catchment, a sub-basin of the Sakarya basin in north-west Turkey, has a drainage area of 4.25 km2 and ranges in altitude from 830 m to1070 m. The basin is equipped with three rain gauges (R1, R2 and R3) and one runoff recording station (H1) (Fig. 1). The Rainfall and Runoff daily data at the average of (R1, R2 and R3) stations were used for model investigation. The data contains information for a period of four years (1988 to 1991). The entire database is represented by 1460 daily values of rainfall and runoff pairs. The ANN model was trained using the resulting runoff and rainfall daily data. The database was collected by the Services of Rural Investigation Instute.

Fig. 1. Location of Kurukavak catchment in Turkey

3 The Structure of the ANN Artificial neural networks employ mathematical simulation of biological nervous systems in order to process acquired information and derive predictive outputs after the network has been properly trained for pattern recognition. The main theme of ANN research focuses on modeling of the brain as a parallel computational device for various computational tasks that were performed poorly by traditional serial computers.

Modeling of Rainfall-Runoff Relationship at the Semi-arid Small Catchments

311

The neural network structure in this study possessed a three-layer learning network consisting of an input layer, a hidden layer and an output layer consisting of output variable(s) (Fig. 2). The input nodes pass on the input signal values to the nodes in the hidden layer unprocessed. The values are distributed to all the nodes in the hidden layer depending on the connection weights Wij and Wjk [28-29] between the input node and the hidden nodes. Connection weights are the interconnecting links between the neurons in successive layers. Each neuron in a certain layer is connected to every single neuron in the next layer by links having an appropriate and an adjustable connection weight.

Fig. 2. Architecture of the neural network model used in this study

In this study, the FFBP were trained using Levenberg–Marquardt optimization technique. This optimization technique is more powerful than the conventional gradient descent techniques [30]. The study [31] showed that the Marquardt algorithm is very efficient when training networks which have up to a few hundred weights. Although the computational requirements are much higher each iteration of the Marquardt algorithm, this is more than made up for by the increased efficiency. This is especially true when high precision is required. The Feed Forward Back Propagation (FFBP) distinguishes itself by the presence of one or more hidden layers, whose computation nodes are correspondingly called hidden neurons of hidden units. The function of hidden neurons is to intervene between the external input and the network output in some useful manner.

4 Method Application of ANN in Rainfall-Runoff Modeling The runoff at watershed outlet is related not only to the current rainfall rate, but also to the past rainfall and runoff situations because of its certain storage capacity. For a discrete lumped hydrological system, the rainfall-runoff relationship can be generally expressed as [32, 5]

[

Q(t ) = F R(t ), R(t − ∆t ),..., R(t − n x ∆t ), Q(t − ∆t ),...Q(t − n y ∆t )

]

(1)

312

M. Tombul and E. O÷ul

where R represents rainfall, Q represents runoff at the outlet of the watershed, F is any kind of model structure (linear or nonlinear), ∆t is the data sampling interval, and nx and ny are positive integers numbers reflecting the memory length of the watershed. In this study the Simplex search method is used to find a set of optimum values for those weights used in the ANN, which are denoted by w

by w

opt jk

opt ij

, 0 ≤ i ≤ n , 1 ≤ j ≤ l and

, 0 ≤ j ≤ l , 1 ≤ k ≤ l , 0 ≤ j ≤ l , 1 ≤ k ≤ m . The estimated runoffs, denoted by

Qˆ (t ) , are determined as a function of those optimum weights of the ANN, which is expressed as

[

Q(t ) = F R(t), R(t − ∆t),...,R(t − n x ∆t), Q(t − ∆t ),...Q(t − n y ∆t) woptij ,wopt kj

]

(2)

When the ANN is implemented to approximate the above relationship between the watershed average rainfall and runoff, there will be a number of n =n x +n y +1

nodes in the input layer, n =n x +n y +1 , while there is only one node in the output, i.e. m=1. The database collected represents four years daily sets of rainfall-runoff values for the Kurukavak basin. In this paper, we used the data for the last year (1991) for model testing, while the other remaining data (1988 to 1990) was used for model training/calibration. The training phase of ANN model was terminated when the mean squared error (RMSE) on the testing databases was minimal. The flow estimation simulations were carried out in two steps. First, only rainfall data was employed for the input layer. Then previous daily flow value was also incorporated into the input data group. They [17], indicated that a noticeable improvement in estimation performance was obtained with the incorporation of flow value into the input layer. In this present, then the flow at the precedent day (Q ) was also added to t-1 the input layer in order to increase the estimation performance.

5 Evaluation Measures the Model Performance The results of the network model (FFBP) applied in the study were evaluated for their performance by estimating the following standard global statistical measures. The statistical criteria consist of root mean squared of error (RMSE), coefficient of determination (R2) and the index of volumetric fit (IVF). They RMSE and (R2) is knowledge very well everybody. Also the index of volumetric fit (IVF)are defined as N ¦ Q sim,i i IVF = =1 N ¦ Qobs,i i =1

(3)

Modeling of Rainfall-Runoff Relationship at the Semi-arid Small Catchments

where

313

Q obs ,i and Q sim,i are respectively, the actual and predicted value of flow

(normalized between 0 and 1). The coefficient of determination (R2) statistic measures the linear correlation between the actual and predicted flows values. The coefficient of determination is often used to measure the performance of a hydrological model. The value is in the range of [-∞, 1]. The zero value means the model performs equal to a naive prediction, that is, a prediction using an average observed value. The value less than zero means the model performs worse than the average observed value. A value between 0.6-0.8 is moderate to good. A value more than 0.8 is a good fit. A value of one is a perfect fit. The RMSE was used to measure the agreement between the observed and simulated water balance. The closer the RMSE value is to zero, the better the performance of the model. The another index emplyoed to assess the model performance is the simple index of volumetrik fit (IVF), which is expressed as the ratio of simulated run off volume to the correspondind observed one. A value of for IVF one is a perfect fit.

6 Results and Discussions The goal of the training process is to reach an optimal solution based on some performance measurements such as RMSE, coefficient of determination known as Rsquare value (R2), and the IVF. Therefore, required ANN model was developed in two phases: training (calibration) phase, and testing (generalization or validation) phase. In the training phase, a larger part for database (three years) was used to train the network and the remaining part of the database (one year) is used in the testing phase. Testing sets are usually used to select the best performing network model. In this research, the ANN was optimal at 50 iterations with 4 hidden nodes. The corresponding accuracy measures of this network model on testing and training data are given in the following table (Table 1). Generally, accuracy measures on training data are better than those on testing data. Table 1. Statistical parameter and accuracy measures of this network model at training and testing phases

Training Phases Testing Phases

RMSE 0.021 0.072

R2 0.75 0.726

IVF 1.02 1.03

The comparison between the predicted and actual flow values at training and testing phases show good agreement with the R2 are respectively 0,75 and 0,726 (Figure 3a, 4a). As regards to the volumetric fit, the value of the IVF is 1.02 in the calibration period and 1.03 in the verification period. From these results, there is no

314

M. Tombul and E. O÷ul

doubt the ANN model is very successful in simulating the non-linear rainfall-runoff relationship on the Kurukavak catchment. Root mean square error (RMSE) value for the training and testing period was considered for performance evaluation and all testing stage estimates were plotted in the form of hydrograph (Figure 3b, 4b).

Training phase

3

Actual flow (m /s)

0.4 0.3 0.2 0.1

R2 = 0.75

0.0 0.0 0.1 0.1 0.2 0.2 0.3 0.3 0.4 0.4 0.5 3

Predicted flow (m /s)

Fig. 3a. Comparision between the actual and ANN predicted flow values for traning phase

Flow (m3/s)

Training Phase

FFBP observed

0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0

100

200

300

400

500

600

700

800

900 1000 1100 1200

Time (day)

Fig. 3b. Comparision between the actual and ANN Predicted flow values for traning phase

Also the statistical parameters of the predicted and actual values of flow for the entire database are practically identical (Table 2). In order to evaluate the performance of the ANN, the multiple linear regression (MLR) technique was applied with the same data sets used in the ANN model. Figure 5 shows the comparative results obtained by MLR technique. The R2 values for MLR and ANN models are presented in Table 3. Apparently, the ANN approach gives much better prediction than the traditional method (MLR).

Modeling of Rainfall-Runoff Relationship at the Semi-arid Small Catchments

315

Actual flow (m3/s)

Testing phase 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0

R2 = 0.7263 0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Predicted flow (m3/s)

Fig. 4a. Comparision between the actual and ANN predicted flow values for testing phase

Testing phase Observed

Daily mean flow (m3 /s)

1.6

FFBP

1.4 1.2 1 0.8 0.6 0.4 0.2 0 0

100

200

300

400

Time (day)

Fig. 4b. Comparision between the actual and ANN predicted flow values for testing phase Table 2. Statistical parameter of the predictetd and actual flow at training and testing phases Training phases

Testing phases

Minimum

Actual Flow(m3/s) 0

Predicted Flow(m3/s) 0

Actual Flow(m3/s) 1E-5

Predicted Flow(m3/s) 0

Maximum

0.42157

0.4099

1.4817

1.3880

Mean Standart of Deviation Coefficient of Variation

0.0152 0.0419 2.75

0.0157 0.0371 2.36

0.0267 0.1379 5.07

0.0273 0.1200 4.44

Statistical parameter

316

M. Tombul and E. O÷ul Table 3. Comparison of determination coefficients (R2)

FFBP(ANN)

MLR R2

Training phase Testing phase

0.75 0.72

0.66 0.60

Testing Phase

1,6

1,6

1,4

1,4

Actual flow (m3/s)

Actual flow (m3/s)

Training Phase

1,2 1 0,8 0,6 0,4

1,2 1 0,8 0,6 0,4 0,2

0,2

2

R = 0.6598 0 0 0,2 0,4 0,6 0,8 1 1,2 1,4 1,6

Predicted flow (m3/s) (a)

R2 = 0,6028

0 0 0,2 0,4 0,6 0,8 1 1,2 1,4 1,6

Predicted flow (m3/s) (b)

Fig. 5. Comparisionof correlation coefficients between actual and predicted flow ANN and MLR models.(a) Training phase (b) Testing phase

7 Conclusion In this study, the results obtained show clearly that the artificial neural networks are capable of model rainfall-runoff relationship in the small semi-arid catchments in which the rainfall and runoff are very irregular, thus, confirming the general enhancement achieved by using neural networks in many other hydrological fields. The results and comparative study indicate that the artificial neural network method is more suitable to prediction of for runoff flow small semi-arid catchments than classical regression model. The ANN approach could provide a very useful and accurate tool to solve problems in water resources studies and management.

References 1. Bertoni, J. C., Tucci, C. E., Clarke, R. T.: Rainfall-based Real-time Flood Forecasting. J. Hydrol., 131 (1992) 313–339 2. Shamseldin, A. Y.: Application of Neural Network Technique to Rainfall-runoff Modeling. J. Hydrol. 199 (1997) 272-294

Modeling of Rainfall-Runoff Relationship at the Semi-arid Small Catchments

317

3. Rajurkar, M. P., Chaube, U. C.: 2002: Artificial Neural Networks for Daily Rainfall-runoff 4. Modeling. Hydrol. Sci. J., 478 (6)(2002) 865–877 5. Riad, S. J., Mania, L., Bouchaou, Y. Najjar.: Predicting Catchment Flow in Semi Arid Region via Artificial Neural Network Technique. Hydrological Process, 18 (2004) 2387-2393 6. Hsu, K. L., Gupta, H. V., Sorooshian, S.: Artificial Neural Network Modeling of the Rainfallrain of Process. Water Resour. Res, 31 (10) (1995)2517–2530 7. Chang, F. J., Suen, J. P.: A Study of the Artificial Neural Network for Rainfall-runoff process. Journal of Chinese Agricultural Engineering (In Chinese), 43 (1) (1997) 9-25 8. Smith, J., Eli, R. N.: Neural-network Models of Rainfallrunoff Process. J. Water Resour. Plan. Manage, 121 (6)(1995) 499–508 9. Thirumalaiah, K., Deo, M. C.: River Stage Forecasting Using Artificial Neural Networks. J. Hydrologic Eng, 3 (1) (1998) 26-31 10. Thirumalaiah, K., Deo, M. C.: Hydrological Forecasting Using Artificial Neural Networks. Hydrologic Eng, 5 (2) (2000) 180-189 11. Campolo, M., Andreussi, P., Soldati, A.: A River Flood Forecasting with a Neural Network Model. Water Resour. Res, 35 (4) (1999) 1191–1197 12. Imrie, C. E., Durucan, S., Korre, A.: River Flow Prediction Using Artificial Neural Networks: Generalization Beyond the Calibration Range. J. Hydrol, 233 (2000) 138-153 13. Liong, S. Y., Lim, W., Paudyal, G. N.: River Stage Forecasting in Bangladesh: Neural Network Approach. J. Comput. Civ. Eng, 14 (1) (2000) 1-18 14. 13.Tokar, A. S., Markus, M.: Precipitation-runoff Modeling Using Artificial Neuralnetworks and Conceptual Models. J. Hydrologic Eng, 5 (2) (2000) 156–161 15. Kim, G. S., Borros, A. P.: Quantitative Flood Forecasting Using Multisensor Data and Neural Networks. J. Hydrol., 246 (2001) 45–62 16. Sajikumar, N., Thandaveswara, B. S.: ANon-linear Rainfall–runoff Model Using an Artificial Neural Network. J. Hydrol., 216 (1999)32–55 17. Tokar, A. S., Johnson, P. A.: Rainfall–runoff Modeling Using Artificial Neural Networks. J.Hydrol. Eng., ASCE, 4(3)(1999)232–239 18. Cigizoglu, H. K, Alp, M.: Rainfall-Runoff Modeling Using Three Neural Network Methods. Artificial Intelligence and Soft Computing- ICAISC 2004, Lecture Notes in Artificial Intelligence, 3070 (2004) 166-171 19. Anctil, F., Perrin, C., Andreassian, V.: Impact of the Length of Observed Records on the Performance of ANN and of Conceptual Parsimonious Rainfall-runoff Forecasting Models. Environ.Modell.Software, 19 (2004) 357-368 20. Freiwan, M., Cigizoglu, H. K.: Prediction of Total Monthly Rainfall in Jordan using Feed Forward Backpropagation Method. Fresenius Environmental Bulletin, 14 (2) (2005) 142-151 21. Thirumalaiah, K., D., M. C.: Real-time Flood Forecasting Using Neural Networks. Computer-Aided Civil Infrastruct. Engng, 13 (2) (1998)101–111 22. Zealand, C. M., Burn, D. H., Simonovic, S. P.: Short term Streamflow Forecasting Using Artificial Neural Networks. J. Hydrol., 214 (1999) 32–48 23. Salas, J. D., Markus, M., Tokar, A. S.: Streamflow Forecasting Based on Artificial Neural Networks. In: Artificial Neural Networks in Hydrology, Govindaraju, R. S. and Rao, A. R. (eds.), Kluwer Academic Publishers, (2000) 24. Sivakumar, B., Jayawardena, A. W., Fernando, T. M. K. G.: River Flow Forecasting: use of Phase Space Recostruction and Artificial Neural Networks Approaches. J.of Hydrology, 265 (2002) 225-245

318

M. Tombul and E. O÷ul

25. Cigizoglu, H. K.: Estimation, Forecasting and Extrapolation of Flow Data by Artificial Neural Networks. Hydrological Sciences Journal, 48 (3) (2003) 349-361 26. Cigizoglu, H. K.: Corporation of ARMA Models into Flow Forecasting by Artificial Neural Networks. Environmetrics, 14 (4) (2003) 417-427 27. Kisi, O.: River Flow Modeling Using Artificial Neural Networks. ASCE J. of Hydrologic Engineering, 9 (1) (2004) 60-63 28. Lippmann, R. P.: An Introduction to Computing With Neural Nets. IEEE ASSP Magazine, (1987)4-22 29. Najjar, Y., Ali, H.: On the Use of BPNN in Liquefaction Potential Assessment Tasks. In Artificial Intelligence and Mathematical Methods in Pavement and Geomechanical Systems, (Edited by Attoh-Okine), (1998) 55-63 30. Najjar, Y., Zhang, X.: Characterizing the 3D Stress-strain Behavior of Sandy Soils: A Neuro-mechanistic Approach. In ASCE Geotechnical Special Publication Number 96, (Edited by G. Filz and D. Griffiths), (2000) 43-57 31. Cigizoglu, H. K., Kiúi, O.: Flow Prediction by Two Back Propagation Techniques Using k-fold Partitioning of Neural Network Training Data, Nordic Hydrology, (in press), (2005) 32. Hagan, M. T., Menhaj, M. B.: Training feedforward techniques with the Marquardt algorithm. IEEE Transactions on Neural Networks, 5 (6)(1994) 989-993 33. Chow, V. T., Maidment, D. R., Mays, L. W.: Applied Hydrology. McGraw-Hill, Inc., NY, (19

A Novel Multi-agent Based Complex Process Control System and Its Application Yi-Nan Guo, Jian Cheng, Dun-wei Gong, and Jian-hua Zhang College of Information and Electronic Engineering, China University of Mining and Technology, Xuzhou, 221008 Jiangsu,China [email protected]

Abstract. ComplH[ process control systems need a hybrid control mode, which combines hierarchical structure with decentralized control units. Autonomy of agents and cooperation capability between agents in multi-agent system provide basis for realization of the hybrid control mode. A novel multi-agent based complex process control system is proposed. Semantic representation of a control-agent is presented utilizing agent-oriented programming. A novel temporal logic analysis of a control-agent is proposed using Petri nets. Collaboration relationships among control-agents are analyzed based on extended contract net protocol aiming at the lack of reference[1].Taken pressure control of recycled gas with complicated disturbances as an application, five kinds of control-agents are derived from control-agent. Reachable marking tree and different transition of each derived control-agent are analyzed in detail. Actual running effect indicates multi-agent based hybrid control mode is rationality and flexible. Temporal logic analysis based on Petri nets ensures the reachability of the systems. Extended contract net protocol provides a reasonable realization for collaboration relationships.

1 Introduction Industry processes increasingly become larger and more complex. It is difficult to control them only by single controller based on traditional control theories. And complex process control systems require more flexibility, opening and reconfiguration. Aiming at solving above problems, new theories are urgently needed. Multi-agent systems (MAS) show strong adaptability in analyzing and designing complex process control systems because of their autonomy and opening [2]. So it has important meanings to apply MAS to design of complex process control systems. Up to now, many control architectures have been presented aiming at complex processes. They are generally divided into three kinds including centralized control, hierarchical control and decentralized control. Centralized control is unreliable and inflexible because information about all of controllers is supervised by one supervision unit[3]. And when a control problem is decomposed, connective relationships among controllers are more complicated as the number of partial controllers is more. In order to solve above difficulties, hierarchical control was proposed in which the functions of supervision are divided vertically. It has applied to D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 319 – 330, 2006. © Springer-Verlag Berlin Heidelberg 2006

320

Y.-N. Guo et al.

analyze complex processes widely. But horizontal division of bottom control functions was not included. Although decentralized control is horizontally distributed, it lacks cooperation between control units which always leads to sub-optimum [4]. It is obvious that the architecture, which combines hierarchical structure with decentralized control units, is reasonable. But there is a lake of appropriate control theories for analysis of above control mode. Multi-agent system consists of multiple autonomous agents by some cooperation mechanism. Each agent can implement tasks independently. Through cooperation and interaction among agents, MAS can accomplish complex tasks and their optimization. It is obvious that MAS provides the foundation for realizing above control mode. Up to now, MAS have been adopted to analysis of complex process control systems by many researchers. Hybrid control systems based on multiple agents were proposed [5]-[6]. Breemen utilized decompose-conquer strategy of MAS to decompose a complex problem to many sub-problems. Each sub-problem was solved by an agent. And the whole task was accomplished by cooperation among agents [7]-[10]. It provides design foundation and common framework for complex process control systems. But there is lake of semantic representation of agents and temporal logic analysis and implementation of their collaboration relationships based on agentoriented programming. Thereby a novel multi-agent based complex process control system is put forward in the paper. It makes the best of autonomy of agent and cooperation capability among agents to realize complex processes control which makes systems flexible and opening. In the rest of the paper, the kernel of control-agents and the collaboration strategies between them are proposed in Section2.To validate rationality of the systems, they are applied to pressure control system of recycled gas in Section3.At last, future work planned to extend the cooperation strategies is included.

2 Multi-agent Based Complex Process Control Systems Multi-agent based complex process control systems adopt a hybrid control mode, which combines hierarchical structure with decentralized control units. How to decompose control functions, realize each control unit and cooperate among control units are key problems. Aiming at above problems, decompose-conquer strategy, agent-oriented programming, Petri Nets (PNs) and contract net protocol are introduced. Decompose-conquer strategy is adopted to simplify the design of complex process control systems. In the strategy, division and integration are two key problems. Division is how to separate a complex process control problem into a group of control sub-problems according to the requirement of control. Each control sub-problem is solved by an agent, called control-agent. Integration is how to synthesize the solutions of sub-problems effectively. A division method has presented by Breemen[1]. In this paper, we emphasize particularly on semantic representation of control-agents adopting agent-oriented programming and temporal logic analysis of their cooperation relationships utilizing contract net protocol.

A Novel Multi-agent Based Complex Process Control System and Its Application

321

2.1 Structure of Control-Agent All of control-agents have same structure and basic functions. So the normal kernel of control-agents, which is called base class of control-agents, is abstracted and described using agent-oriented programming as follows [11]. ::=< function FD> ::=<structure description SD> <environment description ED> ::=<symbol > ::=<waiting> <ED>::= ::= ::=<waiting> <Sa>::=<waiting> <SD>::=<parameters P> ::=<state control SCF> ::=< local controller F> <SCF>::=<state transfer function A> <parameter preservation function PIF> <parameter activation function PAF> ::=<message management><state supervision> Aiming at different control sub-problems, different instances of control-agent are derived from CControlagent. Output variables are sent to actuators as control signal. Communication variables are provided for other control-agents as interaction data. They are computed by controller according to input variables from sensors, formulized by [OUT CD ]T = F(IN, t ) . Different control-agents maybe have different F. Control-agents have three kinds of inner states which can be switched each other. OUT and COM of a control-agent are valid only when its inner state is activation. If AD is waiting, it indicates control-agent waits to be activated and has activation intention. Transfer of such inner states are controlled by SCF, which includes three kinds of A = {activation → inactivation, inactivation → waiting, waiting → activation} .Different control-agents have different transfer condition of SCF. To facilitate the analysis of above inner states and transfer relationships, PNs is utilized, which have been widely used to model, analyze and synthesize control laws for DES and MAS. A PN is a four-tuple N = ( P, T , F , m0 ) , where P is a finite set of places, T is a finite set of transitions, F (P × T ) (T × P) is a set of transition arcs, m0 : P Z is the initial marking with Z as the set of nonnegative integers[12]-[13]. In the paper, a PN denotes a control-agent where T={A,ICF,OCF} and P={AD,Ca,Sa}. Ca is input collaboration signal offered by other control-agents. Sa is output collaboration signal sent to other control-agents. The PN of a control-agent is illustrated in Fig.1. P1 , P2 and P3 denote AD,Ca and Sa respectively. Each place have

㸢

㼁

㺥

322

Y.-N. Guo et al.

three states corresponding to three kinds of states of AD, Ca or Sa, such as P1 , P1' , P1'' denote inactivation, waiting, activation respectively. Assuming inactivation=0, waiting=1, activation =2. The initial state of PN is m0 = [0 0 0] .

Fig. 1. State transfer of an agent described by PNs

From Fig.1, we can see a control-agent can control activation intention autonomously, but whether it is activated is determined by Ca. That is, a control-agent is activated only when activation intention and collaboration relationship are satisfied. And whether Sa is activated is controlled by inner states. Only when AD=activation, output collaboration signal to other control-agents is activated. 2.2 Collaboration Relationships Among Control-Agents Based on Contract Net Protocol Collaboration relationships among control-agents reflect coupling relationships among sub-problems with the restriction of execution time and compute resources. According to the characteristics of complex processes and share relationships of resources, coupling relationships among sub-problems can be divided into three kinds: independence, sequence and competition. Corresponding collaboration relationships among control-agents consist of parallel, sequence and prior-based competition [9]. But temporal logic analysis and implement method of above relationships were not given in [9]. So contract net protocol is utilized to analyze distributed control and autonomous cooperation among control-agents in the paper. Contract net protocol is a distributed negotiation mechanism based on a marketlike protocol [13]. In it, there are two types of roles: manager and bidder. A manager provides a task to be accomplished. A bidder implements a task according to the contract between it and the manager. In multi-agent based complex process control system, a control-agent can act as a manager or a bidder in different circumstances which is determined by above collaboration relationships. To facilitate the analysis and check the feasibility of such protocol, PNs is adopted. Suppose a PN denotes a control-agent. So above collaboration relationships can be mapped into relationships between PNs, as shown in Fig.2. PRIi is the prior of i th control-agent. It is obvious that there is no interaction about information between two control-agents in parallel. In other circumstances, Ca of PN2 is related to Sa of PN1 . In Fig.2(b), Ca of PN2 is activated after Sa of PN1 is activated. It indicates controlagents are executed orderly. However, only when Sa of PN1 is inactivated, Ca of PN2

A Novel Multi-agent Based Complex Process Control System and Its Application

(a)parallel

323

(b)sequence

(c)prior-based competition( PRI1> PRI 2 ) Fig. 2. Collaboration relationships described by PNs

can be activated in Fig.2(c). It shows control-agents can not possess the same resources at the same time. Based on above collaboration relationships, contract net protocol is extended, as shown in Fig.3. Assuming a set of PNs is NS = {N1 , N 2 , N K } where K is size of NS.

(a) phase1

(b) phase2 Fig. 3. Extended contract net protocol

Extended contract net protocol includes two phase. Each phase contain a traditional establishment of a contract, including four steps: (1)Notification for bidding sent by managers: the manager announces a task to all bidders. (2)Decision-making by bidders: bidders determine whether they can perform the task. (3)Selection of contractor by managers: the manager chooses cooperative bidder. (4)Establishment of contract[3],[13]. Phase1: A task for a control sub-problem is announced to all control-agents. Assuming control− agent1 establishes a contract and perform its control strategies, as shown in Fig.3(a). Here, control− agent1 acts as a bidder.

324

Y.-N. Guo et al.

Phase2: When production circumstances vary, control− agent1 acts as a manager to establish a new contract. For example, control− agent2 establishes a new contract with control− agent1 ,as shown in Fig.3(b).In this phase, contracts reflect collaboration relationships between control-agents. And in different steps of establishment of a contract, states of Ca and Sa in each PN is different. Taken N1 , N 2 in Fig.3(b) as an example, temporal logic is discussed. We know collaboration relationship between them is sequence. Letting mij denotes the marking of N j in i th step. Assuming inactivation=0, waiting=1, activation =2. m11 = [2 2 1] ∩ m12 = [0 1 0] when N1 sends a task. If N 2 is capable of performing the task, m22 = [1 1 0] . After N1 selects N 2 as

a contractor, m31 = [1 1 1] . At last, while a contract is established, m42 = [1 1 1] . Through two phases, control-agents compose of a hierarchical structure which includes two layers: control-agents which act as managers make up the high layer and the low layer contains control-agents which act as bidders. It is obvious that this structure is dynamic. In extended contract net protocol, as some control-agents which act as bidders are added or deleted, other control-agents in the systems still work normally. This avoids the difficulty in centralized control mode. In conclusion, multi-agent based complex process control systems work with an opening and distributed collaboration mode. It makes the best of autonomy of agent and cooperation capability among agents to integrate hierarchical control and decentralized control which makes systems flexible.

3 Multi-agent Based Pressure Control System of Recycled Gas Recycled gas is the main fuel to provide energy for production of coke in coke plant. The stability of pressure of recycled gas directly influences the stability of inner temperature of coke oven, which affects the quality of coke and the quantity of gas. So pressure control of recycled gas is vital. The pressure of recycled gas is influenced by many disturbances. And pressure control system emphasizes particularly on different control performance in different circumstances. So it is difficult to satisfy all performances only using single controller. Therefore multi-agent based pressure control system of recycled gas is put forward based on above hybrid control mode. 3.1. Analysis of Disturbances

There are four kinds of disturbances which mainly influence the pressure of recycled gas[14]. Disturbances1: Periodical disturbance is caused by switching processes in current coke oven. The period of disturbance Tc=20min and each disturbance stays 45s. Recycled gas is provided to coke oven in two directions periodically so as to sufficient combustion and exhaust waste gas. Two directions include machine-side and coke-side. They are switched automatically every 20 minute, as shown in Fig.4. Because valve for recycled gas is closed for a short time in a switch process, pressure

A Novel Multi-agent Based Complex Process Control System and Its Application

325

Fig. 4. Switching process

of recycled gas varies to maximum and stays during the switching process. And the fluctuation of pressure ǻp [1500,4500] Pa. This causes large ǻp as a switching process finishes. Disturbances2: Periodical disturbance is caused by switching processes in adjacent coke ovens. | ǻp | [200,800] Pa and period Ta=20min.Recycled gas is transmitted to different coke oven by gas pipes. Because gas pipes are connected in parallel, periodical switching processes in adjacent coke ovens result in coupling disturbance. Note Disturbances1 and Disturbances2 do not happen at the same time. Disturbances3: Random disturbance is caused by the change of quantity of gas for export and heating boiler. And | ǻp | [20,200] Pa. Disturbances4: Random fluctuation of pressure is caused by the hydrocharacteristic of gas. And | ǻp | [0,20] Pa. In a word, the fluctuation of pressure caused by different disturbances are different. And Disturbances1 leads to the largest ǻp , so it is the main disturbance.

㺃

㺃

㺃

㺃

3.2 Decomposition of Pressure Control System and Analysis of Control Strategies Based on Control-Agents

Control objectives of pressure control system of recycled gas are different in different circumstances. First, the fluctuation of pressure caused by periodical switching processes must be restrained as soon as possible. Second, control precision of pressure control system is restrained from +10Pa to -10Pa. Considering above disturbances and control objectives, pressure control of recycled gas is divided into periodical control sub-problems and regulation control sub-problems. Periodical control sub-problems restrain periodical disturbance as soon as possible. According to the source of periodical disturbances, periodical control of current coke oven and periodical control of adjacent coke oven are included. Regulation control sub-problems realize accurate control of pressure. According to the range of pressure fluctuation, large fluctuation control for Disturbances3 and small fluctuation control for Disturbances4 are obtained. In addition, because reference pressure of recycled gas is varied along with the production of different quality of coke, reference-setting sub-problem is used to set reference pressure. And four control sub-problems essentially make pressure changing with reference pressure in time.

326

Y.-N. Guo et al.

Each sub-problem is realized by an instance of control-agent. Based on above decomposition, RPagent for reference-setting sub-problem and PCCagent, PCAagent, LFRagent and SFRagent for four control sub-problems are derived from CControlagent. Different derived control-agents have different control strategies and transfer conditions for inner states. But reachable marking tree of PN RMT(PN) is same, as shown in Fig.5.

Fig. 5. Reachable marking tree of PN of control-agent

PCCagent. It restrains pressure fluctuation caused by periodical switching processes in current coke oven. Assuming p is pressure of recycled gas. t is the current time.Letting tcb = {t | t + K • Tc , K N +} and tca = {t | tcb + 45}denote the initiative time and the end time of a switching process in current coke oven respectively. Letting pcb = p |t =tcb and pcb = p |t =tca .In a switching process, pcb = pca and p = pmax |t㺃(tcb ,tca ) are satisfied.

㺃

To avoid overshoot caused by excess regulation, FPCC := [out= 0] is adopted. During a switching process, PCCagent is activated. So when transition t1 := [t = tcb ] in PN of PCCagent is satisfied, the marking changes from m0 = [0 0 0] to m1 = [1 0 0] . When transition t3 := [t = tca ] is satisfied, the marking changes from m5 = [2 0 0] to m10 = [0 0 0] . PCAagent. Control objective of PCAagent is to restrain large pressure fluctuation caused by periodical switching processes in adjacent coke ovens and other disturbances as soon as possible where | ǻp | [200,800] Pa. Letting tab = {t | t + K • Ta } and taa = {t | tab + 45} denote the initiative time and the end time of a switching process in adjacent coke ovens respectively. Assuming e is the error between reference pressure and actual pressure. PD control is adopted so as to restrain error and its increasing tendency quickly, formalized by FPCA := [out = KPPCA[e(k) - e(k -1)] + KDPCA[e(k) - e(k -1) + e(k - 2)]]. Letting epl and eph denote the minimum and maximum of absolute value of error ep =| e | . When transition t1 := [(t = tab ) (ep (eplPCA, ephPCA])]is satisfied, PCAagent has activation intention. The marking changes from m5 = [2 0 0] to m10 = [0 0 0] when transition t3 := [(t = taa ) (ep eplPCA) (ep > ephPCA])] is satisfied. LFRagent. It restrains frequent pressure fluctuation with high control precision and short transition time. Therefore PID control is adopted, described by

㺃

㺥㺃

㺥 İ 㺥

A Novel Multi-agent Based Complex Process Control System and Its Application

327

FLFR := [out = K PLFR [e (k ) - e (k - 1)] + K ILFR e (k ) + K DLFR [e (k ) - 2e (k - 1) + e (k - 2 )]]

㺃

LFR l

t1 := [ep (ep

LFR h

, ep

LFR h

] | ep

. When transition ep ] is satisfied, the marking changes from m0 = [0 0 0]

İ

PCA l

to m1 = [1 0 0] . SFRagent. Considering the sensitivity and life-span of actuators, the aim of SFRagent is to stabilize pressure within the range satisfied control precision and simultaneously decreasing the running times of actuators. So FSFR := [out = 0] is adopted. SFRagent has activation intention when transition t1 := [ep [eplSFR , ephSFR ] | ephSFR eplLFR ] is satisfied.

㺃

İ

3.3 Analysis of Collaboration Relationships Based on Contract Net Protocol

Because control sub-problems regulate pressure of recycled gas according to reference pressure given by RPagent, collaboration relationship between RPagent and other control-agents is sequence. Outputs of four control sub-problems share one actuator. So among PCCagent, PCAagent, LFRagent and SFRagent, only one can be activated in a control period. Collaboration relationship among them is competition. Because periodical disturbances have larger influences on pressure of recycled gas than hydro-characteristics of gas, PCCagent and PCAagent have larger priority than LFRagent and SFRagent. According to collaboration relationship among control-agents, detail steps of extended contract net protocol in pressure control system is shown as follows. Step1: A task for controlling pressure of recycled gas is announced to all controlagents. In this phase, all of control-agents are bidders. Step2: RPagent which has largest priority establishes a contract. But it does not have control functions. So RPagent acts as a manager to establish a new contract to realize tracking control along with reference pressure. This is reflected by the marking of PNRPagent, which is transferred from m4 to m6 instead of m5 . Step3: Control-agent, in which transition t1 is satisfied, establishes a contract with RPagent. In this phase, control-agents except RPagent are bidders. 3.4 Actual Running Effect

Multi-agent based pressure control system of recycled gas has been utilized in Yanzhou coking plant. The system is developed by Fix 6.5 configurable software for real-time supervising platform and Visual Basic 6.0 for realization of each derived control-agent. Real-time supervising data from sensors and control signal to actuators are transmitted between supervising platform and control-agents through DDE. An electromotive valve is adopted as an actuator in which degree is 0°-60°. In the plant, two kinds of coke are produced, which is export coke and metallurgy coke. Different kind of coke needs different quantity of gas. So reference pressure is different. And reference pressure varies from 500Pa to 1100Pa. In addition, there are three coke ovens in this plant. Three gas pipes for corresponding coke ovens are connected in parallel which lead to coupling disturbance. Control objective of system is to make the pressure fluctuation of recycled gas limited from -10Pa to +10Pa.

328

Y.-N. Guo et al.

On condition that reference pressure of recycled gas p v is 800Pa and the maximum pressure is 4500Pa. Metallurgy coke is produced. The pressure control system is running with parameters as shown in Tab.1. Table 1. Parameters in pressure control system

parameters K

value

PCA P

K DPCA

K

LFR P

K

LFR I

K

LFR D

SFR l

10

ep

200

ephSFR

value 0Pa 10Pa

ep

LFR l

10 Pa

ep

LFR h

230 Pa

100

ep

PCA l

230 Pa

800Pa

ephPCA

20 2

pv

parameters

1200 Pa

The measured curves of actual pressure before and after pressure control system proposed in the paper is adopted are shown in Fig.6(a) and (b) respectively. Four curves express reference pressure and actual pressure of recycled gas to three coke ovens respectively.

1#coke oven

2#coke oven

3#coke oven

(a) before the system is adopted

2#coke oven

3#coke oven

1#coke oven

(b) after the system is adopted Fig. 6. The measured curves of actual pressure

It is obvious that in all switching processes in current coke oven, pressure of recycled gas reach the maximum. And the influence of the switching process in 3# coke oven on other coke ovens is less than the influence of the switching process in 1# or 2# coke oven on other coke ovens. The reason of the phenomena is the length of

A Novel Multi-agent Based Complex Process Control System and Its Application

329

gas pipes between 1# and 2# coke oven is shorter than them between 3# and other coke ovens. Before the pressure control system is adopted, pressure fluctuate caused by periodical disturbances vary within the large range for a long time and pressure fluctuate caused by other disturbances is also large. Through analysis of ash content, sulphur content, anti-crash intensity and wearable intensity of coke, it indicates large pressure fluctuate leads to the instability of quality of coke. After the pressure control system is adopted, there is no reverse overshoot when each switching process ends. The large pressure fluctuation of recycled gas caused by switching processes in adjacent coke ovens are controlled effectively. And pressure fluctuation is stabilized in ±10 Pa during no switching processes.

4 Conclusions Complex process control systems need a hybrid control mode, which combines hierarchical structure with decentralized control units. Autonomy of agents and cooperation capability between agents in multi-agent system provide basis for realization of the hybrid control mode. A novel multi-agent based complex process control system is proposed. Detailed semantic representation of control-agents is presented using agent-oriented programming. A novel temporal logic analysis of control-agents is proposed utilizing Petri nets. Collaboration relationships among control-agents are analyzed based on extended contract net protocol. Taken pressure control of recycled gas with complicated disturbance as an application, multi-agent based pressure control system of recycled gas is analyzed in detail. Five kinds of control-agents are derived from control-agent. Reachable marking tree and different transition of each derived control-agent are analyzed in detail. Actual running effect indicates the hybrid control mode is effective and flexible. Temporal logic analysis ensures the reachability of the systems. It also simplifies the temporal analysis. Extended contract net protocol provides a reasonable realization for collaboration relationships. The problem to be further studied is to apply the method to design of control system based on network for utilizing immigration characteristics of agent better in the heterogeneous operating platform.

Acknowledgements This work is supported by the Postdoctoral Science Foundation of China (2005037225), the Postdoctoral Science Foundation of Jiangsu Province ([2004]300) and the SR Foundation of CUMT (OC4465).

References 1. Breemen, V., Vries, T.J.A.:Design and Implementation of A Room Thermostat Using An Agent-based Approach.Control Engineering Practice Vol.9 (2001)233-248 2. Wooldridge, M.,Jennings, N.R.:Intelligent Agents:Theory and Practice.The Knowledge Engineering Review Vol.10.(1995)115-152

330

Y.-N. Guo et al.

3. Zhou, B.,Wang, D., Xi, L. et al: Agent-Based Hybrid Control System for Manufacturing Cells. Computer Integ rated Manufacturing Systems Vol.10 (2004) 292-297 4. Hu, G., Sun, Y.: Advances and Application of Multiple Model Control Method. Information and Control Vol.33 (2004)73-74 5. Kohu, W., Nerode, A. :Multiple Agent Autonomous Hybrid Control Systems.Proc.of the IEEE Conference on Decision and control (1992)16-18 6. Chen, L., Din, J.: Complex Process Control System Based on the Theory of Multi-Agent. Mechatronics,Vol.1 (2004)28-31 7. Wei, Z., Lu, Q.: Model of Distributed Computer Control System Based on the Multi-agent system.Journal of Hefei University of Technology Vol.27(2004)1570-1573 8. Fischer, K., Jacobi, S., Diehl, C., et al.: Multiagent Technologies for Steel Production and Control. Proc. International Conference on Intelligent Agent Technology Vol.2 Beijing (2004)555-558 9. Albert, J.N., Breemen, V.: Agent-Based Multi-Controller Systems. Ph.D. thesis. Netherlands. Twente University (2001) 10. Voos, H.:Intelligent Agents for Supervision and Control:A Perspective.Proc. International Symposium on Intelligent Control Greece (2000)339-344 11. Guo, Y., Gong, D., Zhou, Y.: Multi-Agent System Based Cooperative Interactive Evolutionary Computation model.Journal of System Simulation Vol.17(2005)1548-1552 12. Chang-jun, J.: Behavior Theory and Applications of Petri Net.Higher Education Press. Beijing (2003) 13. Hsieh, F.: Modeling and Control of Holonic Manufacturing Systems Based on Extended Contract Net Protocol.Proc. of the American Control Conference, Anchorage (2002) 50375042 14. Guo, Y.,Wang, S., Hao, R. et al.: Multi-layer Control Applied to Pressure Control of The Recycled Coal Gas in Coke Oven.Metallurgical Industry Automation (2002)13-15

Neural Network Based Soft Switching Control of a Single Phase AC Voltage Restorer Kayhan Gulez, Tarık Veli Mumcu, and Ibrahim Aliskan Yıldız Technical University, Electrical Engineering Department, 34349 Besiktas / ISTANBUL Fax.: +90212 2594869 {gulez, tmumcu, ialiskan}@yildiz.edu.tr

Abstract. A neural network based pwm type AC voltage restorer simulation is proposed in this paper. The objective is to apply the neural network switching control technique to the AC voltage restorer to decrease time delays during the switching conditions, to reduce switching losses. Thus, the proposed AC voltage restorer has some advantageous such as quick switching response, simpler and intelligent structure, better output waveform. With unknown and/or variable nonlinear loads, the semi-conductor’s triggering and extinction angles depend on the load parameters in a highly nonlinear fashion. The Artificial Neural Network techniques have proven suitable for parameter identification and control of such nonlinear systems. The transient condition of the AC voltage restorer is improved via neural network based control technique. The simulation of the switching conditions is shown, and MATLAB is used to simulate and test its performance. Simulation results were performed to verify the effectiveness of the proposed control strategies.

1 Introduction Power quality issues are of vital concern in most industries today. The power quality is an index to the quality of current and voltage available to industrial, commercial and household consumers of electricity. The quality of the voltage waveform at the entry point of a consumer’s premises depends upon the types of load within those premises. These loads may be linear or nonlinear (harmonic producing) in nature. For linear loads, any distortion in the voltage waveform should be the responsibility of the supply authority. Contrary to this, for nonlinear loads, any deviation from no load to full load voltage waveform is the responsibility of the consumer [1]. Therefore, the quality of the current depends entirely upon the quality of the voltage with the non-harmonic producing loads. For harmonic producing loads, the quality of current depends upon amount of load, resulting in a maximum distortion in current at light loads and the highest magnitude of the harmonics current at full load. The most common power quality problems faced by industries are: switching transients, swells, sags, surges, extended under-voltages, outages (black and brown), harmonics, and impulses with varying magnitude of the voltage at the point of common coupling (to PCC) [2], [3]. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 331 – 340, 2006. © Springer-Verlag Berlin Heidelberg 2006

332

K. Gulez, T.V. Mumcu, and I. Aliskan

Switching Transients: Switching transients take place when there is a rapid voltage peak of up to 20,000 volts with a duration time of 10 micro seconds to 100 micro seconds. They are commonly caused by arcing faults and static discharge. In addition, major power system switching disturbances, initiated by the utilities to correct line problems, may happen several times a day. When an electric clothes dryer is started, the lights may dim for an instant and then return to normal. In an industrial plant, starting a large motor produces similar results. In both cases the line voltage momentarily decreases due to the inrush of current drawn by the heavy starting load. When a large load is removed from a circuit, the voltage may increase slightly and lights may get brighter for an instant. These events are referred to as sags and swells respectively. Swells: An increase from nominal voltage lasting one or more line cycles. Sags are also known as brownouts, sags are short term decreases in voltage levels. At times of extremely high power consumption, an electric utility may have to reduce the supply voltage for energy conservation purposes. This condition is called a “brownout.” or sags. Surge: A short term increase in voltage, typically lasting at least 1/120 of a second. Surges in an electrical power system can be caused by lightning, utility switching operations and system faults. Surge protection is generally provided by surge arresters located either in the utility system or within the customer’s power distribution system. The neural network based control technique helps the AC voltage restorer to regulate the switching conditions of the semiconductor elements. A detailed design of the neural network methodology is given in Section 4.

2 Fundamentals of the AC Voltage Restorer 2.1 AC/AC Boost Converters AC/AC Converters were built out of electromechanical components such as motorgenerator sets or rotary converters. With the advent of solid state electronics, it has become possible to build completely electronic frequency changers. These devices usually consist of a rectifier stage (producing direct current) which is then inverted to produce AC of the desired frequency. The inverter may use Mosfet, IGBT, thyristor. If voltage conversion is desired, a transformer will usually be included in either the AC input or output circuitry and this transformer may also provide galvanic isolation between the input and output AC circuits. A battery may also be added to the DC circuitry to improve the converter's ride-through of brief outages in the input power. Fig.1 shows the schematic diagram of a single-phase AC-AC boost converter. The AC-AC boost converter is connected on the secondary side of the step-up injection transformer. Under the sags in the voltage at the PCC vS, the boost converter injects a required voltage in series vinj to regulate the load voltage vL at a desired value. The boost converter has four active switches (IGBT with an anti-parallel diode constitutes a switch). The switches SC1 and SC2 form one pair and are turned on/off simultaneously. Similarly, switches SC3 and SC4 form another, pair and are turned on/off simultaneously [1].

Neural Network Based Soft Switching Control

333

Fig. 1. General view to the single phase AC/AC converter

2.2 AC/AC Buck-Boost Converters AC/AC buck converters are used for controlling the supply voltage of sensitive loads. Otherwise, voltage swellings of supply voltage can cause your sensitive devices to break down. Another reason for using these converters is that some electrical/ electronic devices need to lower level voltage source than local AC network to

Fig. 2a. Source circuit of sags/swells of supply voltage

334

K. Gulez, T.V. Mumcu, and I. Aliskan

run properly. However in this study, we proposed using the buck converter to regulate the voltage swells of the supply voltage of the load. Voltage sags and/or swells of supply voltage may occur on the same line in the real world. Only one AC source supplies more than one load and some of these loads may include source components of voltage sags/swells. In Fig. 2.a shows one of these circuit types and curve shape of load voltage for same time sags/swells can be seen here Fig. 2.b.[6][7]

Fig. 2b. Output signals of circuit which was showed in figure 2.a. (from top to bottom: - output of thrystor bridge, - AC voltage source, - Supply voltage of RLC load )

Therefore, we used an injection transformers; secondary winding which were used for the operation of buck-boost conversion. In this way, we could control to the sags and swells of the supply voltage of the load. Thus, the supply voltage of the load can be controlled in an appropriate voltage horizon (band). Fig. 3 shows the proposed AC-AC voltage restorer and additional circuit elements. A small modification (adding a centre tap), in the injection transformer secondary winding and the AC-AC buck-boost converter connection to this winding, makes it capable of taking care of voltage sags as well as swells [2]. On detection of voltage sags in the supply system, the contactors SW1 and SW2 are closed to contact position C1. For voltage swells, SW1 and SW2 are closed to contact position C2. The switches

Neural Network Based Soft Switching Control

335

Fig. 3. A Single phase AC voltage restorer compensates for transients

SC1 and SC2 form one pair and are turned on/off simultaneously. Similarly, switches SC3 and SC4 form another, pair and are turned on/off simultaneously [1]. The value of Np/Ns is equal to 1/6. In Fig. 3, RS and LS are source resistance and inductance, respectively. The components, RW and LW are resistive and inductive components, respectively, of impedance between the injection transformer and the PCC. The parameters L and C are the LC filter’s components to eliminate the high frequency components contained in the voltage across points A and B. The SSVR’s devices (SC1– SC4) are switched to a high frequency (25 kHz), so, the LC filter is tuned to provide a voltage across C that matches with the wave shape of the PCC voltage. This SSVR topology, shown in Fig. 3, use another LC filter with components LF and CF [3]. This filter absorbs EMI noise emanating from the AC-AC converter due to the high frequency switching of devices SC1–SC4. For this converter, isolated gate drives synchronize the switching operations of rectifying switching elements with the line voltage. Fig. 4 shows the timing diagrams of the operation of switching and the switching frequency range of semi-conductor elements used in the simulation.

Fig. 4. Switching frequency range of the semi-conductor elements used in the simulation (mosfet and IGBT), and timing diagrams of the proposed AC/AC converter

336

K. Gulez, T.V. Mumcu, and I. Aliskan

3 AC Voltage Restorer The proposed topology of the voltage restorer shown in Fig. 3 is capable of compensating voltage sags/swells. This topology exhibits a buck-boost control circuit to compensate the sag/swell in the supply voltage. Mosfets were used in this research and high frequency values could be reached for catching and controlling short term voltage sags/swells. Other power electronic devices can not be triggered at high frequency values where Mosfets can. In consideration of all these reasons, buck-boost AC voltage restorer circuit, which can be seen in Fig. 5 was developed.

Fig. 5. Proposed buck-boost AC voltage restorer circuit (Matlab function block includes neural network controller software)

4 Switching Conditions and Neural Network Controller Performance 4.1 Switching Conditions and Design of Neural Network Controller The network receives the load voltage error signals through the use of the buckboost voltage restorer. The network tries to keep supply voltage of the load in voltage horizon (band). The upper and lower bound values of the control horizon are ±03% of the voltage source. Table 1 is given to illustrate buck boost converter activation areas.

Neural Network Based Soft Switching Control

337

Table 1. Buck-boost voltage restorer running conditions for voltage horizon of ±03% of AC voltage source

Inputs AC Source Value (V) >0

0

Outputs

AC Source-Load Voltage Value 4.7 V -4.7 V <…< 4.7 V 4.7 V -4.7 V -4.7 V <…< 4.7 V 4.7 V

Buck Converter

Boost Converter

0 0 1 0 0 1

1 0 0 1 0 0

At the same time, we need to know that which of the mosfets are applied to circuit of buck-boost voltage restorer. Table 2 and Table 3 are included to this data. Input parameters of timing table of M3 and M4 mosfets are different from M1 and M2 mosfets. Because, these two switches built for current of L induction of the circuit. If this current was shut down suddenly, L induction behaves like a voltage source which depends on di(t)/dt value. This can damage to all circuit. More information can be found in Table 3. Table 2. Timing table of the M1 and M2 Mosfets

Inputs (V) AC SourceValue >0

0

Outputs

Source-Load Value 4.7 V -4.7 V <…<4.7 V 4.7 V -4.7 V -4.7 V <…<4.7 V 4.7 V

M1 Mosfet 1 0 1 0 0 0

M2 Mosfet 0 0 0 1 0 1

Table 3. Timing table of the M3 Mosfet and M4 Mosfet respectively (Blue color parts were included to complete the table but they are not possible in real time application)

Inputs AC Source Value(V)

-2

z M1 position 0

>0 1 0 0 1

Inputs

Outputs -1

z M1 position

M3 Mosfet

0 1 0 1 0 1 0 1

0 0 1 0 0 0 0 0

AC Source Value(V)

-2

z M2 position 0

>0 1 0 0 1

Outputs -1

z M2 position

M3 Mosfet

0 1 0 1 0 1 0 1

0 0 0 0 0 0 1 0

338

K. Gulez, T.V. Mumcu, and I. Aliskan

All these conditions and operation areas can be seen in Fig. 6. This figure gives an idea of the number of hidden layers and hidden cells, which are built in the neural network controller [5].

Fig. 6. Buck-boost converters and Mosfets operation areas (H(1,x) : x’th cell of hidden layer1)

Fig. 7. Developed neural network controller

Neural Network Based Soft Switching Control

339

The first hidden layer includes three cells; this position can be seen in Fig 6. One of them draws a border line which defines a decision border and one of these lines can be seen in Fig. 6. The second hidden layer includes 2+2 (4) cells. That means, one of the cells of this hidden layer takes input parameters from only two cells of the first hidden layer and tries to find one of the operation areas. Finally, the output cells take decisions of the operation areas of the buck-boost converters and Mosfets [4]. The general appearance of the developed neural network can be seen in Fig. 7. 4.1 Neural Network Controller Performance To understand the performance of developed neural network controller, Fig 2.b. and Fig 8. can be compared with each other.

Fig. 8. Output of load voltage which was controlled by developed neural network controller

Fig. 8 shows that the developed neural network controller could take the load voltage in a control horizon with an upper and lower area of ±03% of AC supply voltage.

5 Conclusions The switching element which can be triggered at the highest frequency Mosfet for time delays of transient conditions of an AC voltage restorer were analyzed and applied. From the results, it is understood that short time term sags/swells of supply voltage needs high frequency switching to control the voltage. Other aspect of this research is that new type Neural Network controller can predict later steps of load voltage and produce the necessary control signals on time. However, classical control systems can produce the necessary control signals based an error. The control signal in classical control systems is always produced after the error signal, and this signal may not be taken into control horizon of the load voltage. Since there is a time delay between the error signals and the control signals, predictive neural network controller can prevent this time delay.

340

K. Gulez, T.V. Mumcu, and I. Aliskan

References 1. Srinivasan, K., Jutras, R.: Conforming and Nonconfirming Current for Attributing Steady State Power Quality Problems. IEEE Trans. Power Deliv, 13 (1) (1998) 212-217 2. Steciuk, P. B., Redmon, J. R.: Voltage Sag Analysis Peaks Customer Service. IEEE Comput. Appl. Power, 9 (1996) 48-51 3. Hietpas, S. M., Naden, M.: Automatic Voltage Regulator Using An Ac Voltage-voltage Converter. IEEE Trans. Ind. Appl., 36 (1) (2000) 33-38 4. Mumcu, T.V., Gulez, K., Mercimek, M.: Switching Control of an AC/DC Converter by Neural Networks. International Journal of Information Technology, 11 (5) (2005) 78-86 5. Haykin, S. S.: Neural Networks : A Comprehensive Foundation. Prentice Hall, (1999) 6. Leonard, N. E., Levine, W. S.: Using Matlab to Analyze and Design Control Systems. Addison-wesley Publishing Company, UK (1995) 7. MATLAB, Simulink, Symbolic Math Toolbox, Optimization Toolbox, Spline and Lcc Compiler, Math Works Inc. 3 Apple Hill Drive Natick, MA, 01760-2098 USA, (2005)

Neural Network Training Using PSO Algorithm in ATM Trafﬁc Control Yuan-wei Jing1 , Tao Ren1 , and Yu-cheng Zhou2 2

1 Northeastern University, Shenyang, Liaoning 110004, China Research Institute of Wood Industry, Chinese Academy of Forestry, Beijing 100091, China

Abstract. In this paper, we address an end-to-end congestion control algorithm for available bit-rate trafﬁc in high speed asynchronous transfer mode network. A neural network controller is proposed, because the precise characteristics of the switching system architecture are not known and some conditions such as time delay and network load change over time. The particle swarm optimization algorithm, which characterizes fast convergence and global minimum is introduced in neural network weights training. Simulation results show that the control system is adaptive, robust and effective, the quality of service is guaranteed.

1 Introduction Asynchronous Transfer Mode (ATM) technology has been retained as the transport technology to be used in Broadband Integrated Service Digital Networks (B-ISDNs). The ATM Forum has deﬁned ﬁve service classes to support multimedia trafﬁc: the constant bit-rate (CBR) class, the real-time and non-real-time variable bit rate (VBR) classes, the unspeciﬁed bit-rate (UBR) class and the available bit-rate (ABR) class, which is the ”best effort” class. ABR is the only class that responds to network congestion by means of a feedback control mechanism in order to improve network utilization by minimizing data loss and retransmissions [1]. So ABR is more suitable to solve the congestion in network. ATM forum adopted rate-based feedback control method as standard. And, there are two different schemes for rate-based feedback control method:1. Binary feedback scheme; 2. Explicit rate (ER) scheme. In [2] the former was applied because of its simplicity. However, they suffer serious problems of stability, exhibit oscillatory dynamics, and require large amount of buffer in order to avoid cell loss. As a consequence, explicit rate algorithms have been largely considered and investigated. In [3] an analytic method for the design of closedloop congestion controllers is developed, and the issues of stability and fairness are addressed. However, the dynamic version of the method could be costly to implement given the fact that a number of parameters need to be updated frequently. In [4], the authors design a dual PD controller, where the control parameters can be designed to ensure the stability of the control loop. But it is complicated to get the control parameters, too. In recent work, an H∞ based ﬂow controller[5] is designed which is robust to uncertain time-varying multiple time-delays. The controller forces the queue length at the bottleneck node to desired steady-state value asymptotically and also satisﬁes a D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 341–350, 2006. c Springer-Verlag Berlin Heidelberg 2006

342

Y.-w. Jing, T. Ren, and Y.-c. Zhou

weighted fairness condition. In [6], based on the state-space equation, a theorem that states that, for computer congestion control systems with linear controllers, the stability of the system with a single source is equivalent to the stability of the system with multiple sources is proved.The applicability of the result is potentially very broad. In [7], a control scheme based on a smith predictor with a proportional controller inside is presented. It can not only lower the average queue level but improve disturbance rejection. However, the scheme requires knowledge of the round-trip delays between the source and the switches on the connection’s path. So if the propagation delay changes, the system may be unstable. Because the traditional control theories are model-based, the problem remains how to design an efﬁcient congestion controller which can deal with poorly understood ABR and VBR trafﬁc models, large propagation delay, and many other uncertainties existing in ATM networks. Recently, neural networks[8] and fuzzylogic[9] have emerged as viable techniques for dealing with complicated characteristics and environmental changes that cannot be described by an exact mathematical process. So, in this paper, based on explicit rate scheme, a neural network controller which can adjust ABR source rate without the precise ATM network model is presented. The particle swarm optimization (PSO) algorithm which is a novel heuristic algorithm with global optimize ability is used to train the neural network. The simulations show that the algorithm designed is robust, effective and adaptive in variant network environments. This paper is structured as follows: Section 2 describes the ATM network model; Section 3 describes the proposed control law in detail; Section 4 reports simulation results; Finally, last section draws the conclusions.

2 The Network Model Fig. 1 depicts a single bottleneck link ATM switching network which is consisted of: source (S), destination (D), switch node (SN). The network trafﬁc is contributed by source/destination pairs (S, D). Each (S, D) connection is associated with a virtual circuit (VC) which is set up before communication starting.

Fig. 1. Single bottleneck link ATM network model

Here we implement a commonly used FIFO queue switch whose buffer is shared by all VC’s. Associated with each buffer is an ER computation engine that determines the rate for each VC switched through this buffer. The source generates RM cells, at a rate proportional to its current data cell rate. The destination terminal will turn around and

Neural Network Training Using PSO Algorithm in ATM Trafﬁc Control

343

route the RM cells back to the source. The RM cells contain some special ﬁelds including ER ﬁeld which denotes the rate the switch can support. Each node encountered by the RM cell along the VC path, stamps the computed value for the input rate on the RM cell only if this value results to be less than the rate already stored. The node that has the smallest ER value is called bottleneck node. Moreover we assume that there is only one bottleneck node for each VC. The RM cells will return source from bottleneck node after a delay Tb , where Tb denotes the backward propagation delay. On receiving the RM cell, the source sets the input rate to this value, which is bounded by minimum cell rate (MCR) and peak cell rate (PCR). The effect of the new rate becomes apparent at the switch under consideration after another delay T f , where T f denotes the forward propagation delay. The sum of propagation delay in the forward and the backward path represents the round-trip propagation delay, denoted by Trtti = Tb + T f . Let y j (t) be the queue level associated with the j-th link of the bottleneck switch node. So the ﬂow conservation equations, starting at t = 0 with y(0) = 0 may be written as: t N

y j (t) = Satyc

∑ ui j (τ − Trtti )dτ −

0 i=1

t

0

d j (τ )dτ

(1)

⎧ ⎨ 0 if z < 0 where Satyc (z) = yc i f z > yc , N denotes the number of VC’s sharing the j-th link of ⎩ z otherwise the switch node consideration, ui j (t) denotes the input rate of the i-th VC, Trtti denotes the round-trip propagation delay of the i-th VC, d j (t) denotes the bandwidth of the j-th link.

3 Design of the Control Law Let Tmax be the maximum round-trip propagation delay among all the VC’s, because the round-trip propagation delay is the main factor affecting the performance of network, we design the controller under the worst environment, i.e., Trtti = Tmax (i = 1, 2, · · · , N). Then (1) can be rewritten as: t N

y j (t) = Satyc

∑ ui j (τ − Tmax )dτ −

0 i=1

t

0

d j (τ )dτ

Fig. 2. Block diagram of ABR ﬂow control

(2)

344

Y.-w. Jing, T. Ren, and Y.-c. Zhou

Fig. 2 is the block diagram of the ABR ﬂow control. Since it can be difﬁcult to measure the available ABR bandwidth, d j (t) is here modeled as a disturb. In the following part, we will discuss the design of neural network which is trained by PSO algorithm, as well as the algorithm of guaranteing MCR. 3.1 A Novel PSO Algorithm The PSO algorithm is an evolutionary computation technique. It was inspired by the social behavior of a ﬂock of birds when searching for food. Generally speaking, PSO algorithm can be applied to solve most optimization problems such as: system design, multi-objective optimization, pattern recognition and decision making[10]. Each individual adjusts its ﬂying according to its own ﬂying experience and the ﬂying experience of its companions. Each individual is named as a ”particle” which, in fact, represents a potential solution to a problem. Each particle is treated as a point in a D-dimensional space. The i-th particle is represented as Xi = (xi1 , xi2 · · · , xiD ), i = 1, 2, . . . , m; d = 1, 2, . . . , D. Its ﬁtness value is denoted by p f iti . The best previous position of the i-th particle is recorded and represented as Pi = (pi1 , pi2 , · · · , piD ) with its best ﬁtness value denoted by pbesti . The position of the best particle among all the particles in the population is represented by Pg = (pg1 , . . . , pgD ) with its ﬁtness value denoted by gbest . The rate of the position change (velocity) for particle i-th is represented as Vi = (vi1 , vi2 , · · · , viD ). The particles are manipulated according to the following equation: vid = ω vid + c1rand() · (pid − xid ) + c2 rand() · (pgd − xid )

(3)

xid = xid + vid

(4)

where c1 and c2 are acceleration factors, rand() is random function in the range [0,1]. ω is inertia weight which keeps the movement inertial for the particle. It describes the inﬂuence of the previous velocity to the current velocity, and is capable to extend the search space and explore new space. If the inertia weight decreases linearly from 0.9 to 0.4, it may decelerate the movement of the particle, also defend shock when particle close to gbest . The following equation describes the linear variety of ω . ωmax − ωmin ω = ωmax − G (5) Gmax where ωmax and ωmin are the maximum and minimum inertia weight respectively, Gmax is the maximum number of iteration, G is the current number of iteration. 3.2 Neural Network Training with Novel PSO Algorithm In contrast to traditional trafﬁc control method which is difﬁcult to deal with inexact network model and cannot quickly adapt to trafﬁc characteristic changes, the neural network is more suitable for trafﬁc control in ATM networks, since it can learn a nonlinear function with many inputs and outputs and can be exploited in hardware implementations which provides short and predictable response times.

Neural Network Training Using PSO Algorithm in ATM Trafﬁc Control

345

The proposed controller here uses a 3-layer neural network with 3 neurons in input layer, 6 neurons in hidden layer and only one neuron in output layer.The external inputs of neural network which are denoted by ai (i = 1, 2, 3) are the desired queue length r, the queue length variety e and its variable ratio ∆ e respectively. The output which is represented by R, is the estimated rate of each user. Let s j , b j ( j = 1, 2, · · · , 6) represent the input and output of hidden layer respectively. Then the mathematical relationship between external input and the output of neural network is: 3

s j = ∑ ωi j · a i

(6)

b j = f (s j )

(7)

i=1

6

R=

∑ νj · bj

(8)

j=1

where ωi j denotes the connection weight from the input layer to the hidden layer. ν j is the connection weight from the hidden layer to the output layer. Activation function is chosen as the Sigmoid function, that is f (x) =

1 1 + e−x

(9)

As well known, most neural networks are trained using Back Propagation (BP) algorithm, whose main drawbacks are easily getting stuck in local minimum and needing longer training time. PSO algorithm is a novel heuristic algorithm with global optimize ability. So in this paper we use PSO algorithm to train the neural network controller. Let the position vector of each particle denote the set of neural network weights. In order to achieve fast response and short overshoot, we choose the ﬁtness function as: J = 0t |e(t)|dt, where e(t) is the error of the closed-loop system. The other parameters are chosen as: population size is 15, the maximum number of iterations is 100 and the accelerate factors are both 2.01. The process of training neural network using PSO can be summarized as follows: step 1. Initializing: Set the position vector and velocity vector for each particle randomly. step 2. Computing the ﬁtness values: For each particle, compute the p f iti according to the ﬁtness function. If p f iti > pbesti then Pi = Xi ; If p f iti > gbest then Pg = Xi . step 3. Updating: Calculate particle position and velocity according to (3) and (4). step 4. End condition: If the adaptive threshold of ﬁtness value is satisﬁed or the number of iteration reaches to the set value, then update the weights of the neural network; otherwise go to step 2. 3.3 MCR Guarantees and Fairness According to the fairness deﬁnition, no set of ABR connections should be arbitrarily discriminated against and no set of connections should be arbitrarily favored. In the

346

Y.-w. Jing, T. Ren, and Y.-c. Zhou

previous sections, we have explicitly assumed that all the active ABR connections have zero MCR, so the same feedback rate is sent to all active VC’s. However, if ABR connections have different nonzero MCR requirements, we have to consider other fairness criteria, one of which is called ”MCR plus equal share,” deﬁned as follows for VCi : ERi j = MCRi + R

(10)

When MCRi = 0 (i = 1, 2, · · · , N), then ERi j = R; when MCRi > 0, (10) can guarantee all MCR’s. So the common rate in steady state will converge to: N

Rs =

Cabr − ∑ MCRi

(11)

i=0

N where Cabr

is the ABR bandwidth of the link consideration. That is to say, after meeting the requirements of the VC’s with MCR>0, the remaining ABR capacity is shared equally among all the VBR users.

4 Performance Evaluation Via Simulations In this section, we study the transient performance of the neural network controller under a variety of networking conditions and loads. We choose the similar simulation scenarios proposed by Kolaro [4]. It has two switches with 1680 cell buffers each, two groups of ABR sources with each group consisting of ﬁve persistent sources and one group of VBR source consisting of four VBR sources (Fig. 3). The desired queue length r=50 cells. All links have a capacity of 365 cells/ms (155 Mb/s). Obviously the link between the two switches is the bottleneck link. A Gr o u p 5 A B R S o u rc e s

A R e c e iv in g T e rm in a ls

B o t t le n e c k Lin k

100 o r 1000km ...

...

...

...

...

B Gr o u p 5 A B R S o u rc e s

...

S w it c h 1 C Gr o u p 4 VB R S o u r c e s

S w it c h 2

B R e c e iv in g T e rm in a ls

C R e c e iv in g T e rm in a ls

Fig. 3. Single bottleneck link simulation model

4.1 Simulation in LAN The distance from the sources A and B to the switch1 is 100 km and the round-trip propagation is 1 ms. The source parameters are chosen as follows: PCR=365 cells/ms, ICR=MCR=4 cells/ms. Sources in group A start transmission at time t = 1 ms, while sources in group B start at time t = 300 ms. Fig. 4 shows the rate for each source, when the sources in group A start transmission, the rates converge to stable value 73 cells/ms (365/5). When group B sources start transmission, all source rates stabilize around a new equilibrium of 36.5 cells/ms (365/10).

Neural Network Training Using PSO Algorithm in ATM Trafﬁc Control

347

350 300

Rate (cells/s)

250 The Rate of Souces A 200 The Rate of Souces A And B 150 100 50 0 0

0.1

0.2

0.3 0.4 Time (s)

0.5

0.6

0.7

Fig. 4. ABR rate in LAN 60

Queue Level (cells)

50

40 Source B Start

30

20

10

0 0

0.1

0.2

0.3 0.4 Time (s)

0.5

0.6

0.7

Fig. 5. Queue level of the buffer in LAN

From Fig. 5, the queue length converges to 50 cells which is the buffer set point after 175 ms and there is no overshoot. When group B sources start at time t =300 ms, the queue is stable after 100 ms with very small overshoot (1.8 cells). So it is easy to get the conclusion that the neural network controller can respond to the changes of network load on time. It has good robustness. 4.2 Simulation in WAN Here, we set the distance from the sources A and B to the switch1 to be 1000km and the round-trip propagation to be 10ms correspondingly. Other conditions are same as in simulation 4.1. Fig.6 depicts the compare of ABR source rate with 10 ms and 15 ms time delay respectively. The two curves are similar to each other with only a slight degradation for the curve which has 15 ms time delay. So it is easy to get the conclusion that the controller can well adapt to the variety of time delay. Next, we add four video MPEG sources at switch1. The MPEG sources have service priority over ABR sources and start transmission at time t = 1 ms. The ABR sources are persistent (with inﬁnite backlog). As is shown in Fig.7, the aggregate rate of four MPEG sources is about 73 cells/ms. After t = 1 ms and t = 300 ms, the rate of ABR sources

348

Y.-w. Jing, T. Ren, and Y.-c. Zhou 350 Trtt =10 ms Trtt =15 ms

300

Rate (cells/s)

250 The Rate of Souces A 200 The Rate of Souces A And B

150 100 50 0 0

0.1

0.2

0.3 Time (s)

0.4

0.5

0.6

Fig. 6. ABR rate with different time delay in WAN 350 300

Rate (cells/s)

250

The Rate of Souces A The Rate of Sources A And B

200

The Rate of MPEG 150 100 50 0 0

0.1

0.2

0.3 0.4 Time (s)

0.5

0.6

0.7

Fig. 7. ABR rate with VBR existing in WAN

stabilize around 58cells/ms and 29cells/ms respectively. Although the bandwidth of ABR varies at all times, the rates of all ABR sources converge to stable value quickly, with little oscillations. The simulation indicates that the neural network controller can overcome the bad effect caused by VBR in WAN effectively. 4.3 Simulation with Different MCR in WAN The simulations above have been done with the same MCR of all the ABR sources. Now we show that the scheme proposed achieves fairness and supports minimum rate guarantees (MCR>0). The ER can be computed by (10): ERi j = MCRi + R. The following MCR values are assigned to ten ABR sources. 1) Sources A(1), A(2), and A(3) (group S1) have MCR =49 cells/ms; 2) Sources A(4), A(5), and B(1) (group S2) have MCR =38 cells/ms; 3) Sources B(2), B(3), B(4) and B(5) (group S3) have MCR=26 cells/ms. Sources in group A start transmission at time t = 1 ms. From Fig. 8, we observe that the remaining bandwidth after the MCR guarantees are satisﬁed, namely 365-105-48=212 cells/ms, is equally shared amongst the ﬁve sources of group A. In other words, sources in group S1 get 35+212/10=56 cells/ms each,

Neural Network Training Using PSO Algorithm in ATM Trafﬁc Control

349

300

Rate (cells/s)

250

200 The Rate of S1 150 The Rate of S2 100

The Rate of S3

50

0 0

0.1

0.2

0.3 0.4 Time (s)

0.5

0.6

0.7

Fig. 8. ABR source rate with different MCR

whereas sources in group S2 [A(4) and A(5)] get 24+212/10=45 cells/ms. When sources from group B become active at time t =300 ms, the sum of all MCR’s becomes 225 cells/ms and the remaining bandwidth 365-225=140 cells/ms is equally shared amongst the ten sources, so that each source gets an additional bandwidth of 14 cells/ms on the top of its MCR. Indeed, from Fig. 8 we observe that: 1) Sources in group S1 get 49 cells/ms each; 2) Sources in group S2 get 38 cells/ms each; 3) Sources in group S3 get 26 cells/ms each.

5 Conclusion In this paper, we have presented a control-theoretic approach to design a closed-loop rate-based ﬂow controller in high-speed networks. The neural network controller trained by PSO algorithm can be easily implemented in practice and overcome the adverse effect caused by the time delay and the uncertainties of network. Thus the source rates can respond to the changes of network rapidly and avoid the congestion effectively. Further, the algorithm can guarantee the MCR and achieve fairness. Simulation results show that in various environments (in LAN, in WAN or with VBR existing), the scheme has good adaptability and robustness.

Acknowledgement This work is supported by the National Key Spark Project and by the Detection Technology of Forest Products Volatile Foundation.

References 1. The ATM Forum Trafﬁc Management Speciﬁcations, 4.0 ed. ATM Forum, (1996). 2. Bonomi F., Mitra D.,Seery J.B.: Adaptive Algorithms for Feedback-Based Flow Control in High-Speed, Wide-Area ATM Networks. IEEE Journal on Selected Areas in Communications (JSAC), vol. 13 (7). (1995) 1267-1283.

350

Y.-w. Jing, T. Ren, and Y.-c. Zhou

3. Benmohamed L., Meerkov S. M.: Feedback Control of Congestion in Packet Switching Networks: The Case of A Single Congested Node. IEEE/ACM Trans. on Networking, vol. 1. (1993) 693-707. 4. Kolarov A., Ramamurthy G.: A Control-Theoretic Approach to The Design of An Explicit Rate Controller for ABR Service. IEEE/ACM Trans. on Networking, vol. 7. (1999) 741-753. 5. Quet P. F., Atalar B., Iftar A.: Rate-based Flow Controllers for Communication Networks in the Presence of Uncertain Time-varying Multiple Time-delays. Automatica, vol. 38. (2002) 917-928. 6. Sichitiu, Mihail L., Bauer, Peter H.: Asymptotic Stability of Congestion Control Systems with Multiple Sources. IEEE Trans. on Automatic Control, vol 51. (2006) 292-298. 7. Luigi Alfredo Grieco, Mascolo S.: Smith’s Predictor and Feedforward Disturbance Compensation for ATM Congestion Control. Proc. of the 41st IEEE CDC, Las Vegas(2002) 987-992. 8. Habib I., Tarraf A., Saadawi T.: A Neural Network Controller for Congest Control in ATM Multiplexers. Computer Networks and ISDN Systems, vol. 29. (1997) 325-334. 9. Rose Qingyang Hu , David W. Petr.: A Predictive Self-Tuning Fuzzy-Logic Feedback Rate Controller. IEEE/ACM Trans. on Networking, vol. 8. (2000) 697-709. 10. Eberhart R., Shi, Y.: Particle Swarm Optimization: Development, Applications and Resources. Proc. of congress on evolutionary computation, Piscataway, NJ: IEEE Press, (2001) 81-86.

Parameter Identification of Dynamical Systems Based on Improved Particle Swarm Optimization Meiying Ye College of Mathematics and Physics, Zhejiang Normal University, Jinhua 321004, P.R. China [email protected]

Abstract. Improved Particle Swarm Optimization (IPSO), which is a new robust stochastic evolutionary computation algorithm based on the movement and intelligence of swarms, is proposed to estimate parameters of nonlinear dynamical systems. The effectiveness of the IPSO algorithms is compared with Genetic Algorithms (GAs) and standard Particle Swarm Optimization (PSO). Simulation results of two kinds of nonlinear dynamical systems will be illustrated to show that the more accurate estimations can be achieved by using the IPSO method.

1 Introduction A fundamental part of control engineering is the identification of the systems being controlled. According to a known mathematical or an estimated model for systems, a controller will then be designed by a lot of different control techniques such that the certain output response of system can be satisfied. In system identification, considerable efforts have been devoted to develop methods for identification of system models and their parameters. Currently, a wide range of analytical techniques exists for linear systems. However, for nonlinear systems, limited progress has been made with analytical approaches. Instead, some success has been achieved with various traditional optimization methods such as least squares and local search. However, a fundamental problem of traditional optimization techniques is their dependence on unrealistic assumptions such as unimodal performance landscapes and differentiability of the performance function. Consequently, nonlinear problems are often oversimplified to fulfill such assumptions, which makes the found solutions hard to transfer back to the original problem. In recent years, parameter estimation techniques in nonlinear systems identification based on artificial intelligence have been successively proposed [1-3], such as using genetic algorithms (GAs) [4], and other stochastic search techniques. They seem to be a promising alternative to traditional techniques. First, GAs do not rely on any assumptions such as differentiability, continuity, or unimodality. Second, they are capable of handling problems with nonlinear constraints, multiple objectives, and time-varying components. Although GAs have been applied to parameter estimation, recent research has identified some deficiencies in GA performance [5]. This degradation in efficiency is D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 351 – 360, 2006. © Springer-Verlag Berlin Heidelberg 2006

352

M. Ye

apparent in applications with highly epistatic objective functions, i.e., where the parameters being optimized are highly correlated (the crossover and mutation operations cannot ensure better fitness of offspring because chromosomes in the population have similar structures and their average fitness is high toward the end of the evolutionary process). Moreover, the premature convergence of GAs degrades its performance and reduces its search capability. Recently, a new evolutionary technique, the Particle Swarm Optimization (PSO), is proposed [6,7] as an alternative to GAs. Its development was based on observations of the social behavior of animals such as bird flocking, fish schooling, and swarm theory. PSO is initialized with a population of random solutions. Each individual is assigned with a randomized velocity according to its own and its companions’ flying experiences, and the individuals, called particles, are then flown through hyperspace. Compared with GAs, PSO has some attractive characteristics. It has memory, so knowledge of good solutions is retained by all particles; whereas in GAs, previous knowledge of the problem is destroyed once the population changes. It has constructive cooperation between particles, particles in the swarm share information between them. Due to the simple concept, easy implementation and quick convergence, nowadays PSO has gained much attention and wide applications in different fields [8]. In this paper, we demonstrate how to employ the Improved Particle Swarm Optimization (IPSO) method to obtain the optimal parameters of nonlinear systems. The results are compared to those obtained by GAs. It has been demonstrated that the IPSO has better performance than the GAs in solving the parameter estimation problem of nonlinear systems.

2 Problem Formulation This paper considers a class of nonlinear systems, which are described by the statespace model of the discrete form as

x(k + 1) = f (k , x(k ), u (k ), P1 ), y (k ) = h(k , x(k ), u (k ), P2 ),

(1)

where u ∈ R is the input of the system, x ∈ R n are the internal states, y ∈ R is the output, P1 and P2 are the vectors of system parameters, in which may contain the term of the delay time, to be determined. For simplification, let Θ = [θ1 , θ 2 , , θ m ] , where m is the total number of system parameters, be a new vector of parameters that collect all of parameters in P1 and P2 . In order to successfully estimate Θ , the following assumptions on the nonlinear system of equation (1) are requested.

(1) The output y must be measurable in each step and be also finite during the given sampling steps. (2) Each system parameter in Θ must be required in connection with the output y , i.e., the parameter Θ can be estimated from the measurement of the output y .

Parameter Identification of Dynamical Systems Based on IPSO

353

Before proceeding with the optimization operations, a performance criterion or an objective function should be first defined, because the value of an objective function will deeply influence on how to perform the evolutionary type on Θ . In general, the IPSO only needs to evaluate the objective function to guide its search and no requirement for derivatives about the system. In this study, the total summation of square error ( SSE ) is taken as an objective function, which is given by q

q

k =1

k =1

SSE = ¦ [ y (k ) − yˆ (k )] 2 = ¦ e 2 (k ) ,

(2)

where q is the number of given sampling steps, yˆ is the evaluated output from IPSO, and e is the error between y and yˆ . Our objective is to determine the parameters Θ based on using the proposed IPSO in such a way that the value of SSE is minimized, approaching zero as much as possible.

3 Particle Swarm Optimization (PSO) The PSO simulates the behavior of swarm as a simplified social system. Like fish schooling and bird flocking, the social behavior of such organisms can be treated as an optimization procedure. In the PSO system, each particle tries to search the best position (state) with time in a multidimensional space. During flight or swim, each particle adjusts its position in light of its own experience and the experiences of neighbors, including the current velocity and position and the best previous position experienced by itself and its neighbors. This characteristic manifests two basic models of PSO. For more information, we refer reader to Kennedy et al. [7], a standard textbook on PSO, treating both the social and computational paradigms. The PSO differs from traditional optimization methods in that a population of potential solutions is used in the search. The direct fitness information, instead of function derivatives or related knowledge, is used to guide the search. As mentioned above, it is promising to solve the parameter estimation problem of nonlinear systems by adopting PSO. The PSO is initialized with a group of random particles (solutions) and then searches for optima by updating generations. Particles profit from the discoveries and previous experience of other particles during the exploration and search for higher objective function values. Let i indicate a particle’s index in the swarm. Each of m particles fly through the n -dimensional search space R n with a velocity vi , which is dynamically adjusted according to its own previous best solution si and the previous best solution sˆ of the entire swarm. The velocity updates are calculated as a linear combination of position and velocity vectors. The particles interact and move according to the following equations vi ( j + 1) = wvi ( j ) + c1r1 ( j )( si ( j ) − pi ( j )) + c2 r2 ( j )(sˆi ( j ) − pi ( j )) ,

(3)

pi ( j + 1) = vi ( j + 1) + pi ( j ) ,

(4)

where r1 ( j ) and r2 ( j ) are random numbers between zero and one. c1 ( j ) and c2 ( j ) are learning factors, usually about c1 = c2 = 2 . And w is an inertia weight, which is commonly taken as a decreasing linear function in index j from 0.9 to 0.6. It is

354

M. Ye

possible to clamp the velocity vectors by specifying upper and lower bounds on vi , to avoid too rapid movement of particles in the search space. Then we can use the standard procedure to find the optimum. The searching is a repeat process, and the stop criteria are that the maximum iteration number is reached or the minimum error condition is satisfied. The standard procedure is described as below: Data: nonlinear system of equation (1), number of sampling steps q in equation (2), learning factors c1

㧘 c , inertia weigh w and maximal particle velocity v 2

max

in

equation (3), parameters θ1min , θ1max , θ 2 min , θ 2 max , , θ m min and θ m max in search space, swarm size N , and iteration number G (or the summation of square error SSE ). (1) Set the iteration number j to zero. Initialize randomly the swarm S of N particles (population number) such that the position xi (0) of each particle to meet the prescribed conditions. (2) Evaluate the fitness of each particle F ( xi ( j )) . (3) Compare the personal best of each particle to its current fitness, and set si ( j ) to the better performance, i.e.

si ( j − 1) if F ( xi ( j )) ≤ F ( xi ( j − 1), si ( j ) = ® ¯ xi ( j ) if F ( xi ( j )) > F ( xi ( j − 1), (4) Set the global best sˆ( j ) to the position of the particle with the best fitness within the swarm, i.e.

sˆ( j ) ∈ {s1 ( j ), s2 ( j ),, sm ( j )} F ( sˆ( j )) = max{F ( s1 ( j )), F ( s2 ( j )),, F ( sm ( j )),} . (5) (6) (7) (8)

Change the velocity vector for each particle according to equation (3). Move each particle to its new position, according to equation (4). Let j = j + 1 . Go to step (2), and repeat until meets the stop criteria.

It can be easily seen that there are two key steps when applying PSO to optimization problems: the representation of the solution and the fitness function. One of the desirable merits of PSO is that PSO takes real numbers as particles. It is not like GAs, where transformation of binary encoding and special genetic operators are needed. The complete application of PSO, as well as the method to do parameter estimation of nonlinear systems, is discussed in the following section.

4 Improved Particle Swarm Optimization (IPSO) The PSO performance has been investigated in several papers since its presentation [9], [10]. The work presented in [11] describes the complex task of parameter selection in the PSO model. Comparisons between PSO and the standard GA formulation have been carried out in [12], where the author points out that the PSO performs well in the early iterations, but it presents problems reaching a near-optimal solution.

Parameter Identification of Dynamical Systems Based on IPSO

355

The behavior of the PSO in the model presents some important aspects related with the velocity update. If a particle’s current position coincides with the global best position, the particle will only move away from this point if its inertia weigh and previous velocity are different from zero. If their previous velocities are very close to zero, then all the particles will stop moving once they catch up with the global best particle, which may lead a to premature convergence of the algorithm. In fact, this does not even guarantee that the algorithm has converged on a local minimum—it merely means that all the particles have converged to the best position discovered so far by the swarm. This phenomenon is known as stagnation [13]. The solution presented in [14] is based on adding a new parameter and additional equations. Another solution is presented in [15] by introducing a breeding and subpopulation. To solve the problem above, this paper proposes a new model called IPSO, by incorporating the mutation process often used in GA into PSO. This process allows the search to escape from local optima and search in different zones of the search space. This process starts with the random choice of a particle in the swarm and moves to different positions inside the search area. In this paper, the mutation process is employed by the following equation: x(k ) = − x(k ) + δ

where x(k ) is the random choice particle from the swarm, and δ is randomly obtained within the range [0, 0.2 × ( xmax − xmin )] , representing 0.1 times the length of the search apace. This pseudocode for the IPSO algorithm is given by begin Create and initialise While (stop condition is false) begin evaluation update velocity and position mutation end end

The IPSO algorithm was tested on benchmark problems. The results shown that the IPSO has better behavior than the GAs and PSO.

5 Simulation Experiments To verify the proposed IPSO algorithm in parameters estimation of nonlinear systems, several experiments were used as evaluation of the estimating power of the IPSO algorithm. MATLAB Simulation is processed in Intel Pentium4 1.2G processor with 512M RAM, under Windows XP. The fitness function f is a reciprocal of the objective function SSE as in equation (2). It implies the smaller SSE the value of particle, the higher its fitness value f =

1 . SSE

(5)

356

M. Ye

Example 1. The Hénon map with unknown parameters is an example of twodimensional discrete-time chaotic system whose equations is described by x(k + 1) = 1 − Ax 2 (k ) + y (k ) , y (k + 1) = Bx(k )

(6)

where A and B are the two bifurcation parameters, and their actual values are assumed to be fixed at 1.4 and 0.3, respectively, in this simulation. We consider initial condition x(0) = 0.1 , y (0) = 0.1 , maximal particle velocity vmax = 2 , and the time step size q = 10 in this discrete-time system. Let Θ = [θ1 , θ 2 ] = [ A, B] be a vector of estimated parameters. Hence, the objective in parameter estimation is to determine Θ as accurately as possible. Other variables used in PSO or IPSO operations are givenby

θ1min = 0, θ1max = 2, θ 2 min = 0, θ 2 max = 2, N = 40, c1 = 2, c2 = 2, G = 300. The inertia weight w( j ) is taken as a decreasing linear function in iteration index j from 0.9 to 0.4. The optimization process is repeated 10 times and the averages of these results are provided. Table 1 lists the final estimated results after running 300 generations and makes comparisons with the GA method, whose variables are given by

θ1min = 0, θ1max = 2, θ 2 min = 0, θ 2 max = 2 N = 40, pc = 0.8, pm = 0.1, G = 300. where pc is the crossover probability and pm is the mutation probability. Table 1. Comparisons of estimated parameters with the proposed method and the GA and PSO method for the Hénon map

A Actual parameters

1.4000

B

SSE

0.30000

——

Elapsed time(s) ——

IPSO algorithm

1.4000

0.30000

1.0962 × 10

−15

PSO algorithm

1.3997

0.30003

2.0962 ×10

−12

Genetic algorithm

1.3933

0.33319

1.2783 × 10 −2

43.268 40.047 72.652

The convergence curves of SSE values produced by the IPSO algorithm with respect to numbers of iteration are plotted in Fig. 1. Also, Fig. 2 shows the tuning trajectories of two estimated parameters, i.e., A and B , with respect to numbers of iteration by means of our proposed method.

Parameter Identification of Dynamical Systems Based on IPSO

357

0.6

Optimal SSE

0.5 0.4 0.3 0.2 0.1 0.0

0

50

100 150 200 Iteration index j

250

300

Fig. 1. Convergence curves of SSE for the Hénon map

Estimated parameters A,B

2.0 1.6 1.2

A B

0.8 0.4 0.0 -0.4

0

50

100

150

200

250

300

Iteration index j

Fig. 2. Trajectories of A and B using the IPSO method for the Hénon map

Example 2. The proposed IPSO is also applied to an example of three-dimensional continuous-time chaotic system. Let us consider the Lorenz chaotic system whose dynamic equations are in the following x = σ ( y − x), y = ρ x − xz − y, (7) z = xy − β z. It has been proved that the Lorenz system presents chaos when σ = 10 , ρ = 28 and β = 8 / 3 ≈ 2.6667 . In this simulation, the true value of unknown parameters is assumed to be Θ = [θ1 , θ 2 , θ 3 ] = [σ , ρ , β ] = [10, 28, 2.6667] . The initial state of system are taken as x(0) = 0.1 , y (0) = 0.1 , z (0) = 0 , maximal particle velocity vmax = 2 , and the time step size q = 100 , respectively. The data of the system states are generated via the Euler's method with a fixed time step of 0.01. Other variables used in PSO or IPSO operations are chosen as follows:

θ1min = 0, θ1max = 30, θ 2 min = 0, θ 2 max = 30, θ 3 min = 0, θ 3 max = 30, N = 40, c1 = 2, c2 = 2, G = 300. Here other IPSO parameters are the same as that in the Example 1.

358

M. Ye

For this example, Table 2 lists the final estimated results after running 300 iterations and makes comparisons with the GA method, whose variables are givenby

θ1min = 0, θ1max = 30, θ 2 min = 0, θ 2 max = 30, θ 3 min = 0, θ 3 max = 30, N = 40, pc = 0.8, pm = 0.1, G = 300. In Fig. 3, the convergence curves of SSE values produced by the IPSO algorithm with respect to numbers of iteration are plotted. Fig. 4 shows the tuning trajectories of three estimated parameters, σ , ρ and β , with respect to numbers of iteration by means of our proposed method. Table 2. Comparisons of estimated parameters with the proposed method and the GA and PSO method for the Lorenz system

σ

ρ

β

SSE

Actual parameters

10.000

28.000

2.6667

——

IPSO algorithm

10.000

28.000

2.6667

1.3229 × 10

−15

31.797

PSO algorithm

10.002

28.003

2.6666

0.7311× 10 −13

29.043

Genetic algorithm

10.085

27.911

2.7029

6.1101

85.750

Time(s) ——

From the above two examples, the results presented show that a good optimal performance can be achieved by the proposed method for Hénon map and Lorenz system. To compare the performance of IPSO and PSO and GAs, the final estimation results using two different optimal algorithms are listed in Tables 1 and 2. There is obviously difference between the different parameter estimation methods in the SSE values. It is obvious that more accurate parameter estimation can be implemented using the IPSO for both nonlinear dynamical systems. The computational requirement of using the IPSO method is far less than that required by the GA method proposed. Again, the simulation results are shown in Figs. 1–4. From the Figs. 1 and 3, one can see the error SSE converges to zero as generation number j goes to infinity. Fig. 2 and Fig .4 show that the trajectories of the estimated parameters asymptotically converge to their actual values. 3.0

Optimal SSE

2.5 2.0 1.5 1.0 0.5 0.0

0

50

100 150 200 Iteration index j

250

300

Fig. 3. Convergence curves of SSE for the Lorenz system

Estimated parameters σ, ρ, β

Parameter Identification of Dynamical Systems Based on IPSO

359

30 25

σ ρ β

20 15 10 5 0 0

50

100

150

200

250

300

Iteration index j

Fig. 4. Trajectories of σ , ρ and β using the IPSO method for the Lorenz system

6 Conclusions This paper has presented a new technique of using the IPSO method for making parameter estimation of nonlinear dynamical systems. It is clear from the results that the proposed method can avoid the shortcoming of premature convergence of GA method and can obtain higher quality solution with better computation efficiency. The simulation results show that IPSO has much better potential in the field of dynamical system parameter estimation. Furthermore, this method provides a detailed design process for parameter estimation of uncertain chaotic systems and has wide practical applications in many other complex dynamical systems with all the unknown parameters.

Acknowledgements The Project Supported by Zhejiang Provincial Natural Science Foundation of China (Y105281).

References 1. Ursem, R.K., Vadstrup, P.: Parameter Identification of Induction Motors using Stochastic Optimization Algorithms. Applied Soft Computing. 4 (2004) 49–64 2. Kristinsson, K., Dumont, G.A.: System Identification and Control using Genetic Algorithms. IEEE Trans on Systems, Man, and Cybernetics. 22 (1992) 1033–1046 3. Chang, W.-D.: An Improved Real-coded Genetic Algorithm for Parameters Estimation of Nonlinear Systems. Mechanical Systems and Signal Processing. 20 (2006) 236–246 4. Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. MA: Addison Wesley. (1989) 5. Gaing, Z.L.: A Particle Swarm Optimization Approach for Optimum Design of PID Controller in AVR System. IEEE Trans on Energy Conversion. 19 (2004) 384–391 6. Kennedy, J., Eberhart, R.C.: Particle Swarm Optimization. Proc IEEE Conf on Neural Networks. (1995) 1942–1948

360

M. Ye

7. Kennedy, J., Eberhart, R.C., Shi, Y.: Swarm Intelligence. San Francisco: Morgan Kaufmann Publishers. (2001) 8. Eberhart, R.C., Shi, Y.: Particle Swarm Optimization: Developments, Applications and Resources. Proc Congr on Evolutionary Computation. (2001) 81–86 9. Kennedy, J.: Small Worlds and Mega-minds: Effects of Neighborhood Topology on Particle Swarm Performance. Proc Congr on Evolutionary Computation. (1999) 1931–1938 10. Suganthan, P.N.: Particle Swarm Optimizer with Neighborhood Operator. Proc Congr on Evolutionary Computation.. (1999) 1958–1962 11. Shi, Y., Eberhart, R.: Parameter Selection in Particle Swarm Optimization. Proc Annu Conf on Evolutionary Programming. (1998) 591–600 12. Angeline, P.: Evolutionary Optimization versus Particle Swarm Optimization Philosophy and Performance Differences. Proc Annu Conf on Evolutionary Programming. (1998) 601–610 13. Eberhart, R.C., Shi, Y.: Comparison between Genetic Algorithms and Particle Swarm Optimization. Lecture Notes in Computer Science. 1447 (1998) 611–616 14. Bergh, F.V.D., Engelbrecht, A.P.: A New Locally Convergent Particle Swarm Optimizer. Proc. IEEE Proc Conf Systems, Man, and Cybernetics. (2002) 96–101 15. Løvbjerg, M., Rasmussen, T.K., Krink, T.: Hybrid Particle Swarm Optimizer with Breeding and Subpopulations. Proc Conf on Evolutionary Programming. (2001) 469–476

Petri Net Modeling Method to Scheduling Problem of Holonic Manufacturing System (HMS) and Its Solution with a Hybrid PSO Algorithm Fuqing Zhao1, Qiuyu Zhang1, and Yahong Yang2 1

School of Computer and Communication, Lanzhou University of Technology, 730050 Lanzhou, P.R. China {zhaofq, zhangqy}@mail2.lut.cn 2 College of Civil Engineering, Lanzhou University of Techchnology, 730050 Lanzhou, P.R. China [email protected]

Abstract. Holonic manufacturing is a highly distributed control paradigm based on a kind of autonomous and cooperative entity called “holon”. It can both guarantee performance stability, predictability and global optimization of hierarchical control, and provide flexibility and adaptability of heterarchical control. In this paper, A new class of Time Petri Nets(TPN), Buffer-nets, for defining a Scheduling Holon is proposed, A TPN represents a set of established contracts among the agents in HMS to fulfill an order. To complete processing of orders, liveness of TPNs must be maintained. As different orders may compete for limited resources, conflicts must be resolved by coordination among TPNs. A liveness condition for a set of TPNs is provided to facilitate feasibility test of commitments.which enhances the modeling techniques for manufacturing systems with features that are considered difficult to model. A scheduling architecture, which integrates TPN models and AI techniques is proposed. By introducing dynamic individuals into the reproducing pool randomly according to their fitness, a variable population-size genetic algorithm is presented to enhance the convergence speed of GA. Based on the Novel GA and the particle swarm optimization (PSO) algorithms, a Hybrid PSO-GA algorithm (HPGA) is also proposed in this paper. Simulation results show that the proposed method are effective for the optimization problems.

1 Introduction Modern manufacturing systems have to cope with dynamic changes and uncertainties such as machine break down, hot orders and other kinds of disturbances. Holonic manufacturing systems (HMS)[1],[2],[3],[4] provide a flexible and decentralized manufacturing environment to accommodate changes dynamically. HMS is based on the notion of holon[5], an autonomous, co-operative and intelligent entity able to collaborate with other holon to complete the tasks. HMS requires a robust coordination and collaboration mechanism to allocate available resources to achieve the production goal. Multi-agent systems (MAS) [6],[7] provides desirable characteristics D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 361 – 372, 2006. © Springer-Verlag Berlin Heidelberg 2006

362

F. Zhao, Q. Zhang, and Y. Yang

to proactively handle uncertainties. HMS is usually modeled as a cooperative MAS. To optimize performance, contract net protocol[8] is applied to effectively and robustly allocate resource for HMS. As contract net protocol does not model the internal process of agents, direct application of contract net protocol to HMS may not yield satisfactory results. For example, undesirable states such as deadlocks, which cripple systems and degrade the performance, may occur. Although there are a lot of research works on HMS[9],[10],[11], however, deadlock issue has not been addressed[3],[12],[13]. So, the application of Petri net theory to HMS requires further study as HMS are different from centralized systems. How to combine the modeling and analysis capability of Petri nets with MAS to distribute tasks is an interesting problem. This paper presents a framework to model and control HMS based on Petri net and MAS theory. The remainder of this paper is organized as follows. Section 2 proposes a new class of Time Petri Nets(TPN), Buffer-nets, for defining a Scheduling Holon. In section 3, scheduling operation of a Scheduling Holon using Petri-nets is presented. Section 4 proposes a hybrid PSO-GA(HPGA) based evolutionary algorithm. A scheduling holon architecture, which integrates TPN models and HPGA techniques is given in Section 5. Section 6 concludes this paper.

2 Timed Petri-Net (TPN) Based Scheduling Holon 2.1 Buffer Nets Definition

1.

A

timed-PN

is called a Buffer net (B-net) if where the set of places R represents the resources

( P = R ∪ Q) ∧ ( R ∩ Q = Φ ) and the set of places Q represents the buffers, and the following three conditions are also satisfied:

I (r , t ) = O(r , t ) ∀t ∈ T , ∀r ∈ R (2) ∀t ∈ T , there exists a single p ∈ Q : I ( p, t ) = 1 and a single p '∈ Q : O ( P ' , t ) = 1, for p ≠ p ' ; (3) The subnet G ' = (Q, T , I ' , O ' , M ' ,τ ) ; where I ' and O ' are the restrictions of I to (Q × T ) and O to (T × Q ) , respectively, and M ' is the restriction of M to Q , is an acyclic graph without isolated places. (1)

Condition (1) assures that all the resources used by a transition are released after the firing of the transition. Condition (2) indicates that only one input (output) buffer is allowed for each transition. Condition (3) assures that no cycles are introduced. Definition 2. Initial and final states of a B-net (1) For a B-net, BI ⊂ Q is called a set of input buffer places if

∀p ∈ BI and t ∈ T , I ( p, t ) = 1 and O ( p, t ) = 0 , i.e. BI = { p ∈ Q | (!∃) t ∈ T O( p, t ) > 0} .

Petri Net Modeling Method to Scheduling Problem

363

(2) Bo ⊂ Q is called a set of output buffer places if

∀p ∈ Bo and t ∈ T , I ( p, t ) = 0 and O ( p, t ) = 1 , i.e. Bo = { p ∈ Q | (!∃) t ∈ T I ( p, t ) > 0} . (3) A state Mo is called an initial state (marking) for a B-net if {M ( p) = 0 ∀p ∉ ( BI ∪ R)} ∧ {M ( p) ≥ 1 (∀p ∈ R) ∨ (∀p ∈ BI )} . (4) A state M F is called an final (goal) state for a B-net if {M ( p ) = 0 ∀p ∉ ( Bo ∪ R )} ∧ {M ( p ) ≥ 1 (∀p ∈ R ) ∨ (∀p ∈ Bo)} . It is noted that final states.

BI ∩ Bo = 0 and the number of initial states equal the number of

2.2 B-Nets Based Scheduling Holon of HMS Three interesting properties can be observed in B-nets. A B-net is live (in the sense that M ≠ M F ), bounded (in the sense that the initial marking is bounded) and consistent (in the sense that BI and Bo merge to one place). Given an HMS formulation the following basic top–down procedure can be used to build the corresponding TPN. Each resource type is modeled as a place. A job J i is

Pi sub-nets that represent the Pi plans (Fig. 1(a)). The sub-nets are connected to two places. The first place pi _ start , called the input buffer, repre-

initially modeled as

sents the unprocessed parts that are ready to enter the system. A final place output buffer )represents parts that have been completed. A plan Pij is initially modeled (Fig.1(b)) as a sequence of (

Pi − end (the

qi j + 1 places

pij 0 , pij1 , pij 2 …) representing infinite buffers interconnected by PN subsystems,

each modeling a task. Places

pij 0 and pijqij are merged with the places pi _ start ,

Pi − end , respectively. In Fig. 1(c), a task

Tijk of the plan Pij that can be achieved by Cijk alternative op-

erations is modeled as

Cijk sub-PNs that share the input and output places ( Pijk −1 ,

Pijk , respectively) of the task. In Fig. 1(d), each operation source

Oi jkl is modeled as a single transition tijkl . If a re-

R ∈ [ R1 , R2 ,..., Rm ] modeled by place r ∈ [r1 , r2 ,..., rm ] is included in Sijkl

(i.e.

Oi jkl uses resource r ), then r is an input and an output place of the transition

and

τ (tijkl ) = hijkl , where τ

is the firing time of the transition tijkl .

364

F. Zhao, Q. Zhang, and Y. Yang

The initial (final) marking Mo( M f ) of the TPN is obtained by placing as many tokens in pi _ start ( p j _ end ) as parts of job

J i to be processed. Any place r repre-

senting a resource is initialized with as many tokens as the resources of this type that exist in the system. A TPN is said to be live if, no matter what marking has been reached, it is possible to ultimately fire any transition of the net by progressing through some further firing sequence. A B-net is live since for all M ≠ M F there exists at least one enabled transition. A TPN is said to be bounded if the number of tokens in each place does not exceed a given value for any marking reachable from Mo . From the B-net definition we can conclude that for any

t ∈ T , ¦ I ( p, t ) = ¦ O( p, t ) ∀p ∈ P . Hence a B-

net is bounded if Mo is bounded. We use B-nets to model scheduling holon of HMS. The kernel or primitives of the model are formed by: (1) the TPN modeling of the alternate operations in which each task can be achieved; (2) the model of the resources in the system. Resources are represented by resource-places in our TPN model. A token in a place means that the resource is idle and that it can be assigned to an operation. When starting the operation, a token will be removed from the place (the resource is busy) and after the operation is completed a token will be put back in the place (resource is available again). A resource can be shared by various processes.

pi_end

pi1 pi2 ... piN

pi_start Tij1 pi_end

O iN (c)

pi_end

(b) r1 Resource

pijk-1

Tijm

pij2 ...

(a)

O k1 O i2 ...

Tij2

pij1

. . .

rm pijk

pijk-1 Tijkl Oijkl pijk Sijkl (d)

Fig. 1. B-net modeling of job i in Scheduling Holon

3 Formulating the Scheduling Operation of a Scheduling Holon Using Petri-Nets In a Scheduling Holon, there are n (where n > 1) products to be produced using m (where m > 1) processing units. For each product, the sequence by which the processing units will be visited is pre-specified and is referred to as the product (or job)

Petri Net Modeling Method to Scheduling Problem

routing or processing recipes. Normally, the processing time

τ ij

365

for a product (or job)

i (i = 1,2,, n) in unit j ( j = 1,2,, m) is given. In a TPN, places can be used to model activities or resources, and time constants associated with places represent the processing duration of activities. A token residing in a place means either the availability of a resource, or an intermediate product being processed. Transitions can be used to model the discrete events of the involved activities, such as opening a valve, starting an operation, finishing a reaction, etc. Let Oij represent the operation (or processing activity) of product i at unit j . Operation Oij can be represented by two transitions t sij and t fij for the start and the termination of this operation, respectively, and one place pij with time duration

τ ij

for the processing activity (see Fig. 2a). For product i , suppose that Oij is the upward activity of operation

Oik (see Fig. 2b). All the activities of the same product can

be linked by additional intermediate places (see Fig. 2c). In a Scheduling Holon, different products may be processed in the same unit. This kind of resource sharing can be modeled by introducing a resource place p j ( j = 1,2,, m) for each processing unit. Suppose that operations

Oij ( j = 1,2,, m) share the same unit j , then place p j is both the input of the starting events and the output of the terminating events for all these activities. All operations in a Scheduling Holon can then be formulated by the following procedures: Step1: For product i , each processing activity is represented by two transitions

t sij and t fij , and one place pij . When there is no intermediate storage between activities, the final point of one place will correspond to the starting point of the following place; Step2: For product i , the i th initially marked place psi to represent the beginning of the job (e.g., the raw materials are ready), and the i th final place p fi to represent the finishing of the i th job are introduced; Step3: For processing unit

j , resource place p j is introduced to represent its

availability; Step4: In terms of job routing or product recipe, all the activities involved in product i are linked, and modeled as a TPN sub-model; Step5: All activities that share the same resource places p j

( j = 1,2,, m) are

interconnected, and the complete TPN for the scheduling Holon is created. Fig. 2d shows the TPN for a multi-product batch plant consisting of three products (p1–p3) and two processing units (u1-u5), where the processing times are given in Table 1.

366

F. Zhao, Q. Zhang, and Y. Yang

4 Hybrid PSO-GA Based Evolutionary Algorithm (HPGA) 4.1 A Novel GA Evolutionary computing (EC) is an exciting development in computer science. It amounts to building, applying and studying algorithms based on Darwinian principles of natural selection [14],[15]. Genetic algorithms (GA) are a family of computational models developed by Holland [16],[17]. GA operate on a population of potential solutions by applying the principle of the survival of the fittest to produce successively superior approximations to a solution. Recently, more attention has been paid to the population size adjustment. By introducing the age and maximum lifetime properties for individuals, Arabas eliminated the population size as an explicit parameter and developed the Genetic Algorithm with Variable Population Size (GAVaPS) [18]. The maximum lifetime depends on the fitness of the corresponding individual, while the age is incremented at each generation by one. Individuals are removed from the population when their ages reach the value of their predefined maximal lifetime. Further, Bäck et al. proposed a variant of GAVaPS, namely the Adaptive Population size GA (APGA) by remaining the best individual unchanged when individuals grow older [19]. Differently, Eiben gave a new population resizing mechanism based on the improvements of the best fitness in the population and therefore proposed the Population Resizing on Fitness Improvement GA (PRoFIGA) [20].

ts

ts

ij

tf

p si(0)

ij

p ij (tij)

p ij(tij )

ts

t f (ts ) ij

ij

(a)

ik

tf

ts22

ts31

p 22 (2)

p 2 (0)

tf11( ts13)

tf (ts ) i1

p 12 (2) tf13( ts14)

(b) tf

p s3 (0)

p 1 (0)

tf22( ts23)

p 31 (2)

tf31(t s33)

i2

p i2 (ti2 )

ik

p s2 (0)

ts11 p 11 (3)

p i1 (ti1 )

p ik (t ik )

p s1 (0)

i1

p 23 (4)

p 3 (0)

tf23( ts24)

p 33 (5)

tf33(t s35)

i2

p 4 (0) p fi(0) (c)

p 13 (3)

p 24 (4)

p 35(1)

tf14

tf24

tf35

p f1 (0)

p s2 (0)

p 35 (5)

p f3 (0) (d)

Fig. 2. Modeling a Scheduling Holon via TPN Table 1. Processing times (h) of products

Units U1 U2 U3 U4 U5

Products P1 3.0 2.0 3.0 0.0 0.0

P2 0.0 2.0 4.0 4.0 0.0

P3 2.0 0.0 5.0 0.0 5.0

Petri Net Modeling Method to Scheduling Problem

367

By introducing the “dying probability” for the individuals and the “war/disease process” for the population, the authors propose a novel approach in this paper to determine the population size and the alternation between generations. The symbols for our algorithm as follows: POP_INITIAL --- the initial population size, POP_MAX ---the max size of the population, SELECTION_POP--- the number of individuals selected into the reproducing pool in each iteration, sizeof_population--- the size of the current population, DIE_PROBABILITY[k] ----the pre-defined dying probability to those individuals lived for k generations, die probability ---the dying probability of the current individual, VALMAX ---the designated max fitness to determine whether the algorithm should be stopped, value_max--- the max fitness of the current population, value_min--- the least fitness of the current population. The step of the algorithm is summarized as follows: (1) Generate initial population: sizeof_population=POP_INITIAL; create randomly population[sizeof_population], die_probability[sizeof _population]= DIE_PROBABILITY[0]. (2) Evaluate the individual to obtain the max fitness and the least fitness of the population: value_max and value_min. (3) Memorize the best solution and stop if value_max > VALMAX or gap > GAP. (4) Select SELECTION_POP individuals into the reproducing pool randomly according to their fitness. (5) Divide individuals in the reproduction pool into couples randomly. All the couples perform the crossover and mutation operations. If the fitness of one child is larger than value_min, then put it into the population, else determine whether it should be put into the population according to the predefined probability; sizeof_population++. (6) Perform die process. For each individual, according to its die probability determine whether it will die. If the individual should die then sizeof_population- -; else if die_probability =DIE_PROBABILITY[k] then die_probability = DIE_PROBABILITY[k++]. (7) Perform the war/disease process: if sizeof _population > POP_MAX then select POP_INITIAL individuals randomly into the new population according to their fitness. Go to step 2. 4.2 PSO-GA Based Hybrid Algorithm Particle Swarm Optimization (PSO) also is an evolutionary computational model which is based on swarm intelligence. PSO is developed by Kennedy and Elberhart [21] who have been inspired by the research of the artificial livings. Similar to GAs, PSO is also an optimizer based on population. The system is initialized firstly in a set

368

F. Zhao, Q. Zhang, and Y. Yang

of randomly generated potential solutions, and then performs the search for the optimum one iteratively. Whereas the PSO does not possess the crossover and mutation processes used in GAs, it finds the optimum solution by swarms following the best particle. Compared to GAs, the PSO has much more profound intelligent background and could be performed more easily. Based on its advantages, the PSO is not only suitable for science research, but also engineering applications, in the fields of evolutionary computing, optimization and many others [22],[23]. This paper proposes a novel hybrid approach through crossing over the PSO and GA, called hybrid PSO-GA based algorithm (HPGA). The proposed algorithm executes the two systems simultaneously and selects P individuals from each system for exchanging after the designated N iterations. The individual with larger fitness has more opportunities of being selected. Simulations for a series of benchmark test functions show that the proposed HPGA method possesses better ability of finding the global optimum compared to the GA and PSO algorithms. This paper proposes a novel hybrid PSO-GA based algorithm (HPGA). The performance of the algorithm is described as follows: (1) Initialize GA and PSO sub-systems, respectively. (2) Execute GA and PSO simultaneously. (3) Memorize the best solution as the final solution and stop if the best individual in one of the two sub-systems satisfies the termination criterion. (4) Perform hybrid process if generations could be divided exactly by the designated iterative times N . Select P individuals from both sub-systems randomly according to their fitness and exchange. Go to step 2. In this section, the novel GA described in section 4.1 is employed in the hybrid algorithm HPGA. Therefore, the number of individuals should satisfy that P
5 Numerical Results A multipurpose batch plant in a scheduling holon with five products (p1-p5) and five processing units (u1-u5) is considered as the case study to testify our model and algorithm, whose product recipe and processing time are given in Table 2. The TPN model for the NIS policy is shown in Fig.3. Both GA,PSO and HPGA algorithms have been implemented in this example. The number of searches can be decreased from 337 to 164 by the use of the HPGA algrithm and again the results were the same as the results from GA and PSO. The Gantt Chart of optimal schedule is shown in Fig.4. The calculation of usage duration for each unit in the multipurpose case is enhanced about 53% and 35% than that of GA and PSO respectively. Comparisons of different algorithm for the case study is shownin Table 3. SGA means simple genetic algorithm. The Object Function Solution is represent the minimum time function with some constriction. From the table, we can see that the performance of HPGA is better than that of SGA and PSO.

Petri Net Modeling Method to Scheduling Problem Table 2. Processing times (h) of products

Units

Products P1 P2 3.0 0.0 0.0 2.0 2.0 2.0 4.0 3.0 0.0 2.0

U1 U2 U3 U4 U5

P3 2.0 0.0 5.0 0.0 1.0

p s1(0)

p s2(0)

p s3 (0)

ps4 (0)

ts11

ts22

ts31

ts41

p 22 (1)

p 2(0)

p 41(3)

p11(2) tf11(ts13)

p13(3) tf13(ts14)

p1(0)

tf22(ts23)

tf31(ts33)

tf14

tf23(ts24)

pf1(0)

tf24(ts25)

p 25 (2)

ts52

tf52(ts53)

p 3(0)

p 23 (4)

p 24 (4)

p s5(0)

tf41(ts42)

tf33(ts35)

tf42(ts44)

p 4(0) p14(2)

P4 1.0 2.0 0.0 3.0 0.0

tf53(ts55) p 5(0)

p 35(1)

p 44(5)

p55(2)

tf35

tf44

tf55

p f3 (0)

p f4 (0)

pf5(0)

tf25 p s2(0)

Fig. 3. TPN model for a Scheduling Holon(5 × 5)

Fig. 4. Gantt chart for improved method

P5 0.0 3.0 2.0 0.0 2.0

369

370

F. Zhao, Q. Zhang, and Y. Yang Table 3. Comparisons of Different Algorithm for the case study

PSO SGA HPGA

Average Success(%) 91.1% 81.8% 98.2%

Time 95” 206” 58”

Object Function Solution 767.6933 976.966 635.2341

Table 4. Comparison of the result of different problems with different sizes

Scheduling PSO GA HPGA units Time Fitness Time Fitness Time Fitness 112” 793.0080 213” 1252.0451 66” 692.4561 6u6 125” 851.9691 239” 1463.4482 79” 735.6785 7u7 138” 919.7112 258” 1545.8591 90” 785.6912 8u8 151” 1000.5921 276” 1661.9535 107” 821.2398 9u9 173” 1101.4523 299” 1886.6164 121” 860.1475 10 u 10 196” 1244.1212 315” 1938.7123 145” 904.5236 11 u 11 225” 1307.4739 342” 2005.7821 171” 988.5656 12 u 12 251” 1469.5115 385” 2187.8714 197” 1089.3562 13 u 13 Total 1371” 8 687.8393 2327” 13 942.2881 976” 6 877.6585 Improvement in Fitness (%) 23.18(HPGA compare to PSO), 48.22(HPGA compare to SGA) Normalized running time 1.233(HPGA compare to PSO), 2.611(HPGA compare to SGA)

When considering the larger size problem, the SGA and PSO usually hard to obtain the optimal solution in desired time, but the HPGA can often get the optimal result. To test the performance of the HPGA, we randomly produced some problems with different sizes. The result together with the comparison of the PSO and SGA are shown in Table 4.

6 Concluding Remarks In the manufacture-to-order environment, production plans can only be drawn up and executed successfully with the use of a planning and control concept that provides, on the one hand, predictability and stability and, on the other hand, ﬂexibility and fault tolerance. So the model and control of HMS based on Time Petri net theory provides a functional structure for a computer application, which enables the planners to cope with logistic and technological logical Scheduling problems on multiple levels of aggregation. The proposed novel GA algorithm performs the population alternation according to the features of the evolution of the populations in natural. Simulation results show that the proposed GA is more efficient than standard GAs. The proposed HPGA synthesizes the merits in both PSO and GA. It is a simple and yet effective model to handle different kinds of continuous optimization problems. The simulation results of the example show that the methods to scheduling holon are effective for fulﬁlling the scheduling problem is feasible.

Petri Net Modeling Method to Scheduling Problem

371

Acknowledgements This research is supported by Natural Science foundation of GANSU province (grant NO ZS032-B25-013 and 3ZS042-B25-005).

References 1. Balasubramanian, S., Brennan, R. W., Norrie, D. H.: An Architecture for Metamorphic Control of Holonic Manufacturing Systems. Computers in Industry, 46 (2001)1679-1684 2. Wyns, J.:Reference Architecture for Holonic Manufacturing. Ph.D. dissertation, PMA Division, Katholieke Universiteit, Leuven, Belgium (1999) 3. Hsieh, Fu-Shiung:Model and Control Holonic Manufacturing Systems Based on Fusion of Contract Nets and Petri Nets. Automatica,40 (2004) 51-57 4. Babiceanu, R.F., Chen, F.F.,Sturges, R.H: Framework for the Control of Automated Material-handling Systems Using the Holonic Manufacturing Approach. International Journal of Production Research, 42 (2004) 3551-3564 5. Koestler, A.: The Ghost in the Machine. London, Hutchinson (1967) 6. Xu Rui, Cui Pingyuan, Xu Xiaofei: Realization of Multi-agent Planning System for Autonomous Spacecraft. Advances in Engineering Software ,36 (2005)266-272 7. Earl, Matthew G., D'Andrea, Raffaello: Modeling and Control of a Multi-agent System Using Mixed Integer Linear Programming. Proceedings of the IEEE Conference on Decision and Control, (2002)107-111 8. Suesut, T., Tipsuwanporn, V., Nilas, P. et al:Multi Level Contract Net Protocol Based on Holonic Manufacturing System Implement to Industrial Networks. IEEE Conference on Robotics, Automation and Mechatronics,(2004)253-258 9. Mondal Samrat,Tiwari, M.K: Application of an Autonomous Agent Network to Support the Architecture of a Holonic Manufacturing System. International Journal of Advanced Manufacturing Technology, 20(2002) 931-342 10. Fletcher, M.,Deen, S.M.: Fault-tolerant Holonic Manufacturing Systems.Concurrency Computation Practice and Experience, 13 (2001) 43-70 11. Valckenaers, P.,Van Brussel, H.: Holonic Manufacturing Execution Systems. CIRP Annals - Manufacturing Technology ,54 (2005) 427-432 12. Wyns, Jo,Van Brussel, Hendrik,Valckenaers, Paul :Design Pattern for Deadlock Handling in Holonic Manufacturing Systems.Production Planning and Control ,10 (1999) 616-626 13. Hsieh Fu-Shiung: Deadlock Free Task Distribution and Resource Allocation for Holonic Manufacturing Systems Based on Multi-agent Framework.Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, (2001) 2463-2468 14. Oduguwa, V.,Tiwari, A., Roy, R.: Evolutionary Computing in Manufacturing Industry: An Overview of Recent Applications. Applied Soft Computing Journal, 5 (2005) 281-299 15. Nojiri, H: An Evolutionary Computing Approach to Hierarchical Team Decision Problems.International Journal of Smart Engineering System Design ,5 (2003)47-53 16. J.H. Holland: Adaptation in Natural and Artificial System. The University of Michigan Press, Ann Arbor, MI (1975) 17. D.E. Goldberg: Genetic Algorithms in Search, Optimization & Machine Learning.Addison-Wesley, Reading, MA (1989) 18. J. Arabas, Z. Michalewicz, J. Mulawka.: GAVaPS-a genetic Algorithm with Varying Population Size.in: Proc. 1st IEEE Conf. on Evolutionary Computation, Orlando, FL, IEEE Service Center, Piscataway, NJ (1994) 73-78

372

F. Zhao, Q. Zhang, and Y. Yang

19. T. Bäck, A.E. Eiben, N.A.L. van der Vaart.: An Empirical Study on GAs “without parameters”. Proc. 6th Conf. on Parallel Problem Solving from Nature, Paris, France, Lecture Notes in Comput. Sci., vol. 1917, Springer, Berlin (2000) 315-324 20. A.E. Eiben, E. Marchiori, V.A. Valko: Evolutionary Algorithms with On-the-fly Population Size Adjustment. Proc. 8th Conf. on Parallel Problem Solving from Nature, Birmingham, UK, Lecture Notes in Computer Science, vol. 3242, Springer, Berlin(2004) 41-50 21. J. Kennedy, R.C. Eberhart:Particle Swarm Optimization. Proc. IEEE Internat. Conf. on Neural Networks, Perth, Australia,vol. IV, IEEE Service Center, Piscataway, NJ (1995) 1942-1948 22. 22. Vanden, B., Frans, E., Andries, P.: A Cooperative Approach to Particle Swam Optimization. IEEE Transactions on Evolutionary Computation (2004) 225-239 23. Coello, C., Carlos, A., Pulido, G. Lechuga, M.: Handling Multiple Objectives with Particle Swarm Optimization. IEEE Transactions on Evolutionary Computation (2004) 256-279

Real-Time Motion Planning by Sampling Points on Obstacles’ Surfaces Towards HRI Hong Liu1,2, Xuezhi Deng1, Hongbin Zha1, and Keming Chen1 1

National Lab on Machine Perception, Peking University, 2 Shenzhen Graduate School Beijing, China, 100871 {liuhong, zha, dengxz, chenkm}@cis.pku.edu.cn

Abstract. For solving real-time motion planning problems in dynamic environments towards Human Robot Interaction (HRI), a method of sampling points on obstacles’ surfaces to represent moving obstacle is proposed. In preprocessing phase, a mapping from cells in workspace into nodes and edges of the roadmap in configuration space (C-space) is constructed. The roadmap of C-space is constructed based on a PRM framework. In query phase, some determinate points on obstacles’ surfaces are sampled, only the cells which contain at least one sampled point are mapped into the roadmap. Based on the method of sampling points on moving object’s surfaces, the computationally complex processing of online cell decomposition is avoided. Simulation experiments with real parameters of Kawasaki manipulators by the methods of points sampling and cell decomposition for motion planning in dynamic environments are implemented. Experiments show that the proposed method is efficient and feasible for motion planning between moving obstacles and robot manipulators. Finally, simulation results of a real-time motion planning system for human robot interaction are given.

1 Introduction In recent years, as the development of service robots, Human-Robot Interaction problems become more and more important and get lots of attention. Many new issues are being presented in motion planning for human robot interaction closely. Firstly, about efficiency, we need real-time responses of algorithms and robots, and more important issue is friendship between robots and human bodies. Very fast motion of a robot (even without collision) is often a goal of motion planning systems before, but it is maybe very dangerous for nearby human beings. Secondly, about safety, not only collision avoidance is needed, but also a safe distance is required for avoiding unexpected occasions. Finally, about practical system design, not only solving a geometrical problem but also measuring and representing objects is involved. Although these issues are essential issues for motion planning towards realtime Human-Robot Interaction in dynamic environments, little work is done to cope with them. As an application, the method for motion planning between a human arm and robot arms could also help to solve motion planning problems among two/multiple manipulators [1, 2]. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 373 – 384, 2006. © Springer-Verlag Berlin Heidelberg 2006

374

H. Liu et al.

Obstacle representation, collision detection and path searching are three important steps in motion planning [3]. For object representation, configuration space is a classical concept for mapping robots and objects into parameter space of robots [4]. By these kinds of mapping, all motion planning problems can be transformed into a common problem of motion planning for a point robot. C-space based methods are effective for many problems in static environments. For path searching, PRM based methods are greatly promising for their efficiency and completeness [5,6,7,8]. These methods use a roadmap in configuration space instead of the whole C-space to search a path in query phase, so the searching space is remarkably reduced. However, for dynamic environments, representation of moving obstacles in C-space is rather complex for their variable features. Going back to workspace to represent obstacles becomes a new way for motion planning in dynamic environments. DRM based methods decompose the global workspace into unit cells to analyze influences on PRM by different areas occupied by obstacles [9, 10]. Most of on-line collision detection is avoided, which make it more feasible for motion planning in dynamic environments. However, when an obstacle is moving to new position and orientation, cells it occupied must be decomposed again. On-line cell decomposition will cost much computation. Fast and accurate decomposition needs special hardware acceleration, as mentioned in [10]. On-line cell decomposition becomes an important difficulty for practical motion planning system in dynamic environments. Combining with a PRM based framework and mapping from workspace into Cspace, a method of sampling points on obstacles’ surfaces is proposed. Online cell decomposition in workspace is avoided, complexity of motion planning in dynamic environments is decreased.

2 Roadmap and Mapping Construction The motivation to utilize a PRM based framework is that it can reduce the searching space remarkably. We can use a determinate graph with nodes evenly distributed in the C-space, which makes little difference in final performance, but for robustness of the planning system, a probabilistic graph is preferred. 2.1 Roadmap Construction First, a roadmap (denoted as a graph G) in C-space is constructed. Because our system runs in dynamic environments, the position of obstacle will be changed frequently, this graph should be built in the whole configuration space independently from certain obstacle, which is very different from the graph construction in static environments. This graph is denoted as G(Gn , Ge ) , where Gn represents the node set of graph G, Ge represents the edge set of G. 2.2 Mapping Workspace into C-Space Because effect on the graph caused by obstacles in any position of the workspace should be considered, a mapping from whole workspace to graph G in C-space should

Real-Time Motion Planning by Sampling Points

375

be constructed. After discretizing the workspace into basic cube cells with a given size, there are two kinds of mapping: from cells in workspace into nodes in the graph and from cells in workspace into edges in the graph. Which are formalized as follows:

f n : W ( x , y , z ) → Gn f e : W ( x, y , z ) → Ge

(1)

Where W ( x, y, z ) represents a cube cell in workspace with ( x, y, z ) as its center. The mapping f n and f e describe which nodes and edges of graph G will be invalidated caused by the cell W ( x, y, z ) in workspace, respectively, i.e., the manipulator whose configuration lies in the edge or equal to the node will collide with the cell. Given a configuration of the manipulator, it’s easy and direct to find the cells colliding with it, but given a cell in workspace, it’s not easy to find the nodes of graph G in which the manipulator collides with this cell. Therefore, calculating the inverse mapping f n−1 and f e−1 is easier and more efficient. f n−1 and f e−1 are expressed as:

f n−1 : Gn ( Ai , j ,k ) → W f e−1 : Ge ( A, B ) → W

(2)

For a node Ai , j , k (here i, j, k represent the three joint coordinates of point A in Cspace) in G, its inverse mapping f n ( A) indicates the cells in the workspace occupied by the manipulator with joint coordinates i, j, k. For an edge ( A, B ) in Ge , to calculate its inverse mapping, first calculate the inverse mapping of node A and B, i.e. f n−1 ( A) and f n−1 ( B) , then use a dichotomy scheme, iteratively calculate the inverse mapping of the middle point for each current edge. The recursion process stops when no more cells are added after calculating the inverse mapping of one middle point.

3 Sampling Points on Obstacle’s Surfaces To avoid the complex on-line cell decomposition in workspace, an idea and method of sampling points on obstacles’ surfaces is proposed in this section. It contains four basic parts: (1) motivation of the idea; (2) sampling method on obstacles’ surfaces; (3) processing of moving obstacles; (4) discussion on the above method in details. 3.1 Motivation

As the first step for motion planning, object representation can be considered by methods of Grid, Cell Tree, Polygonal Approximation, Boundary Representation, and CSG, et al. Generally, original data of an object are coming from two basic ways, parameters in geometry and practical measurement of physical objects. Main aims of computer vision and computer graphics are to reconstruct 3D presentation of an object from their parameters, equations and data, which are rather complex until now. In many motion planners, the reconstructed 3D model needs to be decomposed into

376

H. Liu et al.

basic cells for mapping into C-space. From parameters and measurement data to 3D model, then from 3D model to basic cells again, whether these complex procedures are needed for motion planning? It is a fundamental knowledge that if an object M is intersected with another object N, object M must be intersected with surfaces of object N, because real objects are continuous in 3D space. It’s to say that collisions between two objects can be noticed by analyzing collisions among surfaces of the two objects. If surface data of an obstacle can be mapping into C-space directly to describe the connectivity in free Cspace, the problems of object representation and decomposition will be solved to a great extend. Fortunately, surface data or surface presentation is easier to be acquired from 3D laser scanner, stereo vision and geometrical parameters. Therefore, surface points of obstacles instead of cell decomposition in workspace are considered to represent moving obstacles for real-time motion planning in this paper. 3.2 Sampling Methods

There are countless points on the surfaces of an obstacle. For efficiency, accuracy and safety, sampling discretized points on surfaces of an obstacle will be a practical way to realize the above idea. Determinate points are sampled on the surface of the obstacle model, illustrated in Fig. 1. For intuitionistic comparison, cell decomposition is illustrated in Fig.1, too. The distance of neighboring points is the same as the cell's side length according to a safety distance.

Fig. 1. Sampled points on obstacle’s surface and cell decomposition

Limited data, maintenance of obstacle’s shape and safety requirement are all important rules for points sampling. It is assumed that the smallest safety distance between a robot and an obstacle is defined as SafeDis, then the distance between two adjacent points should be bigger than SafeDis. There are many approaches of object’s surface sampling according to various representation of surfaces. For example, sampling for 3D range data is similar to simplification of point cloud and sampling for regular objects are based on their geometrical features. Here, an algorithm based on skeleton and section for points sampling on objects’ surfaces is given: Step 1: Determine the skeleton of an object and divide the skeleton into simple branches. The k branches of the skeletons are denoted as Lk. Step 2: Cut the branch vertically along Lk to get sections denoted as Cik, distance between two adjacent sections is SafeDis.

Real-Time Motion Planning by Sampling Points

377

Step 3: Determine the edge line intersected between the section Cik and obstacle’s surfaces along branch Lk., and denote the edge line as Eik Step 4: Select the point with maximum value in x-axis as the first sampled point (noted as Pi ) in Eik. Set j =0. Step 5: From Pi+j, search points along Eik clockwise in a step length of (SafeDis/5), the fifth searched point in Eik is regarded as the next sampled point, and noted as Pi+j+1. Set j=j+1. Step 6: If searching along Eik is finished, go to step 7. Otherwise, go to step 5. Step 7: If all branches are processed, go to step 8; Otherwise, k=k+1, go to step 2. Step 8: Add new points in holes among sampled surfaces points for close representation. Then, Stop. Although computation of the skeleton for an object with complex shapes is rather complex, the above process will not affect the efficiency of motion planning remarkably. The reason is that the skeleton should be computed only once. If an obstacle is known, skeleton extraction can be finished even before planning. For many regular objects, such as cube, cylinder and sphere, many geometrical features can be used for skeleton extraction and points sampling on their surfaces. When a point on surfaces is sampled, the cell including the sampled point can be easily determined. Then, the cell will be mapping into C-space to update the PRM. Using the continuity of an obstacle, the validity of the above method can be proved. If there exists a node A in graph G corresponding to a cell C1 inside the obstacle formalized as A ∈ f n (C1 ) , i.e. the cell C1 collides with the manipulator whose configuration is A , according to the continuity of the obstacle, there also exists a cell C2 on the surface of the obstacle, which collides with the manipulator of configuration A . Therefore, A ∈ f n (C2 ) . In fact, this method is feasible as long as the neighborhood distance of sampled points is less than the distance of SafeDis. With this observation, the number of sampled points can be reduced, which improves the performance of our system heavily. 3.3 Processing of Moving Obstacles

When an obstacle moves with some translation vector and rotation matrix, the sampled points pi (1 ≤ i ≤ N ) also move with the same translation vector and rotation matrix. Let pi′ denotes the new position of pi , transformation can be formulated as follows: ª r11 r12 r13 « r 21 r 22 r 23 pi' = « « r 31 r 32 r 33 « ¬0 0 0

vx º vy » » pi , (1 ≤ i ≤ N ) vz » » 1¼

(3)

Given each sampled point’s new position, it's very easy and fast to find the cell containing this sampled point. Only these cells will be used to find the corresponding nodes and edges in query phase.

378

H. Liu et al.

3.4 Discussion

Generally, a certain distance should be kept among robots and human bodies during HRI for the purposes of higher safety factor, more friendship and more time to deal with unexpected accidents. Therefore, “collision detection” in general motion planning procedure should be take place with “safety detection” in HRI. Safety detection is based on a safe distance instead of the zero distance of “collision”. Once the safe distance is selected as the maximum distance between two adjacent sampled points, and a robot keeps this distance with all the sampled points, then, any parts in the obstacle will keep at least half of the distance from the robot. Therefore, Safety detection between a robot and the sampled points can be regarded as the safety detection between the robot and the obstacle. Because decomposed cells in workspace change their positions and attitudes with moving of the decomposed obstacle, these cells can’t be mapped into C-space directly. In its new configuration, the obstacle must be re-decomposed to determine new set of occupied cells. Thus, the three procedures of motion planning, obstacle’s moving and cell decomposition will be repeated alternately. On the other hand, when an obstacle is moving, only positions of the sampled points on its surfaces will be changed, and no attitude change is involved. In any new configuration of an obstacle, sampled points on its surface will represent object’s position and attitude as before. No re-sampling is needed. The sampling is processed for only one time and cell decomposition is processed for many times during obstacle’s motion. That’s the advantage of surface point sampling method. Following an obstacle’s movement, all sampled points on the obstacle will move and their new positions can be easily computed by same transformation matrix. In Formula (4), all elements rij in the transformation matrix are constant before computing new position of each sampled point. The constants in the transformation matrix can be changed into integers for faster multiplication and computations can be more efficient. The sampling algorithm can be run for geometric models or direct measurement data of obstacles. Obstacle measurement and motion detection are preconditions of practical motion planning. As two important ways, stereo vision and 3D scanner will acquire range data of many discretized points on the surfaces of a obstacle. These range data can easily sampled according to different adjacent distance. On the other hand, it is more complex to reconstruct obstacle model from these range data and then decomposed the model into cells. In recent years, as the development of the 3D scanner, 3D range data of the object’s surfaces can be obtained easily. It prompts practicability of the surface points sampling method. As other application fields of motion planning, more accurate models are needed in many systems of computer animation and virtual reality. 3D range data will also be used frequently in these kinds of systems. Moreover, the sampling method will be very practical when the shapes of obstacles are anomalistic.

4 Scheme of Motion Planning A Model of Kawasaki FS03N manipulator is considered in our system (see Fig. 2). One manipulator has 6 DOFs, but only the first 3 links are for the gross motion planning, the last 3 links are for the trivial planning of the end effector. Therefore,

Real-Time Motion Planning by Sampling Points

379

Fig. 2. Manipulator, human arm and their models

only the first 3 links are considered in our experiments, and the configuration space’s dimension is 3 and 6 for one and two manipulators, respectively. Each link of the manipulator is represented as combination of polyhedrons and cylinders. The human arm is represented as another manipulator with two links. Each link of the human arm is modeled as a cylinder. Now the collision detection problem is the combination of such basic procedures: to determine whether a cylinder (the human arm) collides with a polyhedron or a cylinder in 3D workspace. In our system, motion parameters of the moving human arm are given by keyboard input and random creations. By selecting different keys, human arm’s motion can be controlled on line. 4.1 Sampling Nodes and Graph Construction

In the preprocessing phase, a given number of nodes are sampled in C-space, then, the nodes are used to construct the graph G. Here we don’t use any factor to bias the sampling (e.g., use manipulability [11]), because the environments are sparse, which cause little difference. The C-space is discretized into 161×71×121 configurations. As to the number of nodes in graph G, we choose 5000 as a trade-off. More nodes will cause the generated path smoother and make the system more robust but will cost more memory and influence real-time performance, vice versa. These sampled nodes are connected by the k-nearest rule. The path between two neighboring nodes is generated using a local planner, which connects them directly with discrete points lying on the line segment of the two nodes. 4.2 Mapping from W-Space to C-Space

The workspace is discretized into 38×72×58 basic cube cells, whose side lengths are about 13 mm. The mapping from cells to the graph G is calculated and saved before planning. To reduce the large memory requirement for storing them, the symmetry of the manipulator is used. Using this technique, without losing the efficiency, the memory used for preserving the mapping is halved. 4.3 Cell Decomposition

For comparing with the surface point sampling method, a simple method of region expanding to calculate cells occupied by the obstacle is selected. Expanding is starting with a cell in the center of the obstacle model. For each current cell, collision detection should be carried out between the cell and the obstacle model. Then, expanding a new adjacent cell to determine whether the new cell is collided with the

380

H. Liu et al.

obstacle. If yes, the new cell is accepted, otherwise the new cell is rejected. However, this method is not efficient enough, because the number of cells occupied by the obstacle is not small enough (about one thousand in our system), such that the number of nodes and edges traversed is not small enough either. These two factors make it hard for real-time requirement. 4.4 Sampling, Mapping and Graph Searching

When calculating the mapping of an obstacle in query phase, it is not necessary to generate the union set of the cells’ respective mapping. Just traverse each cell’s corresponding nodes and edges, mark each one traversed no matter whether it’s been marked before. A local planner is used to connect start and goal configurations to graph G. In graph searching step, an A* based method is used. If no path can be found as the obstacle moves, planner try to enhance the graph by adding some nodes near the path found previously. If it still fails to find a path after some periods, the planner waits until the obstacle moves again.

5 Experiments For testing the dynamic planning performance of the proposed method, some interactive motion planning experiments with real parameters of Kawasaki manipulator are implemented. The graphic interface for the planner is shown in Fig.3.

Fig. 3. Graphic interface of the experimental system

In Fig.3, the left part is for workspace, the white point near the human’s head is for goal point. The right part is for C-space, the dark nodes and edges denote the mapping of the human arm, and the thick lighter line denotes the planned path. The system runs on a Pentium III 700Hz PC with 512MB memory. Interactive motion planning in dynamic environments is composed of a series of static planning for each moment, the performance of each static planner affects the efficiency of the interactive planning directly. Therefore, a set of experiments with

Real-Time Motion Planning by Sampling Points

381

the fixed human arm in given positions are performed and analyzed firstly. The results of the eight experiments (E1, E2, … E8, see Fig. 4) with different positions of goal point (noted as white point), different configurations of manipulator and human arms are summarized in Table 1. To analyze and compare the performance, the results of region expanding method and surface points sampling method are given in Table 1. It can be seen that the performance of surface points sampling method is superior to the region expanding method (the planning speeds are 4 to 6 times faster in average).

E1

E2

E3

E4

E5

E6

E7

E8

Fig. 4. Eight experiments with different initial configurations

From table 1, we can see that the numbers of traversing nodes and edges (denoted as Nt ) is a very an important factor for the system performance. Nt depends on the number of cells used (or divided in region expand method) and the average of ( f n (C ) + f e (C ) ) for these cells (here C denotes each cell of them). For the whole workspace, the average of ( f n (C ) + f e (C ) ) in our experiments is about 250, but its value differs in different positions. Using surface points sampling method, the cells used are reduced remarkably compared with region expanding, the value of Nt also reduces remarkably. Experiments with different number of nodes for the eight sets of parameters are implemented. We found that when 4000 is chosen for graph G, the system is still rather robust, and the efficiency advantage is more obvious. Twenty experiments have been implemented for interaction between a human arm and a manipulator. The manipulator can interacted with the moving hand on-line and reach the goal point without collision and keeping a safe distance. A set of interactive planning results is given in Fig.5. The human arm first stretch down to make the manipulator move and turn to left, shown in Fig.5 (a) ~ (d). Then the arm move to left slowly so as the robot can find a path across to approach the goal, shown in Fig.5 (e) ~ (h). Motion of human arm is controlled by keyboard randomly.

382

H. Liu et al. Table 1. Comparision of surface poinbt sampling method and cell expanding method Two mehtods

Surface points sampling

Experiments

Number of times traversing nodes and edges

E1

43028

Number of on-line collision detectoin 0

E2

23012

E3

42727

E4

Cell expanding method

291

1720

Number of on-line collision detectoin 1030

0

70

386

1026

0

260

1175

970

25331

0

80

440

1032

E5

28653

0

96

552

934

E6

23319

0

71

416

952

E7

31358

0

152

638

966

Planning time (ms)

Planning time (ms)

E8

27719

0

110

508

967

Average

30643

0

141

730

986

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Fig. 5. Planning results of interaction between a human arm and a robot manipulator

Fig. 6. Two manipulators interact with moving obstacles

Real-Time Motion Planning by Sampling Points

383

Table 2. Experimental results of two manipulators interacting with moving obstacles Manipulator 1 Experiments E 11

Average time (ms) 35.9

Manipulator 2

Times of planning 51

Total time (ms) 1833

Average time (ms) 16.3

Times of planning 19

Total time (ms) 311

E 12

30.1

63

1898

37.1

48

1781

E 13

23.5

18

424

22.7

44

998

E 14

21.9

47

1029

47.4

32

1517

E 15

33.2

48

1595

38.6

23

888

E 16

46.2

19

877

55.8

19

1061

E 17

51.3

52

2669

20.8

37

769

E 18

49.8

74

3688

55.4

18

997

E 19

50.9

34

1732

35.3

26

920

E 20

31.2

37

1154

20.9

21

439

For testing performance of the sampling method, experiments for two manipulators with 6 DOFs interacting with moving obstacles are implemented, too, illustrated in Fig.6. Obstacles with and without regular shapes are all considered. Collisions between the two moving robot manipulators are considered, too. Average time for motion planning is about 37.5 and 35.0 milliseconds for the two manipulators in ten experiments (E11 to E20, shown in Table II) whose moving parameters of obstacles are given randomly. Experiments show that the sampling method can support realtime motion planning in general dynamic environments.

6 Conclusions This paper proposes a method for solving real-time motion planning problems in dynamic environments. Safety distance, practical representation of obstacles and realtime interaction are all considered which are very important in motion planning towards real-time Human-Robot Interaction. Main contribution of this paper is sampling points on obstacle’s surfaces to represent moving obstacles to avoiding cell decomposition on-line. For testing efficiency of the proposed method, experiments of motion planning among robot manipulators, moving obstacles and a human arm model are implemented. Experimental results for comparing the two methods of cell expanding and the proposed sampling shown that the proposed method is superior to the region expanding method to a great extent. It can be expected that the combination of PRM and the proposed surface point sampling method will be a promising scheme to solve real-time motion planning problems between human arms and robot manipulators in dynamic environments.

384

H. Liu et al.

Acknowledgments This work is supported by National Natural Science foundation of China (NSFC, Project No. 60175025).

References 1. Chen, F., Ding, F. Q., Zhao, X. F.: Collision-free Path Planning of Dual-arm Robot. ROBOT, 24 (2002) 112-115 2. Hirano, G., Yamamoto, M., Mohri, A.: Trajectory Planning for Cooperative Multiple Manipulators with Passive Joints. In Proc. IROS (2000) 2339–2344 3. Hwang, Y. G., Ahuja, N.: Gross Motion Planning - A Survey. ACM Computing Surveys, 24(3) (1992) 219–291 4. Perez, L., Wesley, M. A.: An Algorithm for Planning Collision-free Paths Among Polyhedral Obstacles. Communication, ACM, 22(10) (1979) 560~570 5. Overmars, M. H., Svestka, P.: A probabilistic learning approach to motion planning. In Proc. Workshop Algorithmic Foundations Robotics (1994) 19-37 6. Kavraki, L. E., Latombe, J. -C.: Randomized Preprocessing of Configuration Space for Fast Planning. In Proc. IEEE Conf. Robotics and Automation, 3 (1994) 2138–2145 7. Horsch, T., Schwarz, F., Tolle, H.: Motion Planning for Many Degrees of Freedom Random Reflections at C-space Obstacles. Proc. IEEE ICRA, San Diego, CA (1994) 33183323 8. Wilmarth, S. A., Amato, N. M., Stiller, P. F.: Motion Planning for a Rigid Body Using Random Networks on the Medial Axis of the Free Space. In Proc. ACM Symp. on Computational Geometry (SoCG) (1999) 173-180 9. Leven, P., Hutchinson, S.: Toward Real-time Path Planning in Changing Environments. In Proc. Workshop Algorithmic Foundations Robotics (2000) 363-376 10. Kallmann, M., Mataric, M.: Motion Planning Using Dynamic Roadmaps. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2004) New Orleans, Louisiana, (2004) 4399 - 4404 11. Leven, P., Hutchinson, S.: Using Manipulability to Bias Sampling During the Construction of Probabilistic Roadmaps. IEEE Transactions on Robotics and Automation, 19 (2003) 1020–1026

Sliding Mode Control Based on Fuzzy Neural Network for Missile Electro-hydraulic Servo Mechanism Chunlai Yu, Hualong Xu, Yunfeng Liu, and Shiqi Huang 302 Branch, Xi’an Research Inst. Of High-tech, Hongqing Town, 710025, China [email protected]

Abstract. The position tracking control of a missile electro-hydraulic servo Mechanism is studied. Since the dynamics of the system are highly nonlinear and have large extent of model uncertainties, such as big changes in parameters and external disturbance, a design method of sliding mode control (SMC) based on fuzzy neural network (FNN) is proposed. A FNN is introduced in conventional SMC, which uses the dynamical back propagation (BP) algorithm to ensure the existing condition of SMC. The chattering brought by sliding mode switch control can be effectively attenuated, without sacrificing the robustness of SMC. Simulation results verify the validity of the proposed approach.

1 Introduction The electro-hydraulic servo mechanism has been frequently used in the position servo system of a missile thanks to their capability of providing large driving forces or torques, rapid response and a continuous operation [1]. However, electro-hydraulic servo mechanism inherently has many uncertainties and highly nonlinear characteristics, which results from the flow-pressure relationship, oil leakage, and etc. Furthermore, the system is subjected to load disturbances [2]. Consequently, the conventional control approaches based on a linearized model near the operating point of interest may not guarantee satisfactory control performance for the system. Since the variable structure control strategy using the sliding mode can offer many good properties, such as insensitivity to parameter variations, external disturbance rejection and fast dynamic response [3], SMC has been studied by many researchers for the control of electro-hydraulic servo system [4-6]. However, SMC may suffer from the main disadvantage associated with the chattering control input due to its discontinuous switching control used to deal with the uncertainties. The most commonly used method for attenuating the chattering control input is the boundary layer method [6]. The control input is smoother than that without using a boundary layer. However, its stability is guaranteed only outside of the boundary layer, and its tracking error is bounded by the width of boundary layer. In [7, 8] fuzzy control theory is introduced to attenuate the chattering, but it is hard to establish the exact fuzzy rule. FNN incorporates inference of fuzzy logic and learning ability of neural networks. In this paper, FNN is introduced to replace the discontinuous sign term in the SMC, D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 385 – 394, 2006. © Springer-Verlag Berlin Heidelberg 2006

386

C. Yu et al.

which can attenuate the chattering indeed, and we apply the control method to the missile electro-hydraulic servo mechanism. Simulation results show the advantages of the approach.

2 Problem Statement and Design of Conventional SMC For a kind of missile electro-hydraulic servo mechanism, which is a typical electrohydraulic position servo system [1], Fig.1 shows a structure diagram of missile electro-hydraulic servo mechanism.

guidance & control unit

δc

digital controller

electrohydraulic servo valve

current amplifier

actuator

δ

nozzle

potentiometer

Fig. 1. Structure diagram of Missile electro-hydraulic servo mechanism

The closed loop of control system is composed of a digital controller [9], a current amplifier, an electro-hydraulic servo valve, an actuator, and a potentiometer. The objective of the control is to generate the input current such that the angular position of the nozzle is regulated to the desired position. The piston position of the actuator is

δ

controlled as follows: Once the voltage input corresponding to the position input c is transmitted to the digital controller, the input current is generated in proportion to the error between the voltage input and the voltage output from the potentiometer. Then the valve spool position is controlled according to the input current applied to the torque motor of the servo valve. Depending on the spool position and the load conditions of the piston, the rate as well as the direction of the flows supplied to each cylinder chamber is determined. The motion of the piston then is controlled by these flows, and then swing angle δ of the nozzle is achieved. At the same time, the piston is influenced by an external disturbance generated from the nozzle. The whole system dynamics model is given by the following derivation equations [1]

VT S + K ce ) . 4B ARPL = IS 2δ + nSδ + Kδ δ + M

Kui KV KQ u = ARSδ + pL (

(1)

㧙servo amplifier gain, K 㧙servo valve gain, P 㧙load pressure , K 㧙 valve flow gain, A 㧙 pressure area in the actuator, R 㧙 effective torque arm of the linkage, V 㧙effective system oil volume, K = C + K ( C 㧙leakage coefficient of cylinder, K 㧙valve pressure gain), B 㧙oil effective bulk modulus, n 㧙coefficient of viscous friction, I 㧙moment of inertia, M 㧙load torque, K 㧙coefficient of position torque, u 㧙input voltage, δ 㧙swing angle of the nozzle, S 㧙Laplace operator. where K ui

L

V

T

ce

e

c

e

c

δ

Q

SMC Based on FNN for Missile Electro-hydraulic Servo Mechanism

x3 ]T = [δ

δ δ]T , then the system state-

x1 = x2 ° x2 = x3 . ® ° x = f ( X ) + gu − d ¯ 3

(2)

Choose system state: X = [ x1 space equation is

a1 =

where

387

4 BKδ K ce , IVT

a2 =

f ( X ) = −a1 x1 − a2 x2 − a3 x3 , g =

x2

4 B( AR) 2 + 4 BK ce n + Kδ VT , IVT

a3 =

n 4 BK ce + , I VT

4 BK ce 1 dM 4 BAR M+ ⋅ K Q KV K ui , d = IVT IVT I dt

The parameters a1 , a2 , a3 , g , d are all uncertainties due to the variations of K Q , B , Ce ,

K ui and M . It is assumed that δ d is the desired angle, and has up to 3rd derivative. All state variables are measurable and bound. The objective is to let the state vector X track X d under the condition of parameter variations and external disturbances, where X = (δ , δ , δ ) . Define the tracking error e = x − δ , and the error vector d

d

d

d

1

e = [e1

e2

d

1

e3 ]T = [e1 e1 e1 ]T .

(3)

In the conventional SMC design, we usually assume ai = ai 0 + ∆ai

㧘g = g

0

+ ∆g .

where ai 0 , g 0 is the nominal parameters of ai and g

㧘and ∆a , ∆g i

(4) is the model

uncertainty. Let α i (t ) , β (t ) and r (t ) are the upper bound function of ∆ai , ∆g and d respectively, i.e. ∆ai ≤ α i (t ) , ∆g ≤ β (t ) , d ≤ r (t ) . Take s (e ) = c1e1 + c2 e2 + e3 .

(5)

where c1 , c2 are constants and λ 2 + c2 λ + c1 is Hurwitz polynomial. Then the sliding surface is c1e1 + c2 e2 + e3 = 0 .

(6)

Take the derivative of (5), set s(e) = 0 , then the equivalent control can be obtained as 3

uequ = g 0−1 [¦ ai 0 xi − c1e2 − c2 e3 + δd ] .

(7)

i =1

From (2) and (7), the control law is taken as u = uequ + u N = uequ + K sgn( s) . 3

where K = − [¦ α i (t ) xi + β (t ) uequ i =1

1, s > 0 ° + r (t ) + η ] [ g 0 − β (t )] , sgn( s ) = ® 0, s = 0 ° −1, s < 0 ¯

(8)

388

C. Yu et al.

From the analysis above, we get ss ≤ −η s < 0 , where η > 0 is constant. So under the control law (8), the sliding surface exists and is reachable. Since λ 2 + c2 λ + c1 is Hurwitz polynomial, the sliding surface is stable.

3 Design of SMC Based on FNN From (8) it can be seen that the undesirable control input chattering in the conventional SMC is caused by the discontinuous sign term sgn( s) . The switching control law uN which guarantees the reachability and existence of the sliding mode is in proportion to the uncertainty bound including α i (t ) xi , β (t ) uequ and r (t ) . However, the bound of parameter variations and the external disturbance are difficult to know in advance for practical applications. Therefore, usually a conservative control law with large control gain K is selected. However, it will yield unnecessary deviations from the switching surface, causing a large amount of chattering. online learning algorithm

δd

×

e1

s

X

d dt

s s

FNN

SMC

∆u N

uequ

× z −1

uN +

+

×

electro-hydraulic servo mechanism

δ

Fig. 2. Missile electro-hydraulic servo mechanism with SMC based on FNN

Therefore, a FNN is adopted in this study to facilitate adaptive switching control law adjustment. The control block diagram of the SMC based on FNN is shown in Fig.2. The inputs of the FNN are s and its derivative s , and the output of the FNN is the substituted sliding switch control ∆u N . The adjustment of ∆u N is stop when the output error between the position command and the actual plant is zero. If the output error e → 0 as t → ∞ implies s and s → ∞ as t → ∞ . If the uncertainties are absent, once the switching surface is reached initially, a very small value of u N would be sufficient to keep the trajectory on the switching surface, and the amplitude of chattering is small. However, when the uncertainties are present, deviations from the switching surface will require a continuous updating of u N produced by the FNN to steer the system trajectory quickly back into the switching surface. Though the true value of the lumped uncertainty cannot be obtained by the FNN, a less conservative control is resulted to achieve minimum control effort according to s and its derivative s [10].

SMC Based on FNN for Missile Electro-hydraulic Servo Mechanism

389

The structure of FNN can be seen in [11]. The signal propagation and the basic function in each layer are introduced below. Layer 1 is the input layer. The net input and the net output are represented as fi (1) = xi(0) = xi , xi(1) = fi (1) ( i = 1, 2 )

.

(9)

where x1 = s , x2 = s . In order to guarantee tracking precision, s and s should be changed into the uniformed form. The net input is initialized in the range [ −1,1] . Layer 2 is the linguistic variable layer. The term set of each fuzzy variable is{ NB , NS , ZE , PS , PB }. In this layer each node performs a membership function. The Gaussian function is adopted as the members function.

f

(2) ij

=−

( xi(1) − cij ) 2

σ

2 ij

, xij(2) = µi j = e

f ij( 2 )

(i = 1, 2; j = 1, ,5) .

(10)

Layer 3 is the fuzzy logic rule layer. (2) (3) f j(3) = x1(2) = a j = f j(3) (i1 , i2 = 1, ,5; j = 1, , 25) . i1 x2 i2 , x j

(11)

Layer 4 is a uniformed layer. Each node is the uniformed form as following

f j(4) = x (3) j

25

= aj

¦x

(3) i

i =1

25

¦a

(3) i

, x (4) = a j = f j(4) ( j = 1, 2, , 25) . j

(12)

i =1

Layer 5 is the output layer. 25

25

j =1

j =1

f (5) = ¦ w j x (4) = ¦ w j a j , x (5) = ∆uN = f (5) j

.

(13)

BP is used in the learning algorithm for FNN. There are two phases including an off-line training one and on-line training one. They are described below respectively. 1) Off-line training phase. The training sample data are produced by fuzzy control [6]. For training the FNN, the cost function is defined as follows: J = 0.5 × (∆u*N − ∆uN )2 .

(14)

where ∆u*N is the desired output and ∆u N is the FNN’s output. Then the weight w j and the parameters ( cij , σ ij ) of membership function can be modified as follows w j (t + 1) = w j (t ) − β

cij (t + 1) = cij (t ) − β

∂J ∂w j

( j = 1, 2,, 25 ).

∂J ∂J , σ ij (t + 1) = σ ij (t ) − β ( i = 1, 2; j = 1,,5 ) ∂cij ∂σ ij

(15)

(16)

where β is the learning rate. ∂J ∂w j , ∂J ∂cij and ∂J ∂σ ij can be derived as follows such that the cost function defined in (14) is minimized by using the BP algorithm.

390

C. Yu et al.

2( xi − cij ) 2 2( xi − cij ) ∂J ∂J ∂J ( 2) = −(∆u*N − ∆u N )a j , = − , δ . = −δ ij(2) ij ∂w j ∂cij σ ij3 ∂σ ij σ ij2 1

where δ (5) = ∆u*N − ∆u N , δ (4) = δ (5) w j , δ (3) = j j

25

(¦ ai )2

25

25

i =1

k =1

(17)

(4) (δ (4) j ¦ ai − ¦ δ k ak ) i≠ j

k≠ j

i =1

n ∂f k(3) i µ jj if xij(2) is the input of rule k = ° ∏ (2) (2) (3) δ ij = ¦ δ k Sij e , Sij = ® ∂xij j =1, j ≠ i k =1 ° 0 otherwise ¯ 2) On-line training phase. The pre-training FNN is implemented to enhance the control performance. Considering Computation effort and real-time character, the adjusted parameter is only w j . For training FNN, the cost function is defined as 25

−

( xi − cij )2

σ ij2

J c = 0.5 × ( y − xd )2 .

(18)

Thus the weight parameters w j can be modified as follows w j (t + 1) = w j (t ) + r ⋅ ( y − xd )a j (∂y ∂∆uN ) .

(19)

where r is the learning rate. Considering Computation effort , ∂y ∂∆u N is replaced by its sign function sgn(∂y ∂∆uN ) [11].

4 Simulation Results and Discussion For a missile electro-hydraulic servo mechanism (1), the nominal value [1] of some parameters are assumed as kui = 5mA / V , K Q = 12cm3 /( s ⋅ mA) , A = 10cm 2 ,

R = 17cm . Substituting the values into (2), we get a10 = 0 , a20 = 8873.64 , a30 = 37.68 , g 0 = 179425 , d = 0.86M + 9.73M , where M = M f 0 Sgnδ + M d , M f 0 is frictional torque amplitude, M d is position torque. Desired trajectory δ d (t ) = sin 2π t . The sampling period t = 1ms . Assume ∆ai = 0.5sin(2π t )ai 0 , so ∆ai ≤ α i (t ) = 0.5 × ai 0 , ∆g = 0.2sin(2π t ) g 0 , so

∆g ≤ β (t ) = 0.2 × g 0 , M f 0 = 3000 + 1000 sin 2π t , M d = 500 + 100sin 2π t . Choose the poles of the system as described by (6) at −80, −80 , we can obtain c1 = 6400 , c2 = 160 . The initial values of system state variables X (0) = [1 0 0]T . The initial values of u N is zero. All the parameters of the membership functions and connective weights are randomly initialized in the range [0,1] . Design parameter r = 0.15 , β = 0.25 , η = 1 . We do simulation research and compare results with that of conventional SMC under the same condition of parameter variations and external disturbances. Simulation results are indicated in Fig. 3 Fig. 8. Fig. 3 and Fig. 4 show the tracking

ˉ

SMC Based on FNN for Missile Electro-hydraulic Servo Mechanism

391

response of the system. Fig. 5 and Fig. 6 show the tracking error. Fig. 7 and Fig. 8 show the control input where the controller is taken as variable universe adaptive FSMC or the conventional SMC.

Fig. 3. Tracking response of system

Fig. 4. Magnifying figure of tracking response

Fig. 5. Tracking error of system

392

C. Yu et al.

Fig. 6. Magnifying figure of tracking error

Fig. 7. Control input with SMC based on FNN

Fig. 8. Control input with conventional SMC

SMC Based on FNN for Missile Electro-hydraulic Servo Mechanism

393

Simulation analysis: From the simulation results, we can conclude that: 1) if the controller is the conventional SMC, the tracking error is small and there are serious high frequency chattering in the control signal due to the sign function in the switching control; 2) if the controller is the SMC based on FNN, chattering phenomena is attenuated, the control input is smooth and the strength of the control signal can also be significantly reduced. The transient deviation of tracking error and control input, which are depicted in Fig.5 and Fig.7, respectively, are induced owing to the parameters initialization of the membership functions and connective weights especially under the occurrence of uncertainties. The tracking error is small because the adjusted parameter in the online training of the FNN can deal with the uncertainty of the system effectively.

5 Conclusions In this study, a design method of SMC based on FNN is proposed to control the position of missile electro-hydraulic servo mechanism. A FNN is introduced in conventional SMC, which uses the dynamical BP algorithm to ensure the existing condition of SMC. The high frequency chattering brought by sliding mode switch control can be effectively minimized, without sacrificing the robustness of sliding mode control. Simulation results indicate that the control approach can cope with uncertainties to obtain an excellent tracking result without the occurrence of chattering control input.

References 1. Zhu, Z. H.: Thrust Vector Control Servo System. Beijing, Astronautics press (1995) 2. Wang, Z. L.: Control on Modern Electrical and Hydraulic Servo. Beijing, Beijing University of Aeronautics and Astronautics press (2004) 3. Hung, J. Y., Gao, W. B., Hung, J. C.: Variable Structure Control: A Survey. IEEE Trans. Ind. Electron., Vol. 40, No.2 (1993) 2-22 4. A. G. Mohamed: Variable Structure Control for Electro-hydraulic Position Servo System. The 27th Annual Conference of the IEEE Industrial Electronics Society (2001)2195-2198 5. Liu, Y. F., Dong, D.: Research on Variable Structure Robust Control for Electro-hydraulic Servo System. Journal of Second Artillery Engineering Institute, Vol. 19, No.4(2005) 12-14 6. Duan, S. L., An, G. C.: Adaptive Sliding Mode Control for Electro-hydraulic Servo Force Control Systems. Chinese Journal of Mechanical Engineering, Vol.38. No.5 (2002) 109-113 7. Ha, Q. P., Nguyen, Q. H.: Fuzzy Sliding Mode Controllers with Applications. IEEE Transactions on industrial electronics, Vol.48. No.1 (2001) 38-46

394

C. Yu et al.

8. Mihajlov, M., Nikolic, V., Antic, D.: Position Control of an Electro-hydraulic Servo System Using Mode Control Enhanced by Fuzzy PI Controller. Mechanical Engineering, Vol.1. No.9 (2002) 1217-1230 9. Liu, Y. F., Miao, D.: 1553B BUS and Its Application in Electro-hydraulic Servo System. Machine Tool & Hydraulics, Vol.38. No.9 (2004) 106-108 10. Karakasoglu, A., Sundareshan, M. K.: A Recurrent Neural Network-based Adaptive Variable Structure Model Following Control of Robotic Manipulators. Automatica, Vol. 31. No.5 (1995) 1495-1507 11. Wang, Y. N.: Intelligent Control Engineering for Robots, Beijing, Science Press (2004)

Stability Analysis of Network Data Flow Control for Dynamic Link Capacity Case Yuequan Yang1, Yaqin Li1, Min Tan2, Jianqiang Yi2, John T. Wen3, and Xuewu Guo4 1

College of Information Engineering, Yangzhou University, Yangzhou 225009, China 2 Key Laboratory of Complex Systems and Intelligence Science, Institute of Automation, Chinese Academy of Sciences, Beijing 100080, China 3 Department of Electrical, Computer and Systems Engineering, Rensselaer Polytechnic Institute, Troy, NY 12180, USA 4 Shanxi Jinyao Coking Co. Ltd, Shanxi 030001, China [email protected], [email protected]

Abstract. Much attention has been paid upon network data flow control in recent years. The main problem in this field is how to design good algorithm or control law for flow rate of network data flow sources and for updated price of communication links. Base on Lyapunov stability theory, this paper makes a deep analysis of stability of such network data flow control systems with the consideration of dynamic link capacity case. Simulations show that the stability analysis of dynamic link capacity of network data flow control system provided in the paper is enlightening and meaningful to further understand and design good control strategy of network data flow control problem.

1 Introduction In recent years congestion control has attracted much interest in the field of control theory research. Many good regulation methods and control schemes have been proposed. In the Internet environment, network flow is governed by the interconnection between information sources and communication links [1-9]. With this view, the central problem is to seek good regulation law for each source rate and update law of price for communication links. F. Kelly, A. Maulloo, and D. Tan (1988, [1]) and S. H. Low and D. E. Lapsley (1999, [2]) provided a common approach to flow control, that is, to decompose the problem into a static optimization problem and a dynamic stabilization problem. For the optimization the main task is to design algorithms to approximate to equilibrium of the closed loop system with some constraints of available queue length or available link rates based on gradient projection optimization techniques. In [2], the synchronous algorithm and asynchronous update algorithm are proposed. S. H. Low, F. Paganini, and J. C. Doyle (2002, [3]) pointed out that congestion control mechanisms in today’s Internet already represent one of the largest deployed artificial feedback systems. In [3], comprehensive D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 395 – 406, 2006. © Springer-Verlag Berlin Heidelberg 2006

396

Y. Yang et al.

description and analysis were given with optimization-based framework. Considering the presence of communication delays, R. Johari and D. K. H. Tan (2001, [4]) gave stability condition of network rate control for both a single resource and a large network with constant round-trip delay. Especially, J. T. Wen (2004, [5]) developed a unifying framework for stabilizing source and link control laws, which encompass many existing algorithms in [1,6] and many special cases. Based on passivity theory, J. T. Wen proposed a comprehensive equilibrium stability analysis and dynamic control law design by constructing different passive system or strictly passive systems instead. The greatest advantage of this novel strategy is the combination of equilibrium stability and dynamic control law design for network flow control. Though, because the complexity and variety of network environment, capacity of links to be allocated is not static, but often dynamic according to its available bit rate for some networks such as ATM network environment in reality. Contraposing to this problem, this paper mainly aims at giving deep stability analysis of the optimum equilibrium manifolds of the primal optimal problem and its dual problem based on some mild conditions. This paper is organized as follows. In section 2, we provide some preliminary knowledge including some critical concepts, such as positive projection, strictly passivity and other critical results to be used in next sections. The main problem under consideration is described in detail in section 3. The deep L p stability analysis of the two kinds of systems based on rate control for information sources and link price update price law models is given based on Lyapunov stability theory in section 4. The simulation experiments are done in section 5. Finally, conclusion is made in section 6.

2 Preliminary Knowledge In this section, some preliminary knowledge about passivity theory and some critical results are given as follows. Definition 2.1. [5] Positive projection

( f (x)) +x

with some function

f (x ) is

defined as follows

f ( x ) ᧨ if x > 0, or x = 0 and f ( x ) ≥ 0 ( f ( x )) +x = ® ¯0 ᧨ if x = 0 and f ( x ) < 0

(1)

Now, assume there exist a system H , in which system state is vector

x ∈ R N , input

u ∈ R M and output y ∈ R M . According to passivity theory[10], some definitions are given as follows. Definition 2.2. The system H above is called passivity, if there exists a continuously differentiable energy function some

W ( x) ≥ 0 .

V ( x ) ≥ 0 satisfying V ( x ) ≤ −W ( x ) + u T y for

Stability Analysis of Network Data Flow Control for Dynamic Link Capacity Case

397

From the above definition, some system is passive if the system itself doesn’t generate energy, but possibly dissipates the energy injected into the system. Definition 2.3. The system H above is called strictly passivity, if there exists a V ( x) ≥ 0 satisfying continuously differentiable energy function

V ( x ) ≤ −W ( x ) + u T y for some W ( x ) > 0 . Lp stable, if Lp -norm of the state vector, and output vector of the controlled system exist simultaneously for p > 1 , or p = ∞ if input vector variable is L p .

Definition 2.4. The system is

Lemma 2.1[11] . Suppose that

W : [0, ∞) → −αW (t ) + β (t ) satisfies

D +W (t ) ≤ −αW (t ) + β (t )

(2)

D + denotes the upper Dini derivative, α is a positive constant, and β ∈ L p , p ∈ [1, ∞) , then

where

W where

−

Lp

1 p

1 q

≤ (αp ) W (0) + (αq) β

(3) Lp

p, q satisfies 1 1 + =1 p q

(4)

and when p = ∞ , the following estimate holds

W ≤ e −αt W (0) + α −1 β

L∞

(5)

3 Problem Formulation As we know, status of network environment is often changing along with time. Not only the number of network data flow source but also link rate capacity/bandwidth is variable. In fact, some network links in certain routing path might fail in work. In this note, we don’t plan to make consideration of this problem. Now we focus upon such an occasion that link rate capacities for sources are not constant, but dynamic changing along with time, denoted as c (t ) with respect to time variable t . Firstly, we assume network is constructed by N information resource and L communication links where N and L are some known integer number. Now we define a routing matrix

R = ( rij ) L× N with 1 , if the source j passes through the link i rij = ® ¯0 , other

(6)

398

Y. Yang et al.

s(l ) and l (s ) respectively, i = 1, , L } . that is, s( l ) = { j | rlj = 1᧨j = 1, , N } and l ( s ) = {i | ris = 1᧨ From the routine matrix, it is easily to define two sets

y ∈ R L and aggregate price of sources

Then the aggregate rate vector of links

q ∈ R N can be defined respectively as y = Rx, and q = R T p

(7)

where x ∈ R is called source rate vector and p ∈ R link price vector. In [1], the information flow control problem is described a static optimization prime problem and its dual problem as follows N

L

max x ≥0

where

¦

U i ( xi )

s. t.

Rx ≤ c(t )

i =1,, N

(8)

U i ( xi ) is utilization function with the strictly concave property and

c(t ) ∈ R L is link rate capacity vector with component cl (t ) representing the rate capacity of the link l . Its dual problem is easily obtained as follows min max p ≥0

x ≥0

¦ U (x ) − ¦ i

i

i =1,, N

pl ( yl − cl (t )) .

l =1,, L

(9)

After simple transformation, the above can be converted to be

min max p ≥0

If

x ≥0

¦{ U ( x ) − q x } + ¦ i

i

i

i =1,, N

i

l =1,, L

p l cl ( t )

(10)

U (x ) is differentiable, the first order condition for the maximization problem is U i' ( xi ) = qi

(11)

= 0 , if y l < cl (t ) pl ® ¯≥ 0, if y l = cl (t )

(12)

and

where U i' ( xi ) = ∂U i ∂xi . From appendix I in [5], we know that if U i ( xi ) is strictly concave and routing matrix R is full row rank the (11) and (12) are sufficient to determine unique equilibrium for the constant link capacity. But what will happen when link capacity becomes dynamic changing along with time? The objective of flow rates and link update laws is to drive the actual source rate and link prices to their respective equilibrium dynamic. To realize this destination, there are several real constraints exist, such as decentralization network topology, no routing information, that is, routing matrix is unknown to network data flow source, no coordination and the link capacity is time varying.

Stability Analysis of Network Data Flow Control for Dynamic Link Capacity Case

399

In [1], the source update law is given by

x = K (U ' ( x ) − q(t )) where

+ x

(13)

K = diag{k } K = diag{ki }, ki > 0 , U ' ( x) ∈ R N with ith component

U i' ( xi ) . And the link price generation function is given by p = h( y ) where

(14)

h ( y ) ∈ R L , with l th component is hl ( yl ) which may be considered as a

penalty function enforcing the link capacity constraints,

yl ≤cl (t) . The function h( y )

is monotonically nondecreasing and nonnegative which is defined in [1] as

hl ( y ) =

( yl − cl + ε ) +

. Here, considering the dynamic capacity case, introducing a

ε2 buffering factor δ and dynamic adjusted factor γ (t ) , link price generated function is defined as follows

㇢

0, y l ≤ (1 − δ )cl (t ) hl ( y l ) = ® 2 (1 − δ )cl (t ) < y l ¯γ (t )( y l − (1 − δ )cl (t )) ,

㇢

(15)

where

δ , δ =® 1 ¯δ 2 ,

cl (t ) ≥ 0 cl (t ) < 0

,0 < δ 1 < δ 2 < 1

(16)

The main motivation to define such a link price generation function above is to reduce the risk of link congestion under consideration of dynamic link capacity case. The buffering factor and dynamic adjusted factor become smaller while available link capacity becomes increasing which aims to widen the buffer size. But in this case under consideration, from (11) and (12), we can find that equilibrium is shifted and timely variant because link capacity c (t ) is not constant now. Therefore, equilibrium manifold can be obtained as

q * = U ' ( x * (t )) and p * = h( y * (t ))

(17)

It is remarkable to note that much attention should be paid to link capacity’s changing property such as rate changing velocity, changing shape and so on. The detailed analysis of the influence of dynamic link capacity upon equilibrium manifold and control performance will be done in next research step. In this note, network data flow control shown at Fig. 1 considering the differences between real-time data flow and equilibriums as follows Proposition 3.1. Assume that: (1) link capacity c l (t ) is dynamic changing with time;

(2) If U : R → R satisfies U ( x ) < − δ I N , δ > 0 , and I N is N × N unit matrix. (3). Routing matrix R is of full row rank, then optimality condition given by (11) and (12) has a unique equilibrium manifold (17). N

N

''

400

Y. Yang et al. * − ( p − p* ) − (q− q )

RT

-

y − y*

x− x*

x = K(U' (x) − q*)+x

R

p − p*

h Fig. 1. Diagram of network data flow control in error means

For simplicity some interpretation of the above proposition is given. When the variable t is frozen and denoted as t , it is apparent to see now the situation is same

c(t ) , there are unique equilibrium p ( t ) , q ( t ) , x ( t ) , y (t ) . So, when link capacity c l (t ) is

with the Appendix I in [5]. That is to say, for every *

*

*

*

*

*

*

*

dynamic changing with time, p ( t ) , q ( t ) , x ( t ) , y (t ) forms a dynamic equilibrium manifold each. Here we focus on the equilibrium manifold over the dynamic link capacity case. And the next task is to realize stability analysis of network flow control for the primal optimal control system and the dual optimal control system.

4 Stability Analysis Firstly, according to the passivity-based flow control structure shown in Fig. 1, we observe the system with source rate controller and link update law as follows

x = K (U ' ( x ) − q) +x ° ° p = h( y ) °y = ( ¦ x i (t )) l = Rx ® i∈s ( l ) ° °q = ( ¦ pl ) i = R T p ° l∈l ( i ) ¯

(18)

and we have the following results. Theorem 4.1. Considering the closed-loop system (18) shown in Fig. 1, with the

assumption that U ( x ) < −δI N , for some ''

δ >0,

and the link penalty function

h ( y ) satisfies

0 ≤ h ' ( y ) ≤ η , for all y ≥ 0 and all links

(19)

where η is a positive constant. Then the two following inequalities hold

x − x*

Lp

≤ kmax (δkmin p)−1/ p ( x(0) − x* (0)T K( x(0) − x* (0) + 2 kmax (δkminq)−1/ q 1 / 2kmax / kmin x * (t )

(20) Lp

Stability Analysis of Network Data Flow Control for Dynamic Link Capacity Case

p − p*

Lp

≤ η R x − x*

Lp

.

401

(21)

*

That is, if x (t ) is L p , then the system (18) is L p stable. Proof: We take the Lyapunov function as follows

V1 ( x − x * (t )) =

( xi − xi* (t )) 2 1 . ¦ ki 2 i =1,, N

(22)

Its derivative along the solution is

V1 =

¦(x

i

− xi* (t ))( x i − x i* (t )) / k i

i =1,, N

=

¦ (x

i

− x i* ( t ))( U i' ( x i ) − q i ) +x − ( x i − x i* ( t )) / k i x i* ( t )

(23)

i = 1 , , N

According to the definition of positive projection, it yields

( x i − x i* (t ))(U i' ( x i ) − qi ) +x ≤ ( x i − x i* ( t ))(U i' ( x i ) − qi )

(24)

With the above inequality, (23) can turn into

V1 ≤

¦(x

i

− xi* (t ))(U i' ( xi ) − qi ) − ( xi − xi* (t )) / ki xi* (t )

i =1,, N

=

¦( x − x (t ))(U ( x ) − q * i

i

' i

* i

i

+ qi* − qi ) − ( xi − xi* (t )) / ki xi* (t )

i =1,, N

=

¦( x − x (t ))(U ( x ) − U ( x (t )) + q i

* i

' i

i

' i

* i

* i

− qi ) − ( xi − xi* (t )) / ki xi* (t )

i =1,,N

= ( x − x * (t )) T (U ' ( x ) − U ' ( x * )) − ( x − x * (t )) T ( R T ( p ) − R T ( p * )) − ( x − x * (t )) T K −1 x * (t ) Considering the utilization function is strictly concave, then 2 V1 ≤ −δ ∆x − ∆x T kx * (t ) − ( y − y * (t )) T ( p − p * )

(25)

2

= −δ ∆x − ∆x T kx * (t ) − ( x − x * ( t )) T R T ( h ( Rx ) − h( Rx * )) And with the property of the function vector satisfies (19), further we have 2 V1 ≤ −δ ∆x + k ∆x x * (t ) − ( x − x * (t )T R T h ' (ξ ) R( x − x * ) 2

≤ −δ ∆x + k ∆x x * ( t )

≤ −2δk minV1 + 2k max / k min x * (t ) V1

(26)

402

Y. Yang et al.

where

k = k max , k min = min{k i , i = 1, , N } , k max = min{ki , i = 1, , N } ,

∆x = x − x * and ξ ∈ [ Rx, Rx * ] which follows from the mean value theorem. We take

W = V1 and obtain D +W = −δk minW + 1 / 2k max / k min x * (t )

(27)

According Lemma 2.1, we get

W

Lp

≤ (δkmin p)−1/ p W (0) + (δkminq)−1 / q 1 / 2kmax / kmin x* (t )

Lp

and

W (t ) ≤ e −δkmint W (0) + (δk min ) −1 1 / 2k max / k min x * (t )

L∞

.

Therefore, following from the above, it is easy to obtain (20) and (21). Furthermore, * the system is L p stable if x (t ) is L p .

Remark 4.1. From (25), it is not difficulty to find that the forward system in Fig.1 is strictly passive when the following

∆x >

k

δ

x * (t )

(28)

holds. Further, it shows that for the case of dynamic link capacity the forward system strictly passivity partly depends not only on the approximate difference of information source rate x (t ) to the equilibrium dynamic x * (t ) but also on its changing degree. It is not always true that the faster the source flow rate approaches, the better performance of the system. Remark 4.2. According to theorem 4.1, we know that that the system under *

consideration is Lp stable depends upon the property of x (t ) , which in fact upon the property of dynamic link capacity c (t ) . Next, we consider the dual problem shown as Fig. 2. * − ( p − p* ) −(q− q ) x− x* ' − 1 x = (U ) (q) R RT

p − p*

y − y*

p = Γ( y − c(t))+p

Fig. 2. Diagram of the dual problem in error means

Stability Analysis of Network Data Flow Control for Dynamic Link Capacity Case

403

and we define it as follows

x = (U ' ) −1 ( q) ° + ° p = Γ( y − c(t )) p ° ® y = ( ¦ xi (t ) ) l = Rx i∈s ( l ) ° °q = ( pl ) i = R T p ¦ ° l∈l ( i ) ¯ Γ = diag{λl > 0, l = 1,, L} and λmin = min{λl , l = 1,, L}. where

(29)

and denote λmax

= max{λl , l = 1,, L}

Theorem 4.2. For the system (29) shown as Fig.2, assume that

− δ 1 I N ≤ U '' ( x ) ≤ −δ 2 I N

(30)

and for certain positive constant c 2 , the following condition

p (t ) ≤ c2

(31)

where c 2 is a positive constant, then the system (29) is L p stable. And furthermore, there exist

p− p*

Lp

≤ λmax(αp)−1/ p ( p(0) − x* (0)T Γ−1(x(0) − x* (0) + 2 λmax(αq)−1/ q β L

(32)

p

x − x * (t ) ≤ 1 / η 2 R p − p * (t )

Lp

(33)

where 2 α = δ 1−1λmin σ min (R)

(34)

β = 1 / 2λmax / λmin p * (t )

(35)

Proof: First, define the Lyapunov function as V2 ( p − p * ) =

( p l − p l* (t )) 2 1 ¦ 2 l =1,,L λl

Along with the (29), the difference of the above can be obtained

V2 =

¦( p

l

− p l* (t ))( p l − p l* (t )) / λl

l =1,, L

= ( p − p * ) T ( y − c ) +p − ( p − p * ) T Γ −1 p *

(36)

404

Y. Yang et al.

≤ ( p − p * ) T ( y − y * ) − ( p − p * ) T Γ −1 p *

(37)

= ( p − p * ) T R ((U ' ) −1 ( R T p ) − (U ' ) −1 ( R T p * )) − ( p − p * ) Γ −1 p * = ( p − p * ) T R(U '' ) −1 (ξ ) R T ( p − p * ) − ( p − p * ) Γ −1 p * * The above follows from the mean value theorem, where ξ ∈ [ p, p ] and with the assumption (30), the above turns into 2 V2 ≤ −δ 1−1 R T ( p − p * (t )) + 1 / λmin p − p * p * 2 ≤ −2δ 1−1λminσ min ( R )V2 + 2 λmax / λmin p * V2

(38)

Further using W = V 2 , we have 2 D +W = −δ 1−1λminσ min ( R )W + 1 / 2λmax / λmin p * (t ) .

(39)

From Lemma 2.1, it is easy to get (32), and because of

x = (U ' ) −1 ( q) = (U ' ) −1 ( Rp ) and mean value theorem, the (33) is not difficult to be obtained . Remark 4.3. From (37), only when the following

p * = Λ ( p − p* ) , for Λ = diag{τ l > 0, l = 1, , L} ,

(40)

then return subsystem is strictly passive. Remark 4.4. Through the deep analysis above, the condition stability of the system under consideration is comparatively tough. For dynamic link capacity case, the property of p * which is determined by dynamic link capacity, plays very important

role of the network data flow control system.

5 Simulations First, we assume simulation test is made based on the topological graph of network data flows shown in Fig.3, S1 S 2 1

5 3

2 S3

4 S 4 S5

Fig. 3. Topological graph of network data flows

Stability Analysis of Network Data Flow Control for Dynamic Link Capacity Case

and the routing matrix

405

R is given as follows ª1 «0 « R = «1 « «1 ¬«0

1 0

0 0 0º 1 1 0» » 1 1 1 0» » 0 1 1 1» 1 0 0 1 »¼

(41)

The buffering factor ϑ1 = 1 / 8,ϑ2 = 1 / 6 , and dynamic adjusted factor follows

γ (t ) is chosen as

㇢ c ( t ) < 0 ㇢ c ( t ) = 0 ㇢ c ( t ) > 0

3, °

γ (t ) = ® 2 , °1, ¯

(42)

Here, two cases are considered in the simulation tests. Firstly, network link capacities available to be occupied for feedback based are constant, with 10, 8, 24, 15 and 16 Mb/sec from link 1 to link 5 respectively. The flow rate of each data flow is shown in Fig.4. Secondly, considering the dynamic link capacity case, that is, the dynamic capacity of link 3 and link 4 given by (43), the simulation test is shown is Fig.5. 20 , ° c 3 ( t ) = ® 20 + 3 t , ° 15 , ¯

0 ≤ t ≤ 10 10 ≤ t ≤ 15 15 ≤ t

c ( t ) = 30 − t , ® 4 ¯ 20 ,

12

0 ≤ t ≤ 15 15 ≤ t

(43)

14 flow 1 flow 2 flow 3 flow 4 flow 5

10

flow 1 flow 2 flow 3 flow 4 flow 5

12

10

flow rate

flow rate

8

6

8

6

4 4 2

0

2

0

2

4

6

8

10

time

Fig. 4. Flow rate for constant capacity case

0

0

10

20

30

40

50

time

Fig. 5. Flow rate for dynamic capacity case

6 Conclusion In this paper L p stability analysis of network data flow control system with dynamic link capacity case from the Lyapunov stability point of view, is first done for both its prime flow control system and its dual control system. Moreover, the corresponding comparison with passive or strictly passive property of flow control system is made. The conclusion is that stability performance of network data flow control system under consideration depends on property of dynamic link capacities. The simulations

406

Y. Yang et al.

illustrate the stability analysis in this paper is good and enlightening. Therefore, research about transient and statistical property of dynamic link capacity needs to be made in the fields of control and computer areas.

Acknowledgments This work is partly supported by the National Natural Science Foundation of China under Grant No. 60334020 and No. 60440420130.

References 1. Kelly, F., Maulloo, A., Tan, D.: Rate Control In Communication Networks: Shadow Prices, Proportional Fairness And Stability. J. Oper. Res. Soc. 49 (1998) 237-252 2. Low, S.H., Lapsley, D.E.: Optimization Flow Control—I: Basic Algorithm And Convergence. IEEE/ACM Trans. Networking. 7 (1999) 861-874 3. Low, S.H., Paganini, F., Doyle J.C.: Internet Congestion Control. IEEE Control System Magazine. 22 (2002) 28-43 4. Johari, R., Tan, D.K.H.: End-to-End Congestion Control for the Internet: Delays and Stability. IEEE/ACM Trans. Networking. 9 (2001) 818-832 5. Wen, J.T., Arcak, M.: A Unity Passivity Framework for Network Flow Control. IEEE Trans. Automatic Control. 49 (2004) 162-174 6. Paganini, F.: A Global Stability Result In Network Flow Control. System Control Letters. 46 (2002) 165-172 7. Ioannou, P., Tao, G.: Frequency Domain Conditions for Strictly Positive Real Functions. IEEE Trans. Automatic Control. 32 (1987) 53-54 8. Wen, J.T.: Time Domain and Frequency Domain Conditions for Strict Positive Realness. IEEE Trans. Automatic Control. 33 (1988) 988-992 9. Paganini, F., Doyle, J.C., Low, S.H.: Scalable Laws for Stable Network Congestion Control. In Proc. of Conference on Decision Control, Orlando, FL. (2001) 185-190 10. Arjan van der Schaft: L2-Gain and Passivity Techniques in Nonlinear Control. SpringerVerlag London Limited (2000) 11. Khalil, H.: Nonlinear Systems. 2nd ed. England Cliffs, NJ: Prentice Hall (1996)

Matrix Measure Stability Criteria for a Class of Switched Linear Systems Hongbo Zou, Hongye Su, and Jian Chu National Laboratory of Industrial Control Technology, Institute of Advanced Process Control, Zhejiang University, Hangzhou, 310027, P.R. China {hbzou, hysu, chuj}@iipc.zju.edu.cn

Abstract. The problem of stability analysis for a class of switched linear systems is studied in this paper based on the properties of matrix measure. The matrix measures of all subsystems are used to determine the stability of the switched linear systems. Based on this, sufﬁcient conditions are reached, which can be used to determine the stability or instability of switched linear systems under arbitrary switching law. If a switching sequence satisﬁes some conditions, a theorem in this paper can be used to verify if it can stabilize the systems. The obtained conditions are simpler than the reported methods such as multi-Lyapunov functions and hence are easier to check. Numerical examples are used to demonstrate these conditions.

1 Introduction A hybrid system is a dynamic system which be composed of discrete and continuous dynamical systems. Those systems arise as models for phenomena which cannot be described by exclusively continuous or exclusively discrete processes. The continuous processes generally have the form of differential equations and the discrete processes usually are logical or decision-making processes. The standard model for such systems is given in [1]. When a system can only be described by a nonlinear arithmetical model such as a nonlinear differential equation, it is difﬁcult to get its exact arithmetical model in practice and the control law is also difﬁcult to design. So in this instance, more then one simple models can be applied to describe this system approximately. When the system state vector arrives in a special space, one of these simple models is used to replace the system’s nonlinear mathematical model. All of these simple models are called subsystems and thus this system is a hybrid system. The systems like this are also called switched systems for its mathematical models switching between these subsystems. So we can say that a switched system is a special type of hybrid systems and the system state is continuous during the switching instances. For the sake of simplicity, subsystems usually are the linear models such as linear differential equations. Systems whose subsystems are all linear models are called switched linear systems. Examples include the control of manufacturing systems [2,3], communication networks, trafﬁc control [4,5,6], chemical processing [7] and automotive engine control and aircraft control [8]. It is pointed out in [13] that there are three basic problems in stability analysis and design of switched systems: D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 407–416, 2006. c Springer-Verlag Berlin Heidelberg 2006

408

H. Zou, H. Su, and J. Chu

1. Find conditions that guarantee that the switched system (1) is asymptotically stable for any switching signal(under arbitrary switching sequence). 2. Identify those classes of switching sequence for which the switched system (1) is asymptotically stable. 3. Construct a switching sequence that makes the switched system (1) asymptotically stable. In the last two decades, there has been increasing interest in stability analysis and controller design for switched systems, see, e.g., [9,10,11,12,13,14,15,16] and the references therein. Up to now, the most efﬁcient methods used to analysis the stability of switched systems are the multiple Lyapunov functional (see [11,12,17,18,19] ) and the common Lyapunov functional (see [9,10,14,20,21] ). A number of other different methods have been discussed in [13]. The work of [9] has been focused on the synthesis of a stable switching sequence on unstable switched linear systems. In that, assuming that there exists a Hurwitz linear convex combination of all of its subsystem matrices, they proposed a stable dependent switching sequence to stabilize the system. Their rule and proof of stability relay on common Lyapunov functional. In [10] the stability of a class of switched linear systems with input delay was studied. Applying common lyapunov functional they gave the method of construction of switching regions and a stable switching rule when time delay is very small. In [15], the stability of the switched linear systems whose subsystems have their eigenvalues real, negative and non-repeating has been discussed. If this condition is satisﬁed, the system will be globally asymptotically stable. The jumping systems were also discussed in this paper and sufﬁcient conditions were given to verify if a jumping system is stable under arbitrary switching sequence. Ref. [16] discussed some necessary conditions for stability of switched linear systems. Ref. [22] dealt with the systems whose subsystems are fully controllable. With state feedback of all subsystems, the considered systems will be globally asymptotically stable under arbitrary switching sequence. But the design methods and criterions were not given. The practical stability and stabilization problems for hybrid and switched linear systems are studied in [23]. They focused their attention on the hybrid systems and switched systems which no common equilibrium point exists for all subsystems. They also discussed the practical stabilizability of switched systems with time-varying subsystems. In this paper, the attention is focused on the stability analysis of switched linear systems whose subsystems are linear time-variant systems. Applying the properties of matrix measure, sufﬁcient conditions have been reached which can be applied directly to analyze the stability of switched linear systems under arbitrary switching sequence. At the same time, on the base of these conditions, a stable switching sequence can be veriﬁed if a system is asymptotically stable under this switching sequence when the system is unstable under some switching sequences. It is not required tedious arithmetic to apply these theorems and thus is easy to implement practically. Notations: In this paper, only switched linear systems are considered. A switched system is a hybrid system without state jump at switching instants (see [23]).

Matrix Measure Stability Criteria for a Class of Switched Linear Systems

409

2 Problem Formulation and Deﬁnitions In this paper, we consider a class of switched linear systems of the following form, x˙(t) where x(t)

Aq (t)x(t)

n is the system state, Aq (t), (q

(1)

1 2

N) are continuous in n n .

Deﬁnition 1. (matrix induced norm)[24] Let be a given norm on n , Then for each matrix A n n , the quantity i deﬁned by

Ai

sup Ax

x

1

sup Ax x

x1

n

(2)

is called the induced norm of A corresponding to the vector norm . Deﬁnition 2. (matrix measure)[24] Let i be an induced matrix norm on n n . Then the corresponding matrix measure is the function : n n deﬁned by

[A(t)]

lim 0

I A(t) 1

(3)

Remark 1. Given vector norm on n , it is general a very difﬁcult task to obtain an explicit expression for the corresponding induced norm. Therefore it is difﬁcult to obtain an explicit expression for the corresponding matrix measure.

3 Main Results Lemma 1 ([24]). The matrix measures corresponding to the norms , 1 and 2 , respectively, are

[A(t)]

max Re[aii (t)]

[A(t)]

max Re[a j j (t)]

1

2

i

j

[A(t)]

max

ji

i j

ai j (t)

ai j (t)

[AH (t) A(t)]2

(4)

(5)

(6)

where Re[ai j (t)] means the real part of ai j (t), max means the maximum eigenvalue of a matrix and AH (t) is Hermition adjoint of A(t). Proof. Eq. (4) and (5) can be obtained from Deﬁnition 1 and Deﬁnition 2 easily. The H 12 following is the proof of (6). Let A(t ) I A(t). A(t)i2 max [A (t)A(t)] . According to deﬁnition 2, we have

410

H. Zou, H. Su, and J. Chu i2 1)( A(t ) i2 1) ( A(t ) i2 1) A(t ) 2i2 1 lim 0 ( A(t ) i2 1) max [AH (t )A(t )] 1 1 lim lim 0 A(t ) i2 1 0

2 [A(t)]

lim

( A(t )

0

1 lim 2 0

max I 2 AH (t)A(t) [AH (t) A(t)]

1 max A (t)A(t) [AH (t) A(t)] 1 lim 2 0 max [AH (t) A(t)]2 H

(7)

1

1

Lemma 2 ([24]). Whenever i is an induced norm on nn , the corresponding matrix measure () has the following property i

Re[A(t)]

[A(t)]

i

[A(t)]

(8)

holds. Lemma 3 ([24]). Consider the differential equation A(t)x(t)

x˙(t)

n

(9)

n n

and A(t) is piecewise-continuous. Let be a norm on where x(t) ,A(t) and let i and [A(t)] denote, respectively, the corresponding induced norm and matrix measure on nn . Then whenever t t0 0 , we have

n ,

x0 (t) exp

t

[A(t)]dt

x(t)

x0 (t) exp

t

t0

[A(t)]dt

(10)

t0

holds. Remark 2. From Lemma 2 and Lemma 3, it is clear that if [A(t)] 0 , then the system (9) is asymptotically stable. And if [A(t)] 0, then the system (9) is unstable. Theorem 1. Consider the switched linear system (1). Let () be a matrix measure on nn . This system is asymptotically stable under arbitrary switching sequence if 1 2 N. q (t) [Aq (t)] 0, q Proof. Let

DS (t0 i0 ) (t1 i1 ) (tk ik )

(11)

denote the switching sequence of the system (1). In DS, (tk ik ) means subsystem ik is running in [tk tk1 ). Deﬁne (12) max max q (t) 0 qt

and let the initial state of system (1) is x0 x(t)

x(t0 ) 0. According to (10),

t

x0 exp

t0

i0 ()d

t [t0 t1 )

(13)

Matrix Measure Stability Criteria for a Class of Switched Linear Systems

411

holds. Then we have t 1

x(t1 )

i0 ()d

x0 exp

t 1

i0 ()d

x0 exp

t0

(14)

t0

When t [t1 t2 ), being the same with (13) and (14), the following can also be reached, t

x(t)

i1 ()d

x1 exp

t

x0 exp

t1

t1

i1 ()d

t 1

i0 ()d

(15)

t0

Repeating this and we ﬁnally obtain the following, t

x(t)

x0 exp tk

ik ()d

k 1 tm1

im ()d

(16)

tm

m 0

Considering (12), we can botain t

x(t)

x0 exp t0

max d

x0 exp (t t0 )

max

t t0

(17)

Then we have 0

lim x0 exp (t t0 )

lim x(t)

t

holds and thus lim x(t) t

t

max

0

(18)

0. This completes the proof.

Theorem 2. Consider the switched linear system(1). Let () be a matrix measure on This system is unstable under arbitrary switching sequence if [Aq (t)] 0, q 1 2 N.

nn .

Proof. The proof resemble that of Theorem (1). Suppose min

and x(t0 ) 0, consider the left side of (10), then we have t

x0 exp t0

Thus we have lim x(t) t

min d

x0 exp (t t0 ) min

min [ qt

(Aq (t))] 0

x(t)

(19)

. So system (1) is unstable.

Theorem 3. Consider the switched linear system (1). Let (11) is a switching sequence of the system. Let ik , i k denotes the time in which ik (t) [Aik (t)] 0 and ik (t)

[Aik (t)] 0 ,respectively, in (ik tk ). And let T i k , T i k respectively. If there exists a switching sequence such as (11) which make lim

t

T T

hold, then this system is asymptotically stable under this switching sequence.

(20)

412

H. Zou, H. Su, and J. Chu

Proof. Suppose a switching sequence DS as (11). Let q (t) (Aq (t)). In (ik tk ), let (t) 0 and (t) 0. Because (t) is continuous in [tk tk1 ), thus ik ik ik ik ik we can denote i k max max i k (t) and ik max max ik (t). Under DS, (16) is t[tk tk1 )

t[tk tk1 )

available. And we can get the inequality as follows, x(t)

t

x0 exp tk

ik d

m 0

x0 exp (t tk ) max 0 tk

0 im k 1

x(t)

im ()d

tm ik ()

t

where im is the time in which (im tm ). And let max max

k 1 tm1

k 1

im

immax

im

(21)

immax

m 0

im (t)

0 and im is the time in which im (t) 0 in 0, max max immax 0, we get

immax

0 im k

x0 exp(T

max

T

max )

(22)

Since (20) holds, for any M 0, there exists a T M 0 and when t T M , we have T max (M 1) T max

T

max

T

max

0 MT max

(23)

) holds. Thus the following inequality can be hold. So lim (T max T max t reached ) 0 0 lim x(t) lim x0 exp(T max T max (24) t

t

This completes the proof. Remark 3. For a switched linear system whose subsystems are all fully controllable is unstable under arbitrary switching sequence, we can make it stable by applying the feedback to make it satisfy theorem (1). Remark 4. Theorem 3 implies that making those subsystems whose matrix measures are negative run longer is a necessary condition for a stable switched linear system.

4 Numerical Examples To demonstrate Theorem 1 and 2 directly by numerical examples is somewhat difﬁcult. Alternatively, we can take other ways indirectly to demonstrate Theorem 1 and 2. Example 1 ([11]). According to Theorem 1, if the switched system (1) is unstable under some switching sequences, then at least one of its subsystems has its matrix measure nonnegative. 2, x Consider the switched system (1) with the following parameters, n N [x1 (t) x2 (t)]T , 1 100 1 10 A1 A2 100 1 10 1

Matrix Measure Stability Criteria for a Class of Switched Linear Systems

413

Both of the subsystems are asymptotically stable. If select

1 if q(t ) 2 if q(t )

q(t )

2 and x2 (t) 1 and x2 (t)

A1

½ (A1 )

1

(A2 )

2

(25)

0, this system is unstable. The matrix

as the switching sequence, then, for any x(0) measure of A1 and A2 are as follows, 1

5x1 (t) 02x1 (t)

(A1 )

½ (A2 )

99 0

(A2 )

88 0

2

(26)

Neither of them is negative. Example 2 ([11]). According to Theorem 2, if the switched system (1) is stable under some switching sequences, then at least one of ( Aq ), q 1 2 N is positive. Consider the switched system (1) with the following parameters, n N 2, x [x1 (t) x2 (t)]T , 0 10 15 2 A1 A2 0 0 2 05 Both subsystems are unstable. If we select

1 if q(t ) 2 if q(t )

q(t )

2 and x2 (t) 1 and x2 (t)

025x1 (t) 04x1 (t)

as the switching sequence, then this system is asymptotically stable. We have 1

( A1 )

½ ( A1 )

1

( A2 )

( A1 )

10

½ ( A2 ) 2 ( A 2 )

3 5 1

2

(27)

All of them are positive. Example 3. Consider the switched linear system (1) with the following parameters, n N 2, x [x1 (t) x2 (t)]T ,

A1

04 1 A2 0 04

0 4 0 1 0 4

we have i1 (A1 ) i½ (A1 ) i1 (A2 ) i½ (A2 ) 06 0, i2 (A1 ) i2 (A2 ) 01 0, i1 ( A1 ) i½ ( A1 ) i1 ( A2 ) i½ ( A2 ) 14 0 and i2 ( A1 ) i2 ( A2 ) 09 0. So we can not apply Theorem 1 and Theorem 2 to this example. Let we design the switching sequence as follows, q(t )

1 2

kx1 1 k x1

x2 0 x2 0

(28)

Actually, if let k 04, then this system is unstable and Fig1 show the trajectory of system with x0 [ 1 1]T . Let k 025, then this system is asymptotically stable. Fig 2 show the trajectory of the system whit x0 [ 1 1]T .

414

H. Zou, H. Su, and J. Chu 700 system trajectory switching line 1 switching line 2 600

500

400 x2 300

200

100

mInitial point -100

0

100

200

300

400

500

600

700

x

1

Fig. 1. Unstable State Trajectory with k

04

system trajectory switching line 1 switching line 2

mInitial point 1

0.8

0.6 x2

0.4

0.2

0

-1

-0.8

-0.6

-0.4

-0.2 x1

0

Fig. 2. Stable State Trajectory with k

0.2

0 25

0.4

0.6

Matrix Measure Stability Criteria for a Class of Switched Linear Systems

415

5 Conclusion In this paper, we study the problem of stability analysis for a class of switched linear systems. When the matrix measure of all subsystems for a switched linear system are negative, we show that the system is asymptotically stable under arbitrary switching sequence. A sufﬁcient condition is shown under which the system is unstable under any switching sequence. When a switched linear system do not satisfy any condition above, we show the sufﬁcient condition can be used to verify if the system is stable under a special switching sequence.

Acknowledgment This work is supported by the National Creative Research Groups Science Foundation of China under grant 60421002, the National Science Foundations of P. R. China (60503027) and the National Tenth Five-Year Science and Technology Research Program of China under grant 2004BA204B08.

References 1. Branicky, M., Borkar, V., and Mitter, S.: A Uniﬁed Framework for Hybrid Control: Model and Optimal Control Theory. IEEE transactions on Automatic Control, Vol.88 (1998) 31–45 2. Pepyne, D., and Cassandaras, C.: Optimal Control of Hybrid Systems in Manufacturing. Proc. IEEE, Vol.88 (2000) 1008–1122 3. Song, M., Tran, T., and Xi, N.: Integration of Task Scheduling, Action Planning, and Control in Robotic Manufacturing Systems. Proc. IEEE, Vol.88 (2000) 1097–1107 4. Horowitz, R., Varaiya, P.: Control Design of An Automated Highway System. Proc. IEEE, Vol.88 (2000) 913–925 5. Livadas, C., Lygeros, J., Lynch, N.A.: High-level Modeling and Analysis of the Trafﬁcalert and Collision Avoidance System (TCAS). Proc. IEEE, Vol.88 (2000) 926–948 6. Varajya, P.: Smart Cars on Smart Roads: Problems of Control. IEEE Transactions on Automatic Control, Vol.38 (1993) 195–207 7. Engell, S., Kowalewski, S., Schulz, C., Strusberg, O.: Continuous-discrete Interactions in Chemical Processing Plants. Proc. IEEE, Vol.88 (2000) 1050–1068 8. Antsaklis, P.: Special Issue on Hybrid Systems: Theory and Applications - A Brief Introduction to the Theory and Applications of Hybrid Systems. Proc. IEEE, Vol.88 (2000) 887–897 9. Wicks, M., Peleties, P., DeCarlo, D.: Switched Controller Synthesis for the Quadratic Stabilization of A Pair of Unstable Linear systems. European Journal of Control, Vol.4 (1998) 140–147 10. Kim, S., Campbell, S.A., Liu, X.: Stability of a Class of Linear Switching Systems width Time Delay. IEEE Transactions on Circuits and Systems I, Vol.53 (2006) 384–393 11. Decarlo, R.A., Branicky, M.S., Petterson, S., Lennartson, B.: Perspectives and Results on the Stability and Stabilizability of Hybrid Systems. Proceedings of the IEEE, Vol.88 (2000) 1069–1082 12. Branicky, M.S.: Multiple Lyapunov Functions and Other Analysis Tools for Switched and Hybrid Systems. IEEE Transactions on Automatic Control, Vol.43 (1998) 475–482 13. Liberzon, D., Morse, A.S.: Basic Problems in Stability and Design of Switched Systems. IEEE Control Systems Magazine, Vol.19 (1999) 59–70

416

H. Zou, H. Su, and J. Chu

14. Liberzon, D., Hespanha, J.P., Morse, A.S.: Stability of Switched Systems: A Lie-Algebraic Condition. Systems & Control Letters, Vol.37 (1999) 117–122 15. Mitra, R., Tarn, T.J., Dai, L.: Stability Results for Switched Linear Systems. in Proceedings of the American Control Conference, Arlington, USA (2001) 1884–1889 16. Sun, Z.: Stabilizability and Insensitivity of Switched Linear Systems. IEEE Transactions on Automatic Control, Vol.49 (2004) 1133–1137 17. Johansson, M., Rantzer, A.: Computation of Piecewise Quadratic Lyapunov Functions for Hybrid Systems. IEEE Transactions on Automatic Control, Vol.43 (1998) 555–559 18. Ye, H., Michel, A.N., Hou, L.: Stability Theory for Hybrid Dynamical Systems. IEEE Transactions on Automatic Control, Vol.43 (1998) 461–474 19. Xie, L., Shishkin, S., Fu, M.: Piecewise Lyapunov Functions for Robust Stability of Linear Time-Varying Systems. Systems & Control Letters, Vol.31 (1997) 165–171 20. Liberzon, D., Tempo, R.: Common Lyapunov Functions and Gradient Algorithms. IEEETransactions on Automatic Control, Vol.49 (2004) 990–994 21. Ishii, H., Basar, T., Tempo, R.: Randomized Algorithms for Synthesis of Switching Rule for Multimodal Systems. IEEE Transactions on Automatic Control, Vol.50 (2005) 754–767 22. Chen, D., Guo, L., Lin, Y., Wang, Y.: Stabilization of Switched Linear Systems. IEEE Transactions on Automatic Control, Vol.50 (2005) 661–666 23. Xu, X., Zhai, G.: Practical Stability and Stabilization of Hybrid and Switched Systems. IEEE Transactions on Automatic Control, Vol.50 (2005) 1897–1903 24. Vidyasagar, M.: Nonlinear Systems Analysis Prentice-Hall, Englewood Cliffs, New Jersey (1978) 64–90

Study on FOG-SINS/GPS Integrated Attitude Determination System Using Adaptive Kalman Filter Xiyuan Chen Department of Instrument Science and Engineering Southeast University, Nanjing City, 210096, P.R. China [email protected]

Abstract. A novel adaptive filter based on the model of marine FOGSINS/GPS (fiber optical gyroscope – strapdown inertial navigation/global positioning system) integrated attitude determination system is studied in this paper. Considering the actual situation of ship operation, the simulation experiment is made. The results show that the performance stability of this kind of adaptive filter is better than the conventional Kalman filter, and can calculate adaptively the matrix of system noise variance Q and measurement noise matrix R , so the estimation precision of the parameters such as attitudes etc. can be improved effectively for marine FOG-SINS/GPS integrated attitude determination system. Statistical performance after filtering for this integrated system is better than the GPS attitude determination system.

1 Introduction Positioning and attitude determination is an important component in navigation, guidance and control systems used for a wide range of moving platforms such as spacecraft, airplanes, robotics, land vehicles, and vessels. Traditionally, positioning and attitude information has been provided by inertial navigation systems (INS). The INS is self-contained and independent of any external signals. However, one of the main drawbacks of INS when operated in a stand-alone mode is the time dependent growth of systematic errors. This drawback is more obvious for marine inertial navigation systems operating over long time intervals. In contrast to the INS short-term accuracy, satellite-based kinematical positioning techniques can offer consistent accuracy during the period of positioning. Different systems have been developed based on whether the GPS measurements are used alone, or in combination with INS measurements. By mounting three or more GPS receiver antennas on the platform, a GPS-based attitude determination system can be constructed [1,2]. On the other hand, precise satellite measurements are ideally suited for the calibration of INS systematic errors. A calibrated INS system can then provide high-rate precise positioning and attitude information. Presently, the inertial navigation system (INS) and global positioning system (GPS) are widely used for marine navigation applications around the world. In this paper, GPS providing accurate vessel heading and pitch as well as position and velocity is D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 417 – 427, 2006. © Springer-Verlag Berlin Heidelberg 2006

418

X. Chen

employed to be integrated with FOG-SINS to constitute an outstanding marine integrated system, which requires higher accuracy both in position and attitude. The Kalman filter is usually used in the integrated navigation system, which is capable of estimating SINS errors online based on the measured errors between SINS and GPS. The accuracy of Kalman filters depends on a priori knowledge of system models and noise statistics. As long as the sensors and noise distributions do not change, the Kalman filter will give the optimal estimation [3]. In the practical applications for integrated inertial navigation system, the inertial measurement unit (IMU) integrated by inertial sensors such as three FOGs and three accelerometers is directly located under the gun base or beside the radar base for the strapdown inertial attitude system without any slider. It is easy for the sensor signal to be disturbed because of high dynamic environment such as vibration, impulse, etc. The attitude accuracy of the GPS attitude determining system is closely related to the calculation of distance between two antennae [4], so the attitude outputs of GPS are prone to be disturbed by the temporally changing signal noise than the position output, which will lead to significant performance degradation of the Kalman filter. A number of approaches can be taken for GPS/INS integrated system in order to improve the practicability and robustness of the Kalman filter. The CKF (Centralized Kalman Filter) and the FKF (Federated Kalman Filter) have been studied more for integrated navigation system in recent years [5]. The CKF can be applied to a system with multi-measurement sets to determine an optimal estimation of global system states. Another architecture that has received considerable attention as a practical means of decentralization is the FKF (Federated Kalman Filter). FKF differs from the conventional Kalman filter because each measurement such as SINS and GPS is processed in local filters, and the results are combined in a master filter. The primary disadvantage is that the FKF does not give performance equal to that of the CKF, even when local filters are based on true models of the system [6, 7]. Similar to the conventional Kalman filter, the accuracy of the CKF and the FKF depends on the correct priori models of the system. The purpose of an adaptive filter is to reduce or bound the gaps by modifying or adapting the Kalman filter. Since the basic source of uncertainty is due to unknown priori statistics of noise, one can estimate them online from the observed data. Another approach is to estimate the optimal Kalman gain directly without estimating the covariance of the process and measurement noise [8]. In the past few years, adaptive Kalman filtering based on innovation based adaptive estimation (IAE) has become one of the major approaches under study. Bian. et al. [9] summarized and analyzed these methods for GPS/INS integrated system, then proposed a novel IAE-AKF based on the maximum likelihood criterion for the proper computation of the filter innovation covariance and hence of the filter gain. The IAE-FLC (fuzzy logic control) methods decrease the computation time of the algorithm remarkably without increasing the system state dimension. Simulations verified their good robustness and accuracy. However, the key point, which is how to establish the fuzzy inference rules and select the membership function, is not provided with a convincing solution. Bian. et al. [9] theoretically deduced the proposed IAE-AKF algorithm in detail; the

Study on FOG-SINS/GPS Integrated Attitude Determination System

419

approach was tested in the developed INS/GPS integrated marine navigation system. The approach is direct without having to establish fuzzy inference rules compared with the IAE-FLC methods, but this method is necessary to test for the GPS/SINS integrated system in high dynamic environment. Based on the conventional Kalman filter, a real-time adaptive estimation Kalman filter is presented in this paper to improve the system’s overall performance when GPS and SINS measurement noise change abruptly in dynamic environment. A novel adaptive filter based on the model of marine FOG-SINS/GPS (fiber optical gyroscope – strapdown inertial navigation/global positioning system) integrated attitude determination system is studied in this paper. Considering the actual situation of ship operation, the simulation experiment was designed. This kind of adaptive filter can calculate adaptively the matrix of system noise variance Q and measurement noise matrix R , so the estimation precision of the parameters such as attitudes etc. can be improved effectively for marine FOG-SINS/GPS integrated attitude determination system. The organization of this paper is as follows: In Section 1, the research backgrounds are provided. Section 2 introduces the mathematical model of FOG-SINS/GPS integrated navigation system. A novel innovation adaptive filtering method based on system model and simulation experiments are presented in section 3, and finally, conclusions are given in section 4.

2 FOG-SINS/GPS Integrated Navigation System Model 2.1 State Equation for Integrated System Considering the characteristics of ship operation, assume the attitude error of GPS is expressed as the first-order Markov process equation; SINS navigation coordinate system is in the East-North-Up (ENU) frame. According to the SINS errors characteristics and GPS performance, the state equation of the integrated system can be written as follows: •

X = FX + W .

(1)

Where state variable of integrated system

X =[δL δλ δVE δVN φE φN φU ∇be ∇bn εbe εbn εbu δϕG δθG δγG]T .

(2)

δ L and δλ represent SINS latitude error and longitude error respectively; δVE and δ VN are east and north velocity errors respectively; φ E , φN , φU are east, north and azimuth misalignment angles respectively; ∇be and ∇ bn are the east and north accelerator biases respectively;

ε be , ε bn , ε bu represent

the constant drift rates of east, north

and azimuth gyros, respectively; δϕG , δθG , δγ G are GPS heading error and GPS pitch error, GPS roll error ,respectively. F is a 15 × 15 matrix, see equation (3) [10].

420

X. Chen

ª « 0 « « F21 « « F31 « F41 « « 0 « « « « F61 « « « F71 F=« « 0 « « 0 « 0 « « 0 « 0 « « « 0 « « « 0 « « « 0 «¬

0

0

1 RN

0

0

0

0

0

0

0

0

0

0

0 0 0

F23 F33 F43

0 F34 0

0 0 fU

0 − fU 0

0 fN − fE

0 c11 c12

0 c21 c22

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0

0

1 RN

0

F56

F57

0

0

c11

c21

c31

0

0

0

1 RE

0

F65

0

VN RN

0

0

c12

c22

c32

0

0

0

tgL RE

0

F75

VN RN

0

0

0

c13

c23

c33

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0 1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

−

−

−

0

τ ϕG −

1

τθG 0

º 0 » » 0 » » 0 » 0 » » 0 » » » » 0 » » » 0 » » 0 » » 0 » 0 » » 0 » 0 »» » 0 » » » 0 ». » 1 » » − τ γ G »¼

(3)

Where cij is an element of Cnb , which represents attitude matrix expressed as the relation between the ship coordinate system and geographic coordinate system. ª c11 c12 C = «c21 c22 « ¬«c31 c32

c 13 º c23 » » c33 ¼»

b n

Where F21 =

ϕ ,θ , γ

F56 = ω ie sin L +

F23 =

VE tgL; RE

VE tgL; RE

F71 = ωie cos L +

(4)

represent ship’s heading, pitch and roll, respectively.

VE sec LtgL; RE

F34 = 2ω ie sin L +

ªsinϕsinθ sinγ +cosϕcosγ cosϕsinθ sinγ −sinϕcosγ −cosθ sinγ º =« sinϕcosθ cosϕcosθ sinθ » . « » ¬«cosϕsinγ −sinϕsinθ cosγ −sinϕsinγ −cosϕsinθ cosγ cosθ cosγ ¼»

1 sec L; RE

F31 = 2ω ieVN cos L +

F41 = −2ω ieVE cos L − F57 = −ω ie cos L −

VE ; RE

VE VN sec 2 L; RE

VE 2 sec 2 L; RE

F61 = −ω ie sin L;

F33 =

VN tgL; RE

F43 = − 2ω ie sin L − 2

VE tgL ; RE

F65 = −ω ie sin L −

VE tgL; RE

(5)

VE V sec2 L; F75 = ωie cos L + E . RE RE

Where L is local latitude, ω ie is earth rotating angular rate, VE and V N are east and north velocities, respectively, and RE is the radius of the earth. The process noises

W are zero mean Gaussian processes with covariance Q . W = ª¬wδ L wδλ wδVE wδVN wφE wφN

T

wφU 0 0 0 0 0 wδϕG wδθG wδγG º¼ .

(6)

Study on FOG-SINS/GPS Integrated Attitude Determination System

421

τ ̓with different subscripts in equation (3) represents different time correlated constants corresponding to the GPS error states. In general they are selected from 100 seconds to 200 seconds. 2.2 Measurement Equation for Integrated System

Let LI = Lt + δL and λ I = λt + δλ represent SINS measurements of ship’s longitude

and

latitude,

respectively.

Let

LG = Lt − N N / R N

and

λG = λt − N E /( RE cos LG ) represent GPS measurements of ship’s longitude and latitude respectively, where Lt , λt are true values, N N and N E are north and east position errors of GPS receiver respectively. Let VIE = VE + δVIE , VIN = VN + δVIN be SINS measurements of ship’s east and north velocities respectively. Let VGE = VE − M E and VGN = V N − M N represent GPS measurements of ship’s east and north velocities respectively, where VE , V N are true values, M N and M E are north and east velocity errors of GPS receiver respectively. Let ϕI = ϕ + δϕ I , θ I = θ + δθ I and γ I = γ + δγ I represent SINS measurements of ship’s yaw, pitch and roll respectively. Let ϕG = ϕ + δϕG , θG = θ + δθ G and γ G = γ + δγ G represent GPS measurements of ship’s yaw, pitch and roll respectively. We assume that the measurement noise V is a zero-mean Gaussian process with covariance R independent with W . The measurement equation can be written as follows: ª ( LI − LG ) RN º ª RN δL + N N º «(λI − λG ) RE cos L» « RE cos Lδλ + N E » « » « » ªH p º ªV p º « » « » « » « » « » « δVIE + M E » « » VIE − VGE « » . « » « » «Hv » «Vv » VIN − VGN Z=« » = « δVIN + M N » = « » X + « » = HX + V « » « » « » « » H « » « δϕ − δϕ » « α» «Vα » ϕ I − ϕG I G « » « » « » « » ¬ ¼ θ I −θ G « » « δθ I − δθ G » ¬ ¼ « » « » γ I −γ G ¼ ¬ ¼ ¬ δγ I − δγ G

(7)

Where H p , H v , H α can be written as follows [11]: H p = [ diag[RN

RE cos L] 02×13 ] , Hv = [ 02×2 diag[1 1] 02×11 ]

ª c c « − 2 21 232 c21 + c22 « « c22 − H a = «03×4 2 « 1 − c23 « c12 c33 − c32 c13 « 2 « c132 + c33 ¬

−

c22 c23 2 2 + c22 c21 c21 2 1 − c23

c31c13 − c11c33 c132 + c332

1 0 03×5 0

º » » » . − I 3×3 » » » » » ¼

(8)

422

X. Chen

3 Q , R Adaptive Kalman Filtering Algorithm and Simulation 3.1 Q , R Adaptive Filtering Algorithm [12]

It is well known that if the process can be approximated with a linear model plus white noise with known statistics, then an optimal (minimum mean squared error) Kalman gain P can be evaluated. The key to determining the performance of the Kalman filter, is only to depend on the priori known transmitted matrix φ , measurement matrix H, process noise covariance matrix R and measurement noise covariance matrix Q . When φ , H, R and Q are matched with the actual system, the optimal estimation condition will be satisfied; but for the disturbance and time variance system like the FOG-SINS/GPS integrated navigation system in the high dynamic environment, it’s so possible that the computation of P matrix is divergent and degradation of the performance of the filter will happen. For the robustness and the adjustability of P, in this section, the innovation-based Q , R adaptive estimation algorithm is presented. For time variance system, the innovation is to make use of the current real measurements to evaluate noise statistics characteristics and model parameters online. It is a significant approach to accomplish an adaptive Kalman filter by introducing the innovation sequence to adjust the calculation of the Kalman gain. The novel adaptive Kalman filter presented herein is normal to calculate adaptively Q matrix and R matrix. From the statistics covariance of the real-time residual online, but this adaptive filter is secondly optimal. Assume a discrete linear system is as follows X k = φk ,k −1 X k −1 + Γk −1Wk −1

Z k = H k X k + Vk

(9)

Where X k ( n dimension) is state vector which will be evaluated at k instant, Z k ( m dimension) is measurement vector at k instant, φ k ,k −1 is the first step system state transition matrix from k − 1 instant to k instant, matrix, H k ( m × n dimensions) is the measurement matrix at k instant, Γk ( n × r dimensions) is system noise matrix . Wk (r dimension) is the process noise vector and Vk (m dimension) is the measurement noise vector. Both Vk and Wk are assumed to be uncorrelated zero-mean Gaussian white noise sequences with covariance given by T E{Wk } = 0, E{WkW jT } = Qkδ kj , E{Vk } = 0, E{VkV j } = Rkδ kj , Qk and Rk are variance matrix

of system noise and measurement noise respectively, and δ kj is a Kronecker function defined as 0 k ≠ j . ¯1 k = j

δ kj = ®

(10)

Let the initial state statistics characters be E{ X 0 } = m x 0 , Var{ X 0 } = C x 0 , X 0 is unrelated to the {Wk } and {Vk } .

Study on FOG-SINS/GPS Integrated Attitude Determination System

423

The recursive equations of an innovation-based conventional Kalman filter are as follows: X k / k −1 = φk , k −1 Xˆ k −1 , Pk / k −1 = φk ,k −1Pk −1φkT,k −1 + Γk −1Qk −1ΓTk −1 ,

K k = Pk / k −1 H k T ( H k Pk / k −1 H k T + Rk ) −1 , Xˆ k = Xˆ k / k −1 + Kk (Zk − H k Xˆ k / k −1 ) .

(11)

For stability of P matrix and eliminating calculation error, variance matrix innovation calculation formula deduced by Joseph is used Pk = ( I − K k H k ) Pk / k −1 ( I − K K H K )T + K K RK K K T .

(12)

For simplicity, assume filtering period is very short, Qk can be calculated as follows Qk = (Q + φ k +1,k Q φ kT+1,k )

T . 2

(13)

In equation (11), (12), system disturbance noise innovation variance matrix Qk and measurement noise innovation variance matrix Rˆ ( k ) can be calculated by innovation equation Q (k ) =

1 [( k − 1)Q ( k − 1) + K k ε ( k )ε T ( k ) K T k + Pk − Φ k , k −1 Pk −1Φ T k , k −1 ] k 1 Rˆ k = [( k − 1) Rˆ k −1 + ε ( k )ε T ( k ) − H k Pk , k −1 H T k ] . k

(14)

where ε k = Z k − H k X k , k −1 is an innovation sequence matrix. According to (11), (12),(14), adaptive filtering algorithm is obtained. The algorithm can evaluate noise characters and state vectors online. 3.2 Simulation Verification

To verify the better performance of the Q , R adaptive Kalman filter over the conventional Kalman filter for marine FOG-SINS/GPS integrated navigation system, assume the ship ideal angular movement equations are as follows ϕ = ϕ 0 + Aϕ sin(ωϕ t + φϕ ) ° . ®θ = Aθ sin(ωθ t + φθ ) °γ = A sin(ω t + φ ) γ γ γ ¯

(15)

Where Aϕ , Aθ and Aγ are swing angular amplitude of yaw, pitch and roll respectively;

φϕ , φθ and φγ

are initial swing phase of yaw, pitch and roll respectively.

The ship ideal velocities are as follows VE = V sin(ϕ ) ° . ®VN = V cos(ϕ ) °V = 0 ¯ U

(16)

Assume Aϕ = 14° , ωϕ = π / 3 ; Aθ = 9° , ωθ = π / 4 ; Aγ = 12° , ωγ = π / 5 ; initial heading ϕ0 = 45° , sailing velocity V = 15 m/s; initial phases of yaw, pitch and roll are zero. Assume FOG fix drifts are 0.1º/h, random drifts of FOG are 1º/h, accelerometer fix

424

X. Chen

biases are 0.5×10-4g, random biases of accelerometers are 0.5×10-4 g, misalignment angles of east, north, up are each 1º. Quarter element attitude algorithm with one order is used in FOG-SINS, updating period of attitude is 10 millisecond. Assume GPS velocity measurement noise variance is 0.2m/s, GPS position measurement noise variances of east, north are each 15m, GPS three attitude measurement noise variances are each 0.4º, and initial values of adaptive Kalman filter parameters are as follows P0 = diag{(0.02$ )2 ,(0.02$ )2 ,(0.5m / s)2 ,(0.5m / s)2 ,(1$ )2 ,(1$ )2 ,(1$ )2 , (0.05mg )2 ,(0.05mg )2 ,(1$ / h)2 ,(1$ / h)2 ,(1$ / h)2 ,(0.4$ )2 ,(0.4$ )2 ,(0.4$ )2 ,} Q0 = diag{(0.02$ )2 ,(0.02$ )2 ,(0.5m / s) 2 ,(0.5m / s)2 ,(1$ )2 ,(1$ )2 ,(1$ )2 , (0.05mg )2 ,(0.05mg )2 ,(1$ / h)2 ,(1$ / h)2 ,(1$ / h)2 ,(0.4$ )2 ,(0.4$ )2 ,(0.4$ )2 ,} R0 = diag {(15 m ) 2 , (15 m ) 2 , (0.2 m / s ) 2 , (0.2 m / s ) 2 , (0.4 $ ) 2 , (0.4 $ ) 2 , (0.4 $ ) 2 }

For the convenience of analyses for the stability of filter and filtering process over long time, assume filtering period of adaptive filtering is 10 seconds, error correction with open loop for attitude is applied in integrated system [13]. Fig.1 represents the results simulated in static condition ((a), (b) vertical axes represent heading error and roll error respectively with unit in degrees). Fig.2 represents the results simulated in above description dynamic condition ((a), (b) vertical axes represent heading error and roll error respectively with unit in degrees). Due to limited space, Fig.3 (a), (b), (c) represent only roll error (unit: degree), latitude (unit: degree), north velocity (unit: m/s) of the adaptive filtering process of parameters evaluated, Fig.3 (d) shows the convergence process of the trace of matrix Q , the other parameters also converge with similar results.

Fig. 1. comparison between adaptive Kalman filtering and convention Kalman filtering correction error curve in static condition (solid line is adaptive filtering correction error, dash line is conventional filtering correction error)

From Fig.1 and Fig.2, we can conclude that Q , R adaptive Kalman filter has better performance than conventional filter, and it has fine precision in dynamic condition. In Fig.2, error evaluated by adaptive Kalman filter is consistent with the original error and show fine tracking performance. Fig.3 shows that the convergence performance of parameters is good although there exist very short fluctuation during initial 200

Study on FOG-SINS/GPS Integrated Attitude Determination System

(a) heading error

425

(b) roll error

Fig. 2. comparison between error evaluated by adaptive Kalman filter and original error in dynamic condition (solid line is original error, dash line is error evaluated by adaptive filtering correction error)

T/10ms (a) roll error

T/10ms

(c) north velocity

T/10ms

(b) latitude error

T/10ms

(d) the trace of matrix Q

Fig. 3. convergence process of parameters evaluated during adaptive Kalman filter filtering process

426

X. Chen

filtering periods. For the FOG-SINS/GPS integrated system, filtering period is longer, measurement errors are larger because of FOG-SINS error accumulation, and this case can be explained and shown in Fig.3. Main navigation parameters statistics data after filtering can be obtained. Output of integrated system using adaptive filter is stable and statistics results are shown in table 1. The results show that marine FOG-SINS/GPS integrated navigation system using Q , R adaptive Kalman filter can improve the precision of attitude and the other navigation parameters, and has better statistic performance than GPS attitude determination system. Table 1. Precision of output for FOG-SINS/GPS marine integrated system

parameters Mean error Standard error parameters Mean error Standard error

heading ϕ ( rad ) 3.11201959×10-4 2.409576×10-3

pitch θ ( rad ) -9.72846272×10-5 1.553160×10-3

East velocity

North velocity

Latitude

VE (m/s)

VN (m/s)

L ( rad )

2.278456×10-2 1.623810×10-1

-2.053372×10-2 1.666933×10-1

-1.462282×10-7 2.485179×10-6

roll γ ( rad ) 6.0350448×10-4 1.764169×10-3 Longitude

λ ( rad ) -1.234185×10-7 2.447358×10-6

4 Conclusions Considering the actual situation of ship operation, Q , R adaptive filter presented in the paper has fine filtering result that is reasonable for marine FOG-SINS/GPS integrated attitude determination system under the static and dynamic condition. The performance stability of this kind of adaptive filter is better than the conventional Kalman filter. Statistical performance after filtering for this integrated system is better than the GPS attitude determination system.

Acknowledgement The work was supported by the Southeast University Excellent Young Teacher Foundation ( 4022001002 ) and the National Defense Advanced Research Foundation (6922001019). The author would like to thank Dr. Song Ye and Prof. Dejun Wan of the Department of instrument science and engineering at Southeast University for helpful suggestions.

References 1. Van, G. F., Braasch, M.S.: GPS Interferometric Attitude and Heading Determination: Initial Flight Test Results. Navigation, Vol.38, No.4, (1991)297-316 2. Cohen, C.E.: Attitude Determination. In Global Positioning System Theory and Applications conf., American Institute of Astronautics, Washington D.C. Vol. 2, (1996) 519-538.

Study on FOG-SINS/GPS Integrated Attitude Determination System

427

3. Wang, B.: Study on Adaptive GPS/INS Integrated Navigation System. IEEE Proc. Intel. Transp. Syst. Vol. 2 (2003) 1016–21 4. Hide, C., Moore, T., Smith M.: Adaptive Kalman Filtering Algorithms For Integrating GPS and Low Cost INS.Position, Location and Navigation Symp, (2004) 227–33 5. Lee, T.G.: Centralized Kalman Filter with Adaptive Measurement Fusion: its Application to a GPS/SDINS Integration System with an Additional Sensor. International Journal of Control, Automation,and Systems. Vol. 1, No. 4,(2003) 6. Carlson, N.A.: Federated Filter for Computer Efficient, Near-optimal GPS Integration. IEEE Trans. on Aerospace and Electronic Systems, (1996) 306-314,. 7. Fried, K.: Avoinics Navigation Systems. A Wiley-Interscience Publication (1997) 8. Maybeck, P.S.: Stochastic Models. Estimation and Control volume 2, Academic Press, New York (1979) 9. Bian, H., Jin, Z., Tian, W.: Study on GPS Attitude Determination System Aided INS Using Adaptive Kalman Filter. Meas. Sci. Technol. Vol.16 ,(2005) 2072–2079 10. Ye, S.: Study on Data Processing and Fusion Technology in FOG Strapdown/GPS Integrated Attitude and Heading System [D]. Southeast University, Nanjing (2004) 11. Yang, Y., Bian, H., Tian, W., Jin, Z.: A Novel INS/GPS Integrated Navigation Technique. Journal of Chinese Inertial Technology, Vol.12, No.4, (2004) 23–26 12. Zhu, H., Mo, J.: Underwater Navigation Information Fusion Technology [M]. National Defense Press, Beijing (2002) 13. Yuan, X., Yu, J.-X.: Navigation System [M]. Beijing Aeronautics Industry Press, Beijing (1993)

Supervisory Control for Rotary Kiln Temperature Based on Reinforcement Learning Xiaojie Zhou, Heng Yue, Tianyou Chai, and Binhao Fang Research Center of Automation, Northeastern University, Shenyang, 110004, P.R. China [email protected]

Abstract. The burning zone temperature in rotary kiln process is a vitally important controlled variable, on which the sinter quality mainly relies. Boundary conditions such as components of raw material slurry often change during kiln operation, but related offline analysis data delay to reach or even are unknown to the human operator. This causes unsatisfactory performance of the burning zone temperature controller and subsequent unstable production quality. To deal with this problem, a Q-learning-based supervisory control approach for burning zone temperature is proposed, in which the signals of human intervention are regarded as the reinforcement learning signals, so that the set point of burning zone temperature can be duly adjusted to adapt the fluctuations of the boundary conditions. This supervisory control system has been developed in DCS and successfully applied in an alumina rotary kiln. Satisfactory results have shown that the adaptability and performances of the control system have been improved effectively, and remarkable benefit has been obtained.

1 Introduction Rotary kiln is a kind of large scale sintering device widely used in metallurgical, cement, refractory materials, chemical and environment protection industries. Its complicated working mechanism includes physical change and chemical reaction of material, procedure of combustion, thermal transmission among gaseous fluid, solid material fluid and the liner. The automation problem of such processes remains unsolved because of the following inherent complexities. It is a multivariable nonlinear system with strong coupling. The key controlled variable of burning zone temperature is difficult to measure. It has less manipulated variables than controlled variables. It has multiple control targets. Most of rotary kilns are still under manual control with human operator observing the burning status. As a result, the product quality is hard to be kept consistent and energy consumption remains high. Although several advanced control strategies including fuzzy control [1], intelligent control [2,3,4] and predictive control [5] have been introduced into process control of rotary kiln, all these researches focused on trying to achieve complete automatic control without human operators. As a matter of fact, the boundary conditions of a rotary kiln often change. For example, the material load, water content and components of the raw material slurry vary frequently and severely. Moreover, the offline analysis data of components of raw material slurry reach the operator with large time delay. Thus complete automatic control without D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 428 – 437, 2006. © Springer-Verlag Berlin Heidelberg 2006

Supervisory Control for Rotary Kiln Temperature Based on Reinforcement Learning

429

human operation for such a complex process is unpractical. To deal with the complexity of operation conditions, the authors have proposed an intelligent control system based on human-machine interaction for an alumina rotary kiln in [10], in which human intervention function was design so that, if the operation condition changed largely, the human operator observing burning status can intervene the control actions when the system is in the automatic control mode to enhance the adaptability of the control system. This paper develops a supervisory control approach for burning zone temperature based on Q-learning, in which the signals of human intervention are viewed as the reinforcement learning signals. Section 2 makes brief descriptions of process and control system architecture. Section 3 discusses the detailed methodology of Q-learning-based supervisory control approach. The implementation and industrial applications are shown in Section 4. Finally, Section 5 draws the conclusion.

2 Process Description and Control System Architecture The rotary kiln process is described as follows. Raw material slurry is sprayed into the rotary kiln from upper end (the kiln tail). At the lower end (the kiln head), the coal powders from the coal injector and the primary air from the air blower are mixed into bi-phase fuel flow, which is sprayed into the kiln head hood and combusts with the secondary air, which comes from the cooler. The heated gas was brought to the kiln tail by the induced draft fan, while the material moves to the kiln head via the rotation of the kiln and its self weight, in counter direction with the gas. After the material passes through the drying zone, pre-heating zone, decomposing zone and burning zone in sequence, soluble sodium aluminate is generated in the sinter, which is the final product of the kiln process. This process aims to reach high digesting rate of alumina and sodium oxide in the following sinter digestion procedure. How to keep the stability of temperature distribution along the above zones of the rotary kiln is the key problem during kiln operation. To reach this target, the operator or control system must keep all key controlled variables in technical required ranges. The burning zone temperature TBZ is indirectly measured by an infrared pyrometer located at kiln head hood, and the kiln tail temperature TBE is obtained through a thermocouple. In our previous study [10], a preliminary process control system for such a rotary kiln has been established, whose general structure is shown in Fig. 1. The related process control strategies include, 1) a hybrid intelligent temperature controller was designed, which coordinated the coal feeding u1, damper position of the induced draft fan u2, and primary air flow u3 to make TBZ, TBE, and the residual oxygen content in combustion gas OX satisfy technical requirements; 2) individual PI controllers were assigned to basic loops of primary air flow, primary air pressure and slurry flow of raw material; and 3) the function of human intervention was designed so that certain interventions to coal feeding control from experienced operator can be introduced in the mode of automatic control when the operating conditions changed significantly. This paper has constructed a supervisory control system consisting of a supervisory level and a process control level. The latter employs the aforementioned process control strategy, and the former adjusts the setpoints of TBZ, TBE, OX and the kiln rotary speed n according to the variances of such boundary conditions as production capacity in unit time per kiln,

430

X. Zhou et al.

Human intervention uM TBZ_SP Setpoint of TBE Setpoint of OX

Hybrid intelligent temperature controller Primary air flow controller

Setpoint of air press. Setpoint of raw mat. flow Setpoint of n

Rotary kiln process

UMI

Primary air pressure controller Raw material flow controller Kiln rotary motor speed controller

u1 u2 u3

Coal feeder Induced draft fan

Air blower

TBZ

Rotary Kiln

Feed pump

TBE OX

Kiln electromotor

Fig. 1. Structure of the original process control system for rotary kiln

components of raw material slurry. The final target of this supervisory control system is to keep the production quality index, i.e. the sinter unit weight, being acceptable even if the boundary conditions changed. From the viewpoint of technologists, TBZ_SP, i.e. the setpoint of TBZ is related with the sintering temperature Tsinter, which should be altered according to the changes of components of raw material slurry. The relationship between desired Tsinter and components of raw material slurry can be viewed as a unknown nonlinear function T sin ter = f ([ A / S ], [ N / R ],[C / S ], [ F / A]) .

(1)

where [A/S] is the alumina silica ratio of raw material slurry, [N/R] is the alkali ratio, [C/S] is the calcium silica ratio, [F/A] is the iron alumina ratio. Among them, the alumina silica ratio of raw material slurry has the strongest influence on Tsinter, the latter must be enhanced along with the enhancement of the former. As a matter of fact, the main problem we are facing is that the components of raw material slurry often change and the offline analysis data reach to the operator with large time delay so that the operator cannot directly adjust the setpoint of TBZ duly. As a result, a single hybrid intelligent temperature controller cannot maintain satisfactory performance. In such a case, a human operator usually rectifies the output of the temperature controller based on the experience of observing burning status through the human-machine interaction mechanism embedded in the control system. Such interventions can adapt the variation of operating conditions to a certain degree to sustain the sintering quality of the product. In the following section, a Q-learning strategy is employed to construct the self-adjusting knowledge about the setpoint of TBZ through learning from the human intervention signals.

3 Setpoint Adjustment Approach Based on Q-Learning 3.1 Bases of Q-Learning Reinforcement learning is learning with a critic instead of a teacher. The only feedback provided by the critic is a scalar signal r called reinforcement, which can be thought of

Supervisory Control for Rotary Kiln Temperature Based on Reinforcement Learning

431

as a reward or a punishment. Reinforcement learning performs an on-line search to find an optimal decision policy in multi-stage decision problems. Q-learning [9] is a reinforcement learning method where the learner builds incrementally a Q-function which attempts to estimate the discounted future rewards for taking actions from given states. The output of the Q-function for state x and action a is denoted by Q ( x, a ) . When action a has been chosen and applied, the environment moves to a new state, x′ , and a reinforcement signal, r , is received. Q ( x, a ) is updated by

˄

Q ( x, a ) ← 1 − α )Q( x, a ) + α {r + γ max Q ( x′, a′)} .

(2)

a′∈A ( x′ )

where A(x′) is the set of possible actions in state x′ , α is the learning rate and γ a discount factor. Learning system of burning zone temperature setpoint

Environment HMI

Critic

Human intervention

Reward r Qk+1(x,a) Actionvalue function look up table

TBZ

Learner Qk(x,a) Action Selector

TBZ_SP

Hybrid Intelligent Temperature Controller

u1 u2 u3

Rotary Kiln Process

G

State x State Perception

Fig. 2. The schematic diagram of setpoint adjustment approach for TBZ based on Q-learning

3.2 Principle of Setpoint Adjustment Approach Based on Q-Learning In this section, we may design an online self-learning system based on reinforcement learning to gradually establish the optimal policy of setpoint adjustment of TBZ. Although it cannot reach to the operator in time, the changes of components of raw material slurry may be indirectly reflected through certain state information about the rotary kiln process. The state information can be used to construct the environment state set of the learning system. Moreover, information of human interventions can be regarded as evaluations about whether the setpoint of TBZ is proper or not, for human interventions often occur when the performance is unsatisfactory. Thus this kind of information can be defined as reward signal from environment. The environment provides current states and reinforcement payoffs to the learning system. The learning system produces actions to perform in the environment. The learning system consists of a state perceptron, a critic, a learner and an action selector, as shown in Fig. 2. The state perceptron firstly samples and processes selected measurements to construct the original state space, and then maps the original

432

X. Zhou et al.

continuous state space into a finite feature space based on a defined feature extraction function. The action selector employs a İ-greedy action selection rule to produce action and the learner updates value function of the state-action pair based on tabular Q-learning. The critic serves to calculate an internal practicable reward relying on some heuristic rules. 3.3 Construction of the State Perceptron In an MDP (Markov decision process), only the sequential nature of the decision process is relevant, not the amount of time that passes between decision stages. A generalization of this is the semi-Markov decision process (SMDP) in which the amount of time between one decision and the next is a random variable. For the setpoint adjustment learning process, we define τ s as state perception time span for the perceptron to get the state of the environment and τ r as reward calculation time span for the critic to calculate reward from the environment respectively. The shortest time span from one decision to the next is τ τ s τ r . If the operating condition of the rotary kiln is not kept in a relatively smooth status, the learning procedure should be postponed accordingly. First, based on human experience, some state variables are specified to buildup the state space S of the learning system. They are TBZ , averaged coal feeding u1 Ave t/h ,

˙ˇ

㧔㧕

㧔㧕

averaged flow rate of raw material slurry G Ave m3/h and temperature bias ∆TBZ , which are defined by J

u1 Ave = ¦ u1 ( j ) J .

(3)

j =1 J

G Ave = ¦ G ( j ) J .

(4)

∆TBZ = TBZ _ SP − TBZ .

(5)

j =1

where u1 ( j ) , G ( j ) are coal feeding, flow rate of raw material slurry on j th sampling respectively, J is the number of the samplings during τ s . Then we establish the state vector s = (TBZ , ∆TBZ , u1 Ave , GAve ) with s ∈ S . Since the state space S we consider in this paper is continuous, it is impossible to compute and store value functions for every possible state or state-action pair due to the curse of dimensionality. The issue is often addressed by generating a compact parametric representation, such as an artificial neural network [6], that approximates the value function and can guide future actions. In this paper, we practically choose to use a feature extraction method [8] to map the original continuous state space into a finite feature space, then we can employ tabular Q-learning to solve the problem. By identifying one partition per possible feature vector, the feature extraction mapping F defines a partitioning of the original state space. The burning zone temperature level feature f1 , the temperature setpoint bias level feature f 2 , the coal feeding level feature f3 , flow rate of raw material slurry level feature f 4 are defined respectively by

Supervisory Control for Rotary Kiln Temperature Based on Reinforcement Learning

433

0, 1210 ≤ TBZ ( j ) < 1250 ° f1 = ®1, 1250 ≤ TBZ ( j ) < 1280 ∀j = 1,2,, J . °2, T ( j ) ≥ 1280 BZ ¯

(6)

−2, ° °− 1, ° f 2 = ®0, °1, ° ¯°2,

∆TBZ ( j ) < −40 − 40 ≤ ∆TBZ ( j ) < −20 − 20 ≤ ∆TBZ ( j ) < 20

∀j = 1,2,, J

.

(7)

20 ≤ ∆TBZ ( j ) < 40 ∆TBZ ( j ) ≥ 40 0, 5 ≤ u1 Ave < 8 ° f 3 = ®1, 8 ≤ u1 Ave < 11 °2, 11 ≤ u < 14 1 Ave ¯

.

(8)

0, 65 ≤ GAve < 75 ° f 4 = ®1, 75 ≤ GAve < 85 °2, G ≥ 85 Ave ¯

.

(9)

each one being a function from the state space S to a finite set Pm , m = 1,2,3,4 . Then we associate the feature vector F ( s ) = [ f1 (TBZ ),, f 4 (GAve )] to each state s ∈ S .The resulting set of all possible feature vectors, also defined as feature space X , is the Cartesian product of the sets Pm and its cardinality increases exponentially with the number of features. 3.4 Action Set The learning system aims to deduce the proper or best actions of setpoint adjustment of TBZ from specified environment state. The problem to be handled is how to choose the amendment of TBZ setpoint according to the changes of environment state. Thus the action set can be defined as A = {a1 , a 2 , a 3 , a 4 , a5 } = {−20,−10,0,10,20} . 3.5 Reward Signal For the learning system, the reward signal r is determined in table 1. where ∆TBZ (i) is the temperature bias on i th sampling respectively, I is the number of the samplings during τ r , n is the effective human intervention times during τ r . The Table 1. Determination of the reward signal

r ∆TBZ (i) ≥ 50 ∀i = 1,2,, I

n < 10 0

10 ≤ n < 20 −0.4

n ≥ 20 −1.0

30 ≤ ∆TBZ (i) < 50 ∀i = 1,2,, I

0 .2

0

−0.4

∆TBZ (i) < 30 ∀i = 1,2,, I

0 .4

0 .2

0

434

X. Zhou et al.

“effective human intervention” indicates that it acts in opposite direction to the controller’s regulation while operator has made no amendment on the intervention action in a short following time. If the operator made a pseudo judgment, he should restore the coal feeding to the original level in a short following time. A positive reward is given if less human interventions occur and the error between setpoint and measured value of TBZ remains minor, and vice versa. 3.6 Algorithm Summary The whole learning algorithm is summarized as follows: 1) 2) 3) 4)

Initialize Q ( x, a ) by means of certain prior knowledge. Perceive a current state x during the current τ s by using (3)-(9). Select a setpoint adjustment action a by using İ-greedy policy [7] where İ=0.1. Take the selected action a and then obtain the reward r from table 1 calculated during the current τ r . 5) Perceive the next state x′ during the next τ s by using (3)-(9) and update the Q ( x, a ) by using (2), where α = 0.15 , γ = 0.90 . 6) Go to step 2 until the terminal condition is met, that is, all Q ( x, a ) almost no longer changes with time.

4 Industrial Application Shanxi Alumina Plant is the largest alumina plant in Asia with megaton production capacity. It has 6 oversize rotary kilns of ĳ4.5×110m. Its production employs the series parallel technology of Bayer and Sintering Processes. Such a production technology makes components of the raw material of rotary kilns often vary in large range. It is more difficult to keep a stable kiln operation than ordinary rotary kiln. A supervisory control system has been developed in the #4 rotary kiln of Shanxi Alumina Plant based on the proposed structure and the setpoint adjustment approach of burning zone temperature. It is implemented in the I/A Series 51 DCS of Foxboro. The Q-learning-based strategy has been realized in the configuration environment of Fox Draw and ICC of I/A Series 51 DCS. Related parameters are chosen as τ s 30min, τ r =120min. Fig. 3 shows a typical operating condition of mismatching of setpoint of TBZ and variations of components of raw material slurry. During observed 8 hours, the flow rate of raw material slurry has been kept about 80m3/h, remaining a normal level. From later offline analysis we knew that the components of raw material slurry were [N/R]=1.07, [C/S]=2.02, [A/S]=3.63, and [F/A]=0.07. According to the technical requirement, the desired sintering temperature should be about 1310 . The actual measured TBZ was around 1300 , remaining the rather high level. The coal feeding around 11t/h also remains in high level. But the actual setpoint of TBZ was kept at 1240 without reflecting the required sintering temperature. As a result, the temperature controller kept decreasing the coal feeding, while frequent human interventions of increasing coal feeding were often introduced by operator.

㧩

͠

͠

͠

Supervisory Control for Rotary Kiln Temperature Based on Reinforcement Learning

435

Fig. 3. The trends when burning zone temperature sepoint mismatches components of raw material slurry

Fig. 4. The setpoint of burning zone temperature is properly adjusted after learning

Fig. 4 shows the condition that, after a period of learning, a set of relatively stable strategies of setpoint adjustment has been established so that the setpoint of the TBZ can be automatically adjusted according to the level of raw material slurry flow, the level of coal feeding, and trends of TBZ changing. The adaptability for variations of operating conditions has been significantly enhanced. During the shown 8 hours, the flaw rate of raw material slurry was kept around 83m3/h at normal level. At 13:00, the measured TBZ shifted from 1260 to 1295 . It was known, from later offline analysis, that the components of raw material slurry changed to be [N/R]=1.04, [C/S]=2.06, [A/S]= 3.50, and [F/A]=0.07. In such a condition, technical required sintering temperature should be about 1300 . While the current setpoint of TBZ remains 1255 , more and more human interventions were introduced gradually. At 13:30, based on the judgments to environment states that coal feeding during τ s remained high about 12t/h , and TBZ

͠

͠

͠

㧔

͠㧕

͠ 㧔

㧕

with regular flow rate of raw material slurry, the remained high above 1290 supervisory control system has amended the setpoint of TBZ to 1285 in order to satisfy the requirement of sintering temperature. The curves in following 5 hours shown that human interventions became less than before and coal feeding has been kept relatively stable.

͠

436

X. Zhou et al.

In the period of test run, the running rate of supervisory control system has been up to 90%. Negative influences on the heating and operating conditions from human factors have been avoided, rationalization and stability of sinter production has been kept, and operational life span of kiln liner has been prolonged remarkably. The qualification rate of sinter unit weight has been enhanced from 78.67% to 84.77%; production capacity in unit time per kiln has been increased from 52.95t/h to 55t/h with 3.9% increment. The kiln operating rate has been elevated up to 1.5%. Through the calculation based on average 10 reduction of kiln tail temperature and average 2% decrease of the residual oxygen content in combustion gas, it can be concluded that 1.5% energy consumption has been saved.

ć

5 Conclusion This paper develops a supervisory control approach for burning zone temperature based on Q-learning. The signals of human intervention are viewed as the reinforcement learning signals so that the learning procedure of the control system can be evaluated properly and a closed-loop online learning system is constructed. This control system has been successfully applied in an alumina rotary kiln in China, satisfactory results have been obtained and show that the adaptability and performances of the control system have been improved effectively, and the proposed strategy is an effective tool to improve the adaptability and performances of the kiln control system. Further research will focus on integration with supervised learning approach in rotary kiln control.

Acknowledgement The paper is supported by the National Fundamental Research Program (2002CB312201), by the State Key Program of National Natural Science (60534010), by the Funds for Creative Research Groups (60521003), and by Program for New Century Excellent Talents in University of China.

References 1. Holmblad, L. P., Østergaard, J. –J.: The FLS Application of Fuzzy Logic, Fuzzy Sets and Systems, 70 (1995) 135-146 2. Jarvensivu, M., Saari, K., Jamsa-Jounela, S. L.: Intelligent Control System of an Industrial Lime Kiln Process, Control Engineering Practice, 9(6) (2001) 589-606 3. Jarvensivua, M., Esko Juusob, Olli Ahavac: Intelligent Control of a Rotary Kiln Fired with Producer Gas Generated from Biomass, Engineering Applications of Artificial Intelligence, 14 (5) (2001) 4. Liu, Z. Q., Liu, Z. H., Li, X. L.: Status and Prospect of the Application of Municipal solid waste incineration in China, Applied Thermal Engineering, 26(11-12) (2006) 1193-1197 5. Rolando Zanovello, Hector Budman: Model Predictive Control with Soft Constraints with Application to Lime Kiln Control, Computers and Chemical Engineering, 23 (6) (1999) 791-806

Supervisory Control for Rotary Kiln Temperature Based on Reinforcement Learning

437

6. Sutton, R. S.: Generalization in Reinforcement Learning: Successful Examples using Sparse Coarse Coding, In: D. Touretzky, M. Mozer, M. Hasselmo, (eds.) Advances in Neural Information Processing Systems, NY: MIT Press (1996) 1038-1044 7. Sutton, R. S., Barto, A. G.: Reinforcement Learning: An Introduction, Cambridge, MA: MIT Press (1998) 8. Tsitsiklis, J. N., Van Roy, B.: Feature-based Methods for Large Scale Dynamic Programming, Machine Learning, 22(1-3) (1996) 59-94 9. Watkins, J. C. H., Dayan, P., Q-Learning, Machine Learning, 8(3-4) (1992)279-292 10. Zhou, X. J., Xu, D. B., Zhang, L., Chai, T. Y.: Integrated Automation System of a Rotary Kiln Process for Alumina Production, Journal of Jilin University (Engineering and Technology Edition), sup: 350-353 (in Chinese) (2004)

A Data Reorganization Algorithm to Improve Transmission Efficiency in CAN Networks Jung-Ki Choi, Sungyun Jung, and Kwang-Ryul Baek Department of Electronics Engineering, Pusan National University, Busan, Korea {choijk1979, syjung, krbaek}@pusan.ac.kr

Abstract. CAN network is multi-master/multi-slave communication protocol and also supports single-master/multi-slave. In single-master/multi-slave mode, CAN master transmits control messages to each slave. If many slaves need small data from CAN master, CAN network spends a lot of time transmitting CAN overhead bits. So the transmission efficiency in CAN network goes down. In this paper, we propose an algorithm that increases the transmission efficiency by reorganizing the data mathematically.

1 Introduction The CAN network is multi-master/multi-slave communication protocol and also supports single-master/multi-slave [1]. The configuration of single-master/multi-slave CAN network is shown in Fig. 1. One master and many slaves are connected to CAN network and master controls flow of data.

Fig. 1. The configuration of CAN network

As shown in Fig. 2, the frame of standard CAN has 47 bits of overhead. The overhead consists of SOF field, arbitration field, control field, data field CRC field ACK field and EOF field. To transmit data over CAN network, one stuff bit is added every 4 bits in the original frame. This because the CAN standard states that one stuff bit is added every 5 bits at the same level in the bit stream transmitted over the bus. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 438 – 443, 2006. © Springer-Verlag Berlin Heidelberg 2006

A Data Reorganization Algorithm to Improve Transmission Efficiency

439

Fig. 2. Format of the standard CAN frame

The time to transmit CAN message is: § ª 34 + 8S m º · Cm = ¨¨ « »¼ + 47 + 8S m ¸¸τ bit . 5 ©¬ ¹

(1)

where Cm is the longest time taken to transmit CAN message and Sm is the number of bytes in payload field of the message and Ĳbit is bit time of the CAN bus [2][3]. To transmit data over CAN network, many bits are added for making CAN message frame. Sometimes CAN network is inefficient because of the additional overhead bits. To reduce the overhead bits, CANopen uses the PDO mapping and PDO linking techniques [4]. But the techniques don’t provide mathematical calculation algorithm to reduce the overhead bits. In this paper, we propose an algorithm to improve transmission efficiency which is based on mathematical calculation of overhead bits and estimate how transmission efficiency is improved by using the proposed algorithm.

2 Transmitting CAN Message Using Sequential Algorithm Fig. 3 shows how CAN network transmits message frames using sequential algorithm. This algorithm is a normal method to transmit data over CAN network. To transmit n bytes of data to m slaves, CAN master transmits m messages one after one. And by using the ID in the CAN message frame, each slave detects the data what it needs.

Fig. 3. Transmitting CAN messages

440

J.-K. Choi, S. Jung, and K.-R. Baek

The time to transmit n bytes of data to m slaves is: · § ª 34 + 8 × n º Tmn = m × Cn = m × ¨¨ « + 47 + (8 × n) ¸¸τ bit . » 5 ¼ ¹ ©¬

(2)

According to the equation 2, as the number of slave increases and data increase, the transmission time increases.

3 A Data Reorganization Algorithm In section 3, data reorganization algorithm is proposed to increase transmission efficiency mathematically. At Fig. 4, m n bytes of data are reorganized into 8 bytes and transmitted with new ID. Fig. 5 shows the algorithm how slaves receive the reorganized data. All slaves save the data which are transmitted by CAN master. Each slave reads all frames, reassembles the whole combined messages and extracts its intended data.

μ

Fig. 4. Reorganization of data

Fig. 5. Transmitting reorganized message frames

The transmission time using the proposed algorithm is:

T ' mn = C 8 × floor ( m × n / 8 ) + C mod(

μ μ

μ

m × n ,8 )

.

μ

(3)

where floor(m n/8) is rounding down of the value m n/8 and mod(m n, 8) is the remainder of m n/8. Table 1 shows transmission efficiencies which are calculated by dividing the transmission time with sequential algorithm by the transmission time

A Data Reorganization Algorithm to Improve Transmission Efficiency

441

with the proposed algorithm. As the number of slave increases and data per each slave decreases, the transmission efficiency increases. By using the proposed algorithm, the transmission efficiency is improved up to 388% at CAN 2.0A, 250kbps. Table 1. Transmission efficiency (%) = Tmn / T’mn

The Number of slaves Data per slave 1 byte 2 byte 3 byte 4 byte 5 byte 6 byte 7 byte 8 byte

໬100 @ CAN 2.0A, 250Kbps

1

2

3

4

5

6

7

8

100 100 100 100 100 100 100 100

174 158 148 141 100 100 100 100

230 197 128 124 121 100 100 100

275 224 148 141 115 114 100 100

311 179 164 130 112 111 100 100

341 197 148 141 121 109 100 100

367 211 159 133 118 107 100 100

388 224 169 141 125 114 106 100

4 Modeling Using M/D/1 Queue The memory of CAN master can be modeled as an M/D/1 queue [3]. Fig. 6 shows M/D/1 queue in single-master/multi-slave system. To adopt M/D/1 queue model, we suppose that the data transmitted from master to slaves arrive according to Poisson process of rate Ȝ. And when master transmits data to slave, new arrived data wait in the master’s queue. And the service time is the time to transmit data from master to slaves over CAN network.

Fig. 6. CAN network of M/D/1 queue

In M/D/1 system, service time E[Ĳ] is constant according to the transmission data and Tmn is the service time using sequential algorithm and T’mn is the service time using the proposed algorithm and E[T] is mean total delay in the system [5]. § ª 34 + 8n º · E[τ ] = Tmn = m × ¨¨ « + 47 + (8 × n) ¸¸ × τ bit . » ©¬ 5 ¼ ¹

§ m×n · E[τ ] = T 'mn = C8 × floor ¨ ¸ + Cmod(m×n ,8) . © 8 ¹

E[T ] = E[τ ]

2− ρ . 2 − 2ρ

(4) (5) (6)

442

J.-K. Choi, S. Jung, and K.-R. Baek

where ȡ = arrival rate/service rate. By using the equation 6, we can get the graph of mean total delay at CAN 2.0A, 250kbps. Fig. 7 is the graph when transmission data per each slave is 1 byte. As the number of slave increases from 1 to 8 in sequential algorithm, the mean total delay increases 8 times. But the mean total delay using the proposed algorithm is decreased than sequential algorithm because CAN overhead bits are decreased by using the proposed algorithm. The mean total delay in Fig. 8 is not decreased as much as Fig. 7. And the mean total delay in Fig. 9 is not decreased because the overhead bits using the proposed algorithm are not decreased. E[T] (us) 10000

E[T] (us) 10000

Sequential algorithm

5000

Proposed algorithm

5000 8 slaves l slave

0

0

0.5 ȡ = arrival rate / service rate

1

0

0

8 slaves l slave

0.5 ȡ = arrival rate / service rate

1

Fig. 7. The mean total delay with 1 byte of the transmission data per slave E[T] (us) 10000

5000

E[T] (us) 10000

Sequential algorithm

5000

8 slaves

8 slaves

l slave 0

0

0.5 ȡ = arrival rate / service rate

Proposed algorithm

l slave 1

0

0

0.5 ȡ = arrival rate / service rate

1

Fig. 8. The mean total delay with 4 bytes of the transmission data per slave E[T] (us) 10000

E[T] (us) 10000

Sequential algorithm

8 slaves

5000

8 slaves

5000

l slave 0

0

0.5 ȡ = arrival rate / service rate

Proposed algorithm

l slave 1

0

0

0.5 ȡ = arrival rate / service rate

1

Fig. 9. The mean total delay with 8 bytes of the transmission data per slave

5 Conclusion When CAN master transmits small amount of data to many slaves using sequential algorithm, the transmission time is increased because of additional overhead bits. In this case, the transmission time is decreased by using the proposed data reorganization algorithm.

A Data Reorganization Algorithm to Improve Transmission Efficiency

443

To use the proposed algorithm, slave needs more memory to save received data. And the interrupt rate is raised from the CAN controller to the CPU. But the proposed algorithm can improve network performance. And it is useful in some embedded control network which have requirement of real time characteristic.

Acknowledgments This work was supported by the Regional Research Centers Program(Research Center for Logistics Information Technology), granted by the Korean Ministry of Education & Human Resources Development.

References 1. Bosch.: CAN Specification Version 2.0, Robert Bosch GmbH, Stuttgart (1991) 2. Tindell, K., Burns, A., Wellings, A.: Calculating Controller Area Network (CAN) Message Response Times. Proc. IFAC DCCS’94. Toledo, Spain (1994) 35-40 3. Lawrenz, W.: CAN System Engineering: From Theory to Practical Applications. Springer. Oct. (1997) 4. Farsi, M., Ratcliff, K., Barbosa, M.: An Introduction to CANopen. Computing & Control Engineering Journal, Vol. 10, Issue 4, Aug. (1999) 161-168 5. Leon-Garcia, A.: Probability and Random Processes for Electrical Engineering. Second Edition. Addison Wesley (1994)

A Neural Network Approach to QoS Management in Networked Control Systems over Ethernet Wenhong Zhao1 and Feng Xia2 1

Precision Engineering Laboratory, Zhejiang University of Technology, Hangzhou 310014, China [email protected] 2 National Laboratory of Industrial Control Technology, Zhejiang University, Hangzhou 310027, China [email protected]

Abstract. The popularity of using Ethernet as an industrial communication network in networked control systems (NCSs) has significantly grown in recent years. Despite this, the quality of control (QoC) could be degraded by crosstraffic in Ethernet. To enable networked control over Ethernet, a proactive QoS (quality of service) management scheme that exploits the neural network technology is presented. Using the idea of QoS framework, the shared network resource is dynamically allocated among control loops with respect to crosstraffic fluctuations, which is predicted by a simple and computationally-fast neural network. Simulation results show that the proposed scheme is highly effective in improving the QoC of NCSs over Ethernet.

1 Introduction In the area of manufacturing automaton and process control, there is a trend in networked control systems (NCSs) to substitute conventional industrial networks with Ethernet [1-3]. In these systems, the same Ethernet may be shared by multiple sensing and actuation flows, periodic bulk data transfers, and other wide-area traffic such as FTP, HTTP, and TELNET flows [4]. It is well-known in the control community that in NCSs network-induced delays and packet losses could significantly deteriorate the quality of control (QoC), and in extreme cases cause instability [5]. Therefore, the QoC largely depends on the timeliness provided by the network, which is often tightly associated with the traffic condition. The presence of cross-traffic in Ethernet as well as the non-deterministic characteristic of the CSMA/CD (carrier-sense multiple access with collision detection) protocol that Ethernet uses makes it difficult to always guarantee required QoC in NCSs. Originally defined for multimedia and telecommunication systems, QoS (quality of service) is a promising framework for run-time management of certain resources that are shared by multiple applications. It has several advantages that make it interesting for use in NCSs. For instance, QoS enables graceful degradation, and QoS level adjustment is an attractive way to increase overall system utilization [6]. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 444 – 449, 2006. © Springer-Verlag Berlin Heidelberg 2006

A Neural Network Approach to QoS Management in NCSs over Ethernet

445

In this paper, we apply the idea of QoS framework to dynamically manage the bandwidth resource of NCSs. In the considered system, multiple control loops are closed over the same Ethernet, and have to compete for bandwidth resource with other non-control bursty cross-traffic. To attack the negative impact of varying traffic conditions on QoC, we present a proactive QoS management scheme based on neural networks (NNs), which enables networked control over Ethernet. The cross-traffic is predicted and then the network resources allocated to control loops are adjusted accordingly. In this way, the network QoS provided for each control loop is effectively managed during run time. It is argued by simulation results that our scheme could significantly improve the overall QoC of NCSs. The rest of this paper is organized as follows. Section 2 illustrates the features of NCSs over Ethernet. We present the neural network based QoS management scheme in Section 3. The performance of our method is evaluated in Section 4. This paper is concluded with Section 5.

2 NCS over Ethernet Although Ethernet was not created to guarantee the delivery of time-critical information and features non-deterministic nature of communication, it has become a prime candidate for control networks. The advantages of Ethernet (compared with fieldbuses such as ControlNET, CAN, WorldFIP, etc.) include low cost, widespread usage, flexibility, and high communication rate [1-3]. A typical structure of control applications over Ethernet is illustrated in Fig. 1. In a feedback control loop, the sensor and the controller send messages to the controller and the actuator respectively via the Ethernet. There are also other nodes generating cross-traffic, e.g. HTTP, FTP, or TELNET flows over the Ethernet. These cross-traffic may comes from, e.g. the plantlevel enterprise information management subsystem, computer integrated manufacturing system (CIMS), or Internet.

Controller 1

...

Controller N

CIMS

Other App.

Ethernet Sensor 1

Actuator 1 Plant 1

...

Sensor N

Actuator N Plant N

Fig. 1. Networked control over Ethernet

Existing experiments and applications have demonstrated that Ethernet can be successfully applied to real-time closed loop control systems [2,3]. Under low network loads, the Ethernet-induced delay is quite small (almost zero to be exact). Otanez et al [1] recommend that there is a linear relationship between utilization and collision rate for small payloads, and network utilization should not exceed 25% and 37% for 10 and 100Mbps Ethernet respectively.

446

W. Zhao and F. Xia

3 Neural Network Based QoS Management In this section, we present a proactive QoS management scheme to enable networked control over Ethernet. With response to network traffic fluctuations, bandwidth resources are dynamically allocated to control loops that share the same Ethernet channel with other non-control applications. Once a real-time control loop is closed over an Ethernet, the bursty nature of the cross-traffic will inevitably affect the QoC [4]. The stability of the system cannot always be guaranteed because Ethernet’s non-deterministic protocol makes it impossible to predict network delays. Therefore, it is critical to properly allocate the bandwidth among different flows so as to optimize the overall system performance. Fig. 2 shows the proposed QoS management scheme. It is assumed that the crosstraffic (here we refer to all non-control traffic) cannot be rescaled. Consequently, we attempt to adjust the bandwidth resources allocated to all control loops with the intent of minimizing network delays, especially for control traffic. Since time delays increase as the Ethernet collision rate increases, the network utilization will be kept at a specific level, Us, e.g. 25% for 10Mbps Ethernet, in order to avoid too frequent collisions. The objective of QoS management is to maximize the overall control performance in the presence of bursty cross-traffic. Similar to the methods in [7,8], the cross-traffic is predicted using a neural network. Based on the prediction results, the bandwidth allocated to each control loop is then dynamically adjusted. QoS Manager Dynamic Bandwidth Allocation

Control Loops

Non-Control Applications

Predicted Cross-Traffic Cross-Traffic

Neural Network

Ethernet

Fig. 2. NN based QoS management for NCSs over Ethernet

Predicting network traffic is a typical time series prediction problem, which can be successfully solved using NNs [9,10]. To achieve low overhead of QoS management, we here use a simple BP neural network to predict the next transmission rate of crosstraffic, rct, which has been normalized with respect to the whole data rate of the Ethernet, rE. We use the same structure of neural network as [10]. As shown in Fig. 3, it is a three-layer NN, with 5 inputs, 10 hidden neurons, and one output. Based on the predicted transmission rate of cross-traffic, the total transmission rate of all control flows will then be adjusted so that the overall network utilization will be Us. Following the idea of feedback scheduling [7], we change the sampling periods of all control loops as follows:

A Neural Network Approach to QoS Management in NCSs over Ethernet

447

N

hi ( k 1)

hˆi <

¦ (8d j / hˆ j ) j 1

rE (U s rct ( k 1))

(1)

, i 1,..., N

where h is sampling period, Ʃ is the default/nominal sampling period, d is the total data size (in byte) transmitted in each control loop during every sampling interval, rct(k+1) is the output of NN predictor, k is the sampling instant of the QoS manager, and N is the number of control loops. Note that the QoS manager is time-triggered. In the system considered we assume that the normalized transmission rate of crosstraffic never exceeds the desired utilization, i.e., rct<Us. This is based on the fact that network delays will always be significant due to large collision rate when cross-traffic heavily loads the Ethernet network, which makes it quite difficult, if not impossible, to improve the QoC.

rct (k) rct (k-1) rct (k-2) rct (k-3) rct (k-4)

rct(k+1)

Fig. 3. Neural network structure

In order to avoid too frequent changes in sampling periods, which may negatively affect control performance, we update the sampling periods only if the following condition holds:

∆rct =| rct (k + 1) − rct (k ) |> ∆rmin

(2)

where ǻrmin is a specified lower threshold for the absolute difference between the predicted value and current value of the transmission rate of cross-traffic.

4 Simulation In this section, we carry out simulations to evaluate the performance of the proposed approach using Matlab/Simulink and the TrueTime toolbox [11]. We consider a system composed of 10 independent control loops (i.e., N=10) and 5 disturbance nodes. In every control loop there is a sensor, a controller, and an actuator in addition to the controlled plant. All plants have the same transfer function G ( s ) = 1000 /( s 2 + s ) , and all controllers execute well-designed PID algorithms. The disturbance nodes generate cross-traffic. All nodes are connected via an Ethernet with a transmission rate of 10Mbps. Accordingly, the desired network utilization Us is chosen as 25%. The default sampling period of each control loop takes on a value of

448

W. Zhao and F. Xia

0.008, 0.009, or 0.01 s. The sizes of messages from sensor to controller and from controller to actuator are 100 bytes. Therefore, d = 200 bytes. The invocation interval of the QoS manager is set to 0.1s. To apply the proposed QoS management scheme, we need to train and test the utilized neural network. For this purpose, we first assemble a representative sample of cross-traffic transmission rates using the simulation setup. Then with a part of this sample we train the neural network using BP algorithm. The other part of this sample is used to verify the effectiveness of the obtained NN. Once satisfactory prediction performance is achieved by the NN, we implement the NN based QoS manager. Each plant experiences an input step change every 1s, and the whole simulation lasts 30s. The utilization of the Ethernet under our method (denoted NN-QoSM) is shown by the red solid line (without circles) in Fig. 4. For comparison, we also give results when no QoS management scheme is used (denoted Non-QoSM), see blue line with circles. It is clear that by using our method, the network utilization can be increased to optimize resource usage when the cross-traffic is light, and decreased to avoid frequent collisions when the cross-traffic is heavy. Our method yields a relatively stable utilization regardless of the fluctuations of the cross-traffic.

Fig. 4. Network utilization

Fig. 5. IAE values of control loops

To assess the QoC, we record IAE (the integral of absolute error) of each control loop, see Fig. 5. To take into account the stochastic nature of Ethernet’s CSMA/CD protocol, we here use the mean value of five simulation runs. Notice that the smaller the IAE, the better the QoC. Clearly, in comparison with the Non-QoSM case, our method can effectively improve the QoC.

5 Conclusions Since cross-traffic inevitably degrades control performance, QoS management is critically important for NCSs, especially when Ethernet is used as the communication network. In this paper, we present a proactive QoS management scheme using the neural network technology. It enables networked control over Ethernet where multiple control loops and non-control applications have to compete for the same bandwidth resources. With graceful degradation, the proposed method successfully minimizes

A Neural Network Approach to QoS Management in NCSs over Ethernet

449

the impact of cross-traffic on the QoC. We conduct simulation experiments to assess the performance of our method. Simulation results show that our method optimizes network resource usage when the cross-traffic is light and reduces collisions when the cross-traffic is heavy, both of which yields improved QoC.

Acknowledgement This work is partially supported by Zhejiang Provincial Natural Science Foundation of China under Grant No. M503059.

References 1. Otanez, P. G., Parrott, J. T., Moyne, J. R.,Tilbury, D. M.: The Implications of Ethernet as a Control Network. Proc. of Global Powertrain Conference, Ann Arbor, MI (2002) 2. Ji, K., Kim, W. J.: Real-Time Control of Networked Control Systems via Ethernet. Int. J. of Control, Automation, and Systems, Vol. 3, No. 4 (2005) 591-600 3. Tipsuwan, Y., Chow, M.-Y.: Neural Network Middleware for Model Predictive Path Tracking of Networked Mobile Robot over IP Network, Proc. 29th IEEE IECON’03, Vol. 2, Roanoke, VA (2003) 1419-1424 4. Robinson, B., Liberatore, V.: On the Impact of Bursty Cross-Traffic on Distributed RealTime Process Control. Proc. IEEE Int. Workshop on Factory Communication Systems, Vienna, Austria (2004) 147-152 5. Xia, F., Wang, Z., Sun, Y. X.: Integrated Computation, Communication and Control: Towards Next Revolution in Information Technology. Lecture Notes in Computer Science 3356 (2004) 117-125 6. Sanfridson, M.: Problem Formulations for QoS Management in Automatic Control. Technical Report TRITA-MMK 2000:3, ISSN 1400-1179, ISRN KTH/MMK-00/3-SE, Royal Institute of Technology (KTH), Sweden (2000) 7. Xia, F., Li, S. B., Sun, Y. X.: Neural Network Based Feedback Scheduler for Networked Control System with Flexible Workload. Lecture Notes in Computer Science 3611 (2005) 237-246 8. Moh, W. M., Chen, M. J., Chu, N. M., Liao, C. D.: Traffic Prediction and Dynamic Bandwidth Allocation over ATM: a Neural Network Approach, Computer Communications, Vol. 18, No. 8 (1995) 563-571 9. Frank, R.J., Davey, N., Hunt, S.P.: Time Series Prediction and Neural Networks, Journal of Intelligent and Robotic Systems 31 (2001) 91-103 10. Develekos, G., Michail, O., Douligeris, C.: A Neural Networks Approach to the Estimation of the Retransmission Timer (RTT). Proc. 9th Panhellenic Conf. in Informatics, Thessaloniki, Greece (2003) 11. Henriksson, D., Cervin, A., Årzén, K. E.: True Time: Simulation of Control Loops Under Shared Computer Resources. Proc. of the 15th IFAC World Congress on Automatic Control, Barcelona, Spain (2002)

A Novel Micro-positioning Controller for Piezoelectric Actuators Van-Tsai Liu1, Chun-Liang Lin2, Hsiang-Chan Huang1, and Zi-Jie Jian2 1

Department of Electrical Engineering, National Formosa University, Huwei, Taiwan 632, R.O.C. [email protected] 2 Department of Electrical Engineering, National Chung Hsing University, Taichung, Taiwan 402, R.O.C. [email protected]

Abstract. The main purpose of this study is to design a tracking controller for a dual-axes piezoelectric actuated platform. First, a Preisach model is used to numerically describe the hysteresis behavior of piezoelectric actuators. Next, on the basis of the Preisach model, a feed-forward controller is developed to compensate for the hysteresis nonlinearity. Then, a PID controller is introduced to further suppress the tracking error due to the modeling inaccuracy and hence to get precision tracking control. We utilize evolution algorithm to choose three optimality control gain for PID controller. The dual-axes motion control problem for the piezoelectric actuated platform is also investigated. A neural-net based decoupling control scheme is proposed to eliminate the contour error which is typical in dual-axes tracking control problem. The developed approaches are numerically and experimentally verified which demonstrate performance and applicability of the proposed designs under a variety of operating conditions.

1 Introduction Piezoelectric actuators have widely been used in many industrial applications such as noise-and-vibration control. However, the positioning precision can be significantly reduced due to nonlinear hysteresis effects when piezoelectric actuators are used in relatively long-range, positioning applications. Therefore, piezoelectric actuators are typically operated in the linear range to avoid positioning errors caused by nonlinear hysteresis effects. The compensation methods for the piezoelectric hysteresis have been widely proposed. Preisach model is the earliest model of hysteresis, Ge [1] set up hysteresis mathematic model by Preisach model. Ge and Choi generated the parameters of mathematic model using the tracking control [2] and feedforward controller to analyse the effects of different feedback circuit [3]. However, there are still some practical problems. First, the contour error which is typical in dual-axes tracking control problem is unsolved. To solve the coupling problem, a novel decoupling control scheme based on neural network is proposed in this paper. Many engineering applications of neural network have been found in the areas of signal processing, function approximation, etc [4]. Because of their high capacity and learning ability, neural networks are well suited to depict functions with local nonlinearities and fast D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 450 – 455, 2006. © Springer-Verlag Berlin Heidelberg 2006

A Novel Micro-positioning Controller for Piezoelectric Actuators

451

variations. Second, the Conventional PID controllers need to regulate three parameters which were usually chosen trial and error. This motivates intelligent control algorithms described in the follows. In this paper, we proposed evolutionary algorithms to choose three optimality control gains for PID controller [5]. A neural-net based decoupling control scheme is proposed to eliminate the contour error. Extensive simulations and experimental results are presented to confirm our proposed approach.

2 Preisach Model The Preisach model can be described as [1]

f (t ) = ³³ α ≥ β u (α , β )γ αβ [u (t )]dα d β .

(1)

where u(t ) is the input signal, f (t ) is the output response, u (α , β ) is the weighing function in Preisach model, α and β are, respectively, the upper bound and lower bound of input value, and γ αβ [u (t )] is the hysteresis operator which is set up between 0 and 1. The expressions of Preisach model can be defined as N −1

f ( t ) = ¦ ( fα ' β ' − f α ' β ' ) + f u ( t ) − f u ( t ) β ' . k =1

f (t ) =

N −1

¦ [ F (α

(2)

N −1

k

' k

, β k' − 1 ) − F (α k' , β k' )]

k =1

k

k −1

k

+ { F (α N' , β N' − 1 ) − F [α N' , u ( t )]} =

N −1

¦ ( fα k =1

' k

β k'

− fα ' β ' ) + fα ' k

k −1

N

u (t )

.

− fα '

N

(3)

β N' − 1

The actual displacement output of the piezoelectric actuator is measured when the experiment data f α i and i, j = 1, 2, , n have already known. The output f (t ) can use (2) and (3) to calculate output displacement of piezoelectric actuator. The experiment and simulation of hysteresis loops is shown in Fig.1.

Fig. 1. Experiment and simulation of hysteresis loops (x-axle)

452

V.-T. Liu et al.

3 Design for Tracking Control Scheme In this paper, we propose a novel tracking control method including the feedforward controller and PID controller generated by evolutionary algorithms. The procedure is as follows. 1.

2.

Assign a reference signal and the control signal ud (0) to begin from 0. Take a sample displacement f d ( k ) using parameters of Preisach model (as (2)

and (3)), and compare the displacement to calculate the hysteresis output f r . The error is used to update the control signal, its formulation is as the expression. ud (k ) = udold ( k ) + ∆ud ( k ) .

3.

(4)

where ∆ud (k ) is the modified control voltage, ud ( k ) and udold (k ) are, respectively, the control voltage with and without modifying it. Input the next reference displacement signal and repeat the step 2, and let ud old ( k ) = ud ( k ) . To make the tracking error minimum, the fitness function is defined as

f ∝

1

( fd −

f act )

2

.

(5)

where f d is reference displacement input, and f act is actual displacement. We assume that there are w individuals in every generation, each individual is denoted by the vector v = ( k i , N ( 0, σ 2 ) ) , where k i denotes the vector of three PID control

parameters, N ( 0, σ 2 ) denotes the vector of the group zero average, σ is the standard

deviation for gauss function. The g-th mutation can defined as k ij( g +1) = k ij( g ) + N ( 0, σ 2 ) , i = 1, , w, j = 1, , l .

(6)

The three gain parameters of PID controller are produce in the way as follows: k i′ = k i + N ( 0, σ 2 ) .

(7)

σ ′ = σ ⋅ exp(ζ ) .

(8)

where ζ is determined to have zero average and normal distribution function between two generations variation ∆σ 2 . The rules for searching parameters can be determined (g) (g) by f max − f min < ε , where ε is the upper bound, f max = max fi and f min = min fi . i =1,, w

i =1,, w

4 Cross Coupling Control in Dual-Axes Systems The main purpose of this section is to use a decoupling controller based on neural network to reduce the contour error of dual-axes cross-coupling. The neural network is L layer of feedforward type neural network, it is defined as NN (v,W1 ,W2 , ,WL ) ,

A Novel Micro-positioning Controller for Piezoelectric Actuators

453

where Wi (i = 1, , L) ∈ ℜn ×n is the weighing matrix from (i − 1) layer to i layer. The input vector is ε (e x , e y ) . The structure of the neural network is i

i −1

Cc (ε ) = Ψ L [WL ⋅⋅⋅ Ψ 2 [W2 Ψ1[W1ε ]]] .

(9)

The activation function vector Ψi [⋅]: ℜ ℜ is defined as ni

ni

Ψi [ z] ≡ [ψ 1 ( z1 )ψ ni ( zni )]T x + y = z .

(10)

The activation function ψ ( z ) of hidden layer is expressed as ° ½° § 1− e−zβ · Fs ≡ ®ψ (⋅): R R |ψ (z) = λ ¨ , β, λ > 0¾ . −z β ¸ + 1 e © ¹ ¯° ¿°

(11)

where β and λ are parameters of the activation function. The activation function of the output layer is linear type: Fl ≡ {ψ (⋅) : R R | ψ ( z ) = λ z , λ > 0} .

(12)

To minimize track error, defined fitness function as f ∝

where

wi , i = 1, 2

1

.

w1 ( ex + e y 2 ) + w2 C xy 2

(13)

is the weighing factor; Cxy is the cross covariance, it is defined as C xy =

1 tf

³

tf

0

x(t ) y(t )dt −

1 t 2f

³

tf

x(t )dt

0

³

tf

y (t ) dt .

0

(14)

where t f is the operating time. The weighing parameter set of the neural network is shown as:

S = {k1 , k2 , , kw } , i = 1,..., w .

(15)

where

(

n n -1 ki = ª¬vecT ( wiin, j1 ) , vecT ( wi21,kj ) , L, vecT wi(,kj)( ) T bias , vecT ( wiout )º¼ , ,1k ) , vec ( wi

)

, j = 1,, L, J , k = 1, , K , n=2,, N ,

The mutations can be defined as

k (ijg +1) = k ij( g ) + N ( 0,σ 2 ) , i = 1,, w, j = 1,, l .

(16)

5 Experimental Verification 5.1 The Experiment and Simulation Results of the Single Axle Control

A piezoelectric actuated platform is introduced to confirm the feasibility. The position of the piezoelectric actuators ranges from 40 µm to 25 mm, the range of the control voltage is 0~5V. The precision specification of the sensor is 0.1µm. The evolutionary

454

V.-T. Liu et al.

algorithm is first proposed to choose optimal parameters. The parents are w = 15 , the offsprings of mutation are λ = 7 ⋅ w , and the fitness function is defined as (5). After 50 generations, the fitness function of x-axle converges. The sinusoidal wave is used as the input signal where the amplitude is 30µm and frequency is 0.1Hz. When using the feedforward controller, the rms error of the x-axle is 2.0185 µm, the tracking error is 6.73 % and the maximum error is 3.344 µm, the rms error of the y-axle is 1.7688 µm, the tracking error is 5.9 % and the maximum error is 2.9833 µm. When using the feedforward controller with the PID feedback controller, the rms error of the x-axle is 0.9389 µm, the tracking error is 3.13 %, the maximum error is 4.87 µm, the rms error of the y-axle is 0.4189 µm, the tracking error is 1.4 %, the maximum error is 2.274 µm. Obviously, the PID controller with evolutionary algorithms improves the rms error and tracking error, and the positioning accuracy is quite satisfactory. 5.2 The Experiment and Simulation Result of the Dual-Axes Control

The cross-coupling controller based on neural network was used to eliminate the contour error. The structure of the neural network controller is described in (9), where the activation function is described as (10). In this paper, there are 24 hidden layers. The activation function of hidden layer is expressed as (11). The parents are w = 15 , the offsprings of mutation are λ = 7 ⋅ w , and the fitness is defined as (13). After 50 generations reproduction to derive the weighing value of the neural network. The cosine wave is used as dual-axes input signal where the amplitude is 20 µm and frequency is 1Hz. A round with 10 µm radius is plotted as shown in Fig.2.

Fig. 2. The cross-decoupling control and reference tracking

R.M.S. of the contour error without using cross-coupling control is 0.7923µm, the rms of the tracking error of the x-axle is 0.6770 µm, the rms of the tracking error of the y-axle is 0.6007 µm. When using cross-coupling control, the contour error is 0.5321µm, the tracking error of the x-axle is 0.4574 µm, the tracking error of the yaxle is 0.4737 µm. From the experimental results, the cross-coupling controller can improve the contour error and the tracking error rms of dual-axes.

A Novel Micro-positioning Controller for Piezoelectric Actuators

455

6 Conclusion In this paper, a feed-forward controller and a PID controller with evolutionary algorithms are proposed to compensate for the hysteresis nonlinearity. The tracking error is obviously suppressed by the PID controller. The contour error which is typical in dual-axes tracking control problem is eliminated by the neural-net based decoupling control scheme. From the experimental results, the developed approaches demonstrate the performance and applicability under a variety of operating conditions.

References 1. Ge, P., Jouaneh, M. : Modeling Hysteresis in Piezoceramic Actuators, Precision Engineering, vol. 17, no. 3 (1995) 211-221 2. Choi, G. S., Kim, H. S., Choi, G. H. :A Study on Position Control of Piezoelectric Actuators, IEEE International Symposium on Industrial Electronics (1997) 851-855 3. Ge, P., Jouaneh, M. : Generalized Preisach Model for Hysteresis Nonlinearity of Piezoceramic Actuators, Precision Engineering, vol. 20, no. 2 (1997) 99-111 4. Yamada, T., Yabuta, T. : Nonlinear Neural Network Controller for Dynamic System, in Proc. of 16th IEEE Annual Conference, vol. 2 (1990) 1244-1249 5. Lin, C. L., Jan, H. Y. : Multiobjective PID Control for a Linear Brushless DC Motor: an Evolutionary Approach, Proc. of IEE Electric Power Applications, vol. 149 (2002)

A Study on the Robust Control of an Inverted Pendulum Using Discrete Sliding-Mode Control J.Y. Yang1, H.J. Lee2,*, J.H. Hwang1, N.K. Lee2, H.W. Lee2, G.A. Lee2, and S.M. Bae3 1

Dept. of Aerospace and Mechanical Engineering, Hankuk Aviation University, 200-1, Hwajeon-dong, Deogyang-gu, Goyang-city, Geonggi-do, #412-791, Korea [email protected], [email protected] 2 Digital Production Processing & Forming Team, Korea Institute of Industrial Technology, 994-32, Dongchun-dong, Yeonsu-gu, Incheon Metropolitan City, #406-130, Korea {naltl, nklee, hwlee, galee}@kitech.re.kr 3 Industrial & Management Engineering, Hanbat National University, SAN 16-1, Duckmyoung-dong, Yusong-gu, Daejon Metropolitan City, #305-719, Korea [email protected]

Abstract. In this paper, discrete sliding mode controller (DSMC) with slidinglike condition is evaluated experimentally the robustness point of view. DSMC was modified to use a pseudo-sliding methodology and a variable sliding gain methodology. DSMC with a pseudo-sliding methodology and a variable sliding gain methodology perfectly eliminate a chattering problem and can control the system use a small control action. The usefulness of the designed DSMC in this paper is demonstrated by the control of an inverted pendulum system and the robustness of DSMC experimentally evaluated for model error. To provide a performance benchmark with respect to robustness, a direct comparison was made against linear quadratic control.

1 Introduction It is well known that the continuous sliding mode control (CSMC) has robustness for parameter uncertainties and external disturbances [1-5]. But discretization of CSMC reduces robustness. The sampling may cause undesirable system performances or unstable behavior. For that reason, many investigations were reported to solve this problem. However, most of all restricted to the local behavior around the sliding mode. But ‘Sliding-like Condition’ by Chen [6] is a global discrete sliding condition. It implies that the system does not sliding in the sliding mode, but moves in a layer with a finite thickness. Sliding mode control is considered a nonlinear controller, not because of a nonlinear structure or model but because of control action switching. Such a high frequency motion known as chattering is highly undesirable in practice and will result in unnecessary wear and tear on the actuator components. Then, in this paper chattering pre-vented DSMC was designed [7]. To mitigate high controller action and *

Corresponding author.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 456 – 462, 2006. © Springer-Verlag Berlin Heidelberg 2006

A Study on the Robust Control of an Inverted Pendulum Using DSMC

457

sensitivity to the assigned parameters of system, a variable sliding gain methodology was incorporated in the sliding laws [1]. The inverted pendulum is a classic nonlinear system which widely used to verify the usefulness of many control techniques. In this paper, discrete sliding mode controller (DSMC) with sliding-like condition is evaluated experimentally the robustness point of view. DSMC was modified to use a pseudo-sliding methodology and a variable sliding gain methodology. The usefulness of the designed DSMC in this paper is demonstrated by the control of an inverted pendulum system and the robustness of DSMC experimentally evaluated for model errors. To provide a performance benchmark with respect to robustness, a direct comparison was made against linear quadratic control. The designed DSMC which is a chattering free and low control action controller has a good robustness for pendulum length error on the order of ±50% as a compared to LQC in the same condition. Moreover, the designed DSMC can assure the stable actuator performance without chattering because it needs a very small control action to control the system.

2 Inverted Pendulum Systems The inverted pendulum system is described in Fig. 1. The DC motor translates the cart through a ball screw. Two rails on both sides of the ball screw support the cart. Two encoders measure the cart’s position and the pendulum angle. The system is modeled under the following assumptions. First, the pendulum angular displacement is very small. Second, the pendulum is restricted to the vertical. Third, the pendulum, the ball screw and the rail are rigid. Fourth, the friction and damping force are negligible at joint. The last, the friction is constant between the cart and the ball screw. M and m are the masses of the cart and the inverted pendulum, respectively; l is the pendulum length. The position of the cart is r and θ is the angular position of the inverted pendulum. The dynamic equations can be derived based on the Lagrangian equations. After linearization at θ ≅ 0, then

x (t ) = Ax(t ) + Bu (t ) + ζ (t ) where x = ª¬ r θ

(1)

T

r θ º¼ , ζ (t ) is the disturbance error, and

0 ª0 «0 0 « mg A = «0 − « M « ( M + m) g «0 «¬ Ml Fr = b + (

−

1 0 Fr

M Fr

Ml

2π 2 K m Kb ) r R

0º 1»

»

ª 0 º « 0 » « » « Fv » B=« » « M » « Fv » «¬ − Ml »¼

0»

,

,

Fv =

» » 0» »¼

2π K m r

R

(2)

(3)

458

J.Y. Yang et al.

Parameters of inverted pendulum system are shown in Table 1, then 0 1 ª0 «0 0 0 A=« «0 −196.14 −24.965 «¬0 29.421 0.624

0º 1» » 0» 0 »¼

,

ª 0 º « 0 » » B=« « 81.467 » « » ¬« −2.037 ¼»

(4)

The system ( A, B) is controllable.

Fig. 1. The inverted pendulum system Table 1. Parameters of inverted pendulum system

Parameter Cart mass Pendulum mass Pendulum length Cart friction coefficient Displacement per ball screw one revolution Motor torque constant Motor reverse electromotive force constant Motor armature resistance

Symbol M m l b

Value 1.0kg 0.2kg 40cm 0.5kg/s

r

1.27cm

Km

4.94N·cm/A

Kb

0.0607V/rad/s

R

0.3Ω

3 Controller Design 3.1 Discrete Sliding Mode Controller

The inverted pendulum system is a single input system, it can be described a continuous time-invariant system. Define the error state as e(t ) = x - xd , then (1) becomes e (t ) = Ae(t ) + Bu (t ) + ζ (t ) (5)

A Study on the Robust Control of an Inverted Pendulum Using DSMC

where xd = [rd

459

0 0 0]T , u (t ) is the control input, ζ (t ) contains the disturbance

errors, and the pair ( A, B) is controllable. If the sampling time T is sufficiently fast so that (5) can be directly discretized as e k +1 = (I + TA )e k + TBuk + Td k

(6)

where e (t ) ≈ (e k +1 − e k ) / T , the input u (t ) is generated by a zero-order-hold as u (t ) = uk for kT ≤ t < (k + 1)T . The disturbance d k contains both the modeling and truncation errors, it is bounded as d k < δ for all k. Where δ is a positive constant, the input uk depends on ek −1 . The sliding function is [6] sk = cT ek

(7)

Let the control law uk depends on e k −1 as [6] uk = −(cT B) −1[cT Aek + w sgn( sk −1 )]

(8)

where sgn( sk −1 ) is the sign function of sk −1 and w = δ c . The following characteristic equation

det λ I − (I − B(cT B )−1 ) A = 0

(9)

which possess stable eigenvalues with Re(λ) < 0. And the characteristic equation of a discrete system is

det

( z − 1) I − (I − B(cT B) −1 ) A = 0 T

(10)

If λ is selected to satisfy z < 1 , the system is stable. Pseudo-Sliding and Variable Sliding Gain To prevent chattering, the sigmoid-like function replaces with the signum function of (9). The sigmoid-like function is

υδ (s)=

s ( s +δ )

(11)

where 0 G 1 (set as 0.005). To mitigate high controller action and sensitivity to the assigned parameters of system, a variable sliding gain was incorporated in the sliding laws [1].The variable sliding gain decreases the magnitude of the sliding action on approach to the sliding surface and can therefore prevent overshoot. The variable sliding gain is w=

ws n

¦x

i

i =1

+ε

(12)

460

J.Y. Yang et al.

where w is the initial value for the sliding gain , w = δ c , (δ = 0.5) , and ε is a positive constant set at 0.1. Now the new control law is uk = −(cT B) −1[cT Ae k + wυδ ( sk −1 )]

(13)

3.2 Linear Quadratic Controller

The LQ control has a linear feedback law of the following form u = -Kx

(14)

where K is the feedback gain matrix. To optimize this gain matrix, weighting matrices Q and R for input and output, respectively, are designed to minimize a cost function J . ∞

J = ³ (x T Qx + u T Ru )dt

(15)

0

4 Experimental Results The inverted pendulum of the experiment is shown in Fig. 2. DSMC is implemented by a PC (DSP-board, CEMTOOL/SIMTOOL) with sampling time T =0.001 sec. DSMC was designed as eigenvalues of the feedback system (10) are {-20, -4.97±i0.89} and cT = [ −0.2553 −24.9111 −0.1123 −4.9828] . LQC was designed R = 1, Q = diag [1 0 1 0] , K = [-1.0 -525.6 -1.8 -105.6].

Fig. 2. Inverted pendulum apparatus

4.1 Robustness Test for Pendulum Length Error The pendulum length for the experiment is 40cm. Hence DSMC and LQC were designed as 20cm and 60cm pendulum length (±50% errors). And these controllers apply to the correct inverted pendulum system to experimentally evaluate robustness of the designed controller. This set of robustness tests would be the most realistic in an industrial environment and could be extended to other plants where the system

A Study on the Robust Control of an Inverted Pendulum Using DSMC

461

parameters are time varying. Fig. 3, 4 show the robustness test results of DSMC and LQC, respectively. LQC has vibration of the response more than DSMC and the control input of LQC is twice as high as the control input of DSMC.

(⎯black):desired , (⎯ red):correct model, (⎯green):+50% error, (⎯blue):-50% error

Fig. 3. DSMC

Fig. 4. LQC

5 Conclusions In this paper, DSMC with a pseudo-sliding methodology and a variable sliding gain methodology was designed and experimentally evaluated robustness for model error. From the result of the robustness test performed on the inverted pendulum, it can be

462

J.Y. Yang et al.

concluded that DSMC shows a high level of robustness in the presence of model error on the order of ±50%. As a compared to LQC, DSMC shows a higher level of robustness than LQC. All of the control input magnitude and frequency of DSMC are very low then LQC’s. It means that the designed DSMC can ensure the stable actuator performance for control.

Acknowledgements This work has been sponsored by MOCIE (Ministry of Commerce, Industry and Energy) of Korea.

References 1. Harry N. Iordanou , Brian W.Surgenor: Experimental Evaluation of the Robustness of Discrete Sliding Mode Control Versus Linear Quadratic Control. IEEE Transactions on Control Systems Technology, Vol. 5, No. 2, (1997) 254-260 2. Utkin VI: Variable Structure Systems with Sliding-modes. IEEE Trans Automatic Control (1977) 212-222 3. Vadim Utkin, Jürgen Guldner, Jingxin Shi: Sliding Mode Control in Electromechanical Systems. CRC press (1999) 4. Vadim I. Utkin: Variable Structure Systems with Sliding Modes. IEEE Transactions an Automatic Control, Vol. 22, No. 2, (1997) 212-222 5. Vadim I. Utkin: Sliding Modes in Control and Optimization. Springer-Verlag Berlin, Heidelberg (1992) 6. Yon-Ping Chen, Jeang-Lin Chang, Sheng-Renn Chu: PC-based Sliding-mode Control Applied to Parallel-type Double Inverted Pendulum System. Mechatronics, Vol. 9, (1999) 553-564 7. Christopher Edwards, Sarah K. Spurgeon: Sliding Mode Control ; Theory and Applications. Taylor & Francis (1998) 8. K. David Young, Vadim I. Utkin, Umit Ozguner: A Control Engineer’s Guide to Sliding Mode Control. IEEE Transactions on Control Systems Technology, Vol. 7, No. 3, (1999) 328-342 9. Weibing Gao, Yufu Wang, Abdollah Homaifa: Discrete-Time Variable Structure Control Sys-tems. IEEE Transactions on Industrial Electronics, Vol. 42, No. 2, (1995) 117-122 10. Sami Z. Sarpturk, Yorgo Istefanopulos, Okyay Kaynak: On the Stability of Discrete-Time Sliding Mode Control Systems. IEEE Transactions on Automatic Control, Vol. AC-32, No. 10, (1987) 930-932 11. Gene F.Franklin, J. David Powell, Michael Workman: Digital Control of Dynamic Systems. Addison Wesly Longman (1998)

A VRC Algorithm Based on Fuzzy Immune PID-Smith Ling He1, Yu-cheng Zhou2, Yuan-wei Jing1, and Hai-yu Zhu1 2

1 Northeastern University, Shenyang 110004, China Research Institute of Wood Industry, Chinese Academy of Forestry, Beijing 100091, China

Abstract. VRC algorithm is a new active queue management (AQM) arithmetic based on rate. The response rate of VRC algorithm is rapider than traditional AQM arithmetic. A new VRC algorithm based on the fuzzy immune adaptive PID control and Smith predictor is proposed. PID parameters’ online selfadapting was implemented by immune feedback mechanism. Moreover the Smith predictor was successfully introduced into feedback to compensate the time delay in network. Contrasted with traditional PID algorithm, fuzzy immune PID-Smith is validated by simulation results. It can adjust itself to new network conditions rapidly and stably, can converge to queue size-setting value, and can get better robustness.

1 Introduction Network congestion is the main reason to cause descent of network capability. One side, the source should adjust transmission speed of data on the state of network congestion to reduce network congestion; on the other side, the network should participate in management and control of resource, i.e. adopt active queue management (AQM) in the Router[1]. AQM can be divided into two types: 1) the rate-based congestion control. 2) the queue-based congestion control. Recent years, different AQM are proposed. The representative AQM algorithm are RED[2], REM[3], ECN[4] and so on. C.Hollot[5] and other researchers designed P and PI controller used as AQM algorithm by classical control theory. As the improving of PI algorithm, PID algorithm[6] introduced the integral to eliminate static error. But PID algorithm is lack of the adaptive mechanism to the dynamic network. Currently, many scholars take note of the effect to AQM control capability caused by time delay in network[7,8]. But it is lack of considering the improvement of self-adaptive ability to network circumstance. A virtual rate control arithmetic(VRC) is proposed in [9]. The algorithm is the congestion control algorithm based on rate. VRC algorithm is capable of response the flux change rapidly. The fuzzy immune PID control and Smith predictor is combined in this paper. So the non-linearity of system model and the uncertainty of time delay can be compensate by fuzzy controller. The system can realize real time online control rapidly. The Smith predictor can conquer the instability effect to system by time delay. The simulation indicates that the new arithmetic can get better steady and dynamic performance than traditional VRC algorithm. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 463 – 468, 2006. © Springer-Verlag Berlin Heidelberg 2006

464

L. He et al.

2 Analysis of Virtual Rate Control Algorithm In [10], the dynamic model of TCP network is proposed based fluid model: 1 W (t ) W (t − R(t )) W (t ) = − p(t − R(t )) R(t ) 2 R(t − R(t ))

q (t ) = r (t ) − C

(1)

(2)

W (t ) is the TCP window size. q(t ) is the queue length. R (t ) is the round road time delay (RTT). p (t ) is the mark/discard probability of packets. C is the link capability. r (t ) is the input rate. B is the buffer size. The VRC algorithm use Router queue length and input rate as the control variables. A proportion rate control mechanism is proposed in [9]. p(t ) = [α ( r (t ) − rt (t ))]+ , α > 0

(3)

rt (t ) is the goal rate, [⋅]+ = max(min(⋅,1), 0) . For making the queue length q(t ) approach the goal queue length Qref , the goal rate can be shown as (6):

rt (t ) = C + γ (Qref − q (t )), γ > 0

(4) *

But the input rate r * is always higher than goal rate rt at the balance point. For compensating the rate error, virtual goal rate is introduced. rt (t ) in (3) can be replaced by the virtual goal rate p(t ) = [α (r (t ) − rv (t ))]+

(5)

The update of virtual goal rate can minimize the difference between r (t ) and rt (t ) .

rv (t) = rt (t) −∆rv (t), t = nTs

(6)

∆rv (t + Ts ) = ∆rv (t ) + β Ts (r (t ) − rt (t )), β > 0

(7)

Then we can get the mark probability expression of VRC: t

p(t ) = [α (r (t ) − C ) + α ( β + γ )(q(t ) − Qref ) + αβγ ³ (q(τ ) − Qref )dτ ]+ 0

(8)

(10) can be express as (11) if the restrict of buffer size is neglect. t

p(t ) = K D e(t ) + K P e(t ) + K I ³ e(τ )dτ 0

(9)

(11) is a PID control fashion. K D = α , K P = α (β + γ ) , K I = αβγ , α , β and γ are constant coefficient. The system can guarantee local stabilization and get minimum error between the queue length and goal value if the K P , K I , K D satisfy some condition in [9]. But the AQM algorithm should adapt the infection of dynamic change and disturbing in the network. In the paper, the fuzzy immune control is introduced into the VRC

A VRC Algorithm Based on Fuzzy Immune PID-Smith

465

algorithm. The fuzzy immune controller can correct the change of network parameter online. At the same time, the Smith predictor is introduced to eliminate the influence to system caused by time delay. The system can get better steady and dynamic performance.

3 Control Strategy The control model exist strong uncertainty, non-linearity and affixation yawp. So the controller parameters should respond the change of network condition under realistic network flux. There need an adaptive control mechanism to track the change of network condition. In this paper, we ameliorate two functions based on traditional VRC algorithm: one is introducing the fuzzy immune PID controller which can selfadaptive adjust PID parameter online; the other is introducing the Smith predictor to compensate the time delay of network. 3.1 Fuzzy Immune PID Control Algorithm

Classical linear PID controller is hard to ensure the robustness and the effectiveness while the model is uncertain. The discrete equation is given by k ª e(k ) − e(k − 1) º R(k ) = k p e(k ) + kiT ¦ e(j ) + kd « » T ¬ ¼ j =0

(10)

Based on immune-feedback principle, we introduce the control structure[11] If we treat the amount of the antigens as control error e(k ) between the set point and the output of a control system. We can get the feedback control law in [11]: R(k ) = K ª¬1 − Șf ( R(k ) ,ǻR(k ) ) º¼ e(k )

(11)

We now can derive the output of the immune-PID controller as follow:

ª e(k ) − 2e(k − 1) + e(k − 2) º R(k ) = R (k − 1) + k 'p [ e(k ) − e(k − 1)] + ki Te(k ) + kd « » T ¬ ¼

(12)

where k 'p = K ª¬1 − Șf ( R (k ), ∆R (k ) ) º¼ , K=k1 is used to control the response speed, k1 is a stimulation factor, k2 is a suppression factor, Ș = k2 k1 is used to control the stabilization effect. When we design the immune PID control law, we now propose to employ a fuzzy controller to implement the nonlinear function f (⋅) . Fuzzy controller are the applications of fuzzy set and fuzzy inference in control theory, here our operation is typically divided into the following three phases. 1) Fuzzification: It is a procedure to define fuzzy sets for R (k ) , ∆R (k ) and f (⋅) . Here for R (k ) and ∆R (k ) , we define two fuzzy sets respectively, they are “positive” (P) and “negative” (N); Three fuzzy sets are defined for f (⋅) . They are “positive” (P), “zero” (Z) and “negative” (N). The membership function are defined over ( −∞, + ∞ ) .

466

L. He et al.

We choose Z model function, S model function and Triple model function to describe the fuzzy set of input and output variables. 2) Inference: We apply the four fuzzy rules below: a. If R is P and ∆R is P then f ( R, ∆R ) is N b. If R is P and ∆R is N then f ( R, ∆R ) is Z

c. If R is N and ∆R is P then f ( R, ∆R ) is Z

d. If R is N and ∆R is N then f ( R, ∆R ) is P For these rules, we use Zadeh fuzzy-logic AND operator. 3) Defuzzifization: We apply center of gravity method to calculate the output of fuzzy controller. 3.2 The Designing of the Fuzzy Immune PID-Smith Controller

The combination of fuzzy immune PID and Smith predictor can solve the control problem of large time delay system effectively. As figure 1 shows, G P ( s ) is the estimative model of control object, τ is the estimative time delay. Smith predictor control is one of the main methods to conquer large delay. If the Smith predictor is introduced into delay system, and the estimate model is matching with the real model, the character equation will not exist e − R s . That is to say the Smith predict control can eliminate the effect of time delay. adaptive adjust parameter of PID

e(k )

Q ref

fuzzy aapproach of immune feedback function

fuzzy immune PID controller

_

p (t )

control object

e R s

RC 2 2N º ª 2 N 2 «s 2 » R C¼ ¬

N 1º ª R «s » R¼ ¬

window

queue

~ G (s)

q (t )

+ +

Fig. 1. Is the network system frame of fuzzy immune PID-Smith control. G ( s ) is the Smith predictor, G ( s ) = G P ( s )(1 − e −τ s ) .

4 The Simulation Analyze We make simulation of the system. We take C = 5Mb / s , the capability of buffer is 300 packets , the number of TCP connection N = 60 . the time delay R = 0.04 s, Qref = 150 packets . We take PID-VRC as the comparison. Firstly, we can get control curve of the two algorithms in LAN. R = 0.04 s A disturb is put into the system at 20s. The result is shown as Fig.2.

A VRC Algorithm Based on Fuzzy Immune PID-Smith 180

467

350

160

fuzzy immune PID-Smith PID-VRC

300

140

y(t)/packet

q(t)/packet

250 120 100 80 60 40

200 150 100 50

20

PID-VRC fuzzy immune PID-Smith-VRC

0

0 -20

0

10

20

30

40

50

-50

60

0

5

10

t/s

15

20

25

30

t/s

Fig. 2. The simulation curve in LAN

Fig. 3. The simulation curve in WAN

Secondly, we make simulation in WAN. We change the delay time into R = 0.6s . The control curve of the two control algorithmic are shown as Figure 3. 300

800 PID-VRC fuzzy immune PID-Smith

250

PID-VRC fuzzy immune PID-Smith 600

y(t)/packet

y(t)/packet

200 150 100 50

400

200

0

0 -200 -50 -100

0

10

20

30

40

50

60

t/s

Fig. 4. The result when the parameter changes

-400

0

10

20

30

40

50

60

t/s

Fig. 5. The result under the worst network condition

Then we change the parameter of control object, i.e. N = 80 , C = 4Mb / s . The control curve of the two control algorithmic are shown as Figure 4. At last, we get simulation under the worst network condition, i.e. N = 80 , C = 4Mb / s , R = 0.6s . The control curves are shown as Figure 5. We can see that the fuzzy immune PID-Smith arithmetic has better stability and robust than PID-VRC arithmetic.

5 Conclusion AQM is a technique hotspot of TCP congestion control. But the existing AQM arithmetic do not consider the influence of parameter change and large delay. A new VRC algorithm is produced in this paper. The new algorithm combines the fuzzy immune PID and Smith predictor. So it improve the self-adaptive ability of network, and realize the self-adaptive function of PID parameters online based on the status of network dynamic changes. At the same time, the algorithm compensates the delay of network. The analysis of simulation shows that the new algorithm has less packet discard probability, and can get to the goal queue length rapidly. The influence of the network change and time delay is self-adaptive conquered by the algorithm.

468

L. He et al.

Acknowledgement This work is supported by the National Key Spark Project and by the Detection Technology of Forest Products Volatile Foundation.

References 1. Braden B, Clark D, et al.: Recommendations on Queue Management and Congestion Avoidance in the Internet. RFC2309, Internet Engineering Task Force (1998) 2. Floyd S, Jacobson V.: Random Early Detection Gateways for Congestion Avoidance. IEEE / ACM Transactions on Networking (1993) 397–413 3. Floyd S, Gummadi R, et al.: Adaptive RED: An Algorithm for Increasing the Robustness of RED’s Active Queue Management [EB/OL]. http://www.icir.org/f loyd/ papers/adaptive Red. Pdf (2001) 4. Kunniyur S, Srikant R.: End-to-end Congestion Control: Utility Functions, Random Losses and ECN Marks. IEEE/ACM Trans on Networking, (2003) 689–702 5. Hollot C V, Misra V, et al.: On Designing Improved Controllers for AQM Routers Supporting TCP Flows. Proc of IEEE INFOCOM 2001 (2001) 1726–1734 6. Misra V, et al.: Fluid-based Analysis of a Network of AQM Routers Supporting TCP Flows with An Application to RED. ACM SIGCOMM 2000, Sweden (2000) 151–160 7. Ying L, et al.: Global Stability of Internet Congestion Control with Heterogeneous Delays. American Control Conference, New York (2004) 8. Ren F.Y., et al.: The Congestion Control Algorithm of Large Delay Network. Journal of software, 14 (2003) 503–511 9. Park E C, Lim H, Park K J, et al.: Analysis and Design of the Virtual Rate Control Algorithm for Stabilizing Queues in TCP Networks. Int. J. Computer Networks. 44 (2004) 17–41 10. Misra V, Gong W B, Towsley D.: Fluid-based Analysis of a Network of AQM Routers Supporting TCP Flows with an Application to RED [DB/OL]. http://www.net.cs.umass. edu/~misra/ (2001) 11. Wei W, Guo-hong Z.: Artificial Immune System and Its Application in The Control System. Control Theory and Application. 19 (2002) 158–160

Absolute Stability of State Feedback Time-Delay Systems Hanlin He1 and Zhongsheng Wang2 1

2

College of Sciences, Naval University of Engineering, Hubei, Wuhan, P.R. China, 430033 [email protected] Department of Electric Engineering, Zhong Yuan Institute of Technology, Henan, Zhengzhou, P.R. China, 450007

Abstract. Given a linear time-delay system and a corresponding algebraic Riccati equation, a method is proposed for synthesizing state feedback controllers for nonlinear Lurie’s control systems. Using Lyapunov functional analysis, the state feedback controllers of insuring absolute stability of closed systems is proposed, and the corresponding sector of absolute stability is also given. By using the comparison results of the algebraic Riccati equation, it is easy known that when the gain becomes larger, the sector of absolute stability becomes bigger.

1

Introduction

Many nonlinear physical systems can be represented as a feedback connection of a linear dynamical system and a nonlinear element. When the nonlinear elements satisfying some sector condition, the problem of showing that the origin is uniformly asymptotically stable for all nonlinearities in the sector, is called absolutely stability. The problem was originally formulated by Lurie and sometimes called Lurie s problem [1-2]. In recent years, increasing attention has been devoted to the problems of stability of Lurie type control systems with time-delay. Various techniques of absolute stability have been proposed over the past years, including time-delay-dependent and time-delay-independent absolute stability criteria; see, e.g. [3-12]. This paper considers a linear time-delay system that is not stable, but via state feedback, we can show that the synthesized time-delay system is absolute stability with some sector. The result shows that when the gain becomes larger, the sector of absolute stability becomes bigger.

2

Preliminaries

Throughout this paper, the following notations are used. X ≥ Y (respectively X > Y ), where X and Y are symmetric matrices, means that the matrix X − Y is positive semi-deﬁnite (respectively, positive deﬁnite). λmin (P ) and λmax (P ) denote the minimal and maximal eigenvalues of matrix P . Cτ = C([−τ, 0], Rn ) D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 469–473, 2006. c Springer-Verlag Berlin Heidelberg 2006

470

H. He and Z. Wang

denotes the Banach space of continuous vector functions mapping the interval [−τ, 0] into Rn with the topology of uniform convergence φC = max−τ ≤θ≤0 φ(θ) for any φ ∈ Cτ . . refers to either the Euclidean vector norm or the induced matrix 2-norm. Deﬁnition 1 [1]: A memoryless function h : [0, ∞) × RP −→ RP is said to belong to sector 1)[0, ∞] if v T h(t, v) ≥ 0. 2)[K1 , ∞] with K1 = K1T if v T (h(t, v) − K1 v) ≥ 0. 3)[0, K2 ] with K2 = K2T if h(t, v)T (h(t, v) − K2 v) ≤ 0. 4)[K1 , K2 ] with K = K2 − K1 = K T if (h(t, v) − K1 v)T (h(t, v) − K2 v) ≤ 0.

3

Main Result

Consider the linear time-delay system described by x(t) ˙ = Ax(t) + Ad x(t − τ ) + Bu(t),

(1)

x(t0 + θ) = φ(θ), θ ∈ [−τ, 0)

(2)

with the initial condition (2), where x(t) ∈ R is the state, u(t) ∈ R is the control input, τ > 0 is the time-delay of the system, A, Ad , B, C are known real constant matrices of appropriate dimensions. Furthermore, pair (A, B) is assumed to be stabilizable. Suppose there is a symmetric positive matrix R such that 2BRB T > Ad ATd , then for a given symmetric positive matrix, the following Riccati equation has symmetric positive matrix solution P [13]. n

AT P + P A − P (2BRB T − Ad ATd )P + I + Q = 0,

m

(3)

where I is the identity matrix. Via the state feedback u = Kx + v,

(4)

where K = −RB P . The system (1) becomes T

x(t) ˙ = (A + BK)x(t) + Ad x(t − τ ) + Bv(t).

(5)

Consider the following time-delay Lurie control system x(t) ˙ = (A + BK)x(t) + Ad x(t − τ ) + Bv(t), y(t) = C T x(t), v(t) = f (y(t)), f ∈ [−L, L],

(6)

where 0 ≤ L = LT . Under the stated condition, we have the following result. Theorem 1: Suppose P, Q, K are given by (3) and (4), then when λmax (L) <

λmin (Q) , 2P BC

the system (6) is absolute stability for all f ∈ [−L, L].

(7)

Absolute Stability of State Feedback Time-Delay Systems

471

Proof: Take the following Lyapunov functional candidate: T

t

V (x(t)) = x(t) P x(t) +

x(θ)T x(θ)dθ.

t−τ

Clearly V (x(t)) satisﬁes the following inequalities λmin (P )x(t)2 ≤ V (t) ≤ (λmax (P ) + τ )xt 2C , where xt = max−τ ≤θ≤0 xt (θ). The time derivative of V (x(t)) along the system (6) is given by: V˙ (x(t)) = x(t) ˙ T P x(t) + x(t)P x(t) ˙ + x(t)T x(t) − x(t − τ )T x(t − τ ). Then from (3) and (6), it follows V˙ (x(t)) = −x(t)T P x(t) + 2x(t)T P Bf (y(t)) −[x(t − τ ) − ATd P x(t)]T [x(t − τ ) − ATd P x(t)]. By deﬁnition 1, [f (y(t)) − (−L)y(t)]T [f (y(t)) − Ly(t)] ≤ 0. Because [f (y(t)) − (−L)y(t)]T [f (y(t)) − Ly(t)] = f (y(t))T f (y(t)) − y(t)T LT Ly(t), hence, when f ∈ [−L, L], f (y(t)) ≤

LT Ly(t) = λmax (L)y(t).

Hence V˙ (xt ) ≤ −xTt Qxt + 2x(t)T P Bf (y(t)) ≤ −λmin (Q)xt 2 + 2x(t)P Bf (y(t)) ≤ −[λmin (Q) − 2λmax (L)P BC]x(t)2 . Hence, when λmax (L) <

λmin (Q) , 2P BC

V˙ (x(t)) < 0(x(t) = 0), the Theorem 1 is proved. Remark: The Hamiltonian matrix associates the equation (3) is ! A −(2BRB T − Ad ATd ) H= . −(I + Q) −AT

472

H. He and Z. Wang

Let J =

! 0 −I , then I 0 JH =

! I +Q AT . A −(2BRB T − Ad ATd )

When , R1 ≤ R2 , JH1 =

! ! I +Q AT I +Q AT = ≥ JH . 2 A −(2BR1 B T − Ad ATd ) A −(2BR2 B T − Ad ATd )

By the comparative result [13] about Riccati equation, the symmetric positive matrix solution P1 associating with R1 and the symmetric positive matrix solution P2 associating with R2 satisﬁes P1 ≥ P2 . Because P B ≤ P B, hence, when P1 ≥ P2 , P1 B ≥ P2 B. This means that, by (7), when the gains K of state feedback become larger, the sectors of absolute stability for the system (6) become bigger. When L is a diagonal matrix L = diag{l1 , l2 , · · · , lm }, 0 ≤ li ≤ l = max{l1 , l2 , · · · , lm } = λmax (L), we have the following: Corollary 1: Under the same assumption as Theorem 1, when λmax (L) < λmin (Q) 2P BC , the system (6) is absolute stable for all f ∈ [−lI, lI].

4

Illustrative Example

Example: Suppose the system (1) with the following data ! ! ! 1 1.5 0 −1 01 A= ; Ad = ;B = . 0.3 −2 0 0 10 Then by choosing Q ==

! 21 , 13

R1 = I, R2 = 2I, respectively, we get the solutions ! ! 2.6256 1.3714 1.1841 0.8057 P1 = , P2 = , 1.3714 1.0105 0.8057 0.7732 0.5855 respectively. Then, from (7) when λmax (L1 ) < 0.2027 C , λmax (L2 ) < C , respectively, the system (6) is absolute stability for all f ∈ [−L1 , L1 ] and f ∈ [−L2 , L2 ] respectively.

5

Conclusions

A synthesizing state feedback controllers for unstable nonlinear Lurie’s control systems is proposed in this paper. The results show that when the gain becomes larger, the sector of absolute stability becomes bigger.

Absolute Stability of State Feedback Time-Delay Systems

473

Acknowledgments This work is supported by the Natural Science Foundation of China (60474011), and Academic Foundation of Naval University of Engineering, China.

References 1. Khalil, H.K.: Nonlinear Systems Analysis. 3rd ed, Prentice-Hall, Englewood Cliﬀs, NJ (2002) 2. Liao, X.X.: The Mathematical Theory of Stabilization and Its Application. 2nd ed, Huazhong Normal University Press, Hubei, China (2001) 3. Popov, V.M. and Halanay, A.: About Stability of Nonlinear Controlled Systems with Delay. Automation and Romote Control, 68(2) (1962) 849-851 4. Somolines, A.: Stability of Lurie Type Functional Equations. J Diﬀ Eqs, 26(1) (1997) 191-199 5. Gan, Z.X. and Ge, W.G.: Absolute Stability of a Class of Multiple Nonlinear Control Systems with Delay. Acta Mathematica Sinica, 43(4) (2000) 633-638 6. Peng, D.Z., Xu, B.G.: Delay-dependent Robust Absolute Stability for Lurie Uncertain Time-delay Systems. Electric Machines and Control, 7(4) (2003) 322-325 7. Xu, B.J., Shen, Y., Liao, X.X.: Delay Bound of the Robust Stability for Lurie Uncertain Control Systems. Systems Engineering and Electronics, 23(8) (2001) 52-54 8. Wu, G.J., He, H.L.: Absolute Stability of Perturbed Lurie Control Systems with Time-delay by State Feedback. In Lu XX. et al (eds): The First International Conference on Complex Systems and Applications, Watam Press, Waterloo, (2006) in press 9. Yang, B. and Chen, M.Y.: Delay Dependent Criterion for Absolute Stability of Lurie Type Control Systems with Time Delay. Control Theory and Applications, 18(6) (2001) 929-931 10. Yang, Y., Huang, L.: Absolute Stabilization Related to Circle Criterion: LMI-based Approach. Applied Mathematics and Mechanics, 24(8) (2003) 805-811 11. Yu, L.: On the Absolute stability of a Class of Time Delay Systems. Acta Automatica Sinica, 29(5) (2003) 780-784 12. Cao, J.F., Yu, L.: On the Robust Absolutestability of Uncertain Time-delay Systems. Control and Decision, 19(1) (2004) 114-119 13. Zhou, K. M., Doyle, J., Glover, K.: Robust and Optimal Control, Prentice-Hall, Upper Saddle River, NJ (1996)

Adaptive Fuzzy Control of Nonlinear Systems Based on Terminal Sliding Mode Shuanghe Yu Automation Research Center, Dalian Maritime University 116026 Dalian, P.R. China [email protected]

Abstract. A global fast terminal sliding mode (FTSM) controller with fast terminal fuzzy approximator is proposed for the nonlinear systems. The finite time convergence property of FTSM is applied not only in the reaching phase and the sliding phase, but also in the fuzzy approximator. The convergence process is accelerated with a faster finite time and the steady errors are reduced greatly. Simulation studies show the effectiveness of the proposed scheme.

1 Introduction Fuzzy control is suitable for uncertain systems due to its universal approximation ability. Although the stability of an adaptive fuzzy control system has been guaranteed in the Lyapunov sense in [1], how the fuzzy logic system approximates the unknown system function is not clear yet. The terminal sliding mode (TSM) concept has been proposed to address the finite time control issue [2]. In particular, the fast TSM (FTSM) concept, is particularly effective in terms of delivering fast and finite time control performance as well as high-precision [3]. Lyapunov conditions for finite time stability and the rigorous proofs in mathematics are presented in [4]. Many researchers proposed adaptive fuzzy sliding mode controller where the fuzzy logic systems were employed to approximate the unknown system functions and sliding mode controls ensure the stability of the closed-loop system [5]. In this paper, a continuous finite time TSM controller combined with a finite time fuzzy approximator is proposed for the nonlinear system, a faster finite time convergence and reduced tracking error are achieved in the Lyapunov sense.

2 Preliminaries We consider the

n th-order SISO nonlinear system expressed in the canonical form: x ( n ) = f ( x, x , , x ( n −1) ) + bu y=x

.

(1)

where x = ( x, x ,, x ( n −1) )T ∈ R n is the state vector, b is the control gain, and u ∈ R and y ∈ R are the control input and the output of the system, respectively. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 474 – 479, 2006. © Springer-Verlag Berlin Heidelberg 2006

Adaptive Fuzzy Control of Nonlinear Systems Based on Terminal Sliding Mode

475

The FTSM concept [6], or fast terminal attractor, is depicted as

s = e + α e + β e q

=0 .

p

(2)

where e represents the tracking error, α , β >0 are constants and q, p are odd positive integers. It can be derived that the time to reach the equilibrium e = 0 is ts =

p αe(0) ( p − q ) / p + β ln . α ( p − q) β

(3)

The recursive structure based on the FTSM concept for higher order systems has been derived in [8]. A continuous but finite time reaching law can be directly achieved as s = −α s − β s q

p

(4)

.

The fuzzy logic system is expressed as the following form:

∏i =1 µ A ( xi ) ( x) = M n ¦l =1 (∏i =1 µ A ( xi )) n

y( x ) = θ ξ ( x ) , ξ T

l

l i

.

(5)

l i

where µ A j ( xi ) is the membership function. θ is a parameter vector of ξ (x ) . i

3 Global FTSM Control Design Based on the FTSM, fast terminal reaching law concepts and differential inequalities achieved in [2], the main result is presented as follows: Theorem 1. For the system (1), if we choose the control law n −2 1 n −2 d n −k −1 u(t ) = − ( ¦ α k sk( n −k −1) + ¦ β k n− k −1 sk( qk b k =0 dt k =0

pk )

+ α n−1 s n−1 + β n −1 snq−n1−1

pn −1

) .

(6)

then the system will reach the neighborhood ∆ of sn −1 = 0 according to the terminal attractor sn −1 = −α n −1 s n −1 − β n′−1 s nq−n 1−1 t ' sn −1 =

pn −1

at least in a finite time t ' sn −1 :

pn −1 α s (0) ( pn −1 −qn −1 ) pn −1 + 2 ( qn −1 − pn −1 ) / 2 pn −1 η . ln n −1 n −1 α n −1 ( pn −1 − qn −1 ) 2 ( qn −1 − pn −1 ) / 2 pn −1 η

β n′−1 = β n −1 −

f ( x) s nq−n1−1

pn −1

, β n −1 =

f U ( x) q

sn −n1−1

p n −1

pn −1 § f U ( x) · ° ¸¸ + η , η > 0 , ∆ = ® x : sn −1 ≤ ¨¨ © β n −1 ¹ °¯

(7) qn −1

½ ° ¾ °¿

Proof. Taking the first order derivative of sn −1 , we can get

sn −1 = sn −2 + α n− 2 sn −2 + β n −2 Since si = si −1 + α i −1 si −1 + β i −1 siq−i1−1

pi −1

d qn −2 sn −2 dt

pn − 2

.

(8)

, and the l th order derivative of si is

si( l ) = si(−l 1+1) + α i −1 si(−l 1) + β i −1

d l qi −1 si −1 dt l

pi −1

.

(9)

476

Shuanghe Yu

Then it can be easily induced step by step till the following equation is achieved: n−2

n −2

k =0

k =0

sn −1 = f ( x ) + bu(t ) + ¦α k sk( n − k −1) + ¦ β k

d n − k −1 ( q k sk dt n − k −1

pk )

.

(10)

Substituting the control law (6) into (10) yields § f ( x) sn −1 = −α n −1 sn −1 − ¨¨ β n −1 − qn −1 pn −1 sn −1 ©

· qn−1 ¸¸ sn −1 ¹

If we choose the Lyapunov function candidate as V = § f ( x) V = −2α n −1V − 2γ ¨¨ β n −1 − qn−1 p n−1 sn −1 ©

where 0 < γ =

p n −1

.

(11)

1 2 sn −1 , its first derivative is 2

· γ ¸¸V = −2α n −1V − 2γ β n' −1V γ . ¹

(12)

pn −1 + qn −1 < 1 . It means that as long as the condition 2 pn −1

β n' −1 = β n −1 −

f ( x) s

qn −1 pn −1 n −1

≥ β n −1 −

f U ( x) s

qn −1 pn −1 n −1

>0.

(13)

which is equal to the outside of the region ∆ , is satisfied, the Lyapunov theorem always is satisfied. With the finite time stability defined in [6], we can conclude that the system will enter the region ∆ at least in the finite time (7). Q.E.D. Remark 1. If we choose β n −1 > f U ( x ) and pn −1 q n −1 > 1 , we can always make the neighborhood ∆ of the sliding manifold s n −1 small enough

4 FTSM Control Design with Fuzzy Approximator Firstly we make the following assumption: There exists a coefficient vector θ such *

*T

that the optimal fuzzy model f * ( x ) = θ ξ ( x ) approximates the unknown system function f (x ) with the minimum approximation error ε over a compact set U , that is, ∃θ such that f ( x ) − f * ( x ) = ε , ∀ x ∈ U . According to the assumption, the actual system function can be expressed as *

*T

f ( x) = θ ξ ( x) + ε .

(14)

If we use the fuzzy logic model (5) to estimate the unknown function f (x ) in T system (1) as fˆ ( x ) = θˆ ξ ( x ) , the estimation error can be given by T ~T *T e = f ( x ) − fˆ ( x ) = θ ξ ( x ) − θˆ ξ ( x ) + ε = θ ξ ( x ) + ε .

(15)

Adaptive Fuzzy Control of Nonlinear Systems Based on Terminal Sliding Mode

477

~T ~ * where θ = θ − θˆ , and define e1 = θ ξ ( x ) as the estimation error with respect to the optimal approximator. Theorem 2. The parameter vector θˆ = [θˆ1 θˆM ] of the fuzzy logic system (5) can be continuously updated as

θˆi = (α n e1 + β n e1qn

pn

)ξ i ( x ) .

(16)

in order to attain the optimal approximator to the unknown system (1), the following properties can be guaranteed: (a). The estimation error e1 will convergence to zero in a finite time. (b). The estimated parameters θˆ of the fuzzy model will remain bounded. i

1 2 e1 . For proving e1 = 0 2 will be reached in a finite time, we take the time derivative of V1 .

Proof. (a). Define the Lyapunov function candidate V1 =

M ~ V1 = e1e1 = e1 ( ¦i =1θ iξi ( x )) M

= −e1 ( ¦ (α n e1 + β n e1qn

pn

j =1

)ξi2 ( x ) .

(17)

= −α nV1 − β nV1( p n + qn ) 2 pn Because of 0 < ( pn + qn ) 2 pn < 1 , the above formula is a FTSM. Therefore V1 = 0 , i.e. e1 = 0 will be reached in finite time. (b). Consider the Lyapunov function candidate: V2 =

1 2 1 M ~2 e1 + ¦θ i . 2 2 i =1

(18)

and take the first derivative, we have M

~ ( p +q ) 2 p V2 = −α nV1 − β nV1 n n n − ¦ θ i (α n e1 + β n e1( pn + qn ) i =1

pn

)ξ i ( x )

.

(19)

( pn + qn ) 2 p n n 1

= −2α nV1 − 2 β V

Thus, we have V2 ≤ 0 for all V1 ≥ 0 , V2 will be asymptotically convergent. Therefore, the estimated parameters θˆ i will remain bounded in the case of its bounded initial value. Remark 2. We can get the following indirect adaptive fuzzy global FTSM controller n −2 n−2 1 d n − k −1 u (t ) = − ( fˆ ( x ) + ¦ α k sk( n − k −1) + ¦ β k n − k −1 s k( qk b dt k =0 k =0

pk )

+ α n −1 sn −1 + β

n −1

snq−n −11

pn −1

) . (20)

478

Shuanghe Yu

Because the estimate error satisfies f ( x ) − fˆ ( x ) ≤ f U ( x ) , the system states will

reach a reduced region in a faster finite time.

5 Simulations Given an unstable plant x (t ) =

1 − e − x (t ) + u (t ) . 1 + e − x(t )

(21)

The control objective is to force the system state to track the desired state trajectory xd = sin(t ) . We define six fuzzy sets over the intervals [-3,3] with labels N 3 , N 2 , N1 , P1 , P2 , P3 , and the membership functions are Gaussian form. Here the approximation error is ω = fˆ x θ − f ( x ) and the tracking error is e = x − x . With

( )

d

f

(a) Approximation

(c) Tracking

(b) Tracking error

(d) Control

Fig. 1. FTSM control of an unstable unknown nonlinear system

Adaptive Fuzzy Control of Nonlinear Systems Based on Terminal Sliding Mode

479

the adaptive laws and the control law designed in section 3 and 4, the simulation results are shown in Fig. 1, which demonstrates the convergences of the approximation error and tracking error are all assured in spite of the unknown nonlinear function, and the chattering of traditional sliding mode control is reduced.

References 1. Sala A., Guerra T. M., Babuska R.: Perspectives of Fuzzy Systems and Control. Fuzzy Sets and Systems. (2005) 432–444 2. Yu, S., Yu, X., Shirinzadeh B., Man Z.: Continuous Finite-Time Control for Robotic Manipulators with Terminal Sliding Mode. Automatica. (2005) 1957–1964 3. Yu, S., Yu, X., Man Z.: A Fuzzy Neural Network Approximator with Fast Terminal Sliding Mode and Its Applications. Fuzzy Sets and Systems. (2004) 469–486 4. Hong, Y., Huang, J., Xu, Y.: On An Output Finite-Time Stabilization Problem. IEEE Trans. on Automatic Control. (2001) 305-309 5. Rojko A., Jezernik K.: Sliding-Mode Motion Controller with Adaptive Fuzzy Disturbance Estimation. IEEE Trans on Industrial Electronics. (2004) 963-971

Simulation of Air Fuel Ratio Control Using Radius Basis Function Neural Network Zhixiang Hou College of Automobile and Mechanical Engineering, Changsha University of Science and Technology, 410076 Changsha, China [email protected]

Abstract. In order to decrease emission and fuel consumption of gasoline engine, we should accurately control air fuel ratio approach to the theoretical value. But its accurate control is very difficulty, especial under transient conditions. A composite air fuel ratio control strategy based on neural networks is advocated in this paper, where feedback control is achieved by means of regular PI controller to ensure the system stability and antidisturbance, and feedfoward control is achieved by virtue of neural networks controller to enhance response ability of control system. The simulation was finished using experiment data of HL495 gasoline engine, and the results show the effectiveness of this control method. Thus the system can effectively avoid the present defects elicited by enormous calibration to accurate control air fuel ratio with fair self-adaptability.

1 Introduction Air fuel ratio accurate control is a key index decreasing emission and fuel consumption of gasoline engine, and we should accurately control air fuel ratio approach to the theo-

retical value. The subtle fluctuation of the actual air fuel ratio would result in serious declining of the conversion efficient in three-way catalytic. Due to the intake manifold of gasoline engine having the characteristics of the fuel film speciality and the instability during the course of intake-air and the indeterminacy of fuel quality and operating conditions, the control accuracy about air fuel ratio under transient condition is much bad. So feedforward compensation is achieved by calibration under transient condition. But the control accuracy about air fuel ratio under transient condition is difficult to dominate, as the complicated calibration, the fussy table look-up, special for the poor self-adaptability based on calibration which is used to compensate the fuel consume when the gasoline engine aging and working environment changing. In recent years, intelligent control strategy based on neural networks has become the focus of the research on air fuel ratio accuracy control both internal and internet [1-5]. The thesis advances a new composite control method that is based on neural networks to modify the present defect with the poor self-adaptability, elicited by enormous calibration to control air fuel ratio under transient condition. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 480 – 485, 2006. © Springer-Verlag Berlin Heidelberg 2006

Simulation of Air Fuel Ratio Control Using Radius Basis Function Neural Network

481

2 Mean Value Model of Gasoline Engine As far as the model of gasoline engine is concerned, two important models are used widely. One is Mean Value Engine Model (MVEM), which is a model based-time and focuses on the overall dynamic character. Other is Cylinder-by-Cylinder Engine Model (CCEM), which uses different control methods to each cylinder by taking into account the formation of gas mixture among every cylinder and the diversities among intake and exhaust flow phase, compression phase and combustion work phase. Generally speaking, the model is more complex the control of engine is more difficult. We hope that the model of gasoline engine possibly has the fewer parameters and lower orders so that we could modify and adjust the parameters easily under each operating condition. MVEM could simulate the dynastic response processes of gasoline engine with high precision. Taking engine inter temperature as constant, MVEM can be simply expressed using the following formula (1) [6]. m fv = (1 − X ) m fc 1 ff = m ( − m ff + X m fc )

τf

(1)

m fi = m fv + m ff

Where m fv is the mass flow of fuel vapor, m fc being the mass flow of fuel, m ff being the mass flow of fuel file , m ff being the variable ratio of fuel file, m fi being the fuel mass flow at admission valve, τ f and X being respectively fuel file vapor constant and fuel injection partition coefficient, whose estimation value could be calculated according to formula (2). Xˆ = 9 . 6 × 10

τˆf = 1 . 67 ω

−5

ω + 0 . 7236

− 0 . 65

(2)

Where ω is engine rotation speed. MVEM seems a linear system apparently, but it is a nonlinear in fact because τ f and X are time-varying parameters. The total injection quantity is defined as formula (3): m fc = k f T

(3)

Where T is fuel injection time, k f being fuel injection coefficient. According to MVEM, gas flow m ao delivered through the intake pipe into the cylinder is expressed using formula (4) [7]: m ao = φη vol m a ω

(4)

Where ηvol is volumetric efficiency and φ is calculated by the following formulas (5) η vol = ( 24 . 5ω × 10 4 ) m a 2 + ( − 0 . 167 ω + 222 ) m a + ( 8 . 1 × 10 − 4 + 0 . 352 ) φ =

Ve 4π V m

(5)

Where Ve is engine emission, Vm being engine volume, ma being the flow ratio through the throttle, which could be tested by flow sensor directly. Air fuel ratio

482

Z. Hou

control is to adjust fuel injection time according to the entering-air amount to control the actual air fuel ratio near the theoretical one which is indicated as λdes .

3 Structure of Air Fuel Ratio Control Using Neural Networks Feedforward and feedback control are respectively achieved by means of neural networks controller and regular PI controller. Feedback control is to ensure the system stability and anti- interference, which is achieved by regular PI controller using oxygen sensor signals; Feedforward control is to enhance response ability of control system under transient conditions. The structure of air fuel ratio control is shown in figure 1. Neural Networks Controller uses the leaning algorithm based on teachers to calculate the output fuel injection time ( T 1 ), compares it with the total injection time T , and then rectify the weights of neural network. The purpose of study is to minimize the odds about fuel injection time between the actual and the outputted by neural networks. So radius basis function neural networks whose inputs are the engine rotation speed ω and the throttle degree α which are the two chief factors affecting engine admission volume is adopted.

Fig. 1. Structure of air fuel ratio control using neural network

4 Radius Basis Function Neural Networks Radius basis function neural network is a three layers feedforward network, whose mapping relation from the output to the input is nonlinear and the one from the hidden layer to the output is linear. The structure of RBF network is shown in figure 2. Supposed X = [ x1, x 2,… xn]T is the input vector of RBF neural network, and H = [h1, h 2, … hm ]T is radial basis vector, where hj is GES function as following formula (6).

᧨

᧨

h j = exp(

x − CJ 2bj

2

2

),

j = 1,2,…m

(6)

The central vector at node j is represented in terms of C = [ c j 1 , c j 2 , … , c jn ] T , and basis width vector is supposed in the form of B = [b1, b 2,… , bm ]T , where bj is the basis width degree parameter at node j, which is greater than zero. Taking following

Simulation of Air Fuel Ratio Control Using Radius Basis Function Neural Network

483

Fig. 2. Structure of RBF neural network

formula (7) as the weights vector, the overall output of the RBF neural network could be expressed as formula (8). W = [ w1, w2,…, wm]T

(7)

ym ( k ) = w1h1 + w2 h2 + … + wmhm

(8)

Taking formula (9) as performance index function, weights, central node and node basis width parameters were modified according to formula (10)-(14) based on gradient descent algorithm J=

(9)

1 [ y ( k ) − ym ( k )] 2 2

wj ( k ) = wj ( k −1) + η ( y ( k ) − Ym ( k ) )h + α (wj ( k −1) − wj ( k − 2 ) ) ∆bj = ( y ( k ) − Ym ( k ) ) wjhj

X − Cj bj

2

(11)

3

bj ( k ) = bj ( k −1) + η∆bj + α (bj ( k −1) − bj ( k − 2) ) ∆cji = ( y ( k ) − Y ( k ) m) wj

(10)

xj − cji bj 2

cji ( k ) = c( y ( k ) − Ym ( K ) ) + η∆cji + α (cji ( k −1) − cji ( k − 2) )

(12) (13) (14)

Where η is study ratio, α being the momentum factor.

5 Simulation of Air Fuel Ratio Control Using RBF Neural Network The electronic jet engine of HL495Q is used on trial, whose working-volumetric with single cylinder is 710cm3 , engine emission being 2.84L, calibration power and rotate speed being 73.5KW/3800r/min, compression ratio being ε = 7.8 . Experiment was achieved using the typical throttle signals that is used widely abroad showed in figure 3, where the throttle changes one time for every 5 seconds, and every time changes 5‫ޕ‬, and sample ratio is 0.01 second, and throttle bring on the fluctuation of velocity within 2 seconds. When throttle act as figure 3, engine rotation speed and air flow ratio are shown in figure 4 and figure 5 respectively. The topulogy structure of RBF

484

Z. Hou

neural networks is determined 2-5-1 by trail training, and the throttle degree and engine rotation speed are the input vector, fuel injection time (T1) is the output value, definition function e = ¦ T 0 is the error function of the neural networks. Presumed the initial value of fuel injection time is T 1 = 0ms, T 0 = 0.3ms, the value of subgrade reaction is kp, the integral coefficient is ki, the physical parameters and correlative control parameters are shown in table 1. 2

Fig. 3. Throttle degree

Fig. 4. Rotation speed

Fig. 5. Air flow ratio

Table 1. Simulation parameter

㧔CM 㧕 V (L)

Ve

710

3

m

2.8

Kf

Kp

Ki

η

α

¬

des

5.47 2.53 0.1 0.01 0.08 14.7

Simulation algorithm is displayed as followsing: Step1: initializing correlative parameters. Step2: calculating the air flow ( m fi ) based on the throttle degree and engine rotation speed according to formula (4). fc according to formula (3), and then calculating Step3: calculating fuel injection m the fuel quantity according to formula (1). Step4: calculating the actual air fuel ratio according to formula λf = m ao (which is m fi

simulated by the computation model instead of oxygen sensor), then getting the error of air fuel ratio, then calculating fuel injection time T0 by PI controller Step 5: if T 0 ≥ 0.001ms, we should calculate the error of neural networks according to e = ¦ T 0 2 , and then adjust the weights, central vectors values and output fuel injection time T1, then calculate the total fuel injection time T ;if T 0 ≤ 0.001ms, we could consider the error is slight. So the weights are constant and the overall outputs are produced by neutral networks. Step 6: go to step 2 The simulation result is shown in figure 6 according to the above algorithm when throttle act as figure 3. Seen from figure 6, actual air fuel ratio was control near

Simulation of Air Fuel Ratio Control Using Radius Basis Function Neural Network

485

Fig. 6. Simulation result of air fuel ratio control

theoretical one and its overshoot is less than 2%, and control system of air fuel ratio based on RBF neural network owes perfect respond ability and fair self-adaptability.

6 Conclusion The thesis advances a composite control method of air fuel ratio based on neural networks, which can effectively avoid the present defects elicited by enormous calibration to control air fuel ration with fair self-adaptability. The simulation results show the effectiveness of this control method.

References 1. Shiraishi, S.L., Cholpri, D.D.: CMAC Neural Network Controller for Fuel Injection System. IEEE Transaction on Control System Technolojy. (1995)32-38 2. Won, M., Choi, S.B.: Air Fuel Ratio Control Of Spark Ignition Engines Using Gaussian Network Sliding Control. IEEE Transaction on Control System Technolojy. (1998)678-687 3. Wendeker, M.: Hybrid Air Fuel Ratio Control Using The Adaptive And Neural Networks. SAE pape2000-01-1248. (2000)1477-1484 4. Shayler, P.J., Goodman, M.S.: Transient Air/Fuel Ratio Control Of An Engine Using Network. SAE paper 960326.(1996)410-419 5. Guoyou, L.: The Research On Self-adaptability Fuzzy Neural Mix Coupling Controller. Advanced Technology Communication. (2004)78-80 6. Hendricks, E.: Mean Value Modeling Of Spark Engines. SAE paper 9606 (1996)1359-1372 7. Manzie, C., Palaniswami, M., Watson, H.: Gaussian Networks For Fuel Inject Control. Proc instn Mech Engrs. (2001)1053-1068

An Experimental Study on Multi-mode Control Methods of Flexible Structure Using PZT Actuator and Modified IMSC Algorithm W.S. Lee1, H.J. Lee2,*, J.H. Hwang1, N.K. Lee2, H.W. Lee2, G.A. Lee2, S.M. Bae3, and D.M. Kim1 1

Dept. of Aerospace and Mechanical Engineering, Hankuk Aviation University, 200-1, Hwajeon-dong, Deogyang-gu, Goyang-city, Geonggi-do, #412-791, Korea [email protected], {jhhwang, dmkim}@hangkong.ac.kr 2 Digital Production Processing & Forming Team, Korea Institute of Industrial Technology, 994-32, Dongchun-dong, Yeonsu-gu, Incheon Metropolitan City, #406-130, Korea {naltl, nklee, hwlee, galee}@kitech.re.kr 3 Dept. of Industrial & Management Engineering, Hanbat National University, SAN 16-1, Duckmyoung-dong, Yusong-gu, Daejon , #305-719, Korea [email protected]

Abstract. In this paper, the modified independent modal space control (IMSC) algorithm, which is a method for reducing the number of actuators, is used for flexible structural vibration control. By comparing the experimental results obtained by the IMSC and modified IMSC algorithm, the performance of those is examined. The performance of the IMSC algorithm is more effective than modified IMSC algorithm in the responses of PZT sensors and the input voltages to PZT actuators. But the IMSC algorithm is limited for many applications since it is needed the number of actuators as many as controlled modes. For this reason, the modified IMSC algorithm is proposed in this paper. That of the modified IMSC algorithm is similar with the IMSC algorithm but there are discontinuous responses because of switching algorithm. Therefore, if the switching algorithm has gradually varying according to shape of control forces, its performance is more effective and applicable than the IMSC algorithm. This paper will provide an essential experimental basis material for that study.

1 Introduction Many researchers study about variable vibration control algorithms of flexible space structures. But the control algorithms using actual controllers are hard to apply. Because orders of system are high, the calculation quantity is a burden to proposed algorithm. Especially when controlling in real-time, it comes to be much more. To resolve these problems more effectively, Independent modal space control (IMSC) algorithm was proposed by Meirovitch and Joint researchers [1]. IMSC algorithm converts a modal matrix at transformation matrix and makes coupled equation of motion of a structure with uncoupled equation in modal coordinate system. And then

*

Corresponding author.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 486 – 493, 2006. © Springer-Verlag Berlin Heidelberg 2006

An Experimental Study on Multi-mode Control Methods of Flexible Structure

487

we can easily design a controller in modal space. In this case, the design of controller is simple without a number of systems DOF (Degree Of Freedom). Because it uses each mode state, control is accomplished independently. The namely calculation quantity becomes very little. Using this control method, it plans the control force of each mode first and it calculates a real control force by coordinate transformation. Also, if a number of controlled mode and actuators are same, controllability is always satisfied and a control spillover will be minimized. For a design of modal control force, if using a state estimator with a distributed sensor, estimating modal state can be excluded from an observation spillover. But in spite of many strong point, it is necessary for this vibration control algorithm to be equal a number of controlled mode with actuators. For these reasons, application field of IMSC is restricted [2-3]. In this paper, the modified independent modal space control (MIMSC) algorithm, which is a method for reducing the number of actuators, is used for a flexible structural vibration control. So we can control more the number of modes than actuators through the simulation and experimental approach. To examine the performance of IMSC and modified IMSC algorithm, we will compare the experimental results obtained by those two algorithms.

2 Basic of Controllers and Modeling with Piezoelectric Material Piezoelectric (PZT) material is one of the smart materials and describes the material of generating an electric field when the material is subjected to a mechanical stress (direct effect), or, conversely, generating a mechanical strain in response to an applied electric field. PZT ceramics are essentially small-stroke large-force solid-state actuators with very good high-frequency performance. So it has wide band applications particularly in NANO systems and Micro Electro-Mechanical Systems (MEMS) [4]. 2.1 Modeling Cantilever Beam with PZT The equations of sensor and actuator which considers mass effect as PZT materials attached to cantilever beam; are given by [6]

Vs (t ) =

2 Qb d ht x ∂ w( x, t ) = ν b ³x 2 dx, ν b = − E31 c 2 1 Cb ε 33 S11lc ∂x

(1)

Where Vs (t ) is a output voltage from PZT sensor and it is generated a charge Qb by PZT ceramic characteristics, Cb is a capacitance of PZT sensor, coefficient.

ν b is

a voltage

V ( x, t ) = [ H ( x − x1 ) − H ( x − x2 )] Va (t )

(2)

M Λ = ηb V ( x, t ), ηb = 1 d31 Ec b( tc + tb )

(3)

2

V ( x, t ) is a voltage distribution by input voltage Va (t ) to PZT actuator. M Λ is a bending moment of PZT actuator. According to the above derivations, we could derive the equation of motion of cantilever beam with PZT and estimate responses of a sensing and actuating.

488

W.S. Lee et al.

2.2 Locations of PZT Sensors and Actuators By using a modal position sensitivity function of cantilever beam, we attached PZT sensors and actuators to the beam that affects considering modes [6]. And then PZT sensors and actuators were placed with same location and opposite side of the beam. The reason is that a good influence point of sensing and actuating is a same place and then controller has a minimum phase differences. 2.3 Modal Parameter Estimation We estimated a modal parameter of cantilever beam with PZT by measuring the frequency response function (FRF). So we could minimize a mass effect of PZT. FRF was obtained by attached PZT sensors and actuators and resulted in following. Table 1. Results of modal test

Frequency (Hz) 4.023 25.623

1st mode 2nd mode

Damp. (%) 2.3 1.7

3 Theoretical Backgrounds of IMSC and MIMSC 3.1 IMSC IMSC control algorithm converts a modal matrix at transformation matrix and makes coupled equation of approximated motion of a structure with uncoupled equation in modal coordinate system. And then we can easily design a controller in modal space [1], [3]. The approximated equation of motion of cantilever beam in modal coordinate system is derived by

Iv( t ) + Λv (t ) = f (t )

(4)

Where I is an n unit matrix and Λ is diagonal matrix ( Λ = diag[ω ω ] ), it consists of diagonal elements that are formed of natural frequency squares. f (t ) is a n-modal control force vector( f (t ) = [ f 1 (t ) f 2 (t ) f n (t )]T = P T u(t ) ). And then n is a number of controlled mode, v ( t ) is a modal displacement vector ( v (t ) = [v (t ) v (t ) v (t )] = P q (t ) ) that is used to model the controller. T

1

2

2

2

1

n

T

n

3.2 Modified IMSC The weak point of IMSC is that IMSC needs enough actuators to control the vibration modes. So Baz and Poh presented modified independent modal space control (MIMSC) to be able to be applied IMSC when a number of actuators was less than controlled modes. Procedure of MIMSC is composed of two-steps. First, we split a system into controlled mode or not in order. And then regularly moment, actuators are

An Experimental Study on Multi-mode Control Methods of Flexible Structure

489

operated to the mode that has a biggest modal energy at same moment in controlled modes. So the important things are easiness and efficiency of designing control force in IMSC and reducibility of actuator number in MIMSC. This reasons of MIMSC algorithm affected active vibration control of flexible structures. 3.3 Design of Controller and Observer Multi-mode control algorithm will be able to compose each mode control forces only it knows the mode condition. So in this paper, for estimating mode state, we adopted Luenberger observer and utilized standard LQ regulator for getting control gains. In these experiments, we did not consist of residual mode for convenience manner [7].

Fig. 1. Schematics of a multi mode control

4 Preparation of Experiments For active vibration control, we attached PZT ceramics to cantilever beam according to modal position sensitivity function under certain pressure and temperature conditions [10-11].

Fig. 2. Bonding Process of PZT material to cantilever beam

490

W.S. Lee et al.

Then we used a charge amplifier to supplement a low output current of PZT sensor and a high voltage amplifier to improve an input to PZT actuators. And we utilized step-motor for a constant impulse external force to the cantilever beam. The schematic and photograph of the experimental system are shown in Fig.3. Specifications of flexible beam and PZT actuator are shown in Table 2 and 3.

(a) Schematic of experimental system

(b) Photograph of experimental system Fig. 3. Schematic and photograph of experimental system Table 2. Specification of flexible beam

Length (mm)

Width (mm)

Thickness (mm)

Young’s Modulus (GPa)

Density (kg/m3)

Poisson’s Ratio

450

15

1.0

69

2780

0.33

Table 3. Specification of PZT actuator

Actuator & Sensor (PZT C-63)

Piezoelectric Charge Constant

Length (mm)

Width (mm)

Thickness (mm)

Young’s Modulus (Gpa)

( d 3 1, 1 0 − 12 m / V )

30

15

0.3

67

-165

An Experimental Study on Multi-mode Control Methods of Flexible Structure

491

5 Results of Experiments and Simulations Simulation control gains and observer gains were obtained. And those are equally applied to IMSC and MIMSC in our experiments. Using IMSC algorithm and uncontrolled results of simulation are as follows.

Fig. 4. Controlled responses using IMSC Simulations

We practiced each control algorithm using gains from simulation and some compensations system for errors like nonlinearities. We monitored a response of system and got feedback errors by PZT sensor attached to the free end of cantilever beam. We used two sensors in IMSC and MIMSC algorithm to observe feedback error. To apply IMSC algorithm to this system, system must have actuators of the same number as sensors. Therefore we used two actuators in IMSC. In MIMSC algorithm, we used a single actuator on the root position of cantilever beam because can reduce the number of actuators in this algorithm. The results are as shown in Fig.5.

Fig. 5. Comparison of responses in the time domain-Experiments

492

W.S. Lee et al.

In the experiments time domain, the result of IMSC algorithm was similar with MIMSC. As results of IMSC algorithm compared with MIMSC in the PSD (Power Spectrum Density) using above given time domain, the control effect of IMSC is better than MIMSC algorithm in second natural frequency. And there are discontinuous responses because of switching algorithm. But the differences are very small. These results in frequency domain are as shown in Fig.6. Therefore, if the switching algorithm has gradually varying according to shape of control forces, its performance is more effective and applicable than the IMSC algorithm. Because it can reduce the number of actuator, the MIMSC algorithm is more efficiency in controllability and control power than the IMSC.

Fig. 6. Comparison of responses in PSD

6 Conclusions In this paper, IMSC and MIMSC methods are applied to control a multi-mode vibration of flexible structure using PZT actuator and sensor. Our experiments’ results show that MIMSC method has advantages of easy controller and observer design from the experiment. The MIMSC method has high control efficiency because restricted modes are controlled. These restricted modes have large modal energy out of infinite modes. And the MIMSC method can control effectively using a few numbers of actuator. Therefore the MIMSC method could be applied to the control method of flexible system that has restricted numbers of actuator and space.

Acknowledgements This work has been sponsored by MOCIE (Ministry of Commerce, Industry and Energy) of Korea.

An Experimental Study on Multi-mode Control Methods of Flexible Structure

493

References 1. Leonard Meirovitch: Dynamics and Control of Structures, John Wiley & Sons, (1990) 2. Baz, A. and Poh, S.: Performance of an Active Control System with Piezoelectric Actuators. J. Sound and Vibration, Vol. 126, No 2, (1988) 327-343 3. Hwang, J.H., Kim, J. S., Baek, S.H.: A Method for Reduction of Number of Actuators in Independent Modal Space Control. KSME International Journal, Vol. 13, No. 1, (1999) 42-49 4. Kenji Uchino: Piezoelectric Actuators and Ultrasonic Motors. Kluwer Academic Publishers (1997) 5. Inderjit Chopra: Smart Structures. Univ. of Maryland 6. Hwang Jin-Kwon: System Identification and Vibration Control of Flexible Structures Using Piezoelectric Materials, Seoul national Univ. (1997) 7. Leonard Meirovitch: Computational Methods in Structural Dynamics. Kluwer Academic Pub. (1980) 8. Daniel J. Inman: Engineering Vibration. Prentice Hall (1994) 9. Kenneth G. McConnell: Vibration Testing – Theory and Practice. John Wiley & Sons (1995) 10. Khulief Y. A.: Vibration Suppression in Rotating Beams Using Active Modal Control. J. Sound and Vibration, Vol. 242, No. 4, (2001) 681-699 11. Kermani M. R., Moallem M. and Patel R. V.: Optimizing the Performance of Piezoelectric Actuators for Active Vibration Control, IEEE International Conference on Robotics Washington, DC May (2002)

Applications of Computational Verbs to Image Processing of RoboCup Small-Size Robots Wanmi Chen1, Yanqin Wei1, Minrui Fei1, and Huosheng Hu2 1

Shanghai Key Laboratory of Power Station Automation Technology, School of Mechatronics Engineering & Automation, Shanghai University, 200072 Shanghai, China [email protected], [email protected], [email protected] 2 Department of Computer Science, University of Essex, Colchester CO4 3SQ, U.K. [email protected]

Abstract. Computational verbs have been applied to digital image processing recently in which the grey values of an image are treated as a dynamic evolving process in space [5]. The paper is to apply such a theory in the RoboCup domain. First, the computational verbs for our small-size soccer robots are developed based on the theory. Then, the modelling and matching are investigated and analyzed to obtain verb similarity. The developed system reduces the computation cost by using row-wise and column-wise compositions of spatial verbs, and makes the vision system have high accuracy and high process speed.

1 Introduction RoboCup is an international joint project that aims to foster AI and intelligent robotics research by providing a standard problem where wide range of technologies can be integrated and examined [1]. The ultimate goal of the RoboCup project is to develop a team of fully autonomous humanoid robots that can win against the human world champion team in soccer by 2050. It is anticipated that the project also promote many real-world applications of AI and robotics. In order for a robot team to actually perform a soccer game, various technologies must be incorporated, including design principles of autonomous agents, sensor integration, data fusion, strategy acquisition, real-time reasoning and multi-agent collaboration. In the RoboCup competition, there are different leagues such as simulation league, small-size robot league, middle-size robot league, legged robot league and, etc. In all these leagues, it is common for two teams of multiple fastmoving robots to operate under a dynamic environment, either a simulated or a real pitch. In the small-size robot league, the game is designed to imitate the real soccer match condition. It is a centralised system – a central computer sends the commands through decision-making programs. The system consists of four parts: an overhead vision system, a decision-making system, a wireless communication system and five mobile robots. Among these parts, the overhead vision system monitors the current status on D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 494 – 499, 2006. © Springer-Verlag Berlin Heidelberg 2006

Applications of Computational Verbs to Image Processing

495

the pitch and sends robot position and moving directions to the central computer so that the control algorithm can decide how each robot should act. Since the small-size robot league of RoboCup is a highly dynamic system, it is very important that its vision system should have the real-time performance and highly accuracy. In [2], an efficient detection algorithm is developed to detect simple colour patches on top of individual mobile robots so that their positions can be calculated accurately. It starts by searching for the key patch, and then detecting and sorting the additional patches radically. After that, the rotational correspondence of the additional patches is matched with the geometric model of the pattern. But its accuracy needs to be improved. In this paper, computational verb theory is adopted in the image processing process of our robot system so that the position and moving direction of individual robots can be detected fast and accurately. The rest of the paper is organised as follows. Section 2 reviews the background of computational verb theory [3] and its application to image processing [5]. In Section 3, the colour marker on top of our small-size robot is described. Section 4 describes how to construct canonical spatial computational verbs for image processing of our robots. In Section 5, real-time matching of images and spatial computational verbs is explained. Section 6 presents some preliminary experimental results. Finally, a brief conclusion is given in Section 7.

2 Verb Image Processing – Background Computational verbs are a dynamic system, i.e. a dynamic process in time, space or state space [3][4]. By the mathematics principle, the dynamic system includes 4 parts: (i) time; (ii) state space (include all advisable status together); (iii) distance function for distinction of two different statuses; and (iv) the state evolving law - a time function. The theory treats the grey values of an image as a dynamic process along spatial coordinates, namely spatial verbs [5]. This paper investigates the application of these spatial verbs in the RoboCup domain since verb image processing is a promising way to reduce computation costs for image processing. As we know, the traditional image processing algorithms deal with an image as a static picture where lines and shapes are purely geometric items [5]. The complexity of an image operator is proportional to the product of the width and height of the image. This requires expensive computing power and makes real-time applications very demanding. On the other hands, verb image processing handles images in a different way in which the complexity of a verb image processing operator is proportional to the sum the width and height of the image. This will demand much less computing power and make real-time applications be economic. Basically, verb image processing is to find the relationship between an image and a template spatial verb. The evolving function of a spatial verb denotes the change of grey values along spatial coordination, which was decoupled into two functions, namely a brightness profile function and a shape outline function [5]. In other words, one function deals with the brightness information and the other handles the spatial configuration. Their canonical form is

E v (i , j ) =

Wd

wd

k =− wd

l =− wd

¦ ¦

f p ( k , l ) × f o (i − k , j − l )

(1)

496

W. Chen et al.

where Wd is the window size. fp(.) is the brightness profile function. fo(.) is the shape outline function. {i, j, k ,l} are the pixel index of 2D images. To simplify the computational complexity, the canonical form (1) can be constructed by chosen either row-wise or column-wise compositions: (i) the row-wise composition:

Ev (i, j ) =

wd

¦

f p (l ) × fo (i, j − l )

(2)

f p ( k ) × f o (i − k , j )

(3)

l =− wd

and the column-wise composition: E v (i , j ) =

wd

¦

k = − wd

By using either (2) or (3), the computational cost cab be reduced dramatically [5].

3 Colour Markers on Robots The RoboCup Small-Size robot competition is based on global vision systems that are installed at the top of the football pitch. The individual robots have no independent visual ability embedded. Each robot has colour makers on its top as shown in Fig. 1. The colour markers can be detected by the overhead cameras. Since the ball and two teams of robots are moving very fast, it is very important that the image processing algorithm can be implemented in real time, i.e. at least 25 frames per second so that the decision system can find the positions and moving directions of the ball and all the robots on the pitch in good accuracy.

(a)

(b)

(c)

Fig. 1. Colour marker on the top of the robot. (a) Robot’s marker.(b) Its grey image. (c) Colour edges.

The robot marker has different colours: (i) the red colour is the robot orientation; (ii) the yellow colour is the team marker, and (iii) the green colour is the marker of a team member. To implement verb image processing, the colour image will be converted into the grey image, as shown in Fig. 1(b). The 2-value shape of the robot marker is shown in Fig. 1(c), in which we have 9 pixels for the diameter of a team mark, 17 pixels for the diameter of the middle circle, and 25 pixels for the diameter of the outer circle. Suppose the grey value of all circles is uniform. The image size of the top-view of the robot is 31x31 pixels in our experimental setting.

Applications of Computational Verbs to Image Processing

497

4 Constructing Spatial Computational Verbs To establish the evolve function of spatial verbs, we need to establish both the brightness profile function and shape outline function in this section. 4.1 Brightness Profile Function As the colour marker of the robot is converted into the grey value image with the spatial coordinates that are symmetric, we can resolve its brightness profile function as the row-wise and the column-wise compositions. The image evolves the difference with the matching of the moulding board to row-wise and column-wise compositions. If the image size is N* M, we are computing the sum N+ M instead of N* M. So, the computing speed is much improved. According to the colour mark arrangement in Fig.1(a) and the grey values in Fig.1 (b), we can get as follows the brightness profile function (row and column). 1 i ∈[−15, −12] [12,15] ° 2 i ∈[−12, −8] [8,12] °1 − 144 − i 60 f p (i) = ® i ∈[−8, −4] [4,8] ° 13 − 16 − i 2 2 ° 15 i ∈[−4,4] ¯

(4)

where the size of window is 31x31 pixels, and the origin is at the center of the mark.

(a)

(b)

Fig. 2. Function simulation. (a) Brightness profile function. (b) Shape outline function.

4.2 Shape Outline Function The ideal outline of the marking is shown in Fig.1(c). By using (2) and (3), we can obtain the row-wise and column-wise verb of the moulding board. Both the brightness profile function and the shape outline function are shown in Fig.2.

498

W. Chen et al.

5 Real-Time Matching of Image and Spatial Computational Verb The final step is target identification and matching, which is described in this section. 5.1 Computational Verb of an Image In the vision system, the computer receives the colour masks taken from the overhead camera. Its RGB information is converted to the grey value by using the following formula: Pi = 0.3*R + 0.59*G+ 0.11*B (5) After getting the grey value, we adopt the boundary measurement method [6] to convert the grey image into a 2-value boundary image. The shape outline function of the image verb is then obtained. As the boundary image may not be clear enough, we use Gauss filters and smoothness processing algorithms to remove all the low frequency noise interference. 5.2 The Matching of Image and Molding The process of the verb matching is a process of identification of the verb similarity degree [5], which is defined as follows: 0 ¦ V 1( i, j ) + V 2 ( i, j ) = 0 ° ( i , j )∈Ω °° ¦ V 1( i, j ) − V 2 ( i, j ) ® °S (V1, V2) =1 − ( i , j )∈Ω other V 1( i, j ) + V 2 ( i, j ) ° ¦ °¯ ( i , j )∈Ω

(6)

ZKHUH S ∈ [0,1] and ȍ is the size of the matching window. Note that the higher the degree is, more likely an object is. Obviously, if two verbs are from the same image then the resemble degree is 1.

6 Experiments and Results Our vision system is connected to a desktop PC with 2.4GHz CPU and 256M EMS. The camera has a resolution of 640*480 pixels, with RGB output. The accurate timer counts the running time of the control program at a cycle time of 20ms. The position error is about 1 pixel (3mm). Table 1 shows the estimation of our vision system capacity before and after using computational verbs. Table 1. Comparison before and after using computational verbs

Guide line Time-consuming Error identification Orientation error Angle error

Before <25ms 1/500 cycle >2 pixels >20 o

After <20ms 1/7500 cycle <1 pixels <8 o

Applications of Computational Verbs to Image Processing

499

7 Conclusions This paper presents the use of computational verb theory in image processing of RoboCup small-size robots at Shanghai University in China. The unique feature of verb image processing is that the change process of the gray values is monitored so that our vision system is able to reduce detection errors and increase detection speed. The system will be deployed in RoboCup 2006 at Bremen in Germany.

Acknowledgments This work is financially supported by Doctoral Program Foundation of Science and Technology Special Project in Shanghai University (20040280017), the Key Project of Science and Technology Commission of Shanghai Municipality (04JC14038), Shanghai Leading Academic Disciplines (T0103), and Shanghai Municipal Education Commission Development Foundation (04AB65).

References 1. Douret, J., Benosmen, R., Bouzar, S., Devars, J.: Localization of Robots in F180 League Using Projective Geometry. Proc. of Int. RoboCup Symposium, Japan (2002)312-318 2. Bruce, J., Veloso, M.: Fast and Accurate Vision-Based Pattern Detection and Identification. Proceedings of IEEE International Conference on Robotics and Automation (ICRA '03), (2003) 1277-1282 3. Yang, T.: Computational Verb Theory: from Engineering, Dynamic Systems to Physical Linguistics. Monographs in Information Sciences, Yang’s Scientific Institute, Tucson, AZ Vol. 2 ( 2002) 4. Zhu, L., Fei, M.R.: RoboCup Opponent Modeling Based on Computational Verb Theory, J. of Control Theory and Applications, Vol.21 Supplementary (2005)19-23 5. Yang, T.: Applications of Computational Verbs to Digital Image Processing. International Journal of Computational Cognition, Vol. 3, No. 3 (2005)32-41 6. Chen, C.: Computer Image Processing Technology and Arithmetic. Tsinghua University Press, Beijing ( 2003)

Auto-Disturbance-Rejection Controller Design Based on RBF Neural Networks Yongli Shi and Chaozhen Hou Department of Automatic Control, Beijing Institute of Technology, Beijing 100081, China [email protected], [email protected]

Abstract. Compared with general controllers, the auto-disturbance-rejection controller (ADRC) has been paid much more attentions in the field of control as it has advantages on simple algorithm, strong robustness, and less dependence on the mathematical models of the plants. However, the tuning procedure of the ADRC parameters is very complicated due to its reliance on the human experiences. Therefore, a radial basis function neural network (RBFNN) is introduced to regularize the ADRC parameters automatically on line. Simulation results indicate that the algorithm above is feasible.

1 Introduction The auto-disturance-rejection controller (ADRC) is a new nonlinear controller without depending on the model of the plant. Due to its simplicity, strong robustness and disturbance rejection, it has been successfully applied to many fields, such as the speed control of the induction drives, high-precision tracking control of the Stewart platform of large radio telescopes, etc[1-3]. However, the tuning procedure of the ADRC parameters is very complicated due to its heavy reliances on the human experiences. Therefore, Han and Wang [4], Zhang [5] applied the time scale to the parameter regulations of ADRC, but this procedure depends on a reference system whose parameters still need tuning through experiences. Although the simplex method is proposed to optimize the parameters of ADRC in Zhang [5], precise mathematical models of the plants are still needed. Li and Zhu [6] adopted a genetic optimizing algorithm with the penalty strategy to solve the problem of ADRC parameters optimization in a certain range, but the parameter ranges must be given at first. Obviously, it is difficult to tune the ADRC parameters at present if the exact model of the plant and the parameter ranges are unknown In this paper, a novel method is proposed to tune ADRC parameters automatically on line by using a radial basis function neural network (RBFNN). This method can optimize the controller parameters in a large scope without the precise models of the plants and parameter ranges in prior. The remainder of the paper is organized as follows. Section 2 will give a brief introduction of ADRC. In section 3, a self-tuning ADRC will be presented based on a RBFNN. Section 4 will give a simulation example to verify the feasibility of the new ADRC. Conclusions are made in section 5. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 500 – 505, 2006. © Springer-Verlag Berlin Heidelberg 2006

Auto-Disturbance-Rejection Controller Design Based on RBF Neural Networks

501

2 Auto-Disturbance-Rejection Control Theory An ADRC consists of a tracking differentiator(TD), an extended state observer(ESO), and a nonlinear state error feedback law (NLSEF). Consider a second-order plant

y = f ( y, y , k ) + w(k ) + b0u (k ) .

(1)

where f(⋅) is the unknown nonlinear time-varying function, y and u are output and input, respectively, w is the external disturbance, b0 is known. The structure of the ADRC is shown in Fig.1, and the discrete-time ADRC algorithm can be described as follows[1,2,7]. w v1 +

v

e1 e2 + -

TD v2

u0 + NLSEF 1/b0

Plant

y

b0

z3

z2

z1

u

ESO

ADRC

Fig. 1. Structure of the ADRC

㧦 ®vv ((kk ++11)) == vv ((kk))++hh⋅⋅vfst(k()v (k ) − v, v (k ), r, h ) . 1

TD

¯

1

2

2

2

1

2

(2)

0

z1 (k + 1) = z1 (k ) + h[ z 2 (k ) − β 01 fal (e0 (k ), a1 , δ1 )] 2 2 3 02 fal (e0 ( k ), a 2 , δ 1 )] . ° z (k + 1) = z (k ) − hβ fal (e (k ), a , δ ) 3 03 0 3 1 ¯ 3

᧶ °®z (k + 1) = z (k ) + h[ z (k ) − β

ESO

᧶ ®uu(k(k) )==uβ(kfal) −(ez ((kk),) ab , δ ) + β 0

NLSEF

¯

1

0

1

3

4

2

2

fal (e2 (k ), a5 , δ 2 )

(3)

.

(4)

0

where e0(k)=z1(k)-y(k), fst(⋅) and fal(⋅) are nonlinear functions, h is sampling interval. The design of TD, ESO and NLSEF can be completed by using the separability principle. The function of the TD is to arrange the expected transient behavior. Comparatively, it is rather independent and can be designed according to the transient process of the system and the differential signals of the input. The ESO estimates the state variables of the plant and the disturbances of the system and can be designed according to the requirements of the plant states. The NLSEF makes use of the state error between TD and ESO to construct the nonlinear feedback law according to a cascaded integral linear system. Although there are many parameters in ADRC, a large part of them can be regulated and fixed beforehand. For a certain kind of plants, the parameters {r, h0, a1, a2, a3, δ1, a4, a5, δ2, β01, β02} can be set as fixed ones. Only three parameters {β1, β2, β03} often need to be adjusted.

502

Y. Shi and C. Hou

3 Self-tuning ADRC Design In order to tune the ADRC parameters automatically, a RBFNN is introduced as shown in Fig.2, where TDL denotes the tapped delay line. The output vector of the TDL consists of the delayed input signal. The input vector of the RBFNN is the output vector of the TDL. The RBFNN is used to identify the plant to obtain the Jacobian matrix on line. According to the performance index and the gradient descent algorithm, the on-line tuning rule of the ADRC parameters can be obtained using the Jacobian matrix. w v

ADRC

u

y

Plant TDL

TDL

+ yNN -

RBFNN

Fig. 2. Block diagram of self-tuning ADRC

3.1 RBFNN Architecture and Learning Algorithm

Compared with BP neural networks, RBFNNs have advantages on simple structure, faster learning algorithms, better approximation capabilities, and the avoidance of the local optima. Fig.3 gives the network structure adopted in this paper. x1

ϕ1

x2

ϕ2

w1 w2

y NN

wm

ϕ3

xn i

j

Fig. 3. RBF neural networks

The network output y can be written as[8]: m

m

y NN (k ) = ¦ w j (k )ϕ j (k ) = ¦ w j (k ) exp ª− X (k ) − C j (k ) «¬ j =1 j =1

2

2σ 2j (k )º . »¼

(5)

where wj denotes the weight for the jth node, m is the number of the hidden nodes, ||⋅|| is the Euclidean norm, Cj is the center of the jth node of the hidden layer, σj is the spread of the jth RBF, j=1,2,…m.

Auto-Disturbance-Rejection Controller Design Based on RBF Neural Networks

503

Then, the performance index function of the RBFNN training is defined as:

J (k ) = [ y (k ) − y NN (k )] 2 2 = e 2y (k ) 2 .

(6)

According to the gradient descent algorithm, the iterative algorithm of the learning rule is obtained as follows: w j (k + 1) = w(k ) + η1e y (k )ϕ j (k ) .

[

C j (k + 1) = C j (k ) + η 2 e y (k ) w j (k )ϕ j (k ) X (k ) − C j (k )

σ j (k + 1) = σ j (k ) + η 3 e y (k ) w j (k )ϕ j (k ) X (k ) − C j (k )

(7)

] 2

σ 2j (k ) .

(8)

σ 3j (k ) .

(9)

where η1 , η 2 , η 3 are learning rates. 3.2 Self-tuning ADRC Learning Algorithm

According to section2, only three parameters β1, β2, β03 of the ADRC need to be adjusted on line. The tuning rule of the parameters can be described as follows. From (3) and (4), the controller output u(k) is derived as:

u (k ) = β 1 fal (e1 (k ), a 4 , δ 2 ) + β 2 fal (e 2 (k ), a 5 , δ 2 ) − z 3 (k ) b0 . ® ¯ z 3 (k ) = z 3 (k − 1) − hβ 03 fal (e 0 (k ), a 3 , δ 1 ) In order to optimize ADRC parameters, the performance index is defined as

(10)

㧦

E (k ) = [v1 (k ) − y (k )] 2 2 = evy2 (k ) 2 .

(11)

Through the minimization of E, the parameters β1, β2, β03 can be optimized. According to the gradient descent algorithm, the iterative algorithm of the parameter β1, β2, β03 can be written as follows:

β 1 (k + 1) = β 1 (k ) + η∆β1 (k ) ° ®∆β (k ) = − ∂E (k ) = − ∂E (k ) ∂y (k ) ∂u (k ) = e (k ) ∂y (k ) fal (e (k ), a , δ ) . vy 1 4 2 °¯ 1 ∂β 1 (k ) ∂y (k ) ∂u (k ) ∂β 1 (k ) ∂u (k )

(12)

β 2 ( k + 1) = β 2 (k ) + η∆β 2 (k ) ° . ∂E (k ) ∂E (k ) ∂y(k ) ∂u ( k ) ∂y(k ) ® °∆β 2 (k ) = − ∂β (k ) = − ∂y (k ) ∂u (k ) ∂β (k ) = evy (k ) ∂u (k ) fal (e 2 (k ), a 4 , δ 2 ) 2 2 ¯

(13)

β 03 (k + 1) = β 03 (k ) + η∆β 03 (k ) ° . 1 ∂E (k ) ∂E (k ) ∂y (k ) ∂u (k ) ∂y (k ) ® °∆β 03 (k ) = − ∂β (k ) = − ∂y (k ) ∂u (k ) ∂β (k ) = b evy (k ) ∂u (k ) f Σ (k ) 03 03 0 ¯

(14)

504

Y. Shi and C. Hou

᧹

k

where f Σ (k ) = ¦ hfal (e0 (i ), a 4 , δ 2 ) , η is the learning rate. Let u (k ) x1 (k ) in Fig.3, i =1

the Jacobian matrix can be derived in the following form from (5). m

∂y (k ) ∂y NN (k ) ≈ = ∂u (k ) ∂u (k )

∂ (¦ w j (k )ϕ j (k )) j =1

∂x1 (k )

᧹¦ w (k )ϕ (k ) c m

1j

j

j

(k ) − x1 (k )

σ j (k ) 2

j =1

(15)

.

4 Simulation Results In order to verify the feasibility of the self-tuning ADRC, a second-order nonlinear plant is considered as follows:

y (k ) = 1.792(1 − 0.8e −0.1k ) y(k − 1) − 0.792 y (k − 2) + 0.000131u (k − 1) + 0.0001212u (k − 2)

.

(16)

㧩

The reference inputs of this plant are set to be v(k ) = sin(0.4πkt s ) and v(k) 1 respectively. During the simulation, the controller structure in Fig.2 is used, and RBFNN structure is chosen to be 6-6-1, the input vector X=[u(k), u(k-1), u(k-2), y(k), y(k-1), y(k-2)]. Assume r=1000, h 0.001, h0=0.001, β01=200, β02=2000, a1=1, a2=0.5, a3=0.25, δ1=0.01, a4=0.5, a4=0.5, δ2=0.01, simulation results can be obtained under the different reference inputs, as shown in Fig.4 and Fig.5. Fig.4 shows the outputs under sinusoidal and step inputs when the conventional ADRC is used(β1 7, β2 1, β03 200). It is clear that the control performance is not good, and the controller parameters need to be tuned by experience step by step. Fig.5 shows the outputs under the same conditions as Fig.4 when self-tuning ADRC is used. It can be seen that the self-tuning ADRC possesses better performance than Fig.4 since the parameters can be tuned on line using RBFNN.

㧩

㧩㧩

㧩

1.5

1.4

input output

1

input output

1.2 1

0.5 v,y

v,y

0.8

0

0.6

-0.5

0.4

-1 -1.5

0.2

0

2

4

6 time(s)

(a) sinusoidal input

8

10

0

0

1

2

3 time(s)

(b) step input

Fig. 4. System output with conventional ADRC

4

5

Auto-Disturbance-Rejection Controller Design Based on RBF Neural Networks 1.5

1.4

input output

1

input output

1.2 1

0.5

0.8

0

v,y

v,y

505

0.6

-0.5 0.4

-1 -1.5

0.2

0

2

4

6

8

10

0

0

1

2

time(s)

(a) sinusoidal input

3

4

5

time(s)

(b) step input

Fig. 5. System output with self-tuning ADRC

5 Conclusion In this paper, a novel method using RBFNN is proposed to tune the parameters of ADRC on line. Compared with the previous studies, this method can optimize the controller parameters in a large scope without the precise models of the plants and parameter ranges in prior. Finally, the self-tuning ADRC can improve the control performance effectively and can be applied to many fields which need highperformance control.

References 1. Gao, Z. Q., Hu, S. H., Jiang, F. J.: A Novel Motion Control Design Approach Based On Active Disturbance Rejection. Proc. of the 40th IEEE Conference on Decision and Control, (2001)39-45 2. Feng, G., Liu, Y. F., Huang, L.P.: A New Robust Algorithm to Improve the Dynamic Performance on the Speed Control of Induction Motor Drive. IEEE Trans on Power Electtonics 19 (2004) 1614-1627 3. Su, Y. X., Duan, B. Y, Zheng, C. H., Zhang, Y. F.: Chen G.D., Mi. J.W. DisturbanceRejection High-Precision Motion Control of a Stewart Platform. IEEE Trans. Control Systems Technology 12 (2004) 364-374 4. Han, J. Q., Wang, X. J.: The Time Scale of a System and Nonlinear PID Controller [A]. The Proceeding of Chinese Control Conference. Chinese Science and Technology Press, Beijing (1994) 321 341 5. Zhang, W. G.: The Time Scale and ADRC Controller. Beijing: Institute of System Science of Chinese Academy of Science, Beijing (1999). (in Chinese) 6. Li, H. S., Zhu, X. F.: On Parameters Tuning and Optimization of Auto-disturbance-rejection Controller. Control Engineering of China 11 (2004) 419-423. (in Chinese) 7. Han, J. Q.: Nonlinear Design Methods for Control Sysetms. 14th World Congress Of IFAC Beijing (1999) 521-526 8. Gupta et al.: Static and Dynamic Neural Networks. John Wiley & Sons (2003)

᧩

CASTmiddleware: Security Middleware of Context-Awareness Simulation Toolkit for Ubiquitous Computing Research Environment InSu Kim1, YoungLok Lee2, and HyungHyo Lee3 1

Department of Information Security, Chonnam National University, Gwangju, Korea [email protected] 2 Linux Security Research Center, Chonnam National University, Gwangju, 500-757, Korea [email protected] 3 Division of Information and EC, Wonkwang University, Iksan, 570-749, Korea [email protected]

Abstract. The study deals with the most important elements of ubiquitous computing, that is, the toolkit to acquire, express and safely use the context information. To do so, we introduce security middleware of CAST(ContextAwareness Simulation Toolkit) and show how it works. CAST generates users and devices in a virtual home domain, designates their relation and acquires virtual context information. The created context information is reused by the request of application and put into use for context learning. Particularly, we have given a consideration to security in the process of context acquisition and its consumption. That is, we applied SPKI/SDSI to test if the created context information was valid information and if the application that called for the context had legitimate authority to do so. CASTmiddleware not only captures virtual context information, but it also guarantees the safe sharing of the context information requested by the application.

1 Introduction Context Awareness is the most important research area in ubiquitous computing, on which many researches have recently been working on. It was because Mark Weiser, who first suggested the notion 'ubiquitous computing', noted that basically the followHuman-friendly interface, Computing connecing 4 should be reflected[10,11]: tion at anytime, anywhere, Calm technology, Dynamic service. Like this, Context Awareness is very important to ubiquitous computing that is intended to enable natural and convenient computer use in everyday lives regardless of location and time. Since Context Awareness is important, there have been lots of preceding researches on it. However, there are some limitations and our approach to the resolution of it is as follows: The vagueness of the definition of context: : Although there have been many preceding researches, there is no agreed definition of it yet.. The main reason of it is that we generalize and then try to define all the domains. Here, we are going to focus on the context of home domain.

Ԛ

Ԙ

ԛ

ԙ

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 506 – 513, 2006. © Springer-Verlag Berlin Heidelberg 2006

CASTmiddleware: Security Middleware of Context-Awareness Simulation Toolkit

507

Fig. 1. Architecture of Context-Awareness Simulation Toolkit

The difficulty in acquiring context: In real world, context information is acquired by physical sensors. However many researchers have hard time in their getting context information since they have mostly majored in S/W part rather than H/W part. The validity of acquired context information and its safe sharing: Preceding studies on the things such as Context Toolkit have not given any consideration to security while acquiring context. That is, there have been no consideration to if the acquired context information was valid or if the application requesting the acquired context is duly authorized. CAST not only captures virtual context information, but it also guarantees the safe sharing of the context information requested by the application. This is significant because it provides virtual, secure experimental environment to the S/W-major ubiquitous computing researchers who find it difficult to fabricate physical sensors. The study is composed of the following. In section 2, it will take a look at related research and in section 3, it will introduce CAST middleware. In section 4, it will discuss the feasibility of CASTmiddleware. Finally, in section 5, it will present its comparison with existing research by a proposed prototype of CASTmiddleware and the conclusion.

2 Related Works Except the Context Toolkit of A.K.Dey[1,2], All preceding researches acquire context information through physical device in H/W part[3,4]. 2.1 The Context Toolkit of Dey et al. The approach of the Context Toolkit by Dey et al[1,2] origins from the area of location-aware services. Its main focus is to provide a comprehensive conceptual framework together with a toolkit for representing and processing context information independent of an application, thus establishing external customisation architecture.

508

I. Kim, Y. Lee, and H. Lee

The context toolkit which realises only a subset of the frameworks' concepts has been implemented using Java and XML. Various applications have been developed on basis of the toolkit, including, e.g., an in/out board for indicating those persons which are inside a building, a personalised information display which shows the user in front relevant information and a context-aware mailing list sending emails only to those subscribed users which are currently in a certain building. 2.2 The CIVE of Seiie Jang et al. The CIVE(Context-based Interactive system for distributed Virtual Environment) is a system that connects real world with virtual world by sharing user's context[3]. It consists of ubi-UCAM for generating user's context, NAVER for managing virtual environments, and Interface for linking ubi-UCAM with NAVER. Interface transforms contexts or commands into events that influence virtual environment, and converts events into contexts for context-based services in real environment. 2.3 The GUIDE of Davis et al. The GUIDE system of Cheverst et al. [4,5] stems from the area of location-based services. The focus is to provide tourists with up-to-date and context-aware information about a city via a PDA.

3 CASTmiddleware: Middleware of Context-Awareness Simulation Toolkit CASTmiddleware performs the communication role among the virtual people and devices set in CASThome. It is composed of Event Manager and Context Manager. Their respective roles are like the following. 3.1 Event Manager (SeJES: Secure Javaspaced-Based Event Service) Ubiquitous services must appropriately adapt to the context information of the user. In need of privacy protection and proper adaptation, context information should be generated from the accurate event information and only right possessor of the authority about the event should utilize it. The ad hoc network environment introduces fundamental new problems. One is the absence of an online server, and another is secure transient association. Authentication and Authorization are the most interesting security problems in ad hoc networking, because much of the conventional wisdom from distributed systems does not quite carry over. For solving these problems, we use SPKI/SDSI(Simple Public Key Infrastructure/Simple Distributed Security Infrastructure) certificates.[6] 3.1.1 ACL and SPKI/SDSI Certificate The consumer of events will have Name Certificate() and Authorization Certificate(
CASTmiddleware: Security Middleware of Context-Awareness Simulation Toolkit Virtual Devices Of CASThome

Sensor Device 1

CASTmiddleware

LRC

SSCM (eType,ACLs)

Z Certificate Verification

Remote Event Listener

ུ Event Subscription Request ྲྀ Event Notification

ཱི Listener Register JES

[-1 Adsent of Sensor

use sQ u er g E r Re g y iste ven rR t equ est

SPKI/SDSI certificates

Event Producer 1

ཱ Verification Certificate

ens in

཰ ACLs Query ི Request of Consumer Register

X \- Y Pr AC 1S od L

Application Device (Event Consumer)

509

ACLs

Sensor Device 2 Event Producer 2

ERC ACLs

Event Manager (SeJES)

CASTadmin

[-2,\-2 Producer Query

ཱུConsumer Query

CQC Policy Manger

Policy Repository

Sensor Device 3 Event Producer 3

Context Repository

ACLs

CIC

Context Manager

Fig. 2. CASTmiddleware which supports the certification, authorization and device’s context query request/response

Authorization-tag, Validity>)[6]. The event consumer sends Name Certificate and Authorization Certificate bundled together to LRC when registering event listeners. ACL includes security policy which event producer endows the beneficiary event consumer the authority and is expressed as the Authorization Certificate. 3.1.2 SPKI/SDSI Certificate Manager SSCM(SPKI/SDSI Certificate Manager) includes “Certificate Chain Discover” algorithm which carries out the work of retrieving from its certificate cache authority certificates and name certificates provided to verify the authority for the consumer to perform the given calculation[6]. 3.1.3 Listener Registration Controller (The First Security Challenge) LRC(Listener Registration Controller) registers only the event listener of authorized consumer(Application Device) to JES. LRC sends the event type and certificate bundle which it received from consumer to SSCM for authority probation and decides by the feedback whether to register the event listener[13,14]. The following is the process in which the certification and authorization is performed on the Application Device as the consumer of events.( - of Figure 2)

Ԙԛ

Ԙ The consumer of events retrieves ACLs via Event Manager for the event type that it wants to receive prior to trying event communication. ԙ The consumer of events verifies that it has the subscription authority for the

event type with ACLs and its list of certificates and transmits the verified results to LRC. It requests SSCM to probe the certification package and receives the feedback. Once the subscription authority is verified, LRC records the information on the consumer of events to let the consumer register later on.

510

I. Kim, Y. Lee, and H. Lee

Ԛ The consumer of events that is verified to have the legitimate authority on the type of events requests LRC to register it as event listener. ԛ LRC downloads and registers the listener stub for the legitimate consumer of events and registers event listeners.

3.1.4 Event Registration Controller (The Second Security Challenge) ERC(Event Registration Controller) registers only the events of authenticated producer(=sensor device) and produces valid context. It tests the event type registered by producer and checks for any error for ACLs and for any overlapping of the event registered[13,14]. The following is the process where the certification/authorization for sensor device as the event producer is performed( - of Figure 2).

‫ݐݎ‬

‫ ݎ‬Once the power is on, Sensor Device1, an Event producer, sends the certifica‫ݏ‬ ‫ݐ‬

tion package containing the event type that it supplies, ACLs and verifying that it has the authority to publish the events to ERC, requesting it to register as event producer. After checking for any errors, ERC requests SSCM(SPKI/SDSI Certificate Manager) to verify the certification package and to register event publication. Based on the verified value of the certification package, SSCM registers event type as well as ACLs. ERC records the information on the event producer to be able to later process the events published by event producer.

3.2 Context Manager [15, 16] The context manager processes context information coming from sensors, stores it, and converts it into high layer's context information according to the inference policy. Also, it performs dynamic query on the context information that application device requests. The context information, stored in the database, is reused later in the study of user sign or when the application demands. This context information is the events which application devices want, so to inference those events the context manager provides GUI for setting the inference policy.

‫ݐݎ‬

3.2.1 CQC (Context Query Controller) 3.2.1.1 Sensor Device query processing. After Sensor Device is verified( - of Figure 2), the Event Manager announces the advent of the sensor by transmitting the sensing instance of the sensor to the CQC of Context Manager( -1). CQC parses the instance, transforms it into a CREATE table in SQL, and generates the sensor table in the DB( -2). Of course, once the same device already exists, it must not be generated repeatedly. If the sensing event really happens later( -1), it is transformed into INSERT or UPDATE table in SQL through CQC and is stored in the given sensor table( -2).

‫ݑ‬

‫ݒ‬

Ԙԛ

‫ݑ‬

‫ݒ‬

3.2.1.2 Application Device query processing. After the verification of Application of Figure 2), Event Manager transmits the context request/response Device( -

CASTmiddleware: Security Middleware of Context-Awareness Simulation Toolkit /LYLQJURRP /RFDWLRQ6HQVRU

/LJKWVHQVRU

511

79'HYLFH2Q

6RIDVHQVRU

(a) Bob is standing at the living room

(b) Bob sit on the sofa in front of the TV device

Fig. 3. Demo and Prototype of our proposed CASTmiddleware using JAVA, Flash MX 2004

Ԝ

instance for the subscription( ) of the given application device’s event to the CQC of Context Manager, then parses and transforms it to SELECT table in SQL, querying ( ) and replying( ) the needed context.

ԝ

Ԟ

3.2.2 CIC (Context Inference Controller) CIC (Context Inference Controller), following given reasoning policy, re-transforms a low-level context to a high-level one, either storing through CQC or replying the query of the Application. 3.2.3 Context Repository Context store is where context information is systematically arranged and stored. In the study, for the later re-use of the context, it was stored using DBMS in MySQL.

4 Implementation and Analysis To probe the feasibility of the suggested CAST, we implemented in Linux/Windows OS, Macromedia’s Flash MX 2004[7, 8], JDK 1.3 and Jini 1.2 development environment [13,14]. We set up a scenario, and subsequently observed that virtual certified devices produce valid context information in accordance with the scenario and that authorized TV devices share and consume it. 4.1 Scenario We realized the prototype scenario as the following; “On July 5, 2005, PM 6:00, Bob heads for home after work. In the way, he stops by a grocery store to buy some fruits. When he arrives home, the light goes on and after certification, he enters home. First, he changes his clothes in his home, and then he has a brief dinner and sits on the sofa. The context manager, detecting his sitting on the sofa, turns on the TV to a channel based on Bob's appetite. Bob runs out to the grocery store to retrieve his wallet that he had mistakenly dropped there, with the TV on. When Bob came back, TV Devices is off, but turns on to the same channel that he viewed when he sits on the sofa.”

512

I. Kim, Y. Lee, and H. Lee

Representation of Context

Sharing of Context

Acquisition of Context

Abstraction

Reusability

manual

semiautomatic

automatic

static

dynamic

Device Authentication

Authorization

Security

CAST

O

O

-

-

O

-

O

O

O

CIVE

O

-

O

O

-

-

O

-

-

Context Toolkit

O

O

-

-

O

-

O

-

O

GUIDE

O

-

O

O

-

-

O

-

-

Approach

Kind

Automation

Dynamicicity

Fig. 4. compared and reviewed CASTmiddleware realized by a prototype and preceding studies

4.2 Prototype Implementation and Demo When Bob enters into living room, the location sensor detects his presence and turns the light on (Figure 3-(a)). But the TV does not go on by the Rule designated by the TV On/Off Adaptation Algorithm. If Bob sits on the sofa by clicking “Sit Down” button of sofa in living room, by the rule, the TV goes on(Figure 3-(b)).[9]

5 Analysis and Conclusion CASTmiddleware generates users and devices in a virtual home domain, designates their relation and acquires virtual context information. In particular, we could find a big merit of guaranteed security in the aspects of context acquirement and consumption/sharing in Figure 4. That is, CASTmiddleware acquires valid context and supports the function of sharing it with only authorized applications. This is significant because it provides virtual experimental environment to the S/W-major context researchers who find it difficult to fabricate physical sensors. As a conclusion, CASTmiddleware(Security Middleware of CAST[9]) that the study suggests has the following characteristics: First, Context Manager of CASTmiddleware supports the acquisition of reusable contexts through virtual device. Second, Event Manager of CASTmiddleware can have confidence in the context acquired by certification/authorization support, and application devices can safely use it.

Acknowledgement This research was supported by the MIC, Korea, under the ITRC(Information Technology Research Center) support program supervised by the IITA(Institute of Information Technology Assessment).

CASTmiddleware: Security Middleware of Context-Awareness Simulation Toolkit

513

References 1. Dey, A.K., Salber, D., Abowd, G.D.: The Context Toolkit: Aiding the Development of Context-Aware Applications. Workshop on Software Engineering for Wearable and Pervasive Computing, Limerick, Ireland, June, (2000) 434-441 2. Dey, A.K., Salber, D. Abowd, G.D.: A Conceptual Framework and a Toolkit for Supporting the Rapid Prototyping of Context-Aware Applications. anchor article of a special issue on Context-Aware Computing Human-Computer Interaction (HCI) Journal, 16 (2001) 97-166 3. Jang, S., Lee, Y.G., Woontack, W.: CIVE: Context-based Interactive System for Distributed Virtual Environment. ICAT 2004 (2004) 495-498 4. Davis, N., Cheverst, K., Mitchel, K., Efrat, A.: Using and Determining Location in a Context Sensitive Tour Guide. IEEE Computer, (2001) 35-41 5. Mitchel, N. D.: The Role of Adaptive Hypermedia in a Context-aware Tourist Guide. Communications of the ACM (CACM), (2002) 47-51 6. Lee, Y.L., Lee, S.Y., Park, H.M., Lee, H.H., Noh, B.N.: The Design and Implementation of Secure Event Manager Using SPKI/SDSI Certificate. IFIP International Conference on UISW 2005, Nagasaki, Japan, (2005) 490-498 7. Kaye, J., Castillo, D.: FlashTM MX for Interactive Simulation. THOMSON, (2005) 8. Swann, C., Caines, G.: XML in FlashTM. QUE Publishing, (2002) 9. Kim, I.S.: CONTEXT-AWARENESS SIMULATION TOOLKIT-A Study on Secure Context-based Learning in Ubiquitous Computing, WEBIST 2006, ) 190-198 10. Weiser, M.: Some Computer Science Problems in Ubiquitous Computing, Comunications of the ACM, (1993) 11. Weiser, M.: Ubiquitous Computing, Nikkei Electronics, (1993) 137-143 12. PHILIP BISHOP AND NIGEL WARREN: JavaSpacesTM in Practice, (2002) 13. EDWARDS, W.K.: Core JINI (2nd Edition), Prentice Hall PTR, (2000) 14. Wu, H.D.: Sensor Data Fusion for Context-Aware Computing Using Dempster-Shaper Theroy, The Robotics Institute Carnegie Mellon University, (2003) 15. Stefanidis, K., Pitoura, E., Vassiliadis, P.: On Supporting Context-Aware Preferences in Relational Database Systems, (2004) 16. Bunningen, A.H.V.: Technical report, Context Aware Querying: Challenges for Data Management in Ambient Intelligence, Technical Report In Fourteenth Belgium-Netherlands, (2004)

Cybernation Process of Lidar System Detecting the Atmospheric Carbon Dioxide* Yue-Feng Zhao1, Yin-Chao Zhang1,2, Pei-Tao Zhao1, Jia Su1, Xin Fang1, Jun Xie1, and Kai-feng Qv1 1

Key Laboratory of Atmospheric Optics, Anhui Institute of Optics and Fine Mechanics, Chinese Academy of Sciences, Hefei 230031, China 2 Key laboratory of environment detection technology, Anhui, Hefei 230031, China [email protected]

Abstract. Anthropogenic emissions of CO2 over the past century have altered significantly the climate over the global. Using Lidar system to detect the atmospheric carbon dioxide is a novel and efficient technique. The cybernation process of Lidar system accomplish the energy monitoring, acquiring controlling, photon counter controlling and signal disposal and display controlling. In this article, the auto-control processes are described in detail. Moreover, the principle of the Lidar controlling is introduced. We use the computer to manage the energy meter, select the appropriate filters and attenuation mirrors by controlling the step motor, run the software to acquire the amplified backscatter signals. At the same time, the cybernation process analyzes the recorded signal and displays the ultimate results. In addition, the cybernation process has been applied into the Lidar system and achieved successfully the concentration profiles of the atmospheric carbon dioxide.

1 Introduction It is well known that the carbon dioxide (CO2) is the biggest radiative forcing for postindustrial climate change. The atmospheric concentration of CO2 has increased by approximately one third since preindustrial times, around 1750 [1]. The increase of CO2 is caused primarily by fossil fuel combustion with additional increases resulting from land use change and biomass burning. The instrumental record of surface temperature over the land and oceans shows that the global mean surface temperature has increase by approximately half a degree in the past century. The linkage between global warming and the increase of atmospheric CO2 is evident, but the details of this linkage are yet to be resolved. So monitoring the changing of concentration for atmospheric CO2 has become one of the most important issues of well understanding the climate change and the environmental problem [2][3]. Light detection and ranging (Lidar) is an active remote sensing technique suitable for monitoring atmospheric CO2. A Lidar system consists of a transmitter and a *

Foundation item: National 863 Program Project(2002AA135030).

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 514 – 519, 2006. © Springer-Verlag Berlin Heidelberg 2006

Cybernation Process of Lidar System Detecting the Atmospheric Carbon Dioxide

515

correlated receiver, which controlled by the cybernation system. The cybernation processes include scanning system, transmitter, receiver and data analysis. This article presents the theoretical analysis and experimental process of Lidar system for detecting the atmospheric CO2. Moreover, the processes of auto-control and data collection are introduced. We also explain the auto disposal and display software for the signal of the Lidar system.

2 Raman Lidar Principle The backscattered power return signal P(r) in the Lidar equation is[4]:

Pco2 (r )

㧩 C (λ

co 2

) M co 2 ( r ) β co 2 ( r ) r

2

r

exp[ − 2 ³ σ ( r ) dr ] .

(1)

0

Where C is the constant coefficient for the Lidar system, β (r ) is the atmospheric volume backscatter coefficient, σ is the atmospheric extinction coefficient resulting from all the other attenuation processes. The number density profile of atmospheric CO2 M (r ) , as a function of the range r, and the backscattered power return signal P(r), can be written as:

M co2 (r )

P co 2 ( r ) r 2

㧩 C (λ

co 2

r

) β co 2 ( r )

exp[ 2 ³ σ ( r ) dr ] . 0

(2)

For atmospheric N2, due to the stabilized proportion in the atmosphere, its atmospheric volume backscatter differential coefficient can be precisely measured. Moreover, its backscattered voltage returned signal could be used as the reference value:

PN 2 (r )

㧩 C (λ

N2

) M N 2 (r )β N 2 (r ) r

2

r

exp[ − 2 ³ σ ( r ) dr ] .

(3)

0

Due to the equal backscatter cross section for atmospheric CO2 and N2 approximately[5], the extinction coefficient can be discarded. So we can gain:

M N 2 (r )C (λ N 2 ) Pco2 (r ) M co2 (r )

㧩

C (λco2 ) PN 2 (r ) β co2 (r )

.

(4)

β N (r ) 2

Based on the principle above, we designed the Raman Lidar system for detecting the atmospheric CO2, and the frames are shown in Fig.1:

516

Y.-F. Zhao et al.

Circumrotation and scanning system

Lidar Transmitting system

Lidar acquiring system

Energy detection system

Lidar controlling system

Fig. 1. The frame for Raman Lidar system

3 The Lidar Controlling Process The Lidar system includes transmitting system, circumrotation and scanning system, acquiring system, energy detection system and controlling system [6][7].

Return signal

Controlling computer

Photon counter

Q Fiber

Filters

PMT

Amplifier

Telescope Fig. 2. Scheme of Lidar acquiring system

As shown in Fig.2, the backscattered signals are collected into the fiber by the telescope. The different filter selects the signal for CO2 and N2 respectively (the test and calculation show that the center wavelength of filter for CO2 and N2 is 371.66nm and 386.70nm). The signals are magnified through the photoelectric multiplier tube (PMT) and amplifier and acquired by the photon counter. The switch (Q) selects the different filter and attenuation mirror. The controlling process is complex, which includes the transmitting, circumrotation and scanning, selecting the filter, acquiring the signal and the data disposal, and at the same time, the controlling process must consider the energy changing of the laser and the triggering signal.

Cybernation Process of Lidar System Detecting the Atmospheric Carbon Dioxide

517

3.1 Signal Acquiring Controlling If the controlling system receives the trigger signal, it means that the Lidar is working. Then the circumrotation and scanning system move to the appropriate direction under the control of computer, which was proceeded by controlling the magnetic stepping motor. Moreover, the computer sends the order to spin the other small stepping motor, which select the aimed filter and attenuation mirror by the different step number. The appropriate step number is calculated basing on the speed and the sequence of the revolving tray. The panel for signal acquiring controlling is shown in Fig.4.

Fig. 3. The panel for signal acquiring controlling

Fig. 4. The panel for the controlling process of the photon counte

The P7882 PCI board is a versatile family of advanced multichannel analyzers/ dual.

518

Y.-F. Zhao et al.

3.2 Photon Counter Controlling The backscatter signals are amplified and converted from lighting signal to electric signal. For the controlling system, acquiring the signals efficiently is the key step for the whole Lidar system. In this Lidar system, the device for collecting the signal is a photon counter (Model: P7882), not the classical collector[8]. input multiscalers. Multiscaling mode is far advanced and offers two count inputs with 100ns bin width (200ns in dual input mode) and counting rates far in excess of 200MHz. The two counting inputs allow acquiring the two wavelengths Lidar backscatter signal. The Lidar controlling system based on the windows panel, which controls the photon counter to acquire the signal when the computer receives the trigger signal. The backscatter signals are recorded into the bin files. Fig.5 shows the controlling process of the photon counter.

Fig. 5. The panel of signal disposal and display

3.3 Signal Disposal and Display Controlling The acquired signals must to be calculated basing on equation (4) and the result should be displayed. LabWindows/CVI software is used for accomplishment, which is a programming environment for developing instrument control, automated test, and data acquisition applications in ANSI C. The controlling computer starts to run the LabWindows software to calculate and display the distribution of the CO2 concentrations. The panel of signal disposal and display controlling is shown in Fig.6.

4 Conclusions Cybernation process on Lidar system detecting the atmospheric carbon dioxide is very complex, which must accomplish the energy detection, signal acquiring controlling, photon counter controlling and signal disposal and display controlling. The controlling processes are designed and running perfectly for the Lidar system. Basing on the process, the concentration profiles of the atmospheric CO2 has been achieved successfully.

Cybernation Process of Lidar System Detecting the Atmospheric Carbon Dioxide

519

References 1. Boyd, T., Tolton, Dany, P.: Sensitivity of Radiometric Measurements of the Atmospheric CO2 Column from Space. Applied Optics. 9 (2001) 1305 2. Immler, F: A New Algorithm for Simultaneous Ozone and Aerosol Retrieval from Tropospheric DIAL Measurement. Appl.Phys. B76 (2003) 593-596 3. Wu, Y. H., Yue, G. M., Hu, H. L.: D2 Stimulated Raman Scattering Pumped by Fourth Harmonic Nd:YAG Laser and Its Application in Laser Radar. Chinese Journal of Lasers. 9 (2000)823 4. Jae, H., Park: Atmospheric CO2 Monitoring from Space. Applied Optics. 12(1997) 2701~2712 5. Shen, J. Q.: Laser Monitoring the Atmosphere. Science Press, Beijing (2001) 86 6. Edner, H., Fredriksson, K., Sunesson, A.: Mobile Remote Sensing System for Atmospheric Monitoring. Appl Optics. 19 (1987) 4330-4338 7. Luca, F., Eric, D.: Comparison among Error Calculations in Differential Absorption Lidar Measurements. Optics &Laser Technology. 33 (2001) 371~377 8. Sonoulepnikoff, L. D., Mitev, V., Simeonov, V.: Experimental Investigation of High-power Single-pass Raman Shifters in the Ultraviolet with Nd: YAG and KrF Lasers. Applied Optics. 21 (1997) 5023

Design of a Robust Output Feedback Controller for Robot Manipulators Using Visual Feedback Min Seok Jie, Chin Su Kim, and Kang Woong Lee School of Electronics, Telecommunication and Computer Engineering, Hankuk Aviation University, 200-1, Hwajeon-dong, Deokyang-gu, Koyang-city, Kyonggi-do, 412-791, Korea [email protected], [email protected], [email protected]

Abstract. In this paper, we proposed a robust output feedback controller using visual feedback for trajectory control of robot manipulators with bounded parametric uncertainties. The feature extraction by the CCD camera mounted on the end effector makes the desired position. The proposed controller with integral compensation reduces steady state error due to the limitation of high feedback gains. High-gain observer is used to estimate joint velocity. We show that the stability of the control system is asymptotically stable and the output feedback controller recovers the performance achieved under the state feedback controller. The performance of proposed method is demonstrated by experiments on two degree of freedom 5-link robot.

1

Introduction

Visual feedback supplies the information of variable working space and makes it possible to work under the unstructured environment and to reduce the time for constructing the working area. Kelly [1] proposed an image-based direct visual servo controller for camera-in-hand robot manipulators, which is of a simple structure based on a transpose Jacobian term plus gravity compensation. In [2], a robust tracking controller has been designed to compensate the uncertainty in the camera orientation and to ensure globally uniformly ultimate boundedness. Robust control schemes for robot manipulators [3] are required to achieve good tracking performance in the presence of parametric uncertainties such as an unknown payload and unmodeled joint friction. The robust saturation control techniques are typically applied to guarantee uniform ultimate boundedness of the tracking errors [4]. Most of these schemes require known uncertainty bounds, which is difficult to estimate precisely. Hence, the estimated uncertainty bounds may be very conservative. These conservative bounds lead to excessive large control magnitudes. Moreover, robust control schemes require high feedback gains in order to maintain small tracking errors. In practice, high feedback gains are limited because of hardware issues such as digital implementation, neglected high frequency modes, and actuator saturation and the noise contained in velocity measurements. Limitation of high feedback gains induces large tracking errors [5]. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 520 – 526, 2006. © Springer-Verlag Berlin Heidelberg 2006

Design of a Robust Output Feedback Controller

521

To overcome this problem, integral control and a joint velocity observer are introduced. The proposed method is based on [6], which is the output feedback control of nonlinear systems. The gains of integral control [7] are designed to avoid wind-up phenomenon and reduce tracking error. The high gain observer is used to estimate the joint velocity of robot manipulators[8]. The performance of the tracking control by the proposed method is demonstrated by experiments on an industrial robot manipulator and compares its performance with the performance under state feedback.

2

Robot Dynamic Model and Camera Model

The dynamic equation of an n-link rigid robot manipulator can be expressed as [1]

M (q)q + C (q, q )q + G(q) = τ .

(1)

We consider a CCD camera mounted at the end effector of the robot manipulator. Let 0

pc and 0 pt be the origin of the camera coordinate frame and the position of feature

point with respect to the robot coordinate frame, respectively. Then the feature point c

pt = [X

Z ] T is represented by

Y

c

pt = c R0 ( 0pt − 0 pc ) .

(2)

where c R0 is the orientation of the camera frame with respect to the robot frame. We assume that the feature point c pt in the camera frame projects onto the point T

ª X Y º on the image plane by a perspective projection. «α β » ¬ ¼ The motion dynamics of the image feature point described by the robot joint velocity as [1]

ξ = [x y ]T =

f Z

ξ = J (q, ξ , Z )q .

(3)

In robot control with visual feedback, the control problem is to design a controller to move the end-effector in such a way that the actual image features reach the desired ones specified in the image plane. We take the desired joint velocity q d as

~ qd = − J + (q, ξ d , Z ) K cξ .

(4)

[

]

−1

where K c is the positive definite matrix and J + (•) = J T (•)J (•) J T (•) .

d are generated by integrating and The desired position qd and the acceleration q

[

differentiating q d , respectively. Set Qd is defined as Qd = qdT

q dT

qdT

]. T

522

3

M.S. Jie, C.S. Kim, and K.W. Lee

Robust State Feedback Control

The dynamic equation (1) can be changed to the error dynamic equation. Defining the tracking error as e = q − qd , the error dynamic equation for the robot manipulator of (1) is given by

x1 = x2 , x2 = − qd − M −1 ( x1 , qd )[C ( x, qd , qd )( x2 + qd )+ G ( x, qd ) + τ ] .

[

where x1 = e , x2 = e , and x = x1T x2T Defining

the

augmented

]

T

(5)

.

state

vector

ζ = [σ T x1T x2T ]T

as

where

t

σ i = κ ³ ei (τ )dτ , i = 1,2,..., n , κ > 0 , the augmented state equation is given by 0

ζ = Aζ + BM −1 ( x1 , qd ) [− Y (x, Qd )θ + τ ] .

(6)

We make the following assumptions on the robot dynamics (1). Assumption 1: There exist positive constants α M , α C , and α G , and a positive function β (ζ ) such that

M 0 ( q ) − M ( q) ≤ α M , C0 (q, q ) − C ( q, q ) ≤ α C q ,

(7)

G0 (q ) − G (q ) ≤ α G , Y ( x, Qd )(θ 0 − θ ) ≤ β 0 + β1 ζ + β 2 ζ 2 . d + α C qd + α G , β1 ≥ 2α C q d where β 0 ≥ α M q

, and β 2 ≥ α C . θ 0 is the

nominal value of θ , M 0 (⋅) , C0 (⋅) and G0 (⋅) are nominal matrices of M (⋅) , C (⋅) and G (⋅) , respectively. We propose the robust state feedback control law

τ = Y ( x, Qd )θ 0 + M 0 ( x1 , qd ) Kζ + τ n .

(8)

where K is the gain matrix and τ n is a nonlinear control input term to achieve robustness to the bounded parametric uncertainties. The nonlinear term τ n is taken as:

λ M β (ζ ) s ° − λ s ° m τn = ® 2 2 λ β (ζ ) s °− M µλ m ¯°

if λ M β (ζ ) s > µ

(9) ,µ > 0.

if λ M β (ζ ) s ≤ µ

where s = B T Pζ , β (ζ ) = β 0 + β1 ζ + β 2 ζ

2

+ αM K ζ

, and P = PT is the

solution of the Lyapunov equation ( A + BK )T P + P( A + BK ) = − I .

Design of a Robust Output Feedback Controller

523

Theorem 1: Suppose that assumption 1 hold. Given the error dynamic equation of (6) the proposed controller of (8) and (9) ensures link position tracking errors to be reduced to the allowed values in finite time. Proof: We take the Lyapunov function candidate V = ζ T Pζ .

If λM β (ζ ) s > µ , we have V ≤ − ζ

2

(10)

,

µ And if λM β (ζ ) s ≤ µ , we have V ≤ ( a − 1) . 2

{

}

Defining Ω c = ζ ∈ R 3 n : V (ζ ) ≤ c , for µ ≤

c 2aλmax ( P )

, with a > 1 , the

inequality of (10) becomes V ≤ − µ ( a − 1) . 2 This implies that the ζ will uniformly ultimate bounded in finite time.

4

Output Feedback Control

We use the same high-gain observer of [8] to estimate x2 for output feedback control. The high-gain observer is given by xˆ1 = xˆ 2 +

1

ε

L1 (x1 − xˆ1 ) ,

(11)

1 −1 xˆ2 = 2 L2 ( x1 − xˆ1 ) − M 0 ( xˆ1 , qd ){Y ( xˆ, Qd )θ0 + τ S (σ , xˆ , Qd )} .

ε

where xˆ = [ xˆ1 xˆ2 ]T , L1 = diag {α1i } , L2 = diag {α 2i } , i = 1, , n and ε is a positive parameter to be specified. The positive constants α1i and α 2i are chosen such that

the matrix L = ª« − L1

¬− L2

I º is Hurwitz. 0 n×n »¼

Define the scaled estimation error

χi =

1

ε 2−i

( xi − xˆi ) , i = 1, 2 .

(12)

We will show that for sufficiently small ε , the output feedback system recovers the performance of the state feedback system. We can revise the estimate of V as V ≤ −

V

λmax ( P )

+

µ 2

+ k1ε , k1 ² 0 .

(13)

524

M.S. Jie, C.S. Kim, and K.W. Lee

For ε ≤ (α − 1) µ

4k1

, we obtain

µ V ≤ − (α − 1) . 4

(14)

Hence for sufficiently small ε , the output feedback controller using only joint position measurements recovers the performance of state feedback controller.

5

Experimental Results

The proposed control method was implemented on two degree of freedom 5-link robot manipulator Faraman-AS1 manufactured by Samsung Electronic, co. Fig. 1 and 2 are experimental environments and program execution picture, respectively.

Fig. 1. Faraman-AS1 robot manipulator

Fig. 3. Tracking errors of link 1

Fig. 2. Program execution picture

Fig. 4. Tracking errors of link 2

Table 1. Design parameters

µ =1

λm = 1.44

β 3 = 0.5

q1 = π / 2( rad )

κ = 1 , outside the set

λM = 8.33

ε = 0.001

q2 = π (rad )

κ = 10 , inside the set

β1 = 35

S1 = 2390

ξ = [100 100]T

K = [−5I − 50I − 50I ]

β 2 = 4.1

S 2 = 2370

ξ d = [0 0]T

Design of a Robust Output Feedback Controller

Fig. 5. Feature errors of x plane

525

Fig. 6. Feature errors of y plane

In these experiments, we assume that a payload of around 4.5 kg is added to the mass m4 and the other parameters are not affected by this added payload. Design parameters are shown in Table 1. The experimental results are shown in Fig. 3, 4, 5 and 6. Fig. 3, 4 represents the tracking errors of joint 1 and 2. The results by the output feedback control method are compared to state feedback control. Fig. 5, 6 illustrates the trajectory of feature point on the image plane which shows the convergence to the desired feature point. Experimental results show that performance of the proposed controller is effective.

6 Conclusions In this work, we have designed a robust output feedback controller using high-gain observer for trajectory control of n-link robot manipulators with parameter uncertainties. The desired trajectory for tracking is generated from feature extraction by the camera mounted on the end effector. The proposed controller to include integral action reduces tracking error. Experiment results applied by robot manipulator show that the tracking performance by state feedback control is recovered for sufficiently small ε parameter.

References 1. Kelly, R., Carelli, R., Nasisi, O., Kuchen, B., Reyes, F.: Stable Visual Servoing of Camera-in-hand Robotic Systems. IEEE/ASME Trans. Mechatronics, Vol.5, No.1 (2000) 39-43 2. Zergeroglu, E., Dawson, D. M., Queiroz, M. S., de, Setlur, P.: Robust Visual-servo Control of Robot Manipulators in the Presence of Uncertainty. Journal of Robotic Systems, Vol.20, No.2 (2003) 93-106 3. Dawson, D. M., Qu, Z., Lewis, F. L., Dorsey, J. F.: Robust Control for the Tracking of Robot Motion: Int. J. Control, Vol.52, No.3 (1991) 581-595 4. Spong, M. W.: On the Robust Control of Robot Manipulators. IEEE Trans. Automat. Contr., Vol.37, No.11 (1992) 1782-1786 5. Liu, G. J., Goldenberg, A. A.: Robust Control of Robot Manipulators Based on Dynamic Decomposition. IEEE Trans. Robotics and Automation, Vol.13, No.5 (1997) 783-789

526

M.S. Jie, C.S. Kim, and K.W. Lee

6. Mahmoud, N. A., Khalil, H. K.: Asymptotic Regulation of Minimum Phase Nonlinear Syste ms using Output Feedback. IEEE Trans. Automat. Contr., Vol.41, No.10 (1996) 1402-1412 7. Khalil, H. K.: Universal Integral Controllers for Minimum Phase Nonlinear Systems. IEEE Trans. Automat. Contr., Vol.45, No.3 (2000) 490-494 8. Lee, K. W., Khalil, H. K.: Adaptive Output Feedback Control of Robot Manipulators using High-gain Observer. Int. J. Contr., Vol.67, No.6 (1997) 869-886

Fuzzy Sliding Mode Controller with RBF Neural Network for Robotic Manipulator Trajectory Tracking Ayca Gokhan Ak1,* and Galip Cansever2 1

Marmara University, Istanbul/TURKEY [email protected] 2 Yildiz Technical University, Istanbul/TURKEY [email protected]

Abstract. This paper proposes a fuzzy sliding mode controller with radial basis function neural network (RBFNN) for trajectory tracking of robot manipulator. The main problem of sliding mode controllers is that a whole knowledge of the system dynamics and system parameters is required to compute the equivalent control. In this paper, a RBFNN is proposed to compute the equivalent control. Computer simulations of three link robot manipulator for trajectory tracking indicate that the proposed method is a good candidate for trajectory control applications.

1 Introduction The applications of intelligent control techniques (fuzzy logic or neural networks) to the motion control of robotic manipulators have received considerable attention in the past decade [1],[2],[3]. Robotic manipulators are highly nonlinear, highly timevarying and highly coupled [1]. In a Sliding Mode Control (SMC) system, control laws are designed to drive the system states toward a specific sliding surface. A sliding motion occurs in which the system state crosses the switching surface repeatedly. When the system states stay in the sliding surface, the equivalent control is capable of making the system state in the surface [4]. SMC suffers two main disadvantages. The first one is that there always exists high frequency oscillation in the control input, which is called “chattering”. The second disadvantage is that it is difficult to obtain the system parameters. For this reason, calculation of the equivalent control is difficult [5]. Neural network is used to compute of equivalent control [6]. Similar to multilayered feed forward network trained with back propagation algorithm, RBFNN also known to be good universal approximator. The input data go through a non-linear transformation in the hidden layer, and then the functions responses are linearly combined to form the output [7]. By introducing the fuzzy concept to the sliding mode and fuzzifying the sliding surface, the chattering in a sliding mode control system can be alleviated, and fuzzy control rules can be determined systematically by the reaching condition of the SMC [8]. In this paper, a fuzzy sliding mode controller based with RBFNN is proposed for robot manipulator. Fuzzy logic is used to adjust the slope of the sliding surface. The *

Student member IEEE.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 527 – 532, 2006. © Springer-Verlag Berlin Heidelberg 2006

528

A.G. Ak and G. Cansever

weights of the RBFNN are adjusted according to some adaptive algorithm for the purpose of controlling the system states to hit the sliding surface and then slide along it.

2 Fuzzy Sliding Mode Controller with RBF Neural Network The control objective is to drive the joint position q to the desired position q d . The tracking error equation can be written as follows: e = q − qd .

(1)

Firstly, a time varying sliding surface is defined to apply the SMC:

s = e + λe .

(2)

in which e is tracking error and λ is a strictly positive constant matrix. The control should be chosen such that the candidate Lyapunov function satisfies Lyapunov criteria. SMC law can be represented as:

u = u eq + u c .

(3)

where u eq is the equivalent control law for sliding phase motion and u c is the corrective control for the reaching phase motion. u c is shown as follows:

u c = Ksign( s ) .

(4)

where K is a positive constant and sign( s ) is signum function. Notice that (4) exhibits high frequency oscillations, which is defined as chattering. To eliminate the chattering shifted sigmoid function is used instead of sign function:

h( s i ) =

2 1 + e − si

−1 .

(5)

Sliding surface ( s ) is fuzzified with the fuzzy system. Absolute error is input and λ is the output of the fuzzy system. The rules in the rule base are in the following form: IF ei is Aim , THEN λi is Bim where Aim and Bim are fuzzy sets. In this paper it is chosen that both ei and same kind of membership functions: VS, S, M, L, XL, XXL. From the knowledge of the fuzzy systems, λi can be written as; λ i = θ kiTψ ki (ei ) .

[

where θ ki = θ ki1 , ,...,θ kim ,...,θ kiM r

[

]

T

have

(6)

is the vector of the center of the membership

]

functions of λi , ψ ki ( ei ) = ψ ki1 ( ei ),...,ψ kim ( ei ),...ψ kiM r ( ei )

T

is the vector of the height

of the membership functions of λi in which ψ kim ( ei ) = µ Am ( ei ) / M r is the amount of the rules.

λi

¦

Mr m =1

µ Am ( ei ) , and

Fuzzy Sliding Mode Controller with RBF Neural Network

529

Sliding variable, s, will be used as the input signal for establishing a RBFNN model to calculate the equivalent control. The Gaussian function is used as the activation function of each neuron in the hidden layer (7). The excitation values of this Gaussian function are distances between the input values of the sliding surface value, s, and the central positions of the Gaussian functions [7].

(

g ( s ) = exp ®− s − c j / σ ¯

) ½¾¿ . 2

j

(7)

where j is the j. neuron of the hidden layer, c j characterize the centers of Gaussian function. The variable σ j determines how the function ( g ) spreads over the domain of its input space. Weightings between input and hidden layer neurons are specified as constant 1. Weightings between hidden and output layer neurons ( w j ) are adjusted based on adaptive rule. The overall outputs of the structure are evaluated by a weighted sum of the neurons contained in the hidden layer and is described by (8). u eq =

n

¦ w j g j ( s) = w T g ( s )

.

(8)

j =1

The weightings of the RBFNN should be regulated based on the reaching condition, ss < 0 . An adaptive rule is used to adjust the weightings for searching the optimal weighting values and obtaining the stable converge property. The adaptive rule is derived from the steep descent rule to minimize the value of ss with respect to w j : w j = γs( t )g( s ) .

(9)

where γ learning parameter.

3 Application A three link scara robot is utilized in this study to verify the effectiveness of the proposed control scheme. The dynamic equations which derived via the EulerLagrange method are presented as follows [9]: ª M 11 « « M 21 «¬ M 31

M 12 M 22 M 32

M 13 º ª q1 º ª C11 C12 « »« » M 23 » « q2 » + l1l2 sin( q 2 )«C21 C22 «¬C31 C32 M 33 »¼ «¬ q3 »¼

C13 º ª q1 º ª 0 º » »« » « C23 » « q 2 » + « 0 » + f ( q ) + t1( t ) = u( t ) C33 »¼ «¬ q 3 »¼ «¬ − m3 g »¼

where,

M 13

§m · §m · M11 = l12 ¨ 1 + m2 + m3 ¸ + l1l2 (m2 + 2m3 ) cos(q2 ) + l22 ¨ 2 + m3 ¸ © 3 ¹ © 3 ¹ m = M 23 = M 31 = M 32 = 0 M 22 = l 22 §¨ 2 + m3 ·¸ M 33 = m3 © 3 ¹

m C11 = −q 2 (m2 + 2m3 ) C12 = − q 2 §¨ m2 + m3 ·¸ C 22 = − q 2 §¨ 2 + m3 ·¸ © 2 ¹ © 2 ¹

C13 = C 22 = C 23 = C31 = C32 = C33 = 0

.

(10)

530

A.G. Ak and G. Cansever

in which q1 , q 2 , q3 , are the angle of joints 1, 2 and 3; m1 ,m2 ,m3 are the mass of the links 1,2 and 3; l1 ,l 2 ,l3 are the length of links 1, 2 and 3; g is the gravity acceleration. Important parameters that affect the control performance of the robotic system are the external disturbance t1 ( t ) , and friction term f ( q ) . The system parameters of the scara robot are selected as following: l1 = 1.0m , l 2 = 0.8m , l 3 = 0.6m , m1 = 1.0kg , m2 = 0.8kg , m3 = 0.5kg , g = 9.8

External disturbances and friction forces are selected as: ª3 sin( 3t )º t1( t ) = ««3 sin( 3t )»» «¬3 sin( 3t )»¼

ª 12q1 + 0.2sign( q1 ) º f ( q ) = ««12q 2 + 0.2sign( q2 )»» «¬12q3 + 0.2sign( q3 )»¼

.

(11)

Block diagram of proposed controller is shown in Fig. 1. Firstly, slope of the sliding surface ( λ ), is adjusted with fuzzy logic block. Equivalent control u eq is computed by RBFNN. RBFNN has three inputs and three outputs. The number of hidden nodes is equal to 5. Central positions of the Gaussian function c j are selected from -2 to 2. Spread factors σ j are specified from 0.1 to 0.7. The weights of the RBFNN are changed with adaptive algorithm (9) in adaptive law block. Outputs of RBFNN and corrective control block are summed and then applied to robotic manipulator. The desired trajectories for three joint to be tracked are, qd (t ) = 1 + 0.1sin( pit ) .

(12)

Tracking position and tracking error for joint 1 is shown in Fig. 2a and b respectively. Tracking position and tracking error for joint 2 is shown in Fig. 3a and b respectively. Fig 4a, and b shows the tracking position and tracking error for joint 3. It is seen in Fig 2a, 3a and 4a that joint positions track the desired trajectory. Tracking errors converge to zero as shown in Fig 2b, 3b and 4b.

Fig. 1. Block diagram of proposed control system

Fuzzy Sliding Mode Controller with RBF Neural Network

(a)

531

(b)

Fig. 2. Simulation results for joint 1: (a) tracking response; (b) tracking error

(a)

(b)

Fig. 3. Simulation results for joint 2: (a) tracking response; (b) tracking error

(a)

(b)

Fig. 4. Simulation results for joint 3: (a) tracking response; (b) tracking error

4 Conclusion In this paper a novel fuzzy sliding mode controller with RBFNN is proposed for robotic manipulators. To verify the effectiveness of the proposed control scheme, it is simulate on three link scara robot manipulator. Weighting values of radial basis

532

A.G. Ak and G. Cansever

functions can be converged by on-line learning with zero initial weighting values. Adaptive training algorithm were derived in the sense of Lyapunov stability analysis, so that system-tracking stability of the closed-loop system can be guaranteed whether the uncertainties or not. Simulation results presented in his paper indicate that tracking responses trace closely the desired trajectory occurrence of disturbances and the friction forces. Suggested approach has considerable advantages compared to the classical one and a chattering-free trajectory tracking performance without exact knowledge of system parameters.

References 1. Guo, Y., Woo P.: An Adaptive Fuzzy Sliding Mode Controller for Robotic Manipulator. IEEE Trans. on System, Man and Cyb.Part A: Systems and Humans, Vol.33, NO.2 (2003) 2. Sun, F.C., Sun, Z.Q., Feng, G.: An Adaptive Fuzzy Controller Based on Sliding Mode for Robot Manipulators. IEEE T. on Sys., Man and Cyb.-Part B: Cybs., Vol.29, NO.4 (1999) 3. Wai, R.: Tracking Control Based on Neural Network Strategy for Robot Manipulator. Neurocomputing 51 (2003)425-445 4. Hsu, Y., Li, H.: A Fuzzy Adaptive Variable Structure Controller with Applications to Robotic Manipulators. IEEE Trans. on Systems, Man and Cybernetics- Part B: Cybernetics, Vol.31, NO.3 (2001) 5. Tsai, C., Chung, H. and Yu, F.: Neuro-Sliding Mode Control with Its Applications to Seesaw Systems. IEEE Trans. on Neural Networks, Vol.15, NO1 (2004) 6. Ertugrul, M., Kaynak, O.:Neural Computation of the Equivalent Control in Sliding Mode for Robot Trajectory Control Applications. IEEE Int.Conf. on Rob.&Automation (1998) 7. Huang, S., Huang, K., Chiou, K.: Development and Application of a Novel Radial Basis Function Sliding Mode Controller. Mechatronics (2001)313- 329 8. Lin, C., Mon, Y.: Hybrid Adaptive Fuzzy Controllers with Application to Robotic Systems. Fuzzy Set and Systems. 139 (2003)151-165 9. Wai, R.J. and Hsich, K.Y.: Tracking Control Design for Robot Manipulator via Fuzzy Neural Network. Proceedings of the IEEE International Conference on Fuzzy Systems. Volume 2 (2002)1422–1427

Hybrid Fuzzy Neural Network Control for Complex Industrial Process Qingyu Yang1, Lincang Ju2, Sibo Ge1, Ren Shi1, and Yuanli Cai1 1

Dept. of Automation, Xi’an Jiaotong University, 710049 Xi’an, P.R.C. [email protected] 2 Dept. of Power & Control Engineering, Xi’an Jiaotong University, 710049 Xi’an, P.R.C. [email protected]

Abstract. Fire exchange must be done for the large scale glass furnace in order to assure combustion equality. However, disturbance during fire exchange breaks the stability of temperature and pressure in furnace, and the automatic control precision is decreased greatly. A hybrid fuzzy neural network controller is employed to suppress these fluctuations in this paper. The algorithm and structure of hybrid FNNC are described in detail. During fire exchange, the optimal control coefficients are obtained using FNNC, at the same time the typical PID controller is closed. After fire exchange, the fuel and press output decrease or increase gradually according to these coefficients. The hybrid FNNC has already been implemented successfully. The results show that the hybrid FNNC algorithm improved the performance of automatic control greatly.

1 Introduction Large scale glass furnace is a complex chemical device. In order to assuring combustion equality, fire exchange is needed every about 20 minutes for the large scale glass furnace. However, the cutting off of fuel(such as gas or oil) and air(offering oxygen for fuel) during fire exchange results in the great fluctuation of temperature, press and level in furnace. In this process the temperature and press decrease after fuel is cut off, which make glass level fluctuant. After fire exchange is end, in order to keep the setting temperature a mass of fuel is insufflate into the furnace. Because of the delay of temperature and hot inertia, the temperature and press will be fluctuated again. Furthermore this breaks the stability of temperature and pressure, and the automatic control precision is decreased greatly. Technics process of large scale glass furnace is shown as Fig.1. The fire exchange of large scale glass furnace is a complex nonlinear and timevariable process. During this process, the most importance is to suppress the fluctuation of temperature and press. The general method is difficult to obtain good control precision. Recently, much research has been done on applications of fuzzyneural-network (FNN) systems for control of dynamic systems[1-3]. The FNN system possesses the merits of both fuzzy systems and neural networks, which combines the capability of fuzzy reasoning in handling uncertain information and the capability of D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 533 – 538, 2006. © Springer-Verlag Berlin Heidelberg 2006

534

Q. Yang et al.

neural networks in learning from process[4-6]. The neural network’s ability to produce arbitrarily nonlinear mappings has been demonstrated in various application studies. Therefore FNN should be a most appropriate control method for the fire exchange process of large scale glass furnace. Fire gun Left hot keeping room

1# material machine

Press machine

101 channel No.1

No.2

No.3

No.4

No.5

ME

RE

2# material machine

102 channel

Right hot keeping room

Press machine

Direction of glass stream

Fig. 1. Technics process of large scale glass frunace

Therefore a hybrid fuzzy-neural-network-controller (FNNC) is presented in this paper. This controller works together with typical PID controller. The authors’ work attempts to suppress the fluctuation of temperature and press during fire exchange, and keep the stability of hot parameters in furnace. The hybrid FNNC can guarantee that the actuators of gas and press act according to the determinate curve. The remainder of this paper is organized as follows. In section 2, we describe the structure of hybrid FNNC. This section presents the FNNC algorithm and structure of the total control system in detail. Section 3 is the experimental results and some discussions. A short conclusion is drawn in section 4.

2 Structure and Algorithm of Hybrid FNNC During fire exchange, we obtain the optimal control coefficients of gas and press using FNNC. These coefficients multiplied by the output of typical PID controller are used for the increment of gas and air. The typical PID controller is closed during fire exchange. After fire exchange, the fuel and air decrease or increase gradually according to the optimal coefficients. The hybrid FNNC for fire exchange is shown as Fig.2. In Fig.2 the output of selector is out2 during fire exchange. Except fire exchange time, the output of selector is out1. The exchange signal and gas closed signal are used to control the start and close of PID controller and FNNC respectively. A 4-layer FNN[7] is selected in Fig.2, which is comprised of four layers: an input layer, a membership layer, a rule layer, and an output layer. The hybrid FNNC has two inputs and two outputs. The structure of FNNC is shown as Fig.3. The signal propagation and the basic function in each layer are introduced below.

Hybrid Fuzzy Neural Network Control for Complex Industrial Process

535

Temperature sensor Exchange signal

S1

᧧

᧩

Control lock and unlock

eT

out1

Typical PID controller Gas closed

Gas control coefficient α g FNNC

Multiplier

Press control coefficient α p

S2

eP

out1

Typical PID controller

Temperature out2 ME

Multiplier

᧧ ᧩

Gas actuator

selector

out2

Pressure

Pressure actuator

Selector

Control lock and unlock Exchange signal Pressure sensor

Fig. 2. Hybrid fuzzy neural network controller for fire exchange

yi(1)

PB

yk( 3)

y (j2 )

wko( 4 )

... NB

...

xi(1)

PB

yo( 4)

... NS

Fig. 3. Structure of FNNC

Layer 1(The input layer): For the i-th node in the input layer, the network input and the network output are represented as si(1) = xi(1) ,

y i(1) = f i (1) (s i(1) ) = s i(1) .

(1)

Where i = 1,2 . The FNNC has two inputs, x1(1) = eT , x 2(1) = eP , which are the error of temperature and press respectively.

536

Q. Yang et al.

Layer 2(The membership layer): The aim of this layer is to obtain the membership function of inputs. According to the practical situation, the error change of press and temperature is symmetrical, so we choose their linguistic variables as {PB, PM, PS, ZO, NS, NM, NB}. Then the number of nodes in this layer is 2×7=14. In this paper, the Gauss function is adopted as the membership function. The network input and output are followings s (j 2) = −

( xi( 2) − cij ) 2 , ( 2 ) y j = f j( 2) s (j2 ) = exp s (j2) . (σ ij ) 2

( )

( )

(2)

Where cij and σ ij , respectively, the center and width of the Gaussian function in the j-th term of the i-th input linguistic variable to the node of membership layer. Here j = 1,2, ,14 is the total number of linguistic variables. Layer 3(The rule layer): This layer is used to implement the antecedent matching. Here we calculate the fitness value of every rule by take the method of product. Then for the k-th rule node in this layer, the network input and output are

( )

( 3) ( 3) ( 3) ( 3) s k( 3) = ∏ w (jk3) x (j3) , y k = f k s k = s k . j

(3)

Where w (jk3) is the weight between the membership layer and the rule layer, which is assumed to be unity. k = 1,2,,49 is the number of rules. Layer 4(The output layer): This layer computes the overall output as the summation of all input signals. The relation between network input and output is described as

( )

( 4) ( 4) , y o( 4 ) = f o( 4) s o( 4 ) = s o( 4 ) . s o( 4) = ¦ wko xk

(4)

k

Where the weight wko( 4) is the output action strength of the o-th output associated with the k-th rule. o = 1,2 is the number of outputs, and y1( 4 ) and y 2( 4 ) are the control coefficient of temperature and press as shown in Fig.2 respectively. In above expressions, the parameters cij , σ ij and wko( 4) are regulated by learning algorithm. A supervised learning law is used to adjust the link weights of Layer 4 wko( 4) in the FNNC in this paper.

3 Experimental Results and Discussion Fig.4(a) and Fig.5(a) are the typical fluctuant curve of temperature and press of ME in furnace respectively. The setting temperature is 1605 , and the setting pressure is 27Pa. As shown in the two figures, during fire exchange the temperature decreases about 30 , and the fluctuant of pressure is about ±10Pa. Furthermore the overshoot of temperature appears after fire exchange. This is because that the temperature drops down during fire exchange, and the regular controller has to increase fuel greatly to

ć

ć

1610

1610

1600

1600

Temperature(OC)

Temperature(OC)

Hybrid Fuzzy Neural Network Control for Complex Industrial Process

1590 1580

537

1590 1580 1570

1570 1560 08:20:00

08:40:00

09:00:00 09:20:00 time(h:m:s)

1560 16:00:00

09:40:00

16:20:00

16:40:00

17:00:00

17:20:00

time(h:m:s)

(a) Typical temperature curve

(b) Temperature curve using hybrid FNNC

35

35

30

30 Press(Pa)

Press(Pa)

Fig. 4. Temperature fluctuant curve of ME

25

20

25

20

15 08:20:00

08:40:00

09:00:00

09:20:00

09:40:00

15 16:00:00

16:20:00

time(h:m:s)

16:40:00

17:00:00

17:20:00

time(h:m:s)

(a) Typical pressure fluctuant curve

(b) Pressure fluctuant curve using hybrid FNNC

Fig. 5. Pressure fluctuant curve of ME

keep the temperature near the setting point. However the rapid increasing of fuel results in the great fluctuant of pressure too. Therefore the fire exchange breaks the stability of temperature and pressure, and automatic control precision is decreased. According to Fig.2, the hybrid FNNC algorithm is implemented. During the nonfire-exchange period, the FNNC is locked, and the output of selector is out1. After startup of fire exchange, the typical PID controller is locked, and the output of selector is out2. After fire exchange, the output of press, gas and air is decided by the control coefficient computed by FNNC. When the fire exchange is end, the FNNC is locked again, and the system is controlled by typical PID controller. Fig.4(b) and Fig.5(b) are the fluctuant curve of temperature and press of ME respectively using hybrid FNNC algorithm. We can conclude that the maximal temperature decreasing is about 14.5 , and the fluctuant of press is about ±2.5Pa. Compared with the typical fluctuant curve as shown in Fig.4(a) and Fig.5(a), the fluctuation of temperature and press is more smaller. Therefore when fire exchange is end, the fuel change relative to the non-fire-exchange is small, which improves the temperature overshoot. The fuel and air are put into the furnace gradually. Furthermore the fluctuant of press is small relative to the time before FNNC adopted. So the control system with FNNC algorithm suppresses the fluctuation of parameters and guarantees the control precision and the stability of temperature and press.

ć

538

Q. Yang et al.

4 Conclusion This paper discusses the hybrid FNNC and its application in a complex industrial process. The emphases in this paper are how to suppress the fluctuation of temperature and press during fire exchange using the hybrid FNNC. The results show that the hybrid FNNC improves the control precision during fire exchange and suppresses the fluctuation of parameters in furnace. Therefore a good combustion environment is obtained, which is the guarantee of products and quality. The design thinking and application experience of this hybrid FNNC are useful to control other complex industrial process.

References 1. Wai, R.J.: Hybrid Fuzzy Neural-Network Control for Nonlinear Motor-Toggle Servomechanism. IEEE Trans. Control Systems Technology, Vol. 10. (2002) 519–532 2. Lin, F.J., Wai, R.J.: Adaptive Fuzzy-Neural-Network Control for Induction Spindle Motor Drive. IEEE Trans. Energy Conversion, Vol. 17. (2002) 507–513 3. Wang, Y.N., Zhang, C.F.: Genetic-Based Neural fuzzy Control for Complex Industrial Process. Control Theory and Applications, Vol. 16. (1999) 886–891 4. Sun, Z.Q.: Intelligent Control Theory and Technology. Tsinghua University Press, Beijing (1997) 5. Muskinja, N., Tovornik, B., Donlagic, D.: How to Design a Discrete Supervisory Controller for Real-Time Fuzzy Control Systems. IEEE Trans. Fuzzy Systems, Vol. 5. (1997) 161–166 6. Park, Y.M., Choi, M.S., Lee, K.Y.: An Optimal Tracking Neuro-Controller for Nonlinear Dynamic Systems. IEEE Trans. Neural Networks, Vol. 7. (1996) 1099–1110 7. Chen, Y.C., Teng, C.C.: A Model Reference Control Structure Using a Fuzzy Neural Network. Fuzzy Sets Systems, Vol. 73. (1995) 291–312

Intelligent Vehicle Control by Optimal Selection of Image Data M. Junaid Khan, Danya Yao, Juan Zhao, Shuning Wang, and Yu Cai Department of Automation, Tsinghua University, Beijing 100084, China {khan03, zhaojuan03, y-cai04}@mails.tsinghua.edu.cn {yaody, swang}@mail.tsinghua.edu.cn

Abstract. This paper presents an algorithm for distance measurement and control between two moving vehicles using video input from a camera mounted on a follower car behind a preceding car. The idea is to keep the computational complexity bounded by switching between diﬀerent levels of image quantization, using a pyramid decomposition of the image. The distance is related to the car size in the image, hence the number of pixels representing the car. In order to achieve global bounds on the error, the controller works coarsely when the vehicles are close and at ﬁner range for more distant vehicles. Hence this paper presents a method for trading oﬀ accuracy vs. speed of computations in a speciﬁc vehicle following application.

1

Introduction

Images contain vast amount of information, yet only a small fraction of it may be relevant to a given computer vision task. Although micro-miniaturization of electronic processors has now reached a stage where the design of complex technical systems for on-line data interpretation has become aﬀordable. However, in real time applications, the image data rate still exceeds the processing abilities of general-purpose computers. Thus it is expedient, often essential that a vision system avoids excessive details. For example, in case of tracking a moving object based on computer vision, very quick processing is not needed if the object is distant, but when the object is close, short reaction time is mandatory. There are three major sources of error while tracking a moving object from a video camera. The ﬁrst depends on the number of pixels that identify the object; the second is due to the scale of image-processing algorithm; and the third is the inherent noise due to visibility conditions. These errors result in uncertain measurements, so the designed system needs to be robust to allow for some uncertainty while still displaying stability in real-time. A too coarse scale oﬀers faster computations at the cost of precision, while a too ﬁne scale results in accurate but with higher computational cost. This trade-oﬀ forces a compromise where the vision system is to be developed such that it is able to adapt to its environment. This paper addresses the longitudinal vehicle following problem where a follower vehicle D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 539–545, 2006. c Springer-Verlag Berlin Heidelberg 2006

540

M.J. Khan et al.

maintains a constant headway distance with a leader while using CCD camera as a distance sensor.1 Recent research in intelligent vehicles on highways varies in the types of controllers used for automated driving; examples are classical PID controllers [1], LQR synthesis approach [2], [3],[4], control based on a Lyapunov function [5], Hinﬁnity control [4],[6], fuzzy logic [7] etc. This paper develops a framework where the uncertainty in the system, characterized as a result of coarse measurements due to diﬀerent scales, is handled by the H-inﬁnity controller due to its inherent robust stabilization feature. H-inﬁnity synthesis gives a bound on uncertainty which lets demarcate the use of a certain pyramidal level of image processing. A set of quantizers, whose extent of quantization depends on the demarcations imposed by allowable uncertainty, models the measurements of the image sensor at diﬀerent levels of pyramid. Real image data has been obtained and processed by a specialized commercial software HALCON for image detection, recognition and then counting the number of pixels representing the car in image in the horizontal plane. In the experiments, it is observed that the algorithm produces faster results if the image is being represented in coarser scale. The paper is organized as follows: In Section 2, the vehicles’ dynamic model is described. In Section 3 we propose the distance estimation scheme with the help of a CCD camera. In the same section, pyramid formation of an image is also discussed. Section 4 covers the controller design. In Section 5, we perform simulations where we evaluate the performance of our methodology by assuming some possible real state scenarios and maneuvers. Finally we draw conclusions in Section 6.

2

Vehicles’ Dynamics Modeling

The system model is developed using the following notations as listed below: ξl Dist. of leader from a ref. point ξf Dist. of follower from a ref. point ξd Desired distance ξe Error in the desired distance Kmd Mechanical drag on vehicle Kad Aerodynamic drag coeﬃcient τf Engine time constant of vehicle

L Length of vehicle vl Velocity of leader mf Mass of vehicle vf Velocity of follower δf Engine input of vehicle af Acceleration of follower al Acceleration of leader

We model the system with the following simplifying assumptions. (i) The motion of the vehicles is constrained to translations (ii) The movement of vehicles is smooth. (iii) Width of the leader car wo is known. Considering two cars moving on road, it is intuitive that ξl − ξf − L ± ξe = ξd . Consider that the dynamics of vehicles on road is described by the following nonlinear model [2], ξ˙e = vl − vf , v˙f = af , a˙f = gf (vf , af ) + hf (vf )δf . 1

(1)

This work was supported in part by the HEC Pakistan, NUST Pakistan, NSFC (No.60374061) and National 973 Program (No.2002CB312200) of China.

Intelligent Vehicle Control by Optimal Selection of Image Data

541

Functions gf (., .) and hf (.) are given by: gf (vf , af ) = −

2Kad 1 Kad 2 Kmd 1 − [af + vf + ], hf (vf ) = . mf τf mf mf mf τf

(2)

If the parameters in equation 2 are known, the following feedback linearizing control law could be adopted [2]: δf = mf µ + Kad vf2 + Kmd + 2τf Kad vf af .

(3)

Using equations (1) through (3), the 3rd equation in (1) becomes:a˙f = τ1f (µ−af ), where µ is the input signal that makes the closed loop system satisfy certain performance criteria and it can be viewed as the throttle/brake input causing acceleration/decelerations in the controlled vehicle . Hence the resulting system can be described by the following standard linear equations: x(t) ˙ = Ax(t) + Bu(t)

3

where

A ∈ 3x3 , y(t) = Cx(t) + Du(t).

(4)

Distance Measurement

We propose a method to estimate the distance ξˆd of the object mapped on the CCD of a video camera by counting the number of pixels representing the object in the image. The distance, as measured from number of pixels np mapped on 0 CCD, is given by: ξˆd = nfp∗w ∗sp The focal length f of the camera used for experiment is 17.5 mm and the pixel size sp is approximately chosen as 10 µm. After taking the video sequences of moving cars on road, the image sequence is processed for detection and object recognition by HALCON software. The software allows computing various parameters including the object size (width in our case) in the image. However, for eﬃcient handling of image in diﬃcult scenarios like low visibility, shadows and occlusions, the software requires careful tuning. A number of real experiments are performed to analyze the eﬀectiveness of the distance measuring scheme, some characteristic results are appended in Table 1. We have only taken vertical outer edges of the car to get the estimate of distance, as shown in Fig. 1. Because of the factors of error mentioned in Section 1 and 2, it will not be possible to get a very accurate number of pixels mapping on CCD. The noise factors will make the actual number of pixels, drift from its real value. The same is veriﬁed by our real time experiments of distance estimation on cars moving on the road. However the diﬀerence could be brought to minimum by careful setting of threshold and other parameters. It can be readily said that if we get one pixel error due to mapping conditions, then the relative error increases as the distance to the object is increased. For eﬃcient handling of image data, the pyramid formation of image has been proposed for multi resolution image processing [8],[9], since a pyramid formation of the image serves as a preprocessor to reduce the data rate to a level where it can be eﬃciently processed for extracting essential image features. In this work, the analogy of error propagation as the distance increases is extended

542

M.J. Khan et al.

Fig. 1. A car being tracked at diﬀerent distances

to pyramidal image processing. The image is low pass ﬁltered successively and the set of these low pass copies are appropriately sub sampled. Several special purpose hardware architectures are proposed for pyramid processing [9],[10]. Details of pyramid generation can be consulted from [9]. The system will encounter half the image data as compared from previous level while moving one level up and so the speed of computations will improve. In this way, by choosing an appropriate level for selective image data processing, the time required for computations and accuracy of measurements, can be eﬃciently traded oﬀ with each other. A Matlab code is written to ﬁlter any image to various levels of pyramid according to the scheme explained in [9]. The pairs of curves in Fig. 2a show the regions of error occurred due to error in number of pixels. The choice for a suitable pyramid level depends on uncertainty Γ obtained from H∞ design and the relative distance between the vehicles. The segments of distances corresponding to a speciﬁc level of pyramid are obtained as shown in Fig.2b. The error in estimated distance at certain level of pyramid depends on the level of pyramid and true distance ξˆd = f (l, ξd ± ξe ). To validate the scheme, the input image sequence to the software is decomposed to the image pyramid and then the car tracking is observed. It is found that the processing frame rate of the software is higher in case of using higher level of pyramid. Our assumption of making one pixel error, while processing the car image at zero level, leads to an equivalent 2-pixel error on 1st level, 4 pixels error on 2nd level, 8 pixels error on 3rd level and so on. This approach allows achieving eﬃciency in computations at the cost of accuracy. The eﬀect is modeled by using scaled quantizers in the system model; the extent of quantization is set according to the segments of distance, the soul purpose being to model the uncertain measurement pertaining to that particular inter-vehicular spacing. The switching logic for the quantizers is based on the current distance measured by the sensor. However the range of each segment of distance corresponding to a particular quantizer, is governed by H-inﬁnity synthesis. In this work, we have introduced three quantizers that simulate the employment of 3 levels of pyramid.

4

Robust Controller Synthesis

H-inﬁnity control design particularly has some advantages for a control problem like this due to its inherent robust stabilization eﬀect. In this work we model uncertainty in the plant by uncertain measurements available to the controller. We then apply H-inﬁnity control based on LMI. In the formulation of H-inﬁnity setup, the eﬀects of external input noise and factors of uncertain measurements

Intelligent Vehicle Control by Optimal Selection of Image Data

543

Table 1. Experimental Results Experiment 1 No of Pixels (software) 89 No of Pixels (manually) 90 Error 1 Relative Error (percent) 1 Estimated Distance(m) 32 Real Distance(m) 30

2 62 64 2 3 46.5 45

3 45 46 1 2 62.5 60

4 40 42 2 5 72 75

are considered. The controller synthesized from the nominal model is then tested on the perturbed system that contains model uncertainties. To examine the eﬀects on vehicles tracking behavior as a result of model variations, simulations are performed and the results show that the designed controller can stabilize the system in the presence of perturbations in measurements.The generalized plant P has the following state space representation. x(t) ˙ = Ax(t) + b1 w(t) + b2 µ, z(t) = c1 x(t) + d11 w(t) + d12 µ(t), y(t) = c2 x(t) + d21 w(t) + d22 µ(t)

(5)

This system P contains four signals w, u, z and y.The signal w is 2 dimensional exogenous input that lumps together the eﬀect of external disturbance and the eﬀect of uncertainty, the signal µ is 1 dimensional, which is the controller output (and therefore the plant input) and the signal y of dimension 1 as the controller input (and therefore a plant output). The signal z is called the objective signal, which in this case is the error in spacing. The perturbed plant is formed by introducing perturbations in the nominal plant P. The goal for the generalized regulator problem is to ﬁnd a controller K for the generalized plant so that the inﬁnity norm of transfer function CLSY Szw mapping the input w to the objective z is smaller than the upper bound for all frequencies. An additional constraint is that the closed loop of P and K has to be stable. In H∞ terms, this can be expressed as:CLSY S zw ∞ < Γ1 ,such that K stabilizes P. H-inﬁnity synthesis problem is solved by LMI approach using Matlab 7.0 routines. The value of inﬁnity norm governs the sector bounded by the dark straight lines of Fig. 2b. The intersections of these lines with those curves depicting pixels error, give us the segments of distance where a certain amount of error is permissible. It can be seen that in the nearer regions, more pixels error is permissible whereas beyond 265m, only one pixel error is permissible, which means processing of the image at zero level.

5

Simulations Analysis

In simulation analysis, the parameters as listed in the vehicle modeling section, are assumed as,mf = 1200,τf = 0.2,Kdf = 0.3,Kmd = 0.35 . The following scenario has been implemented in the Simulink model of the system: the vehicles

544

M.J. Khan et al.

Fig. 2. (a) Pairs of curves giving regions due to error in number of pixels. (b) Segments of distances for a speciﬁc pyramid level.

Fig. 3. Tracking response using diﬀerent pyramid levels for image processing

are initially moving at speeds of 25m/sec, the initial spacing between them, is assumed to be 65 m. At 50th instant the leader accelerates and its velocity rises to 28 m/sec, then at 60th instant, it decelerates and comes back to 25 m/sec. Based on variable spacing policy [4], the initial velocity of leader require a desired spacing of 30 m. As soon as the autonomous operation begins the follower car with automatic control, covers the initial diﬀerence of 35 m and then later at 50th instant, it has to cover up the error in distance due to increased velocity of the leader. The response of the controlled vehicle is shown as Fig. 3(a). The uncertainty bound obtained from H-inﬁnity design for such a desired distance allows the processing of image at the 3rd level of image pyramid and thus the controller will receive coarse measurements from the distance sensor. The quantized signal available to the controller is shown as Fig 3(b). To show the whole idea of coarse-ﬁne measurements and the associated control, we performed simulations with diﬀerent initial and desired spacings each corresponding to the region characterized for diﬀerent levels of pyramids as shown in Fig.2b. The entire scenario for further simulations remains the same except for the initial conditions and desired spacing. The response of the follower to track desired spacing from diﬀerent initial spacing, is shown in Fig 3 (c) and (e). It is clearly observable from Fig. 3 (d) and (f), that the quantization gets ﬁner as the distance to track increases, and hence the pyramid level is set to a lower one. The switching logic of the quanitzer from coarse to ﬁne or vice versa, can be formulated on the basis of current distance to the leader.

Intelligent Vehicle Control by Optimal Selection of Image Data

6

545

Conclusion

We used the camera as a sensor to track another vehicle by using a simple technique of recovering distance from the image formed on the CCD of a camera. We have devised a method in which the eﬃcient image data handling properties of pyramidal approach of processing and robustness properties of H-inﬁnity controller are utilized to optimize the performance of tracking system for real time applications. The system has shown suﬃcient robustness to uncertainties. Using H-inﬁnity control, we constantly raised the degree of uncertainty by exploiting pyramidal approach, we sacriﬁced on accuracy by moving up the pyramid to even 3rd level where an image, measuring 256x256 pixels is just left with 32x32 pixels with a consequent eﬀect of blurring around edges is 3 times, but still with such a coarse image to measure distance, a stable tracking is observed. So we simultaneously gained eﬃciency in computations, which is especially desirable in close quarter situations.

References 1. Shladover, S.: An overview of the automated highway systems program. Vehicle Syst Dynamics 24 (1995) 551–595 2. Srdjan, S.S., Milorad, J.S., Siljak, D.D.: Decentralized overlapping control of a platoon of vehicles. IEEE Trans. on Control Syst Tech 8 (2000) 816–832 3. Khan, M.J., Wang, S.: LQR based autonomous cruise control with a minimum order state observer. In: IASTED Intl Conf. Intell. Syst. and Control. (2005) 4. Khan, M.J., Wang, S.: Intelligent longitudinal cruise control by quadratic minimization and robust synthesis. In: IEEE Intl Conf. on Veh. Electronics and Safety. (2005) 5. Tae, S.N., Kil-To, C., Do-Hwan, R.: A lyapunov function approach to longitudinal control of vehicles in a platoon. IEEE Trans. on Vehicular Technology 50 (2001) 116–124 6. Lefebvre, D., Chevrel, P., Richard, S.: H-inﬁnity-based control design methodology dedicated to the active control of vehicle longitudinal oscillations. Trans. on Control Systems Technology 11 (2003) 7. Zyed, Z., Patrick, L.: Longitudinal control of an autonomous vehicle through a hybrid fuzzy/classical controller. In: Conf. Record Idea/Microelectronics. (1994) 118–124 8. Dickmanns, Ernst, D.: Dynamic Vision for Intelligent Vehicles”, Laboratory for Information and Decision Systems. MIT,Cambridge (1998) 9. Peter, J.B.: Smart sensing within a pyramid vision machine. In: Proc. of the IEEE. Volume 76. (1988) 10. Andrew, P.W.: Scale space ﬁltering. In: Proc. Int’l joint conf. Artiﬁcial Intelligence. Volume 2. (1983)

Rule-Based Expert System for Selecting Scene Matching Area Guozhong Zhang and Lincheng Shen College of Mechatronics Engineering and Automation, National University of Defense Technology, China [email protected]

Abstract. The selection of Scene Matching Area is a key technology in mission planning of unmanned aircraft navigated by scene matching. Since many current methods didn’t consider the stability of scene area, constraint conditions of mission planning and expert knowledge, they have little credibility and cannot resist disturbance efficiently. In order to solve this problem, a rule-based expert system for scene navigability analysis with high efficiency and intelligence is presented. The principal decision rules of the system are extracted based on rough set theory. The method of certainty factor model is used to reason in the rule-based expert system. Experimental results demonstrate that the proposed system is effective and more suitable for scene navigability analysis.

1 Introduction Scene matching system provides position-correction information for Inertia Navigation System of unmanned aircraft, which adopts Digit Scene Matching Area Correlator (DSMAC) techniques. The principle of scene matching system is summarized as follows. The geography position of reference map in scheduled aircraft’s flight course is known. If one can get the relative location of sensor map in the reference map based on the results of certain algorithm, then the actual localization of aircraft can be determined. The precision of scene matching based navigation is independent of the aircraft’s flight distance in theory. Compared with traditional navigation approaches, scene matching based navigation gives much higher accuracy and has higher autonomy. Scene navigability analysis, is evaluating the matching and fix properties of scene area for scene matching system beforehand. According to what the matching system’s requirement for matching fix property, one can determine whether the scene area is Scene Matching Area (SMA) or not. How to select the SMA is a key technology in DSMAC system, and there is no mature scheme to resolve it at present. As an ideal method for selecting the SMA, the unique features, stability features, expert knowledge, information of terrain classification and constraint conditions of the aircraft should all be considered. Therefore a rule-based expert system for scene navigability analysis ought to be set up. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 546 – 553, 2006. © Springer-Verlag Berlin Heidelberg 2006

Rule-Based Expert System for Selecting Scene Matching Area

547

2 Proposed Expert System The basic structure of a rule-based expert system for selecting the SMA is shown in Fig. 1. The reference map usually refers to remote sensing image. The typical set of reference maps must be built firstly when we set up rule-based scene navigability expert system. We choose 576 samples of reference map representing different regions. These samples include residential area, farmland, canal, lake, highway, airport, grassland, desert and so on. The corresponding terrains of these reference maps are plain, hill and mountainous region. The width and length of reference map are both 1200 meters. Secondly, the reference maps should be interpreted and analyzed quantitatively, thus the database of condition attributes and decision attribute can be set up. For statistical attribute alone cannot describe the unique features of reference map, texture feature, fractal feature, entropy, non-similarity between sensor map and reference sub-map, undulating degree of terrain which corresponds to reference map should all be considered when setting up the database of condition attributes. It is difficult to build the mathematic model of scene for navigability analysis because the relationship between scene’s space position and matching suitability is complex. Unfaithful model always leads to false evaluation of SMA’s performance. In addition, the features computed from raw scene are sometimes incomplete and inconsistent. So it is necessary to choose a powerful tool to discover knowledge of scene navigability without building any model for scene signal. Rough set is a valid mathematical theory to deal with imprecise, uncertain, and vague information [1-2]. It has been applied in such fields as machine learning, data mining, intelligent data analyzing and controls algorithm acquiring, etc. Thus, we construct decision rules from sets of condition attributes and decision attribute based on rough set theory. These rules expressed in form of production rule are main body of the scene navigability knowledge base. The experts who work on the field of scene matching accumulate some valuable experience and common sense, which belongs to qualitative knowledge. Such knowledge cannot be gained when analyzing the raw image automatically. It is

Fig. 1. Basic structure of a rule-based scene navigability expert system

548

G. Zhang and L. Shen

necessary to integrate expert knowledge into scene navigability knowledge base. The constraint conditions of the aircraft are also important. It amounts to an increase in the credibility of rule-based expert system when considering actual constraint conditions. The corresponding terrain contour data are needed so that we can make simple classifications of terrain. The information of terrain classification, which is a kind of descriptive geoscience knowledge, reflects spatial geography phenomenon and characteristic of distribution directly. Logical reasoning system is the embodiment of the application for knowledge acquisition, and it applies scene navigability knowledge to the selection of SMA.

3 Knowledge of the Expert System Rough set theory has been introduced as a tool to deal with incomplete, uncertain information. It is no need to build any model for scene signal when we make knowledge acquisition for selecting SMA. Minimal expression of knowledge can be gained through a procedure of reduction of a decision table, which is based on attributes data. As for the selection of condition attributes, there are 8 features to be considered, statistical feature, fractal feature, texture feature, etc. As for that of decision attribute, the acquisition probability is chosen and it is computed on the assumption that scene signal is non-stationary process. How to evaluate acquisition probability is explained in [3]. In the analysis of scene navigability, abbreviations are used to represent attributes and are shown in Table 1. The features and acquisition probability of reference map are integrated as decision table S of navigability information system. Then firstly, it will deal with missing attribute values and discretize real data of decision table S. Secondly decision rules are mined out using some reduction algorithm [4]. Table 1. Description of attributes Attributes

Description

Var

Variance of image

Entropy

Entropy of image.

MD

Margin density of image

RMQ

Quantity of image’s repeat mode

ANQ

Measurement of image’s anti-noise

RSNR

Relative Signal-to-Noise of image

LHR

Ratio of low-frequency energy to high-frequency energy after the wavelet decomposition of raw image

H

Fractal feature of image

Navigability

The acquisition probability of image

A small handful of rules for scene navigability are listed in Table 2. Each rule has associated with some numerical measures, such as accuracy and coverage, which are explained in detail in [2]. They are sorted according to accuracy. These rules read “if Į

Rule-Based Expert System for Selecting Scene Matching Area

549

then ȕ” is denoted by Įĺȕ. The pattern Į is called the rule’s antecedent, while the pattern ȕ is called the rule’s consequent. Intuitively, a “strong” rule is both accurate and has a high coverage. Rules with high coverage form succinct characterizations of the outcome classes, and are good for descriptive purposes. The accuracy of a rule reflects how “trustworthy” its consequent is. Table 2. Some descision rules of scene navigability extracted based on rough set theory

ė

Accuracy

Rule Į ȕ ))

=>

0.191))

=>

RSNR([-, 2.41)) Navigability(Bad)

AND

H([0.191,

Entropy([6.27, )) Navigability(Good)

AND

H([-,

Coverage

0.985507

0.43038

0.983051

0.446154

0.976744

0.323077

0.971429

0.43038

0.96875

0.476923

Var([26.63, 30.96)) AND MD([0.366, )) AND H([-, 0.191)) => Navigability(Good)

0.962963

0.2

MD([0.314, 0.366)) AND ANQ([0.27, 0.36)) AND H([0.191, )) => Navigability(Bad)

0.958333

0.14557

Entropy([5.88, 6.27)) AND MD([0.366, )) AND H([-, 0.191)) => Navigability(Good)

0.952381

0.153846

Entropy([6.27, )) AND RMQ([-, 0.116)) => Navigability(Good) Var([-, 26.63)) Navigability(Bad)

AND

MD([0.366, )) AND Navigability(Good)

……

))

=>

0.191))

=>

H([0.191, H([-,

…

…

There are 734 decision rules mined out by the method of rough set theory. Accuracy and coverage of each rule has been computed. It is inadequate to analyze the matching suitability of reference map based on these rules, and corresponding expert system lacks intelligence. Heuristic knowledge of scene-matching expert, constraint conditions of the aircraft, knowledge of terrain classification should be appended to the knowledge base to increase the effectivity of expert system. For the sake of uniform expression, this knowledge is also expressed as “production rule”. If the scenery of certain reference map varies according to seasons or there are many man-made alterations in this reference map, the expert considers that it cannot be selected as SMA even though its estimated matching suitability is good. If there is a great undulation in the terrain area or residential area included in the reference map, it cannot be selected as SMA either. The heuristic knowledge is listed as rules in Table 3. Among these rules, “Stability” represents the stability feature of scene. “TerrainSigma” represents undulating degree of the corresponding terrain.

550

G. Zhang and L. Shen

Table 3. Some Added Rules for Knowledge Base

ė

Accuracy

Rule Į ȕ Navigability(Good) AND Stability(Bad) => SMA(No)

1

Navigability(Good) AND TerrainSigma(Big) => SMA(No)

1

Navigability(Good) AND Stability(Good) => SMA(Yes)

1

Navigability(Good) AND TerrainSigma(Short) => SMA(Yes)

1

Navigability(Bad) => SMA(No)

1

ಹಹ

ಹ

4 Reasoning Method Researchers in artificial intelligence have developed some reasoning procedures for manipulating uncertain information, such as Bayesian probability theory, Dempster Shafer (DS) theory, fuzzy logic and certainty factor model [5-6]. Certainty factor model was invented for use in the medical expert system MYCIN, which was intended both as an engineering solution and as a model of human judgement under uncertainty. In the rule-based expert system of scene navigability analysis, the method of certainty factor model is used to reason. As has been mentioned before, we model all knowledge of this field in a set of production rules of the form EĺH. Such production rule has the following meaning: if evidence E has been observed, then the hypothesis H is true. An expert associates with the hypothesis H of each production rule EĺH a (real) number, quantifying the degree to which the actual observation of evidence E confirms the hypothesis H. All assertions being considered by this system have associated with them two numbers: measure of belief (MB) and measure of disbelief (MD) of a hypothesis H given evidence E. 1, if P ( H ) = 1 ° . MB( H | E ) = ® max{P ( H | E ), P ( H )} − P( H ) , otherwise °¯ 1 − P(H )

(1)

1, if P ( H ) = 0 ° . MD( H | E ) = ® min{P ( H | E ), P ( H )} − P( H ) , otherwise °¯ − P( H )

(2)

Certainty factors range from –1 to +1, which is given as

CF ( H | E ) = MB( H | E ) − MD ( H | E ) .

(3)

When using the production rules, however, the evidence E used may be an intermediate hypothesis that has been confirmed to some degree CF ( E | S ) not necessarily equalling +1; S denotes present observation. That is, it may be the case that the truth of the evidence E is not known with certainty. After application of the rule EĺH described above, the actual certainty factor for H is computed by means of

Rule-Based Expert System for Selecting Scene Matching Area

CF ( H | S ) = CF ( H | E ) × max(0, CF ( E | S )) .

551

(4)

This computation rule is called the combination function for uncertain evidence. When CF ( E | S ) < 0 , it means that the rule’s antecedent is false, then the rule EĺH will not be used. When CF ( E | S ) > 0 , it means that the rule’s antecedent is true to some degree, then the certainty factor for H is defined by CF ( H | S ) = CF ( H | E ) × CF ( E | S ) .

(5)

Before the combination function for uncertain evidence can be applied, a certainty factor for the entire combination of evidence E has to be known. This certainty facor is computed from the separate certainty factors for each of the atomic pieces of evidences E comprises, using CF ( E1 ∧ E 2 ∧ ∧ E n | S ) = min{CF ( E1 | S ), CF ( E 2 | S ), , CF ( E n | S )} .

(6)

where E1 ∧ E 2 ∧ ∧ E n denotes the conjunction of n evidences, and CF ( E1 ∨ E 2 ∨ ∨ E n | S ) = max{CF ( E1 | S ), CF ( E 2 | S ), , CF ( E n | S )} .

(7)

where E1 ∨ E 2 ∨ ∨ E n denotes the disjunction of n evidences. These functions are called the combination functions for composite hypotheses. When different successful production rules EiĺH (e.g. i=2), conclude on the same hypothesis H, a certainty factor CF ( H | E1 ∧ E 2 ) is derived from the application of each of these rules. The net certainty factor for H is computed using B

B

CF ( H | E1 ) + CF ( H | E 2 ) − CF ( H | E1 ) ⋅ CF ( H | E 2 ), ° if CF ( H | E1 ) ≥ 0, CF ( H | E 2 ) ≥ 0 ° °CF ( H | E1 ) + CF ( H | E 2 ) + CF ( H | E1 ) ⋅ CF ( H | E 2 ), . CF ( H | E1 ∧ E 2 ) = ® if CF ( H | E1 ) < 0, CF ( H | E 2 ) < 0 ° CF ( H | E1 ) + CF ( H | E 2 ) ° otherwise ° 1 − min {CF ( H | E ) , CF ( H | E ) } , 1 2 ¯

(8)

This computation rule is called the combination function for co-concluding production rules.

5 Case Study After introducing certainty factor model, we provide a small case study to exemplify how this model can be applied to analyze the scene navigability of a reference map. A reference map X1 to be classified is shown in Fig. 2, and its condition attributes and decision attribute are described in Table 4. The corresponding terrain is plain, and has little undulation. It means that the TerrainSigma attribute is Small. The satisfied rules are R1: AĺX, R2: BĺX, R3: X FĺY, which are described as follows: B

B

R1: MD([0.366, )) AND RSNR([3.17, )) => Navigability(Good); R2: Var([26.63, 30.96)) AND RMQ([-, 0.116)) => Navigability(Good); R3: Navigability(Good) AND TerrainSigma(Small) => SMA(Yes)

552

G. Zhang and L. Shen

Fig. 2. Reference map x1 to be classified B

B

Table 4. X1’s Attributes Described in Decision Table B

B

Var

Entropy

MD

RMQ

ANQ

RSNR

LHR

H

Navigability

29.81

6.11

0.370

0.098

0.39

3.82

8.68

0.176

Good

Where R1 and R2 are extracted based on rough set theory, and R3 is the field knowledge of expert for scene matching. The accuracy of R1 is 0.88, and that of R2 is 0.96, and that of R3 is 1. Obviously, for this problem, the certainty factors for evidences A, B, and F are 1, namely, CF ( A | S ) = CF ( B | S ) = CF ( F | S ) = 1 . And from evaluation of samples’ navigability, we get P( X ) = 0.45 . In the expert system, let P( H | E ) = accuracy( E → H ) , and then we obtain

㧘 P( X | B) = 0.96 㧘 P(Y | X ∧ F ) = 1 . Applying (1) to (3) to obtain certainty factors for A and B and XшF yields P( X | A) = 0.88

MB ( X | A) =

max{0.88,0.45} − 0.45 = 0.78 . 1 − 0.45

CF ( X | A) = 0.78 , CF ( X | B) = 0.93 , CF (Y | X ∧ F ) = 1 .

(9) (10)

By the combination for co-concluding production rules R1 and R2, we obtain CF ( X | A ∧ B) = 0.98 .

(11)

Making the conjunction of evidence A and B, that is CF ( A ∧ B | S ) = min{1,1} = 1 .

(12)

In terms of propagating principle of certainty factors, we obtain CF ( X | S ) = 0.98 × max(0, CF ( A ∧ B | S )) = 0.98 .

(13)

Rule-Based Expert System for Selecting Scene Matching Area

553

Making the conjunction of evidences X and F, that is CF ( X ∧ F | S ) = min{0.98,1} = 0.98 .

(14)

In terms of propagating principle of certainty factors, we finally obtain CF (Y | S ) = CF (Y | X ∧ F ) × max(0, CF ( X ∧ F | S )) = 0.98 .

(15)

It means that the certainty factor of reference map X1 to be SMA at present observation is 0.98. B

B

6 Conclusions In this paper, we present an overview of our research. Rough set theory is introduced as a useful and successful solution of data mining in scene navigability analysis. The method of certainty factor model is used to reason in the rule-based production system. We develop a novel system for selecting SMA, and collected some real scene data for testing. The future work is to enrich the knowledge base to improve the adaptability of the expert system.

References 1. Pawlak, Z.: Rough Set Theory and Its Application to Data Analysis. Cybernetics and Systems, 29 (9) (1998) 661-668 2. Aleksander, Øhrn.: Discernibility and Rough Sets in Medicine: Tools and Applications. PhD thesis, Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway (1999) 3. Zhang G. Z., Shen L. C., Chang W. S.: An Acquisition Probability Estimation Model of Density Correlation Scene Matching System. International Conference on Image and Graphics, Proceedings of SPIE , China, 4875 (2002) 546-551 4. Jan, G., Bazan, Andrzej, S., Piotr, S.: Dynamic Reducts as a Tool for Extracting Laws from Decision Tables. In Proc. International Symposium on Methodologies for Intelligent Systems, Lecture Notes in Artificial Intelligence, Springer-Verlag, Berlin Heidelberg , 869 (1994) 346-355 5. Heckman, D. E.: Probabilistic Iinterpretations for MYCIN’s Certainty Factors. In, Kanal L. N., Lemmer J. F. (eds.), Uncertainty in Artificial Intelligence North Holland, Holland (1986) 167-196 6. Linda, V. D. G.: A Pragmatical View on the Certainty Factor Model. http://citeseer.ist. psu.edu/vandergaag90pragmatical.html ( 1990)

Multi-channel Measurement of Transmissivity of Smoke Tao Shen and Jian-she Song The Second Artillery Engineering College Section 602, Xi’an Shannxi 710025, China [email protected]

Abstract. The randomness of the single channel measurement of smoke transmissivity is decreased by the multi-channel measuring technique. There are several ways to achieve the multi-channel measurement, and the single radiation source scanning method is the most feasible.

1 Introduction The infrared weapon is used more and more widely in modern wars, and in the photoelectricity count-work, the smoke can decrease and interfere the enemy’s wireless command, radar reconnaissance and weapon control and guide system. It can made the enemy’s radar be blind, wireless command be intermitted and the control and guide weapon lose control. So, the status and effect of smoke in the war can not be ignored. The interfere effects of the smoke to the electromagnetic wave lie on the extinction of the smoke, in this way, the testing work of the smoke capability is very important to us.

2 Smoke Interfere Principle When the light radiates through the smoke screen, due to the light wavelength and the size, shape, and optics character of smoke particulate are different from each other, the smoke can produce refraction, reflection, diffraction and summation of the light. The synthetical effect can make the light intension of the permeated smoke less than the ingoing smoke. Dispersion and absorbency are the basic reason of the light attenuation. According to the different principal of light energy attenuation, the interfering type can be divided to the dispersion and the absorbability. The dispersion smoke is made up of much small atoms. They suspend in the air for a long time, and make the light deflection all the way. The dispersion effects to the light of the smoke are caused by the inner refraction, the external reflection and diffraction of the atoms. The incident beam of any smoke atom is dispersed to various orientations. The dispersed light is irradiated to near atom, and dispersed once more, In this way, the smoke atom is not only lighted by the initial incident beam, but also the many times dispersed light. The synthetical effect is that the energy of the initial incident light is decreased. So, the target can hardly be discovered. The absorbency has the strong absorbability to the light, and it can be regarded as much atoms. These atoms rest on the air, absorb the incident beam strongly, make themselves temperament high, and D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 554 – 558, 2006. © Springer-Verlag Berlin Heidelberg 2006

Multi-channel Measurement of Transmissivity of Smoke

555

eradiate to outside. In the cause of the wavelength of eradiated light is longer than the initial incident beam, the target can hardly be discovered in the same way. The testing and evaluation of smoke interfered effect is the way of testing and evaluation for the shield effect of the smoke to the interfered object. It is connected with the tactics and technique capability of the smoke, the tactics and technique capability of the interfered object, the interferential environment and the condition of the evaluation nearly. It has many characterization techniques, and the transmissivity is the important one. At present, the way of testing the smoke transmissivity mostly adopts single-channel measurement. That is to say, in the area of smoke only the testing light permeates the area is the data acquisition dot. Due to the smoke characteristic and the influence of the weather, the distribution of the smoke mass concentration and variation of physical dimension have the randomicity. The conclusion has the randomicity by single channel measurement. This paper discusses these problems above, and brings forward a feasible measurement.

3 Multi-channel Measurement The perfect smoke transmissivity should be the average transmissivity of each area of the smoke. But nowadays it is very hardly to realize the test. The multi-channel measurement distributes many testing points, forms many testing channels, and tests the smoke transmissivity of different point in the same time. It overcomes the randomicity of single channel. 3.1 Single Radiant Point Reflected Measurement The single radiant point reflected measurement is the easy way to get the smoke transmissivity. The method as follow: Put the radiant source, beam reflectors and detectors to each side of the smoke, it can reflect the electromagnetic wave many times to transmit the smoke and form multi-channel. The characteristic of this way is that it has many beam reflectors and detectors and the configuration of system is simpleness.

Fig. 1. The single radiant point reflect measurement

556

T. Shen and J.-s. Song

The average of the achieved transmissivity is the geometrical mean, namely:

τ s = n τ s1τ s 2 ...τ sn = n

PS Pa

τ s -the average transmissivity of smoke τ s1τ s 2 ...τ sn -the transmissivity of each channel PS -the radiant intensity of the last detector of smoke Pa - the radiant intensity of the last detector of no smoke 3.2 Single Radiant Point Scan Measurement The single radiant point scan measurement is another way to get the smoke transmissivity. The method as follow: put the radiant source and several detectors to two sides of the smoke separately, the radiant source send the light to detector through the smoke as scanning way in turns, and record the testing data. At last, deal with the achieved data and get the smoke transmissivity. 3.2.1 Theory of Measurement Suppose at some time, the laser scans to the detector, the computer collects the data of referent signal and test signal as the fixed speed. The test signal is sent from data transmitter to data receiver. Set the reference signal voltage as V1 , test signal voltage

㧦

as V2 , then

V1 = T1 K1 P0 V2 = T2 K 2TaTs P0

T1 ——optics transmissivity of reference channel K1 ——photo-electricity transfer function of reference channel P0 ——output power of laser T2 ——optics transmissivity of test channel K 2 ——photo-electricity transfer function of test channel Ta ——transmissivity of atmosphere TS ——transmissivity of smoke In this way, the ratio of

V1 and V2 is independent of the laser output power. That

is to say, the changing of the laser output power can not affect the precision of the test transmissivity. The formula below can prove this: K expresses the ratio of V1 and

V2

Multi-channel Measurement of Transmissivity of Smoke

557

V2 T2 K 2TaTS P0 T2 K 2 = = TaTS = K 0TaTS V1 T1 K1 P0 T1 K1 T2 K 2 K 0 as apparatus constant K 0 = T K 1 1 Set K 0Ta as normalized factor, and normalize K , the formula of the smoke K=

transmissivity can be achieved.

Ts = No smoke,

TS =1

㧘K =K T ‫ޕ‬

K K 0Ta

0 a

In the smoke box, the attenuation of the laser can be ignored, No smoke,

Ta =1

K = K0

㧘

3.2.2 Data Transmit and Process All the data collection, data processing and test result offer can be accomplished by software. The test process can be divided in two phases. The first phase is preparation, input the scan scope, orientation angle and the number of test channel to the computer

Fig. 2. The single radiant point scans measurement

558

T. Shen and J.-s. Song

according to the predefined test points. Then the scanner scans as the single stepping till all the detectors have been disposed completely. The second phase is test phase, in the beginning; the computer transfers the instruction to the scan system and data collection system constantly. In the same time, the scan servo mechanism begins to scan. When the scan light scan to the detector, the computer transfers the pause instruction, in the same time, the data collection system begins to collect and transfer reference signal and test signal, the computer records the data accordingly. Then, go to the next detector till to the end. In the abstract, the channel is as more as many, in the other hand, too many channels brings to large expenses, 3 to 7 channels are enough to common settings. The scan period must be satisfied with the smoke dynamic changing in real-time and the request of the detector for the response time, 0.1—1.0 second is set according to common settings. The scan scope should be satisfied with all the smoke, the width is connected with the test distance. The scan precision should be controlled strictly to assure the test precision.

4 Conclusion In the technology of realizing the multi-channel measurement, the single radiant point scan measurement is best way to get the smoke transmissivity. It overcomes the defect of several instruments test in the same time and the single radiant point reflected measurement. It has many merits: the most flexible channel setting, the most high precision, most credible data and most convenient operation.

Acknowledgment This work is supported by the National Nature Science Foundation of China under Grant NO: 60272022.

References 1. Xiong, X. W., Liu, S. Q.: Effect Evaluation on Infrared Aerosol Screening Smoke. System Engineering and Electronics, 23 (2) (2001) 25-27 2. Xue, J. G., Wang, B.: The Measure Elements and Precision Analysis about the Transmissivity of Smoke Screen, Opto-Electronic Engineering, 20 (2) (2005) 9-12 3. Wang, X.: Examination Theory and Technique of Photo-Electricity Passive Interfere, Doctor’ degree Thesis, National University of Defense Technology, 1990 4. Han, J., Zhang, J. Q., He, G, J.: Estimation Technology of the Infrared Smoke Interference Effect, Infrared and Laser Engineering, 33 (1) (2004) 69-72 5. Xu, H. Y., Xie, D. L., Yang, H.: Analysis of the Transmittance of Laser Atmospheric Transmission, Opto-Electronic Engineering, 26 (1999) 18-22

Multi-Model Predictive Control Based on a New Clustering Modeling Method* Luwen zhou and Lifang zhou Institude of Industrial Control, Department of Control Science and Engineering, Zhejiang University, Hangzhou, China {lwzhou, lfzhou}@iipc.zju.edu.cn

Abstract. A new Multi-model predictive control (MMPC) strategy to solve the control problem of Multi-input & Multi-output (MIMO) nonlinear systems is presented in this paper. Multiple models, which are obtained by using advanced K-means clustering and PLS method, are used to describe the different environments and will be switched according to the value of distance coefficient to calculate the control action. The proposed MMPC strategy is applied in pH neutralization process and the simulation results demonstrate its validity.

1 Introduction Model predictive control (MPC) [1] has been applied in industrial process and given a good performance because of its advantages over conventional methods. Although MPC have been matured for linear processes, two major issues limit its possible application into nonlinear systems. The first is how to obtain the proper nonlinear model structure; the second is a nonlinear non-convex optimization problem must be solved for each sampling period with algorithms, which is usually too slow for realtime control due to a large amount of computation [2]. Multi-model approach is considered as an effective method used to track dynamic characteristics of nonlinear systems around the respective operating points. To handle nonlinear problems with MPC algorithm, a multi-model predictive control strategy that relies on a bank of piecewise linear models to capture the possible input-output response behavior [3] has been developed. Using a divide-and-conquer strategy, local linear models set is described and the global output is obtained by the integration of local ones. Hence, MPC algorithm for linear systems can be used for designing controllers. Different multi-model predictive control algorithms are introduced in some literatures as [4-6]. This paper presents a Multi-Model Predictive Control (MMPC) strategy based on advanced k-means clustering modeling to deal with the control problem for MIMO nonlinear systems. Firstly, a new clustering modeling method is simply introduced; *

This work is partially supported by items of Huzhou science and technology plan #2004GG33 to Zhou lifang and Hangzhou produce-study-research to Zhou lifang.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 559 – 564, 2006. © Springer-Verlag Berlin Heidelberg 2006

560

L. zhou and L. zhou

Secondly, MMPC algorithm based on the clustering modeling method is researched; finally, through a typical complex nonlinear process-PH neutralization system, the validity of MMPC proposed in this paper is demonstrated.

2 Multi-Model Predictive Control The issues for MMPC are determining the number and type of models to encompass the plant behaviour and the need to detune the controller. In this section, an advanced K-means clustering modeling method [7] and a multi-model predictive control method are presented in brief. 2.1 Multi-model Identification Based on Advanced K-Means Clustering [7] K-means clustering which has simply algorithm structure and rapid convergence speed is one of the widely used knowledge discovery techniques to reveal structures in a data set. It is used to partition the input-output time-series data into k subsets corresponding to k local-models in multi-model approach. Of course, it also has some disadvantages, hence. In this part, a multi-model approach which is based on advanced k-means clustering [7] is introduced. It is assumed that the complete data set D composed of n data elements described with attributes A1, A2 ,……, Am, where m is equal to the number of attributes, and K predefined number of clusters have been obtained and it is believed that each attribute of the data elements is normally distributed. According to CCIA [8], the initial cluster centres for K-means clustering and all of sub-clusters which potentially exist are computed. When the number of presented sub-clusters is larger than K, it is possible to check whether syncretism rules are executed. This rule is defined as a dynamic threshold value which is relied on the maximum distance for existing clusters at instant time. It can be described as (1). If the rule is satisfied, it is to execute merging clusters. The final number of clusters and corresponding cluster centers are stored. This method is different from the fuzzy satisfactory clustering [9] for it is a partition method which merges all of potential clusters into desired number of clusters depending upon some rules. The algorithm can be seen [7]. threshold =

m 1 max(dist (ci , c j )) , dist (ci , c j ) = ¦ (cip − c jp ) 2 i, j ∈ [1, K ], p ∈ [1, m] . 2 p =1

(1)

Where {C1, C2, ..., CK} represent all cluster centers for instant time. 2.2 MMPC Algorithm Consider an N-input and R-output MIMO system, it has been described by multiple models with K cluster centers, and designed the controllers by multivariable GPC strategy [10]. Hence, doing some simple transformations for multi-models structure into the incremental form so that they are described in (2):

∆y j (i ) = p1j ∆x1 (i ) + p 2j ∆x 2 (i ) + + p mj ∆x m (i ) .

(2)

MMPC Based on a New Clustering Modeling Method

561

These local multi-models can be used as the predictive models in GPC strategy. The structure of MMPC (algorithm 1) can be shown in Fig.1. e Ȉ

Model K

Identification

Modeling Ȉ

Model 2

Switchin

Model 1

u

Ȉ

e Ȉ

Plant

Predictive error Controller 1

Desired output Y*

Controller 2 GPC strategy Controller K

Fig. 1. A general architecture for control using K identification models and controllers

Algorithm 1 Step1: get the corresponding models’ parameters for each cluster and every cluster centre using the modeling method described in section 2.1; Step2: Design a local multivariable GPC controller for each pair of {Ai, Bi}; Step3: Execute the switching scheme to decide which model to use at each instant and when to switch next model. Switching scheme: at time i, the input-space of the system for the current time is known, and then to decide which cluster it belongs to. According to least of distant coefficients, the model corresponding to the cluster which has the minimal coefficient is selected. It is denoted with mathematic expression just like:

dist (d i , cm ) = min{dist (d i , c1 ), dist (d i , c2 ), , dist ( di , ck )} m ∈ [1, K ] .

(3)

Step4: Compute control increment ǻu and predictive output value using multivariable GPC algorithm; Step5: Calculate the actual input value u=u+ǻu and identification error between actual output and predictive output value, and go to step 3).

3 PH Neutralization Process by MMPC In this section, MMPC strategy proposed in the previous section for MIMO nonlinear system will be applied to a typical multivariable PH neutralization process [11] in order to testify its validity. The reason is due to the strong nonlinearities and uncertainty of the PH process.

562

L. zhou and L. zhou

3.1 System Description for PH Process [11] Consider a PH process with three reaction streams: HNO3, NaOH, NaHCO3 and two output variables: liquid level h and PH. The dynamical part of system physical model and parameters can be seen in [11] for detail. 3.2 Multi-model for PH Process Consider a pH process including two-input (Fa, Fb) and two-output (h, pH) and a disturbance (Fbf), to use modeling method to identify the process and giving the simulation condition as:Fa(k)=18+4sin(2Ȇkts/15),Fb(k)=20+4cos(2Ȇkts/25), Fbf(k)=0.55+0.055sin(2Ȇkts/10).Fig 2 shows the modeling results with the proper number of models and the identification models can be found in Appendix A. 0.1

0.1

0

0

-0.1

0

100

200

300

400

-0.1

0

100

200

300

400

Fig. 2. Identification errors of MIMO pH processes modeling (left: pH error; right: h error.)

From the result, it is shown that the model gives a good fit for pH and h and appears to have captured the nonlinearity of the pH process. Table 1 gives RMSE compared with other methods for model predictive error. From table 1, the value of RMSE for the proposed method is superior to the results written by [5] and [11]. Table 1. Comparison the predictive errors with other methods for identification models

method T-S models USOCPN This paper

RSME pH channel 0.216 0.217 0.021

H channel 0.0439 0.114 0.0194

The number of models 6/6 44 4/4

3.3 MMPC for PH Neutralization System In this part, MMPC algorithm (algorithm 1) is applied to keep the pH value and level value following the desired set-points and the parameters are given by: P=8, M=4, Q=eye(2*8, 2*8), Ȝ=0.5*eye(2*4, 2*4), h1=h2=ones(8, 1). Fig.3 shows the performance of MMPC in response to set point changes in pH and h channels.

MMPC Based on a New Clustering Modeling Method 20 flux of Fb

flux of Fa

25 20 15 10

0

100

200 Sample No.

300

18 16 14 12

400

0

100

200 Sample No.

300

400

100

200 sample No.

300

400

20

10

18 h

pH

8 6 4

563

16 14

0

100

200 sample No.

300

400

12

0

Fig. 3. MMPC for MIMO pH process with Fbf=0.55ml/s

As it is shown in Fig.3, the set-points of liquid level h and pH change at every 100 steps, but MMPC reaches the new optimum very quickly with some oscillations. And obviously, the tracking ability for h channel is better but slower than pH channel. One reason is that the definition of pH is described as natural logarithm of H+ concentration and the nonlinearity of PH is more serious than h channel. The other is that pH channel is more sensitive to the flow rate of input variables than h channel.

4 Conclusion In this paper, advanced K-means clustering and PLS is used to describe multi-model structure for the nonlinear system. According to the characteristic of the proposed multi-models, based on the switching scheme of the minimum of distance coefficients, MMPC is proposed and applied in PH process. The simulation results demonstrate the feasibility of the proposed approach.

References 1. Garcia, C.E., Prett, D.M., Morari, M.: Model Predictive Control: Theory and Practice-a Survey . Automatics, vol.25, (1989) 335-348 2. Li, N., Li S.Y., Xi, Y.G.: Multiple Model Predictive Control for MIMO Systems. ACTA automatics sinica, vol.29, (2003) 516-523 3. Schott, K.D., Bequette, B.W.: Multiple Model Adaptive control, Ch.11 in: Multiple Model Approaches to Modeling and Control; Murray Smith & Johanson (eds.), Taylor & Francis, UK (1997) 269-291 4. Brian Aufderheide, Vinay Prasad, B.Wayne Bequette: A Comparison of Fundamental Model-Based and Multiple Model Predictive Control, J. Proc. Of the 40th Conference on Decision & Control, (2001) 4863-4868

564

L. zhou and L. zhou

5. Li, N., Li, S.Y., Xi, Y.G.: Multi-Model Predictive Control Based on the Takagi-Sugeno Fuzzy Models: A Case Study. J. Information sciences, vol.165, (2004) 247-263 6. Lakshmanan, N.M., Arkun, Y.: Estimation and Model Predictive Control of Non-Linear Batch Processed Using Linear Parameter Varying Model, J. Control, vol.72, (1999) 659-675 7. Zhou, L.W., Zhou, L.F.: Multiple Modeling Method Based on Advanced K-means Clustering. J. journal of university of science and technology of china, vol.35, (which has been published) (2005) 62-67 8. Shehroz, S.K., Amir A.: Cluster Center Initialization Algorithm for K-means Clustering. J. Pattern Recognition Letter, vol.25, (2004) 1293-1302 9. Li, N., Li, S.Y., Xi, Y.G.: Modeling of PH Neutralization Processes Using Fuzzy Satisfactory Clustering,. J. control and decision, vol.17, (2002) 143-147 10. 10.Shu, D.Q.: Predictive Control System And its Application. M.Bei jing, Mechanism industry Publishing company,(1996) 11. 11.Nie, J.H., Loh, A.P., Hang, C.C.: Modeling pH Neutralization Processes Using Fuzzyneural Approaches. J. Fuzzy Sets and Systems, (1996) 78:5-22

Appendix A: Identification models for pH process pH channel: if (Fa (k -1), Fb (k -1), Fbf (k -1), pH (k - 2), pH (k -1)) ∈ cluster 1, then pH (k ) = -0.049 + 0.0019Fa (k -1) + 0.0013Fb (k -1) - 0.0243Fbf (k -1) + 0.5095 pH (k - 2) + 0.506 pH (k -1)

if (Fa (k -1), Fb (k -1), Fbf (k -1), pH(k -2), pH(k -1)) ∈cluster 2, then pH(k) = -0.6454 + 0.0022Fa (k -1) + 0.0138Fb (k -1) + 0.2217Fbf (k -1) + 0.6753pH(k -2) + 0.621pH(k -1) if (Fa (k -1), Fb (k -1), Fbf (k -1), pH (k - 2), pH (k -1)) ∈cluster 3, then pH (k ) = 0.006-0.0033Fa (k -1) + 0.006Fb (k -1) + 0.2033Fbf (k -1) + 0.4784 pH (k -2) + 0.497 pH (k -1) if (Fa (k -1), Fb (k -1), Fbf (k -1), pH(k -2), pH(k -1)) ∈cluster 4, then pH(k) = 0.0576-0.0079Fa (k -1) + 0.0117Fb (k -1) + 0.0843Fbf (k -1) + 0.4864 pH(k -2) + 0.4929 pH(k -1)

H channel: if ( Fa (k -1), Fb (k -1), Fbf (k -1), h(k - 2), h(k -1)) ∈ cluster 1, then h(k ) = -0.0613 + 0.0063Fa (k −1) + 0.0002Fb (k −1) − 0.0474Fbf (k − 1) + 0.4983h(k − 2) + 0.4993h(k −1)

if (Fa (k -1), Fb (k -1), Fbf (k -1), h(k -2), h(k -1)) ∈cluster 2, then h(k) = -0.0641+ 0.0025Fa (k −1) + 0.0012Fb (k −1) + 0.0765Fbf (k −1) + 0.4986h(k − 2) + 0.4991h(k −1) if (Fa (k -1), Fb (k -1), Fbf (k -1), h(k - 2), h(k -1)) ∈ cluster 3, then h(k ) = -0.0146 + 0.0004Fa (k −1) + 0.0019Fb (k −1) + 0.0205Fbf (k −1) + 0.4983h(k − 2) + 0.4992h(k −1) if (Fa (k -1), Fb (k -1), Fbf (k -1), h(k - 2), h(k -1)) ∈ cluster 4, then h(k ) = -0.0543 + 0.0023Fa (k −1) + 0.0017Fb (k −1) + 0.0427Fbf (k −1) + 0.4986h(k − 2) + 0.4991h(k −1)

Neural Network-Based an Adaptive Discrete-Time Global Sliding Mode Control Scheme Zhenyan Wang, Jinggang Zhang, Zhimei Chen, and Yanzhao He Department of Automation, Taiyuan University of Science and Technology, Taiyuan 030024, P.R. China [email protected]

Abstract. An adaptive discrete-time global sliding mode control scheme based on neural network reaching law for a class of discrete linear uncertain systems is proposed in this paper. Parameters İ and į were determined previously in the conventional reaching law method, but they are regulated adaptively by a feedforward neural network(FNN) in this paper. Simulation results shown that all advantages of the reaching law are retained, meanwhile the dynamic features and robustness of the control system are improved effectively through shortening the reaching phase and chattering phenomenon is eliminated.

1 Introduction In the past decades, sliding mode variable structure control (SMVSC) has attracted intense research due to its invariance and robustness to uncertainties including modeling errors and external disturbances. However, the SMVSC system is only a routine feedback system during the reaching phase, which means the robustness of the sliding mode controller may not be preserved. The time-varying sliding surfaces were introduced to solve this problem in [1] and [2], but the controllers were all considered and designed for continuous systems. Now, almost all algorithms are realized by digital computers in industry processes; and SMVSC controllers may not have those desirable properties because of the finite sampling time in discrete-time systems. So it is necessary to research the discrete controller design based on time-varying sliding surfaces for discrete-time system. The neural network is a complex net of numerous nerve cells. It has been proven that artificial neural networks can approximate a wide range of nonlinear functions and nonlinear mapping to any desired degree of accuracy under certain conditions [3]. In recent years, the integration of neural network and SMVSC has pushed variable structure control theory into a new development stage [4-7]. In this paper, an adaptive discrete-time global sliding mode control scheme based on shifting switching surface is proposed for discrete-time linear uncertain system. As well as, the reaching law method based on FNN solves the problem in [8-9] that system control performance will be widely affected by parameter matching schemes in the conventional reaching law design method. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 565 – 570, 2006. © Springer-Verlag Berlin Heidelberg 2006

566

Z. Wang et al.

2 Problem Formulation Consider a discrete-time linear uncertain system x(k + 1) = ( A + ∆A) x(k ) + (B + ∆B)u(k ) + f (k ) .

(1)

Where x(k)∈Rn×1 is the state vector, u(k)∈R1 is the control input, f(k)∈Rn×1 is the external disturbance vector, and A, ∆A, B and ∆B are matrices of appropriate dimensions. The matrices ∆A and ∆B represent parameter uncertainties and control gain uncertainties respectively. The following two assumptions hold throughout this paper. Assumption 1. The pair (A, B) is completely controllable. Assumption 2. The matching condition holds for the system (1). With Assumption 2, there exist a 1×n row vector a and scalars b such that ∆A = Ba , ∆B = Bb and f (k ) = Bf (k ) . Therefore, system (1) can be rewritten in x (k + 1) = Ax( k ) + Bu ( k ) + Bd ( k ) .

(2)

Where the generalized disturbance d(k) is constructed as d ( k ) = ax (k ) + bu (k ) + f (k ) . Note that d(k) includes the parametric uncertainties ∆A, the control gain uncertainties ∆B, as well as the external disturbance f(k). Define the discrete shifting switching surface s( k ) = C T x( k ) − γ (k ) .

(3)

Where CT=[c1,c2,…cn-1,cn]. ci is constant and, without loss of generality let us assume cn=1. γ(k) is a step function, which satisfies γ(0)=cx(0) and γ(k)=0 if k≥M (M is the time that the shifting of switching surface stopped). Then, the objective of sliding mode control is to steer the system (2) to remain on the surface s(k + 1) = s(k ) = 0 .

(4)

Due to the controller’s discrete nature this situation is seldom reached, instead s(k) oscillates around the zero, in what it is called quasi-sliding mode. Therefore, a conventional reaching law in [8] was presented as s(k + 1) − s(k ) = −ε T sgn[s(k )] − δ Ts(k ) .

(5)

Where parameters İ and į were optional and satisfy İ>0, 0<įT<1. T is the sampling time. It has been proven that |s(k)| will reach ε T infinitely, and system 2 − δT ε T . However, parameters İ and į were chattering will appear once when s(k ) = 2 − δT

determined previously in the conventional methods [8-9], which made the problem that system control performances were widely affected by parameter matching schemes.

Neural Network-Based an Adaptive Discrete-Time

567

3 Design of the SMVSC System 3.1 The Design of Sliding Mode Controller Assume that the upper and lower bounds of the generalized disturbance d(k) in system (2) is known, respectively du∈Rn and dl∈Rn. Then the average disturbance d 0 = d l + d u 2

and the maximum allowable error

d − dl δd = u 2

are obtained.

The sliding mode controller is designed with the reaching law (5): u (k ) = (C T B)−1 {−CT Ax(k ) + γ (k + 1) + (1 − δ T ) s(k ) −ε T sgn[ s(k )]} + us (k )

(6)

Here the additional control us(k)=-d0-įd sgn[s(k)] is used to overcome the generalized disturbance d(k). With Eq.(2-4), the ideal sliding mode equivalent controller can be obtained ur (k ) = (C T B ) −1 {−C T Ax(k ) + γ ( k + 1)} − d 0 .

(7)

Because the disturbance d(k) is usually unknown, d0 is used as a substitute for d(k) in Eq.(7). Here the reaching law parameters İ and į are regulated adaptively by a 3-l-2 feedforward neural network (FNN). As well as, in order to smooth the control signal −α s ( k ) is used to and eliminate the chattering, a hyperbolic tangent function G[s(k )] = 1 − e 1 + e−α s ( k )

substitute for the signal function sgn[s(k)]. Obviously G[s(k)]∈(-1, 1) and it can substitute for signal function completely when Į is large enough. Substituting İ(k), į(k) and G[s(k)] for İ, į and sgn[s(k)], the practical control law Eq.(6) can be rewritten in u(k ) = (CT B)−1 {−CT Ax(k ) + γ (k +1) + (1− δ (k )T )s(k ) −ε (k )TG[s(k )]} − d0 − δd G[s(k )] .

(8)

3.2 The Reaching Law Base on FNN A 3-l-2 feedforward neural network is used to adaptively regulate parameters İ and į, as shown in Fig.1. Nonnegative sigmoid functions are used as output layers' activation function (İ>0, 0<įT <1). Where wij (i=1, 2, 3; j=1, … , l) is the synaptic weights connecting neuron i of input layer to neuron j of hidden layer; vjp ( p=1, 2) is the weights connecting neuron j of hidden layer to neuron of output layer. [x1, x2, x3]T=[x(k-1), x(k), u(k)]T is the input vector of network. Then the network outputs are obtained as follows: l

r

j =1

i =1

ε (k ) = g1 (¦ v j1 f (¦ wij xi )),

l

r

j =1

i =1

δ (k ) = g2 (¦ v j 2 f (¦ wij xi ))

(9)

Where l and r=3 is the total number of hidden neurons and input neurons. And hyperbolic tangent function f ( x) = 1 − e −2 x is used as hidden layers’ activation function; 1 + e −2x

1 g1 ( x) = 1 + e −x

and

1/ T g2 (x) = 1+ e−x

are used as FNN output layers’ activation function.

568

Z. Wang et al.

f wij

v jp

İ(k)

x( k 1)

g1 B

B

x( k ) u (k )

g2

B

B

G (k)

Fig. 1. FNN structure

According to Eq.(7) and Eq.(8), define the error performance index J=

1 1 2 ¦[ur (k) − u(k)] = 2 ¦e2 (k) . 2

(10)

According to steepest descent algorithm, the improved addition momentum method is used to update weights by minimizing the error performance index J. r

∆v j1 ( k ) = − (C T B ) −1η Te( k )G[ s ( k )]ε ( k )(1 − ε ( k )) f ( ¦ wij xi ) + µ ⋅ ∆v j1 ( k − 1) i =1

r

∆ v j 2 ( k ) = − ( C T B ) − 1η Te ( k ) s ( k )δ ( k )(1 − T δ ( k )) f ( ¦ vij xi ) + µ ⋅ ∆ v j 2 ( k − 1) i =1

r

∆ wij ( k ) = − ( C B ) η Te ( k )(1 − f ( ¦ wij xi )){G [ s ( k )]ε ( k )(1 − ε ( k )) v j1 xi T

−1

2

i =1

− s ( k )δ ( k )(1 − T δ ( k )) v j 2 xi } + µ ⋅ ∆ wij ( k − 1)

Where the learning rate Ș and the momentum factor ȝ satisfy Ș, ȝ∈(0,1). 3.3 Shifting Switching Surface Algorithm The algorithm of shifting switching surface is outlined as follows: (1) Choose the appropriate parameter ci, and γ(0) is designed to satisfy s(0)=0. (2) The shifting direction is determined by the sign of γ(0). I.e. if γ(0)<0, upward, and else if γ(0)>0, downward. (3) The surface is immediately shifted before the next sample period. The value γ(k+1) is a fitted multiple of γ(k) simply, i.e. γ(k+1)=(1±λ)γ(k). γ(k+1)=(1+λ) γ(k) as upward, else γ(k+1)=(1-λ)γ(k) as downward. Where 0<λ<1. It can assure that γ(k)=0 at the step of k≥M instead of chattering around the zero. (4) The shifting stops if γ(k)=0. 3.4 The System Robustness Analysis Theorem 1. For the system (2) with control law (8), if e in Eq(10) is convergent to em, then the essential condition that sliding mode control system approach the sliding mode surface is |s(k)|>max{ |CTB(įd - em)|, |CTB(įd+ em)| }

(11)

Neural Network-Based an Adaptive Discrete-Time

569

Proof. e is convergent to em, i.e. ur(k)-u(k)=em

(12)

2

Select the Lyapunov function L(k)=s (k), According to Eq.(7) and Eq.(12), then 2

L(k + 1) − L(k ) = s 2 (k + 1) − s 2 (k ) = ª¬C T x(k + 1) − γ (k + 1) º¼ − s 2 (k ) 2

= ª¬C T ( Ax(k ) + Bur (k ) − Bem + Bd (k )) − γ (k + 1) º¼ − s 2 (k ) 2

= ª¬C T B(d (k ) − d0 − em ) º¼ − s 2 (k )

It is obvious that |d(k)-d0|<įd, so L(k+1)< L(k) if |s(k)|>max{|CTB(įd - em)|, |C B(įd+ em)|}. In the theorem, it is shown that system will approach asymptotically the some region of the zero if the sliding mode function satisfy Eq.(11). Considering the uncertainties, the value of s(k) may increase and overtop the region. However, it will move again toward the region once when the sliding mode function satisfy Eq.(11). T

4 Simulation Example To illustrate the efficiency of the proposed method, a two-order linear uncertain system is considered: ª x1 (k + 1) º ª0.3 1 º ª x1 (k ) º ª0º ª 0º « x (k + 1) » = «0.5 0.6» « x (k ) » + «1 » u (k ) + «1 » d (k ) ¼¬ 2 ¼ ¬ ¼ ¬ ¼ ¬ 2 ¼ ¬

The initial values of network parameters are made stochastically. And the following numerical values are employed: l=10, ȝ=0.05, Ș=0.6, CT= [0.3 1], T=0.01, α=2, λ=0.1. The system initial state is selected x(0)=[2 -0.5]T and the generalized disturbance d(k)=1.0+1.0sin(0.01kT). The simulation results are shown in Fig.2-3. The result is shown in Fig.2 with the method of this paper, and shown in Fig.3 with fix switching surface method. 2 1.8

2

--- without generalized disturbance üwith generalized disturbance

1.6 1.4 1.2 x1 B

without generalized disturbance üwith generalized disturbance

1.5 x1 1

1

B

B

B

0.8

0.5

0.6 0.4

0

0.2 0

-0.5 0

10

20

30 k

Fig. 2.The trajectory of B x1

50

60

0

1

2

3 k

4

5

Fig. 3. The trajectory of xBB1

6

570

Z. Wang et al.

According to the state curve of x1 shown in Fig.2 and Fig.3, it can be seen that the dynamic features and robustness of the control system are improved effectively through shortening the reaching phase and system chattering is eliminated effectively with shifting switching surface. Simulation results show that the robustness of the system is obtained.

5 Conclusions An adaptive discrete SMVSC scheme based on shifting switching surface has been proposed for a class of discrete-time linear uncertain systems. The reaching law parameters İ and į are regulated adaptively by FNN rather than determined previously in the conventional method. Better dynamic response performance is achieved with shifting switching surface even if the generalized disturbance. The simulation results demonstrate the effectiveness of the proposed method through shortening the reaching phase and regulating adaptively parameters by FNN.

Acknowledgement The authors appreciate the subvention of Shanxi Nature Science Found (20041049) and Taiyuan University of Science and Technology Youth Found (2005014) (China).

References 1. Choi, S.B., Cheong, C.C., Park, D.W.: Moving Switching Surfaces for Robust Control of Second-order Variable Structure Systems. International Journal of Control. Vol. 58 (1992) 229–245 2. Chi, Y.D., Wang, Y., Shao, H.H.: Method to Design Time-varied Sliding Surface. Journal of Shanghai Jiaotong University. Vol. 32 (1998) 70–73 3. Hunt, K.J., Sbarbaro, B., Zbikowski, R., Gawthrop, P.J.: Neural Networks for Control System–a Survey. Automatica. Vol. 28 (1992) 1083–1112. 4. Da, F.P., Song W.Z.: Adaptive Control for a Class of Nonlinear Systems Based on Fuzzy Neural Networks Sliding Mode Controller. Acta Electronica Sinica. Vol. 27 (1999) 117–119 5. Fang, Y., Chow, T.W.S., Li, X.D.: Use of a Recurrent Neural Network in Discrete Sliding-mode Control. IEE Proc-Control Theory Appl.. Vol. 146 (1999) 84–90 6. Wang, Z.Y., Zhang, J.G., Chen, Z.M.: Variable Structure Control for Nonlinear Discrete-time System. Journal of System Simulation. Vol. 17 (2005) 2483–2485 7. Wang, Z.Y., Zhang, J.G., Chen, Z.M.: A Survey of Research on Neural Network Sliding Mode Variable Structure Control. Information and Control. Vol. 34 (2005) 451–456 8. Gao, W.B., Wang, Y.F., Homaifa, A.: Discrete-time Variable Structure Control Systems. IEEE Trans. on Ind. Electr. Vol. 42 (1995) 117–122 9. Zhai, C.L., Wu, Z.M.: Variable Structure Control Design for Uncertain Discrete Time Systems. Acta Automatica Sinica. Vol. 26 (2000) 184–191

Real Coded Genetic Algorithm for Optimizing Fuzzy Logic Control of Greenhouse Microclimate Fang Xu, Jiaoliao Chen, Libin Zhang, and Hongwu Zhan The MOE Key Laboratory of Mechanical manufacture and Automation, Zhejiang University of TechnologyHangzhou, Zhejiang Province, China 310032 [email protected]

Abstract. The artificial intelligent techniques are proposed for the control of the greenhouse microclimate, and a new real-coded genetic algorithm (RCGA) for optimizing fuzzy logic control (FLC) of greenhouse temperature is developed. Based on the balance of the energy, the model of the temperature in the greenhouse is built. According to the model, Gaussian input membership functions the error and the change-in-error of the temperature of FLC is optimised by RCGA in terms of the root-mean-square error (RMSE) with setpoint and input energy. The good performance control curve line of the greenhouse temperature is obtained using the optimized FLC. Compared with the conventional fuzzy control of greenhouse microclimate, it gives better performance in terms of precision, energy and robustness.

1 Introduction The control of greenhouse climate, in order to improve the crop quality, quantity and to minimize production costs, is becoming increasingly important for the growers. The greenhouse temperature is one of the key factors affecting the crop development, and is influenced by many factors such as the complexity of the outside weather, the actuators, and the crop itself. The greenhouse system is a multivariable, highly non-linear, and nonstationary system that is often difficult to be analyzed with the classical control techniques, such as Bang-Bang control, multivariable adaptive control, optimal control and Proportional-Integral-Plus (PIP) control [1]. Moreover, these methods induce choices to simplify assumptions, and they are often very sensitive to the disturbances which are not envisaged in the model. To solve these problems, some researchers proposed the fuzzy logic control (FLC) of greenhouse microclimate [3, 4]. But one fuzzy controller needs to determine a lot of parameters, the expert's experience only play a guide role and is difficult to make sure every parameter accurately. To overcome the problem of knowledge acquisition, GA offers an efficient search method for a complex problem space and can be used as powerful optimization tools. The paper proposed the optimizing fuzzy logic controller of greenhouse microclimate using RCGA, in order to improve control precision and minimize input energy. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 571 – 577, 2006. © Springer-Verlag Berlin Heidelberg 2006

572

F. Xu et al.

2 Physical Model The sensible heat balance inside a greenhouse defines the rate of change of temperature dT / dt , and is used to compute the heating requirements [5]:

V ρ Cp

dTi = Qr + Qheater − Qv − Qc − Ql − Qsoil − Qleaf − Qtran dt

where: Ti is the air temperature in greenhouse (

ć); V

is the volume of the green-

house ( m ); ρ is the air density in kgm ; C p is specific heat ( Jkg −1 K −1 ); Qr is 3

−3

(1)

the net solar radiation into greenhouse ( W ); Qheater is the energy input from heating system ( W ); Qv is heat flux from ventilation and infiltration ( W ); Qc is the heat flux through the cover ( W ); Ql is the energy flux due to the long wave thermal radiation ( W ); Qsoil , Qleaf is the heat flux due to the convection between greenhouse air with soil and crop leaves ( W ); Qtran is the heat flux due to crop transpiration ( W ). The dynamic of greenhouse temperature model validation have been verified (see the Ref [6] in detail), and it can be used as a virtual greenhouse.

3 Methodology 3.1 Fuzzy Logic Controller (FLC) Design

As it is know from fuzzy logic principles, the FLC acts as a nonlinear system implementing human-based reasoning for the computation of the control values, and is defined by a set of linguistic rules and fuzzy sets. FLC is able to compute appropriate values for the heater actuator by taking into account information data coming from the actual system. The fuzzy logic process was implemented with three operations: 1. Fuzzify numerical inputs using input membership functions. 2. Apply fuzzy operators to the antecedents of the rule base, shape the consequent portion of the rules, and aggregate each rule’s output into a common fuzzy set. 3. De-fuzzify the aggregate fuzzy set to obtain control output. To demonstrate the FLC design, a two-input and one-output FLC structure for greenhouse control is considered in this paper. Triangular input membership functions for the error of temperature use seven linguistic variables to apportion over the range of -1°C to 1°C (NB, NM, NS, ZO, PS, PM, PB). Triangular input membership functions for the change-in-error use seven linguistic variables to apportion over the range of –0.3°C to 0.3°C (NB, NM, NS, ZO, PS, PM, PB). A Triangular output membership function of -3 to 7 (heat1, heat2, heat3, no-change, cool1, cool2, cool3) performs the process of implementing rules, and aggregating a response to provide a crisp output command. The output membership functions consist of three stages of heat, three stages of cool and no-change.

Real Coded Genetic Algorithm for Optimizing FLC of Greenhouse Microclimate

573

A rule base maps linguistic inputs to outputs and the fuzzy process quantifies these actions. According to the expert intuition, the rule base consisting of 49 rules is followed: 1. If the error=NB, change-in-error=NB, then output=heat3; 2. If the error=NB, change-in-error=NM, then output=heat3; 3. If the error=NB, change-in-error=NS, then output=heat3; … We take the max-min reasoning and the center of gravity (COG) defuzzification method, which are frequently used in the FLC design. 3.2 Genetic Algorithm(GA)

The parameters defining the fuzzy system involve in the definition of the membership functions, and also the conclusion of the rules. GA is a more general and flexible method that is capable of searching wide solution spaces and avoiding local minima. The advantages of GA over conventional parameter optimization techniques are that they are appropriate for the ill-behaved problem, highly non-linear spaces for global optima. There are two types of coding methods for GA. They are binary and realcoded GA, respectively. Since the parameters that we are interested in the present study are real numbers, the latter is applied for evolution. However, the advantages of RCGA compared to binary GA are its speed and accuracy, since it is not necessary to transform the real numbers into binary numbers [7]. These real-coded numbers are combined together as a string or structure and are called as a chromosome while each number of the chromosome is treated as a gene. The GA starts with a population of n randomly generated structures, where each structure encodes a solution to the task at hand. Thereafter, the RCGA further processes a fixed number of generations until it satisfies some stopping criterion by using three operators, selection, crossover, and sequential mutation. The structure with optimum, or highest, fitness value of the last population is selected.

Start

Setpoint

Initial population

Fuzzify

du/dt

New populatin Input FLC system

Rule base

Mutation

Crossover

Evaluate fitness RMSE and Input energy ?

YES

De-fuzzify

NO Slect higher fitness reproduction

NO

Heating/ventilation

N>25 Temperature

Greenhouse tmeperature model

YES End

Fig. 1. Framework of optimizing FLC using RCGA

574

F. Xu et al.

The overall framework of our proposed method could be summarized as shown in Fig.1. GA has the potential to be used to evolve both the fuzzy rules and the corresponding fuzzy set parameters. In this paper, RCGA has been used to choose the most appropriate parameter values characterizing the fuzzy membership functions, which are five membership function on error (Fig.2 left) and three on change-inerror (Fig.2 right). The Gaussian membership functions instead of triangular have been adopted, and each of them is characterized by two parameters, including the center and variance of the Gaussian function. According to the greenhouse temperature model, the parameters of the fuzzy membership functions choose in each generation are used to simulate the FLC of greenhouse temperature, and the fitness value can be achieved. The setup of RCGA is as follows. 1. The size of population: 10. 2. The number of generations: N =25. 3. The type of selection: normal distribution selection probability method (one of the ranking selection method), Ps=0.08. 4. The type of crossover: arithmetic crossover. 5. The type of mutation: Non-Uniform Mutation, shape parameter b=0.5. For the asexual reproduction operator, best 10% of chromosomes in current generation are directly copied to the next generations as offspring. The RMSE between simulated temperature and setpoint ∆ and input energy Qtotal work as the fitness functions. ∆Ti = (Qr + Qheater − Qv − Qc − Ql − Qsoil − Qleaf − Qtran ) / V ρ C p (Ti (l −1) + ∆Ti − setpoint( l −1) ) 2

(2)

l = 2,3, ⋅⋅⋅, n

(3)

Qtotal = ¦ Qheater (l ) =¦ (Vo ( l ) * Table) l = 1, 2, ⋅⋅⋅, n

(4)

∆=

n n

n

l =1

l =1

Where ∆Ti is the change-in-error, setpoint is the setpoint temperature. Vo is the output value of the FLC, Table is the value of lookup table of input energy.

Fig. 2. Comparison with the error (left) and change-in-error (right) input function through RCGA optimization

Real Coded Genetic Algorithm for Optimizing FLC of Greenhouse Microclimate

575

The real data collected from Dec. 16th to Dec. 22nd in 2003 in the experimental greenhouse are used to generate input files for the greenhouse model. It took about 855 seconds to run one optimization on Windows XP PC with Pentium IV 2.93GHz and 2GBytes RAM. The optimized program was implemented in Matlab, and optimized input functions on the error and change-in-error (Fig.2) were reached in term of the RMSE and input energy.

4 Results and Discussion Data were collected in a Venlo experimental greenhouse, which is located in an intelligent greenhouse in the Zhejiang Province Hi-Tech demonstration garden of agriculture, Hangzhou, China (East longitude 120°12', North latitude 30°13'). The available data were collected from Dec. 16th to Dec. 24th in 2003.

Fig. 3. The FLC block diagram of the greenhouse temperature

Matlab/simulink is applied to FLC simulation of greenhouse (Fig.3). The greenhouse model is integrated with a fixed time step (30s) Runge-Kutta algorithm (ode4). The greenhouse temperature is used in a feedback loop, the setpoint temperature is subtracted, and the resultant control error is sent to a multiplexer and then to FLC. Controller output is a discrete number {-3, -2,-1, 0, 1, 2, 3} representing relay activation. The two switch blocks and their lookup table determine what values for heat and ventilation are sent to the greenhouse model at each time step. The real data of outside climate (radiation, outside temperature and wind speed) on Dec. 23rd and Dec. 24th in 2003 are used to input data of greenhouse temperature model. Through the FLC simulation in matlab, the simulation results of greenhouse temperature and input energy are achieved using conventional FLC (Fig.4) and optimized FLC (Fig.5) respectively. Compared with the simulation temperature in the greenhouse and the input energy of the heat requirement of the conventional FLC, optimized results using RCGA eliminate the disturbance around setpoint of FLC resulted from the input energy disturbance, improve the precision of temperature, and minimize the input energy. From the point of the input energy, it shows that optimized FLC using RCGA is easy to implement in practice.

576

F. Xu et al.

Fig. 4. Simulated temperature (left) and input energy (right) of the conventional FLC

Fig. 5. Simulated temperature (left) and input energy (right) of the optimized FLC using GA

5 Conclusion In the paper, we have proposed the optimizing FLC of greenhouse temperature using RCGA. Through RCGA optimization of the FLC parameters setting, the best fitness in terms of the root-mean-square error (RMSE) between setpoint and input energy is reached and the good performance control curve line of the greenhouse temperature is obtained. Compared with the conventional fuzzy control, it can give the better performance in terms of precision, energy and also robustness.

Acknowledgements We are grateful to the staffs of the Zhejiang Province Hi-Tech demonstration garden of agriculture, who helped with the experiments. This study was supported by the Chinese National Basic Research Priority Program (2005CCA04600) and Zhejiang Science & Technology Committee (2003C14003).

References 1. Sigrimis, N., Arvanitis, K.G., Gates, R. S.: A Learning Technique for a General Purpose Optimizer.Computers and Electronics in agriculture. 26 (2) (2000):83-103 2. Lafont, F., Balmat, J. F.: Optimized Fuzzy Control of a Greenhouse. Fuzzy Sets and Systems. 128 (1) (2002) 47-59 3. Chao, K., Gates, R.S., Sigrimis, N.: Fuzzy Logic Control Design for Staged Heating and Ventilating Systems. Transaction of the ASAE. 43 (6) (2000) 1885-1894

Real Coded Genetic Algorithm for Optimizing FLC of Greenhouse Microclimate

577

4. Gates, R. S., Chao, K., Sigrimis, N.: Identifying Design Parameters for Fuzzy Control of Staged Ventilation Control Systems, Computers and Electronics in Agriculture, 31(1) (2001) 61-74 5. Bot, G.P.A.: Physical Modelling of Greenhouse Climate. Proceedings of the IFAC/ISHS Workshop. (1991) 7-12 6. Fang, X., Zhang L. B., Chen J. L., Zhan H. W.: Modeling and Simulation of Subtropical Greenhouse Microclimate in China. Chinese Society of Agricultural Machinery. 36 (11) (2005) 102-105+131 (in Chinese) 7. Kuo, R.J., Chen, J.A.: A Decision Support System for Order Selection in Electronic Commerce Based on Fuzzy Neural Network Supported by Real-coded Genetic Algorithm. Expert Systems with Applications. 26 (2) (2004) 141-154

Research and Implementation on the Mobile Intelligent Controller for Home Automation Service Jonghwa Choi, Dongkyoo Shin, and Dongil Shin* Department of Computer Science and Engineering, Sejong University, 98 Kunja-Dong Kwangjin-Gu, Seoul, Korea [email protected], [email protected], [email protected]

Abstract. This paper described the mobile intelligence controller proving an automatic home service based on a user’s preference in a smart home. The mobile intelligence that is presented in this paper optimizes context-aware middleware on a wireless device (a PDA) for a smart home. We presented a system architecture and class diagram of the mobile intelligence controller for automatic service of consumer electronics. We implemented an ECG sensor, environment sensor and a consumer device driver for our experiment. We applied a SVM (Support Vector Machine) to decide on home service through recognition of the user and environment contexts. To increase the pattern algorithm’s precision, we applied different importance of contexts according to the characteristics of consumer electronics. Keywords: Smart Home, Home Automation Controller, Context-Aware Middleware, Intelligent Home Agent.

1 Introduction Ubiquitous computing environment consist of a large number of autonomous agents that work together to transform physical spaces into smart and interactive environments [1]. In order to implement ubiquitous space, context-aware middleware is presented. Many researchers presented various models of context-aware middleware for a smart home. The context provider acquires contexts from the home environment and the biometrics sensor device. Then, all contexts are normalized by the context manager. The home service predictor predicts home service by recognition of context patterns, and the home service provider executes home service (consumer electronics service) through the home network. The contexts are the location of the user, the identity of people, and physical objects that are useful for building context-aware middleware [2]. A context-aware middleware predicts efficient consumer electronics service using all contexts that are acquired in a home. The home environment that context-aware middleware is embodied is known as a smart home [3]. The smart home not only makes home life more fun and convenient, but can enhance home safety as well. A lot of work has been done in the area of smart homes over the past few years. The MavHome smart home project is a multidisciplinary research project *

Corresponding author.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 578 – 583, 2006. © Springer-Verlag Berlin Heidelberg 2006

Research and Implementation on the Mobile Intelligent Controller

579

at the University of Texas at Arlington focused on the creation of an intelligent and versatile home environment [4]. The goal of the Managing an Adaptive Versatile Home (MavHome) project is to create a home that acts as a rational agent. In their paper, Sajal and Diane introduced the MavHome project and its underlying architecture, and the role of prediction algorithms within the architecture is discussed, and three prediction algorithms that were central to home operations were presented [5]. The smart home user interface is a working prototype for an interface for an automated home. It is a powerful graphical interface created with standard 3D programming tools to control and supervise the home [6]. In order for a smart environment to provide services to its occupants, it must be able to detect its current state or context and determine what actions to take based on this context. Dey and Abowd discuss the requirements for dealing with context in a smart environment and present a software infrastructure solution [7].

2 Architecture of the Mobile Intelligence Controller Figure 1 shows architecture of the mobile intelligence controller for consumer electronics that is presented in this paper.

Fig. 1. Architecture of the Mobile Intelligence Controller

The mobile intelligence controller communicates with the ECG sensor using RS232. Also, the mobile intelligence controller uses the user’s facial expression, room temperature, pulse, body temperature, and iris recognition as contexts. All sensor data is acquired through wireless (IEEE 802.11b) and RS232 communication. The mobile intelligence controller predicts consumer electronics service for the user using acquired sensor data. We used an algorithm that integrates a rule-based algorithm and a pattern recognition algorithm for pattern analysis of the user’s consumer electronics service. Figure 1 also shows the structure of the ECG sensor to acquire the user’s pulse and temperature. The ECG sensor applied the PROTETION to detect error signal data. Figure 2 shows the structure of the mobile intelligence controller.

580

J. Choi, D. Shin, and D. Shin

Fig. 2. Structure of the Intelligence Mobile Controller

We use a PDA as a hardware device of the mobile intelligence controller. Each component that is presented as the mobile intelligence controller’s elements acts in the Window CE operating system. The mobile intelligence controller includes four components. The controller component manager manages life cycle of all the components. The context provider acquires context from the sensor device that is installed in the home and the user. It also manages the status of all sensor devices. All contexts that are acquired by the context provider must pass through a context normalization process to be offered by the home service predictor’s input value. The context normalization is to increase accuracy of the pattern recognition algorithm. We use a SVM (Support Machine Vector) for recognition of the context pattern [9]. The context manager takes charge of context normalization. The home service provider executes the home service (consumer electronics service) that is predicted by the home service predictor for the user. Figure 3 shows the structure of the context provider with the sensor device. The context provider acquires the biometrics contexts (pulse, body temperature, ECG, facial expression and eye tracker) and the home environment contexts (room temperature, user location and user motion in home). We use a WiFi (wireless LAN) for network communication of sensor devices. To recognize the human’s location and motion, we acquire color images (720 x 486) from the digital network camera every two seconds. We use four digital network cameras to acquire a real-time image of the human. The camera system used in this experiment is consisted of one server camera (32 bit RISC CPU) and three client cameras. The facial expression device divides user’s facial expression by 7 (blank, surprise, fear, sad, angry, disgust and happy). The facial expressions are categorized as described in [8]. We installed a temperature sensor on an embedded board to measure the home’s temperature. Also, an embedded board for a temperature sensor is installed on the wireless LAN card for network communications. The ECG sensor uses four tags to acquire the user’s pulse and temperature. The context data is delivered to the home service predictor after it passed through the normalization process in the context manager. The home service predictor decides the home service for the user through the context’s pattern recognition. We use an SVM to analyze pattern recognition of the user’s home service.

Research and Implementation on the Mobile Intelligent Controller

581

Fig. 3. The Structure of the Context Provider

Figure 4 shows a class diagram of the mobile intelligence controller. The home service predictor used an algorithm that integrates a rule-based algorithm (RuleBasedController class) and a pattern recognition algorithm (PatternRecognitionController class) for the prediction of home service. The five home services that are provided in the experiment are TV, audio, DVD, air-conditioner and light. We use a thread of the SVM learning process and the SVM prediction to predict the five home services. The home service that is predicted by the home service predictor is provided to the user by the home service provider.

Fig. 4. Class Diagram of the Mobile Intelligence Controller

582

J. Choi, D. Shin, and D. Shin

3 Experimental Results and Evaluations We experimented with the accuracy of the home service that is predicted by the mobile intelligence controller. We compared the performance of the mobile intelligence controller based on different levels of context importance. First, we applied a higher importance to context data related to the occupant. Second, we applied a higher importance to context data related to the environment. Finally, we applied importance of context data by the characteristics of the home appliances.

Fig. 5. Experiment is applied importance of context data by the characteristics of the home appliances

As a result of the experiment, when the mobile intelligence controller predicts automatically a home service for the user using a pattern recognition algorithm, applying the differentiated context by the characteristics of the home appliances heightens the prediction rate of the home service.

4 Conclusions A smart home must be able to achieve a role that predicts automatically the home service that user wants by using the context acquired from the user and home environment. Most smart home research until now provided a home service according to a

Research and Implementation on the Mobile Intelligent Controller

583

rule defined by the smart home server. In this paper, we described the mobile intelligence controller as providing an automatic home service based on a user’s preference in a smart home. We presented system architecture and class diagram of the mobile intelligence controller for automatic control of consumer electronics. In addition, we implemented an ECG sensor, a home environment sensor and consumer electronics device driver for the experiment. We applied different importance of context according to the characteristics of the consumer electronics. And we tested performance of the mobile intelligence controller for the consumer electronics service. As seen in out experiment, when the mobile intelligence controller predicts automatically a home service for the user using a pattern recognition algorithm, applying a differentiated context by the characteristics of the home appliances heightens the prediction rate of the home service.

References 1. Anand, R., Royh, C.: A Middleware for Context-Aware Agents in Ubiquitous Computing Environments. LNCS 2672, (2003) 143-161. 2. Anid, K, Dey.: Providing Architectural Support for Building Context-Aware Applications. PhD Thesis. Georgia Institute of Technology, (2000) 3. Das, S, K. Cook, D, J.: Guest Editorial Smart Homes. Wireless Communications, 9 (2002) 62-62 4. Managing on Adaptive Versatile Home. http://mavhome.uta.edu/ (2004) 5. Das, S. K., Cook, D. J., Battacharya, A. Heierman, E. O.: Tze-Yun Lin.: The Role of Prediction Algorithms in The MavHome Smart Home Architecture. IEEE Wireless Communications, 9 ( 6) (2002) 77-84 6. Borodulkin, L., Ruser, H., Trankler, H. R.: 3D Virtual “Smart Home” User Interface. Virtual and Intelligent Measurement Systems (VIMS’02), 2002 IEEE International Symposium, (2002) 111-115 7. Dey, A. K., Salber, D., Abowd, G. D.: A Context-based Infrastructure for Smart Environments. Proceedings of The 1st International Workshop on Managing Interactions in Smart Environments (MANSE’99), Dublin, Ireland (1999) 114-128 8. Charles, D.: The Expression of The Emotions in Man and Animals. Electronic Text Center. University of Virginia Library, Virginia, USA (1898) 9. Vapnik V, N.: The Nature of Statistical Learning Theory. Springer-Verlag, Berlin Heidelberg New York (1995)

Sampled-Data Systems with Quantization and Slowly Varying Inputs Ge Guo, Huipu Xu, and Yuan Tian School of Automation, Dalian Maritime University, Dalian 116026, China [email protected]

Abstract. Boundedness properties of nonlinear sampled-data systems with quantization and slowly varying exogenous input signal are studied. We prove that nonlinear sampled-data systems with slowly varying exogenous input signal are ultimately bounded under quantization effects if the variation rates of the exogenous input signal and the quantization function are sufficiently small.

1 Introduction The analysis and synthesis of sampled-data control systems have been of continuing interest for several decades, see, e.g., [1]-[3] and the references cited in these papers. Most of the results don’t take into account quantizer effects in the digital controller and in the interconnecting elements. In [4] quantization effects are studied for linear systems. L. Hou [5] studied quantization effects in nonlinear sampled-data systems with linear digital controllers. D.A. Lawrence [6] studied boundedness of nonlinear sampled-data control systems with slowly varying input without considering the effects of quantization. The major objective of this paper is to extend the result of [6] to systems with quantizer effects. The nonlinear plant is described by

x (t ) = f ( x(t ), d (t )) + Ke(t ) e(t ) = e(k ) , t ∈ [k , k + 1) , k = 0,1,

(1)

and the digital controller is u (k + 1) = Mw(k ) + Du (k ) p(k ) = Nu (k )

(2)

where x ∈ R n , w ∈ R l , u ∈ R p , e, p ∈ R m , K, D and M are real matrices of appropriate dimensions, d (t ) is the slowly time varying exogenous input, f ( x, d ) is twice

continuously differentiable on an open connected set containing the origin with f (0,0) = 0 .

2 Main Result We assume that the sampled-data system has a family of constant operating points parameterized by constant values of the exogenous signal d (t) = α . Also, it is assumed D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 584 – 589, 2006. © Springer-Verlag Berlin Heidelberg 2006

Sampled-Data Systems with Quantization and Slowly Varying Inputs

585

that there exist continuously differentiable functions x (⋅) and u (⋅) on a compact convex set Γ containing the origin such that f ( x (α ), α ) + Bu (α ) = 0

(3)

u (α ) = Cx (α ) + Du (α ) , α ∈ Γ

(4)

Then define the state of the sampled-data system

[

]

(5)

]

(6)

xe (t ) = x T (t ), u T (k + 1)

T

The parameterized constant operating point is

[

xe (α ) = x T (α ), u T (α )

T

Let the following matrix-valued function characterize the family of linearized plants about the family of constant operating points ∂f ( x (α ), α ) A(α ) = (7) ∂x For interface elements and digital controllers with quantization effects [5], w(k ) = Q ( y (k )) = Lx (k ) + q y (k ) = Mw(k ) + Du (k ) + qu (k ) e(k ) = p(k ) = Q( Nu (k )) = Nu (k ) + q p (k )

(8) (9) (10)

where Q(θ ) = θ + q(θ ) , q(θ ) < ε for all θ ∈ R , ε is a positive constant depending on the desired precision, q y , qu and q p should be interpreted as vectors whose components contain quantization terms. It is easily verified that there exist positive constants J ς which are independent of ε such that qς (k ) ≤ J ς ε , ς = y, u, p , k = 0,1, In the following, we will represent the nonlinear system by x (t ) = f ( x(t ), d (t )) + KNu (k ) + Kq p (k ) u (k + 1) = MLx (k ) + Du (k ) + qu (k ) + Mq y (k )

(11)

(12)

Assume that (H1) for each α ∈ Γ , the constant operating point xe (α ) of the nonlinear sampled-data system (1) is uniformly asymptotically stable; (H2) for any t 0 ≥ 0 , the continuously differentiable exogenous input signal d (t ) ∈ Γ satisfies xe (t 0 ) − xe (d (t 0 )) < γ 1

(13)

sup d (t ) < γ 2 , t ≥ t 0

(14)

t ≥ t0

where γ 1 and γ 2 are positive constants.

586

G. Guo, H. Xu, and Y. Tian

(H3) for any t 0 ≥ 0 , the continuously differentiable quantization effects satisfy sup ∂q y (t ) ∂x < γ y , sup ∂qu (t ) ∂u < γ u , sup ∂q p (t ) ∂u < γ p , t ≥ t 0 t ≥t0

where

γ y , γu

t ≥t0

and

γp

t ≥t 0

(15)

are positive constants.

Theorem 1. If the nonlinear sampled-data system (12) satisfies the conditions (11) and assumptions (H1)~(H3), then it is ultimately bounded if the following is true § ª A(α )T ∂q ¨ «e + Ψ (α )M y ∂x ¨« C ρ¨ « ∂q ¨« M y ¨« ∂x ¬ ©

∂qu º · ¸ ∂u »» ¸ 0 <1 »¸ ∂qu » ¸ ∂u ¼» ¸¹

Ψ (α ) Ψ (α ) D I

(16)

where T

∂q p

0

∂u

Ψ (α ) = ³ e A(α )τ ( B + K

)dτ

(17)

Namely, the following ultimate boundedness property holds: ~ ~ x e (t ) − x e ( d (t )) ≤ ξ e −δ ( t −t0 ) x e (t 0 ) − x e ( d (t 0 )) , t 0 ≤ t < t1

(18)

~ xe (t ) − xe ( d (t )) ≤ b (sup d (t ) ,ϖ~ ) , t ≥ t1

(19)

t ≥t0

~ is a continuous function where ϖ §

·

ϖ~ = g ¨ sup ∂q y (t ) ∂x , sup ∂q y (t ) ∂x , sup ∂q y (t ) ∂x ¸ ©

t ≥t 0

with g (0,0,0) = 0 , ~ and

ξ

strictly increasing with ~

t ≥t 0

t ≥t0

¹

~ are positive constants, continuous function

δ

b (0,0) = 0

(20) ~ is b (⋅)

.

Proof of Theorem 1: Define an intermediate variable z ( k + 1) = MLx( k ) + Du(k ) + qu (k ) + Mqy (k )

(21)

T Let ~ x e (t ) = [x T (t ), z T ( k + 1), u T ( k ) ] , where u (k ) = z (k ) . As in the previous section, we assume there exist continuously differentiable function x (⋅) , z (⋅) and u (⋅) satisfying

f ( x (α ),α ) = 0

(22)

z (α ) = Cx (α ) + Du (α ) + c (α )

(23)

u (α ) = z (α ) , α ∈ Γ

(24)

where c is a constant for each α ∈ Γ . Then the nonlinear sampled-data system with quantization (11) is equivalent to the nonlinear system with jumps

Sampled-Data Systems with Quantization and Slowly Varying Inputs

~ ~ xe (t ) = f c ( ~ xe (t ), d (t )) , t > 0 , t ≠ kT ~ ~ xe (kT ) = f j ( ~ xe (kT − ), d (kT − )) , k ≥ 0

587

(25) (26)

where ª Kq p (t )º ~ ~ f c ( xe , d ) = f c ( ~ xe , d ) + «« 0 »» «¬ 0 »¼

(27)

ª º 0 ~ ~ « » f j ( xe , d ) = f j ( ~ xe , d ) + « 0 » « Mq y (t ) + qu (t )» ¬ ¼

(28)

ª f ( x, d ) º ª0 0 B º fc (~ xe , d ) = «« 0 »» + ««0 0 0 »» ~ xe ¬« 0 ¼» ¬«0 0 0 »¼

(29)

ª x º ªI 0 0º f j (~ xe , d ) = ««Cx + Du »» = ««C 0 D »» ~ xe «¬ z »¼ «¬ 0 I 0 »¼

(30)

~ ª A(α ) 0 B º ∂f ~ Ac (α ) = ~c ( ~ x e (α ), α ) = « 0 0 0 »» + qc (α ) « ∂x e «¬ 0 0 0 »¼

(31)

with

Also we have

~ ªI ∂f j ~ ~ A j (α ) = ~ ( xe (α ), α ) = «C « ∂x e «¬ 0

0 0º D 0»» + q j (α ) I 0»¼

(32)

where ∂q p º ª «0 0 K ∂u » qc (α ) = «0 0 0 » « » «0 0 0 » «¬ »¼ ª « 0 q j (α ) = « 0 « ∂q y «M ¬ ∂x

(33)

º 0 » 0 0 » ∂qu » 0 » ∂u ¼ 0

(34)

To prove this theorem, we need only to verify that for nonlinear sampled-data system ~ ~ ~ with jumps (25) the condition ρ e Ac (α )T A holds. Simple mathematics leads to j (α ) ≤ λ < 1

(

e

~ Ac (α )T

)

∂q y ª A (α ) T + Ψ (α ) M «e ∂x ~ A j (α ) = «« C ∂q y « M «¬ ∂x

which proves Theorem 1.

∂qu º » ∂u » 0 » ∂qu » »¼ ∂u

Ψ (α ) Ψ (α ) D I

(35)

588

G. Guo, H. Xu, and Y. Tian

3 Examples Consider the following plant ª1º x (t ) = G ( d (t )) x(t ) + « »u (t ) ¬ 0¼

(36)

where d (t ) denotes the time-varying input. Here d (t ) = sin(0.2t )

(37)

1 + 2 sin(0.2t ) º ª 2 sin(0.2t ) G (d (t )) = « » ¬− 2 + sin(0.2t ) − 3 + sin(0.2t ) ¼

(38)

Notice that d(t) ∈ [-1,1], and since d (t ) ≤ 0.2 , d(t) doesn’t change significantly within one small sampling interval. We write the quantization terms as qς (k ) = 0.1e −ς ( k ) , with ς = y, u , p and k = 0,1, . It is clear that qς (k ) ∈ [0, 0.1] and qς (t ) < 0.1 for all ς = y, u , p and k = 0,1, . Thus the conditions (H2) and (H3)

required by the results in this paper are satisfied. We choose a sampled-data state feedback controller satisfying the ultimately boundedness condition of Theorem 1 as follows u (k + 1) = [− 0.5 − 0.2]x(k ) + u (k )

(39)

The given system has the origin as its operating points at 0.2t = nπ ( n ∈ Z stands for integers), which satisfy the condition for constant operating points by selecting u (α ) = 0 . It is clear that these constant operating points are uniformly asymptotically stable. Thus the condition (H1) is satisfied. Figure 1 shows the state responses of the given system. Notice that the response is bounded and satisfactory.

Fig. 1. Sampled-data control with T=0.01

Sampled-Data Systems with Quantization and Slowly Varying Inputs

589

4 Conclusions Boundedness is established for nonlinear sampled-data control systems with a slowly varying exogenous input. When quantization effects are considered, the closed loop systems are ultimately bounded if the variation rates of both the exogenous input signal and the quantization effects are sufficiently small.

Acknowledgment This work was supported by Natural Science Foundation of China under grant 60504017 and Program for New Century Excellent Talents in University under grant NCET-04-0982.

References 1. Chen, T., Francis, B.A.: Input-output stability of sampled-data systems. IEEE Trans. Automat. Contr. 36 (191) 50-54 2. Francis, B.A., Georgiou, T.T.: Stability Theory for Linear Time-Invariant Plants with Periodic Digital Controllers. IEEE Trans. Automat. Contr. 33 (1988) 820-832 3. Khalil, H.K., Kokotovic, P.V.: On Stability Properties of Nonlinear Systems with Slowly Varying Inputs. IEEE Trans. Automat. Contr. 36 (1991) 229 4. Miller, R.K., Michel, A.N., Farrell, J.A.: Quantizer Effects on Steady-State Error Specifications of Digital Feedback Control Systems. IEEE Trans. Automat. Contr. 34 (1989) 651-654 5. Hou, L., Michel, A.N., Ye, H.: Some Qualitative Properties of Sampled-Data Control Systems. IEEE Trans. Automat. Contr. 42 (1997) 1721-1725 6. Lawrence, D.A.: A Stability Property of Nonlinear Sampled-Data Systems with Slowly Varying Inputs. IEEE Trans. Automat. Contr. 45 (2000) 592-596

Set-Stabilization with Occasional Feedback Ge Guo1 and Jigong Li2 1 School

of Automation, Dalian Maritime University, Dalian, 116026, China [email protected] 2 School of Electrical Engineering, Lanzhou University of Technology, Lanzhou, 730050, China

Abstract. Occasional information feedback stabilization of linear discrete-time systems with a remote controller via a communication channel with limited data-rate is envestiaged. A new stabilizability concept called set-stabilizatility is introduced. The lowest data rate sufficient and necessary for the system to be finite time set-stabilizable is obtained.

1 Introduction For integrated communication and control systems, a question has attracted the attention of many researchers in recent years, see, e.g. [1]-[10]. That is: How are communication and control interacting each other? Good performance and large stability margins can be obtained by having enough output information that is communicated to the controller, while increasing the availability of the output information to the controller means increasing the bandwidth needed by the control system. In this paper we study the impact of the communication constraints on stabilizability property of linear systems that are occasionally “corrected” by a remote feedback signal through a communication channel that is shared by a number of plants, sensors, actuators and a remote main controller. Since the channel has finite bandwidth, so that only a limited number of plants can be accommodated at any one time. Without loss of generality we assume that only one plant is accommodated at each period of time for simplicity. This paper is organized as follows. Section 2 formulates control problems with communication constraints for systems with linear dynamics. In Section 3, a new concept called finite time set-stabilizability is introduced, for which some sufficient and necessary data rate conditions are derived. Some illustrative examples are given in Section 4. The conclusions are given in Section 5.

2 Problem Description The networked control system considered is given by xk + l +1 = Axk + l + Buk + l uk + l = Kxk D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 590 – 595, 2006. © Springer-Verlag Berlin Heidelberg 2006

(1)

Set-Stabilization with Occasional Feedback

591

where xk ∈ R n and uk ∈ R m are the state and control respectively; k is the discrete time; A and B are matrices of appropriate dimensions, K is the controller gain, l denotes the total network-induced time delay. The initial state x0 ∈ X 0 , where X 0 is assumed to be an open set. At times ri, i=0,1,…, r0=0, the sensor measurement is coded and transmitted to the controller via a communication channel with finite bandwidth. Once the observation arrives to the controller, it is decoded and the feedback control is computed, coded, and transmitted back to the plant. During time periods when sensor measurement is unavailable, the system operates in an open loop manner. The period of time h(i)=ri+1–ri is called the operation interval. During each operation interval the controller is assumed to use the communication channel J (J1) times, namely J times of successive measurement and feedback control, or J calls. After this the sensor measurement is unavailable to the controller and the system operates in an open loop manner until the next operation interval begins. In this paper, ri, l and h are multiples of the basic time step T. The communication channel is assumed to be error-free, noiseless digital channel that can transmit at each time step one of 2 R symbols denoted σ ∈ Γ , where Γ is the set of finite length strings of binary symbols with | Γ |= 2 R . Specifically, the channel has a data rate of R bits of information per unit time T, or equivalently it takes δ =T/R seconds to send one bit through the communication channel. System (1) is called an occasional networked feedback control system (ONFCS) thereinafter. Responses of such control system are given below. Lemma 1. The ONFCS (1) with initial state

xr0 = x0 has the following responses:

xrN = [ Ah − J ( A + BK ) J ]N x0 , for l<1

(2)

xrN = {Ah − Jl −1[ Al +1 + ( Al + I ) BK ]J A}N x0 , for l1

(3)

Proof. We first consider the case of l<1. As is supposed previously, the measurement data begins to be coded and sent at time instance r0 = 0 . So we have xri +1 = ( A + BK ) xri

(4)

After J consecutive times of feedback control, the response becomes as xri + J = ( A + BK ) J xri

(5)

Then the system will operate autonomously until the beginning of the next operation interval. Thus at the end of each operation interval, we have xri +1 = Ah − J xri + J = Ah − J ( A + BK ) J xri

Clearly, the response will be as follows at time instance xrN = [ Ah − J ( A + BK ) J ]N x0

which concludes the proof for the case l<1.

(6)

rN (7)

592

G. Guo and J. Li

For the case l 1, it is important to note that xri + jl = Al xri + ( j −1)l

(8)

xri + jl +1 = Axri + jl + Buri + jl

(9)

for i = 0,1, … …, j = 1,2, … …, J. Thus we have xri + Jl +1 = Al xri + ( J −1)l +1 + Buri + Jl

(10)

xri +1 = Ah − Jl −1 xri + Jl +1

(11)

uri + jl = Kxri + ( j −1) l

(12)

and

Since theoretically we have

which might not be practically effective in the existence of long network-induced delays. Thus we use a one-period-ahead prediction here such that u r + jl = Kxr + jl . Then i

i

simple algebra leads to the result.

3 Stabilizability Result We are now in a position to discuss the set-stabilizability problem defined below. Definition 1. For a given set M and for some positive N, a system is said to be finite time set-stabilizable to M at time N if for any initial state x0 ∈ X 0 the trajectory returns to the cube M, i.e. xN ∈ M , where X 0 is a set with edge length rX , and M is a set with edge length rM that is no larger than rX . Theorem 1. Assume that (A, B) is controllable and M is a cube with the edge lengths rM . Then the ONFCS (1) is set-stabilizable to M within N operation intervals if (i) for l < 1 a) | λ ( A + BK ) |< 1 ; and b) Ah − J

N

∆ x ≤ rM ;

(ii) for l 1 a) | λ[ A1+l + ( Al + I ) BK ] |< 1 ; and b) Ah − Jl

N

∆ x ≤ rM .

where λ (⋅) stands for all the eigenvalues of a matrix,

ǻ x = max{ǻ ix }in=1 , I is the

identity matrix of dimension n. Proof. It is sufficient to prove that for any x0 ∈ X 0 , all the trajectories starting in X 0 at time zero will enter M at the beginning of the Nth operation interval under occasional state feedback control. From Lemma 1 we know that for the case of l1, the closed-loop response is xrN = { Ah − Jl −1[ Al +1 + ( Al + I ) BK ]J A}N x0

(13)

Set-Stabilization with Occasional Feedback

Let

593

x denote the center of M, and c the center of X 0 , we then have xrN − x = {Ah − Jl −1[ Al +1 + ( Al + I ) BK ]J A}N ( x0 − c)

(14)

Therefore, xrN − x = { Ah − Jl −1[ Al +1 + ( Al + I ) BK ]J A}N ( x0 − c )

≤ Ah − Jl −1[ Al +1 + ( Al + I ) BK ] J A ≤ A

h − Jl −1 N

≤

Al +1 + ( Al + I ) BK

1 h − Jl A 2

N

JN

N

A

Al +1 + ( Al + I ) BK

x0 − c N

JN

(15)

x0 − c ∆x

Resorting now to the conditions (ii) in Theorem 1, we have xt − x N

∞

≤ rM 2 ,

namely, xN ∈ M . Similar process also applies to the case of l<1. Thus completes the proof. Theorem 1 supplies a useful tool for the design of stabilizable operation intervals with network-induced delays taken into account. This can be done by solving the given LMI (linear matrix inequality) problems. The set M can be seen as the attracting zone of the closed-loop system. It is clear that stabilizability with smaller attracting zone needs smaller quantization level ǻ x , which means larger data rate or bandwidth. Conditions relating data rate to stabilizability are discussed below. Theorem 2. For ONFCS (1) with (A, B) controllable, the lowest data rate necessary for closed-loop set-stabilizability is (i) for l < 1 R≥

1 n J n ¦ log 2 λi ( Ah− J ) + h ¦i =1 log 2 λi ( A + BK ) ; h i =1

(ii) for l 1 R≥

1 n J n ¦ log 2 λi ( Ah− Jl ) + h ¦i=1 log 2 λi [ Al +1 + ( Al + I ) BK ] h i =1

Proof. For the case of l 1, the state response is xrN = { Ah − Jl −1[ Al +1 + ( Al + I ) BK ]J A}N x0

Let

(16)

x denote the center of M, and c the center of X 0 . Then we have xrN − x = {Ah − Jl −1[ Al +1 + ( Al + I ) BK ]J A}N ( x0 − c)

(17)

Therefore, xrN − x = { Ah− Jl −1[ Al +1 + ( Al + I ) BK ] J A}N ( x0 − c )

≤ Ah − Jl −1[ Al +1 + ( Al + I ) BK ]J A

N

x0 − c

(18)

594

G. Guo and J. Li

If the system is set-stabilizable, we have xrN − x ≤ 1 A h − Jl

N

2

Al +1 + ( Al + I ) BK

JN

∆x

≤

1 rM 2

(19)

By counting the number of regions of diameter less than rM it takes to cover the set of points that xt can take, we can obtain a lower bound on the data rate. To cover the N

set including xt with diameter less than rM , we need at lest N N

R≥

=

JN

det( Ah− Jl ) det[ Al +1 + ( Al + I ) BK ] Γn ( rX 0 / 2) n 1 log 2 Nh Γn ( rM / 2) n

(20)

rX 1 n J n n ¦ log2 λi ( Ah−Jl ) + h ¦i=1 log2 λi [ Al+1 + ( Al + I ) BK] + Nh log2 r 0 h i=1 M

rM ≤ rX 0 . So η 0, namely the result of Theorem 2 corresponding to l1

Since

holds. Similar discussion applies to the proof of the case l<1. This completes the proof.

4 Example Consider a DC motor with a simplified discrete-time state space description with ª0.8825 0º , B = [1.5667 0.008]T , C = [1 0] A=« » ¬0.0094 1¼ J>1,l=0.5 0.1

0.2

0.05

Closed-loop system states

Closed-loop system states

J=1,l=0.5 0.3

0.1 0 -0.1 -0.2 -0.3 -0.4 0

5

10 Time steps

15

20

Fig. 2(a). Stable ONFCS with J=1

0 -0.05 -0.1 -0.15 -0.2 -0.25 0

5

J>1,l=0.5

20

J=1,l=0.5 1

0.2

Closed-loop system states

Closed-loop system states

15

Fig. 2(b). Stable ONFCS with J=2

0.4

0 -0.2 -0.4 -0.6 -0.8 -1 0

10 Time steps

5

10 Time steps

15

20

Fig. 2(c). ONFCS with noise (J=1)

0.5

0

-0.5

-1

-1.5 0

5

10 Time steps

15

20

Fig. 2(d). ONFCS with noise (J=2)

Set-Stabilization with Occasional Feedback

595

In the simulations, the state feedback gain is designed as K=[-0.6720 -28.7226] such that we have closed-loop poles (0.5,0.1). In Fig.2(a) and Fig.2(b), responses of ONFCS of the motor with delay l=0.5s are illustrated for J=1 and J=2 respectively. For the case of J=1, after one time of network utilization the system states enter into a set with a radius rM a little larger than 0.35. The trajectory stays within this set afterwards. For the case of J=2, the response is similar but converges more quickly. The rate calculated using Theorem 2 is R=0, which means that set-stabilizability can be achieved without networked feedback because the open loop system is stable. In fact, this is true even when there is noise in the system, see Fig.2(c) and Fig.2(d), where a noise signal with magnitude 0.01 is introduced.

5 Conclusions In this paper, the problem of state feedback control for linear systems with finite communication data rate is discussed. The concept of finite time set-stabilizability is introduced. Especially, sufficient and necessary data rate is derived for the system to be finite time set-stabilizable.

Acknowledgement This work was supported by Natural Science Foundation of China under grant 60504017 and Program for New Century Excellent Talents in University under grant NCET-04-0982.

References 1. Branicky, M.S., Phillips, S.M., Zhang, W.: Scheduling and Feedback Co-design for Networked Control Systems. Proc. IEEE CDC. Las Vegas. (2002) 2. Butman, S.: A General Formulation of Linear Feedback Communication Systems with Solutions. IEEE Trans. Inform. Theory. 15 (1969) 392–400 3. Butman, S.: Linear Feedback Rate Bounds for Regressive Channels. IEEE Trans. Inform. Theory. 22 (1976) 363–367 4. Chan, H., Ozguner, U.: Closed-Loop Control of Systems over a Communications Network with Queues. Int. J. Control. 62 (1995) 493–510 5. Halevi, Y., Ray, A.: Integrated Communication and Control Systems: Part I-Analysis. ASME Journal of Dynamic Systems, Measurement, and Control. 110 (1988) 367-373 6. Hong, S.H.: Scheduling Algorithm of Data Sampling Times in the Integrated Communication and Control Systems. IEEE Trans. Control Syst. Tech. 3 (1995) 225–230 7. Hu, S., Zhu, Q.: Stochastic Optimal Control and Analysis of Stability of Networked Control Systems with Long Delay. Automatica, 39 (2003) 1877-1884 8. Liu, X., Wong, W.S.: Controllability of Linear Feedback Control Systems with Communication Constraints. Proc. Conference on Decision and Control. San Diego, California, USA (1997) 9. Montestruque, L.A., Antsaklis, P.: On the Model-Based Control of Networked Systems. Automatica, 39 (2003) 1837-1843 10. Wong, W.S., Brockett, R.W.: Systems With Finite Communication Bandwidth Constraints-II: Stabilization With Limited Information Feedback. IEEE Trans. Autom. Control. 44 (1999) 1049-1053

Spatial Reasoning for Collision Detection and Hardware Implementation Chirag Nepal1, Seung Woo Nam2, Dohyung Kim2, and Kyungsook Han1,* 1

School of Computer Science and Engineering, Inha University, Inchon, Korea 2 Digital Content Research Division, ETRI, Daejeon, Korea [email protected], [email protected], [email protected], [email protected]

Abstract. Spatial reasoning is a core constituent in physical simulation, robotics, computer animation, computer-aided design, and geographic information systems. Many problems in these areas involve contact analysis and collision detection between static and/or moving objects. Due to its wide range of applications, collision detection between objects has been studied in various fields, but collision detection is still considered a major computational bottleneck. We classified collision detection problems into fourteen cases and implemented those using graphics hardware. For efficient collision detection, the algorithm uses various forms of bounding volumes, which are an approximate but efficient mechanism, and program codes are optimized. Our algorithm also produces the intersection part inside an object as well as the interaction point and collision time. We tested both software implementation and hardware implementation on the flight path problem with an actual satellite picture of Seoul, which was represented in a polygon mesh with 250,000 triangle lists. Experimental results demonstrated that hardware implementation was up to 70 times faster than software implementation and that code optimization and hardware implementation can significantly speed up the collision detection process.

1 Introduction Spatial reasoning is essential to many problems of physical simulation, robotics, computer animation, computer-aided design, and geographic information systems. Examples of spatial reasoning problems include collision detection, path planning, and putting together a jig-saw. Collision detection tells us whether two moving or static objects occupy the same space at the same time. Collision detection between moving and/or static objects has been studied for years in various fields, but collision detection is still considered a major computational bottleneck. Since computation of collision detection takes time, how accurate collision detection should be depends on the problem type. If the problem requires very accurate physics, it should carry out ray collision tests against all the triangles that make up the world. However, many problems can be solved with much less accurate *

To whom correspondence should be addressed.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 596 – 601, 2006. © Springer-Verlag Berlin Heidelberg 2006

Spatial Reasoning for Collision Detection and Hardware Implementation

597

tests. Bounding volumes described below are an approximate but efficient collision mechanism in many problems. Even if the problem requires more accurate collision detection, using bounding volumes still allows us to reject whole objects before going down to the accurate test. For example, avoiding as many ray-triangle tests as possible using bounding volume test will eliminate many triangles. In collision detection, when two bounding volumes do not intersect, then the contained objects cannot collide, either. Testing against a bounding volume is typically much faster than testing against the object itself, because the bounding volume has simpler geometry than the object. This is because an object is typically composed of polygons or data structures that are reduced to polygonal approximations. In either case, it is computationally wasteful to test each polygon against the view volume if the object is not visible. For efficient collision detection, we use various types of bounding volumes, such as bounding spheres, bounding cylinders, oriented bounding boxes (OBB), and axisaligned bounding boxes (AABB). A bounding sphere is a sphere containing the object. Spheres are very quick to test for collision with each other; two spheres are separated if the distance between their centers is greater than the sum of their radii. This makes bounding spheres appropriate for objects that can move in three dimensions. A bounding cylinder is an upright axis-aligned cylinder containing the object. Cylinders are appropriate for objects that can rotate about the vertical axis but not about other axes, and especially appropriate for objects that are constrained to move along a flat horizontal surface. Two axis-aligned cylinders on a flat surface intersect if the horizontal distance between their axes is less than the sum of their radii. A bounding box is a cuboid containing an object. In dynamical simulation, bounding boxes are better than other shapes of bounding volume such as bounding spheres or cylinders for objects that are roughly cuboid in shape, so that the intersection test is very accurate. This gives a benefit for objects that rest close to each other, for example a car resting on the ground; a bounding sphere would show the car possibly intersecting with the ground which would then be rejected by a more expensive test of the actual model of the car; a bounding box would show the car not intersecting with the ground, saving the more expensive test. A bounding box is sometimes called an oriented bounding box (OBB) to distinguish it from an axisaligned bounding box (AABB), which is a bounding box aligned with the axes of the coordinate system. Axis-aligned bounding boxes are simpler to test for intersection than oriented bounding boxes, but have the disadvantage that they cannot be rotated. The rest of this paper presents our method and its implementation results.

2

Method and Implementation

We classified collision detection tests into 14 cases: Ray-Triangle test, AABB-AABB test, OBB-OBB test, Triangle-AABB test, Triangle-OBB test, Sphere-Sphere test, Triangle-Sphere test, Ray-Cylinder test, Triangle-Cylinder test, Cylinder-Cylinder test, OBB-Cylinder test, OBB-Plane test, Ray-Sphere test, and Sphere-OBB test. For collision detection for these 14 cases, we improved the algorithms of Ericson [1]. While Ericson’s algorithm determines a collision point only, our algorithm computes the portion of the line segment inside a rigid body as well as the contact

598

C. Nepal et al.

point. Computing the collision point of the intersection between the line segment and a rigid body is very important in many physical simulations, for example, 1 2

To determine if the player or character hits a wall or obstacles so as to stop them walking through it, which will be very commonly used in computer games. To determine the depth of the penetration of the segment with the geometrical object. This can be useful in an application where the behavior of the object has to be changed after some point.

Our algorithm determines whether the intersection between a rigid body and a line segment occurs or not. If collision occurs, the exact collision point is determined. After the collision point is determined, the length of the portion of the line segment inside the cylinder is to be determined. The end terminal of the line segment that is inside the cylinder is determined and the distance is computed from the calculated collision point. When both end points of a line segment are inside the cylinder, the total length of the line segment is the length inside the cylinder. Whereas in the collision case, if both end points of a line segment are outside the rigid body, that is, the line segment completely intersects the rigid body, then the length of the line segment inside the rigid body is determined from the two collision points. Here, one point represents the entry and the other represents the exit point. // determination of a point inside a cylinder norm1 = p - q; // normal of a cylinder plane dist1 = dot ((a – p), norm1); if (dist1 < 0) return 0; norm2 = q - p; // normal of a cylinder plane dist2= dot ((a - q), norm2); if (dist2 < 0) return 0; return 1; // determination of a point inside a sphere norm = cross (a, center); // normal of a sphere plane dist = dot ((a – center), norm); if (dist < 0) return 0; return 1;

Visual C++ was used for software implementation of the algorithm. This algorithm is a hybrid of Ericson’s algorithm [1] and ours. Ericson’s algorithm was used to determine the intersection, and our algorithm was used to find the length of the line segment inside the rigid body. The program is composed of two parts: one part that determines the occurrence of intersection and finds the collision point and the other that determines the distance of the line segment. To ensure stability, the test which checks the line segment parallel to cylindrical axis, cylindrical axis, the calculated value “a” must be compared to a small interval around 0. With a properly adjusted EPSILON value, the algorithm is stable.

Spatial Reasoning for Collision Detection and Hardware Implementation

599

The algorithm was checked against the different values of rigid bodies and the line segments. On different cases we found an exact collision point and the length of the line segment inside the rigid body. Whenever objects intersect, the output of the system is the collision point, distance of the line segment and parameter value t, whereas in the case of nonintersection, it returns the collision flag as false. For hardware implementation, the Visual C++ code was translated into VHDL code. VHDL is an acronym for Very High Speed Integrated Circuit Hardware Description Language, which is a programming language used to describe a logic circuit by function, data flow behavior or structure. An FPGA based acceleration board with PCI interface was made for our experiment (see Fig. 1). The board has 2 million Xilinx gates, and 7 million Vertex 2 pro gates for memory and the collision detection logic. It also has two 1 GB 100 MHz DDR memories and 100 MHz 288 bit bus for the input data, and seven 2 MB 100 MHz SRAMs and 100 MHz 224 bit bus for the output.

Fig. 1. The acceleration board with 64bits/66MHz PCI interface. On the board there are 2 million and 7 million gates Xilinx Vertex 2 pro FPGAs for memory and the collision detection logic. The board also has two 1 GB 100 MHz DDR memories and 100 MHz 288 bit bus for the input data, and seven 2 MB 100 MHz SRAMs and 100 MHz 224 bit bus for the output.

As shown in Fig. 2, we performed collision detection tests for a flight path over Seoul, Korea. An actual satellite picture of a district in Seoul was represented in a polygon mesh with 250,000 triangle lists. Computation for collision detection was performed in every frame. Comparison of the performance of software implementation to that of hardware implementation revealed that hardware implementation is much faster than software implementation in all 14 cases. In particular, the hardware implementation was 70 times faster than the software implementation in the case of ray-triangle on a system with Xeon 2.8GHz and 2GB memory (Fig. 3).

600

C. Nepal et al.

Fig. 2. Experiment of collision detection for a flight path over Seoul, Korea. An actual satellite picture of a district in Seoul was represented in a polygon mesh with 250,000 triangle lists. Computation for collision detection was performed using graphics hardware in every frame.

Fig. 3. Left: Computation time of software implementation. Right: Computation time of hardware implementation.

3 Conclusion As a core spatial reasoning problem, collision detection is a major computational bottleneck so has been studied in various fields. This paper presented an efficient

Spatial Reasoning for Collision Detection and Hardware Implementation

601

collision detection using graphics hardware and approximation. We classified frequent collision detection problems into fourteen cases and implemented them using FPGA based acceleration board. For efficient collision detection, we used various types of bounding volumes, which are an approximate but efficient collision mechanism, and optimized program codes. Experiments for the flight path problem with an actual satellite picture of Seoul showed that hardware implementation was consistently faster than software implementation. In particular, the hardware implementation was 70 times faster than the software implementation for the raytriangle case. Our experiment demonstrated that code optimization and hardware implementation can significantly speed up the process of collision detection. The satellite picture used for the experiment was represented in a static polygon mesh with 250,000 triangle lists, but we plan to transform it into a dynamic data structure in future study.

Acknowledgement This work was supported by Inha University Research Grant.

References 1. Ericson, C.: Real Time Collision Detection. Morgan Kaufmann (2005) 177-179,194–198 2. Held, M.: ERIT: A Collection of Efficient and Reliable Intersection Test. Journal of Graphics Tools 2 (1997) 25-44 3. http://www.spiritedtechnology.com/ 4. http://www.geometrictools.com/ 5. http://www.graphicsgems.org/

Stability Criteria for Switched Linear Systems Hongbo Zou, Hongye Su, and Jian Chu National Laboratory of Industrial Control Technology, Institute of Advanced Process Control, Zhejiang University, Hangzhou, 310027, P.R. China {hbzou, hysu, chuj}@iipc.zju.edu.cn

Abstract. The problem of stability analysis for switched linear systems is studied based on the properties of the quadratic form. Suﬃcient conditions for asymptotical stability, instability under arbitrary switching sequence and asymptotical stability under special switching sequence are reached. These conditions can be applied not only to study the problem of stability analysis for switched linear systems, but also to design the stable switching sequence for a switched linear system. A numerical example is used to demonstrate the applicability of the proposed methods.

1

Introduction

In the last two decades, there has been increasing interest in stability analysis and controller design for switched systems, see, e.g., [1,2,3,4,5,6,7] and the references therein. Up to now, the most eﬃcient methods are the multiple Lyapunov functional (see [3,4,8,9,10]) and the common Lyapunov functional (see [1,2,6,11,12]. In this paper, the attention is focused on the stability analysis of switched linear systems and designing a switching sequence to stabilize the switched system (1) on the basis of properties of the quadratic form. Suﬃcient conditions have been reached which can be applied directly to analyze the stability of switched linear systems under arbitrary switching sequence. Based on these conditions, a stable switching sequence can be designed for a class of switched linear system which are unstable under some switching sequences. It is easier to apply these conditions than Lyapunov functional methods. The switched linear systems under consideration is composed of a number of linear subsystems as follows, x(t) ˙ = Aq x(t)

(1)

where x(t) ∈ Rn , Aq ∈ Rn×n , (q = 1, 2, . . . , N ).

2

Deﬁnitions

Deﬁnition 1 (Energy Velocity Matrix). Given a linear system x(t) ˙ = Ax(t), A ∈ Rn×n ,

(2)

the matrix EA = A + AT is called the energy velocity matrix of the system (2). D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 602–607, 2006. c Springer-Verlag Berlin Heidelberg 2006

Stability Criteria for Switched Linear Systems

603

Remark 1. For given Lyapunov functional f (x(t)) = xT (t)x(t) for the system (2), d then dt f (x(t)) = xT (t)(A + AT )x(t) = xT (t)EA x(t) holds. Therefore the energy velocity matrix of a linear system can be used to determine whether this system is stable. If EA < 0, this system is asymptotically stable, and vice versa. Deﬁnition 2 (Negative Deﬁnite Space). Given a matrix A ∈ Rn×n and a subspace VA− ⊂ Rn , then VA− is the negative deﬁnite space of A if and only if the following hold, 1. x = 0 ∈ VA− ; 2. xT EA x < 0, ∀ x ∈ VA− , x = 0; 3. xT EA x 0, ∀ x ∈ VA− ; Deﬁnition 3. Given the linear system (2), this system’s negative deﬁnite space is deﬁned as the negative deﬁnite space of A. Remark 2. The part of trajectories of a linear system in its negative deﬁnite space have a tendency towards the equilibrium point. If the negative deﬁnite space of the system (2) is Rn , then this system is globally asymptotically stable. Deﬁnition 4. Given the switched linear system (1), its negative deﬁnite space is the union of its subsystems’ negative deﬁnite space.

3

Main Results

Lemma 1 ([11]). Given the switched linear system (1), if there exists a symmetric positive deﬁnite matrix P satisfying P Aq + ATq P < 0, q = 1, 2, . . . , N,

(3)

then the system (1) is asymptotically stable under arbitrary switching sequence and the quadratic functional V (x(t)) = xT (t)P x(t) is a common Lyapunov functional. Theorem 1. The switched linear system (1) is asymptotically stable under arbitrary switching sequence if all of its subsystems’ energy velocity matrices are negative deﬁnite. Proof. For given switched linear system (1), due to all of its subsystems’ energy velocity matrices are negative deﬁnite, the functional f (x(t)) = xT (t)x(t)

(4)

is a common Lyapunov functional for all subsystems. According to Lemma 1, this system is globally asymptotically stable under arbitrary switching sequence. Theorem 2. The switched linear system (1) is unstable under arbitrary switching sequence if all of its subsystems’ energy velocity matrices are positive deﬁnite.

604

H. Zou, H. Su, and J. Chu

Proof. Let (4) be a common Lyapunov functional candidate for the switched linear system (1). Then we have d f (x(t)) = xT (t)(Aq + ATq )x(t) > 0, x = 0, q = 1, 2, . . . , N dt

(5)

holds because EAq = Aq + ATq , (q = 1, 2, . . . , N ) are positive deﬁnite. Thus, for any initial state x(t0 ) = 0, f (x(t0 )) = xT (t0 )x(t0 ) > 0 (6) lim f (x(t)) → +∞ t→+∞

holds which implies that system (1) is unstable under any switching sequence. Theorem 3. Consider the switched linear system (1) with EAq (q = 1, 2, . . . , N ) indeﬁnite. If this system’s negative deﬁnite space is Rn , then at least one switching sequence can be found under which this system is globally asymptotically stable. Proof. Since all subsystems’ energy velocity matrices are indeﬁnite, any subsystem’s negative deﬁnite space is not empty. The switching sequence can be described as follows, (7) σ(t) = q if xT (t)EAq x(t) < 0 where q = 1, 2, . . . , N , σ(t) ∈ {1, 2, . . . , N }. Since this system’s negative deﬁnite space is Rn , equation (7) is solvable. Then under σ(t) this system’s trajectories in Rn tend towards the equilibrium point and thus are convergent. So the system is globally asymptotically stable under switching sequence σ(t). At the same time, there may exist more then one solution of (7). So we can ﬁnd at least one switching sequence to stabilize this system. Corollary 1. Given the switched linear system (1), if at least one of its subsystems’ energy velocity matrices is negative deﬁnite and the others are indeﬁnite, then there exist at least one switching sequence under which the system is globally asymptotically stable. Proof. In Corollary 1, the switched linear system’s negative deﬁnite space is Rn . According to Theorem 3, there exists at least one switching sequence for the system to be globally asymptotically stable. Remark 3. The boundary of VA− is the solutions of xT EA x = 0. As usually, we can not get the solutions directly. But it is easy to found an orthogonal matrix P and if let x = P y, then xT EA x = 0 is equivalent to y T P T EA P y = y T diag(λ1 , . . . , λn )y = 0.

(8)

where λi (i = 1, . . . , n) are the eigenvalues of EA . Suppose Qy = 0 are the solutions of (8). Then QP T x = 0 are the solutions of xT EA x = 0.

Stability Criteria for Switched Linear Systems

605

m boundary of :2

-2

-4 boundary of :1 o

-6

-8 m system trajectory

x2 -10

-12

-14 system trajectory boundary of :2 boundary of :

-16

1

-18 mInitial point -20

-5

0

5

10 x

15

1

Fig. 1. State Trajectory with Switch Sequence

4

Numerical Example

Example 1. This example is used to demonstrate Theorem 3. Consider the system (1) with the following parameters, n = N = 2, x = [x1 (t), x2 (t)]T , ! ! −2 3 1 0 , A2 = . A1 = 0 2 0 −9 Both of the two subsystems are unstable. The energy velocity " matrix#of A1 and ! ! −3 √1 √ −4 3 2 0 A2 are EA1 = , EA2 = respectively. Let P = √110 √310 , we have 3 4 0 −18 10 10 ! ! ! −5 0 1 1 y 1 EA1 y = P T EA1 P = and = 0 are the solutions of y T EA1 y y = 0 5 −1 1 y2 ! ! ! −1 2 x1 1 1 T = 0 are the solutions of 0. And let x = P y, then P x= 2 1 x2 −1 1 xT EA1 x = 0. The solutions of xT (t)EA2 x(t) = 0 are |x1 (t)| = 2|x2 (t)|. Then we obtain the negative deﬁnite space of the subsystem A1 and A2 are as follows, Ω1 = {x(t) = [x1 (t), x2 (t)]T | {x1 (t) > 2x2 (t) ∩ x2 (t) > −2x1 (t)} ∪ x1 (t) = x2 (t) = 0}, Ω2 = {x(t) = [x1 (t), x2 (t)]T | |x1 (t)| < 3|x2 (t)| ∪ x1 (t) = x2 (t) = 0}.

(9) (10)

The negative deﬁnite space of this switched linear system is Ω = Ω1 ∪ Ω2 = Rn . According to Theorem 3, there exist at least one switching sequence make this system globally asymptotically stable. Assume x(t0 ) = [x1 (t0 ), x2 (t0 )]T to be the initial state, the switching sequence σ(t) can be determined as follows,

606

H. Zou, H. Su, and J. Chu

1. σ(t0 ) = 1, if x(t0 ) ∈ Ω1 , else σ(t0 ) = 2; 2. If x(t) arrive at the boundary of Ωσ(t− ) , then σ(t) = σ(t− ) + 1, else σ(t) = σ(t− ); 3. if σ(t) > 2, σ(t) = 1; Fig. 1 and Fig. 2 show the trajectory of the system under σ(t) simulated with MATLAB and the initial state is x(0) = [10, −20]T . 15

10

5

m x (t) 1

0

-5 m x2(t) -10

-15

-20

0

0.5

1

1.5

2 t(s)

2.5

3

3.5

4

Fig. 2. State Trajectory with Time t

5

Conclusion

In this paper, we study the problem of stability analysis for a class of switched linear systems. When energy velocity matrices of all subsystems for a switched linear system are negative deﬁnite, we show that the system is globally asymptotically stable under arbitrary switching sequence. When all of them are positive deﬁnite, we also show that the system is unstable under any switching sequence. When a switched linear system do not satisfy any condition above, we show that if all of its subsystems’ energy velocity matrices are negative deﬁnite or indeﬁnite and the system’s negative deﬁnite space is Rn , then there always exits at least one switching sequence under which the system is globally asymptotically stable.

Acknowledgment This work is supported by the National Creative Research Groups Science Foundation of China under grant 60421002, the National Science Foundations of P. R. China under grant 60503027 and the National Tenth Five-Year Science and Technology Research Program of China under grant 2004BA204B08.

Stability Criteria for Switched Linear Systems

607

References 1. Wicks, M., Peleties, P., DeCarlo, D.: Switched Controller Synthesis for the Quadratic Stabilization of A Pair of Unstable Linear Systems. European Journal of Control Vol. 4 (1998) 140–147 2. Kim, S., Campbell, S.A., Liu, X.: Stability of A Class of Linear Switching Systems with Time Delay. IEEE Transactions on Circuits and Systems I Vol. 53 (2006) 384–393 3. Decarlo, R.A., Branicky, M.S., Petterson, S., Lennartson, B.: Perspectives and Results On the Stability and Stabilizability of Hybrid Systems. Proceedings of the IEEE Vol. 88 (2000) 1069–1082 4. Branicky, M.S.: Multiple Lyapunov Functions and Other Analysis Tools for Switched and Hybrid Systems. IEEE Transactions on Automatic Control Vol. 43 (1998) 475–482 5. Liberzon, D., Morse, A.S.: Basic Problems in Stability and Design of Switched Systems. IEEE Control Systems Magazine Vol. 19 (1999) 59–70 6. Liberzon, D., Hespanha, J.P., Morse, A.S.: Stability of Switched Systems: a Liealgebraic Condition. Systems & Control Letters Vol. 37 (1999) 117–122 7. Sun, Z.: Stabilizability and Insensitivity of Switched Linear Systems. IEEE Transactions on Automatic Control Vol. 49 (2004) 1133–1137 8. Johansson, M., Rantzer, A.: Computation of Piecewise Quadratic Lyapunov Functions for Hybrid Systems. IEEE Transactions on Automatic Control Vol. 43 (1999) 555–559 9. Ye, H., Michel, A.N., Hou, L.: Stability Theory for Hybrid Dynamical Systems. IEEE Transactions on Automatic Control Vol. 43 (1998) 461–474 10. Xie, L., Shishkin, S., Fu, M.: Piecewise Lyapunov Functions for Robust Stability of Linear Time-varying Systems. Systems & Control Letters Vol. 31 (1997) 165–171 11. Liberzon, D., Tempo, R.: Common Lyapunov Functions and Gradient Algorithms. IEEE Transactions on Automatic Control Vol. 49 (2004) 990–994 12. Ishii, H., Basar, T., Tempo, R.: Randomized Algorithms for Synthesis of Switching Rule for Multimodel Systems. IEEE Transactions on Automatic Control Vol. 50 (2005) 754–767

Suppressing of Chaotic State Based on Delay Feedback Wenli Zhao1 and Linze Wang2 1

Department of Mechanics and Electronics, Hangzhou Dianzi University, Hangzhou 310018, China 2 Department of Computer, Hangzhou Dianzi University, Hangzhou 310018, China [email protected]

Abstract. The paper studies how to eliminate chaotic abnormal vibration in the system with clearance impacts when control target cannot be known beforehand. Some state variables of the previous impact are used to suppress chaotic motion by negative feedback. The method is illustrated with an example of impact vibration model with clearance impacts. This method is validated by analysis and simulation of many dimension mapping equations of the system in Poincaré section. The results show that the method can suppress chaotic motion effectually. Some researches are also done on the controlling effect corresponding to the feedback matrix G and the start point of the controlling. It is found out that the smaller the G is, the longer it takes to suppress chaos. An extremely small G will cause the system to converge to a fixed circle instead of a fixed point. The start point of the controlling makes no influence on control.

1 Introduction In this paper, we will discuss the impact vibration system with clearance, the model of which is piecewise linear. The present researches show that such systems may convert to chaotic motions while the parameters are changed, even though their original actions are stable [1-7]. When the system is difficult or even unable to be redesigned, employing the controlling method to avoid the chaotic motion is a possible choice. However, previous methods to suppress the chaotic motion are based upon the knowledge of the expected results [8-9]. In some cases, the expected results is hard to be given, and the only requirement is to eliminate chaotic motions to get the system back to stable periodic motions, rather than exact previous motions. Thus, under this condition, utilizing the self-information of chaotic motions to suppress the chaotic states is a good method. In this paper, we will study the controlling method, which uses the time delay feedback to suppressing the chaotic abnormal status in the vibration system with clearance through the feedback information of previous states.

2 Time Delay Feedback Controlling Method The main idea of this method is to skillfully use certain parts of output signals of the system, through time delaying and subtracting it with original output signals, to get the controlling information to be feed backed to the system. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 608 – 615, 2006. © Springer-Verlag Berlin Heidelberg 2006

Suppressing of Chaotic State Based on Delay Feedback

609

The time delay feedback can be given as the following dynamic forms

x(n + 1) = F(x(n)) + G[x(n − p) − x( n)] where,

X ∈ RM

㧘 G = ( g , g ,..., g ) 㧘 g T

1

2

i

1

(1)

= g 2 = ... = g i ( i ≤ M ) are the

feedback ratio factors. The Fig. 1 shows the illustration of this method. When G-=0, the original system has the chaotic motion path. x(n) and x( n − p ) are the measurements of x at the time n and the time n-p respectively. The period of certain unstable period motion can be chosen as the delay time p. When the feedback matrix G has been properly chosen, the system would be driven to the expected orbit. X(n) Chaotic system

_

G

Gx(n) Delay

GX(n p )

P

Fig. 1. Principle of the time delay feedback controlling

Fig. 2. Model of the system

3 Simulations To study the effect of the above method to suppress the chaotic motion of the Vibroimpact clearance system, we discuss the model shown in Fig. 2, which can be an abstract of many Vibro-impact clearance systems and is a typical model. The system is comprised of the principal masses, steel ball, linear spring and damper. m1 and m2 represent the masses of principal body and steel ball respectively. k and

c denote the stiffness of spring and damping coefficient respectively. Assuming there isn't friction between the principal mass and the small ball, the trough’s clearance of the principal mass is d . The external harmonic excitation acts on the principal mass. Thus, the principal mass’s absolute motion is x and the small ball’s is y as the Figure 2 showed .The vibration system’s differential equation can be written as m1 x(t ) + cx (t ) + kx(t ) = F sin(Ωt + α ) ½ ¾ m2 y (t ) = 0 ¿

(2)

According to the momentum theorem and the definition of restitution coefficient, we obtains the system’s impact equation as

610

W. Zhao and L. Wang

x + = x − + µ ( y − − y + )½ ¾ x + = y x + R( y − − x − ) ¿ where µ

=

(3)

m2 ˈ5 represents the coefficient of restitution, x− , y − , x+ and y + m1

denote the instantaneous velocity before and after the impact of the two objects respectively. In the Eqs. (2) and (3), set x0 =

F x y c , x1 = , x2 = ,h= , ωn = k x0 x0 2 m1 k

k Ω ,ω = , τ = ωnt m1 ωn

Then Eqs. (2) and (3) become those dimensionless forms

x1 (τ ) + 2hx1 (τ ) + x1 (τ ) = sin(ωτ + α ) ½ ¾ x2 (τ ) = 0 ¿

(4)

x1+ = x1− + µ ( x 2− − x 2+ )½ ¾ x1+ = x 2+ + R( x 2− − x1− )¿

(5)

It is assumed that the external harmonic excitation acts only on the principal mass and the impact happens between the small ball and the principal mass. Using the relation of motion, the impact should take place at the points of x2 − x1 = ± d o , here

d 0 = d ( 2 x 0 ) . To study the problem of suppressing the chaos of the clearance system, we suppose the period of the impact of the system is the same as the period of the external excitation. For the stable motion, in a period of external excitation, there are twice balanced impacts respectively happened on the left and the right side. Set the Poincare section located at the right of the principal mass and τ is the interval of the twice impacts σ = {( x1 , x1 , x2 , τ ) ∈ R 3 × S x2 = x1 + d 0 , τ = τ + }

᧨

According to the boundary condition and Eqs(4) and (5), we can get the Poincaré mapping F : σ → σ as

X n+1 = F (ν , X n )

(6)

where v is a real parameter, ν ∈ R ˈ X = ( x1 , x1 , x 2 , τ ) . The stable state corresponds to the fixed point of Poincaré section, so Eq. (6) should satisfy 1

T

X * = F (ν * , X * )

(7)

where X = ( x1 , x1 , x 2 ,τ ) is the fixed point on the Poincaré section and ν is certain changeable system parameter corresponding to the fixed point. *

*

*

*

* T

*

Suppressing of Chaotic State Based on Delay Feedback

611

The purpose of the controlling is that when the chaotic motion happens, it continuously feedbacks some of the time-delayed state variables until the chaos is suppressed. When the chaos is suppressed, the system will be converged to the fixed point on the Poincaré section. Set the delay time asS It is known that when the system parameters are d = 5, F = 7, k = 6, R = 0.8, µ = 0.1, C = 0.1 Zith changes of the bifurcation parameters Ȧ the system will deviate from stability and convert to chaos motion. Figs. 3 and 4 respectively show the diagram of projection mapping of the system variables on the Poincaré section and the time response of the mapped values when ω = 1.8486 and there is no controlling.

Fig.5 shows the suppressing of above chaotic motions, while G = ( g1 , g 2 , g 3 ) T

g1 = g 2 = g3 = 0.0425 andS To illustrate the controlling effect clearly, we x1

x1

x1 (a)

x 2

x1 ~ x1

(b)

x2 ~ x1

Fig. 3. The projection on the Poincaré section of some of the system state variables and there is no controlling, where ω = 1.8486

x1

x1

1

1.5

1

0.5

0.5 0 0 -0.5 -0.5 -1

-1.5 0

-1

500

D N

1000

1500

2000

N

-1.5 0

500

1000

1500

2000

N

~ x1 Time responsesE N ~ x1 Time responses

Fig. 4. Time responses of some variables while there is no controlling, where ω = 1.8486

612

x1

W. Zhao and L. Wang

x1

1

1.5

1

0.5

0.5 0

0 -0.5

-0.5 -1

-1

-1.5 0

500

D N

1000

1500

~ x1 Time responses

2000

N

-1.5 0

500

E N

1000

1500

2000

N

~ x1 Time responses

Fig. 5. Time response after controlling, where ω = 1.8486

put the controlling in atQ It is shown in Fig. 5 that the time delay feedback controlling method is effective on suppressing the above chaotic motions. To understand the effect of this method on suppressing such chaotic motions roundly, in the following, through numerical simulations, we’ll discuss the relationship between the feedback matrix G = ( g1 , g 2 ,..., gi ) and the result of T

the controlling, as well as that between the time of the controlling and it’s effect. Let’s discuss the relationship between feedback matrix G = ( g1 , g 2 ,..., gi ) and T

it’s effect at first. While setting G = ( g1 , g 2 , g 3 ) and g1 T

= g 2 = g3 = g in

the region of (0,1), we decreaseJfrom 0.9 gradually. ,Wwould be seen that the responses of the transitional process will be changed from the type of monotony converging to the type of concussive converging, and either too large or smallJwill prolong the converging time. Fig. 6 is the simulation results of the responses of the system’s transitional process with J while other conditions are fixed. (We only illustrate the cases of τ DQGother variables are the same.) The simulation also shows that if theJis too small,the system may converge to a changeless circle and preserves the periodical motion for a long time, rather than converged to a fixed point. For this system, if Jit can not be converged to the fixed point. Fig. 7 shows the system variable’s time responses and it projection on the Poincaré section when g = 0.03 and other conditions are unchanged. To clearly show the results, in the diagram of the projection, the first 1000 points at the beginning of the N-axis are not shown. It is obviously that the system converge to a changeless circle. Besides the relationship between the feedback matrix G = ( g1 , g 2 ,..., gi ) and the effect of the controlling, it is also very important to know the relationship between the time when we put the controlling into the system and the results of the controlling. The initial conditions of the controlling are different when the time that we put the T

Suppressing of Chaotic State Based on Delay Feedback

τ

τ

2

2

1.9

1.9

1.8

1.8

1.7

1.7

1.6

1.6

1.5

1.5

1.4

1.4

1.3

1.3

1.2 500

520

540

560

580

(a) g=0.9

τ 2.2

600

N

613

1.2 500

520

540

560

580

600

N

(b) g=0.5

τ 2.6

2.1 2.4 2 2.2

1.9 1.8

2

1.7 1.8

1.6 1.5

1.6

1.4 1.4 1.3 1.2 500

520

540

560

580

600

N

1.2 500

520

540

(c) g=0.1

560

580

600

N

(d) g=0.05

Fig. 6. The simulation results of the responses of the system transitional process with g

controlling into the system is different. Thus, we only change the time when we put the controlling into the system, and the results show that this method is uncorrelated with the initial conditions. Fig. 8 shows the simulation results with different controlling time. τ 2.6

τ

2

2.4 1.9 2.2 1.8 2 1.8

1.7

1.6 1.6 1.4 1.5 1.2 1 0

N 500

1000

(a) Time responses

1500

2000

1.4 -0.5

N -0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

(b) The diagram of the projection on the Poincare section.

Fig. 7. The results of the controlling, while g = 0.03

0.4

614

τ

W. Zhao and L. Wang

τ

3.5

2.6 2.4

3 2.2 2

2.5

1.8 2

1.6 1.4

1.5 1.2 1 0

τ

500

1000

1500

2000

N

(a) N=600

1 0

τ

2.6

2.4 2.2

2

2

1.8

1.8

1.6

1.6

1.4

1.4

1.2

1.2

1000

(c) N=800

1500

2000

N

1500

2000

N

(b) N=700

2.2

500

1000

2.6

2.4

1 0

500

1 0

500

1000

1500

2000

N

(d) N=900

Fig. 8. The simulation results with different controlling time

4 Conclusion In this paper, a method using the time delay feedback is proposed to eliminate certain kind of chaotic motions in the system with high dimensions. The numerical simulations show this method is effective. The results show this method can effectively suppress the chaotic motion of the Vibro-impact system. During the application, we can improve the speed of the suppressing by properly choosing the feedback ratio factors. The signals this method uses are generated by the system itself, so the results of controlling are uncorrelated with the time when we put the controlling into the system. The above study reinforces the foundation of suppressing the chaotic abnormal status in the vibration system with clearance and provides some useful results for the further realistic applications.

Acknowledgement This work is supported by National Natural Science Foundation of China (No.605720 52).

Suppressing of Chaotic State Based on Delay Feedback

615

References 1. Zhao, W. L., Wang, L. Z.: Hopf Bifurcation of a Impact Damper. Proceeding of the 3rd International Conference on Nonlinear Mechanics. Shanghai, China 10 (1998) 437-440 2. Sung, C. K., and Yu, W. S.: Dynamics of a Harmonically Excited Impact Damper: Bifurcations and Chaotic Motion. Journal of Sound and Vibration 2 (1992) 317~329 3. Zhao, W. L., Zhou, X. J.: Bifurcation and Chaos of Two Degree of Freedom Impact Vibration System with Clearance. Journal of Zhejiang University 5 (2006) 4. Luo, G. W., Xie, J. H.: Research in Stability of Periodic Motion, Bifurcation and Chaos in a Vibratory System with a Clearance. ACTA MECHANICA SOLIDA SINICA 9 (2003) 127-133 5. Luo, G. W., Xie, J. H.: Bifurcation and Chaos in a System with Impacts. Physica D. 148 (2001) 183-200 6. Chatterjee, S., Mallik, A. K.: Bifurcations and Chaos in Autonomous Self-excited Oscillators with Impact Damping. Journal of Sound and Vibration 191(1), (1996) 539-562 7. Wang, L. Z., Zhao, W. L.: Suppressing Chaotic Motion of Vibro-Impact Systems with Clearance. 7th International Conference on Electronic Measurement & Instruments. Beijing, China 8 (2005) 186-190 8. Wang, L. Z., Zhao, W. L.: Suppressing Chaotic Motion of a Class of Piecewise-Smooth Systems by sine Periodic Force. ACTA PHYSICA SINICA 9 (2005) 4038-4043 9. Wang, L. Z., Zhao, W. L.: Suppressing Chaos in Machine System with Impacts Using Period Pulse. International Conference on Intelligent Computing. LNCS 3645. SpringerVerlag, Berlin Heidelberg (2005) 929-939

Vector Controlled PMSM Drive Based on Adaptive Neuro-fuzzy Speed Controller Xianqing Cao and Liping Fan Shenyang Institute of Chemical Technology, Shenyang, China [email protected]

Abstract. This paper presents the implementation of adaptive fuzzy neural network controller (FNNC) for accurate speed control of a permanent magnet synchronous motor (PMSM). FNNC includes neural network controller (NC) and fuzzy logic controller (FC). It combines the capability of fuzzy reasoning in handling uncertain information and the capability of neural network in learning from processes. The initial weights and biases of the artificial neural network (ANN) are obtained by offline training method. Using the output of the fuzzy controller (FC), online training is carried out to update the weights and biases of the ANN. Several results of simulation and experiment are provided to demonstrate the effectiveness of the proposed FNNC under the occurrence of parameter variations and external disturbance.

1 Introduction For a high-performance drive system, not only a fast response is required, but also the ability of quick recovery of the speed from any disturbances and insensitivity to parameter variation is essential [1, 2, 3, 4, 5].The speed controller used in PMSM drive system plays an important role to meet all the requires mentioned above. Parameterfixed, conventional controllers such as PI controller have been widely used. For a long-run drive system, their performances are unsatisfactory. Because of the unknown load dynamics, and other factors such as noise, temperature or parameters variation, etc. It is difficulty to choose the optimal parameters of the PI controller for the PMSM drive. To solve this problem, some adaptive controllers have been applied in both ac and dc motor drive. Such as model reference adaptive controller (MRAC) [6] and sliding model controller (SMC) [7], etc. all the types of controllers can improve the performance of the motor drive. However, they are usually based on the parameters and structure of the system model. It will lead to complex computation when the system model is uncertainty. Other persons replace PI speed controller with NC to meet the system’s dynamic characteristics. Generally, in order to get the error to tune the weights of the ANN, they all adopt an other controller (such as MRAC). Inevitably, it will lead to the complex computation again, To overcome this difficulty, this paper present an approach of designing adaptive FNNC which adopts the fuzzy logic to the artificialneural-network. FNNC combines the capability of fuzzy reasoning in handling uncertain information and the capability of ANN in learning from processes. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 616 – 622, 2006. © Springer-Verlag Berlin Heidelberg 2006

Vector Controlled PMSM Drive Based on Adaptive Neuro-fuzzy Speed Controller

617

The proposed control scheme has been testified by simulation and experiment, the results indicate the proposed PMSM drive will have the ability of quick recovery of the speed from any disturbances and parameters variation. Accordingly, the PMSM drive will have better dynamic performances and robustness.

2 Vector Control Model of the PMSM An efficient control strategy of the vector control technique is to make the d-axis current id zero so that the torque becomes dependent only on q-axis current. With this control strategy, the motor model of the PMSM becomes simpler as can be described by the following equations [4] piq = (v q − Rs i q − K bω r ) / Lq .

(1)

pω r = (Te − TL − Bω r ) / J .

(2)

Where vq is the q-axis stator voltage, iq is q-axis stator current, Lq is q-axis stator inductance,ɓr is rotor speed, Rs is stator resistance, p is differential operator, Kb=Pɒm, J is the rotor inertia, B is the damping coefficient, Te, TL are the electromagnetic torque and the load torque, and can be represented by [4]. Te = TL + Jpω r + Bω r .

(3)

2 TL = K 1ω r + K 2ω r + K 3 .

(4)

Where K1, K2 and K3 are constants. Combining (1)-(4), the q-axis current can be expressed by [4] iq (n) = A1iq ( n − 1) + B1[ωr ( n) − (α +

εC1 B

)ω r ( n − 1) − βω r (n − 2) − γω r2 (n) − δω r2 (n − 2) − ϑ ] / ε

.

(5)

Where α , β ,γ ,δ ,ε ,ϑ, A1, B1 ,C1 are given in Appendix A. Equation (5) reveals the nonlinear function between iq and Ȧr. The purpose of the ANN is to map the non-linear relationship between iq and Ȧr.

3 Structure of the FNNC for PMSM Drive 3.1 Structure of the ANN The inverse dynamics of PMSM as described in equation (5) indicates the inputs and output of the ANN used in the control system. The proposed architecture of three-layer ANN includes M=4 input nodes, N=3 hidden-layer neurons, and Q=1 output nodes. W1, b1 are the weights matrix and biases vector between the input layer neurons and hidden layer neurons, W2, b2 are the weights matrices and biases vector between the hidden-layer neurons and output-layer neuron. The transfer function used in the hidden layer and output layer neuron are tan-sigmoid 1 and pure-linear f o (x ) = x , respectively. Once the structure of the ANN is f h (x ) = −x 1− e

618

X. Cao and L. Fan

done, the initial weights and biases are obtained through the off-line training. BP algorithm is used in this work. To get the satisfactory weights and biases, the training data should slid over the entire speeds and q-axis currents. Supposing the load torque is a constant (TL=0N·m), two sets of data are obtained. The input matrix is the size of 4 960. while the output vector is the size of 1 960. After the off-line training is well performed, the weights and biases are considered as the initial parameter set of the neural network controller. To tune the weights and biases, the neural network error gradient must be evaluated online. So the system output error between desired output (iq*) and actual output (iq) is necessary. In fact, the desired output (iq*) is unknown. So we can adopt the fuzzy logic to the ANN. using the desired speed and actual speed, we can get the q-axis current increment iq to tune the weights and biases online.

h

h

3.2 Structure of the FC In this work, the output of the FC is the q-axis current increment the input variables is 2, giving

i , the number of q

¹ (n) -¹ (n) . e =¹ (n-1)-¹ (n) . *

es1= s2

r

r

r

(6) (7)

r

The universe of all input and output variables are normalized to [-1 1] and the three fuzzy variables are divided into 7 fuzzy subsets with the linguistic values {NB NM NS ZO PS PM PB}. The fuzzy membership functions are shown in Fig. 1.

Fig. 1. Fuzzy membership functions of the input and output variables

In this work, using the reference speed and the actual speed, we can calculate es1 and es2. Once the actual speed can not track the reference speed. We can tune the weights and biases of the ANN with the output of the FC. 3.2 Online Weights and Biases Updating The error function is given by e ( n) =

1 2 ∆iq (n) 2

.

(8)

Vector Controlled PMSM Drive Based on Adaptive Neuro-fuzzy Speed Controller

619

Using the error, we can update the weights and biases as follows [5]. Wijh (n + 1) = Wijh ( n) + ηδ hj ( n) X i ( n)

B hj (n + 1) = B hj (n) + ηδ hj (n)

(9)

.

.

(10)

W jko (n + 1) = W jko (n) + ηδ ko (n)O hj (n) .

(11)

Bko (n + 1) = Bko (n) + ηδ ko (n) .

(12)

Where xi is the output of the ith neuron of the input layer, Wijh is the weight between

ith neuron of the input layer and jth neuron of the hidden layer, B hj is the bias of the jth neuron, W jko is the weight between jth neuron of the hidden layer and kth neuron of the output layer, Bko is the bias of the kth neuron, Ș is the learning rate, δ jh and

δ ko are the local gradient, respectively. In real time implementation, error is derived from the fuzzy controller at each instant. When the actual speed can track the reference speed, the output of the fuzzy controller is zero. Otherwise, the weights and biases will be updated online as described above. And the structure of the FNNC is shown in Fig. 2.

Fig. 2. Block diagram of the structure of the fuzzy neural network controller (FNNC)

4 Results of Simulation and Experiment A block diagram of the proposed PMSM drive is shown in Fig. 3. The parameters of the PMSM are given in Table. 1. The simulation conditions are given: Case1 : J = 2 × J , TL = 5N ⋅ m, ωr* = 1000r/min ; Case 2 : J = J , TL = 5 N ⋅ m .Where J

is the changed rotor inertia. For all the simulation conditions, the results of the reference speed r*, actual speed r and the output of the NC (ǻiq) are given by Fig. 4, Fig. 5.

¹

¹

Table 1. Parameters of the PMSM

Rated power PN Stator resistance Rs Magnetic flux Ȍm

3kW 2.875ȍ 0.175Wb

Rotor inertia J Damping coefficient B d-axis stator inductance Ld

7.25e-3kg.m2 0.8e-3(N.m)/rad/sec 8.5e-3H

Number of poles

4

q-axis stator inductance Lq

8.5e-3H

620

X. Cao and L. Fan

Fig. 3. Block diagram of PMSM drive with the proposed FNNC

Fig. 4. Simulation results of the proposed PMSM drive at case 1:(a) Track response (b) Control effort

gė

g

Fig. 4 shows the simulation results of the load change (TL=0N m TL=5N m) and the machine parameter change (J=7.25e-3kg.m2 J=2*7.25e-3kg.m2). Because of these changes, the motor speed changes slightly, but the motor speed approximately tracks the reference speed after that.

ė

Fig. 5. Simulation results of the proposed PMSM drive at case 2:(a) Track response (b) Control effort

At case 2, the reference speed follows a profile of 1000r/min-0r/min-1000r/min0r/min, as depicted in Fig. 5. This figure shows the proposed control scheme is operated well even if at the instant of the reference speed changes. To verify the proposed control scheme, the laboratory control board was constructed (Fig. 6). This system is based on TMS320LF2407 DSP. The parameters of the PMSM are given in Table. 1. Offline training of the neural networks is completed in the MATLAB/SIMULINK. When the load torque TL=5N⋅m and the reference speed ω r* change from 500r/min to 1000r/min, the results are given in Fig. 7. From Fig. 7 we can see the proposed control system has better performances than the conventional drive, and can adapt the running conditions quickly. In this experiment, all the results are derived from TDS2014 oscillograph.

Vector Controlled PMSM Drive Based on Adaptive Neuro-fuzzy Speed Controller

621

Fig. 6. Configuration of the proposed control system

(a) A phase current

(c) A phase current

(b) Speed response

(d) Speed response

Fig. 7. Experimental results of the PMSM drive: (a) (b) response of the conventional control drive; (c) (d) response of the proposed control drive

5 Conclusion In this paper, a vector control scheme with the fuzzy neural network controller (FNNC) for the PMSM has been presented. FNNC includes neural network controller (NC) and fuzzy logic controller (FC). It combines the capability of fuzzy reasoning in handling uncertain information and the capability of neural network in learning from processes. The initial weights and biases of the artificial neural network (ANN) are obtained by offline training method. Using the output of the fuzzy controller (FC), online training is carried out to update the weights and biases of the ANN. The results of simulation have shown that the PMSM drive with the proposed FNNC has the merits of simple structure, robustness, accurate tracking performance, and parameter learning algorithms.

622

X. Cao and L. Fan

References 1. Lin. F.J, Lin, C.H..: A Permanent-Magnet Synchronous Motor Servo Drive Using SelfConstructing Fuzzy Neural Network Controller. IEEE Transactions on Energy Conversion, Vol. 19,No. 1,(2004)66-72 2. Liu, J., Liu D., Wu, P.S., Bai, H.Y.: The Simulation Analysis of Permanent Magnet Synchronous Motor Based on the Strategy of Modulating the time of Voltage Vector Proceedings of the CSEE, Vol. 24,No. 10,(2004)148-152 3. Ha, J.I., Sawa, Toshihiro, Sul, S. K.: Sensorless Rotor Position Estimation of an Interior Permanent-Magnet Motor From Initial States. IEEE Transactions on Industry Applications, Vol. 39, No. 3,(2003)761-767 4. Yang, Y., D. Mahinda Vilathgamuwa, Rahman, M. A.: Implementation of an ArtificialNeural-Network-Based Real-Time Adaptive Controller for an Interior Permanent-Magnet Motor Drive. IEEE Transactions on Energy Conversion, Vol. 39, No. 1,(2003)96-104 5. Rahman, M. A, St. John: On-line Adaptive Artificial Neural Network Based Vector Control of Permanent Magnet Synchronous Motor. IEEE Transactions on Energy Conversion, Vol. 13,No. 4,(1998)311-318 6. Naitoh, H., Tadakuma, S.: Microprocessor Based Adjustable Speed dc Motor Drives Using Model Reference Adaptive Control. IEEE Transactions on Industry Applications, Vol. 23,No. 2,(1987)313-318 7. Dote, Y., Hoft, R. G.: Microprocessor Based Sliding Mode Controller for dc Motor Drives. IEEE/IAS Annual Meeting (1980)641-645

㧚

Appendix A: Inverse Dynamic Constants α = [ 2LqJ + ∆T( Rs J + LqB + K2Lq ) − ∆T2( RsB + KT Kb + K2Lq )] / D; β = −LqJ / D;;δ = [ ∆TK1Lq ] / D γ = −[ ∆TK1( Lq + Rs∆T )] / D;ε = [ KT ∆T 2 ] / D;ϑ = −[ K3Rs∆T 2 ] / D;.D = Lq J + ∆T( Rs J + LqB + K2Lq ) ∆TKb R ∆T ∆T ;C1 = − . A1 = 1 − s ; B1 = Lq Lq Lq WhereǻT is the sampling interval.

“Intelligent Yardstick”, an Approach of Ranking to Filter Non-promising Attributes from Schema in Data Mining Process Mohammad M. Hassan Information and computer science department King Fahd University of Petroleum and Mineral, Dhahran 31261, Saudi Arabia [email protected]

Abstract. Knowledge searching and representation process needs to go through filtering to make it concise and precise. The data mining process (DM) as a knowledge extraction method also needs filtering to remove unnecessary or erroneous data. Sometimes filtering also needs to cover processing or representational limits, especially in visual data mining. Common approaches for filtering or shrinking the database are - discard records, prune attributes from the database or reduce dimensionality. In this paper we propose an approach where we have applied single dimension DM process on each individual attribute to select only those attributes which are good for future full-scale DM and prune the non prospective attributes. In this process we check several properties of a single attribute like data distance, clustering tendency and cluster density. According to the result, we formulate a scale that will rank all attributes to indicate their importance for future knowledge extraction schema. The main observation that helps us to build this approach is –‘An attribute that has no data pattern itself will not help to find a pattern in a multidimensional environment’.

1 Introduction In recent times, due to revolutionary upgrading of computer storage and processing power, the amount of information available in electronic media has rapidly grown. Traditional data handling schema are inadequate to handle the situation. Users are not satisfied with mere information but also want to know the concise facts behind the relative information. So Knowledge Discovery in Database (KDD) evolves that explorer large amount of data to find relevant and interesting patterns within them by extensive search. According to the definition by Fayyad -“KDD extracts potentially interesting and previously unknown knowledge from large amounts of data”1. Data mining as a part of KDD, has three basic problems to solve- identifying classifications, finding sequential patterns, and discovering association rules8. To accomplish these tasks, DM depends on concise and error free data which implicitly include one more problem in DM as ‘Intelligent data preparation’ 2. From Fig-1 it is also obvious that this step is an essential part of KDD process. Our work is included in this arena. We apply limited DM process for ranking attributes and then identify D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 623 – 632, 2006. © Springer-Verlag Berlin Heidelberg 2006

624

M.M. Hassan

non promising attributes to prune them from schema. In this way, we accomplish data preparation to get concise data for further mining in KDD process. The main feature of our system is that we apply the basic and simplified idea of KDD for individual attributes. We can extract knowledge from the pattern of data. So, we first try to find the data pattern of a single attributes then according to the pattern, we try to guess its probable effectiveness for overall data mining process. In this way, in training phase, we can identify the most valued attributes and consider them as prime properties for further KDD process, like making decision tree etc. The main profit of our approach is that it reduces the complexity of finding pattern in large schema base dataset by not affiliating some attributes. This approach looks really promising in VDM (visual data mining) 3 where due to incapability of representation space we have to select limited number of attributes. It also helps in those situations where processing power is also a constrained. In the following sections, at first we briefly check related systems and then, we propose our model in section 3 and 4. We also implement our system as a simulation model and find some interesting observations that are described in section 5 and 6 respectively. Finally in section 7, we make the conclusion for further development of our system to use it in complex or unstructured database.

Fig. 1. A typical KD process4

2 Related Works According to our observation there are three main categories of data preprocessing involved in KDD as• Data cleaning • Dimensionality Reduction • Attributes pruning 2.1 Data Cleaning It is a process of discarding erroneous, corrupted or duplicate data from database. A good example of it is in Waynne Hsu’s work, where after conforming the diabetic patients database of various format to a standard one they found a lot of duplicate data, those are not only contribute to the problem of handling ever-increasing amount of data, but also lead to the mining of inconsistent or inaccurate information that is obviously undesirable. Corrupt, damaged or erroneous data are also a major problem in data preparation, as it some times happens due to malicious attacks or system failures. So it is needed to recover or discard them from the database6.

“Intelligent Yardstick”, an Approach of Ranking to Filter Non-promising Attributes

625

2.2 Dimensionality Reduction The main idea of dimensionality reduction is to represent or convert several dimensions (attributes) by a single form of attribute. Many dimensionality reduction techniques have been proposed, including Singular Value Decomposition (SVD), the Discrete Fourier transform (DFT), and the Discrete Wavelet Transform (DWT) 7. The main profit of this is that it makes similarity searching and indexing data easier. 2.3 Attributes Pruning We called attribute pruning as discarding or deleting some attributes from the database schema. This process is sometime also referred as dimensionality reduction like Huang’s work 8. But we think it’s a distinct process, because in common terminology dimensionality reduction means combining several attributes to a single attribute in which data of individual attribute doesn’t lose totally but hide under a complex form of presentation. Actually, when we prune some attributes (dimensions) from a dataset we totally ignore the data of those attributes which means that they will have no influence for future KDD process. The main logic behind this process is that the main goal of data mining process is to classify the data to get some inferred knowledge. So, eliminate those parameters (less important parameters first) that do not contribute to class separation. There are several methods already exist, like Samuel H. Huang works8, where he used Chi-square statistics to rank the importance of individual attributes then prune less promising attributes. Mark and Geoffrey has done some good benchmarking work 10 where they checked six systems of attributes ranking. According to their study they found that there is no single best approach for all situations rather they suggest the miner to check the strengths and weaknesses of the target learning algorithm along with background knowledge about the data. In our propose system we make some kind of greedy approach where we try to find low ranked attributes which will surely be ranked low by any other system due to its inherent nature.

3 Rationale Behind “IY” ‘An attribute that has no data pattern itself will not help to find a pattern in a multidimensional environment’. Our model for developing IY (Intelligent yardstick) relies on this above idea. Now we will prove it by example in a two dimensional representational space. The clustering mainly depends on the distance of entities in presented area. Now look at the Fig-2. If we take both X and Y dimensions to represent any entity then the two points look apart but if we take their projection on corresponding dimension, it becomes clear that for Y dimension the points become apart as in X projection they look very much close. So, if we take only X then we can say that these two entities have some closeness to make some patterns (by becoming the member in a cluster) and we know the main goal of data mining is to find the pattern from enormous data. This concept of pattern is used for knowledge discovery and machine learning. The

626

M.M. Hassan

Fig. 2. Data Points in a two dimensional space

main point is that if the values of a single field have no trend to concentrate on some points then the field has low chance to become a promising candidate for DM.

4 “IYs” Model In our proposed system we make the final ranking in three steps as• Calculate several features values of each attribute. • Calculate promise-ness of each attribute by combining those features values. • Make a relative ranking for all the attributes 4.1

Calculation of Three Features Value of a Single Attribute

In our proposed model we check three properties of each individual attribute as – (a)average distance of data points, (b)number of clusters and (c) the over all density of each cluster.

Average distance

W1 Number of clusters Calculated Promise ness

W2 W3 Density of clusters

Fig. 3. Calculating weighted promise-ness of a single attribute

“Intelligent Yardstick”, an Approach of Ranking to Filter Non-promising Attributes

627

Average Distance: We measure it by averaging Euclidian distance of data points (after sorting). We choose it as a ranking criterion because if the average distance is small then data points of that specific attribute have good chance to make pattern. Number of cluster: We check number of clusters using center based clustering approach11. In this process we order the data point first and then start clustering. Density of clusters: We calculate the cluster density by the average point present in cluster. It means that we find the number of clusters with total number of points belongs to them and then divide the clustered points by the cluster number. 4.2

Calculation of Promise-ness

After getting the values of three features for a single attribute we combine them to a single value with some weight sum. Here we use w1, w2, w3 as weight average distance, number of cluster and density of cluster respectively. The weighted sum of the features value is considered as promise-ness of the attribute. These weights are set in training phase for a specific domain. We give this flexibility to give the miner some options to take special kinds of attributes. According to our observation in simulation (section-4) it seems that the most important feature is the density of clusters and then the number of clusters, but we make it open for further investigation for a specific domain. This portion of the model is implemented based on simple neural network concept12. In training phase miner can check the values of promise-ness and investigate the corresponding patterns and then adjust weight to make it a good classifier for attributes. 4.3 Relative Ranking The idea behind relative ranking is to expose the ranking difference between attributes more finely.

Promise-ness Attribute 1

Highest Ranked Attribute

Promise-ness Attribute 2 Relative

i- th Ranked Attribute

Lowest Ranked Attribute Promise-ness Attribute N

Fig. 4. Relative Ranking of all attributes

We show it in Fig-4 where we collect the weighted promise-ness values of all attributes then make the final ranking to grade them according to their relative difference.

628

M.M. Hassan

At a glance, our IY model is applying the KDD process for each attribute to find their inherent non patternization tendency.

5 Implementation We have developed our model using MS .NET technology. Fig-5 shows the screen shot of our implemented IY. Here we gave the user a choice to select a database and also specify his desired dataset as table. Then he has to set some parameters before getting the ranking result. Actually, these parameters setting are for training phase to collect the importance of features in a specific domain dataset. After finalizing the settings (which will be used to calculate the weighted sum of the features as promiseness), clicking on “Show Ranking Result” button will produce the ranked result with three features value of each attribute.

Fig. 5. Screen shot of our developed IY model

6 Result and Discussion To test our model we made a database with four attributes with different sparse properties. In the following Table-1 we will show the properties of the attributes in a tabular form-

“Intelligent Yardstick”, an Approach of Ranking to Filter Non-promising Attributes

629

Table 1. Properties of attributes in tested dataset Attributes name

F1 F2 F2 F3 F4

Value range 1-400 1-100 350- 400 100-350 270-300

Number of data 100 Unspecified Unspecified 100 100

Here, F1 contains any data according to the random generator within its range. For F2 we take random data of two groups. In F3 the range is small regarding others and finally in F4 data points are very close due to short ranged. .If we observe Table-1 carefully we will find that the maximum range is around 400. We apply this constraint because we want to show data points in left pan of the IY window (Fig-5) whose size is 400 by 400. We make a function which will support this value range with these constraints from random numbers (In Fig-5 “Prepare Random Data” button contain this function). We use .NET random number generator to make this function. Table-2 shows the resultsTable 2. Feature values of attributes in dataset based on random sample

Attributes name

Average Distance

No of Cluster

Cluster Density

F1

3.96

4

3.25

F2

3.92

7

10.29

F3

2.44

8

3.38

F4

0.29

4

19.00

Fig. 6(a). F4 with F1(High cluster density of F4 tends to make a pattern)

Fig. 6(b). F4 with F2(High cluster density of F4 tends to make a pattern)

630

M.M. Hassan

Fig. 6(c). F4 with F3(High cluster density of F4 tends to make a pattern)

Fig. 6(d). F1 with F3(Due to high average distance of F1 no significant pattern exists)

Fig. 6(e). F1 with F2(Due to high average distance of F1 no significant pattern exists)

The result of above table may change slightly with different random data generated by our function. We have checked several times with different data points. According to our observation, we found that cluster density is the most important feature. If we observe following Fig-6(a),(b),(c) then it becomes obvious that in any presentation, if F4 (Highest cluster density) exists, the over all presentation tends to exhibit some pattern. According to our another observation, we found that clustering effects negatively due to high average distance, like F1, we cannot find significant pattern due to high average distance with low clustering and low cluster density (See Fig-6.(d),(e)). From number of clusters we did not get any significant result due to our experimental limitations. Finally, in Table-3 we show our observation in brief.

“Intelligent Yardstick”, an Approach of Ranking to Filter Non-promising Attributes

631

Table 3. Pattern result after combining two attributes by observation

F1 F1 F2 F3

F2 No

F3 No Weak

F4 Weak Strong Strong

Now, we can train our system for dataset of this type using Table-3. From where we can say that that F4 followed by F2 and F3 type attributes are more significant than others and in training process it will readjust those parameters by setting such a way that will rank F4 as the top most. Finally, after training we can use it for rank attributes in this type of dataset.

7 Conclusion and Future Work In our system though we may give some attributes high ranks which are not really good candidates but it ensures one thing that low ranked candidates are surely less important for DM process. We can improve the ranking for high ranked attributes by combining them with others to check their possibilities to exhibit patterns in multidimensional space. We can use this model in data acquisition process by emphasis on getting accurate and error free data for attributes which are more important for KDD. Our plan is to search more features that can be ideal for showing the strength of an individual attribute. We will do some extensive experiment with diversified data to check these features.

Acknowledgments I would like to thank King Fahd University of Petroleum and Minerals for funding this research work and providing of the computing facilities, special thanks to anonymous reviewers for their valuable comments on this paper.

References 1. Fayyad, U.M., Piatetsky-Shapiro G., Smyth, P.: From Data Mining to Knowledge Discovery: An Overview. Advances in Knowledge Discovery and Data mining, AAAI Press (1996) 1-34 2. Xingquan Z., Xindong W.: Cost-Constrained Data Acquisition for Intelligent Data Preparation. IEEE Transactions on Knowledge and Data Engineering, 17 (2005) 3. Hinneburg, A., Wawryniuk, M., Keim, D.: Using Projections to Visually Cluster HighDimensional Data. IEEE Computing in Science & Engineering, 5 (2003) 14 – 25 4. Abraham B., Foster P., Shawndra H.: Toward Intelligent Assistance for a Data Mining Process: An Ontology-Based Approach for Cost-Sensitive Classification. IEEE Transactions on Knowledge and Data Engineering, 17 (2005)

632

M.M. Hassan

5. Wynne H., Mong L.L., Bing L., Tok W.L.: Exploration Mining in Diabetic Patients Databases: Findings and Conclusions. Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2000) (Boston, Aug. 20–23). ACM Press, New York (2000) 430–436 6. Paul A., Sushil J., Senior, P.L.: Recovery from Malicious Transactions. IEEE Transactions On Knowledge And Data Engineering, 14 (2002) 7. Eamonn K., Kaushik C., Sharad M.-M.P.: Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases. ACM Transactions on Database Systems (TODS), 27 (2002) 188 – 228 8. Rakesh A., Tomasz I., Arun S.: Database Mining: A Performance Perspective. IEEE Transactionson Knowledge and Data Engineering, 5(6) (1993) 914–925 9. Samuel H.H.: Dimensionality Reduction inAutomatic Knowledge Acquisition:A Simple Greedy Search Approach. IEEE Transactions On Knowledge And Data Engineering, 15 (2003) 10. Mark A.H., Geoffrey H.: Benchmarking Attribute Selection Techniques for Discrete Class Data Mining. IEEE Transactions On Knowledge And Data Engineering, 17 ( 2005) 11. Mohammad, M.H., Islam, S.-M.S.: Virtual Ear: A Center Based Visual Clustering Approach. Proceedings of International Conference on Intelligent System Kuala Lumpur, Malaysia, 8-D-1 12. Stuart R., Peter N.: Statistical Learning Methods. Artificial Intelligence A Modern approach, Prentice Hall, (2003) 736-748

A Local Computing-Based Hierarchical Clustering Algorithm Building Density Trees Wei-di Dai1, Jie-Liu2, Da-yi Zhao3, Zhen-hua Liu3, Jun-xian Zhang3, and Pi-lian He1 1

Department of Computer Science and Technology, Tianjin University, 300072 Tianjin, China [email protected], [email protected] 2 Academic Affairs Office, TianJin University, 300072 Tianjin, China [email protected] 3 Binzhou Vocational College, 256624 Binzhou, China [email protected], [email protected]

Abstract. A new kind of clustering algorithm called LOCHDET (LOcal Computing-based Hierarchical clustering algorithm building DEnsity Trees) is presented in this paper. LOCHDET generates a density tree for each potential cluster according to its local density distribution. Each cluster is regarded as a tight coupling structure. Those “closer” clusters are merged if some conditions are satisfied. In order to reduce the cost time, a local computing technology is introduced. LOCHDET has a wide range of parameter settings, preferable accuracy in discovering clusters with arbitrary shape, good ability of processing noise data sets and weak sensitivity to input parameters by generalizing densitybased, hierarchical, and locality-based methods. The results of our experiments confirm these mentioned above.

1 Introduction Cluster Analysis is one of the most important research fields in data miming which could be viewed as the process of grouping a set of physical or abstract objects into classes of similar objects. Density-based clustering methods, as one main theme of cluster analysis, have attracted extensive concern due to their capability for discovering clusters of arbitrary shapes in databases with noise[1]. Typically, these methods include DENCLUE[2], DBSCAN[3], OPTICS[4], DILC[5] and CABDET[6]; Hierarchical clustering methods create one decomposition of the given set of data objects. Hierarchical methods can be classified as being either agglomerative or divisive, based on how the hierarchical decomposition is formed. Some famous algorithms such as CURE[7], Chameleon[8] and BIRCH[9] belong to this hierarchical method. DENCLUE is a clustering method based on a set of density distribution functions. The process of clustering depends on the identification of density attractors which are D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 633 – 641, 2006. © Springer-Verlag Berlin Heidelberg 2006

634

W.-d. Dai et al.

the local maxima of the sum of density influence functions from all data points. DBSCAN, which requires two global input parameters, namely radius of the İneighborhood and the minimum number of objects, can quickly discover different clusters of arbitrary shapes according to the connectivity of density. To avoid DBSCAN’s global density parameters, OPTICS method computes an augmented cluster ordering which represents the density-based clustering structure of the data. Therefore, two values need to be saved for each object; core-distance and reachability-distance. OPTICS provides greater flexibility than the DBSCAN algorithm for discovering intrinsic structure by reason of its changeable radius of the İ-neighborhood and unchangeable minimum number of objects. Density isoline clustering (DILC) starts from the density isoline figure of objects, and locates relatively dense regions. During the clustering process, the number of objects within the radius of the İ-neighborhood for each current object should be computed. CABDET creates a tree structure for every cluster, from which the neighbor’s radius of the current object is calculated by the local density of its father node. Those unprocessed objects in the neighbor of the current object are added to extend the tree structure until no new object is founded. Density-based clustering algorithms mentioned above, except CABDET, set two parameters: the radius of the neighborhood and the density threshold. These methods are put forward by three means: observing the density distribution by using an unchangeable radius of the neighborhood; investigating the changeability of the radius of the neighborhood; and adopting an unchangeable radius of the neighborhood and an unchangeable density threshold. The fixed input parameters put some constraint on algorithms and can’t effectively reveal the true distribution of objects. As a result of its changeable radius of the neighborhood and density threshold, CABDET doesn’t suffer from these flaws and has no limitation of global input parameters. The experiments on the same data sets show that CABDET is significantly more accurate in discovering potential clusters and less sensitive to input parameter than well-known algorithm DBSCAN[6]. However, the higher computational complexity of CABDET restricts its application on large scale of real data sets. Another problem of CABDET is to split one cluster into several smaller clusters at an unsuitable input parameter. A new kind of clustering algorithm called LOCHDET (LOcal Computing-based Hierarchical clustering algorithm Building DEnsity Trees) is presented in this paper. LOCHDET generates a density tree for each potential cluster according to its local density distribution. Each cluster is regarded as a tight coupling structure. Those “closer” clusters are merged if some conditions are satisfied. In order to reduce the cost time of computing the distance between objects, a local computing technology is introduced into LOCHDET. Obviously, LOCHDET generalizes many clustering methods, including density-based, hierarchical, and locality-based methods. The details of LOCHDET are described in Section 2. Experimental results are presented in Section 3, followed by a discussion. Finally, in Section 4, we present our conclusion and describe some topics for future work.

A Local Computing-Based Hierarchical Clustering Algorithm

635

2 LOCHDET Algorithm 2.1 Definitions The process of building a density tree in LOCHDET is similar to CABDET, but not identical. We first have to introduce some new definitions before describing LOCHDET. Definition 1(merging distance). The merging distance is the minimum distance for merging clusters. Clusters will be merged once the following condition is satisfied:

dis pq (C i , C j ) ≤ dis merge Where

(1)

p ∈ C i , q ∈ C j , p and q are objects, C i and C j represent different clusters.

Definition 2(cluster representative). A cluster representative has the maximal point density value in its cluster. It can be represented by the function:

{

prep = pi Density( pi , ε ) > Density( p j , ε ), i ≠ j, pi ∈ C, p j ∈ C

}

(2)

Where C is a cluster. Actually, the cluster representative is the root node of the density tree. Definition 3(local computing coefficient). The local computing coefficient is the value of local computing radius of objects divided by input original radius. That is

ξ= Where

ξ ≥ 1 , ε range

εrange ε0

is the local computing radius of objects,

(3)

ε0

is the input original

radius. The local computing coefficient sets the same computing range for every object. Users can predefine ξ to reduce the cost time of computing distance matrix. 2.2 The Algorithm LOCHDET algorithm could be divided into five steps: Step 1: compute the distance array Dist(i,j) among all objects. Step 2: set the allowed maximum radius of the neighborhood ε max .

Step 3: create the density-tree until no point is added according to the local density fluctuation. Step 4: merge those “closer” clusters to improve clustering accuracy. Step 5: delete clusters containing one object or very few objects. In our algorithm, we create the same data structure for each object: the distance from its father node; the identifier of itself; radius of neighborhood of its father node; and point density of its father node. The details for the five steps in the following are based on this structure.

636

W.-d. Dai et al.

2.2.1 Computing Dist(i ,j)

Generally, the computational complexity of calculating similarity matrix Dist(i,j) is O(n 2 ) .A local computing technology is used to reduce the cost time. The restricted

computing range is determined by input parameters

ε0

and ξ , not computing all of

the distances between any objects. As shown in Figure 1, the distance between object P and B at the X dimension is PB X , and PB X < ε range , the distance between P and B , PB , is calculated and stored into Dist(i,j) . However, the distance between object P and C at the X dimension is PC X , and PC between P and C ,

X

> ε range , the distance

PC , is abandoned, and set PC = +∞ .Obviously, this method

converts the overall computation into the local computation.

|PC|X P

|PB|X B C

ε range = ξ ⋅ ε 0 Fig. 1. Local computation

2.2.2 Setting the Allowed Maximum Radius of the Neighborhood This process is divided into two steps: first, the possible maximum radius of the

neighborhood

' ε max is

calculated by using the same method in CABDET algorithm.

However, unlike CABDET algorithm, the ε max value can’t be set as the allowed '

maximum radius of the neighborhood for making the best of the local computation. Then, the second step is to set the allowed maximum radius of the neighborhood by the following operation: ' ε max = min{ε max , ε range }

(4)

2.2.3 Generating Tree Structure The process of generating density-tree structure is the same as CABDET algorithm, which could be divided into two steps: sequentially find those unprocessed objects with the highest point density as the root node of each potential tree, then call the function “Searching_For_Sons” and iterate until no new object is added. The detail of this process is available in CABDET algorithm [6].

A Local Computing-Based Hierarchical Clustering Algorithm

637

2.2.4 Merging Those “Closer” Clusters Clusters by generating density-tree structure have the character of inner tightness. One cluster will be split into several segments which are tagged as different clusters under the unsuitable parameter settings. The aim of this process is to merge those closer clusters at the guideline of hierarchical clustering idea. Many methods are used to identify those closer clusters, such as minimum distance and representatives’ distance. In our experiment, two methods are respectively used to improve clustering accuracy. 2.2.5 Deleting “Noise” Clusters The LOCHDET algorithm marks all possible clusters, even clusters containing only one object. After merging, if some clusters containing few objects are isolated from others, the algorithm deletes them directly according to “noise” threshold predefined by users. 2.2.6 Computational Complexity Suppose n is the number of objects with d dimension L is the length of data space, and the local computing radius is ε range . If the data sets obey uniform distribution, the

computational complexity of Dist(i,j) is O(n ⋅ maximal radius of the neighborhood

d 2d ε range

⋅ n) . In computing the possible Ld , a typical sorting algorithm has the

' ε max

computational complexity of O(n ⋅ logn) when ranking objects according to point density. The computational complexity of merging clusters depends on using strategy. If users adopt the minimum distance, LOCHDET only index Dist(i,j) simply; if users adopt the representatives’ distance, its computational complexity is O(m 2 ) , where m is the number of cluster representatives. Therefore, the whole algorithm has the d 2d ε range complexity of O(n ⋅ ⋅ n) O(n ⋅ logn) O(m 2 ) . Generally speaking, d L m << n and O(n ⋅ logn) could be ignored when n is larger, meaning the complexity

㧗

㧗

of LOCHDET is O(n 2 ) . However, the LOCHDET algorithm obtains a higher speedup. The computational complexity of CABDET is O(n 2 ) O(n ⋅ logn) , then the speedup is:

㧗

O(n 2 ) + O(n ⋅ logn)

Tspeedup = O(n ⋅

d 2d ε range d

L

(5)

⋅ n) + O(n ⋅ logn) + O(m 2 )

Our experiments show the cost time of sorting objects is further less than the time of computing Dist(i, j) . This suggests that O(n ⋅ logn) and O(m 2 ) could be ignored: Tspeedup ≈ O(n ⋅

O(n 2 ) d 2d ε range Ld

= ⋅ n)

§ L =¨ ¨ 2ε range ⋅ O(n 2 ) ©

O(n 2 ) d 2d ε range

Ld

· ¸¸ ¹

d

(6)

638

Where

W.-d. Dai et al.

ε range = ξ ⋅ ε 0 . For a given data sets,

relational to input parameters

ε 0 and ξ

L and d are known, then the speedup is

. Larger ε 0 and ξ lead to the decrease of

Tspeedup .

3 Performance Analysis Three different test data sets are used to evaluate the performance of LOCHDET. DB1 is the database3 used by DBSCAN; DB2 is the imitating data according to database2 in DBSCAN by Dr Jörg Sander; DB3 is generated randomly by obeying normal distribution; the programs, LOCHDET and CABDET, are written in MatLab. All experiments have been run on a DELL 2600 server. 3.1 Comparison of Clustering Accuracy

We compare LOCHDET with the performance of CABDET for they have the same method of generating density trees. Figure 3 and Figure 4 show the clustering results of CABDET and LOCHDET respectively. The experiments show that CABDET splits one cluster into several clusters which have the feature of inner tightness at unsuitable input parameters. In figure 3, some objects vesting in a cluster is marked noise objects. The LOCHDET algorithm accurately recognizes every full cluster by merging closer segments. Further investigation reveals that LOCHDET takes on such characteristics: wide range of parameter settings, accurate in discovering clusters with arbitrary shape, good ability of processing noise data sets and less sensitive to input parameters 3.2 Input Parameters

LOCHDET requires three input parameters: the original radius of a neighborhood ε 0 , the local computing coefficient ξ , and merging distance dist . The significance of

ε0

in LOCHDET is similar to CABDET, which is used to calculate point density of each object and to be set as the root nodes’ radius of the neighborhood; The second parameters ξ is used to reduce the cost time of computing similarity matrix Dist(i, j) , which provides a limited range for the conversion form the whole space to the local space. In principle, the range of computation should be larger than the possible maximum radius of the neighborhood ε max , calculated automatically by '

the algorithm. The third parameter dist is used to set the threshold for merging those closer clusters. Many methods about how to define dist is available in papers related to hierarchical clustering. In our experiments, we adopt minimum distance and representatives’ distance respectively to improve accuracy of clustering. The three input parameters are relatively independent of each other, which allows user to adjust one of them conveniently. Besides, this advantage is helpful to extend the algorithm to different real data sets.

A Local Computing-Based Hierarchical Clustering Algorithm

(a)DB1

639

(b)DB2

(c)DB3 Fig. 2. Data Sets

3.3 Comparison of Effectivity

Table 1 lists the effectivity of LOCHDET and CABDET. The test data sets are generated randomly by following the same method with DB3. The same input parameters, ε 0 = 3 and ξ = 1.2 , are set to offer the equal opportunity. The test results show that the cost of Dist(i, j) occupies more than 90 percent of the total time of CABDET, which causes longer running time. However, the cost of Dist(i, j) in LOCHDET occupies only 30~40 percent of the total time by introducing local computing technology which improves the running efficiency obviously. The local speedup of Dist(i, j) is high to 15. The holistic speedup of LOCHDET versus CABDET is about 6~7.

㧘

(a)

ε 0 = 2.2

(b)

ε 0 = 1.8

Fig. 3. Clustering results of CABDET

(c)

ε 0 = 2.0

640

W.-d. Dai et al.

(a) ε 0 = 2.2

(c) ε 0 = 2.0

(b) ε 0 = 1.8

dismerge = 2.2

dismerge = 3

disrep = 7

Fig. 4. Clustering results of LOCHDET

4 Conclusion Clustering belongs to an unsupervised method and is mostly used to divide an unknowingly-distributed data set which has no transcendental knowledge or experiences and no class label attributes. General speaking, the clustering results can be quite sensitive to input parameters. Parameters are often hard to determine, especially for data sets containing high dimensional objects. In contrast with CABDET algorithm, LOCHDET has a wide range of parameter settings, preferable accuracy in discovering clusters with arbitrary shape, good ability of processing noise Table 1. Comparison of Effectivity between LOCHDET and CABDET(s) Data Number

CABDET

LOCHDET

T

( ε 0 =3)

( ε 0 =3, ξ =1.2)

Tspeedup

2000

Tm

22.8

1.52

15.1

Tw

25.2

3.82

6.6

3000

Tm

54.2

3.13

17.3

60.0

8.4

7.1

4000

Tw Tm

95.5

5.9

16.2

Tw Tm

106.9

16.8

6.4

5000

150.5

10.1

14.9

Tw

168.9

28.3

6.0

6000

Tm

219.6

14.2

15.5

Tw

245.9

40.4

6.1

Notes: Tm -the running time of Dist(i , j) ; Tw -the total running time;

Tspeedup -the speedup

A Local Computing-Based Hierarchical Clustering Algorithm

641

data sets and weak sensitivity to input parameters by generalizing density-based, hierarchical, and locality-based methods. In order to reduce the cost time, a local computing technology is introduced and improves the running efficiency. The results of our experiments confirm these mentioned above. Future work will have to consider the following issues. First, we consider how to lower its space complexity on larger scale of data sets; Second application of LOCHDET to high dimensional feature space should be investigated, especially focusing on the way of obtaining the suitable input parameters automatically.

Acknowledgments This work was supported by Science-Technology Development Project of Tianjin (04310941R) and Applied Basic Research Project of Tianjin(05YFJMJC11700).

References 1. Han, J. W, Kambr, M: Data Mining Concepts and Techniques. Higher Education Press, Beijing (2001) 2. Alexander, H., Daniel, A. K.: An Efficient Approach to Clustering in Large Mul-timedia Databases with Noise. In Pro. Con. KDD’98, New York (1998) 58-65 3. Martin, E., Hans, P. K., Jörg, S., Xu, X. W.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Pro. Con. KDD’96, Portland, OR (1996) 226-231 4. Mihael, A., Markus, M., Breunig, H. P. K., Jörg, S.: OPTICS: Ordering Points To Identify the Clustering Structure. Proc.ACM SIGMOD’99 Int. Conf. On Management of Data, Philadelphia, PA (1999) 49-60 5. Zhao, Y. C., Fan, X., Song, J. D.: DILC: A Clustering Algorithm Based On Density-isoline. Journal of Beijing University of Posts and Telecommunication. 25 (2) (2002) 8-13 6. Dai, W. D., Hou, Y. X., He, P. L., Zheng, X. S.: A Clustering Algorithm Based On Building A Density-tree. Proceeding of ICMLC2005 Conference, Guangzhou (2005) 1999-2004 7. Sudipto, G., Rajeev, R., Kyuseok, S.: Cure: An Efficient Clustering Algorithm For Large Databases. SIGMOD’98, Seattle, WA (1998) 73-84 8. George, K., Han, E. H., Vipin, K.: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling. COMPTRER, 32 (1999) 68-75 9. Zhang, T., Raghu, R., Miron, L.: BIRCH: An efficient Data Clustering Method For Very Large Databases. SIGMOD’96, Montreal, Canada (1996)103-114

Support Vector Clustering and Type-Entropy Based Joint De-interleaving/recognition System of Radar Pulse Sequence Qiang Guo1,2, Zheng Li2, and Xingzhou Zhang1 1

College of Information and Communication Engineering, Harbin Engineering University, Harbin 150001, China 2 Southwest China Research Institute of Electronic Equipment, Chengdu 610036, China [email protected]

Abstract. Modern electronic warfare faces complex and dense pulses environments, which brings a severe challenge to radar signal sorting. Support Vector Clustering (SVC) is used into the radar signal sorting and the notion of type-entropy is presented in this paper. And combine the recognition technology based on type-entropy with the sorting technology based on SVC to constitute a novel radar sequence signal sorting system. The experiment result shows that the system can sort efficiently radar signals in the complex pulses environment.

1 Introduction Radar signal sorting is a key technology in electronic support measures (ESM) system. We can analyze and extract the parameters of radar emitter signals only based on sorting. Therefore, signal sorting is indispensable basis of radar emitter identification. The conventional multi-parameter de-interleaving technique sequentially uses the information such as direction of arrival (DOA), pulse repetition interval (PRI), radio frequency (RF) and scanning mode, to de-interleave the radar pulse sequence [1,2]. In the condition of high pulse densities and complex pulses interleaving, the conventional tactic of signal sorting is obviously inefficient because it is to be determined by accuracy of each step pre-sorting in the above processing. The final sorting result is maybe that pulses from different radars are mistakenly combined into a single cell, pulses from a single radar split into several cells i.e. “increasing batch” and “missing batch” are made. The paper presents a novel signal sorting system based on support vector machines in data mining and entropy analysis in information theory. It combines clustering sorting method based on delaminating coupling and support vector clustering (DCSVC) and type-entropy recognition technology. Compared conventional signal sorting system, DCSVC sorting method break the limit of setting tolerance in conventional sorting processing, it can produce more complex and compact clustering boundaries according to the distribution characteristics of data set and has good generalization performance. We make entropy as a measure of electromagnetic signal environment, which benefits to quantify the complexity of it. Type-entropy has the capability of macroscopic analysis on electromagnetic signal D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 642 – 653, 2006. © Springer-Verlag Berlin Heidelberg 2006

SVC and Type-Entropy Based Joint De-interleaving/recognition System

643

environment. The result of clustering sorting can be recognized by type-entropy to assist sorting. Through it, we can adjust the parameters of DCSVC sorting so that it could develop a novel system of radar pulse sequence sorting. The experiment result of radar signal sorting in the novel system is to be abstained by computer simulation.

2 ESM Data Processing Scheme The ESM data processing scheme has the structure shown in Fig.1. The block former accumulates pulses from the ESM front end. When a certain number have been accumulated, the block of pulses is submitted to the multi-parameter clustering sorter. The block former then starts to accumulate another block. E S M

r e c e i v e r

b l o c k

f o r m

e x t r a c t P D V { D O A , R F , P W

S V

C

d i g i t a l

p u l s e

b l o c k s

o f

d a t a

e r p u l s e s

}

a n a l y s i s p a r a m e t e r a d j u s t m e n t c o n t r o l o f q a n d C

t y p e - e n t r o p y p a t t e r n r e c o g n i t i o n

d e t e r m p a r a m

i n e c l u s t e r i n g e t e r s q a n d C

D

T O

C

S V

m u l t i - p a r a m e t e r c l u s t e r i n g s o r t e r

C

b a t c h e s A - d i f f e r e n c e h i s t o g r a m

o f

p u l s e s

d e - i n t e r l e a v e d

p u l s e

c h a i n s

p u l s e - c h a i n c h a r a c t e r i z e r p u l s e - c h a i n

e m

i t t e r

e m

t a b l e

i t t e r

p a r a t e t e r s

u p d a t e r

t a b l e

Fig. 1. Structure of ESM data processing scheme

The multi-parameter clustering sorter and the TOA-difference histogram deinterleave the pulses in the block into pulse chains. Ideally, each pulse chain will consist of all the pulses in the block which have originated from one emitter, and no other pulses. The de-interleaving process takes place in two stages. Firstly, the multi-parameter clustering sorter splits each block of pulses into a number of batches of pulses. The batches are then processed sequentially by the TOA-difference histogram, and split into individual pulse chains.

644

Q. Guo, Z. Li, and X. Zhang

The parameters to be entered in the emitter table are then evaluated for each deinterleaved pulse chain by the pulse-chain characterizer. The parameters of the characterized pulse chains are then compared with those in the current emitter table by the emitter table updater. This comparison can result in (1) an emitter table entry being updated to include information concerning the new pulse chain, and (2) a new emitter table entry being made if the new pulse chain is not associated with any existing emitter table entry. A novel multi-parameter sorter is embedded in the ESM data processing system. It de-interleaves radar pulse sequence in multidimensional attribute space according to the character that signals from single radar have self-similarity and the signals from different radars have little similarity. The follow is the detailed introduction.

3 Sorting Algorithm of Radar Signals Based on Delaminating Coupling and Support Vector Clustering A radar pulse descriptor vector vi, i =1, 2,…, N (N is the length of radar pulse sequence)with 3-dimension attribute information is constituted with direction of arrival (DOA), radio frequency (RF) and pulse width (PW) of emitter pulse. And radar pulse sequence {vi} is cut as n subsequences whose length is int(N/n) (the function 'int' is the greater integer less than the operand). Sorting algorithm can be summarized in the two following steps: first, the n subsequences whose length is int(N/n) are clustered respectively to m j (j=1,2, …,n) sub-clusters by support vector clustering. n

And then adopting the idea of delaminating coupling, the

¦ m j sub-clusters from the j =1

first step are combined again by clustering algorithm so that the final sorting result can be taken. This method is described here. 3.1 Support Vector Clustering Support vector machines [3, 4] are a kind of statistical learning method which is about pre-estimate on finite samples. It is founded on the principle of structure risk minimization and combined the idea of the maximal margin classifier with kernelbased learning methods. It shows good generalization performance and can effectively overcome some problems such as the curse of dimensionality, overfitting and so on. At the same time, it can obtain the globally optimal solution. SVC has a unique advantage—by such above characters—in seeking for structured globally optimal solution, the convergence of algorithm, overcoming of the sensitivity of algorithm on noise, etc. The basic idea of SVC presented by Ben-Hur [5, 6] et al is: first, the data sample is mapped from attribute space to a high dimensional feature space by non-linear transformation. Then we are looking for the optimum separating hypersphere in this new space. The non-linear transformation is founded by kernel function non-linear mapping. The useful feature can be recognized, extracted and amplified by kernel function mapping. The implied feature can be projected so that the clustering is more effective.

SVC and Type-Entropy Based Joint De-interleaving/recognition System

645

We introduce the SVC process on intercepted subsequences here. Let V ⊆ R3 be a data space of the above radar pulse description vector vi, with {vi } ⊆ V , i=1, 2,…, N. The distribution characteristics of received radar pulse parameters are so complex that the boundaries of clusters are also complicate. The clustering feature of the data sets will be more outstanding by using a nonlinear transformation ĭ from V to some high dimensional feature space. We are looking for the smallest closed convex sphere of radius R in the feature space. This is described by the constraints: Φ (v j ) − a

where

•

2

(∀j, ξ j ≥ 0)

≤ R2 + ξ j

(1)

is the Euclidean norm and a is the center of the sphere. Soft constraints

are incorporated by adding slack variables ξ j . To solve this problem we introduce the Lagrangian 2

L = R 2 − ¦ ( R 2 + ξ j − φ (v j ) − a ) β j j

(2)

− ¦ξ j µ j + C¦ξ j j

j

where β j ≥ 0 , µ j ≥ 0 are Lagrangian multipliers, C is a penalty factor, and C ¦ ξ j is a j

penalty term. Under the Karush-Kuhn-Tucker conditions [7], we conclude: 1. A point φ (vi ) with βi = C is mapped to the outside of the feature space sphere whose the minimal radius is R. The points as vi will be called outliers and lie outside of cluster boundaries. 2. A point φ (vi ) with 0 < βi < C is mapped to the surface of the feature space sphere whose the minimal radius is R. The points as vi will be called Support vectors (SVs) and lie on cluster boundaries. 3. All other points lie inside cluster boundaries. Turning the Lagrangian into the Wolfe dual form that is a function of the variables β j : W = ¦ Φ(v j ) 2 β j − ¦ β i β j Φ(vi ) ⋅ Φ(v j )

(3)

0 ≤ β j ≤ C , ¦ β j = 1, j = 1,, N .

(4)

j

i, j

with the constraints: j

Throughout this paper, the Gaussian kernel is used: K (vi , v j ) = Φ (vi ) ⋅ Φ (v j ) = e

with width parameter q.

− q vi − v j

2

(5)

646

Q. Guo, Z. Li, and X. Zhang

The Lagrangian W is written as: W = ¦ K (vi , v j ) β j − ¦ βi β j K (vi , v j ) j

i, j

(6)

At each point v, the distance of its image is defined in feature space from the center of the sphere: R 2 (v ) = Φ (v ) − a

2 2

= Φ (v ) −

¦

β j Φ (v j )

(7)

j

= K (v, v ) − 2¦ β j K (v j , v ) − j

¦ β i β j K (v i , v j ) i, j

The radius of the sphere is: R={R (vi)| vi is a support vector}. The contours that enclose the points in data space are defined by the set {v | R (v) = R}. SVs lie on the contours, which forms the cluster boundaries of the parameters of single radar. Cluster assignment: given a pair of data points that belong to different clusters, any path that connects them must exit from the sphere in feature space. Therefore, such a path contains a segment of points y such that R(y)>R. This leads to the definition of the adjacency matrix Aij between pairs of points vi and vj whose images lie in or on the sphere in feature space: ° Aij = ® ° ¯

1 , R( y) ≤ R

(8) 0 , other

Clusters are defined as the connected components of the graph induced by A. Cluster assignment is to be made again based on the connected components by Depth First Search (DFS). 3.2 Delaminating Coupling Based on SVC In radar signals sorting, the number of data handled is very large. If all the data are be treated as the training samples, it makes the scale of adjacency matrix of SVC clustering algorithm enormous. Then the speed of calculation would be affected. Therefore, we adopt the delaminating coupling clustering method based on support vector machines. The steps are here: To delaminate the pulse sequence of radar signals, i.e. to cut them short, we can obtain m subsequences. Each one can be clustered by SVC and the results are ni (i=1,…,m) clusters respectively. Then we extract the centroid Gij of No. j (j=1,…,ni) clustering set in No. i (i=1,…,m) subsequence(i.e. to obtain the mean statistics value of each kind of parameters). Coupling processing: all the centroids are clustered once again by SVC to link the subsequences from single radar in all sub-clusters in order to realize the sorting of the radar pulse sequence signals. Framework of delaminating coupling SVC algorithm as Fig.2.

SVC and Type-Entropy Based Joint De-interleaving/recognition System

647

ra d a r p u ls e sequence

d e la m in a tin g

1 th s e g m e n t subseq uence

…

ith s e g m e n t subsequen ce

…

m th s e g m e n t su b seq u en ce

SV C

…

SV C

…

SV C

n1 subc lu s te rs

…

ni subc lu s te rs

…

n m subc lu s te rs

T o e x tra c t re s p e c tiv e ly th e c e n tro id o f e v e ry s u b -c lu s te r G ij ( i = 1 , … , m ) ( j = 1 , … , n i)

G ijs a r e c l u s t e r e d b y S V C

T o lin k w ith s u b -c lu s te rs fro m a s in g le ra d a r

Fig. 2. Framework of delaminating coupling SVC

The advantages of delaminating coupling are: 1. SVC is to be presented in order to resolve the small sample learning problem [8] so that delaminating processing fits SVC and the idea of delaminating coupling meets the needs of real time process in radar signal sorting. 2. The memory capacity that of unit needed for the preceding stage sorting is saved. At the same time because the complexity of SVC algorithm is O(N2),the data size that of the clustering pulses increased undoubtedly makes a drastic drop in calculating speed. The radar pulse sequence is to be delaminately processed by SVC so that the calculating speed of clustering algorithm is to be raised. Moreover, the missed pulses produced by low processing speed are decreased. Thereby, the efficiency of radar signals sorting is to be raised.

4 The Recognition Method of Entropy Measure for Radar Pulses In electronic countermeasure system, pulse density is used usually to evaluating complexity of electromagnetic environment at present. However, there is a good deal of controversy that pulse density is used to describe the signal environment not only in actual application but on theory. The controversies mainly are:

648

Q. Guo, Z. Li, and X. Zhang

1) To purely emphasize high pulse density outside is little meaning to evaluating the capable of the system. 2) The same pulse density can't indicate the same complex signal environment. 3) Pulse density itself lacks a generally recognized measurable definition. From this sense, pulse density is an unfitting measurement value to measuring the complexity of electromagnetic signal environment. Seeking for a suitable physical quantity as the measurement to the complexity of signal environment is not only an urgent need to engineering practice but a difficulty to electronic countermeasure for many years. Adopting the notion of information entropy provided a feasible basis for scientifically evaluating signal environment. In this paper, radar pulse environment (or its subset) is faced by ESM system is treated as the information source. In this way, complexity of pulse environment can be expressed by uncertainty. Generally, information source can be described by probability space, and uncertainty of information source can be described by number of possible state and its probability in this probability space [9]. Let probability space of information source [X] be {X,P(x)}, where X = {x1,x2,…,xn} is state space of random events and P(x)={P(x1),P(x2),…,P(xn)} is probability distribution of possible states of random event, with ¦ P (x) = 1 . In information theory, it is named information source entropy that mean amount of information or uncertainty provided from a message from information source. It is indicated as: n

H( X ) = −¦ P ( xi ) log P ( xi ) i =1

(9)

From (9), we can see that, when the distribution of information source probability space is equiprobability, uncertainty value H(x) is the biggest. Whose size is related to the number of possible states or probability in probability space. The more the number of possible states or the smaller probability, the bigger the uncertainty value is. Information source entropy is the function of probability distribution of information source probability space. According to the definition of information source entropy, we describe the complexity of signal environment by employing type-entropy in accordance with the character of radar environment. Type-entropy can be indicated as the estimating to the description of pulse categories. Let a pulse be described as RF, PW and TOA, in another word, the same RF, PW and TOA are looked as a kind of pulses, in this sense, we can describe how many the pulse categories are in signal environment by employing type-entropy. It is defined as, N

H T ( P) = − ¦ Pn log Pn n =1

(10)

Where Pn is the probability of each kind of pulses, N is the categories. Considering two kinds of electromagnetic signal environment, as table 1

SVC and Type-Entropy Based Joint De-interleaving/recognition System

649

Table 1. The comparison of type-entropy in the two pulse signal environment signals

parameters

signal1 signal environment I

normal signal RF=1000, PRF=1000Hz, PW=1s normal signal RF=1500, PRF=2000Hz, PW=4s normal signal RF=1780, PRF=4500Hz, PW=12s normal signal RF=1000, RF=40000Hz, PW=1s normal signal RF=1780, RF=20000Hz, PW=12s

signal2

signal3

signal1 signal environment II

signal2

pulse density

typeentropy

7500/s

1.34

60000/s

0.92

We make out the type-entropy of the two kinds of situations in table 1 according to the definition of type-entropy. And we can see that, though the pulse density is higher in environment II, it is more complex in environment I than in environment II but if the entropy is looked as the criterion. It is the result that we expect by using entropy into the sorting /recognition system and it’s also reasonable. With the type of signals different, probability of all kinds of signals is different. Probability is related to pulse repetition frequency and the type of pluses each a unit time. Pi =

Si PRF

i = 1,, N

(11)

where Si is the times of each type pulses emerged each a unit time. We first analyze the relation of entropy estimation, pulse repetition frequency and the type of pulses of single signal. And then analyze the situation of the combined signals. 1. Group agile signal If the frequency sets of group agile signal are N points, the distribution of probability can be indicated as: Pi =

1 PRF Si 1 = N = PRF PRF N

i = 1,, N

(12)

where PRF is pulse repetition frequency, N is the number of group agile. 2. Diversity signal If diversity sets of diversity signals are N points, the frequency is: Pi =

Si PRF 1 = = PRF N ⋅ PRF N

(13)

650

Q. Guo, Z. Li, and X. Zhang

3. Staggered signal, jittered signal and conventional signal The type of pulses hasn’t changed and all the pulses are similar to each other so that they can be looked as one kind of pulses and its probability is: Pi =

Si =1 PRF

i =1

(14)

4. The situation of the combined signals When several signals are combined, the probability Pj of every signal can be indicated as: Pj =

PRF j

j = 1,, M

M

¦ PRF j

(15)

j =1

We can draw a conclusion throughout the above introduction. 1. Agile frequency signal and diversity signal can much increase the complexity of signal environment. Moreover, the complexity caused by a special signal is much more than that caused by conventional one. 2. The closer the repetition frequency, the bigger the entropy value is when it’s equal individually in the number and type of signals. In fact, sorting is also more difficult in these conditions. 3. Type-entropy is much related to the category of pulses. The maximum entropy value is logN, N is the category of pulses. With the number of the type of pulse increasing, the entropy value of the environment will be bigger.

5 Parameter Adjustment Control of q and C by Type-Entropy Cluster boundaries are controlled by the width parameter q of Gaussian kernel and the penalty factor C of Lagrangian function in DCSVC clustering sorting. With parameter q is increased, cluster boundaries perform a more compact character. The size of parameter C determines the number of outliers. With the value C (C1) is reduced, the number of outliers can be increased accordingly. Cluster boundaries can be smoothed by reducing the value C [10]. We propose to use SVC as a “divisive” clustering algorithm [11], starting from a small value of q and increasing it. The initial value of q may be chosen as q=

1 maxi , j vi − v j

2

(16)

At this scale all pairs of points produce a sizeable kernel value, resulting in a single cluster. At this value no outliers are needed, hence we choose C=1. As q is increased we expect to find bifurcations of clusters. Although this may look as hierarchical clustering, we have found counterexamples when using outliers. Thus strict hierarchy is not guaranteed, unless the algorithm is applied separately to each cluster rather than to the whole dataset. We do not pursue this choice here, in order to show how the cluster structure is unraveled as q is increased. Starting out with C = 1,

SVC and Type-Entropy Based Joint De-interleaving/recognition System

651

we do not allow for any outliers. If, as q is being increased, clusters of single or few points break off, or cluster boundaries become very rough, C should be reduced in order to investigate what happens when outliers are allowed. In general, a good criterion seems to be the number of SVs: a low number guarantees smooth boundaries. As q increases this number increases. If the number of SVs is excessive, C should be reduced, whereby many SVs may be turned into outliers, and smooth cluster (or core) boundaries emerge. In other words, we propose to systematically increase q and reduce C along a direction that guarantees a minimal number of SVs. An important problem in the divisive approach is the decision when to stop dividing the clusters. We believe that in our SV setting it is natural to use the number of support vectors as an indication of a meaningful solution, as described above. Hence we should stop SVC when the fraction of SVs exceeds some threshold [5]. But the threshold of the number is difficult to be determined reasonably in practice because we lack a stable physical quantity which can ensure that the determining of the threshold is reasonable. The problem just can be solved by using type-entropy. According to the character that type-entropy value is getting big with the increasing categories and complexity of pulse signals, we can calculate type-entropy on the multi-parameters clustering results. Through recognizing the complexity of it by type-entropy, we can macro-analyze the results of clustering sorting in order to judge it to decide the final parameters q and C of clustering sorting.

6 Simulation Experiment Result To verify the effect of the novel joint de-interleaving/recognition system, we adopt the radar signal data as table 2 in the simulation experiment. At the pulse simulating data being produced, sampling intervals want set and the simultaneously arrived signals are losing proceeded. The first 5000 pulses are sorted in the radar pulse serial data flow. Table 2. The radar parameters information

radars

PRF (Hz)

Radar1

RF (MHz)

PW (ȝs)

DOA ( ‫)ޕ‬

1.2~1.3

48~60

The number of pulses 824

0.3~0.4k (PRI change according to sine regulation) Radar2 0.3~0.4k (jittered PRF)

2080~2250 (agile frequency)

2750~2850 (agile frequency)

1~1.1

60~80

823

Radar3

0.8~1k (staggered PRF)

2250~2350 (jumping frequency on ten points)

1.2~1.25

68~80

2149

Radar4

0.7~0.9k (jittered PRF)

2550~2750 (agile frequency)

1.3~1.4

56~64

1891

652

Q. Guo, Z. Li, and X. Zhang

When the above radar pulse sequence is sorted by DCSVC and parameters are adjusted to q=30, C=1 by type-entropy recognition technology, we can obtain a better result of clustering sorting, as Fig.3 and Fig.4. Statistic on the sorting result shows the sorting accuracy is 98.06%.

Fig. 3. The distribution of 2-dimension attribute parameters of the clustering result on first 50 data samples by SVC

Fig. 4. The statistic histogram of the sorting result by the multi-parameter clustering sorter

7 Conclusions This paper presents a novel joint de-interleaving/recognition system of radar pulse sequence. It introduces a novel sorting method based on SVC and delaminating coupling. At the same time, the notion of type-entropy is to be adopted and

SVC and Type-Entropy Based Joint De-interleaving/recognition System

653

type-entropy recognition is used to assist signal sorting. Simulation experiment shows that the sorting system is effective to the high pulse density environment and the complex signal pattern.

References 1. Milojevic, D.J., and Popovic, B. M.: Improved Algorithm for De-interleaving of Radar Pulses, IEE Proc. F., Comm., Radar Signal Processing, vol. 139, no.1, (1992)98-104 2. J.A.V. Rogers, Ph.D.: ESM processor system for high pulse density radar environments, IEE Proc. F., Comm., Radar& Signal Processing, vol. 132, no.7, (1985) 621-625 3. Cristianini, N., and Shawe-Taylor, J.: An introduction to Support Vector Machines and Other Kernel-based Learning Methods, Cambridge University Press (2000) 4. Vladimir N. Vapnik: Statistical Learning Theory, Wiley-Interscience (1998) 5. Ben-Hur A, Horn D Siegel mann H T , et al .: Support vector clustering, Journal of Machine Learning Research vol. 2, (2001) 125-137 6. Ben-Hur A, Horn D, Siegelmann H T, et al.: Support vector method for hierarchical clustering, Advances in Neural Information Processing Systems, vol. 13, (2001) 367-373 7. DENG Nai-yang, TIAN Ying-jie,: A new method in data mining: Support Vector Machines, Science Press (2004) 8. Burges, C.J.C.: A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, vol.2, no.2, (1998) 955-974 9. Robert J. McEliece: The theory of information and coding: a mathematical framework for communication, Reading, Mass.: Addison-Wesley Pub. Co., Advanced Book Program (1977) 10. Ben-Hur, D. Horn, H.T. Siegelmann, and V. Vapnik: A support vector clustering method, In International Conference on Pattern Recognition (2000) 11. A.K. Jain and R.C. Dubes: Algorithms for clustering data, Prentice Hall, Englewood Cliffs, NJ (1988)

Classification Rule Mining Based on Ant Colony Optimization Algorithm Peng Jin1,2, Yunlong Zhu1, Kunyuan Hu1, and Sufen Li1 1

Shenyang Institute of Automation of the Chinese Academy of Sciences, Shenyang, 110016, China 2 Graduate School of the Chinese Academy of Sciences, Beijing, 100039, China {jinpeng, ylzhu, hukunyuan, lisufen}@sia.cn

Abstract. Classification rule mining is an important function of data mining, and is applied in many data analysis tasks. The classification rule mining algorithm based-on ant colony optimization (ACO) is researched in this paper. Some improvements are implemented based on existing research to enhance classification predictive accuracy and simplicity of rules. Multi-population parallel strategy is proposed, the cost-based discretization method is adopted, and parameters in the algorithm are adjusted step by step. With these improvements, performance of the algorithm is advanced, and classification predictive accuracy is enhanced. Finally, SIMiner, a self-development data mining software system based on swarm intelligence, is applied to experiment on six data sets taken from UCI Repository on Machine Learning. The results illuminate the algorithm proposed in this paper has better performance in predictive accuracy and simplicity of rules.

1 Introduction Classification is a primary task of data mining, and is an important content of databased learning in the real world. For example, identifying credit level of credit card users, diagnosing some kind of disease of a patient, etc, these are all processes of classification. In data mining, classification is defined as the process that building the model to describe and distinguish data class or concept, and using it to identify data whose class is unknown into appropriate data class [1]. Common classification methods include decision trees, neural network, genetic algorithm, and naive Bayes classification, etc [2]. Different methods achieve different models. As a comprehensible model, classification rules are usually used. A classification rule is generally expressed with IF-THEN rule, as follows: IF THEN . Each term of the condition is a triple . The rule consequent is the result of classification, i.e. the value of class attribute. Ant colony optimization (ACO) is a metaheuristic algorithm inspired with collective foraging behavior of ant colony in real world. Due to the good robusticity and flexibility for searching solution, it has been applied in the traveling salesman D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 654 – 663, 2006. © Springer-Verlag Berlin Heidelberg 2006

Classification Rule Mining Based on Ant Colony Optimization Algorithm

655

problem (TSP), quadratic assignment problem, graph coloring, job-shop scheduling, sequential ordering, and vehicle routing [3]. Ant-Miner, an ACO-based algorithm for data mining, is proposed in [12], and is compared with CN2, a well-known data mining algorithm for classification. In this paper, some improvements of Ant-Miner algorithm are implemented based on existing research to enhance predictive accuracy and simplicity of rules. Multipopulation parallel strategy is proposed, the cost-based discretization method is adopted, and parameters in the algorithm are adjusted step by step. For evaluating the new algorithm, called ACO-Miner, we applied SIMiner, a self-development data mining software system based on swarm intelligence, to experiment on some data sets taken from UCI Repository on Machine Learning. The results illuminate ACO-Miner has better performance in predictive accuracy and simplicity of rules than Ant-Miner and CN2. The rest of this paper is organized as follows. Section 2 presents artificial ants and ACO algorithm briefly. Section 3 discusses the Ant-Miner algorithm and represents improvements of Ant-Miner algorithm, i.e. the ACO-Miner. Section 4 reports experiment results and compares with Ant-Miner and CN2. Finally, Section 5 concludes the paper and points expectation for future research.

2 Artificial Ants and ACO Algorithm The primary ideas of ACO algorithm are inspired from ant colony in real world, including: 1) collaborative individuals compose the swarm; 2) pheromone trail is used for communication through the local environment; 3) the shortest path is discovered by a series of behaviors of collaborative individuals; 4) the random decision strategy based on local information is adopted, without transcendental knowledge. On the other hand, artificial ants in ACO algorithm have several characteristics which real ants do not have, namely: 1) artificial ants move in a discrete space; 2) artificial ants have interior status and have the memory about past behaviors; 3) artificial ants release pheromone according to the quality of solution which they discovered; 4) the time spent by artificial ants on releasing pheromone depends on certain problem which is solved, it can not correspond to the behaviors of real ants usually; 5) ACO adopts some extra abilities of artificial ants, such as local optimization, looking forward, looking backward, and global monitor, to improve the efficiency of the whole system. These characteristics are just used in Ant-Miner algorithm. But Ant-Miner algorithm only adopts the basic concepts of ACO algorithm. The introduction of some new concepts can enhance the performance of Ant-Miner algorithm. So we propose ACO-Miner algorithm for improving the performance of algorithm.

656

P. Jin et al.

3 Ant-Miner and Improvements: ACO-Miner Based on the research of [12], we proposed ACO-Miner algorithm for mining classification rules. It improves Ant-Miner algorithm in some aspects. A high-level description of ACO-Miner is shown in Algorithm I. ALGORITHM I: A High-level Description of ACO-Miner TrainingSet = {all training cases}; DiscoveredRuleList = []; /* rule list is initialized with an empty list */ while (|TrainingSet| > Max_uncovered_cases) Adjust Min_cases_per_rule with a certain step; for (p = 1; p < No_ant_populations; p++) j = 1; /* convergence test index */ Initialize all trails with the same amount of pheromone; for (i = 1; i < No_ant_in_each_populationsp, j < No_rules_converg; i++) Anti starts with an empty rule and incrementally constructs a classification rule Ri by adding one term at a time to the current rule until Min_cases_per_rule is not satisfied or each term is used; Prune rule Ri; Update the pheromone of all trails by increasing pheromone in the trail followed by Anti (proportional to the quality of Ri) and decreasing pheromone in the other trails (simulating pheromone evaporation); if (Ri is equal to Ri-1)/*update convergence test */ j++; else j = 1; Choose the best rule Rbest among all rules Ri constructed by all the ants in all the populations; Add rule Rbest to DiscoveredRuleList; TrainingSet = TrainingSet-{set of cases correctly covered by Rbest}; end end end At first, the Ant-Miner algorithm is introduced as follow. 3.1 General Description of Ant-Miner Each iteration of the outer loop, i.e. while loop, of Algorithm I discovers one classification rule, and this rule is added into the list of discovered rules. Then the cases that covered by this rule are removed for training set, until the number of uncovered cases is bigger than Max_uncovered_cases. Each iteration of the interior loop, i.e. for loop, of Algorithm I consists of three steps, including rule construction, rule pruning, and pheromone updating.

Classification Rule Mining Based on Ant Colony Optimization Algorithm

657

1. Rule construction. An ant adds one term at a time into the rule, which it is constructing, according to the value of heuristic function and the amount of pheromone, until 1) any another term to be added into the rule would make that the number of cases covered by this rule is smaller than Min_cases_per_rule, or 2) all attributes have already been used by this ant. 2. Rule pruning. Rule Ri constructed by Anti is pruned in order to remove irrelevant terms. One term is removed at a time from the rule, and the change of the quality of this rule is computed. Then the term, which makes the greatest improvement of the quality of this rule, is removed. This process is repeated until there is only one term in this rule or there is no term whose removal will improve the quality of this rule. 3. Pheromone updating. The amount of pheromone in each trail is updated, increasing the pheromone in the trail followed by Anti according to the quality of rule Ri and decreasing the pheromone in the other trails to simulate the pheromone evaporation. Each of iteration of interior loop corresponds to the behavior of one ant. After one ant finished this process, another one starts, until all ants have finished or the current Anti constructed a rule that is exactly the same as the rule constructed by the previous No_rules_converg – 1 ants. When all the interior loops are completed, the best rule among the rules constructed by all ants is added into the list of discovered rules, and a new iteration of the outer loop starts with reinitializing the same amount of pheromone in all trails. 3.2 Pheromone Updating In the beginning of each of iteration of the outer loop, all trails are initialized with the same amount of pheromone. The initial amount of pheromone is defined as follow.

τ ij (t = 0 ) =

1 a

¦b

(1)

i

i =1

where a is the total number of attributes and bi is the number of possible values that can be taken on by attribute Ai. Pheromone updating is based on two basic ideas: 1) the amount of pheromone in each term occurring in the rule which discovered by current ant should be increased in proportion to the quality of the rule; 2) the amount of pheromone in each term that does not occur in current rule should be decreased to simulate pheromone evaporation. The quality of a rule is defined as Q=

TP TN ⋅ TP + FN FP + TN

(2)

where TP (true positives) is the number of cases that are covered and whose class is predicted correctly by the rule, FP (false positives) is the number of cases that are covered and whose class is predicted falsely by the rule, FN (false negatives) is the number of cases that are not covered but that have the class predicted by the rule, TN (true negatives) is the number of cases that are not covered and that do not have the class predicted by the rule.

658

P. Jin et al.

Then pheromone updating formula of termij occurring in the rule is presented as

τ ij (t + 1) = τ ij (t ) + τ ij (t ) ⋅ Q

(3)

where termij is j th value of attribute Ai. The pheromone evaporation is simulated by normalizing the value of each pheromone τ ij , i.e. dividing τ ij by the summation of all τ ij . This is proved as follow. The summation of increasing pheromone is equal to the summation of increasing pheromone of termij occurring in current rule Ri, i.e.

¦ ∆τ = ¦τ (t ) ⋅ Q

termij ∈ Ri

ij

i) If termij does not occur in Ri, the value of its pheromone

(4)

τ ij

does not change. Then

the difference between the normalizing value of this time and last time is

τ ij (t )

¦¦τ (t ) + ¦ ∆τ

−

ij

i

j

τ ij (t ) <0 ¦¦τ ij (t ) i

(5)

j

i.e. the amount of pheromone in this term is decreased; ii) If termij occurs in Ri, the value of its pheromone is changed to τ ij (t + 1) = τ ij (t ) + τ ij (t ) ⋅ Q . Then the difference between the normalizing value of this time and last time is §

τ ij (t ) + τ ij (t ) ⋅ Q τ ij (t ) − = + ∆ τ t τ ( ) ¦¦ ij ¦ ¦¦τ ij (t ) i

j

i

j

·

τ ij (t ) ⋅ ¨¨ Q ⋅ ¦¦τ ij (t ) − ¦ ∆τ ¸¸ ©

i j ¹ § · τ ij (t ) ⋅ ¨¨ ¦¦τ ij (t ) + ¦ ∆τ ¸¸ ¦¦ i j © i j ¹ § · τ ij (t ) ⋅ Q ⋅ ¨¨ ¦¦τ ij (t ) − ¦τ ij (t )¸¸ © i j ¹ >0 = § · τ ij (t ) ⋅ ¨¨ ¦¦τ ij (t ) + ¦ ∆τ ¸¸ ¦¦ i j © i j ¹

(6)

i.e. the amount of pheromone in this term is increased; For enhancing predictive accuracy and simplicity of rules, ACO-Miner improves Ant-Miner algorithm with the following three aspects. 3.3 Multi-population Parallel Strategy Ant colony in ACO-Miner is divided into some populations. These populations are parallel and run separately. This strategy can avoid dependence on initial term due to the random choice of term at the beginning of interior loop. Each population has the same amount of ants, searches rules in current training set separately, and has its

Classification Rule Mining Based on Ant Colony Optimization Algorithm

659

own list of amount of pheromone and list of discovered rules respectively. When the interior loop of each population is completed, all of the rules in every list are ordered by the quality of the rules. Then the best rule with the highest quality is added into the final list of discovered rules. The number of ant populations is greater, the dependence on initial term is smaller, but the cost of computing is bigger. The number of ant populations can be adjusted according to the number of cases in training set. 3.4 Adjusting Parameters Step by Step In ACO-Miner, the minimum number of cases covered per rule (Min_cases_per_rule) is a variable. It is set a bigger value at the early phase (e.g. 1/20 of the number of cases in training set), and is set a smaller value at the late phase (e.g. 1/50 of the number of cases in training set). During this process, it is decreased step by step. On one hand, setting a bigger value at the early phase can reduce the time of computing and enhance the running efficiency of the algorithm. On the other hand, decreasing the value of Min_cases_per_rule properly can discover new rules more effectively and make more cases in the training set covered. 3.5 Adopting the Cost-Based Discretization Method The cost-based discretization method, proposed in [16], is adopted in ACO-Miner algorithm to replace the C4.5-Disc discretization method used in Ant-Miner algorithm. In practical application, the essentiality of different attributes is discriminating. For example, in the analysis of customer loyalty, the duration of customer relationship keeping has higher essentiality than the time since recently buying. The cost of misclassification is diverse for different attributes. The method proposed in [16] introduces the concept of misclassification cost, puts appropriate weights to different attributes, and make the result of discretization more reasonable.

4 Experimental Results For evaluating the performance of ACO-Miner, we apply SIMiner, a selfdevelopment data mining software system based on swarm intelligence, to experiment on six data sets taken from UCI Repository on Machine Learning. The main characteristics of these data sets are summarized in Table 1. ACO-Miner has five user-defined parameters, namely: 1) the number of ant populations (No_ant_populations); 2) the number of ants in each population (No_ant_in_each_populationsp); 3) the maximum number of uncovered cases in the training set (Max_uncovered_cases); 4) the minimum number of cases per rule (Min_cases_per_rule); 5) the number of rules used to test convergence of the ants (No_rules_converg).

660

P. Jin et al. Table 1. Data sets used in the experiment

Data Set Ljubljana breast cancer Wisconsin breast cancer Tic-tac-toe dermatology hepatitis Cleveland heart disease

#cases 282 683 958 366 155 303

#categ. attrib. 9 9 33 13 8

#contin. attrib. 9 1 6 5

#class 2 2 2 6 2 5

The values of these five parameters will be given according to different experiments. The ten-fold cross-validation method is taken in all the following experiments. All the following experiments are implemented with SIMiner, a data mining software system based on swarm intelligence developed by our own with Java. SIMiner consists of three main modules, i.e. data preprocessing module, classifying module, and clustering module. Some data mining methods based on swarm intelligence, including ACO-Miner and Ant-Miner, are implemented in classifying module and clustering module. These two modules are extensible by jointing other algorithm. The first two modules are used in our experiments. The data sets are preprocessed using this module, including filling the missing values, discretizing the continuous attributes, and generating the subsets for ten-fold cross-validation. Once the data sets are ready, the mining of classification rules with ACO-Miner can be implemented using classifying module. 4.1 Comparing ACO-Miner with Ant-Miner and CN2 We have evaluated the performance of ACO-Miner by comparing it with Ant-Miner and CN2. The five parameters are set as follows: 1) No_ant_populations = 3; 2) No_ant_in_each_populationsp = 3000; 3) Max_uncovered_cases = |TrainSet|/30; 4) Min_cases_per_rule = |TrainSet|/30; 5) No_rules_converg = 10. where |TrainSet| is the number of cases in initial train set but not current train set. The results comparing the predictive accuracy are reported in Table 2. Table 2. Predictive accuracy of ACO-Miner, Ant-Miner, and CN2

Data Set Ljubljana breast cancer Wisconsin breast cancer Tic-tac-toe dermatology hepatitis Cleveland heart disease

Predictive accuracy (%) ACO-Miner Ant-Miner 79.31±3.26 75.28±2.24 96.73±2.56 96.04±0.93 91.63±2.48 73.04±2.53 93.15±0.97 94.29±1.20 91.20±2.73 90.00±3.11 58.72±1.88 59.67±2.50

CN2 67.69±3.59 94.88±0.88 97.38±0.52 90.38±1.66 90.00±2.50 57.48±1.78

Classification Rule Mining Based on Ant Colony Optimization Algorithm

661

From table 2, we can find that in three out of the six data sets the predictive accuracy of ACO-Miner is the highest. In the other three data sets it is the second one. These results indicated that ACO-Miner is competitive with Ant-Miner and CN2 with respect to predictive accuracy, even better than the other two algorithms in some domains. The results comparing the simplicity of the rules are reported in Table 3. Table 3. Simplicity of the rules of ACO-Miner, Ant-Miner, and CN2

Data Set Ljubljana breast cancer Wisconsin breast cancer Tic-tac-toe dermatology hepatitis Cleveland heart disease

Number of rules ACO-Miner Ant-Miner 6.63±0.46 7.10±0.31 4.58±0.21 6.20±0.25 8.23±0.53 8.50±0.62 6.57±0.34 7.30±0.15 3.20±0.23 3.40±0.16 8.79±0.78 9.50±0.92

CN2 55.40±2.07 18.60±0.45 39.7±2.52 18.50±0.47 7.20±0.25 42.40±0.71

As reported in table 3, the simplicity of the rules obtained by ACO-Miner is much better than CN2 in all six data sets, and is appreciably better than Ant-Miner, too. 4.2 Influence of Multi-population Parallel Strategy The influence of multi-population parallel strategy is analyzed by setting different value of No_ant_populations. ACO-Miner was run with three values of No_ant_populations, i.e. 3, 10, and 15. The other parameters are the same as before. The results running with different value of No_ant_populations are presented in Table 4. Table 4. Influence of multi-population parallel strategy on ACO-Miner

Data Set Ljubljana breast cancer Wisconsin breast cancer Tic-tac-toe dermatology hepatitis Cleveland heart disease

No_ant_populations =3 =10 79.31±3.26 79.56±2.78 96.73±2.56 96.78±2.33 91.63±2.48 92.47±3.25 93.15±0.97 93.15±1.06 91.20±2.73 92.39±2.91 58.72±1.88 59.03±2.47

=15 79.59±3.11 96.78±2.33 91.92±2.76 93.32±1.17 92.46±2.43 59.07±2.39

As we can observed in table 4, the predictive accuracy of ACO-Miner enhances as the value of No_ant_populations increases, but when the value of No_ant_populations has been big enough, the enhancement has become unobvious.

662

P. Jin et al.

5 Conclusion This paper has proposed an algorithm for classification rules discovery called ACOMiner. It is based on the ACO algorithm, a swarm intelligence algorithm, and the Ant-Miner algorithm, an algorithm for classification rules mining. In ACO-Miner algorithm, multi-population parallel strategy is proposed, the cost-based discretization method is adopted, and parameters in the algorithm are adjusted step by step. With these improvements, performance of the algorithm is advanced, and predictive accuracy is enhanced than Ant-Miner. SIMiner, a data mining software system based on swarm intelligence developed by our own with Java, is applied to evaluate the performance of the ACO-Miner algorithm. Six data sets taken from UCI Repository on Machine Learning have been used in our experiments. The results illuminate that the algorithm proposed in this paper has better performance in predictive accuracy and simplicity of rules. In the future research, SIMiner need be improved further, and more data mining algorithms should be integrated in. On the other hand, the performance of ACO-Miner should be enhanced by other kinds of heuristic function and pheromone updating strategy, and the applications of ACO-Miner in other domains should be researched.

Acknowledgements This work is supported by the National Natural Science Foundation of China (Grant No. 70431003).

References 1. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2000) 2. Kantardzic, M.: Data Mining: Concepts, Models, Methods, and Algorithms. IEEE Press, New Jersey (2003) 3. Bonabeau, E., Dorigo, M., Theraulaz, Guy.: Swarm Intelligence: from Natural to Artificial Intelligence. Oxford University Press, New York (1999) 4. Muata, K., Bryson, O.: Evaluation of Decision Trees: a Multi-Criteria Approach. Computers & Operations Research 31 (2004) 1933-1945 5. Carvalho, D. R., Freitas, A. A.: A Hybrid Decision Tree/Genetic Algorithm Method for Data mining. Information Sciences 163 (2004) 13-35 6. Li, R. P., Wang, Z. O.: Mining Classification Rules Using Rough Sets and Neural Networks. European Journal of Operational Research 157 (2004) 439-448 7. Dorigo, M., Maniezzo, V., Colorni, A.: Ant System: Optimization by A Colony of Cooperating Agents. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics 26 (1996) 29-41 8. Dorigo, M., Gambardella, L. M.: Ant Colonies for the Traveling Salesman Problem. Biosystems 43 (1997) 73-81 9. Dorigo, M., Di Caro, G.: Ant Colony Optimization: a New Meta-heuristic. In: Proceedings of the Congress on Evolutionary Computation, Washington DC, USA (1999) 1470-1477

Classification Rule Mining Based on Ant Colony Optimization Algorithm

663

10. Dorigo, M., Di Caro, G., Gambardella, L. M.: Ant Algorithms for Discrete Optimization. Artificial Life 5 (1999) 137-172 11. Dorigo, M., Bonabeaub, E., Theraulaz, G.: Ant Algorithms and Stigmergy. Future Generation Computer Systems 16 (2000) 851-871 12. Parpinelli, R. S., Lopes, H. S., Freitas, A. A.: Data Mining with An Ant Colony Optimization Algorithm. IEEE Transactions on Evolutionary Computation 6 (2002) 321-332 13. Liu, B., Abbass, H. A., McKay, B.: Classification Rule Discovery with Ant Colony Optimization. In: Proceedings of the IEEE/WIC International Conference on Intelligent Agent Technology, Halifax, Canada (2003) 83-88 14. Shelokar, P. S., Jayaraman, V. K., Kulkarni, B. D.: An Ant Colony Classifier System: Application to Some Process Engineering Problems. Computers & Chemical Engineering 28 (2004) 1577-1584 15. Admane, L., Benatchba, K., KOUDIL, M., DRIAS, H., GHAROUT, S., HAMANI, N.: Using Ant Colonies to Solve Data-Mining Problems. In: IEEE International Conference on Systems, Man and Cybernetics, Hague, Netherlands (2004) 3151-3157 16. Janssens, D., Brijs, T., Vanhoof, K., Wets, G.: Evaluating the Performance of Cost-Based Discretization Versus Entropy- and Error-Based Discretization. Computers & Operations Research 33 (2006) 3107-3123 17. Kohavi, R., Sahami, M.: Error-Based and Entropy-Based Discretization of Continuous Features. In: Proceedings of the second international conference on knowledge & data mining. Portland, Oregon USA (1996) 114-119

Dynamic Feature Selection in Text Classiﬁcation Son Doan and Susumu Horiguchi Graduate School of Information Science Tohoku University, 6-3-09 Aoba, Sendai 980-8579, Japan {s-doan, susumu}@ecei.tohoku.ac.jp

Abstract. We study the problem of feature selection in text classiﬁcation. Previous researches use only a measurement such as information gain, mutual information, chi-square for selecting good features. In this paper we propose a new approach to feature selection - dynamic feature selection. A new algorithm for feature selection is proposed. In this algorithm, by combining diﬀerent measurements for features and tuning parameters, several feature subsets are generated, the best feature set which achieves the best performance from a classiﬁer is obtained from those. Experiments dealing with the real-world data set show that the proposed dynamic feature selection outperforms traditional feature selection methods.

1

Introduction

Text classiﬁcation is the traditional problem in both machine learning and natural language processing communities. One of the most interesting issues recently in text categorization is feature selection problem. It plays a very important role in data mining in general and text classiﬁcation in particular. Theoretically, feature selection is shown as the NP-hard problem [1] and many solutions based on search heuristics are proposed such as [3],[4],[5]. Feature selection problem for text data seems to be more diﬃcult due to its non-structure. Text data can be found in the form of raw text or emails or Web pages. In addition, the number of terms in many text documents is large and this leads diﬃcult to construct a classiﬁer from features. There are two general models for feature selection in machine learning community: a wrapper model and a ﬁlter model [3],[4]. In the wrapper model, feature can be chosen so that they achieve the best performance of a classiﬁer. The strategy for wrapper model is “try-and-test” the feature subsets. This model yields high performance, however it is hardly practical because it requires a lot of time for computations. Otherwise, the ﬁlter model does not require much time. It chooses the most “informative” features by ﬁltering features through a measurement. The measurement can be determined by an existing measurement in information theory such as entropy, mutual information [7],[14] or statistics measurements such as chi-square, etc [9], [13]. In text categorization, almost researches use the ﬁlter model because of the large number of terms in text data. Some measurements have been proposed [7], [9]. However the drawback of D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 664–675, 2006. c Springer-Verlag Berlin Heidelberg 2006

Dynamic Feature Selection in Text Classiﬁcation

665

the ﬁlter model is that the feature subset is heuristically chosen. In addition, it ignores the role of the classiﬁer in feature selection step. In this paper we propose a new feature selection approach - dynamic feature selection - in text classiﬁcation. In our approach, by combining diﬀerent measurements for features and tuning parameters, several feature subsets are generated, the best feature set which achieves the best performance from a classiﬁer is obtained from those. Experiments dealing with the real-world data set and standard classiﬁers, Rocchio and naive Bayes, show that the proposed dynamic feature selection outperforms traditional feature selection methods. This paper is organized as follows. Section 2 brieﬂy introduces related work. Section 3 introduce the proposed dynamic feature selection. Classiﬁers Rocchio and naive Bayes in text categorization are described in Section 4. Experimental results are shown in Section 5. Section 6 draws conclusions.

2

Related Work

Text classiﬁcation consists of two main steps: pre-processing and classiﬁer building. Feature selection is an important step in text classiﬁcation. Pre-processing includes tasks such as feature extraction, feature selection and document representation. After pre-processing, a document will be represented as a vector of features in Vector Space Model [10] or a “bag-of-words” in probabilistic model; features are the components in a vector or a “word”. Therefore, feature selection plays a very important role in steps later and aﬀects the performance of the whole system. Two most common approaches in feature selection are the ﬁlter and the wrapper [3],[4],[7]. In the wrapper approach, the subset of features is chosen based on the accuracy of classiﬁers. Technically, the wrapper method is relatively diﬃcult to implement, especially with a large amount of data. Instead, the ﬁltering approach is usually chosen because it is easily understood and for its independent classiﬁers. The ﬁlter approach, as its name implies, chooses a subset of features by ﬁltering based on the scores which were assigned by a speciﬁc weighting method. In general, feature selection in text categorization are followings: [7],[11],[14]. 1. Using all terms in the vocabulary: This method uses all terms existing in documents as features (hereafter we refer a feature as a term in a document). It starts from the point of view that all terms in documents have information reﬂecting those contents. 2. Document frequency criterion: Features are selected by their frequencies in a document, with a threshold. 3. Class-based criterion: Select features based on their frequency in a class. 4. Information gain measure: Given a set of categories C = {ci }m i=1 , the information gain of term x is given by [11],[14]: IG(x) = −

m $ i=1

P (ci ) log P (ci ) +

666

S. Doan and S. Horiguchi

P (x)

m $

P (ci |x) log P (ci |x)

i=1 m $

+P (¯ x)

P (ci |¯ x) log P (ci |¯ x).

(1)

i=1

5. Mutual information measure: Mutual information of term t in class c is given by [11],[14]. P (x ∧ c) . (2) M I(x, c) = log P (x).P (c) for a set of categories C = {ci }m i=1 , mutual information of each term t can be calculated by, m $ P (x ∧ ci ) log . (3) M I(x, C) = P (x).P (ci ) i=1 There are also other measures for feature selection, for example, chi-square and odd-ratio . . . [7],[11],[14]. Among these measures, mutual information and information gain measure are mostly used [2],[11],[12],[14].

3

Dynamic Feature Selection in Text Classiﬁcation

The feature selection problem in text classiﬁcation can be stated as follows: Given a original set X consisting of n features x1 , x2 , . . . xn , choose the optimal subset S of X so that S achieves highest eﬀectiveness of text classiﬁcation. To solve this problem, we assume that each feature can be weighted by a speciﬁed measurement. Each measurement corresponds to a speciﬁed criterion. Our basic idea is to combine these measurements in order to choose the optimal subset. Mathematically, feature selection can be stated as follows. Choose a proper subset S of X, given a set of criteria τ1 , τ2 , . . . , τt , within which each criterion determines a ranking of X. Formally, we have: Criterion τ1 : xσ1 (1) θ1 . . . θ1 xσ1 (N ) ... ... Criterion τt : xσt (1) θt . . . θt xσt (N ) where σi is a permutation of the set {1, . . . , N }, and θi is the order relation based on criterion τi with a threshold value θi After ranking X according to a multiple criteria as above, for each criterion τi , we select a subset Si of X based on a threshold θi . The combined set is given by S=

% i=1

Si .

(4)

Dynamic Feature Selection in Text Classiﬁcation

667

Algorithm. DynamicFeatureSel(X - original feature set, S - optimal feature set, θ1 , . . . , θt - threshold values, σ - the classiﬁer) for i =1 to t loop Si ← ø; Step 1. Rank all features based on criterion τi ; Step 2. Choose ﬁrst features based on θi ; Step 3. Return Si ; end loop; S ← S1 ∪ S2 ∪ . . . ∪ St ; Return S Fig. 1. The algorithm DynamicFeatureSel for selecting optimal feature set

By tuning parameters θi , we can generate several subsets S and the optimal feature subset is the subset that achieves highest performance of the classiﬁer. Thus we call the approach dynamic feature selection. Algorithmically, the process of the dynamic feature selection is depicted as in Figure 1. In order to investigate the advantages of this approach to traditional methods, we apply to two baseline models: Vector Space Model with the Rocchio algorithm and probabilistic model with naive Bayes algorithm.

4

Classiﬁers

Suppose we have a database D including m document d1 , d2 , . . . , dm , and a set of given k classes C = (c1 , c2 , . . . , ck ). Given a document d , our problem is to build a classiﬁer σ that can assign the document d , that is σ(d ) to a class. We investigate brieﬂy two algorithms Rocchio and naive Bayes as the followings. 4.1

Rocchio Algorithm

Rocchio algorithm (also called relevance feedback) is proposed by Rocchio [8] and is used as the common technique in information retrieval. It is also considered as the baseline algorithm for text classiﬁcation [11], [13]. It deals with documents in Vector Space Model. One document is represented as a vector in a Vector Space Model. Each dimension corresponds to a term and the importance of terms can be assigned by some existing term weighting methods such as tf , idf , or tf idf , etc. The number of terms may increase several thousands, this leads the number of dimensions of Vector Space is very large. Thus, one of the requirement in Vector Space Model is to reduce the number of dimensions. It corresponds to the problem of feature selection under consideration. Suppose that we have n terms T = (t1 , t2 , . . . , tn ), therefore we have n dimensions. The training set can be divided into two parts: a positive set P S and a negative set N S. The Rocchio algorithm is used to construct a vector characterizing all documents in a class in a vector space depending on the positive and negative sets. c is given by:

668

S. Doan and S. Horiguchi

c=

$ di $ dj β α − . |P S| ||di || |N S| ||dj || di ∈P S

(5)

dj ∈N S

where |P S|, |N S| are the number of documents in the positive and negative sets and ||di ||, ||dj || are the Euclidean length of vectors di , dj . α and β are parameters reﬂecting the factor of positive and negative set, in this paper we set α and β to 1 as the original. Each class has a classiﬁer vector, thus, we can construct k classiﬁer vectors, (c1 , . . . , ck ). A document d in Vector Space Model is represented as a vector d . Then, the similarity between the vector d and class ci can be calculated by the cosine between them as follows, sim(ci , d ) = cosine(ci , d ) =

d .ci . ||d ||.||ci ||

(6)

Then, the class of the document d is calculated by the following formula, σ(d ) = arg max sim(ci , d ).

(7)

i∈[1...k]

4.2

Naive Bayes Algorithm

The naive Bayes algorithm is based on probabilistic model, in which each document can be represented as a bag-of-word. Suppose that a document d consisting of terms t1 , t2 , . . . tn . The naive Bayes algorithm calculates the probability of a class belonging to each document with the assumption of independent variables (attributes). The formulation is based on the Bayes theorem is given by: P (ci |d ) ∝ P (d |ci )P (ci ) = P ((t1 , t2 , . . . , tn )|ci )P (ci ) =

k &

P (tj |ci )P (ci )

j=1

Thus, the class of document d is calculated by the following formula, σ(d ) = arg max P (ci |d ).

(8)

i∈[1...k]

5 5.1

Experiments Experimental Methodology

We apply both algorithms Rocchio and naive Bayes to text classiﬁcation problem. In order to validate and further investigate the dynamic feature selection, we consider two cases of text classiﬁcation: binary classiﬁcation, the problem of assigning a document into only two classes, positive and negative classes, and multiple classiﬁcation, the problem of assigning a document into more than two classes.

Dynamic Feature Selection in Text Classiﬁcation

669

Table 1. 20Newsgroup Dataset comp.graphics comp.windows.x comp.os.ms-windows.misc comp.sys.mac.hardware comp.sys.ibm.pc.hardware talk.politics.guns talk.politics.mideast talk.politics.misc talk.religion.misc soc.religion.christian

sci.electronics sci.crypt sci.space sci.med misc.forsale alt.atheism rec.sport.baseball rec.sport.hockey rec.autos rec.motorcycles

In binary classiﬁcation, we compare dynamic feature selection to the method using all terms in vocabulary. Two criteria corresponding to document frequency and information gain measures are chosen. This is because these two measurements are often selected for binary classiﬁcation [14]. The details of choosing parameters are indicated in Section 5.4. In multiple classiﬁcation, we compare dynamic feature selection to the method using all terms in vocabulary and the method which uses only a measurement. In the experiments we choose mutual information measure because it is often selected for multiple classiﬁcation [13]. The details are indicated in Section 5.4. 5.2

Real-World Database

The data set used in our experiments is 20Newsgroup. It is Usenet’s emails collected by Ken Lang and used as the benchmark data for text classiﬁcation. It has 20 diﬀerent groups, each group has 1000 documents in which there is no document belonging to more than 2 groups. The names of groups in 20Newsgroups is shown in Table 1. We used Rainbow package for text pre-processing [6]. Experiments are executed in SunOS 5.8 operating system, Perl, sed and awk programming languages. 5.3

Performance Measurement

Here we describe the performance of text classiﬁcation in general, e.g, in case of multiple classes. Conventional performance measurement in text classiﬁcation involves precision/recall, the F1 measure, and the break-even point (BEP ). For a binary classiﬁer, precision and recall are deﬁned as follows [14]: P =

Documents found and correct . Total documents found

(9)

R=

Documents found and correct . Total documents correct

(10)

670

S. Doan and S. Horiguchi Table 2. Category contingency Category c

Human assign Human assign yes no Classiﬁer predict yes a b Classiﬁer predict no c d

Table 3. Global contingency for a set of classes Set of classes Human assign Human assign C = {c1 , c2 , . . . , c|C| } yes no |C| |C| $ $ Classiﬁer predict yes A = ai B= bi i

Classiﬁer predict no

C=

|C| $

i

ci

D=

|C| $

i

di

i

Mathematically, they are expressed through the contingency table for a category in Table 2. P = a/a + b. (11) R = a/a + c.

(12)

There is often a trade-oﬀ between precision and recall: precision P is often high and recall R low when data is limited but when data is greater, P decreases as R increases. The relationship between precision and recall is characterized by a graph called an precision/recall curve. F1 measurement combines precision and recall, characterizing the performance of classiﬁcation and deﬁned as follows, F1 = 2P R/(P + R).

(13)

The break-even point (BEP ) is that at which precision equals recall (P = R). The BEP , often used when comparing the accuracy of diﬀerent methods, is calculated using interpolation in which the BEP is the average of precision and recall when they are closest to each other. For multiple classes, microaveraging of precision and recall is calculated based on the global contingency table in Table 3. 'k ai A = 'k i=1 . A+B i=1 (ai + bi )

(14)

'k ai A micro-R = = 'k i=1 . A+C i=1 (ai + ci )

(15)

micro-P =

Dynamic Feature Selection in Text Classiﬁcation

671

Table 4. F1 and BEP for comp.graphics and comp.windows.x with Rocchio algorithm parameters θ1 θ2 20 2 20 4 20 10 40 2 40 4 40 10 100 2 100 4 100 10 all terms

comp.graphics F1 BEP 71.25 70.31 71.55 69.66 71.21 69.86 72.63 72.05 72.00 70.03 70.53 68.88 74.22 72.99 72.72 71.65 70.96 69.03 70.99 69.47±0.04

comp.windows.x F1 BEP 71.08 69.15 68.79 65.51 67.11 62.57 67.35 70.63 69.11 66.88 68.12 65.12 71.20 69.39 69.48 66.65 68.55 65.50 67.35 63.04±0.03

Microaveraging F1 (micro-F1 ) measures system performance for multiple categories is calculated by micro-F1 =

2 · micro-P · micro-R . micro-P + micro-R

(16)

In this paper we use all micro of F 1, BEP , and precision/recall for evaluating performance of text classiﬁcation for binary classes. For multiple classes we use microaveraging F 1 and BEP . 5.4

Experimental Results

Binary Classiﬁer. We choose two groups comp.graphics and comp.windows.x for binary classiﬁcation. Parameters in algorithm DynamicFeatureSel are selected as follows: τ1 is information gain, τ2 is document frequency. Two threshold values θ1 and θ2 are set as θ1 = 20, 40, 100 and θ2 = 2, 4, 10, respectively. the classiﬁers are Rocchio and naive Bayes. Parameters in Rocchio are set to 1. The total number of terms in vocabulary is 6740 and the number of terms after applying the algorithm are 303, 983 and 3327, respectively. The experiments are implemented 10-fold cross-validation. Table 4 shows the F1 and BEP of two groups comp.graphics and comp.windows.x for Rocchio algorithm and Table 5 for naive Bayes algorithm. The result shows that, in almost cases, dynamic feature selection methods outperformed the method using all terms in vocabulary. In case we choose appropriate parameters, we can achieve the highest performance, this case θ1 = 100 and θ2 = 2. In case of Rocchio algorithm, F 1 and BEP scores achieve the highest performance 74.22 and 72.99 for comp.graphics, and 71.20 and 69.39 for comp.windows.x In case of naive Bayes algorithm, F 1 and BEP scores achieve the highest performance 84.25 and 82.67 for comp.graphics, and 83.02 and 82.93 for comp.windows.x.

672

S. Doan and S. Horiguchi

Table 5. F1 and BEP for comp.graphics and comp.windows.x with naive Bayes algorithm parameters θ1 θ2 20 2 20 4 20 10 40 2 40 4 40 10 100 2 100 4 100 10 all terms

comp.graphics F1 BEP 81.38 79.74 81.38 80.11 81.48 79.84 81.93 79.12 81.87 80.07 80.39 78.67 84.25 82.67 82.74 81.45 80.11 79.40 81.71 80.34

comp.windows.x F1 BEP 79.74 78.20 79.74 80.11 79.89 79.85 79.58 79.16 80.44 80.05 78.96 78.67 83.02 82.93 81.67 81.47 79.53 79.41 80.43 80.26

Precision/recall curves are shown in Figure 2 and Figure 3. Figure 2 shows the results for Rocchio algorithm, and Figure 3 for naive Bayes algorithm. We can easily see that precision/recall curves of all methods using dynamic feature selection are above the method using all terms. comp.graphics

.4

θ1=40, θ2=2 θ1=40, θ2=4 θ1=40, θ2=10 all terms 0

.2

1

PRECISION

PRECISION

.8

.6

.4

.6 RECALL comp.graphics

.8

.4

.2

.4

RECALL

.6

θ1=40, θ2=2 θ1=40, θ2=4 θ1=40, θ2=10 all terms

.6

0

.2

1

θ1=100,θ2 =2 θ1=100,θ2 =4 θ1=100,θ2 =10 all terms 0

.8

.4

1

.8

.6

comp.windows.x

1

PRECISION

PRECISION

1

.8

1

.4 .6 RECALL comp.windows.x

.8

1

.8

1

.8 θ1=100, θ2=2 θ1=100, θ2=4 θ1=100, θ2=10 all terms

.6

.4

0

.2

.4

RECALL

.6

Fig. 2. Precision/recall curves of category comp.graphics and comp.windows.x with θ1 = 40, θ2 = 2, 4, 10 (two ﬁgures above), and θ1 = 100, θ2 = 2, 4, 10 (two ﬁgures below) using Rocchio classiﬁer

Dynamic Feature Selection in Text Classiﬁcation

comp.graphics

.8

.6

.4

θ1=40, θ2=2 θ1=40, θ2=4 θ1=40, θ2=10 all terms 0

.2

.4

.6

comp.windows.x

1

PRECISION

PRECISION

1

.8

.8

.6

.4

1

θ1=40, θ2=2 θ1=40, θ2=4 θ1=40, θ2=10 all terms 0

.2

RECALL comp.graphics

1

PRECISION

PRECISION

1

.8

θ1=100, θ2=2

.6

.4

.6 RECALL comp.windows.x

all terms .2

.4

RECALL

.6

.8

.4 1

1

θ1=100, θ2=2 θ1=100, θ2=4 θ1=100, θ2=10 all terms

.6

θ1=100, θ2=4

0

.8

.8

θ1=100, θ2=10 .4

673

0

.2

.4

.6

.8

1

RECALL

Fig. 3. Precision/recall curves of category comp.graphics and comp.windows.x with θ1 = 40, θ2 = 2, 4, 10, (two ﬁgures above) and θ1 = 100, θ2 = 2, 4, 10 using naive Bayes classiﬁer

We can also see that, compared with Rocchio algorithm, naive Bayes achieved better performance with the same number of features. Multiple Classiﬁers. We chose total 20 groups for multiple classiﬁcation. Parameters in DynamicFeatureSel are chosen as follows: τ1 is mutual information and τ2 is document frequency. θ1 is set to 2500 and 10000, respectively and θ2 is set to 2, 4, 6, and 8. Then, the number of features with θ1 = 2500, θ2 = 2, 4, 6, 8 are 9751, 14676, 18912, and 22599, respectively. In case of θ2 = 10000, θ1 = 2, 4, 6, 8, the number of features are 14022, 17605, 21041 and 24207, respectively. The total number of all terms without feature selection is 114444. Three baseline methods for feature selection are chosen for comparisons: selection with mutual information measure and using all terms in vocabulary. We chose the number of features 2500 (denoted baseline 1) and 10000 (denoted baseline 2), the same as the parameters θ1 . The experimental results for Rocchio algorithm is shown in Table 6, and for naive Bayes in Table 7. In both tables we can ﬁnd that all cases using both parameters θ1 and θ2 outperformed baseline 1 and baseline 2, and favorably comparable to the method using all terms in vocabulary. We note that, with the same parameters, we can achieve highest performance in both algorithms. In our case, with θ1 = 2500 and θ2 = 8 we achieve 80.8 and 82.3 for Rocchio and naive Bayes, respectively. Then, we have the optimal subset with only 22599 terms, using only 22599/114444 ≈ 20% all terms in vocabulary.

674

S. Doan and S. Horiguchi Table 6. Micro F1 performance of 20Newsgroups, Rocchio algorithm Parameters θ1 θ2 2500 2 2500 4 2500 6 2500 8 10000 2 10000 4 10000 6 10000 8 baseline 1 baseline 2 all terms

1 82.0 80.4 80.6 80.7 80.1 81.0 81.2 80.2 74.9 79.6 81.6

2 79.4 79.2 81.5 81.6 81.5 80.2 81.1 81.3 75.3 80.1 80.9

3 80.0 81.4 81.0 81.9 79.1 80.1 81.8 80.1 76.3 80.3 80.6

4 80.8 80.7 80.3 81.4 79.5 81.3 80.3 81.3 77.2 81.0 81.1

Trials 5 6 7 79.6 80.5 79.1 79.7 81.0 79.5 80.6 81.0 80.8 80.4 80.2 80.7 80.6 80.2 80.3 82.0 81.8 81.3 80.4 79.1 80.1 79.7 79.9 81.5 76.8 75.8 75.4 78.8 79.9 80.5 77.8 81.8 79.9

8 79.9 80.0 79.9 80.4 80.7 80.8 79.7 83.4 77.0 79.5 81.5

9 79.6 79.6 80.1 80.3 79.7 80.3 81.6 81.4 75.2 79.9 82.1

10 80.8 81.0 81.3 80.3 81.4 81.1 81.2 80.0 76.7 76.7 81.6

ave. 80.2 80.3 80.7 80.8 80.9 81.0 80.6 80.1 76.1 80.0 80.8

Table 7. Micro F1 performance of 20Newsgroups, naive Bayes algorithm Parameters θ1 θ2 2500 2 2500 4 2500 6 2500 8 10000 2 10000 4 10000 6 10000 8 baseline 1 baseline 2 all terms

6

1 81.2 80.2 82.5 82.2 80.9 80.9 82.0 82.5 78.7 79.5 82.8

2 80.3 81.0 82.0 82.3 80.9 81.7 81.7 81.3 77.8 79.6 83.6

3 80.3 81.5 83.2 83.0 81.7 82.9 81.3 81.7 71.1 80.9 82.3

4 81.0 80.6 81.2 81.8 82.5 81.2 82.9 82.5 79.0 80.2 82.6

5 79.8 81.0 81.9 81.9 81.1 81.5 81.2 82.6 78.4 80.4 81.3

Trials 6 7 80.8 81.3 81.1 80.9 81.8 81.9 83.0 82.2 81.6 80.1 79.9 80.9 81.5 82.6 81.1 82.2 75.9 76.5 80.4 78.8 81.9 83.7

8 79.9 81.1 82.1 81.2 82.7 82.0 81.0 81.5 78.3 80.5 83.9

9 79.1 81.5 79.4 82.8 82.4 82.5 82.3 81.1 78.7 79.1 81.5

10 79.9 81.3 81.8 82.6 81.0 81.3 82.1 81.6 78.0 79.7 83.6

ave. 80.4 81.0 81.8 82.3 81.5 81.6 81.8 81.9 77.3 79.9 82.7

Conclusions

This paper proposed a new dynamic feature selection in text classiﬁcation. We showed that, by tuning parameters in algorithm DynamicFeatureSel, the optimal feature subset can be obtained. The advantages of performance have been investigated in terms of F 1, BEP , and precision/recall curves.

References 1. Amaldi,E., Kann,V.:On the Approximation of Minimizing non Zero Variables or Unsatisﬁed Relations in Linear Systems, Theoretical Computer Science, 12(209)(1998)237–260 2. Baker,L., McCallum,A.:Distributional Clustering of Words for Text Classiﬁcation, In Proc of SIGIR-98(1998) 96–103

Dynamic Feature Selection in Text Classiﬁcation

675

3. Blum,A., Langley, P.:Selection of Relevant Features and Examples in Machine Learning, Artiﬁcial Intelligence, 97(1-2)(1997)245–271 4. Kohavi ,R., John,G.: Wrappers for Feature Subset Selection, Artiﬁcial Intelligence, 97(1-2)(1997)273–324 5. Liu,H., Motoda,H.: Feature Selection for Knowledge Discovery and Data Mining, Kluwer Academic(1998) 6. McCallum,A., Bow: A Toolkit for Statistical Language Modeling, Text Retrieval, Classiﬁcation and Clustering (1996) http://www.cs.cmu.edu/∼ mccallum/bow 7. Mladenic,D.: Feature Subset Selection in Text Learning, In Proc of European Conference on Machine Learning(ECML)(1998) 95–100 8. Rocchio,J.: Relevance Feedback in Information Retrieval, In G. Salton, editor, The SMART retrieval system: Experiments on Automatic Document Processing, chapter 14, Prentice Hall (1971)313–323 9. Rogati,M., Yang,Y.: High-performing Feature Selection for Text Classiﬁcation, In International Conference on Information and Knowledge Management-CIKM2002, (2002) 659–661 10. Salton,G., WongA., Yang,C.: A Vector Space Model for Automatic Indexing, Communications of the ACM, 18(11)(1975)613–620 11. Sebastiani,F.: Machine Learning in Automated Text Categorization, ACM Computing Survey, 34(1)(2002)1–47 12. Yang,Y.: An Evaluation of Statistical Approaches to Text Categorization, Information Retrieval Journal (1999)69–90 13. Yang,Y., Liu,X.: A Re-examination of Text Categorization Methods, In Proc. of 22th SIGIR, ACM Intl. Conf. on Research and Development in Information Retrieval (1999)42–49 14. Yang,Y., Pedersen,J.: A Comparative Study on Feature Selection in Text Categorization, In Proceeding of the 14th International Conference on Machine Learning (ICML97) (1997)412–420

Knowledge Veriﬁcation of Active Rule-Based Systems Lorena Chavarr´ıa-B´aez and Xiaoou Li Secci´ on de Computaci´ on Departamento de Ingenier´ıa El´ectrica CINVESTAV-IPN Av.IPN 2508, M´exico D.F., 07360, M´exico [email protected], [email protected]

Abstract. Active rules are a mechanism to provide a reactive behavior to software systems. They are not exempt to have errors, which can be introduced inadventently during their development phase. Determining if a rule set is free of errors, involves to verify its correcteness. In this work, we ﬁrst extend the error deﬁnitions given for production rules, in order to consider all the elements of ECA rules. After, we propose a method, based on CCPN, to detect them. Unlike other works, our approach is able to detect errors independently of initial marking.

1

Introduction

Active rules are a mechanism by which some software systems, called active systems, can exhibit a reactive behavior due to they enable them to execute actions automatically when a relevant event happens and some conditions are met. Since active rules have three components: an event, a condition and an action, frequently they are called ECA (Event-Condition-Action) rules. In order to assure active rules in a software system will not cause undesired performance, it is necessary to verify that they are free of errors. Errors are introduced during the design of the system or knowledge adquisition. They can be originated by causes such as: 1) the lack of the system speciﬁcation or lack of adherence; 2) semantics as well as syntactic errors introduced during system implementation [7]. Main structural (syntactic) errors identiﬁed are: inconsistency, incompleteness, redundancy and circularity. Veriﬁcation process is useful to check if the rule base matching with the speciﬁcations (when they exist), as well as to check its consistency and completeness, which are aﬀected by semantics and syntatic errors. Veriﬁcation is related to building the system right. Several research works have been developed in order to verify the consistency and completeness of production rules. However, production rules are just a special case of active rules. As to active rule base analysis, only few related works exist [9] and they are mainly focused on analyze properties such as termination (circular rules) and conﬂuence, rather than verifying structural errors. Earlier production rule-bases veriﬁcation process involves the comparison of rule pairs [6], however, recent proposals [2], [3], [4], [5] uses graphical techniques D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 676–687, 2006. c Springer-Verlag Berlin Heidelberg 2006

Knowledge Veriﬁcation of Active Rule-Based Systems

677

to do it. In this sense, a production rule base can be veriﬁed by means of Petri Nets (PN). In [5] a special PN type, called − net, is proposed, and in [3] authors propose the use of a Colored PN. In both cases, authors use analysis tools (such as reachability and transition graphs) to determine if errors are present. Nevertheless, these tools can detect errors depending on the initial marking of the net. In [4], authors propose to performs a static analysis of the PN by using a Predicate/Transition net to model the knowledge base. With this model, they detect errors through a syntactic pattern recognition method. This method allows to authors to detect errors independently of initial marking of the net. But it is only designed to verify production rules bases. For ECA rules, there are works focused only on analyze speciﬁc characteristics [9], such as termination and conﬂuence, but there is not any work focused on error detection. For this reason, in this paper, we ﬁrstly analyze active rule bases, and extend the structural errors in production rule-based systems to active rule bases. Then, a method based on a PN extension, called Conditional Colored PN (CCPN), is proposed to detect those errors, we use the incidence matrix of CCPN as analysis tool. Our work contributes to ECA rules base veriﬁcation activities because we can detect errors into rule bases and we can identify some enhacement to the rules. The rest of the paper is organized as following: Section 2 reviews structural error deﬁnitions in production rules, active rules characteristics and CCPN model. Section 3 deﬁnes structural errors in active rule bases. Section 4 presents our methods to detection structural errors based on CCPN model, also shows some examples. Section 5 exposes the conclusion and future work.

2

Preliminar Deﬁnitions

In this section, we review main structural errors that can happen during the development of a production rule base. Then general knowledge model of active rules are introduced. Finally, how CCPN is used to model active rules is illustrated by an example. 2.1

Structural Errors in Production Rule Base

The most common structural errors in rule bases are: inconsistency, incompleteness, redundancy and circularity [2], [4], [7]. Inconsistency has a main characteristic conﬂicting rules. They occur when under the same premises, their conclusions are contradictory. For example, consider the following two rules: p → q and p → ¬q. Both rules have the clause p as premise but they reach contradictory conclusion. Incompleteness refers to the situation that when a rule base lacks necessary information to answer important questions, i.e., there are missing rules. Missing rules can happen due to factors, such as: 1) facts that are not used within the inference process; 2) conclusions not aﬀected by any rules; and 3) a failure to cover all legal values of some input. Apart from missing rules, other rule types that characterize incompleteness are dead-end rules and unreachable rules. If the conclusion of a rule can neither match the premise of any other rules, nor the

678

L. Chavarr´ıa-B´ aez and X. Li

ﬁnal conclusion, this conclusion is called dead end. On the other hand, if there is no reference path in rule base to reach the system target conclusion with any initial conditions, this conclusion is an unreachable goal. Redundancy includes subsumed and redundant rules. Two rules are redundant if they have the same premises (does not matter the order in which the atoms can be arranged) and reach identical conclusions. For example, consider the following rules: p ∧ q → r and q ∧ p → r. Does not matter the order of the atoms in the premises, both rules conclude the same. One rule is considered subsumed by another if it has more constraints in the premise, but they have the same conclusion. For example, consider the following two rules: p ∧ q ∧ r → s and p ∧ q → s. We can say that rule p ∧ q ∧ r → s is subsumed by rule p ∧ q → s because this last have less constraints and both conclude the same fact. Circularity. Occurs when a sequence of rules employs a circular reasoning, i.e., they cause a loop, so they are circular rules. We show some circular rules: p → q and q → p. When the ﬁrst rule ﬁnds a match and ﬁres, its conclusion will cause that the second rule ﬁres too. But when this last ﬁres, its conclusion will match the ﬁrst rule again, which will generate an inﬁnite loop. 2.2

ECA Rules Description

As we mentioned, an active (ECA) rule has three parts. The event part of a rule describes a happening to which the rule may be able respond. The condition examines the context in which the event has taken place. The action describes the task which will be carried out by the rule if the event has taken place and the condition is fulﬁlled. The general form of an ECA rule is the following: ON event IF condition THEN action When an event is detected, the rule is activated. When the rule is activated, it may be triggered or not depending on the evaluation result of the condition. During the action execution of a rule, other events can in turn be signaled that may produce a cascade rule ﬁring. Due in this moment we only are interested in static analysis of ECA rules, we will omitt the execution model of ECA rules. For more details, see [1]. 2.3

CCPN for Modeling ECA Rules

Colored Petri Nets (CPN) is a high-level PN which integrates the strength of PNs with the strength of programming languages.Thus, CPN can manipulate data values. Due to this, Conditional Colored Petri Net (CCPN) was designed as an CPN extension. CCPN is a model for speciﬁcation of ECA rules [8]. Due to space restrictions, we only show the modeling process for ECA rules by using CCPN. However, it is important to say that CCPN has a formal deﬁnition. For more details to CCPN, see reference [8].

Knowledge Veriﬁcation of Active Rule-Based Systems

679

CCPN has a graphical and mathematical representation. The graphical elements used in CCPN are showed in Figure 1.

Fig. 1. List of CCPN elements

Mathematical representation of CCPN is given by incidence matrix which shows the relations between places and transitions. The rows of the matrix represent transitions and the columns represent places. The absolute value of an element Ai,j represents the weight of the arc that connects the transition i with the place j. If the value of Ai,j is zero it means that there is not connection between the transition i and the place j. An entry Ai,j with negative value means that place j is an input place for the transition i. An entry Ai,j with positive value means that place j is an output place for the transition i. Modeling ECA Rules with CCPN. In general, an ECA rule can be mapped to CCPN elements as following: condition part is mapped to a conditional transition, and event and action parts are mapped to input and output places of transitions, respectively. Matching between events and input places has the following characteristics: Primitive places, (Pprim ) , represent primitive events; Composite places, (Pcomp ) , represent composite events; Copy places, (Pcopy ) , are used when one event triggers two or more rules. An event can be shared by two or more rules, but in PN theory, one token need to be duplicated for share. A copy place takes the same information as its original one; Virtual places, (Pvirt ) , are used for accumulating diﬀerent events that trigger the same rule. For example, when the event part of a rule is the composite event OR. Matching between conditions and transitions has the following characteristics: Rule transitions, (Trule ) , represent a rule; Composite transitions, (Tcomp ) , represent composite event generation; Copy transitions, (Tcopy ) , duplicate one event for each triggered rule.

680

L. Chavarr´ıa-B´ aez and X. Li

Fig. 2. Diﬀerent structures for an ECA rule

Figure 2 shows diﬀerent CCPN structures in which an ECA rule can be mapped. A whole ECA rule base is formed by connecting those common places that are both output and input of two diﬀerent transitions, i.e., places that represent actions that a rule must carry out meanwhile these actions are events that activate other rules. The following example, in the context of active databases, will be used through the paper. Example 1. The rule base is based on the following relational database: Tables STUDENT (Name, Career, Speciality, IdTutor, IdS ) PROJECT ( IdP, Budget ) TEACHER (Name, IdT, Category, Salary, Status) NOTE (IdEst, Mark, IdSub) The following rules form the rule base. Rule 1 ON insert TEACHER IF TEACHER.Salary > 15000 THEN delete from TEACHER where TEACHER.IdT = new.TEACHER.IdT Rule 2 ON delete TEACHER IF TEACHER.Status != ”Tutor” THEN delete from STUDENT where STUDENT.IdS = TUTOR.IdS Rule 3 ON or(insert NOTE, update NOTE.Mark ) IF NOTE.Mark < 6 THEN delete from STUDENT where STUDENT.IdS = NOTE.IdS

Knowledge Veriﬁcation of Active Rule-Based Systems

681

Rule 4 ON insert TEACHER IF TEACHER.Category < 3 THEN delete from TEACHER where TEACHER.IdT = new.TEACHER.IdT Rule 5 ON update NOTE.Mark IF NOTE.Mark < 6 then delete from STUDENT where STUDENT.IdS = new.STUDENT.IdS Rule 6 ON and(insert TEACHER, insert PROJECT) if PROJECT.Budget < 20000 then delete from PROJECT where PROJECT.IdP = new.PROJECT.IdP Rule 7 ON insert PROJECT IF PROJECT.Budget < 20000 then delete from PROJECT where PROJECT.IdP = new.PROJECT.IdP Rule 8 ON insertPROJECT IF PROJECT.Budget < 20000 THEN delete from PROJECT where PROJECT.IdP = new.PROJECT.IdP The rule base of Example 1 can be mapped into a CCPN as shown in Figure 3, by following CCPN modeling process. Also, Figure 3 shows the matching between rules and transitions of CCPN.

Fig. 3. CCPN of Example 1

3

Structural Errors in Active Rule Bases

In this section we present our extension of one of the main structural errors that can happen during the desing of the ECA rule base. 3.1

Subsumed Rules

Subsumed rules, as they are deﬁned in production rule-based systems, are related with the fact that if a rule has less premises than another and both generate the

682

L. Chavarr´ıa-B´ aez and X. Li

same action, then the rule with less premises subsumes to the other due to, with less premises, it can reach the same results. In such systems the AND connector is used in order to join the restrictions of the premises. ECA rules not only have an AND operator but a more extensive range of operators [1]. In this section we are going to examine subsumed rules when operators AND and OR are employed. For both types of operators, it is possible that a rule is totally or partially subsumed into another. Subsumed rules - composite event AND Composite event with an AND operator forces that all elements that it involves must happen. If two rules execute the same action but one of them needs less restrictions than the other, we say that this one subsumes to the other. In general, we have two ways of subsumed rules: totally and partially subsumed. Totally subsumed rule is the case that both event and condition of a rule are subsumed into the event and condition of another and both rules generate the same action. Partially subsumed rules diﬀers from totally subsumed rules in the sense that the conditions may be diﬀerent in each rule, so that one rule subsumes to the other only if their conditions are both true. When we have an AND operator, we adopt the deﬁnition that in rule based systems is given for subsumed rules. Deﬁnition 1. Totally subsumed rules (composite event AND). Rule 2 is subsumed by Rule 1 if the following conditions are met: a) event and condition of Rule 1 are a subset of those of Rule 2, and b) both execute the same action or the action of Rule 2 is a subset that of Rule 1. Let us consider Rule 6 and Rule 7 of Example 1. We say that Rule 7 subsumes to Rule 6 totally because its event is a subset of that of Rule 6 and both verify the same condition and execute the same action. When both rules execute the same action and the event of one of them is subsumed by the event of the another, but they do not evaluate the same condition, then a rule subsumes to the other when both conditions are evaluated true. Let us suppose the condition of Rule 6 is changed to: IF TEACHER.Category < 5. Obviously Rule 6 and Rule 7 are no longer totally subsumed. Only when their conditions are met we can say than Rule 7 subsumes to Rule 6 because with less constraints it is possible to reach the same result. Subsumed rules - composite event OR Now let us consider OR type composite event. Unlike AND composite event, in which it is necessary that all constraints in the premise are met in order to reach the rule conclusion, composite event OR only needs that one of its constraints deﬁned in the premise occurs. Similar to AND operator, we deﬁne totally and partially subsumed rules for OR type composite events. Deﬁnition 2. Totally subsumed rules (composite event OR). Rule 1 is subsumed by Rule 2 if: a) the event and condition of Rule 1 are a subset those of Rule 2; and b) the action of Rule 1 is the same (or a subset) of that of Rule 2.

Knowledge Veriﬁcation of Active Rule-Based Systems

683

Consider Rule 3 and Rule 5 of Example 1. We say Rule 3 subsumes to Rule 5 because the ﬁrst one contains more events that may cause its triggering, the actions are the same. There are cases in which a rule ”looks like” to be subsumed by another but they have diﬀerent conditions. In this case, we have partially subsumed rules since if condition of any of them is not true, then rules will not be subsumed. For example, taking into account Rule 3 and Rule 5 of Example 1, if the condition of Rule 3 is changed to NOTE.Mark != 7, then those rules are no longer totally subsumed. A rule is subsumed to another only when both conditions are evaluated true.

4

Error Detection

As we have mentioned, several works have attempted to verify production rulebased systems [3], [5] based on PN’s. However, to detect errors they depend on the initial marking of the net. We propose to perform a static analysis in order to identify structural errors based on incidence matrix of the CCPN. Incidence matrix represents, in a simple manner, interactions among the rules. Before we can present our methods it is necessary to deﬁne some concepts. Deﬁnition 3. Initial place. An initial place in a CCPN, is a place that only has output arcs and no input arcs. For instance, in Figure 3, places E5 , E0 , E2 and E3 are examples of initial places. An initial place in incidence matrix only has a negative value in its column. Deﬁnition 4. End place. An end place is that only has input arcs and no output arcs. In Figure 3, places E12 and E9 are end-places. An end place in incidence matrix only has positive values in its column. Deﬁnition 5. Incident place. An incident place is a primitive place that has more than one input arc. In Figure 3, place E1 is an incident place. An incident place has more than one positive value in its column, of course, the colum must to correspond to a primitive place. Deﬁnition 6. Route (RU (pi , tj )). Given to an initial place pi and a rule typed transition tj in a CCPN, a route RU from pi to tj RU (pi , tj ) is a ﬁnite sequence of pairs of arcs that connect places and transitions and vice versa, as the following way: RU (pi , tj ) = (pi , tk ) → (tk , pn ) → . . . → (tm , ps ) → (ps , tj ) where: pi is an initial place; tj is a rule typed transition; pr is a place, r = n, . . . , s;

684

L. Chavarr´ıa-B´ aez and X. Li

td is a transition, d = k, . . . , m; (pr , td ) is an input arc; (td , pr ) is an output arc. For example, in Figure 3, a route from E0 to T5 is RU (E0 , T5 ) = (E0 , T4 ) → (T4 , CopyE0 ) → (CopyE0 , T7 ) → (T7 , E1 ) → (E1 , T5 ) Algorithm 1 describes the procedure for generating routes, based on incidence matrix. Algorithm 1. Detecting routes into incidence matrix Input: Incidence matrix Output: List of routes 1. for i = 0 to i < # rows 2. for j = 0 to j < # columns 3. if w (ti , pj ) < 0 4. Adds pi to Places set 5. ∀pi ∈ P laces 6. if pi is an initial place 7. The transitions and places ”visited” form a route 8. Return to step 5 9. else 10. for k = 0 to k < # rows 11. if w (tk , pj ) > 0 12. Adds tk to Transitions set 13. ∀t ∈ T ransitions 14. Go to step 2 In general terms, to generate a route from incidence matrix, we have to look for negative values in its rows and, for each one of them, to look for positive values in its columns. For each positive value, to look for negative values in its rows and continue until to ﬁnd an initial place. As we mentioned earlier, during the construction of CCPN a matching between ECA rules and transitions must be registered. In order to obtain the information about the matching, we deﬁne the following procedures: Get transitions(e): Returns the transitions for which e is an output place. Get routes(t): Returns a list of all the routes of a transition t. Routes are obtained by means procedure described earlier. Get composite(r): Returns the composite events that are in a route, r. Get primitives(p): Returns the component places of a composite place, p. Type of(p): Returns the type (AND, OR) of a composite event p. Incidence matrix of CCPN of Figure 3 is given in Figure 4. After deﬁning the described procedures, we can provide procedures for detect structural errors.

Knowledge Veriﬁcation of Active Rule-Based Systems

685

Fig. 4. Incidence matrix of CCPN of Example 3

4.1

Procedures for Error Detection

In this section, we will describe procedures for detecting structural errors for ECA rules. Subsumed rules Algorithm 2 shows how to detect if a rule is subsumed by another. Algorithm 2. Detecting subsumed rules Input: Incidence matrix Output: Subsumed rules 1. ∀p ∈ Incidence list 2. T r = Get transitions( p ) 3. ∀t ∈ T r 4. Ru = Get routes( t ) 5. ∀r ∈ Rul 6. p = Get composite( r ) 7. if p != ∅ 8. pp = Get primitives( p ) 9. If any element of pp is in any route of Ru* 10. If Type of( p ) is AND 11. Then pp subsumes to p 12. If Type of( p ) is OR 13. Then p subsumes to pp * Omitting the routes in which composite event is.

A similar procedure is applied to verify conditions. The procedure identiﬁes composite places along routes that executing the same action. If any of its component places is in another rule, then it subsumes to rule with composite event.

686

4.2

L. Chavarr´ıa-B´ aez and X. Li

Veriﬁcation Example

To show our veriﬁcation methods, we will use the rule base of Example 1. Subsumed rules Incidence places are the following: {E12 , E1 , E9 } . We will analyze places E9 and E12 because into routes for transitions for place E1 there is not composite places. Transitions for place E9 are T5 , T6 and T8 . We can see that in routes for transition T5 threre is not any composite place. So, we only check routes for transitions T6 and T8 . Routes for transition T6 and transition T8 , both are into Tr, are the following: RU (E2 , T6 ) = (E2 , T0 ) → (T0 , EC4 ) → (EC4 , T6 ) RU (E3 , T6 ) = (E3 , T1 ) → (T1 , EC4 ) → (EC4 , T6 ) RU (E3 , T8 ) = (E3 , T1 ) → (T1 , CopyE3 ) → (CopyE3 , T8 ) The only composite event is stored in place p = EC4 , its component places are pp = {E2 , E3 } . A subset of pp, E3 , is in the only route of T8 . Type of EC4 is OR, so that, rule represented by transition T6 subsumes to rule represented by transition T8 . On the other hand, for place E12 we have the following analysis. Transitions for E12 are T9 , T10 and T12 . Routes for each transitions are the following: RU (E5 , T9 ) = (E5 , T11 ) → (T11 , CopyE5 ) → (CopyE5 , T2 ) → (T2 , EC6 ) → (EC6 , T9 ) RU (E0 , T9 ) = (E0 , T4 ) → (T4 , CopyE0 ) → (CopyE0 , T2 ) → (T2 , EC6 ) → (EC6 , T9 ) RU (E5 , T12 ) = (E5 , T11 ) → (T11 , CopyE5 ) → (CopyE5 , T12 ) RU (E5 , T10 ) = (E5 , T11 ) → (T11 , CopyE5 ) → (CopyE5 , T10 ) The composite event is p = EC6 and its component events are E5 and E0 . Since the place E5 is a subset of composite place EC6 , E5 is into the route of T10 and the type of this place is AND, we can say that rule represented by transition T10 subsumes to rule represented by transition T9 . As rule represented by transition T12 is the same that rule represented by transition T10 , transition T12 also subsumes to transition T9 . Through CCPN is easy to detect the described errors because we have the rules interactions description into incidence matrix. Detection can be done on the ECA rules itself but is more diﬃcult due to ECA rules is not ordered according to ﬁring sequence.

5

Conclusion and Future Work

In this paper we propose the deﬁnitions of structural errors for ECA rules. Some of structural errors deﬁnitions have been considered to verify production rulebase systems, but never to verify active rule-based systems. Also, we expose a static approach in order to detect (to verify) errors in ECA rules bases. Error

Knowledge Veriﬁcation of Active Rule-Based Systems

687

veriﬁcaction is an important issue due to increasing necessity to execute actions automatically and independently of the user. As well as to improve quality software systems and contribute to do easy the maintenance process. In our approach to verify ECA rules-based systems we consider the structural errors deﬁned and CCPN for modeling them. Comparing with others works, our approach has the following advantages: 1) We use CCPN incidence matrix as analysis tool because it contains all the information related to interactions among rules, so, we can detect errors independently of the initial event set; 2) Our methods are based on route concept, which is easily obtained through information stored into incidence matrix, also, they are easy to implement. Due to restrictions of space, we do not show all the deﬁnition that we have, as well as all the methods to detect them based on CCPN model. This veriﬁcation process can be done in any stage of rule base development in order to, whenever a new rule is added, to check if any errors can happen. As future work we propose to extend our deﬁnitions to consider complex events with time stamps. Also, to incorporate a dynamic analysis to determine more accurately the occurrence of errors. Of this way, since the maintenance issue is the main obstacle to actives systems become widely used we can contribute to simplify it.

References 1. Paton, N., D´ıaz,O.: Active Database system. ACM Computing Surveys, Vol. 31, No. 1 (1999) 63-103 2. Nazareth, D. L.: Investigating the Applicability of Petri Nets for Rule-Based Systems Veriﬁcation. IEEE Transactions on Knowledge and Data Engineering, Vol. 4, No. 3 (1993) 402-415 3. Wu, Q. , Zhou, C., Wu, J.,Wang, C.: Study on Knowledge Base Veriﬁcation Based on Petri Nets. International Conference on Control and Automation (ICCA2005), Budapest, Hungry (2005) 4. Zhang, D., Nguyen, D.: PREPARE: A Tool for Knowledge Base Veriﬁcation. IEEE Transactions on Knowledge and Data Engineering, Vol. 6, No. 6 (1994)983-989 5. He, X., Chu, W., Yang, H.: A New Approach to Verify Rule-Based Systems Using Petri Nets. Information and Software Technology, Vol. 45, No.10 (2003) 663–670, 6. Gupta, U.: Validating and Veryfying Knowledge-Based Systems. Uma Gupta, Los Alamitos, California (1990) 7. Gonz´ alez J. A., Dankel D.: The Engineering of Knowledge-Based Systems. Prentice Hall (1993) 8. Li, X.O., Mar´ın, Joselito M., Chapa, Sergio, V.: Applying Petri Nets on Active Database Systems. IEEE Transactions on System, Man, and Cybernetics, Part C: Applications and Reviews, accepted for publication (2005) 9. Baralis, E., Ceri, S.,Paraboschi, S.: Compile-Time and Run-Time Analysis of Active Behaviors. IEEE Transactions on Knowledge and Data Engineering, Vol. 10, No. 3 (1998) 353-370

Lattice-Based Classification with Mixed Numeric and Nominal Data Wei Hu and Huanye Sheng Computer Science Dept. Shanghai Jiaotong Univ. 200030 Shanghai, China [email protected], [email protected]

Abstract. Traditional classification methods are often designed for certain types of data. They cannot be directly applied to dataset with mixed numeric and nominal data. Only after numeric data was discretized or nominal data was encoded, could algorithms work. As data should accommodate to algorithm, such learning scheme is approach oriented to some extend. This paper presents a new data mining scheme called lattice-based learning (LBL), whose central idea is formulating algorithms using basic operations on lattice structure. Since both numeric and nominal data can be easily embedded into lattices, LBL algorithms are applicable to any dataset with mixed data. We detail lattice-based classification (LBC) algorithm in this paper. The performance of LBC has been studied on different datasets. Results show that LBC is an effective method for classification with mixed data and LBL learning is a promising scheme for data mining.

1 Introduction Traditional algorithms often assume data are either numeric or nominal [2, 10, 12, 13], but in the real world one has to deal with dataset with mixed numeric and nominal data. Most solutions adopt data preprocessing techniques and fall into one of the following categories: (1) Discretize numeric data and apply algorithms for nominal data [12]. The discretization process often causes loss of important information especially the difference between two numeric values. (2) Encode nominal data as numeric data and apply algorithms for numeric data. Nominal data can be converted into binary representations, using 0 or 1 to represent either a nominal value absent or present [9]. These binary representations can then be handled as numeric data. To some extend, traditional learning scheme is approach oriented as data is accommodating to algorithm. In this paper, we present a new data mining scheme called lattice-based learning (LBL), whose central idea is formulating algorithms using basic operations on lattice structure. As long as dataset is embedded into a lattice, which is a no information losing process, any LBL algorithm will work. Some basic notions of lattice structure are introduced as follows. An element u of a partially ordered set (S, % ) is an upper bound of a subset A of S, if a % u for every a∈A; an element u is a least upper bound or sup of subset A, if u is an D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 688 – 699, 2006. © Springer-Verlag Berlin Heidelberg 2006

Lattice-Based Classification with Mixed Numeric and Nominal Data

689

upper bound of A and u % v for every upper bound v. Lower bounds and greatest lower bounds (inf) are defined similarly, everywhere reversing the inequalities. A lattice is a partially ordered set L in which any pair of elements a, b∈L has a sup a∧b and an inf a∨b. By induction, one shows that any finite set of elements of a lattice has a sup and an inf. We denote sup and inf of finite subset A by A and A respectively. If L1 and L2 are lattices, then L1×L2 is also a lattice, where (a1,a2) % (b1,b2) if and only if a1 % b1 and a2 % b2. Suppose a dataset has m attributes A1, A2, …, Am. We can embed the dataset with mixed data into a lattice L by the following two steps:

ġ

Ģ

(1) Embed Ai into lattice Li for all i from 1 to m. If Ai is a numeric attribute, define Li to be Ai and define a % b to mean a ≤ b. Then Li is a lattice, where a∧b = max{a,b} and a∨b = min{a,b}. If Ai is a nominal attribute, define Li to be Ai {⊥,∅}and define a % b to mean a=⊥ or b=∅ or a=b. Then Li is a lattice, where a∧b is a if a=b otherwise is ⊥ and a∨b is a if a=b otherwise is ∅. (2) Compute Cartesian product of all lattices. Lattice L is just L1×L2× …×Lm. It is worth noting that the embedding method is not unique. For instance, let N be the union set of some nominal data and define LN to be the family of the subset of N, and a % b to mean a ⊆ b. Then LN is a lattice, where a∧b = a b and a∨b = a b. Data classification is an important form of data analysis that can be used to extract models describing important data classes. The rest of this paper will present a member of LBL family, lattice-based classification (LBC).

2 The LBC Algorithm The output of LBC algorithm is classification rule with hierarchically structured exceptions (RE) [3], which is a list of if-else rules connected by except-edges and if not-edges (Figure 1). Because each rule of RE may have at most one except-edge and one if not-edge, formal definition of RE can be recursively expressed as a triple , where R is an if-else rule, X and N are except-RE and if not-RE respectively. It is easy to transform RE into decision list and decision tree [11]. Compared with other representations of classification rules, RE is more effective and more similar with human cognitive results. When an instance is inputted, RE is performed in the following manner. Rules connected by if not edges behave like decision lists: only if a rule is not applicable to an input instance, the next rule is applied. As one rule’s except-RE contains information of exceptions to the rule, a rule may fire only if its except-RE does not classify the input instance. 2.1 Special Case Our discussion of LBC starts from a special case with no nominal data. Consider an artificial training data with two numeric attributes X and Y (Figure 2). There are 27 instances, of which 11 are positive and 16 are negative.

690

W.Hu and H. Sheng

R0 except if not R1

R2

R3 except if not

R4

antecedent R0 R1 R2 R3 R4

astigmatism=no AND tear=normal age=presbyopic AND spectacle=myope AND astigmatism=no AND tear=normal astigmatism=yes AND tear=normal spectacle=hypermetrope AND astigmatism=yes AND tear=normal

consequent lenses=none lenses=soft lenses=none lenses=hard lenses=none

Fig. 1. An example RE

1.0

R0

0.8

R1

0.6 0.4

RE2 R2

0.2 0.0

R3 0.0

0.2 0.4 0.6 0.8 1.0 positive negative

Fig. 2. A scatter diagram of artificial dataset

Let’s examine how LBC induces RE for positive instances. (1) We first try to characterize a set of positive instances {e3, e5, e6, e9, e10, e14, e15, e16, e21, e22, e25} with one if-else rule R1 IF 0.05≤X≤0.82 AND 0.15≤Y≤0.75 THEN Class=positive. whose antecedent is obtained by computing maximum and minimum of instances’ X value and Y value and consequent is the class of instances. The rule corresponds to a minimum rectangle range covering all positive instances, which also covers some negative instances {e4, e7, e8, e11, e17, e18, e19, e20, e23}. To measure the validity of rule(s), we compute the ratio of exceptions to the whole negative instances, called exception rate. This measure indicates whether a description of instances is general or specific. If the rate is lesser than a predefined threshold (0.3 for instance), rule(s) will be adopted. Here, exception rate of R1 is 8/16 (>0.3) and the rule is unadopted as it is too general.

Lattice-Based Classification with Mixed Numeric and Nominal Data

691

(2) One method for specializing description is partition instance set, constructing rule for each part and then characterizing positive instances with all the rules. Suppose 16 positive instances are split into two parts {e3, e5, e6, e9, e10, e14, e15, e16} and {e21, e22, e25}. Rule R2 IF 0.05≤X≤0.42 AND 0.25≤Y≤0.75 THEN Class=positive and rule R3 IF 0.75≤X≤0.82 AND 0.15≤Y≤0.45 THEN Class=positive. are learned for two parts respectively. Exception to rule R2 is {e4, e7, e11} and R3 covers no negative instance. The exception rate of R2 and R3 (3/16) is lesser than threshold (0.3), so R2 and R3 connected by if not-edge if not R2

R3

are adopted to summarize of common characteristic of positive instances. (3) Then, how to handle with exception to R2? A natural idea is to reduce the original task to a new RE induction task, whose positive instance set is R2’s exception {e4, e7, e11} and negative instance set is {e3, e5, e6, e9, e10, e14, e15, e16} from which R2 is learned. Suppose RE RE2 has been induced for {e4, e7, e11}. RE for 16 positive instances can be constructed by connecting RE2 with R2 using an except-edge. if not R2 R3 except

RE2 (4) The method for dealing with exception is applied recursively until no exceptions occur. In binary classification problem, once description of one class is given, that of another class can be obtained under the closed world assumption. Here, we introduce a special rule, called default rule, R0 IF true THEN Class=negative whose antecedent is always true. And the following RE, connecting R0 by except-edge, is applicable to predict class of new instance and distinguish between positive and negative instances. R0 except if not R2

R3 except

RE2

692

W.Hu and H. Sheng

2.2 Lattice Extension Now we extend the approach proposed in former subsection to lattice structure. Consider (again!) the special case, whose basic operations are: (1) Constructing if-else rule. Rule in last subsection corresponds to a range whose bounds are maximum value and minimum value of finite number set. If a number is regarded as an element in lattice, whose binary relation % is relation less than or equal to ≤, computing maximum (minimum) of a set is just computing sup (inf) of the set in the lattice. (2) Testing weather an instance is covered by a rule. This operation in last subsection is implemented by comparing X (Y) values of the instance with the bounds. It is easily extended to a partially order set. Therefore, the approach for numeric data can be extended to lattice. The extended approach for classification rule induction is formulated by operations on lattice structure and is a member of LBL learning scheme. Suppose a dataset has m attributes A1, A2, …, Am and it is embedded into a lattice L using the method mentioned in section 1. A lattice-based approach to inducing RE for positive instances in binary classification problem is depicted as follows.

Input : Set of positive instances P Set of negative instances N Threshold θ Rule with exceptions RE Algorithm LBC: 1 if P = ∅ then 2 return 3 end if 4 {P1, P2, …, Pk} = Partition (P, N, θ) 5 for all i from 1 to k do Pi % (A1, A2, …, Am) % Pi 6 Ri = IF THEN Class = P.Class 7 end for 8 NR=RE 9 for all i from 1 to k do 10 NR.R = Ri 11 LBC (Ri(N), Pi, θ NR.X) 12 NR = NR.N 13 end for

Ģ

ġ

㧘

The formal parameter P is a positive instance set while N is a negative instance set. θ is a predefined threshold to control the size of exception. RE is a rule with exceptions to be induced, which is a recursive structure corresponding to triple mentioned before. Partition procedure first split the nonempty positive instance set into a number of subsets according to a validity measure (Line 4). The measure of a cluster of subsets {A1, A2, …, Al}is defined by exception rate of corresponding rules

Lattice-Based Classification with Mixed Numeric and Nominal Data

693

l

| Val ({A1, A2, …, Al}, N) =

Ri ( N ) | i =1

|N|

.

where Ri is the if-else rule corresponding to subset Ai, N is negative instances and Ri(N) is negative instances covered by rule Ri. The Partition procedure returns {P1, P2, …, Pk} whose Val measure is lesser than threshold θ. We detail the Partition procedure in the next subsection. Next, if-else rule is constructed for each subset (Line 5 to 7). Here, Pi and Pi are sup and inf of subset Pi in lattice L. P.class represents the class of instances in set P, which is positive originally. Then the algorithm is called again to deal with exceptions to rule Ri (Line 11). Here, formal parameter P is Ri(N) which is subset of negative instances, and P.Class is negative. Finally, RE for exceptions is connected by except-edge (Line 11) and rules for different subsets are connected by if not-edge (Line 12). It is easy to extend the algorithm to induce RE for a certain class from dataset with multi classes of instances. And then, LBC is applicable to multi classification problem by adding a default rule

Ģ

ġ

2.3 Partition Procedure

The thing left to decide is how to split a given set into a proper number of subsets. Some notions of partition and partition sequence are introduced first. A partition π of a set S is a collection of disjoint nonempty subsets of S that have S as their union. In other words, the collection of subsets {Si | i∈I where I is an index set} forms a partition of S if and only if (1) Si ≠ ∅ for i∈I, (2) Si Sj = ∅, when i≠j, and (3) S i = S .

i∈I

Each subset is a block of the partition. Partition α is a refinement of another partition β if, for all block A of α, there exits a block B of β such that A ⊆ B.

π(1), π(2), …, π(n) is a partition sequence of a finite set S if and only if (1) π(1) = {S}, (2) π(i) is a refinement of π(i-1) for i from 2 to n, and (3) π(n) = {{s}| s∈S }. There are many methods for acquiring a partition sequence from a finite set. The simplest approach is hierarchically and randomly partitioning the set. One can also use clustering algorithms, some of which are applied to dataset with mixed numeric and nominal data [7, 8].

694

W.Hu and H. Sheng

The following theorem tell us if a partition sequence of training dataset is obtained, partition sequence of any subset can be computed easily. Theorem: Let α(1), α(2), …, α(n) be a partition sequence of finite set A and B ⊆ A. Then sequence β(1), β(2), …, β(n) , where

β(i) = {X B | X is a block of α(i)} for all i from 1 to n, is a partition sequence of B. Proof (1) β (1)= {A B}={B}. (2) For all block W of β(i), there exits a block X of α(i) such that W = X B. For X of (i) α , there exits a block Y of α(i-1) such that X ⊆ Y. For Y of α(i-1), there exists a block Z= Y B of β(i-1). Because X ⊆ Y, W = X B ⊆ Y B =Z. Therefore, β (i) is a refinement of β (i-1) for i from 2 to n. (3) β(n)= {{a} B | a∈A}= {{b}| b∈B}. Ƒ Since any finite subset of training dataset has been associated with a partition sequence, partition problem is reduced to finding a partition in a given partition sequence according to the Val measure. Suppose that P and N are finite set, P N = ∅ and π(1), π(2), …, π(n) is a partition sequence of P. The Val measure has two useful properties on the partition sequence. (1) Val decreases on partition sequence i.e., Val(π(i), N) ≥ Val(π(j), N) for all i≤j. (2) Val(π(n), N) = 0. Proof (1) We reduce property 1 to proposition: for all finite set A and B, RA(N) RB(N) ⊆ RA∪B(N) where RA is rule learned from A, RB from B and RA∪B from A B. First, suppose x∈RA(N) RB(N). This implies that x∈RA(N) or x∈RB(N). Now suppose that x∈RA(N). A % x % A. Because A⊆A B, A % A B and It follows x∈N and A B % A. Hence, A B % A % x % A % A B and x∈RA∪B(N). This shows that RA(N) RB(N) ⊆ RA∪B(N). (2) Suppose A is block of π(n). It follows that A ={p} where p∈P. Because A= A= Ƒ p and P N = ∅, RA(N)=∅. It follows that Val(π(n), N) = 0.

Ģ

Ģ

Ģ Ģ

ġ Ģ

ġ ġ

ġ

ġ

Ģ ġ

Property 2 assures that for all θ >0 there exists at least one partition whose Val value is lesser than θ. Property 1 tells us binary search is an effective method to find partition

π = arg max {Val(π(i), N)| Val(π(i), N)< θ }. π (1) ,π ( 2 ) ,...,π ( n )

Lattice-Based Classification with Mixed Numeric and Nominal Data

695

2.4 Comparison with Other Works

Most traditional RE induction algorithms are fall into the scheme proposed in the Counterfactuals algorithm [14]: (1) Find an interesting rule pertaining to instances. (2) Learn the exceptions recursively. (3) Learn the remaining instances recursively. Dimopoulos et al.’s method is parametric on a classical rule learning algorithm to find interesting rules [4]. The Induct algorithm seeks the rule with the most discriminatory power. It is a very fast and effective statistical approach to empirical induction [5]. Kivinen et al. presented an Occam algorithm for learning rules that are organized into up to k levels [6]. Compared with these methods, LBC does not need to learn the remaining instances recursively. This is because for all finite set A RA(A)=A, where RA is rule learned from A.

3 Experiments To perform empirical evaluation of our approach, we have performed two experiments on data classification. One is to analyze relation between threshold, clustering algorithm, numbers of rules and performance of lattice-based algorithm. And another is to compare the effect of using different classification methods Datasets: We have selected two databases from University of California Irvine Data Repository [1]. Of these, one dataset only numeric data (shuttle), another dataset mixed data (abalone). Learning system: A LBC learning system called VC04 has been implemented. It offers a platform to adjust learning parameters of LBC. Adjustable parameters include training and testing data, threshold value, lattice type, similarity between two instances, similarity between instance and set, similarity between two sets, clustering algorithm type and testing type. 3.1 Experiment 1: Analyze Relation Between Threshold, Clustering Algorithm, Numbers of Rules and Performance of LBC

We choose dataset abalone for our first experiment. The dataset has 7 numeric attributes and 1 nominal attribute. There are 4,177 examples each of 29 types: 1 to 29. We group classes 1-8, 9 and 10, and 11 on and treat the task as a 3-classificaiton problem. The learning parameters of experiment 1 are listed as follows. Threshold θ : range from 0.15 to 0.95. Clustering algorithm: Randomly partition, Simple K-means and ECCAS [8]. Lattice: Dataset is embedded into a lattice using the method discussed in section 1 and all attributes are used. Similarity between two instances: L1 measure. Similarity between instance and set: Average similarity. Testing Type: 8-fold stratification.

696

W.Hu and H. Sheng

#error 150 120 90 60 30 0 0.0 0.2 random

0.4 0.6 0.8 1.0 θ K-means ECCAS

Fig. 3. Relation between threshold, clustering algorithm and numbers of errors

#rule 1000 800 600 400 200 0 0.0 0.2 random

0.4 0.6 0.8 1.0 θ K-means ECCAS

Fig. 4. Relation between threshold, clustering algorithm and numbers of rules

The learning results are shown in Figure 3 and Figure 4. As can be seen in Figure 3, performance of LBC using Simple K-means is slightly worse than that of LBC using ECCAS, better than that of LBC using randomly partition. For each clustering algorithm, the error rate is high when θ is close to 0.15 or 0.95. This is because when θ is close to lower bound, the output rules are too specific, and when θ is close to upper bound, the output rules are too general. In Figure 4, output rules size of LBC using Simple K-means is slightly larger than that of LBC using ECCAS, less than that of LBC using randomly partition. For each clustering algorithm, the rules size is larger when θ is close to 0.15 or 0.95. When threshold θ is close to lower bound, few exceptions occur, which is not easy to be satisfied. Then subsets are split into many parts and bring many rules. On the other hand, big θ allows many exceptions. Then LBC will be called many times and output RE will have many levels.

Lattice-Based Classification with Mixed Numeric and Nominal Data

697

3.2 Experiment 2: Compare the Effect of Using Different Classification Methods

The dataset Shuttle has 58,000 examples, 43,500 for training and 14,500 for testing. There are 9 numeric attributes and 7 classes: Rad Flow, Fpv Close, Fpv Open, High, Bypass, Bpv Close and Bpv Open. The attributes concern the position of radiators in a NASA space shuttle, and the classes are the appropriate actions to be taken during a space shuttle flight. The learning parameters are listed as follows. Threshold θ : range from 0.3. Clustering algorithm: Simple K-means. Lattice: Dataset is embedded into a lattice using the method discussed in section 1 and all attributes are used. Similarity between two instances: L1 measure. Similarity between instance and set: Average similarity. Testing Type: Using testing set. Table 1. Comparison among different algorithms for Shuttle dataset

Algorithm NewId BayTree Cn2 Cal5 LBC (simple kmeans) Cart C4.5 Ac2 KNN LVQ

ErrorRate 0.01 0.02 0.03 0.03 0.08 0.08 0.10 0.32 0.44 0.44

Time Train 6180 240 11160 313 610 79 13742 2553 32531 2813

Test ? 17 ? 10 60 2 11 2271 10482 84

Table 1 shows a comparison among different algorithms. LBC outperforms C4.5, kNN, Ac3 and slightly worse than NewId and BayTree. The dataset abalone is used again in experiment. Now first 3133 instances are used for training and final 1044 are for testing. The learning parameters are listed as follows. Threshold θ : range from 0.3. Clustering algorithm: Simple K-means. Lattice: Dataset is embedded into a lattice using the method discussed in section 1 and all attributes are used. Similarity between two instances: L1 measure. Similarity between instance and set: Average similarity. Testing Type: Using testing set. The learning results for both 3-classificaiton task and 29-classificaiton task are given in table 2. As can be seen, the presented algorithm clearly outperforms Linear

698

W.Hu and H. Sheng Table 2. Comparison among different algorithms for abalone dataset

Algorithm

29-class es

3-class es

Backprop Dystal Cascade (no hidden nodes) Cascade (5 hidden nodes) C4.5 Linear Discriminate k Nearest Neighbor (k=5) LBC (simple kmeans, θ = 0.3)

? ? 24.86 26.25 21.50 0.00 3.57 19.45

64.00 55.00 61.40 65.61 59.20 32.57 62.46 63.82

Discriminate, Nearest Neighbor, and is slightly worse than Cascade and Backprop. Comparing with C4.5, it is worse than C4.5 in 29-classificaion task, but better in 3-classificaion task.

4 Conclusions In this paper, we have presented a new learning scheme called lattice-based learning (LBL), whose central idea is formulating algorithms using basic operations on lattice structure. As long as dataset is embedded into a lattice, which is a no information losing process, any LBL algorithm will work. Lattice-based classification (LBC) for dataset with mixed data, a member of LBL family, is discussed in detail. The performance of LBC has been studied on different datasets. Results show that LBC is an effective method for classification with mixed data and LBL learning is a promising scheme for data mining.

References 1. Blake, C., and Merz, C.: UCI Repository of Machine Learning Databases. (1998) 2. Breiman, L., Friedman, J. H., Olshen, R., A. and Stone, P.J.: Classification and Regression Trees. Wadsworth, Belmont. (1984) 3. Compton, P., Jansen, R..: Knowledge in Context: A Sstrategy for Expert System Maintenance. In Proceedings of AI'88: 2nd Australian Joint Artificial Intelligence Conference. (1988) 292–306 4. Dimopoulos, Y., Kakas, A. C.: Learning Non-monotonic Logic Programs: Learning Exceptions. In Proceedings of the 8th European Conference on Machine Learning. (1995) 5. Gaines, B.R., Compton, P.J.: Induction of Ripple Down Rules. In Fifth Australian Conference on Artificial Intelligence. (1992) 6. Kivinen, J., Mannila, H., and Ukkonen, E.: Learning Hierarchical Rule Sets. In Proc. of the 5th Annual ACM Workshop on Computational Learning Theory. Pittsburgh, Pennsylvania, July 27–29. (1992) 37–44

Lattice-Based Classification with Mixed Numeric and Nominal Data

699

7. Li, C., Biswas, G.: Unsupervised Learning with Mixed Numeric and Nominal Data. IEEE Transactions on Knowledge and Data Engineering. Vol. 14. No. 4. (2002) 8. Li, X.Y., Ye, Y.: A Supervised Clustering and Classification Algorithm for Mining Data with Mixed Variables. IEEE Transactions on Systems, Man, and Cybernetics. Part A: Systems and Humans. (2005) 9. Ralambondrainy, H.: A Conceptual Version of the K-means Algorithm. Pattern Recognit. Lett., Vol.16. No. 11. (1995) 1147–1157 10. Salzberg, S.: A Nearest Hyperrectangle Learning Method. Machine Learning 6(3). (1991) 251-276 11. Scheffer, T.: Algebraic Foundation and Improved Methods of Induction of Ripple Down Rules. In Proc. Pacific Knowledge Acquisition Workshop. (1996) 12. Quinlan, J.R.: C4.5 Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA. (1993) 13. Quinlan, J.R.: Improved Use of Continuous Attributes in C4.5, Journal of Artificial Intelligence Research 4. (1996) 77–90 14. Vere, S.: A. Multilevel Counterfactuals for Generalization of Relational Concepts and Production. Artificial Intelligence, 14. (1980) 139–164

Learning to Semantically Classify Email Messages Eric Jiang University of San Diego, 5998 Alcala Park San Diego, California 92110, United States of America [email protected]

Abstract. As a semantic vector space model for information retrieval (IR), Latent Semantic Indexing (LSI) employs singular value decomposition (SVD) to transform individual documents into the statistically derived semantic vectors. In this paper a new junk email (spam) filtering model, 2LSI-SF, is proposed and it is based on the augmented category LSI spaces and classifies email messages by their content. The model utilizes the valuable discriminative information in the training data and incorporates several pertinent feature selection and message classification algorithms. The experiments of 2LSI-SF on a benchmark spam testing corpus (PU1) and a newly compiled Chinese spam corpus (ZH1) have been conducted. The results from the experiments and performance comparison with the popular Support Vector Machines (SVM) and naïve Bayes classifiers have shown that 2LSI-SF is capable of filtering spam effectively.

1 Introduction Over the years, various spam or unsolicited email message filtering technology and anti-spam software products have been developed and deployed. Some are designed to stop spam at the server level and may rely on DNS blacklists of domain names that are known to originate spam. This approach can be problematic and insufficient due to the lack of accuracy of the name lists. The other major type of spam filtering functions at the client level. Once an email message is downloaded, its header and/or content can examined to determine whether the message is spam or legitimate. Many machine learning algorithms in text categorization [11] have been successfully used in client-side spam detection and filtering applications. Among them, naïve Bayes [9], Racchio’s algorithm [8], decision tree with boosting [10], Support Vector Machines (SVM) [2] are the most popular. In this paper, a new spam filtering model (2LSI-SF), based on Latent Semantic Indexing (LSI), is proposed. In Section 2, LSI is briefly introduced and its original reference is provided. In Section 3, the 2LSI-SF model is described, and its experiments on some email corpora and a performance comparison with the SVM and naïve Bayes classifiers are presented in Section 4. Some concluding remarks are provided in Section 5. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 700 – 711, 2006. © Springer-Verlag Berlin Heidelberg 2006

Learning to Semantically Classify Email Messages

701

2 Latent Semantic Indexing As a popular vector space model for information retrieval (IR), LSI [3] employs a rank-reduced term-document space through the singular value decomposition (SVD) [5] and effectively, it transforms individual documents into their semantic content vectors to estimate the major associative patterns of terms and documents and to diminish the obscuring noise in term usage. Since the search in the space is based on the semantic content of documents, this approach is capable of retrieving relevant documents even when the query and such documents do not share any common terms. Several variations of the LSI model have been proposed recently. For instance, an enhanced LSI implementation, which updates LSI by performing nonlinear perturbations to the LSI space, has been developed in [6] and it represents a more accurate semantic model for effective information retrieval.

3 Spam Filtering Model LSI can be used as a learning algorithm for text categorization by replacing the notion of query-relevance with the notion of category-membership. In regard to spam filtering, it can be used to classify incoming messages into either legitimate or spam category. An experiment of this approach on the Ling-Spam corpus was reported in [4]. However, there is a need to fully justify the validity of this approach as a general spam filtering model and to investigate the practicability of the approach in incorporating further space dimensionality reduction and category discriminative information in the training data. First, as pointed out recently by the author of Ling-Spam [1], the performance of a learning based spam filter on Ling-Spam can be over-optimistic because all legitimate messages in this corpus are topic-specific, and hence it may not reflect the performance that can be achieved on the incoming messages of a real email user. Secondly, the SVD algorithm can be computationally expensive for large data sets, and the exploration of additional dimensionality reduction with LSI is particularly valuable in order to make it a viable spam filtering system. This can be accomplished by reducing both sizes of the feature set and the training data set. Thirdly, LSI itself is a completely unsupervised learning algorithm and when it is applied to the (supervised) spam filtering problem, the valuable existing email discrimination information on the training data should be utilized and be integrated in the model learning. Lastly, email classification is a cost sensitive learning process and it is always desired to develop email filters that minimize the misclassification errors of legitimate messages. The potential utilization of category semantic spaces in this aspect is worth investigating. In this section the model 2LSI-SF is described in terms of its structure and major components. 3.1 Feature Selection Two levels of feature selection have been used in 2LSI-SF for dimensionality reduction. In this paper, a term, or a feature is referred to as a word, a number, a symbol, or

702

E. Jiang

simply a punctuation mark. Dimensionality reduction aims to trim down the number of terms to be modeled while the content of individual messages is preserved. First, features are selected in an unsupervised setting. The process is carried out by removing the stop or common terms and applying a term stemming procedure. Then, the terms with low document frequencies or low global frequencies are eliminated from the training data, as these terms may not help much in differentiating messages for categories and instead they can add some obscuring noises in message classification. The selection process also removes the terms with very high global frequencies in the training data. The high frequent terms can mislead the classification process in 2LSI-SF due to the tf portion of the weighting scheme (see Section 3.2) and in addition, many of these terms seem distribute almost equally between spam and legitimate categories and are not valuable in characterizing messages in different categories. Next, features are selected by their frequency distributions between spam and legitimated messages in the training data. This supervised feature selection step intends to, through those classified samples in the training data, further identify the features that distribute most differently between categories. In 2LSI-SF, the Information Gain (IG) measure [14] is used in this selection process. IG quantifies the amount of information gained for category prediction by the knowledge of the presence or absence of a term in a document. More specifically, the Information Gain of a term T about a category C can be defined as

IG (T , C ) =

¦ ¦

P ( t , c ) log

c∈{ C , C ) t ∈ ( T ,T )

P (t , c ) P (t ) P (c )

(1)

where P(c) and P(t) denotes the probability that a document belongs to the category c and the probability of t occurs in a document, respectively, and P(t, c) is the joint probability of t and c. Through the process of feature selection, the feature dimensionality of the training data can be significantly reduced. For, instance, in the experiments with PU1 (see Section 4.2 for detail) the original feature size that is over 20,000 has been trimmed down to tens, hundreds, and thousands. 3.2 Message Vector and Term Weighting After feature selection, each message is encoded as a numerical vector whose elements are the values of the retained feature set. Each term value is associated with a local and global term weight, representing the relative importance of the term in the message and the overall importance of the term in the corpus, respectively. It is our belief that in the context of email messages, term frequencies can be more informative than the simple binary coding in classifying if a given message is spam. There are several choices to weight a term locally and globally based on its frequencies. Some preliminary experiments we performed on several weighting

Learning to Semantically Classify Email Messages

703

combinations have indicated that the traditional log(tf)-idf weighting scheme [6] produces very satisfactory performance and was used in our experiments. 3.3 Augmented Category LSI Spaces For a give document set, LSI builds a rank-reduced vector space that characterizes the principal correlations of terms and documents in the set. In regard to document classification, multiple LSI spaces can be constructed, one for each category, and each of the spaces is constructed from the only documents of one category. It is assumed that a pure category-based LSI space offers a more accurate content profile, and more representative term and document correlations for the category. In practice, however, this approach may not work very well because documents from different categories can be quite similar and difficult to be distinguished from each other. It is especially true in email filtering. Many spam messages are purposely written in a way to have legitimate looks and to mislead spam filters. In 2LSI-SF, assuming that the training data have been separated into the spam and legitimate sets, this problem is ameliorated by augmenting each of the category sets to include a small number of the training samples that are most close to the category but belong to the opposite category. Because of their closeness to a category set, the new incoming messages that are similar to those augmented samples are prone to be misclassified in the LSI spaces built from pure category sets but can be correctly classified in the LSI spaces built from our augmented category sets. The similar strategy has been used in data compression [12]. Each of the augmented category sets is then used to build the corresponding semantic space for message classification. In 2LSI-SF, the expansion of the category training sets is carried out by clustering both sets and finding their cluster centroids for sample comparison and selection. Given a group of messages and their vector representations, the centroids is a vector computed by averaging the term weights in the group. It is noted that email messages, though vary significantly in content, can likely be grouped by topics into a number of clusters where some underlying semantic term and document correlations are expectedly present. For instance, spam email might be grouped into classes such as get-rich ads, adult products, and online-shopping promotions. Since the clustering is performed after feature selection, the individual centroids are the encoded content vectors representing the most important retained features of the corresponding message classes and effectively, they provide a mechanism to summarize the topics of the related classes. Once the clusters for a category set are formed and their centroids are identified, all training samples from the other category set are compared against to the centroids, and the most similar ones are selected and added to the category set. The similarity between a sample d and a centroid c is measured by their vector cosine value

cos( d , c ) =

d ⋅c || d || × || c ||

(2)

704

E. Jiang

The cosine value above offers a comparison of sample d with all messages in the cluster where the centroids c is constructed. Mathematically, it measures an average similarity between the sample and these messages. A variant of the well-known k-means clustering algorithm was used in 2LSI-SF. The number of clusters and the sizes of augmented samples can vary depending on the corpus and the space dimensionality, and in our experiments presented in Section 4, an average size 50 of the augmented samples is used, and the fixed 11 clusters are constructed for both category sets. The fixed cluster size is determined empirically by a silhouette plot [7] on the training data. A preliminary analysis on 2LSI-SF has indicated that the size of the augmented samples to a category set has certain impact on the classification accuracy for the category. This characteristic can potentially be used further improve classification on legitimate messages. It should be pointed out that the proposed approach of using augmented categorybased semantic spaces is effective in terms of integrating the most valuable category discrimination information into the LSI model learning and characterizing the principal semantic content structure of each category. The two-space configuration of 2LSISF, which is embedded in our classification algorithms described in the next subsection, also helps improve the classification accuracy of incoming messages. 2LSI-SF requires the construction of two LSI spaces. However, the dimensionality of each space in this model is substantially reduced and this can be especially useful for dealing with large training sets. 3.4 Message Classification To classify incoming email messages, three classification algorithms are considered. Each of them treats incoming messages as individual queries and utilizes the embedded email discrimination information in the model. The first algorithm uses the most semantically similar message in the training data, which is determined by comparing all training sample messages in both augmented LSI spaces, to classify incoming messages. The algorithm is referred to as Single. It is simple, but can be less accurate as more spam messages are warily designed and written to have legitimate looks. The second algorithm classifies incoming messages by using a group of the top m most similar samples in the training data compared in the LSI spaces. The counts or sums of cosine similarity values, of spam and legitimate sample messages in the group make the classification decision. The algorithm uses the latter approach is referred to as Multiple. The third message classification algorithm is a hybrid approach that combines the ideas of Single and Multiple with the hope to mollify some problems associated with the algorithms. This algorithm is named as Hybrid. It has a few parameters that are set heuristically, and can be configured by the user, depending upon users’ tolerance level to spam and potential misclassification errors.

Learning to Semantically Classify Email Messages

705

4 Experiments In this section, the experiments of 2LSI-SF on the benchmark spam testing corpus PU1 and a new Chinese email collection ZH1 are presented. A comparison with the SVM and naïve Bayes methods is also provided. 4.1 Performance Evaluation The performance of a spam classifier can be evaluated by spam precision (S-prec) and recall (S-rec), legitimated precision (L-prec) and recall (L-rec). In brief, the precision is gauged by the percentage of messages classified to a category which actually are, whereas the recall is quantified by the percentage of messages from a category that are categorized by the classifier. These measurements, however, do not take an unbalanced misclassification cost into consideration. Spam filtering can be a cost sensitive learning process in the sense that misclassifying a legitimate message to spam is typically a more severe error than misclassifying a spam message to legitimate. In our experiments, a cost sensitive and unified weighted accuracy [1] is also used as a performance criterion and it can be defined as WAcc ( λ ) =

λ nL−>L + nS −>S

λ (nL−>L + nL−>S ) + (nS −>S + nS −>L )

(3)

where nL->L, nL->S, nS->S and nS->L denotes the count of the classification L->L (legitimate classified as legitimate), L->S (legitimate misclassified as spam), S->S (spam classified as spam), and S->L (spam misclassified as legitimate), respectively, and Ȝ is a cost parameter. The WAcc formula assumes that the error of L->S is Ȝ times more costly than the error of S->L. In our experiments, Ȝ = 1 (L->S and S>L have the same cost) and Ȝ = 9 (L->S has a higher cost than S->L) are used. The setting of Ȝ = 999 has also been proposed in literature. However, this setting can be inaccurate when the training data are not large enough. For this reason, the setting is not used here. For performance comparison, two well-known classification models: naïve Bayes and SVM that have been popular choices for email filtering applications are also considered in our experiments. The Weka [13] implementations of both models are used and the input data to both classifiers are the same as to 2LSI-SF, namely, the processed set of message vectors after feature selection and term weighting. 4.2 Experiments on PU1 PU1 is a benchmark spam testing corpus that was released recently [1]. It contains a total of 1099 real email messages, with 618 legitimate and 481 spam. The experiments on PU1 are performed using stratified 10-fold cross validation. More specifically, the PU1 corpus is partitioned into ten equally-sized subsets. Each experiment takes one subset for testing and the remaining for training, and the process repeats ten

706

E. Jiang

Table 1. Classification results on various small feature sets and message classification algorithms # features

50

150

S-prec S-rec L-prec L-rec

.861 .906 .927 .885

.944 .927 .945 .956

S-prec S-rec L-prec L-rec

.841 .900 .921 .867

.917 .910 .932 .935

S-prec S-rec L-prec L-rec

.857 .971 .975 .874

.926 .977 .982 .939

250 Hybrid .963 .958 .967 .971 Single .947 .942 .956 .958 Multiple .933 .973 .979 .945

350

450

550

650

.953 .940 .954 .963

.972 .935 .952 .979

.979 .935 .952 .984

.983 .929 .948 .987

.919 .925 .942 .935

.944 .927 .945 .956

.943 .927 .946 .955

.918 .940 .952 .934

.920 .985 .988 .932

.957 .981 .985 .964

.964 .971 .978 .971

.970 .969 .976 .976

times with each subset takes a turn for testing. The performance is then evaluated by averaging over the ten experiments. Table1 summarizes the classification results on three message classification algorithms and various small feature sets from 50 to 650 with an increment of 100. For both Hybrid and Multiple algorithms, its S-prec and L-rec measurements show an upward trend while its L-prec and S-rec measurements shift up and then down after the size 250 or 350 as the number of the retained features being increased. This phenomenon implies that feature expansions can especially be beneficial to profile legitimate messages more accurately and subsequently reduce the critical misclassifications of legitimate messages. It is also interesting to note that Multiple offers very high values of S-rec and L-prec and is expected to classify most of spam messages correctly. For the Single algorithm, its measurements are less predictable but all of Table 2. Classification results on various expanded feature sets and message classification algorithms # features

1650

2650

S-prec S-rec L-prec L-rec

.985 .923 .944 .989

.981 .921 .942 .985

S-prec S-rec L-prec L-rec

.928 .927 .945 .942

.921 .938 .952 .953

S-prec S-rec L-prec L-rec

.970 .963 .972 .976

.967 .950 .963 .974

3650 Hybrid .989 .921 .943 .992 Single .941 .938 .952 .953 Multiple .970 .944 .958 .977

4650

5650

6650

8000

.994 .915 .939 .995

.992 .910 .936 .994

.988 .915 .939 .990

.983 .910 .935 .987

.957 .933 .951 .966

.965 .935 .952 .973

.961 .938 .954 .969

.967 .923 .943 .974

.973 .944 .959 .979

.967 .935 .953 .974

.981 .917 .940 .985

.982 .904 .931 .987

Learning to Semantically Classify Email Messages

707

them peak at the size of 250. Overall, it seems that there should be a minimum requirement of feature set size (say around 250) on this corpus in order to achieve an acceptable classification performance. Clearly, the algorithm Single is less competitive than Multiple and Hybrid over these feature sets and while Multiple outperforms in S-rec and L-prec, Hybrid achieves a better performance when the unbalanced costs of misclassification errors are taken into consideration. The generally higher cost of L->S than S->L is reflected in the quantities of S-prec and L-rec. For text classifiers, the size of feature set can have an effect on their classification performance. Most of spam experiments have been reported in literature use relatively small feature sets. The performance of 2LSI-SF on expanded feature sets has also been experimented. Table 2 shows the results on feature sets ranging from 1650 to 6650 with an increment of 1000. The category of 8000 in the table refers to the feature sets retained without going through the supervised feature selection step. The numbers of the retained features are roughly around 8000. It can be seen in Table 2 that as the feature set gets larger, both S-rec and L-prec values seem to have a downward or flat trend for all algorithms while the values of Sprec and L-rec have an upward trend with the exception of Hybrid that its values peak at the size of 4650. The same observation applies when comparing the results of the expanded sets in Table 2 with those from the feature set of 650 (the last column in Table 1). A further study of message classification counts generated by the filter has shown that as more features are included for model learning, both L->L and S->L counts are likely (but just slightly) increased, whereas the L->S count is likely a bit decreased, and the S->S count seems unwavering. For 2LSI-SF, we may conclude that a richer feature set helps further characterize legitimate messages and classify them correctly, but it contributes less or nothing to improving classification accuracy for spam messages. In addition, the precisions and recalls for both categories are generally improved as feature sets get larger. However, when the size of feature set reaches to certain point (250 or 350 in our experiments, and see Table 1), further enlarging feature set may generate higher spam precision and legitimate recall but lower or same spam recall and legitimate precision. In order to compare the performance of 2LSI-SF on PU1 with the SVM and naïve Bayes (NB) classifiers and also take the unbalanced misclassification costs into consideration, the average weighted accuracies WAcc are computed for all feature sets that have been considered. Fig. 1 and Fig. 2 show a comparison of three 2LSISF algorithms with SVM and NB for the cost parameter Ȝ = 1 and Ȝ = 9, respectively. For Ȝ = 1, Single is clearly inferior to all other four approaches that have most of their accuracy values above 95%. Among these top four, Multiple is the best performer over the small feature sets whereas NB and SVM are the top-rated when large feature sets are considered. For Ȝ = 9, all algorithms except Single deliver good performance. And for most of the feature sets, Hybrid consistently achieves the highest accuracy that peaks at the size 4650 with 98.9%. It is interesting to note that NB does extremely well on PU1 and likely, its performance is boosted by our feature selection process.

708

E. Jiang

average weighted accuracy

1

0.95

Hybrid Single 0.9

Multiple SVM NN

0.85 50

150

250

350

450

550 650 1650 2650 number of retained features

3650

4650

5650

6650

8000*

6650

8000*

Fig. 1. Average weighted accuracy with Ȝ = 1 (PU1)

average weighted accuracy

1

0.95

Hybrid Single Multiple

0.9

SVM NN

0.85 50

150

250

350

450

550 650 1650 2650 number of retained features

3650

4650

5650

Fig. 2. Average weighted accuracy with Ȝ = 9 (PU1)

4.3 Experiments on ZH1 In this subsection, the experiments of 2LSI-SF on a newly complied Chinese spam corpus ZH1 [15] and a performance comparison with SVM and NB are presented. The experiments intend to demonstrate the capability of a model for classifying email written in a language with different linguistic structures. Chinese text does not

Learning to Semantically Classify Email Messages

709

1

0.98

0.96

average weighted accuracy

0.94

Hybrid

0.92

Single Multiple

0.9

SVM NB

0.88

0.86

0.84

0.82

0.8 50

150

250

350

450

550 650 1650 2650 3650 number of retained features

4650

5650

6650

7650

8650

Fig. 3. Average weighted accuracy with Ȝ = 1 (ZH1) 1

average weighted accuracy

0.95

0.9

0.85 Hybrid Single Multiple SVM

0.8

NB

0.75 50

150

250

350

450

550 650 1650 2650 3650 number of retained features

4650

5650

6650

7650

8650

Fig. 4. Average weighted accuracy with Ȝ = 9 (ZH1)

have explicit word boundaries like English and words in the text can be extracted by some specialized word segmentation software [15]. The construction of ZH1 is very similar to PU1, and ZH1 is made up of 1205 spam and 428 legitimate sample messages. The experiments on ZH1 are also performed using stratified 10-fold cross validation and the varied feature sets with the size up to 8650 are used. Fig. 3 and Fig. 4 show the average weighted accuracy values obtained by all five classifiers over the feature sets for Ȝ = 1 and Ȝ = 9, respectively. For the case with

710

E. Jiang

equal misclassification cost (Ȝ = 1), Fig. 3 indicates that SVM performs best, followed by the group of three 2LSI-SF algorithms, and NB evidently fails to be comparable. When a higher cost of misclassifying legitimate messages is considered (Ȝ = 9), a similar observation can be made, but this time, Hybrid from 2LSI-SF becomes more competitive to SVM. Both models deliver high classification performance on the corpus. The experiments on both PU1 and ZH1 corpora have demonstrated that the proposed 2LSI-SF model, in particular the Hybrid approach, is very effective for spam detection and filtering, and represents a very competitive alternative to the wellknown classifiers such as SVM and naïve Bayes.

5 Conclusions As a rank-reduced vector space model, LSI has been successfully used in information retrieval and other applications. In this paper, an LSI-based spam filtering model, 2LSI-SF, is proposed that classifies email messages by their semantic content. The model utilizes the valuable email discriminative information in the training data and incorporates several pertinent feature selection and message classification algorithms. The experiments of 2LSI-SF on some email testing corpora have shown that 2LSI-SF is very effective in learning to classify unwanted (spam) email messages. The competitive performance of 2LSI-SF is also demonstrated by comparing the model with two popular email classifiers: SVM and naïve Bayes. As future work, we plan to improve the accuracy and efficiency of the 2LSI-SF model by further exploring spam email structures such as number of images contained in email and to develop a commercial quality spam filtering system.

Acknowledgments The author would like to thank the anonymous reviewers for their valuable comments on the paper.

References 1. An Androutsopoulos, I., Paliouras, G., Michelakis, E.: Learning to Filter Unsolicited Commercial E-mail. Technical Report 2004/2, NCSR Demokritos (2004) 2. Christianini, B., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other Kernel-Based Learning Methods. Cambridge University Press (2000) 3. Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science. (1990) 391-409 4. Gee., K.: Using Latent Semantic Indexing to Filter Spam, Proceedings of the 2003 ACM Symposium on Applied Computing (2003) 460-464 5. Golub, G., Vn Loan, C.: Matrix Computations. John-Hopkins, Baltimore, 3rd edition (1996)

Learning to Semantically Classify Email Messages

711

6. Jiang., E., Berry, M.: Solving Total Least-Squares Problems in Information Retrieval, Linear Algebra and its Applications (2000) 137-156 7. Kaufman, L., Rousseeuw, P.: Finding Groups in Data, Wiley (1990) 8. Rocchio, J.: Relevance Feedback Information Retrieval, The Smart Retrieval System Experiments in Automatic Document Processing (Salton, G. ed.), Prentice-hall (1971) 313-323 9. Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian Approach to Filtering Junk E-mail, Proceedings of AAAI workshop (1998) 55-62 10. Schapier, R., Singer, Y.: BoosTexter: A Boosting-based System for Text Categorization, Machine Learning (2000) 135-168 11. Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys (2002) 1-47 12. Schutze, H., Hall, D., Pedersen, J.: A Comparison of Classifiers and Document Representations for the Routing Problem, Proceedings of SIGIR (1995) 229-237 13. Weka -Data Mining Software in Java, http://www.cs.waikato.ac.nz/ml/weka/ 14. Yang, Y., Pedersen, J.: A Comparative Study on Feature Selection in Text Categorization, Proceedings of the 14th International Conference on Machine Learning (1997) 412-420 15. Zhang, L., Zhu, J., Yao, T.: An Evaluation of Statistical Spam Filtering Techniques, ACM Trans on Asian Language Information Processing (2004) 243-369

Optimizing the Hyper-parameters for SVM by Combining Evolution Strategies with a Grid Search Ruiming Liu1, Erqi Liu2, Jie Yang1, Ming Li1, and Fanglin Wang1 1

Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University 200240, Shanghai, P.R. China {liuruiming, jieyang, mingli, hardegg}@sjtu.edu.cn 2 Institute of the Second Academy, China Aerospace Science & Industry Corporation 100854, Beijing, P.R. China [email protected]

Abstract. In real-world applications, selecting the appropriate hyper-parameters for support vector machines (SVM) is a difficult and vital step which impacts the generalization capability and classification performance of classifier. In this paper, we analyze the distributing characteristic of hyper-parameters that the optimal hyper-parameters points form neighborhoods. For finding all the optimal points (on the grid points) in neighborhoods, based on this characteristic, we propose a hybrid method that combines evolution strategies (ES) with a grid search (GS), to carry out optimizing selection of these hyperparameters. We firstly use evolution strategies find the optimal points of hyperparameters and secondly execute a grid search in the neighborhood of these points. Our hybrid method takes advantage of the high computing efficiency of ES and the exhaustive searching merit of GS. Experiments show our hybrid method can successfully find the optimal hyper-parameters points in neighborhoods.

1 Introduction The statistical learning method of support vector machines (SVM) has been researched widely ever since it was proposed by Vapnik [1]. It has been successfully used to complete classification, regression and clustering tasks. However, in realworld problems, you have to give appropriate kernel functions for specific training dataset and the appropriate hyper-parameters for them. The nicer performance of SVM depends on the right selection of these parameters. In some papers, the hyperparameters also were called multiple parameters which comprise of a regularization parameter [3] (soft margin parameter [2], penalty parameter in [4]) and kernel parameters. There has been quite a bit of attention on the selection of hyper-parameters for SVM [3], [4], [5], [6], [7], [8], [9], [10], [13], [14]. In the beginning, the grid search was recommended [4] to find the best hyper-parameters. It’s a natural and exhaustive parameter searching approach which can obtain the global optimization parameters. At the same time, grid search has to try every possible point of parameters that makes computing times excessive. The computing times will increase exponentially as the D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 712 – 721, 2006. © Springer-Verlag Berlin Heidelberg 2006

Optimizing the Hyper-parameters for SVM by Combining Evolution Strategies

713

number of parameters increases. For reducing computing quantity, the gradient descend maybe is the most reasonable solving approach. Some researcher has used the gradient descent to choose the best hyper-parameters for SVM [5], [6], [7], [8]. Nevertheless, the gradient descent needs to meet a series of differentiable conditions. That is impossible in most practice. Its application therefor is restricted consumedly. Recently, evolution algorithms (EA) have been applied to select the optimization hyper-parameters in [3], [9], [10]. EA have greater computing efficiency than grid search and can optimize parameters of non-differentiable functions. Thus, it’s not a bad choice. Friedrichs and Igel [3] used the covariance matrix adaptation evolution strategy (CMA-ES, [11], [12]) to optimize the hyper-parameters of Gaussian kernel function (radial basis function, RBF). They have demonstrated on benchmark datasets that the CMA-ES can improve the results achieved by grid search. Liu et al. [9] gave a weighted support vector machine with GA-based parameter selection (GA-WSVM). The kernel function that they investigated was also Gaussian kernel function. Rojas et al. [10] have used a genetic algorithm (GA) to search the parameterization space of SVM kernels with multiple parameters for classification application and achieved a fine performance. In this paper, we will analyze the distributing characteristic of hyper-parameters for SVM. Based on this characteristic, we propose a hybrid method to carry out optimizing selection which can find all the optimal points (on the grid points) in respective neighborhoods. The cross-validation results are used as evaluation of hyper-parameters selection. In the following, we give a brief review and analysis of SVM in section 2. In section 3, we give analyses of the distributing character of hyper-parameters and thoroughly present how to find all the hyper-parameters in neighborhoods for SVM by our hybrid method, ES-GS. We show some experiment results in section 4 and draw a conclusion in section 5.

2 Support Vector Machines We mainly focus our attention on the SVM for classification. The basis principle of SVM classifier is to map the data into a higher dimension feature space and then find a linear separating hyper-plane with the maximal margin in that high dimension space, by which we can correctly separate the data. * * We consider the problem of binary classification. Let S = {( x1, y1 ),,( xm , ym )} be * n the training set where xi ∈ (i = 1, , m ) is an n-dimension input vector, * yi (i = 1, , m) is the corresponding label of xi and its range is {−1, +1} . The task of SVM is to learn an optimal separating hyper-plane, which can be represented as

* * < w, xi > + b = 0 .

(1)

When the datasets are linearly separable, the problem of seeking the optimal separating hyper-plane can be transformed into following optimization problem [15]:

* * minimisew,b < w ⋅ w > * * subject to yi (< w ⋅ xi > +b) ≥ 1,

(i = 1,, m)

(2)

714

R. Liu et al.

The linearly separable case is simple and easy to understand. However, the real-world datasets will in general be linearly non-separable in the feature space. In order to still find the separating hyper-plane, the slack variable ξi associated with misclassification is introduced. Then the optimization problem (3) changes into the following form: m * * minimisew ,b < w ⋅ w > + C ¦ ξi2

(3)

i =1

s.t.

* * y i ( < w ⋅ xi > + b ) ≥ 1 − ξ i ,

(i = 1, , m )

It is applied to give penalties to the misclassification. What we should pay more attention to is the regularization parameter C, one of hyper-parameters. We will have a detail discussion about it in section 3.1. In order to obtain the solution of equation (3), the dual representation is needed. The dual objective function and corresponding restrictive conditions is described as: m

¦ αi −

m axim ise

i =1 m

¦α

s .t .

i

1 2

m

¦

* * y i y jα iα j < x i ⋅ x j >

i , j =1

yi = 0 ;

(4)

0 ≤αi ≤ C

i =1

i , j = 1, , m When

the

datasets

are

non-linearly

separable,

a

kernel

function

** * * K ( x, z ) =< Φ( x), Φ( z ) > is used so that separating the data precisely is realized by a hyper-plane. The function Φ can map the raw data to a feature space in which it is yet linearly separable. In this space, the SVM learns a separating hyper-plane by solving the following quadratic optimization problem: m

maximise

¦ αi − i =1 m

s.t .

¦α y i

i

1 m * * ¦ y i y jα i α j K < x i , x j > 2 i , j =1 = 0;

0 ≤ αi ≤ C

(5)

i =1

i , j = 1, , m There are many various kernel functions [16]. However, as examples, we only investigate two popular kernel functions in this paper:

噝The polynomial kernel function:

K ( x, z ) =< x ⋅ z + 1 > d .

(6)

噝The Gaussian kernel function: 2

x−z K ( x, z ) = exp( − ). 2σ 2

(7)

Optimizing the Hyper-parameters for SVM by Combining Evolution Strategies

715

The exponent d of polynomial kernel function and the width σ of Gaussian kernel function are the so-called kernel parameters. More kernel functions and details about them can be found in [16].

3 Hyper-parameters Optimization Based on ES-GS 3.1 Hyper-parameters and Its Distributing Characteristic In section 2, we have reviewed the basis theory of SVM and given a simple introduction of a regularized parameter and kernel parameters. For the SVM with a fixed kernel function, its hyper-parameters are composed of a regularized parameter and kernel parameters, e.g. (C,σ ) being the hyper-parameters of SVM with Gaussian kernel function. The regularized parameter C can give a balance between minimizing the number of training errors and maximizing the margin width. Furthermore, as C decreases, the width of the margin increases. In the other hand, the kernel parameters dominate the mapping results of datasets. They play a leading role in separable degrees of dataset in feature space. Accordingly, the classification performance is greatly influenced by the selection of hyper-parameters. In practice, we found that the optimal points of hyper-parameters do not exist uniquely as an unattached point. In [16], author has also mentioned that the regularized parameter C may have more than one optimal value. Moreover, after a number of experiments, we discovered a more important phenomenon that there are usually other optimal hyper-parameters points in the neighborhood of one optimal hyper-parameters point. All of these optimal hyper-parameters points have the same value of evaluation function (fitness function), e.g., the cross-validation criterion [16], AUC [10] or the validation rate [9]. Namely, these optimal hyper-parameters points form a neighborhood in which the values of evaluation function are equal and optimal, which means the top of function surface is flat. Fig. 1 gives a function example to demonstrate this characteristic.

Fig. 1. An example showing hyper-parameters distribution characteristic

716

R. Liu et al.

In the previous works and papers, few authors have paid an enough attention to this significant distributing characteristic of optimal hyper-parameters. They settled for finding only one hyper-parameters point. Their selection results of hyper-parameters are some unattached optimal points of hyper-parameters, such as in [14] and [16]. Seeking a good method that can find more optimal hyper-parameters or give their ranges is necessary. That makes it be impossible to select more robust parameters in solving engineering problems. 3.2 Evolution Strategies and the Grid Search As one of the important classes of evolution algorithms (EA), evolution strategies (ES) are often used as optimization tools. Distinguished from genetic algorithms (GA), ES algorithm doesn’t need coding and encoding processes. Especially, it has self-adaptation search capacity which makes it quickly and nicely guide to optimal points. The ES algorithm has three basis operations: 1) Mutation: The main variations in evolution algorithms are mutations. The mechanism of ES mutation is implemented by making perturbations, namely adding random numbers which follow a Gaussian distribution to its variable vector (chromosome). Let ( x1 , , xn ) be the chromosome involving n variables. We represent a mutation operator by a formulation as follows:

xik +1 = xik + N (0, σ ik +1 ) .

σi the number of generation. The σ i

Where the strategy parameters

are actually step sizes of mutation and k is will also evolve in their own way.

σ ik +1 = σ ik exp(τ ⋅ N (0,1)) . Where

τ

(8)

(9)

is learning rate controlling the self-adaptation speed of σ i .

2) Recombination: Mutation performs a search step which is only based on the information of only one parent, while recombination can shares several parents’ information. It reproduces only one offspring from ρ parents. ES has two versions of recombination technique: discrete recombination and intermediate recombination. The discrete recombination selects every component randomly for the offspring from relevant components of ρ parent individuals. In contrast, the intermediate recombination gives an equal right to all ρ parents for reproduction. The offspring take the average of ρ parent vectors as its value. When the variables are discrete, we need to perform a rounding operation. 3) Selection: Selection operation is necessary to direct the search to the promising range of the object parameter space and give a result of optimization. The basic selection approaches of ES are comma selection ( µ , λ ) and plus selection ( µ + λ ) . For ( µ , λ ) selection, only the

λ

offspring individuals can be selected, while the parental individuals are

Optimizing the Hyper-parameters for SVM by Combining Evolution Strategies

717

not in the selected set. This selection technique will forget the information of parent generation. Thus we can avoid pre-converging on the local optimal points. However, the plus selection ( µ + λ ) gives a choice from the selection set which comprises the parents and offspring. It can ensure the best individuals survive and preserve them. Moreover, the fitness is also very important to ES. It is a criterion for evaluating searching result. There are several approaches that can evaluate the searching results of hyper-parameters for SVM, such as the k-fold cross-validation [14], the radius-margin [1] and the area under the curve of the classifier function [10], etc. The cross-validation usually is used as a tool of evaluation of statistical learning method. Due to avoiding the overfitting problem, it often is applied to train classifiers. In papers [4], [14], [16], the cross-validation becomes an evaluation criterion of the hyper-parameters. The authors of paper [3] have tested the several evaluation criteria and proved they generate qualitatively the same results. We consider the cross-validation is a direct and easy-tounderstand criterion. So, we use k-fold cross-validation technique to train and test SVM. Finally, we use the correct classification rate of k-fold cross-validation as the fitness of ES. The grid search is an exhaustive search method. It examines the search space entirely. Each dimension of a grid represents a variable to be optimized. The variables can be successive or discrete. The grid search requires specifying a range for each dimension, which defines the maximum and minimum values of the dimension. Moreover, we should give the number of divisions along each dimension that can regularize how fine the grids are. 3.3 The ES-GS Method for Hyper-parameter Optimization In the section 3.1, we have analyzed the distributing characteristic of hyperparameters. That is the optimal hyper-parameters usually form a neighborhood in which the evaluation function has an equal and optimal value. Because this distribution characteristic draws little attention previously, few people have given an appropriate method which can find all of the optimal hyper-parameters in each neighborhood. Indubitably, the grid search is an exhaustive search method which can find every optimal hyper-parameters points, but its efficiency of computing is too low. When the number of parameters is even a little large, the grid search will be computationally unfeasible. ES algorithm is efficient and has been successfully applied to optimize hyper-parameters. However, it will stop the searching process when the one optimal point is found. So it can only find single optimal point and can not find all the optimal points in neighborhoods. We propose a hybrid method to select the hyper-parameters based on the distributing characteristic of them, mentioned above. This method combines evolution strategies with a grid search to select the hyper-parameters, which is called ES-GS by us. The details of ES-GS are described as follows. We firstly find some optimal hyper-parameters points by ES. That takes full advantage of computing efficiency of ES. In the running process of ES, we make a

718

R. Liu et al.

timely record of evolution step size

σi

of every hyper-parameter. When the optimal

hyper-parameter points are found, we let its relevant step size be σ . After finding some optimal hyper-parameters points by ES, we start a grid search in the neighborhood of them. Each dimension (axis) of the grid relates to one of hyper*

parameters of SVM. The range of each dimension is twice the step size σ of ES and the *

centre of the grid is the optimal hyper-parameter points corresponding to σ which have been found by ES. The number of divisions, denoted by l, can be adjusted by a trade-off between the precision and computing quantity. For ensuring convergence, ES must have a precision control that means the search will stop when the present fitness is close enough with the previous fitness. Therefor, ES can not find all points in neighborhoods. This is why we carry out a grid search after ES. The grid search needs to be given a range for each dimension. We make use of the results of ES and let each range to be equal to twice its relevant step size. This range selection technique can not only make grid window most possibly lay over all optimal hyper-parameters points in the neighborhood, but also ensure the searching range not to be too large. The large range of grid search dimension results in much computing time. By the way, some authors firstly run a grid search and then execute ES [3] is in order to make ES have a more quickly searching. However, our scheme (ES-GS), running a grid search after ES, is order to find all the optimal points in neighborhoods. *

4 Experimental Results and Discussions 4.1 Datasets and Programs The goal of experiments in this section is to verify the validity of ES-GS, used to find the optimal hyper-parameters, which is proposed in this paper. For enhancing possibility of finding global optimal hyper-parameters, we select a ( µ , λ ) -ES and

set µ = 3 , λ = 3 and ρ = 3 . The recombination manner is intermediate recombination. For the number of divisions GS, l, is set to be 10. The IRIS dataset from the UCI repository [17] is selected as training data and testing data. For completing the experimental task, we only need to consider a binary classification problem. However, the IRIS dataset has three class labels. We modify the dataset as binary class datasets just like in [4], [9], [10]. The IRIS1v23 denote class 2 and class 3 are viewed as one class (class23) and we classify class1 and class23. Similarly, the IRIS2v13 and IRIS3v12 are the same as IRIS1v23. For fully proving that our method is in effect, we tested two SVMs with two different kernel functions, i.e., the polynomial kernel function SVM (P-SVM for short) and the Gaussian kernel function SVM (G-SVM). Their hyper-parameters respectively are (C , d ) and (C , σ ) . The program is written in matlab language and implemented on Matlab 7.0. We ran the experiments on a Pentium IV (2.4GHz) PC with 512MB RAM.

Optimizing the Hyper-parameters for SVM by Combining Evolution Strategies

719

4.2 Results and Discussions We train and test P-SVM and G-SVM by 10-fold cross-validation respectively on each modified IRIS dataset. Then the testing results were used as the fitness of ES. The table 1 shows the results of hyper-parameters optimization which is obtained by the ES. They are some unattached points. Table 1. The optimal hyper-parameters results of single ES P-SVM

SVM Dataset IRIS1v23

G-SVM

(C , d )

Fitness

(C , σ )

Fitness

(500, 5)

1.0000

(500, 10)

1.0000

IRIS2v13

(27, 7)

0.9200

(252, 6)

0.9533

IRIS3v12

(126, 2)

0.9533

(500, 11)

0.9600

The results of grid search, a group of hyper-parameters and its relevant fitness (correct classification rate), can build a database. For example, we selected one of results in table 1, P-SVM on IRIS2v13, to show its ES-GS optimization results in table 2. The values in brackets are (C , d , Fitness ) respectively. The points surrounded by dashed lines are optimal points of hyper-parameters of P-SVM. We can see they distribute in a neighborhood. From Table 2, we obtained sound confirmation of the distributing characteristics of hyper-parameters, which has been analyzed in section 3.1. Furthermore, this experiment also verified the ES-GS, proposed by us, can be competent for finding all the optimal hyper-parameters in a neighborhood. That is different from existed approaches which can find only some unattached optimization hyper-parameters points. Table 2. The optimal hyper-parameters results of ES-GS for P-SVM onIRIS2v13 Ă Ă Ă Ă Ă Ă Ă Ă Ă (15.7,3,90.7) (15.7,4,90.7) (15.7,5,91.3) (15.7,6,92.0) (15.7,7,92.0) (15.7,8,92.0) (15.7,9,92.0) (15.7,10,90.7) Ă Ă (18.2,3,90.0) (18.2,4,90.0) (18.2,5,92.0) (18.2,6,92.0) (18.2,7,92.0) (18.2,8,92.0) (18.2,9,91.3) (18.2,10,89.3) Ă Ă (20.7,3,90.0) (20.7,4,90.0) (20.7,5,92.0) (20.7,6,92.0) (20.7,7,92.0) (20.7,8,92.0) (20.7,9,90.7) (20.7,10,89.3) Ă Ă (23.2,3,89.3) (23.2,4,91.3) (23.2,5,92.0) (23.2,6,92.0) (23.2,7,92.0) (23.2,8,92.0) (23.2,9,89.3) (23.2,10,89.3) Ă Ă (25.7,3,90.0) (25.7,4,91.3) (25.7,5,92.0) (25.7,6,92.0) (25.7,7,92.0) (25.7,8,91.3) (25.7,9,89.3) (25.7,10,88.0) Ă Ă (28.3,3,90.0) (28.3,4,91.3) (28.3,5,92.0) (28.3,6,92.0) (28.3,7,92.0) (28.3,8,91.3) (28.3,9,89.3) (28.3,10,88.0) Ă Ă (30.8,3,90.7) (30.8,4,91.3) (30.8,5,92.0) (30.8,6,92.0) (30.8,7,92.0) (30.8,8,89.3) (30.8,9,89.3) (30.8,10,88.0) Ă Ă (33.3,3,90.7) (33.3,4,91.3) (33.3,5,92.0) (33.3,6,92.0) (33.3,7,92.0) (33.3,8,89.3) (33.3,9,88.7) (33.3,10,88.0) Ă Ă (35.8,3,90.7) (35.8,4,92.0) (35.8,5,92.0) (35.8,6,92.0) (35.8,7,91.3) (35.8,8,89.3) (35.8,9,88.0) (35.8,10,88.0) Ă Ă (38.3,3,90.7) (38.3,4,92.0) (38.3,5,92.0) (38.3,6,92.0) (38.3,7,91.3) (38.3,8,89.3) (38.3,9,88.0) (38.3,10,88.7) Ă Ă Ă Ă Ă Ă Ă Ă Ă

720

R. Liu et al.

5 Conclusions In this paper, we analyzed the distributing characteristic of hyper-parameters of SVM. The optimal hyper-parameters points of SVM are not unattached that means they form a neighborhood in which they have an equal criterion function value. Aiming at this characteristic, we proposed a hybrid method of combining evolution strategies with a grid search (ES-GS) and used this method to optimize the hyper-parameters for SVM. Our method is based on the following idea. The ES can only search some unattached points of hyper-parameters but has a high computing efficiency. Although GS can find allover the hyper-parameters, it will be computationally unfeasible when the variables are excessive or the variables have a large range. Our method presented in this work, ES-GS, can synthesize their advantages and escape from their drawbacks. The experiments have shown that ES-GS can quickly find all the optimal hyper-parameter points. Finally, we have to mention that the ES can not guarantee to find global optimization points of hyper-parameters. In theory, the global optimum is obtained when the number of generations goes to infinity. That is impossible in practice.

References 1. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995) 2. Cortes, C., Vapnik, V.: Support Vector Network. Machine learning, 20 (1995) 273-297 3. Friedrichs, F., Igel, C.: Evolutionary Tuning of Multiple SVM Parameters. Neurocmputing, 64 (2005)107-177 4. Hsu, C., Chang, C., Lin, C.: A Practical Guide to Support Vector Classification, Department of Computer Science and Information Engineering, National Taiwai University (2003) 5. Chapelle, O., Vapnik, V., Bousquet, O., Mukherjee, S.: Choosing Multiple Parameters for Support Vector Machines. Machine Linearing, 46 (2002) 131-159 6. Chung, K.-M., Kao, W.-C., Sun, C.-L., Lin, C.-J.: Radius Margin Bounds for Support Vector Machines with RBF Kernel. Neurocmputing, 15 (2003) 2643-2681 7. Gold, C., Sollich, P.: Model Selection for Support Vector Machine Classification, Neurocomputing, 55 (2003) 221-249 8. Keerthi, S.-S.: Efficient Tuning of SVM Hyper-Parameters Using Radius/Margin Bound and Iterative Algorithms. IEEE Transactions on Neural Networks, 13 (2002) 1225-1229 9. Liu, S., Jia, C.-Y. Ma, H.: A New Weighted Support Vector Machine with GA-based Parameter Selection. Proc. the 4th Int’l Conf. on Machine Learning and Cybernetics(ICMLC’05), (Aug. 2005) 18-21 10. Sergio, A., Rojas, Delmiro, F.-R.: Adapting Multiple Kernel Parameters for Support Vector Machines Using Genetic Algorithms. Division of parasitology national institute for medical research London NW71AA, UK and department of computer science university college London 11. Hansen, N., Ostermeier, A.: Convergence Properties of Evolution Strategies with the Derandomized Covariance Matrix Adaptation: the (ȝ/ȝ,Ȝ)-CMA-ES. Proc. of the 5th European Congress on Intelligent Techniques and Soft Computing(EUFIT’97), (Sept. 1997), 650-654

Optimizing the Hyper-parameters for SVM by Combining Evolution Strategies

721

12. Hansen, N., Ostermeier, A.: Completely Deranomized Self-adaptation in Evolution Strategies. Evolutionary Computation, 9 (2001) 159-195 13. Cassabaum, M.-L., Waagen, D.-E., Rodriguez, J.-J., Schmitt, H.-A.: Unsupervised Optimization of Support Vector Machine Parameters, Automatic target recognition XIV, edited by Firooz A. Sadjadi, Proceedings of SPIE, 5426, 316-325 14. Imbault, F., Lebart, K.: A Stochastic Optimization Approach for Parameter Tuning of Support Vector Machine. Proc. the 17th Int’l conf. on Pattern Recognition (ICPR’04) 15. Nello, C., John S.-T.: An Introduction to Support Vector Machines and Other Kernelbased Learning Methods. Cambridge university Press.(2000) 16. Steve, R.-G.: Support Vector Machines for Classification and Regression, Faculty of engineering, science and mathematics school of electronics and computer science.(1998) 17. Blake, C.-L., Merz, C.-J.: UCI Repository of Machine Learning Databases. http://www.ics.uci.edu/~learn/MLRepository.html. (1998)

Prediction of Sinter Burn-Through Point Based on Support Vector Machines Xiaofeng Wu1,2, Minrui Fei1, Heshou Wang1,3, and Shuibo Zheng3 1

School of Mechatronical Engineering and Automation, Shanghai University, Shanghai 200072, China 2 Automation Department of Laiwu Steel Group, Laiwu 271104, China 3 Shanghai Fire Research Institute of Ministry of Public Security, Shanghai 200032, China [email protected]

Abstract. In order to overcome the long time delays and dynamic complexity in industrial sintering process, a modeling method of prediction of burn-through point (BTP) was proposed based on support vector machines (SVMs). The results indicate SVMs outperform the three-layer Backpropagation (BP) neural network in predicting burn-through point with better generalization performance, and are satisfactory. The model can be used as plant model for the burn-through point control of on-strand sinter machines.

1 Introduction The sintering process is a preprocess for blast-furnace materials. The quality of sinter is very important for smooth operation and high productivity of the blast furnace since it improves the permeability and reducibility of the burden material. The raw mix in the form of small pellets composed essentially of ore, coke and water, is loaded onto a moving strand and leveled to form a bed and also included in the raw mix are return fines: pieces of sinter which are returned for reprocessing because they are too small. The strand is constructed from metal and refractory links, allowing a large fan to suck air down through the bed to the wind boxes where the temperature of sucked air is measured, and the raw mix is fused to form a sinter. When the sinter reaches the end of the strand, it is crushed and products of acceptable size are brought to the blastfurnace, while the return fines are refed for reprocessing. For an optimal operation of the plant, the location along the stand where the sintering process is completed, the so-called burn-through point is important. It is characterized by the maximum of the exhaust gas temperature. The burn-through point is largely determined by the ignition temperature, the height of the bed and the strand speed [1]. Several successful attempts have been made to model the sintering process analytically [2, 3, 4]. The applicability of these models is limited by the fact that a large number of physical properties of the sinter material are

㧘

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 722 – 730, 2006. © Springer-Verlag Berlin Heidelberg 2006

Prediction of Sinter Burn-Through Point Based on Support Vector Machines

723

difficult to obtain. The existing mathematical models are, however, too complicated for control purposes. In this paper, the prediction model based on support vector machines for burnthrough point in the iron ore sintering process is built. The feasibility of applying SVMs to burn-through point modeling was investigated by comparing it with threelayer Backpropagation neural network.

2 Support Vector Regression Support vector machines [4] based on statistical learning theory (SLT), a new powerful learning machine, have up to date had a sound orientation towards realworld applications due to the industrial context. SVMs classifiers [4, 6, 7] became competitive with the best available systems for recognition tasks in a short period of time from initial work on OCR (optical character recognition). In regression [6, 8] and time series prediction applications [9], excellent performances were soon obtained. Given a data set {( x1 , y1 ), ( x2 , y2 )" ( xl , yl )} & u \ , where Χ denotes the space of the input patterns and l is the total number of samples. A primal space is transformed into a high-dimensional feature space by a nonlinear map ĭ ( x ) = (φ1 ( x ), φ2 ( x ), , φn ( x ) ) . Approximating the data set with a nonlinear

function f ( x ) = ȦT ĭ ( x ) + b.

(1)

The coefficients Ȧ and b can be obtained by minimizing the regularized risk function

minmize

1 2

2

Ȧ + C Remp

Remp = The first term Ȧ

2

. 1 l L( yi , f ( xi )). ¦ l 1

(2)

is regularization term, i.e. confidence interval, which controls

the function capacity. The second term Remp is the empirical error measured by the loss function L( yi , f ( xi )) . The initial choice for loss function 4 is the ε -insensitive loss function

L( yi , xi ) = yi − f ( xi ) ε ,

(3)

Where ° y − f ( xi ) − ε , yi − f ( xi ) ε = ® i °¯ 0,

yi − f ( xi ) ≥ ε otherwise.

.

(4)

Other loss functions include quadratic ε -insensitive loss, Huber’s loss and Laplacian loss. C is referred as the regularization constant. ε is called the tube size.

724

X. Wu et al.

Optimization problem (2) can be further transformed to the following primal objective function

Minimize

1 2

l

2

Ȧ + C ¦ (ξi + ξi* ) i =1

yi − ȦT ĭ( xi ) − b ≤ ε + ξi ° subject to ® ȦT ĭ( xi ) + b − yi ≤ ε + ξi* ° * ≥ 0, ¯ ξi , ξi

(5)

where ξ i , ξ i* are positive slack variables. C determines the trade-off between the flatness of f and the amount up to which deviations larger than ε are tolerated. A Lagrange function from the primal objective function and the corresponding constraints was constructed as follows:

L=

1 2

2

l

l

i =1

i =1

Ȧ + C ¦ (ξi + ξi* ) − ¦α i (ε + ξi − yi + ȦT ĭ( xi ) + b) l

l

i =1

i =1

.

(6)

− ¦ αi* (ε + ξi* + yi − ȦT ĭ( xi ) − b) − ¦ (ηiξi + ηi*ξi* ). This Lagrange function has a saddle point with respect to the primal and the dual variables at the optimal solution. The dual variables in Eq.(6) satisfy positive constraints, i.e. α i , α i* ,ηi ,ηi* ≥ 0 . By means of Karush-Kuhn-Tucker (KKT) conditions, we obtain

∂L ° ∂Ȧ ° ° ∂L ° ° ∂b ® ° ∂L ° ∂ξi ° ° ∂L °¯ ∂ξi*

l

= Ȧ − ¦ (α i − α i* )ĭ( xi ) = 0 i =1

l

= ¦ (α i − α i* )

=0

= C − α i − ηi

=0

= C − α i* − ηi*

= 0.

i =1

.

(7)

Based on Mercer condition, we have a kernel function K such that K ( xi , x j ) = ĭ ( xi )T ĭ ( x j ) . Thus SVMs avoid computing explicitly the map ĭ( x) and exploit the kernel function instead. Any function which satisfiers Mercer condition can be used as kernel function. Utilizing Eq.(7) to eliminate the primal variables (Ȧ, b, ξi , ξi* ) in (6), we can obtain the Wolfe dual optimization problem:

Prediction of Sinter Burn-Through Point Based on Support Vector Machines

1 l * * ° - 2 ¦ (α i − α i )(α j − α j ) K ( xi , x j ) ° i , j =1 Maximize ® l l ° − ε (α + α * ) + y (α − α * ) ¦ ¦ i i i i i °¯ . i =1 i =1 ° subject to ® ° ¯

l

¦ (α

i

725

(8)

− α i* ) = 0

i =1

0 ≤ α i , α i* ≤ C.

By solving quadratic program (8), regression function (1) is rewritten as: l

f ( x ) = ¦ (α i − α i* ) K ( xi , x ) + b,

(9)

i =1

where α i , α i * satisfy α i × α i * = 0, α i ≥ 0, α i * ≥ 0 . Only a number of coefficients (α i − α i * ) are nonzero values, and the corresponding training data points have approximation errors equal to or larger than ε . They are called support vectors. Various kernels can be used as follows: (1) Linear Kernel: K ( x , xi ) = xT xi || x − xi ||2 ) 2σ 2 (3) Polynomial Kernel: K ( x , xi ) = ((γ xT xi ) + r )d , d = 1, 2, , N .

(2) RBF Kernel: K ( x, xi ) = exp(−

Here, σ , γ , r and d are kernel parameters.

3 Experimental Methodology 3.1 Data Processing

(1) Data Preprocessing The set of measured data used is from Lai Gang No.2 265 m2 Sinter plant with 27 wind boxes. The desired value of BTP is the location of the 25th wind box. BTP can not be directly measured in sintering process. It is estimated by the three measurement points ( x1 , T1 ), ( x2 , T2 ), ( x3 , T3 ) of the exhaust gas temperature towards the end of the strand. ( x2 , T2 ) is the highest exhaust gas temperature measurement point. The three points is applied to fit quadratic curve Tgas = Ax 2 + Bx + C . The maximum value x0 = − B / 2 A is used for the estimation of BTP. Scaling training data before applying SVMs is very important. The signals are likely to be measured in different physical units. These attributes in greater numeric ranges dominate those in smaller numeric ranges. Each attribute is recommended to linearly scale to the range [-1, 1] or [0, 1].

726

X. Wu et al.

Suppose to scale a certain attribute of training data set from ' min

[ ymin

ymax ] to

' max

ª¬ y y º¼ . y is the raw attribute value of training data or test data. The scaled value is obtained by ' y ' = ymin +

' ' ymax − ymin ( y − ymin ) . ymax − ymin

(10)

(2) Data Postprocessing Each output value yp of SVMs should be converted to its original unit by yraw_p = ymin +

ymax − ymin ' ( y p − ymin ). ' ' ymax − ymin

(11)

3.2 Performance Criteria

The learning and generalization prediction performance is evaluated using the following statistical metrics, namely, the learning mean squared error (LMSE) and generalized mean squared error (GMSE). MSE =

1 n ¦ ( yi − yˆi )2 . n i =1

(12)

n is the total sample number, y and yˆ i represent measured and estimated values respectively. LMSE is MSE on the training set and GMSE is the criteria on the test set. 3.3 Running Environment

SMO [10] for solving regression problem is implemented in this experiment and the program is developed using the VC++ language. The source files is compiled as Mexfile in Matlab. A three-layer BP neural network is used as a benchmark. There are three nodes in the input layer. The number of output node is equal to 1. The number of hidden nodes is determined based on the experimental results. Bayesian Regulation backpropagation [11] is used as network training function. The transfer function of the hidden layer is a tangent sigmoid function and a linear transfer function in the output layer is used. The BP software used directly from Matlab 6.1 neural network toolbox version 4.0.1.

4 Using SVMs and BP for BTP Prediction BTP prediction model has 3 inputs and 1 output. The three inputs are bed height h , ignition temperature T and strand velocity v . The output is BTP. The structure of BTP prediction model is shown in Fig. 1.

Prediction of Sinter Burn-Through Point Based on Support Vector Machines

727

h

T

v Fig. 1. The structure of BTP prediction model

The training set is 152 samples and 70 samples are used as the test set, which are shown in Fig. 2. Few training and test data can demonstrate better learning ability of SVMs. We choose RBF kernel with width σ =1.7, the loss function with ε =0.001 and regularization constant C =100 in SVMs. The number of hidden nodes in the BP network is 9. The number of support vector for training is 115. The results on the training and test are given in Table 1. The results of BP and SVMs model approximation to BTP

Fig. 2. Measurement points of three input quantity

728

X. Wu et al. Table 1. Results on BTP prediction model

LMSE GMSE

SVMs 0.000108 0.000275

BP 0.000007 0.001506

(a) training set

(b) test set Fig. 3. BP prediction model approximation to BTP

Prediction of Sinter Burn-Through Point Based on Support Vector Machines

729

(a) training set

(b) test set Fig. 4. SVMs prediction model approximation to BTP

are shown in Fig. 3 and Fig. 4, respectively. Fig. 3 (b) and Fig. 4 (b) show theprediction abilities of the BP and SVMs using a set of measured data which had not been used during the training. It can be seen better generalization performance can

730

X. Wu et al.

be achieved via SVMs than the BP. This indicates SVMs has better generalization capability to prevent the overfitting problem. The SVMs prediction model is capable of predicting BTP with good accuracy.

5 Conclusion This paper presents a new prediction model of burn-through point in the sintering process based on support vector machines. SVMs require only small amount of training samples to obtain better generalization performance than the BP network. Support vector machines provide an effective prediction method for BTP with proper accuracy. The prediction model of BTP is to be further investigated as control model for the BTP control system in the sintering process.

Acknowledgement This work was supported by Doctoral Program Foundation of Science & Technology Special Project in University (20040280017), Key Project of Science & Technology Commission of Shanghai Municipality under grant 04JC14038, and Shanghai Leading Academic Disciplines (T0103).

References 1. Hu, J. Q., Rose, E.: Simulation of an Iiron Ore Sintering Plant. Research Report No.488. Department of Automatic Control and Systems Engineering, University of Sheffield, England (1993) 2. Young, R. W.: Dynamic Model of Sintering Process. Ironmaking and Steelmaking, 5 (1979) 25-31 3. Rose, E., Anderson, W. R. M., Orak, M.: Simulation of Sintering, IFAC World Congr., 10, (1993) 289-194 4. Augustin, M., Arbeithuber, C., JORGL, H. P.: Modeling and Simulation of an Iron Ore Sinter Strand. Proc. EUROSIM Congress '95, Sept. 11-15, Vienna, Austria (1995) 5. Vapnik, V. N.: The Nature of Statistical Leaning Theory. Springer-Verlag, New York (1995) 6. Vapnik, V. N.: Statistical Learning Theory. John Wiley & Sons, New York (1998) 7. Schölkopf, B.: Learning with Kernels. Ph. D. Thesis. Universität Tübingen, (1997) 8. Smola, A. J., Schölkopf, B.: A Tutorial on Support Vector Regression. NeuroCOLT2 Technical Report Series. UK: University of London, London (1998) 9. Müller, K. R., Smola, A. J., Rätsh, G.: Using Support Vector Machines for Time Series Prediction. Advances in Kernel Methods. Cambridge, MIT Press, MA (1998) 185–208 10. Platt, J. C.: Fast Training of SVMs Using Sequential Minimal Optimization. Schölkopf, B., Burges, C. J. C, Smola, A. J. Advances in Kernel Methods-Support Vector Learning. Cambridge, MIT Press, MA (1998) 185–208 11. Foresee, F. D., Hagan, M. T.: Gauss-Newton Approximation to Bayesian Regularization. Proceedings of the International Joint Conference on Neural Networks, Canada (1997)

Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset Show-Jane Yen and Yue-Shi Lee Department of Computer Science and Information Engineering, Ming Chuan University 5 The-Ming Rd., Gwei Shan District, Taoyuan County 333, Taiwan {sjyen, leeys}@mcu.edu.tw

Abstract. The most important factor of classification for improving classification accuracy is the training data. However, the data in real-world applications often are imbalanced class distribution, that is, most of the data are in majority class and little data are in minority class. In this case, if all the data are used to be the training data, the classifier tends to predict that most of the incoming data belong to the majority class. Hence, it is important to select the suitable training data for classification in the imbalanced class distribution problem. In this paper, we propose cluster-based under-sampling approaches for selecting the representative data as training data to improve the classification accuracy for minority class in the imbalanced class distribution problem. The experimental results show that our cluster-based under-sampling approaches outperform the other under-sampling techniques in the previous studies.

1 Introduction Classification Analysis [5, 7] is a well-studied technique in data mining and machine learning domains. Due to the forecasting characteristic of classification, it has been used in a lot of real applications, such as flow-away customers and credit card fraud detections in finance corporations. Classification analysis can produce a class predicting system (or called a classifier) by analyzing the properties of a dataset having classes. The classifier can make class forecasts on new samples with unknown class labels. For example, a medical officer can use medical predicting system to predict if a patient have drug allergy or not. A dataset with given class can be used to be a training dataset, and a classifier must be trained by a training dataset to have the capability for class prediction. In brief, the process of classification analysis is included in the follow steps: 1. Sample collection. 2. Select samples and attributes for training. 3. Train a class predicting system using training samples. 4. Use the predicting system to forecast the class of incoming samples. The classification techniques usually assume that the training samples are uniformly-distributed between different classes. A classifier performs well when the classification technique is applied to a dataset evenly distributed among different D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 731 – 740, 2006. © Springer-Verlag Berlin Heidelberg 2006

732

S.-J. Yen and Y.-S. Lee

classes. However, many datasets in real applications involve imbalanced class distribution problem [9, 11]. The imbalanced class distribution problem occurs while there are much more samples in one class than the other class in a training dataset. In an imbalanced dataset, the majority class has a large percent of all the samples, while the samples in minority class just occupy a small part of all the samples. In this case, a classifier usually tends to predict that samples have the majority class and completely ignore the minority class. Many applications such as fraud detection, intrusion prevention, risk management, medical research often have the imbalanced class distribution problem. For example, a bank would like to construct a classifier to predict that whether the customers will have fiduciary loans in the future or not. The number of customers who have had fiduciary loans is only two percent of all customers. If a fiduciary loan classifier predicts that all the customers never have fiduciary loans, it will have a quite high accuracy as 98 percent. However, the classifier can not find the target people who will have fiduciary loans within all customers. Therefore, if a classifier can make correct prediction on the minority class efficiently, it will be useful to help corporations make a proper policy and save a lot of cost. In this paper, we study the effects of undersampling [1, 6, 10] on the backpropagation neural network technique and propose some new under-sampling approaches based on clustering, such that the influence of imbalanced class distribution can be decreased and the accuracy of predicting the minority class can be increased.

2 Related Work Since many real applications have the imbalanced class distribution problem, researchers have proposed several methods to solve this problem. As for re-sampling approach, it can be distinguished into over-sampling approach [4, 9] and undersampling approach [10, 11]. The over-sampling approach increases the number of minority class samples to reduce the degree of imbalanced distribution. One of the famous over-sampling approaches is SMOTE [2]. SMOTE produces synthetic minority class samples by selecting some of the nearest minority neighbors of a minority sample which is named S, and generates new minority class samples along the lines between S and each nearest minority neighbor. SMOTE beats the random oversampling approaches by its informed properties, and reduce the imbalanced class distribution without causing overfitting. However, SMOTE blindly generate synthetic minority class samples without considering majority class samples and may cause overgeneralization. On the other hand, since there are much more samples of one class than the other class in the imbalanced class distribution problem, under-sampling approach is supposed to reduce the number of samples which have the majority class. Assume in a training dataset, MA is the sample set which has the majority class, and MI is the other set which has the minority class. Hence, an under-sampling approach is to decrease the skewed distribution of MA and MI by lowering the size of MA. Generally, the performances of under-sampling approaches are worse than that of undersampling approaches.

Under-Sampling Approaches for Improving Prediction

733

One simple method of under-sampling is to select a subset of MA randomly and then combine them with MI as a training set, which is called random under-sampling approach. Several advanced researches are proposed to make the selective samples more representative. The under-sampling approach based on distance [11] uses distinct modes: the nearest, the farthest, the average nearest, and the average farthest distances between MI and MA, as four standards to select the representative samples from MA. For every minority class sample in the dataset, the first method “nearest” calculates the distances between all majority class samples and the minority class samples, and selects k majority class samples which have the smallest distances to the minority class sample. If there are n minority class samples in the dataset, the “nearest” approach would finally select k× n majority class samples (k ≥ 1). However, some samples within the selected majority class samples might duplicate. Similar to the “nearest” approach, the ”farthest” approach selects the majority class samples which have the farthest distances to each minority class samples. For every majority class samples in the dataset, the third method “average nearest” calculates the average distance between one majority class sample and all minority class samples. This approach selects the majority class samples which have the smallest average distances. The last method “average farthest” is similar to the “average nearest” approach; it selects the majority class samples which have the farthest average distances with all the minority class samples. The above under-sampling approaches based on distance in [11] spend a lot of time selecting the majority class samples in the large dataset, and they are not efficient in real applications. In 2003, J. Zhang and I. Mani [10] presented the compared results within four informed under-sampling approaches and random under-sampling approach. The first method “NearMiss-1” selects the majority class samples which are close to some minority class samples. In this method, majority class samples are selected while their average distances to three closest minority class samples are the smallest. The second method “NearMiss-2” selects the majority class samples while their average distances to three farthest minority class samples are the smallest. The third method “NearMiss3” take out a given number of the closest majority class samples for each minority class sample. Finally, the fourth method “Most distant” selects the majority class samples whose average distances to the three closest minority class samples are the largest. The final experimental results in [10] showed that the NearMiss-2 approach and random under-sampling approach perform the best.

3 Our Approaches In this section, we present our approach SBC (under-Sampling Based on Clustering) which focuses on the under-sampling approach and uses clustering techniques to solve the imbalanced class distribution problem. Our approach first clusters all the training samples into some clusters. The main idea is that there are different clusters in a dataset, and each cluster seems to have distinct characteristics. If a cluster has more majority class samples and less minority class samples, it will behave like the majority class samples. On the opposite, if a cluster has more minority class samples and less majority class samples, it doesn’t hold the characteristics of the majority class samples and behaves more like the minority class samples. Therefore, our

734

S.-J. Yen and Y.-S. Lee

approach SBC selects a suitable number of majority class samples from each cluster by considering the ratio of the number of majority class samples to the number of minority class samples in the cluster. 3.1 Under-Sampling Based on Clustering Assume that the number of samples in the class-imbalanced dataset is N, which includes majority class samples (MA) and minority class samples (MI). The size of the dataset is the number of the samples in this dataset. The size of MA is represented as SizeMA, and SizeMI is the number of samples in MI. In the class-imbalanced dataset, SizeMA is far larger than SizeMI. For our under-sampling method SBC, we first cluster all samples in the dataset into K clusters. The number of majority class samples and i the number of minority class samples in the ith cluster (1ูiูK) are SizeMA and i SizeMI , respectively. Therefore, the ratio of the number of majority class samples to i i the number of minority class samples in the ith cluster is SizeMA / SizeMI . If the ratio of SizeMA to SizeMI in the training dataset is set to be m:1, the number of selected majority class samples in the ith cluster is shown in expression (1): i Size MA i SSize MA = (m × SizeMI ) ×

K

¦

i Size MI

i Size MA

i =1

(1)

i Size MI

In expression (1), m × SizeMI is the total number of selected majority class samples K

i

that we suppose to have in the final training dataset. ¦ SizeMA i =1

i SizeMI

is the total ratio of

the number of majority class samples to the number of minority class samples in all clusters. expression (1) determines that more majority class samples would be selected in the cluster which behaves more like the majority class samples. In other i is larger while the ith cluster has more majority class samples and words, SSizeMA less minority class samples. After determining the number of majority class samples

which are selected in the ith cluster, 1ูiูK, by using expression (1), we randomly choose majority class samples in the ith cluster. The total number of selected majority class samples is m× SizeMI after merging all the selected majority class samples in each cluster. At last, we combine the whole minority class samples with the selected majority class samples to construct a new training dataset. Table 1 shows the steps for our under-sampling approach. For example, assume that an imbalanced class distribution dataset has totally 1100 samples. The size of MA is 1000 and the size of MI is 100. In this example, we cluster this dataset into three clusters. Table 2 shows the number of majority class samples i i i Size MA , the number of minority class samples SizeMI , and the ratio of Size MA to i SizeMI for the ith cluster.

Under-Sampling Approaches for Improving Prediction

735

Table 1. The structure of the under-sampling based on clustering approach SBC

Step1. Step2. Step3.

Step4.

Determine the ratio of SizeMA to SizeMI in the training dataset. Cluster all the samples in the dataset into some clusters. Determine the number of selected majority class samples in each cluster by using expression (1), and then randomly select the majority class samples in each cluster. Combine the selected majority class samples and all the minority class samples to obtain the training dataset. Table 2. Cluster descriptions

Cluster ID 1 2 3

Number of majority class samples 500 300 200

Number of minority class samples 10 50 40

i i SizeMA / SizeMI

500/10=50 300/50=6 200/40=5

Assume that the ratio of SizeMA to SizeMI in the training data is set to be 1:1, in other words, there are 100 selected majority class samples and the whole 100 minority class samples in this training dataset. The number of selected majority class samples in each cluster can be calculated by expression (1). Table 3 shows thenumber of selected majority class samples in each cluster. We finally select the majority samples randomly from each cluster and combine them with the minority samples to form the new dataset. Table 3. The number of selected majority class samples in each cluster

Cluster ID 1 2 3

The number of selected majority class samples 1×100× 50 / (50+6+5) =82 1×100× 6 / (50+6+5) = 10 1×100× 5 / (50+6+5)= 8

3.2 Under-Sampling Based on Clustering and Distances

In SBC method, all the samples are clustered into several clusters and the number of selected majority class samples is determined by expression (1). Finally, the majority class samples are randomly selected from each cluster. In this section, we propose other five under-sampling methods, which are based on SBC approach. The difference between the five proposed under-sampling methods and SBC method is the way to select the majority class samples from each cluster. For the five proposed methods, the majority class samples are selected according to the distances between the majority class samples and the minority class samples in each cluster. Hence, the distances

736

S.-J. Yen and Y.-S. Lee

between samples will be computed. For a continuous attribute, the values of all samples for this attribute need to be normalized in order to avoid the effect of different scales for different attributes. For example, suppose A is a continuous attribute. In order to normalize the values of attribute A for all the samples, we first find the maximum value MaxA and the minimum value MinA of A for all samples. To lie an attribute value ai in between 0 to 1, ai is normalized to

ai − MinA MaxA − MinA

. For a categorical

or discrete attribute, the distance between two attribute values x1 and x2 is 0 (i.e. x1x2=0) while x1 is not equal to x2, and the distance is 1 (i.e. x1-x2=1) while they are the same. Assume that there are N attributes in a dataset and Vi

X

represents the value of at-

tribute Ai in sample X, for 1ูiูN. The Euclidean distance between two samples X and Y is shown in expression (2). distance( X , Y ) =

N

¦ (Vi

i =1

X

− ViY ) 2

(2)

The five approaches we proposed in this section first cluster all samples into K (K 1) clusters as well, and determine the number of selected majority class samples for each cluster by expression (1). For each cluster, the representative majority class samples are selected in different ways. The first method SBCNM-1 (Sampling Based on Clustering with NearMisss-1) selects the majority class samples whose average distances to M nearest minority class samples (M 1) in the ith cluster (1ูiูK) are the smallest. In the second method SBCNM-2 (Sampling Based on Clustering with NearMisss-2), the majority class samples whose average distances to M farthest minority class samples in the ith cluster are the smallest will be selected. The third method SBCNM-3 (Sampling Based on Clustering with NearMisss-3) selects the majority class samples whose average distances to the closest minority class samples in the ith cluster are the smallest. In the forth method SBCMD (Sampling Based on Clustering with Most Distant), the majority class samples whose average distances to M closest minority class samples in the ith cluster are the farthest will be selected. For the above four approaches, we refer to [10] for selecting the representative samples in each cluster. The last proposed method, which is called SBCMF (Sampling Based on Clustering with Most Far), selects the majority class samples whose average distances to all minority class samples in the cluster are the farthest.

4 Experimental Results For our experiments, we use three criteria to evaluate the classification accuracy for minority class: the precision rate P, the recall rate R, and the F-measure for minority class. Generally, for a classifier, if the precision rate is high, then the recall rate will be low, that is, the two criteria are trade-off. We cannot use one of the two criteria

Under-Sampling Approaches for Improving Prediction

737

to evaluate the performance of a classifier. Hence, the precision rate and recall rate are combined to form another criterion F-measure, which is shown in expression (3). MI’s F-measure =

2× P× R P+R

(3)

In the following, we use the three criteria discussed above to evaluate the performance of our approaches SBC, SBCNM-1, SBCNM-2, SBCNM-3, SBCMD, and SBCMF by comparing our methods with the other methods AT, RT, and NearMiss-2. The method AT uses all samples to train the classifiers and does not select samples. RT is the most common-used random under-sampling approach and it selects the majority class samples randomly. The last method NearMiss-2 is proposed by J. Zhang and I. Mani [10], which has been discussed in section 2. The two methods RT and NearMiss-2 have the better performance than the other proposed methods in [10]. In the following experiments, the classifiers are constructed by using the artificial neural network technique in IBM Intelligent Miner for Data V8.1. Table 4. The experimental results on Census-Income Database

Method

MI’s Precision

MI’s Recall

MI’s F-measure

MA’s Precision

MA’s Recall

MA’s F-measure

SBC RT AT NearMiss-2 SBCNM-1 SBCNM-2 SBCNM-3 SBCMD SBCMF

47.78 30.29 35.1 46.3 29.28 29.6 28.72 29.01 43.15

88.88 99.73 98.7 81.23 99.80 99.67 99.8 99.73 93.48

62.15 46.47 51.9 58.98 45.28 45.64 44.61 44.94 59.04

94.84 99.63 98.9 91.70 99.67 99.49 99.63 99.54 96.47

67.79 23.92 39.5 68.77 20.07 21.39 17.9 19.05 59.15

79.06 38.58 43.8 78.60 33.41 35.21 30.35 31.99 73.34

We compare our approaches with the other under-sampling approaches in two real datasets. One of the real datasets is named Census-Income Database, which is from UCI Knowledge Discovery in Databases Archive. Census-Income Database contains census data which are extracted from the 1994 and 1995 current population surveys managed by the U.S. Census Bureau. The binary classification problem in this dataset is to determine the income level for each person represented by the record. The total number of samples after cleaning the incomplete data is 30162, including 22654 majority class samples which the income level are less than 50K dollars and 7508 minority class samples which the income level are greater than or equal to 50K dollars. We use eighty percent of the samples to train the classifiers and twenty percent to evaluate the performances of the classifiers. The precision rate, recall rate, and F-measure for our approaches and the other approaches are shown in Table 4. Fig 1 shows

S.-J. Yen and Y.-S. Lee

N M -1 SB C N M -2 SB C N M -3 SB C M D SB C M F

SB C

iss -2

T A N ea

rM

RT

200 180 160 140 120 100 80 60 40 20 0 SB C

Execution time (min.)

738

M ethods

Fig. 1. The execution time on Census-Income Database for each method

the execution time for each method, which includes selecting the training data and training the classifier. In Table 4, we can observe that our method SBC has the highest MI’s F-measure and MA’s F-measure while comparing with other methods. Besides, SBC only need to take a short execution time which is shown in Fig 1. The other real dataset in our experiment is conducted by a bank and is called Overdue Detection Database. The records in Overdue Detection Database contain the information of customers, the statuses of customers’ payment, the amount of money in customers’ bills, and so on. The purpose of this binary classification problem is to detect the bad customers. The bad customers are the minorities within all customers and they do not pay their bills before the deadline. We separate Overdue Detection Database into two subsets. The dataset extracted from November in 2004 are used for training the classifier and the dataset extracted from December in 2004 are used for testing task. The total number of samples in the training data of Overdue Detection Database is 62309, including 47707 majority class samples which represent the good customers and 14602 minority class samples which represent the bad customers. The total number of samples in the testing data of Overdue Detection Database is 63532, including 49931 majority class samples and 13601 minority class samples. Fig 2 shows the precision rate, the recall rate and the F-measure of minority class for each approach. From Fig 2, we can see that our approaches SBC and SBCMD have the best MI’s F-measure. Fig 3 shows the execution times for all the approaches in Overdue Detection Database. In the two real applications which involve the imbalanced class distribution problem, our approach SBC has the best performances on predicting the minority class samples. Moreover, SBC takes less time for selecting the training samples than the other approaches NearMiss-2, SBCNM-1, SBCNM-2, SBCNM-3, SBCMD, and SBCMF.

Under-Sampling Approaches for Improving Prediction

739

F

D

M SB C

M SB C

N M -3

N M -2

SB C

SB C

N M -1

SB C

iss -2

T N ea

rM

A

RT

200 180 160 140 120 100 80 60 40 20 0 SB C

Execution time (min.)

Fig. 2. The Experimental Results on Overdue Detection Database

Methods

Fig. 3. Execution time on Overdue Detection Database for each method

5 Conclusion In a classification task, the effect of imbalanced class distribution problem is often ignored. Many studies [3, 7] focused on improving the classification accuracy but did not consider the imbalanced class distribution problem. Hence, the classifiers which are constructed by these studies lose the ability to correctly predict the correct deci-

740

S.-J. Yen and Y.-S. Lee

sion class for the minority class samples in the datasets which the number of majority class samples are much greater than the number of minority class samples. Many real applications, like rarely-seen disease investigation, credit card fraud detection, and internet intrusion detection always involve the imbalanced class distribution problem. It is hard to make right predictions on the customers or patients who that we are interested in. In this study, we propose cluster-based under-sampling approaches to solve the imbalanced class distribution problem by using backpropagation neural network. The other two under-sampling methods, Random selection and NearMiss-2, are used to be compared with our approaches in our performance studies. In the experiments, our approach SBC has better prediction accuracy and stability than other methods. SBC not only has high classification accuracy on predicting the minority class samples but also has fast execution time. However, SBCNM-1, SBCNM-2, SBCNM-3, and SBCMF do not have stable performances in our experiments. The four methods take more time than SBC on selecting the majority class samples as well.

References 1. Chawla, N. V.: C4.5 and Imbalanced Datasets: Investigating the Effect of Sampling Method, Probabilistic Estimate, and Decision Tree Structure. Proceedings of the ICML’03 Workshop on Class Imbalances, (2003) 2. Chawla, N. V., Bowyer, K.W., Hall, L. O., Kegelmeyer, W. P.: SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, 16 (2002) 321– 357 3. Caragea, D., Cook, D., Honavar, V.: Gaining Insights into Support Vector Machine Pattern Classifiers Using Projection-Based Tour Methods. Proceedings of the KDD Conference, San Francisco, CA (2001) 251-256 4. Chawla, N. V., Lazarevic, A., Hall, L. O., Bowyer, K. W.: Smoteboost: Improving Prediction of the Minority Class in Boosting. Proceedings of the Seventh European Conference on Principles and Practice of Knowledge Discovery in Databases, Dubrovnik, Croatia (2003) 107-119 5. Clark, P., Niblett, T.: The CN2 Induction Algorithm. Machine Learning, 3 (1989) 261-283 6. Drummond, C., Holte, R. C.: C4.5, Class Imbalance, and Cost Sensitivity: Why UnderSampling Beats Over-Sampling. Proceedings of the ICML’03 Workshop on Learning from Imbalanced Datasets, (2003) 7. Del-Hoyo, R., Buldain, D., Marco, A.: Supervised Classification with Associative SOM. Lecture Notes in Computer Science, 2686 (2003) 334-341 8. Japkowicz, N.: Concept-learning in the Presence of Between-class and Within-class Imbalances. Proceedings of the Fourteenth Conference of the Canadian Society for Computational Studies of Intelligence, (2001) 67-77 9. Zhang, J., Mani, I.: KNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction. Proceedings of the ICML’2003 Workshop on Learning from Imbalanced Datasets, (2003). 10. Chyi, Y. M.: Classification Analysis Techniques for Skewed Class Distribution Problems. Master Thesis, Department of Information Management, National Sun Yat-Sen University, (2003)

Utilizing a Hierarchical Method to Deal with Uncertainty in Context-Aware Systems Donghai Guan, Weiwei Yuan, Mohammad A.U. Khan, Youngkoo Lee*, Sungy-oung Lee, and Sangman Han Department of Computer Engineering Kyung Hee University, Korea {donghai, weiwei, khan, sylee, i30000}@oslab.khu.ac.kr, [email protected]

Abstract. In context-aware systems, one of the main challenges is how to model context uncertainty well, since perceived context always yields uncertainty and ambiguity with consequential effect on the performance of contextaware system. To handle uncertainty in context-aware systems, firstly, we should know from where uncertainty comes. In this paper, we argue that uncertainty comes from several sources for each context level in context-aware systems. Based on this argument, we propose a hierarchical method to deal with context uncertainty in different levels, with the aim of reducing uncertainty and, developing a pattern to better understand this uncertainty. This will, in turn, helps in improving the system’s reliability.

1 Introduction Context plays an important role in ubiquitous computing systems. A lot of work has been done in trying to develop applications in ubiquitous computing environments context aware [1] [2] [3] [4] [5] [6]. One of the main challenges in context-aware systems is how to tackle context uncertainty well, since perceived context always yields uncertainty and ambiguity with consequential effect on the performance of context-aware systems [7] [8]. To handle context uncertainty well, first, we need to get the knowledge about the origins of uncertainty. Fig. 1 shows typical information flow in a context-aware ubiquitous system. In this architecture, we argue that information flow from lower level to higher level will inevitably generate uncertainty so that we should analyze it in different phases: Phase 1: Raw sensor data to low-level context (S-LC) The main factor that promotes uncertainty in S-LC is the often inherent inaccuracy and unreliability of many types of low-level sensors, which may lead to contradicting or substantially different reasoning about low-level context. In this phase, we propose to apply Dempster-Shafer Evidence Theory to handle uncertainty.

*

Prof. Youngkoo Lee is the corresponding author.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 741 – 746, 2006. © Springer-Verlag Berlin Heidelberg 2006

742

D. Guan et al.

Phase 2: Low-level context to high-level context (LC-HC) This phase is always referred to “Context Aggregator” or “Context Synthesizer”. In this phase, reasoning is always in the uncertain conditions. In this regard, we propose to use Bayesian Networks to infer high-level context.

Fig. 1. Information flow in context-aware systems

2 S-LC Uncertainty Sensor’s inherent uncertainty is the main source of this phase’s uncertainty. To handle this problem, sensor redundancy is usually applied. Sensor redundancy could improve system’s reliability, however, at the same time, it always generates sensor competition problem [9]. Sensor competition means the results of sensors representing the same measurement are competitive. Let us consider the following scenario: The sensors here are three RFIDs (A, B and C). The output of each RFID is a Boolean variable (true or false). True means a user (Bob) is in room, while, false means it isn’t. Suppose the three RFIDs’ outputs are different. Two RFIDs shows that Bob is room, while, another one shows Bob is not in room. This is a typical sensor competition problem. In the following part, we will describe how to solve it. 2.1 Dumpster-Shafer Theory The advantage of Dempster-Shafer theory is that it can work well even in the case of lack of knowledge of the complete probabilistic model required for other methods such as Bayesian inference. The Dempster-Shafer theory of evidence represents uncertainty in the form of belief functions. It is based on two ideas: the idea of obtaining degrees of belief for one question from subjective probabilities for a related question, and Dempster’s rule for combining such degrees of belief when they are based on independent items of evidence [10]. Dempster-Shafer theory starts by assuming a universe of discourse, also called a frame of discernment, which is a set of mutually exclusive alternatives (similar to a state space in probability), denoted by Ω . Any hypothesis A will refer to a subset of Ω for which observers can present evidence. The set of all possible subsets of Ω ,

Utilizing a Hierarchical Method to Deal with Uncertainty in Context-Aware Systems

including itself and the null set ∅ , is called a power set and designated as the power set consists of all possible hypothesis

743

2Ω . Thus,

2Ω ={ A1 , Ω , An }.

We can assign hypothesis to any of the three types of values. Basic probability numbers are a mapping of each hypothesis A to a value m(A) between 0 and 1, such that z

the basic probability number of the null set ∅ is

z

the sum

m(∅) =0, and

m( A1 ) + … + m( An ) =1.

The second type of assignment is a belief function that maps each hypothesis B to a value bel(B), between 0 and 1, define as

bel ( B) =

¦

m( Aj )

j: A j ⊂ B

(1)

The belief function represents the weight of evidence supporting B’s provability. The third type of assignment is a plausibility function that maps each hypothesis B to a value pls(B) between 0 and 1, defined as

pls ( B) =

¦

m( A j )

j: A j ∩ B ≠∅

(2)

The plausibility function is the weight of evidence that doesn’t refute B, and belief and plausibility are related by

pls ( B ) = 1 − bel ( B )

(3)

Where B is the hypothesis “not B”. Shafer showed that a one-to-one correspondence exists between basic probability numbers, belief, and plausibility, meaning that any of the three functions is sufficient for deriving the other two. Dempster’s Rule for combination is a procedure for combining independent pieces of evidence. Suppose m1 ( A) and m2 ( A) are the basic probability numbers from two independent observers. Dempster’s rule for combination consists of the orthogonal sum

m( B) = m1 ( B) ⊕ m2 ( B) =

¦

m1 ( Ai )m2 ( Aj )

¦

m1 ( Ai )m2 ( Aj )

i , j : Ai ∩ A j = B

(4)

i , j: Ai ∩ A j =∅

We can combine more than two belief functions pairwise in any order. 2.2 Using Dempster-Shafer Theory in Our Scenario

Ω = {T , T } , where T means Bob is in room, and T is the compliment event meaning Bob is not in the room. For this Ω , the power set has three

In our scenario,

744

D. Guan et al.

elements: hypothesis H={T} that Bob is in room; hypothesis H={ T } that Bob is not; and hypothesis U= Ω that Bob is in room or not. Suppose the probability of RFID A being trustworthy is α . If RFID A claims that Bob is in room, then its basic probability assignment will be

m1 ( H ) = α m1 ( H ) = 0

m1 (U ) = 1 − α

(5)

If RFID A claims that Bob in not in room, its basic probability assignment will be

m1 ( H ) = 0 m1 ( H ) = α m1 (U ) = 1 − α

(6)

Likewise, given prior probabilities for the trustworthiness of RFID B and C, we would construct their basic probability assignments m2 and m3 similarly. Next, the combined belief of A, B, and C in H is

bel ( H ) = m( H ) = m1 ( H ) ⊕ m2 ( H ) ⊕ m3 ( H )

(7)

Following Dempster’s rule for combination (Equation 4), We can compute this by combining any pair of arguments and then combining the result with the remaining third argument. For example, let’s first combine m1 and m2 :

1 [m1 ( H ) m2 ( H ) + m1 ( H ) m2 (U ) + m1 (U )m2 ( H )] K 1 m1 ( H ) ⊕ m2 ( H ) = [m1 ( H )m2 ( H ) + m1 ( H )m2 (U ) + m1 (U )m2 ( H )] K 1 m1 (U ) ⊕ m2 (U ) = m1 (U )m2 (U ) K m1 ( H ) ⊕ m2 ( H ) =

(8)

Where

K = m1 ( H )m2 ( H ) + m1 ( H )m2 (U ) + m1 (U )m2 ( H ) + m1 ( H )m2 ( H ) + m1 ( H )m2 (U ) + m1 (U )m2 ( H ) + m1 (U )m2 (U )

(9)

We can similarly combine the result from Equation 8 with m3 . To use Dempster-Shafer theory, A, B and C’s reliability must be known. We calculate initial reliability of each sensor by keeping a malcount for each of them and then comparing the malcounts to a set of thresholds; a malcount exceeding higher thresholds lowers the sensor’s reliability rating.

3 LC-HC Uncertainty In our paper, we propose to use Bayesian networks. Bayesian networks are a powerful way of handling uncertainty in reasoning. A Bayesian network is a directed acyclic graph of nodes. Nodes represent variables and arcs representing dependence relations

Utilizing a Hierarchical Method to Deal with Uncertainty in Context-Aware Systems

745

among variables. For example, if there is an arc from node A to another node B, then A is a parent of B. In Bayesian networks, for each node, the conditional probability on its parent-set is stored. These locally stored probabilities can be combined using the chain rule [11] to construct the overall joint probability distribution P. Two main merits of Bayesian networks drive us to adopt it. One is that Bayesian networks can handle incomplete data sets. This point is very important as context-aware system is always partially-observable. The other one is that using Bayesian networks, we can learn causal relationships between low-level context and high-level context. So if only one or two kinds of low-level are available, we can select the most important one by causal relationships so as to improve reasoning accuracy. Let’s see an example of Bayesian network. Considering the case in which the system needs to infer whether the user is having lunch or not. For inferring such an activity it is needed that we have some data about the location of the user, time of the day, and some data about his actions.. Through prior knowledge, we may construct a Bayesian network shown in Fig. 2. Then activity can be deduced from this network.

Fig. 2. Bayesian network for activity reasoning

4 Conclusions and Future Work In this paper, we propose a hierarchical method to deal with uncertainty in contextaware systems. Two different methods: Dempster-Shafer Theory and Bayesian Networks are applied in two different phases in our paper. We argue that this hierarchical method is feasible from the viewpoint of mathematical model. However, when using mathematical methods in real applications, many other aspects, such as hardware feasibility, time delay etc. should also be considered. The involve matter of these aspects in our current model is a topic of our future research. We are currently studying the application of different approaches on our test bed—CAMUS [12] and comparing their performance.

Acknowledgement The research was supported by the Driving Force Project for the Next Generation of Gyeonggi Provincial Government in Republic of Korea.

746

D. Guan et al.

References 1. Dey, A.nind K., Gregory D. A.: Towards a Better Understanding of Context and ContextAwareness. In Proc. of the 2000 Conference on Human Factors in Computing Systems, Netherlands (2000) 304-307 2. Anind, K. D., Daniel, S., Gregory, D. A.: A Conceptual Framework and a Toolkit for Supporting the Rapid Prototyping of Context-Aware Applications. In J. Human-Computer Interaction (HCI) Journal, London, UK (2001) 97-166 3. Jason, I. H., James, A. L.: An Infrastructure Approach to Context-Aware Computing. In J. Human-Computer Interaction (HCI), London, UK (2001) 287-303 4. Steven, A. S., Barry, B., Cadiz, JJ.: Interaction Issues in Context-Aware Interactive Environments. In J. Human-Computer Interaction (HCI), London, UK (2001) 363-378 5. Pascoe, J., et al.: Issues in Developing Context-Aware Computing. In Proc. of the International Symposium on Handheld and Ubiquitous Computing. Springer-Verlag, Heidelberg, Germany (1999) 208-221 6. Schilit, W. N.: A Context-Aware System Architecture for Mobile Distributed Computing. PhD Thesis. Columbia University (1995) 7. Satyanarayanan, M.: Pervasive Computing: Vision and Challenges. In IEEE Proc. of IEEE Personal Communications (2001) 10-17 8. Satyanarayanan, M.: Coping with Uncertainty. In J. of IEEE Pervasive Computing (2003) 2-3 9. Wilfried, E.: Sensor Fusion in Time-Triggered Systems. Ph. D. dissertation, Vienna University of Technology (2002) 10. Shafer, Glenn, Judea Pearl: Readings in Uncertain Reasoning. Morgan Kaufmann (1990) 11. Jensen, F.V.: Introduction to Bayesian Networks. University College London Press (1998) 12. Hung, N.Q., Shehzad, A., Kiani, S. L., Riaz, M., Lee, S.: A Unified Middleware Framework for Context Aware Ubiquitous Computing. In the 2004 IFIP International Conference on Embedded And Ubiquitous Computing, EUC2004, Japan (2004) 672-681

A Multi-focus Image Fusion Method Based on Image Information Features and the Artificial Neural Networks Lijian Zhou, Guangrong Ji, Changjiang Shi, Chen Feng, and Rui Nian College of Information Science and Engineering, Ocean University of China, Qingdao, 266003, China [email protected], [email protected], [email protected], [email protected], [email protected]

Abstract. An approach to multi-focus signal level image fusion is presented, which is based on image information features and the multiplayer feedforward neural networks (MLFN). Three feature parameters (spatial frequency, definition and information entropy) are chosen, which can reveal image visual information effectively. For assessing objectively fusion image quality, the root mean square error (RMSE) is taken as the assessment criterion. Experimental results show that the fusion method based on image visual features and the MLFN can effectively fuse the multi-focus image, and is superior to the wavelet and multiwavelets methods quantitatively.

1 Introduction With the availability of the multisensor data in many fields, such as remote sensing, medical imaging or machine vision, image fusion has emerged as a promising and important research area [1]. The goal of image fusion is to integrate complementary information from multisensor data such that the new images are more suitable for the purpose of human visual perception and computer-processing tasks such as segmentation, feature extraction, and object recognition. Depending on the representation format at which image information is processed, it is often divided into three categories, namely, signal level, feature level and decision level [2]. Higher feature level and decision level combine information in the form of feature descriptors and probabilistic variables extracted from the source images. In signal level fusion, the combination mechanism works directly on the pixels obtained at the sensor outputs. In this paper, we focus on signal level image fusion for multi-focus images. Multi-focus problem is that the objects of the image cannot be in focus at the same time due to the limited depth-of-focus of optical lenses in CCD devices. The aim of multi-focus images fusion is to achieve all objects “in focus” by combining a few of images of different focus, and to keep details as more possible as. In recent decades, the wavelets transform methods [1, 3-7] and the artificial neural networks methods [810] have become an attractive tool in image fusion field. In this paper, a novel approach to multi-focus image information features analysis and fusion method is proposed, which is based on the MLFN. The MLFN can preserve the image sub-block, which is in focus. The next sections are organized as follows. We analyze the feature parameters of the input images in section 2. In D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 747 – 752, 2006. © Springer-Verlag Berlin Heidelberg 2006

748

L. Zhou et al.

section 3, we give the algorithm of fusing images base on the MLFN. Experiments will be presented in section 4 and the last section gives some conclusions.

2 Image Information Feature Parameters In signal-level image fusion, the most important aspects of this process are the truthful representation of the saliency of input image features and the ability to correctly transfer this visual information into the fusion image. Therefore the key problem in fusion is which features can present image details and make a reliable comparison of information from different inputs effectively. By analyzing and comparing several parameters appropriate to multi-focus image, we choose the following three parameters as feature parameters, which can reveal the detail information of the image effectively. 2.1 Spatial Frequency (SF) Spatial frequency is used to measure the overall activity level of an image [11], which is an important indication to measure the image details. The higher the spatial frequency, the more the image details are. For an M × N image F window block, the spatial frequency is defined as

SF = RF 2 + CF 2 where

(1)

RF and CF are the row and column frequencies, respectively. RF =

1 M −1 M −1 [F (i, j ) − F (i, j − 1)]2 ¦¦ MN i =0 j =1

(2)

CF =

1 M −1 M −1 ¦¦ [F (i , j ) − F (i − 1, j )]2 MN j =0 i =1

(3)

㧔Mean Gradient㧕

2.2 Definition

It represents mini-detail difference and texture character.

∇G = where ∆xf (i, respectively.

1 MN

M

N

¦¦

∆xF (i, j ) 2 + ∆yF (i, j ) 2

(4)

i =1 j =1

j ) , ∆yf (i, j ) are the difference of F (i, j ) along x and y direction,

2.3 Information Entropy The information entropy of the image can reflect the image quality. For a single image, we may take the gray value of the pixels as independent samples, the gray distribution of this image is

A Multi-focus Image Fusion Method

p = {p(0), p(1),, p(i ),, p( L − 1)}

749

(5)

where p (i ) is the ratio between the number of the pixels whose gray value equals i and the total pixels of the image, L is the total gray level. According to the information theory, the information entropy of the image is defined as follows. N

I = −¦ p( i ) log( p( i ))

(6)

i =0

where N is 0~255 in general.

3 The Multi-focus Images Fusion Based on the MLFN 3.1 The Fusion Algorithm First, we extract three feature parameters to get the feature vector map of the original image to get the feature vector. Second, combine the feature vector to constitute the input samples, and train the MLFN. Third, test the trained neural networks. At last, verify the fusion result. In detail, the algorithm consists of the following steps: (1) Extract three feature parameters. Decompose the source image Fi into M×N blocks. Denote the jth image block by Fij . From each image block, calculate three feature parameters. Denote the feature vector for

(

)

Fij by SFFij , MG Fij , IE Fij , which present spatial frequency, definition

and information entropy, respectively. And they are normalized to the range [0, 1]. (2) Train the MLFN. The input vector of the MLFN is the difference vector

(

The input vector = SFF1 j − SFF2 j , MGF1 j − MGF2 j , IEF1 j − IEF2 j

)

(7)

The expectation output is labeled according to

1 The exp ectation outputij᧹® ¯0

if Fij is the clearer otherwise

(8)

The detail train process can refer to reference [12]. (3) Perform testing of the trained neural networks. The jth block Z j of the fused image is then constructed as

Z j᧹Fij if the output > 0.5

(9)

(4) Verify the fusion result. If the neural network decides that a particular block comes from Fij , but with the majority of its surrounding blocks from Fkj , this block switched to Fkj .

750

L. Zhou et al.

3.2 The Design and Algorithm of the MLFN Many artificial neural network models have been proposed for tackling a diverse range of problems, including pattern classification, function approximation and regression. The fusion problem we examine here can be considered as a classification problem. In this paper, the traditional MLFN and the training algorithm of the networks [12] are used. For simplicity, we consider the processing of just two source images. Moreover, we assume that the source images have been registered. The number of the input nodes is the number of the extracted image information features. The output node is 1. In general, the number of hidden layer nodes is one bigger than the number of the input nodes. So the node number of the every layer is 3, 4 and 1. The initial values can be obtained by the function of Matlab function base.

4 Simulation Experiments and Results Figures 1(a) shows a 256-level image of size 256×256 with good focus everywhere. Figures 1(b) and (c) show a pair of test images, which are blurred by a Gaussian of radius 1.5 in the left part and the right part of the Figures 1(a), respectively. Only one man in either image is in focus. We trained the MLFN with the different size subblock samples (4×4, 8×8, 16×16) because it is also important how to choose the size of the image sub-block. According to the three information features of the couple image, twenty pair of samples is selected from two source images. After training the MLFN, we fuse the overall source images, the results are shown in figure2 (a), (b) and (c). For comparison, we fuse them using the scalar wavelet transform method and the multiwavelets transform method [1, 3-7] with the same fusion scheme, respectively. We use the Daubechies2 (db2) wavelet and the Chuilian (CL) multiwavelets in the experiments. The fusion results are shown in figure 2(d) and (e). From figure 2, the results are very similar visually. For assessing objectively fusion image quality, the root mean square error (RMSE) is taken as the assessment criterion. We calculate the RMSE between the fusion image and the original clarity image, as shown in Table 1. The smaller the RMSE is, the better the fusion effect is. Table 1 shows the fusion method in this paper is

(a)

(b)

(c)

Fig. 1. Original image and input source images (a) Original clarity image. (b) Input source image 1 (the left part is in focus). (c) Input source image 2 (the right part is in focus).

A Multi-focus Image Fusion Method

(a)

(b)

(d)

751

(c)

(e)

Fig. 2. Fusion results (a) Fusion result based on the MLFN (the size of the sub-block is 4×4). (b) Fusion result based on the MLFN (the size of the sub-block is 8×8). (c) Fusion result based on the MLFN (the size of the sub-block is 16×16). (d) Fusion result based on db2 wavelet. (e) Fusion result based on CL multiwavelets. Table 1. Objective performance of multi-focus image fusion approaches

RMSE

DWT DBSS (2.2) 0.76

DMWT CL 0.62

MLFN (4×4) 0.56

MLFN (8×8) 0.44

MLFN (16×16) 0.54

superior to the other two methods and the fusion results of the 8×8 sub-block is better than ones of the 4×4 sub-block and the 16×16 sub-block with the fusion method in this paper.

5 Conclusions This paper addressed two issues: 1) an analysis of image information features and 2) the image fusion method based on MLFN. By analyzing the information characteristic of multi-focus image, we found the spatial frequency, definition and information entropy of the image can reflect the image detail information very effectively. So we combine these feature parameters through the MLFN to fusion the source images. Experiments show that the fusion method proposed in this paper outperforms the scalar wavelet and multiwavelets transform method. In addition, we do some other experiments with other images using these method, the results is the same as the results of this paper.

752

L. Zhou et al.

Acknowledgments The National Natural Science Foundation of China (60572064) fully supported this research.

References 1. Manjunnath, H. Li., Mitra, S. K.: Multisensor Image Fusion Using the Wavelet Transform. Graphical Models and Image Processing. Vol. 57, 3 (1995) 235–245 2. Klein, L.: Sensor and Data Fusion Concept and Applications. Bellingham, SPIE, 1999 3. Valdimir, S. P.: Gradient-Based Multiresolution Image Fusion. IEEE Transactions on Image processing. Vol.13, 2 (2004) 228–237 4. Youcef chibani, Amrane Houacine: Redundant versus Orthogonal Wavelet Decomposition for Multisensor Image Fusion. Pattern Recognition. 36 (2003) 879–887 5. Zhang Z., Blum, R.S.: A Categorization of Multiscale Decomposition-based Image Fusion schemes with a Performance Study for a Digital Camera Application. Proc. IEEE. Vol. 87, 8 (1999) 1315–1326 6. Wen, C.Y., Chen, J. K.: Multi-resolution Image Fusion Technique and its Application to Forensic Science. Forensic Science. Vol. 140, (2003) 217–232 7. Pajares G, Mauel J.C.: A Wavelet-based Image Fusion Tutorial. Pattern Recognition. 37 (2004) 1855–1872 8. Shutao Li, JamesT.Kwok, Yaonan Wang.: Multifocus Image Fusion using Artificial neural networks. Pattern Recognition. 23 (2002) 985–987 9. Liu Ningning, etc.: An information Integration Method of Texture and Gray Based Neural Network. Journal of Software. 6 (2002) 575–579 10. Zhaoli Zhang, Shenghe Sun.: Image Data Fusion Algorithm Based on the One-dimensional Self-organizing Neural Networks. ACTA elctronica SINCA. Issue 9, (2000) 74–77 11. Eskicioglu, A.M., Fisher, P.S.: Image Quality Measure and their Performance. IEEE Trans. Comm. 12 (1995) 2959–2965 12. Xingjun Yang, Junli zheng : Artificial Neural network and Blind Processing. TheTsinghua Publication House. (2003) 31–40

A Multilayer Topic-Map-Based Model Used for Document Resources Organization Jiangning Wu, Haiyan Tian, and Guangfei Yang Institute of Systems Engineering, Dalian University of Technology, Dalian, 116024, China [email protected], [email protected]

Abstract. Nowadays with the worldwide prevalence of Internet-connected computers, web resources ad hoc electronic document resources become flooded. How to organize an overwhelming amount of information and hence make an efficient retrieval from the Internet is always the hot topic in knowledge management as well as information retrieval. The Topic Map (TM) offers a framework for modeling and representing knowledge resources and plays a role as “GPS” in the information universe. It is hereby introduced into our study to organize document resources. By employing the concept of TM, a multilayer Topic-Map-based model is proposed in this paper. It organizes the concepts generalized from domain document contents in a hierarchical way and leads the searching process in a narrower topic space. The proposed model has been used to Chinese document organization, which makes knowledge navigating more efficiently.

1 Introduction With the massive amount of information on line and lack of automated co-ordination in its organization, extracting and making future use of this knowledge is becoming more and more problematic. The knowledge is usually available electronically in unstructured or semi-structured documents. Searching useful information from such documents is often inefficient and relies heavily upon their organizational types. In order to improve the efficiency of information retrieval, we need to organize the unstructured or semi-structured information in a good way. Topic Map (TM) is a new technology for knowledge organization, which provides an approach that marries the best of several technologies, including those of traditional indexing, library science etc., with advanced techniques of linking and addressing [1]. Dubbed “GPS of the information universe”, TM is destined to provide powerful new ways of navigating large and interconnected knowledge base. According to traits of TM in knowledge organization, we, in the research, try applying it to document organization, which is one of the important branches of knowledge organization, to realize effective organization and retrieval of electronic documents. This paper is organized as follows. In Section 2, we firstly introduce some basic concepts about Topic Map and then present a multilayer TM-based Document Organization Model (TMDOM) which is proposed based on the Topic Map’s richly cross-linked structure and capabilities of topics used to group together objects that D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 753 – 758, 2006. © Springer-Verlag Berlin Heidelberg 2006

754

J. Wu, H. Tian, and G. Yang

relate to a single abstract concept. In Sections 3 and 4, the creation process of TMDOM, including document representation, document clustering and TM creation, and the corresponding example are presented in detail. Finally, we conclude our research in Section 5.

2 A Multilayer TM-Based Document Organization Model 2.1 Topic Map: A New Technology for Knowledge Organization When designing the knowledge structure for a certain domain, TM is helpful as it creates the conceptual structure to navigate the large knowledge base. The essential idea behind TM is to organize the out-of-order information by domain ontologies in order to exhibit the semantic structure of resources definitely. It is a highly flexible and powerful standard for the organization and representation of knowledge and a tool for providing access to that knowledge. Main contents of TM can be summarized in the following three parts: a model, two spaces and three concepts. A model is the TM model. Two spaces are topic space and resource space, where the topic space is the knowledge map and the resource space is a collection of information resources. Three concepts are topic, association and occurrence. A topic is a subject in the domain of discourse. Associations are relations between topics, and occurrences, which may be external or internal to the topic map, are information resources relevant to topics. Topics and resources are connected by occurrences. Topics and associations can be grouped according to their individual types that are called topic types and association types. Topics, topic types and occurrences help us to organize information and design indexes; associations and association types help us to describe relations between topics. TM is actually a kind of knowledge network, in which each node can be seen as a topic and links between nodes are relations between topics. We can organize abstract knowledge as a knowledge map to create knowledge structure for flooded information. 2.2 A Proposed TMDOM Model Limitation of traditional methods for document organization lies in that they couldn’t incarnate relations between documents or topics clearly, whereas TM makes it possible to define arbitrarily complex knowledge structures by introducing “topics”, “associations”, and “occurrences”. So, in this paper, we propose a TM-based Document Organization Model named TMDOM for document resources organization. Topic space of TMDOM consists of n layers. On the n-layer, the topics are concerned with some broad research fields, where the associations show the relationships between two topics. From (n-1)-layer to 2-layer, topics on each layer are subtopics or small categories of topics corresponding to their upper layers. On the 1-layer, topics are names of documents related to their corresponding parent topics on the 2-layer. Associations between any two documents are their similar relationships. Occurrences are documents themselves or details that can be found in “resource” by links. From descriptions above, we can see that all concepts like topic, association and occurrence etc. in this model are simplified as several types. For example, there are

A Multilayer Topic-Map-Based Model Used for Document Resources Organization

755

only two kinds of topics: titles of documents and domain ontologies. All associations are grouped into three types: is sub-topic of, is super-topic of, and is related to. All occurrences are linked to locations of documents stored in the resource space. TMDOM model arranges all concepts extracted from the given domain document contents in different hierarchical levels, by which knowledge can be organized in the interconnected form and searching can then be performed in the narrower topic space than in the resource space. According to a series of related topics returned, we can easily find information we are looking for through the links between topics and their occurrences.

3 Creation of the Proposed TMDOM Model In view that the topic space of the proposed model is hierarchical, we need to identify which cluster each document belongs to and which layer the cluster should be on. In this study, we use hierarchical clustering algorithms to group documents at first and then define the topic for each cluster manually. The creation process can be summarized in the following three modules: document representation, document clustering and TM creation. 3.1 Document Representation In this module, we should fulfill two tasks: First, documents have to be transformed into adequate computer-readable representations to simplify search and comparison of documents; second, the most suitable features for representing the documents should be selected and weighed depending on their importance. Once appropriate features are selected to represent the documents, the documents are then transformed to the feature space with much reduced size which makes the further analysis, organization and clustering more efficiently. By using the weighting terms, document di can be represented as di= {(ti1, wi1), (ti2, wi2) … (tik, wik)…}, where tik denotes a feature term in di and wik is the weight of tik in di, M is the number of documents in the collection and N is the number of feature terms that we have selected. 3.2 Document Clustering The whole clustering process is multistage and it is described as follows. 1. The first clustering process: Compute the similarity between each pair of documents and select the two documents between which the similarity is maximal, then group them into one cluster and represent the cluster according to frequencies of feature terms in the two documents. Then go on with the agglomerate clustering process and repeat it until the maximal similarity is smaller than a predefined threshold. After the first clustering process, we should determine what documents each final cluster contains. At last we will get the results that could be represented as L2Cg={(tg1, wg1), (tg2, wg2) … (tgk, wgk) …} (g=1, 2, … G; k=1, 2, … N) and d1, d2, …di… L2Cg, where L2Cg means one cluster on the 2-layer and G is the number of clusters on the 2-layer. L2Cg={(tg1, wg1), (tg2, wg2) … (tgk, wgk) …} means that cluster L2Cg could be represented in this form and d1, d2, … di … L2Cg means cluster L2Cg includes seeds: d1, d2, …… di ……

756

J. Wu, H. Tian, and G. Yang

2. The second clustering process: It is almost the same as the first clustering process just mentioned. After the second clustering process, we will get results represented as L3Ch= {(th1, wh1), (th2, wh2) … (thk, whk)…} (h=1, 2, … H; k=1, 2, … N) and L2C1, L2C2 … L2Cg … L3Ch, where L3Ch means one cluster on the 3-layer and meanings of other expressions are similar to those described above, H is the number of clusters on the 3-layer. Go on for the next clustering process again and again until the final clusters on the highest layer are what we initially expect. Then the whole clustering process is over. 3.3 TM Creation In this module we should determine associations between topics and create the TM. There are mainly two kinds of topics: one refers to the titles of documents and another refers to the clusters that we get from the clustering process. For these clusters, we only have to determine the topics according to keywords and their frequencies in the corresponding clusters with the help of domain experts. This process is similar to ontology building so we don’t give particular descriptions here. TM creation can be summarized in the following four steps. 1. Topics: Topics of 1-layer are titles of documents and we express them as L1Ti that means one topic on the 1-layer. Here each topic can be represented the same as its document. For example, L1Ti= di = {(ti1, wi1), (ti2, wi2) … (tik, wik)…} (i = 1, 2, … M; k = 1, 2, … N). From 2-layer to n-layer we have to determine topics for all clusters one by one manually. Meanwhile each topic can be represented the same as its cluster. For example, one topic on the 2-layer L2Tg= L2Cg = {(tg1, wg1), (tg2, wg2) … (tgk, wgk)…} (g=1, 2, … G; k=1, 2, … N). 2. Associations between topics on the same layer: Compute similarities between any two topics on the same layer, e.g., Sim (L1T8, L1T12), Sim (L2T3, L2T6), etc. 3. Associations between topics on adjacent two layers: For each topic on layers from 2 to n, find all its subtopics and compute similarities between the topic and its subtopics. Clusters and topics on each layer are one-one so seeds of each cluster are also subtopics of the current topic. For example, L3C8 includes seeds: L2C2, L2C6 so subtopics of L3T8 are: L2T2, L2T6. So we should compute similarities between L3T8 and L2T2, L2T6. 4. Occurrences: Link topics on the 1-layer, namely documents on the 1-layer, with their corresponding occurrences in resource space.

4 An Example for Creating a 3-Layer Topic Map In this paper, we select 200 documents related to topics “text clustering”, “text categorization”, “feature extraction”, “Chinese word segmentation” and “automatic abstracting” from China Journal Full-text Database (CJFD) for a simple experiment. We adopt extended Boolean model and define a new similarity calculation formula for document clustering. Suppose two documents be di= {(ti1, wi1) … (tik, wik)…} and dj = {(tj1, wj1) … (tjk, wjk)…} (i, j=1, 2, … 200; k=1, 2, … 421), then their similarities can be calculated by the following formula:

A Multilayer Topic-Map-Based Model Used for Document Resources Organization

757

N

¦(w

ik

− w jk ) 2

.

k =1 N

Sim ( d i , d j ) = 1 − (

¦

N

wik 2 ) × (

k =1

¦

w jk 2 )

(1)

k =1

where meanings of all variables are the same as described previously and here N=421. At last, we get 5 clusters on the 2-layer and 1 cluster on the 3-layer. By determining topics and their associations, the final Topic Map can be created as shown in Fig. 1. text mining

3-layer

2-layer

text clustering

text categorization

Chinese word segmentation

feature extraction

automatic abstracting

Topic Space 1-layer

Resource Space

Fig. 1. Structure of TMDOM based on the small-sized sample set

In Fig. 1, there are three layers of topics and topics on the 1-layer are titles of documents that we don’t indicate here because of space limitation and associations between them are also omitted. From the experimental results, we find that some documents are linked to topic “text mining” directly and some documents couldn’t be grouped into any cluster. This is because there may be some exceptions in the feature extraction process or their similarities with each topic may be too small. Anyhow, through TM in Fig. 1 we can see the conceptual structure of domain “text mining” clearly. With such a TM users could search for documents they need conveniently and this makes effective organization and retrieval of document resources come true.

5 Conclusion According to the concepts of TM that was firstly standardized by ISO in 2000, a multilayer TM-based document organization model is proposed in this paper in order to improve the efficiency of document organization and document retrieval. In addition, we introduce the creation process of TM that could be summarized in three modules: document representation, document clustering and TM creation. For clearly describing, an example for creating a 3-layer topic map is given in the last part of this paper.

758

J. Wu, H. Tian, and G. Yang

Due to the limitation of the space, we just present the key points in our proposed model and don’t list the experimental results here, which will be found in the following papers related to this research topic.

Acknowledgement This work is supported by Natural Science Foundation of China (NSFC) under grant 70431001 and many thanks to the viewers for their helpful comments.

References 1. What’s in a Topic Map? http://www.webreference.com/xml/column77/2.html 2. Nils Pharo: Topic Maps–Knowledge Organization Seen from the Perspective of Computer Scientists:http://www.elag2004.no/papers/Pharo.pdf 3. Jianxia Ma:A Study of Application of Topic Map in Knowledge Organization. New Technology of Library and Information Service, No. 7. (2004) 11–16 4. Steve Pepper: The TAO of Topic Maps–Finding the Way in the Age of Infoglut, http://www.ontopia.net/topicmaps/materials/tao.html 5. http://www.gca.org/papers/xmleurope2000/papers/s29-04.html 6. http://www.isgmlug.org/n3-4/n3-4-15. html 7. Feng Dai:Research of Knowledge Representation Technology Based on Topic Map, Journal of South-Central University for Nationalities, Vol. 23. No. 1. (2004) 84–87 8. http://www.gca. org/papers/xmleurope2000/papers/s22-02.html 9. Richard T. Freeman, Hujun Yin: Tree View Self-organization of Web Content, Neurocomputing, Vol. 63. (2005) 415–446 10. Zechun Han:The Theme Map-An Effective Method for Organizing and Revealing Knowledge. Journal of Xinzhou Teachers University, Vol. 21. No. 4. (2005) 122–124 11. Alexander Sigel:XML Topic Maps in Knowledge Organization. http://kpeer.wim.uni-koeln. de/~sigel/veroeff/TU-Eindhoven2004/FINAL/Eindhoven2004_Abstract.pdf

A New Clustering Algorithm for Time Series Analysis Jianping Zeng and Donghui Guo Department of Physics, Xiamen University, Fujian 361005, P. R. China [email protected], [email protected]

Abstract. Conventional model-based clustering algorithms for time series data are limited in improving the clustering performance and also their computation complexity is high. In order to tackle this problem, a new model-based clustering algorithm with a certainty factor is proposed to evaluate the certainty degree of time series data being in a cluster. The new algorithm can be used to show a reasonable result for time series data clustering and reduce the computation complexity greatly. Performance of the algorithm is verified by the experiments on both synthetic data and real data.

1 Introduction Model-based clustering algorithms for time series are the ones that attempt to learn an expressive model for the patterns from the series data and then perform clustering either in model space or in data space by applying distance or similaritybased clustering algorithms [1],[2]. Model-based clustering of time series appears to achieve better performance and it has been shown as a promising method for time series analysis [3]. HMM (Hidden Markov Model) has been introduced into the model-based clustering algorithm to fit the sequences in recent years[2],[3],[4]. Generally, HMM-based clustering can be classified into two categories, i.e., clustering with no model merging [2] and clustering with model merging [1],[3]. For convenience, we refer the two algorithms as (HMM-HC1) and (HMM-HC2) respectively. Although the first approach is simple, it is not suitable to distinguish between more similar time series. However, the second one can get better performance when the sequences are more similar to each other, but its computation complexity can be very high. A new algorithm, CBCFM (Clustering algorithm for time series analysis Based on Certainty Factor Model) based on the certainty factor model is proposed to deal with the problem of existing HMM-based clustering algorithm.

2 Clustering Algorithm Based on Certainty Factor Model The CBCFM algorithm, with SN sequences as input, is shown in table 1. And the algorithm is present in detail in the section. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 759 – 764, 2006. © Springer-Verlag Berlin Heidelberg 2006

760

J. Zeng and D. Guo Table 1. The main step in CBCFM algorithm

1. Create SN HMMs for each sequence 2. Compute SN similarity matrixes 3. Compute certainty factor matrix 4. Hierarchical agglomerative clustering and select the best result 2.1 Description of Certainty Factor Model Certainty factor was first proposed in MYCIN system to denote the certainty of inference rules [5]. Also computation of certainty factor of rules can be performed under different circumstances [6]. Here, we bring the concept of certainty factor to evaluate the certainty degree that sequences are grouped into one cluster. We denote E as the facts in inference, and H as the inference results. And the certainty factors of E,H, and the rule “if E then H” are CF(E), CF(H) and CF(H,E) respectively. Then the computation formulas of certainty factor for inference rule are as follows [6].

CF ( H ) = CF ( H , E ) × max{0, CF ( E )}

(1)

Especially, when E = E1 and E2 and … and En, then

CF ( E ) = min{CF ( E1 ), CF ( E2 ), , CF ( En )}

(2)

When E = E1 or E2 or … or En, then

CF ( E ) = max{CF ( E1 ), CF ( E2 ), , CF ( En )}

(3)

When two inference rules have the same inference result but different certainty factor, for example, there are two rules: if E1 then H CF(H,E1), and if E1 then H CF(H,E2), then the final certainty factor can be calculated by the formula as follows,

CF12 ( H ) = CF1 (H ) + CF2 ( H ) − CF1 (H ) × CF2 (H )

(4)

2.2 Algorithm Description Let SX={X1,X2,…,XSN}denote a set of SN time series. First, SN HMMs λi (i=1…SN) are learnt from each of the SN sequences using Baum-Welch algorithm [7]. More detail description of HMM can be found in [7]. Compute Similarity Matrixes (1) Compute P(Xi|λx), i=1 to SN, the probability of the sequence Xi, given the model λx. This can be get through forward algorithm [7]. Then a vector, VP={vp1,vp2,…,vpSN}, here, vpi=logP(Xi|λx), describes how well the given model λx matches a given observation sequence Xi. (2) Transform the vector VP into the similarity matrix VM x = [vmij ] indicating the x

similarity between each sequence by the following formula.

A New Clustering Algorithm for Time Series Analysis

vp j / vpi , j > i ° ,j =i vm = ®1 °vp / vp , i > j j ¯ i x ij

761

(5)

Where, 1 ≤ i, j ≤ SN . Then, by applying SN models to step (1) and (2), SN similarity matrixes VMx, x=1…SN, can be got. Compute the Certainty Factor Matrix First, we define the inference rules for a give pair sequences (Xk,Xl). Generally, there are three types of rules, see table 2 for reference. Table 2. Rules for building certainty factor matirx

RULE1 RULE2 RULE3

IF Sim(λk,Xk,Xl) THEN C(Xk,Xl) cf1 IF Sim(λl,Xk,Xl) THEN C(Xk,Xl) cf2 IF Sim(λi,Xk,Xl) THEN C(Xk,Xl) cf i3

SN

Where, cf1+cf2+

¦

cfi 3 =1 and i=1…SN, k=1…SN, l=1…SN, ki, li, and

i =1,i ≠ k ,i ≠ l

Sim(λi, Xk,Xl) means the similarity between Xk and Xl under the evaluation of λi. C(Xk,Xl) means Xk and Xl are in the same cluster. cf1, cf2 and cf i3 are the certainty factor of the rule. Next, we apply the formula (1) and (4) to these rules to compute the certainty factor matrix FM = [fmkl], 1k,lSN. That is, we combine the RULE1 and RULE2, and the certainty factor, denoted as cf12, is as follows, k

l

cf12 = vmkl × cf1 + vmlk × cf 2

− (vmklk × cf1 ) × (vmlkl × cf 2 )

(6)

Then, we combine RULE3 by recursively applying the following formula,

cf i ( i +1) = vmkli × cfi 3 + vmkli +1 × cf( i +1) 3 − (vmkli × cfi 3 ) × (vmkli +1 × cf( i +1) 3 )

(7)

where i =1…SN, and ik, il. Then, finally we can then get the final certainty degree fmkl for Xk and Xl. It is reasonable to suppose that cf1=cf2 and the value of all cfi3 (i=3…SN) are equal. Then we can set these parameters as follows,

cf1 = cf2 =cf and cfi3 = (1-2*cf) / (SN-1), (i=3…SN) Where, cf is a parameter that is related to the dataset. And it can be estimated by CF(H) with setting CF(E)=1 in formula (1). So, if the model λi is correctly learnt from Xi, then Xi and Xj can be considered in a same cluster with a high certainty factor. Then, cf should be related to the probability that the HMM can be correctly learnt from a sequence.

762

J. Zeng and D. Guo

Hierarchical Agglomerative Clustering and Select the Best Clusters. The process is done by the following four steps.

(1)

(2)

Perform a hierarchical agglomerative clustering. However, we compute the certainty factor for the nodes in the hierarchical tree. It is calculated by formula (1)-(4). And we denote the clustering result of each agglomeration as C={C1,C2,…Cm}, where m is the number of clusters. Compute the valid index for the cluster C, using the following certainty degree evaluation index, which is similar to Dunn's Index [8]: certIndex = max { max { 1≤i ≤ m 1≤ j ≤ m , j ≠ i

δ (C i , C j )

min {∆ (C )}

1≤ k ≤ m

where,

}}

(8)

k

∆ ( S ) = min { fmxy } is the certainty degree of the cluster S, and x , y∈S

δ ( S , T ) = max { fmxy } is the certainty degree for the separation of cluster S and T. x∈S , y∈T

(3) (4)

Repeat 1-2 until one cluster remains. Select the best clustering result

From (8), we can see that the smaller the certIndex is, the higher certainty degree in the cluster and less certainty degree between any two clusters. So, we can simply select the clustering result that makes the certIndex minimal.

3 Experiments The rate of correctness (RC) which is defined as the number of correctly clustered sequences divided by total sequences number, is usually used to evaluate the performance of clustering algorithm. Two kinds of experiment were done in the paper. One was based on synthetic data and the other one was done on real world data. On synthetic data, we carried out three experiments. Each experiment (Exp-1, Exp-2, Exp-3, Exp-4) was done with different number of sequences, length of sequence and distance between models. Here we use Kullback Leibler[7] as the distance measurement for two models. In Exp-1, 4 datasets were generated to evaluate the effect of model distance on the performance of clustering algorithm. Exp-2 checked the time complexity of the algorithms as total sequences number increases. Exp-3 checked the effect of sequence length on clustering. The original HMM used to generate the four datasets is with the number of hidden states 4 and the number of visible states 10 and the probability distribution matrix is randomly set. In EXP-4 testing on time series data from real word, we used an intrusion detection test dataset provided by Schonlau et al (http://www.schonlau.net). This dataset was collected from keyboard sequential commands on a UNIX platform issued by each of 70 users. Each user is represented by 15,000 commands, collected over time from a few days to a few months. We randomly selected 2,3,4,5,6 users and selected 24, 30, 39, 44 and 59 sequences in our clustering experiment respectively. To avoid the randomness, each of the tests were done ten times and the average results were achieved.

A New Clustering Algorithm for Time Series Analysis

763

The result of RC under different model distance and different sequence length are shown in table 3 and table 4. And we can see that as the model distance decreases, the RC of CBCFM is higher than that of the other two algorithms. When the length decreases, the RC also decreases, but CBCFM can get a good result than others. The result of EXP-4 is shown in table 5. Because the intrusion detection test dataset may not be fit by a HMM perfectly, and this may cause the performance to become worse than test result on synthetic data. However, the performance of CBCFM is still better than HMM-HC2. Table 3. Rate of Correctness and model distance

Model Distance 0.02 0.04 0.06 0.08 0.10

HMM-HC1 0.56 0.61 0.64 0.71 0.75

HMM-HC2 0.60 0.69 0.79 0.90 1.0

CBCFM 0.62 0.75 0.95 0.99 1.0

Table 4. Rate of Correctness and sequence length

Length of Sequences 40 60 80 100

HMM-HC1 0.55 0.61 0.65 0.72

HMM-HC2 0.68 0.72 0.78 0.95

CBCFM 0.86 0.90 0.92 1.0

EXP-2 was conducted on DELL PC with PIII, 512M RAM and Windows 2000 OS, the result is shown in table 6. It is clearly that as the number of sequence increases, the complexity of HMM-HC2 increases quickly, while the CBCFM and HMM-HC1 spend less time and shows lower time consumption. From these experiments, we can see that the clustering performance of CBCFM is the best in despite of the similarity in time series data increases. Furthermore, its computation complexity can be kept in a lower level. Table 5. Rate of correctness and the sequences number

Number of Clusters/Sequences 2/24 3/30 4/39 5/44 6/59

HMM-HC2

CBCFM

0.92 0.82 0.69 0.65 0.63

1.0 0.88 0.85 0.82 0.796

764

J. Zeng and D. Guo Table 6. Time (seconds) consumption and the sequences number

Number of Sequences 15 30 45 60

HMM-HC1

HMM-HC2

CBCFM

5 8 12 23

77 171 398 569

13 25 38 50

4 Conclusion A new HMM-based clustering algorithm CBCFM incorporated with certainty factor model has been proposed in this paper. Experiments show that CBCFM can get better performance even if the model distance or the length of sequence becomes smaller. By avoiding merging models in the process of clustering, the CBCFM algorithm can greatly reduce the computation complexity.

References 1. Zhong S., Ghosh, J.: A Unified Framework for Model-based Clustering. Journal of Machine Learning Research, 4 (2003) 1001-1037 2. Bicego, M., Murino, V., Figueiredo, M. A. T.: Similarity-based Clustering of Sequences Using Hidden Markov Models. Machine Learning and Data Mining, 12 (2003) 86-95 3. Zhong S.: Probabilistic Model-based Clustering of Complex Data. The University of Texas at Austin. Ph.D. Thesis, USA (2003) 4. Tobias, S., Christian, D., Stefan, W.: Mining the Web with Active Hidden Markov Models. Proceedings of IEEE International Conference on Data Mining, (2001) 645-646 5. Buchanan, G., Shortliffe, E.: Rule-Based Expert Systems:The MYCIN Experiments of the Stanford Heuristic Programming Project. Addison-Wesley Publishing Co., New York (1984) 90-104 6. Shao, J. L., Zhang, J., Wang, C. H.: Fundament of Artificial Intelligence. Electrics industry press of China, Beijing (2003) 97-99 7. Rabiner, L. R.: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of. IEEE. 77 (2) (1989) 257 – 286 8. Ujjwal, M., Sanghamitra, B.: Performance Evaluation of Some Clustering Algorithms and Validity Indices. IEEE Transaction on pattern analysis and machine intelligence, 12 (2002) 1650-1654

A New IHS-WT Image Fusion Method Based on Weighted Regional Features 2

Jin Wu1,2, Bingkun Yin1, Jian Liu , and Jinwen Tian 1

2

College of Information Science and Engineering, Wuhan University of Science and Technology,430081 Wuhan, P.R. China [email protected] 2 HuaZhong University of Science and Technology, Image Information and Intelligence Control Laboratory of the Ministry of Education 430074 Wuhan, P.R. China [email protected]

Abstract. The goal of image fusion is to create new images that are more suitable for the purposes of human visual perception, machine vision, object detection and target recognition. This paper preserves a new IHS-WT method based on weighted regional features for the fusion of a high-resolution panchromatic image and a low-resolution, multi-spectral image. Firstly, the multi-spectral image is transformed into IHS component. Secondly, the histogram-matched panchromatic (New PAN) image and intensity component (I) are decomposed into wavelet coefficients respectively. Thirdly, the new intensity component can be obtained by fused the wavelet coefficient data of New PAN and I through adaptive weights based on window region features. Finally, the new intensity, hue, and saturation components are transformed back to RGB. A comparative analysis is carried out against other existing strategies. The results show that the proposed method can achieve better performance in combing and preserving spectral-spatial information for the test images.

1 Introduction Image fusion combines different information from images of different sources into a composite image with more accurate description. It can be used to improve the reliability of the computational vision task or to provide a human observer with a deeper insight about the nature of observed data. The current definition of image fusion can take place at the pixel, local feature and decision level. In this paper, our task is merging a low-resolution, multi-spectral image with a high-resolution panchromatic image so as to achieve optimal resolution in the spatial and spectral domains. Many image fusion methods for the purpose have been proposed, however, most methods such as IHS [1], PCA [2], and HPF [3], distort the spectral characteristics of the multi-spectral image. Although the Wavelet Transform method (WT) can achieve better spectral and spatial quality than the IHS, PCA and HPF [4], but it may suffer from ringing because of giving up the low frequency component of the panchromatic image completely. Consequently, we proposed a new D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 765 – 770, 2006. © Springer-Verlag Berlin Heidelberg 2006

766

J. Wu et al.

image fusion algorithm based on IHS, WT and the regional features. The subjective visual effect and the objective statistical results indicate that this method is useful for image fusion.

2 IHS Transform and Wavelet Transform Fusion Method 2.1 IHS Transform Fusion Method The widespread use of the IHS transform to merge images is based on its ability to separate the spectral information of an RGB composition in its two components H and S, while isolating most of the spatial information in the I component [5]. Generally, the IHS transformation includes following steps: 1) Transformation of three multi-spectral bands from RGB to IHS space. 2) Substitution of the intensity image (I) by the Pan image. 3) Inverse transformation of the hue, saturation, and the replacement intensity to RGB space. 2.2 Classical Wavelet Fusion Method Wavelet transform is a global mathematical analysis. It is based on a multi-resolution analysis, and can easily detect local features in signal processing. As a result, wavelet is very suitable for data fusion. The classical wavelet-based image fusion can be performed in the following steps: 1) The two input images are decomposed separately into approximate coefficients and detailed coefficients. 2) The high detailed coefficients of the multi-spectral image are replaced with those of the panchromatic image. 3) The new wavelet coefficients of the multi-spectral image are transformed with the IWT to obtain the fused multi-spectral image. The fusion image may suffer from ringing because it discards low frequency component of the panchromatic image completely. Therefore, a new wavelet transform method should be applied. In this paper, we use the adaptive weights based on window region features such as variance and energy to reconstruct the new coefficients.

3 A New IHS-WT Fusion Method In the IHS-WT method, the multi-spectral image is first transformed from RGB into the IHS space. Making the hue and saturation components unchanged, we can merge the panchromatic image with the intensity component in order to obtain the fused intensity component I ' , and then inverse transforming back to RGB. By means of the steps presented above, the fused image, which providing a better merger between the multispectral and high-resolution images, is generated. However, the heart of the IHSWT method is how to acquire the fused intensity component I ' .

A New IHS-WT Image Fusion Method Based on Weighted Regional Features

767

Let A be the high resolution panchromatic image and B be the intensity from the IHS representation of the multi-spectral image. They are both transformed with pyramid wavelet transform. We can obtain approximate coefficients S A ( 2 J ; x, y ) ,

S B ( 2 J ; x, y ) and detailed coefficients WAK ( 2 j ; x, y ) , WBK ( 2 j ; x, y ) . Here, J denotes

decomposition level, K = 1, 2,3 denotes the three directions and j = 1, 2," J denotes different resolutions. The fused approximate coefficients are obtained with weighted average method. The fusion rule is:

S ( 2 j ; x, y ) = k1 < S A ( 2 j ; x, y ) + k2 < S B ( 2 j ; x, y ) .

(1)

Where k1 , k2 are the weighted coefficients and their relation satisfy the equation

k1 + k2 = 1 . For the detailed components, in order to preserve localized feature information as more as possible, we compare the coefficient of A with the correspond coefficient of B, and give a greater weight to the one that contains more detail information. It can be defined as: W K ( 2 j ; x, y ) = k1 <WAK ( 2 j ; x, y ) + k2 <WAK ( 2 j ; x, y ) .

(2)

We define the adaptive weighted coefficients k1 and k2 as Eq.(3) based on window region features [6]. In this way, we give the different coefficients different weights. This method can take full advantage of the localized feature information of the images. N

k1 = ∏ f An n =1

ª N n N nº «∏ f A + ∏ f B » , k2 = 1 − k1 . ¬ n =1 ¼ n =1

(3)

Where, f An is the nth local feature of A and f Bn is the nth local feature of B. N denotes the total number of local features. In this paper, we use variance and energy of the corresponding local regions of the source images (A and B). Then, we have: 2

k1 = ∏ f An n =1

2 ª 2 n º + f f Bn » , k2 = 1 − k1 , ∏ ∏ A « ¬ n =1 ¼ n =1

(4)

f A1 = DA (2 j ; x, y ), f B1 = DB (2 j ; x, y ) ,

(5)

f A2 = E A (2 j ; x, y ), f B2 = EB (2 j ; x, y ) .

(6)

Where, DAK ( 2 j ; x, y ) and DBK ( 2 j ; x, y ) are WAK ( 2 j ; x, y ) and WBK ( 2 j ; x, y ) ’s variance respectively in the range of n × n window whose center coordinate is ( x, y ) . E A (2 j ; x, y ) and EB (2 j ; x, y ) are the energy of the region, they are defined

by:

768

J. Wu et al.

E A (2 j ; x, y ) = ¦ c ( x ', y ') ª¬WAK (2 j ; x ', y ') º¼ ,

(7)

EB (2 j ; x, y ) = ¦ c ( x ', y ') ª¬WBK (2 j ; x ', y ') º¼ .

(8)

2

x ', y '

2

x ', y '

where x ', y ' are in the rang of n × n , the c ( x ', y ' ) is the corresponding weighted coefficient. The method based on IHS and weighted regional features of wavelet transformations can be performed in the following steps: 1) Transform the multi-spectral image into the IHS components. (Forward IHS transform). 2) Perform histogram match between panchromatic image and intensity component (I), and get new panchromatic image (New Pan). 3) Decompose the histogram-matched panchromatic image and intensity component (I) into wavelet coefficients respectively. 4) Use Eq.(1) and Eq.(2) to form the new intensity. 5) Transform the new intensity, the hue, and the saturation components back into RGB space (inverse IHS transform).

4 Experiments and Fusion Results Analysis In this paper, we merge the SPOT panchromatic image and landsat multi-spectral image to show the availability of the proposed algorithm. The tested methods include IHS fusion method, the combined IHS and classical wavelet method (IHS-WT), the combined IHS and local deviation of wavelet method (IHS-CWT) [7], and the new IHS-WT image fusion method based on weighted regional features. The fusion effect images are shown in Fig.1. Seen from the visual effect, the fused image with the new method not only preserves most of the spectral information of the multi-spectral image but also improves the spatial details distinctly. Compared with the IHS method, the method has obvious advancement in preserving the spectral information and some improvement in enhancing spatial details. Compared with the IHS-WT method, and the IHS-CWT method, the new algorithm’s performance in fusing spatial details is almost the same but it has an obvious advancement in preserving spectral information. So the new method is superior to the former. The statistical parameters, such as standard variance σˆ , entropy H , definition g , distortion degree D , deviation index Dindex , and the correlation coefficient ρ xy [8], are chosen to quantitatively analyze the different fusion results. In Table 1, the σˆ , H , and ρ xy of the fused image using the new algorithm are all bigger than those of the fused image using IHS method, they increase by 0.9273, 0.0071 and 0.2384 respectively; meanwhile, equivalent in g , the D reduces 16.9815, and the Dindex reduces 0.3481. It indicates that the new algorithm is superior to IHS method in the way that it preserves spectral information and enhances spatial details.

A New IHS-WT Image Fusion Method Based on Weighted Regional Features

769

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 1. Original images and fused images with different methods. (a), (b) Original SPOT, Landsat images. (c) The IHS method. (d) The IHS-WT method. (e) The IHS-CWT method. (f)Fused Image with the IHS-WT based on weighted regional features. Table 1. The Quantization Measurement Results of The Four Methods

σˆ IHS 53.4678 IHS-WT 54.7212 IHS-CWT 54.8411 Proposed algorithm 54.3951 Different images

H

g

D

Dindex

5.2859 5.2965 5.2970

20.1161 28.4583 0.5387 20.4559 12.8560 0.2254 20.8017 12.7887 0.2151

5.2930

19.7409 11.4768 0.1906

ρ xy 0.7183 0.9477 0.9483 0.9567

Although, compared with the IHS-WT method, and the IHS-CWT method, the fused image using our new method is inferior in σˆ , H and g , but superior in D , Dindex and

ρ xy .

Compared with the former method, the D , Dindex and

ρ xy increase

by

1.3792, 0.0348 and 0.009 respectively. Compared with the latter method, the D , Dindex and ρ xy increase by 1.3119, 0.0245 and 0.0084 respectively. It indicates that the new method’s performance is equivalent to the two methods in reflecting the spatial details but is better than the latter in preserving the spectral information.

5 Conclusion Remote sensing image data fusion method based on IHS transform or wavelet transform is an important way to integrate every kind of image information. In this paper,

770

J. Wu et al.

for combing the advantages of the two methods to achieve a better fusion result, a remote sensing image fusion method with combining the IHS and based on weighted regional features wavelet is proposed. The new algorithm can preserve the spectral information in the fused image; meanwhile, it can merge the spatial details of the panchromatic image. So the new algorithm’s performances are improved well in preserving spectral information and enhancing spatial details. Both objective statistical parameters and subjective visual effect indicate that the new method can well improve spatial resolution of the remote sensing image as the IHS, the IHS-WT, and the IHSCWT method, especially; it is more effective on preserving spectral content of remote sensing image.

References 1. Carper, W. J., Lillian, T. M., Kiefer, W.: The Use of Intensity-hue-saturation Transformations for Merging SPOT Panchromatic and Multi-spectral Image Data. Photogrammetric Engineering & Remote Sensing, Vol.56 (1990) 459-467 2. Chavez, P. S. J., Kwarteng, Y.: Extracting Spectral Contrast in Landsat Thematic Mapper Image Data Using Selective Principal Component Analysis. Photogrammetric Engineering & Remote Sensing, Vol.55 (1989) 339-348 3. Chavez, P. S. J., sides, C., Anderson, A.: Comparison of Three Different Methods to Merge Multiresolution and Multispectral Data: Landsat TM and SPOT panchromatic. Photogrammetric Engineering & Remote Sensing, Vol.57 (1990) 295-303 4. Zhou, J., Civco, D. L., Silander, J.A.: A Wavelet Transform Method to Merge Landsat TM and SPOT Panchromatic Data. Int. J. Remote Sensing, Vol.19 (1998) 743-757 5. Pohl, C., Genderen, J. L. V.: Multisensor Image Fusion in Remote Sensing: Concepts, Methods and Applications. Int. J. Remote Sens., Vol.19 (1998) 823-854 6. Huang, J., Pan, Q., Pi, Y.N., Li, J.: An IHS Image Fusion Method Based On Weighted Regional Features. Computer Engineering and Applications, Vol.41 (2005) 39-41 7. Wu, J., Huang, H.L., Liu, J., Tian, J.W.: Remote Sensing Image Data Fusion Based on IHS and Local Deviation of Wavelet Transformation. IEEE ROBIO2004, (2004) 564-568 8. Li, B.C., Wei, J., Peng, T.Q.: Objective Analysis and Evaluation Of Remote Sensing Image Fusion Effect. Computer Engineering and Science, Vol.26 (2004) 42-46

A Novel Algorithm of Mining Multidimensional Association Rules WanXin Xu1,2 and RuJing Wang1 1

Hefei Institute of Intelligent Machines, Chinese Academy of Sciences, 230031 Hefei, Anhui, China [email protected] 2 Department of Automation, University of Science and Technology of China, 230027 Hefei, Anhui, China [email protected]

Abstract. In this paper, a novel algorithm of mining multidimensional association rules in relational database is proposed. First, design a particular structure including many indexes to save multidimensional itemsets, which makes time of finding itemsets decrease. Basing on statistic idea, the algorithm saves all frequent 1-itemsets and their support members at scanning the database for the first time. Frequent k-itemsets will be generated with no necessary to scan the database more times because of using these support members. Compared with some traditional algorithms of mining association rules, the algorithm presented in this paper has better executive efficiency and expansibility, which is proved in our experiments.

1 Introduction Data mining has been defined as “The nontrivial extraction of implicit, previously unknown, and potentially useful information from data” [1,2]. Association rule, which describes correlations of data items in large numbers of data, is a manner of expressing knowledge. It was first presented by R.Agrawal in 1993 [3]. Then, more and more scholars began to research on association rule and many methods are based on the algorithm Apriori proposed by R.Agrawal in 1994[4]. Association rule can be classified as single-dimensional association rule and multidimensional association rule by dimension appeared in rule. Multidimensional association rule can be classified as inter-dimension association rule and hybrid-dimension association rule. Inter-dimension is called when there are not repetitious predications appeared in the same rule, while hybrid-dimension is called when there are. What this paper discusses is mining multidimensional inter-dimension association rules directly in relational database. To mine association rules in relational database, some papers present a method which transforms relational database into transaction database and uses Apriori algorithm to mine rules. Because the performance of the former algorithm is not very good, frequently used methods [5,6] of mining multidimensional association rules from relational database are those which improve on classical algorithms [3,4,7] using SQL tools. These algorithms are fast and effective to a certain extent, so they are used in many applications. However, they often need to scan the target database many times and may also cost much time. To find a solution to this problem, D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 771 – 777, 2006. © Springer-Verlag Berlin Heidelberg 2006

772

W. Xu and R. Wang

this paper introduces a new structure and presents an algorithm named MDIM (Multi_Dimensional_Indexing_Mining) basing on the structure.

2 Definitions 2.1 Association Rule Suppose I = {i1, i2, … , im} is a set of items. D is a database of transactions. Each transaction T is a set of items and has an identifier called TID. Transaction T satisfies T ⊆ I. Suppose A is a set of items. Transaction T includes A if and only if A ⊆ T. Definition 1. Association rule is an implication formula with a form “A ⇒ B”, where A and B are all itemsets and satisfy that A ⊂ I, B ⊂ I and A ∩ B= ∅ . Definition 2. Rule “A ⇒ B” is true in sets D of all transactions with a support (denoted by s) and a confidence (denoted by c), where A and B are all itemsets. Support s is a percentage of transactions including both A and B (A ∪ B) in transaction sets D. Confidence c is a percentage of transactions including both A and B (A ∪ B) in transactions only including A. That is: s = P( A ∪ B) , c =

P( A ∪ B) . P( A)

(1)

2.2 Apriori Algorithm and Apriori Property Apriori algorithm is one which mines single-dimensional and boolean type of association rule. First, the algorithm scans database once and generates all frequent 1-itemset. Secondly, let k=2. Every different two of all (k-1)-itemsets are linked into a candidate k-itemset, then the database is scanned again and all frequent k-itemsets are generated. Repeat this process until no new frequent itemsets is generated. In this algorithm, an important property which is called Apriori property is used. Property 1(Apriori property). All not empty subsets of frequent itemsets must be frequent [1,2]. 2.3 Itemsets in Multidimensional Data Sets Suppose RDB is relational database with m records and n dimensions. All dimensions, which are also called predications or fields, constitute a dimension combination with a formula (d1, d2, …, dn), in which dj represents j-th dimension. The i-th record can be expressed as value combination with a formula (vi1, vi2, …, vin), where vij represents j-th value in i-th record, 1 i m and 1 j n. Definition 3. A formula like (dj, vij) is called an “item” in relational database or other multidimensional data sets, which is denoted by Iij. That is: Iij = (dj, vij), where 1 i m and 1 j n.

A Novel Algorithm of Mining Multidimensional Association Rules

773

Definition 4. Suppose that A and B are items in the same relational database RDB. A equals B if and only if the dimension and the value of the item A are equal to the dimension and the value of the item B, which is denoted by A=B. If it is not true that A equals B, it is denoted by AB. Definition 5. A set constituted by some “item” defined in definition 3 is called “itemset”. The “itemset” which are constituted by all items in relational database is called “Full itemset”, and it is denoted by S. That is: S {Iij 1 i m, 1 j n}.

㧩 Ί

3 Structure, Strategy and Algorithm There are two courses in mining association rule. First, find all frequent itemsets in target database. Secondly, generate association rules from frequent itemsets. The first course is the core of the algorithm and has an important effect on mining efficiency, while the second course is simple relatively. So algorithm MDIM mainly improves the first course. Furthermore, there should be a series of operations of data pretreatment before mining. These operations include data cleaning, integration, transformation and reduction. After these operations, the database will become more appropriate for mining interesting rules [1,2,5]. 3.1 Structure and Strategy Structure. Algorithm MDIM defines four simple data structures named DimSizeNode, DimSetNode, ValueSetNode and SupMemNode, respectively. A 4-level link list structure, which is based on the four simple structures and used to reserve itemsets in database, is generated in algorithm MDIM. An example is given to show the structure in figure 1 and the data used in the example come from a relational table shown in table 1. Table 1. A relational table used by algorithm MDIM

ID 1 2 3 4 5 6 7 8

Field 1 Field 2 Field 3 Field 4 Field 5 1 1 1 1 1 2 2 1 2 1 2 2 2 2 1 1 1 2 1 1 2 3 2 1 2 1 1 1 1 1 2 3 2 1 2 2 2 1 2 1

From the structure displayed in the figure 1, it is easily to find that an itemset is divided into two parts, one of which is dimension combination and the other is value combination. Two parts of an item set are reserved in different places. The structure named DimSetNode stores the dimension combination and the structure named ValueSetNode stores the value combination.

774

W. Xu and R. Wang

Fig. 1. 4-level link list structure used by MDIM (minimum support is 0.5)

For example, {(1, 2), (3, 1)} is a 2-itemset with two support records whose serial numbers are 2 and 8. In our structure, the 2-itemset will be stored under the node whose type is DimSizeNode and field named dimSetSize is 2. The dimension combination (1, 3) is reserved in a node whose type is DimSetNode and (1, 3) locates in a place which a field named dimPtr points to. The value combination (2, 1) is reserved in a node whose type is ValueSetNode and (2, 1) locates in a place which a field named valuePtr points to. 2 and 8 locate in a place which a field named memberHead points to and a field named memberNum is filled with 2 and a field named frequent can be got by comparing the memberNum with the appointed parameter. Strategy. As many indexes on dimension, dimension combination and value combination are created in the 4-level link list structure, itemsets and the number of their support members can be found easily and quickly, which makes the time consuming of matching and finding itemsets decrease. Furthermore, as the 4-level structure stores the support members of itemsets, support members set of newly generated candidate k-itemsets can be get by computing the intersection of support member sets of two frequent (k-1)-itemsets instead of scanning target database to get. So, the database will be scanned only once and all frequent itemsets can be generated. 3.2 Algorithm MDIM Input: RDB(a relational database), min_sup(minimum support). Output: indexHead(an access to all frequent itemsets). Multi_Dimension_Indexing_Mining(RDB, min_sup){ indexHead = Initialize_DimSizeNode(1);

A Novel Algorithm of Mining Multidimensional Association Rules

775

indexHead = Gen_Candidate_1_Itemsets(RDB); Gen_Frequent_ItemSets(indexHead, min_sup); lastIndex = indexHead; while (Lk-1 ∅ ){ curIndex = Gen_Candidate_k_ItemSets(lastIndex); Gen_Frequent_ItemSets(curIndex, min_sup); lastIndex → next = curIndex; lastIndex = curIndex; } return indexHead; } Gen_Candidate_k_ItemSets(lastIndex){//(k2) curIndex=Initialize_DimSizeNode(lastIndex.dimSize+1); for each itemset l1 in lastIndex for each itemset l2 in lastIndex if(l1[1] = l2[1] l1[2] = l2[2] … l1[k-2]= l2[k-2] l1[k-1] l2[k-1]) then //Create a candidate k-itemset c using l1 and l2. //Then, insert c into curIndex. The support members //of c are the intersection of support members of //l1 and those of l2. Combine_2ItemSets_To_1ItemSets(curIndex,l1,l2); return curIndex;

ш

шш

ш

} Gen_Frequent_ItemSets(curIndex, min_sup){//(k1) for each itemset=(dimSetPtr, valueSetPtr) in curIndex if valueSetPtr → memberNum min_sup then valueSetPtr → frequent = true; else valueSetPtr → frequent = false; } 3.3 Theoretical Analysis Suppose that target relational database is RDB whose number of records is M and number of dimensions is N(N<<M), the number of different values of i-th dimension is |Vi| and the maximum number of item in one of all frequent itemsets is K. Frequent i-itemsets have |Di| different dimension combinations. There are |Li| frequent i-itemsets and |Ci| candidate i-itemsets. If the value combination of a frequent i-itemset is appeared as j-th value combination, the number of support members of the frequent i-itemset is denoted by |Sij|. The time consuming in MDIM can be individed into following parts: One is the time of generating candidate 1-itemset: for each data e (totally NgM data), find in the 4-level structure whether the corresponding node of valueSetNode type for storing the data e has existed. If it has, insert support member to the structure.

776

W. Xu and R. Wang

Otherwise, insert data e and its support member into the structure. Time consuming is O(NgMg|Vs|) where |Vs| = ma/x(|V1|, |V2|, … ,|VN|). The other is the time of linking two frequent (k-1)-itemset to generate a new k-itemset: O(k + (k-2)g|Lk-1| + (|S(k-1),x| + |S(k-1),y|) ). As the first item is smaller compared with latter items in the formular and (|S(k-1),x| + |S(k-1),y|) equals to 2M in the worst condition, the former formular can be O((k-2)g|Lk-1| + 2M). Especially, time consuming in this part will be O(M) if k = 2. Time of generating frequent k-itemset with candidate k-itemset: O(|Ck|). So, the total time consuming in MDIM should be: K

O( (NgMg|Vs|) +

¦

( (k-2)g|Lk-1| + 2M + |Ck| ) ).

(2)

k 2

4 Experiments

time consuming PV

Our experiments are made on the machine with P4 2.4G processor and 256M memory. The algorithm MDIM is actualized in Visual C++ 6.0 and the target database used in the experiment is mushrooms database which is located on web pages at http://www.ics.uci.edu/~mlearn/MLSummary.html. The database has 8124 records and 23 fields which have 2~12 different values. Figure 2 shows the result of the experiment: the time consuming is small comparatively and changes linearly with the increase of record numbers almost.

PLQBVXS PLQBVXS PLQ VXS

record number .

Fig. 2. Time consuming with different record numbers and different minimum supports

5 Conclusions This paper presents a novel algorithm of mining multidimensional association rules. Because of using data structure appropriately and adding statistic idea into the algorithm, all frequent 1-itemsets and their support members can be generated by scanning target database only once and then all frequent k-itemsets can be generated by means of frequent (k-1)-itemsets with no necessary to scan the database once more, which reduces the quantities of the scanned data. As many indexes are used in the structure, it costs less time to find an itemset than before. By using the Apriori property, useless

A Novel Algorithm of Mining Multidimensional Association Rules

777

candidate itemsets can’t be generated. Theoretical analysis and experimental studies show that algorithm MDIM is fast, steady and efficient. It works better than traditional algorithms of mining frequent itemsets from relational database.

References 1. Han, J.W., Kamber, M.: Data Mining: Concepts and Techniques. 1st edn. San Francisco, California (2001) 2. Zhu, M.: Data Mining. 2nd edn. Hefei, China (2002) 3. Agrawal, R.., Lmielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Database. In: Proceeding of the ACM SIGMOD Conference on Management of Data. Washington, USA (1993) 207-216 4. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proceeding of the 20th Int. Conference on Very Large Databases. Santiago, Chile (1994) 487-499 5. Agrawal, R., Shim, K.: Developing Tightly-coupled Data Mining Applications on a Relational Database System. In: Proceeding of the 2nd Int. Conference on Knowledge Discovery in Databases and Data Mining. Portland, Oregon (1996) 287-290 6. Yang, B.R., Sun, H.H., Xiong F.L.: Mining Quantitative Association Rules with Standard SQL Queries and it’s Evaluation. Journal of Computer Research and Development, Vol.39 (2002) 307-312 7. Houtsma, M., Swami,A.: Set-oriented Mining of Association Rules in Relational Databases. In: Proceeding of the 11th Int. Conference on Data Engineering. Taipei, Taiwan (1995) 25-33

A Novel Discretizer for Knowledge Discovery Based on Multiknowledge Approaches QingXiang Wu1,2,3, Girijesh Prasad2, TM McGinnity2, David Bell3, ShaoChun Zhong4, and Jiwen Guan3,4 1

School of Physics and OptoElectronic Technology, Fujian Normal University Fujian, Fuzhou, 350007, China [email protected] 2 School of Computing and Intelligent Systems, University of Ulster at Magee Londonderry, BT48 7JL, N.Ireland, UK {Q.wu, TM.McGinnity, G.Prasad}@ulster.ac.uk 3 School of Computer Science, Queens University, Belfast, UK {Q.Wu, DA.Bell, J.Guan}@qub.ac.uk 4 College of Software, Northeast Normal University, Changchun, 130024, China [email protected]

Abstract. As many knowledge discovery approaches are good at dealing with discrete values, discretization of continuous attributes is required when the approaches are applied to an instance information system with continuous attributes. In this paper, a novel adaptive discretizer based on a dichotomic entropy and a statistical distribution index is proposed to preprocess the continuous attributes in the instance information system, and then multiknowledge approach is applied to make decisions. The experimental results on benchmark data sets show that the adaptive discretizer can improve the decision accuracy. The decision accuracies for combination of this discretizer and multiknowledge approach is better than other approaches.

1 Introduction Various approaches [1], [2] of knowledge discovery, machine learning and data mining have been developed and widely applied to information processing and decision-making. For example, the multi-knowledge [3], [4] is represented by multiple reducts based on rough set theory. Multi-knowledge approach can extract more useful knowledge from a training set so that a high decision accuracy can be reached. Because this approach prefers dealing with discrete data, a transformation from continuous values to discrete values is required. Two classes of discretizers (unsupervised and supervised discretizers) have been surveyed and proposed in [5], [6], [7]. In this paper, a new adaptive discretizer is proposed to solve the data type transformation problem in the multi-knowledge approach. This new discretizer is based on a compound distributional index and a dichotomic entropy defined in this paper. The dichotomic entropy is applied to determine the splitting D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 778 – 783, 2006. © Springer-Verlag Berlin Heidelberg 2006

A Novel Discretizer for Knowledge Discovery

779

point within an interval. The compound distributional index is used to determine which interval should be splitt. The discretizer can adaptively discretize any continuous attribute according to adaptive rules based on minimal dichotomic entropy and maximal compound distributional index. The compound distributional index is defined using the information of attribute value distribution and decision value distribution. In Section 2, a statistical distribution is introduced. In Section 3, a dichotomic entropy, a compound distributional index, and adaptive rules are defined, and an example is applied to illustrate the algorithm. Experimental results and analysis are given in Section 4. Section 5 concludes the paper.

2 Statistical Distribution 2.1 Instance Information System Following the notation in [2, 8, 13], let I = < U, A∪D > represent a decision system, where U = {u1, u2, …, ui, …, u|U| } is a finite non-empty set, called an instance space or universe, and where ui is called an instance in U. A = {a1, a2, a3, …, ai, …, a |A|}, also a finite non-empty set, is a set of attributes of the instances, where ai is an attribute of a given instance. D is a non-empty set of decision attributes, and A∩D = Ø. For every a ∈ A there is a domain, represented by Va, and there is a mapping a(u) : U → Va from U into the domain Va , where a(u) represents the value of attribute a of instance u and is a value in the set Va. Va = a(U) = {a(u) : u ∈ U } for a ∈ A.

(1)

For a decision system, the domain of decision attribute is represented by Vd = d(U) = {d(u) : u ∈ U } for d ∈ D.

(2)

2.2 Value Number Distribution In order to obtain a statistical table, a set of distribution numbers are defined as follows. Let N d k , ai ,vx represent the number of instances with decision value dk and attribute value

vx ∈ Vai .

N d k ,ai ,vx = | {u : d (u ) = d k and ai (u ) = vx for all u ∈ U } | . N ai ,vx =

¦

d x ∈Vd

N d x ,ai ,vx .

where N ai ,vx is the number of instances with attribute value vx ∈ Vai .

(3)

(4)

780

Q. Wu et al.

3 Adaptive Discretizer 3.1 Definition of Dichotomic Entropy

Let vx ∈ Vai be a value of continuous attribute ai and the continuous attribute ai is split by border value vbd. In order to indicate instance number and homogeneous degree over the decision space within an attribute value interval, a decision distributional index is defined as follows.

Ed (vstart → vend ) =

¦ −N

d k ∈Vd

d k ,vstart →vend

log 2 (

N d k ,vstart →vend N ai ,vstart →vend

).

(5)

where N d k ,vstart →vend represents the number of instances with decision value dk and attribute value within vstart and vend, and N ai ,vstart →vend represents the number of instances attribute value within vstart and vend for all decision values. This decision distributional index indicates homogeneous degree for corresponding interval. Based on this index, two decision distributional indexes can be obtained when an interval is split. A dichotomic entropy for splitting point vbd is defined as

E (vbd ) =

1 1 Ed (vstart → vbd ) + Ed (vbd → vend ) . |U | |U |

(6)

where |U| is the total number of instances. According to machine learning theory [5-7, 10-11], the smaller the entropy is, the better the attribute discretization. Applying Equation (6), a border value vborder can be obtained by minimizing dichotomic entropy. vborder = arg min E (vbd ) vbd ∈Vai

§ 1 · 1 = arg min ¨ Ed (vstart → vbd ) + Ed (vbd → vend ) ¸ . |U | vbd ∈Vai © | U | ¹

(7)

3.2 Adaptive Rules

Applying Equation (7), a continuous attribute can be split into two intervals. Then an interval is selected to split further. In principle, an interval with an inhomogeneous value distribution and large number of instances should be split. In order to get a quantitative criterion, a distributional index for instance number distribution over the attribute values and the decision space within an interval, which is called a value distributional index, is defined as follows. Ev (vstart → vend ) =

¦

vstart ≤ vx < vend

¦ −N

d k ∈Vd

d k , ai , vx

log 2 (

N d k , ai ,vx N ai ,vx

).

(8)

A Novel Discretizer for Knowledge Discovery

781

It is obvious that Ev (vstart → vend ) is small if the distribution varies with value vx at high frequency. Ev (vstart → vend ) is large if the distribution varies with value vx very slowly, in other words, the distribution over value vx is homogeneous. Mathematically, it is easy to prove that the maximal value of Ev (vstart → vend ) is equal to the decision distributional index Ed (vstart → vend ) for the same interval. Based on the difference of Ed (vstart → vend ) - Ev (vstart → vend ) , a compound distributional index is defined as follows. ∆E (vstart → vend ) =

Ed (vstart → vend ) − Ev (vstart → vend ) . |U |

(9)

where |U| is total number of instances in the instance information system. Dividing by |U| ensures that ∆E (vstart → vend ) is a real value within [0,1] and ∆E (vstart → vend ) decreases as the instance number of the interval decreases i.e. the interval becomes smaller. This compound distributional index can be applied as a criterion to determine whether an interval is to be split further. An interval with largest ∆E (vstart → vend ) is selected to split during a splitting operation in an adaptive discretizer. If ∆E (vstart → vend ) = 0 i.e. the value distribution is a homogeneous distribution with respect to vx, it is not necessary to split the interval further. In most cases, 5 intervals are enough for reaching a high decision accuracy in our experiments. Therefore, adaptive discretizer stops when the number of intervals reaches five or ∆E (vstart → vend ) is less than a threshold. These adaptive rules are thus very different from the approach in [6]. The value of ∆E (vstart → vend ) reflects the two distributional indexes (decision distributional index and value distributional index) and depends on the number of instances within the interval. The larger the decision distributional index is, the larger the value of ∆E (vstart → vend ) . The larger the value distributional index is, the smaller the value of ∆E (vstart → vend ) . The larger the number of instances within the interval is, the larger the value of ∆E (vstart → vend ) . Value ∆E (vstart → vend ) is thus a compound index to provide guidance to an interval that should be split.

4 Experimental Results A set of 13 benchmark data sets from the UCI Machine Learning Repository [9] was applied to test both multi-knowledge approaches with the discretizer and without the discretizer. The decision accuracies under the ten-fold cross validation standard are given in Table 1. Column ‘No-Discretizer’ lists decision accuracies for multiknowledge approach without the discretizer. Column ‘Adaptive Discretizer’ lists decision accuracies for multi-knowledge approach with the discretizer. In order to compare with an unsupervised discretizer, column ‘5-equal Discretizer’ lists decision accuracies for multi-knowledge approach with a 5-identical-interval discretizer. It can be seen that multi-knowledge approach with the adaptive discretizer improved

782

Q. Wu et al.

decision accuracies for 13 data sets. The average accuracy over 13 data sets is better than multi-knowledge approaches without the adaptive discretizer and with a 5identical-interval discretizer. Column C-type Attributes gives the number of continuous attributes that contain in corresponding data set. The names with ‘♣’ indicate that some attribute values are missing in the data set. Table 2 shows that combination of new discretizer and the multi-knowledge approach can obtain higher decision accuracy than other conventional approaches. Table 1. Improvement results for applying the discretizer to multi-knowledge Attribute Number 60 27 34 13 15 13 13 44 6 4 6 38 39

Data Name Sonar Horse-colic ♣ Ionosphere Wine Crx_data♣ Heart Hungarian♣ SPECTF Bupa Iris_data Ecoli Anneal♣ Bands♣ Average

C-type Attributes 60 7 34 13 6 6 6 44 6 4 6 6 20

Instance number 208 300 351 178 690 270 294 80 345 150 336 798 540

NoDiscretizer

Adaptive Discretizer

77.8 80.0 90.6 98.9 85.1 83.3 85.4 73.8 65.5 96.7 71.5 99.4 77.8 83.5

97.1 86.3 93.7 99.4 86.5 86.3 85.4 98.8 70.2 96.7 75.3 99.7 79.6 88.8

5-equal Discretizer

91.4 80.3 92.6 97.8 85.0 85.1 84.0 92.5 67.0 93.3 75.0 99.7 76.5 86.2

Table 2. Comparison of different combination approaches Combination Approach Original-C4.5[12] Chi-C4.5[12] MChi-C4.5[12] MDLPC-C4.5[12] New discretizer-MK

Iris 95.33 94.0 94.7 94.0 96.7

Wine 92.7 88.8 93.2 87.7 99.4

Bupa 68.4 50.0 50.3 63.5 70.2

Heart 53.8 55.1 32.9 54.5 86.3

5 Conclusion Based on a value distribution, a compound index, which is composed of a decision distributional index and a value distributional index, was defined and applied to determine to the interval that should be split during discretization. Dichotomic entropy was defined and minimal dichotomic entropy was applied to determine the border value for splitting an interval. Using this approach, a continuous attribute can be split in two intervals at the border point with minimal dichotomic entropy; then the compound index is applied to select the new intervals to split further until the desired

A Novel Discretizer for Knowledge Discovery

783

number of intervals is reached or the compound index is small enough. The adaptive discretizer was combined with the multi-knowledge approach to making decision. The experimental results on 13 benchmark data sets shows that the average accuracy has been improved. This adaptive discretizer can be combined with other machine learning approaches for further study.

References 1. Lin, T.Y., Cercone, N. (eds): Rough Set and Data Mining, Kluwer Academic Publishers, (1997) 2. Polkowski, L., Tsumoto, S., Lin, T.Y. (eds): Rough Set Methods and Applications, New Developments in Knowledge Discovery in Information Systems, Physica-Verlag, A Springer-Verlag Company, (2000) 3. Hu, X., Cecone, N., Ziarko, W.: Generation of Multiple Knowledge from Databases based on Rough Set Theory, in: Lin TY and Cercone N (eds) Rough Set and Data Mining, Kluwer Academic Publishers, (1997)109-121 4. Wu, Q.X., Bell, D.A., McGinnity, T.M.: Multi-knowledge for Decision Making, International Journal of Knowledge and Information Systems, Springer-Verlag, 7 (2) (2005)246 – 266 5. Dougherty, J., Kohavi, R., Sahami, M.: Supervised and Unsupervised Discretization of Continuous Features, Proc. Of International Conference on Machine Learning, (1995) 194202 6. Wu X.D.: A Bayesian Discretizer for Real-Valued Attributes. The Computer J., 39 (1996) 688-691 7. Lukasz A.K., Krzysztof J.C.: CAIM Discretization Algorithm, IEEE Transactions on Knowledge and Data Engineering, 16 (2) (2004) 145-153 8. Pawlak, Z.: Rough Sets: Theoretical Aspects Data Analysis, Kluwer Academic Publishers, Dordrecht (1991) 9. Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/~mlearn/MLRepository.html, UC Irvine, Dept. Information and Computer Science, (Download in 2003) 10. Quinlan, J.R.: Induction of Decision Trees, Machine Learning, 1 (1) (1986)81-106 11. Mitchell, M.T.: Machine Learning. McGraw Hill Co-published by the MIT Press Companies, Inc. (1997) 12. Tay, F.-E.H., Shen, L.: A Modified Chi2 Algorithm for Discretization, IEEE Transaction on Knowledge and Data Eng., 14 (3) (2002)666-670 13. Wu, Q.X., Bell, D.A.: Multi-Knowledge Extraction and Application. In: Wang, G.Y., Liu, Q., Yao, Y.Y., and Skowron, A., Eds. Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing (RFDGrC03). LNAI 2639, Springer, Berlin (2003) 274-279

A Novel Neural Network for Sensor Fusion Applied to Wood Growth Ring Moisture Measurement* Mingbao Li1, Shiqiang Zheng2, and Jun Hua2 1

School of Civil Engineering, Northeast Forestry University, No. 26 Hexing Road, Harbin, 150040, China [email protected] 2 School of Electromechanical Engineering, Northeast Forestry University, No. 26 Hexing Road, Harbin, 150040, China [email protected], [email protected]

Abstract. Wood growth ring moisture content is considered as an important measured quantity for the analysis of wood drying stress. However, wood moisture sensors working in the complicated drying conditions are intensively jammed by ambient factor parameters. In this paper, a novel functional-link neural network (FLNN) is presented for the implementation of wood moisture sensor fusion. FLNNs are single layer networks that are able to handle linearly nonseparable classes due to the dimensions of the inputs being increased by using nonlinear combinations of the input features. Simulation results have demonstrated that FLNN has a less computational complexity and a better performance compared with conventional multilayer neural networks.

1 Introduction Wood is preferred building and engineering material because it is cost effective, low in processing energy, easy to work, renewable, strong and aesthetically pleasant [1]. Before used for engineering applications, drying treatment for wood is needed. Wood growth ring moisture content is considered as an important property since it is related to many wood properties including strength, stiffness and dimensional stability. It also affects wood processing properties [2]. The accuracy measurement for moisture content in the wood growth ring is critical for the analysis of wood stress, as well as the determination of the drying schedule. In the harsh drying environment moisture content sensors are intensively influenced by ambient factors, including in-kiln temperature, equilibrium moisture content, jet flow, etc. In order to solve the noisy problems and obtain an accurate output, sensor fusion methods are adopted. Sensor data fusion approach combines data from multiple sensors (and associated databases if appropriate) to achieve improved accuracies and more specific inferences that could not be achieved by the use of only a single sensor [3]. Data fusion methods can be further classified into three categories:Probabilistic models, such as Bayesian *

This work is supported by National Natural Science Foundation of China (30371126) and by Natural Science Foundation of Heilongjiang Province of China (C0308).

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 784 – 789, 2006. © Springer-Verlag Berlin Heidelberg 2006

A Novel Neural Network for Sensor Fusion

785

reasoning, evidence theory, robust statistics; Least squares, such as Kalman filtering, regularization, set membership; intelligent fusion: such as fuzzy logic, neural networks, genetic algorithms, etc. Of those methods, sensor fusion based on artificial neural network has been more and more concerned in recent years. Much efforts have been focused on NN architectures and their design, in particular, on architectures for fast learning, on adjustable weight matrices for learning, and on fast learning algorithms [4],[5]. Most of the artificial neural network (ANN) data fusion techniques are based on multilayer feed-forward networks such as multilayer perceptron (MLP) trained with back-propagation (BP) or more improved variations of this algorithm. The problem of data fusion using MLP structure lies in that the BP learning algorithm needs a long training time because of the hidden layers. When the number of hidden layer is increased, the network will converge closer to a local minimum. The functional link neural network (FLNN) was first proposed by Pao and Takefuji [6]. According to the function approximation basic theorem proposed by Hornik, FLNN method combines linearly the input signal and expands the input signal space. In this structure, the hidden layers existing in the MLP are completely replaced by the nonlinear mappings. That simplifies the structure of MLP and makes FLNN perform faster convergence rate and less computational complexity than the former. Philip Chen et al. [7] presents an adaptive implementation of the functional-link neural network architecture together with a supervised learning algorithm that rapidly determines the weights of the network. By tests on function approximation, time-series prediction, and system identification, the simulation shows a very promising result. Also, other literatures [8],[9] proved that FLNN is an effective and valid modeling tool for sensor fusion. Based on the former work, in this paper, functional link neural network using Chebyshev polynomials is proposed for this purpose to compensate for nonlinear response characteristics and complex nonlinear dependency of the environmental parameters on the sensor characteristics and to get accuracy and reliable fusion output. The sensor fusion model based on the FLNN architecture will be built. The performance comparison with the MLP model will be discussed.

2 ANN-Based Sensor Fusion In this paper, wood growth ring moisture content measuring system employs multiple sensors when the single sensor cannot provide satisfactory accuracy or reliability. The method of data fusion combines redundant and complementary multi-sensor information. The fusion of redundant information can reduce overall uncertainty and thus increase the accuracy with which the features are perceived by the system. The redundant information provided by multiple sensors can also serve to increase reliability in the case of sensor error or failure. Neural networks provide a well-established formalism with which to model the processing of data fusion. Relations of neurons can be activated in response to the relationship between the multi-sensor data input and the desired signal or decision output.

786

M. Li, S. Zheng, and J. Hua

The neural network used in this paper is a single layer FLNN using Chebyshev polynomials. A general structure of FLNN is shown in Fig. 1. The architecture consists of two parts [16]: namely, numerical transformation part in part (A) and learning part in part (B). Numerical transformation deals with the input layer to hidden layer by approximate transformable method. The training patterns are denoted by { X k yk } , where the input pattern vector Xk be of dimension n, and the output yk be a scalar. A set of N basis functions Φ ( X k ) = [φ1 ( X k ) φ2 ( X k ) ⋅⋅⋅ φN ( X k )]T are adopted to expand functionally the input signal X k = [ x1 ( k ) x2 (k ) ⋅⋅⋅ xn (k )]T . These N linearly independent functions map the n-dimensional signal space into an N-dimensional space, that is R n → R N , n < N . As a result the Chebyshev polynomial basis can be viewed as a new input vector. Hence, there is no learning in this part. For example, consider a two dimensional input pattern X = [ x1 x2 ]T . An enhanced pattern obtained by using Chebyshev functions is given by Φ = [1 T1 ( x1 ) T2 ( x2 ) ⋅⋅⋅ T1 ( x1 )T2 ( x2 ) ⋅⋅⋅ ]T .

(1)

where Ti ( xi ) is a Chebyshev polynomial. The four kinds of the Chebyshev polynomials can be generated by the following the two term recursion formula [16] Ti +1 ( x) = 2 xTi ( x) − Ti −1 ( x ),

T0 ( x ) = 1 .

(2)

but with different choices of initial values T1 ( x1 ) = x, 2 x, 2 x − 1 and 2 x + 1 . The linear combination of these function values can be presented in its matrix form, that is S = Wĭ. Here Sk = [S1(k) S2(k) ··· Sm(k)]T, W is the m×N dimensional weight matrix. The matrix Sk is input into a set of nonlinear function ȡ(·) = tanh(·) to generate the equalized output Yˆ = [ yˆ1 yˆ 2 ⋅⋅⋅ yˆ m ]T , yˆ j = ρ ( S j ) , j =1, 2, ···, m.

ĳ1(x)

x1 x2

Functional

ĳ2(x)

: :

xk

Expansion

X=

x1 x2 : : xn

Sk

^y k

tanh (.)

–

e(k) ĳN(x)

Upgrate Algorithm

+ yk

A. numerical transform part

B. learning part

Fig. 1. Structure of a functional link network

Being similar to a MLP, the FLNN uses a BP algorithm to train the neural networks. But since the FLNN has much simpler structure than a MLP, its speed of convergence for training process is a lot faster than a MLP. The detailed training algorithm and stability analysis can be found in the literature [10].

A Novel Neural Network for Sensor Fusion

787

3 Model Building and Testing In this section we present the results of sensor fusion model for wood ring moisture content measurement. Wood species and measurement method was given. Ambient factors that influence the measurement precision were analyzed. Based on those work, detailed performance studies using FLNN for these models have been carried out and an illustration of FLNN performance was analyzed. 3.1 Material and Measurement Method

The wood used for these experiments were rough-sawn, green sapwood boards, 60mm×150mm, obtained from a local sawmill in Harbin from Fraxinus mandshurica timber sourced from North China. The boards were dried in a closed-circuit drying kiln at an air velocity of 3m/s. Wood moisture content was measured by selfdeveloped sensors. In order to consider the other influenced factors for sensor fusion modeling, in-kiln air temperature was measured by a platinum resistance thermometer and the equilibrium moisture content (EMC) by a temperature-compensated capacitive element. These measured signals were controlled through Advantech® PCI–1711 data acquisition board and control software. The weight of the boards was monitored continuously by a Sartorius FA-N/FI Series Electronic Analytical Balance with a maximum capacity of 10kg and a resolution of 0.1mg. These weight data from ASTM D 4442–92 test method were used as calibration values for the actual sensor output in the modeling training and testing. In the wood drying applications, the final wood moisture content (MC) usually ranges from 7% to 12%. The ultimate phase (MC descending from approximately 25% to the ultimate MC) of the wood drying process is most crucial for the drying quality. So in this paper wood MC in the range from 7% to 25% was mainly studied. By data acquisition software and Electronic Analytical Balance, 396 input-output samples recorded with a sampling period of 10 minutes used for model building and testing were prepared. Next, these datasets were divided into two groups, 264 sets for the model training and 132 sets for the model testing. 3.2 Model Building

Before modeling, the three inputs (including MC sensor output voltage, ambient temperature and EMC output voltage) and the output (fusion value of MC output), used in the simulation study were suitably normalized so as to keep their values within 1. The reference [11] gives the mean square error (MSE) for the proposed model for inputs expanded to different number of terms along with the number of weights to be updated in the FLNN. From the comparison it becomes clear that when the order of the Chebyshev polynomial expansion is taken as two, the MSE is minimum. So for sensor fusion in this study, we have expanded the three inputs to two terms each. After forward calculations, the FLNN produces a fusion output of wood MC, which will be compared with the target pattern (from the balance reading) to obtain an error value. This error is used to update the weights of the FLNN by the improved training

788

M. Li, S. Zheng, and J. Hua

algorithm. Then, the iteration process continues till the mean square error between the desired and the FLNN output reaches a preset minimum value. After 689 iterations are made to train the FLNN model with MSE 0.089, the final weights are stored for later use for model testing. The performance of the proposed FLNN was compared with that of an MLP. For this purpose, the same data sets were also trained on a 5-3-1 multi-layer neural network (MLNN) using the BP algorithm. It takes 780 iterations to reach the training MSE 0.091. 3.3 Model Testing

In order to check the validity of the trained FLNN model, another 132 data-sets are normalized and applied to the model. The performance evaluation of the model is carried out by loading the final stored weights into the FLNN. There is no updating of the weights during testing mode and actual use of the LMCS model. To evaluate the effectiveness of the model, the FLNN fusion output of MC was computed and then compared with the actual values of MC. The similar operations were applied to the MLP model for making a comparison. True and fusion MC ranging from 12% to 20% from testing sets for MLP models and FLNN are evaluated shown in Fig. 2. The linearity of estimation in the MLP and FLNN model is quite good. It can be observed that the MLP and FLNN models are capable of estimating the wood MC quite accurately. The fusion result using the FLNN model is with the relative MC output error of 2.30% and the testing MSE of 0.653. The elapsed training time is 2.4322s on an IBM R52e–1.70GHz Notebook. For the MLP model, these performance values are 4.58%, 0.738 and 5.7239s, respectively. 0.8

20 True Value Fusion Output Error

0.8

20 True Value Fusion Output Error

B-MLP

0.4

18

0.2

-0.2 -0.4

14

0.6 0.4

18

0.2 0.0

16

-0.2 -0.4

14

-0.6 12 12

14

16 Weighted value of MC (%)

18

-0.8 20

Error

0.0

16

Fusion value of MC (%)

0.6

Error

Fusion value of MC (%)

A-FLNN

-0.6 12 12

14

16

18

-0.8 20

Weighted value of MC (%)

Fig. 2. The true versus the fusion MC at different values of temperature: (A) FLNN; (B) MLP

From the above observations, it is clear that the performance of the MLP model is similar or a little superior to the FLNN model. However, it must be pointed out that FLNN has the simplest architecture, and has the minimum computational requirements. The amount of computational complexity of the learning algorithm determines its efficiency during training phase and cost of hardware at the time of implementation of the model.

A Novel Neural Network for Sensor Fusion

789

4 Conclusions In this paper, we have presented sensor fusion schemes in a neural network framework to solve the nonlinearity and ambient interference problems of wood growth ring moisture content sensors. The functional link neural network (FLNN) is adopted for the implementation of wood moisture sensor fusion. FLNN eliminates the hidden layer(s) of conventional neural networks by expanding the input pattern into a high order dimensional space. Compared to the multilayer perceptron (MLP), FLNN with two terms of Chebyshev polynomial trained by a BP algorithm has the similar performance and less computational complexity. The amount of computational complexity of the learning algorithm determines its efficiency during training phase and cost of hardware at the time of implementation of the model. In addition, the testing performance of FLN model has been studied. This precision performs very well for the fusion model requirements compared with the conventional neural networks. Therefore, data fusion based on FLN is a very effective method for lumber MC measuring. Based on the above advantages, FLNN may be conveniently employed in the other fields of instrument and measurement systems to compensate the adverse effects of the environment and nonlinearities of the devices.

References 1. Usta, I.: Amenability of European Silver Fir (Abies alba Mill.) to Preservative Treatment by the Full-cell Process in Longitudinal, Tangential, Radial and Triplex Flow Pathways on the base of Wood Drying. Bldg. Environ 41, (2006) 1027–1033 2. Perréa, P., Turner, I.W.: A Heterogeneous Wood Drying Computational Model that Accounts for Material Property Variation across Growth Rings. Chem.Eng.J. 86, (2002) 117–131 3. Hall, D.L., Llinas, J.: An Introduction to Multi-sensor Data Fusion. Proc. IEEE, Vol. 85, (1997) 6–23 4. Su, J.B., Wang, J., Xi, Y.G.: Incremental Learning with Balanced Update on Receptive Fields for Multi-Sensor Data Fusion. IEEE Trans. Syst. Man Cybern. B, Cybern,Vol.34, No. 1, (2004) 659–665 5. Rao, N.S.V., Reister, D.B., Barhen, J.: Information Fusion Methods Based on Physical Laws. IEEE Trans. Pattern Anal. Mach. Intell, Vol. 27, (2005) 66–77 6. Pao, Y.H. , Takefuji, Y.: Functional-link Net Computing: Theory, System Architecture, and Functionalities. Computer, Vol.25 ,(1992) 76–79 7. Chen, Philip, C.L., Steven, R., Clair, L., Pao, Y.H.: An Incremental Adaptive Implementation of Functional-link Processing for Function Approximation, Time-series Prediction, and System Identification. Neurocomputing 18, (1998) 11–31 8. Zhang J.W. , Cao, J.: ANN-based Data Fusion for Lumber Moisture Content Sensors. Trans. Inst. Meas. Control 28, (2006) 69–79 9. Patra, J.C. ,Bos, A.: Modeling of an Intelligent Pressure Sensor Using Functional Link Artificial Neural Networks. ISA Trans. 39, (2000) 15–27 10. Weng, W.D.,Yen, C.T.: Reduced-decision Feedback FLANN Nonlinear Channel Equaliser for Digital Communication Systems. IEE Proc.-Commun 151, (2004) 305–311 11. Purwar, S., Kar, I.N., Jha, A.N.: On-line System Identification of Complex Systems Using Chebyshev Neural Networks. Applied Soft Computing, in press 2005

A Novel Reduction Algorithm Based Decomposition and Merging Strategy Feng Hu, Xinghua Fan, Simon.X Yang, and Chaohai Tao Institute of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, P.R.China {hufeng, fanxh}@cqupt.edu.cn, [email protected], [email protected]

Abstract. In knowledge acquisition based on rough set theory, attribute reduction is a key problem. In this paper, by integrating with the idea of decomposing and merging decision tables, a novel attribute reduction algorithm with three steps is presented. First, an intact decision table is decomposed into two sub-decision tables with same condition and decision attributes, according to the decomposition strategy presented. Second, attribute reduction results of the two sub-decision tables are acquired in terms of classical attribute reduction algorithms. Third, a merging algorithm is developed, so the two sub-decision tables are merged into the intact decision table. The whole reduction process of original decision table is implemented. Experimental results demonstrate the eﬃciency of the proposed algorithms.

1

Introduction

Rough set (RS) is a valid mathematical theory to deal with imprecise, uncertain, and vague information [1]. It has been applied in many ﬁelds such as machine learning, data mining, intelligent data analyzing and control algorithm acquiring, successfully since it was developed by Pawlak in 1982 [2]. Rough set is an important method in KDD for the sake of the capability of process non-determinate information.In the research of knowledge acquisition, attribute reduction is a key problem in rough sets based knowledge acquisition, and many researchers proposed some algorithms for attribute reduction [3-7].However, these algorithms can hardly deal with large data environment, which is not convenient to the application of Rough Sets in industry, and it is urgent to ﬁnd appropriate road to resolve the problem. The core idea of decomposing and merging strategy is as follow. Firstly, a very large and hard problem is decomposed into many small size and independent sub-problems with same type.Secondly,all sub-problems are processed and their solutions are acquired. Thirdly, their solutions are merged and the solution of the original large and hard problem is obtained. Combining the idea of decomposing and merging strategy, original large data sets are divided into many small data sets, then, the small data sets are processed and their solutions are merged. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 790–796, 2006. c Springer-Verlag Berlin Heidelberg 2006

A Novel Reduction Algorithm Based Decomposition and Merging Strategy

2

791

Basic Notions in Rough Set Theory

For the convenience of description, some basic notions of decision table are introduced here at ﬁrst. Deﬁnition 1. A decision table is deﬁned as S =< U, R, V, f >, where U is a non-empty ﬁnite set of objects, called universe, R is a non-empty ﬁnite set of attributes, R = C ∪ D, where C is the set(of condition attributes and D is the Vp , and Vp is the domain of the set of decision attributes, D = Ø. V = p∈R

attribute p. f : U × R → V is a total function such that f (x, p) ∈ Vp for every p ∈ R, x ∈ U . Deﬁnition 2. Given a decision table S =< U, C ∪ D, V, f >, D ={d }. All consistent element sets in S can be collected into set P s, that is, f or all Ei ∈ P s, f or all (x, y ∈ Ei ), (d(x) = d(y)). Ps is called positive element set. All inconsistent condition classes in S construct another set N s, that is, f or all Ei ∈ N s, ∃x, y ∈ Ei (d(x) = d(y)), N s is called negative element set. Deﬁnition 3. Given a decision table S =< U, C ∪ D, V, f >, C and D are its condition attribute set and decision attribute set respectively. Ps is its positive element set and Ns is its negative element set. Given an attribute subset B ⊆ C, f or all Ei ∈ P s, if Ei can satisfy one of the following two conditions: (1) There is a element set Ej ∈ P s(Ei = Ej ) , Ei and Ej have the same values for the condition attribute subset B, and diﬀerent values for the decision attribute set D. (2) There is a element Ej ∈ N s , Ei and Ej have the same values for the condition attribute subset B. Then Ei is called a collision element set on attribute set B in Ps, otherwise Ei is called a non-collision element set on attribute set B in Ps. Property 1. Assume Ei is a non-collision element set on attribute set B(B ⊆ C), for any attribute set A(B ⊆ A ⊆ C), Ei is also a non-collision element set on attribute A. On the contrary, assume Ei is a collision element set on attribute set B(B ⊆ C) , for any attribute set A(A ⊆ B ⊆ C) , Ei a is also collision element set on attribute set A. Proof. According to Deﬁnition 3, it is obvious. Proposition 1. Given a decision table S =< U, C ∪ D, V, f > , let P OSP (Q) be the P positive region of Q, for any attribute set P ⊆ C, P OSP (Q) = P OSC (D) if f there is no collision element set on P in its positive element set Ps. Proof. Obviously, the proposition holds according to Deﬁnitions 2 and 3. In this paper, according to positive element sets, negative elemental sets, Property 1 and Proposition 1, decomposition and merging reduction algorithms are presented based on division and conquering strategy.

792

3

F. Hu et al.

Decomposition and Merging Reduction Algorithms

Divide and conquer strategy is adopted in decomposition and merging reduction algorithm.The algorithm can be developed by three steps: (1)By decomposition algorithm, original decision table S =< U, C ∪ D, V, f > is decomposed into two sub-decision table S1 =< U1 , C ∪ D, V, f > and S2 =< U2 , C ∪ D, V, f >,where U1 ∩ U2 = φ and U1 ∪ U2 = U . (2)Find reduction of S1 and S2 respectively:red1 and red2 ,by classic attribute reduction algorithm(such as attribute reduction algorithm based on discernibility matrix,general attribute reduction algorithm, attribute reduction algorithm based on feature choice and attribute reduction algorithm based on information entropy,etc).(3)After red1 and red2 are gained,the attribute reduction red of S is computed by merging reduction algorithm. After decision table S is divided into S1 and S2 ,their attribute reduction red1 and red2 are acquired by traditional method, which is accustomed by us and will not be discussed in this paper.In the rest part, we will pay attention to discuss merging reduction algorithm.According to property 1 and proposition 1, the algorithm is proposed as follow. Algorithm 1. Merging Reduction Algorithm of Decision Table. Input: Decision table S1 =< U1 , C ∪ D, V, f > ,its attribute reduction red1 ,its positive element set P s1 and negative element set N s1 ; Decision table S2 = < U2 , C ∪ D, V, f >,its attribute reduction red2 ,its positive element set P s2 and negative element set N s2 Output: the attribute reduction result Red of the new decision table S =< U1 ∪ U2 , C ∪ D, V, f > Setp1:(Initiating) Red = red1 ∪ red2 ,F = φ,r = 1. Step2: (Eliminating redundant positive and negative element sets). f or each N si2 in N s2 do if N si2 ∈ N s1 , then N s2 = N s2 − {N si2 }. if ∃P sj1 ∈ P s1 (1 ≤ j ≤ |P s1 |),where P sj1 and N si2 have same values in attribute set C,then P s1 = P s1 − {P sj1 }. f or each N sk1 in N s1 do if ∃P sj2 ∈ P s2 (1 ≤ j ≤ |P s2 |),where P sj2 and N sk1 have same values in attribute set C,then P s2 = P s2 − P sj2 . Setp3:(N eed it add new attributes ?) f or each P si2 ∈ P s2 do f or each P sj1 ∈ P s1 do if P si2 contradicts P sj1 on attribute set Red, let c1 , c2 , ..., cu be attributes in attribute set C − Red on which P si2 and P sj1 have diﬀerent attribute values, and let br = c1 ∨ c2 ∨ ... ∨ cu ,r = r + 1.

A Novel Reduction Algorithm Based Decomposition and Merging Strategy

793

f or each N sk1 ∈ N s1 do if P si2 contradicts N sk1 on attribute set Red, let c1 , c2 , ..., cu be attributes in attribute set C − Red on which P si2 and N sk1 have diﬀerent attribute values, and let br = c1 ∨ c2 ∨ ... ∨ cu ,r = r + 1. f or each N si2 ∈ N s2 do f or each P sj1 ∈ P s1 do if N si2 contradicts P sj1 on attribute set Red, let c1 , c2 , ..., cu be attributes in attribute set C − Red on which N si2 and P sj1 have diﬀerent attribute values, and let br = c1 ∨ c2 ∨ ... ∨ cu ,r = r + 1. Setp4:(Add new attributes) Let F = b1 ∧ b2 ∧ ... ∧ bk−1 and transform F to a disjunctive formula F = (q1 ∨ q2 ∨ ... ∨ qi ∨ ... ∨ qn ). Choose the smallest term qj (qi = c1 ∧ c2 ∧ ... ∧ cl ) from F. A = {c1 , c2 , ..., cl }. Step5:(Eliminating possible existed redundant attributes) Red = red ∪ A. f or i = 1 to |Red| do P = Red; Let ci be i-th attribute in Red ; P = P − {ci }; if P osC (D) = P osP (D), then Red = P . Step6:(Return) return Red. Time complexity analysis of Algorithm 1: Let N1 be number of objects in S1 ,n1 be cardinality of element set in S1 ,Let N2 be number of objects in S2 ,n2 be the cardinality of element set in S2 .Let m be the cardinality of condition attribute set C,k be the cardinality of condition attribute set red1 ∪ red2 .Obviously, n1 < N1 and n2 < N2 .The best time complexity of algorithm 1 is :T = O(n1 × n2 ),and the worst time complexity is:T = O(n1 × n2 × m) + O(k × (n1 + n2 )2 ). The merging reduction algorithm is presented above, and the decomposition and merging reduction algorithm will be developed as follow. Algorithm 2. Decomposition and Merging Reduction Algorithm Input: Decision table S =< U, C ∪ D, V, f > Output: the attribute reduction result Red of S Step1(Decomposition) According to decomposition method of column vector, S is decomposes into two sub-decision tables S1 =< U1 , C ∪ D, V, f > and S2 =< U2 , C ∪ D, V, f >, where U1 ∪ U2 = U and U1 ∩ U2 = φ. Step2(Computing element sets of two sub − decision tables) According to Def.5,computing positive element set P s1 and negative element set N s1 of S1 , and computing positive element set P s2 and negative element set N s2 of S2 . Step3(Computing attribute reduction of sub − decision tables) By classical attribute reduction algorithm, acquiring attribute reduction red1 and red2 of S1 and S2 respectively.

794

F. Hu et al.

Step4(M erging reduce) By Algorithm1, red1 and red2 , computing attribute reduction Red of S. Step5(return) return Red. Since Algorithm 2 decomposes one |U |·|C ∪D| problem into two |U |·|C ∪D|/2 problems,and time complexity of Algorithm 1 is very small, Algorithm 2 will decrease the running time of reduction process of decision table to half,comparing to classic attribute reduction algorithm.And if distributed and parallel technique is adopted, the running eﬃciency will enhance more.

4

Experimental Results

In order to test the validity of the algorithms, two classical algorithms for attribute reduction (general attribute reduction algorithm [2] , attribute reduction algorithm based on information entropy [2]) are used. The conﬁguration of the PC in our experiments is P4 2.4G(CPU), 256M(memory), windowsXP (operation system). Data sets Heart c ls, P ima India, Crx bq ls, Liver and Abalone from UCI database are used to test. The whole data set are used as the original decision table, and it is decomposed into two sub-decision table. Attribute reductions are computed by two classical algorithms and Algorithm 2 for each original decision table. The experiment results are shown in Table 1. Where, T is running time of algorithm, its unit is second. n is the number of condition attributes in reduction results.Algorithm A represents general attribute reduction algorithm, and Algorithm B represents attribute reduction algorithm based on information entropy.original represents original decision table,subtable represents subdecision table.M ix represents mixed reduction, which means one sub-decision is reduced by general attribute reduction algorithm, and other sub-decision is reduced by attribute reduction algorithm based on information entropy.And Algorithm A , Algorithm B and M ix below Algorithm 2 are the attribute reduction algorithms of sub-decision table. Given a decision table, ﬁrst it is divided into two sub-decision tables.Then,the attribute reductions of sub-decision tables are acquired by classic algorithm, and the two sub-decision tables are merged by algorithm 1.In [6], an incremental Table 1. Algorithms A and B [2] with the proposed Algorithm 2 Algorithm A original subtable T n T Heart c ls 0.157 9 0.062 P ima India 0.453 5 0.157 Crx bq ls 0.86 6 0.265 Liver 0.14 5 0.094 Abalone 15.672 6 3.907 Datasets

Algorithm B Algorithm 2 original subtable Algorithm A Algorithm B M ix T n T T n T n T n 0.234 9 0.063 0.124 9 0.127 9 0.126 9 0.718 5 0.187 0.330 5 0.376 5 0.364 5 1.609 7 0.391 0.63 6 0.791 7 0.767 6 0.11 5 0.094 0.188 5 0.188 5 0.189 5 44.39 6 8.687 8.615 6 18.2 6 13.436 6

A Novel Reduction Algorithm Based Decomposition and Merging Strategy

795

Table 2. Experimental results compare with incremental reduction algorithms Algorithm 1 Algorithm A Algorithm B ELA T n T n T n Heart c ls 0 9 0.01 9 0.14 9 5 0.02 5 0.39 5 P ima India 0.016 0.1 6 0.09 7 15.907 6 Crx bq ls 5 0 5 0 5 Liver disorder 0 Abalone 0.801 6 0.826 6 36.734 6 Datasets

algorithm for attribute reduction is presented, but it can not process inconsistent decision table, which needs be modiﬁed.In the paper,the algorithm is modiﬁed, namely ELA (Extended Liu’s algorithm).After the attribute reduction of one sub-decision table is obtained,another sub-decision table is added and attribute reduction is gained by ELA.The experiment results are shown in Table 2. From Table 2, it shows that running time of Algorithm 1 is far less than ELA.Of course, when the size of two sub-decision tables is not balanceable(such as one occupies 90% and the other occupies 10% of original decision table), ELA may be better than Algorithm 1.

5

Conclusion

Rough set theory has been receiving paid more attention for its merit. Its matured theory foundation has been put to use in many ﬁelds. However,its application brings about little beneﬁt in industry [7]. One main reason is that many algorithms in rough sets theory can not do in large data sets. Therefore, it is urgent to ﬁnd more highly eﬃcient and quicker algorithm. Based above, Rough set theory is studied and a determinant method about attribute reduction based on element set is presented. Besides, a decomposition and merging attribute reduction algorithm is proposed.The algorithm also can be regarded as an incremental algorithm for attribute reduction.It can adapt to bigger data sets, which can be embodied from simulation experimental results.

References 1. Pawlak, Z.: Rough Sets. International Journal of Computer and Information Sciences, 11 (1982) 341-356 2. Wang, G.Y.: Rough Set Theory and Knowledge Acquisition. Xi’an Jiaotong University Press, Xi’an (2001) (in Chinese) 3. Pan, D., Zheng, Q.L.: An Adaptive Searching Optimal Algorithm for the Attribute Reducts. Journal of Computer Research and Development, 38(8)(2001) 904-910 (in Chinese)

796

F. Hu et al.

4. Miao, D.Q., Hu, G.R.: A Heuristic Algorithm for Attribution of Knowledge. Journal of Computer Research and Development, 36(6) (1999) 681-684 (in Chinese) 5. Wang, J., Wang, J.: Reduction Algorithm Based on Discernibility Matrix: the Ordered Attributeds Method. Journal of Computer Science & technology, 16(6) (2001) 489-504 6. Liu, Z.T.: An Incremental Arithmetic for the Smallest Reduction of Attributes. Acta Electronica Sinica, 27(11) (1999) 96-98 (in Chinese) 7. Liu, S.H., Sheng, Q.J.: Research on Eﬃcient Algorithms for Rough Set Methods. Chinese Journal of Computers, 26(5) (2003) 524-529 (in Chinese)

A Pattern Distance-Based Evolutionary Approach to Time Series Segmentation* Jingwen Yu1, Jian Yin1, Duanning Zhou2, and Jun Zhang1 1

Department of Computer Science Sun Yat-Sen University, Guangzhou, 510275, China 2 Department of Accounting & Information Systems Eastern Washington University, U.S.A [email protected]

Abstract. Time series segmentation is a fundamental component in the process of analyzing and mining time series data. Given a set of pattern templates, evolutionary computation is an appropriate tool to segment time series flexibly and effectively. In this paper, we propose a new distance measure based on pattern distance for fitness evaluation. Time sequence is represented by a series of perceptually important points and converted into piecewise trend sequence. Pattern distance measures the trend similarity of two sequences. Moreover, experiments are conducted to compare the performance of pattern-distance based method with the original one. Results show that pattern distance measure outperforms the original one in correct match, accurate segmentation.

1 Introduction Recently, the increasing use of temporal data has initiated various research and development efforts in the field of data mining. Time series are an important class of temporal data objects and can be easily obtained from financial and scientific applications. Undoubtedly, discovering useful time series patterns is important [1][2]. Time series data are characterized by their numerical and continuous nature. Hence, it needs to discretize a continuous time series into significative symbols [3][4]. This process is called “numeric-to-symbolic” (N/S) conversion, and is considered as one of the most basic processes before mining in the time series. In reference [1], the authors considered the problem of finding a set of suitable time points for segmentation with a set of pattern templates as an optimization problem and solved it with evolutionary computation by identifying the perceptually important points directly from the time domain. Reference [1] utilizes direct point to point *

This work is supported by the National Natural Science Foundation of China (60573097), Natural Science Foundation of Guangdong Province (05200302,04300462), Research Foundation of National Science and Technology Plan Project (2004BA721A02), Research Foundation of Science and Technology Plan Project in Guangdong Province (2005B10101032) and Research Foundation of Disciplines Leading to Doctorate degree of Chinese Universities(20050558017).

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 797 – 802, 2006. © Springer-Verlag Berlin Heidelberg 2006

798

J. Yu et al.

distance (DPPD) for fitness evaluation to measure the similarity between time series segments and given pattern templates. However, DPPD is not an appropriate measure to evaluate the similarity of two sequences with different length because it usually doesn’t reflect the nature of similarity of two sequences. In this paper, we extend the work presented in [1], and study the limitation of distance measure based on DPPD. We present a novel distance measure based on pattern distance [5], which improves the performance of evolutionary approach [1]. For convenience, we refer [1] as DPPD (direct point to point distance) approach. The rest of this paper is organized as follows: In Section 2, we introduce the concept of perceptually important points, distance measure and analyze problems of this distance measure in the DPPD approach. In Section 3, we introduce our new distance measure for fitness evaluation in detail. In Section 4, we present the results of the evolutionary time series segmentation with new proposed distance measure and compare our work with the DPPD approach. Section 5 provides the conclusions of the paper.

2 Review of Distance Measure in Evolutionary Time Series Segmentation 2.1 Perceptually Important Points and Distance Measure In the time series mining system, the interesting and frequently appearing patterns usually can be abstractly represented by a few critical points. These points have perceptually important influences in the human vision. Based on this idea, the DPPD approach used perceptually important points (PIPs) to characterize a temporal sequence. The pseudocode of the PIP identification process is as below: function FindPIP (P,Q) Input: sequence P[1,...,m], length of pattern Q[1,...,n] Output: pattern SP[1,...,n] begin First set SP[1] := P[1] and SP[n] := P[m] Repeat Select point p[j] with highest fluctuation rate to the adjacent points in SP and add p[j] to SP until all the PIPs of P are put into SP[1,...,n],i.e., SP[I,...,n] is fully filled return SP end. After definition of PIPs, the distance between the sequence P and the query Q can be computed directly using point-to-point comparison. Suppose that: SP={(SPk ,SPkt) | k=1,...,n} denotes the PIPs extracted from sequence P, SPk and SPkt respectively denote the amplitude coordinate and time coordinate of PIPs of sequence P; and Q={(Qk ,Qkt) | k=1,...,n} has the same meaning. To take both temporal and amplitude distortion into consideration, the distance measure defined in [1] is as below:

1 n 1 n t 2 t DM ( SP, Q ) = ω × ( SP − Q ) 2 + (1 − ω ) × ¦ ¦ ( SP − Q ) . 1 1 k k n − 1 k =2 k n k =1 k

(1)

A Pattern Distance-Based Evolutionary Approach to Time Series Segmentation

799

and Ȧ1 denotes the weighting for the amplitude distance and the temporal distance, which can be considered as a parameter and can be specified by the users. 2.2 PIPs Added and PIPs Lost Problems

In reference [1], the number of PIPs for each segment is required to keep same with the length of pattern templates. Hence, there may be PIPs added or PIPs lost problems when the amount of apparently fluctuant points in a segment is more or less than the length of corresponding pattern template. (see Fig.1)

(a) PIPs added problem

(b) PIPs lost problem

Fig. 1. PIPs added and PIPs lost problems. In (a), the third PIP of segment is unfluctuating and so is redundant. The segment in (b) misses two PIPs respectively between 3rd and 4th, 6th and 7th.

The distance measure with formula (1) is based on DPPD. It is brittle when there are PIPs added and PIPs lost problems. As we see from Fig.1 (a), the added PIP between 2nd and 4th will cause the point to point mapping after the 3rd PIP warping a point, which leads to the warp of distance calculation of corresponding points. The same thing happens when there are PIPs lost problems (see Fig.1 (b)). In this case, the distance with formula (1) of two similar sequences is still very large and the DPPD measure can not reflect the actual distance of two sequences well.

3 Pattern Distance In Section 2, we have analyzed in detail the drawbacks of the distance measure based on DPPD. To overcome the misdirection caused by time warp in the point to point comparison during the evolutionary process, we propose a new distance measure based on pattern distance. It computes the dissimilarity of two given time sequences and indicates the trend similarity [6] right. Sometimes we determine whether two sequences are similar by comparing their trends. Pattern distance (PD) [5] is easy to implement and can be calculated rapidly since it doesn’t need to normalize the amplitude. Moreover, PD can overcome the problem of pattern mismatch in the DPPD approach, and it also works well when sequences appear through the time series in different resolution.

800

J. Yu et al.

In order to calculate the PD of two sequences, first, sequences should be converted into piecewise linear representation (PLR). The PLR sequence can be implemented by finding a series of PIPs in the sequence as mentioned in Section 2.1. Second, the PLR sequence should be converted into piecewise trend sequence [7]. We use trend m to describe the dynamic varying status for each line of the PLR sequence. There are usually three statuses: uptrend, downtrend or continuation. We define a Trend set M, which consists of three items: 1, -1, 0, respectively representing the three statuses above. According to every line’s dynamic varying status, time series represented by PLR can be converted into trend sequence S t = {(m1 , t1 ),..., (mi , t i ),..., (m N , t N )} , where

mi ∈ M and ti denotes the end time point of trend mi.

Hence, given two trend sequences S1 = {(m11 , t11 ),..., (m1i , t1i ),..., (m1 N , t1N )} and

S 2 = {(m 21 , t 21 ),..., (m 2i , t 2 i ),..., (m 2 N , t 2 N )} , when t1i = t 2i for i=1,2, …,N, the pattern distance between S1 and S2 can be calculated as below: 1 N D p (S , S ) = )× | m − m | . ¦ (t − t 1 2 1i 2i t i =1 1i 1(i −1) 1N

(2)

From above, we can see pattern distance has some good properties: first, its value varies from zero to one; second, the more similar the two trend sequences are, the closer its value is to zero; third, it is not sensitive to the noise of sampling data and time axis. Nevertheless, the corresponding end time points of two trend sequences are seldom same with each other in practice [8]. Assume T1 denotes the set of all the end time points of all trend segments from one trend sequence, and T2 denotes the same set from the other one, then a feasible method to solve the time inconsistency problem is to reconstruct the two trend sequences with time points set T1 T2. For instance, S1={ (1, 2), (0, 4), (-1, 5) } and S2={ (0, 1), (1, 5) }, so the set of all end time points of all trend segments from S1 and S2 is T={2, 4, 5} {1, 5}={ 1, 2, 4, 5 }. We divide S1 and S2 again by set T and get the two reconstructed trend sequences: S1’= { (1, 1),(1, 2),(0, 4), (-1, 5) } and S2’={ (0, 1), (1, 2), (1, 4), (1, 5) }.With aligning in time coordinate, we can directly use formula (2) calculating the pattern distance between two trend sequences.

ы

ы

4 Experimental Results In this section, we present the performance of new distance measure proposed in Section 3 and compare its results with the DPPD approach. The standard pattern templates used in our experiment include four dissymmetrical patterns as shown in Fig.2. All of their length is 9 (i.e., 9 points). In addition, we used an artificial time series generated from the stochastic combination of standard pattern templates to evaluate the segmentation results of the new distance measure and compared its results with DPPD approach. The artificial time series used in our experiments is shown as Fig.3.

A Pattern Distance-Based Evolutionary Approach to Time Series Segmentation

801

Fig. 2. Standard pattern templates

Fig. 3. Artificial time series

The evolutionary algorithm used in our approach is same with [1]. We used the same parameters for both the DPPD approach and our approach in all of our experiments. Some tests show that setting the parameters of evolutionary algorithm to following values can obtain better evolutionary results. The size of population was set to be 50, the crossover rate was set to be 0.4, the mutation rate was set to be 0.2, and furthermore, the probability of adding a time point during mutation was set to be 0.5. Before presenting the results, we will introduce two notions frequently used in this section: Notion 1. (Correct Match) It means every segment of time series is correctly matched to the corresponding expected pattern when the segmentation is approximately correct. We use CM for short. Notion 2. (Accurate Segmentation) It means not only the segmentation of a time series has a correct match, but also the time series is accurately segmented without a little warp. The accurate segmentation for a time series is unique. We use AS for short. Table 1. Comparison of different distance measures in CM & AS (MaxGen is the maximum number of generation in evolutionary approach)

MaxGen 10000 30000 50000

Direct Point to Point Distance CM AS 10 5 10 9 10 9

Pattern Distance CM AS 10 7 10 10 10 10

802

J. Yu et al.

In order to avoid the haphazard result, we did 10 tests for each experimental scheme and recorded every result. We recorded the number of CM and AS in the 10 tests using our proposed distance measure and compared the results with the DPPD approach. The results are shown in Table 1. From Table 1 we can see that both the new distance measure and DPPD measure always have the accurate segmentation after 50000 generations. By comparing the number of CM and AS in the 10000th, 30000th, 50000th generation, it is apparent to see both of the two distance measures can always choose the correct matched pattern template for each segment after 10000th, but pattern distance measure has better results than the DPPD approach in Accurate Segmentation since it always has accurate segmentation after 30000 generations, i.e., the new approach based on pattern distance measure is easier to find the accurate segment points than the DPPD approach.

5 Conclusion We have presented a new distance measure based on pattern distance for fitness evaluation in the evolutionary time series segmentation. Differing from the DPPD, the new distance measure is to compare the trend similarity of two sequences and so is robust to amplitude transformation, time phase, scale and baseline. We have compared the performance of the proposed distance measure with the DPPD approach. Experimental results show that pattern distance measure performs better than the DPPD approach in terms of correct match, accurate segmentation. Furthermore, pattern distance is efficient and leads to optimal solution due to its good properties.

References 1. Chung, L., Fu, T. C., R. Luk.: An Evolutionary Approach to Pattern-Based Time Series Segmentation. IEEE Trans. Evol. Comput., vol. 8, vol. 5, 1 (2004) 471-489 2. Fu, C., Chung, F. L., V. Ng, R. Luk.: Evolutionary Segmentation of Financial Time Series into Subsequences. In Proc. Congr. Evolutionary Computation, Seoul, Korea, (2001) 426-430 3. Das, Lin, K. I., H. Mmmila.: Rule Discovery from Time Series. In Proc. ACM SIGKDD Int. Conf. Knowledge Discovery Data Mining, (1998) 16-22 4. Kai, Y., Jia, W., Zhou, P., Meng, X.: A New Approach to Transforming Time Series into Symbolic Sequences. In Proc. 1st joint BMES/EMBS Conf., vol. 2, 10 (1999) 974 5. Wang, D., Rong, G.: Pattern Distance of Time Series. Journal of Zhejiang University (Engineering Science), vol. 38, no. 7, 6 (2004) 795-798 6. Yoon J, L. EE. J, Kim. S.: Trend Similarity and Prediction in Time Series Databases. In Proceedings of SPIE on Data Mining and Knowledge Discovery: Theory, Tools, and Technology II [C], Washington: SPIE (2000) 201-212 7. Wang, D., Rong, G, Li, H. L.: Variable Step Algorithm for Sub-trend Sequence Searching. Journal of Zhejiang University (Engineering Science), vol. 38, no. 12, 12 (2004) 1566-1569 8. Yu, Z., Peng, H., Zheng, Q. L..: Pattern Distance of Time Series Based on Segmentation by Important Points. In Proc. 4th Int. Conf. on Machine Learning and Cybemetics, 8 (2005) 1563-1567

A Quality Prediction Method of Injection Molding Processes Using Sub-stage PCA-SI XiaoPing Guo1,2, FuLi Wang2, and MingXing Jia2 1

Information Engineering School, Shenyang Institute of Chemical Technology, Shenyang 110142, China [email protected] 2 Information Science and Engineering School, Northeastern University, Shenyang 110004, China [email protected]

Abstract. Injection molding process, a typical multistage batch process, due to controller feedback, process or feedstock disturbances and unavailability of direct on-line quality measurements, it is difficult for on-line quality control. A quality prediction method based on sub-stage PCA-SI (subspace identification) is proposed for dedicating to capture batch-to-batch dynamic correlations among the variables in the historical data, and to capture dynamic correlation between process variables at different stages of injection molding process and final qualities, and to build a stage-based on-line quality prediction model of injection molding process. Application has demonstrated that the proposed method can not only give a valid quality prediction, but also it can effectively carry on quality closed-loop control.

1 Introduction Injection molding, an important plastic processing technology, transforms plastic granules into various products with high precision and productivity. Approximately 32% by weight of all plastics go through injection molding today, and there is hardly an industry that does not use injection molded parts in the manufacture of its products. Due to multistage, high dimensionality, dynamic, batch-to-batch variation, and also limited product-to-market time, it is difficult to develop a first-principle or knowledge-based model for quality prediction. These quality variables are typically measured off-line with appreciable delays that render simple feedback control strategies infeasible. So, on-line quality prediction method has been attracting increasing interest in unity of process safety and quality control [1]. An inherent nature of injection molding process is multiplicity of the operation stages. each stage has its own underlying characteristics and can exhibit significantly different behaviors over different stages. Now, multivariate statistical process control (MSPC) such as multi-way partial least square (MPLS) and MPCA have gained attention over the past decade noticeable by the large number of publications in this area [2]. But these methods use process variables over the entire batch course as the inputs and are linear static methods. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 803 – 809, 2006. © Springer-Verlag Berlin Heidelberg 2006

804

X. Guo, F. Wang, and M. Jia

Up to now, for multistage process modeling, Kosanovich et al. [3] and Dong et al. [4] have proposed to define the stages by operation time according to the process knowledge. Lu [1] has proposed a stage-based sub-PCA modeling method which can overcome many difficulties of MPCA-based monitoring for batch processes. Subspace identification methods (SIM) have drawn a great deal of attention in recent years, which aimed at finding a minimum-order state space model. Recently, Dorsey and Lee [5] showed how the method can be used on batch process data to derive a model capturing batch-to-batch dynamics or correlations. The main advantage is that measurements of previous batches such as delayed laboratory measurements of the quality variables can be used directly in the model for improved predictions. Thus, in this paper a quality prediction method based on sub-stage PCA-SI method is proposed for dedicating to reveal dynamic relationship between process variables at different stages and final qualities, and to build a stage-based on-line quality prediction model of injection molding process.

2 Process Description Injection molding [6], an important polymer processing technique, transforms polymer materials into various shapes and types of products. As a typical multistage process, injection molding operates in stages, among which, filling, packing-holding, and cooling are the most important phases. During filling, the screw moves forward and pushes melt into the mold cavity. Once the mold is completely filled, the process then switches to the packing-holding stage, during which additional polymer is “packed” at a high pressure to compensate for the material shrinkage associated with the material cooling and solidification. The packing-holding continues until the gate freezes off, which isolates the material in the mold from that in the injection unit. The process enters the cooling stage; the part in the mold continues to solidify until it is rigid enough to be ejected from the mold without damage. Concurrently with the early cooling phase, plastication takes place in the barrel where polymer is melted and conveyed to the front of barrel by screw rotation, preparing for next cycle. In injection molding process, product surface defect, such as jetting and record grooves, depends only on injection stage, while product dimension is mainly determined by packing holding stage. The material used in this work is high-density polyethylene (HDPE). Ten process variables are selected for modeling, that is, Nozzle Pressure, Stroke, Injection Velocity, Hydraulic Pressure, Plastication Pressure, Cavity Pressure, Screw Rotation Speed, SV1 valve opening, SV2 valve opening, and Mold Temperature, respectively. The sampling rates of these process variables are 20 ms. The operating conditions are set as follows: injection velocity is 25mm/sec; mold temperature equals 25ºC; seven-band barrel temperatures are set to be (200, 200, 200, 200, 200, 180, 160, 120) ºC; packingholding time is fixed to be 3 second. Quality variables are part weight. Totally, 60 batch runs are conducted under 19 different operation conditions, which can cover all the normal operation range. Based on these data, a stage-based PCA-SI model is developed for process analysis and quality prediction.

A Quality Prediction Method of Injection Molding Processes

805

3 Sub-stage PCA-SI Modeling Method 3.1 Identify Critical-to-Quality Stages Step 1. The data gathered from multistage batch process forms a three-dimensional data matrix, X( I q J q K ) where I denotes cycle number, J denotes variable number, and K denotes the number of samples within a cycle. X( I q J q K ) is unfolded, with each of the K time slabs concatenated to produce a two-way array Xnew ( I q JK ) . Based on sub-PCA stage partition strategy, a batch run is divided into several stages.

Step 2. Identify critical-to-quality stages which has the most important contribution to a quality. It is important to identify critical-to-quality stages and find out in each stage the key process variables that contribute the most to the quality variations. In this paper, the multiple coefficient of determination, R2 in multiple regression analysis, is borrowed to evaluate the accuracy of the derived stage model [7].

( y (i) yˆ (i)) 1 ( y (i) y ) j

R

2 j

2

j

i

2

j

, i 1,2,", I ; j 1,2,", J y

(1)

j

i

Note that the above measure is developed for each quality variable, y j ( j 1, 2, ", J y ), not for the whole quality data sets. For the quality variables highly relevant to a certain stage, they have large R 2 in that stage, indicating that the regression model of that stage is reliable to explain the variation of these quality variables. On the other hand, for the quality variables irrelevant to that stage, they have small R 2 . The regression model can not give an accurate prediction. Under the precondition that the derived stage model has sufficiently extracted the systematic correlation between process variables and quality variables, for a certain quality variable y j , the R 2 values may indicate which stage has the most contribution to the variation of this quality variable. 3.2 PCA-SI Modeling All un-fold data of one stage represented as X ( I q m) , where I is batch numbers, and m is the number of process variables after un-fold data of the stage. :H definH a lifted output vector y containing all the measurements throughout the VWDJH. Letting the outputs measured throughout the stage be denoted as y and creating the time index t = 0,1,..., n, y can be expressed as

ykT = ª¬ y (1)Tk , y (2)Tk ,..., y ( n)Tk º¼

(2)

where y ( n)Tk refers to the vector of available measurements at sample time point n of RQH stage of the k th batch. The dimension of such a lifted vector is typically very

806

X. Guo, F. Wang, and M. Jia

high and may cause difficulties in applying the subspace identification algorithm. To address this problem, we first recognize that variations in the elements of yk tend to be strongly correlated since the variations are likely due to a few dominant disturbances, which affect the measurements altogether. Hence, we can use the principal component analysis (PCA) to compress the lifted on-line measurement vector to a much lower dimension. Let y represent the score vector from applying PCA to data for y . The quality measurement vector q can then be appended to this reduced vector. The resulting data containing the compressed online data and quality data are then differenced with respect to the batch index in order to minimize the effect of no stationary batch-to-batch drifts potentially contained in the process data. Let us denote the differenced score vector by ∆ y k = y k − y k −1 (3) Then, by applying the subspace algorithm to the differenced data, a state space model of the following form can be identification Subspace identification is a technique used to build a state space model directly from available data from the process. The following stochastic model of the process FDQEH extracted from the data using a subspace identification technique [8].

x k +1 = Ax k + K ε k ª ∆ y k º ªC y º « » = « » xk + ε k ¬« ∆qk ¼» «¬Cq »¼

(4)

+HUH ε k represents the model’s prediction error and is a white noise sequence. In the continuous process setting, the index k refers to the current sample time. On the other hand, in the batch process setting, the index k refers to the current batch number. Hence, the dynamic correlations captured in the identified state space model are those that are present from batch to batch. Intrabatch dynamics or correlations are captured through the covariance matrix of ε k , which is typically nondiagonal. The output matrix is split into C y and C q to correspond to the reduced process measurement vector and the full quality vector. This model can allow for predictions at all time points during the batch by rewriting the model in a time-based output equation as demonstrated below. This requires us to create the following augmented form of the original model

x k = ª¬ xk

yk

qk

qk −1 ε k º¼

T

(5)

To illustrate the handling of delayed measurements, we have also assumed that the quality variables are delayed by one batch, and appended the quality measurements from the previous batch to the state vector. The augmented model form is obtained as

A Quality Prediction Method of Injection Molding Processes

x k +1

0 ª º K º ª A 0 0 0 « » «C A I 0 0 C K » ª¬ I ny 0 º¼ » « y y « » = «Cq A 0 I 0 Cq K » x k + «« ª0 I nq º »» ε k +1 ¬ ¼ « » « » 0 I 0 0 » « 0 0 « » «¬ 0 » 0 0 0 0 ¼ «¬ » I

¼ Φ

807

(6)

Γε

I ny and I nq are used here to indicate the identity matrices of dimensions of the reduced on-line and full off-line measurements, respectively. The model can be used in real time by creating the following time varying output equation yˆ k (t ) = H (t )Θ (ϑ x k (t t )) + ε k (t )

yˆ k ( t )

(7)

The residual white noise term ε is added because we have thrown some information away through PCA. We define ϑ = ª¬ 0 I ny 0 0 0 º¼ such that the current reduced space vector y k is selected from x k. Also, Θ is the matrix containing the PCA loading vectors and expands the reduced vector to the full measurement space. H (t ) is a timevarying matrix that picks out those measurements that become available at the t th sample time. This is used together with the batch-to-batch update equation above to form a periodically time-varying system with period M, which in turn can be used to construct a Kalman filter and update the state vector based on incoming on-line measurements yk (t )

x k (t t ) = x k (t − 1 t − 1) + K (t ) ª¬ H (t )Θϑ x k (t − 1 t − 1) − yk (t ) ¼º , t ∈ 1, 2,..., M x k +1 (0 0) = Φ x k ( M M )

(8)

When off-line measurements are available, H(t) should be chosen to include the delayed quality variables from the state equation. This model form thus allows end product quality predictions throughout the batch based on current incoming measurements and previous batch measurements [5].

4 Experimental Results Without using any prior process knowledge, using sub-PCA based stage-division algorithm, the trajectories of an injection molding batch run is divided into four stages according to the change of local covariance structure, correspond to the four physical operation stages, i.e., injection, packing-holding, plastication and cooling stages. The final quality variables have weak relation with the plastication and cooling stage. The on-line quality prediction model is a distributed model, the weight variables are estimated by sub-stage subspace model in packing stage.

808

X. Guo, F. Wang, and M. Jia

7he predicted results of the proposed method and desired output for 23 batches using the off-line model of packing stage are shown in Fig. 1. It is clear that the predictions are much closer to the actual weight measurements, indicating significant effectivement by using data of the packing stage prediction model. Based on the substage model prediction, the packing pressure is adjusted automatically to compensate for the disturbance. Fig. 2 shows the comparison of the on-line predicted weight and the actual measurement. It is clear that they have good agreement.

5 Conclusion For injection molding process, a typical multi-stage batch process, a new quality prediction method has been applied. Firstly, Process stages are determined by analyzing the change of process covariance structure and partitioning time-slice PCA loading matrices using a new clustering algorithm. Then applying subspace identification based on PCA to un-fold stage data and quality data. In addition, a quality control scheme was established. Application has shown that the proposed methods can give a valid quality prediction.

29.6

27.2

28.8

Real measurement of Weight = 26.71 Average of predicted values = 26.74 Prediced error = 0.097% Maximum predicted error = 0.52%

27.0

28.0

weight(g)

weight(g)

Real measurement of final quality On-line predicted values

Design Prediction

27.2

26.8

26.6

26.4 26.4

25.6 5

10

batch

15

20

Fig. 1. Predicted results for weight variables

80

100

120

140

160

180

200

220

240

sample

Fig. 2. On-line weight predicted results

Acknowledgement This work was supported by the National Science Foundation of China under Grant 60374003 and project 973 under Grant 2002CB312200.

References 1. Lu, N., Gao, F., Yang, Y., Wang, F.: A PCA-Based Modeling and Online Monitoring Strategy for Uneven-Length Batch Processes. Ind. Eng. Chem. Res., Vol. 43, No. 5 (2004) 3343-3352 2. Nomikos, P., Macgregor, J. F.: Multiway Partial Least Squares in Monitoring Batch Processes. Chemometrics Intell. Lab. Syst., Vol. 30, No. 1 (1995) 97-108

A Quality Prediction Method of Injection Molding Processes

809

3. Kosanovich, K. A., Piovoso, M. J., Dahl, K. S., MacGregor, J. F., Nomokos, P.: Multi-way PCA Applied to an Industrial Batch Process. Proceedings of the ACC (1994)1294-1298 4. Dong, D., McAvoy, T. J.: Multi-stage Batch Process Monitoring. Proceeding of ACC, Sprange E.N.M. (1995) 5. Dorsey, A., Lee, J. H.: Building Inferential Prediction Models of Batch Processes Using Subspace Identification. J. of Process Control, Vol. 13, No. 5 (2003) 397-406 6. Yang, Y.: Injection Molding: from Process to Quality Control, Ph.D. Thesis, The Hong Kong University of Science & Technology (2004) 7. Johnson, R., Wichern, D.: Applied Multivariate Statistical Analysis, Prentice Hall: Upper Saddle River, NJ (2002)

A Robust Algorithm for Watermark Numeric Relational Databases* Xinchun Cui1,2, Xiaolin Qin1, Gang Sheng3, and Jiping Zheng1 1

School of Information Science and Technology, Nanjing University of Aeronautics and Astronautics 210016, Nanjing, Jiangsu, China {cuixc, qinxcs, Zhengjiping}@nuaa.edu.cn 2 School of Information Technology and Communication, Qufu Normal University, 276826 Rizhao, Shandong, China 3 School of Operations Research and Management, Qufu Normal University, 276826 Rizhao, Shandong, China [email protected]

Abstract. This paper studies a novel method for watermarking relational databases for copyright protection. A novel technique to insert and detect watermarks using a mark bit position is proposed, together with a decision-making algorithm by pattern matching. Experimental results show that the proposed scheme is robust against various forms of attacks.

1 Introduction Digital Watermarks help to protect digital assets from unauthorized duplication and distribution by enabling provable ownership over the content. Although extensive efforts have been invested in the problem of watermarking multimedia data, little has been done to protect copyright of relational databases. As far as we know, there is only a few pioneer studies on watermarking relational databases so far [1-8], [12-13], which can be classified into several kinds according to different techniques they used. Namely, secret key based methods [1], [2], [6], [7], transform based methods [3-5], [12] and noise based methods [8], [13]. In this paper, we present a new technique to insert and detect watermarks using a mark bit position, together with a decision-making algorithm by pattern matching. In order to improve robustness, the scheme embeds a watermark into the database for several times and recovers the detected watermark by majority voting mechanism.

2 Algorithms for Watermarking Relational Databases The problem of watermarking relational databases can be modeled as follows. Suppose relation R contains primary key P and numerical attributes A0, A1, …, AȞ-1. *

This work is supported by the High-Technology Research Project of Jiangsu Province of China (No.BG2004005) and Aeronautics Science Foundation of China (No. 02F52033).

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 810 – 815, 2006. © Springer-Verlag Berlin Heidelberg 2006

A Robust Algorithm for Watermark Numeric Relational Databases

811

Assume that it is acceptable to change one of ȟ least significant bits (LSB). A bit-string of length |L| is to be embedded into relation R for the purpose of copyright protection. Table 1 summarizes the important parameters used in our algorithms. Table 1. Notations used in this paper

Ș Ȟ Ȗ ȟ Ȧ k Į

Number of tuples in the relation Attribute numbers to be marked Fraction of tuples marked Number of candidate bits to be modified Number of tuples actually marked The secret key Significance level for detecting a watermark

Our technique aims to mark only numeric attributes. The data owner is responsible for deciding which attributes are suitable for marking. He also helps to decide two important parameters and , which describe the limit of modification to a database. We also suppose the adversary should not remove or modify the primary key attribute for the integrity and availability of the database.

¯

¤

2.1 Insertion Algorithm A one-way hash function H helps to identify a tuple in insertion and detection phase. Usually it has the form of h = H(M), where M is the seed. Besides, It bears the following characteristics [2][11]: Given M, it is easy to compute h; Yet when given h, it is hard to compute M, such that H(M) = h; Given M, it is hard to find another message M0 such that H(M) =H(M0). Several hash functions such as MD5 and SHA are good choices for this purpose[11]. We First transform a meaningful watermark M(Plain Watermark) into a bit flow E(EMC, Encrypted Mark Code) of certain length. Next, we decide which tuple(attribute and bit) should be marked by the hash value. The remainder of h module L(k=h mod L) decides which bit of the EMC is used at one time. Since the hash result is expected to be uniform distributed, we can divide the relation into groups of varied but similar sizes. Thus we have L (the number of bits of the watermark) groups. The i-th bit of the watermark will be inserted into the i-th group. The ascending order of i ranging from 0 to L-1 naturally preserves the sequence of the marks. This is helpful in the detection phase. Finally we check the availability [3] with respect to the intended use of the data. If not acceptable, we simply give up watermarking this tuple and roll back. The watermark insertion algorithm was given as follows:

812

X. Cui et al. Algorithm 1. Watermark insertion algorithm // Insert a plain watermark M into relation R in form of EMC, return marked R // The parameters k, L, Ȗ, and Ȟ are all private to the owner. 1) calculate L-bit EMC E[L]=H(k concatenate M) // L is the length of EMC 2) foreach tuple r R do 3) t= H( k concatenate r.P) 4) if ( t mod Ȗ equals 0) then // mark this tuple 5) attribute_ index i = t mod Ȟ // mark attribute Ai 6) bit_index j = t mod ȟ // mark j th bit 7) watermark_index k= t mod L //use the k-th bit of EMC to mark // get the value of marked bit 8) mark_bit m=Ek XOR (k mod 2) 9) set the j-th least significant bit of r.Ai to m 10) if (not within_usability(new_data)) // check the availability 11) rollback 12) else commit 13) return R

ę

¯

2.2 Detection Algorithm The same mechanism as in the insertion algorithm is used to recognize the marked tuples. For each marked bit, we count the numbers of its value being zeroes and ones respectively, and then use a majority voting mechanism to decide the final value of this bit. So far, we get a detected watermark. Because of the identical distribution of the hash function when seeded by the same key, the selected tuples in each group is of the same order as in the insertion algorithm. Thus we get a sequential pattern string. The watermark detection algorithm was given as follows: Algorithm 2. Watemark detection algorithm // Algorithm to return a watermark DM[ ] from relation R // The parameters k ,L, Ȗ, and Ȟ are private to the owner. 1) for s=0 to L-1 do 2) DM[s]=’’ // initialize detected mark code 3) count[s][0]=0, count[s][1]=0 // initialize counter 4) foreach tuple r R do 5) t= H( k concatenate r.P) 6) if ( t mod Ȗ equals 0) then // this tuple selected 7) i = t mod Ȟ // select attribute Ai 8) j = t mod ȟ // select j-th bit 9) k= t mod L // the k-th bit of EMC used 10) m= ( j-th LSB of r.Ai) XOR (k mod 2) // k-th bit value of EMC 11) count[k][m]=count[k][m]+1 // add the counter 12) for s=0 to L-1 // get the watermark 13) if (count[s][0]>=count[s][1]) // majority voting 14) then DM[s]=0 else DM[s]=1 // to get the final bit value 15) return DM[ ]

¯

ę

A Robust Algorithm for Watermark Numeric Relational Databases

813

2.3 Decision Making Algorithm As is given in 2.2, the detected mark is a sequential pattern string. So, if a pattern with rather small probability is likely to happen following a certain predefined routines decided by a secret key, we can conclude that the owner of the watermarked relation is the one who has the secret key once the pattern is detected. Algorithm 3. Decision making algorithm // to decide whether a piracy occurs; M is the plain watermark, DM is the detected EMC // The parameters k, L, , and are all private to the owner. 1) L-bit EMC E[L]=H(k concatenate M) // calculate L-bit EMC 2) matchCount=0 3) for i=0 to L-1 4) if (E[i] equals DM[i]) 5) then matchCount= matchCount+1 6) = threshold(L, ) 7) if (matchCount> ) then suspect piracy

¤¯ ®

´

¢ ´

For a non-marked relation, the value of a certain bit can be modeled as a Bernoulli trial, so as an EMC produced by a one-way hash function. Each bit takes equal probability of being zero or one. For a special EMC string seeded by a peculiar secret key, each bit is fixed. To a certain bit of an EMC string, the probability of a corresponding bit of another string equals to what is 0.5, that is: P(Ei=DMi)=0.5 (i=0,1,

Ă/

(1)

Because each bit submits to independent and identical distribution, the number of matches is a random variable that has a binominal distribution. Denote the number of matches in L trials by SL. The probability of having at least N matches in L trials, the cumulative binomial probability, can be written as: P(SL > N ) = B(L, N, p ) = ¦ b(L, N, p ) L

i =n

¢

(2) .

So that a threshold can be deduce according to and L[1][7] The significance level Į determines how amenable the system is to false hits. By choosing a lower value of Į, we can increase our confidence that if the detection algorithm finds a watermark in a suspected relation, it probably is a pirated copy. The parameter Į is decided by the detector freely.

3 Experiments and Analysis To test the validity and robustness of this algorithm, we perform experiments on a computer running Windows XP Professional with 2.4 GHz CPU and 256MB RAM. Algorithms are implemented on Java Eclipse Platform Version 3.0 using JDBC to

814

X. Cui et al.

'HWHFWLRQUDWLR

visit Microsoft Office Access 2003. We applied our algorithms to generated synthetic data with 10 attributes. The size of the generated set was 24,000. We choose MD5 as the one-way hash function, and significance level is 0.01. Subset selection attack, the attacker attempts to destroy the watermark by subsetting either tuples or attributes. We select different ratio of the original data to simulate such attacks, and the mark length is 64 with a marking frequency (Ȧ/L) of 5. Figure 1 shows the detection ratio of watermarks when subset selection attack occurs.

WXSOH

PXOWL

DWWULEXWH

VLQJOH

'HWHFWLRQ UDWLR

'DWDDOWHUDWLRQ

'DWD6HOHFWLRQ

Fig. 1. Result of subset selection attack

Fig. 2. Result of subset alteration attack

D

7XSOH6HOHFWLRQ

Fig. 3. Robustness of different marking frequency

E

F

'HWHFWLRQ UDWLR

'HWHFWUDWLR

Subset addition attack (mix-and-match attack) is very similar to subsetting selection attack on tuples. In this case, the attacker may randomly selects part of the watermarked relation and mixes them with similar tuples probably without watermarks to form a new relation of approximately the same size of the original one.

7XSOH6HOHFWLRQ

Fig. 4. Comparison with related algorithm

The attacker may also modify the watermarked relation a little with an intention to destroy a watermark. He may choose to alter bits in either one certain attribute or several attributes. Figure 2 shows the detection ratio in simulative attacks. To be noted, the robustness of our algorithm has close relation with marking frequency. The higher marking frequency we get, the more robust it is. Yet a compromise between robustness and imperceptibility must be reached in implementary phase. Figure 3 illustrates the dependency of robustness on different marking frequency.

A Robust Algorithm for Watermark Numeric Relational Databases

815

To compare the performance of our scheme with related algorithm, we perform our algorithms on different portions of tuples. Figure 4 shows the detection ratio of the comparison, where a stands for our algorithm, b and c stand for algorithm in [12][11] respectively.

4 Conclusion In this paper, we studied an improved scheme to embed a meaningful bit-string into numerical attributes in relational databases. The algorithm proved to have immunity to popular attacks to relational databases and need the smallest available bandwidth. In the future, we’d like to use this technique on DBMS of independent copyright.

References 1. Rakesh, A., Jerry, K.: Watermarking relational databases. Proceedings of the 28th International Conference on VLDB. (2002) 2. Rakesh, A., Peter, J. Haas, Jerry, K.: Watermarking Relational Data: Framework, Algorithms and Analysis. VLDB Journal, (2003) 157–169 3. Radu, S., Mikhail, A., Sunil, P.: Rights Protection for Relational Data. Proceedings of ACM SIGMOD, (2003) 98–109 4. Min, H., Cao, J. H.. Peng, Z. Y., Fang, Y.: A New Watermark Mechanism for Relational Data. The Fourth International Conference on Computer and Information Technology (CIT'04), (2004) 946–950 5. Radu, S., Mikhail, A., Sunil, P.: On Watermarking Numeric Sets. In Proceedings of IWDW 2002, Lecture Notes in Computer Science, CERIASTR 2001-60. Springer-Verlag, (2002) 130–146 6. Guo, F., Wang, J. M., Zhang, Z. H., Ye, X. J., Li, D. Y.: An Improved Algorithm to Watermark Numeric Relational Data. In Proceedings of WISA 2005, Lecture Notes in Computer Science, 3786 (2006) 138–149 7. Li, Y. J., Vipin, S., Sushil, J.: Fingerprinting Relational Databases:Schemes and Specialties. IEEE Transactions on Dependable and Secure Computing, 2 (1), Jan. (2005) 34–45 8. Zhang Y., Zhao D. N., Li D. Y.: Digital Watermarking for Relational Databases. Journal of PLA University of Science and Technology, 4 (5) (2003) 1–4 9. Hartung, F. M., Kutter.: Multimedia Watermarking Techniques. Proc. of the IEEE, Special Issue on Identification and Protection of Multimedia Information, 87 (1999) 1079–1107 10. David, G. A.: Query-preserving Watermarking of Relational Databases and XML Documents. PODS 2003, San Diego, CA (2003) 191–201 11. Schneier, B.: Applied Cryptography. John Wiley, New York (1996) 12. Liu, S., Wang, S. Deng, R., Shao, W.: A Block Oriented Fingerprinting Scheme in Relational Database. Proc. Seventh Ann. Int’l Conf. Information Security and Cryptology (ICISC2004) (2004) 455–460 13. Yoshioka, K. Shikata, J., Matsumoto, T.: A Method of Database Fingerprinting. Proc. 2004 Workshop Information Security Research, (2004) 112-123

A Study on the RAP Approach and Its Application Jian Cao1,2, Gengui Zhou1, and Feng Tang1 1

College of Business & Administration, Zhejiang University of Technology, Hangzhou, 310032, P.R. China [email protected] 2 Industrial Institute of Processing Control, Zhejiang University, Hangzhou, 310027, P.R. China

Abstract. Compared with traditional short-term forecasting problems, many today’s short-term forecasts encounter more elusive regularity of practical data and more complicated relationship between influencing factors. How to handle the different effects of various historical data to the forecasting results is also a trouble. By employing the properties of the adaptive modeling technique and the forecasting precise model, this paper presents a new RFF-based adaptive prediction (RAP) approach. The approach use the weighted recursive least square estimate method with variable forgetting factor to estimate model parameters, and apply the forecasting precise criterion to guarantee model optimization. By using the approach, the parameter estimates and even the structure coefficients can be automatically updated to obtain the optimal forecasting results. An illustrative example indicates that the proposed approach produces higher accurate results against with several traditional forecasting algorithms.

1 Introduction In many today’s short-term forecasting problems, some new characteristics emerge compared with traditional ones. For example, the forecasting period is shortened, the regularity of practical data is elusive and the influencing factors are increasing. Therefore, to reduce the risk of decision-making, it’s a key task for the decision-makers to apply useful tools to uncover the rules or patterns of numerous forecast-relative data. The technical literature displays a wide range of methodologies and models for the short-term forecasting [1-5], among which the statistical technology has been intensively studied and widely used over the past years. Statistical approaches generally employ two kinds of models: static models [1,2] and dynamic models [1,3-5]. Static models assume that the predicted value has a linear combination of some functional elements. The assumption reduces their adaptability and tracking capability to a changing environment. In dynamic models, the difference between the predicted and the actual value is usually considered as a stochastic process, and the analysis of this random process leads to a more accurate prediction. However, these dynamic models usually have the potential weakness of low robustness due to divergence of model parameters [4,5]. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 816 – 821, 2006. © Springer-Verlag Berlin Heidelberg 2006

A Study on the RAP Approach and Its Application

817

Therefore, to try to overcome the disadvantages of existing statistical methods mentioned above, a novel RFF-based adaptive prediction (RAP) approach, which could be regarded as one kind of dynamic statistical approaches, is presented in this paper. Combined with the parameter identification by recursive least square estimate algorithm with variable forgetting factor [6], the proposed approach can extract the implicitly continuously changing relation between the past data and the predicted values. By employing the forecasting precise criterion [7] to guarantee the model optimization, the approach is able to adjust its parameter estimates and even model structure coefficients to obtain the optimal forecasting results in every time period.

2 The RFF-Based Adaptive Prediction (RAP) Approach 2.1 RAP Approach Description We describe the dynamic identifiable model as a difference equation, in which the predicted value at time k can be expressed as z(k ) = −a1 z(k − 1) − a2 z(k − 2) − − ana z(k − na ) n

+ ¦[bi0ui (k − di ) + bi1ui (k − 1 − di ) + +binb ui (k − nbi − di )] + e(k )

nbi e(k)

(1)

i

i =1

where z(k) ui(k) di ai, bij na

,

actual predicted value in period k, actual value of ith influencing variable in period k, i [1,n], pure delay time of ith influencing variable, dynamic parameters of the model, number of preceding successive predicted values which are correlative to z(k), number of preceding successive values of ith influencing variable which are correlative to z(k), system disturbance which is usually regarded as white noise, i.e. E[e(k)]=0.

Щ

We define λ = [na , nb1 , , nbn , d1 , d n ] as the structure coefficient vector of the

proposed identifiable model. Then, we rewrite Eq. (1) as z (k ) = ϕ T (k )θ + e(k ) ,

(2)

where ĳ(k) is the regression vector of the known values of past predicted data and correlative variables, and ș is the parameter vector of the identifiable model, both are expressed as follows: ϕ T (k ) = [− z(k − 1),,− z(k − na ), u1 (k − d1 ), u1 (k − 1 − d1 ),, u1 (k − nb1 − d1 ), , ui (k − d i ), ui (k − 1 − d i ),, ui (k − nbi − d i ), , u n (k − d n ), un (k − 1 − d n ),, u n (k − nbn − d n )]

,

(3)

818

J. Cao, G. Zhou, and F. Tang

θ T = [a1 , a2 ,, ana , b10 , b11 ,, b1nb1 ,, bi 0 , bi1 ,, binbi ,, bn0 , bn1 ,, bnnbn ] .

(4)

Let İ(k) be the estimation error which represents the difference between the actual predicted value z(k) and the estimated predicted value z (k ) . Use Eq. (2) and note that z (k ) is calculated by the multiplication of ĳT(k) and θ , we get (5) ε (k ) = z (k ) − z (k ) = ϕ T (k )θ + e(k ) − ϕ T (k )θ . Define the output column vector z = [z(1), z(2),, z(m)]T , the input regression matrix

φ = [ϕ (1),ϕ (2),,ϕ (m)]T , the disturbance column vector e = [e(1), e(2),, e(m)]T and the estimation error column vector ε = [ε (1), ε (2), , ε (m)]T . Using Eq. (2) and Eq. (5), z and İ are obtained as z = φθ + e ,

(6)

ε = z − φθ = φ (θ − θ ) + e .

(7)

In the model describing time-variant process behavior, the later information is usually more important than the earlier to influence the output. Therefore, to emphasize the importance of the later data to θ , we define a positive weighted matrix w in the identification process, w = diag{ρ m−1 , ρ m−2 ,, ρ,1} , where ȡ is a forgetting factor, ȡ (0,1) [5]. The parameter estimates are computed by minimizing a criterion J, which is described as (8) J = ε T wε = ( z − φθ )T w( z − φθ ) .

Щ

Theorem 1. Utilize the input and output data by time k, define the variance matrix T P(k ) = (φ k wk φ k ) −1 and the gain matrix K (k ) = P (k )ϕ (k ) ρ k −1 , thus the solution to θ is obtained by the following set of recursive equations,

°θ (k ) = θ (k − 1) + K (k )[ y (k ) − ϕ T (k )θ (k − 1)] ° T −1 ® K (k ) = P(k − 1)ϕ (k )[ ρ + ϕ (k ) P(k − 1)ϕ (k )] . ° 1 T ° P(k ) = ρ [ I − K (k )ϕ (k )]P(k − 1) ¯

(9)

Let θ (0) = 0 and P (0) = βI , where ȕ is a sufficiently large positive. This initial approach [8] to the solution θ is efficient and effective due to its capability of fast parameters convergence. Apply θ (k ) obtained at time k to the calculation of the forecasting value z (k + 1) at time k+1, and consider the disturbance e(k) as white noise, i.e. E[e( k )] = 0 , we have (10) z (k + 1) = ϕ T (k + 1)θ (k ) .

A Study on the RAP Approach and Its Application

819

Note that before the calculation of θ (k ) , Ȝ(k) should be determined. Different Ȝ(k) leads to different θ (k ) , and different θ (k ) leads to different z (k + 1) . In practice, several structure coefficient vectors are determined by experts’ experience or certain searching algorithms at the very begin, thus different forecasting results in the same period will be obtained. Herein we use the forecasting precise (FP) criterion to determine the optimal forecasting value. The FP is expressed as [7] 1 k | z( j) − z r ( j) | , FPr (k ) = 1 − ¦ (11) k j =1 z( j)

Щ

where FPr(k) and z r ( j ) is determined by the rth structure coefficient vector Ȝr(k), r [1,s], s means the number of the structure coefficient vectors. The closer the FP approximates to 1, the better the corresponding forecasting result is. Before we obtain the optimal forecasting value z * (k + 1) , the optimal structure * coefficient vector Ȝ (k), which leads to the optimal forecasting precise FP*(k), FP*(k)=max{FPr(k), r [1,s]}, should be identified.

Щ

2.2 The Forecasting Process Based on the RAP Approach

A concise summary of steps in the approach follows: Step 1. Determine several influencing variables of the dynamic identifiable model. Step 2. Identify several structure coefficient vectors. Calculate the parameter estimates and then the corresponding forecasting result with various structure coefficient vectors, respectively. Step 3. Compute the forecasting precise with every structure coefficient vector and determine the optimal structure coefficient vector by the optimal FP, then compute the optimal forecasting value. Step 4. If the optimal FP is less than the designed threshold value, for instance, 90 percent, add several new structure coefficient vectors and iterate the step 2 and step 3.

3 An Illustrative Example In this example, we consider one influencing variable, u1(k), in the regression vector. We define that ϕT (k) = [−z(k −1),,−z(k − na ),u1 (k − d),u1 (k −1 − d),, u1 (k − nb1 − d)] , Ȝ=[na, nb1 , d] and θ T = [ a1 , a 2 , , a na , b0 , b1 , , bnb ] . 1

We primarily consider that the range of na is from 10 to 16, nb1 from 3 to 4 and d from 1 to 2. Thereby, the number of Ȝ is equal to 7×2×2=28. Giving ȡ = 0.9 and ȕ= 106, the RAP approach was implemented in Matlab 6.0.

820

J. Cao, G. Zhou, and F. Tang Table 1. Historical data in the first 11 periods

Period

1

2

3

4

5

6

7

8

9

10

11

Actual predicted value z(k) 162 171 168 184 185 193 207 224 205 194 190 Influencing variable value u1(k) 17 17 18.5 18.5 18.5 19.5 19.5 19.5 18 18 18

Firstly, we give the actual data of the first 11 periods in Table 1. Another consecutive 20-period actual data are given in Table 2. Table 2 also lists the optimal forecasting values and corresponding optimal structure coefficient vectors by the RAP approach, as well as the forecasting results calculated by the MA and the ES. Table 2. Forecasting results by RAP, MA and ES

Forecasting result z*(k) Actual Influencing RAP Week predicted variable value Optimal structure coe ES ES Optimal fore MA (Į=0.4) (Į=0.8) u1(k) value z(k) -ffiients (na, n b1 ,d) -casting value 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

187 180 177 196 201 215 209 232 240 251 223 231 235 214 198 191 214 218 234 241

18 19 19 19.5 20.5 20.5 21 21 20.5 20.5 20 20 20 20 20 18.5 18.5 20 20 21

10,4,1 10,4,1 10,4,1 10,4,1 10,4,1 10,4,1 10,4,1 10,4,1 14,4,1 12,4,1 12,4,1 12,4,1 16,4,1 16,4,1 16,4,1 16,4,1 15,4,1 15,4,1 15,4,1 15,4,1

192 187 179 175 204 203 214 215 214 249 225 214 242 214 211 211 227 213 212 254

202 201 197 189 187 189 193 196 205 216 225 228 231 235 232 225 215 214 212 212

202 196 190 185 189 194 202 205 216 225 236 231 231 232 225 214 205 209 212 221

202 190 182 178 192 199 212 210 228 238 248 228 230 234 218 202 193 210 216 230

Table 3 presents the forecasting precise (FP), the mean absolute error (MAE) [7] and the mean square error (MSE) [7] statistics by using these approaches. Table 3. Precise statistics by RAP, MA and ES

Precise criteria FP MAE MSE

RAP 0.9509 10.60 171.20

MA 0.9111 18.95 505.65

ES

㧔Į=0.4㧕 0.9255 15.85 309.95

ES

㧔Į=0.8㧕 0.9380 13.25 217.35

A Study on the RAP Approach and Its Application

821

As Table 3 shown, highly accuracy forecasting is achieved with periodical average errors less than 5% by the RAP approach. Because the actual predicted values are quite variable, we consider that such forecasting results are valid and no other structure coefficient vectors are required to add in this case. Table 3 also presents that all three precise statistics of the RAP are better than those of the MA and the ES. For example, the ratios of improvement in the FP criterion from the RAP reach up to 45%, 34% and 21% compared with the MA, the ES (Į = 0.4) and the ES (Į = 0.8), respectively. The results reveal a high degree of accuracy and robustness of the RAP approach.

4 Conclusions To solve problems such as elusive regularity of actual data, increasing influencing factors and different effects of various historical data to the forecasting results in many today’s short-term forecasting problems, a new RAP approach is proposed in this paper. By employing the properties of the adaptive modeling techniques, the approach can continuously update the parameter estimates and even the structure coefficients to procure the optimal forecasting results. An illustrative example has revealed that the proposed approach has the capability to tracking actual data patterns effectively and efficiently, thus more accurate forecasts can be obtained compared with some traditional forecasting approaches. Another advantage of the approach is the minimum human intervention required when we apply the approach-based system, it enhances the suitability of the approach for online application.

Acknowledgements This research work was partially supported by Research Planning Fund of Zhejiang Provincial Education Department (No.20040580) and Zhejiang Provincial Nature Science Foundation (No.Y104171).

References 1. Jae, K.S.: Strategic Business Forecasting: the Complete Guide to Forecasting Real World Company Performance. CRC Press LLC, Boca Raton (2000) 2. Zhong, Q., Chen, X.H., Liu, S.Q.: On Fuzzy Forecast of Sales Volume of Enterprise Product. Chinese Journal of Management Science. 9 (2001) 31-35 3. Kamarianakis, Y., Prastacos, P.: Space-Time Modeling of Traffic Flow. Computers & Geosciences. 31 (2005) 119-133 4. Kulendran, N., Witt, S.F.: Cointegration Versus Least Squares Regression. Annals of Tourism Research. 28 (2001) 291-311 5. Ljung, L.: System Identification Theory for the User. Prentice Hall, Englewood Cliffs (1987) 6. Fang, C.Z., Xiao, D.Y.: Process Identification. Tsinghua University Press, Beijing (1988) 7. Peiris, M.S.: Improving the Precision on Forecasting. Microelectronics and Reliability. 36 (1996) 1375-1378 8. Guo, S.L.: Stochastic Control. Tsinghua University Press, Beijing (1999)

An Analytical Model for Web Prefetching Lei Shi1,2, Lin Wei2, Zhimin Gu1, Yingjie Han2, and Yun Shi3 1

Department of Computer Science and Engineering, Beijing Institute of Technology, Beijing 100081,China [email protected], [email protected] 2 School of Information Engineering, Zhengzhou University, Zhengzhou 450052, China [email protected], [email protected] 3 Department of Information Technology, State Post Bureau, Beijing 100808, China [email protected]

Abstract. The ultimate goal of Web prefetching is to improve the quality of service on the Web. Previous studies in speculative prefetching focus on building prediction algorithms for the purpose of Web prefetching accuracy. This paper presents a theoretical analytical model on Web prefetching, based on which, different Web prefetching scheme can be developed. The discussion tries to describe a generic prefetching algorithm and provide implementation basis for Web speculative prefetching.

1 Introduction Currently, the most effective solutions for improving the retrieval rate of large distributed documents are Web caching and prefetching [1]. Prefetching can either be speculative or informed. In this paper we investigate speculative prefetching. Previous studies in speculative prefetching focus on building access models and evaluating the performance of such models in predicting future accesses. The criterion that most existing prefetch schemes apply in prefetching documents from the origin servers into proxy side or client side is usually the probability of each document being accessed in the near future or the popularity of the document. While these models are important, they do not constitute a complete framework for building optimal prefetch strategies. Most of the prefetch techniques do not consider the factors such as the size of the data items, the data access rate, and the data update rate. In this paper, we address these issues by proposing an analytical model, and based on which, presents different Web prefetching schemes. Intuitively, Web prefetching can have a better performance than those without prefetching mechanism. Our work tries to make performance analysis of Web prefetching quantitatively. Based on the analytical model, the access latency and the hit ratio metrics are discussed theoretically. Although Web caching has already widely used in the WWW, the benefits from Web caching are becoming limited due to the rapid changes of network resources, D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 822 – 827, 2006. © Springer-Verlag Berlin Heidelberg 2006

An Analytical Model for Web Prefetching

823

while Web prefetching can optimize the World Wide Web in many aspects. The combination of Web prefetching and caching holds the promise of improving the QoS of Web systems. In 1994, Griffioen [3] paid attention to the modeling of Web prefetching and caching on file system. The research assumed that prefetching and caching share the same cache space, and showed that integrated Web prefetching and caching can improve the performance of cache system. In 1995, Cao [4] presented a model of integrated Web prefetching and caching on file system, and based on which, she made performance study and simulative validation. Simulations illustrated that the integrated model could reduce the elapsed times of the applications by up to 50%. In 2001, Yang [5] presented an integrated architecture for Web object caching and prefetching. The Web object prediction model was built by mining the frequent paths from past Web log data, and prefetching algorithm named Pre-GDSF was implemented. Experimental shows that integrated Web prefetching and caching system can have a better performance than those without prefetching mechanism. Our paper tries to establish an analytical model and make discussions based on the integrated architecture for caching and prefetching.

2 Analytical Model The following notations are used in the discussion. N: the total number of data items; ai: the mean access arrival rate for data item I; ui: the mean update arrival rate for data item i, i.e. the average modification cycle of data item I; Ti,d: the average delay imposed by the Internet, i.e. the interval between the request time and the response time; Ti,c: the average response time from the cache; Tc,s: the average response time from the origin server, and Ti,s=Ti,c+Ti,d; si: the size of data item i; pi: the access probability to data item i; The access probability to document i can be calculated as follows.

pi = a i

N

¦a

i

.

(1)

i =1

Assume that the access pattern of object i follows the Poisson distribution and the inter-arrival time of requests to document i is exponentially distributed:

f i (t ) = a i e − ai t .

(2)

The update interval is assumed to follow the exponential distribution with the average interval being ui. Define PAi (t ) as the probability that the last access occurs within time t in the past and

PBi (t ) as the probability that no update occurs within

time t in the past. Then, the following equations exist.

PAi (t ) = 1 − e ai t .

(3)

824

L. Shi et al.

PBi (t ) = e −t / ui .

(4)

The probability of no accesses to object i in this time t is P(no accesses to object i in t time)= e

− ai t

(5)

.

Then the probability that there is at least one request to data item i during a given modification cycle is given by

gi = 1 − e − ai u i .

(6)

For an object i that is not prefetched, a current access is a hit if the last access occurred after the last update, and the probability of an access to be a hit is given as (7). For the prefetched objects, the hit ratio is 1. ∞

hi = ³ PAi (t )d PBi (t ) = 0

ai u i . ai u i + 1

(7)

2.1 Access Latency Potential sources of latency are the Web server’s heavy load, network congestion, low bandwidth, bandwidth underutilization and propagation delay. Suppose that r documents are prefetched to the cache. For simplicity, we consider three types of average delays. Ti , c refers to the response time from the cache, which means the average delay imposed by the Intranet between the request and the receiving of the response from the cache when a “fresh” or “valid” copy of the document is found in the cache. Ti , d represents the average delay imposed by the Internet, including the transmission time of the document, the round-trip time and the processing time. Ti , s refers to the response time from the origin server, which means the total average delay imposed by the Intranet and the Internet when there is no “fresh” copy of document i in the cache. Apparently, Ti , s = Ti , c + Ti , d . Let Lcache and L pref stand for the average latency for no-prefetch and prefetch cache scheme separately. The average latency L pref for a prefetch scheme can be expressed as

L pref =

N

¦ p [h T i

i = r +1

i i ,c

r

r

i =1

i =1

+ (1 − hi )Ti , s ] + ¦ piTi ,c = Lcache − ¦ pi (1 − hi )Ti ,d

(8)

r

L pref = Lcache − ¦ p i (1 − hi )Ti , d .

(9)

i =1

As seen from (9), the latency reduction is actually the benefit of Web prefetching r

gains and is determined by the parameters of

¦ p (1 − h )T i

i =1

i

i ,d

.

An Analytical Model for Web Prefetching

825

2.2 Hit Ratio Let

H cache and H pref refer to the total hit ratio for no-prefetch and prefetch scheme

separately. The total hit ratio for a prefetch scheme can be given by N

r

¦ a h + ¦a i i

H pref =

i

i = r +1

N

r

r

i =1

i =1

i =1

= ¦ pi hi + ¦ pi (1 − hi ) = H cache + ¦ pi (1 − hi ) (10)

i =1

N

¦a

i

i =1

r

H pref = H cache + ¦ p i (1 − hi ) .

(11)

i =1

r

From (11), we can find that

¦ p (1 − h ) is the increment of the hit ratio comi

i

i =1

pared with the “no-prefetch” cache scheme. 2.3 Bandwidth Let

Bcache and B pref be the transmission bandwidth for no-prefetch and prefetch

scheme separately. Thus, obtaining the expression of the transmission bandwidth required for transmitting documents from the origin servers to the cache.

B pref =

N

r

¦ a (1 − h )s + ¦ s i

i

i

i = r +1

i =1

r

i

= Bcache + ¦ [1 − ai (1 − hi )]si i =1

r

B pref = Bcache + ¦ [1 − a i (1 − hi )]s i .

(12)

i =1

From the equation above, we can see that the prefetch scheme needs an extra r

transmission bandwidth of

¦[1 − a (1 − h )]s i

i

i

than the ordinary cache scheme. To

i =1

be noticed, different from access latency and hit ratio, the transmission bandwidth can be increased or reduced, which depends on the value of the parameters.

3 Discussion Web prefetching algorithm cannot give an all-in-one solution. The criterion to choose what to prefetch can be different. If the main objective of Web prefetching is to reduce the access latency, then a good choice of the scheme is to minimize L pref . If the main objective is to increase the hit ratio of Web cache, then the kernel of the algorithm should be to maximize H pref .

826

L. Shi et al.

The generic prefetching algorithm can be described as follows. Algorithm Prefetching (Objective: prefetch_obj) Step 1: Pre-process the log, clean noisy records; Step 2: For each Web object do Calculate the key based on prefetch_obj; Step 3: Sort the key values above; Step 4: Prefetch the optimal ones; The total number of accessed documents is assumed 100,000. The mean update arN

rival rates are assumed to be 2 hours, i.e. 7200 seconds. The total access rate

¦a

i

is

i =1

assumed as 1 request per second. The access distribution { a i } of documents follows a Zipf-like distribution [2], which means stant, i is the rank of popularity and

pi = C / i a , where parameter C is a con-

p i is the conditional probability of the Web page

ranking i. The Zipf exponent α reflects the degree of popularity skew. The average delay imposed by the Internet

Ti , d is assumed 200ms. The size of document s i is

uniformly assumed 32 KB. Fig. 1 demonstrates the numerical results of the analysis. The objectives of the figures are maximized reduced access latency and minimized increased bandwidth. The trend of maximized increased hit ratio is similar to that of maximized reduced access latency. When the prefetch number increases, the reduced access latency increases simultaneously, but the trend becomes smoother, and the higher of Zipf-like parameter Į is, the more of the reduced access latency. Higher Į and more prefetched objects will gain more increased hit ratio. The reason is that higher Į implies the concentration of hot objects, prefetching hot objects will make more contributions to the reduced access latency. More prefetch number consumes more bandwidth, and the trend becomes steep as the prefetch number increases. The higher Į is, the relatively lower of bandwidth increased.

¢

¢

¢

¢

¢

Reduced

$FFHVV /DWHQF\

¢

¢

¢

¢

Increased Bandwidth

¢

3UHIHWFK1XPEHU

3UHIHWFK1XPEHU

Fig. 1. Reduced access latency and increased bandwidth of Web prefetching algorithm

An Analytical Model for Web Prefetching

827

4 Discussions In this paper, we present an analytical model for Web prefetching and caching system. The performance metrics, the access latency, hit ratio, and bandwidth are discussed and analysed theoretically. Ongoing work includes the application and further performance analysis of the analytical model for Web prefetching.

References 1. Shi, L., Gu, Z., Pei, Y., Wei, L.: A PPM Prediction Model Based on Web Objects' Popularity. Lecture Notes in Artificial Intelligence, Vol. 3614. Springer-Verlag, Berlin Heidelberg (2005) 110–119 2. Shi, L., Gu, Z., Wei, L., Shi, Y.: Quantitative Analysis of Zipf's Law on Web Cache. In: Pan, Y., Chen, D. (eds.): ISPA 2005. Lecture Notes in Computer Science, Vol. 3758. Springer-Verlag, Berlin Heidelberg (2005) 845–852 3. Griffioen, J., Appleton, R.: Reducing File System Latency Using a Predictive Approach. In: Proceedings of USENIX Summer Conference, USENIX, Berkeley (1994) 197–207 4. Cao, P., Felten, E.W., Karlin A.R., Li K.: A Study of Integrated Prefetching and Caching Strategies. In: Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, Ottawa (1995) 171–182 5. Yang, Q., Zhang, H.H.: Integrating Web Prefetching and Caching Using Prediction Models. World Wide Web. Vol.4 (4) (2001) 299–321 6. Yu, S., Kobayashi, H.: A New Prefetch Cache Scheme. In: Proceedings of GLOBECOM '00. Vol.1. IEEE, Piscataway (2000) 350–355 7. Shi, L., Gu, Z., Wei, L., Shi, Y.: Popularity-based Selective Markov Model. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence, Beijing (2004) 504–507

An Optimal Interval for Computing Course and Ship Speed in Marine Gravity Survey Based on Approximate Reasoning Lihua Zhang1,2, Chong Fang2,3, Xiaosan Ge2, and Yilong Li1 1

Department of Hydrography and Cartography, Dalian Naval Academy, 667, Jiefang Road, Dalian, Liaoning, 116018, P.R. China 2 State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing,Wuhan University, 129 Luoyu Road, Wuhan, Hubei, 430079, P.R. China 3 Department of Surveying and Land Information Engineering, Central South University, Changsha, Hunan, 410083, P.R. China [email protected]

Abstract. Course and ship speed determine the accuracy of Eötvös correction which is the main error source of marine gravity survey. This paper presents a method to acquire an optimal interval for computing course and ship speed based on approximate reasoning. The computed error derived from positioning inaccuracy decreases with increasing time interval, but increasing interval also makes the computed values inaccurate to the actual. The optimal interval is concluded theoretically, and the value is acquired using a statistical method. Experiments show the proposed method good results, the reasonable interval can be gained, and accuracy of Eötvös correction can be improved.

1 Introduction Marine gravity survey is a kinetic survey, the platform (ship) of which is moving continuously. Observations of marine gravity are affected by the interferential acceleration such as horizontal and vertical acceleration, cross-coupling and Eötvös effect. In order to eliminate these influences and make marine gravity survey more accurate, a series of measures used to improve accuracy, which have been taken in instruments, work means and data processing, are introduced [1],[2],[3],[4]. Of all influences above, the Eötvös correction error is the main source lowering the accuracy of marine gravity survey. In the Eötvös correction equation, it is indicated that Eötvös correction is mostly dependant on course and ship speed. The increasing accuracy of course and speed will lead to great improvement of Eötvös correction. However, the real-time course and ship speed, which can not be obtained directly, are computed usually with positions of neighboring points. The accuracy of Eötvös correction is determined by positioning accuracy of the survey points and the method used to compute course and ship speed. To improve accuracy of Eötvös correction, it is necessary to improve accuracy of positioning system. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 828 – 833, 2006. © Springer-Verlag Berlin Heidelberg 2006

An Optimal Interval for Computing Course and Ship Speed

829

However, accuracy of positioning is limited a certain extent. So it is useful and meaningful to optimize the interval used to calculate course and ship speed [4],[5],[6].

2 Model and Analyses 2.1 The Influence of Course and Ship Speed on Eötvös Correction The Eötvös correction formula is directly cited from reference [1] as follows:

δg E = 7.50V sin A cos B + 0.004V 2 .

(1)

Where: A —true course, B —geodetic latitude, V —speed of survey ship. In formula (1), it is shown that the computation accuracy of Eötvös effect correction depends on course, ship speed and latitude of the survey point. For the sake of estimating the influence from all errors above on Eötvös correction, it is necessary to calculate differential coefficient of formula (1). Because latitude error of survey point has been tiny influence on δg E , the correction formula can be simplified as: d (δg E ) = 7.50V cos A cos BdA + (7.50 sin A cos B + 0.008V )dV .

(2)

2.2 The Method and Analyses for Computing Course and Ship Speed Course and ship speed can be calculated using coordinates and time intervals of foreand-after survey points in survey. According to specifications for marine gravity survey, the course and ship speed are computed by expressions (3) and (4). A = arctg

V=

Where:

xi

yi − yi −1 . xi − xi −1

(3)

( x i − x i −1 ) 2 + ( y i − y i −1 ) 2 t i − t i −1

.

㧘 y —coordinates of the survey point; t − t i

i

(4) i −1 —time

interval of fore-

and-after survey points. For the reason of positioning errors, the two formulae above can be described as: y − ~y ) + ( ∆y i − ∆y i −1 ) (~ A = arctg ~ ~i −1 . ( x i − x i −1 ) + (∆x i − ∆x i −1 ) V=

(5)

(( ~ xi − ~ xi −1 ) + (∆xi − ∆xi −1 )) 2 + (( ~yi − ~yi −1 ) + (∆yi − ∆yi −1 )) 2 ti − ti −1

(~ xi , ~ yi ) —true coordinates of No. i point in a survey line errors of No. i point.

.

(6)

㧧 ∆x , ∆y —positioning i

i

830

L. Zhang et al.

As it shows in formula (5) and (6), the longer the interval of fore-and after survey points is, the larger the values of ~x − ~xi −1 and ~ yi − ~ yi −1 are. However, the difference between fore-and-after in the positioning system ( ∆xi − ∆x i −1 and ∆yi − ∆yi−1 ) is limited and impossible to increase boundlessly with the augment of survey points interval used to calculate course and ship speed. Therefore the increase of survey point interval will partly decrease the positioning system errors’ influence on course and ship speed’s computation. Actually, the ship speed is not steady and the track is not straight so that the course and ship speed calculated from formula (5) and (6) are mean values of the former point and the current one. The longer the interval is, the more impossible the computation values are the true course and ship speed. Theoretically, the shorter interval to calculate course and ship speed will be better if positioning system is no errors. Reversely, with positioning system errors, longer interval can make the computation error of course and ship speed (caused by positioning errors) small. Theoretically there is an optimal interval for computing Eötvös correction, which makes synthetically the total error minimal. 2.3 The Mathematic Expression of Optimal Computation Interval When working on the sea, the survey ship will sway around planned survey line due to interferences as shows in Fig.1. There are corresponding curve fitting methods to fit the true track in reference [1]. Supposing that in a certain survey line, the course change rate function of the ship is A (t ) and the speed change rate function is V (t ) . In the interval

㧔 t — t 㧕, the mean course A a

b

b

from t a to tb (the course at

tb ) can be

calculated approximately by formula (3).

Fig. 1. The true curse and ship speed are changing continuously. Practically the values are computed using an approximate method according to the formula (3) and (4).

The true course of the survey ship at time

t b is

tb ~ ~ Ab = Aa + ³ A (t ) dt.

(7)

ta

~ Aa is the true course at t a . A (t ) is change rate of the course from t a to t b . The difference between the calculated value and the true value at time

~ dAb = Ab − Ab .

t b is (8)

An Optimal Interval for Computing Course and Ship Speed

831

t b can be obtained. Theoretically, the optimal interval to calculate course and ship speed at t b is the interval t b - t a in which the Eötvös Similarly, dVb at time

㧔

㧕

correction error caused synthetically by course and ship speed is the lowest. It can be calculated using the following formula, which is based on formula (2). min(7.50Vb cos Ab cos Bb dAb + 7.50 sin Ab cos Bb + 0.008Vb )dVb ) .

Put

(9)

dAb and dVb into formula (9), calculate the time derivative and recalculate: 7.50Vb cos Ab cos Bb ( Pb − A (t )) + (7.50 sin Ab cos Bb + 0.008Vb )(Qb − V (t )) = 0.

Where:

(10)

Pb and Qb are two complex functions related to the compute interval and

positions errors. As formula (10) shows, the optimal interval is related to course, ship speed rate,

(t ) and V (t ) are equal to 0, that is the ship has a and positioning errors. When A steady course and speed , the results with any interval will be always the same if without positioning errors. Actually, the ship speed is unsteady and the track is (t ) and V (t ) become irregular at all time. changing so that A In fact, the optimal interval can hardly be computed from formula (10) since the functions are too complicated and the changing of positioning errors can’t be expressed with a simple mathematic expression. So it is impossible to calculate the real-time variation of course and ship speed using the optimal interval. However, position errors don’t excess a certain limit and the course and ship speed is controlled technically in survey, mean square error (MSE) of all point’s Eötvös correction will decrease if the reasonable mean interval is used in a or several surveys. The mean interval can’t be from the formula (10) directly, but an approximate value can be obtained based on statistical analyses [7]. 3 Experiment and Discussions Mean square error (MSE) of intersection discrepancies (the differences of gravity values respectively in two intersectant survey lines) is used estimate accuracy in survey. To analyze the influence of course and ship speed using different interval to compute Eötvös correction, MSEs of intersection discrepancies using different time intervals are computed in four surveys. The results are shown in table 1. As it shows in table 1 and Fig.2, the MSE of intersection discrepancies decrease quickly and then increase slowly when the interval changes from short to long. There is theoretical optimal interval from the trend of the change. However, it is not very definite for the incomplete statistics. In some range, the MSE of intersection discrepancies is lower than others’. As the figures show, the Eötvös correction error is large using the interval less than 1 minute. For example, if the interval is 1 minute, the

832

L. Zhang et al. Table 1. MSEs of intersection discrepancies based on different time intervals(TI)

(a) Surveying No.1

(b) Surveying No.2

(c) Surveying No.3

(d) Surveying No.4

Fig. 2. The trends of MSE of intersection discrepancies based on different time intervals

MSE is 2.7 mGal larger than using 11 minutes in surveying No.1., 2.0 mGal larger than using 11 minutes in surveying No.2, 1.7 mGal larger than using 12 minutes in surveying No.3 and 1.2 mGal larger than using 14 minutes in surveying No.4. At one time, the MSE increase apparently if the interval is rather long (more than 30 minutes etc.). It is concluded that it is necessary to decide reasonable interval to compute Eötvös correction. Seen from the above figures, the MSEs are optimal when the time interval for computing the course and ship speed is within 10-15 minutes.

An Optimal Interval for Computing Course and Ship Speed

833

4 Conclusions The computation inaccuracy of the course and ship speed causes the Eötvös correction error and further affects accuracy of marine gravity product. There is the optimal course and ship speed calculating interval to improve the accuracy of Eötvös correction in theory. To one or several surveys, it can be gained based on approximate reasoning.

Acknowledgments This work is supported by Program of the State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (WKL(05)0304) and Science Foundation of Dalian Naval Academy(FZ0402512).

References 1. Liang, K., Liu, Y.: Marine Gravity and Magnetic Survey. Beijing; Surveying and Mapping Press (1996) 2. Huang M., et al.: On the Compensation of Systematic Errors in Marine Gravity Measurements. Marine Geoodesy, 22(3) (1999) 183-194 3. Huang, M., Zhai, G., Ouyang, Y.: Two-Step Processing for Compensating the Systematic Errors in Marine Gravity Measurements. Geomatics and Information Science of Wuhan University, 27(2) (2002) 251-255 4. Yi, Q., Sun, Y., Lin, Y.: A Method for Determining Gross Errors and Improving Accuracy of Marine Gravity Survey. Marine Surveying and Mapping, (2004) 24-25 5. Zhang, L., Yin, X., Sun, Y.: The Analysis of Intervals of Calculating Course and Ship Speed of Eötvös Effect Correction. Surveying and Mapping Engineering, 11 (2002) 39-42 6. Zhang, T., Gao, J., Chen, M.: The Reasonable Correction of Eötvös Effect in Marine G ravity Survey Marine Surveying and Mapping, 25 (2005) 17-20 7. Keim, D.A., Panse, C.-S.M.: A New Visual Data Mining Approach for Analyzing Large Spatial Data Sets. Proceedings of the Third IEEE International Conference on Data Mining, Melboume, Florida, USA: IEEE Computer Society (2003)

Application of Association Rules in Education Sylvia Encheva1 and Sharil Tumin2 1

Stord/Haugesund University College, Bjørnsonsg. 45, 5528 Haugesund, Norway [email protected] 2 University of Bergen, IT-Dept., P. O. Box 7800, 5020 Bergen, Norway [email protected]

Abstract. This paper is devoted to application of association rules in education. The aim is to discover association rules with some given information in the antecedent without the constraint of support threshold being involved.

1

Introduction

Association rules are often used in market basket analysis [1] for establishing associations among products purchased by a single costumer, and in medical research [4] for discovering associations among numerical, categorical, time and even image attributes. The probabilistic approach applied there deals with statements of the form ’the presence of attributes α and β often also involves attribute γ’. This paper is devoted to application of association rules in education. The aim is to discover association rules with some given information in the antecedent without the constraint of support threshold being involved. The rest of the paper is organized as follows. Related work is described in Section 2. Some deﬁnitions and statements from formal concept analysis and rule mining may be found in Section 3. The main results of the paper are placed in Section 4. The paper ends with a conclusion in Section 5.

2

Related Work

Formal concept analysis [8] started as an attempt of promoting better communication between lattice theorists and users of lattice theory. Since 1980’s formal concept analysis has been growing as a research ﬁeld with a broad spectrum of applications. Various applications of formal concept analysis are presented in [5]. An excellent introduction to ordered sets and lattices and to their contemporary applications can be found in [3]. The complexity of mining frequent itemsets is exponential and algorithms for ﬁnding such sets have been developed by [7]. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 834–838, 2006. c Springer-Verlag Berlin Heidelberg 2006

Application of Association Rules in Education

3

835

Preliminaries

A concept is considered by its extent and its intent: the extent consists of all objects belonging to the concept while the intent is the collection of all attributes shared by the objects [3]. A context is a triple (G, M, I) where G and M are sets and I ⊂ G × M . The elements of G and M are called objects and attributes respectively. For A ⊆ G and B ⊆ M , deﬁne A = {m ∈ M | (∀g ∈ A) gIm} and B = {g ∈ G | (∀m ∈ B) gIm} so A is the set of attributes common to all the objects in A and B is the set of objects possessing the attributes in B. Then a concept of the context (G, M, I) is deﬁned to be a pair (A, B) where A ⊆ G, B ⊆ M , A = B and B = A. The extent of the concept (A, B) is A while its intent is B. A subset A of G is the extent of some concept if and only if A = A in which case the unique concept of the which A is an extent is (A, A ). The corresponding statement applies to those subsets B of M which are the intent of some concept. The set of all concepts of the context (G, M, I) is denoted by B(G, M, I). B(G, M, I); ≤ is a complete lattice and it is known as the concept lattice of the context (G, M, I). For concepts (A1 , B1 ) and (A2 , B2 ) in B(G, M, I) we write (A1 , B1 ) ≤ (A2 , B2 ), and say that (A1 , B1 ) is a subconcept of (A2 , B2 ), or that (A2 , B2 ) is a superconcept of (A1 , B1 ), if A1 ⊆ A2 which is equivalent to B1 ⊇ B2 . Frequent sets [6] are sets of attributes that occur often enough to deserve further consideration. An association rule Q → R holds if there are suﬃcient objects possesing both Q and R and if there are suﬃcient objects among those with Q which also possess R [2]. A context (G, M, I) satisﬁes the association rule Q → Rminsup,minconf , | with Q, R ∈ M , if sup(Q → R) = |(Q∪R) ≥ minsup and conf (Q → R) = |G| |(Q∪R) | |Q |

≥ minconf provided minsup ∈ [0, 1] and minconf ∈ [0, 1].

| | The ratios |(Q∪R) and |(Q∪R) are called, respectively, the support and the |G| |Q | conﬁdence of the rule Q → R. In other words the rule Q → R has support σ% in the transaction set T if σ% of the transactions in T contain Q ∪ R. The rule has conﬁdence ψ% if ψ% of the transactions in T that contain Q also contain R.

4

Looking at a Linear Algebra Test

Consider ﬁrst year engineering students on bachelor level enrolled in a linear algebra course. In this particular case they are divided in units according to gender and results from a preliminary test as follows: – Unit 1 : male students with excellent score – Unit 2 : male students with very good score – Unit 3 : male students with good score

836

– – – – – –

S. Encheva and S. Tumin

Unit Unit Unit Unit Unit Unit

4 5 6 7 8 9

: : : : : :

male students with satisfactory score female students with excellent score female students with very good score female students with good score female students with satisfactory score students with poor score

Fig. 1. Hasse diagram for students’ results

The goal is to ﬁnd the association rules that relate students’ results from the preliminary test to the following attributes: – – – – – –

has special interst in algebra has moderate interst in algebra has no interst in algebra had an algebra course within the last year did not have an algebra course within the last year has working knowledge about polynomials

Application of Association Rules in Education

837

– has no working knowledge about polynomials – has working knowledge about matrices – has no working knowledge about matrices For the sake of simplicity we limit the amount of attributes that may eﬀect students’ performance. The corresponding Hasse diagram is shown in Fig. 1. Formal concepts presented in the Hasse diagram – {Unit 1, Unit 5}, {has special interst in algebra, had an algebra course within the last year, has working knowledge about polynomials, has preliminary knowledge about matrices} – {Unit 2, Unit 6}, {has special interst in algebra, had an algebra course within the last year, has no working knowledge about polynomials, has preliminary knowledge about matrices} – {Unit 3, Unit 7}, {has moderate interst in algebra, did not have an algebra course within the last year, has working knowledge about polynomials, has preliminary knowledge about matrices} – {Unit 4, Unit 8}, {has no interst in algebra, did not have an algebra course within the last year, has working knowledge about polynomials, has no preliminary knowledge about matrices} – {Unit 9}, {has special interst in algebra, did not have an algebra course within the last year, has working knowledge about polynomials, has no preliminary knowledge about matrices} – {Unit 1, Unit 2, Unit 5, Unit 6}, {has special interst in algebra, had an algebra course within the last year, has preliminary knowledge about matrices} – { Unit 1, Unit 3, Unit 5, Unit 7}, {has working knowledge about polynomials, has preliminary knowledge about matrices} – {Unit 1, Unit 3, Unit 4, Unit 5, Unit 7, Unit 8, Unit 9}, {has working knowledge about polynomials} – {Unit 1, Unit 5, Unit 9 }, {has special interst in algebra, has working knowledge about polynomials} – {Unit 1, Unit 2, Unit 3, Unit 5, Unit 6, Unit 7}, {has preliminary knowledge about matrices} – {Unit 1, Unit 2, Unit 5, Unit 6, Unit 9}, {has special interst in algebra} – {Unit 3, Unit 4, Unit 7, Unit 8, Unit 9 }, {did not have an algebra course within the last year, has working knowledge about polynomials} – {Unit 4, Unit 8, Unit 9 }, {did not have an algebra course within the last year, has working knowledge about polynomials, has no preliminary knowledge about matrices} – {Unit 1, Unit 2, Unit 3, Unit 4, Unit 5, Unit 6, Unit 7, Unit 8, Unit 9}, {} – {}, {has special interst in algebra, has moderate interst in algebra, has no interst in algebra, had an algebra course within the last year, did not have an algebra course within the last year, has working knowledge about polynomials, has no working knowledge about polynomials, has preliminary knowledge about matrices, has no preliminary knowledge about matrices} Some association rules that have the attribute ’has special interst in algebra’ as an antecedent are

838

S. Encheva and S. Tumin

– If a student has special interst in algebra then he/she has had an algebra course within the last year and has preliminary knowledge about matrices with probability 72%. – If a student has special interst in algebra then he/she has preliminary knowledge about matrices with probability 59%.

5

Conclusion

In this paper association rules in education have been used for ﬁnding correlations among students’ preliminary knowledge in mathematics and their abilities to solve linear algebra related problems.

References 1. Brin, S., Motwani R., Ullmann, J.D., Tsur, S.: Dynamic Itemset Counting and Implication Rules for Market Basket Data. Proceedings of the ACM SIGKDD international conference on management of data, Tuscon, AZ, USA, (1997) 255–264 2. Carpineto, C., Romano, G.: Concept Data Analysis: Theory and Applications, John Wiley and Sons, Ltd., (2004) 3. Davey, B. A., Priestley, H. A.: Introduction to Lattices and Order. Cambridge University Press, Cambridge (2005) 4. Delgado, M., Sanchez, D., Martin-Bautista, M.J., Vila, M.A.: Mining Association Rules with Improved Semantics in Medical Databases. Artiﬁcial Intelligence in Medicine, 21(1-3) (2001)241-5 5. Ganter, B, Stumme, G., Wille, R.: Formal Concept Analysis - Foundations and Applications. Lecture Notes in Computer Science, Springer Verlag, 3626 (2005) 6. Pasquier, N., Bastide, T., Taouil, R., Lakhal, L.: Discovering Frequent Closed Itemsets for Association Rules. Proceedings of the 7th International Conference on Database Theory, Jerusalem, Israel (1999) 398–416 7. Pei, J., Han, J., Mao, R.: Closet: An Eﬃcient Algorithm for Mining Frequent Closed Itemsets. Proceedings of the ACM SIGKDD International Workshop on Research Issues in Data Mining and Knowledge Discovery, Dallas, USA (2000) 21–31 8. Wille, R.: Concept Lattices And Conceptual Knowledge Systems. Computers and Mathematics with Applications, 23(6-9) (1992) 493–515

Associative Neighborhood According to Representative Attribute for Performing Collaborative Filtering Kyung-Yong Jung School of Computer Information Engineering, Sangji University, Korea [email protected]

Abstract. Collaborative filtering has been widely used and successfully applied to recommend items in practical application. However, collaborative filtering does not use attributes of items at all. In this paper, associative neighborhood according to the representative attribute, for the purpose of improving accuracy and performance in collaborative filtering system, is proposed. This associative neighborhood selects the associative users that have similar preferences by extracting the representative attribute that most affect the preference. The approach is empirically evaluated, for comparison with the nearest-neighbor model and k-means clustering, using the EachMovie dataset. This method was found to significantly outperform the previous method.

1 Introduction The personalized recommendation systems using collaborative filtering calculate the similarity between a specific user and each of other users who have rated the items that are already rated by the user. Since collaborative filtering is based on the ratings of the neighbors who have similar preferences, it is very important to select the neighbors properly to improve the prediction quality [7],[10]. Collaborative filtering systems use each user’s rating information on various items. The most common approach compares the rating information between users, discovers similar users, and then predicts a user preference for a certain item based on the similar preferences. Because collaborative filtering does not require any information on the contents of items, it can recommend items like music or movies, for which the contents are difficult to analyze. The similarity of preferences between a specific user and other users is computed from the correlation coefficient. Predicting preferences for a certain item is based on other users preferences for that item, and the similarity between each other. In reflecting human opinions, collaborative filtering has several advantages: filtering items that are not easily analyzed by automated processes, filtering items based on quality, and filtering serendipitous items [2]. However, collaborative filtering does not use attributes of items in any way. In this paper, the associative neighborhood according to the representative attribute, for the purpose of improving accuracy and performance in collaborative filtering systems, is presented. The term representative attribute is used for representing a primary attribute that influences the preference for the item. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 839 – 844, 2006. © Springer-Verlag Berlin Heidelberg 2006

840

K.-Y. Jung

2 Associative Neighborhood The associative neighborhood is used through data mining for collaborative filtering. This technique is also used to promote the accuracy and performance, by reducing a high dimensional feature space using the Apriori algorithm. The accuracy of the associative neighborhood depends on the number of users for composing associative users. We show how the numbers of users for composing associative users at the Apriori algorithm are selected efficiently. 2.1 Selection of Associative User The Apriori algorithm can mine association rules using the data mining technique. The process of mining association rules between users consists of two stages. In the first stage, compositions having transaction support in excess of minimum support are found, in order to constitute a list of frequent users. In the second stage, the list of frequent users is used to create association rules from the database. As for all frequent users (L), find subset instead of all empty sets of frequent users. As for each subset (A), if the ratio of support (L) to support (A) is not less than minimum confidence, the rule of AÆ(L-A) type is displayed. The support of this rule is support (L) [3]. Table 1 reconfigures items to evaluate the preference of user transactions. Table 1. The user transaction Transaction No. 1 2 3 4

The extracted users u1 , u2 , u3 , u4 u2, u3, u5, u9, u10, u11 u1, u2, u3, u5, u6, u7, u8, u12 u3, u13, u14, u15, u16, u17

Transaction No. 5 6 7 8

The extracted users u3, u13, u18 u3, u13, u15, u19, u20 u23, u24, u25 u21, u22

The transaction number means the items to evaluate the preference, and the extracted user is to organize the candidate user set and the frequency user set. Table 2 presents the steps used for extracting the associative users using the Apriori Algorithm. Frequent 3-userset is extracted {u1, u2, u3}, {u2, u3, u5}, {u2, u5, u15}, {u3, u13, u15} according to the Apriori Algorithm. Table 2. The steps for extracting the association users using the Apriori Algorithm Candidate 1-userset u1(2) u2(3) u3(6) u4(1) u5(2) u6(1) u7(1) u8(1) u9(1) u10(1) u11(1) u12(1) u13(1) u14(1) u15(2) u16(1) u17(1) u18(1) u19(1) u20(1) u21(1) u22(1) u23(1) u24(1) u25(1)

Frequent 1-userset u1(2) u2(3) u3(6) u5(2) u13(3) u15(2)

Candidate 2-userset (u1,u2)(2) (u1,u3)(2) (u1,u5)(1) (u1,u13)(0) (u1,u15)(0) (u2,u3)(3) (u2,u5)(2) (u2,u13)(0) (u2,u15)(0) (u3,u5)(2) (u3,u13)(3) (u3,u15)(2) ( u5,u13)(0) (u5,u15)(0) (u13,u15)(2)

Frequent 2-userset (u1,u2,u3)(2) (u1,u2,u5)(0) (u1,u2,u13)(0) (u1,u2,u15)(0) (u1,u3,u5)(1) (u1,u3,u13)(0) (u1,u3,u15)(0) (u2,u3,u5)(2) (u2,u3,u13)(0) (u2,u3,u15)(0) (u2,u5,u15)(2) (u2,u5,u13)(0) (u3,u5,u13)(0) (u3,u5,u15)(0) (u3,u13,u15)(2) (u13,u15,u1)(0) (u13,u15,u2)(1) (u13,u15,u3)(0) (u13,u15,u5)(0)

Frequent 3-userset (u1,u2,u3)(2) (u2,u3,u5)(2) (u2,u5,u15)(2) (u3,u13,u15)(2)

Associative Neighborhood According to Representative Attribute

841

2.2 Associative User Pattern Generation The associative user pattern representation includes not just 2-associative users but as many as 5-associative users present in the database. At a confidence level of 85 and a support level of 25 in the Apriori algorithm, some characteristic user combinations, can be captured, where the number of users increases [5,6]. The process of generating the pattern is performed in the retrieval step, where associative users are generated in the last pass. An illustration of this is the accumulated number of associative user patterns during the process of generating the pattern for 20,864 users in Table 3. Let AU denote an associative user involved in the generation of the associative user using the Apriori algorithm. The number of associative user patterns generated using 2-AU can be seen to be larger than that in the other cases. Table 3. The generating associative user pattern and the result of clustering process

No. of Associative User Response Time (sec) Accuracy

2-AU 149,894 43 89.2

3-AU 12,936 21 92.6

4-AU 3,822 11 71.4

5-AU 191 -

In order to evaluate the performance of the clustering process in each case (2-AU, 3-AU, 4-AU, 5-AU), the Association Rule Hypergraph Partitioning algorithm (ARHP) [3,7] is used for clustering 20,864 users using association rules and hypergraph partitioning. In the case where the ARHP algorithm using AU clusters a user behavior into all of the classes except for the attribute class, this is considered to be incorrect clustering. The response time represents time of the associative user clustering process. 2-AU case exhibits very poor performance. However, the accuracy of the clustering process 2-AU is higher than using that 4-AU, but lower than that using 3-AU. The clustering process using 3-AU is much more accurate than the other cases. In addition, the 3-AU case exhibits relatively good performance. The 4-AU case exhibits very good performance. However, the accuracy of the clustering process using 4-AU is far lower than the cases. Therefore, in this case, it is appropriate to use the 3-associative users format for the pattern selection of associative user clustering.

3 Representative Attribute for Performing Collaborative Filtering The associative neighborhood according to the representative attribute is used to select the nearest associative neighbors, who have similar preferences to the active user, by extracting the representative attribute that most affect preferences [8]. Collaborative filtering involves the selection of all other users who have similar tastes or opposite tastes to the active user by means of the Pearson correlation coefficient [2],[11]. These users preferences are used in predicting the preference of a target item for the active user. However, collaborative filtering using the representative attribute only involves users that have similar tastes, in order to select a user that has higher correlation coefficient than the active user. The correlation is generally computed with the Pearson correlation coefficient.

842

K.-Y. Jung

3.1 Extraction of Representative Attribute To overcome that collaborative filtering does not use the attributes of items at all, such that item attribution provides more efficient filtering, collaborative filtering using attributes is used, through the extraction of the representative attribute. In general, this means that user preferences for items are mainly affected by attributes. The representative attribute is defined to be the primary factors that influence the preference for a particular item. Table 4 presents the algorithm used to determine the representative attribute of an associative user. Table 4. Extraction of the representative attribute Algorithm Extraction of the representative attribute Input: Association user’s preference ൺ Value of item Output: Representative attribute ൺ RepresentativeID AttributeSum[Num_Attribute] ← 0, AttributeCount[Num_Attribute] ← 0 for Items that user rated do for Attributes of item do AttributeSum ← AttributeSum + Value of Item AttributeCount++ endfor endfor for j=1 to Num_Attribute do RepresentativeID ← Max(RepresentativeID, AttributeSum/AttributeCount) endfor for i=1 to Num_Attribute do for j=1 to Num_Gender do for k=1 to Num_Age do Representative Attribute[RepresentativeID] ← Add Associative User endfor endfor endfor return

The extraction of the representative attribute uses items rating user preferences. For the purpose of extracting the representative attribute, in this paper, the ratings of the user’s preference are summed for each attribute of the item. The attribute with the maximal summation is considered to be the representative attribute. The associative neighborhood according to the representative attribute is presented, that uses the age and gender, in order to improve the prediction quality. The users are grouped by age and gender, in order to take the relationship between age and gender, into consideration. For each age, gender grouping, derived from all of the user profiles, an active user, whose profile is composed of the average of the group preferences for each representative attribute, is created.

4 Evaluation In order to evaluate the performance of the associative neighborhood according to the representative attribute (AN-RA), grouping methods for collaborative filtering are

Associative Neighborhood According to Representative Attribute

843

compared with the nearest-neighbor model (NNM) [2],[7] and k-means clustering [1] (KMC). Experiments are performed on a subset of movie rating data, collected from the EachMovie dataset [9]. The data represents an experiment of 20,864 users, with each user rating at least 80 movies and 1,612 kinds of movie, through data integrity. To generate the associative neighborhood, users are represented as 3-associative users using the Apriori algorithm and 20,864 users. The Apriori algorithm can mine associative users at confidence 85 and support 25 and 3-association rule [5],[6]. As an experimental result, 167,729 numbers of the associative user pattern and the confidence are created in the transaction. In addition, 3-AU represents comparatively good speedup. It is relevant to use the 3-associative users format in pattern selection for the associative user pattern. The associative user pattern is grouped according to the representative attribute, using the ARHP Algorithm. In addition to the above, it is important to evaluate recall and precision. In order to quantify this with a single measurement, the F-measure is used, which is a weighted combination of recall and precision widely used in Information Retrieval [4].

Fig. 1. Performance of the AN-RA method compared to KMC method and NNM method

Figure 1 summarizes the performance of the three methods. In Figure 1(a), it is seen that AN-RA is more accurate than the other methods (average 89.12 for AN-RA vs. 76.36 for KMC vs. 84.62 for NNM). In Figure 1(b), it can be seen that AN-RA has higher performance than other methods (average 89.12 for AN-RA vs. 59.33 for KMC vs. 65.55 for NNM). These results are encouraging and provide empirical evidence that the use of the associative neighborhood according to the representative attribute can lead to improved performance of collaborative filtering systems.

5 Conclusion In this paper, the associative neighborhood according to representative attribute is proposed for the purpose of improving accuracy and performance in collaborative filtering systems. The associative user pattern representation that includes associative

844

K.-Y. Jung

users instead of just single users, is proposed. It has been shown that when associative users are composed of 3-associative users, the performance of associative user grouping is most efficient. The associative neighborhood selects the associative users that have similar preferences by extracting the representative attribute that most affect the preference. The results are encouraging and provide empirical evidence that the use of the associative neighborhood according to the representative attribute can lead to improved performance in collaborative filtering systems. In the future, this method would be verified with more users, and the proposed method would be combined with hybrid filtering.

Acknowledgements We thank the DES systems research center from permitting us to use the EachMovie dataset. This research was supported by Sangji University Research Fund, 2006.

References 1. Ding ,C., He, X.: K-Means Clustering via Principal Component Analysis, In Proc. of the 21th Int. Conf. on Machine learning (2004)225-232 2. Connor, M. O., Herlocker, J.: Clustering Items for Collaborative Filtering,Proc. of the ACM SIGIR Workshop on Recommender Systems, Berkeley, CA (1999) 3. Han, E. H., Karypis G.,, Kumar, V.: Clustering based on Association Rule Hypergraphs,Proc. of the SIGMOD'97 Workshop on Research Issues in Data Mining and Knowledge Discovery (1997)9-13 4. Yu, H., Hatzivassiloglou V.:Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences, In Proc. of the Conf. on Empirical Methods in Natural Language Processing (2003) 5. Jung, K. Y., Lee, J. H.: User Preference Mining through Hybrid Collaborative Filtering and Content-based Filtering in Recommendation System, IEICE Trans. on Information and Systems, Vol. E87-D, No. 12(2004)2781-2790 6. Ko, S. J., Lee, J. H.: Optimization of Association Word Knowledge Base through Genetic Algorithm, In Proc. of the 4th Int. Conf. on Data Warehousing and Knowledge Discovery, LNCS 2454 (2002)212-221 7. Karypis, G.: Evaluation of Item-Based Top-N Recommendation Algorithms, Technical Report CS-TR-00-46, Computer Science Dept., University of Minnesota(2000) 8. Wang, J., de Vries, A. P., Reinders, M. J.T.: A User-Item Relevance Model for Log-based Collaborative Filtering, In Proc. of the European Conf. on Information Retrieval, LNCS 3936(2006)37-48 9. MovieLens Collaborative Filtering Data Set, http://www.cs.umn.edu/research/GroupLens/, Grouplens Research Project(2000) 10. Kim, T. H., Yang S. B.: An Effective Recommendation Algorithm for Clustering-Based Recommender Systems, In Proc. of the Conf. on Artificial Intelligence, LNCS 3809 (2005) 1150-1153 11. Herlocker, J. L., Konstan J. A., Terveen, L. G., Riedl J. T.: Evaluating Collaborative Filtering Recommender Systems, ACM Transactions on Information Systems (TOIS) archive, Vol. 22, No. 1 (2004)5-53

Benchmarking a Recurrent Linear GP Model on Prediction and Control Problems Xiao Luo, Malcolm Heywood, and A. Nur Zincir-Heywood Faculty of Computer Science Dalhousie University Halifax, NS, Canada, B3H1W5 {luo, mheywood, zincir}@cs.dal.ca Abstract. In this work, a recurrent linear GP model is designed by introducing the concept of internal state to the standard linear Genetic Programming (GP), so that it has the capacity of working on temporal sequence data. We benchmarked this model over four standard prediction and control problems, which include generic even parity problem, sun spot series prediction, Lorenz Chaotic time series prediction and pole balance control problem. From the experimental results, the recurrent linear GP model appears to be very competitive compared to those algorithms relying on spatial reasoning of the temporal problem.

1

Introduction

Genetic Programming has been applied to a wide range of supervised learning problems, chieﬂy formulated as either classiﬁcation or function approximation problems [4]. They have also seen widespread application within the context of reactive environments with delayed payoﬀ (reinforcement learning), such as the ‘Ants’ [4] or robot control problems. In both cases, the models relay on a spatial description of the problem. However, in this work, we are interested in problems, which have temporal descriptions. Thus, pattern sequences and capacity of detecting or retaining the temporal relationships between the patterns of a sequence is now important. There are two basic solutions to build models for solving temporal problems. In the ﬁrst case, the temporal dependence is encoded by the features of each pattern using some apriori information, thus reducing the problem to spatial reasoning alone. Examples of this might involve encoding the temporal property of the problem using a sliding window (shift register) of some predeﬁned depth and resolution. Such an approach has seen wide spread application to predictive problems [9], [10] and [11]. In the second case, a recurrent learning model is employed. This means that the model has capacity to retain state across more than one pattern. Examples of recurrent models include Hidden Markov Models and recurrent neural network models. In these examples, support for reasoning about temporal aspects of the problem is provided by feedback paths internal to the model. Various evolutionary approaches have been proposed for building such models [1] and [5]. The motivation of this work is to design and implement a recurrent linearly structured GP (L-GP) model that falls in the second case and benchmark it on D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 845–850, 2006. c Springer-Verlag Berlin Heidelberg 2006

846

X. Luo, M. Heywood, and A.N. Zincir-Heywood

predictive and control problems. In the following, the recurrent L-GP model is described in section 2. Results from an experimental study are presented on four diﬀerent problems in section 3. Finally, conclusions are drawn and future work is discussed in section 4.

2 2.1

Recurrent Linear Genetic Programming Linear Genetic Programming

Linearly Structured Genetic Programming is based on a representation closely related to that employed by Genetic Algorithms. Speciﬁcally, individuals are constructed from a (linear) sequence of integers each of which has to be decoded into a valid instruction (syntactic closure). The decoding process eﬀectively translates each integer into instruction. In this work, a 2-address instruction format is employed e.g., R1 = R1 + IP 3, where R1 denotes a register with index ‘1’ and IP 3 is a reference to a input with index ‘3’ or 3rd feature of the current input pattern. The speciﬁc form of linearly structured GP(L-GP) employed by this work utilizes the page-based L-GP developed in an earlier work [3]. Such a scheme enforces a ﬁxed length representation, the basic components of which are deﬁned as follows. – Representation: Individuals take the form of a 2-address instruction format. Individuals are described in terms of a (uniform) randomly selected number of pages, where each page has the same ﬁxed number of instructions. – Initialization: Individuals are described in terms of the number of pages and instructions. The number of pages per individual is determined through uniform selection over the interval [1 . . . maxPages]. Deﬁning an instruction is a two-stage process in which the mode bit is ﬁrst deﬁned(instruction type) using a roulette wheel(user speciﬁes the proportions of the three instruction types). Secondly the content of the remaining ﬁelds is completed with uniform probability. – Selection Operators: A steady-state tournament is employed. In this case all such tournaments are conducted with 4 individuals selected from the population with uniform probability. The two ﬁttest individuals are retained and reproduce. The children over-write the worst two individuals from the same tournament using their respective position in the population. – Variation Operators: Three variation operators are utilized, each with a corresponding probability of application. Crossover selects one page from each oﬀspring and swaps them. Mutation has two forms. The ﬁrst case is referred to as ‘Mutation’which merely Ex-OR’s an instruction with a new instruction. The second mutation operator is denoted as ‘Swap’ which identiﬁes two instructions with uniform probability in the same individual and interchanges them. This represents the basic page-based L-GP scheme. However, the selection of the page size is problem speciﬁc. As a consequence the Dynamic Page based LGP algorithm was introduced to modify the number of instructions per page

Benchmarking a Recurrent Linear GP Model

847

dynamically during the course of the training cycle [3]. Such a scheme was demonstrated to be much more robust than that of a ﬁxed page size over a range of benchmark problems [3]. In this work, the Dynamic Page based L-GP algorithm is employed. 2.2

Recurrent L-GP

The only modiﬁcation necessary to change a standard L-GP model into a recurrent model is to retain register values between sequential pattern presentations. Thus within the context of a prediction problem, the registers are never reset until the last pattern of the input sequence is reached. The prediction read from the predeﬁned ‘output’ register(s) is to predict the next pattern following the last pattern that input into L-GP. In the case of a control problem in which a failure state might be explicitly reached, the model would be allowed to run until such a state occurs and then the registers are reset before a new initial condition is selected and the process is repeated.

3

Experiments and Results

A total of four benchmark problems are considered from a recurrent modeling context: Generic solution to the even parity problem; predictor for the Lorenz chaotic attractor; predictor for the sun spot series problem, and a controller for the pole balance problem. The GP learning parameters are summarized in Table 1. Table 1. GP Learning Parameters Data set Parity, Sun Spot, Pole Balance Lorenz Pop. Size 125 Max. Instr. 128 512 Max Tournaments 50000 500000 Num. Reg. 4 8 Function Set +, -, *, % ( Terminal Set {0, . . . , 255} {input index} P(Xover) 0.9 P(Mutation) 0.5 P(Swap) 0.9 Runs 50(25 on second pole balance problem)

3.1

Generic Even Parity

The even parity problem is a well-known early benchmark in which the basic objective was to derive a speciﬁc even parity instance using 2 input logical operators that excluded Ex-OR [4]. Here, our objective is to derive 6- and 7-parity from a training set consisting of 2-, 3-, 4- and 5-parity. The input sequence consists of bits associated with the parity case. The bits are input into recurrent

848

X. Luo, M. Heywood, and A.N. Zincir-Heywood

L-GP one after another. On presentation of the last bit in the sequence the value of register R0 which acted as “output” register is compared to the label for that sequence. Fitness function is the sum square error of all training sequences. The simplest(and most typical) solution generated by recurrent L-GP to this problem consisted of only two instructions: R0(t) = R0(t − 1) − X(t); R0(t) = R(t) × R(t). X is the input sequence and X(t = 1 . . . n) corresponds the input bit in the sequence. R0 is initialized to 0 (R0(0) = 0). It is worth to mention that not only this solution is a very concise solution for 6- and 7-parity sequences but also for all n-parity sequences. 3.2

Sun Spot Time Series

The Sun Spot time series has been a benchmark prediction problem in a number of studies. The typical approach has been to use a sliding window with length n to go through the whole sequence to construct a spatial presentation of the sequence, then a predictive model f is built to predict the next time step n + 1, i.e. x(t + n + 1) = f (x(t), . . . , x(t + n)). In our system, no ‘n’ is predeﬁned, all the patterns before x(t+1) are input to predict the x(t+1). Thus, t is dynamic. This leaves the selection of relevant previous time steps to the recurrent L-GP model. In line with previous work, the dataset is divided into training(221 patterns representing the years 1700-1920), and two test sets (Test set 1 has 35 patterns (1921-1955), Test set 2 has 24 patterns (1956-1979)). Fitness function takes the form of a normalized mean square error as shown in formula 1. N M SE(P ) =

P 1 1 $ (desired(p) − GP out(p))2 2 σ P p=1

(1)

Where σ 2 = 1535 and P is the pattern count for the dataset [9]. The best solution for this problem consists of 35 instructions. Table 2 provides a comparison of recurrent L-GP with other predictors identiﬁed in previous works on the same dataset based on their best performance. The other models provide lower errors on training and the ﬁrst test partition, but degrade signiﬁcantly on the second test partition, which represents the period most distant from the training partition. It is important to notice that recurrent L-GP is more consistent over the training and 2 test sets than the other approaches. 3.3

Lorenz Chaotic Attractor

Prediction of a chaotic time series has also been widely used benchmark for predictive models. Same as the case of the Sun Spot benchmark, the typical approach for this problem is to build a predictor based on the sliding window [7]. Lorenz Chaotic time series is deﬁned over three variables by the discrete diﬀerential system, x˙ = σ(y − x); y˙ = ρx − y − yz; z˙ = xy − bz (2) where σ = 10, ρ = 28, and b = 8/3. The time series is built from an initial condition of (0, 0.01, 0) and a step size of 0.01. A total of 4000 samples from the sequence are constructed with the ﬁrst

Benchmarking a Recurrent Linear GP Model

849

Table 2. Comparative Results on Sun Spot Problem Model Recurrent L-GP NN [11] TAR [9] Recurrent NN [5] GP [10]

NMSE (train) 0.1077 0.082 0.097 0.1006 0.125 ± 0.006

NMSE (test1) 0.1655 0.086 0.097 0.0972 0.182 ± 0.037

NMSE (test2) 0.1708 0.35 0.28 0.4361 0.370 ± 0.06

2000 discarded to avoid any start up properties. The remaining 2000 samples are then divided equally between training and test. The ﬁtness function is the normalized mean square error as shown in formula 1. The best results for this problem are NMSE(training)= 1.38 × 10−5 and NMSE(test)= 1.09 × 10−5 . The best solution for this problem consists of 48 instructions. In comparison, the SOM based Dynamic Learning architecture of Principe et al., produced training errors of the order 0.0011 [7], obviously worse than the performance of the recurrent L-GP model. 3.4

Pole Balance

The pole balance or inverted pendulum problem places the learning system within the role of a bang-bang controller [6]. The controller supplies a control force of ±10N to a cart on which an inverted pendulum is connected. Cart behavior is described in terms of x - the distance from the center of a track on which the cart travels -, and θ - the angle of the pendulum relative to the vertical. The state of the cart can be described in the form of a binary fail (unbalanced)/no fail (balanced) metric, where the failure state is deﬁned by the condition, IF (|θ| >12) OR (|x| >2.4) THEN (fail) ELSE (no fail) The objective here is to evolve recurrent L-GP to produce a controller to supply force to the cart so as to keep the cart balanced as long as possible. Given a force produced by a controller, the behavior of the cart is described by a series of diﬀerential equations, modeled as an Euler discrete event simulation at a step size of 0.01 [6]. We set up two sets of experiments. One has two 2 inputs (x and θ), the other has 4 inputs (x, θ and their corresponding velocities). In line with previous GP solutions to this problem, for the training process, we set the 10 initial states of the cart (x, θ and their corresponding velocities) randomly over the interval ±0.2. We did 25 diﬀerent random seed runs, all of them converged. The pole could last 8 seconds under all 10 random initializations without failure in training for both conditions of 2 and 4 inputs. In comparison with the controller evolved by Chellapilla using a macro-mutation operator based Tree structured variant of GP, the mean time a pole was balanced during training was 2.7228 seconds, with no controller solving the problem [2]. Hence, our system is signiﬁcantly better.

850

4

X. Luo, M. Heywood, and A.N. Zincir-Heywood

Conclusions

In this paper, we describe a recurrent linearly structured GP model and benchmark it on a series of prediction and control problems. This model works on sequence directly instead of using a sliding window to map the temporal data representation into spatial data representation. The results show that this model is more consistent on both the training and the test sets than the sliding window based model. Moreover, the evolved rule generated by the recurrent linear GP is a set of instructions, which are very simple. Future work will investigate the inter-relationships between solution lengths and the size of the register set and new ﬁtness functions, which are able to express the temporal nature of the problem. We are also interested in applying such a model to text sequence analysis, and bio-oriented sequence analysis.

References 1. Angeline, P.J., Saunders, G.M., Pollack, J.B.: An Evolutionary Algorithm that Constructs Recurrent Neural Networks. IEEE Transactions on Neural Networks. 5(1) (1994) 54–64 2. Chellapilla, K.: Evolving Compuer Programs Without Subtree Crossover. IEEE Transactions on Evolutionary Computation. 1(3) (1997) 209–216 3. Heywood, M.I., Zincir-Heywood, A.N.: Dynamic Page-Based Linear Genetic Programming. IEEE Transactions on Systems, Man and Cybernetics - PartB: Cybernetics. 32(3) (2002)380–388 4. Koza, J.R.: Genetic Programming: ON the Programming of Computers by Means of Natural Selection. MIT Press, MA, (1992) 5. McDonnell, J.R., Waagen, D.: Evolving recurrent Perceptrons for Time-Series Modeling. IEEE Transactions on Neural Networks. 5(1) (1994) 24–38 6. Miller, W.T., Sutton, R.S., Werbos, P.J.: Neural Networks for Control. MIT Press,(1990) 7. Oakley H.: Two Scientiﬁc Applications of Genetic Programming: Stack Filters and Non-Linear equation Fitting to Chaotic Data. In Advances in Genetic Programming. Chapter 17. K.E. Kinnear (ed). MIT Press, MA, (1994) 369–390 8. Teller, A.: The Evolution of Mental Models. In Advances in Genetic Programming. Chapter 9. K.E. Kinnear (ed). MIT Press, MA, (1994)198–219 9. Tong, H., Lin, K.S.: Threshold autoregression, limit cycles and cyclical data. J. of the Royal Statistical Society. B 42 (1980) 245 10. Vallejo, E.E., Ramos R.: Evolving Turing Machines for Biosequence Recognition and Analysis. In Proceedings of the 4th European Conference on Genetic Programming (EuroGP 01). Springer Verlag, Berlin,(2001) 192–203 11. Weigend, A.S., Huberman, B.A., Rumelhart, R.E.: Predicting the Future: A Connectionist Approach. Int. J. of Neural Systems. 1(3) (1990) 193–209

Cognitive Computing in Intelligent Medical Pattern Recognition Systems Lidia Ogiela1, Ryszard Tadeusiewicz2, and Marek R. Ogiela2 AGH University of Science and Technology 1 Faculty of Management 2 Institute of Automatics Al. Mickiewicza 30, PL-30-059 Krakow, Poland {logiela, rtad, mogiela}@agh.edu.pl

Abstract. The paper will be presented the way of application of linguistic mechanisms of pattern recognition in development of intelligent information and recognition systems. Such methods are aimed to facilitate an in-depth analysis of the meaning for some selected medical patterns, especially in the form of spinal cord and coronary arteries images. The procedures proposed for semantic reasoning are based on the model of cognitive resonance analysis. Cognitive procedures have been applied to the task of semantic interpretation diagnostic images both from the central nervous system as well as from heart helical CT examinations. The application presented in this paper will show the great possibilities of lesion detection in the analysed structures used the grammar approach to the interpretation and classification tasks.

1 Introduction Among a great variety of Information Systems the class of Diagnosis Support Systems (DSS) are very popular due to their wide application in clinical practice. In this paper we try to show an example of a system that was prepared not only for simply diagnose, but also was oriented towards the cognitive analysis leading to understanding of pathological lesions. In medical image analysis the main objective is to determine whether there is any important disease lesions observed in the patient’s analysed organ or whether there are no such changes visible in the image. If there are any lesions, their type is analysed and the system directs its functions towards determining what disease the patient has. DSS systems operate on the basis of three main rules: image transformation, image analysis, and image classification allowing to classify all pathological features existing in the analysed image. DSS systems proposed in earlier research were used, among others, for pancreas as well as for kidney, and hand disease diagnosis. Their functioning is based on medical image recognition methods [5]. Due to the fact that DSS systems develop very rapidly, an attempt was made to construct a new class of such systems using in their operation the mechanisms of cognitive analysis. The said are to be directed at attempts to automatically understand the semantics of analysed images, and therefore at their content meaning interpretation. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 851 – 856, 2006. © Springer-Verlag Berlin Heidelberg 2006

852

L. Ogiela, R. Tadeusiewicz, and M.R. Ogiela

2 Semantic Processing of Medical Visual Patterns Cognitive analysis used in IT systems is very often based on the syntactic approach [3, 5]. For the purpose of meaning image interpretation it first uses a pre-processing operation usually composed of image coding using of terminal symbols, shape approximation, as well as some kind of filtration or segmentation. As a result of the execution of such stages it is possible to obtain a new image representation in the form of hierarchic semantic tree structures and subsequent production steps of this representation from the initial grammar symbol [5]. An intelligent cognitive system distinguishing at the stage of pre-processing image data must, in the majority of cases, perform image segmentation, identify primitive components and determine spatial as well as semantic relations between them. An appropriate classification is based on the recognition of whether a given representation of the actual image belongs to a class of images generated by languages defined by one of possible number of grammars. Such grammars can be considered to belong to sequential, tree and graph grammars while recognition with their application is made in the course of a syntactic analysis performed by the system [5]. The main element of a correctly functioning IT system supporting the medical image diagnostics is, analysis preparation of a cognitive method of disease units and pathological lesions as occurring in the spinal cord or coronary arteries. The cognitive analysis contained in the DSS-central nervous, and blood circulatory systems is aimed to propose an automatic interpretation method of these extremely complicated medical images. Such images are difficult to interpret due to the fact that various patients have various morphologies of the imaged organs. This is true both of the correct state and if there are any disease lesions. The nervous and circulatory systems, similarly as most elements of the human body, is not always correctly built and fully developed from the birth. It often occurs that such systems for the first couple of years functions correctly and only after some time there are some troubles with its functioning. Most pathological changes occurring in the central nervous system, and blood circulation are clinically diagnosed based mainly on image examination assigned to NMR or helical CT tomography.

3 Classification of Spinal Cord Images All the analysed images of spinal cord were, before their proper recognition, subject to segmentation and filtration procedures [4]. Structures shown in this way were then subject to cognitive analysis stages using the grammar described below. In order to analyse disease lesions of the spinal cord, the following attributed grammar has been proposed: Gsc= (VN, VT, P, ST), where: VN – stands for a set of non-terminal symbols (intermediary in the process of image description generation), VT – stands for a set of terminal symbols (final symbols

Cognitive Computing in Intelligent Medical Pattern Recognition Systems

853

describing shape features), P – stands for a production set, ST – stand for the grammar start symbol, VN = {LESION, STENOSIS, DILATATION, TUMOR, N, D, S}, VT = {n, d, s}. Apart from these, the following meaning was given to terminal elements present in the description: n∈[-11°, 11°], d∈(11°, 180°), s∈(-180°, -11°), ST =LESION P production set has been defined as in Table 1. Table 1. Production set defining changes in the Gsc grammar Lesion Dilation/cyst Neoplastic tumours Stenosis, compression Elements of the detected lesions

Grammar rules 1. LESION → DILATATION 2. DILATATION → D N S | D N | D S

3. LESION → TUMOR 4. TUMOR → D S D S | S D S N | SDSD|DSDN 5. LESION → STENOSIS 6. STENOSIS → S N D | S D |S N 7. N → n | n N 8. D → d | d D 9. S → s | s S

Semantic actions Lesion = spinal dilatation Lesion = spinal tumor

Lesion = spinal stenosis Lesion features= location; length; diameter,quantity,severity

The proposed grammar makes it possible to detect various kinds of spinal cord or meningeal stenoses characteristic for neoplastic lesions and inflammatory processes of the spinal cord. Figure 1 presents an image of the spinal cord with a visible deformation, and the diagram of the spinal cord.

Fig. 1. Spinal cord and width diagram. Diagnostic description of spinal cord lesions with paraganglioma detected as a result of cognitive analysis.

The bold area on the image represents the area of occurrence of the anomalies within the structure of the spinal cord. The set of chords, cross-cutting the spinal cord in subsequent points perpendicularly to its axis demonstrate how the width diagram was made.

854

L. Ogiela, R. Tadeusiewicz, and M.R. Ogiela

Spinal cord width diagram (Fig.1) presents the results of spinal cord morphology analysis. It is the most precious source of information when one is looking for pathological lesions and it contains all-important data about the examined fragment of central nervous system. At the same time it ignores all spinal cord image details unimportant from the diagnostic point of view. The results presented here have been achieved by the application of attribute grammar and they are an example of the cognitive approach to the medical data considered here. The type of lesion detected here has been assigned based on its location and on morphometric parameters determined by the grammar semantic procedures. In order to perform meaning analysis on spinal cord images the MISA (Medical Image Syntax Analyser) computer system has been developed. This enables the analysis and classification of spinal cord visualization analysed in this paper. Table 2. The efficiency of cognitive analysis methods directed towards discovering and understanding selected disease phenomena in the central nervous system The analysed disease lesion

Number of examined images

Spinal cord dilation Cysts Neoplastic tumours Stenoses and compression Degeneration Total

Cognitive analysis efficiency [%]

2 18 27 14

Number of correctly recognised images (lesions) 2 17 25 12

23 84

20 76

87 90,5

100 94 93 86

The application efficiency of cognitive analysis procedures, using this system, has been presented in the Table 2 and it is directed towards comparing the results obtained from the use of this system with those that one can consider as a correct diagnosis. These results are obtained as a result of the application of semantic analysis algorithms conducted in reasoning modules of the proposed system and based on semantic actions assigned to structural rules.

4 Coronary Vessel Analysis In the case of coronary vessels analysis the proper graph-based grammar describing these structures was defined so that the individual branches of the graph in the description provided identify all start and end points of coronary vessels and all bifurcations or transitions of main vessels into lower-level ones. Thus developed graph-based structure will constitute the elements of the language for defining spatial topology and correct vascularization of the heart muscle together with the potential morphological changes, e.g. in the form of stenoses in their lumen (Fig. 2).

Cognitive Computing in Intelligent Medical Pattern Recognition Systems

855

Fig. 2. Spatial labelling of coronary arteries and defining relations occurring between them for the right coronary artery. The grey rectangle marks the place of a major stenosis in a coronary artery.

Fig. 3. The set of productions in the form of graphs deriving the structure of coronary vessels together with the vertex characteristic description

To define the location where the vessel pathology is present in the graph of coronary arteries the following grammar is proposed.

GedNLC = ( Σ, ∆ , Γ, P , Z ) , where Σ = ΣN ∪ ΣT and is a set of both terminal and non-terminal node labels describing the examined graphs and defined as follows.

856

L. Ogiela, R. Tadeusiewicz, and M.R. Ogiela

∆ = ΣT = {right coronary artery, right marginal branch, right posterolateral branch, posterior interventricular branch} – a set of terminal node labels. ΣN = {A, B, C, I, M, P, P1, R} is a set of non-terminal node labels. Γ = {µ,1; µ,2; ν,1; ν,24; ο,19; ο,24} is a set of labels describing edges of the graph. Z = {A} is the original starting graph. P – is a finite set of productions recorded in the graph-based form and presented in Fig. 3.

5 Conclusion The research conducted by the author, based on the analysis of images with pathological lesions in central nervous system (spinal cord) and blood circulatory system (coronary arteries), have demonstrated that cognitive data analysis can be the factor that significantly enriches the possibilities of contemporary medical information systems. In particular, the described research has demonstrated that an appropriately built image grammar enables the conduct of precise analysis and the description of medical images from which important semantic information can be gained on the nature of processes and pathological lesions as found in the patient’s spinal cord or coronary vessels. It is worth emphasising that the results described in this paper have been obtained following the cognitive process, simulating an experts’ method of thinking: if one observes a deformation of the organ shown by the medical image used, then one tries to understand the pathological process that was the reason for the appearance of deformations found. One does not perform a mechanic classification for the purpose of pointing out more similar samples on the pathological image.

Acknowledgement This work was supported by the AGH University of Science and Technology under Grant No. 10.10.120.39.

References 1. Burgener, F.A., Meyers, S.P., Tan, R.K., Zaunbauer, W.: Differential Diagnosis in Magnetic Resonance Imaging. Thieme (2002) 2. Khan, M.G.: Heart Disease Diagnosis and Therapy. Williams & Wilkins, Baltimore (1996) 3. Meyer, B. A.: Pattern Recognition in Medical Imaging. Elsevier (2003) 4. Ogiela, L.: Usefulness Assessment of Cognitive Analysis Methods in Selected IT Systems. Ph. D. Thesis. AGH Kraków (2005) 5. Tadeusiewicz, R., Ogiela, M.R.: Medical Image Understanding Technology. Springer, Berlin-Heidelberg (2004)

Data Mining-Based Analysis on Relationship Between Academic Achievement and Learning Methods During Vacation Hea-Suk Kim1, Yang-Sae Moon1, Jinho Kim1, and Woong-Kee Loh2 1

Department of Computer Science, Kangwon National University 192-1, Hyoja-2 Dong, Chunchon, Kangwon 200-701, Korea {hskim, ysmoon, jhkim}@kangwon.ac.kr 2 Department of Computer Science & Advanced Information Technology Research Center Korea Advanced Institute of Science and Technology (KAIST) 373-1, Guseong-dong, Yuseong-gu, Daejeon 305-701, Korea [email protected]

Abstract. In this paper, we systematically analyze the effect of learning methods and living style of students during vacation on academic achievement using data mining techniques. To achieve this goal, we first identify various items of learning methods and living style which can affect academic achievement. Students are surveyed over these items through an Internet online site and the data collected from students are stored into databases. Then, we present data filtering methods of these collected data to adopt data mining techniques. We also propose the methods of generating decision trees and association rules from the collected data of students. Finally, we apply the proposed methods to middle school students in a city of Korea, and we analyze the effect of learning methods during vacation on their academic achievement.

1 Introduction According to the recent advances in information technologies, many companies and organizations make great efforts to discover valuable knowledge from their huge databases. Data mining is a technique to discover such knowledge from the databases [1]. Many commercial data mining products such as SAS Enterprise Miner, IBM Intelligent Miner, and DBMiner were released and applied in a variety of fields including finance, communications, management, and marketing. Academic achievement means the procedure or the result of obtaining knowledge and skills through academic learning [3, 4], and it is usually measured by academic grades. The researches on academic achievement until recently are focused primarily on learning methods and activities for regular semesters. There has been almost no research on academic achievement for vacation, even though the whole vacation spans up to three or four months a year as the extension of regular semesters. In this paper, we systematically analyze the effect of learning methods and living style during vacation on academic achievement using data mining techniques. We D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 857 – 862, 2006. © Springer-Verlag Berlin Heidelberg 2006

858

H.-S. Kim et al.

perform the analysis through the following four steps. First, we derive diverse items of learning methods and living style that can affect academic achievement. Second, we perform survey at an Internet online site to collect the student's real data on the derived items including academic grades before and after vacation. Third, we perform filtering on the collected data to adopt data mining techniques. Last, we use SAS Enterprise Miner [2] (E-Miner in short) to extract meaningful knowledge. We use the decision tree [5,6] and the association rule [7] as the data mining techniques for our analysis. The decision tree is used for analyzing the effect on academic achievement of learning methods during vacation, and the association rule for detecting the meaningful relationship between learning methods and living style during vacation.

2 Related Works There have been several researches on student's academic achievement [8, 9]. In [8], the effect on academic achievement of teaching and learning attitudes was analyzed for the course of Mathematics. In [9], the academic achievement depending on the result of Myers-Briggs Type Indicator (MBTI) test was analyzed. Most of the existing researches were focused primarily on academic achievement during regular semesters and were not interested in that during vacation. It is obvious that learning methods and living style during vacation can significantly affect academic achievement after vacation. Hence, this paper can be distinguished from the existing ones, since it systematically analyzes the variation of academic achievement before and after vacation. There also have been several researches to adopt data mining techniques in the field of education [3, 5, 9]. Reference [5] showed that, for the high-ranked students, studying at academic institutes or their learning attitude doesn't affect their grades, but that, for the mid-ranked students, those can affect their grades much. However, the paper was interested only in those two items and thus has the weakness of not covering other diverse items that can affect academic achievement such as private tutoring and Internet learning sites. Moreover, the paper did not deal with anything on learning methods and living style during vacation. On the contrary, we consider more items that affect academic achievement, and we focus on the academic achievement according to the activities during vacation in this paper. References [3, 9] adopted data mining techniques in other educational problems rather than academic achievement.

3 Analysis Methods 3.1 Deriving Mining Items We derive three categories of mining items. First, we use student's grades before and after vacation for checking the variation of academic achievement. Second, we derive direct-related items that are highly probable to affect academic achievement such as private tutoring and academic institute lessons. Third, we derive indirect-related items

Data Mining-Based Analysis

859

that are less probable to affect academic achievement such as habits of writing a diary and watching TV. The variation of academic achievement is measured as one of three nominal values: Ascend, Descend, and Sustained. The direct-related items are the learning methods that students intentionally carry out to promote their academic achievement. We use the following direct-related items: private tutoring for one or a few students; academic institute lessons; watching educational broadcasting; studying extra learning materials; and Internet learning sites. The indirect-related items are daily habits or activities of students during vacation that are not directly intended for promoting their academic achievement. We use three kinds of indirect-related items as follows: items in daily living (daily hours of using a computer, watching TV, and physical exercising; whether to have a boy/girl friend and communication with parents; whether to eat breakfast every morning); items indirectly related with a specific course (whether to write a diary, letters, and emails; whether to read books or manage his/her own Internet home pages); and additional items (the university hoping to enter; whether to have the experience of travel or foreign habitation). 3.2 Analysis Using Decision Trees We use decision trees [5, 6] to find the learning methods during vacation that are helpful for improving student's grades. We generate a decision tree for each course using the direct-related items explained in Section 3.1. Using the indirect-related items simultaneously makes decision trees too complex to analyze them, and thus, we do not consider the indirect-related items in generating decision trees. We apply the following two pruning rules to decision trees generated by E-Miner. The pruning rules are used to remove meaningless leaf nodes in the trees. Pruning Rule 1. The number of objects (students) in a leaf node should be at least 5% of whole objects (students). Pruning Rule 2. The ratio of change in grades between the root node and a leaf node should be more than 5%. 3.3 Analysis Using Association Rules The association rules are derived from the frequent itemsets which are the sets of items appearing commonly in the same transactions [7]. We use both direct- and indirect-related items to derive association rules. Contrary to decision trees, analyzing association rules is not difficult even when the number of items is large, since association rules have the simple form of ‘X Y’ for itemsets X and Y. We need filtering process of collected survey data. For deriving association rules, we consider only whether a certain item appears in a transaction or not, while some of the survey data contains numeric values. Hence, we need to convert those numeric values into Boolean values of TRUE or FALSE. We convert numeric values according to their averages for each of the numeric items. We can formally define the conversion function Conv(Qi, Rj) as the following Eq. (1):

860

H.-S. Kim et al.

°True , if ans(Qi , R j ) ≥ avg(Qi ), Conv(Qi , R j ) = ® °¯False , if ans(Qi , R j ) < avg(Qi ),

(1)

where ans(Qi, Rj) is the answer given for the i-th question in the j-th student and avg(Qi) is the average of all the numeric values of the i-th question.

4 Experiments and Analysis We collected detailed real data of student's work and activities during vacation at an Internet survey site. The survey was performed on middle school students in a city of Korea whose population is about 250,000. The total number of students in the city area was 10,790, and the size of sample students participated in the survey was more than 500. Hence, the degree of confidence is above 90%, and the range of standard error is less than 5%. The period of survey was 15 days from May 17 to May 31, 2005. We first generated decision trees using only direct-related items as explained in Section 3.2. Figure 1 shows the decision tree for the course of Korean. In the figure, four nodes generated by E-Miner were discarded by the Pruning Rules 1 and 2 in Section 3.2. Hence, there left only two meaningful leaf nodes each of which can be expressed as a condition as shown in Table 1. From the conditions in Table 1, we can get the knowledge that educational broadcasting is helpful for the ascent of grades, while Internet learning sites may cause the descent of grades. Table 1. Conditions for Korean Grades

Classification Condition (Educational Broadcasting = YES) ∧ Ascend (Academic Inst. Lessons = NO) (Educational Broadcasting = NO) ∧ (Private Tutoring = NO) Descend ∧ (Internet Learning Sites = YES) The knowledge obtained from the course of English is that private tutoring is only helpful for sustaining grades, and that taking none of private tutoring, academic institute lessons, and Internet learning site lessons could rather help the ascent of grades. Also, the knowledge obtained from the course of Mathematics is that academic institute lessons help students sustain their Mathematics grades. We omit the detailed results due to space limitation. By combining the above three analysis results, we can derive the following knowledge: 1) Internet learning sites can cause the descent of grades; 2) private tutoring, which was expected to have much influence on student's grades, is actually not helpful very much; and 3) multiple adopting of several learning methods can even cause the descent of grades. We believe that the analysis results would be useful to establish student's guidelines for studying during vacation.

Data Mining-Based Analysis

861

Fig. 1. Decision Tree for Korean Course

We also generated association rules for two support values: 30% and 45%. Here, the confidence was fixed as 90% in both cases. In the case where the support was set to 30%, we got 24 association rules in total. Among the rules, the most meaningful one is that the students who study English and Mathematics at academic institutes also study Korean at academic institutes. However, the rule is within the scope of common sense, and can be hardly regarded to have a special meaning. We obtained four association rules with the support of 45%. As the analysis results by association rules, however, we conclude that there is no special relationship between the activities during vacation.

5 Conclusions In this paper, we proposed data mining-based methods to analyze the effect of learning methods and living style during vacation on student's academic achievement. We performed the analysis through the following steps. First, we derived diverse detailed items of learning methods and living style during vacation. Then, we performed survey collecting student's real data at an Internet online site. Finally, we generated decision trees and association rules finding the effective learning methods during vacation and analyzing the relationship between the activities during vacation. We summarize the knowledge obtained by our analysis as follows. (1) Internet learning sites can cause the descent of grades. (2) Private tutoring, which was expected to have much influence on student's grades, is actually not helpful very much. (3) Multiple adopting of several learning methods can even cause the descent of

862

H.-S. Kim et al.

grades. (4) Regular lessons (and regular living style) give the most positive influence on academic achievement. We believe that the analysis result can help teachers, parents, and students themselves to establish student's guidelines for studying and living during vacation.

Acknowledgement This work was partially supported by the Ministry of Science and Technology (MOST)/Korea Science and Engineering Foundation (KOSEF) through the Advanced Information Technology Research Center (AITrc). Also, it was partially supported by the Research Grant from Kangwon National University.

References 1. Chen, M.-S., Han, J., Yu, P. S.: Data Mining: An Overview from a Database Perspective. IEEE Transactions on Knowledge & Data Engineering, Vol. 8, No. 6 (1996) 866-883 2. http://www.sas.com/technologies/analytics/datamining/miner/. 3. Aronson, J.: Improving Academic Achievement. Academic Press (2002) 4. Chall, J. S., Adams, M. J.: The Academic Achievement Challenge. Guilford Publishers (2002) 5. Bae, J.-H.: Analysis on Academic Achievement Using Data Mining. Master's Thesis, Kyunghee University, Korea (2001) 6. Agrawal, R. et al.: An Interval Classifier for Database Mining Applications. Proc. of the 18th Int'l Conf. on Very Large Data Bases (1992) 560-573 7. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases. Proc. of the 20th Int'l Conf. on Very Large Data Bases (1994) 487-499 8. Kim, D.-H., Cho, Y.-J.: Analysis on the Level of Academic Achievement Based on Mathematics Teaching and Learning Attitudes. Education Theory & Practice, Vol. 14, No. 1 (2004) 115-132 9. Kim, B.-S., Kim, M.-J.: A Study on the Effect of Student's Characteristic Style on Academic Achievement. Korea Human Relationship Bulletin, Vol. 7, No. 1 (2002) 1-20

Database and Comparative Identification of Prophages K.V. Srividhya1, Geeta V Rao1, Raghavenderan L1, Preeti Mehta1, Jaime Prilusky2, Sankarnarayanan Manicka1, Joel L. Sussman3, and S Krishnaswamy1 1

Bioinformatics centre, School of Biotechnology,Madurai Kamaraj University,Madurai [email protected] 2 Biological Services,Weizmann Institute of Science Rehovot 76100, Israel 3 Department of Structural Biology, Weizmann Institute of Science Rehovot 76100, Israel

Abstract. Prophages are integrated viral genomes in bacteria. Prophages are distinct from other genomic segments encoding virulence factors that have been acquired by horizontal gene transfer events. A database for prophages (http://bicmku.in:8082/prophagedb http://ispc.weizmann.ac.il/prophagedb) has been constructed with data available from literature reports. To date other than bacteriophage corner stone genes based iterative searches, no other exhaustive approach unique for identifying prophage elements is available. Here we report detection of prophages based on proteomic signature comparison using a prophage proteome as reference set. This method was tested with using the database and then extended over newly sequenced bacterial genomes with no reported prophages. The approach of using similarity of proteins over a given region helped identify twenty putative prophage regions in nine different bacterial genomes.

1 Introduction Bacteriophages are viruses infecting bacteria. Bacteriophages take up two life cycles, one being lytic infects, multiplies, and lyses host bacterium during progeny release [1] whereas in the other temperate mode the phage DNA integrates with the bacterial genome and is termed as prophage. Prophages range from fully viable to cryptic. Cryptic prophages harbor mutational decay and do not result in lytic growth. Prophages can constitute as much as 10-20% of a bacterium’s genome (Escherichia coli O157:H7 strain Sakai contains 18 prophage elements constituting 16% of the genome) [2]. Prophage sequences contribute to interstrain variability [3]. At present 230 prophages have been reported in 82 bacterial genomes [4]. In addition, prophages are important vehicles for horizontal gene exchange between different bacterial species. Virulence factors in many pathogenic bacteria are observed to be located on prophage locus, indicating the possible role played by prophages in conferring pathogenicity to host bacterium [5][6][7]. The impact of prophages on bacterial evolution has been reviewed extensively [8]. Prophages do not seem to be a homogenous group and show mosaic nature. Their diverse nature is also reflected by the diversity of genome sizes ranging from 549 kb (Flex9 prophage of Shigella flexneri 2a301) to 139449kb (Bh1 prophage from Bacillus halodurans).Analysis of e14 prophage of E coli K-12 revealed the modular nature of the element [9] and provided the basis for the approach of using similarity of proteins over a given region [10]. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 863 – 868, 2006. © Springer-Verlag Berlin Heidelberg 2006

864

K.V. Srividhya et al.

Identifying and understanding prophage elements is medically important as some phage genes are known to increase the survival fitness of lysogens [3]. Unambiguous detection of cryptic prophages is difficult as these defective prophages may be devoid of any corner stone genes. A prophage database has been initiated with the information available from the integrated prophages of sequenced bacterial genomes. At present, the database contains 227 prophage sequences from 49 organisms and details on the twenty putative prophage regions identified in 9 bacterial genomes by protein similarity approach.

2 Methods The prophage database was constructed using PostgreSQL server at backend and PHP in front end. All genomes, proteomes and protein table files were downloaded from ftp://ftp.ncbi.nih.gov/genomes/Bacteria. Currently the database covers prophages reported by Casjens [4]. In order to identify e14 homologs, similarity searches at the protein level were done taking the twenty-three e14 proteins as query and the bacterial proteomes as target. Similarity searches were done using blastp, (local version of the WWW-BLAST) [11] using an e-value of 0.01 and Blosum62 as the scoring matrix.

3 Results and Discussion 3.1 Database for Prophage Elements Prophage database covers genome data (GC content, integration site, location, genome size) and protein data (location, protein annotation, related COG information, PDB homologs and Unfoldability index by FoldIndex©). Fig 1 represents the screenshot of prophage database homepage.

Fig. 1. Screenshot of Prophage Database home page

Database and Comparative Identification of Prophages

865

3.2 Identifying Prophage Elements in Bacterial GenoMes The e14 element is a very well characterized prophage element [9], which contains all the highly conserved prophage genes like the integrase, excisionase , phage portal , cro type regulator , repressor and terminase genes elements . Genes encoding the BLAST hits for the different e14 proteins, which were within a particular distance (this distance varies from one organism to another; it is the size of the longest prophage in the organism’s genome) were then clubbed together. Any regions with two or more genes in this cluster were considered as putative prophage elements and further analyzed [10]. Fig 2 summarizes the method.

Fig. 2. Schematic representation of protein similarity method

A set of 61 bacterial genomes was chosen for prophage detection. To confirm the sensitivity of the approach bacterial genomes with prophage incorporated in prophage database data was used for comparison (BLAST hits (e < = 0.01). Out of 174 reported prophages, 87 loci were identified by protein similarity approach. Among them include 27 from E coli (K-12 O157 VT-2 Sakai, CFT073, EDL933) out of 61 reported in literature. With M. tuberculosis 3 loci were detected among the reported 6 prophages.With S.aureus strains 5 could be identified among 7 reported, 10 amongst 19 in Salmonella , 4 out of 10 in Yersinia, 9 among 15 in S.pyogenes. Samples results reflecting 14 genome sequences is presented in Table 1 . Hence the method was further extended on to genomes with no reports of prophage. Out of 9 genomes, 24 loci were identified among which 9 were found to be highly probable prophage locations. Table 2 details the prophages identified and respective organisms. For the former, prophage regions were delimited using data from the prophage database and from literature [4]. It was observed that most putative prophage regions encode hypothetical proteins suggesting that these regions need to be characterized further. Interestingly among the newly identified prophages, five are

866

K.V. Srividhya et al.

Table 1. Bacterial genomes and probable prophages identified using the Protein Similarity Approach method in comparison with literature reports Organism B subtilis 168

Literature reported 5

PSA detected 2

PBSX, SKIN

C. tetani E88

3

2

Cpt2, Cpt3

E. coli K12

11

4

24

16

H. influenzae Rd KW20

3

1

DLP12 ,QIN , Rac, KpLE1 Sp8 , Sp9, Sp6 , Sp4, Sp14, Sp3, Sp15, Sp1,Sp5, Sp12, Sp11 Sp7,Sp10,Sp18,Sp16, Sp17 FluMu

M. loti MAFF303099 M. tuberculosis H37Rv

3 3

1 1

Meso2 phiRv1

N. meningitidis Z2491 S. aureus N315

3 1

2 1

Pnm2 ,Pnm1 phiN315

S. enterica CT18 (serovar Typhi) S. flexneri 2a301

12

5

Sti4b, Sti8 , Sti3, Sti1,Sti7

11

3

Flex9, Flex5, Flex2

S. pyogenes M1 SF370

4

2

370.2 , 370.1

X. fastidiosa 9a5c Y. pestis KIM

9 5

1 2

XfP4 Yers3, Yers1

E. coli O157 VT-2 Sakai

Prophages Detected

Table 2. Prophage regions detected using the PSA approach from six bacterial genomes. Indicated in # are genomes with no report of prophages. Organism S. enterica LT2 (serovar Typhimurium) S. flexneri 2457T

Prophage St1

Sf1

Gene products Transposase, cyoplasmic proteins, phoQP

S. pyogenes M18 MGAS8232 S. pyogenes M3 MGAS315 P.luminescens subsp. laumondii TTO1#

Pl1,Pl2, P13 P14, 16,P17

Integrase, replication protein, helicase , mating formation Efflux, phage portal protein, transposase Transposase, antiterminator drug resistance protein DNA Invertase HIN, Mostly hypothetical proteins

Mycobacterium bovis AF2122/97 #

4 putative prophages

Integrase, transposase. Transcriptional regulatory

Sp1 Spy1

Database and Comparative Identification of Prophages

867

located near dehydrogenase genes. A priori there seems to be no attributable reason to this tendency for the putative lambdoid phages to get integrated near a dehydrogenase gene in the bacterial genome. However, it must be noted that the search template e14 is also integrated at the isocitrate dehydrogenase gene in the E. coli K12 genome.

3 Conclusion A prophage database has been constructed and used to devise a prophage identification approach using similarity of proteins over a given region. Prediction of prophage related areas in genomes is problematic due to low similarity between prophage genes and the mosaic nature. Five bacterial genomes for which no prophage has been reported in the literature were analyzed in detail. It was observed that most putative prophage regions encode hypothetical proteins suggesting that these regions need to be characterized further. The database provides information on prophages, cryptic prophages and phage remnants, providing effective and efficient way to access the prophage genomes. Prophage identification can be further extended over newly sequenced genomes and incorporated into the database.

Acknowledgments We acknowledge the use of Bioinformatics centre facility funded by DBT, Govt of India, DBT and the Israel Ministry of Science and Technology (MOST) for INDOISRAEL project support, MOST’s support for the Israel Structural Proteomics Center.

References 1. Brussow, H., Hendrix, R.: Phage Genomics: Small Is Beautiful. Cell 108 (2002) 13-16 2. Canchaya, C., Proux, C., Fournous,G., Bruttin, A., Brussow, H. : Prophage Genomics. Microbiol Mol Biol Rev. 67 (2003) 238–276 3. Brussow, H., Canchaya, C., Hardt, W.D.: Phages And The Evolution Of Bacterial Pathogens: From Genomic Rearrangements To Lysogenic Conversion. Microbiol Mol Biol Rev. 68 (2004) 560-602 4. Casjens, S.: Prophages And Bacterial Genomics: What Have We Learned So Far? Mol. Microbiol, 49 (2003) 277-300 5. Waldor, M, K.: Bacteriophage Biology And Bacterial Virulence. Trends Microbiol, 6 (1998) 295-297 6. Davis, B.M., Waldor, and M.K.: Filamentous Phages Linked To Virulence Of Vibrio cholerae. Curr Opin Microbiol, 36 (2000) 35-42 7. Boyd, E.F., Brussow, H.: Common Themes Among Bacteriophage-Encoded Virulence Factors And Diversity Among The Bacteriophages Involved. Trends Microbiol, 10 (2002) 521-529 8. Canchaya,C., Fournous, G., Brussow, H.: The Impact Of Prophages On Bacterial Chromosomes. Mol Microbiol. 53 (2004) 9-18

868

K.V. Srividhya et al.

9. Mehta, P., Casjens, S., Krishnaswamy, S.: Analysis Of The Lambdoid Prophage Element e14 In The E.Coli K12 Genome. BMC Microbiol, 4 (2004) 1 10. Rao, G.V., Mehta, P.,. Srividhya, K.V., Krishnaswamy, S.: A Protein Similarity Approach For Detecting Prophage Regions In Bacterial Genomes. Genome Biology, 6, (2005) P11 (http://genomebiology.com/2005/6/10/P11) 11. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic Local Alignment. Search Tool. J. Mol. Biol. 215 (1990) 403-410

Dynamic Multidimensional Assignment Algorithm for Multisensor Information Fusion System Yi Xiao, Guan Xin, and He You Research Institute of Information Fusion, Naval Aeronautical Engineering Institute, Yantai 264001 P.R. China [email protected]

Abstract. Data association, one of the key and difficult problems for multitarget tracking, is the decision process of linking the measurements or the tracks deemed to be of common origin under certain criteria. All typical data association algorithms can be deducted into special Multidimensional Assignment Problem. However, present S-D assignment algorithm only associates the synchronous measurements from different sensors, which only generate the static result. In this paper, the static assignment algorithm(S-D) has been generalized to the dynamic Multidimensional Assignment Problem by means of combining the measurements set and the tracks set. The main challenge in the assignment problem is that it is NP-hard. The solution using Hopfield neural network (HNN) is presented in this paper. The simulation results illustrate that this method can decrease the computing burden greatly.

1 Introduction In recent years, considerable research has been undertaken in the field multitarget tracking[1], which is of interest in both military and civilian applications. The problem of data association[2], namely, partitioning measurements across lists (e.g., sensor scans) into tracks and false alarms so that accurate estimates of true tracks can be recovered, is a pivotal issue for the multitarget tracking problem. All typical data association algorithms can be deduced into special multidimensional assignment problem. However, present S-D assignment algorithm only associates the synchronous measurements from different sensors, which only generates the static result[2]. In this paper, the static assignment algorithm is generalized to the dynamic tracking circumstance. The dynamic multidimensional assignment algorithm is proposed and the relationship between the dynamic and static ones is discussed. The main challenge in the assignment problem is that it is NP-hard. The solution using Hopfield neural network is presented in this paper. The simulation experimental results illustrate that this method can decrease the computing burden greatly.

2 Problem Description Let us use ω p to denote the location of the target p . Suppose that there are S sensors tracking targets, the location of which is known, ie: D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 869 – 874, 2006. © Springer-Verlag Berlin Heidelberg 2006

870

Y. Xiao, G. Xin, and H. You

ω s = [ x s , y s , z s ]′ , s = 1, 2, , S .

(1)

Sensors may be 3D radar, 2D radar or 2D passive sensors. The relationship between the true location of the target ω p , the location of the sensor ω s and the measurement m st is suggested as:

m st = H s (ω p , ω s ) .

(2)

The i s (i s = 1, 2, , n s ) measurement from the sensors can be described as m st + W sis Z sis = ® ¯ a sis

if the measurement from a target if the measurement from a false alarm

.

(3)

where W sis ~ N (0, R s ) . Suppose that at the k − 1 th scan the tracking system has formed t tracks, ie U (k ) = { Xˆ i (k k ) , P i (k k )}ti =1 and the measurements set from S sensors at the k th scan is Z (k ) = {Z s (k )}Ss=1 = {{Z sis (k )}ins=1}Ss=1 . s

3 Dynamic Multidimensional Assignment Algorithm for Multi-target Tracking To include the circumstance of emerging target, we add a false target Xˆ 0 to the tracks set at k − 1 th scan. Then the tracks set becomes U (k − 1) = { Xˆ j (k − 1 k − 1) , P j (k − 1 k − 1)}tj =0 .

(4)

similarly, we can get ZU (k ) and the likelihood function of measurements set (k ) originates from the target Xˆ 0 (k − 1) is Z i1i2 is

S

Λ ( Z i1i2 is (k ) Xˆ 0 (k − 1)) =

∏[P

Ds

f ( Z sis ) ω p ]u (is ) [1 − PDs ][1−u (is )] .

(5)

s =1

Due to ω p is unknown, the Maximum Likelihood Estimation ωˆ p is used to replace the true location of the target, ie.

ωˆ p = arg max Λ ( Z i1i2 is ω p ) . ωp

(6)

Then (5) becomes S

Λˆ ( Z i1i2 is (k ) Xˆ 0 (k − 1)) =

∏[P

Ds

s =1

N ( µˆ st , Rs )]u (is ) [1 − PDs ][1−u (is )] .

(7)

Dynamic Multidimensional Assignment Algorithm

871

where

µˆ st = H s (ωˆ t , ω s ) .

(8)

(6) is revised as

∏[

] [

]

S PDs Λ is j (k ) u (is ) 1 − PDs [1−u (is ) ] j = 1, 2, , t ° ° j . Λ ( Z i1i2is (k ) Xˆ (k − 1)) = ® sS=1 ° [ P N ( µˆ , R )]u (is ) [1 − P ][1−u (is )] j = 0 Ds Ds st s ° ¯ s =1

∏

(9)

So we have the formulation n1

t

J * = min

n2

ns

¦¦¦ ¦ c

i1is j ρ i1is j

.

(10)

t n2 ns ° .. ρ i1i2 ...is j = 1; i1 = 1,2,.., n1 ° j = 0 i2 = 0 is = 0 ° t n1 ns ° ... ρ i1i2 ..is j = 1; i 2 = 1,2,..., n 2 ° ° j =0 i1 =0 is =0 ® . ° t n1 ns −1 ° ... ρ i1i2 ...is j = 1; i s = 1,2,..., n s ° j =0 i1 =0 is −1 =0 ° n1 ns ° ... ρ i1i2 ...is j = 1; j = 1,2,..., t ° = 0 = 0 i i s ¯ 1

(11)

ρ i1 i s

j = 0 i1 = 0 i2 = 0

is = 0

subject to

¦¦ ¦

¦¦ ¦

¦¦ ¦ ¦ ¦

4 The HNN Solution Continuous definite Hopfield neural network is a all-netted one with feedback. The state of each nerve cell can be described as a nonlinear dynamic equation. n u dui = ¦ Tij v j − i + I i . dt τi i =1

where

(12)

ui is the state of the ith nerve cell. And the output vi satisfies: v i = f i ( ui ) .

(13)

where Tij is the coefficient between the output of the jth nerve cell and the input of the ith one.

f i (ui ) is the output characteristic of the ith nerve cell. n is the number of

872

Y. Xiao, G. Xin, and H. You

I i is the outside input. As for Hopfield neural network, the energy function related to the Tij and the outside input signal can be defined. If the proper initial the cells.

condition is given to, the system can approach to a convergence, which is the minimum to the energy function. Therefore, the target function, the restriction and the energy function can be associated. The optimization to the problem is the minimum to the energy function. According to the target function and the restriction to the track association problem, the energy function can be defined as:

E = A¦¦¦ v xi v xj + B ¦¦¦ v xi v yi x

i

j ≠i

x

i

y≠ x

2

C§ · + ¨ ¦¦ v xi − N ¸ + D ¦¦ α xi v xi 2© x i x i ¹

.

(14)

where A, B, C and D are punitive coefficients. The nerve cell state function derived from the energy function is:

du xi u § · = − xi − A¦ v xj − B ¦ v yi − C ¨ ¦¦ v xi − N ¸ − Dα xi . dt τ j ≠i y≠ x © x i ¹

(15)

The relationship between the input and the output is

v xi = f (u xi ) =

u 1 [1 + tanh( xi )] . 2 u0

(16)

while

Txi , yj = − A ⋅ δ ( x, y ) ⋅ (1 − δ (i, j )) − B ⋅ δ (i, j ) ⋅ (1 − δ ( x, y )) − C .

(17)

i= j . i≠ j

(18)

I xi = N ⋅ C − D ⋅ α ( x, i ) .

(19)

1, ¯0,

δ (i , j ) = ® the outside input

Given nerve cell state function, the Hopfield network will approach to the state of the minimum energy with the outside input. The state of the nerve cell v xi is the relationship between the xth track from node 1 and the ith track from node 2. It denotes that they are the same track when v xi = 1 .

5 Simulation Results Suppose that there are two nodes considered in the simulations, and a 2-D radar is set in each node. The noise process standard deviations of rang and azimuth measurements from each sensor are assumed to be 170m and 0.017rad, 180m and

Dynamic Multidimensional Assignment Algorithm

873

0.017rad, respectively. A Monte Carlo simulation with 50-runs was carried out for two environments. there are 60 targets, and the maneuvers of these targets are random, and the initial positions of these targets are normally distributed in a region. The initial velocity and azimuth of these targets are uniformly distributed in 4~1200m/s and 0~2 π , respectively. The state equation of the nerve cell are written in the form of difference equation in order to simulate by using computer.

§ · u (t ) u xi (t + 1) = u xi (t ) + ς ¨¨ ¦ Txi , yj u yj (t ) − xi + I xi ¸¸ . τ © y, j ¹ punitive coefficients in (17) are A = B = C = 200 , D = 1200 and

(20)

u0 = 0.02 .

The location MMSE of one target, the correct correlate ratio and the elapsed time of each period are shown in fig.1, fig. 2 and fig.3 respectively.

Fig. 1. Location MMSE of one target

Fig. 2. Correct correlate ratio

Fig. 3. Elapsed time of each period

6 Conclusion Aiming at the data association problem in multitarget tracking, this paper generalizes the S-D assignment algorithm, which settle the measurement association at a certain

874

Y. Xiao, G. Xin, and H. You

time, to the dynamic tracking circumstance. A novel dynamic multidimensional assignment algorithm for multitarget tracking is proposed. The main challenge in the assignment problem is that it is NP-hard. The solution using Hopfield neural network (HNN) is presented in this paper. The simulation experimental results illustrate that this method can decrease the computing burden greatly.

References 1. Blackman,S.S., Popoli R.: Design and Analysis of Modern Tracking Systems. Norwood, MA Artech House (1999) 2. He Y.: Study on the Distributed Multisensor Information Fusion Algorithms. Tsinghua Unsiversity. (1996) 3. Roecker J.A., Phillis G.L.: Suboptimal Joint Probabilistic Data Association. IEEE Trans. AES.,Vol.29,2 (1993) 510–517 4. Bar-Shalom.Y., William.D.B.: Multitarget-Multisensor Tracking, Applications and Advances. Artech House (2001) 5. He Y., Wang G.H., Lu D.J., etc.: Multisensor Information Fusion With Applications. Beijing, Publishing House of Electronics Industry (2000) 6. Deb, S., Yeddanapudi M., Pattipati K., etc.: A Generalized S-D Assignment Algorithm for Multisensor-Multitarget State Estimation. IEEE Trans. on AES, Vol. 33,2 (1997) 523–538

᧶

Future Location Prediction of Moving Objects Based on Movement Rules* Vu Thi Hong Nhan and Keun Ho Ryu** Database and Bioinformatics Laboratory, Chungbuk National University, Korea {nhanvth, khryu}@dblab.chungbuk.ac.kr

Abstract. LBS (Location-based services) are rapidly emerging as a prominent area of deployment of geographic data management technologies. One of the research challenges posed by LBS is to support location management and spatial data mining on vehicle movement. In this paper, we introduce a method that investigates how to deduce future locations of mobile users. The idea is to employ the movement patterns of users in the past. All frequent patterns are enumerated, which are applied to produce movement rules, which are in turn employed to predict future location of moving object. Our proposed algorithms are quite efficient to enable push-driven LBS applications.

1 Introduction Mobile phones are equipped with global positioning system devices or the mobile networks. Technically, the paradigm of anytime-anywhere connectivity raises previously ad hoc challenges. It is impossible to effectively extract detailed locationbased information from the network about users’ positions in a proactive manner. Current LBS are mostly based on pull-driven approaches. The massive tracking of users in a region is impossible in this manner since it would flood the network with such location requests. Location prediction has been addressed in many studies [5], [6], [7], but there exist many deficiencies and they do not consider the sampling error over time of the past movements. This paper proposes a technique for inferring future location of mobile user. The idea is to base on user’s past movement patterns to generate movement rules, which are in turn used to predict user’s future location. In the mining process, we decompose the object space into regular spatio-temporal cubes to which points of trajectories are assigned. To make decision of cube’s size we take the uncertainty of sampled positions into account. As a result, our method is better than UMP method in [9] in regard to two aspects, namely dealing with the uncertainty of sampled positions and imposing timing constraints on moving sequences and therefore on movement rules. The rest of paper is organized as follows. Section 2 reviews relevant work followed by the problem statement in section 3. The process of predicting future location is *

This research was supported by the MIC (Ministry of Information and Communication), Korea, under the ITRC (Information Technology Research Center) support program supervised by the IITA (Institute of Information Technology Assessment). ** Corresponding author. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 875 – 881, 2006. © Springer-Verlag Berlin Heidelberg 2006

876

V.T.H. Nhan and K.H. Ryu

presented in section 4. Section 5 shows the experimental results of the proposed method. Section 6 concludes the paper.

2 Related Work Sequential patterns [3], [4] are described as the discovery of inter-transaction patterns in large customer transaction databases. Location prediction has been addressed in many studies [5], [6], [7], [12], [13]. But there exist a lot of deficiencies as follows: assuming the existence of user movement patterns by only applying pattern matching techniques, without providing a method to discover them in the first place [6]; relying only on knowledge of the probability distribution of the mobile user’s velocity and/or direction of movement [7]; being susceptible in small deviation in object trajectories, which can cause significant degradation in prediction accuracy because they do not distinguish between regular and random movements of a moving object [5]. The approach in [9] does not consider the uncertainty of sampled points.

3 Definitions and Problem Statement A moving object database D is defined as the union of n time series of positions Di, Each Di for one object contains tuples (oid,x,y,t), where oid indicates object identifier, (x, y) shows the coordinates of the position at valid time t. Moving objects move in a two-dimensional space A⊆R2. Spatially, the positions of objects over time are mapped into the object space and the positions are then replaced by the region id, where they are located. Note that objects only can move from one region to its neighbors with respect to eight directions. Moving objects are assumed to stay in a region for a minimal time interval τ and its moving points are collected for a maximal time interval Τ defined as a temporally ordered list of region ids, denoted as with area ai indicating where object is at timestamp ti. The timing constraints are defined as τ < ti+1-ti ≤Τ, 2≤i≤m. A sequence is composed of k regions is denoted as k-pattern. We say P1 is a subpattern of another one P2 if there exists a oneto-one order preserving function g that maps regions in P1 to P2; that is, ai=g(ai) and if ai
Future Location Prediction of Moving Objects Based on Movement Rules

877

4 Location Prediction Using Movement Rules This section explains the process of mining movement rules that are then used to predict future location of moving object. 4.1 Discovery of Moving Patterns The reference plane A is decomposed into regular spatio-temporal cubes. A spatiotemporal unit is defined as the minimum spatial and temporal extent of interest. Trajectories hit per cells spatially in a certain time interval if trajectories are widely scattered, a raster is used as storage means. The object’s maximum speed and the chosen re-sampling ρ affect this choice. Re-sampling rate and cell size must be selected so that the object’s movement produces at least one hit in each cell that it visits. Every trajectory is now transformed into a set of cube labels. The minimal timing constraint in the case here is the temporal extent of cube in the moving sequence extraction process, so we just need concern cell labels in the moving sequence. Furthermore, since we assume that multiple hits will count for just one, so we have to summarize positions of trajectories. Prior mining, moving object database needs transforming into a suitable form. Apart from that, in order to obtain significant knowledge timing constraints τ andΤ concerned in section 3 are imposed on moving sequences. A moving sequence is created when the time difference between two consecutive positions stay within τ andΤ. Our algorithm All_MOP() for discovering moving patterns by modifying GSP algorithm [3] is presented as well as implemented in [11]. 4.2 Generation of Movement Rules and Future Location Prediction This section introduces the operations to find out all the movement rules using the moving patterns, the outcome of the procedure mentioned in the previous subsection (Fig. 1). Assume that a moving pattern P= where k>1, all the possible rules which can be derived from such a pattern are: Æ; Æ ; …;Æ , which is accomplished by Fig. 2. The confidence values of movement rules are computed using the equation (1). Then the rules that have a confidence higher than a predefined confidence threshold (min_conf) are selected. Function MORules(FMOP: set of frequent patterns) FRulesÅφ; For each frequent sequential pattern Fk∈FMOP, k≥2 RulesÅRules∪GenRules(Fk,Fk); return FRules; Fig. 1. Algorithm for discovering all frequent movement rules

878

V.T.H. Nhan and K.H. Ryu

Function GenRules(Fk:k-pattern,Ft:t-pattern) FrulesÅφ ; contcreateÅTrue; while(contcreate && t>1){ AÅ(t-1)-frequent pattern obtained by dropping the last location of Ft; confÅsupport(Fk)/support(A); if(conf ≥ min_conf){ FrulesÅFrules∪{AÆFk-A) with confidence=conf&support=support(Fk)}; t--; }else contcreateÅFalse; } return Frules; Fig. 2. Function for deriving all possible rules from a given moving pattern

These derived rules enable us to predict the next location of a moving object. Assume an object trajectory P= up to now. We need to find out the rules whose antecedents are contained in P and the last location in the antecedent is bj-1. The first location of the consequents of these rules along with a value that is the sum of the confidence and the support values of the rule are stored in an array. The support of a rule is the support of the moving pattern from which the current rule is generated. This array are then sorted in descending order with respect to the sum of support and confidence values, which help us select the most confidence and frequent rule. Then the location of the first tuple that has the highest value is the user's next moving location.

5 Performance Study To evaluate performance of our proposed method, we conduct various experiments in C++, on a PC Pentium IV, 2.00GHz processor with 256MB RAM, and window XP. The tests are carried out on synthetic datasets generated by a generator that simulates moving objects. Assume that all objects travel at the same maximum velocity vmax of 17m/s in a grid 1000×1000. Datasets are generated by the following principle: on average 70% object trajectories have the average length of L in which average 30% trajectories have the same trip and the remaining trajectories are denoted as outliers, which are formed as random walk over the space. Here we set the sampling rate ∆t and the T to 12 and set temporal extent to 2. Moving patterns are mined from the trajectories in the training set by our All_MOP in [11]. The trajectories in the test set are used for evaluating the prediction accuracy of our proposed algorithm. The effectiveness of the proposed algorithm for predicting future locations is measured by the following two measures: Precision is defined the number of correctly predicted locations divided by the total number of predictions made; Recall is defined as the number of correctly predicated locations divided by the total number of locations the user visits. Recall counts the non-prediction case as an incorrect prediction.

Future Location Prediction of Moving Objects Based on Movement Rules

879

With the above values of maximal speed and sampling rate, we have chosen the resampling rate ρ=6 and r approximately 71; that means, a 14×14 grid after evaluating Precision and Recall on various granularities and re-sampling rates. Due to the limit of space, we exclude this test results here. The next experiment is conducted for tuning the parameters of our methods, which are min_sup and min_conf. Here, the best values are searched for each parameter that makes both Recall and Precision good. The total number of trajectories generated is 3500 in which the number of trajectories in training set is 3000 and the number of trajectories in test set is 500 with varying average length of L. Impact of Minimum Support Values Here we examine the effect of increasing minimum support values min_sup on recall and precision values obtained by our algorithm. The Fig. 3 shows that lowering the support threshold results in decreased precision and increased recall. Even though precision decreases the number of mined rules generally increased substantially as the support is lowered, which are useful for prediction. Since recall and precision values vary in inverse proportion when increasing min_sup, it would be the most appropriate to choose min_sup in the interval [7%, 9%]. Because we observe that recall and precision values do not change considerably for the min_sup values in that interval. L4

L8

0.8

0.9 0.8

L4

0.7

L8

0.6

0.6 Recall

Precision

1

0.4 0.2

0.5 3%

7%

9% 21%

min_sup

0 3%

7% 9% min_sup

21%

Fig. 3. Precision and Recall as a function of the min_sup for rule-based prediction algorithm

Impact of Minimum Confidence Values In this test, we investigate the impact of increasing minimum confidence values on the recall and precision of the algorithm. From the Fig. 4, we can realize that precision increases and recall decreases when increasing the confidence value. This is due to the fact that only the rules with high confidence values are used for prediction. Consequently, the number of rules used for prediction is reduced their quality gets higher. Therefore, the precision value improves as increasing min_conf. But the opposite effect on recall occurs for the increase of min_cof. This is because the decline in the rules negatively affects the number of correct predictions. So, recall decreases when increasing min_conf.

880

V.T.H. Nhan and K.H. Ryu

L8

L8

L4

1

0.7

0.9

0.65

Recall

Precision

L4

0.8 0.7

0.6 0.55

0.6

0.5

10%

40%

90%

min_conf

100%

10%

40%

90%

100%

m in_conf

Fig. 4. Precision and Recall as a function of the min_conf for rule-based prediction algorithm

6 Conclusions Value-added services for mobile wireless networks, such as LBS and selective information dissemination service, are achieving an increasing importance. In this paper, we have introduced a method for predicting future locations of mobile users based on their moving patterns in the past. Moving patterns are derived based on spatio-temporal property of moving objects in application domain, viz. spatial and timing constraints. At first, the object space is decomposed into spatio-temporal units. Simultaneously, imprecision about the position in-between measurement is taken into consideration to choose spatial extent and temporal extent. Our algorithm All_MOP in [13] then enumerates all frequent patterns that are used to derive movement rules, which are then employed to make prediction of object’s future location. The experimental results demonstrate that our proposed algorithm for predicting next movement of mobile users is better than the previous ones and also quite efficient to enable push-driven LBS applications.

References 1. Meratnia, N., By, R. A. D.: Aggregation and Comparison of Trajectories. In Proc. Int. Symp. on GIS, (2002) 2. Yun, H., Ha, D., Hwang, B., Ryu, K. H.: Mining Association Rules on Significant Rare Data using Relative Support. Journal of System and Software, 67 (2003) 181-191. 3. Srikant, R., Agrawal, R.: Mining Sequential Patterns: Generalizations and Performance Improvements. In Proc. Int. Conf. EDBT, (1996) 4. Tsoukatos, I., Gunopulos, D.: Efficient Mining of Spatiotemporal Patterns. In Proc. On SSTD, Lecture Notes in Computer Science, Springer-Verlag, Berlin Heidelberg New York (2001) 5. Liu, G. Y., Gerald, M. Q.: A Predictive Mobility Management Algorithm for Wireless Mobile Computing and Communications. In Proc. of the IEEE Int. Conf. on Universal Personal Communications, New York (1995) 268-272 6. Liu, T., Bahl, P., Chlamtac, I.: Mobility Modeling, Location Tracking and Trajectory Prediction in Wireless ATM Networks. IEEE Journal on Selected Areas in Communications, 10 (1998) 922-936

Future Location Prediction of Moving Objects Based on Movement Rules

881

7. Rajagopal, S., Srinivasan, R. B., Narayan, R. B., Petit, X. B. C.: GPS-based Predictive Resource Allocation in Cellural Networks. In Proc. of the IEEE Int. Conf. on Networks, (2002) 229-234 8. Katsaros, D., Nanopoulos, A., Karakaya, M., Yavas, G., Ulusoy, O., Manolopoulos, Y.: Clustering Mobile Trajectories for Resource Allocation in Mobile Environments. In: Intelligent Data Analysis Conf. IDA, 2810 (2003) 319-329 9. Mamoulis, N., Cao, H., Kollios, G., Hadjieleftheriou, M., Tao, Y., Cheung, D.: Mining, Indexing, and Querying Historical Spatiotemporal Data. In Proc. of KDD, (2004) 10. Pfoser, D., Jensen, C. S.: Capturing the Uncertainty of Moving-Object Representations. In Proc. of Advances in Spatial Databases, 6th Int. Symp. SSD (1999) 111-132 11. Vu,T. H. N., Chi, J. H., Ryu, K. H.: Discovery of Spatiotemporal Patterns in Mobile Environment. In Proc. of APWeb (2006) 949-954 12. Oh, Y. B., Ahn, Y. A., Ryu, K. H.: Past Location Estimation of Mobile Objects. IC-AI (2003) 608-614 13. Chung, J. D., Paek, J.W., Lee, O. H., Ryu, K. H.: Temporal Pattern Mining of Moving Objects for Location-based Services. Dexa (2002) 331-340

Fuzzy Information Fusion for Hypergame Outcome Preference Perception∗ Yexin Song1, Zhijun Li2, and Yongqiang Chen1 1

College of Science, Naval University of Engineering, Wuhan, Hubei 430033, China [email protected], [email protected] 2 Wuhan University of Technology, Wuhan, Hubei 430063, China [email protected]

Abstract. This paper presents a novel fuzzy information fusion method for outcome preference perception in hypergame models. Firstly, a fuzzy aggregate algorithm is used to indicate the group fuzzy perception of opponent players’ outcome preference. The level sets of each group fuzzy outcome preference perception are obtained by solving linear programming models. Based on a defuzzification function associated with the level sets of fuzzy number and the Newton-Cotes numerical integration formula, the group crisp perception for opponent players’ outcome preference relation is determined. At last, the concept of most consensus winner is used to decide the crisp outcome preference vectors in the hypergame models. A numerical example is provided to illustrate the proposed method.

1 Introduction In real conflicting situations, it is quite usual that the players involved perceive the conflict differently and subjectively. Hypergame [1] is an efficient framework to deal with such situations. Contrary to the traditional game theory, the players involved in a hypergame may have incorrect subjective perception about the other players’ options, strategies, or preferences, or even be unaware of some of the players in the game. In this paper, we only consider a n-person first-level hypergame models [1-3], where each player correctly perceives the players in the game and the strategy set of the opponent players, but the players’ interpretations of the set of preference vectors can be different from one another because of misperceptions. Suppose N = {1, 2," , n} is the set of players in the hypergame model, Si , ∀i ∈ N is the set of strategies for players i, Vi is the preference vector (PV) for the player i over the

outcome space O = S1 × S2 ×"× Sn , Vij expresses player i's PV as interpreted by player j. When there is misperception, Vij will be different from Vi , so the game played by player j will be different from the one played by player i. In a first-level hypergame, ∗

This work is supported by National Natural Science Foundation of China Grant #70471031 and Scientific Research Foundation of Naval University of Engineering.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 882 – 887, 2006. © Springer-Verlag Berlin Heidelberg 2006

Fuzzy Information Fusion for Hypergame Outcome Preference Perception

883

since each player’s game is formed by a set of perceived PVs, except the one for the player himself, the first-level hypergame can be formulated as

{

H 1 = H 10 , H 20 ," , H i0 , ", H n0

where,

H

0 j

} (∃i, j ∈ N : H

= ( V 1 j , V 2 j , " , V j , " , V n j ) T , j = 1, 2 , " , n

0 i

)

≠ H 0j .

(1)

.

It is important for a player to correctly determine preference vector Vij in (1). To get more correct preference information, the player is often consisted of a group of experts who can give their perception about opponent players’ preferences respectively. Then, how to integrate all the experts’ preference perception information is very important for the player. Song etc [5],[6] presented two novel fuzzy preference information aggregation methods in the face of opponent players’ different linguistic preference or fuzzy preference relations over outcome space perceived by different experts. In this paper, we present a new method for solving such problem based on linear programming, numerical integration and the concept of most consensus winner.

2 Information Fusion Method for Fuzzy Preference Perception In order to determine Vij in (1), suppose that player j is consisted of M experts { D j 1 , D j 2 , " , D jM } , the weight of expert D j m is expressed by a trapezoidal fuzzy number

w

jm

= (ε

jm

,ζ

jm

,η

jm

,θ

jm

)

, where,

0 ≤ ε

jm

≤ζ

jm

≤η

jm

≤θ

jm

≤1

, and expert

D jm provides his/her perception about player i's fuzzy preference relation over O

i ijm = [r ijm ] , where, trapezoidal fuzzy number expressed in the matrix form as R pq K × K rpijqm = (α

ijm pq

,β

ij m pq

,γ

ijm pq

,δ

i jm pq

)

,

ijm ijm ijm 0 ≤ α ijm pq ≤ β pq ≤ γ pq ≤ δ pq ≤ 1, m = 1, 2, " , M ; p , q = 1, 2, " , K

,

denote player i's fuzzy preference of the outcome O p over Oq perceived by D jm . 2.1 Group Crisp Perception for Opponent Players’ Preference Relation ijm Given the data Ri and w jm , m = 1, 2 , " , M , using following aggregate algorithm which is similar to the fuzzy weighted average [6], player i's fuzzy preference of the outcome O p over Oq perceived by expert group (player j) can be expressed as

M

¦ (r

ijm pq

r = f ( r , r " , r ij pq

ij pq

ij1 pq

ij 2 pq

ijM pq

, w j1 , w j 2 " , w jM ) =

⊗ w jm )

m =1

,.

M

¦ w

(2)

jm

m =1

where, ⊗ is the fuzzy multiplication operation. In the following, we first present an approach to solve rpqij λ , the λ (λ ∈ [0,1]) level set of rpqij . Discretizi the complete range of the membership [0,1] of the fuzzy number

884

Y. Song, Z. Li, and Y. Chen

into the following finite number of L-values: 0 ≤ λ1 ≤ λ2 ≤ " ≤ λL ≤ 1 . For given λl , ijm ijm , w jmλl = [ w jmλl , w jmλl ] . Define l = 1, 2, " , L , denote rpijm qλ = [ r pqλ , rpqλ ] l

l

l

M M

f

ij pql

(w

j1

,w

j2

," w

) =

jM

¦

(w

jm

⋅r

m =1

ijm p q λl

)

M

¦

w

,

f

ij pql

( w j 1 , w j 2 , " w jM ) =

¦ (w

r pqλl = min{ f ij

ij pql

⋅ r pqijmλ l )

.

M

¦w

jm

(3)

jm

m =1

m =1

Theorem 1. Denote

jm

m =1

ij rpijq λ l = [ r p q λ l , r pijq λ l ]

, then

( w j1 , w j 2 ," w jM )} , rpqij λl = max{ f

ij pql

( w j1 , w j 2 ," w jM )} .

(4)

where, w jm ∈ [ w jmλl , w jmλl ] , m = 1, 2," , M . Proof: The reader is referred to the Ref. [6]. According to Theorem 1, the interval of λl level set of rpqij can be obtained by solving (4), which are two fractional programming problems. In order to solve (4), we wish to use the linear programming (LP) approach by first transforming this nonlinear problem into a linear one by using the Charnes and Cooper’s linear transformation technique [7]. The transformation procedure is described in the following. Let 1

y =

,

M

¦w

x jm = yw jm , m = 1, 2," , M . (5)

jm

m =1

Multiplying both the objective functions and the constraints by y and using the definitions given in (5), (4) can be converted into the following LP problems. LP1 : min f

ij pql

M

M

= ¦ r pqλl x jm ijm

m =1

ij LP 2 : max f pql = ¦ rpqijmλl x jm m =1

w jmλl y ≤ x jm ≤ w jmλl y , m = 1, 2," , M , . w jmλl y ≤ x jm ≤ w jmλl y , m = 1, 2, " , M , °M °M ° ° s.t ® ¦ x jm = 1, s.t ® ¦ x jm = 1, = 1 m ° m =1 ° ° y ≥ 0, x jm ≥ 0, m = 1, 2, " , M . ° y ≥ 0, x jm ≥ 0, m = 1, 2," , M . ¯ ¯

(6)

Solving LP1 and LP2, we can obtain rpqij λ = [r ijpqλ , rpqij λ ] , l = 1, 2," , L . l

l

l

In order to determine the group (player j's) crisp perception for player i’ outcome preference relation, a defuzzification function [8] associated with the level sets of fuzzy number is defined as follows. Definition 1. Let ϕ : F → ℜ is a mapping from the set of fuzzy numbers F to the set of real numbers ℜ , if for any a ∈ F , ϕ ( a ) = 1 1 ( a λ + a λ )d λ , then the function ϕ is 2

³

0

a defuzzification function. According to definition 1,

ϕ (rpqij ) =

1 1 ij (r pqλ + rpqij λ )d λ . 2 ³0

(7)

Fuzzy Information Fusion for Hypergame Outcome Preference Perception

Dividing the interval [0,1] into four equal parts, denote

λ

j

= j / 4

,

885

j = 0 , 1, 2 , 3 , 4

,

and applying the following Newton-Cotes numerical integration formula 1 ³ f ( λ ) d λ ≈ 90 [7 f ( λ ) + 32 f ( λ ) + 12 f ( λ ) + 32 f ( λ ) + 7 f ( λ )] . 1

0

ϕ ( rpi jq )

0

1

2

3

4

(8)

can be calculated approximately as ϕ ( rpijq ) =

1 1 ij 1 ij ij ( r p q λ + r pijq λ )d λ ≈ [7 ( r p q λ 0 + r pijq λ 0 ) + 3 2 ( r p q λ1 + r pijq λ1 ) 2 ³0 180 ij ij ij + 1 2 ( r p q λ 2 + r pijq λ 2 ) + 3 2 ( r p q λ 3 + r pijq λ 3 ) + 7 ( r p q λ 4 + r pijq λ 4 )]

.

(9)

Since the algebraic accuracy of (8) is at least 5, the approximate formula (9) is more accurate. Then, using (9), we can obtain the group (player j's) crisp perception for player i’ outcome preference relation matrix R i j = [ ϕ ( rpi jq ) ] K × K . 2.2 Crisp Preference Vector in Hypergame Model (1)

We now discuss how to determine a crisp preference vector V ij in model (1) from the crisp preference relation matrix Rij . Define K °1 if ϕ (rpqij ) > 0.5 1 , g ijp = g ijpq = ® g ijpq . ¦ K − 1 0 otherwise q =1, q ≠ p °¯

(10)

where g ijpq expresses whether outcome O p defeats Oq or not, and g ijp is the mean degree to which O p is preferred to all the other outcomes. Assume that fuzzy linguistic quantifier Q =’most’ to be a fuzzy set defined in [0,1] and given as 1, ° µ Q ( x ) = ® 2 x - 0.6, ° 0, ¯

for

x ≥ 0.8,

for 0.3 < x < 0.8, .

(11)

for x ≤ 0.3.

Then z ijpQ = µ Q ( g ijp ) is the extent to which O p is preferred to Q other outcomes. Finally, the fuzzy Q-consensus winner [9] is defined as a fuzzy set ij WQ = z1ijQ / O1 + z2ijQ / O2 + " + z KQ / OK .

(12)

So, the crisp preference vector Vij in model (1) can be determined according to the fuzzy Q-consensus winner W Q .

3 Numerical Example To illustrate the proposed approach, we choose the numerical example from Song [4]. Considering a two-person hypergame situation where all the players have 2 strategies. Player 1 is consisted of 3 experts ( D11 , D12 , D13 ) , whose respective

886

Y. Song, Z. Li, and Y. Chen

linguistic weights are w11 = H , w12 = B, w13 = AF , and whose perception about player 2's preferences over O = {O1 , O2 , O3 , O4 } using linguistic preference relations as follows (here, use the linguistic term set L={B, VH, H, AF, F, BF, L, VL, W}, where B = Best= (0.8,1,1,1), VH = Very High = (0.6,0.8, 1,1), H = High= (0.6,0.8,0.8,1), AF=Above Fair= (0.3,0.5,0.8,1), F = Fair = (0.3,0.5,0.5,0.7), BF =Below Fair =(0,0.2,0.5,0.7), L=Low= (0,0.2,0.2,0.4), VL = Very Low= (0,0,0.2,0.4), W = Worst= (0,0,0,0.2)):

R 211 = [ rpq211 ]4× 4

ª − « BF =« «W « ¬ B

AF − VH VH

B Wº ª − « BF VL VL »» , 212 R = [ rpq212 ]4× 4 = « « L − L» « » H −¼ ¬VH

R 213 = [ rpq213 ] 4 × 4

ª − «F =« «VL « ¬B

F

VH

−

W

B

−

AF

H

AF

H

− VH B

VL − H

VL º W »» L» » −¼

W º BF »» . L » » − ¼

At first, solve the linear programming LP1 and LP2, some results about the intervals of rpqij λ = [r ijpqλ , rpqij λ ] are listed in Table 1. l

l

l

Table 1. Some results about the intervals of rpqij λ j 21

21

21

21

21

p, q

e pq 0

epq210

e pq1/ 4

epq211/ 4

e pq1/ 2

epq211/ 2

e pq 3/ 4

epq213 / 4

e pq1

p=1,q=2 p=1,q=3 p=1,q=4 p=2,q=1 p=2,q=3 p=2,q=4 p=3,q=1 p=3,q=2 p=3,q=4 p=4,q=1 p=4,q=2 p=4,q=3

0.300 0.646 0 0.039 0 0 0 0.626 0 0.695 0.542 0.600

0.875 1.000 0.257 0.700 0.317 0.342 0.305 1 0.400 1 1 1

0.350 0.700 0 0.096 0 0.008 0.016 0.680 0.050 0.750 0.603 0.650

0.834 0.975 0.212 0.650 0.272 0.303 0.262 1 0.350 1 0.981 0.950

0.400 0.754 0 0.152 0 0.017 0.033 0.735 0.100 0.805 0.664 0.700

0.792 0.952 0.167 0.600 0.228 0.262 0.218 1 0.300 1 0.964 0.900

0.450 0.808 0 0.209 0 0.029 0.054 0.789 0.150 0.859 0.725 0.750

0.750 0.932 0.122 0.550 0.183 0.221 0.174 1 0.250 1 0.950 0.850

0.500 0.862 0 0.265 0 0.044 0.077 0.844 0.200 0.913 0.785 0.800

epq211

0.708 0.913 0.077 0.500 0.139 0.178 0.130 1 0.200 1 0.939 0.800

21 ϕ (e pq )

0.596 0.854 0.083 0.376 0.114 0.140 0.127 0.867 0.200 0.902 0.815 0.800

Using (9), we can obtain the player 1's crisp perception for player 2’ outcome preference relation matrix R 21 . Then, applying (10), (11) and (12), the fuzzy Qconsensus winner can be computed as WQ = 0.733 / O1 + 0 / O2 + 0.066 / O3 + 1/ O4 . According to WQ , we can determine the crisp preference vector V21 = (4,1,3,2). The result is the same as that in [4]. It shows that the proposed method in this paper is reasonable and effective.

Fuzzy Information Fusion for Hypergame Outcome Preference Perception

887

4 Conclusion A novel information fusion method has been proposed in this paper for fuzzy outcome preference perception in the first-level hypergame models. A fuzzy aggregate algorithm is first used to indicate the group fuzzy perception of opponent players’ outcome preference. The level sets of each group fuzzy outcome preference perception are then obtained by solving linear programming models. Based on a defuzzification function associated with the level sets of fuzzy number and the Newton-Cotes numerical integration formula, the group crisp perception for opponent players’ outcome preference relation is determined. At last, the concept of most consensus winner is used to decide the crisp outcome preference vectors in the hypergame models. An illustrative example verifies the feasibility and effectiveness of the proposed method.

References 1. Wang, M., Hipel, K.W., Frase N. M.: Solution Concepts in Hypergames. Applied Mathematics and Computation. 34 (1989) 147-171 2. Putro, U.S., Kijima, K., Takahashi, S.: Adaptive Learning of Hypergame Situations Using a Genetic Algorithm. IEEE Transactions on Systems, Man and Cybernetics-Part A: Systems and Humans. 5 (2000) 562-572 3. Hipel, K.W., Wang, M., Frase, N. M.: Hypergame Analysis of the Falkland Island Crisis. Internat. Stud. Quart. 32 (1988) 335-358 4. Song, Y. X., Wang, Q., Li, Z. J.: A Group Decision Making Method for Integrating Outcome Preferences in Hypergame Situations. In: Lipo, W., Yaochu, J. (eds.): Fuzzy Systems and Knowledge Discovery. Lecture Notes in Artificial Intelligence, Vol 3613. Springer-Verlag, Berlin Heidelberg New York (2005) 676-683 5. Song, Y. X., Qu, Y., Liu, Z.R., etc.: Fusion and Automatic Ranking of Fuzzy Outcome Preferences Information in Hypergame Models. Proceedings of 2005 International Conference on Machine Learning and Cybernetics, 5(2005) 2711-2715 6. Liou, T.S., Wang, M. J.: Ranking Fuzzy Numbers with Integral Value. Fuzzy Sets and Systems. 50(1992) 247-255 7. Charnes, A. A., Cooper, W. W.: An Explicit General Solution in Linear Fractional Programming. Naval Res. Logist. Quat., 3(1973) 91-96 8. Song, Y. X., Yin, D., Chen, M. Y.: Decision Making with Fuzzy Probability Assessments and Fuzzy Payoff. Journal of Systems Engineering and Electronics. 1(2005) 69-73 9. Kacprzyk, J., Fedrizzi, M., Nurmi, H.: Group Decision Making and Consensus under Fuzzy Preferences and Fuzzy Majority. Fuzzy Sets and Systems, 49(1992) 21-31

Image Identification System Using MPEG-7 Descriptors* Wonil Kim1, Sanggil Kang2,**, and Juyoung Lee1 1

College of Electronics and Information Engineering at Sejong University, Seoul, Korea [email protected], [email protected] 2 Department of Computer Science, The University of Suwon, Gyeonggi-do, Korea [email protected]

Abstract. In this paper, we propose an image identification system employing MPEG-7 as feature and using neural network as the main methodology. The proposed system properly identifies whether a given image belongs to particular sports images or not. The simulation results show that the proposed system successfully identifies images with the rate of over 85%.

1 Introduction As the era of information highway begins, everyday we encounter more image data, even in some cases, than the traditional textual data. The attention and focus of information shift from textual form to multimedia form. Identifying textual information is relatively easier than that of image information, since in case of image, the low level information bears so many alternatives. Even though there are many useful image identification algorithms so far, not many of them use a standard feature such as MPEG-7 visual descriptors. This paper proposes an image identification system in which it tells whether a given image belongs to particular group of images. For example, it will tell that a given image belongs to soccer image with confidence rate from 0.0 to 1.0. The system uses MPEG-7 visual descriptors for image features and the identification module employs neural network. The usage of this system is enormous. It can be properly fit into the image filter engine for search system. In case of medical system, this system judges whether a patient’s medical image belongs to a specific case. We simulated the proposal with several sports image identification system and received above 85% results for both true positive and true negative rates. This paper is organized as follows; in the next section, we discuss background researches on image identification using MPEG-7 visual descriptors. Then, the proposed neural network based image identification system is discussed in section 3. The simulation results are explained in section 4. Section 5 concludes with some future research remarks. * **

This paper is supported by Seoul R&BD program. Author for correspondence: +82-31-229-8217.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 888 – 893, 2006. © Springer-Verlag Berlin Heidelberg 2006

Image Identification System Using MPEG-7 Descriptors

889

2 MPEG-7 Descriptors and Image Processing Systems MPEG-7 is a recent emerging standard used in image classification systems. It is not a standard dealing with the actual encoding and decoding of video and audio, but it is a standard for describing media content. It uses a XML to store metadata. It solves the problem of lacking standard to describe visual image content. The aim, scope, and details of MPEG-7 standard are nicely overviewed by Sikora of Technical University Berlin in his paper [1]. There are a series of researches that use various MPEG-7 descriptors. Ro et al. [2] shows a study of texture based image description and retrieval method using an adapted version of homogeneous texture descriptor of MPEG-7. Other studies of image classification use descriptors like a contour-based shape descriptor [3], a histogram descriptor [4], and a combination of color structure and homogeneous descriptors [5]. As a part of the EU aceMedia project research, Spyrou et al. propose three image classification techniques based on fusing various low-level MPEG-7 visual descriptors [6]. Since the direct inclusion of descriptors would be inappropriate and incompatible, fusion is required to bridge the semantic gap between the target semantic classes and the low-level visual descriptors. There is a CBIRS that combines neural network and MPEG-7 standard: researchers of Helsinki University of Technology developed a neural, self-organizing system to retrieve images based on their content, the PicSOM (the Picture + self-organizing map, SOM) [7]. The technique is based on pictorial examples and relevance feedback (RF). The PicSOM system is implemented by using tree structured SOM. The MPEG7 content descriptor is provided for the system. In the paper, they compare the PicSOM indexing technique with a reference system based on vector quantization (VQ). Their results show the MPEG-7 content descriptor can be used in the PicSOM system despite the fact that Euclidean distance calculation is not optimal for all of them. Neural network has been used to develop methods for a high accuracy pattern recognition and image classification for a long period of time. Kanellopoulos and Wilkinson [8] perform their experiments of using different neural networks and classifiers to classify images including multi-layer perceptron neural networks and maximum likelihood classifier. The paper examines the best practice in such areas as: network architecture selection, use of optimization algorithms, scaling of input data, avoiding chaos effects, use of enhanced feature sets, and use of hybrid classifier methods. They have recommendations and strategies for effective and efficient use of neural networks in the paper as well. It is known that the neural network of the image classification system should make different errors to be effective. So Giacinto and Roli [9] propose an approach to ensemble automatic design of neural network. The approach is to target to select the subset of given large set of neural networks to form the most error-independent nets. The approach consists of the overproduction phase and the choice phase, which choose the subset of neural networks. The overproduction phase is studied by Partidge [10] and the choice phase are sub-divided into the unsupervised learning step for identifying subsets and the final ensemble set creation step by selecting subsets from the previous step.

890

W. Kim, S. Kang, and J. Lee

Kim et al. [11] proposed a neural network based classification module using MPEG-7. In this model, inputs for the neural network are fed from the feature values of MPEG-7 descriptors that are extracted from images. Since the various descriptors can represent the specific features of a given image, the proper evaluation process should be required to choose the best one for the adult image classification.

3 The Proposed Image Identification System 3.1 Feature Extraction Module Our system for identifying a specific sports image from various sports images is composed of two modules such as the feature extraction module and the identification module. The two modules are connected in serial form, as shown in Fig. 1. In the feature extraction module, there are three engines. From the figure, MPEG-7 XM engine extracts the features of images with XML description format. The parsing engine parses the raw descriptions to transform them to numerical values, which are suitable for neural network implementation. The preprocess engine normalizes the numerical values to the 0-1 range. By normalizing the input features, it can avoid that input features with big number scale dominant the output of the neural network identifier (NNI) for identifying a specific sports image over input features with small number scale.

Sports Image

MPEG-7 XM Engine

Parsing Engine

Feature Extraction Module

Preprocessing Engine

Neural Network Identifier

Yes No

Identification Module

Fig. 1. The proposed identification system of sports images

3.2 Identification Module Using the data set of the normalized input features and classes of sports, we can model an NNI in the identification module. Fig. 2 shows an example of the NNI with three layers. According to different MPEG-7 descriptors, the number of the input features can be various. Let us denote the input feature vector obtained from the first MPEG-7 descriptor as X = ( x1 , x 2 , , x Di , , x DN ) , here xi is the ith input feature extracted

from MPGE-7 descriptors and the subscript N is the dimension of the input features

Image Identification System Using MPEG-7 Descriptors

891

x1

1 (Yes)

x2 . . . . xN

. . . .

Input layer

. . . .

Hidden layer

0 (No)

Output layer

Fig. 2. An example of three layered neural network classifier

from the MPEG-7 descriptors. The identification of a specific sports image from various sports images means the binary classification of images. In other words, our system decides whether the coming sports image is a target sports image or not, using the input features. Thus, the output value is binary, i.e., 0 or 1. If the coming sports image is estimated to the target image then the output value is 1, otherwise 0. By utilizing the hard limit function in the output layer, we can have binary value, 0 or 1.

4 Simulation 4.1 Environments

We implemented our sports image identification system using 17 sports image data base consisting of Taekwondo, Field & Track, Ice Hockey, etc. As explained in the previous section, we extracted input features from query images using two MPEG-7 descriptors such as Color Layout (CL) and Homogenous Texture (HT) from the feature extraction module. The input feature values were normalized into 0-1 range. A total of 2,550 images (150 images per sports) were extracted. For training an NNI, 2,040 images (120 images each sports) were used and 510 images (30 images each sports) for test. The training and test images are exclusive. We structured the fourlayered (two hidden layers) NNI in the identification module. The hyperbolic tangent sigmoid function and hard limit function was used in the hidden layers and in the output layer, respectively. In each hidden layer, 10 nodes were connected. For training the NNI, we chose the back-propagation algorithm because of its training ability. In order to optimal weight vectors, large number of iterations (100,000) was selected. 4.2 Result

We simulated six image identification modules, such as Horse Riding, Skiing, and Swimming modules. These six modules employ either Color Layout or Homogeneous Texture descriptor as input features. Table 1 shows the performances of the proposed image identification modules.

892

W. Kim, S. Kang, and J. Lee

The true positive rates for three modules that use CL as input features show better results than those of using HT as input features. The true negative rates for all the six modules are relatively high, which is above 95%, comparing with that of true positive rates, which are around 85% on the average. This is reasonable since it is rather easier to say that a given image is not a particular image than saying a specific one. The results seem very promising and can be applied to various image processing domains. It can be easily extended to medical image processing, in which identifying a particular image belongs to a certain symptom is very critical. Also it can be implemented as the main part of image search engine or image collection engine. For a large image data base, it is very useful tool for image retrieval system. Table 1. The accuracies of the identification modules for Horse Riding, Skiing, and Swimming, each uses Color Layout and Homogeneous Texture respectively for input features (%)

Horse Riding Skiing Swimming Average

CL HT CL HT CL HT CL HT

True Positive 94.44 100.00 86.36 81.82 83.33 72.22 88.04 84.68

True Negative 96.20 94.42 93.98 96.90 95.32 94.71 95.38 95.33

False Positive 5.56 0.00 13.64 18.18 16.67 27.78 11.92 15.32

False Negative 3.80 5.58 6.02 3.10 4.8 5.29 4.86 4.67

5 Conclusion In this paper, we proposed and simulated a sports image identification system. The system uses the MPEG-7 visual descriptors as the input features for the neural network based identification modules. The results show a promising performance. Even the simulation is limited to the sports images in this paper, the idea can be easily scaled into the various areas, such as medical image processing, image search engine, and image retrieval system.

References 1. Sikora, T.: The MPEG-7 Visual Standard for Content Description – An Overview. IEEE Transactions on Circuit and Systems for Video Technology, Vol. 11, No. 6 (2001) 696-702 2. Ro, Y., Kim, M., Kang, H., Manjunath, B., Kim, J.: MPEG-7 Homogeneous Texture Descriptor. ETRI Journal, Vol. 23, No. 2 (2001) 41-51 3. Bober, M.: The MPEG-7 Visual Shape Descriptors. IEEE Transactions on Circuit and Systems for Video Technology, Vol. 11, No. 6 (2001) 716-719 4. Won, C., Park, D., Park, S.: Efficient Use of MPEG-7 Edge Histogram Descriptor. ETRI Journal, Vol. 24, No. 1 (2002) 23-30

Image Identification System Using MPEG-7 Descriptors

893

5. Pakkanen, J., Ilvesmaki, A., Iivarinen, J.: Defect Image Classification and Retrieval with MPEG-7 Descriptors. Lecture Notes in Computer Science, Vol. 2749. Springer-Verlag, Berlin Heidelberg, New York (2003) 349-355 6. Spyrou, E., Borgne, H., Mailis, T., Cooke, E., Arvrithis, Y., O’Connor H.: Fusing MPEG-7 Visual Descriptors for Image Classification. Lecture Notes in Computer Science, Vol. 3697. Springer-Verlag, Berlin Heidelberg, New York (2005) 847-852 7. Laaksonen, J., Koskela, M., Oja, E.: PicSOM – Self-Organizing Image Retrieval with MPEG-7 Content Descriptor. IEEE Transactions on Neural Networks: Special Issue on Intelligent Multimedia Processing, Vol. 13, No. 4 (2002) 841-853 8. Kanellopoulos, I., Wilkinson, G.: Strategies and Best Practice for Neural Network Image Classification. International Journal of Remote Sensing, Vol. 18, No. 4 (1997) 711-725 9. Giacinto, G., Roli, F.: Design of Effective Neural Network Ensembles for Image Classification Purposes. Image and Vision Computing, Vol. 19, No. 9-10 (2001) 699-707 10. Patridge, D.: Network Generalization Differences Quantified. Neural Networks, Vol. 9, No. 2 (1996) 263-271 11. Kim, W., Lee, H., Yoo, S., Baik, S.: Neural Network Based Adult Image Classification. Lecture Notes in Computer Science, Vol. 3696. Springer-Verlag, Berlin Heidelberg, New York (2005) 481-486

Improved Text Mining Methods to Answer Chinese E-mails Automatically Yingjie Lv, Qiang Ye, and Yijun Li School of Management, Harbin Institute of Technology, P.R. China, 150001 [email protected] Abstract. The rapid development of E-commerce makes it difficult to deal with large numbers of customer e-mails quickly and effectively for enterprises. In order to solve the problem, we can employ the method of answering customer e-mail automatically based on text classification techniques. Classified into some appropriate classifications, customer e-mails can be answered using previously defined reply templates which correspond to classifications. In this paper, according to the feature of Chinese customer e-mails, we mainly use improved classification technique based on concept extraction to raise reply accuracy. In the process of classification, we consider the impact of linguistic context into the concept extraction, and establish two different classification criterions (product criterion and demand criterion) to raise classification accuracy. Correspondingly, in the selection of reply template, we combine the result of product analysis with the result of demand analysis to offer customers the most appropriate reply information.

1 Introduction With the development of Internet, e-commerce is becoming widely used in the traditional business. As an important communication means through the Internet, e-mail plays an important part in the development of e-commerce. Especially for the enterprises which have undertaken e-commerce, e-mail has become a necessary approach to contact with their customers. Many famous enterprises think much of the customer e-mails because customer consciousness has gradually come to fore in the highly competitive business environment, and satisfying customer needs and offering them better services have become the basic requirements for businesses to survive. However, when more and more customers begin to use e-mails to voice their opinions or raise problems, the service departments of the enterprises will face a great challenge to deal with so many e-mails. Some large enterprises may receive thousands of customer e-mails, and treating with them by man will spend a long time and involve many employees, which results in the large cost and low customer satisfaction because of the protracted response to the customers. How to deal with this contradiction has become an urgent problem that the enterprises need to solve. After a long-term research, we find that quite a lot of customer e-mails are actually the consultation of common problems such as the function of the products, operation method, etc, which can be replied by some fixed templates. This condition offers a possibility for automatic response to e-mails by computer. Therefore, the research proposes a structure that uses D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 894 – 902, 2006. © Springer-Verlag Berlin Heidelberg 2006

Improved Text Mining Methods to Answer Chinese E-mails Automatically

895

text mining techniques to implement the function. We firstly analyze the e-mails, extract the common problems to identify the customer purpose and then classify the e-mails by classification algorithm. Finally according to the result of the classification, the appropriate templates will be chosen to answer the e-mails [1]. In the meantime, special e-mails that can’t be answered by automatic processing or need to be dealt with personally by professional will be arranged into relevant product classes, so that they can be distributed automatically to relevant departments for further disposal. In this way, the enterprises can save much cost and improve work efficiency remarkably, which promotes the development of e-commerce ultimately.

2 Literature Reviews The first experiment on text classification was reported in a paper by H.P.Luhn published in 1958[2].All subsequent work reported up to about 1970 was based on Luhn’s ideas and relied mainly on the word-frequency methods while the later work was based on domain-knowledge or based on natural language understanding[3,4]. Some famous classification techniques include Decision Tree, Support Vector Machine, Neural Networks, k-Nearest Neighbors, Bayesian Classifier and so on[5~8]. With the rapid development of e-mail, text classification techniques are applied to e-mail classification which has become hot issues now. However, the purpose of most researches is to filter massive spam e-mail. In contrast, the researches aimed at the demand analysis of customer e-mails are much less. Sung-Shun Weng and Chih-Kai Liu (2004) proposed an approach to e-mail classification based on concept extraction[9]. The main process is: First, train the customer e-mails. Through an expert, some important concepts of customer e-mails can be determined and some high frequency words can be extracted to represent the concepts of customer e-mails. In the meantime, each concept can be classified into certain classification. Next, extract all concepts of the e-mails to be evaluated and compute the weight of the concepts in order to decide the classification of the e-mails. Finally, choose a relevant reply template to answer the customer e-mail. Although their research has a good effect on reply accuracy, there are still some problems remained to be solved when we use above approach to deal with Chinese customer e-mails: 1 ) In an English sentence, there is a blank every two words to distinguish them separately. But in Chinese, a sentence is composed of consecutive characters without interval marks. So we must do Chinese word- segmentation first. 2 ) The concepts composed of words cannot express the e-mail topic accurately. The weakness of concept extraction based on words is the assumption that words reflect respectively the topic and their relations are ignored. However, some concepts are related to special context, splitting sentences into single words will destroy some patterns, resulting in the inaccuracy of information expression. 3 ) Compared to regular e-mail, customer e-mail has its own feature. Usually, this kind of e-mail is used to express customers’ thought and requirement on certain product of enterprise. If we can design classification criterions in detail from different angles, the accuracy of classification will be improved.

896

Y. Lv, Q. Ye, and Y. Li

3 Methodology 3.1 Definition of Concept Concept used in this research refers to the common meaning expressed by a group of the same or similar expressions. For example, the following two expressions: “what is” and “what’s the meaning of” have the same meaning, so both of them belong to the same concept----the definition of certain product. The normal form of concept can be expressed as: Concept {expression1, expression2,…}. Concept extraction must depend on the actual operation of the enterprise. Experts will determine the amount and the kind of concept according to long-term work experience. After defining the important e-mail concepts, some expressions which belong to each concept will be chosen subsequently. In order to choose the most refined and suitable expressions to represent the concepts, we must train the e-mails first. Since in Chinese text, sentence is made up of consecutive Chinese characters without interval marks, we should make Chinese word-segmentation. However, the following problems arise: In one hand, the complexity of Chinese understanding usually results in some errors of Chinese word-segmentation. In the other hand, the purpose of the word-segmentation is not to distinguish every word clearly, but to extract the regular expressions. The expressions may be a word set not only a single word. That is to say, it’s possible that the union of some words shows an entire meaning. So we must combine the word-segmentation techniques based on the dictionary and the statistic of word frequency. First, we make word-segmentation by a common dictionary to get all words, and then extract the word sets with high frequency. The detailed process of word sets extraction is: From each sentence, we choose two words at random to compose word sets, and then calculate the total number of the word set occurring in all sentences. Thus, we can get all the frequencies of word sets composed of two words. According to the actual demands, we can continue to deal with the word sets composed of three words, four words… From these word sets, we choose those with high frequency as preparative expressions, and after combining the experts’ experience, we can decide the best expressions for each concept finally. After choosing the expressions of concepts, we must calculate the weight of every expression, because certain expressions may belong to different concepts and the recognition ability of every expression for the concept is different. We use Salton’s TF-IDF formula to show the expression weight of the concept.

w (t , d ) =

tf ( t , d ) × log( N / n t + 0 . 01 )

¦

t∈ d

[ tf ( t , d ) × log( N / n t + 0 . 01 )] 2

(1)

In Formula(1), w(t,d) represents the weight of expression t in concept d; tf(t,d) represents the number of times that expression t appears in concept d; N is the total number of concepts while nt is the number of concepts that include expression t.

Improved Text Mining Methods to Answer Chinese E-mails Automatically

897

3.2 Generation of Classification Criterion Although we have defined some concepts to show the e-mail topic, we can’t make sure how to answer the customer e-mails according to the extracted concepts. In some cases, the same concept extracted from different e-mails may have different functions. For example, the concept “maintenance” can appear in an e-mail of complaint to certain products, and also can appear in an e-mail of acknowledgement for good after-sale service. So we must determine the classification of the e-mails. The establishment of classification criterion is the first issues to be solved. According to the purpose of the study, e-mails will be classified respectively by two different criterions: product criterion and demand criterion. Through the product analysis of an e-mail, we can make sure which kind of products the e-mail belongs to. The demand analysis of the e-mail makes us know which kind of service the customer needs. The advantages are as follows: 1 ) Showing the e-mail structure conveniently. Generally speaking, the structure of customer e-mails is quite similar. The contents are mainly some consultation and help aiming at certain products. Thus, classification by two different criterions can show the e-mail’s structure clearly. 2 ) Improving the accuracy of classification. When the e-mails are classified by product criterions, we can only focus on the analysis of special nouns because only these nouns can express the target product. In the same way, when the e-mails are classified by demand criterions, we can only focus on the analysis of some verbs, adjectives, especially some interrogative pronouns because these words can express what service the customers want. 3 ) Be convenient for further disposal. The technique of auto response can only deal with some common problems. Other e-mails with special problems that cannot be dealt with automatically will depend on manual disposal. These special e-mails which are classified by product criterion can be directly removed to relative product departments to dispose further.

Fig. 1. The relation between classifications and concepts

3.3 Classification Algorithm 1 ) Concepts extraction. Based on the dictionary made up of all expressions of the concepts, we do Chinese word-segments on the customer e-mail by Maximum Matching Method (MM). If some predefined expressions are extracted from the

898

Y. Lv, Q. Ye, and Y. Li

e-mail, we will conclude that the e-mail contains the concepts which the expressions belong to. After extracting all the expressions, the expressions which belong to the same concept will be summed up by their weights to get the weight of each concept for the e-mail. The formula is shown as follows: m

a j = ¦ b jk × t k

(2)

k =1

In formula (2), aj shows the weight of concept j in the e-mail, bjk is the weight of expression k in concept j, tk is the number of the times that expression k appears in the e-mail and m is the total number of expressions in concept j. 2 ) E-mail classification. In order to get the classification result, we need calculate the weight of each class by summing up all the concepts that belong to the same class. According to the result, we can conclude that the class with the most weight is the e-mail’s class. The formula is shown as follows:

Wi =

n

¦a

(3)

ij

j =1

‫ޕ‬

In formula (3), Wi shows the weight of class i in the e-mail, aij shows the weight of concept j in class i, n is the total number of concepts in class i.

Fig. 2. Generation of the templates

3.4 Reply Template Selection After we make sure the classification of an e-mail, the next step is to choose an e-mail reply template. A reply template is a standard reply format according to different classifications and concepts. The generation of templates is shown in Fig.2. The combination of a product concept and a demand concept corresponds to a template. The detailed selection process is: First, we get product classification of an e-mail according to the classification algorithm. Then from all the concepts involved in the chosen classification, we select the concept with the biggest weight in the e-mail as a benchmark product concept. Correspondingly, we do the same disposal of the e-mail

Improved Text Mining Methods to Answer Chinese E-mails Automatically

899

classification by demand criterion to get a benchmark demand concept. Finally, we choose the template that the combination of the two benchmark concepts corresponds to as the proper reply template. For example, if in product analysis of an e-mail, the e-mail belongs to the class Product2 and the weight of concept3 is the biggest, and in demand analysis, the e-mail belongs to the class Demand1 and the weight of concept1 is the biggest, we will choose template3 to answer the e-mail.

4 Experiment and Result 4.1 Experiment Data Since it is difficult to get actual customer e-mails from enterprises, we thus simulate customer e-mails using similar FAQ models as our experimental data. The FAQ data involved in this experiment derived from www.chinamobile.com which mainly contains a collection of frequently asked questions regarding some daily work of this enterprise. There are customer service personnel answering the questions on-line. We collect a total of 450 questions and their answers as experiment data. 4.2 Experiment Design First, we divide the experiment data into two parts: training ones (300 groups) and ones to be evaluated (150 groups). The training data is analyzed by two criterions. According to the actual operations of the mobile communication enterprise, the training data are divided into ten product classes and three demand classes. Since product division simply depends on enterprise operations, we put emphasis on demand design. Three Demand classes are “Consultation (Questions regarding general information)’’, “Disposal (Problems regarding a curtain product to be solved)’’, “Comment (Suggestions, thanks or complaints)’’. And all concepts of demand classes are shown in Table 1. Then we begin to dispose the training data using the method proposed above. After all preparations are completed, we begin to test the unevaluated e-mails according to classification algorithm above. If the reply template generating automatically towards an e-mail coincides with actual reply information, we can conclude that classification result is correct. In order to assess system efficiency, we employ a commonly used measure in survey data. The measures are precision, recall and F-measure [10]. 4.3 Results Because this research proposes the classification method by different criterions, the result analysis should also be divided into two parts (product and demand). Classification results show that in product analysis, the accuracy is about 0.95 which is very high. This is because the concepts related to products are not confusable and generally have given expressions. So we put emphasis on the analysis of classification result by demand criterion. Considering the impact of linguistic context on the concept

900

Y. Lv, Q. Ye, and Y. Li

extraction, this paper uses expressions composed of words and word sets with high frequency to represent the concepts. The experiment result (Fig.3.) shows that the improvement effectively raises the accuracy of classification.

Fig. 3. Comparison of classification accuracy based on different concept forms

Table 1. The classification result by demand criterion

Class

Concept Receive Charge Cancel Consultation Modify Definition Use Cancel Disposal Charge Thanks Comment Suggestion Complaint Total

Predicted 17 25 16 12 14 12 30 8 10 3 3 150

Actual 15 28 20 13 11 9 28 8 14 3 1 150

Correct 12 19 12 9 10 9 27 6 8 3 0 115

Recall

Precision

74.0%

74.0%

86.8%

91.7%

68.8%

61.1%

76.7%

76.7%

Table 1 is the classification result by demand criterion. From the Table 1, we can draw two conclusions: (1) On the whole, the accuracy is 0.767 which is fairly high for Chinese e-mail classification. (2)Compared to the other classes, class “Disposal” with only two concepts has high classification accuracy, which indicates the number of concepts in one class has an important impact on the accuracy of classification. The classification accuracy of class “Comment” is quite low. The reason is that many e-mails which belong to “Question” or “Disposal” mix up many expressions such as “thanks”, “hope”. These polite expressions affect the extraction of actual information. So how to filter the useless expressions is an issue to be solved in future.

Improved Text Mining Methods to Answer Chinese E-mails Automatically

901

5 Conclusion This paper proposes a method of automatically answering Chinese customer e-mails by using text mining techniques. Based on former researches and the feature of Chinese customer e-mails, this paper makes some major improvement as follows: 1 ) Considering the impact of linguistic context on concept extraction, in this research, concept is composed of expressions not words. In the process of choosing expressions, we not only choose the high frequency words, but also extract the word sets with high frequency as a part of the expressions. Thus, we get the best expressions of each concept. 2 ) According to the feature of customer e-mails, we propose that e-mail should be classified respectively by two different criterions: product criterion and demand criterion in order to improve the accuracy of classification. 3 ) In the selection of reply template, we combine the result of product analysis with the result of demand analysis to generate the proper reply template.

Acknowledgements The work was partially supported by the National Science Foundation of China (Grant No. 70501009) and Heilongjiang Natural and Science Fund Project (G0304). This work was performed at the National Center of Technology, Policy and Management (TPM), Harbin China.

References 1. Busemannn, S., Schmeier, S., Arens, R.: Message Classification in the Call Center, in: Proceedings of ANLP-2000,Seattle, (2000) 159-165 2. Luhn, H.P.: An Experimentin Auto-abstracting. In: International Conference on Scientific Information. Washington D.C.,(1958) 3. Rau, L F.: Conceptual Information Extraction and Retrieval from Natural Language Input. User-oriented Content-based Text and Image Handling. Proceedings of RIAO'88 Conference. MIT,(1988) 4. Wang, J.B., Du, C.L.: Study of Automatic Abstraction System Based on Natural Language Understanding.Journal of Chinese Information Processing, 9(3): (1995) 33 42 5. Aggarwal, C. C.: On Effective Conceptual Indexing and Similarity Search in Text Data. Proceedings of the 2001 IEEE International Conference on Data Mining (3–10) 6. Yang, Y.M., Liu, X.: A Re-examination of Text Categorization Methods. Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'99)[C], (1999) 42-49 7. Joachims, T.: A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. Processing of ICML-97,the14th International Conference on Machine Learning [C]: (1997) 143 -151

㨪

902

Y. Lv, Q. Ye, and Y. Li

8. Joachims T: Transductive Inference for Text Classification Using Support Vector Machines.In: Proceedings of ICML-99,16th International Conference on Machine Learning [C]. Blad, Slowenien,(1999) 200-209 9. Weng, S.S., Liu, C.K.: Using Text Classification and Multiple Concepts to Answer E-mails.Expert Systems with Applications 26 (2004) 529–543 10. Wong P. K., Chan, C.: Chinese Word Segmentation Based on Maximum Matching and Word Binding Force. The 16th Intemational Conference on Computational Linguistics, Copenhagen,Denmark,(1996) 200 - 203

Improvement on ARC-BC Algorithm in Text Classification Method Yu Zhao1, Weitong Huang2, and Yuchang Lu3 1

Department of Computer Science and Technology, Tsinghua University Beijing, P.R.China [email protected] 2 Department of Computer Science and Technology, Tsinghua University Beijing, P.R.China [email protected] 3 Department of Computer Science and Technology, Tsinghua University Beijing, P.R.China [email protected]

Abstract. With the rapid development of automatic text clustering and classification, many techniques and algorithms studying have been made focused in the field of text categorization. However, there is still much work to be done for improving the effectiveness of these classifiers, and new models need to be examined. This paper introduce an ARC-BC algorithms by using of association rule mining in text categorization systems, and proposes some new concepts and improvements on ARC-BC. The experimental results show that the training time of association-rule-based classifier is comparable to other well-known text classifiers, but classification quality is slightly lower than KNN algorithm. Moreover, the improvement proposed here can well improve the classification quality and can shorten training time. In all, our investigation leads to conclude that association rule mining is a good and promising strategy for efficient automatic text categorization, and it has a large room to enhance.

1 Introduction to Text Categorization As the information technology is developing and the use of computers is widely used in almost all the fields, and the volume of text and documents are growing quickly, it is very important to find the truly relevant content for purpose. Text categorization is the problem of assigning predefined categories to text documents. So far, a lot of learning methods have been applied to this problem, including Naïve Bayes probabilistic classifiers[1], Decision Tree[4], Nearest Neighbors algorithm[3], and other learning approaches. The basic idea of Naïve Bayes probabilistic classifiers[1] is to estimate the probabilities of categories given a test document by using the joint probabilities of words and categories. The naïve part is the assumption of word independence. The assumption makes Naïve Bayes classifier far more efficient than the exponential complexity of non-naïve Bayes on Reuters reported by [1] and [2], respectively. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 903 – 908, 2006. © Springer-Verlag Berlin Heidelberg 2006

904

Y. Zhao, W. Huang, and Y. Lu

Decision Tree[4] method is used to select informative words based on an information gain criterion, and predict categories of each document according to the occurrence of word combinations in the document. Nearest Neighbors algorithm[3] ranks the nearest neighbors of an input document among the training document, and predicts the categories of the input document. KNN[5] is an example of Nearest Neighbors algorithms. Given a new document, the KNN algorithm finds the k nearest neighbors among the training document, and uses the categories of the k neighbors to weight the category candidates.

2 Introduction to ARC-BC 2.1 Main Idea of the Algorithm [6] constructs a text classifier by extracting association rules that associate the terms of a document and its categories. To do so, we model the text documents as a collection of transactions where each transaction represents a text document, and the items in the transaction are the terms selected from the document and the categories the document is assigned to. Then, the system discovers associations between the words in documents and the labels assigned to them. The process of ARC-BC is showed in fig.1:

Fig. 1. The Process of ARC-BC

2.2 Outline of the Algorithm In the first phase, each category is considered as a separate text collection and the association rule mining, for example Apriori algorithm, applied to it. The rules generated from all the categories separately are combined together to form the classifier. In the second phase, we use the training set to evaluate the classification quality, which we will introduce it in Section 3.

Improvement on ARC-BC Algorithm in Text Classification Method

905

3 Methods and Improvements of Classifying a New Document 3.1 Classifying a New Document The association-rule-based text categorizer is a set of rules that assigns a category to a document if a set of terms occurs in the document. It is common that more than one rule would reinforce the assignment of one document to a class label. Definition 1: The number of rules covered The number of rules covered by each test document Ri refers to the number of rules of the ith category that is included in the test document. Definition 2: Attributive probability For each category, the probability pi of a test document belongs to this category i is

pi =

Ri

¦

20 1

Rj

[6] proposes the definition of dominance factor į, which is the proportion of rules of the most dominant category in the applicable rules for a document to classify. When į is set to a certain percentage, only the categories that have enough applicable rules representing that percentage of the number of rules applicable for the most dominant category are selected. 3.2 Improvement Based on Rule Reduction Definition 3: Category support If n categories contain a same association rule, this rule’s category support is n. Definition 4: Rule reduction Rule reduction is to take out the rules whose category support is no less than a threshold, because we regard these rules are of no sense to classify a new document. Such a simple rule reduction approach will raise problems if some category leaves no rule after this. This kind of situation will cause system failures. To solve the problem, we should adjust the number of rules generated by ARC-BC to make sure the number of rules of each category is not too small. (For example, if we find association rules by Apriori, we can adjust the minimum support of the algorithm to achieve the target.)

4 Experimental Evaluation 4.1 Experiment Conditions In the experiment, we use an Intel Pentium M PC with 1.60 GHz and 512M of main memory (approximately 500M useable when running). The programming language we use is java with jdk1.5 of runtime environment.

906

Y. Zhao, W. Huang, and Y. Lu

4.2 Data Set We use the data set 20newsgroup, which is widely used. This data set contains 20 given groups, and 1000 documents in each group. We randomly select 5000 documents to be test documents and the rest as training document set. 4.3 Evaluations and Analysis For ARC-BC algorithm, we compare the different results when rule reduction threshold is 2, 3, 4, 5, 6, and no rule reduction, respectively. Table 1 shows the total number of rules generated from training set under the six situations mentioned above. Table 1. Total number of rules of training set Rule reduction Number of Rules

2 841

3 1027

4 1102

5 1134

6 1164

No rule reduction 1198

The result in Table 1 shows that appropriate rule reduction is good for reducing common rules of different categories.

Fig. 2. The Running time, F1 measure and Breakeven Points

Improvement on ARC-BC Algorithm in Text Classification Method

907

Fig. 3. F1 measure when rule reduction is 2

The results of the running time-including the time to generate rules, reduce rules and classify test documents, F1 measure[7] and breakeven points[7] under the six situations mentioned above are showed in fig.2. Fig.2 shows that the running time for ARC-BC algorithm is appropriate 40 seconds. The larger the threshold of rule reduction is, the longer it takes to run the algorithm. We can find F1 measure and breakeven points of ARC-BC -which represent the classifying quality- is getting larger as the rule reduction is getting smaller. The results shows that appropriate rule reduction is not only speed up the running time, but also improve the quality and effect of classifying effectively. Fig.3 illustrates F1 measure of each category when rule reduction threshold is 2. We find the best quality on category 9, 11 and 12. The reason why these categories share the high quality is that the association rules of these categories is characteristic, so the system can easily include the documents supposed to be in the correct category and easily exclude those are not supposed to be in the category. Category 1, 13, 14, 19 and 20, which represent alt.atheism, sci.electronics, sci.med, talk.politics.misc and talk.religion.misc in 20newsgroup set have a low classifying quality. Among these categories, the reason why sci.electronics and sci.med have a low classifying quality is that the total number of rules of the two categories is small. The reason why the other three categories share low qualities is that the categories overlap too much and there is a large quantity of common rules.

5 Conclusion This paper explores the association-rule-based ARC-BC algorithms in text categorization systems. We propose new concepts of attributive probability, etc, which help to formulize the description of the algorithm and make improvements by rule reductions on ARC-BC. The experimental results show that appropriate rule reduction is good for reducing common rules of different categories. Moreover,

908

Y. Zhao, W. Huang, and Y. Lu

appropriate rule reduction is not only speed up the running time, but also improve the quality and effect of classifying effectively. The major advantage of association-rule-based classifier is its fast speed. The system time of ARC-BC is less than one minute, while it takes about one hour for KNN to run the algorithm. However, the classification quality of ARC-BC is 20 percent lower than KNN algorithm. How to enhance the quality is one of the most urgent and promising directions for ARC-BC to develop. In the future, we are testing more performance of ARC-BC on more other data sets to improve the algorithm, and test ARC-BC by all means to improve its classifying quality.

References 1. Lewis, D.: Naive (bayes) at forty: The Independence Assumption in Information Retrieval. In: 10th European Conference on Machine Learning (ECML-98) (1998) 4-15 2. Moulinier, I.: Is learning bias an issue on the text categorization problem? In: technical report, LAFORIA-LIP6. University Paris VI (1997) 3. Dasarathy B V.: Nearest neighbor norms: NN pattern classification techniques. Los Alamitos, CA: IEEE Computer Society Press (1991) 4. Moulinier, I. Ganascia, J.-G.,: Applying an existing machine learning algorithm to text categorization. In S.Wermter, E.Riloff, and G.Scheler, editors, Connectionist statistical, and symbolic approaches to learning for natural language processing. Springer Verlag, Germany. Lecture Notes for Computer Science series, No.1040, (1996) 5. Duda, R. O., Hart, P. E.: Pattern Classification and Science analysis, John Wiley & Son. (1973) 6. Zaiane, O. R., Antonie, M.: Classifying Text Documents by Associating Terms with Text Categories. In Proceedings of the Thirteenth Australasian Database Conference (ADC) (2002) 7. Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. In IR (1999)

Improving Structure Damage Identification by Using ICA-ANN Based Sensitivity Analysis Bo Han1, Lishan Kang1, Yuping Chen1, and Huazhu Song2 1

School of Computer Science, Wuhan University, Wuhan, Hubei 430072, China [email protected] 2 School of Computer Science and Technology, Wuhan University of Technology, Wuhan, Hubei 430072, China

Abstract. Structure health monitoring detects the nature of structure damage in an early stage by the network sensors whose signals are normally highly correlated and mixed with noise. Feature reduction methods are applied in extracting attributes,that will be input into advanced classification models. The complicated data transformation and classification procedures make it difficult to identify direct relationship between sensors and abnormal dynamic structure characteristics, especially for complex buildings with large numbers of sensors. In this study, the sensor sensitivity analysis on a structure damage identifier is applied, which integrates independent component analysis (ICA) and artificial neural network (ANN) together. The approach is evaluated on a benchmark data from University of British Columbia. The experimental results show sensitivity analysis not only helps domain experts understand the mapping from different location and type of sensors to a damage class, but also significantly reduce noise and increase the accuracy of ICA-ANN classifier.

1 Introduction Structural damage identification involves the observation of a system over time using periodically sampled dynamic response measurements from a network of sensors. State-of-Art researches in this area [1,2] mapped the task to a classification problem in machine learning community. It generally consists of two steps: feature reduction from measured multi-source sensor signals and statistical classification [3,4,5]. In the first step, the highly correlated and noisy sensor signals make it hard to perform effective feature selection. Many successful system applied independent component analysis (ICA) [6,7] or principal component analysis (PCA) to reduce signals to much lower dimension features. These independent or uncorrelated features facilitate the classification using either artificial neural networks (ANN) or support vector machines (SVM) in the second step [8,9]. However, the complicated data transformation and classification make it difficult to identify direct relationship between sensors and abnormal dynamic structure characteristics. Structure engineers are keen to explore such relationship because different type and location sensors have empirically been proved to provide varied quality information. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 909 – 914, 2006. © Springer-Verlag Berlin Heidelberg 2006

910

B. Han et al.

Some redundant signals not only increase noise, but also make the input signals to a much higher dimension. The well known “dimensionality curse” degrades classification performance and also bring difficulties to analyze structure damage types and causes in details. Consequently, it is critical to identify which sensors make most contributions for a specific type of damage. The task has intensive applications since many complex buildings with different thickness, cracks and holes are increasingly constructed, and a large number of sensors will be set into structures. In this paper, we applied first-order sensitivity analysis to identify most important sensors for a classifier, which combines ICA and ANN together. The experiments, based on the benchmark data from the University of British Columbia, showed sensitivity analysis can clearly reveal the relationship between selected sensors and specific damages. By using the most sensitive signals, the ICAANN classifier significantly improves the identification accuracy.

2 Methodology 2.1 ICA-ANN Structure Damage Classifier ICA performs optimal linear transformations on multiple source signals and extracts the most informative independent components. These components work as feature inputs of an ANN, and thus a damage classification model is constructed. ICA and ANN have been widely applied in the stage of feature reduction and classification respectively. In our previous work, we successfully integrated them together to construct a reliable damage classifier. The detailed technical steps are listed in [10]. 2.2 ICA-ANN Based Sensitivity Analysis By sensitivity analysis, the classifier is regarded as an ICA-ANN black box. Its inputs are sensor signals x1, x2, …, xh, and output is status label Y. We assume each signal xi (i = 1,2,…, h) observers normal distribution with N( x i , σ i ) . By perturbing a sensor signal with a small value ∆x i , we explore how much difference a new predictor Yi will make, comparing with the predictor Yfull constructed by full set of original sensor signals. Thereby, the normalized stimulation sensitivity Si=

∆Yi / σ Yi ∆x i / σ i

=

σ i (Yi − Yfull ) , σ Yi ∆x i

where σ Yi is the standard derivation of predictor Yi. Given all sensor signals have the

same standard derivation, σi = σ j , σ Yi = σ Yi (here i, j = 1,2,…, h and i ≠ j), Si is simplified as the first-order derivative

(Yi − Yfull ) . Sorting the Si, we will rank the sen∆x i

sors signals by their sensitivity. The top features play the most important roles in the damage detection.

Improving Structure Damage Identification by Using ICA-ANN

911

The detailed algorithm is listed in Fig.1. Input: a dataset d = < T, S >, including damage status target T, and signals from q sensors, denoting as S = [ s1,s2,…, sq ]; Output: sensitive sensor lists SL; Step1: Randomly divide d into training dataset d1 and test dataset d2; Step2: Using full set of original sensor signals in d1 as input to ICA-ANN black box, train a classification model; Applying this model on d2, we get a prediction class vector Predfull; Step3: Let I = 1; ∆ = 10-8; Step4: Let Si’ = Si + ∆ ; S’ = [ s1,…,Si’,…, sq]; Step5: Using S’ in d1 as input to ICA-ANN black box, train a classification model; Applying the model on d2, we get a prediction class vector PredI; |d | 1 Step6: Compute the distance disti = | Pr ed I − Pr ed full | , where |d2| repre¦ | d 2 | *∆ l=1 sents the number of samples in d2; Step7: I = I + 1; if I ≤ q, goto Step4; Otherwise, goto step8; Step8: Sort the distance vector < dist1, dist2, …., distq > in descendent order; and the corresponding sensor signals are listed in the new order S’’ = [snew1, snew2, …, snewq]; Step9: Let J = 1; Step10: Let SL = [ snew1,…, snewj ]; Input them into ICA-ANN black box, train a classification model based on d1; applying the model on d2, we get a prediction class vector SPredj; Step11: J = J+1; if J ≤ q, goto step10; Otherwise, goto step12; Step12: Choose the best J, such that SPredj is most close to the true target vector in d2; SL = [ snew1,…, snewj ]. 2

Fig. 1. Algorithm for sensor sensitivity analysis

3 Experiments 3.1 Data Sets We selected a popular benchmark to testify the classification accuracies. It is developed by the IASC-ASCE SHM task Group at University of British Columbia. In the experiments, we mainly use seven data sets in the ambient data from this benchmark, where C01 is an undamage dataset, C02-C07 are different type of damaged datasets. The detailed data and information were shown in [11]. There are 15 attributes in each dataset. They correspond to the signals from 15 sensors located in this steel-frame. In addition, the benchmark provides an additional noise attribute, which helps researchers to study feature reduction.

912

B. Han et al.

3.2 Experimental Results In the experiments, for each undamaged or damaged dataset, we randomly choose 6000 samples. Then following the steps in Fig. 2, we obtain a sorted attribute list shown in Table 1. The bold attributes denotes they have been selected into sensitive sensor list SL. The table also helps domain experts to explore different location and type of sensors to a specific damage class. Table 1. Sorted attributes list in dataset (C01-C07)

data C01 C02 C03 C04 C05 C06 C07

1st 8 10 2 15 1 2 6

2nd 6 5 4 2 14 13 5

3rd 9 11 3 13 4 9 1

4th 4 15 15 4 7 8 9

5th 12 13 1 1 6 7 12

6th 3 2 6 12 2 11 11

7th 10 4 12 6 10 15 8

8th 11 3 10 5 12 12 4

ica by all sensors in 6000 undamage data C01 5 0 -5 2 0 -2 5 0 -5 10 0 -10 5 0 -5 5 0 -5 5 0 -5 5 0 -5 5 0 -5 20 0 -20

9th 10th 11th 12th 13th 14th 15th 7 2 5 13 1 15 14 12 8 14 9 6 1 7 8 11 14 9 7 13 5 7 9 10 3 14 8 1 15 9 11 8 5 3 13 14 4 3 6 5 10 1 7 10 2 14 15 3 13

ica by 7 sensitive sensors in 6000 undamage data C01 5

0

1000

2000

3000

4000

5000

6000

0

1000

2000

3000

4000

5000

6000

0

1000

2000

3000

4000

5000

6000

0

1000

2000

3000

4000

5000

6000

0 -5

0

1000

2000

3000

4000

5000

6000

0

1000

2000

3000

4000

5000

6000

0

1000

2000

3000

4000

5000

6000

0

1000

2000

3000

4000

5000

6000

0

1000

2000

3000

4000

5000

6000

5 0 -5 10

0

1000

2000

3000

4000

5000

6000

0

1000

2000

3000

4000

5000

6000

0

1000

2000

3000

4000

5000

6000

0

1000

2000

3000

4000

5000

6000

0 -10 5 0 -5 10

0

1000

2000

3000

4000

5000

6000

0 -10

0

1000

2000

3000

4000

Fig. 2. 10 ICs of C01

5000

6000

Fig. 3. 5 ICs of sensitive sensors in C01

In Fig. 2 and Fig. 3, we compare the independent components transformed from the full sensor signals and from the selected seven sensitive signals in data C01. We clearly see sensitivity analysis significantly reduced the number of extracted features from 10 to 5, so the ANN input dimension decrease much. Moreover, the number of the independent components from sensitive sensors has been reduced round 50% compared with the number of the independent component from all sensors in data C02-C07, shown in Table 2. For all seven data sets in Table 1, we counted the total occurring frequency for each selected sensitive attribute. The results are listed in Fig. 4.

Improving Structure Damage Identification by Using ICA-ANN

913

We have 7 attributes 15,12,6,4,2,11,1 with a frequency larger than 3. To confirm they are most informative features, we compute the correlation p-value between extracted independent components of these features with the additional noise attribute in datasets. If p<0.05, they are statistically correlated. The results are listed in Table 2. It clearly showed sensitivity analysis effectively remove the attributes strongly correlated with noise. Sensitive sensor statistic value 5

statistical value

4

3

2

1

0

0

1

2

3

4

5

6

7 8 9 10 11 12 13 14 15 All 15 sensors

16

Fig. 4. Sensitive sensor statistic Table 2. The number of ICs correlated with noise

Data

Apply ICA on signals from all sensors # of ICs

C01 C02 C03 C04 C05 C06 C07

10 7 7 7 8 9 10

Apply ICA on signals from most sensitive 7 sensors

# of correlation 3 5 0 3 0 3 0 3 4 4 2 4 1 5

# of ICs

# of correlation 2 0 0 0 2 1 1

Table 3. Accuracy using C01-C07 for training and test

Test data C01,C02 C01,C03 C01,C04 C01,C05 C01,C06 C01,C07

Using all sensors 0.9150 0.9275 0.9487 0.9115 0.9107 0.9330

Using most ensitive 7 sensors 0.9638 0.9813 0.9596 0.9415 0.9675 0.9703

914

B. Han et al.

We used 70% of C01 for training, the remaining 30% of C01 and the same number of samples from another damaged dataset for test. Table 3 compared the classification accuracy by using all signals or using the most sensitive seven signals, whose results showed sensitive sensors significantly improve the prediction accuracy.

4 Conclusions In this paper, we applied sensitivity analysis on a structure damage classifier, which effectively integrates ICA and ANN together. We evaluated the analysis results based on the benchmark data from University of British Columbia. The results showed sensitive sensors can effectively reduce noise in the measured background. The damage detection accuracy using sensitive attributes is significantly better than those obtained by using full sensor signals. Furthermore, the detailed analysis can help us identify the functions of different type and location sensors. In our further research, we would apply this approach in extensive applications.

References 1. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, The Morgan Kaufmann Series in Data Management Systems. (2000) 2. Fritzen, C.-P., Jennewein, D., Kierer, Th.: Damage Detection Based on Model Updating Methd., Mechanical systems and Signal Processin. 12 (1998) 163-186 3. Müller, K., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.: An Introduction to KernelBased Learning Algorithm. IEEE Transactions on Neural Networks, 12 (2001) 4. Masri, S. F., Smyth, A. W., Chassiakos, A. G., Caughey, T. K.,Hunter, N. F.: Application of Neural Networks for Detection of Changes in Nonlinear System, J. of Engineering Mechanics, (2000) 5. Ni, Y. (ed.): The Artificial Neural Network Methods for Damage Detection of a Benchmark Structure. The Third Int. Workshop on SHM Standford (2001) 6. Song, H., Zhong, L., Han, B.: Structural Damage Detection by Integrating Independent Component Analysis and Support Vector Machine. Lecture Notes in Artificial Intellengence, Vol.3584, Springer-Verlag, Berlin Heidelberg New York (2005) 7. Zang, C., Friswell, M., Imregun, M.: Structural Damage Detection using Independent Component Analys, Structural Health Monitoring, Int. J. 3 (2004) 69-84 8. Müller, K.-R., Smola, A.J., Rätsch, G., Schölkopf, B., Kohlmorgen, J., Vapnik, V.: Predicting time series with support vector machines. Artificial Neural Networks, SpringerVerlag, Vol.1327. (1997) 999–1004 9. DECOSTE, D., LKOPF, B. S.: Training Invariant Support Vector Machines. Machine Learning, (2001) 10. Song, H., Zhong, L., Moon, F.: Structural Damage Detection by Integrating Independent Component Analysis and Artificial Neural Networks. CSREA (2005) 190-196 11. http://wusceel.cive.wustl.edu/asce.shm/benchmarks.htm

Inferring Certification Metrics of Package Software Using Bayesian Belief Network Chongwon Lee1, Byungjeong Lee2,*, Jaewon Oh3, and Chisu Wu1 1

School of Computer Science and Engineering, Seoul National University, Korea {ljw, wuchisu}@selab.snu.ac.kr 2 School of Computer Science, University of Seoul, Korea [email protected] 3 Mobile Software Platform Team, Software Laboratories, Samsung Electronics Co., Korea [email protected]

Abstract. Due to the rapidly increasing package software products, the quality certification has been required for software products. When certifying software products, one of the most important factors considered is the selection of the metrics. In this paper, specific package software types are represented as characteristic vectors having relationships with the specific metrics. The relationships are also described using probability. Once represented with the characteristic vectors, a specific package software product can distinguish itself from the other package software products. In order to utilize the past package software certification data, Bayesian belief network (BBN) is adopted. When using BBN, the dependency relationship network of the characteristic vectors and metrics should be constructed by first using the past package software certification data. The dependency relationship network is then used to infer the proper metrics for the certification of new package software products.

1 Introduction According to the Department of Defense in the United States, over 99% of computer executable programs are in a form of package software style [1]. Package software products mean the software systems that can be used without modification [2]. Only external behavioral characteristics are measured to certify the quality of package software products. When such quality measurements are carried out, diverse metrics are examined and chosen to set up the criteria of measurement. Software systems are the most complex invention created by mankind. For example, the most extensive software systems should be capable of handling 1020 different situations [3]. Therefore, it is considered to be very difficult to achieve high reliability [4]. Certifying the package software products requires some other customized kind of evaluation method [5]. By analyzing the past certification data, specific relationships between the quality characteristics and metrics could be identified. So far, statistical methods have been used to infer such relationships. However, it has been noted that the statistical methods have not described the casual relationships among the software quality and metrics [6]. *

Corresponding author.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 915 – 920, 2006. © Springer-Verlag Berlin Heidelberg 2006

916

C. Lee et al.

In this paper, Bayesian Belief Network (BBN) [7] is adopted. BBN is usually used to represent causal knowledge such as the fact that lightning causes thunder. In this paper, each package software product represents itself using characteristic vectors. Just like the training data of neural networks, the past software quality certification data are used to construct BBN. This paper is organized as follows. In Section 2, related work and BBN are described. In Section 3, the definition of characteristic vectors and reasons for using BBN are described. In Section 4, an example of metrics inference using a BBN is shown. The conclusions follow in Section 5.

2 Related Work 2.1 Software Metrics Inference A method was proposed to infer metrics using an interconnected multilayer perceptrons grid [8]. The metrics are set up to quantify the characteristics of source codes for software quality analysis. As a basic metrics inference mechanism, the method utilizes the self-learning capability of perceptrons. Through trial-and-error, the grid of the perceptrons can classify the characteristics of source codes for proper metrics. However, one should not overlook the possibility of biased output data occurring due to over-fitting on old training data. Another technique is unsupervised learning [6]. The main idea is to increase the probability of finding optimal solutions. There are three data sets of software metrics forming fuzzy clusters. The clusters are ranked by the number of faults caused by their software modules. This technique allows gradually improving modules to be traced continuously. However, there is also possibility of skewed reasoning due to lack of background data. 2.2 Bayesian Belief Network The outcome of Bayesian inference reacts to environmental changes sensitively since Bayesian learning incorporates past experiences and decisions of the operator [7]. Bayes theorem consists of the basic algorithm of Bayesian learning. When there are many variables to consider for probability distribution, BBN is used instead of Bayes theorem. A BBN’s primarily aim is to describe the joint probability distribution from a set of variables used to estimate the posterior probabilities. A node of BBN, which corresponds to a variable, forms a dependency network with other nodes [9]. The algorithm which calculates the joint probability distribution of BBN can be expressed as: n

P( y1 ,..., y n ) =

∏ P( y | Parents(Y )) . i

i

(1)

i =1

Parent(Yi) means the set of Yi’s immediate parent nodes. Thus, P(yi| Parent(Yi)) means the conditional probability, which is related to the node Yi, considering its parent nodes.

Inferring Certification Metrics of Package Software Using Bayesian Belief Network

917

3 Characteristic Vectors and Bayesian Belief Network 3.1 Characteristic Vectors

In this paper, each unique package software product is represented with the characteristic vectors which have components showing detailed properties of package software type. A characteristic vector can include characteristic sub-vectors as its components recursively. The value of the characteristic vector, which is a Boolean type, becomes true, for corresponding specific software, otherwise it is false. Table 1 is an example of characteristic vectors. In Table 1, both CV1 and CV2 have CVij sub-vectors. For example, CV11 which is the component, “Office Management,” belongs to CV1. Likewise, CV22 belongs to CV2 as the component, “Resolution.” Table 1. Characteristic vector example Vector Application Software (CV1)

N/W & Communication (CV2)

Components Office Management (CV11) Financial (CV12) Manufacturing (CV13) Education (CV14) Credit Cards (CV15) Internet (CV21) Resolution (CV22) WAP (CV23) Security (CV24) E-Mail (CV25)

3.2 Reasons for Using Bayesian Belief Network

There are many unstable factors that influence software development [10]. In this study, BBN was selected to overcome such unstable factors due to the following reasons [11]: First, the lack of the past cases or training data does not pose as a critical problem as the dependency relationship network can make inferences from other prior data. Second, the dependency relationship network searches a large area of software faults detection increasing the possibility of catching hard-to-find errors. Third, it is easy to integrate prior knowledge and newly learned experiences or data. In sum, when considering the matter of package software certification, it is relatively easy to apply BBN because the existing past certification data are available and the dependency relationships can be easily extracted from the data. Moreover, BBN causes fewer burden on software certifiers who are primarily interested in actual field application rather than pure theory since BBN only requires least knowledge of basic dependency relationships regarding the specific software domains [12].

918

C. Lee et al.

4 Deriving and Using Dependency Relationships 4.1 Deriving Dependency Relationships

Based on the existing certification data of the Telecommunications Technology Association in Korea, an initial BBN, as shown in Fig. 1, is set up with six package software products certification data.

Fig. 1. Bayesian belief network of characteristic vectors (CV) and metrics (M)

The child nodes are the metrics having relationships, which are shown as arcs in Fig. 1, with the CV nodes. For example, the M10 node, which is the metric, “Interface standard compliance,” has two arcs coming from CV11 and CV24 because M10 was applied for quality certification of package software product corresponding to CV11 and CV24. Likewise, other metrics nodes have arcs coming from the parent CV nodes according to the past certification data. With this BBN, proper metrics can be estimated for a new package software product by adjusting the conditional probability distribution. 4.2 Constructing Bayesian Belief Network

Besides the dependency relationships, a node of BBN should have a conditional probability table describing possible statuses of parent nodes to make a joint probability distribution table for actual inference. Table 2 shows an example of a conditional probability table for M10 node, which is the metric, “Interface standard compliance.” M10 is influenced by its two parents, CV11 and CV24, which are the characteristic vector components, “Office Management” and “Security.” Each of the two parent nodes has two states, “True” or “False.” From this table, there are only four possible Table 2. Conditional probability distribution of M10 Parent Nodes (True = 1, False = 0) CV11 CV24 True True False True False False

Node: M10 Probability of Being True 1.0 0.5 0.5 0.0

Inferring Certification Metrics of Package Software Using Bayesian Belief Network

919

parent node states. Table 2 enumerates such possible four states as the prior probabilities of BBN. If CV11 and CV24 are both true, M10 will have probability one to become true. If CV11 and CV24 are both false, M10 has probability zero to be true. Otherwise, if only either one of parent nodes becomes true, the probability is 0.5 for M10 since the half of parent nodes are true. After setting up the prior probability distribution, the BBN is ready to infer from various evidences of external events. Let us assume there is a new package software product that needs to be evaluated for certification. The software certifier wishes to know which metrics are suitable for the evaluation. The state of characteristic vector of the new package software product is as follows. (CV11 = True, CV24 = True) .

(2)

This state means every CV node is false except CV11 and CV24 in the BBN. Among the M nodes influenced by the CV nodes state, M11 should be noticed. Since half of the parent nodes are true, the posterior probability of M11 is set as 0.5. If one-third of the parent nodes are true, the posterior probability would be 0.33. Fig. 2 shows the metrics having higher posterior probability which shall be applied upon first priority for the certification of new package software product. The operator of BBN should pay attention to the result of certification so that new certification data will be used to update the BBN.

Fig. 2. Posterior probability distribution of metrics nodes

5 Conclusion This paper explains the use of applying BBN to analyze the existing software quality certification data. The main aim is to derive dependency relationships between the software characteristics and metrics in the BBN. If software certification institutions adopt the method proposed in this paper, some benefits can be expected. BBN can express the appropriateness of specific metrics in probabilistic detail for specific software, not just in a form of Yes or No. This would increase the potential of finding trivial faults since any metrics with probability greater than zero should be inspected. In the future, we plan to define diverse rules of setting up the prior probabilities. The public data of software review which are available on Internet can be used to set

920

C. Lee et al.

up the experimental BBN. The outcome of Bayesian inference will be compared to that of human software certification experts so that the capability of BBN may be examined in many different aspects.

Acknowledgments This work was supported by grant No. (R01-2006-000-11150-0) from the Basic Research Program of the Korea Science & Engineering Foundation.

References 1. Boehm, B.: Managing Software Productivity and Reuse. IEEE Computer, Vol. 32 (9) (1999) 111-113 2. Sommerville, I.: Software Engineering. 7th edn. Addison Wesley (2004) 3. Friedman, M.A., Voas, J.M.: Software Assessment: Reliability, Safety, Testability. John Wiley & Sons, Inc., New York (1995) 4. Jones, C.: Software Assessments, Benchmarks, and Best Practices. Addison-Wesley, New York (2000) 5. Basili, V.R., Boehm, B.: COTS-based Systems Top 10 List. IEEE Computer, Vol. 34 (5) (2001) 91-95 6. Dick, S., Kandel, A.: Fuzzy Clustering of Software Metrics. The 12th IEEE International Conference on Fuzzy Systems 2003. FUZZ '03, Vol. 1 (2003) 642-647 7. Jensen, F.V.: An Introduction to Bayesian Networks. Springer Verlag, New York (1996) 8. Alexiuk, M.D., Pizzi, N.J.: Discriminatory Software Metric Selection via a Grid of Interconnected Multilayer Perceptrons. IEEE CCECE 2003. Canadian Conference on Electrical and Computer Engineering, Vol. 2 (2003) 1131-1134 9. Mitchell, M.T.: Machine Learning. McGraw-Hill (1997) 10. Ziv, H., Richardson, D.: Bayesian Confirmation of Software Testing Uncertainties. In Proc. of International Conference on Software Maintenance (1997) 11. Heckerman, D.: A Tutorial on Learning with Bayesian Networks. Technical Report MSRTR-95-06. Microsoft Research, Redmond, WA (1995) 12. Fenton, N.E., Neil, M., Finkelstein, A. (ed.): Software Metrics: Roadmap. The Future of Software Engineering. ACM Press (2000)

Inﬂuence Analysis in Linear Mixed Models Yu Fei1 and Jianxin Pan2 1

Yunnan University, Kunming 650091, P.R. China 2 Manchester University, M13 9PL, UK [email protected], [email protected]

Abstract. This paper is devoted to discussion case-deletion methodologies of inﬂuence assessments in linear mixed models (LMM) for longitudinal data. The paper presents two case-deletion inﬂuence measures to detect the inﬂuential individuals in the context of LMM with unknown covariance structures. These two measures can be easily used to real data analysis.

1

Introduction

Linear mixed models (LMM) are commonly used models for longitudinal data analysis. The exiting case-deletion methodologies of inﬂuence assessments in LMM are mostly based on the likelihood function (see [1] and [2]). These likelihood-based methods are too diﬃcult to apply in the context of LMM with unknown covariance structures (CS). This paper proposes two Fisher information matrix-based Cook’s distances which can be used in the context of LMM with unknown CS. We consider the following classic LMM Y = Xβ + Zu + (1) ' m where Y = (Y1 , · · · , Ym ) is the n × 1 (n = i=1 ni ) response vector of m subjects, X = (X1 , · · · , Xm ) is the n × p design matrix for ﬁxed eﬀects β, Z = diag(Z1 , · · · , Zm ) is the n × mq design matrix for random eﬀects u = (u1 , · · · , um ) with u ∼ N (0, G) and = (1 , · · · , m ) is the n × 1 vector of random errors with ∼ N (0, R). The random eﬀects ui are independent of the random errors i , G = diag(G, · · · , G) with G = G(α), the q × q between-subject common covariance matrix, and R = diag(R1 , · · · , Rm ) with Ri = Ri (γ), the ni × ni within-subject covariance matrix, where α and γ are r × 1 and s × 1 parameters in G and Ri , respectively. The most simple structures for G and Ri are independent structures, i.e., G = σu2 Iq and Ri = σ2 Ini where σu2 and σ2 are between-subject and within-subject variances and Iq and Ini are q ×q and ni ×ni identity matrices, respectively. In the following context, all discussions are made in this independent covariance (IC) structures. In this case, the parameters of interest are θ = (β , α , γ ) = (β , σu2 , σ2 ) . When the ith subject is deleted, model (1) reduces to the subject-deletion model (2) Y[i] = X[i] β + Z[i] u[i] + [i] D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 921–926, 2006. c Springer-Verlag Berlin Heidelberg 2006

922

Y. Fei and J. Pan

where a vector/matrix with a subscript [i] represents relevant quantity with the ith subject Yi removed. Let θˆ and θˆ[i] be the maximum likelihood estimates (MLEs) of θ based on the log-likelihood functions l(θ) and l[i] (θ) for the full-model (1) and the casedeletion model (2), respectively. The commonly used diagnostic statistic Cook’s distance is given by ˆ M (θˆ[i] − θ) ˆ CDi (M ) = (θˆ[i] − θ)

(3)

for an appropriate choice of M . The key issue for computing CDi is the calcuˆ But for LMM with unknown CS, due lation of the diﬀerence between θˆ[i] and θ. to the existence of random eﬀects, the diﬀerence between θˆ and θˆ[i] is diﬃcult to compute (see [1] and [2]), resulting the diﬃculty of obtaining Cook’s distance for the full parameter θ from direct calculations. In this paper, based on Fisher scoring algorithm, we propose the one-step approximation to obtain the diﬀerence between θˆ[i] and θˆ through which we can ˆ the expectation of Hessian matrix derive Cook’s distance Di based on −E ¨l[i] (θ), ¨ ˆ −l[i] (θ), evaluated at θ. Furthermore, to reduce the burden of computation, ˆ to construct a new-type Cook’s distance D∗ ˆ to replace −E ¨l[i] (θ) we use −E ¨l(θ) i which is more simple than Di in form. We prove that Di∗ is a good approximation of Di in Section 2 and in Section 3 we show that both Di and Di∗ play well in detecting inﬂuential individuals in real data analysis.

2

Fisher Information Matrix-Based Cook’s Distance

The log-likelihood of model (1) is given by 1 1 n l(θ) = − log2π − log|Σ| − e Σ −1 e 2 2 2

(4)

where Σ = σu2 ZZ + σ2 In , e = Y − Xβ (see [3]). The MLEs of β, σu2 and σ2 can be obtained from the following normal equations l˙β = X Σ −1 e = 0 (5) l˙σu2 = − 21 tr(Σ −1 ZZ ) + 12 e Σ −1 ZZ Σ −1 e = 0 l˙σ2 = − 21 tr(Σ −1 ) + 12 e Σ −2 e = 0 The solution for β is obtained as a function of σu2 and σ2 as follows βˆ = (X Σ −1 X)−1 X Σ −1 Y

(6)

But there are no closed-form solutions for the MLEs of σu2 and σ2 . Consequently, the log-likelihood of reduced model (2) can be written as l[i] (θ) = −

1 n − ni 1 −1 log2π − log|Σ[i] | − e[i] Σ[i] e[i] 2 2 2

where Σ[i] = σu2 Z[i] Z[i] + σ2 In−ni , e[i] = Y[i] − X[i] β.

(7)

Inﬂuence Analysis in Linear Mixed Models

923

In order to derive Cook’s distance CDi deﬁned in (3), the key issue is to comˆ However, as mentioned above, there are pute the diﬀerence between θˆ[i] and θ. no closed-form solutions for the estimates of σu2 and σ2 , which make it impossible for obtaining the closed-form expression for the diﬀerence between θˆ and θˆ[i] . Alternatively, we use Fisher Scoring (F-S) algorithm to compute this diﬀerence. Applying F-S algorithm to l[i] (θ), with θˆ as the initial value, the one-stop approximate estimate θˆ[i] can be written as ˆ −1 l˙[i] (θ) ˆ θˆ[i] = θˆ + {−E ¨l[i] (θ)}

(8)

ˆ is where dots over function denotes derivatives with respect to θ, and −E ¨l[i] (θ) ˆ Thus, the diﬀerence between the expectation of Hessian matrix, evaluated at θ. θˆ[i] and θˆ is ˆ −1 l˙[i] (θ) ˆ θˆ[i] − θˆ = {−E ¨l[i] (θ)} (9) ˆ Cook’s distance deﬁned by (3) is given Based on (9), taking M = −E ¨l[i] (θ), by

ˆ {−E ¨l[i] (θ)} ˆ −1 {l˙[i] (θ)} ˆ Di = {l˙[i] (θ)}

(10)

ˆ in (10) is needed for all cases, the total Since the weighted matrix {−E ¨l[i] (θ)} ˆ instead of computation burden involved will be very heavy. We use {−E ¨l(θ)} ˆ to obtain the diﬀerence between θˆ[i] and θˆ as follows {−E ¨l[i] (θ)} ˆ −1 l˙[i] (θ) ˆ θˆ[i] − θˆ = {−E ¨l(θ)}

(11)

ˆ Similarly, we have {−E ¨l(θ)}-based generalized Cook’s distance Di∗ as follows ˆ {−E ¨l(θ)} ˆ −1 {l˙[i] (θ)} ˆ Di∗ = {l˙[i] (θ)}

(12)

Under mild condition, the following theorem shows that (11) is a good approximation for (9), Di∗ , is thus a good approximation of Di . ˆ = Op (n), l˙[i] (θ) ˆ = Op (1), E ¨l[i] (θ) ˆ − E ¨l(θ) ˆ = Theorem 1. Assuming that E ¨l(θ) Op (1) we have that ˆ −1 l˙[i] (θ) ˆ = θˆ + {−E ¨l(θ)} ˆ −1 l˙[i] (θ) ˆ + Op (n−2 ) θˆ[i] = θˆ + {−E ¨l[i] (θ)}

(13)

ˆ = E ¨l(θ) ˆ + Op (1) = E ¨l(θ)[1 ˆ Proof. Since E ¨l[i] (θ) + Op (n−1 )], we have −1 −1 −1 −1 ˆ ˆ ˆ −1 [1 − Op (n−1 )]. Not= {−E ¨l(θ)} [1 + Op (n )] = {−E ¨l(θ)} {−E ¨l[i] (θ)} ˆ = Op (1), we see that Theorem 1 holds. ing that l˙[i] (θ)

3 3.1

Examples Example 1 Aerosol Data

The aerosol data was analyzed in [4] in terms of local inﬂuence analysis and later in [2] under the assumption of known covariance structures. Both of them

924

Y. Fei and J. Pan

concluded that subject 5 is the most inﬂuential subject. Now we use Di and Di∗ as the inﬂuence measurements for inﬂuential subject detection in this data set. ˆ we deﬁne CDi = For comparison, using the true diﬀerence between θˆ[i] and θ, ˆ ¨ ˆ ˆ ˆ ˆ (θ[i] − θ) {−l[i] (θ)}(θ[i] − θ) which can be used as a criterion for comparison. All numerical results are presented in Table 1. In Table 1, we see that there is a values of CDi negative, i.e., CD5 = −6.8097. We use |CDi |, the absolute value of CDi , to replace them to produce the index plot. The index plots of |CDi |, Di and Di∗ are presented in Fig.1(a), from which we see clearly that the similarity among these three statistics. All of three measures indicate that the 5th subject is the most inﬂuential subject. Relatively, the inﬂuential pattern of Di∗ (line with triangles) looks more like that of |CDi | (line with boxes), implying that Di∗ may be better than Di in a sense. Table 1. Cook’s distance for aerosols data Subject 1 2 3 4 5 6

Di 1.1729 0.6971 0.7219 2.0037 10.9317 0.2887

Di∗ 0.8714 0.5570 0.5723 1.4585 8.9948 0.2260

|CDi | 1.1155 0.8716 0.9003 0.7306 6.8097 0.3361

CDi 1.1155 0.8716 0.9003 0.7306 -6.8097 0.3361

2.5

10 2.0

8

1.5

6

1.0

4

0.5

2

0

0.0

1

2

3

4

5

6

Index

(a)

0

5

10

15

20

25

Index

(b)

Fig. 1. The index plots of |CDi |, Di and Di∗ for Aerosol data (Panel (a)) and the index plots CDi , Di and Di∗ for Dental data (Panel (b)), where lines with boxes, circles and triangles are CDi , Di and Di∗ , respectively

3.2

Example 2 Dental Data

Actually, we also use Di and Di∗ proposed in this paper to analyze dental data presented in [5]. For saving spaces, we only present the ﬁnal results in Fig.1(b). In Fig.1(b), we see that the 20th and 24th individuals (two boys) are most inﬂuential on θ in terms of CDi , Di and Di∗ . These results are very similar to those of the ﬁndings of [6], [7], [8] and [9].

Inﬂuence Analysis in Linear Mixed Models

925

[10] also analyze this data using the growth curve modelling within likelihood framework as well as Bayesian framework and ﬁnd that the 20th and 24th individuals are the largest inﬂuential subjects. Obviously, our ﬁndings are the same as those made in [10] implying that Di and Di∗ work well in detection of inﬂuential subjects. In summary, in Example 1 and Example 2, we can see that the diagnostic results based on Di and Di∗ are very similar to those based on CDi , implying that the Fisher information matrix-based inﬂuence statistics proposed in Section 2 are good approximations to the real distance CDi based on the true diﬀerence ˆ However, in terms of analytical forms, obviously, D∗ is simpler between θˆ[i] and θ. i than Di . For simplicity, we prefer Di∗ to Di in practice.

4

Final Remarks

The Fisher information matrix-based measures proposed are eﬃcient in the detection of inﬂuential individual for full parameter θ. Noting that both weighted ˆ and {−E ¨l[i] (θ)} ˆ are block-diagonal, Cook’s measures Di and matrices {−E ¨l(θ)} ∗ Di can be decomposed as follows ∗ ∗ + Di(σ Di = Diβ + Di(σu2 ,σ2 ) , Di∗ = Diβ 2 2 u ,σ )

(14)

This decomposition is very useful in practical application which presents an approach to study the inﬂuence analysis for subset of parameters. For example, it is convenient to detect the inﬂuential points for the ﬁxed eﬀects β using Diβ ∗ or Diβ . However, for saving spaces, we give no more details about that. The inﬂuence measures presented in this paper are simple because the Fisher information matrix is block-diagonal. Theoretically speaking, likelihood-based Cook’s distance described in this paper can be extended to other sophisticated covariance structures such as AR(1) structure et al. However, new technical problems may arise because the associated weighted matrices will be too complicated to have explicit expressions of Di and Di∗ . In our next paper, we will derive a new diagnostic measure based on Q-function, the expectation of log-likelihood in the Expectation Maximization algorithm in [11]. This new measure can be used to solve the problem mentioned above.

References 1. Banerjee, M., Frees, E.W.: Inﬂuence Diagnostics for Linear Longitudinal Models. Journal of the American Statistical Association. 92 (1997) 999-1005 2. Christensen, R., Pearson, L.M., Johson, W.: Case-deletion Diagnostics for Mixedmodels. Technometrics. 34 (1992) 38-45 3. Lesaﬀre, E., Verbeke, G.: Local Inﬂuence in Linear Mixed Models. Biometrics. 54 (1998) 570-582 4. Beckman, R.J., Nachtsheim, C.J., Cook, R.D.: Diagnostics for Mixed-model Analysis of Variance. Technometrics. 29 (1987) 413-426

926

Y. Fei and J. Pan

5. Potthoﬀ, R.F., Roy, S.N.: A Generalised Multivariate Analysis of Variance Model Useful Especially for Growth Curve Problems. Biometrika. 51 (1964) 313-326 6. Lee, J.C., Geisser, S.: Applications of Growth Curve Prediction. Sankhya. A 37 (1975) 239-256 7. Lee, J.C.: Prediction and Estimation of Growth Curve with Special Covariance Structure. Journal of the American Statistical Association. 83 (1988) 432-440 8. Lee, J.C.: Tests and Model Selection for The General Growth Curve Model. Biometrics. 47 (1991) 147-159 9. Fei, Y.: Outlier Score Test in a Growth Curve Model. Systems Science and Mathematical Sciences. 8(2) (1995) 157-165 10. Pan, J.X., Fang, K.T.: Growth Curve Models and Statistical Diagnostics. SpringerVerlag, New York (2002) 11. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from Incomplete Data Via The EM Algorithm (with Discussion). Journal of Royal Statistical Society. 39 (1977) 1-38

Knowledge Reduction in Set-Valued Decision Information Systems Hong Wang and Wen-Xiu Zhang College of Mathematics and Computer Science, Shanxi Normal University, Linfen, Shanxi 041004, P.R. China [email protected]

Abstract. Knowledge reduction is one of the main problems in the study of rough set theory. This paper deals with knowledge reduction in set-valued decision information systems. The judgement theorem and discernibility matrix associated with reduction are established, from which we can obtain the approaches to knowledge reduction in set-valued information systems. Finally, the characteristics of core attribute are analyzed.

1

Introduction

Rough set theory ( RST ), proposed by Pawlak [8], is an extension of set theory for the study of intelligent systems characterized by insuﬃcient and incomplete information. As a kind of very eﬀective methods for data analysis, RST has been wildly applied to various ﬁelds, such as machine learning, information retrieval, software engineering, and knowledge discovery. One fundamental aspect of RST involves the searching for some particular subsets of condition attributes. By such one subsets the information for classiﬁcation purposes provided is the same as the condition attribute set done. Such subsets are called reducts. To acquire brief decision rules from decision systems, knowledge reduction is needed. In the rough set theory, knowledge reduction has been studied extensively in recent years from various point of perspectives and each of them aims at some basic requirement [2-11,13]. For example, possible rules and possible reducts have been proposed as a means to deal with inconsistence in inconsistent decision tables, generalized decision rules and generalized decision reducts provide a decision maker with more ﬂexible selection of decision behavior, and an approximation knowledge reduct was also introduced in [11] such that the knowledge reduction preserving the value of generalized inference measure function. In [1,12], a generalized decision logic language to treat dominance relation and discovered ordering rules for a particular types of binary relations. In this paper, we are concerned with approaches to knowledge reduction in set-valued decision information systems. In the next section, we give some basic notions related to set-valued information systems. In Section 3, we examine the judgement theorem and discernibility matrix, A computative example is also given to illustrate our approach, and then discusses the approach to reduction D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 927–932, 2006. c Springer-Verlag Berlin Heidelberg 2006

928

H. Wang and W.-X. Zhang

as well as the characteristics of core attribute. We then conclude the paper with a summary in Section 4.

2

Basic Notions Related to Set-Valued Information Systems

An information systems is a triplet (U, A, F ), where U = {x1 , x2 , . . . , xn } is a non-empty, ﬁnite set of objects called the universe of discourse, A = {a1 , a2 , . . . , am } is a non-empty, ﬁnite set of attributes, F = {fl : l ≤ m} is the family of mapping, where fl : U → P0 (Vl ) (l ≤ m) where Vl is called the the domain of al , P0 (Vl ) is the set of non-empty subsets of Vl . Each non-empty subset B ⊆ A determines a binary relation as follows: RB = {(x, y) ∈ U × U : fa (x) ∩ fa (y) = ∅, ∀a ∈ B} By [x]B denote the set of indiscernible objects: [x]B = {y ∈ U : fa (x) ∩ fa (y) = ∅, ∀a ∈ B} The following properties hold: (1) RB is reﬂexive and symmetric, but may not be transitive, so it is a tolerance relation. (2) B1 ⊆ B2 ⊆ A ⇒ RB1 ⊇ RB2 ⊇ RA . (3) B1 ⊆ B2 ⊆ A ⇒ [x]B1 ⊇ [x]B2 ⊇ [x]A (4) J = {[x]B : x ∈ U } constitute a covering of U . For any X ⊆ U and B ⊆ A, the lower and upper approximations of X with respect to the tolerance relation RB are deﬁned as follows: RB (X) = {x ∈ U : [x]B ⊆ X} RB (X) = {x ∈ U : [x]B ∩ X = ∅}. Theorem 1. Let (U, A, F ) be a set-valued decision information systems, for any B ⊆ A and X, Y ⊆ U , its lower and upper approximation satisfy the following properties: (1) RB (X) =∼ RB (∼ X), RB (X) =∼ RB (∼ X); (2) RB (U ) = RB (U ) = U , RB (∅) = RB (∅) = ∅; (3) RB (X ∩ Y ) = RB (X) ∩ RB (Y ), RB (X ∪ Y ) = RB (X) ∪ RB (Y );

Knowledge Reduction in Set-Valued Decision Information Systems

929

(4) RB (X) ∪ RB (Y ) ⊆ RB (X ∪ B), RB (X ∩ Y ) ⊆ RB (X) ∩ RB (Y ); (5) RB (X) ⊆ X ⊆ RB (X) where ∼ X is the complement of X. A decision systems is a quadruple (U, A, F, d) where (U, A, F ) is a set-valued information systems, and d is a distinguished attribute called the decision, in this case A is called the conditional attribute set, d is a map d : U → Vd of the universe U into the value set Vd , we assume, without any loss of generality, that Vd = {1, 2, . . . , r}, Deﬁne Rd = {(xi , xj ) : d(xi ) = d(xj )} Then Rd partition U into a family of disjoint subsets U/Rd called a quotient set of U : U/Rd = {[x]d : x ∈ U } where [x]d denotes the equivalence class determined by x with respect to d, i.e., [x]d = {y ∈ U : (x, y) ∈ Rd }. If RA ⊆ Rd , then (U, A, F, d) is consistent set-valued decision information systems. Deﬁnition 1. Let (U, A, F, d) be a consistent set-valued decision information systems, if there exists an attribute set B ⊆ A such that RB ⊆ Rd , then B is called a consistent set of (U, A, F, d). If B is a consistent set, and no proper subset of B is a consistent set of (U, A, F, d), the B is referred to as a reduct of (U, A, F, d).

3

Approach to Knowledge Reduction in Consistent Set-Valued Decision Information Systems

This section provides an approach to knowledge reduction in consistent setvalued decision information systems. Let us ﬁrst introduce some concepts and notations. Deﬁnition 2. Let (U, A, F, d) be a consistent set-valued decision information systems, denote {a ∈ A : fa (xi ) ∩ fa (xj ) = ∅}, d(xi ) = d(xj ), Dd (xi , xj ) = ∅, d(xi ) = d(xj ). then Dd (xi , xj ) is called the discernibility attribute set, and Dd = {Dd (xi , xj ) : xi , xj ∈ U } is called the discernibility matrix of (U, A, F, d).

930

H. Wang and W.-X. Zhang

Theorem 2. Let (U, A, F, d) be a consistent set-valued decision information systems, B ⊆ A, then B is an attribute consistent set if and only if B ∩ Dd (xi , xj ) = ∅(Dd (xi , xj ) = ∅). Proof. Suppose B is a consistent set, we have RB ⊆ Rd , i.e. [x]B ⊆ [x]d (∀x ∈ U ). If Dd (xi , xj ) = ∅, then d(xi ) = d(xj ), thus xj ∈ [x]d , so xj ∈ [x]B . Therefore there exists b ∈ B such that fb (xi ) ∩ fb (xj ) = ∅, i.e. b ∈ Dd (xi , xj ). Hence B ∩ Dd (xi , xj ) = ∅. Conversely, for any xj ∈ [xi ]B , if d(xi ) = d(xj ), since [xi ]A ⊆ [xi ]d , [xj ]A ⊆ [xj ]d , we have [xi ]A ∩ [xj ]A = ∅. Thus there exists a ∈ A such that fa (xi ) ∩ fa (xj ) = ∅. Hence Dd (xi , xj ) = ∅. If B ∩ Dd (xi , xj ) = ∅, then there exist b ∈ B such that fb (xi ) ∩ fb (xj ) = ∅, then xj ∈ [xi ]B , which is a contradiction. So d(xi ) = d(xj ), i.e. xj ∈ [xi ]d , [x]B ⊆ [x]d (∀x ∈ U ). Thus B is a consistent set. Deﬁnition 3. Let (U, A, F, d) be a consistent set-valued decision information systems, Dd = {Dd (xi , xj ) : xi , xj ∈ U } is the discernibility matrix of (U, A, F, d). Denoted by ) * Md = { {ak : ak ∈ Dd (xi , xj )} : xi , xj ∈ U } Then Md is referred to discernibility function. Theorem 3. Let (U, A, F, d) be a consistent set-valued decision information systems, The minimal disjunctive normal form of discernibility function Md =

p ) qk * ( ail ) k=1 l=1

Denoted by Bk = {ail : l ≤ qk }, then Bk (k ≤ p) is just the set of all reducts. Proof. It follows directly from Theorem 2 and Deﬁnition 3. Theorem 3 provides an approach to knowledge reduction in consistent set-valued decision information systems. Now we present an example to illustrate this approach. Example 1. Given a consistent set-valued decision information system (Table 1). Table 1. A consistent decision set-valued DT 1 2 3 4 5 6

a1 {0,1} {1} {1} {3} {0,1} {0}

a2 {1} {1,2} {2} {1,2,3} {1} {3}

a3 {1} {1} {0,1} {0,1} {0,1} {1}

a4 {0,1} {0} {0,1} {0,1} {0,1} {1}

a5 {0,1} {1} {0,1} {2} {0,1} {0}

d 1 1 1 2 1 2

Knowledge Reduction in Set-Valued Decision Information Systems

931

Table 2 is a discernibility matrix of consistent set-valued decision information systems presented in Table 1. Table 2. The discernibility matrix Dd

x1 x2 x3 x4 x5 x6

x1 ∅

x2 ∅ ∅

x3 ∅ ∅ ∅

x4 {a1 , a5 } {a1 , a5 } {a1 , a5 } ∅

x5 ∅ ∅ ∅ {a1 , a5 } ∅

x6 {a2 } {a1 , a2 , a4 , a5 } {a1 , a2 } ∅ {a2 } ∅

The discernibility function is Md = (a1 ∨ a5 ) ∧ a2 ∧ (a1 ∨ a2 ∨ a4 ∨ a5 ) ∧ (a1 ∨ a2 ) = (a1 ∧ a2 ) ∨ (a2 ∧ a5 ). By Theorem 3 we conclude that both B1 = {a1 , a2 } and B2 = {a2 , a5 } are reducts of the DT. Deﬁnition 4. Let (U, A, F, d) be a consistent set-valued decision information r + systems, and {Bi : i ≤ r} all reducts of (U, A, F, d), then C = Bi is referred to as the core of (U, A, F, d).

k=1

Theorem 4. Let (U, A, F, d) be a consistent set-valued decision information systems. Then a ∈ A is an element of the core of (U, A, F, d) if and only if there exists xi , xj ∈ U such that Dd (xi , xj ) = {a}. Proof. It should be noted that a ∈ C if and only if A − {a} is not a consistent set. Then there exists xi , xj ∈ U such that D(xi , xj ) ∩ (A − {a}) = ∅. Hence there exists xi , xj ∈ U such that D(xi , xj ) = {a}.

4

Conclusion

Knowledge reduction is one of the main issues for the study of data mining and knowledge discovery in databases. Many types of attribute reduction in decision tables have been proposed based on rough set theory. In this paper, we propose a new technique of knowledge reduction in set-valued decision information systems. The judgement theorem and discernibility matrix associated with reduction are established, from which we can obtain the approaches to knowledge reduction in set-valued decision information systems. Finally, the characteristics of core attribute are studied.

932

H. Wang and W.-X. Zhang

Acknowledgements This work was supported by a grant from the Major State Basic Research Development Program of China (973 Program No.2002CB312200) and a grant from the National Natural Science Foundation of China (No.60373078).

References 1. Greco, S., Matarazzo, B., Slowinski, R.: Rough Approximation of a Perference Relation by Dominance Relations. European Journal of Operational Research 117(1999) 63-83 2. Kryszkiewicz, M.: Comparative Study of Alternative Types of Knowledge Reduction in Inconsistent Systems, International Journal of Interligent Systems 16(2001) 105-120 3. Kryszkiewicz, M.: Rough Set Approach to Incomplete Information Systems, Information Sciences 112 (1998) 39–49 4. Kumar, A.: New Techniques for Data Reduction in a Database System for Knowledge Discovery Applications. Journal of Intelligent Information Systems 10(1998) 31-48 5. Leung, Y., Wu, W.-Z., Zhang, W.-X.: Knowledge Acquisition in Incomplete Information System: a Rough Set Approach. European Journal of Operational Research 168(2006) 164-180 6. Mi, J.-S., Wu, W.-Z., Zhang, W.-X.: Approaches to Knowledge Reduction based on Variable Precision Rough set Model. Information Sciences 159(2004) 255-272 7. Nguyen, H.S., Slezak, D.: Approximation Reducts and Association Rules Correspondence and Complexity Results. In: Zhong, N. Skowron A., Oshuga, S. (eds.), Proceedings of RSFDGrC’99, Yamaguchi, Japan. LNAI 1711, (1999) 137–145 8. Pawlak, Z.: Rough Sets, International Journal of Computer and Information Science. 11(1982) 341–356 9. Pawlak, Z.: Rough sets: Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, Boston, (1991) 10. Skowron, A., Rauszer, C.: The Discernibility Matrices and Functions in Information Systems, In: R. Slowinski (ed.), Intelligent Decision Support-Handbook of Applications and Advances of the Rough Sets Theory, Kluwer Academic Publishers, (1992) 331–362 11. Slezak, D.: Approximate Reducts in Decision Tables. In: Proceedings of IPMU’96, Granada, Spain, 3(1996) 1159-1164 12. Yao, Y.Y., Wong, S.K.M.: Generalization of Rough Sets using Relationships between Attribute Values, Proceedings of the 2nd annual Joint Conference on Information sciences, (1995) 30-33 13. Zhang,W.X., Leung,Y., Wu,W.Z.,: Information Systems and Knowledge Discovery, Science Press, Beijing (2003)

Modeling and Fusion Estimation of Dynamic Multiscale System Based on M-Band Wavelet and State Space Projection Peiling Cui1,2, Guizeng Wang1, and Quan Pan2 1

Department of Automation, Tsinghua University, Beijing 100084, China [email protected] 2

Department of Automation, Northwestern Polytechnical University, Xi’an, 710072, China

Abstract. This paper investigates the modeling and fusion estimation of a class of dynamic multiscale system (DMS). The state equation at the finest scale is known, and the measurement equations at J scales are also known. There is one sensor distributed at each scale, and the sampling rate of which decreases by a factor of M. M-band wavelet transform is used to build the state space projection equation, and a new state space model that satisfies Kalman filtering condition is derived. Finally, Monte Carlo simulation results are given to obtain insight into the efficiencies offered by our model and algorithm.

1 Introduction The multiscale autoregressive (MAR) framework [1] was developed to model a variety of random processes efficiently. MAR estimates the stationary processes from numerous measurements by multiscale techniques, which aim at saving much computation compared with the traditional linear minimum mean-square-error estimation (LMMSE). Many advanced systems are mostly observed by several sensors independently at multiple scales. The resolution and sampling rate of the sensors are supposed to be different. The state equation is described by a partial differential equation at the finest scale. An important practical problem in the above systems is to find a state estimator given the observations. This problem has been studied in recent years due to the numerous applications associated with it [2]. In [2], an algorithm for optimal and dynamic multiresolutional distributed filtering is derived. The wavelet transform is utilized as a bridge linking the signals at different resolution levels. In [3], optimal estimation of a class of dynamic multiscale systems is discussed. In aforementioned papers, they all assumed that sampling ratio between the adjacent scales is two. However, this is a real limitation in many practical situations. In this paper, we aim at obtaining the optimal fusion estimation of a class of dynamic multiscale systems (DMS), which are observed by J sensors independently with the sampling rate of sensor 1 to J decreases by a factor of M. In each time block k ∆T , there are M J −1 state nodes at scale 1, M J − 2 nodes at scale 2, …, and sequentially only 1 node at scale J . Sensor 1 to sensor J corresponds to scale 1 to scale J, and their observations are fused to estimate the state of the DMS optimally. This class of dynamic multisale system is D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 933 – 938, 2006. © Springer-Verlag Berlin Heidelberg 2006

934

P. Cui, G. Wang, and Q. Pan x1 (k1 + 1) = A(k1 ) x1( k1 ) + B (k1 ) w(k1)

(1)

z j ( k j ) = C j (k j ) x j (k j ) + v j (k j ) , j = 1, 2,...J .

(2)

where x j (k j ) ∈ R N x ×1 is the state vector and z j (k j ) ∈ R N z ×1 is its measurement. k j j

denotes the sampling time at scale j . A(k1 ) , B(k1 ) and C (k j ) are the system, input and measurement matrices. w(k1 ) ∈ Ru×1 and v j (k j ) are Gaussian white processes with zero mean and covariance matrices Q (k1 ) and R j (k j ) , respectively. They are individually independent. In this paper, modeling and optimal fusion estimation algorithm for the mentioned DMS is presented. M-band wavelet transform is used to perform the state projection. Monte Carlo simulation results demonstrate that the proposed algorithm is effective and powerful in this kind of DMS fusion estimation problem.

2 Modeling and Fusion Estimation Based on M-Band Wavelet 2.1 State Space Projection Based on M-Band Wavelet Suppose that the filter length of M-band wavelet is L, L > M , and n − m + 1 = L . For clarity, we unify the notations of state nodes firstly. x j +1 ( M J − j −1k ) is denoted as the state of scale j+1. There are L finer nodes x j ( M J − j k − n) , x j ( M J − j k − n + 1) , ..., x j ( M J − j k − m) that relate to one coarse scale node x j +1 ( M J − j −1k ) . The multiscale sys-

tem does not constitute homogeneous tree of order M and a portion of the state vector in time block k ∆T overlap with that in time block (k + 1) ∆T . Supposing that the low-pass filter corresponding to the M-band wavelet is H1 , and h(0) denotes the impulse response at moment zero. Let us define H1 = [ h(m), h( m + 1),!, h(0),!, h(n)] .

(3)

ª º H j +1 = « h(m),0,0,...,0, h(m + 1),0,0,...,0, h(m + 2),..., h(n − 1),0,0,...,0, h( n ) » .

«¬ »¼ Mj Mj Mj

(4)

where H j +1 is obtained by inserting M j zeros between every two coefficients of filter H1 . In time block k ∆T , the node at scale j+1 is obtained as what follows H j +1 = H 1 * H 2 * " H j .

(5)

where ∗ represents convolution. Letting H 1 = [1] , and x j +1 (<) = ( H j +1 ∗ x1 (<) ) ↓ M j .

where ↓ M denotes downsampling by a factor of j

H

j +1

( j +1)

can be derived, and L

= ( M − 1)( L − 1) + 1 . j

M

j

(6) . Then the tap number of filter

Modeling and Fusion Estimation of DMS

935

Letting H j +1 = ª¬ h j +1 (m j +1 ), h j +1 (m j +1 + 1),..., h j +1 (n j +1 ) º¼ , we have m j +1 = ( M j − 1)m, n j +1 = ( M j − 1)n .

In the following, we denote m j = 0,1,!, M x j +1 ( M J − j −1k + m j +1 ) =

n j +1

¦

i=m

j +1

J− j

(7)

− 1; j = 1,!, J . From Eq. (6), we can get

h j +1 (i) x1 ( M J −1k + M j m j +1 − i) .

(8)

root node xJ ( k ) is of the form xJ (k ) = ( H J ∗ x1 (<) ) ↓ M J −1 =

nJ

¦h

J

(i ) x1 ( M J −1k − i ) .

(9)

i = mJ

the calculation of xJ ( k ) needs L( J ) = ( M J −1 − 1)( L − 1) + 1 nodes at scale 1. Defining T

x (k ) = ª¬ x1T ( M J −1k − ( M J −1 − 1)n), x1T ( M J −1k − ( M J −1 − 1)n + 1),!, x1T ( M J −1k − ( M J −1 − 1)m) º¼ .

(10) ª º j +1 j +1 j +1 j +1 j +1 j +1 » < < < T j +1 ( m j +1 ) = « Ο , ..., h ( n ), h ( n 1), ..., h ( m ), , ..., Ο Ι Ι − Ι Ο Ο

» .

« Z 2j +1 ( m j +1 ) » L (j +1) «¬ Z1j +1 ( m j +1 ) ¼

(11)

where ǿ is N x × N x identity matrix, Ο is N x × N x zero matrix, Z1j +1 (m j +1 ) and Z 2j +1 (m j +1 ) are Z1j +1 (m j +1 ) = M j m j +1 + n( M J −1 − M j ) .

(12)

Z 2j +1 (m j +1 ) = m( M j − M J −1 ) − M j m j +1 .

(13)

The node x j +1 ( M J − j −1k + m j +1 ) at scale j + 1 can be written as x j +1 ( M J − j −1k + m j +1 ) = T j +1 (m j +1 ) x (k ) .

(14)

x j ( M J − j k + m j ) = T j ( m j ) x (k ) .

(15)

Then

2.2 System Equation The augmented measurement equation is z (k ) = C (k ) x (k ) + v (k ) .

(16)

the derivation of Eq. (16) is similar with [4]. When the measurements of time block (k + 1)∆T are available, the overlapped part of x (k ) and x (k + 1) is performed smoothing process. The last element of x (k ) is x1 ( M J −1k − ( M J −1 − 1)m) , the first element of

x (k + 1)

l1 = M

J −1

− (M

is J −1

x1 ( M J −1 (k + 1) − ( M J −1 − 1)n)

,

the

overlapping

length

is

− 1)(n − m) , and the non-overlapping length is l 2 = M J −1 . In order to

936

P. Cui, G. Wang, and Q. Pan

simplify the expression, we define l (k ) = x1 ( M J −1k − ( M J −1 − 1)m) , then the last element of x (k + 1) is x1 (l (k + 1)) , and that of x (k ) is x1 (l (k )) , namely, x1 (l (k + 1) − l 2) . For the non-overlapping part of x (k + 1) and x (k ) , the following can be obtained according to the state equation m1

x1 (l ( k + 1) − l 2 + 1 + m1 ) = ∏ A(l ( k + 1) − l 2 + m1 − n)< x1 (l ( k + 1) − l 2) n =0

m1 m1 − h −1

+¦ h=0

∏

.

(17)

A(l ( k + 1) − l 2 + m1 − n)< B (l ( k + 1) − l 2 + h)w(l ( k + 1) − l 2 + h)

n=0

where m1 = 0,1,..., M J −1 − 1 . Letting w(k ) = col ( w(l (k + 1) − l 2), w(l (k + 1) − l 2 + 1),..., w(l (k + 1) − 1) ) .

(18)

m1

A(k , m1 ) = ∏ A(l ( k + 1) − l 2 + m1 − n) .

(19)

m1 − 2 ª m1−1 º «∏ A(l (k + 1) − l 2 + m1 − n)< B(l ( k + 1) − l 2), ∏ A(l (k + 1) − l 2 + m1 − n)< B(l (k + 1) − l 2 + 1) » . B (k , m1 ) = « n=0 n =0 » « » ¬,!, ( B(l (k + 1) − l 2 + m1 ) ) , Ο,!, Ο ¼

(20)

n=0

where w(k ) is M J −1u × 1 Gaussian white noise with covariance I , A(k , m1 ) is N x × N x matrix, B(k , m1 ) is N x × M J −1u matrix. Letting A(k ,0) º B(k ,0) º ª x1 (l (k + 1) − l 2 + 1) º ª ª « » « » « » A ( k ,1) B ( k ,1) » , B(k ) = « » , x (2, k + 1) = « x1 (l (k + 1) − l 2 + 2) » . A(k ) = « « » « » « » # # # « » « » « » J −1 J −1 x1 (l (k + 1)) ¬« A(k , M − 1) ¼» ¬« B (k , M − 1) ¼» ¬« ¼»

(21)

where A(k ) is l 2 ⋅ N x × N x matrix, B(k ) is l 2 ⋅ N x × M J −1u matrix, x (2, k + 1) is l 2 ⋅ N x × 1 matrix. x (2, k + 1) is non-overlapping part of x (k + 1) and x (k ) , and it can be written as x (2, k + 1) = [ Ο

A(k )] x ( k ) + B (k ) w(k ) .

(22)

where Ο is a matrix with the indicated dimension whose elements are all zeros. For simple expression, letting f (k + 1) = M J −1 (k + 1) − ( M J −1 − 1) n , the overlapping part of x (k + 1) and x (k ) is T

x (1, k + 1) = ª¬ x1T ( f (k + 1)), x1T ( f (k + 1) + 1),!, x1T ( f (k + 1) + l1 − 1) º¼ . § ¨ ©

(23)

· ¸ ¹

Letting Ι = diag ¨ I, I, , I ¸ , then !

l 1 −1

ªΟ Ι Ο º x (1, k + 1) = « » x (k ) + [ Ο1 ] w(k ) . ¬Ο Ο Ι ¼

where Ο1 is l1 ⋅ N x × M J −1u zero matrix.

(24)

Modeling and Fusion Estimation of DMS

937

From the above, the state transition equation has the form ªΟ Ι ª x (1, k + 1) º « x (k + 1) = « = » «Ο Ο ¬ x (2, k + 1) ¼ « Ο Ο ¬

Ο º ª Ο º » Ι » x (k ) + « 1 » w(k ) ¬ B(k ) ¼ A(k ) »¼

.

(25)

Letting ªΟ Ι « A( k ) = « Ο Ο «Ο Ο ¬

Ο º ª Ο º » Ι » , B (k ) = « 1 » . ¬ B(k ) ¼ A(k ) »¼

(26)

the augmented state equation is (27)

x (k + 1) = A(k ) x ( k ) + B (k ) w(k ) .

2.3 Optimal Fusion Estimation at Each Scale Writing Eqs. (16) and (27) together ° x (k + 1) = A(k ) x (k ) + B (k ) w(k ) . ® ¯° z (k ) = C (k ) x (k ) + v (k )

(28)

where w(k ) and v (k ) are white noise and they are independent. It is straightforward to see that this model meets the requirements of standard discrete time Kalman filtering [5]. Assuming that the filter is stable, the LMMSE xˆ (k ) of state x (k ) can be obtained by performing Kalman filtering. For the optimal state fusion estimation at other scales, we have theorem 1, and its prove can be referenced to Ref. [4]. Theorem 1: Suppose xˆ (k ) is the LMMSE of x (k ) in time block k ∆T , then the LMMSE of node x j ( M J − j k + m j ) is T j ( m j ) xˆ (k ) .

3 Simulation Results We take the following process as an illustrative example, whose system equation at the finest scale is x1 (3n + 1) = ax1 (3n) + bw(3n) . Suppose the DMS has two scales, and the measurements are available at both the scales z j (3 2 − j n + 3 2 − j −1) = c j x j (3 2− j n + 3 2 − j −1) + v j (3 2 − j n + 3 2 − j −1), j = 1, 2 . 8

20

6

15

4

10

14

(29)

12

12

10

10 2

5

8

0

0

8

-2

-5

6

-4

-10

6

4 4

-6

-15

-8

-20

-10

0

100

200

300

400

(a) scale 1

500

600

-25

2

2

0

20

40

60

80

100

120

140

160

(b) scale 2

Fig. 1. True state(solid) and estimated state (dotted)

180

0

0

100

200

300

400

(a) scale 1

500

600

0

0

20

40

60

80

100

120

140

160

180

(b) scale 2

Fig. 2. Measurement noise RMS(dotted) and estimation error RMS (solid)

938

P. Cui, G. Wang, and Q. Pan

where w(<) is Gaussian white noise with zero mean and its variance is q . Gaussian white noise v1 (<) and v2 (<) are with zero-mean and variances r1 and r2 , respectively. They are uncorrelated with w(<) . Let a = 0.95 , b = 1 , c1 = c2 = 1 , q = 1 , r1 = 16 , r2 = 13 . 200-run Monte Carlo simulations are performed. 3-band wavelet is used, and its filtering coefficients are h = [ 0.58610191 0.91943525 1.25276858 0.41389809 0.08056475 -0.25276857 ] . (30) g (1) = [ -0.20330296 0.94280904 -0.03239930 -1.21091061 -0.23570226 0.73950609 ] .

(31)

g (2) = [ 0.69911956 -1.08866211 0.79779083 -0.69911956 -0.13608276 0.42695404 ]

(32) . Fig. 1 shows a sequence of true state and the estimated state at scale 1 and scale 2. Fig. 2 compares the measurement noise root-mean-square (RMS) with the estimation error RMS at the two scales. The noise compression ratio at the scale 1 and scale 2 are 7.6dB and 2.4dB, respectively. It can be seen from Fig. 2 that, the estimation error RMS at the two scales is smaller than measurement noise RMS.

4 Conclusions This paper investigates the modeling and optimal fusion estimation of a class of DMS that is observed independently by sensors whose sampling rates decrease by a factor of M. M-band wavelet transform is used to perform the state space projection. A new state space model that satisfies Kalman filtering condition is derived. Monte Carlo simulation results are given to obtain insight into the efficiencies offered by our model and fusion estimation algorithm.

Acknowledgement This research has been supported by National Natural Science Foundation of China under grant 60234010 and by China Postdoctoral Science Foundation under grant 2005037353.

References 1. Basseville, M., Benveniste, A., Chou, K., Golden, S., Nikoukhah, R., Willsky, A. S.: Modeling and Estimation of Multiresolution Stochastic Processes. IEEE Transactions on Information Theory, 38 (1992) 766–784 2. Hong, L.: Multiresolution Distributed Filtering. IEEE Transactions on Automatic Control, 39 (1994) 853–856 3. Cui, P. L., Pan, Q., Wang, G. Z., Cui J. F.: Multiresolutional Filtering of a Class of Dynamic Multiscale System Subject to Colored State Equation Noise. Lecture Notes in Computer Science, 3560 (2005) 218-227 4. Pan, Q., Zhang, L., Cui, P. L., Zhang, H. C.: The Optimal Filtering of a Class of Dynamic Multiscale Systems. Science in China (Series E), 34 (2004) 433-447 5. Anderson, B.D.O., Moore J. B. (eds.): Optimal Filtering. Prentice-Hall, Inc., Englewood Cliffs, N. J. ( 1979)

Multiscale Feature Extraction for Time Series Classiﬁcation with Hybrid Feature Selection Hui Zhang1 , Mao-Song Lin1 , Wei Huang2 , Saori Kawasaki3, and Tu Bao Ho3 1 School of Computer Science and Technology Southwest University of Science and Technology Mianyang, Sichuan, P.R. China, 621010 2 School of Management Huazhong University of Science and Technology Wuhan, P.R. China, 430074 3 School of Knowledge Science Japan Advanced Institute of Science and Technology Nomi, Ishikawa, 923-1292 Japan [email protected], [email protected], [email protected], [email protected], [email protected]

Abstract. Time series classiﬁcation has attracted increasing interests in recent years. Time series data are normally high dimensional data and it is well known that high dimensionality decreases the classiﬁcation accuracy. Many feature extraction algorithms have been proposed to reduce the dimensionality of time series as a preprocessing stage. However, most of the proposed feature extraction algorithms don’t utilize the relevance between features and classes. Moreover, the optimized dimensionality often needs to be given by the users that is not convenient in real cases. In this paper, we propose a feature extraction algorithm for time series classiﬁcation utilizing the relevant information of features and classes. The key idea of this algorithm is combining unsupervised multiscale feature extraction with supervised feature selection techniques. The optimized dimensionality could be detected directly from the algorithm. Experimental results on several benchmark time series datasets demonstrate the beneﬁts of the proposed algorithm.

1

Introduction

Time series data are widely existed in our daily life, such as the daily stock market closing value [1], EEG data used in medical diagnosis. Time series classiﬁcation has been widely used in various domains, whose target is to construct a model from the training data and identify the class of a new come time series by that model. One of the main challenges of time series classiﬁcation is the so-called curse of dimensionality problem. The curse of dimensionality problem refers to the dependency of the sample size and feature dimensionality in classiﬁcation to guarantee a given classiﬁcation accuracy [2]. Unfortunately, many time series data are high dimensional with only limited number of samples. In practical cases, the sample size of the data is often ﬁxed. For alleviating the curse of D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 939–944, 2006. c Springer-Verlag Berlin Heidelberg 2006

940

H. Zhang et al.

dimensionality problem, dimensionality reduction can be used as a preprocessing stage for classiﬁcation. Feature extraction is a widely used dimensionality reduction method that transforms the time series from temporal domain into another domain has extensively exploited for time series classiﬁcation. Many feature extraction algorithms have been proposed based on diﬀerent heuristics. One of the widely used heuristics is to generate a new set of features with respect to minimizing the reconstruction error between the features and original data with given dimensionality. Taking lower frequency part of the data as the features is another popular heuristic. However, features having lower reconstruction error or lower frequency don’t lead to high classiﬁcation accuracy directly. Utilizing the relevance of features and classes on the training data as a heuristic could accept good results. Another drawback of the proposed feature extraction algorithms is that the dimensionality of features needs to be given by the user. Selection the dimensionality is not an easy task for users. In this paper, we propose a feature extraction algorithm by integrating an unsupervised feature extraction algorithm with supervised hybrid feature selection algorithms. Wavelet which is an unsupervised multiscale transform technique is used ﬁrstly to reduce the redundancy of time series. The importance of the transformed wavelet coeﬃcients (features) is ranked by their distinguishing ability with instances in diﬀerent classes by ReliefF algorithm. The optimized dimensionality of the subset features is determined by a wrapper searching method utilizing the classiﬁcation accuracy provided by the target classiﬁcation algorithm as the subset quality measure. The rest of the paper is organized as follows: We give the feature extraction algorithm in Section 2. The experimental evaluation results are introduced in Section 3. Finally, we conclude the paper with summarizing the main contributions of the paper.

2

Algorithm Description

The dimensionality reduction problem is concerned with ﬁnding an optimal mapping Θ for a given time series dataset S = [S1 , S2 , . . . , Sn ], in which each time series Si has dimensionality N . The mapping process S −→ Θ −→ Snew generates a new dataset Snew , and each sample in Snew has dimensionality M < N . The dimensionality reduction techniques can be grouped into two branches, i.e., feature selection and feature extraction, in terms of whether the data are transformed into another domain or not. Feature selection selects a subset of informative features from S, with respect to strong relevance with the class labels. Many feature selection algorithms assume features are independent. However, recent work has pointed out the eﬀect of feature redundancy to classiﬁcation problem [3]. Feature redundancy will decrease the classiﬁcation performance. There are huge amount of interaction between neighboring temporal points in a time series. That makes applying feature selection techniques to time series data directly unsuitable. This point motivates us to consider combining feature extraction algorithm with feature selection algorithm.

Multiscale Feature Extraction for Time Series Classiﬁcation

941

Wavelets are modern signal processing techniques that transform a signal from time-domain into time-frequency domain [4]. We use Discrete wavelet transform (DWT) in this work because DWT can be served as a decorrelating transform for certain stochastic processes. It has been shown that any two distinct within-scale or between-scale coeﬃcients are approximately uncorrelated [5]. Hence working with wavelet coeﬃcients rather than the original data is more suitable for dimensionality reduction problem for time series data. −→ Assuming a signal AJ is existed in the scale J = log2 (N ), where N is the dimensionality of the zero-padded signal. We call the transformed wavelet coeﬃ−−−−→ −−−−→ cients W (AJ ). The dimensionality of the transformed wavelet coeﬃcients W (AJ ) equals to or higher than the dimensionality of original time series. We employs ReliefF which is a supervised feature selection algorithm to rank the importance of wavelet coeﬃcients. ReliefF algorithm which is a general feature estimator has been successfully applied in various domains [6]. ReliefF estimates the quality of features in terms of how well their values distinguish instances near to each other among diﬀerent classes. One of the beneﬁts of ReliefF is that both of the numerical and categorical features can be dealt with. ReliefF algorithm just calculates the weight of wavelet coeﬃcients, a search algorithm and a subset evaluation criterion is needed to get the dimensionality reduced feature subset. For the feature subset evaluation problem, wrapper approach that utilizes the classiﬁcation accuracy as the criterion directly has the best performance [7]. However, the candidate combination of features is very −−−−→ huge. As the coeﬃcients in W (AJ ) have low interaction, we can treat them separately. Thus the eﬃcient forward or backward search can be applied by taking the assumption that features with higher weight calculated in ReliefF are more important. Forward selection refers to the search that starts from the empty of features, and smoothly increases the dimensionality (adding features). Backward search begins at the full set of features, and decreases the dimensionality gradually (remove features). We use the backward wrapper selection algorithm in our experiments shown in Algorithm 1. The dimensionality of wavelet coeﬃcients set is smoothly decreased by removing features with lower weight. In each iteration, the goodness of subset features are evaluated by the classiﬁcation accuracy. The classiﬁcation accuracy is estimated by 5 fold cross-validation in our experiments.

3

Experimental Evaluation

We conduct experiments with several benchmark datasets. Three classiﬁers are used in the experiments, that is, one-nearest neighbor-classiﬁer (1-NN), C4.5 - a decision tree classiﬁer [8], and PART - a rule based classiﬁcation algorithm [9]. The Euclidean distance is used for 1-NN algorithm. All the classiﬁcation results are validated by 10 times 10 fold cross-validation. We used four classiﬁed datasets (CBF, CC, Trance, Gun) from the UCR Time Series Data Mining Archive [10] and one dataset downloaded from Internet (ECG). To validate the eﬀectiveness of the supervised feature selection algorithms for original data (without redundancy reduction), ﬁrstly, we compare the clas-

942

H. Zhang et al.

Algorithm 1. The backward wrapper feature selection algorithm

−−−−→ Input: W (F1 , F2 , . . . , FN ) = W (AJ ) - transformed wavelet coeﬃcients ; E = {E(F1 ), E(F2 ), . . . , E(FN )} - the weight of each feature; S0 - a subset from which to start the search; Output: Sbest - the optimum subset; S0 = W ; γ = eval(S0 , M ) // evaluate S0 by the classiﬁcation accuracy provided by a classiﬁer M; γbest = γ; for i=1 to N do select the feature Ff , where f = argmin(E); S0 = S0 − W (Ff ) // remove the feature with lowest rank from the dataset; E = E − E(Ff ); //remove the item with lowest rank from the weight set; γ = eval(S0 , M ) if γ > γbest then γbest = γ; Sbest = S0 ; end if end for

siﬁcation accuracy of using the original ﬁve time series datasets directly and the classiﬁcation accuracy obtained by the proposed hybrid feature selection algorithm without feature extraction. The classiﬁcation error rates of three used classiﬁers on original time series are shown in Table 1. Table 2 presents the classiﬁcation error rates of the classiﬁers with feature selection. The comparison shows classiﬁcation with feature selection could achieve better classiﬁcation accuracy than classiﬁcation with original time series. Classiﬁcation with feature selection achieves higher classiﬁcation accuracy ten times and obtains lower classiﬁcation accuracy four times compared to classiﬁcation with original data. We notice that classiﬁcation with original data could achieve higher classiﬁcation accuracy sometimes. This is because some of the used classiﬁers (C4.5 and PART) have embedded feature selection procedure. The inherent feature selection algorithm could outperform the outside feature selection algorithm for some data. To validate the importance of redundancy reduction (multiscale feature extraction) with the hybrid feature selection process, we compare the classiﬁcation accuracy given by wavelet coeﬃcients to the classiﬁcation accuracy given by original data. The classiﬁcation error rates of three classiﬁers on Haar wavelet coeﬃcients with feature selection is given in Table 3. The classiﬁcation error rates of three classiﬁers on Daubechies 4 wavelet coeﬃcients with feature selection is presented in Table 4. Compared to the classiﬁcation results of using hybrid feature selection with original data (without redundancy reduction) shown in Table 2, classiﬁcation on wavelet coeﬃcients with hybrid feature selection attains higher accuracy for all the three used classiﬁers on all the used datasets except the 1NN classiﬁer on Gun data. Classiﬁcation with hybrid feature selection on redundancy reduced features (wavelet coeﬃcients) outperforms classiﬁcation with hybrid feature selection on original time series. Compared to the results of

Multiscale Feature Extraction for Time Series Classiﬁcation

943

classiﬁcation with original time series shown in Table 1, classiﬁcation on wavelet coeﬃcients with hybrid feature selection attains higher accuracy for all the three used classiﬁers on all the used datasets. The experimental results show Haar wavelet transform attains similar performance with Daubechies 4 wavelet. From the experimental results, we can conclude the following facts: – The features generated by multiscale feature extraction with hybrid feature selection achieves considerable higher classiﬁcation accuracy than original data. – Classiﬁcation with hybrid feature selection on original data attains higher classiﬁcation accuracy than classiﬁcation without feature selection on original data, but the superiority is not very obvious. – Redundancy reduction (multiscale feature extraction) plays an important role in hybrid feature selection for classiﬁcation.

Table 1. The classiﬁcation error rates (%) of the three classiﬁers on original datasets CBF 1NN 0.49 C4.5 9.06 PART 7.55

CC 1.28 7.32 8.35

Trace 12.1 12 10.80

Gun 5.70 8.40 8.55

ECG 40.14 60.14 52.86

Table 2. The classiﬁcation error rates (%) of the three classiﬁers on original datasets with feature selection CBF 1NN 0.29 C4.5 7.86 PART 8.49

CC 1.33 7.08 7.93

Trace 12.4 10.80 12.20

Gun 1.40 6.10 7.75

ECG 38.57 47.71 52.86

Table 3. The classiﬁcation error rates (%) of the three classiﬁers on Haar wavelet coeﬃcients with feature selection CBF 1NN 0.18 C4.5 6.02 PART 6.54

CC 0.53 6.32 6.133

Trace 10.20 5.15 3.70

Gun 4.60 4.65 6.90

ECG 38 42.29 46.86

Table 4. The classiﬁcation error rates (%) of the three classiﬁers on Daubechies 4 wavelet coeﬃcients with feature selection CBF 1NN 0.03 C4.5 5.23 PART 6.25

CC 0.30 5.43 6.37

Trace 11.85 6.00 8.75

Gun 2.90 5.25 5.15

ECG 34.29 47.29 50.00

944

4

H. Zhang et al.

Conclusion

We have proposed a feature extraction algorithm that integrates multiscale transform with hybrid feature selection for time series classiﬁcation in the paper. Multiscale (wavelet) transform is used to reduce the redundancy within time series data. The transformed features are ranked by their discriminating ability of nearest neighbor instances in the same class and diﬀerent classes. The optimum dimensionality of features is determined by a wrapper feature selection approach. Experimental results with several benchmark time series datasets showed that the proposed algorithm could generate features with superior classiﬁcation accuracy than using the original data directly. The experimental results also indicate that redundancy reduction is crucial for the proposed feature extraction algorithm.

References 1. Huang, W., Nakamori, Y., Wang, S.Y.: Forecasting stock market movement direction with support vector machine. Computers and Operations Research 32 (2005) 2513–2522 2. Randys, S., Jain, K.: Small sample size eﬀects in statistical pattern recognition: Recommendations for practitioners. IEEE Trans. on Pattern Analysis and Machine Intelligence 13 (2001) 252–264 3. Yu, L., Liu, H.: Eﬃcient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research 5 (2004) 1205–1224 4. Mallat, S.: A Wavelet Tour of Signal Processing. Second edn. Academic Press, San Diego (1999) 5. Craigmile, P.F., Percival, D.B.: Asymptotic decorrelation of between-scale wavelet coeﬃcients. IEEE Trans. on Information Theory 51 (2005) 1039–1048 6. Kononenko, I.: Estimating attributes: Analysis and extensions of relief. In Raedt, L.D., Bergadano, F., eds.: Proc. of the Seventh European Conference on Machine Learning, Berlin Heidelberg New York, Springer-Verlag (1994) 171–182 7. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artiﬁcial Intelligence 97 (1997) 273–324 8. Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA (1993) 9. Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: Proceedings of the 15th International Conference on Machine Learning. (1998) 144–151 10. Keogh, E., Folias, T.: The ucr time series data mining archive. http://www. cs.ucr.edu/∼eamonn/TSDMA/ index.html (2002)

Network Traffic Classification Using Rough Set Theory and Genetic Algorithm Ning Li, Zilong Chen, and Gang Zhou National Laboratory of Software Development Environment, Beihang University, Beijing 100083, China {lining, chenzl, gzhou}@nlsde.buaa.edu.cn

Abstract. Network traffic classification by applications is very useful for many network activities which include Quality of service, security monitoring, abnormalities detection etc. In this paper, a hybrid approach for network classification is presented. The target is to classify the traffic flow data into different application categories. Rough set classification can help finding hidden patterns and identifying relationships that would not be found using statistical method. Genetic algorithm is applied to get the reducts. The approach is a supervised method based on training data. The result indicates that the approach can achieve good accuracy and trusty.

1 Introduction As the dramatic increase of applications using the internet, accurate network traffic classification according to applications is very useful to many network activities such as security monitoring, accounting, Quality of service. It may facilitate detection of abnormalities in the traffic, malicious behavior or trend analysis. Due to its fundamental nature and its underpinning of many other techniques, the field of traffic classification has maintained continuous interest. Previous works have used a number of different parameters to describe network traffic. The most common identification techniques are based on the ‘known port numbers’. An analysis of the headers of packets is used to identify traffic associated with a particular port and thus a particular application. But it is no longer accurate because many applications no longer use fixed, predictable port numbers. Other works have used other parameters to describe network traffic [1, 2, 3, 4, 5] including flow size, flow duration, inter-arrival time, flow idle time etc. Machine-learning has been applied in network traffic classification. In [3] Naïve Bayesian analysis techniques are applied using hand-classified network data as input to a supervised Naïve Bayes estimator. A methodology based on machine-learning is presented in [7], which can break the packet header traces down into clusters of traffic with different characters. Some other automated traffic classification methods based on machine-learning are given in [6] [8]. In this paper, we present a hybrid approach for classification based on rough set theory and genetic algorithm. Rough set has many important advantages ([10]): synthesis of efficient algorithms for finding hidden patterns in data; identification D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 945 – 950, 2006. © Springer-Verlag Berlin Heidelberg 2006

946

N. Li, Z. Chen, and G. Zhou

relationships that would not be found using statistical methods; representation and processing of both qualitative and quantitative parameters and mixing of user-defined and measured data; Synthesis of classification or decision rules from data. Genetic algorithm ([12,13]) can helps finding minimal reduct of rules, and improve the efficiency of classification. It is suitable for network classification by combining Rough Set Theory and Genetic algorithm together. The remainder of this paper is organized as follow: Section 2 gives a mainly description of network traffic classification. Section3 introduces some preliminaries about Rough Set Theory and Genetic Algorithm. In section 4, the hybrid approach for traffic classification is given. Section 5 gives the evaluation result. Section 6 concludes this paper.

2 Network Traffic Classification The classification of traffic is an enabling function for a variety of other network activities and topics, including Quality of Service, security and intrusion-detection, monitoring. The target is to classify the traffic by applications. The input object of traffic-flow data is described by a number of different parameters, and the output is one of the traffic categories. 2.1 Objects and Parameters The application of a classification scheme needs the parameterizations of the objects to be classified. Using these parameters classifier allocates an object to a class. These object-describing parameters are chosen to allow discrimination between classes. The object classified in our approach is a traffic-flow represented as a flow of one or more packets between a given pair of hosts. The flow is defined by a tuple consisting of the IP address of the hosts, the protocol type and the port numbers used by the two hosts. Each flow has a number of unique properties (e.g. the source and destination ports), and characteristics which parameterize flow behavior [2]. Ten parameters are selected to each object and used as input for classification in our work. The parameters include flow duration, protocol type, port, flow count, time of the day, byte count, packet count, total active time, packet inter-arrival time, and payload size. 2.2 Traffic Categories Traffic categories which are given according to the application, is fundamental to classification work. Adapted from Moore et al. [2] [14], the main categories used in our work contain BULK (e.g. ftp), DATABASE (e.g. oracle, SQL Server), INTERACTIVE (e.g. ssh, telnet), MAIL (pop, smtp), SERVICES (e.g. dns, ldap, X11, ntp), WWW, P2P(BitComet), ATTACK(worm),GAMES(Half-Life), MULTIMEDIA (e.g. real,windows media player). The use of such categories is illustrative. While each flow is mapped to only one category, the characteristics of the traffic within each category are not necessarily unique.

Network Traffic Classification Using Rough Set Theory and Genetic Algorithm

947

3 Preliminary Rough set theory was developed by Zdzislaw Pawlak[1,2,3] in the early 1980’s. Pawlak’s rough set theory constitutes a mathematically sound framework for inducing minimal decision rules from labeled examples. It can be used for classification to discover structural relationships within imprecise or noisy data [4]. Some basic notions are given in this section. An information System is composed of a 4-tuple S = (U, A, V, f), where U={ x 1 , x 2 ,...x n } is the universe, and A is a finite set of attributes and is divided into two sets: the condition attributes C and the decision attributes D. V= ∪ a∈A Va is a set of attribute values, where Va is the domain of attribute a. f : U × A → V is called an information function that assigns particular values from domains of attributes to objects, such that f(x i , a) ∈ Va for all x i ∈ U and a ∈ A . Every P ⊆ A generates a relation on U which is called indiscernibility relation, denoted as IND(P): IND(P) = {( x, y) ∈ U 2 | f ( x, a ) = f ( y, a ), ∀a ∈ P} . Let P ⊆ A , X ⊆ U , P-lower ( PX ) approximation and P-upper( PX ) approximation are defined as follows: PX = {y ∈ U | [y]IND(P) ⊂ X} ; PX = {y ∈ U | [y]IND(P) ∩ X ≠ φ} . Reduct is a fundamental concept of rough sets. A reduct is the essential part of an information system that can discern all objects discernible by the original information system. The origin of Genetic Algorithm (GAs) is attributed to Holland’s(1972) work on cellular automata. GAs comprises a kind of effective searching and optimizing technique and has been applied to various fields. In particular, GAs works very well on combinatorial problems such as reduct finding in rough set theory [11, 12].

4 Traffic Classification Approach The aim of traffic classification is to identify flow by application category. The hybrid approach based on Rough Set Theory and Genetic Algorithm is given in this chapter. Rough set theory can classify data into sets inherently induced by a decision attribute, and it has some important advantages [10]: synthesis of efficient algorithms for finding hidden patterns in data; identification relationships that would not be found using statistical methods; representation and processing of both qualitative and quantitative parameters and mixing of user-defined and measured data; synthesis of classification or decision rules from data. The price paid is the size of the rule base. Therefore the reduction for the rule base is very necessary. The minimal rule reducts finding is proved as NP-hard problem [12]. Genetic algorithms, which comprise a kind of effective searching and optimizing technique, work very well on combinatorial problems such as reduct finding in rough set theory. Genetic algorithms are considered as one of the most effective algorithm for large decision system reduction [11, 12]. Therefore our approach contains mainly two parts: firstly use rough set theory for traffic classification, and then get the rule reduct through genetic algorithms.

948

N. Li, Z. Chen, and G. Zhou

4.1 Rough Set Classification Algorithm Based on the rough set theory, the steps of rough set classification algorithm for network traffic are mainly described as follow: (1) Preprocessing of data: the input data is the object described in 2.1. The raw input data set is split into two parts: the training data sets and testing data sets. (2) Discretization: for the real values attributes, the discretization strategy is built to obtain a higher quality of classification rules. In our following experiment, equal width discretization method is used. The equal interval width discretization method divides the range of observed values for an attribute into k equal sized intervals, where k>0 is a user-supplied parameter. If an attribute a is observed to have values bounded by a min and a max then this method computes the interval width: width(h) = (a max - a min ) / k And thresholds are constructed at a min + i * width(k ) where i=1,2, …, k-1. The method is applied to each continuous attribute independently. (3) Rule generation: The decision rules are created using the reducts computed by the attribute reduction algorithm. The rule reducts based on genetic algorithm is presented in 4.2. Rules are generated at last, and each rule is attached to a counter telling how many data supported.

After these steps, a further evaluation of the rules can be performed based on the testing data sets. 4.2 Reduct Algorithm Let O = {o1 , o 2 ,..., o m } be a set of flow objects, A = {a 1 , a 2 ,..., a n } be a set of attributes, a 1 : O → Vi . Values Vi are numerical for categories. d : O → Vd is the definition of decision. ∀k, n ∀i, a i (o k ) = a i (o n ) d(o k ) = d(o n ) is assumed. Reduct is the minimal subset R ⊆ A: ∀k, n ∀a i ∈ R and R is a reduct globally shortest. The method for a i (o k ) = a i (o n ) d(o k ) = d(o n ) finding minimal reduct using genetic algorithm is given as follow: (1) Representation: The individuals are presented by bit strings of length n(number of attributes). Chromosomes are interpreted as characteristic functions of subsets of A (set of attributes). Let B be a discernibility matrix, and the algorithm to generate the reducts is: (1.1) let i = 1, R = Φ and let (b1 ,..., b N ) = τ (a 1 ,..., a N ) be an ordered list of attributes-the order is represented by a permutation τ (1.2) If R satisfy (1.1), stop (1.3)Otherwise let R = R ∪ b i , i = i + 1 (1.4)Repeat from (1.2) The result will be a reduct, or a set of attributes containing a reduct as a subset. Every reduct can be found. The result depends on the order of attributes. The goal is to find the proper order of attributes using genetic algorithm.

Network Traffic Classification Using Rough Set Theory and Genetic Algorithm

949

(2) Function of fitness: The fitness function of the individual τ , F(τ ) = 1/Lτ where Lτ is the length of the subset found by (1). (3) Selection method : When the fitness function is calculated for each chromosome, the selection process begins. First, the fitness values are normalized:

FN(x) =

F(x)

¦ F(x)τ

Then the new population is chosen randomly with the new values FN(x). (4) Crossover, mutation Crossing-over process affects chromosome selected for reproduction with probability Pc . In the mutation process, chromosome is selected with probability Pm and then a single gene of the chromosome is chosen randomly.

5 Results The data for training and testing was collected from the network of NLSDE. The network is connected to the network, and it was on the connection to the internet that the monitor (NetFlow) was placed. Traffic was monitored in different period and for both link directions. According to the significant applications and attributes (described in Chapter 2), 2254 rows of data were selected and manually classified. The mount was split into a training set and a testing set containing 60% and 40% of the data respectively. The rules were generated by the hybrid algorithm based on the training set, and then were used to predict the test objects. As a result, 91.2% of the flows were classified correctly based on the testing data. Table 1. Measure of accuracy and trust

Application Category BULK DATABASE INTERACTIVE MAIL SERVICES WWW P2P ATTACK GAMES MUTIMEDIA

Accuracy (%) 91.2 94.3 76.2 90.3 84.3 97.1 80.5 40.0 32.7 81.2

Trust (%) 90.2 97.6 75.6 89.3 51.6 95.3 82.2 43.5 36.3 79.3

Accuracy is the raw count of flows that were classified correctly divided by the number of each category. Trust is a per-class measure and it is a probability that a flow that has been classified into some class, is in fact from this class. Accuracy and trust were also computed as the performance of the algorithm shown in Table 1. The reason for the low accuracy of Attack and Game was that the flow samples were not enough for these categories. Rough set theory shows its power in classifying data into

950

N. Li, Z. Chen, and G. Zhou

sets inherently, while many other methods (e.g. Bayes, Fuzzy Set) groups the flows into clusters of similar data. A further research would focus one the comparison of these methods.

6 Conclusion In this paper, we have demonstrated the application of rough set theory and genetic algorithms to classify network traffic by application. Rough set has many important advantages such as: synthesis of efficient algorithms for finding hidden patterns in data; identification relationships that would not be found using statistical methods; representation and processing of both qualitative and quantitative parameters and mixing of user-defined and measured data; Synthesis of classification or decision rules from data. Based on the manual Genetic algorithm can help finding minimal reduct of rules, and improve the efficiency of classification. The approach is a supervised method based on training data. The result indicates that the approach can achieve good accuracy and trusty. It is suitable for network classification by combining Rough Set Theory and Genetic algorithm.

References 1. Moore, A. W., Papagiannaki, K.: Toward the Accurate Identification of Network Applications. Proceedings of the Fifth Passive and Active Measurement Workshop (2005) 2. Karagiannis, T., Papagiannaki, K., Faloutsos, M.: BLINC: Multilevel Traffic Classification in the Dark. ACM SIGCOMM, Philadelphia, PA (2005) 3. Moore, A. W. et al: Internet Traffic Classification Using Bayesian Analysis Techniques. ACM SIGMETRICS (2005) 4. Paxson, V.: Empirically Derived Analytic Models of Wide-Area TCP Connections. IEEE/ACM Trans. Netw., 2(4) (1994) 316-336 5. Claffy K.C.: Internet traffic characterization. PhD thesis, UC San Diego (1994) 6. Zander, S., Nguyen, T., Armitage, G.: Automated Traffic Classification and Application using Machine Learning. Proceedings of the IEEE Conference on Local Computer Networks 30th Anniversary (2005) 7. Anthony M., et al.: Flow Clustering Using Machine Learning Techniques. Proceedings of the Fifth Passive and Active Measurement Workshop (PAM2004) (2004) 8. Krishnanurthy, B., et al.: Automated Traffic Classification for Application Specific Peering. Proceedings of ACM SIGCOMM Internet Measurement Workshop (2002) 9. Lin, T.Y., et al.: Rough Sets and Data Mining: Analysis for Imprecise Data. Kluwer Academic Publishers, Boston (1997) 411-425 10. Komorowski, J., et al.: Rough Sets: A Tutorial. In: S. K. Pal and A. Skowron, editors, Rough Fuzzy Hybridization. A New Trend in Decision Making, Singapore, Springer (1999) 3-98 11. ZHANG L. H., et al.: Intrusion Detection Using Rough Set Classification. Journal of Zhejiang University SCIENCE, 5(9) (2004) 1076-1083 12. Wroblewski, J.: Finding Minimal Reducts Using Genetic Algorithm. Proceeding of Second Annual Join Conference on Information Sciences. Wrightsvillle Beachm, NC 186-189 13. Buckles, B. P., et al.: Genetic Algorithms. Los Alamitos, California: IEEE Computer Press

Optimization Feature Compression and FNN Realization Shifei Ding1,2, Yuncheng Wang1, Zhongzhi Shi2, and Fengxiang Jin3 1 College of Information Science and Engineering, Shandong Agricultural University, Taian 271018 China 2 Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080 China 3 College of Geo-Information Science and Engineering, Shandong University of Science and Technology, Qingdao 266510 China [email protected], [email protected], [email protected], [email protected]

Abstract. Feature compression is one of the most importmant steps in pattern recognition. In this paper, based on minimum squared error (MSE) rule, we first give discrete K-L transform (DKLT). According to idea of entropy function, we propose two new entropy functions-entropy density function (EDF) and representation entropy (RE), by which we can metricize information content of data matrix X. Secondly, an optimization feature compression is put forward and information compression degree is measured by feature compression rate (FCR) and accumulated feature compression rate (AFCR) proposed by authors firstly. In the end, we give an adaptive feedforward neural network (FNN) and improved adaptive FNN to realize optimization feature compression This method can be applied in Biology, Biomathematics, Ecology, and Bioinformation Science and so on for optimization feature compression. Keywords: pattern recognition, optimization feature compression, entropy density function (EDF), representation entropy (RE), feedforward neural network (FNN).

1 Introduction Pattern recognition is the scientific discipline whose goal is the classification of objects into a number of categories or classes. Depending on the application, these objects can be images or signal waveforms or any type of measurements that need to be classified. We will refer to these objects using the generic term patterns. Pattern recognition has a long history, but before the 1960s, it was mostly the output of theoretical research in the area of statistics. While feature extraction or compression is one of the most important steps in pattern recognition [1-4]. Many scholars had been doing a great researches for compression of feature dimensions and applied many methods, such as correlation analysis, principal components analysis, rough sets and so on[5-8]. The authors study them continuously in this paper. According to idea of entropy function in information theory [9, 10], we study the problems of optimization feature compression, and give the realization method of adaptive Feedforward Neural Network (FNN). D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 951 – 956, 2006. © Springer-Verlag Berlin Heidelberg 2006

952

S. Ding et al.

2 Optimization Feature Compression 2.1 Entropy Density Function According to formula (1), we can get

¦ y = E[( y − y )( y − y )′] = T ′ ¦ x T = diag (λ1 , λ2 ,, λn )

(1)

From formula (1), we can see that DKLT makes ¦ x be diagonal matrix, and λi is variance of y i and which is

λi = σ i2 (T ) = E[( y i − µ i ) 2 ]

(2)

Where µ i = E ( y i ) is expectation y i . Let n

¦λ

ρ i (T ) = λ i

(3)

j

j =1

n

Where 0 ≤ ρi (T ) ≤ 1 and

¦ ρ (T ) = 1 . i

i =1

Because {ρi (T )} has numerical characteristic of probability, using the idea of entropy function, according to formula (3), we can define a new entropy rule, entropy density function (EDF), which is applied to represent discriminate capacity of λi , this is n

n

¦ λ ) log(λ ¦ λ )

I (λi ) = − ρi (T ) log ρi (T ) = −(λi

j

i

j =1

j

(4)

j =1

2.2 Optimization Feature Compression

According to EDF, we can compute all EDF values, I (λi ) (i = 1,2,, n) , and order them as follows: I (λ1 ) ≥ I (λ 2 ) ≥ ≥ I (λ m ) ≥ ≥ I (λ n ) . In order to measure the information compression degree, we define feature compression rate FCR of the i-th eigenvalue as follows n

FCR(Ȝi) = I (λi )

¦ I (λ ) ,

(i = 1,2, , n)

i

(5)

i =1

Then we define accumulated feature compression rate (AFCR) of the former m eigenvalues as follows m

AFCR(m) =

n

¦ FCR(λ ) ¦ FCR(λ ) ≥ α i

i =1

i

i =1

(6)

Optimization Feature Compression and FNN Realization

953

In formula (6), we may take AFCR(m) α = 80%, 90%, or 95%. For example, α = 95%, it indicates that the information content of y transformed by DKLT is 95% of that of x . For proper α ,we can decide m EDFs, which makes AFCR(m) α , and so we can select m eigenvectors t1 , t 2 , , t m corresponding to the front m eigenvalues. Therefore, we can get DKLT components of x as follows y i = t i′ x (i = 1,2, , m; m ≤ n)

(7)

Thinking of them as new features, we can reach the purpose of optimization feature compression.

3 FNN Realization In this paper, according to formulae (5), (6) and (7), the optimization compression of information feature can be realized by adaptive Feedforward Neural Network (FNN), which its structure is illustrated in Fig.1. x1

y1

x2

y2

x3

y3

#

#

xn

ym Fig. 1. Adaptive FNN structure of DKLT

Where the input layer has n neurons, n represents the nth original data sample of training set, the input vector is X = ( x1, x2 ,, xn )′ ; the output has m neurons, m represents the mth compressed data sample of training set, the output vector is Y = ( y1, y2 ,, yn )′ ; the corresponding connection weights matrix is composed of eigenvectors of the covariance of ¦ x , this is T = (t ji ), ( j = 1,2, , m; i = 1,2, , n)

(8)

In adaptive process, the relationship of input and output is y (t ) = T (t ) x(t )

(9)

Where y (t ) = ( y1 (t ), y 2 (t ), , y m (t )) ′ ; x(t ) = ( x1 (t ), x 2 (t ), , x n (t )) ′ ; T (t ) = (t1 (t ), t 2 (t ), , t m (t )) ′ .

(10)

954

S. Ding et al.

In order to accomplish optimization feature compression, we can adopt following adaptive algorithm. T (t + 1) = T (t ) + r (t )[ x(t ) y ′(t ) − T (t ) sup( y (t ) y ′(t ))]

(11)

Where r (t ) denotes step size, and sup() denotes an operator, which makes the all elements of matrix () zero under the diagonal line. According to above the network and algorithm, we can see that the network operation quantity is very big because the other weights need to operate again and again with the changes of one weight. An improved adaptive FNN of DKLT is proposed here, and its structure is illustrated in Fig.2 x1

s1(t)

x2

s2(t)

x3

#

y1 y2 y3

# s3(t) ym

xn Fig. 2. Improved Adaptive FNN structure of DKLT

In this improved adaptive FNN, the mth neuron connection weight can be recurred by the former ( m − 1 ) neuron connection weights. The relationship of network input and network output can be expressed as follows y (t ) = T (t ) x(t ) y(t) y m (t ) = Tm (t ) x(t ) + s (t ) y (t )

(12) (13)

Where y (t ) = [ y1 (t ), y2 (t ),, ym −1 (t )]T , T (t ) = (t ji (t )), j = 1,2,, m − 1; i = 1,2,, n

(14)

Where Tm (t ) = (tm1 (t ), tm 2 (t ),, tmn (t ))′ represents the mth neuron connection weight vector of the output layer, and s (t ) = ( s1 (t ), s 2 (t ), , s m −1 (t )) ′ represents the neuron connection weight vector of the former (m − 1) neuron and the mth neuron. Improved adaptive FNN algorithm is Tm (t + 1) = Tm (t ) + B[ y m (t ) x ′(t ) − y m2 (t )Tm (t )

(15)

s (t + 1) = s (t ) − r[ y m (t ) y ′(t ) + y m2 (t ) s (t )

(16)

Where B, r are learning step size of Tm (t ), s (t ) respectively.

Optimization Feature Compression and FNN Realization

955

4 Experiment Generally speaking, covariance ¦ x of data matrix x is unknown. In practical application, ¦ x is estimated by covariance matrix (S) of sample. In order to eliminate the influence of different unit of every feature, data matrix is standardized. In this moment, covariance matrix (S) of sample is correlation coefficient matrix (S). That is S=R. According to measured data, by computer computation, we can get correlation matrix, R. According to R, applied Jacobi method, 9 eigenvalues of R can be gotten and listed in table 1. Table 1. Computing results No.

λi

ρi

I( λi )

1 2 3 4 5 6 7 8 9 Total

5.1639 1.4888 0.9769 0.8544 0.2400 0.1623 0.0787 0.0228 0.0121 9.0000

0.5738 0.1654 0.1085 0.0949 0.0267 0.0180 0.0087 0.0025 0.0013

0.4598 0.4294 0.3477 0.3224 0.1396 0.1043 0.0595 0.0216 0.0125

FCR( λi ) 24.2408 22.6381 18.3309 16.9970 7.3598 5.4987 3.1369 1.1388 0.6590

AFCR( λi ) 24.2408 46.8789 65.2098 82.2068 89.5666 95.0653 98.2022 99.3410 100.0000

From table 1, if we take m =4, then through DKLT, the information content contained by data vector, y, is above 82% of the total information content. After x being compression by DKLT, i.e. y i = t i′ x, (i = 1,2,3,4) , therefore, 9-dimensional (9D) feature vector ( x ) is compressed 4-dimensional (4-D) vector ( y ) reserved 82% of the total information content of x , and reach the aim of information feature optimization compression.

5 Conclusions Based on DKLT, according to idea of entropy function in information theory, we propose two new concepts of EDF and RE, by which we can metricize information content of information matrix X. An optimization feature compression method is put forward, and information compression degree is measured by FCR and AFCR proposed by the authors. At the same time, We give an adaptive FNN and improved adaptive FNN to realize optimization feature compression. We believe that this method can provide a good basis in theory for the researchers of Biology, Biomathematics, Ecology, Bioinformation Science and so on.

Acknowledgements This work is supported by the National Natural Science Foundation of China under grant no.60435010 and no.40574001, the China Postdoctoral Science Foundation

956

S. Ding et al.

under grant no.2005037439, the Key Laboratory Opening Foundation of the CropBiology of Shandong Province under grant no.20040010, and the Doctoral Science Foundation of Shandong Agricultural University under grant no.23241.

References 1. Duda, R.O., P.E. Hart, P.E. (eds.): Pattern Classification and Scene Analysis. Wiley, New York (1973) 2. Devroye, L., Gyorfi, L., Lugosi, G. (eds.): A Probabilistic Theory of Pattern Recognition. Springer-Verlag, New York (1996) 3. Fukunaga, K. (ed.): Introduction to Statistical Pattern Recognition. Academic Press (Second Edition), New York (1990) 4. Ding, S.F., Shi Z.Z.: Studies on Incidence Pattern Recognition Based on Information Entropy. Journal of Information Science 31(6) (2005) 497-502 5. Yang, J., Yang, J.Y.: A Generalized K-L Expansion Method That Can Deal with Small Sample Size and High-dimensional Problems. Pattern Analysis Applications 6(6) (2003) 47-54 6. Zeng, H.L., Yu, J.B., Zeng, Q.: System Feature Reduction on Principal Component Analysis. Journal of Sichuan Institute of Light Industry and Chemical Technology 12(1) (1999) 1-4 7. Zeng, H.L., Yuan, Z.R.: About A New Approach of Selection and Reduction on System Feature in Pattern Recognition. Journal of Sichuan Institute of Light Industry and Chemical Technology 12(4) (1999) 1-5 8. Ji, X.J. and Li, S.Z., Li, T.: Application of the Correlation Analysis in Feature Selection. Journal of Test and Measurement Technology 15(1) (2001) 15-18 9. Shannon, C.E.: A Mathematical Theory of Communication. Bell Sys. Tech. Journal 27 (1948) 379-423. 10. Jiang, D. (ed.): Information theory. The Publishing House of China University of Science and Technology, Hefei (1987)

Paleolithic Stone Relic Analysis Using ARD* Bum Ju Lee1, Heon Gyu Lee1, Keun Ho Ryu1, Moon Haeng Huh2, and Jong Yun Lee3 1

Database and Bioinformatics laboratory, Chungbuk National University, South Korea {bjlee, hglee, khryu}@dblab.chungbuk.ac.kr 2 Department of Digital Media Engineering, Anyang University, South Korea [email protected] 3 Department of Computer Education, Chungbuk National University, South Korea [email protected]

Abstract. Data mining is a technique to discover useful information from various and large data, and has opened up exciting opportunities for exploring and analyzing data. This technique is currently being used profitably by various research fields. A common approach of information discovery is to identify the association rules, revealing the relationship among different items. In this paper, we use this approach to analyze a large data containing paleolithic stone relics. Our purpose is to acquire the association rules for indicating relationship between each Paleolithic stone relic. We hope that this work can present more information about the relationship between each stone relic information.

1 Introduction The archaeologists excavate the historic site and analyze data from the excavating site. They study about the past affairs and circumstances unknown to the world. The data of stones, plants, bones and geological features, which came from excavating and exploring the surface of the earth can provide information of the past. So, the archaeologists have excavated and recorded the archaeological vestiges. One of the tasks that archaeologists are confronted with is to understand the complicate relationships among the artifacts and geological layers. So, archaeologists perform statistical process to find the hidden relationship about animal born, stone, soil stratum, and so on. Therefore, we want to find the relationship to apply ARD(association rule discovery) to each stone relic data. Recently, many researchers use a data mining for analyzing and predicting the various data such as biology, archeology, business and medical science, etc. Data mining is a technology that blends traditional data analysis methods with sophisticated algorithms processing large volumes of data. Also, data mining, which is referred to as knowledge discovery in databases, means a process of nontrivial extraction of implicit, previously unknown and potentially useful information from data. There are many other terms appeared in some articles and documents, carrying a *

This research is supported by the Regional Research Centers Program of Ministry of Education & Human Resources Development in Korea, and supported by the RRC program of MOCIE and ITEP.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 957 – 962, 2006. © Springer-Verlag Berlin Heidelberg 2006

958

B.J. Lee et al.

similar or slightly different meaning, such as data archeology, knowledge extraction, knowledge mining from databases, data dredging, data analysis, and so on. Data mining has opened up the exciting opportunities for exploring and analyzing new types of data and for analyzing old types of data in new ways. Researchers in many different fields, including database systems, knowledge-base systems, artificial intelligence, machine learning, knowledge acquisition, statistics, spatial databases, and data visualization, have shown great interest in data mining [1], [2], [5]. The focus of this paper is on the application of data mining to archeology data. In this paper, we carry out the relationship analysis between each paleolithic stone relics excavated in Suyanggae in Korea using the association rule, one of the various data mining methods. For stone relic data analysis, we perform data preprocessing, create the association rule, interpret the rules and evaluate the created rules.

2 Basic Concepts Data mining is integral part of knowledge discovery in databases(KDD), which are the overall process of converting raw data into useful information. Core data mining tasks consist of classification, association analysis, cluster analysis and anomaly detection. Classification is the task of assigning objects to one of several predefined categories. Examples include detecting spam email messages based upon the message header and content, categorizing cells as malignant, and classifying galaxies based upon their shapes. Association analysis is useful for discovering interesting relationships hidden in large data sets. Cluster analysis groups data objects based only on information found in the data that describes the objects and their relationships. Anomaly detection is to find objects that are different from most other objects. Examples include fraud detection, intrusion detection, ecosystem disturbances, medicine, and so on. Our main goal is to analyze paleolithic relic information and to discover relationship between each relic because of finding interesting relationship about each attribute of relics. Therefore, we use the association rules to reach our main goal. The uncovered relationships can be represented in the form of association rules or sets of frequent items. The rule suggests that a strong relationship exists between items. Association rule is an implication expression of the form X->Y, where X and Y are disjoint items, that is, X ŀ Y = 0. The strength of an association rule can be measured in terms of its support and confidence. Support determines how often a rule is applicable to a given data set, while confidence determines how frequently items in Y appear in transactions that contain X [1], [2], [3], [4], [5], [8]. Through this mining algorithm, we hope to see the specific pattern about some archaeological relic feature using the association rule. For examples, firstly, which typology are excavated relics that are excavated in specific excavating region or partition involved? Secondly, what kind of raw material is relic that is involved in special topology made up of? Therefore, archaeologists could use these rules to analyze the new excavated relics and the past natural environment research.

Paleolithic Stone Relic Analysis Using ARD

959

3 Association Rule Algorithm This section describes Apriori-T algorithm in detail and data set. We then analyze stone relics using this algorithm and that data set. 3.1 Apriori-T Algorithm The most well-known ARD algorithm that makes use of the downward closure property is Apriori algorithm used hash tree data structure. However, Apriori can equally well be implemented using alternative structures such as set enumeration trees. Set enumeration trees impose an ordering on items and then enumerate the itemsets according to this ordering. If we consider a data set comprised of just three records with combinations of six items: {1, 3, 4}, {2, 4, 5}, and {2, 4, 6}, and a very low support threshold, then the tree would include one node for each large with its support count. Top level of the tree records the support for 1-itemsets, the second level for 2-itemsets, and so on. So, we say that Apriori algorithm combines the classic Apriori ARD algorithm with the T-tree data structure. The implementation of this structure can be optimized by storing levels in the tree in the form of arrays, thus reducing the number of links needed and providing direct indexing. Therefore, T-tree offers two initial advantages over standard set enumeration trees: first, fast traversal of the tree using indexing mechanisms, and second, reduced storage, in that itemset labels are not required to be explicitly stored, and thus no sibling references are required [6], [7]. We use Apriori-T algorithm for the frequent pattern analysis of paleolithic stone relic data. Also, we noticed that the frequent patterns containing both itemsets and dimensions are very useful as well. Therefore, we use multi-dimensional association rules. 3.2 Data Set We analyzed 15,316 pieces of stone relics excavated in Suyanggae in Korea [9]. Through analysis of this data, we select some attributes applied the real association rules. Table 1 shows analysis data such as a percentage of missing value and a number of distinct labels per each attribute. Table 1. Features according to the attribute analysis

Percentage of missing values Number of distinct labels

Partition1

Partition3

Soil strat um

Raw materi al

Typology

12%

14%

0%

2%

1%

10

41

3

13

45

Partition attributes include many missing values because in those days of a excavating term researchers didn’t describe all values exactly. Therefore, we use mark? for missing values. Partition and typology contain many distinct labels. For example,

960

B.J. Lee et al.

typology includes 45 labels such as blanks, anvils, scrapers, tanged points, microblade cores, hammer stones, blade, refitted flakes, and so on. In preprocessing step, we need discretization because of transforming attributes into a suitable form and increasing the speed of algorithms. We used an approach of unsupervised discretization [2], [10].

4 Generating ARs from Paleolithic Stone Relic Data The support threshold in the frequent pattern mining algorithm represents a tradeoff between the sensitivity and the number of false positives that can be generated by the detection algorithm. When the threshold is higher, while fewer patterns are generated, the resulting patterns are likely to have higher statistical significance. On the other hand, if the threshold is lower, the sensitivity is higher since good patterns with lower statistical significance can be detected. After many experiments, we chose a support threshold of 10 and confidence of 60. Table 2. Association rules obtained Number of Rules 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Rule

Confidence

typology=blade -> raw material=shale Soil stratum=disturbance, typology=flake -> raw material =shale typology=flake -> raw material=shale Soil stratum=upper paleolith, typology=flake -> raw mate rial=shale parition1=H, typology=flake -> raw material=shale partition1=H, soil stratum=upper paleolith, typology=flak e -> raw material=shale Partition1=B -> raw material=shale Partition1=H -> raw material=shale Partition1=H, soil stratum= upper paleolith -> raw materi al=shale Partition1=H -> soil stratum= upper paleolith Partition1=H, raw material=shale -> soil stratum= upper paleolith Partition1=H, typology=flake -> soil stratum= upper pale olith Soil stratum=disturbance -> raw material=shale partition1=H, raw material=shale, typology=flake -> soil stratum= upper paleolith Soil stratum=disturbance -> raw material=shale partition1=H, typology=flake -> soil stratum= upper pale olith, raw material=shale partitition1=H -> soil stratum= upper paleolith, raw mater ial=shale

96.05(%) 90.99 90.40 90.33 90.19 89.61 89.53 88.93 88.44 88.23 87.75 87.30 87.29 86.73 86.17 78.23 78.04

Paleolithic Stone Relic Analysis Using ARD

961

Rule 1 in table 2 explains that if typology is blade, then raw material is shale including confidence of 96.05%. Rule 2 explains that if soil stratum is disturbance and typology is flake, then raw material is shale including confidence of 90.00%. Also, rule 14 presents that if partition 1 is H, raw material is shale and typology is flake, then soil stratum is upper paleolith including confidence of 86.73%. Using these patterns, we know that the relic’s raw material of specific excavating site mostly composes shale. This means that shale in past specific region widely existed in the environment. Therefore, an archeologist uses this pattern to analyze and predict relic information.

5 Experimental Results The raw data consists of 15, 316 records. Fig. 1 shows experimental results. The figure presents the number of frequent itemsets and rules through fixing confidence of 60 and increasing support value two by two. In case of setting up support value of 2, the number of frequent itemsets generated is 154 and number of rules generated is 120. In case of setting up support value of 14, 16 and 18, the numbers of frequent itemsets and rules generated are 15 and 10 respectively. In case of setting up support value of 30, the numbers of frequent itemsets and rules generated are 7 and 3 respectively. Rules

Frequent itemsets

Number of itemsets and rules

Support Values

Fig. 1. Number of frequent itemsets and rules according to support values

In the analysis of excavated relics, archeologists sometimes do not know the typology of some relics so they are very small or not described typology name because of the excavator's mistake. Especially, in the animal bone analysis, bones which are very small named fragment or scrap by analyzer because DNA information of the animal's life does not exist. Therefore, in this case archeologists can use generated rules through above experiment for extracting the unknown relic information.

962

B.J. Lee et al.

6 Conclusion and Future Work In this paper, we analyzed data on artifacts which were the paleolithic research results in Suyanggae on Jungwon region in the Republic of South Korea. And we carried the relationship analysis between each paleolithic stone relic data using ARD. We acquired the association rules that present the relationships between each relic data. Through this experimentation, we knew the relationship among partition1, parititon2, soil stratum, typology and raw material. Such a study may provide useful information concerning the specific stone relics. The future work is designing and implementing various time conversion operators that make it possible to reason the temporal data. And, we will study the integrated relationship about stone relics, animal bone, excavating sites and geological layer features.

References 1. Chen, M.-S., Han, J., Yu, P.S.: Data Mining: An Overview from Database Perspective. IEEE Transactions on Knowledge and Data Engineering, Vol. 8 (1996) 866-883 2. Yun, H., Ha, D., Hwang, B., Ryu, K.H.: Mining Association Rules on Significant Rare Data using Relative Support. Journal of Systems and Software, Vol. 69 (2003) 181-191 3. Aggarwal, C.C., Yu, P.S.: Mining Large Itemsets for Association Rules. Data Engineering Bulletin, Vol. 21 (1998) 23-31 4. Agarwal, R., Shafer, J.C.: Parallel Mining of Association Rules. IEEE Transactions on Knowledge and Data Engineering, Vol. 8 (1996) 962-969 5. Doddi, S., Marathe, A., Ravi, S.S., Torney, D.C.: Discovery of Association Rules in Medical Data. Medical Informatics and the Internet in Medicine, Vol. 26 (2001) 25-33 6. Coenen, F., Leng, P.H., Ahmed, S.: Data Structure for Association Rule Mining: T-Trees and P-Trees. IEEE Transactions on Knowledge and Data Engineering, Vol. 16 (2004) 774-778 7. Coenen, F., Goulbourne, G., Leng, P.: Tree Structures for Mining Association Rules. Data Mining and Knowledge Discovery Vol. 8 (2004) 25-51 8. Bench-Capon, T., Coenen, F., Leng, P.: An Experiment in Discovering Association Rules in the Legal Domain. 11th International Workshop on Database and Expert Systems Applications (2000) 1056-1060 9. Lee, Y.-J., Yoo, J.-Y., Kong, S.: Suyanggae Tranged-point in Korea. International Symposium on Palaeoanthropology, Commemoration on the 70th Anniersary of the Discovery of the First Skull of Peking Man at Zhoukoudian (1999) 10. Dougherty, J., Kohavi, R., Sahami, M.: Supervised and Unsupervised Discretization of Continuous Features. International Conference on Machine Learning (1995) 192-202

Pei-Radman Fusion Estimation Algorithm for Multisensor System Applied in State Monitoring Xue-bo Jin1 and You-xian Sun2 1

College of Informatics and Electronics, Zhejiang Sci-Tech University, Hangzhou, 310018, China [email protected] 2 Institute of Modern Control Engineering, Zhejiang University, Hangzhou, 310027, China

Abstract. More exact state estimation can be obtained by multisensor system. But measurement noises from different sensors are generally correlated when multisensor system applied in state monitoring. By solving the maximum eigenvalue of measurement covariances matrix, a Pei-Radman fusion estimation algorithm is developed for the practical state monitoring. The numerical example shows that the fusion method here is very simple and can obtain an optimal estimation in this special multisensor fusion system.

1 Introduction In the practice, lots of signals are needed to constitute the controllable loop of automatic control. However, some required states can’t be online-measured because of the limit of sensor technique. Each kind of sensor has its sphere of application and exact degree. So it’s very difficult to know exactly and completely the true conditions of the installation by single sensor. In the practice, many sensors are employed simultaneously to the industrial processing in order to meet some precise-production-demands. Be subjected to the restraint of sensor technique and the manufacturing cost, people often choose the same sensor to obtain redundancy messages of signals, by which the more exact estimation can be obtained [1] . However, due to the usage of same sensor, a close distance of different sensors and a jam or uncertain source in measurement environment, the measurement noises from different sensors are ordinarily correlated. Moreover, they often has the same correlation covariance[2] . The studies of [3,4] have shown that the performance of estimation results will decease greatly if the correlation can’t be correctly handled. [3, 4] use similarity transformation to decouple the correlated noise covariance and develop the optimal state fusion estimation algorithm. But it seemed so complex because they didn’t consider the peculiarity of multisensor system used for the condition monitoring of production process, i.e., people often use the same sensor. This paper studies a practical state monitoring system, in which the same sensors are employed and the multisensor system has the same measurement matrix, the same covariance of measurement noise and the same correlation covariance. The structure D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 963 – 968, 2006. © Springer-Verlag Berlin Heidelberg 2006

964

X.-b. Jin and Y.-x. Sun

of this paper is as the following: Section 2 gives the model of multi-senor system to be considered. The Pei-Radman fusion estimation is developed and the performance of the algorithm is discussed in Section 3. Finally, a simulation is given in Section 4.

2 System Model A state –space model of the form x(k + 1) = Ax(k ) + Bw(k )

(1)

yi (k ) = cx(k ) + vi (k )

(2)

is considered, where x(k ) ∈ R is the state of the system to be estimated and whose n

initial mean and covariance are known as x0 and P0 . w(k ) ∈ R h and vi (k ) ∈ R m are white noise with zero mean and independent of the initial state x(0), m E {w ( k ) w T ( j )} = Q . yi (k ) ∈ R is the measurement vector of the ith sensor, i = 1,2,3, N .Obviously, all the measurement matrices are equivalent. We define the generalized measurement equation is: ~ y (k ) = Cxˆ (k ) + v(k ) where y(k) = [ y1T (k), y2T (k),, yNT (k)]T ~ C = [c T , c T , , c T ]T

v(k ) = [v1T (k ), v2T (k ),, vTN (k )]T The measurement covariances of every sensor are equivalent and correlation covariances are also equivalent, noted as d and s , respectively. We have the covariance matrix §d s s · ¸ ¨ ~ ¨s ¸ T R = E {v(k ) v ( j )} = ¨ s¸ ¸ ¨ ¨s s d¸ ¹ ©

3 Pei-Radman Fusion Estimation N

xˆλ (k | k ) = xˆλ (k | k − 1) + ¦ K λ i (k )[ yi (k ) − cxˆλ ( k | k − 1)] i =1

xˆλ (k | k − 1) = Axˆλ (k − 1 | k − 1) K λ (k ) = [ K λ 1 (k ), K λ 2 (k ), , K λ N (k )]

~ 1 (R ) K λ i (k ) = Pλ (k | k )c T λ−max

(3)

Pei-Radman Fusion Estimation Algorithm

965

the estimation covariance is ~ −1 −1 1 Pλ (k | k ) = Pλ (k | k − 1) + N * c T λ−max ( R )c

(4)

Pλ (k | k − 1) = APλ (k − 1 | k − 1) A T + BQB T

(5)

~ ~ where λmax ( R ) is the maximum eigenvalue of R . We have the following theorem to present the performance of the Pei-Radman fusion estimation algorithm.

Theorem 1. Pei-Radman fusion algorithm is optimal in the multisensor system (1) and (2). Proof We have the optimal centralized fusion estimation algorithm for the multisensor system (1) and (2) as the following [5]. ~ (6) xˆ (k | k ) = xˆ (k | k − 1) + K (k )[ y (k ) − C xˆ (k | k − 1)]

xˆ(k | k − 1) = Axˆ(k − 1 | k − 1) ~ ~ K (k ) = P(k | k )C T R −1

(7) (8)

the estimation covariance is ~ ~ ~ P −1 ( k | k ) = P −1 ( k | k − 1) + C T R −1 C

(9)

P(k | k − 1) = AP(k − 1 | k − 1) A T + BQB T (10) ~ Define Rλ = λmax ( R ) ∗ I , where I is an identity matrix. We can rewrite the PeiRadman fusion estimation algorithm as the following by the generalized measurement equation ~ (11) xˆλ (k | k ) = xˆ λ (k | k − 1) + K λ (k )[ y (k ) − C xˆλ (k | k − 1)] xˆλ (k | k ) = Axˆλ (k − 1 | k − 1)

(12)

~ K λ (k ) = Pλ (k | k )C T Rλ−1

(13)

where

the fusion estimation covariance is ~ ~ −1 −1 Pλ (k | k ) = Pλ (k | k − 1) + C T Rλ−1C

(14)

Pλ (k | k − 1) = APλ (k − 1 | k − 1) A T + BQB T

(15)

Now we consider the relation between the (6)-(10) and (11)-(15). From [6], we can ~ conclude that covariance matrix R is a Pei-Radman matrix, the maximum eigenvalue ~ of R~ is λmax ( R ) = d + ( N − 1) ⋅ s . When d and s satisfy d ≠ s, d ≠ − s + 1 , the inverse ~ ~ matrix of R is R −1 = [aij ] , where

966

X.-b. Jin and Y.-x. Sun

d + ( N − 2) ⋅ s °° d [d + ( N − 2) ⋅ s ] − ( N − 1) ⋅ s 2 i = j aij = ® −s ° i≠ j 2 ¯° d [d + ( N − 2) ⋅ s ] − ( N − 1) ⋅ s

(16)

i, j = 1, 2, N Then we have

ª a11 a1N º ªc º c ]«« »» «« »» = «¬a N 1 a NN »¼ «¬c »¼

~ ~ ~ C T R −1C = [c T

T

¦c

T

[aii + ( N − 1)a ij ]c

i, j i≠ j

By (16), we can obtain a ii + ( N − 1)a ij =

1 1 d + ( N − 2) s − ( N − 1) s = = ~ d [d + ( N − 2) s] − ( N − 1) s 2 d + ( N − 1) s λ max ( R )

i.e.,

~ ~ ~ C T R −1C = ¦ c T ⋅

1

º ª 1 ~ » ªc º « ( ) λ R » « » ~ T −1 ~ « max T c ]« » « » = C Rλ C 1 » «c » « ~ »¬ ¼ « λ max ( R ) ¼ ¬

T ~ ⋅ c = [c

λ max ( R )

Then based on (9) - (10) and (14) - (15), we can have Pλ (k | k ) = P(k | k ) , k = 1, 2, 3, ...... . By the same method, we can also obtain

~ ~ C T R −1 =

¦c

T

[ a ii + ( N − 1) a ij ] =

i, j i≠ j

~ = C T Rλ−1

¦c

T

⋅

1

~ = [c λ max ( R )

T

ª 1 º ~ « » « λ max ( R ) » T c ]« » 1 » « ~ » « λ max ( R ) ¼ ¬

~ ~ ~ i.e., Pλ (k | k )C T Rλ−1 = P(k | k )C T R −1 , we can have K (k ) = K λ (k ) by (8) and (13). ~ Then, it’s proved that with the same initial conditions, i.e., P (0 | 0) = Pλ (0 | 0) and xˆ (0 | 0) = xˆ λ (0 | 0) , the optimal centralized fusion estimation algorithm (6) - (10) and the Pei-Radman fusion estimation algorithm (11) - (15) will result in the same state ~ estimation xˆ (k | k ) and the same estimation covariance P (k | k ) . Therefore, we can conclude that the Pei-Radman fusion estimation algorithm is optimal. # Now we consider the steady Pei-Radman fusion estimation. If there exists a positivedefine solution P to the ARE

Pei-Radman Fusion Estimation Algorithm

967

~ 1 P = APAT − AP( P + ( N * c T λ−max ( R )c) −1 ) −1 PAT + BQB T the steady Pei-Radman fusion estimation

xˆ (k + 1) = A f xˆ (k ) + B f

N

¦ y (k ) i

i =1

is given by A f = A − NKc

Bf = K

~ 1 where K = Pc T λ−max (R) .

4 Simulation Simulation results are obtained using the following: ª0.9 1 º ª 12 º x(k + 1) = « x( k ) + « » w( k ) » ¬ 0 0.5¼ ¬1 ¼ y i ( k ) = cx ( k ) + vi ( k )

ª2 1 1 º ~ « where w( k ) = N (0, q ) , q = 1 , c = [1 0] , R = «1 2 1 »» . «¬1 1 2»¼ ~ We can obtain λmax ( R ) = 4 and the steady Pei-Radman fusion filter ª 0.3645 0.8222º Af = « » ¬− 0.1778 0.0409¼

ª 0.1789 0.3567 − 0.1789º Bf = « » ¬− 0.1406 0.3184 0.1406 ¼

The results of estimation are shown in Fig.1. The ‘thick’ lines represent the real state, the ‘thin’ lines represent the results of the steady Pei-Radman fusion filter and the ‘star’ lines represent the results of the optimal centralized fusion estimation algorithm [5]. We can know this two estimation methods are completely equivalent.

5 Conclusions The state monitoring of industrial equipment is especially important to process controlling, equipment protection and fault detection. In order to obtain the more exact estimation of state, producers frequently employ several sensors to carry on the online measurement by living balances relationship between quality and the beneficial result. In order to resolve the correlation of measurement sensors, this note develop a simple fusion estimation algorithm, which can obtain the optimal estimation performance in the state monitoring of the practical premise production.

968

X.-b. Jin and Y.-x. Sun

Fig. 1. Results of the steady Pei-Radman fusion estimation

Acknowledgements This research has been supported by Zhejiang Provincial Natural Science Foundation of China grants No. M603174, Zhejiang Provincial Education Office Foundation of China grants No. 20031166, Zhejiang Provincial Young Teacher Support Plan and PH.D Foundation of Zhejiang Sci-Tech University.

References 1. Jin, X., Sun, Y.: Optimal Fusion Estimation Covariance of Multisensor Data Fusion on Tracking Problem. Proceedings of the 2002 IEEE International Conference on Control Applications, Scotland. (2002) 1288–1289 2. Roy, S., Ronald A.I.: Decentralized Linear Estimation in Correlated Measurement Noise. IEEE Transaction on Aerospace and Electronic Systems. 6 (1991) 939–941 3. Jin, X., Sun, Y.: Optimal State Estimation for Data Fusion with Correlated Measurement noise. Journal of Zhejiang University (Engineering Science). 1(2003) 60–64 4. Jin, X., Sun, Y.: Optimal Centralized State Fusion Estimation for Multi-sensor System with Correlated Measurement noise. Proceedings of 2003 IEEE Conference on Control Applications, Istanbul. (2003) 770–772 5. He, Y., Wang, G.: Multisenor Information Fusion and Its Application, Beijing, Electronics industry press. (2000) 238–241 6. Cen, J., Cen, X.: Special Matrix, Qinghua University Press (2001) 390–397

Privacy Preserving Association Rules Mining Based on Secure Two-Party Computation Weimin Ouyang1,2 and Qinhua Huang2 1

2

Management Department, Shanghai University of Sport, 200438 Shanghai, China [email protected] School of Computer Engineering and Science, Shanghai University, 200072 Shanghai, China [email protected]

Abstract. Privacy-preserving data mining in distributed or grid environment is a hot research ﬁeld in recent years. We focus on the Privacypreserving association rules mining in the following situation: two parties, each having a private data set, wish to collaboratively discover association rules on the union of the two private data sets. Therefore, we put forward a novel approach for mining Privacy-preserving association rule based on secure two-party computation using homomorphic encryption technology.

1

Introduction

Data mining and knowledge discovery are hot research ﬁled concerning the joint of artiﬁcial intelligence, database and statistics. It is developed to ﬁnd previously unknown, potentially useful knowledge, rules or models [1] from large database. The presupposition of data mining and knowledge discovery is that the data is open to be used. But in real world it not necessary true. Some database may contain private information which should not be leaked out. Thus techniques of data mining without leaking the private information are needed. Research on privacy preserving data mining is for this purpose. Data mining and knowledge discovery contain many typical problems, including association rule mining, sequential patterning, classiﬁcation and clustering. Correspondingly the privacy preserving data mining and knowledge discovery should be developed aimed at these problems. In this paper we addressed the problem of the privacy preserving association rule mining on vertically distributed data. Our problem is described as follows: Assume two parties, Alice and Bob, have private data set, D1 and D2, respectively, where D1 and D2 are vertical distributed database, i.e., diﬀerent sites gather information about the same set of entities and collect diﬀerent feature sets. These two parties will execute a certain kind of association rule mining algorithm on D1 ∪ D2 without existing of a third party. It requires the two parties must not leak their respective private information during computation. Hence we propose a secure two-party computation protocol based on homomorphic cryptography [4] to address privacy preserving association rules mining. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 969–975, 2006. c Springer-Verlag Berlin Heidelberg 2006

970

W. Ouyang and Q. Huang

The remains of this paper is organized as follows: the related work. is described in section 2, we describe the sequential pattern mining algorithm in section 3. We propose a secure two-party scalar product protocols to discover association rules in section 4 . We make our conclusions and future works in section 5.

2 2.1

Related Work Secure Two-Party Computation

The secure two-party computation problem is ﬁrst proposed by Yao [5],which is extended to the secure multi-party computation(SMC) by O.Goldreich [6].The secure multi-party computation protocol based on the cryptograph secure model can compute arbitrary function in distributed network where each participant holds his inputs, while the participants do not trust each other, nor the channels by which they communicate with each other. However, the participants wish to correctly get the result of the function from their local inputs, while keeping their local data as private as possible. This is the problem of Secure Multi-party Computation (SMC). The security standard is the assumption that we have a trusted third party to whom we can give all data. The third party performs the computation and delivers only the ﬁnal results. It is clear that nobody could reveal anything not inferable from its own input and the results. The goal of secure protocols is to reach this same level of privacy preservation without the trusted third party . Each party provides his input which will keep private to others. For each party it is allowed to know its input, ﬁnal output and information can be deduced from them. If exists the trusted third party, the secure multi-party computation will be easy. They simply transfer their input to a trusted third party and get result from the third party respectively. Actually the trusted third party is often rarely easy to be found. Thus research on secure multi-party computation protocols is needed. The secure computation protocols are presented in the form of combined circuit with generality and simplicity in theory. As Goldreich [7] pointed out these protocols are ineﬃcient and not applicable for data mining, which needs great deal of data input. Thus we need to develop a simpliﬁed and easily implemented secure multi-party computation aimed at speciﬁed data mining task. 2.2

Privacy Preserving Data Mining

Since R.Agrawal and R.Srikant proposed privacy preserving data mining based on random data in 2000, privacy preserving data mining has get wide attention in the ﬁeld of data mining and knowledge discovery. Several techniques, including data perturbation, encryption, and secure multi-party algorithm, have been proposed in the literature, concerning data mining problems as association rule mining, classiﬁcation and clustering [7]. J.Vaidya [8] proposed an algorithm to get scalar product of vectors, in which its security is based on the inability of either side to solve k equations in more than k unknowns. R.Wrigh [9] proposed

Privacy Preserving Association Rules Mining

971

two-party cooperation Bayesian Networks induce based on Pailliers homomorphic cryptography. But attacks on the protocols are easy to be success. According to his protocols PartB could guess out if ai is 1 or 0 by comparing e(1) and e(ai )) when PartA sends him encrypted vector e(a1 ), . . . , e(an )) to PartB, for PartB knows public key e. In this paper we propose secure scalar product computation protocol will deal with this problem.

3 3.1

Privacy Preserving Association Rule Mining Association Rule Mining

The problem of mining association rules was originally proposed by R.Agrawal. Association rule mining tries to ﬁnd the association of customer shopping during a transaction, for example, 75% customer who buy A probably buy B. These rules will help the managers to optimize the stock, places of the goods and managements. In association rule mining we assumed a speciﬁed customer transaction database D, where each transaction is a set of items, such as customer ID, transaction date and products. Each customer generates no more than one transaction in the mean time. The product item attribute is binary, which denotes the item was sold or not. An itemset is a non-empty set of items. We say that a transaction support a itemset if the itemset is contained in a transaction. An itemset has support s in the transaction set if s% of transactions contain the itemset. The frequent itemset denotes the itemset with support no less than the threshold support. Assume a speciﬁed transaction database D, threshold support min suport, threshold conﬁdence min conf , the task of association rules mining is to ﬁnd out the frequent itemsets in the database and construct association rules based on min conf . 3.2

Problem Representation

we consider such a scenario: two parties,Alice and Bob,each party having a private vertically partitioned data set (denoted by D1 and D2 respectively). The two parties want to ﬁnd association rule on D1 ∪ D2. For the security of data, each party can not get any information about the other except the result of algorithm computation. During this course no third (trusted or non trusted) party exists. For the sake of simplicity and without loss of generality, we make the assumptions about D1 and D2 as follows: (1) D1 and D2 contain a same number of transaction records L; (2) The identiﬁer of ith transaction in D1 is the same as The identiﬁer of ith transaction in D2. But the itemsets are secret to each other. (Note that the assumptions can be easily satisﬁed by pre-processing the data sets D1 and D2,and such a pre-processing does not need one party to send its data set to other party) Alice and Bob hold private data set D1 and D2. These two parties try to ﬁnd out frequent itemsets with support no less than a speciﬁed threshold min support without leaking any private information. We say an itemset (A,B) is a frequent itemset if its support s in D1 ∪ D2 satisﬁes s ≥ min support.

972

4

W. Ouyang and Q. Huang

Approach to Privacy Preserving Association Rule Mining

The steps in association rule mining related to data access are: (1) counting of frequent 1-itemset. (2) counting of the support of candidate itemset C.Count. Because there is only one attribute (after having been mapped) contained, the frequent 1-itemset can be processed in one party, there is no need of data accessing over two parties. But for that the candidate itemset C contains 2 or more attributes, which may be distributed in two sites, the computation would requires data accessing operation over two parties. The secure computation problem of the support of candidate itemset C.Count without leaking any private information is exactly addressed in this paper. If the attributes in candidate itemset belongs to one party, the problem may be simpliﬁed as getting candidate support by directly computing the scalar product of corresponding attributes in candidate itemset. Else if they are distributed in two parties, the computation precondition of preserving private information must be well satisﬁed. 4.1

Homomorphic Property of the Encryption

We apply homomorphic encryption in the secure computation protocols of support counting of candidate itemset presented in this paper. A cryptosystem is homomorphic when it satisﬁes the property: e(a) ∗ e(b) = e(a + b). 4.2

Secure Two-Party Scalar Product Protocols

Assume C to be a k-itemset and Alice has m attributes A1 , A2 , . . . , Am and Bob have n attributes B1 , B2 , . . . , Bn , i.e., C = (A1 , A2 , . . . , Am , B1 , B2 , . . . , Bn ), where m ≥ 1, n ≥ 1, m + n = k. We denotes the j th attribute value of Ai as Aij , and j th attribute value of Bi as Bij . For simplicity, let vector ,m ,n X = (x1 , x2 , . . . , xL ), vector Y = (y1 , y2 , . . . , yL ), where Xj = i=1 Aij , Yj = i=1 Bij , (j = 1, 2, 3, . . . , L). To compute the support of candidate itemset C, Alice and Bob need to compute scalar product A1 ∗ A2 ∗ . . . ∗ Am ∗ B1 ∗ B2 ∗ . . . ∗ Bn = X ∗ Y . We have: C.Count = X ∗ Y . In the following we will describe in detail how the two parties Alice and Bob securely compute scalar product X ∗ Y in the condition of preserving respective private information. Based on the computation properties of scalar product, we have P (X) ∗ P (Y ) = X ∗ Y , where P is an arbitrary permutation between X and Y. If Bob sends vector P(Y) to Alice, as the permutation P and vector Y are only known by Bob, the probability for Alice guessing out a certain element order is 1/n and the probability for guessing out all elements order is 1/n!. But for that Alice knows X and P(X), Alice can deduce the permutation P. In order to avoid this problem, we must let Alice know P(X+R) instead of P(X), where R is a random vector generated by Bob and is known only by Bob. For the randomness

Privacy Preserving Association Rules Mining

973

of vector X+R, the probability of Alice guessing out permutation P is 1/n!. To securely get scalar product X ∗ Y Bob sends P(X+R) and P(Y) to Alice, then Alice computes P (X + R) ∗ P (Y ) and sends the result to Bob. Bob computes P (X + R) ∗ P (Y ) − R ∗ Y = P (X) ∗ P (Y ) = X ∗ Y . To further reduce the probability of Alice guessing out a certain elements in Y from P(Y), we randomly partition vector Y into m parts, U1 , . . . , Um , where Y = U1 + . . . + Um . Bob randomly generates a permutation Pi and send Pi (Ui )(i = 1, 2, . . . , m) to Alice. Thus the probability of Alice guessing out the order of a certain element is reduced to only 1/nm . The remained problem is how to make Alice only know P(X+R) without knowing P and R. We propose the following secure vector permutation protocols. Protocol 1: Secure Two-party Vector Permutation Protocols Input: Alice has private vector X. Bob has permutation P and random vector R. Output: Alice gets P(X+R). Begin: 1. Alice generates homomorphic encryption public key and private key pair (e,d), where e(.) is encryption and d(.) is decryption. Then Alice sends public key e to Bob. 2. Alice encrypts vector X with public key e, e(X) = (e(x1 ), . . . , e(xL )) and sends e(X) to Bob. 3. Bob encrypts vector R with public key e, e(R) = (e(r1 ), . . . , e(rL )) and computes e(X) ∗ e(R) = e(X + R). Then Bob performs random permutations on e(X+R) and gets P(e(X+R)). Send P(e(X+R)) to Alice. 4. Alice decrypts the P(e(X+R)) with private key d: d(P(e(X+R))) = P(d(e(X+R))) = P(X+R). End Based on protocol 1 we propose following secure two-party scalar product computation protocol. Protocol 2: Secure Two-party Scalar Product Protocol Input: Alice has private vector X. Bob has private vector Y. Output: Alice gets X ∗ Y . Begin: 1. Bob executes: + . . . + Um . (a) Bob randomly partitions X into m parts, where Y = U1' m (b) Bob generates m random vectors R1 , . . . , Rm . Let w = i=1 Ui Ri . (c) Bob generates m random permutations P1 , . . . , Pm . 2. Alice partitions X into m parts accordingly, where X = V1 + . . . + Vm . 3. For each i = 1, . . . , m, Alice and Bob executes: (a) By performing protocol 1, Alice gets P (Vi + Ri ). (b) Bob sends Pi (Ui ) to Alice. (c) Alice computes s' ∗ P (Vi + Ri ) = Vi ∗ Ui + Ui ∗ Ri . i = Pi (Ui ) ' m 4. Alice computes S = m i=1 si = i=1 Ui ∗ Vi = X ∗ Y + w and send S to Bob. 5. Bob computes S = S − w = X ∗ Y + w − w = X ∗ Y . Bob sends S to Alice. End

974

4.3

W. Ouyang and Q. Huang

Security Analysis

Applying above protocols, all information Alice can get from Bob is Pi (Ui ) and P (Vi + Ri )(i = 1, . . . , m) besides the ﬁnal result of scalar product. Since Alice don’t know the random vector generated by Bob, permutation Pi (i = 1, . . . , m) wont be revealed to Alice, neither Ui nor Y. Because the probability of Alice guessing out a element in Y is 1/nm and she must perform guessing on all Pi (Ui ) (i = 1, . . . , m), the probability of Alice retrieving all elements in Y is (1/n!)m. Except for the ﬁnal result of scalar product, all of Bob can get from Alice is e(X). Because Bob dont know private key,he cant reveals X. So, Alice and Bob can’t know the other’s private vector except the ﬁnal result of scalar product. For permutation just changed the order of the elements, rather than the value of element, Alice could reveal the sum of elements in Y after she received all Pi (Ui )(i = 1, . . . , m) by the protocols. This is a shortage of the protocols, which need to be overcome in future work. 4.4

Computation and Communication Analysis

The communication cost is 4mnd, where m is security parameter, n is the number of elements in vector and d is the number of binary bits needed to represent any number by cryptosystem. The computation cost is divided into 7 parts: (1) generation of Homomorphic encryption public key pair. (2) generation of m random permutations.(3) 2L encryptions (4)one decryption (5) 3L multiplications (6) 3L additions (7) one subtraction.

5

Conclusions and Future Work

A secure two-party computation protocols for privacy preserving association rule mining based on homomorphic cryptography is presented in this paper. Performing the protocols each party do not need to transfer their data to the trusted third party. Because applying homomorphic cryptographic techniques in secure two-party computation to deal with across parties computation problem, we can guarantee that each party can maintain its data privacy. As a future work, we will study the measurement method for quantitative analysis of the security of secure two-party computation protocols in privacy preserving. The secure two-party clustering and classiﬁcation also can be developed using protocols presented in this paper.

References 1. Usama Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth.: From Data Mining to Knowledge Discovery in Databases, AI Magazine17(3), Fall (1996) 37-54 2. Rakesh Agrawal, Ramakrishnan Srikant.: Privacy-Preserving Data Mining. ACM SIGMOD (2000)439-450

Privacy Preserving Association Rules Mining

975

3. Rakesh Agrawal, Ramakrishnan Srikant.: Fast Algorithms for Mining Association Rules in Large Databases. VLDB (1994) 487-499 4. Paillier,P.: Public-key Cryptosystems Based on Composite Degree Residuosity Classes. IIn Advances in Cryptography-EUROCRYPT99, Prague, Czech Republic,May (1999) 223-233 5. Yao,A.C.: Protocols for Secure Computations. In Proc. Of the 23rd Annual IEEE Symposium on Foundations of computer Science (1982) 6. Goldreich,O.: Secure Multi-party Computation (working draft). http://www. wisdom.weizmann.ac.il/ oded/pp.html (1998) 7. Verykios,V.S., Bertino,E., Fovino,I. N., Provenza,L. P. , Saygin,Y.: Theodoridis State-of-the-art in Privacy Preserving Data Mining. In SIGMOD Record, (2004)http://www.sigmod.org/sigmod/record/issues/0403/ B1.bertion-sigmod-record2.pdf 8. Vaidya,J., Clifton,C.W.: Privacy Preserving Association Rule Mining in Vertically Partitioned Data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July (2002)23-26, Edmonton,Alberta, Canada 9. Wright,R., Yang, Z.: Privacy-preserving Bayesian Network Structure Computation on Distributed Heterogeneous Data. IIn Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) (2004) 713-718

QuickBird Panchromatic and Multi-Spectral Image Fusion Using Wavelet Packet Transform Wenjuan Zhang1 and Jiayin Kang2 1

Department of Computer Science, HuiHai Institute of Technology, 222005 Lianyungang, P.R. China [email protected] 2 School of Information Engineering, University of Science and Technology Beijing, 100083 Beijing, P.R. China [email protected]

Abstract. In order to make use of the high spatial information of the QuickBird panchromatic (Pan) images and the essential spectral information of the QuickBird multi-spectral (MS) images, this paper presented a image fusion approach for combining QuickBird Multi-Spectral images with Panchromatic images using wavelet packet transform (WPT). Experimental results shown that the WPT approach performs eﬀectively both in improving the spatial information and in preserving the spectral information.

1

Introduction

QuickBird satellite, launched in October 18, 2001, can currently be acquired in two diﬀerent modes: the panchromatic mode with high spatial resolution of 0.61m and multi-spectral mode with a four times coarser ground resolution. To make use of QuickBird images, it is important to combine QuickBird MS image and Pan image in the same geographic spatial environment. Digital image fusion is a process of combining a lower spatial resolution digital image with a higher spatial resolution digital image to yield higher spatial resolution in the former image [1] , [2] . Image fusion technique is important for a variety of remote sensing application. Various methods for image fusion have been described earlier [3]. According to their eﬃciency and implementation, intensity-hue-saturation (IHS) method, principal component analysis (PCA) and Brovey transform (BT) are the most commonly used algorithms in remote sensing community. However, the problem of color distortion appears at the analyzed area after transformed by using these fusion methods [3]. In recent years, the wavelet transform (WT) with multi-resolution decomposition is a relatively popular approach. However, WT decomposes only at low frequency. Therefore, it preserves more spectral information but looses more spatial information [4]. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 976–981, 2006. c Springer-Verlag Berlin Heidelberg 2006

QuickBird Panchromatic and Multi-Spectral Image Fusion

977

Compare to WT, wavelet packet transform (WPT) decomposes simultaneously at both high frequency and low frequency [5]. That is, besides decomposes low frequency in a higher scale, WPT decomposes high frequency band in a higher scale too. In this paper, the WPT is applied to QuickBird remote sensing image fusion. Experimental results shown that the WPT performs eﬀectively in QuickBird images fusion.

2

Remote Sensing Image Fusion Using Wavelet Packet Transform

2.1

The Theory of Wavelet Packet Transform

The WPT is a generalization of wavelet decomposition that oﬀers a richer range of possibilities for signal analysis [6]. In wavelet analysis, a signal is split into an approximation and a detail. The approximation is then itself split into a second-level approximation and detail, and the process is repeated. For the n-level wavelet decomposition, there are n+1 possible ways to decompose or encode the signal. Fig. 1 is an example of 2D orthogonal discrete wavelet transform with three level decomposition. Basically, the wavelet decomposition is applied recursively to the scaling space only as follow: V0 = V1 ⊕ W1 = V2 ⊕ W2 ⊕ W1 = V3 ⊕ W3 ⊕ W2 ⊕ W1 = · · · .

(1)

Where ⊕ denotes the union of space (like the union of the sets).

Fig. 1. Wavelet decomposition structure

In multiresolution analysis, L2 (R) =

j∈Z

Wj show that Hilbert space L2 (R)

was decomposed into all orthogonal sum of subspaces (Wj (j ∈ Z)) according to diﬀerent scaling factor j, where Wj is the closure of wavelet basis function

978

W. Zhang and J. Kang

φ(t)(wavelet subspace). Wj can be further decomposed according to binary frequency, such that frequency resolution was improved. In order to unify scaling subspace Vj and wavelet subspace Wj , the notations of the new subspace Ujn is introduced as follows: Uj0 = Vj ,

Uj1 = Wj ,

j∈Z.

Hence, the orthogonal decomposition of the Hilbert space L2 (R) can be uniﬁed by new subspace Ujn as follows: 0 Uj+1 = Uj0 ⊕ Uj1

j∈Z.

(2)

Deﬁne subspace Ujn is the closure space of function Wn (t), and Uj2n is the closure space of function W2n (t). Meanwhile, Wn (t) satisﬁes the following bi-scaling equation: √ $ g(k)Wn (2t − k) W2n (t) = 2 k∈Z

√ $ W2n+1 (t) = 2 h(k)Wn (2t − k) .

(3)

k∈Z

In EQ. 3, g(k) and h(k) are the low-pass ﬁlter coeﬃcient and high-pass ﬁlter coeﬃcient respectively, and they satisfy the following relationship: h(k) = (−1)k g(1 − k) . While n = 0 , EQ. 3 be transformed to: √ $ W0 (t) = 2 g(k)W0 (2t − k) k∈Z

√ $ h(k)W0 (2t − k) . W1 (t) = 2

(4)

k∈Z

Compare to the bi-scaling equation required by scaling function and to wavelet basis function in multiresolution analysis $ g(k)ϕ(2t − k) ϕ(t) = k∈Z

φ(t) =

$

h(k)ϕ(2t − k) .

(5)

k∈Z

It is clear that W0 (t) and W1 (t) degenerate into scaling function ϕ(t) and wavelet basis function φ(t) respectively. The sequences {Wn (t) n ∈ Z} is called orthogonal wavelet packet which determined by basis function W0 (t) = ϕ(t) [7]. As shown in Fig. 2, wavelet packet decomposition can be described by a quadtree.

QuickBird Panchromatic and Multi-Spectral Image Fusion

979

Fig. 2. Wavelet Packet decomposition chart

2.2

Image Fusion Scheme Using Wavelet Packet Transform

Fig. 3 illustrates the operation ﬂow for fusing the Quickbird Pan image and Ms image by using WPT. The detailed procedures described as follows: First, three new Pan images are produced according to the histogram of R, G, B bands of MS image respectively. Second, WPT was conducted for three both produced Pan images bands and Ms images bands. Furthermore, compare the low frequency components of the decomposed Pan images bands (R, G, B) with that of the decomposed MS images bands (R, G, B) respectively, and then the bigger low frequency was selected as new Pan images bands (R, G, B) respectively. Third, an inverse wavelet packet transform (IWPT) is applied to each of the new Pan images band containing the local spatial details and one of the Ms bands. After three times of IWPT, the high spatial resolution details from the Pan image are injected into the low resolution MS bands resulting in fused high resolution MS bands.

Fig. 3. The ﬂowchart of QuickBird image fusion based on WPT

3

Experiments and Results Evaluation

The approach presented in this paper was applied to QuickBird image database consisting of 6 areas. One of the images fusion was shown in Fig 4.

980

W. Zhang and J. Kang

Fig. 4. The result of image fusion between QuickBird Pan image and MS image based on WPT. (a) The original Pan image with the resolution of 0.61m, (b) the original MS image with the resolution of 2.44m, (c) merged image

Results Evaluation Considering the drawbacks of the subjective results assessment method, we utilized three existing indices for evaluating the quality of fused images [8]. Information entropy: The expression of the classical information entropy of an image is H=−

L−1 $

Pi lnpi .

i=0

where L denotes the number of gray level, pi equals the ratio between the number of pixels whose gray value equals i(0 ≤ i ≤ L − 1) and the total pixel number contained in an image. Correlation coeﬃcient: The correlation coeﬃcient of two images is often used to indicate their degree of correlation. The correlation coeﬃcient is given by corr

A B

m n ' '

=.

(xi,j − µ(A))(x´i,j − µ(B))

j=1 i=1 m n ' '

(xi,j − µ(A))2

j=1 i=1

n ' m '

. (x´i,j − µ(B))2

j=1 i=1

where A and B are two images, xi,j and x´i,j are the elements of the image A and B respectively. µ(A) and µ(B) stand for their mean values. Warping degree: Warping degree represents the level of optical spectral distortion of a multispectral image. Its formula is 1 $$ W = |xi,j − x´i,j | . m × n j=1 i=1 n

m

where xi,j and x´i,j denote the element of the original image and the fused image. The degree of distortion increases, when W increases. The results of evaluation were listed in Table 1. From Table 1, we observed that the values of several quality indices obtained by WPT fusion method are all much larger than those generated by the WT and IHS method.

QuickBird Panchromatic and Multi-Spectral Image Fusion

981

Table 1. Evaluation results of WPT, WT and IHS methods Index

WPT

WT

IHS

Information entropy 5.152 4.793 3.834 Correlation coeﬃcient 0.958 0.842 0.831 Warping degree 12.837 10.653 9.775

4

Conclusion

An approach for merging QuickBird high spatial resolution Pan image and Ms image was proposed after analyzing the theory of WPT. The experimental results indicated that this approach performs eﬀectively both in improving the spatial information and in preserving the spectral information. Therefore, the fused image can be interpreted easily.

References 1. James, R.C.: Computational Considerations in Digital Image Fusion via Wavelets. Computers & Geosciences. 31 (2005) 527–530 2. Zhang, Y., Hong, G.: An IHS and Wavelet Integrated Approach to Improve PanSharpening Visual Quality of Natural Color IKONOS and QuickBird Images. Information Fusion. 6 (2005) 225–234 3. Tu, T.M., Su, S.C., Shyu, H.C., Huang, P.S.: A New Look at IHS-Like Image Fusion Methods. Information Fusion. 2 (2001) 177–186 4. Jorge, N., Xavier, O., Octavi, F., Albert, P., Vicenc, P., Roman, A.: MultiresolutionBased Imaged Fusion with Additive Wavelet Decomposition. IEEE Trans. Geosci. Remote Sensing. 37 (1999) 1204–1211 5. Cao, W., Li, B.C., Peng, T.Q.: A Remote Sesing Image Fusion Based on Wavelet Packet Transform. Remote Sensing Technology and Application. 18 (4) (2003) 248– 253 6. Manfred, F., Andreas, U.: Wavelet Packet Image Decomposition on MIMD Architectures. Real-Time Imaging. 8 (2002) 399–412 7. Wang, H.H., Peng, J.X., Wu, W.: Remote Sensing Image Fusion using Wavelet Packet Transform. J. of Image and Graphics. 9 (2002) 932–936 8. Shi, W.Z. , Zhu, C.Q., Tian, Y., Nichol, J.: Wavelet-based Image Fusion and Quality Assessment. Int. J. of Applied Earth Observation and Geoinformation. 6 (2005) 241–251

Satisfaction-Based Selection of XML Documents Sergio Andreozzi1,2 , Paolo Ciancarini1 , Danilo Montesi1, , and Rocco Moretti1,3 1

University of Bologna, Department of Computer Science, 40127 Bologna, Italy {paolo.ciancarini, danilo.montesi, rocco.moretti}@cs.unibo.it 2 Istituto Nazionale di Fisica Nucleare-CNAF, 40127 Bologna Italy [email protected] 3 Dept. of Pure and Applied Mathematics, University of Padova, 35131 Padova, Italy

Abstract. XML documents are becoming the most common approach to represent entities of the real world. Users need to select such entities based on their expectations in terms of both requirements and preferences. The selection process should deal with a potentially high number of documents representing similar entities and with the diversity of the perceived satisfactions of the users. In this paper, we present XMatch, a query language enabling the expression of users requests about the expected satisfaction over XML documents. This language improves the expressiveness of queries and supports aggregation of an high number of elementary satisfactions.

1

Introduction

XML is the de facto standard for exchanging representations of real world entities in the Internet [1]. We consider a mechanism supporting the selection of XML documents representing such entities. In particular, we focus on the users expectations that can be expressed in terms of both requirements and preferences. In our previous work [2], we started by proposing a model for the rigorous representation of entity characteristics, for the association of each of their possible values with the user satisfaction and for the aggregation of the single satisfactions in an overall score using a particular logic. In this paper, we describe XMatch, a query language enabling the expression of the user requirements and preferences based on the model deﬁned in [2]. This language is inspired by XQuery [3] reusing a set of constructs useful for our goal and providing clauses based on our model. The application of XMatch to generic XML documents is also depicted by means of an application scenario. The paper is structured as follows: in Section 2, we introduce the XMatch language; in Section 3, we describe a generic application scenario; in Section 4, we present related works; ﬁnally, in Section 5, we draw up our conclusions.

Partially funded by PRIN Project “Middleware basato su Java per la fornitura di servizi interattivi di TV digitale”.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 982–989, 2006. c Springer-Verlag Berlin Heidelberg 2006

Satisfaction-Based Selection of XML Documents

2

983

The XMatch Language

In this section, we present the XMatch language enabling to express queries over XML-based representations of generic entities. It oﬀers the capability of describing the satisfaction degree that a user perceives as regards the possible values of the attributes of interest. The grammar rules are given only for symbols starting with the preﬁx XM, while the other symbols are taken from the XQuery W3C speciﬁcation [3]. For a complete presentation of the language see [4]. 2.1

The Main Clause

The core part of the XMatch language is an expression deﬁned in the grammar by the symbol XMExpr as follows: Grammar 1 XMExpr ::= XMForClause XMLetClause+ XMWhereClause XMReturnClause

The rule deﬁning this symbol is inspired by the FLWOR (For-Let-Where-Order by-Return) expression of the XQuery language [3]. The ﬁrst clause of this expression generates an ordered sequence of tuples of bound variables called the tuple stream (XMForClause). For each set of bound variables generated in this step, the second clause (XMLetClause) enables to bound one or more variables to the value returned by an elementary criterion of satisfaction (see [2] for a presentation of this concept) as regards elements belonging to the tuple stream. The third clause (XMWhereClause) enables to associate each variable deﬁned in the second clause with a relevance category, thus deﬁning the aggregation pattern. The fourth and last clause (XMReturnClause) is used to return the result as an XML document and allows to select a subset of solutions based on their overall satisfaction. 2.2

Generation of the Tuple Stream

As stated above, the XMForClause generates an ordered sequence of tuples of bound variables, called the tuple stream. This is a simpliﬁed version of the XQuery ForClause where only the basic constructs are maintained to select elements from an XML document and to generate the streams of the possible solutions by using the join capability. For each XMatch expression, only one XMForClause is allowed with one or more variables to be bound to diﬀerent types of nodes. The URILiteral is deﬁned in the XQuery speciﬁcation and should refer to a URI that can be resolved to a ﬁle containing the data in XML format from which the set of important fragments are extracted. The OrExpr is also part of the XQuery speciﬁcation. Grammar 2 XMForClause XMDocCall XMPathExpr XMPredicate

::= ::= ::= ::=

<"for" "$"> VarName "in" XMDocCall ("," "$" VarName "in" XMDocCall)* "doc(" URILiteral ")" XMPathExpr? XMPredicate? ( "/" QName )* ( "/" "@" QName )? "[" OrExpr "]"

984

2.3

S. Andreozzi et al.

Expressing the Satisfaction

We describe the XMLetClause as the essential construct for deﬁning a single elementary criterion of satisfaction. The three main categories are supported: an enumeration of all possible values returned by a measurement of an attribute, an absolute classiﬁcation of these values and a relative classiﬁcation [5]. Grammar 3 XMLetClause ::= "let" ( "$" VarName ":=" ( XMSimpleEnum | XMCompEnum | XMRange ) ) XMSimpleEnum ::= XMPathExpr ValueComp XMElement "satisfies" XMSatLiteral ( "," XMElement "satisfies" XMSatLiteral )* XMCompEnum ::= XMPathExprList ValueComp XMElementList "satisfies" XMSatLiteral ( "," XMElementList "satisfies" XMSatLiteral )* XMRange ::= XMPathExpr "in" XMElement "to" XMElement "satisfies" ( "with" "linear" "increment" | "with" "linear" "decrement" | "with" "around" | "not" "around" ) XMPathExprList ::= "(" XMPathExpr ( "," XMPathExpr )+ ")" XMElementList ::= "(" XMElement ( "," XMElement )+ ")" XMElement ::= ( Literal | XMPathExpr | XMAggregation ) XMSatLiteral ::= ("0"? "." Digits | "1" ) XMAggregation ::= ( "max"|"min"|"avg"|"sum"|"count" ) XMPathExpr ::= ( "/" QName )* ( "/" "@" QName )?

The values that are input for an elementary criterion of satisfaction are determined by a simpliﬁed XPath expression (XMPathExpr). A step in this expression consists only of a child forward step in its abbreviated form with a name node test according to the deﬁnitions given in the XPath 2.0 speciﬁcation [6]. As stated in this speciﬁcation, the result of an XPath expression is any sequence allowed by the data model. An important characteristic introduced by the new data model [8] is that there is no distinction between an item (i.e., a node or an atomic value) and a singleton sequence containing this item (i.e., an item is equivalent to a singleton sequence containing the item and vice versa). XMatch only admits as result of an XMPathExpr expression a sequence of atomic values. Besides, we have to specify how an elementary criterion of satisfaction is applied to such a type of result. Given the simpliﬁed path expression, the resulting sequence is composed by elements with the same qualiﬁed name at the same distance from the root node in the XML document. 2.4

Classifying the Satisfactions by Relevance

We explain how the association of an elementary satisfaction to a relevance category (see [2] for a presentation of this concept) is modeled in the XMatch language. Potentially, the relevance categories can be inﬁnite, but only three of them are introduced as they are suﬃcient for meaningful use cases. They are deﬁned in the XMatch language grammar by using the following string literals: essential, desirable and optional. The advantage of such a deﬁnition is the improvement of the legibility of XMatch queries. A possible approach to generalize the language to an high number of relevance categories is to use natural numbers to label them. The lower is the natural number associated to a relevance category, the more important is the satisfaction.

Satisfaction-Based Selection of XML Documents

985

Grammar 4 XMWhereClause ::= "where" ( ( XMRelevanceSubj ( "essential" | "desirable" | "optional" ) ) ( XMRelevanceSubj "essential" and XMRelevanceSubj "desirable") ( XMRelevanceSubj "essential" and XMRelevanceSubj "optional") ( XMRelevanceSubj "desirable" and XMRelevanceSubj "optional") ( XMRelevanceSubj "essential" and XMRelevanceSubj "desirable" and XMRelevanceSubj "optional") ) XMRelevanceSubj ::= ( "$" VarName "is" | XMVarNameList "are" ) XMVarNameList ::= "(" "$" VarName ( "," "$" VarName )+ ")"

| | | |

Given a set of XMLetClause expressions deﬁning elementary satisfactions, each of them can be associated to a relevance category by using the XMWhereClause (see Example 1). This provides the meaningful information for building the aggregation pattern (see [2] for a presentation of this concept). Example 1. where ( $e1, $e2 ) are essential and $e3 is desirable

Weight and power parameters used in the aggregation pattern are considered to be part of the query processor. 2.5

Constructing the Result

We describe how the result of the query is constructed and returned. The decision is to deﬁne a clause that does not provide any transformation capability. The transformation of the result can be achieved by adding a postprocessing phase using languages like XQuery or XSLT. The XMReturnClause returns an XML document with a predeﬁned structure as presented in Example 2. Example 2. ... ...

Each Result element contains a set of elements as generated by the XMForClause and an E attribute with the overall satisfaction associated to the solution. These elements are returned following a decreasing order with respect to the value of E. Moreover, the number of results can be limited in two ways: by asking the ‘Top K’ results and by dropping all solutions that do not reach a minimum overall satisfaction. Grammar 5 XMReturnClause ::= "return" "top" digits ( "with threshold" XMSatLiteral )?

3

An Application Scenario

In this section, we present a generic example to show how the XMatch language can be used. We consider an XML document having a structure given in Figure 1. In this document, two sets of diﬀerent entities are represented: the

986

S. Andreozzi et al.

T

A

A

B 9

B 5

C 5

B 3

B 10 C 28

P

P

Q 5

R 2

R 7

Q 8

R 9

R 4

Fig. 1. Tree view of the XML document (for each node, the left side contains the element QName and the right side contains the element value) Table 1. Elementary satisfactions and overall score Solution /T/A[2], /T/P[2] /T/A[1], /T/P[2] /T/A[2], /T/P[1] /T/A[1], /T/P[1]

ess E e1 e3 0.940 1 1 0.930 0.75 1 0.654 1 0 0.649 0.75 0

des e4 0.8 0.8 0.9 0.9

opt e2 0 1 0 1

entity A having the properties B and C; the entity P having the properties Q and R. Let us consider a user that requires an entity A and an entity P. For the entity A, the expectations can be synthesized as follows: the property B provides an increasing satisfaction for values starting from 6 up to 10, the property C provides a full satisfaction when the value is 5, while it provides a lower satisfaction for the value 7. For the entity P, the expectations can be synthesized as follows: the property Q provides a full satisfaction for the value 8 and it linearly decreases to 0 for values down to 6 or up to 10, the property R provides a linearly increasing satisfaction starting from the value 0 up to the value 20. For all other possible values, the satisfaction is equal to 0. The satisfactions related to the properties B and Q are reputed to be essential, the satisfactions related to the property R is is reputed to be desirable, ﬁnally the satisfaction related to the property C is reputed to be optional. These user expectations are expressed in the following XMatch query: for $A in doc("data.xml")/T/A, $P in doc("data.xml")/T/P, let $e1 := $A/B in 6 to 10 satisfies with linear increment let $e2 := $A/C eq 5 satisfies 1, 7 satisfies 0.7 let $e3 := $P/Q in 6 to 10 satisfies with around let $e4 := $P/R in 0 to 20 with linear decrement where ( $e1, $e3 ) are essential and $e4 is desirable and $e2 is optional return top 10 with threshold 0.6

Each let clause is used to express an elementary criterion of satisfaction, while the where clause describes a speciﬁc instance of the general aggregation pattern given in [2]. In Table 1, we report both the elementary satisfactions (e1, e2, e3 and e4) and the overall score (E) as resulting by the application of the method proposed in [2] mapped into XMatch. Such values have been computed by using the prototype implementation of an XMatch query engine that relies on the mapping into XQuery [?].

Satisfaction-Based Selection of XML Documents

4

987

Related Work

The problem of extending the expressiveness of query languages in order to consider not only user requirements, but also user preferences has been faced in diﬀerent domains. In the context of Grid systems, a meaningful approach for the selection of services [9] is based on the Classiﬁed Advertisement (ClassAd) language [10,11]. This language has been designed in the context of the Condor distributed computing system [12] where it is used for discovery and allocation of resources. Its use consists of the following phases: (1) providers of computing resources submit advertisements describing their capabilities and declaring constraints and preferences for jobs that they are willing to run (constraints are boolean expressions involving constants or ClassAd attributes, while preferences are encoded in a rank that consists in deﬁning an arithmetic expression synthesizing values used for sorting the services satisfying the constraints); (2) consumers submit advertisements describing their jobs and the desired execution environment in terms of constraints and preferences; (3) a matchmaker process matches the resource and consumer request advertisements. As regards this work, we observe that in general it is diﬃcult to deﬁne a rank expression aggregating values from diﬀerent attributes. Two important works in the databases community are proposed by Chomicki and Kießling. Chomicki describes a logical framework for formulating preferences as strict partial orders by using arbitrary logical formulas [13]. In order to embed such formulas into relation algebra, a single winnow operator that can be parameterized by a preference formula was deﬁned. This enables the rewriting of preference formulas as SQL queries. Our language is targeted at semi-structured data and our approach has been to deﬁne a new language. As regards this work, we observe that the theoretical preference framework is mapped into the relational data model and the SQL query language. Kießling proposed a formal language for formulating preference queries based on the Best-Matches-Only (BMO) query model [14,15]. It developed a set of constructors and combinators that can be used to write preference expressions. An algebra modeling such operators was deﬁned and extensions for both SQL (Preference SQL [16]) and XPath (Preference XPath [17]) were proposed. In our work, we focused on the XML data model and we have proposed a new language based on XQuery. Concerning this work, the theoretical preference framework is mapped into 1) the relational data model and the SQL query language and 2) the XML data model and XML path language (XPath).

5

Conclusion

In this paper, we have presented a query language enabling the expression of the user requirements and preferences over generic entities represented by means of XML documents. This language supports the mapping of attributes values into elementary satisfactions, the categorization of such satisfactions by relevance

988

S. Andreozzi et al.

and the computation of an overall score. Such a score can be used to rank the various proposals and to identify the most satisfying solution as regards the users expectations. The simultaneous selection of a set of diﬀerent entities is also supported. Ongoing work is targeted at completing the XMatch engine based on rewriting techniques that translate XMatch into XQuery. By this engine, we will analyze both beneﬁts and performance issues in various application scenarios (e.g., Grid systems). Finally, the prototyping of a native XMatch engine considering optimization aspects that are speciﬁc for this language can be achieved.

References 1. Bray, T., Paoli, J., Sperberg-McQueen, C.M., Maler, E., Yergeau, F., Cowan, J.: Extensible Markup Language (XML) 1.1. W3C Recommendation 4 Feb (2004) 2. Andreozzi, S., Ciancarini, P., Montesi, D., Moretti, R.: An approach to the quantitative evaluation of grid services. To appear in Journal of Concurrency and Computation: Practice and Experience (2005) DOI: 10.1002/cpe.978 3. Boag, S., Chamberlin, D., Fernndez, M., Florescu, D., Robie, J., Simeon, J.: XQuery 1.0: An XML Query Language. (W3C Working Draft, 11 February (2005)) 4. Andreozzi, S., Montesi, D., Moretti,R.: XMatch: a Language for Satisfactionbased Selection of Grid Services. Scientiﬁc Programming Journal, Special Issue on Grids and Worldwide Computing, IOS Press, IOS Press, (2005) 299-316 5. Dujmovic, J.: A method for evaluation and selection of complex hardware and software systems. In: Proceedings of the International Conference for the Resource Management and Performance Evaluation of Enterprise Computing Systems (CMG96), San Diego, CA, USA, Dec 1996. Volume 1. (1996) 368–378 6. Berglund, A., Boag,S., Chamberlin, D., Fern´ andez, M.F., Kay, M., Robie,J., Sim´eon, J.: XML Path Language (XPath) 2.0. W3C Candidate Recommendation, Nov (2005) 7. Fern´ andez, M., Malhotra, A., Marsh, J., Nagy, M., Walsh,N.: XQuery 1.0 and XPath 2.0 Data Model. W3C Candidate Recommendation, Nov (2005) 8. XMatch2XQuery prototype implementation. http://www.cnaf.infn.it/∼andreozzi/research /xmatch/software/. May (2006) 9. Prelz, F. et Al.: Pratical Approaches to Grid Workload and Resource Management in the EGEE Project. In: Proceedings of the Conference on Computing in High Energy and Nuclear Physics (CHEP 2004), Interlaken, Switzerland. (2004) 10. Solomon, M.: The ClassAd language reference manual. Computer Sciences Department, University of Wisconsin, Madison, WI, Oct (2003) 11. Litzkow, M. J., Livny, M., Mutka, M. W.: 2003, ‘Policy Driven Heterogeneous Resource Co-Allocation with Gangmatching’. In: Proceedings of the 12th IEEE International Symposium on High-Performance Distributed Computing (HPDC 2003), Seattle, WA, USA, June (2003) 12. Litzkow, M., Livny, M., Mutka, M. W.: 1988, Condor - a Hunter of Idle Workstations. In: Proceedings of the 8th International Conference on Distributed Computing Systems (ICDCS 1988), San Jose, CA, USA, June (1988) 13. Chomicki, J.: Preference Formulas in Relational Queries. ACM Transaction on Database Systems, 28(4) (2003) 427–466

Satisfaction-Based Selection of XML Documents

989

14. Kießling, W.: Foundations of Preferences in Database Systems. In Proceedings of the 28th Very Large Database System (VLDB) Conference, Hong Kong, China, (2002) 15. Hafenrichter, B., Kießling, K.: Optimization of Relational Preference Queries. In Proceedings of the 16th Australasian Database (ADB05) Conference, Newcastle, Australia, (2005) 16. Kießling, K., K¨ ostler, G.: Preference SQL - Design, Implementation, Experiences. In Proceedings of 28th International Conference on Very Large Databases (VLDB), Hong Kong, China, Aug (2002) 17. Kießling, W., Hafenrichter, B., Fischer, S., Holland, S.: Preference XPATH: a Query Language for E-Commerce. In Proceedings of 5th Internationale Tagung Wirtschaftsinformatik, Augsburg, Germany, Sep (2001)

An Adaptive Error Detection-Recovery QOS for Intelligent Context-Awareness: AEDRQ Eung Nam Ko Department of Information & Communication, Baekseok University 115, Anseo-Dong, Cheonan, ChungNam, 330-704, Korea [email protected]

Abstract. This paper presents the design of the AEDRQ(An Adaptive Fault Tolerance QOS), which is running on RCSM(Reconfigurable Context Sensitive Middleware) for ubiquitous networks. The AEDRQ model is proposed for supporting QoS resource errors detection-recovery in the situation-aware middleware. The AEDRQ model captures the relationships between application missions, actions, QoS requirements, and resources under dynamically changing situations. A resource error detection-recovery algorithm using the AEDRQ model is suggested. We present the utility of the AEDRQ model in a QoS resource error detection-recovery example of the VOD system. We are in process of constructing middleware components such as Resource Agent, Resource Monitoring Agent, and fault tolerance QoS Agent using the error detection-recovery model and its specification. To ensure required reliability of multimedia communication systems, AEDRQ consists of 3 steps that are an error detection, an error classification, and an error recovery.

1 Introduction The development of middleware is closely related to the evolution of ubiquitous computing began in the mid of 1970s, when the PC first brought computers closer people. With the advent of networking, personal computing evolved into distributed computing. With seamless access and World Wide Web, distributed computing marked a next step toward pervasive computing, and mobile computing emerged from the integration of cellular technology with the Web. The “anytime anywhere” goal of mobile computing is essentially a reactive approach to information access, and it prepare the way for pervasive computing’s proactive ”all the time everywhere” goal[1,2]. Context awareness(or context sensitivity) is an application software system’s ability to sense and analyze context from various sources; it lets application software take different actions adaptively in different contexts[3]. In a ubiquitous computing environment, computing anytime, anywhere, any devices, the concept of situation-aware middleware has played very important roles in matching user needs with available computing resources in transparent manner in dynamic environments [4,5]. Although the situation-aware middleware provides powerful analysis of dynamically changing situations in the ubiquitous computing environment by synthesizing multiple contexts and users’ actions, which need to be analyzed over a D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 990 – 999, 2006. © Springer-Verlag Berlin Heidelberg 2006

An Adaptive Error Detection-Recovery QOS for Intelligent Context-Awareness

991

period of time, it is difficult to analyze reliable Quality of Service (QoS) of situationaware applications because the relationship between changes of situations and resources required to support the desired level of reliable QoS is not clear. Thus, there is a great need for situation-aware middleware to be able to predict whether all faulttolerance QoS requirements of the applications are satisfied and analyze tradeoff relationships among the fault-tolerance QoS requirements, if all fault-tolerance QoS requirements cannot be satisfied to determine a higher priority of fault-tolerance QoS requirements. In this paper, a QoS resource error detection-recovery model called “AEDRQ” is proposed for situation-aware middleware. To perform the QoS resource error-recovery, the AEDRQ model effectively captures relationships among missions (application capability requirements), fault-tolerance QoS constraints, actions (object method calls in object-oriented software development), and resources (e.g., hardware, software, network, and data resources). Section 2 describes related works as middleware for context awareness in ubiquitous computing environments. Section 3 denotes Reconfigurable Context-Sensitive Middleware (RCSM)., the AEDRQ architecture, and algorithm. Section 4 describes simulation results of our proposed AEDRQ model. Section 5 presents conclusion.

2 Related Works The Context Toolkit was built based on this conceptual framework. There were five applications that were built to assess the actual benefits of the Context Toolkit. Seminal work has been done by Anind Dey, et al. [6] in defining context-aware computing, identifying what kind of support was required for building context-aware applications and developing a toolkit that enabled rapid prototyping of context-aware applications. They have laid out foundations for the design and development of context-aware applications by proposing a conceptual framework. The proposed conceptual framework separates concerns between context acquisition and the use of context in applications, to provide abstractions that help acquire, collect and manage context in an application independent fashion and identify corresponding software components. In the Context Toolkit, a predefined context is acquired and processed in context widgets and then reported to the application through application-initiated queries and callback functions. In this Reconfigurable Context-Sensitive Middleware(RCSM), Stephen S. Yau et al. [3] proposed a new approach in designing their middleware to directly trigger the appropriate actions in an application rather than have the application itself decide which method(or action) to activate based on context. Their motivation was to extend existing context-sensitive applications by adding new context sources and to easily let multiple concurrent contexts trigger a specific action. They already build a Smart Classroom to validate this RCSM model. RCSM provides an Object-based framework for supporting context-sensitive applications. Anand Ranganathan et al. [7] have built a middleware for developing context-aware applications. This middleware is integrated into their infrastructure for Smart Spaces named GAIA. The middleware is based on a predicate model of context. This model enables agents to be developed that either use rules-based or machine learning approaches to decide their behavior in different contexts.

992

E.N. Ko

With the proposed conceptual framework for building context sensitive applications and the Context Toolkit laid a foundation for context ware middleware to develop. However, while the Context Toolkit does provide a starting point for applications to make use of contexts, its middleware paradigm still forces programmers to think about how to deal with the context widget for the appropriate contextual information. In general these projects do not provide much help on how to reason about contexts. They provide reusable sensing mechanisms but lack of reusable reasoning mechanisms. They do not provide any generic mechanism for writing rules about contexts, inferring higher-level context or organizing the wide range of possible contexts in a structured format [6]. RCSM has laid a further step towards middleware models that fully support for context awareness. RCSM deals with contexts dynamically and let context-sensitive application developers focus on implementing the actions in their favorite language without worrying about details of getting contextual information from different sources. While RCSM uses contextual information dynamically, it does not provide reasoning and/or learning mechanism to help agents reason about context appropriately [3]. The middleware paradigm for context awareness agents implemented in GAIA infrastructure provides a more generic way of specifying the behavior of context-aware-ness applications using different reasoning and learning mechanism. Since all the terms used in the environment are defined in the ontology, it is easy to frame rules for inferring contexts based on these terms. The developers do not have to worry about not using inappropriate terms or concepts, since they can refer to the definitions I the ontology when in doubt [7].

3 The AEDRQ Model Based on RCSM The overview of the RCSM is described in section 3.1. The conceptual architecture of the AEDRQ model is described in section 3.2, and its environment is proposed in section 3.3. Its algorithm is presented in section 3.4. 3.1 Reconfigurable Context-Sensitive Middleware (RCSM) Figure 1 shows how all of RCSM’s components are layered inside a device. All of RCSM’s components are layered inside a device. The Object Request Broker of RCSM (R-ORB) assumes the availability of reliable transport protocols; one R-ORB per device is sufficient. The number of ADaptive object Containers (ADC)s depends on the number of context-sensitive objects in the device. ADCs periodically collect the necessary “raw context data” through the R-ORB, which in turn collects the data from sensors and the operating system. Initially, each ADC registers with the R-ORB to express its needs for contexts and to publish the corresponding context-sensitive interface. RCSM is called reconfigurable because it allows addition or deletion of individual ADCs during runtime (to manage new or existing context-sensitive application objects) without affecting other runtime operations inside RCSM. Ubiquitous applications require use of various contexts to adaptively communicate with each other across multiple network environments, such as mobile ad hoc networks, Internet, and mobile phone networks. However, existing context-aware

An Adaptive Error Detection-Recovery QOS for Intelligent Context-Awareness

993

techniques often become inadequate in these applications where combinations of multiple contexts and users’ actions need to be analyzed over a period of time. Situation-awareness in application software is considered as a desirable property to overcome this limitation. In addition to being context-sensitive, situation-aware applications can respond to both current and historical relationships of specific contexts and device-actions. An example of SmartClassroom is illustrated in [3]. However, it did not include fault-tolerance QoS support in the architecture. In this paper, we focus on how to represent fault-tolerance QoS requirements in situationaware middleware as RCSM.

Situation-Aware Application Objects RCSM

Optional Components RCSM Ephemeral Group Communication Service

Other Services

O S

Core Components Adaptive Object Containers (ADCs) [Providing awareness of situation] RCSM Object Request Broker (R-ORB) [Providing transparency over ad hoc communication]

Transport Layer Protocols for Ad Hoc Networks

Sensors

Fig. 1. RCSM’s integrated components

3.2 Conceptual Architecture of the AEDRQ Model We will present a conceptual model for fault-tolerance QoS requirements representation in situation-aware middleware. Our proposed AEDRQ model aims at supporting adaptive fault-tolerance QoS requirements defined in application-level missions described by a set of actions of objects by reserving, allocating, and reallocating necessary resources given dynamically changing situations. A high-level AEDRQ conceptual architecture to support adaptive fault-tolerance QoS requirements is shown in Figure 2. SA (Situation-aware Agent), RA(Resource Agent), and FTQA(Fault-Tolerance QoS Agent) are the main components shown in SituationAware Middleware box in Figure 2. Applications request to execute a set of missions to Situation-aware Middleware with various QoS requirements. A SA analyzes and

994

E.N. Ko

synthesizes context information (e.g., location, time, devices, temperature, pressure, etc.) captured by sensors over a period of time, and drives a situation. A RA simultaneously analyzes resource availability by dividing requested resources from missions (i.e., a set of object methods, called actions) by available resources. It is also responsible for monitoring, reserving, allocating and deallocating each resource. Given the driven situations, A FTQA controls resources when it met errors through the RA to guarantee requested fault-tolerance QoS requirements. If there are some error resource due to low resource availability, FTQA performs QoS resource error detection-recovery. RA resolves the errors by recovering resources for supporting high priority missions. To effectively detect and recover an error for fault-tolerance QoS, we need to capture the relationships between mission, actions, its related fault-tolerance QoS requirements, and resources.

<Applications> Mission1

<Sensor>

Mission2

Situation 1

Missin n

Situation 2 <Situation-Aware Middleware> Situation-aware Agent

Action1 +QoS1

Situation n Resource Agent

Action n + QoSn Fault-Tolerance QoS Agent

R e s o u r c e 1

R e s o u r c e 2

R e s o u r c e n

Fig. 2. Conceptual Architecture of Our Proposed AEDRQ Model

3.3 An Environment of the AEDRQ Model The access requires situation-aware fault-tolerance QoS, in which the different faulttolerance can be automatically enforced according to different situations such as

An Adaptive Error Detection-Recovery QOS for Intelligent Context-Awareness

995

wired or wireless network environment. For example, a user issues a mission specified by Situation-aware IDL(Interface Definition Language) to watch a VOD movie service in his/her handheld device (e.g., smart phone) on street. When the user arrives at his/her home, the Situation-aware Manager captures the change of situation (e.g., location, device, and network) through sensors and transfers the VOD service continually and seamlessly from the wireless handheld device into a digital, wiredhome-networked TV set. IBM’s TSpace is used as a backbone network and Java-code phone and Desktop Computers(TV emulators) are used for demonstration. The VOD movie doesn’t restart from the beginning. With a click, the transition into another situation (i.e., from a wireless handheld device to a wired TV set) doesn’t disrupt the VOD service (non-stop VOD service). It is an example of the AEDRQ model to support non-stop VOD service from situation 1 (Location = “on street” ^ Device = “handheld” ^ network = “wireless”) into situation 2 (Location = “home” ^ Device = “Desktop Computers” ^ network = “wired”). The VOD service is initiated by the Wireless-VODservice mission. At first, the actions, related with Wireless-VODservice mission such as A11, A12, etc, are triggered by situation1. These actions make VODserver1 provide the VOD service to VOD-client1 using the related resources with satisfying two constraints, Fault-toleranceQoS1 and FramesPerSecond1. The FaulttoleranceeQoS1 constraint is forced on Error-Detection/Error-Recovery to execute error-detection at the VOD-server1 before the VOD data transmission and to execute error-recovery at the VOD-client1 just after receiving the VOD data. Since the situation-aware, adaptive VOD service mission is changed from wireless to wired network, situation2 is created and the actions for Wired-VODservice are triggered. The Fault-toleranceQoS2 and FramesPerSecond2 constraints are enforced when the actions are executed. 3.4 Algorithm of the AEDRQ Model To ensure required reliability of multimedia communication systems based on situation-awareness middleware, AEDRQ consists of 3 steps that are an error detection, an error classification, and an error recovery. AEDRQ consists of EDA(Error Detection Agent) and ERA(Error Recovery Agent). EDA consists of ED(Error Detector), EC(Error Classifier) and EL(Error Learner). EDA is an agent which plays a role in detecting, and classifying errors. ED is an agent which plays a role as an interface to interact among an application, EC and EL. As shown in figure 3, ED has functions which detect an error by using hooking techniques. EDA detects an error by using hooking methods in MS-Windows API(Application Program Interface). When an error occurs, A hook is a point in the Microsoft Windows message-handling mechanism where an application can install a subroutine to monitor the message traffic in the system and process certain types of messages before they reach the target window procedure. Windows contains many different types of hook. The roles of error and application program sharing are divided into two main parts; Abstraction and sharing of view generation. Error and application program sharing must take different from each other according to number of replicated application program and an event command. This proposed structure is distributed architecture but for error and application program sharing, centralization

996

E.N. Ko

architecture is used. As shown in Fig.3, error and application program sharing windows perform process communication of message form. In the middle of this process, there are couple ways of snatching message by error and application sharing agent. ED informs EC of the results of detected errors. ED inspects applications by using hooking techniques to find an error. EC and EL deal with learning in reactive multi-agent systems. Generally, learning rules may be classified as supervised or unsupervised. KB has a registration information of creation of service handle and session manager handle by Daemon and GSM. EC can decide whether it is hardware error or software error based on learning rules by EL. In case of hardware error, it cannot be recoverable. In case of software error, it can be recoverable. This approach is based on the idea of comparing the expected error type which is generated by an EL with the actual error occurred from sites.

G G

<<E <ES> hook table

<ES>

application

Event Distributer

G G G

Filter function

G G G G G G

Network

view /event

<ES>

view/event <ES>

Virtual app.

Virtual app.

Filter func.

Filter func.

G G

Fig. 3. Error and Application Sharing Process

4 Simulation Results To evaluate the performance of the proposed system, an error detection method was used to compare the performance of the proposed model against the conventional model by using DEVS formalism[8], [9], [10], [11], [12]. (Simulation 1) In the first simulation, we have considered composition component as shown in Table 1. The atomic models are EF, RA1, and ED1. The combination of atomic models makes a new coupled model. First, it receives input event, i.e., polling interval. The value is an input value in RA1. An output value is determined by the time related simulation process RA1. The output value can be an input value in ED1. An output value is determined by the time related simulation process ED1. We can observe the result value through transducer.

An Adaptive Error Detection-Recovery QOS for Intelligent Context-Awareness

997

Table 1. Atomic Model and State Variable

Component EF(genr) RA1

ED1

State Variable Poll_int Ra_re_time App_cnt Ra_re_t_a Tat_t_a

Contents Polling interval Response time The number of application program Accumulated response time RAaccumulated response time

(Simulation 2) In the second simulation, we have considered composition component as shown in Table 2. The atomic models are EF, RA2, and ED2. The combination of atomic models makes a new coupled model. First, it receives input event, i.e., polling interval. The value is an input value in RA2. An output value is determined by the time related simulation process RA2. The output value can be an input value in ED2. An output value is determined by the time related simulation process ED2. We can observe the result value through transducer. Table 2. Atomic Model and State Variable

Component EF (genr) RA2

State Variable Poll_int

Contents

Ra_re_time App_cnt

Response time The number application program Accumulated time RAaccumulated time

Ra_re_t_a ED2

Tat_t_a

polling interval

of response response

We can observe the following. The error detected time interval is as follows. Conventional method: 2*Poll_int* App_ cnt Proposed method: 1*Poll_int Therefore, proposed method is more efficient than conventional method in error detected method because of App_cnt >= 1. We have compared the performance of the proposed method with conventional method.

5 Conclusion RCSM provides standardized communication protocols to interoperate an application with others under dynamically changing situations. Since the application needs of

998

E.N. Ko

middleware services and computing environment (resources) keep changing as the application change, it is difficult to analyze: whether it is possible that all faulttolerance QoS(Quality of Service) requirements are met, and what fault-tolerance QoS requirements have tradeoff relationships. In this paper, we propose a faulttolerance QoS resource model called “AEDRQ” for RCSM. An adaptive VOD(Video On Demand) system, is used as an illustrative example of the AEDRQ model and its resource error detection-recovery.The AEDRQ model is proposed for supporting QoS resource errors detection-recovery in the situation-aware middleware. The AEDRQ model captures the relationships between application missions, actions, QoS requirements, and resources under dynamically changing situations. A resource error detection-recovery algorithm using the AEDRQ model is suggested. We present the utility of the AEDRQ model in a QoS resource error detection-recovery example of the VOD system. We are in process of constructing middleware components such as Resource Agent, Resource Monitoring Agent, and fault tolerance QoS Agent using the error detection-recovery model and its specification. To ensure required reliability of multimedia communication systems, AEDRQ consists of 3 steps that are an error detection, an error classification, and an error recovery. This approach has no consideration of domino effect between processes. And we remain simulations for fault-tolerance QOS based on RCSM and QoS resolution strategies for fault-tolerance QOS based on GAIA projects as future work.

Acknowledgement I would like to thank Dr. Hoh Peter In for precious comments and suggestions.

References 1. Satyanarayanan, M.: Pervasive Computing: Vision and Challenges. Personal Communications, IEEE[see also IEEE Wireless Communications, IEEE], 8 (2001) 10-17 2. Saha, D.; Mukherjee, A.: Pervasive &omputing: A Paradigm for the 21st Century. IEEE Computer, 36 (2003) 25-31 3. Yau, S., Karim, F., Wang, Y., Wang, B., Gupta, S.: Reconfigurable Context-Sensitive Middleware for Pervasive Computing. IEEE Pervasive Computing, 1(3) (2002) 33-40 4. Yau, S. S., Karim, F.: Adaptive Middleware for Ubiquitous Computing Environments. Design and Analysis of Distributed Embedded Systems, Proc. IFIP 17th WCC, 219 (2002) 131-140 5. Yau, S. S., Karim, F.: Contention-Sensitive Middleware for Real-time Software in Ubiquitous Computing Environments. Proc. 4th IEEE Int’l Symp. on Object-Oriented Realtime Distributed Computing (ISORC 2001) (2001) 163-170 6. Dey, A.K., et al.: A Conceptual Framework and a Toolkit for Supporting the Rapid Prototyping of Context-Aware Applications. anchor article of a special issue on ContextAwareness Computing, Human-Computer Interaction HCI Journal, (2001) http://www.cc.gatch.edu/fce/contexttoolkit 7. Ranganathan, A., Campbell, R.H.: A Middleware for Context-Aware agents in Ubiquitous Computing Environments. GAIA Project (2003). http://choices.cs.uiuc.edu/gaia/papers/

An Adaptive Error Detection-Recovery QOS for Intelligent Context-Awareness

999

8. Zeigler, B.P., Cho, T.H., Rozenblit, J.W.: A Knowledge-Based Simulation Environment for Hierarchical Flexible Manufacturing. IEEE Transaction on Systems, Man, and Cybernetics-Part A: System and Humans, 26(1) (1996) 81-90 9. Cho, T.H., Zeigler, B.P.: Simulation of Intelligent Hierarchical Flexible Manufacturing: Batch Job Routing in Operation Overlapping. IEEE Transaction on Systems, Man, and Cybernetics-Part A: System and Humans, 27(1) (1997) 116-126 10. Zeigler, B.P.: Object-Oriented Simulation with Hierarchical, Modular Models. Academic Press, (1990) 11. Zeigler, B.P.: Multifacetted Modeling and Discrete Event Simulation. Orlando, FL: Academic, (1984) 12. Zeigler, B.P.: Theory of Modeling and Simulation. John Wiley, NY, USA, (1976), reissued by Krieger, Malabar, FL, USA, (1985)

An Efficient Method to Find a Shortest Path for a Car-Like Robot Gyusang Cho1 and Jinkyung Ryeu2,* 1

Dept. of Computer Eng., Dongyang Univ., Youngju, Korea [email protected] 2 Dept. of IT Electronic Eng., Dongyang Univ., Youngju, Korea [email protected]

Abstract. This paper proposes new formulae for Dubins’ CSC family to find the shortest smooth path between the initial and final configurations of the carlike robot. The formulae are used for finding connection points explicitly between the circle of the initial configuration and the straight line segment, and the connection point of the straight line segment and the circle of the final configuration. Fast finding the connection points(C S and S C) are very useful for a real application. The formulae have simple forms mainly because they are transformed to the origin of their original configuration, and then they are converted to standard forms which are the representative configurations of type LSL and LSR, respectively. And a simple criterion for classifying 4 types of CSC, which are necessary for deciding any type of CSC prior to the conversion to the standard form, is presented.

˧

˧

1 Introduction The classical result by Dubins[1] gives a sufficient set of paths which always contains the optimal path for a car-like mobile robot that only moves forwards from an initial position to a target position. He showed that any shortest path of the car-like mobile robot consists of exactly three path segment which are either arcs of circles of radius or straight line segments. Melzak[2], Robertson[3], and Cockayne and Hall[4] studied the accessibility regions in which the robot can reach, but all these works do not concentrate on optimality of the paths. Bui et. al.[5] proposed the synthesis problem for a car-like robot only moving forward in a plane. Then Boissonnat and Bui[6] considered that the partition of the plane w.r.t. the types of the optimal paths and found the shapes of the shortest path accessibility regions for a car that only moves forwards along optimal paths. Reed and Shepp[7] showed that the initial and final configurations define a sufficient set of 48 paths which contains the optimal path. Each of these paths has at most two cusps and five segments, which are either line segment or arcs of a circle of radius. These result has been proved again by Sussman and Tang[8] and Boissonnat et. al respectively. These authors produced a more elegant proof and added more *

Corresponding author.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 1000 – 1011, 2006. © Springer-Verlag Berlin Heidelberg 2006

An Efficient Method to Find a Shortest Path for a Car-Like Robot

1001

necessary conditions on the optimality of a path. They reduced the number of path types to a sufficient set of 46 paths. Boissonat[9] et. al. approached this problem using a well-known control theory. And Soueres and Laumond[10,11] proposed an alternative approach to the problem with reversals. The problem was treated by the combined Pontryagin's optimality principle with geometric reasoning. To find the shortest path using Dubins method, it is selected one of all possible paths, but it may become a problem that it takes much computation time in applications. Shkel and Lumelsky[12] proposed the logical classification scheme to extract the shortest path from the Dubins set directly, without explicitly calculating the candidate paths. This method has several number of equivalency groups, based on the angle quadrants of the corresponding pairs of the initial and final orientation angles. To reach a target configuration from an initial configuration using the Dubins method, we must know the connection point between a circle of the initial configuration and a straight line segment, and the connection point between a circle of the final configuration and the straight line segment of CSC type path respectively. Using the known initial configuration and the final configuration, we can find a rotational angle of the initial configuration and the final configuration individually. The rotational angles are used to find the two connection points. In this research, we propose formulae in Section 3, which are simple and efficient, for calculating the connection points of CSC type. Simple results are due to an adoption of standard forms for the representative LSL and LSR type. And in Section 4, we also present simple criterions for classifying 4 types of CSC, which are necessary for deciding any type of CSC prior to the conversion to the standard form.

2 2.1

Dubins Car Model and Coordinates Transformation Dubins Car Model

A position of a robot is represented by w( x, y,θ ) . The coordinates of the robot are represented by ( x, y ) and the direction is represented by ș . Dubins[1] showed that any path from an initial position to a target position can have 6 admissible paths. The shortest path consists of exactly three path segments and presents a sequence CCC or CSC, where C for an arc of circle will be denoted R for a right-arc and L for a left-arc, and S for a straight line segment. CCC includes 2 types(LRL and RLR) and CSC include 4 types(LSL, RSR, LSR, and RSL). Fig 1(a) shows RSR type, (b) for RSL, and (c) for LRL. Others not shown in this figure are represented by symmetry of the 3 types. 2.2

Coordinates Transformation

The first step of calculating a shortest path for Dubins robot is to transform an arbitrary position to an origin. The position translated by T ( a , b ) , and then rotated by R (θ ) . After all process of calculation, the position is transformed to the original position in a reverse order(Fig. 2).

1002

G. Cho and J. Ryeu

Standard forms have type LSL and type LSR. Type RSR is symmetrical to type LSL, and type LSR is symmetrical to type RSL about x-axis. Calculation of type LSL and type LSR can be applied to type RSR and type RSL in symmetrical forms, which are explained in Sec. 3.1 and 3.2. After transformation the arbitrary position to the origin, the angle of the target configuration is set to π Gfor type LSL, and 0 for type LSR. These cause the angle differences between the original and the standard form. Calculations of the difference angles are very essential for the proposed formulae. In Sec 3.3 and 3.4, the details of the formulae are shown.

(a) Type RSR of CSC

(b) Type RSL of CSC

(c) Type LRL of CCC Fig. 1. Examples of Dubins shortest paths

Fig. 2. Examples of Dubins shortest paths

An Efficient Method to Find a Shortest Path for a Car-Like Robot

3

1003

Calculation of Coordinates for a Shortest Path

3.1 The Case of Type LSL-Standard Form

The initial configuration and the final configuration are located at an arbitrary position and direction. These forms are transformed to the configuration in Fig. 3 by the coordinate transformation. Type LSL composed of (an arc of circle)-(a line segment)(an arc of circle). When a robot rotates counterclockwise(CCW) by θ i about the center C A of a circle with the rotational radius of ρ there exists a connection point PA of the circle and a straight line. And PB is a connection point of the tangent at PA and the target circle which rotate about the center CB . The length of the line PA PB is l . The robot reaches to a standard form by rotate about C B by θ f . The goal is to obtain the coordinates of PA and PB by calculating θ i and θ f after the standard form s which is the result of rotation by θ d from the position t .

Fig. 3. The standard form of type LSL

The length l equals to the distance between C A (0, ρ ) and CB ( x, y − ρ ) as the following equation.

l=

x + ( y − 2ρ ) 2

2

(1)

The angle δ between C A and CB can be calculated as follows. −1

§

· ¸ © y − 2ρ ¹

δ = tan ¨

x

(2)

1004

G. Cho and J. Ryeu

Then, θ i which rotates counterclockwise about C A is obtained.

§ ©

θi = π − ¨ δ +

π·

(3)

¸

2¹

The line C A C B runs parallel to PA PB and crosses at the right angles with PB CB . Consequently the angle ∠C AC B PC makes (π − δ ) . And then the angle θ a is obtained as follows.

θa = π − (π − δ ) −

π 2

=δ −

π

(4)

2

According to the equation (4), the angle θ f can be determined.

θ f = θa +

π 2

=δ −

π 2

+

π 2

=δ

(5)

The coordinates of PA is obtained by rotating the origin counterclockwise about

C A (0, ρ )

x A = ρ sin(θ i )

(6)

y A = ρ − ρ cos(θ i )

In the same manner, the coordinates of PB is obtained by rotating s ( x, y ) clockwise by θ f about CB ( x, y − ρ ) .

xB = x + ρ sin (θ f yB 3.2

) = y − ρ + ρ cos (θ )

(7)

f

The Case of Type LSR–Standard Form

The initial configuration of type LSR is Pi (0,0,0°) and the target configuration is

t ( x f , y f ,θ d °) . To transform to the standard form, t ( x f , y f ,θ d °) is rotated by θ d Gto s ( x, y ) . The direction of the initial configuration and the standard configuration is set to 0 degree, respectively. The rotational radius of the initial configuration is ρ Gand it is rotated counterclockwise by θ i Gabout C A . The rotation of the standard position s ( x, y ) Gcounterclockwise to the point PB Gby θ f . We set the lengthGof PA PB Gis 2l ,

An Efficient Method to Find a Shortest Path for a Car-Like Robot

1005

Fig. 4. The standard form of type LSR

the length of PA PC Gis l , and the distance C A PC Gis d . The distance d is represented by C A (0, ρ ) and CB ( x, y − ρ ) Gas follows.

x + ( y − 2ρ ) / 2

d=

2

2

(8)

The distance l is represented by the distance d as follows.

l=

d −ρ 2

2

(9)

The angle β Gis represented as follows. −1

§ρ· ¸ ©l¹

β = tan ¨

(10)

The angle δ Gcan be calculated as follows. −1

§ x · ¸ © y − 2ρ ¹

δ = tan ¨

(11)

Because the angle α is α = π / 2 − β , θ i can be represented as follows.

θi = π − α − δ =

π 2

+ tan

−1

§ ρ · − tan −1 § x · ¨ ¸ ¨ ¸ ©l¹ © y − 2ρ ¹

(12)

1006

G. Cho and J. Ryeu

Because the circles of the initial configuration A and the target configuration B is symmetrical to the point PC , the angle θ i is identical to θ f .

θ f = θi

(13)

The point PA is calculated by rotating the origin by θ i about C A .

x A = ρ sin(θ i ) y A = ρ − ρ cos(θ i )

(14)

The point PB Gis calculated by rotating s ( x, y ) counterclockwise by θ f about CB Gas follows. xB = x + ρ sin (θ f yB 3.3

) = y − ρ + ρ cos (θ )

(15)

f

Conversion to the Standard Form-Type LSL

In type LSL, the standard form of the target configuration is determined by setting θ s = π at the configuration s ( x, y ) as in Fig. 5. Consequently the robot toward

θ t must be rotated counterclockwise by θ d about CB . Only t ( x f , y f ) and θ t are known values.

Fig. 5. Conversion to type LSL standard form

The coordinates of C B are calculated by using these values. Then the coordinate of s ( x, y ) is obtained. The point r ( x f , y f + ρ ) is lcated from t ( x f , y f ) by distance ρ in the direction of y . And then the point is located at C B by rotating the point by θ r about t ( x f , y f ) .

An Efficient Method to Find a Shortest Path for a Car-Like Robot

1007

The angle θ r has the same value as θ t which is the angle of the tangent at t ( x f , y f ) as in Eq. (16).

θr = θt

(16)

Then, the center of a circle C B based on the rotation angle θ r can be obtained as following equations.

xCB = x f + ρ sin(θ t )

(17)

yCB = y f + ρ cos(θ t )

The point s ( x, y ) is located at intervals of + ρ from C B in the direction of y . Therefore the coordinates of that point are as the equation (18).

x = x f − ρ sin (θ t )

(18)

y = ρ + y f + ρ cos (θ t ) 3.4

Conversion to the Standard Form-Type LSR

In type LSR, the standard form of the target configuration is determined by setting θ s = 0 at the point s ( x, y ) as in Fig. 6. The point r ( x , y − ρ ) is located at from f

f

t ( x f , y f ) by distance − ρ in the direction of y . Then the point is moved to the point CB by rotating the point about t ( x f , y f ) by θ r . The coordinates of CB rotated counterclockwise by θ r are as the equation (19) because the angle θ r is the same as the angle θ t .

xCB = x f + ρ sin(θ t ) yCB = y f + ρ cos(θ t )

Fig. 6. Conversion to type LSR standard form

(19)

1008

G. Cho and J. Ryeu

The point s ( x, y ) is away from CB by distance + ρ in the direction of y . Therefore the coordinates of s ( x, y ) are as the following equation (20). x = x f − ρ sin (θ t ) y = ρ + y f + ρ cos (θ t )

(20)

4 The Criterion for Type Classification of CSC To calculate a feasible shortest path between the initial configuration to the final configuration using a method described in Section 3, the decision of CSC type(LSL, LSR, RSR and RSL) is preceded. In this section, the criterion for type classification of CSC is presented.

Fig. 7. Criterion for type LSR standard form

In Fig. 7, the angle between the origin and the position of the final configuration is denoted δ . If the angle is within 0-180 degree, the initial configuration type is set to type L of CSC, and the angle is within 180-360 degree, the initial configuration type is set to type R of CSC respectively. The angle θ of the initial configuration is oriented to the final point, but the angle is usually differs from the angle φ of the final configuration. If the difference between the angle φ and the angle θ is plus, the counterclockwise rotation of the final configuration is taken. If the angle difference is minus, the clockwise rotation of the final configuration is taken, vice versa. Consequently, If the angle oriented to left-half plane about θ -axis, the Left turn(CCW) is taken, and the angle oriented to right-half plane about θ -axis, the Right

An Efficient Method to Find a Shortest Path for a Car-Like Robot

1009

turn(CW) is taken, vice versa. But the angle should not exceed 180 degree. If the angle exceeds this limit angle, it is taken the reverse angle. To calculate the angle θ , find the distance d and l first. The equations are as follows.

d = x 2f + ( y f − ρ ) 2

(21)

l = d2 − ρ2

(22)

G

The angle α Gis represented as follows.

§ y−ρ· ¸ © x ¹

α = tan −1 ¨

(23)

At ∆C A PAt the angle β can be calculated as follows.

§l ·

β = tan −1 ¨¨ ¸¸ ©ρ¹

(24)

Therefore the angle θ Gis

θ=

π 2

+α − β

(25)

The criterion for the initial configuration is as follows. C initial = L, ® ¯C initial = R,

if 0 ≤ δ < π if π ≤ δ < 2π

(26)

The criterion for the final configuration is as follows. C final = L, °C ® final = R, °C ¯ final = 0,

if 0 < (φ − θ ) < π if − π < (φ − θ ) < 0

(27)

if (φ − θ ) = 0

5 Simulation Fig. 8 represents the result of simulation for type LSL. The initial configuration of a robot is P(150,300,225’) and the final configuration of a robot is t(399,136,90’). By equations (26) and (27), the type is classified as LSL. After Converted to the standard form, the angle θ i of the initial circle is calculated as 96.61(deg.) and the angle θ f of the final circle is calculated as 218.39(deg.). The connection point of the initial configuration obtained by applying rotational angles to the equation (6) and (7) and a straight line segment PA is calculate as (154,227). And the connection point of the final configuration and a straight line segment PB is calculated as (288,222).

1010

G. Cho and J. Ryeu

Fig. 8. Simulation case for type LSL

Fig. 9 represents the result of simulation for type LSR. The initial configuration of a robot is P(100,150,180’) and the final configuration of a robot is t(299,238,300’). By equations (26) and (27), the type is classified as LSR. After Converted to the standard form, the angle θ i of the initial circle is calculated as -112.94(deg.) and the angle θ f of the final circle is calculated as 127.06(deg.). The connection point of the initial configuration obtained by applying rotational angles to the equation (17) and (18) and a straight line segment PA is calculate as (146,82) and the connection point of the final configuration and a straight line segment PB is calculated as (208,231) respectively.

Fig. 9. Simulation case for type LSR

6 Conclusion This paper proposed an efficient method to find a shortest path for Dubins car-like robot between the initial configuration and the final configuration. We derived the formulae for calculating the rotational angles at the initial form and the final form to

An Efficient Method to Find a Shortest Path for a Car-Like Robot

1011

obtain the coordinates of the connection points between CĺS and SĺC in the type CSC by using the known initial and the final form values. Our method for deriving the formulae is novel and simple in comparison with other researches[10-12]. Introducing the standard forms of type LSL and type LSR makes it possible to derive these simple formulae. And we have presented simple criterions for classifying 4 types of CSC, which are necessary for deciding any type of CSC prior to the conversion to the standard form. Although the fundamental types in Dubins’ method are classified into the type CSC and type CCC, the range in our research was limited to type CSC. Some other methods are required for the type CCC because the radius of the initial form and that of the final form are overlapped in the type. In addition, a study for grafting our results on the Reeds and Shepp’s method in which moving forward and backward is possible must be performed as a further study.

References 1. Dubins, L. E.: On Curves of Minimal Length with a Constraint on Average Curvature and with Prescribed Initial and Terminal Positions and Tangents. Amer. J. Math., 79 (1957) 497-516 2. Melzak, Z. A.: Plane Motion with Curvature Limitations. J. of Soc.Indust.Appl.Math, 9(3) (1961) 471-482 3. Robertson, H. G.: Curvature and Arclength. SIAM Journal Applied Math., 19(4) (1970) 697-699 4. Cockane, E. J., Hall, G. W. C.: Plane Motion of a Particle Subject to Curvature Constraints. SIAM J. Control, 13(1) (1975) 197-220 5. Bui, X.-N., Sou`eres, P., Boissonnat, J.-D., Laumond, J.-P.: Shortest Path Synthesis for Dubins Nonholonomic Robots. In Proc. IEEE Int. Conf. Robot. Automat (1994) 2315-2320 6. Boissonnat, J. D., Bui, X. N.: Accessibility Region for a Car that Only Move Forward Along Optimal Paths. In Res. Rep. INRIA 2181, Sophia-Antipolis, France (1994) 7. Reeds, J. A., Shepp, R. A.: Optimal Paths for a Car that Goes Both Forward and Backward. Pacific Journal of Math., 2 (1990) 367-393 8. Sussmann, H. J., Tang, W.: Shortest Paths for the Reeds-Shepp Car: A Worked out Example of the Use of Geometric Techniques in Nonlinear Optimal Control. Tech. Rep. SYCON-91-10, Rutgers Univ., New Brunswick, NJ (1991) 9. Boissonnat, J.-D., Cerezo, A., Leblond, J.: Shortest Paths of Bounded Curvature in the Plane. In Proc. IEEE Int. Conf. Robot. Automat. (1992) 2315-2320 10. Sou`eres, P., Laumond, J.-P.: Shortest Path Synthesis for a Car-like Robot. in Proc. Eur. Contr. Conf. (1993) 672-688 11. Laumond, J.P., Sou`eres, P.: Metric Induced by the Shortest Paths for a Car-like Mobile Robot. in Proc. IEEE Int. Conf. Intell. Robots Syst. (1993) 1299-1304 12. Shkel, A. M., Lumelsky, V.: Classification of the Dubins Set. Robotics and Autonomous Systems, 34 (2001) 179-202

An Intelligent QOS Model for Adaptive Concurrency Control Running on Ubiquitous Computing Environments Eung Nam Ko Department of Information & Communication, Baekseok University 115, Anseo-Dong, Cheonan, ChungNam, 330-704, Korea [email protected] Abstract. This paper proposes a new model of dynamic window binding by analyzing the window and attributes of the object, and based on this, a mechanism that offers a seamless multimedia view without interfering with concurrency control is also suggested. This paper presents the design of the intelligence QOS model for adaptive concurrency control, which is running on RCSM(Reconfigurable Context Sensitive Middleware) for ubiquitous networks. RCSM provides standardized communication protocols to interoperate an application with others under dynamically changing situations. As the result of this system’s experimental implementation and analysis of the increase of delay factors accordant with the size of collaboration increase, the simulation results clearly show that the seam increases seriously in that case, and that the dynamic window binding mechanism proposed in this paper is worth implementation and effective when a large scale of collaboration is required.

1 Introduction Recently, the rapid development of mobile devices, wireless technique and low-cost, low-power, multifunctional sensor nodes have enabled the realizing of ubiquitous computing environment’s appearance possibly. We can describe ubiquitous computing as the combination between mobile computing and intelligent environment, where intelligent environment is a prerequisite to pervasive computing [1]. Context awareness (or context sensitivity) is an application software system’s ability to sense and analyze context from various sources; it lets application software take different actions adaptively in different contexts [2]. In a ubiquitous computing environment, computing anytime, anywhere, any devices, the concept of situationaware middleware has played very important roles in matching user needs with available computing resources in transparent manner in dynamic environments [3, 4,5]. Although the situation-aware middleware provides powerful analysis of dynamically changing situations in the ubiquitous computing environment by synthesizing multiple contexts and users’ actions, which need to be analyzed over a period of time, concurrency control in using multimedia shared object causes a problem of the seam in the ubiquitous computing environment. It is difficult to avoid a problem of the seam in the ubiquitous computing environment for seamless services. Thus, there is a great need for concurrency control algorithm in D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 1012 – 1021, 2006. © Springer-Verlag Berlin Heidelberg 2006

An Intelligent QOS Model

1013

situation-aware middleware to provide dependable services in ubiquitous computing. This paper proposes a new model of dynamic window binding by analyzing the window and attributes of the object, and based on this, a mechanism that offers a seamless multimedia view without interfering with concurrency control is also suggested. Section 2 describes Reconfigurable Context-Sensitive Middleware (RCSM). Section 3 denotes concurrency control based on RCSM. Section 4 describes simulation results of our proposed algorithm. Section 5 present conclusions.

2 Reconfigurable Context-Sensitive Middleware (RCSM) In the Context Toolkit, a predefined context is acquired and processed in context widgets and then reported to the application through application-initiated queries and callback functions. In this Reconfigurable Context-Sensitive Middleware(RCSM), Stephen S. Yau et al.[2] proposed a new approach in designing their middleware to directly trigger the appropriate actions in an application rather than have the application itself decide which method(or action) to activate based on context. RCSM provides an Object-based framework for supporting context-sensitive applications. It shows how all of RCSM’s components are layered inside a device.

Situation-Aware Application Objects RCSM

Optional Components RCSM Ephemeral Group Communication Service

Other Services

Core Components

O S

Adaptive Object Containers (ADCs) [Providing awareness of situation] RCSM Object Request Broker (R-ORB) [Providing transparency over ad hoc communication]

Transport Layer Protocols for Ad Hoc Networks

Sensors

Fig. 1. Overview of Situation-Aware Middleware

All of RCSM’s components are layered inside a device. The Object Request Broker of RCSM (R-ORB) assumes the availability of reliable transport protocols; one R-ORB per device is sufficient. The number of ADaptive object Containers (ADC)s depends on the number of context-sensitive objects in the device. ADCs periodically collect the necessary “raw context data” through the R-ORB, which in turn collects the data from sensors and the operating system. Initially, each ADC

1014

E.N. Ko

registers with the R-ORB to express its needs for contexts and to publish the corresponding context-sensitive interface. RCSM is called reconfigurable because it allows addition or deletion of individual ADCs during runtime (to manage new or existing context-sensitive application objects) without affecting other runtime operations inside RCSM. However, it did not include concurrency control QoS support in the architecture. In the next subsection, we will present a conceptual model for concurrency control QoS requirements representation in situation-aware middleware.

3 Concurrency Control Based on RCSM In this section, we present a concurrency control mechanism for situation-aware ubiquitous computing. The environment of concurrency control is presented in Section 3.1, its architecture in Section 3.2 and its algorithm in Section 3.3. 3.1 The Environment for Concurrency Control Our proposed model aims at supporting concurrency control requirements by using dynamic window binding mechanism in order to provide ubiquitous, seamless services. An example of situation-aware applications is a multimedia distance education system. The development of multimedia computers and communication techniques has made it possible for a mind to be transmitted from a teacher to a student in distance environment. Other services of RCSM have many agents. They consist of AMA(Application Management Agent), MCA(Media Control Agent), FTA(Fault Tolerance Agent), SA-UIA(Situation-Aware User Interface Agent), SASMA(Situation-Aware Session Management Agent), and SA-ACCA(Situation-Aware Access and Concurrency Control Agent), as shown in Figure 2.

SA-SMA G

S A G S M

AMA FTA

SA-ACCA L S M

SA-UIA MCA

Fig. 2. The Environment for Concurrency Control in Situation-Aware Ubiquitous Computing

An Intelligent QOS Model

1015

AMA consists of various subclass modules. It includes creation/deletion of shared video window and creation/deletion of shared window. MCA supports convenient applications using situation-aware ubiquitous computing. Supplied services are the creation and deletion of the service object for media use, and media share between the remote users. This agent limits the services by hardware constraint. FTA is an agent that plays a role in detecting an error and recovering it in situation-aware ubiquitous environment. SA-UIA is a user interface agent to adapt user interfaces based on situations. SA-SMA is an agent which plays a role in connection of SA-UIA and FTA as situation-aware management for the whole information. SA-ACCA controls the person who can talk, and the one who can change the information for access and concurrency. SMA monitors the access to the session and controls the session. It has an object with a various information for each session and it also supports multitasking with this information. SMA consists of Global Session Manager (GSM), Daemon, Local Session Manager (LSM), Participant Session Manager (PSM), Session Monitor, and Traffic Monitor. GSM has the function of controlling whole session when a number of sessions are open simultaneously. LSM manages only own session. For example, LSM is a lecture class in distributed multimedia environment. GSM can manage multiple LSM. Daemon is an object with services to create session. 3.2 The Architecture for Concurrency Control To win over such dilemma for centralized or replicated architecture, a combined approach, CARV (the Centralized Abstraction and Replicated View) architecture is used to realize the application sharing agent. Figure 3 shows that teacher and students use their local windows and shared window individually.

Local window

Local window

Shared window

SA-GSM

Shared window

Local window

Shared window

Fig. 3. CARV Architecture for Concurrency Control

The shared window is a window shared by all the participants, and the modification carried out by the speaker is notified to every other participants. The local window is not shared except initial file. The tool box provides various tools for editing contents

1016

E.N. Ko

of both the shared window and the local window. The local window has the lecture plans which is distributed at the beginning, and enables participants to memo and browsing other parts in the lesson plans, and has functions as a whiteboard. 3.3 The Algorithm of Concurrency Control As shown in Figure 4, concurrency control inevitably occurs in a multimedia distance education environment where many users perform collaborative work at the same time.

I N I T I A T O R

(1) Issue commands with its session id, timestamp, sequence number and event type

(2) Broadcast serialized commands

(3) Start processing the command and awareness

P A R T I C I P A N T

Fig. 4. Concurrency Control in Situation-Aware Ubiquitous Computing

There may be a case where processing cannot be done in order of arrival due to variation according to present load, processing capability of each system and network delay caused by many participants putting in commands. Concurrency control solves such problems. As shown in Figure 5, when many users request for same media device at the same time, media device acquisition order is controlled. Alternation of event input sequence, according to present load, processing capability of each system and network delay cause by participants’ use of different computers is another problem that occurs in distributed network environment. We attempted to solve such command serialization problem with centralized serialization server and distributed synchronization clock method. In order to guarantee synchronization control and command serialization, SA-ACCA maintains and manages command sequence history that is mutually exchanged. All input events are transmitted with creating time of event to the serialization server. Serialization server can be processed according to the order of occurrence of events. Command serialization occurs because of necessity to process inputs from several users of different locations using same application program and to see same view. Collaborative work system must have command serialization mechanism. In order to overcome the problem mentioned earlier, while the user carries out a serialization of the object that is created in a shared window, a local window that is in coincidence with the scope is bound to the object.

An Intelligent QOS Model

<Seamless View> User Input Local Preview

1017

Shared View

Local Window

Shared Window View Sharing

Abstraction

Serialization

Communication

Fig. 5. A Dynamic Window Binding Mechanism in Situation-Aware Ubiquitous Computing

View

Abstraction

Central Manager

Display

input <Participant #2>

input copy back

<Participant #1>

Abstraction

Abstraction

View Man ager

Display

View Man ager

Display

Fig. 6. CARV architecture

Figure 6 shows the relationship between initiator and participant of a concurrency control for multimedia distance education system.

1018

E.N. Ko

This is the local image copy of a shared object that is to be created. By showing the user the abstract of an input command beforehand, the time that is needed to confirm whether the object is managed properly can be saved. This is like FIFO, and the user is allowed to continue working with at least the minimal level of confidence. Meanwhile, a communication channel for the serialization server is allocated to the local window as an attribute and then concurrency control is in progress. This is the process in which object and command that is created for the purpose of preview is on its way to perfecting window binding on the basis of its actual attributes. This transfer guarantee concurrency and at the same time increases promptness of user interaction, and consequently becomes the solution of eliminating seam from the view. If there is an object that is registered in local window but failed to be registered in shared space due to the failure during serialization process, the user will recognize this as a slight omission and will selectively try to re-input the omitted part.

4 Simulation Results To evaluate the performance of the proposed system, an error detection method was used to compare the performance of the proposed model against the conventional model by using DEVS formalism. In DEVS, a system has a time base, inputs, states, outputs based on the current states and inputs. DEVS(Discrete Event System Specification) is a formalism of being developed by Bernard P. Zeigler. The structure of atomic model is as follows [6 - 12]. M = < X, S, Y, δint, δext, λ, ta > X: a set of input events. S: a set of sequential states. Y: a set of output events. δint : internal transition function δext : external transition function λ : output function ta : time advance function For the structure of software implementation, the CACV(Centralized-Abstraction and Centralized-View) and the RARV(Replicated-Abstraction and Replicated-View) are extreme approaches to design software architecture on which distributed, collaborative applications are based. The CARV(Centralized-Abstraction and Replicated-View) architecture is also comparable with those architecture in terms of performance [12]. The perceived length of system delay the user working in a multimedia collaboration environment for situation-aware middleware to provide dependable services in ubiquitous computing seems to feel depends upon system load and structure of software the application has. The extent can be summarized as Table 1.

An Intelligent QOS Model

1019

Table 1. Performance Model for Software Architecture

Model CACV RARV CARV

Ts (n-1)Tf n(n-1)Tf (n-1)Tf

Tp Ta + Tv Ta Ta

Tv To Tv Ti + Tv

Traffic Model (n-1)E(e)(nFv+Fe) n(n-1) E(e)Fe (n-1)E(e)(nFi+Fe)

Total delay: Ts + Tp + Tv, Ts: serialization overhead, Tp: processing time Tv: view time, Tf: mean transfer time, Ta: abstraction time, Ti: intermediate result time, To: output transfer time, Tv: view generation time Fv: traffic of view transfer, Fe: traffic of event transfer, Fi: traffic of intermediatetx, n: number of platforms, E(e): mean input frequency

Based on the traffic property, the serialization overhead is to be calculated by examining mean transfer time Tf which is the mean value of time required to transmit event packet through out a network. For the evaluation of mean transfer time, following, formula is adopted as equation 1. Because the formula is much simplified to use only for the application layer protocols, it does not include details such as collision loss. Tf = ( m + t / 2 ) * (1 / 1 – p)

(1)

where m: frame transmission time t: propagation delay of the physical network p: traffic amount / channel capacity As it assumed that a message packet is transferred within a mean transfer time via a network, only the number of messages required to perform serialization is taken into account to evaluate the serialization overhead. In the pessimistic approach that is adopted in our project, sequence of received command input is regarded as always disordered as much as possible. Therefore the waiting time for serialization is required as in the worst case. In case of centralized architectures such as CACV and CARV, it is n-1. On the contrary, n(n-1) order of unit message transfer time is

ͽΒΥΖΟΔΪ͙΃ΒΥΚΠ͚͚

ͭ΄ΖΣΚΒΝΚΫΒΥΚΠΟ͑΀ΧΖΣΙΖΒΕ͑ΠΗ͑ͶΒΔΙ͑ͲΣΔΙΚΥΖΔΥΦΣΖͯ ͩ ͧ ͥ ͣ ͡

ʹͲʹ· ΃Ͳ΃· ʹͲ΃· ͢

ͤ

ͦ

ͨ

ͪ

͢͢

ͤ͢

ͦ͢

ͨ͢

ͿΦΞΓΖΣ͑ΠΗ͑͑΄ΚΥΖΤ

Fig. 7. Serialization Overhead of Each Architecture

1020

E.N. Ko

necessary in replicated architecture that is typical RARV. Overview of the command serialization overhead is obtained as seen by Figure 7. The figure on the vertical line means relative overheads compared with maximum value on RARV.

5 Conclusions The focus of situation-aware ubiquitous computing has increased lately. An example of situation-aware applications is a multimedia education system. The development of multimedia computers and communication techniques has made it possible for a mind to be transmitted from a teacher to a student in distance environment. This paper proposed a new model of dynamic window binding by analyzing the window and attributes of the attributes of the object, and based on this, a mechanism that offers a seamless view without interfering with concurrency control is also suggested. As the result of this system’s experimental implementation and analysis of the increase of delay factors accordant with the size of collaboration increase, the simulation results clearly showed that the seam increases seriously in that case, and that the dynamic window binding mechanism proposed in this paper is worth implementation and effective when a large scale of collaboration is required. In the future work, faulttolerance system will be generalized to be used in any environment, and we will progress the study of domino effect for distributed multimedia environment as an example of situation-aware applications. We remain these QoS for resource conflict resolution strategies as future work.

References 1. Hung, N.Q., Ngoc, N.C., Hung, L.X., Lei, S., Lee, S.Y.: A Survey on Middleware for Context-Awareness in Ubiquitous Computing Environments. Korea Information Processing Society Review (2003) 97–121 2. Yau, S., Karim, F., Wang, Y., Wang, B., Gupta, S.: Reconfigurable Context-Sensitive Middleware for Pervasive Computing. IEEE Pervasive Computing, 1(3) (2002) 33-40 3. Yau, S. S., Karim, F.: Adaptive Middleware for Ubiquitous Computing Environments. Design and Analysis of Distributed Embedded Systems, Proc. IFIP 17th WCC, Vol. 219 (2002) 131-140 4. Yau, S. S., Karim, F.: Contention-Sensitive Middleware for Real-time Software in Ubiquitous Computing Environments. Proc. 4th IEEE Int’l Symp. on Object-Oriented Realtime Distributed Computing (ISORC 2001) (2001) 163-170 5. Saha, D., Mukherjee, A.: Pervasive Computing: A Paradigm for the 21st Century. IEEE Computer, 36(3) (2003) 25-31 6. Zeigler, B. P., Cho, T. H., Rozenblit, Jerzy W.: A Knowledge-Based Simulation Environment for Hierarchical Flexible Manufacturing. IEEE Transaction on Systems, Man, and Cybernetics-Part A: System and Humans, 26(1) (1996) 81-90 7. Cho, T. H., Zeigler, B. P.: Simulation of Intelligent Hierarchical Flexible Manufacturing: Batch Job Routing in Operation Overlapping. IEEE Transaction on Systems, Man, and Cybernetics-Part A: System and Humans, 27(1) (1997) 116-126

An Intelligent QOS Model

1021

8. Zeigler, B. P.: Object-Oriented Simulation with Hierarchical, Modular Models. Academic Press (1990) 9. Zeigler, B. P.: Multifacetted Modeling and Discrete Event Simulation. Orlando, FL: Academic (1984) 10. Zeigler, B. P.: Theory of Modeling and Simulation. John Wiley, NY, USA (1976), reissued by Krieger, Malabar, FL, USA (1985) 11. A.I. Conception and B.P. Zeigler: The DEVS Formalism: Hierarchical Model Development. IEEE Trans. Software Eng., 14(2) (1988) 228-241 12. J. C. Lauwers and K. A. Lantz: Collaboration Awareness in Support of Collaboration Transparency: Requirements for the Next generation of Shared Window Systems. proc. of ACM CHI’90 (1990) 302-312

Analysis and Extension of S/Key-Based Authentication Schemes for Digital Home Networks Ilsun You Department of Information Science, Korean Bible University, 205 Sanggye-7 Dong, Nowon-ku, Seoul, 139-791, South Korea [email protected]

Abstract. Recently, we have proposed two S/Key based authentication schemes for secure remote access in digital home networks. However, because they, like other S/Key based schemes, focus on only authentication, they don’t provide authorization, which is one of the important security services at home network. In this paper, we analyze and extend the schemes to support the authorization service. For this goal, we propose a Lightweight Attribute Certificate (LAC) and Lightweight Authorization Protocol (LAP). Through the LAC, LAP allows clients to be seamlessly authenticated and authorized regardless of their location. Especially, this protocol where the involved parties needs only lightweight cryptographic operations can reduce computational cost with single sign-on.

1 Introduction Due to the rapid growth of the Internet technology and electronic devices, much interest has risen in digital home networks [1]. A typical home network is composed of home gateway, home appliances, mobile devices, and service providers. Especially, the home gateway plays an important role as an intermediary between the home network and external networks. It allows residential users to remotely access and control home appliances such as TVs, lights, washing machines, and refrigerators. However, in spite of such conveniences, the remote control service causes digital home networks to have various security threats such as masquerade, denial of service attacks, etc. Furthermore, handheld devices are often connected to digital home networks by wireless links and the links are especially vulnerable to passive eavesdropping, active replay attacks, and other active attacks. Therefore, it is necessary to provide strong security services in digital home networks. In particular, user authentication is a key service required for remote access control. In this paper, we study user authentication schemes that allow user to remotely access and control home appliances through home gateway. Considering both strong security and lightweight cryptographic operations, we have explored the S/Key user authentication scheme, a widely known one-time password system [2,3]. As a result, we proposed two S/Key based authentication schemes for digital home networks [4-6]. Though they achieve both strong security and efficiency, they, like other S/Key based schemes, don’t provide authorization, which is one of the important security D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 1022 – 1033, 2006. © Springer-Verlag Berlin Heidelberg 2006

Analysis and Extension of S/Key-Based Authentication Schemes

1023

services at home network. This paper analyzes and then extends the schemes to support the authorization service. For this goal, we propose a Lightweight Attribute Certificate (LAC) and Lightweight Authorization Protocol (LAP). The rest of the paper is organized as follows. Section 2 describes related works and section 3 analyzes two S/Key based authentication schemes. In section 4, we propose a Lightweight Attribute Certificate (LAC) and Lightweight Authorization Protocol. Finally, section 5 draws some conclusions.

2 Related Work Since user mobile devices tend to be resource constrained, we consider the S/Key scheme and its variants, which uses lightweight cryptographic operations such as exclusive-OR and Hash function. This section briefly describes the S/Key based authentication schemes. The S/KEY one-time password scheme is designed to protect a system against replay or eavesdropping attacks [2,3]. With S/Key, the user’s secret pass-phrase never needs to cross the network at any time such as during the authentication or passphrase changes. Moreover, no secret information needs to be stored on any system, including the server being protected. Although the S/KEY scheme protects a system against passive attacks based on replaying captured reusable passwords, it is vulnerable to server spoofing, preplay, and off-line dictionary attacks [7,9]. Several researches have been conducted to solve these drawbacks of the S/KEY scheme [4-9]. Mitchell and Chen propose two possible solutions to resist against server spoofing and preplay attacks. One is to locally store the predictable challenge in a client so that a server has no need to send the challenge in every login. The other is to digitally sign the predictable challenge. Yen and Liao propose a scheme that uses a shared tamper resistant cryptographic token, which includes a SEED, to prevent off-line dictionary attacks. Yeh, Shen and Hwang propose an one-time password authentication scheme which enhances the S/KEY scheme to resist against the above attacks. The scheme uses smart cards to securely preserve a preshared secret, SEED, and simplify the user login process. Also, it provides a session key to enable confidential communication over the network. However, since the scheme uses user's weak pass-phrase and utilizes SEED as a pre-shared secret, the exposure of the SEED causes the scheme to retain the flaws of the S/KEY scheme [11]. Consequently, the scheme cannot achieve the strength of the S/KEY scheme that no secret information needs to be stored on the server. In addition, it cannot defend against server compromises and is vulnerable to several attacks such as stolen-verifier attacks, denial of service attacks and Denning-Sacco attacks [11-13]. Lee and Chen propose an improvement on Yeh-Shen-Whang’s scheme to prevent its vulnerability from stolen verifier attacks [8,9]. With such improved security, their scheme still provides the same efficiency as Yeh-Shen-Whang’s scheme. However, it is vulnerable to denial of service (DoS) attacks and also allows a compromise of past session keys [5].

1024

I. You

3 Two Enhanced Authentication Schemes: LWAS and PKAS Recently, we have proposed two enhanced authentication schemes for secure remote access in digital home networks [4-6]. While the one, which is called LWAS(Light Weight Authentication Scheme), uses additional lightweight cryptographic operations to improve Yeh-Shen-Whang’s and Lee-Chen’s scheme, the other, which is called PKAS(Public Key Authentication Scheme), uses a server-side public key to improve them. In this section, we review and analyze them. 3.1 Notation and Preliminary -

K is the user secret, C is the client and S is the server SK is the server’s secret key and IPX denotes the IP address of X id denotes the user’s identity SCX denotes X’s smart card and SCNX denotes the serial number of X’s smart card N denotes a permitted number of login times and D is a random number H() denotes a collision-resistant hash function HX(m) means that the message m is hashed X times KRX denotes the private key of X and KUX denotes the public key of X PKCX denotes X.509 public key certificate of X PX(m) means that the message m is encrypted with the public key of X PX -1(m) means that the message m is encrypted with the private key of X pt denotes tth one-time password ⊕ denotes Exclusive-OR operation and | denotes concatenation

Before starting the registration stage, the client randomly generates a large number K and stores it in the smart card. 3.2 LWAS (Light Weight Authentication Scheme) LWAS is composed of three stages: registration, login and authentication stages. This section reviews the scheme and analyzes its drawbacks. It is assumed that the client initially receives SEED (= id ⊕ SK) from the server by out-of-band distribution. (1) Registration Stage Step1) C → S: id Step2) C ← S: N, H(SEED ⊕ N) ⊕ D, H(D), where SEED = id ⊕ SK Step3) C → S: p0 ⊕ D, H(p0 | D), where p0 = HN(K ⊕ SEED) At step (3), the client sends H(P0 | D) as well as P0 ⊕ D to the server. The hash value makes it possible for the server to ensure that P0 is valid, thus defending against DoS attacks. As a result of this stage, the login information P0 and N are stored at the server. In addition, the smart card stores P0.

Analysis and Extension of S/Key-Based Authentication Schemes

1025

(2) Login Stage Step1) C → S: id Step2) C ← S: CN, H(SEED ⊕ CN) ⊕ D, H(D) ⊕ pt-1, where CN = N–t Step3) C → S: pt ⊕ H(D), where pt = HCN(K ⊕ SEED) This stage provides mutual authentication between a user and a server. (3) Authentication Stages After receiving, the server XORs the received pt ⊕ H(D) with H(D) to obtain the fresh one-time password pt. If the hash value of pt is equal to pt-1 stored in the server, then the client is authenticated. Finally, the server updates the last password with pt and the counter value with CN in its database. D, randomly generated by the server, can be used as a session key to enable confidential communication between the server and the client. (4) Analysis LWAS uses only 4 additional hash operations to solve the drawbacks of Lee-Chen’s scheme. Thus, it achieves both the strong security and good performance while being appropriate for low-power mobile devices. However, due to using SEED as a pre-shared secret, it has the following drawbacks: - Because SEED is derived from the server’s secret key SK, anyone who owns or steals a SEED can mount off-line dictionary attacks to find SK. That causes LWAS to be vulnerable to various attacks by legitimate users. - It is so difficult to securely distribute SEED. That causes LWAS to distribute SEED by out-of-band method. Thus, LWAS cannot guarantee the desired security by itself, depending on out-of-band method. - If an attacker succeeds in compromising SK, he or she can compromise all SEEDs. In this case, all clients should reinitialize their login information and smart card. Due to the difficulty of SEED distribution, the cost for recovering LWAS is highly expensive. 3.3 PKAS (Public Key Authentication Scheme) In PKAS, a server-side public key, instead of a pre-shared secret SEED, is used to improve YEH-SHEN-HWANG’s and Lee-Chen’s scheme. PKAS is composed of three stages: registration, login and authentication stages. This section reviews the scheme and analyzes its drawbacks. It is assumed that the server initially issues a smart card containing the server’s X.509 public key certificate PKCS to the user. If PKCS is expired or compromised, it can be updated through an out-of-band method. In addition to PKCS, the smart card contains a randomly generated large number D and PS-1(H(id ⊕ D)) used to authenticate the user in the registration stage. Those values are used once in the first

1026

I. You

registration of the user. To reinitialize the user’s login information, the client uses the large random number D shared with the server in the login stage. (1) Registration Stage Step1) C → S: id, H(SCNid ⊕ IPC), PS(D), PS-1(H(id ⊕ D)) Step2) C ← S: N, SEED, H(N ⊕ SEED ⊕ D) Step3) C → S: p0⊕ D, H(p0 | D) , where p0= HN(K ⊕ SEED) For the first registration, the client uses the D stored in the smart card to authenticate the user. To prevent denial of service attacks, the server firstly verifies the received H(SCid ⊕ IPC) before the expensive public key operations for PS(D) and PS1 (H(id ⊕ D)). Later, to reinitialize the user's login information such as SEED, N and p0, the client should pass both the login stage and the authentication stage to be mentioned below, and use the large number D shared with the server in the login stage. In this case, the client goes through step (2) - (3), omitting step (1). (2) Login Stage Step1) C → S: id, H(IPC ⊕ SCNid ⊕ pt-1), PS(D) Step2) C ← S: CN, SEED, H(CN ⊕ SEED ⊕ D) , where CN = N–t Step3) C → S: pt⊕ H(D) , where pt= HCN(K ⊕ SEED) (3) Authentication Stages Upon receiving, the server compares the hash value of the received pt with pt-1 stored in its database. If they are equal, then the user is authenticated. In this case, the server updates the last password with pt and the counter value with CN in its database, and sends the accept message to the client. Later, the randomly generated D can be used as a session key to enable confidential communication between the server and the client. Furthermore, it is used to reinitialize the user’s login information such as SEED, N and p0. (4) Analysis With a server-side public key, PKAS can overcome the problems resulted from using SEED as a pre-shared secret. However, it needs the client and server to perform expensive asymmetric cryptographic operations. Especially, on CPU-limited devices, asymmetric cryptographic operations are much slower than symmetric cryptographic operations. Moreover, it is computationally expensive or energy-intensive to perform asymmetric cryptographic operations. 3.4 Comparison Computational Cost: Tables 1 compares the computational costs of LWAS and PKAS. In comparison to LWAS, PKAS has an additional computational cost, which includes expensive asymmetric cryptographic operations. Though the additional cost results in performance degrade, it enables PKAS to overcome the drawbacks of LWAS, which is caused by a pre-shared secret SEED.

Analysis and Extension of S/Key-Based Authentication Schemes

1027

Table 1. Comparison of the Computational Costs for LWAS and PKAS Step Reg. Step1

LWAS -

PKAS 4Hash + 4XOR + 3PK

Reg. Step2 Reg. Step3

4Hash + 5XOR (N+2)(Hash+XOR) + 2CON (N+6)Hash + (N+7)XOR + 2CON -

2Hash + 4XOR (N+2)(Hash+XOR) + 2CON (N+8)Hash + (N+10)XOR + 3PK + 2CON 2Hash + 4XOR + 2PK

4Hash + 7XOR

2Hash +4XOR (N-t+1)(Hash+XOR) (N-t+5)Hash + (N-t+9)XOR + 2PK

Reg. Total Login Step1 Login Step2 Login Step3 Login Total Auth. Total

(N-t+1)(Hash+XOR) (N-t+5)Hash + (N-t+8)XOR 2Hash + 1XOR

2Hash + 1XOR

(2N-t+13)Hash +

(2N-t+15)Hash + (2N-t+20)XOR + 5PK +

Difference +4Hash, +4XOR, +3PK -2Hash, -1XOR +2Hash, +3XOR, +3PK +2Hash, +4XOR, +2PK -2Hash, -3XOR

+1XOR, +2PK +2Hash, +4XOR, +5PK

(2N-t+16)XOR + 2CON 2CON * Hash denotes a hash operation, XOR denotes an exclusive-OR operation. PK denotes a public key operation, CON denotes a concatenation operation.

Security: In terms of security, the main difference between LWAS and PKAS is that LWAS uses SEED, which is derived from its secret SK, as a pre-shared secret. Thus, for the security comparison of two schemes, it needs to analyze the costs of breaking the server’s secret key SK and the server’s private key KRS. While Fig. 1 shows the algorithm for breaking SK in LWAS, Fig. 2 shows the algorithm for breaking KRS in PKAS. The costs for breaking SK and KRS are as follows: - n: the key size - Cskey: the cost of breaking the server’s secret key - Cpkey: the cost of breaking the server’s private key - Cxor: the cost of XOR operation - Cpk-decrypt: the cost of public key decryption operation Cskey = 2n × Cxor Cpkey = 2n × Cpk-decrypt The above cost analysis shows that PKAS is more secure than LWAS because Cpk-decrypt is much more than Cxor. Furthermore, the cost for recovering LWAS from SK compromise is expensive because it includes the re-initialization of all users’s login information and distribution of SEED. In the case of PKAS, PKCS in all users’s smart card (or handheld devices) should be updated.

1028

I. You

Fig. 1. Algorithm for Breaking the Server’s Secret Key in LWAS

Fig. 2. Algorithm for Breaking the Server’s Private Key in PKAS

Analysis and Extension of S/Key-Based Authentication Schemes

1029

4 Extensions The S/Key scheme and its variants, including LWAS and PKAS, have focused on only authentication. Thus, they don’t provide authorization, which is one of the important security services at home network. In this section, we extend LWAS and PKAS to support the authorization service. For this goal, we propose a Lightweight Attribute Certificate (LAC) and Lightweight Authorization Protocol (LAP). With the LAC, the proposed protocol allows the involved parties to perform only lightweight cryptographic operations. Furthermore, it enables single sign-on (SSO), which reduces cryptographic operations, and seamless authentication and authorization between the home network and external networks. 4.1 Preliminary It is assumed that a home network is composed of a home gateway (HG), home network service providers (HS), clients (C) and external service providers. A master key mk, a short-term group key, is shared by the home gateway and home network service providers, while being updated periodically. 4.2 Lightweight Attribute Certificate (LAC) Fig. 3 illustrates the structure of a LAC, which contains attribute information such as role, group, access identity and clearance as well as a session key D. The LAC is a short-term certificate, which is digitally signed through the master key mk.

Fig. 3. Lightweight Attribute Certificate

4.3 Lightweight Authorization Protocol (LAP) LAP is divided into the out-home network and in-home network protocols. Out-Home Network Protocol: This protocol allows C in external networks to access the home network through HG. It is composed of LAC issue phase, HG authentication phase and HS authentication phase as shown in Fig. 4.

1030

I. You

Fig. 4. Out-Home Network Protocol

Fig. 5. In-Home Network Protocol

Analysis and Extension of S/Key-Based Authentication Schemes

1031

In LAC issue phase, the login stage of LWAS or PKAS is performed during the steps (1)-(3). If the login is successful, HG issues a LAC to C in the step (4). Then, HG allows C to access in-home network through itself, while starting a new session between C and itself. C can prove itself as the owner of the issued LAC through the session key D established during the login stage. During the valid period of the issued LAC, if the session finishes, C needs not to perform this phase. Instead, C performs HG authentication phase to authenticate itself to HG. If this authentication is positive, C establishes a session, over which the home network can be accessed through HG. Especially, because this phase includes only lightweight cryptographic operations, it can reduce the computational costs occurred in the login stage of LWAS or PKAS. To access a home network service over the established session, C performs HS authentication phase. During HS authentication phase, HS verifies C’s LAC and H(id⊕hs⊕ts⊕D). If the verification is positive, it can be sure that C is valid. Then, it authorizes C according to the attributes included in C’s LAC. In-Home Network Protocol: This protocol is designed for the case that C is in the home network. As illustrated in Fig. 5, if C is successfully authenticated through LAC issue phase, it can directly access home network services without HG. Except for HG authentication, this protocol is same as out-home network protocol. That makes it possible for LAP to provide seamless authentication and authorization between the home network and external networks. 4.4 Analysis Through the LAC, LAP allows C to be seamlessly authenticated and authorized regardless of its location. Especially, this protocol where the involved parties needs only lightweight cryptographic operations can reduce computational cost with SSO. To analyze the performance improvement through SSO, we compare the computational cost of HS (or HG) authentication phase with that of LWAS and PKAS as follows: - CLWAS : the computational cost of LWAS’s login stage - CPKAS : the computational cost of PKAS’s login stage - CLAP : the computational cost of HS (or HG) authentication phase CLWAS = (N-t+5)×Hash +(N-t+8)×XOR CPKAS = (N-t+5)×Hash +(N-t+9)×XOR + 2×PK CLAP = 4×Hash + 10×XOR CLWAS - CLAP = (N-t+1)×Hash + (N-t-2)×XOR CPKAS - CLAP = (N-t+1)×Hash + (N-t-1)×XOR + 2×PK Because N-1 ≥ t, CLWAS - CLAP ≥ 2×Hash – XOR CPKAS - CLAP ≥ 2×Hash + 2×PK

1032

I. You

As shown above, while LAP can reduce at least (2×Hash – XOR) operations compared to LWAS, it can reduce at least 2×(Hash + PK) operations compared to PKAS. In particular, the client’s computational cost occurred in HS (or HG) authentication phase is (2×Hash + 5×XOR). Such cost allows LAP to support low-power mobile devices.

5 Conclusions Remote access control is an important service in the design of digital home networks. The service allows residential users to remotely access and control home appliances such as TVs, lights, washing machines, and refrigerators. However, in spite of such conveniences, the remote control service causes digital home networks to have various security threats. Therefore, it is necessary to provide strong security services in digital home networks. In particular, user authentication is a key service required for remote access control. Recently, we have proposed two S/Key based authentication schemes for secure remote access in digital home networks. However, because they, like other S/Key based schemes, focus on only authentication, they don’t provide authorization, which is one of the important security services at home network. In this paper, we analyze and extend the schemes to support the authorization service. For this goal, we propose a lightweight attribute certificate LAC and an authorization protocol LAP. Through the LAC, LAP allows clients to be seamlessly authenticated and authorized regardless of their location. Especially, this protocol where the involved parties needs only lightweight cryptographic operations can reduce computational cost with single sign-on.

References 1. Sun, H.: Home Networking. Mitsubishi Electric Research Laboratories (2004) http://www.merl.com/projects/hmnt/ 2. Haller, N.: The S/KEY One-time Password. RFC 1760 (1995) 3. Haller, N., Metz, C., Nesser, P., Straw, M.: A One-time Password System. RFC 2289 (1998) 4. You, I., Cho, K.: A S/KEY Based Secure Authentication Protocol Using Public Key Cryptography. The KIPS Transactions: Part C, 10-C(6) (2003) 5. You, I., Jung, E.: A Light Weight Authentication Protocol for Digital Home Networks. ICCSA 2006, Springer-Verlag LNCS. 3938 (2006) 416-423 6. You, I.: One-Time Password Authentication Scheme for Secure Remote Access in Intelligent Home Networks. Accepted to KES 2006 (2006) 7. Mitchell, C. J., Chen, L.: Comments on the S/KEY User Authentication Scheme. ACM Operating Systems Review, 30(4) (1996) 12-16 8. Yeh, T. C., Shen, H. Y., Hwang, J. J.: A Secure One-Time Password Authentication Scheme Using Smart Cards. IEICE Transaction on Communication, E85-B(11) (2002) 2515-2518 9. Lee, N. Y., Chen, J. C.: Improvement of One-Time Password Authentication Scheme Using Smart Cards. IEICE Transaction on Communication, E88-B(9) (2005) 3765-3767

Analysis and Extension of S/Key-Based Authentication Schemes

1033

10. Yen, S. M., Liao, K. H.: Shared Authentication Token Secure against Replay and Weak Key Attacks. Information Processing Letters, 62 (1997) 77-80 11. You I., Cho, K.: Comments on YEH-SHEN-HWANG's One-Time Password Authentication Scheme. IEICE Transaction on Communication, E88-B(2) (2005) 751-753 12. Denning, D., Sacco, G.: Timestamps in Key Distribution Systems. Communications of the ACM, 24(8) (1981) 533-536 13. Kim, S., Kim, B., Park, S., Yen, S.: Comments on Password-Based Private Key Download Protocol of NDSS'99. Electronics Letters, 35(22) (1999) 1937-1938

Anonymous Voting Scheme on Multicast Dong-Myung Shin, Hee-Un Park, Woo-Jae Won, and Jae-Il Lee Korea Information Security Agency, 78, Garak-Dong, Songpa-Gu, Seoul, Korea {dmshin, hupark, yjwon, jilee}@kisa.or.kr

Abstract. A cryptographic scheme can be used for authentication to multicast users who exchange messages with group members or receive a multimedia service from server. But using conventional method like a group key management don't provide anonymity in above situation. Electronic voting, anonymous conference on multicast demand secure and anonymous service with transaction. In this service, other group members should not know which is selected by him/her or who select the one. Various selection processes are present in the real world. Among them, some processes are open to everyone and others allow only members to validate them. These choice and selection types have been developed using cryptography protocol model. But, further study of definite anonymous secure selection protocols based on a cryptography is needed. In this paper, we propose the anonymous secure selection to get the anonymity of an originator, and which blinds the selection result from other entities on the open network. This approach protects a selection result of an originator using one-way anonymous authentication service.

1 Introduction Multicast delivers source traffic to multiple receivers without adding any additional burden on the source or the receivers while using the least network bandwidth of any competing technology. Multicast is a bandwidth-conserving technology that reduces traffic by simultaneously delivering a single stream of information to thousands of corporate recipients. Applications that take advantage of multicast include videoconferencing, electronic voting, IPTV, corporate communications, E-learning, and distribution of software, stock quotes, and news. And there have been many researches into group key management of multicast members[1][2]. However this research limited to member authentication and data secrecy. Especially, in case of electronic voting and anonymous conferencing on multicast, additional requirements should be satisfied. For examples, other group members should not know which is selected by him/her or who select the one. Various selection processes are present in the real world. Among them, some processes are open to everyone and others allow only members to validate them. These choice and selection types have been developed using cryptography protocol model. A research for 'one-way anonymous authentication service' has been in progress to keep the anonymity of the information originator and authenticating the message based on the protocol for a specific group and information. Among them, electronic D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 1034 – 1039, 2006. © Springer-Verlag Berlin Heidelberg 2006

Anonymous Voting Scheme on Multicast

1035

voting, anonymous conference, and fair E-money can be representative examples. This service is based on specific group members. Figuring out an originator's real identity from his/her transmission information is impossible but The authentication to prevent forgery and falsification of information is possible. When the illegal information is made by the originator, group members can identify him/her. Also The third party can not confirm the selected information without the originator's permission. Theses features are important means for offering safety to different sets of users with logically contradicting requirements. As you see, 'Anonymous Secure Selection Scheme' an originator processes secure selection and guarantees the anonymity of his/her identity[3]. In the open network, some researches are proposed to solve the anonymous secure selection service only in parts, but integration mechanisms have not been studied yet. Therefore in this paper, we propose the anonymous voting scheme for secure selection to get the anonymity of an originator, and which blinds the selection result from other entities on the open network.

2 Anonymous Voting Scheme for Secure Selection In this chapter, we describe the electronic voting that is based on one-way anonymous authentication services. On the open network, the efficiency of electronic voting is an interesting field on research. In spite of its importance, when the voting is operated in the open network, the electronic voting scheme has some security requirements and if these requirements are not met, the voting scheme is not to be trusted. Therefore, in this paper, we propose a new electronic voting scheme that satisfies the all the requirements. Especially we apply anonymous secure selection to the electronic voting system on multicast and analyze whether this satisfies all the requirements. The most important characteristic in the voting scheme is an ability to maintain voting security or anonymity. This means that the voting scheme must prevent the third party from knowing the relationship between the voting and the voter. The proposed scheme applies electronic voting system on multicast to satisfy security requirements. This scheme disapproves the forgery and confirming of voting result from the third party and provides private voting that satisfies the anonymity. 2.1 Symbols In this section, we describe system parameters used in the new proposed scheme. • Vi, Gi, Ti, CTA : Voter, Voting booth, Tally center, and CTA(Central Tabulate Agency) • vi, V_resulti : Voting value and result of the voter i • r_și, r_salti, r_seed : Voter i's random number ș, CTA's salt value, and Tally center's seed value • IDi : Voter i's alias identity generated by CTA

1036

• • • • • • • •

D.-M. Shin et al.

p, q : Large prime number, p-1/q g : Integer with order q modulo p randomly chosen from [1, ..., p-1] * KS_* RZ q : *'s private key KP_* = gKS_* mod p : *'s public key EKP(S)_*, DKP(S)_* : *'s public key or private key encryption and decryption H : One-way hash function * xgi RZ q : The private key list for group signature xgi ygi = g mod p : The public key list for group certification.

Щ

Щ

2.2 System Protocol Phase 1. Preliminary phase Step 1.1 CTA (1) CTA writes out the voting list with confirming voting objects and broadcasts the information related to the voting to all voters. Voting information ĺ Vi (2) It generates a slat value r_salti and an alias identity IDi (i= {1, 2, ..., n}, which are given to voter. Here, r_salti is stored in a voting token and a voter, and voting booth can view this information. (3) CTA signs IDi, and concatenates IDj (j i) with a salt value r_saltj which are Digital Voting Token(DVT) elements. It then sends a signed and encrypted DVT with a voting booth's public key to a voting booth Gi. Note. the number of DVT is based on voter's number in a voting list. EKP_Gi(EKS_CTA(IDj||r_saltj)) ĺ Gi Step 1.2 Tally center Ti Ti generates and signs a seed value r_seed and sends a signed and encrypted r_seed along with a voting booth's public key to Gi. At this time, only Ti takes the seed value. EKP_Gi(EKS_Ti(r_seed)) ĺ Gi

ң

Phase 2. Voting phase Step 2.1 Voter Vi When the voting day, a voter Vi authenticates himself to CTA and joins a voting booth Gi on the Internet. EKS_Vi(Voting list) ĺ CTA Here, a voting list will be compared with a tally center's summary sheet, in the confirming phase. Step 2.2 Voting booth Gi He/She sends one of the DVTs to Vi randomly. DVT ĺ Vi Step 2.3 Voter Vi (1) Vi chooses two random numbers r_și and x. r_și, x R[1, 2, ..., q-1] (2) He/She selects a voting value vi.

Щ

Anonymous Voting Scheme on Multicast

1037

(3) He/She generates the voting results V_result using his/her alias identity IDi, CTA's salt value r_salti, Ti's r_seed, and r_și in this way. V_resulti = IDi || (vjҶr_seed) ||(r_salti(vi Ҷr_și) Ҷr_seed) ||... ||(vkҶr_seed) Note. V_n={selection element1, ..., selection elementn}, viЩV_n, V_sңV_n, vj, vkЩV_s (4) Vi calculates k and k' for group signcryption as follows. k = H(KP_CTAx mod p) k1 || k2 = k k' = H(KP_Tix mod p) k1' || k2' = k' (5) He/She processes the encryption and signature. • For CTA : c = Ek1(V_resulti) r = KHk2(V_resulti) R = gr mod p s = x/(r + xgrpi) mod q • For Ti : c’ = Ek1'(V_resulti) r' = KHk2'(V_resulti) R' = gr' mod p s' = x/(r' + xgrpi) mod q (6) Vi sends the generated information (c, R, s) and (c', R', s'), and send it to CTA and a tally center Ti. (c || R || s) ĺ CTA (c' || R' || s') ĺ Ti Phase 3. Confirming phase Step 3.1 CTA and Tally center Ti (1) When the voting time is over, CTA and Ti exchanger their secret information secure way. EKP_Ti(EKS_CTA(IDi||r_salti)) ĺ Ti EKP_CTA(EKS_Ti(r_seed)) ĺ CTA (2) They certify the received voting information from a voter Vi, and decrypts the voting result V_resulti. • CTA : k = H((ygi*R)s*KS_CTA mod p) k1 || k2 = k V_resulti = Dk1(c) When R = gKHk2'(V_resulti) mod p, then V_resulti is accepted. • Ti : k' = H((ygi*R')s'*KS_Ti mod p) k1' || k2' = k' V_resulti = Dk1'(c') When R' = gKHk2'(V_resulti) mod p, then V_resulti is accepted.

1038

D.-M. Shin et al.

Step 3.2 Voter Vi A voter concatenates r_și, IDi and a voting result to confirm, and sends this information to a tally center encrypting it with public key of CTA. At this time, a voting result to confirm is chosen in V_resulti by the voter Vi. EKP_CTA(IDi ||r_salti(viҶr_și) Ҷr_seed || r_și) ĺ TCA EKP_Ti(IDi ||r_salti(viҶr_și) Ҷr_seed || r_și) ĺ Ti Step 3.3 CTA and Tally center Ti (1) They verify vi using the received ș value from the voter and exchange information between CTA with Ti. (r_salti * (vi Ҷ r_și) Ҷ r_seed) Ҷ r_seed = r_salti * (vi Ҷ r_și) (r_salti * (vi Ҷ r_și)) / r_salti = (vi Ҷ r_și) (vi Ҷ r_și) Ҷ r_și = vi (2) Tally center Ti shows the recovered information to a voter Vi in which way. IDi||vi (3) After confirming the voting result, Ti opens the total voting results and compares number of voters with the number gotten from the voting list. v1k, ..., vnk (n = a number of voters, k Щ {voting results}) (4) CTA calculates H_V-result using one-way hash function as follows. It then compares its number of voters with the number gotten from the tally center's summery sheet. H_V-result = H(v1k Ҷ ... Ҷ vnk) Step 3.4 All voting entities They verify the total voting result. H(v1k Ҷ ... Ҷ vnk) = H_V-result Phase 4. Opening If the voting results have no problem, CTA announces the election result to all members. Status-based protocol was proposed for the safety of user location for any event in which the session is not terminated safely through the use of the flag. However it does not have a verification process for T therefore if T is altered, authentication is not possible even with transmission from associated tag. In addition if a malicious third party disrupts the last session of a tag whose flag value is 0, the next authentication protocol is processed with the flag value of 1. Here, ID of the database is renewed, but the tag is not able to renew the ID therefore, synchronization problems arise. 2.3 Analysis of Proposed Scheme In the proposed scheme, only authenticated voters can obtain entrance into the voting booth, and get the voting token that is signed digitally by CTA. Because the third party can not associate the voter to his/her voting result and IDi. Also because CTA confirms the voter using voting list, one person for one voting principle is assured. A third party can not vote for other voter and the conduct other illegal acts. After the voting is over, when the voting result is open to the public board, the other voters can

Anonymous Voting Scheme on Multicast

1039

not trace the voting value using the result. Because tally results are confirmed by only the voters, this scheme retains the receipt free and robustness in voting value and checks the conspiracy of the third party.

3 Conclusion One-way anonymous authentication service is focused on anonymity of the information originator in a group different from peer-to-peer services. Therefore when an anonymous secure selection scheme must be applied to this service, group oriented anonymous authentication can be used. In this paper, we proposed the anonymous secure selection and discussed the efficiency involved with the signcryption. The proposed scheme satisfies the anonymity with authentication, verifiability and integrity and blinds the selection result from other entities on the open network and protects a selection result of an originator using one-way anonymous authentication service.

References 1. Trappe, W., Song, J., Poovendran, R., Liu, K.J.R.: Key Distribution for Secure Multimedia Multicasts via Data Embedding. In Proc. IEEE ICASSP (2001) 1449–1452 2. Agarwal, D.A., Chevassut, O., Thompson, M.R., Tsudik, G.: An Integrated Solution for Secure Group Communication in Wide-area Networks. In Proc. of 6th IEEE Symposium on Computers and Communications (2001) 22-28 3. Viswanathan, K., Boyd, C., Dawson, E.: Secure Selection protocols. The International Conference on Information Security and Cryptology, ICISC '99 (1999) 117–131 4. Zheng, Y.: Signcryption and Its Application in Efficient Public Key Solutions. Proceedings of the Information Security Workshop (1998) 291–312 5. Cohen, J., Fischer, M.: A Robust and Verifiable Cryptographically Secure Election Scheme. Proceedings of the 26th Annual IEEE symposium on the Foundations of Computer Science (1985) 372–382 6. Benaloh, J.: Verifiable Secret-ballot Elections. Ph.D. Thesis, Yale University, Technical Report (1987) 7. Asano, T., Matsumoto, T., Imai, H.: A Scheme for Fair Electronic Secret Voting. Technical Report, IEICE Japan (1990) 21–31

CAST: Design and Implementation of Secure Context-Awareness Simulation Toolkit for Ubiquitous Computing Research Environment InSu Kim1 and YoungLok Lee2 1

Department of Information Security, Chonnam National University, Gwangju, Korea [email protected] 2 Linux Security Research Center, Chonnam National University, Gwangju, 500-757, Korea [email protected]

Abstract. This Paper is significant because it provides virtual, secure experimental environment to the S/W-major ubiquitous computing researchers who find it difficult to fabricate physical sensors. The study deals with the most important elements of ubiquitous computing, that is, the toolkit to acquire, express and safely use the context information. To do so, we introduce CAST and show how it works. CAST generates users and devices in a virtual home domain, designates their relation and acquires virtual context information. The created context information is reused by the request of application and put into use for context learning. Particularly, we have given a consideration to security in the process of context acquisition and its consumption. That is, we applied SPKI/SDSI to test if the created context information was valid information and if the application that called for the context had legitimate authority to do so. CAST not only captures virtual context information, but it also guarantees the safe sharing of the context information requested by the application.

1 Introduction Context Awareness is the most important research area in ubiquitous computing, on which many researches have recently been working on. It was because Mark Weiser, who first suggested the notion 'ubiquitous computing', noted that basically the following 4 should be reflected[20,21,22]: Human-friendly interface, Computing connection at anytime, anywhere, Calm technology, Dynamic service. Like this, Context Awareness is very important to ubiquitous computing that is intended to enable natural and convenient computer use in everyday lives regardless of location and time. Since Context Awareness is important, there have been lots of preceding researches on it. However, there are some limitations and our approach to the resolution of it is as follows:

Ԛ

Ԙ

ԛ

ԙ

• The vagueness of the definition of context: Although there have been many preceding researches, there is no agreed definition of it yet. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 1040 – 1055, 2006. © Springer-Verlag Berlin Heidelberg 2006

CAST: Design and Implementation of Secure Context-Awareness Simulation Toolkit

1041

• The difficulty in acquiring context: In real world, context information is acquired by physical sensors. However many researchers have hard time in their getting context information since they have mostly majored in S/W part rather than H/W part. For the reason, the majority of the related researchers are apt to do their modeling and research context conceptually without the real context information. • The validity of acquired context information and its safe sharing: Preceding studies on the things such as Context Toolkit have not given any consideration to security while acquiring context. That is, there have been no consideration to if the acquired context information was valid or if the application requesting the acquired context is duly authorized.

CAST (Context-Awareness Simulation Toolkit) CASThome

X Virtual Home-Map Editor

\ Scenario & Policy Manager

Event Manager

Y Virtual Device Manager Z Virtual Person Manager Policy Repository

Context Repository

[ CASTadmin

Context Manager

] CASTmiddleware

Fig. 1. A scene from the featuring of famous Korean musical named NANTA(left) and Architecture of CAST(right)

The development of CAST(Context-Awareness Simulation Toolkit) which the study suggests was motivated while the author was watching a play in the theater. Figure 1 is a scene from the featuring of famous Korean musical named NANTA(left) and Architecture of CAST(right). The performers( of Figure 1) play with their all might and main in their gestures( ) following the scenario( ) of the playwright( ) against the stage( ) garnished with various stage properties( ). The CAST suggests will create a valid context information for us via a S/W agent concocted virtual sensor instead of a physical sensor. Just like the performers in the play moves us by their performance in their virtual lives, we are going to safely share and reuse the created context information and utilize it in the user adaptation by the context learning. The study is composed of the following. In section 2, it will take a look at related research and in section 3, it will introduce CAST. In section 4, it will discuss the feasibility and its comparison with existing research by a proposed prototype of CAST. Finally, in section 5, it will present the conclusion.

‫ݎ‬

‫ݓ‬

‫ݐ‬

‫ݒ‬ ‫ݏ‬

‫ݑ‬

2 Related Works There have been lots of preceding researches on Context-Awareness. Except the Context Toolkit of A.K.Dey[1,2], All preceding researches acquire context information through physical device in H/W part[6,10].

1042

I. Kim and Y. Lee

2.1 The Context Toolkit of Dey et al. The approach of the Context Toolkit by Dey et al[1, 2, 3, 4] origins from the area of location-aware services. Its main focus is to provide a comprehensive conceptual framework together with a toolkit for representing and processing context information independent of an application, thus establishing an external customisation architecture. The context toolkit which realises only a subset of the frameworks' concepts has been implemented using Java and XML[5]. Various applications have been developed on basis of the toolkit, including, e.g., an in/out board for indicating those persons which are inside a building, a personalised information display which shows the user in front relevant information and a context-aware mailing list sending emails only to those subscribed users which are currently in a certain building. 2.2 The CIVE of Seiie Jang et al. The CIVE(Context-based Interactive system for distributed Virtual Environment) is a system that connects real world with virtual world by sharing user's context[6]. It consists of ubi-UCAM for generating user's context, NAVER for managing virtual environments, and Interface for linking ubi-UCAM with NAVER. Interface transforms contexts or commands into events that influence virtual environment, and converts events into contexts for context-based services in real environment. Especially, ubi-UCAM(a unified context-aware application model) is an essential part in developing personalized services in ubiquitous computing environment. The proposed model, consisting of ubiSensor and ubiService, enables multiple ubiServices to share object-oriented unified context that describes information of 5W1H(Who, What, Where, When, Why and How) generated by ubiSensors. 2.3 The GUIDE of Davis et al. The GUIDE system of Cheverst et al. [10,11,12,13] stems from the area of locationbased services. The focus is to provide tourists with up-to-date and context-aware information about a city via a PDA.

3 Context-Awareness Simulation Toolkit CAST that we suggest defines context with our focus on a specific domain of home, unlike the preceding researches with wide-range definitions of context. The domain can be a school, a hospital, an office or the like. • Contexthome: Context is the information that can particularize the conditions of all entities constituting home and be put into use in decisionmaking. The entity includes people, place and all objects in the actual world.

CAST: Design and Implementation of Secure Context-Awareness Simulation Toolkit

1043

3.1 CASThome We have used by Macromedia’s Flash MX 2004 in laying out a home environment with virtual people and devices[16,17]. We chose the technology because it's quite useful with its visual convenience and easy interface. It also has a big merit of its ability to be interlocked with external programs[18] since it supports the intercommunication function with the programs such as JAVA, C++, C# via the XMLsocket Class provided by Flash. Prosumer PDA

CP

Application Device Consumer Temperature

Event Manager TV

Sound

Context Manager Video

Light

Location

ཛ Environment sensors

Monitor

Sensor Device

Producer

Diary

ཛྷ Smart tag sensors (Diary, RFID,...)

Fig. 2. The devices are classified in their production and consumption of virtual context as Sensor devices and Application devices

3.1.1 Virtual Device Manager We have realized virtual S/W devices by using the Flash MX 2004[16,17] as well as the JAVA sockets. The devices are classified in their production and consumption of virtual context as Figure 2. 3.1.1.1 Sensor device(=Producer). This is a device, with no calculation capability, that only generates context information and that is classified as the following: •

Environment sensors: This is a sensor that produces physical environmental information that is exposed to everybody. • Smart tag sensors: Smart tag sensor connected to private information.

‫ݒ‬

Sensor Device should be able to transmit context-sensing information to the Context Manager of CASTmiddleware to support the context-recognition environment( -1, -2 of Figure 4). For that, sensing schema is suggested as shown in Figure 3.

‫ݒ‬

<SensorInfo>:

This is the information of the sensor which will provide contextrecognition service and includes the sort of sensor(temperature, humidity, location and the like) and the discriminator. <SensedValue>: This transmits the structure of the information being sensed. Its attributes include the data type and the unit.

1044

I. Kim and Y. Lee 1

Sensing

Request

1

1

1

1

1

SensorType

0..*

1

SensedValue

SensorInfo

1

1

1

0..*

Requester

1

1

1

1

0..*

Context ConditionSet

1

RequestContexts

1

1

Sens orID

1

0..1

1 1 reques tID

1

0..1

1 ContextInfo

ResultValue

serviceID

0..1

1 1

0..* ContextCondition

1

1

1 1

1 ContextName

1 ContextSource

1 ConditionValue

Fig. 3. Sensing schema of the sensor devices(left), Context request/response schema of Application devices(right)

The sensing schema is shared with Context Manager and the sensor programmer generates sensing instance by specifying the necessary contents based on the sensing schema. The following describes the examples of sensing instance sent by the temperature sensor. <Sensing> <SensorInfo> <SensorType>temperature <SensorID>001 <SensedValue dataType="float" unit="celsius">22

3.1.1.2 Application device(=Consumer or Prosumer). A device with calculation capability with itself. Classified as the following: • Consumer: A device that just asks Context manager for context and consumes context.(TV, video, monitor, etc). • Prosumer: A device that not only asks Context manager for context to consume it, but also is able to update its private information-related context. (PDA, cellular phone, etc).

ԜԞ

To perform the service that supports context-recognition, Application Device should be capable of asking context from the Context Manager of CASTmiddleware( in Figure 4). A context request/response schema is suggested as shown in Figure 3 for the use.

CAST: Design and Implementation of Secure Context-Awareness Simulation Toolkit

1045

: The information on the Application Device which sent the context request/response instance to provide the context-recognition service. : It is where logical calculation is performed and the answer is either True or False. As its attributes, this supports relation calculation(<,>,,,,=). (Example) "Is the temperature above 30 ?". : It is where the number of logical calculation is more than two. In other words, it is when more than two ContextConditions are repeated with either AND or OR calculation. (Example) “Is the temperature above 30 ?” AND “Is the humidity over 75%?. : It is generally a place where requires a value, a single context. (Example) “What is the temperature for now?” : It is the location(=ContextSource) and the kind(=ContextName) of the context that application device requires. (Example) The temperature in the livingroom. : It is an element where the result value from the context request is stored and it may describe the unit and data type with its attributes.

͠

͠

Context request/response schema is shared with Context Manager. Application programmer generates Context request/response instance while specifying the necessary context information and condition based on the Context request/response schema. The Application sends the event(=context) subscription request of the context information required to perform the service in his own form of instance to the Context Manager via Event Manager( ). After being converted through CQC(Context Query Controller) via pre-processing(parsing) into Context Query statement, it queries to Context Repository(DB)( ) and replies the context information on the event subscription ( ). An example of the instance of Application Device requiring described in section 4.2.2.

Ԟ

ԝ

Ԝ

3.1.2 Virtual Person Manager This generates family members that belong to the virtual home environment. The profile of each member will be stored in their cellular phones and will be updated at all times. The profile will be written in SOUPA(Standard Ontology for Ubiquitous and Pervasive Applications)[7,8,9] and includes the information of their respective domain homes as well as the application device. The following shows a partial ontology description of the person Kim In-su: In-su

1046

I. Kim and Y. Lee

Virtual Devices Of CASThome

Sensor Device 1

CASTmiddleware st

ཛྷ Verification Certificate LRC

SSCM (eType,ACLs)

Z Certificate Verification

Remote Event Listener

ཟ Event Subscription Request ཡ Event Notification

ཞ Listener Register

[-1 JES

Adsent of Sensor

sQ us ens u er ing er R eg y Ev ent ister Re que

ཛ ACLs Query ཝ Request of Consumer Register

SPKI/SDSI certificates

Event Producer 1

X \- Y P AC 1 S r od L

Application Device (Event Consumer)

ACLs

Sensor Device 2 Event Producer 2

ERC ACLs

Event Manager (SeJES)

CASTadmin

[-2,\-2 Producer Query

འConsumer Query

CQC Policy Manger

Policy Repository

Context Repository

Sensor Device 3 Event Producer 3

ACLs

CIC

Context Manager

Fig. 4. CAST which supports the authentication, authorization and device’s context query request/response

….. Linux Security Research Center,300, Yongbong-dong, Puk-gu, Gwangju, Korea +82-62-530-3714

….. <dev:SamsungCellularphone> <dev:hasUser> <dev:modelNumber rdf:datatype="&xsd;string>SCH-E120 <dev:serialNumber rdf:datatype="&xsd;string>R24W114948 <dev:phoneNumber rdf:datatype="&xsd;string>017-365-8477

…..

CAST: Design and Implementation of Secure Context-Awareness Simulation Toolkit

1047

3.1.3 Virtual Home-Map Editor It generates a virtual home environment as the domain of CAST. A virtual home, like a home in real world, may consist of rooms, a living room, bath room and kitchen. Also, the Sensor and, Application devices generated by Virtual Device Manager will be put into appropriate places in the virtual home(Figure 7). 3.2 CASTmiddleware CASTmiddleware performs the communication role among the virtual people and devices set in CASThome. It is composed of Event Manager and Context Manager. Their respective roles are like the following. 3.2.1 Event Manager (SeJES:Secure Javaspaced-Based Event Service) Ubiquitous services must appropriately adapt to the context information of the user. In need of privacy protection and proper adaptation, context information should be generated from the accurate event information and only right possessor of the authority on the event should utilize it. The ad hoc network environment introduces fundamental new problems. One is the absence of an online server, and another is secure transient association. Authentication and Authorization are the most interesting security problems in ad hoc networking, because much of the conventional wisdom from distributed systems does not quite carry over. For solving these problems, we use SPKI/SDSI (Simple Public Key Infrastructure/Simple Distributed Security Infrastructure) certificates [15, 18]. 3.2.1.1 ACL and SPKI/SDSI Certificate. The consumer of events will have the following Name Certificate() and Authorization Certificate()[15]. The event consumer sends Name Certificate and Authorization Certificate bundled together to LRC when registering event listeners. ACL includes security policy which event producer endows the beneficiary event consumer the authority and is expressed as the Authorization Certificate. The following explains S-expression of Authorization Certificate. Bob grants subject, called InSu Kim in his local name space, its authority, “get notification 2” from Nov.20,2005 to June.18,2006. (cert (display plain) (issuer (public-key (rsa (e #010001#) (n |APsREOm+tJQsyS6f7ddzrY4A ...|)))) (subject (name InSu Kim)) (tag “get notification2”) (valid (not-before "2005-11-20_06:51:33") (not-after "2006-6-18_21:51:33")) (comment "test certificate") (signature (hash sha1 |aj5Le4mGJ1BltdNdhUm3BVxjgrw=|) (public-key (rsa (e #010001#) (n |APsREOm+tJQsyS6f7ddzrY4ACM9fmQC ...|))) (rsa-pkcs1-sha1 |mSWhfa2GBJ3YKwkEYL/7yCP3Iicw... |))

1048

I. Kim and Y. Lee

3.2.1.2 SPKI/SDSI Certificate Manager. SSCM(SPKI/SDSI Certificate Manager) includes “Certificate Chain Discover” algorithm which carries out the work of retrieving from its certificate cache authority certificates and name certificates provided to verify the authority for the consumer to perform the given calculation[15]. 3.2.1.3 Listener Registration Controller (The first security challenge: Authorization). LRC (Listener Registration Controller) registers only the event listener of authorized consumer (Application Device) to JES. LRC sends the event type and certificate bundle which it received from consumer to SSCM for authority probation and decides by the feedback whether to register the event listener [23, 24]. The following is the process in which the certification and authorization is performed on the Application Device as the consumer of events( - of Figure 4).

Ԙԛ

Ԙ The consumer of events retrieves ACLs via Event Manager for the event type that it wants to receive prior to trying event communication. ԙ The consumer of events verifies that it has the subscription authority for the Ԛ ԛ

event type with ACLs and its list of certificates and transmits the verified results to LRC. It requests SSCM to probe the certification package and receives the feedback. Once the subscription authority is verified, LRC records the information on the consumer of events to let the consumer register later on. The consumer of events that is verified to have the legitimate authority on the type of events requests LRC to register it as event listener. LRC downloads and registers the listener stub for the legitimate consumer of events and registers event listeners.

3.2.1.4 Event Registration Controller (The second security challenge: Authentication). ERC (Event Registration Controller) registers only the events of authenticated producer (=sensor device) and produces valid context. It tests the event type registered by producer and checks for any error for ACLs and for any overlapping of the event registered [23, 24]. The following is the process where the certification/authorization for sensor device as the event producer is performed( - of Figure 4).

‫ݐݎ‬

‫ ݎ‬Once the power is on, Sensor Device1, an Event producer, sends the certifica‫ݏ‬ ‫ݐ‬

tion package containing the event type that it supplies, ACLs and verifying that it has the authority to publish the events to ERC, requesting it to register as event producer. After checking for any errors, ERC requests SSCM(SPKI/SDSI Certificate Manager) to verify the certification package and to register event publication. Based on the verified value of the certification package, SSCM registers event type as well as ACLs. ERC records the information on the event producer to be able to later process the events published by event producer.

3.2.2 Context Manager The context manager processes context information coming from sensors, stores it, and converts it into high layer's context information according to the inference policy. Also, it performs dynamic query on the context information that application device

CAST: Design and Implementation of Secure Context-Awareness Simulation Toolkit

1049

requests. The context information, stored in the database, is reused later in the study of user sign or when the application demands. This context information is the events which application devices want, so to inference those events the context manager provides GUI for setting the inference policy. 3.2.2.1 CQC (Context Query Controller) (1) Sensor Device query processing After Sensor Device is verified( - of Figure 4), the following describes sensor device query processing.

‫ݐݎ‬

‫ݒݑ‬of Figure4

‫ݑ‬The Event Manager announces the advent of the sensor by transmitting the sensing instance of the sensor to the CQC of Context Manager(‫ݑ‬-1). CQC parses the instance, transforms it into a CREATE table in SQL, and generates the sensor table in the DB(‫ݑ‬-2). Of course, once the same device already exists, it must not be generated repeatedly. ‫ݒ‬If the sensing event really happens later(‫ݒ‬-1), it is transformed into INSERT or UPDATE table in SQL through CQC and is stored in the given sensor table (‫ݒ‬-2). (2) Application Device query processing After the verification of application device(Ԙ-ԛ of Figure 4), the following ԜԞ of Figure 4 describes application device query processing. Ԝ The application sends the event(=context) subscription request of the context ԝ Ԟ

nformation required to perform the service in his own form of instance to the Context Manager via Event Manager. After being converted through CQC(Context Query Controller) via pre-processing (parsing) into Context Query statement, it queries to Context Repository(DB). It replies the context information on the event subscription.

3.2.2.2 CIC (Context Inference Controller). CIC (Context Inference Controller), following given reasoning policy, re-transforms a low-level context to a high-level one, either storing through CQC or replying the query of the Application. 3.2.2.3 Context Repository. Context store is where context information is systematically arranged and stored. In the study, for the later re-use of the context, it was stored using DBMS in MySQL. 3.3 CASTadmin CASTadmin defines the relations between the device deployed in the CASThome as a domain and the person belonged to it, and generates policy and scenario. The defined relations and policy are to decide who can own and use which device. And person actually behaves following the scenario, then the subsequent virtual device is going to produce or consume context information via the communication with CASTmiddleware. The detailed explanation of it is omitted due to space consideration here.

1050

I. Kim and Y. Lee

4 Implementation and Analysis To probe the feasibility of the suggested CAST, we implemented it in Linux/Windows OS, Macromedia’s Flash MX 2004[16,17], JDK 1.3 and Jini 1.2 development envi ronment[23,24]. We set up a scenario, and subsequently observed that virtual certified devices produce valid context information in accordance with the scenario and that authorized TV devices share and consume it. Application devices

Sensor devices

TV Interface(*.swf) xmlsocket Class

Location Sensor

Sofa Sensor Interface(*.swf) xmlsocket Class

Light Sensor

Event Consumer

Event Producer javaspace.write(…)

javaspace.notify(…)

Event Manager Event Channel 2 of JES Event Channel 1 of JES

javaspace.notify(…) javaspace.write(eventType,..)

Context Manager Event Producer

Context Query

Context Repository

Context store

Event Consumer

Fig. 5. The communication flow between CASTdevice and CASTmiddleware that takes place once Bob sits on the sofa

4.1 Scenario We realized the prototype scenario as the following: “On July 5, 2005, PM 6:00, Bob heads for home after work. In the way, he stops by a grocery store to buy some fruits. When he arrives home, the light goes on and after certification, he enters home. First, he changes his clothes in his home, then he has a brief dinner and sits on the sofa. The context manager, detecting his sitting on the sofa, turns on the TV to a channel based on Bob's appetite. He runs out to the grocery store to retrieve his wallet that he had mistakenly dropped there, with the TV on. When Bob came back, the TV is off, but turns on to the same channel that he viewed when he sits on the sofa.” 4.2 Prototype Implementation 4.2.1 Scenario process The information generated by the sofa sensor is transmitted to Context Manager through JES of Event Manager and stored there. Since the acquired information is the one that is called to turn the TV On/Off, it is automatically transmitted to the TV through JES of Event Manager. Of course, the information of whether the TV has the authority to use the context information generated will be transmitted only when it is probed in SSCM of Event Manager.

CAST: Design and Implementation of Secure Context-Awareness Simulation Toolkit

1051

4.2.2 Context Request/Response Instance of TV The following is the Context Request/Response Instance of TV in the livingroom. The content of the instance is "'Is there someone in the livingroom?' and 'Is someone sitting on the sofa?'". ‫ۈۊۊۍٻۂۉۄۑۄڧ‬ ‫ۍۊێۉۀێٻۏۃۂۄڧ‬

‫ۍۊێۉۀڮٻۉۊۄۏڼھۊڧ‬

‫ڄۉ ڪڃۀھۄۑۀ ڟٻڱگ‬

‫ۍۊێۉۀێٻڼہۊڮ‬

Fig. 6. Demo and Prototype of our proposed CAST using JAVA, Macromidea Flash MX 2004 <ServiceID>TV 00002 pressure livingroom:sofa1 0 location Any livingroom

4.2.3 TV On/Off Algorithm The On/ Off adaptation algorithm for TV’s Context Request/Response Instance replyed by Context Manager is as follows:

1052

I. Kim and Y. Lee

O Algorithm: TV Adaptation(TvConRequestReponse tvevent) O Input: tvevent //instance responsed by Context Manager O Method begin switch(_self.flag) //TV’s state(Off or On) begin case FALSE: // In the case of TV_OFF if((tvevent.sofa>0)ӡ tvevent.livingroom==TRUE)TvON(); case TRUE: // In the case of TV_ON if(tvevent.livingroom==FALSE) TvOFF(); end end

4.2.4 Prototype Implementation and Demo When Bob enters into living room, the location sensor detects his presence and turns the light on(Figure 6). But the TV does not go on by the Rule designated by the TV On/Off Adaptation Algorithm. If Bob sits on the sofa by clicking “Sit Down” button of sofa in living room, by the rule, the TV goes on(Figure 6). 4.3 Analysis The study suggested CAST for the purpose of acquiring, expressing and safely using context information in the previous sections and probed its feasibility by a Demo of a realistic prototype. In particular, we could find a big merit of guaranteed security in the aspects of context acquirement(=Authorization, The first security challenge) and consumption/sharing(=Authentication, The second security challenge) in Figure 7.

Representation of Context

Sharing of Context

Acquisition of Context

Abstraction

Reusability

manual

semiautomatic

automatic

static

dynamic

Device Authentication

Authorization

Security

CAST

O

O

-

-

O

-

O

O

O

CIVE

O

-

O

O

-

-

O

-

-

Context Toolkit

O

O

-

-

O

-

O

-

O

GUIDE

O

-

O

O

-

-

O

-

-

Approach

Kind

Automation

Dynamicicity

Fig. 7. We have compared and reviewed CAST realized by a prototype and preceding studies

CAST: Design and Implementation of Secure Context-Awareness Simulation Toolkit

1053

We have compared and reviewed CAST realized by a prototype and preceding studies, focusing on the following[19]. • Authentication. It means which context information acquired from certain types (manual, semi-automatic, automatic) is valid, that is, if the context is attained from certified source.(The first Security challenge). • Authorization. It means whether it provides the acquired context information to only the application with due authority.(The second Security challenge) • Automation. Concerning the acquisition of context, first it has to be defined who is in charge for gathering appropriate context information, be it either a human (manual acquisition) or the system(automatic acquisition) or a combination thereof(semi-automatic acquisition). • Dynamicity. Another important aspect is when context acquisition takes place. Considering the frequency of context changes, context can be either static, i.e., determined once at application start up(e.g., the device used to select the appropriate interaction style), without considering any further changes or dynamic, i.e., determined on every change during runtime(e.g., the bandwidth to adapt the resolution of an image on the fly). • Abstraction. According to the level of abstraction where context properties are represented it should be distinguished between physical context and logical context. • Reusability. A customization approach should allow to explicitly represent context within the system, not just to intermingle context with adaptation or the application it self. An explicit representation would allow for reusability of already defined context across several applications.

5 Conclusion CAST generates users and devices in a virtual home domain, designates their relation and acquires virtual context information. In particular, CAST acquires valid context and supports the function of sharing it with only authorized applications. This is significant because it provides virtual experimental environment to the S/W-major context researchers who find it difficult to fabricate physical sensors. As a conclusion, the CAST that the study suggests has the following characteristics: First, it supports the acquisition of reusable contexts through virtual device. Second, it can have confidence in the context acquired by certification/authorization support, and application devices can safely use it. Third, it supports the experimental environment that is capable of applying a user-defined scenario through a virtual home environment configuration. Fourth, by suggesting Context Request/Response Schema of devices, any devices can instantly recognize and be deployed without having to re-modify the Context Manager in the home environment.

Acknowledgement This research was supported by the MIC, Korea, under the ITRC(Information Technology Research Center) support program supervised by the IITA(Institute of Information Technology Assessment).

1054

I. Kim and Y. Lee

References 1. Abowd, G.D:. Software Engineering Issues for Ubiquitous computing. Int. Conf. on Software Engineering, Los Angeles, (1999) 75-84 2. Dey, A.K., Abowd, G.D.: The Context Toolkit: Aiding the Development of ContextAware Applications. Workshop on Software Engineering for Wearable and Pervasive Computing, Limerick, Ireland, June, (2000) 434-441 3. Dey, A. K., Kortuem, G., Morse, D. R., Schmidt, A.: Situated Interaction and ContextAware Computing. editorial, Personal Ubi Comp, Vol. 5 (2001) 1-3 4. Dey, A.K., Salber, D. Abowd, G.D.: A Conceptual Framework and a Toolkit for Supporting the Rapid Prototyping of Context-Aware Applications. anchor article of a special issue on Context-Aware Computing Human-Computer Interaction (HCI) Journal, Vol. 16 (2001) 97-166 5. Means,W.S., Harold,E.R.: XML in a Nutshell. A Desktop Quick Reference, O’Reilly, (2001) 6. Jang, S., Lee Y.H., Woo, W.: CIVE: Context-based Interactive System for Distributed Virtual Environment. ICAT 2004 (2004) 495-498 7. Chen, H., Perich, F., Finin, T., Anupam J.S.: SOUPA: Standard Ontology for Ubiquitous and Pervasive Applications. (2004) 258-267 8. Chen, H., Perich, F., Finin, T., Anupam J.S., Kagal, L.: Intelligent Agents Meet the Semantic Web in Smart Spaces. the IEEE Computer Society, October, (2004) 9. Chen, H., Finin, T., Joshi, A.: An Ontology for Context-aware Pervasive Computing Environments. Special Issue on Ontologies for Distributed Systems, Knowledge Engineering Review, (2003) 265-270 10. Davis, N., Cheverst, K., Mitchel, K., Efrat, A.: Using and Determining Location in a Context Sensitive Tour Guide. IEEE Computer, (2001) 35-41 11. Cheverst, K., Davies, N., Mitchell, K., Friday, A.: Experiences of Developing and Deploying a Context-Aware Tourist Guide: The GUIDE Project. Proc. Of the 6th Int. Conference on Mobile Computing and Networking (MOBICOM), Boston MA USA, ACM, Press (2000) 20-31 12. Mitchel, N.D.: The Role of Adaptive Hypermedia in a Context-aware Tourist Guide. Communications of the ACM (CACM), Vol. 45 (2002) 47-51 13. Davies, N., Cheverst, K., Mitchell, K., Efrat, A.: Using and Determining Location in a Context-Sensitive Tour Guide. IEEE Computer, Vol. 34 (2001) 35-41 14. Lee, Y.L., Lee, S.Y., Kim, I.S., Lee, H.H., Noh, B.N.: Development of Event Manager and Its Application in Jini Environment. IFIP International Conference on NCUS 2005, Nagasaki, Japan, (2005) 704-713 15. Lee, Y.L., Lee, S.Y., Park, H.M., Lee, H.H., Noh, B.N.: The Design and Implementation of Secure Event Manager Using SPKI/SDSI Certificate. IFIP International Conference on UISW 2005, Nagasaki, Japan, (2005) 490-498 16. Kaye, J., Castillo, D.: FlashTM MX for Interactive Simulation. THOMSON, 2005 17. Swann, C., Caines, G.: XML in FlashTM. QUE Publishing, (2002) 18. Liu, M. L.: Distributed Computing: Principles and Applications.PEARSON ADDISONWESLEY, (2004) 19. Kappel, G., Proll, B., Retshitzegger, W., Schwinger, W.: Customisation for Ubiquitous Web Applications: A Comparison of Approaches. International Journal of Web Engineering and Technology, Vol.1 (2003) 79-111

CAST: Design and Implementation of Secure Context-Awareness Simulation Toolkit

1055

20. Weiser, M.: Some Computer Science Problems in Ubiquitous Computing. Comunications of the ACM, (1993) 34-38 21. Weiser, M.: Ubiquitous Computing. Nikkei Electronics, (1993)137-143 22. Weiser, M.: The World is Not a Desktop. Interaction, (1994) 7-8 23. Bishop, P., WARREN, N.: JavaSpacesTM in Practice. Addison-Wesley, (2002) 24. Edwards, W.K.: Core JINI. 2nd edn. Prentice Hall PTR, (2000)

Support Vector Machine for String Vectors Malrey Lee1 and Taeho Jo2 1

School of Electronics & Information Engineering, ChonBuk National University, 664-14, 1Ga, DeokJin-Dong, JeonJu, ChonBuk, 561-756, South Korea [email protected] 2 SITE, University of Ottawa, Room 5010, SITE, 800 King-Edward Ave, Ottawa, Ontario, Canada K1N6N5

Abstract. This paper proposes an alternative version of SVM using string vectors as its input data. A string vector is defined as a finite ordered set of words. Almost all of machine learning algorithms including traditional versions of SVM use numerical vectors as their input data. In order to apply such machine learning algorithms to classification or clustering, any type of raw data should be encoded into numerical vectors. In text categorization or text clustering, representing documents into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. Although traditional versions of SVM are tolerable to huge dimensionality, they are not robust to sparse distribution in their training and classification. Therefore, in order to avoid this problem, this research proposes another version of SVM, which uses string vectors given as an alternative type of structured data to numerical vectors. For applying the proposed version of SVM to text categorization, documents are encoded into string vectors by defining conditions of words as their features and positioning words corresponding to the conditions into each string vector. A similarity matrix, word by word, which defines semantic similarities of words, should be built from a given training corpus, before using the proposed version of SVM for text categorization. Each semantic similarity given as an element of the matrix is computed based on collocations of two words within their same documents over a given corpus. In this paper, inner product of two string vectors as the proposed kernel function indicates an average semantic similarity over their elements. In order to validate the proposed version of SVM, it will be compared with a traditional version of SVM using linear kernel functions in text categorization on two test beds.

1 Introduction SVM (Support Vector Machine) refers to a kernel based machine learning algorithm where a vector space is mapped into another vector space where training examples are linearly separable, and two hyper-planes corresponding to two classes are defined with the maximal margin between them as boundaries of classification. In SVM, the mapped space is called feature space, and a function for computing inner products of two examples mapped into the feature space is called a kernel function [4]. Examples involved in defining these hyper-planes are called support vectors [2]. Since SVM is D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 1056 – 1067, 2006. © Springer-Verlag Berlin Heidelberg 2006

Support Vector Machine for String Vectors

1057

tolerant to huge dimensionality of input vectors, it has been applied very popularly to text categorization where documents are represented into high dimensional numerical vectors. In 1998, Joachims initially applied SVM to text categorization and shown that SVM had better performance than NB [7]. In 1999, Drucker et al applied SVM to spam email filtering which was a real task using text categorization and shown that SVM was also the better approach than NB [3]. In 2002, Sebastiani surveyed more than ten machine learning based approaches to text categorization and concluded that SVM is the best approach among them [11]. In 2004, Park and Zhang proposed training SVM using labeled and unlabeled documents by co-training for text categorization [10]. SVM has been applied not only to text categorization, but also to other classification problems including image classification and protein classification [1,2]. These previous literatures have shown that SVM is very good approach to text categorization. However, representation of documents into numerical vectors for text categorization leads to not only huge dimensionality, but also sparse distribution. Although SVM is tolerable to the former, it is not robust to the latter, together with other traditional machine learning algorithms, such as NB, KNN, and traditional neural networks. The reason is that if numerical vectors become very sparse, they lose their discrimination for classification. In order to solve the two problems, Lodhi et al proposed a new kernel function, called string kernel, in 2002 [8]. String kernel is applicable to texts themselves without representing documents in a structured format. The advantages of string kernel are that it is applicable to documents independent of a natural language in which they are written, and that it captures semantic relations implicitly between two documents. However, it requires defining approximately 20,000 substrings; it takes very much time to compute a value of inner product between two raw texts using the string kernel. The experiments of Lodhi el al presented that their proposed kernel failed to improve the performance of SVM in text categorization on the test bed Reuter 21578 [8]. In using SVM for text categorization, this paper proposes that documents should be represented into string vectors instead of numerical vectors. A string vector refers to a finite ordered set of words, and it will be described in section 2.2. In other words, words replace numerical values in string vector. When Jo proposed a new neural network, called NTC (Neural Text Categorizer), in 2000, string vector was initially used for representing documents [5]. Recently, we discovered that string vectors are applicable not only to NTC or NTSO (Neural Text Self Organizer) [7], but also to other traditional machine learning algorithms, such as SVM, NB, and KNN, if they are modified properly. Therefore, this paper proposes a modified version of SVM where a kernel function for string vectors is employed. Since a kernel function is for computing an inner product of two mapped vectors, the inner product implies a similarity between two entities whether they are mapped or not. Based on this fact, this paper defines a kernel function for string vectors as their semantic similarity. Before using the kernel function, it requires building a similarity matrix word by word from a particular corpus. The matrix defines semantic similarities between words as normalized values and the semantic similarities are computed based on collocations of words within their same documents.

1058

M. Lee and T. Jo

In the experiments of this paper, the proposed version of SVM will be compared with its traditional version and NB on two test beds, NewsPage.com and Reuter 21578. The experiments in section 4 will show that the proposed version of SVM is best one among them on both test beds. On the second test bed, the proposed version of SVM is evaluated as the best one, in macro-averaged F1, rather than in microaveraged F1. This means that the proposed version works very well to sparse categories where a small number of training documents is given. This paper consists of five sections. Section 2 describes processes of encoding documents into two structured formats, numerical vectors and string vectors. Section 3 explains the brief concept of SVM, and presents kernel functions for numerical vectors and string vectors. Section 4 presents experimental results where the proposed version of SVM is compared with its traditional version and NB on two test beds. Section 5 mentions significances of this research and further researches to improve the current one, as the conclusion of this paper.

2 Text Representations Since texts are unstructured data in natural languages and computers can not process them directly, they should be encoded into structured data for processing them. This section describes two ways to do that. One is for the traditional version of SVM and NB, and the other is for the proposed version of SVM. 2.1 Numerical Vectors This subsection describes the traditional way to encode texts. In this method, texts are represented into numerical vectors. Main machine learning algorithms such as NB (Naïve Bayes), SVM (Support Vector Machine), MLP (Multi-Layer Perceptron), and Kohonen Network use numerical vectors as their input vectors. This method is necessary to apply such machine learning algorithms for text categorization and text clustering. Features given as attributes of numerical vectors representing texts are words in a collection of texts. If all of words were used as their features, their dimension would be more than 10,000; such a dimension is not feasible for text categorization and text clustering. Previous research suggested several methods of feature selection, such as mutual information, chi-square, frequency based method, and information theory based method [11]. There are three ways to define feature values in numerical vectors representing texts. The first way is to use a binary value for each word indicating its presence or absence in the text. The second way is to use the frequency of each word in each text as its value. In this way, each element is an integer. The third way is to use the weight of each word as a feature value. Such a weight is computed from the equation (1). wit = f it (log 2 N − log 2 d t + 1)

(1)

Support Vector Machine for String Vectors

1059

where wit is the weight of the word, t in the text, i , f it is the frequency of the word, t in the text, i , N is the number of texts in the corpus, and d t is the number of texts including the word, t . 2.2 String Vectors This section describes string vectors as representations of documents for applying the proposed version of SVM to text categorization. A string vector refers to a finite ordered set of words. Each of such words may be mono-gram, bi-gram, or n-gram. A string vector is defined as an ordered set of words with the fixed size, independent of the length of the given document. The string vector representing the document, d i is denoted by vector,

d is = [wi1

wi 2 ... win ] , where n is the dimension of the string

s i

d . An arbitrary example of a four dimensional string vector is expressed as

[computer, software, hardware, machine]. From the given document, a bag of words is generated by indexing the document, as an intermediate representation for a string vector. Figure 1 illustrates the process of mapping a bag of words into a string vector. The dimension of a string vector is determined and the properties of words, such as the word with its highest frequency in the document, a random word in the first sentence, the word with its highest weight, or the word with its highest frequency in the first paragraph, are defined with the features of that vector. For simplicity and convenience of implementing the automatic process of encoding documents into string vectors, we defined the properties of string vectors as the highest frequent word, the second highest frequent word, the third highest frequent word, the fourth highest frequency word, and so on, in the document. In general, the dimension of string vectors is smaller than the size of bags of words. To each property given as a feature, its corresponding word is arranged so as to build a string vector.

Fig. 1. The process of mapping a bag of words into a string vector

Before performing operations on string vectors, from a particular corpus, we need to define a similarity matrix, a word by word matrix whose entries indicate similarities of all possible pairs of words. The idea of building a similarity matrix is based on the research of [2]. However, since word by word matrix defined in the literature does not provide normalized similarity values of words and a similarity between two

1060

M. Lee and T. Jo

identical words varies depending on their frequencies and distributions over a corpus, this research define similarities between words, differently from the previous research. In word by word matrix in this research there are words denoted by w1 , w2 ,..., wN , where N is the number of unique words in the given corpus, except stop words. It is denoted by the matrix, N by N , as follows.

ª s11 «s S = « 21 « ... « ¬sN1

s12 s22 ... sN 2

Its elements are denoted by sij (1 ≤ i, j ity, sij

... s1N º ... s2 N »» ... ... » » ... s NN ¼

≤ N ) , and defined as a semantic similar-

= sim( wi , w j ) , between two words, wi and w j . Each element,

sij (1 ≤ i, j ≤ N ) , is computed based on the collocation of two words, wi and w j , within a same document, using equation (2),

¦ (φ (w ) + φ (w )) r

sij = sim( wi , w j ) =

¦φ

p

( wi ) +

d p ∈Di

where

i

r

j

d r ∈D i ∩ D j

(2)

¦φ (w ) q

j

d q ∈D j

Di is the set of documents including the word, wi , and D j is the set of

φo (⋅) is the function of a word, unique to a function, φo (⋅) , means its occurrence, frequency, or

documents including the word, w j , and particular document,

do . The

weight computed by a particular equation, in the document,

do .

If a similarity matrix is built from a corpus by computing similarities between all possible pairs of words using equation (2), we can compute the similarity between s

two string vectors, denoted by d i

= [ wi1 , wi 2 ,..., win ] and d sj = [ w j1 , w j 2 ,..., w jn ] .

The similarity sim( wik , w jk ) between two words,

wik and w jk is obtained by

looking up the entry with the row corresponding to the word,

wik and the column

corresponding to the word, w jk or with its reverse, from the similarity matrix. The s

similarity between two string vectors di and

sim(d i , d j ) ≈ sim(d is , d sj ) =

d sj is computed using equation (3). 1 n ¦ sim(wik , w jk ) n k =1

(3)

Support Vector Machine for String Vectors

1061

As illustrated in equation (3), the similarity between two string vectors is the average over the similarities of their elements.

3 Support Vector Machine This section describes the brief concept of SVM and presents kernel functions for numerical vectors and string vectors. SVM has been known as a good approach to any classification problem with respect to its theory and practice [4], and it has been applied to various classification tasks including text categorization [2]. Equation (4) models the classification of SVM trained by training examples, x1 , x 2 ,..., x m .

§ m · f (x) = sign¨ ¦ α i k (x ⋅ x i ) + b ¸ © k =1 ¹ where

αi

(4)

is a Lagrange multiplier correspond to a training example,

x i , k (⋅) is a

kernel function which computes an inner product between two mapped examples without mapping them explicitly, and b is a bias of SVM. The function defined in m

equation (4) generates 1 as its output, if the value of

¦ α k (x ⋅ x ) + b is greater i

i

k =1

than zero. Otherwise, it generates -1. The learning of SVM is the process of optimizing the given Lagrange multipliers, α 1 , α 2 ,..., α m , with the given training examples,

x1 , x 2 ,..., x m . Here support vectors are training examples whose Lagrange multipliers are non zero. For optimizing these Lagrange multipliers, SMO (Sequential Minimal Optimization) algorithm is used popularly. Its detail description is included in literatures, [10] and [2]. 3.1 Kernel Functions for Numerical Vectors This subsection presents three typical types of kernel functions for numerical vectors. These types are linear function, polynomial function, and Gaussian function. The linear function is defined in equation (5), and it means the inner product between two vectors in their original space and c is an arbitrary constant.

k (x i ⋅ x j ) = x i ⋅ x j + c

(5)

Polynomial function is defined in equation (6) and it means that a linear function defined in equation (5) is squared several times.

k (xi ⋅ x j ) = ((x i ⋅ x j ) + c )

p

(6)

1062

M. Lee and T. Jo

The third type of kernel function is defined in equation (7) and it could compute inner products between two vectors mapped even into a vector space with infinite dimensionality.

§ (x i − x j )2 · ¸ k (xi ⋅ x j ) = exp¨ − ¨ ¸ σ © ¹

(7)

Among these types of kernel functions, linear function defined in equation (5) is used popularly in text categorization, and there is little difference in their performance in using SVM [10]. 3.2 Proposed Kernel Function This subsection describes the proposed kernel function for string vectors. It is supposed that two documents are represented into two string vectors,

d is = [ wi1 , wi 2 ,..., win ] and d sj = [ w j1 , w j 2 ,..., w jn ] . The proposed kernel function is defined in equation (8).

(

)

(

k d is ⋅ d sj = sim d is , d sj

)

(8)

The proposed kernel function indicates a semantic similarity between two string vectors, and their semantic similarity is expressed in equation (3). Therefore, the proposed version of SVM is expressed in equation (9).

(

)

(

)

§m · § m · f (x ) = sign¨ ¦ α i k d is ⋅ d sj + b ¸ = sign¨ ¦ α i sim d is , d sj + b ¸ © k =1 ¹ © k =1 ¹

(9)

These Lagrange multipliers are optimized using SMO algorithm also in the proposed version, like the traditional version of SVM.

4 Experimental Results This section presents results of evaluating NB and two versions of SVM, on the two test beds, NewsPage.com and Reuter 21578. The NB and traditional version of SVM use 500 dimensional numerical vectors representing documents for their learning and classification, while the proposed version of SVM uses 10 dimensional string vectors in this experiment. The size of input data used in the proposed version of SVM is far smaller than that of input data in the others. This experiment is partitioned into two sets based on the test beds. The first set and the second set of this experiment are evaluations of the three approaches on NewsPage.com and Reuter 21578, respectively. In this experiment, documents are represented into string vectors for using the proposed version of SVM or numerical vectors for using the others. Dimensions of numerical vectors and string vectors representing documents are set as 500 and 10,

Support Vector Machine for String Vectors

1063

respectively, in the two sets of this experiment. For encoding documents into numerical vectors, most frequent 500 words from a given training set for each test bed are selected as their features. The values of the features of numerical vectors are binary ones indicating absence or presence of words in a given document. For encoding documents into string vectors, most frequent 10 words are selected from a given document as values of its corresponding string vector. Here, features of string vectors are the most frequent word, the second most frequent word, the third most frequent word, and so on. SVM has its capacity which means the maximum value of its Lagrange multipliers, as its parameter. In this experiment, its value is set to 4.0, arbitrary, since its value influence very little on the performance of SVM. 4.1 NewsPage.com The first set of this preliminary experiment pursues the evaluation of the three approaches on the test bed, Newspage.com. This test bed consists of 1,200 news articles in the format of plain texts built by copying and pasting news articles in the web site, www.newspage.com. Table 1 shows the predefined categories, the number of documents of each category, and the partition of the test bed into training set and test set. As shown in table 1, the ratio of training set to test set is set as 7:3. Here, this test bed is called Newspage.com, based on the web site, given as its source. Table 1. Training Set and Test Set of Newspage.com

Category Name Business Health Law Internet Sports Total

Training Set 280 140 70 210 140 840

Test Set 120 60 30 90 60 360

#Document 400 200 100 300 200 1200

The task of text categorization on this test bed is decomposed into five binary classification problems, category by category. In each binary classification problem, a classifier answers whether an unseen document belongs to its corresponding category, or not. Table 2 shows the definition of training sets of the predefined categories. In table 2, ‘positive’ indicates that documents belong to the corresponding category, while ‘negative’ indicates that documents do not. For each training set, all of documents not belonging to its corresponding category are allocated as documents in negative class. For each test set, in negative class, documents are allocated as many as positive documents defined in the third column of table 1.

1064

M. Lee and T. Jo

Table 2. The Allocation of Positive and Negative Class in Training Set of each Category

Category Name Business Health Law Internet Sports

Positive 280 140 70 210 140

Negative 560 700 770 630 700

Total 840 840 840 840 840

Figure 2 presents the results of evaluating NB and two versions of SVM on the first test bed, NewsPage.com. In the x-axis of the graph illustrated in figure 2, the left group indicates micro-averaged F1, the right group indicates macro-averaged F1, and each individual bar within each group corresponds to one of NB and two versions of SVM. The y-axis means the value of micro-averaged or macro-averaged F1. In figure 2, ‘SVM-trad’ means the traditional version of SVM and ‘SVM-new’ means the proposed version of SVM. As illustrated in figure 2, the results show that the proposed version of SVM has the best performance in text categorization among the three approaches. They show also that the traditional approach of SVM does not have as good performance even as NB.

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

SVM-trad SVM-new NB

Micro

Macro

Fig. 2. Results of evaluating NB and two versions of SVM on NewsPage.com

4.2 Reuter 21578 The second set of this experiment is for the evaluation of the three classifiers on the test bed, Reuter21578, which is a typical standard test bed in the field of text categorization. In this experiment set, ten largest categories are selected. Table 3 shows ten selected categories and the number of training documents and test documents in each category. The partition of this test bed into training set and test set follows the version, ModApte, which is the standard partition of Reuter 21578 for evaluating text

Support Vector Machine for String Vectors

1065

Table 3. Partition of Training Set and Test Set in 20NewsGroup

Category Name Acq Corn Crude Earn Grain Interest Money-Fx Ship Trade Wheat

Training Set 1452 152 328 2536 361 296 553 176 335 173

Test Set 672 57 203 954 162 135 246 87 160 76

#Document 2124 209 531 3490 523 431 799 263 495 249

clasifers [11]. The number of documents in each category is very variable as shown in table 3. Figure 3 illustrates the results of evaluating NB and two versions of SVM on the test bed, Reuter 21578. The difference of this experiment set from the previous experiment set is that SVM works slightly better than NB. In the results of this experiment set, the proposed version of SVM is evaluated as the worst approach with micro-averaged F1, but as the clearly best approach with macro-averaged F1, as illustrated in the left and right side of figure 3. The results mean that the proposed SVM works very well to sparse categories, each of which the size is very small.

0.7 0.6 0.5 SVM trad

0.4 0.3

SVM new

0.2

NB

0.1 0 Micro

Macro

Fig. 3. Results of evaluating NB and two versions of SVM on Reuter 21578

These two experiment sets validated the performance of the proposed version of SVM. Note that the proposed version of SVM used only ten dimensional string vectors as its input data, while the others used even 500 dimensional numerical vectors as their input data. Considering this point, even if the proposed version is not better than the others, but comparable to them, the proposed version is more recommendable.

1066

M. Lee and T. Jo

However, the proposed version requires building a similarity matrix from a particular corpus, before we apply it to text categorization. Although it takes very much time to build the similarity matrix, it can be reused continually once it is built.

5 Conclusion The significance of this research is to propose a practical version of SVM for text categorization. The proposed version of SVM has better performance than its traditional version and NB, in spite of its far smaller input size. Note that although the traditional version of SVM is tolerable to huge dimensionality of numerical vectors, it is not robust to their sparse distribution together with other traditional machine learning algorithms. This research shows that representing documents into string vectors is more practical than into numerical vectors for tasks of text categorization. A weakness of SVM including the proposed version is its applicability only to binary classification problems. As shown in equation (4) and (9), SVM classifies any entity into one of two classes: positive class and negative class. If we apply it to a multiple classification problem, we must decompose it into several binary classification problems. Other machine learning algorithms, than SVM, such as KNN, NB, and back propagation, are applicable to a multiple classification problem, without its decomposition. It is necessary to modify SVM to be applicable to a multiple classification problem without decomposing it. In addition to the proposed kernel function described in section 3.2, more kernel functions for string vectors could be defined. Current research defined only one kernel function for string vectors. In further research, we plan to define other kernel functions for string vectors and compare them with each other.

Acknowledgements This Research was supported by the MIC(Ministry of Information and Communication), Korea, under the ITRC(Information Technology Research Center) support program supervised by the IITA(Institute of Information Technology Assessment) (IITA-2005-C1090-0502-0023).

References 1. Cristianini, N., Shawe, T. J.: Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, London (2000) 2. Cristianini, N., Shawe-Taylor, J., Lodhi, H.: Latent Semantic Kernels, Journal of Intelligent Information Systems, 18 ( 2-3) (2002) 127-152 3. Drucker, H., Wu, D., Vapnik, V. N.: Support Vector Machines for Spam Categorization. IEEE Transaction on Neural Networks, 10 (5) (1999) 1048-1054 4. Hearst, M.: Support Vector Machines. IEEE Intelligent Systems, 13 ( 4) (1998) 18-28

Support Vector Machine for String Vectors

1067

5. Jo, T.: Neural Text Categorizer: A New Model of Neural Networks for Text Categorization. The Proceedings of the 7th International Conference on Neural Information Processing, Beijing, China (2000) 280-285 6. Jo, T., Japkowicz, N.: Text Clustering Using NTSO. The Proceedings of International Joint Conference on Neural Networks, Sheraton Vancouver Wall Centre, Vancouver, BC (2006) 558-563 7. Joachims, T.: A statistical Learning Learning Model of Text Classification for Support Vector Machines. The Proceedings of the 24th Annual International ACM SIGIR, (1998) 128-136 8. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text Classification with String Kernels. Journal of Machine Learning Research, 2 ( 2) (2002) 419-444 9. Park, S., Zhang, B.: Co-trained Support Vector Machines for Large Scale Unstructured Document Classification Using Unlabeled Data and Syntactic Information. Information Processing and Management, 40 ( 3) (2004) 421-439 10. Platt, J. C: Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. Technical Report MSR-TR-98-14, (1998) 11. Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Survey, 34 (1) (2002) 1-47

RHDP-Scheme for RFID’s Efficiency Improvement on the Ubiquitous Computing* Bok Yong Choi1, Deok Gyu Lee2, Jong Hyuk Park3, and Kwang Nam Choi1,** 1

School of Computer Science and Engineering, Chung-Ang University, Heukseok-dong, Dongjak-gu, Seoul, Korea [email protected], [email protected] 2 Division of Information Technology Engineering, Soonchunhyang University, # 646, Eupnae-ri, Shinchang-myun, Asan-si, Choongchungnam-do, Korea [email protected] 3 CIST, Korea University, 5-Ka, Anam-Dong, Sungbuk-Gu, Seoul, Korea [email protected]

Abstract. In the near future, RFID will be used at practical fields such as supply chain management, stock management, and so on. However, some barriers such as price of chipset and RFID user information privacy protection exist in widely using it in public. So, recently time-memory trade-off was disclosed to reduce the amount of calculation with user’s information privacy protecting scheme utilizing hash-chain, meeting indistinguishable forward security. However, it requires a long pre-calculation time, causing the problem of scalability. In this paper, we propose scheme (RHDP: Real-time Hash-chain Decreasing Pre-calculation) to reduce pre-calculation time that is the biggest problem in time-memory trade-off. The proposed scheme can shorten the authentication time and improve its scalability and efficiency of RFID.

1 Introduction Ubiquitous computing is a computing environment that various kinds of computers soak into people, objects, and environment are connected with each other so that this enables us to implement a computing at anytime and anywhere. As a meaning of “it is commonplace,” “it exits at the same time in every time and every where”, it is more advanced computing environment than existence home-network and mobile computing. Xerox PARC of America describes the requirements of ubiquitous computing throughout the research about computer and network as follows. The interface of it has to be invisible and embodied virtuality of real computer has to be possible at everywhere and the network has to be connected always [1-3]. The core technologies for these ubiquitous computing have RFID (Radio Frequency identification), WSN(Wireless Sensor Network)[4], Context Awareness[5, 6], etc. *

This work was supported by grant No. R01-2005-000-10940-0 from the Basic Research Program of the Korea Science & Engineering Foundation. ** Corresponding author. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 1068 – 1077, 2006. © Springer-Verlag Berlin Heidelberg 2006

RHDP-Scheme for RFID’s Efficiency Improvement on the Ubiquitous Computing

1069

Specially, in the near future, RFID will be used at practical fields such as a variety of applications including supply chain management, counterfeit prevention, automated payment, physical access control, and smart homes and offices. However, some barriers exist in widely using RFID in public. First, the current price of chipset is too expensive. It is essential to lower its manufacturing cost. Second, it relates to the RFID user’s information privacy protection. This problem is an essential characteristic accompanied by RFID. As tags are supposed to response any reader, it is possible to distinguish ID without reader identification. Points of discussion relate broadly and openly[8, 9], and the success of RFID depend on this user’s information privacy protection. RFID user’s information privacy protection methods require generally a large amount of calculation, and consequently does not match RFID’s low-cost policy. From this point of view, recently time-memory trade-off was disclosed to reduce the amount of calculation with a scheme protecting user’s information privacy utilizing hash-chain, meeting indistinguishable forward security with such restrictions improved. However, it requires a long pre-calculation time, causing the problem of scalability. In this paper, we propose RHDP-scheme to reduce pre-calculation time that is the biggest problem in time-memory trade-off. Accordingly discussion could be made on whether Real-time Hash-Chain can shorten the authentication time and improve its scalability and efficiency. The rest of this paper is organized as follows. In Section 2, we discuss Related Works including RFID system, user’s information privacy issue, security requirements, and the existing schemes. In Section 3, we described Hash-chain and Timememory trade-off. In Section 4, we describe proposed advanced scheme including architecture, a comparison of the existing scheme. We conclude and discuss future research direction in Section 5.

2 Related Works 2.1 RFID System The RFID systems consist of three components: Reader, Tag, and Application systems. Generally, an RFID tag consists of a small microchip with some data storage and limited logical functionality, and an antenna. The RF antenna allows the tag to couple to an electro-magnetic field to obtain power or to communicate with the reader or to do both. RFID tags can be distinguished based on their frequency of operation, or on powering techniques such as active, passive, or semi-passive. Passive tags have no power source of their own and therefore must depend on the electro-magnetic field created by a Reader. Passive tags normally communicate information to a reader by modulating the Reader’s RF signal [10, 11]. Application system with its security server equipped with DB controls the various information that are received from each tag such as ID, Reader Location, and Read Time. Application system receives information of tags and recognizes ID of tag. In this case, it can be eavesdropped due to the radio telecommunication between tag and Reader. In addition, anyone can obtain the output value of tag [12].

1070

B.Y. Choi et al.

2.2 RFID User’s Information and Privacy Issue In [7, 12], M. Ohkubo describes the RFID privacy issues: One is data leakage from RFID-tagged belongings. It can be considered as leakage of personal ID information contained in tag, and obtaining through the trace of Ids tag. ∼ Leakage Issue of personal ID: People tend to have a various kinds of item around them. For some items that are very personal such as high-valued belongings or medical records, people don’t want them to be revealed. If this information is recorded in tag, various type of user’s information could be obtained without any recognition of its owner. ∼ Behavioral tracking and personal identification Issue: If a consumer used his credit card to purchase any goods, malicious user can obtain the detailed information on the credit card through the trace of tag ID that was used in that purchasing. In order to protect the user’s information privacy from this threat, a variety of issue should be solved by technologies such as anonymity of ID or avoidance of the trace being made with a malicious intention. However, it not enough to protect user’s information with this sort of thought. That is, forward security is necessary. The malicious user can leak any confidential information contained in tag. However without the forward security, previous records contained in tag could be traced. Accordingly there is necessity to build the concept to prevent from the trace of past information. Researchers have recognized the RFID user’s information privacy issue for some time and are devising better approaches. There are another researches such as Kill command’s user’s information privacy protection [15] using hash function [16,17] or using public key algorithm [13,18,19]. No single approach is likely to be completely satisfactory. 2.3 Security Requirements on RFID Technology [10, 12, 20] In this sub-section, we discuss considered security requirements on RFID technology such as follows: ∼ Indistinguishability: Tag output must be indistinguishable from truly random values. Moreover, they should be unlinkable to ID of the tag. If the adversary can distinguish that a particular output is from a target tag, he can trace the tag. Naturally, this is included in the concept of ID anonymity. ∼ Forward security: Even if the adversary acquires the secret tag data stored in the tag, he cannot trace the data back through past events in which the tag was involved. Needless to say, the adversary who only eavesdrops on the tag output cannot associate the current output with past output. ∼ Confidentiality: To offer confidentiality at RF communication interval, the followings is considered. Developing low-power encryption algorithm equipped with encryption to reduce the amount of calculations, and both significant information and information on the application system need to be stored and transmitted after encrypting them considering the vulnerability of RF communication.

RHDP-Scheme for RFID’s Efficiency Improvement on the Ubiquitous Computing

1071

Besides, it is necessary to be considered attacks such as Physical Attack, Eavesdropping Attack, Replay Attack, and Denial of Service attack. ∼ Physical Attack: It is possible to steal or destroy the tag in use by physical attack. Regarding denial of service and physical attack as above, which are the attack methods attributed from the mechanical and physical characteristics, they shall not be handled in this study suggesting identification protocol. ∼ Eavesdropping Attack: In case of communication between tag and reader can be eavesdropped by anyone, as it is the method of radio frequency. ∼ Replay Attack: By re-transmitting the eavesdropped contents to the justified reader, it could pretend to be a justified tag. ∼ Denial of Service Attack: Contents of communication could be distorted using the noise or disturbing frequency that has a certain frequency so that RFID system cannot be normally operated. 2.4 The Existing Schemes 2.4.1 Kill Command Scheme Tag that the Auto-ID Center supports have the following property. Each tag has a unique 8-bit password, and upon receiving the password, the tag erases itself. This scheme each password is only 8 bits long, so a malicious attacker may be able to determine some passwords in approximately 28 computations, and use this command abusively[15]. 2.4.2 Hash-Lock Scheme We supposed that tag has hash function in MIT method considering low prices of RFID tag. Certification course is like this: First, reader transmits query to tag. Second, tag forwards MetaID=H(Key) using in certification to Reader. And then reader transmits information which is sent to tag to database. Database abstracts key and ID with MetaID and transmits them to Reader. Reader transmits key to tag. If key shows no difference as compare with MetaID, all session is over by transmitting ID. This method is based on one side hash function and considered efficiency. Because of eavesdropping communication channel between reader and tag, adversary get key and certificate by producing MetaID in hash operation. Also, adversary can certificate by using fixed MetaID and re-transmitting wiretapping MetaID. Finally, hash-lock protocol doesn’t satisfy with demand terms of security as if confidentiality, integrity, and certification [16, 17]. 2.4.3 Randomized Hash-Lock Scheme This is an extension of the hash lock type scheme. It requires the tag to have a hash function and a pseudo-random generator. Certification course is like this: Reader transmits query to tag, tag forms random r and transmits h(IDk r). Reader operates all of the IDall into h(IDall r). Reader searches IDkːwhich is identical with h(IDk r) and transmits them to tag. Tag compares IDk with IDk and then all session is over.

ಹ

ː

ಹ

ಹ

1072

B.Y. Choi et al.

The tag output changes with each access, so this scheme cut-off tracking. However, this scheme allows the location history of the tag to be traced if the secret information in the tag is revealed. i.e., this scheme cannot satisfy the forward security requirement [16, 17].

3 Hash-Chain Scheme In this section, we also discuss hash-chain scheme in [7]. to explain properties of proposed scheme in the next section. We also discuss problems of hash-chain scheme and time-memory trade-off scheme. 3.1

Hash-Chain Scheme

Hash-chain Scheme, using hash-chain scheme, renews the confidential information within tag in order to satisfy forward security. It means it is impossible to identify the input value of hash function from the computed value of hash function. In i-th treatment, tag transmits ai = G (Si) to reader in response. Then, it calculates and renews Si+1 = H (Si) out of Si, the previous confidential information.(H and G are hash function). Reader receives ai from tag, and transmits it to application system that possesses secret s1. Application system, then, checks the formula of ai’ = G (Hi(s1)) for the right identification of tag. Next, application system transmits the ID of tag to reader. We assume that T is Tag , R is Reader , A is Application system.

Fig. 1. Hash-Chain scheme structure [7]

Figure 1 shows the structure of Hash-chain of RFID. M means the number of tags, the maximum length of Hash-chain for each tag is n. A declares H, G hash function, creates S1 the number of random, being saved in database in a pair of hash function (ID , S1), which is a confidential aggregate of S1 and ID. Tag: In the transaction within T, R 1. T sends answer ai = G(Si) to the R, 2. T renews secret si+1 = H(Si) as previous secret Si.

RHDP-Scheme for RFID’s Efficiency Improvement on the Ubiquitous Computing

1073

Reader: In the transaction within T, R, A 1. R obtains answer ai from T, 2. R sends information ai to A through a secure channel, 3. R receive id of T from A through a secure channel. Application system: In the transaction within R, A 1. A receives information ai from R through a secure channel, 2. A finds corresponding id to ai by checking ai = G(Hi-1(S1)) for all i between 1 and n in database (1ืiืn), 3. A sends id to R through a secure channel. ∼ Security: The adversary may not eavesdrop on the radio frequency signals between tag and reader. Thus to ensure the anonymity of the tag ID, the tag should not output it’s ID or constant data. Because this scheme uses random hash value (ai , Si+1), satisfies above requirement. ∼ Efficiency: Because this scheme uses only two hash-function units(H,G), which require a small computational complexity and small gate size, will produce lowcost RFID tags. ∼ Problem: Because this scheme calculates hash (i.e., ai = G(Hi-1(S1))) in order to find an ID of ai in database, takes the huge time like O(m,n).(i.e., m is number of tag , n is hash maximum length). If m = 220 and n = 210 is required 230 which takes the time 64sec/tag. Therefore hash-chain based scheme problems in scalability and flexibility. 3.2 Time-Memory Trade-Off It is proposed to use time-memory trade-off[21] that is based on the Hellman’s theory and the optimum standard theory in Oechslin[14] to reduce hash calculating time ai = G(Hi-1(S1)) in basic concept[7]. Trade-off is the concept that reduces calculating time to search the input of output N from the hash function G. The trade-off’s principle is generally such as follows: 1. 2. 3. 4. 5. ∼

Reduction function R is defined. By F and R, chains of inputs and outputs of F can be built. First and final value of each chain is stored in table. We can find out final value of the chain from output R. We generate complete chain and find the input to F. Efficiency: Trade-off reduces time(N) calculating basic concept by two thirds. Trade-off takes 0.0016 sec, if O(m,n) Æ 1 min(mn=30) ∼ Problem: When mn is 30, it is needed pre-calculating time of 10 hours. So, it is a big problem in expansion and efficiency.

4 Proposed RHDP-Scheme This section explains Real-Time Hash-Chain scheme which reduces pre-calculation time of Time-Memory Trade-Off [21]. Though efficiently reduced the complexity which was the largest problem of basic scheme of Hash-Chain by use of Time-

1074

B.Y. Choi et al.

Memory Trade-Off, needed additional pre-calculation time of large amounts. Therefore, to reduce pre-calculation time, we do not calculate all hash-chain element in advance, but do calculate necessary element by Real-Time Hash-Chain uses of additional hash value cik+1 = H( aik || sik) whenever authentication occurred in the transaction within Tag and Reader. 4.1 RHDP-Scheme Design Tag save two values of sik = H(sik-1 ) and cik = H( aik-1 || sik-1) (1im) (1kn). A value of sik is confidential information and cik is authentication information. Application system server manage cik = H( aik-1 || sik-1), IDi , sik and cnt i.e., set of k (ci , IDi , sik, cnti ) in database. A value of cnti indicates the number of times of the authentication and sik, cik are equal to tag values. Here is that ‘m’ tells the number of tags and ‘n’ tells hash maximum length.

Fig. 2. Proposed Scheme Architecture

Tag: In the transaction within T,R

཰-1 T sends answer a = G(s ) and c = H( a s ) by saved the previous transaction to the R, • ཰-2 T renews secret s = H(s ) as previous confidential s and c =H( a s ) as a was calculated in ཰-1. Reader: In the transaction within T, R, A • ཱ-1 R obtains answer c and a from T, R sends information c and a to A through a secure channel, • ၈-1 R receive ID from A through a secure channel. • ၈-2R identify T using ID . Application system: In the transaction within R, A • ི-1 A receives c and a from R through a secure channel, • ཱི-1 A finds corresponding ID using c from ི-1 for all tags in data•

k i

k i

k i

k+1

k

i

k+1 i

i

k i

k i

k i

k i

i

i

k i

k i

i

base(1im),

k-1 i

k

i

k k i || i

k-1 i ||

k i

k i

RHDP-Scheme for RFID’s Efficiency Improvement on the Ubiquitous Computing

• •

1075

ཱི-2 If success ཱི-1, A renew s = H(s ) and c =H( a s ) in database. ཱི-3 else, A finds corresponding ID using a from R compared with a k+1

k

i

k+1 i

i

k i

i

k i ||

k i

k’ i

which calculated G(Hk-1(sic)) as sic in database (cืkืn , c = cnti), (1im). This situation occured network obstacle in previous communication between T and A. Therefore A not processed

• •

ཱི-4 If success ཱི-3, A renew s = H(s ) and tabase. ུ A sends ID to R through a secure channel. k+1

i

k

i

ཱི-2.

cik+1

=H( aik || sik) in da-

i

Fig. 3. Proposed Scheme Protocol

4.2 Proposed Scheme Analysis Proposed scheme can form chain with real-time whenever requests authentication without pre-calculation time. And occur two situation such as normal connection or abnormal connection. If normal connection, calculation time needs only hash calculation si k+1= H(si k) and cik+1 =H( aik || sik ) such as -2 in database. This scheme also can reduce huge hash-time(i,e., O(m,n)) of Hash-Chain scheme[7] with one hash-time, because application system can find IDi without hash calculation through previous calculation value cik =H( aik-1 || sik-1) in database. But if abnormal connection such as -3 process, A needs calculation time like the largest O(mni’) less than O(mn)( cnti ni’ n).

ཱི

ཱི ื ื

∼ Security: Proposed scheme is all satisfied with a indistinguishability and the forward security which are hash-chain basic property. • indistinguishability : Proposed Scheme is different every time output which calculated random values of (si k, cik , aik) by hash function in tag. • Forward security: An adversary, who eavesdrops only on the tag output, cannot relate to the current output to past output and cannot trace. Proposed Scheme ensure forward security that because cik and aik are hash values by calculated one-way function.

1076

B.Y. Choi et al.

∼ Scalability: Proposed scheme takes time needs only two hash calculation in normal connection. But the Hash-Chain scheme [7] take time m*n hash calculation at all times or huge pre-calculation [21]. ∼ Flexibility: Proposed Scheme easy to add and delete tag, because of only a list related to ID of tag from database.

ཱི

∼ Consideration: If abnormal connection such as -3 process, A needs calculation time like the largest O(mni’). Because this time is huge hash-time, we must found the way to solve the problem. ∼ Comparative analysis: The following is the hash-base chain, time-memory tradeoff , real-time chain comparative analysis. The proposed scheme was designed in all aspects Consideration in the results of a table 1. Table 1. Comparison between the existing schemes and the proposed scheme Compared Scheme Items

Hash-base Chain

Time-memory Trade- Proposed RHDPScheme off

Indistinguishability

Ⴜ

Ⴜ

Ⴜ

Forward security

Ⴜ

Ⴜ

Ⴜ

Confidentiality

Ⴜ

Ⴜ

Scalability

;

೰

Ⴜ

೰

Flexibility

Ⴜ

;

Ⴜ

(O : Excellent

೰: Normal X : Unsatisfied)

5 Conclusion We reviewed concept of hash-chain which can satisfy forward security, indistinguishability, and other security requirements using low-cost RFID, and then discussed scheme can improve efficiency. RFID is necessary to reduce calculation time and computational complexity of tag to meet low-cost policy. It also must satisfy security requirements refered in section 2 which can protect for user’s information privacy. We propose advanced RHDP-scheme to reduce pre-calculation time that is the biggest problem in time-memory trade-off for scalability and efficiency of RFID. Our advanced scheme use real-time Hash-chain can shorten the authentication time. The Real time Hash-chain uses 1~2 hash function and short calculation time, therefore, it satisfies above requirements. In the future, it needs to study more about exception handling and scheme which can decrease store space of DB as much as tag of application system and hash-chain. We are going to graft the proposed advanced scheme and additionally studied schemes together.

RHDP-Scheme for RFID’s Efficiency Improvement on the Ubiquitous Computing

1077

References 1. Buxton, W.: Ubiquitous Media and the Active Office, http://www.billbuxton.com /ubicomp.html 2. Satya, M.: IEEE Pervasive Computing Magazine, http://www.computer.org/ pervasive 3. M. Weiser: Hot topic: Ubiquitous Computing, IEEE Computer (1993) 71-72 4. Akyildiz, I. F., Su, W., Sankarasubramaniam, Y., Cayirci, E.: A survey on sensor networks, IEEE Communication Magazine (2002) 102-114 5. Randell, C., Muller, H.: Context Awareness by Analyzing Accelerometer Data. http://www.cs.bris.ac.uk/Tools/Reports (2000) 226-265 6. Rye.: A Survey of Context-Awareness, MobiCom, ACM Press (1996) 97-107 7. Ohkubo, M., Suzuki, K., Kinoshita, S.: A Cryptographic Approach to “Privacy-Friendly” tag, RFID Privacy Workshop (2003) 624-654 8. Benetton undecided on use of ’smart tags. Associated Press (2003) 9. CNET: Wal-Mart cancels ’smart shelf trial (2003) 10. Finkenzeller, K.: RFID Handbook. John, Wiley and Sons (2003) 11. Ranasinghe, D. C., Engels, D. W., Cole, P. H.: Low-Cost RFID Systems: Confronting Security and Privacy, Auto-ID Labs Research Workshop (2004) 54-77 12. Ohkubo, M., Suzuki, K., Kinoshita, S.: Efficient Hash-Chain Based RFID Privacy Protection Scheme, Ubicomp-04 (2004) 36-69 13. Juels, A., Pappu, R.: Squealing Euros: Privacy Protection in RFID-enabled Banknotes, Financial Cryptography ’03 (2003) 69-85 14. Oechslin: Making a Faster Cryptanalytic Time-memory Trade-off, In Advances in Cryptology CRYPTO'03 15. Auto-ID Center: 860MHz-960MHz Class I Radio Frequency Identification Tag Radio Frequency & Logical communication Interface Specification Proposed Recommendation Version 1.0.0, MIT-AUTOID-TR-007 (2002) 85-98 16. Weis, S. A.: Security and Privacy in Radio-Frequency Identification Devices, Masters Thesis. MIT (2003) 69-98 17. Weis, S. A., Sanjay E.Sarma., Ronald, Rivest, L., Daiel W. Engels: Security and Privacy Aspects of Low-Cost Radio Frequency Identification Systems,” First International Conference on Security in Pervasive Computing (2003) 94-102 18. Kinoshita, S., Hoshino, F., Komuro, T., Fujimura, A., Ohkubo, M.: Nonidentifiable Anonymous-ID Scheme for RFID Privacy Protection. Computer and Communications Security (2002) 85-98 19. Golle, P., Jakobsson, M., Juels, A., Syverson , P.: Universal Re-encryption for Mixnets 20. Weis, S. A., Sarma, S. E., Rivest, R. L., Engels, D. W.: Security and Privacy Aspects of Low-Cost Radio Frequency Identification Systems.SPC’03 (2003) 21. Avoine, G., Oechslin, P.: A Scalable and Provably Secure Hash-Based RFID Protocol. The 2nd IEEE International Workshop on Pervasive Computing and Communication Security (2005) 65-92

A Study on the Improvement of Soldiers Training System for Using Ubiquitous Sensor Network* Seoksoo Kim1 and Soongohn Kim2 1

Hannam University, Department of Multimedia Engineering, Postfach, 306 791 133 Ojeong-Dong, Daedeok-Gu, Daejeon, South Korea [email protected] 2 Joongbu University, Department of Computer Multimedia Science, Chungnam, South Korea

Abstract. Logistics innovation is being promoted so that they can support the military campaigns and provide needed supplies to the right place and at the right time. Such information and knowledge can be best obtained through experience or a battle but this may cause a considerable loss, even including causalities. Moreover, munitions system and military strategies in modern times are for possible battles in the future rather than for solving problems at present. This document uses the definition in a narrow sense to do research on the efficient military distribution system that adopts a ubiquitous sensor network and collects information through an RFID tag.

1 Introduction The U.S. Army, the forerunner of the military distribution system that utilizes hightechnology, has presented visions of “ARMY 21” and “Army After Next,” made plans in detail by predicting the operation of the army according to varied military campaigns, making all out efforts to prepare for the future. The U.S. Army is developing a system that it can dispatch the a brigade within 96 hours, a division in 120 hours, and 5 divisions within 30 days to any place in the world. Furthermore, logistics innovation is being promoted so that they can support the military campaigns and provide needed supplies to the right place and at the right time. Such information and knowledge can be best obtained through experience or a battle but this may cause a considerable loss, even including causalities. Moreover, munitions system and military strategies in modern times are for possible battles in the future rather than for solving problems at present. So it is almost impossible to experiment, evaluate, or analyze them in an actual battle. Hence, obtaining knowledge of a war in an indirect way such as a war game is important. In a broad sense, a war game defined by Defense and Strategic University means “all sorts of simulated military operations including a field maneuver of an actual military unit, a maneuver for proficient activities of a commander and a staff officer, a map exercise such as strategy discussion and judgment in mind, thinking process used *

“This work was supported by the Korea Research Foundation Grant funded by the Korean Government (MOEHRD)”(KRF-2005-041-D00576).

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 1078 – 1083, 2006. © Springer-Verlag Berlin Heidelberg 2006

A Study on the Improvement of Soldiers Training System

1079

for analysis of the enemy force and friendly force, a simulated operation by quantitative means and methods, and so on.” In a narrow sense, however, a war game means “a simulated military operation that carries out with certain rules, procedures, and resources an actual or imaginary combat condition which can’t be experimented due to many dangers.” This document uses the definition in a narrow sense to do research on the efficient military distribution system that adopts a ubiquitous sensor network and collects information through an RFID tag. The systematic distribution methods would allow the simulated military operations to be done more promptly and accurately [1].

2 Wireless Data Transmission for a Ubiquitous Environment The frequency range of an RFID tag is usually 125 KHz and 13.56 MHz, which are used for a traffic card or admission control within short distances. As 900MHz and 2.4GHz are used, the available distance becomes longer and prices, inexpensive. As a result, the technology has been introduced to various areas such distribution, logistics, the environment, transportation, and so on. Besides, more sensing abilities are added and its use is extended to medical/security/national defense areas. Technology of developing a chip, an antenna, and packaging is critical to lowering the prices to 5 cents or below and making it have a small size as well as advanced functions. In future a tag will be integrated with a super cheap price (1 cent or below), a wireless chip/sensor, and a simple function. Table.1 depicts the characteristics of an RFID by its frequency range. Technology of an RFID reader An RFID reader can read 100 tags per a second by adopting tag anti-collision algorithms but the technology aims at reading several hundreds. Since the use of mixed frequency range (13.56MHz, 900MHz, 2.4GHz) is expected, a multi-frequency/multi-protocol reader should be developed. Currently, the sensing distance and accuracy of an RFID reader appears to be limited by the function of an antenna as well as surrounding circumstances. In order to enhance the sensor function, therefore, arrangement of 2~4 antennas is used. Beam-forming antennas, which control beams by responding to the environments, will be applied in the future. Research should be done on tag anti-collision also. There must be identifier code system that gives a tag identification numbers according to regions, for the technology will employ electric tags using identification schemes to identify an object. Related technologies are EPC(96bit) system developed by EAN and UCC, which leads international distribution standardization, and u-ID(126bit) system by Japan. Meanwhile, IPv6 of 128bit is being promoted for the Internet address system, indicating that providers need to scale up global efforts to establish code system standardization [2], [3], [4].

3 The Improvement of Military Logistics System for Using RFID Classification by Kinds: Military supplies are classified according to the number of pieces, nature (expendables and non-expendables), unit prices (high and low), and so

1080

S. Kim and S. Kim

on. Of the various standards, classification based on their kinds is used most common [5], [6]. Classification by Functions: Military supplies are also divided by their functions such as fire power, movement, special weaponry, telecommunication and electronics, aviation and shipping, general equipment, supplies, munitions, and medical supplies (9 categories in total).

4 Army Logistics System ROKA logistics system is divided by the enterprises’ delivery types. First, supplies could be directly sent to both a supply unit and a military unit that needs the articles. Or they could be delivered to a logistics supply command and then to a military unit which needs the articles. With respect to expendables, a unit can make a request for them while it reports the status of non-expendables every quarter to expect the demand. Such system may require longer time, for the logistics supply command takes charge of the distribution but allows accuracy of an itemized statement and real-time service. In the above mentioned logistics system a quarterly report replaces a request and a channel of logistics supply collects the reports. The flow of supplies is direct delivery from an enterprise to a unit but administrative procedures are done by a support unit, creating inconsistency. And supplies are divided by their use and kinds in the system. Introduction of RFID technology to the logistics system as depicted in Figure 1 can prevent waste of a budget and other resources by prompt delivery and accurate management of supplies. In addition, though a division promises to offer required supplies when they are not in stock, the request in not made in an automatic manner Table 1. Classification Under 10 Class

Class

Σ Τ Υ Φ Χ Ψ Ω Ϊ Ϋ ά

Items

Remarks

Food

Including meals ready to eat

General supplies, tying materials, parts of a gun, clothing and individual outfits Oil, hard coal, an antifreeze solution, chemicals Construction materials, timbers

Cement, paint

Munitions

Including explosives

Individual stuffs

PX goods

Finished goods (equipment)

Parts/accessories to repair

Medicines, medical devices / equipment, parts to repair An engine, a tire

Materials for service of public welfare

The rest (Farming tools)

Medical devices/supplies

A Study on the Improvement of Soldiers Training System

1081

Table 2. Classification by functions

Functions

Class Scope

Fire power

7,9

Equipment of fire power Parts to repair maintenance materials

Movement

7,9

Vehicles and trailers Part to repair and maintenance materials Part for a vehicle with various equipment

Special weaponry

7,9

Equipment designated by special weapons Parts to repair

Telecommunication & 2,7,9 Electronics Aviation and shipping

7,9

General equipment

7,9

Supplies

2,3

Goods

1,3,4

Munitions

5

Medical supplies

8

Telecommunications devices Computation equipment and expendables High price software barcode system An airplane, a ship, and the equipment Parts to repair Construction/power supply/river-crossing/duty support equipment Parts to repair Chemicals, an antifreeze solution, heating/cooking equipment Clothing, individual outfits, a tent, a tool, expendables Stable/subsidiary food Oil and hard coal Construction/engineering work materials Munitions for fire power equipment Various explosives Medical devices and their parts Medicines, sanitary/medical devices/materials

Fig. 1. Logistics System by RFID Technology

so that the supplies appear to be depleted unless the official in charge of logistics system ask for the articles. Unlike officials in higher divisions, military units do not know about the schedule when the supplies are delivered.

1082

S. Kim and S. Kim

Fig. 2. Logistics System by USN

Because there’s no automatic system among a division, a logistics supply command, and a logistics command, it’s hard to receive needed supplies even when they are in stock. But the introduction of RFID technology can solve the problem, providing a network for the Ministry of National Defense, the Logistics Command, a logistics supply command, and the other relevant organizations so that they can supply articles based on accurate data. A RFID reader attached to a PDA can be applied to a military operation also. The data of materials and human resources, employing RFID technology, can be used as operation data of a war game. There is a variety of kinds of a war game and The U.S. Navy defines it as a simulation of a military operation involving two or more forces who depict an actual or assumed situation. At present, such a war game is operated to understand efficient operation of air power in Korea. And the data obtained by RFID technology can extend the simulated military operations to the whole army.

5 Conclusion The logistics system of ROKA is divided by the way of delivery: direct delivery from an enterprise to both a support unit and a unit that needs supplies and the deliver to a logistics supply command which sends the supplies to a military unit. Concerning expendables, a unit can make a request, but it makes a report of non-expendables to expect the demand. The logistics system may have an advantage of responding each situation but also causes waste of time and a budget. To address the problem this document took a look at the logistics system using RFID technology. But the document has its limitations, for use of RFID in a logistics system has yet to be developed and there’re not sufficient data to prove the effect of the system. Though large-scale logistics companies are active in adopting RFID system, the analysis of the effect should be done in the near future. RFID technology is becoming a new system for automatic management of physical distribution. And it would provide the army

A Study on the Improvement of Soldiers Training System

1083

logistics with a new method to control manufacturing, delivery, and human resources in real time. Army logistics system has employed high technology and this document analyzed the system of physical distribution in order to apply the data to military operations. RFID and USN discussed herein can be introduced to the logistics system and they will serve to provide data for a war game. Still, this document has its limitations because the use of RFID in a logistics system is in a development stage yet and there’re not sufficient data to prove its effect in logistics system. Though large-scale logistics companies are active in adopting RFID system, the analysis of its effect should be carried out in the near future. RFID technology is becoming a new system for automatic management of physical distribution. And it would provide the army logistics with a new method to upgrade a war game and acquire better results.

References 1. Ubiquitous IT Korea forum : http://www.ukoreaforum.or.kr/ukorea/index.php 2. Sung, M. Y.: The Optimization for SCM with RFID Technology. Journal of Engineering Science and Technology, 2 (2) (2003 ) 58-63 3. Sang, L. C., Hyeon, J. K.: A Study on the Ubiquitous Activation Plan in Logistics Distribution Parts. KLRA, 14 (3) (2004) 112-121 4. Goangmo, Y., Jinhong, B., Gyeongsig, K.: Development of Quality Information System for the Improvement of Efficiency of Small Business. Journal of the Research Institute of industrial Technology, 22 (2003) 16-19 5. McEwan, L.: Implementing RFID in the Supply Chain Wal-Mart's RFID compliance requirement has spurred interest. Textile world, 154 (4) (2004) 72-76 6. Sangyong, L.: A Study on The Improvement of Military Logistics System. Focus on the Clustering Group by Product Characteristics, National Defense University, Master's Thesis, (2004) 67-71

㧒

Definition of Security Requirement Items and Its Process to Security and Progress Management* Eun Ser Lee1 and Sun-myoung Hwang2 1

Information & Media Technology Institute, Soongsil University [email protected] 2 Daejeon University, 96-3 Yongun-dong, Tong-gu, Taejon 300-716, South Korea [email protected]

Abstract. This paper propose the development of a security requirement items and its process of the profile and security target(ISO/IEC 15408). This paper estimate the security requirement items and progress management that manage the security requirements and the progress and its problems. For projects in similar domains, it is possible to remove security risk items and to manage progress by using security lifecycle and progress milestone, which can greatly improve the software process.

1 Introduction The focus of security engineering has expanded from one primarily concerned with safeguarding classified government data to broader applications including financial transactions, contractual agreements, personal information, and the Internet. These trends have elevated the importance of security engineering [1][2][3]. In this paper, we will propose a process related to security by ISO/IEC 15408 and a concept of security requirements[4][5][6].

2 Security Requirements for Common Criteria 2.1 Security Requirements For the development of software, the first objective is the perfect implementation of customer’s requirements. And this work may be done by very simple processes. [7][8]. Therefore, developers or analyzers must consider some security-related factors and append a few security-related requirements to the customer’s requirements. To determine this level for a given security service in a particular application, the customer and the information systems security engineer should consider the value of *

This work was supported by a grant from Security Engineering Research Center of Ministry of Commerce, Industry and Energy.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 1084 – 1089, 2006. © Springer-Verlag Berlin Heidelberg 2006

Definition of Security Requirement Items

1085

the information to be protected, and the perceived threat environment. Guidelines for determining these values May be provided in the IATF [7]. Once a determination has been made regarding the information value and threat environment, the security engineer may decide the robustness level by determining EALs (Evaluation Assurance Levels) and SML (Strength of Mechanism Levels). 2.2 Security Profile Lifecycle The lifecycle is made by protecting the profiles in order to fulfill the security requirement. The next step is the removal of the redundant risk items. Protection profile progress and identification is as follows: 1. A protection profile developer must be provided to explain the profile of the TOE. 2. Explanation of the TOE must be provided so as to describe its product type and the character of the general IT. 3. The evaluator must have confirmation that the information has satisfied all of the evidence requirements. 4. The evaluator must have confirmation of the explanation of the TOE with regards to quality and consistency. 5. The evaluator must have confirmation of the explanation of the TOE with regards to the relationship of protection profile consistency.

Fig. 1. Ubiquitous lifecycle milestone of the protection profile

Fig. 2. Ubiquitous lifecycle milestone of the security target

This figure shows the activity of the risk item as it is removed from the security requirement during the extraction process. Each of the factors use the repeatable milestone for progress management. Each stage is as follows:

1086

E.S. Lee and S.-m. Hwang Table 1. Description of the protection profile

Stage Identify the protection profile Security Environment Setting Security Target Setting Security requirement identification and extraction Definition of the security requirement Identify the security profile risk item Basis of the theory Checking of the rationale

Contents Identification and analysis the protection target for the extraction of the security requirement. H/W and S/W, personal element setting for the security. Security requirement Target Setting Identify and extraction of the security requirement using UML(or other method) Definition of the security requirement for system building Identify the security profile risk item at the domain Build of the repeatable lifecycle for the milestone(LCO, LCA) Check the rationale of the repeatable milestone and risk analysis

This figure shows provision of the activity of the security requirement extraction of the security target. Each of the factors use the repeatable milestone (LCO, LCA) for the progress management. Each stage is as follows: Table 2. Description of the security target

Stage Identify the protection target Security Environment Setting Security Target Setting Security requirement identify and extraction Definition of the security requirement Definition and assurance of the function’s security target Admission of the Security profile Basis of the theory Checking of the rationale Testing and identify of risk items

Contents Identification and analysis the protection target for the extraction of the security requirement. H/W and S/W, personal element setting for the security. Security requirement Target Setting Identify and extraction of the security requirement using UML (or other method) Definition of the security requirement for system building Definition and assurance of the security target based on the function for the system building Admission of the confirmed Security profile at its domain Build of the repeatable lifecycle for the milestone (LCO, LCA) Check the rationale of the repeatable milestone and risk analysis Testing and identification for the extraction of new risk items.

Definition of Security Requirement Items

1087

Table 3. Description of the milestone element Milestone Element

Definition of Operational Concept

System Prototype(s)

Life Cycle Objectives (LCO) · Top- level system objectives and scope : System boundary, Environment parameters and assumptions, Evolution parameters · Operational concept : Operations and maintenance scenarios and parameters, Organizational life- cycle responsibilities (stakeholders) · Exercise key usage scenarios · Resolve critical risks

Definition of System Requirements

· Top- level functions, interfaces, quality attribute levels, including: Growth vectors and priorities, Prototypes · Stakeholders’ concurrence on essentials

Definition of System and Software Architecture

· Top- level definition of at least one feasible architecture · Physical and logical elements and relationships · Choices of COTS and reusable software elements · Identification of infeasible architecture options

Definition of LifeCycle Plan

· Identification of life- cycle stakeholders: Users, customers, developers, maintainers, interoperators, general public, others · Identification of life- cycle process model : Top- level stages, increments · Top- level WWWWWHH* by stage

Life Cycle Architecture (LCA) · Elaboration of system objectives and scope of increment · Elaboration of operational concept by increment · Exercise range of usage scenarios · Resolve major outstanding risks · Elaboration of functions, interfaces, quality attributes, and · prototypes by increment : Identification of TBD’s( (to- bedetermined items) · Stakeholders’ concurrence on their priority concerns · Choice of architecture and elaboration by increment : Physical and logical components, connectors, configurations, constraints COTS, reuse choices, Domain- architecture and architectural style choices, Architecture evolution parameters · Elaboration of WWWWWHH* for Initial Operational Capability (IOC) : Partial elaboration, identification of key TBD’s for later increments

· Assurance of consistency among elements above Feasibility · All major risks Rationale resolved or covered by risk management plan * WWWWWHH: Why, What, When, Who, Where, How, How Much. · Assurance of consistency among elements above : via analysis, measurement, prototyping, simulation, etc. Business case analysis for requirements, feasible architectures

1088

E.S. Lee and S.-m. Hwang

Use of the milestones (LCO, LCA) is essential for removal of risk and progress management. The authors of this paper have provided the repeatable cycle based on the milestone element. Progress was checked using the basis of the milestone. We are provides that the repeatable cycle based on milestone element. Also, we are checked the progress by the basis of milestone. 2.3 Assets Protected by TOE A new method for minimizing the ‘threat phrase’ of PP is combining the assets by using the concept ‘assets protected by TOE’. As mentioned earlier, next items may be the as assets of the NIDS. − − − − −

Devices or systems which consist of the NIDS. Services that some servers included in the network NIDS installed provide. Security-related data stored in the NIDS itself. Data stored in the hosts included in the network NIDS installed. Other network resources included in the network NIDS installed.

In the aspects of the security requirements and the evaluation of the IT systems, assets can be re-identified. − TOE (target of evaluation), − Assets protected by TOE • Systems not included in the TOE • Services provided • Data stored • Data transmitted This method for combining of assets is available for the PP, because the identifying of the security environments is connected to the security functional requirements included in the CC. It’s very important to keep in mind that many kinds of threat agents and attack methods may exist in the ‘assets protected by TOE. Therefore, the threat agents and the attack methods must be considered to contain all cases. For example, the phrase ‘by unauthorized methods’ is not suitable in some cases, because the ‘threat agents’ can contain the ‘authorized users’ who may comprise the confidentiality of the assets protected by TOE by accident. Therefore, the phrases about the method and threat agent can be excluded. Now, the threat description may be described as like: The confidentiality of the assets protected by TOE may be compromised. As seen at the above phrase, in the case the concept ‘assets protected by TOE’ used, the descriptions for the threat are expressed as a kind of ‘possibility’ of attack (Systems, services, and others are included in the assets protected by TOE).

3 Conclusions In this paper, we proposed a method appending some sensor network security-related requirements and process to the customer’s requirements by considering robustness level. For the development of software, the first objective is the perfect implementation

Definition of Security Requirement Items

1089

of customer’s requirements. However, if the software developed has some critical security holes, the whole network or systems that software installed and generated may be very vulnerable. Therefore, developers or analyzers must consider some sensor network security-related factors and append a few sensor network security-related requirements to the customer’s requirements. The processes based on the refinement of the security-related requirements are considered with the processes of software implementation.

References 1. ISO. ISO/IEC 21827 Information technology – Systems Security Engineering Capability Maturity Model (SSE-CMM), (1999) 2. ISO. ISO/IEC 15408-1: Information technology - Security techniques - Evaluation criteria for IT security - Part 1: Introduction and general model, (1999) 3. ISO. ISO/IEC 15408-2: Information technology - Security techniques - Evaluation criteria for IT security - Part 2: Security functional requirements, (1999) 4. ISO. ISO/IEC 15408-3: Information technology - Security techniques - Evaluation criteria for IT security - Part 3: Security assurance requirements, (1999) 5. ISO. ISO/IEC TR 15504-1, 2, 3, 4, 5, 6, 7, 8, 9: Information technology – Software process assessment – Part 1: Concepts and introductory guide, (1998) 6. Lee, E., Lee, M.: Development System Security Process of ISO/IEC TR 15504 and Security Considerations for Software Process Improvement (ICCSA2005), LNCS 3972, Springer Verg, Berlin Heidelberg New York (2005) 7. Lee, E., Lee, K., Lee, K.: Development Design Defect Trigger for Software Process Improvement, Software Engineering Research and Applications, LNCS 3170, Springer Verg, Berlin Heidelberg New York (2003) 8. Lee, E., Kim, T.: Introduction of Development Site Security. Process of ISO/IEC TR 15504, Knowledge-Based Intelligent Information and Engineering Systems, LNCS 3174, Springer Verg, Berlin Heidelberg New York (2004)

Design and Implementation of Web Security Access Control System for Semantic Web Ontology* Eun-ser Lee1 and Sun-myoung Hwang2 1

Soong-Sil University, 511 Sangdo-dong, Dongjak-gu, Seoul 156-743, South Korea [email protected] 2 Daejeon University, 96-3 Yongun-dong, Tong-gu, Taejon 300-716, South Korea [email protected]

Abstract. The World Wide Web has incurred the problem that users are not necessarily provided with the information they want to receive, at a time when the amount of information is explosively increasing. Therefore, W3C (World Wide Web Consortium) proposes the Semantic Web as the new web paradigm to present semantic information. But there is always a security problem in Semantic Web such as WWW and the study about this is insufficient. Therefore, the authors of this paper propose that the SecuSemantic system should be used to present semantic information using ontology and prevents that users flow out the information. This system is an Access Control Matrix based access control model basically, and It is a system that prevent information leakage by user's deliberation and by user’s mistake.

1 Introduction The World Wide Web, which was introduced by Tim Berners-Lee in 1989, has brought an Internet revolution in the late 20th century because of convenience at use. But It has incurred the problem that users are not necessarily provided with the information they want to receive, at a time when the amount of information is explosively increasing. Accordingly, W3C (World Wide Web Consortium) proposes the Semantic Web[1]. The Semantic Web is defined to give clear meaning and definition to information exposed in the web, and involves extension of the web so as to enable people to work in cooperation with computers[2][3]. To show semantic information that is well defined for users according to this paradigm a study of a browser system using ontology needs to be achieved[4][5][6]. But, security problem is weighed in this Semantic Web such as World Wide Web. "exposing data hiding in documents, servers and databases “Eric Miller, Semantic Web activity lead for the W3C, said to several hundred conference participants[7] and expressed gravity of security problem. This paper propose that the SecuSemantic system should be used to present semantic information using ontology and prevents that users flow out the information. Security *

This work was supported by a grant from Security Engineering Research Center of Ministry of Commerce, Industry and Energy.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 1090 – 1095, 2006. © Springer-Verlag Berlin Heidelberg 2006

Design and Implementation of Web Security Access Control System

1091

function that proposed in this paper is not a function that users can do freely about the information that provided them. If users pass the user certification successfully, Authority of users is controlled. For example, even if a user can see semantic information because of having read authority, the user can not print, if he has not print authority. This function prevents information leakage by a mistake and by user's deliberation. It is method that prevents information leakage through client program function including web browser. Because user that pass the certification successfully can flow out the information through the browser’s function such as print, source view, screen capture etc, this method is needed[8][9][10]. Chapter 2 discusses the Semantic Web, Web Security etc., and chapter 3 describes the design and implementation of the SecuSemantic. Chapter 4 shows the results of the SecuSemantic along with its estimates, while finally chapter 5 concludes the paper.

2 System Design 2.1 Active X Web Security Module This system performs security function in client. Module that performs this security function is supplied by Active X control form, and when user connects to server, user is downloaded. The function of this module is divided into greatly two. First, the web page is included semantic information that is offered in server by encrypting state, it performs function that does decryption this. The decryption key is received when user performs login. Second, it performs various function controls of browser. It prevents the function, print, source view, screen capture etc. Control list is showed in 3.6. User receives limitation in case of not prevent unconditionally but has no authority. 2.2 User Authentication User performs login through the web. User is given access authority about semantic information from server by authenticated ID. User authentication for this is required following point. - User DB should be constructed. - User information and En/Decryption Key must be in User DB. - The access authority should be established beforehand in user and this is given when succeed login. 2.3 Ontology Agent Ontology Agent accepts information that user wants as keyword and it extracts semantic information that is correct in the keyword in Ontology that already constructed and performs function that makes it by file of html form. Ontology in this paper limits by climate field. And server encrypts this file and sends to User PC. This time, if user has read authority for the semantic information, one can show the information if not, one can’t show.

1092

E.-s. Lee and S.-m. Hwang

2.4 En/Dectyption As preceding section refered, html file that made by Ontology Agent is downloaded being encrypted. User can see this file doing decryption by Active X Web Security Module that is downloaded in client. In this system, en/decryption module can do file whole to doing en/decryption but it can also offer part encryption function to reduce load. Fig. 1 is proto-type of encryption API of this system. Text strings in the web pages are encoded by providing an input value to the source string and returning the result as shown in Fig. 1. To encode the whole web page, the file pathname is given as the input value to the source string as shown.

Fig. 1. Encryption Algorithm

This ensures the security of the web page when it is downloaded to the client’s PC. 2.5 Authority Control Authorized users can get permission or limit the permission assigned during server authentication. Fig. 2 shows control list.

Fig. 2. List for Access Control

Above control function is performed in client and this system can prevent information leakage by user's mistake as well as information leakage by user's deliberation by performing relevant function. For example, read, print, source view and various function of web browser etc.

3 Result and Estimation 3.1 Result Design and implementation for SecuSemantic system discussed in the previous chapter. In this chapter the results and estimations of the authors’ system will be discussed. Fig. 3 is example Web page of this system. When user clicks each words, this system shows semantic information that use Ontology.

Design and Implementation of Web Security Access Control System

1093

Fig. 3. Example Web page

Fig. 4 shows semantic information when clicked "Climate" of example web page. This semantic information is information that is defined beforehand using Ontology.

Fig. 4. Semantic information about "Climate" that use Ontology

Fig. 4 is screen when user has read authority. If user has no read authority, the information is not seen. And fig. 5 shows screen opened by notepad.

Fig. 5. Encrypted semantic information

1094

E.-s. Lee and S.-m. Hwang

This means that relevant page has encrypted, and when a user inspects in web browser, it is decrypted by real-time. Also, even if a user can show the semantic information because one has read authority, in this system a user can not perform relevant function (print, source view, screen capture) if one has no relevant authority. Fig. 6 shows that a user who has no print authority tries to print through the print menu of web browser.

Fig. 6. The screen that a user has no print authority

If summarize above function, when semantic information that user requires is transmitted from server to client, it is encrypting state, and the information is decrypted and inspected in client, and function of web browser is limited because of security although the information is inspected. 3.2 Estimation SecuSemantic that is proposed in this paper is a system that control so that user do not flow out the information as well as may show wanting semantic information using Ontology by applying security to Semantic Web that is web technology next generation. In this system, a user can inspect web page including semantic information, if he has read authority after user's authentication is succeeded, and even if the web page is inspected, if he has no more authority about relevant web page, user can not flow out the information. That is, if there is no source view authority or print authority, screen capture authority, a user can not flow out the information even if he uses special program and print screen key etc in client PC. This function is available because it is achieved in client. Therefore, this system can be called suitable system to secure about sensitive semantic information that is constructed using Ontology. Fig. 7 shows a chart providing comparisons of other relevant systems.

Fig. 7. Chart comparing other systems

Design and Implementation of Web Security Access Control System

1095

4 Conclusion Semantic Web is web technology next generation invented by Tim Berners-Lee who proposes World Wide Web. This Semantic Web by using Ontology can show more correct information that users wanted, but security about the information becomes problem. This paper proposes system that presents semantic information using ontology and prevents that users flow out the information. When semantic information that user requires is transmitted from server to client, it is encrypting state, and the information is decrypted and inspected in client, and function of web browser is limited because of security. Hereafter, by way of research tasks, Resources in web can be file, image, animation etc. as well as web page and the study of security about its information is defined in Ontology is also needed.

References 1. Semantic Web. http://www.w3.org/2001/sw/ (2001) 2. Tim, B., Semantic Web, W3C, http: //www.w3. org/2000/Talks/1206-xml2k-tbl/, (2000) 3. Tim, B., James Hendler, Ora Lassila, Scientific American article: The Semantic Web, Scientific American issue, (2001) 4. Dzbor, M., Domingue, J., Motta, E.: Magpie: Towards a Semantic Web Browser. Proc. of the 2nd Intl. Semantic Web Conf. ISWC, (2003) 5. Domingue, J., Dzbor, M., Motta, E.: Magpie: Supporting Browsing and Navigation on the Semantic Web. Proc. of Intl. Conf. on Intelligent User Interfaces (IUI), (2004) 6. Quan, D., Karger, D. R.: How to Make a Semantic Web Browser. In Proceedings International WWW Conference, New York (2004) 7. http://news.com.com/2100-1032_3-5605922.html 8. Freier, A., Karlton, P., Kocher, P.: The SSL Protocol Version 3.0. http//:www. netscape.com/eng/ssl3/3-spec.ps, (1996) 9. Dierks, T.: The TLS Protocol 1.0. RFC2246, (1999) 10. Weeks, J.: CCI-Based WebSecurity: A Design Using PGP. WWW Journal 95, (1995)

Designing of Patient-Focused Mobile Agent in the Ubiquitous Medical Examination System* Jaekoo Song1, Seoksoo Kim1,**, and Soongohn Kim2 1

Hannam University, Department of Multimedia Engineering, Postfach , 306 791 133 Ojeong-Dong, Daedeok-Gu, Daejeon, South Korea [email protected] 2 Joongbu University, Department of Computer Multimedia Science, Chungnam, South Korea

Abstract. With the improvement in information technology and the rapid adoption of automatization at both work and home, people are now enjoying a life that is more comfortable and convenient. In particular, as the automatization is becoming standard in the medical field, especially in the large hospital, the demand for system that can handle various information and data are now more than ever imperative. This paper aims to design agent system that is mobile based and decentralized, and that can provide and administer basic medical information. Its main objective is to move away from high cost centralized system to low-cost based system corresponding to small to mid-size medical information that can be effectively utilized and administered.

1 Introduction With the improvement in information technology and the rapid adoption of automatization at both work and home, people are now enjoying a life that is more comfortable and convenient. In particular, as the automatization is becoming standard in the medical field, especially in the large hospital, the demand for system that can handle various information and data are now more than ever imperative. As such, EMR (Electronic Medical Record), CPR (Computer-based Patient Record) and HIS (Hospital Information System) are deployed in the hospital with the aim of managing medical and patient information [1]. Aforesaid medical information system involves storing every sort of information and patient examination records and transmitting necessary medical prescription of diagnosed patients to relevant department through network system [2]. This system is not limited to mere administration of patient registration, examination, payment and other essential data, it is also critical in the sense that it unifies and facilitates all administration functions of the hospital [3], [4]. *

“This work was supported by the Korea Research Foundation Grant funded by the Korean Government(MOEHRD)”(KRF-2005-041-D00576). ** Corresponding author. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 1096 – 1101, 2006. © Springer-Verlag Berlin Heidelberg 2006

Designing of Patient-Focused Mobile Agent

1097

2 Patient Specific Mobile Agent Proposal The mobile device that users carry have all relevant information, including personal information, as well as medical information, location information, sensor information and others that will be generated and managed in real time. Moreover, the agent receives information from agent information center, and in turn, will provide latest information back to agent information center.

Fig. 1. Mobile agent Model

Mobile agent information center plays an essential role in analyzing data provided by a mobile agent and through close interaction with medical environment and via patient specific information and mobile devices; it stores vital data, including examination information, location information and sensor information. Information center also alleviates the calculation and memory limitation of patient-focused mobile agent, increasing the efficiency of the system. In addition, it processes mass size data through integrated information system and information linkage (H2) of ubiquitous medical system, which was not possible with the mobile agent alone [5]. 2.1 Patient-Focused Mobile Agent Information Classification Patient’s personal information, medical information, location information and sensor information are separated and administered. Saved information is managed in such way that if patient needs examination, it is immediately provided with appropriate medical service, resulting in highly efficient and convenient service. To administer the basic information efficiently in patient-focused mobile agent construction, the characteristics are divided as shown below in Table 1. Personal information creation and change when the patients request medical service and their personal information

1098

J. Song, S. Kim, and S. Kim Table 1. Patient-focused mobile agent

Classification Personal tion Medical tion

Define

Example

informa- When the patient request medical service name, sex, age, address, phone and their personal information number.. informa- Patient’s health. Relevant Medical information, medical schedule, x-ray, Embryo picture informa- User’s position information GPS position information

Location tion Sensor information

Sensor information that is collected Mobile device

in Body temperature, pulsation

do not initially exist in the mobile device, pertaining to service requested by the patients, the personal information will be generated in the device by the provision of agent. Following the requests of the services, patient information will be changed accordingly. Additionally, depending on the circumstances, services will be changed accordingly. Location information creation and change upon request by the medical service, it will provide the information on the whereabouts and the changes in the location. Sensor information creation and change Specific purpose sensor observes the conditions and changes in the patient’s surroundings and upon the request by medical services; it outputs the most relevant information. Additionally, depending on the circumstances, its criteria will be changed accordingly [6].

Fig. 2. Mobile agent information cycle

The patient-focused mobile agent information cycle is shown in Figure. 2. It occurs largely in agent information administration, agent information management and between hospitals. Individual information and patient’s location and sensor information is managed real time by agent information management and after transmission and analysis by agent information administration, pertinent information is changed accordingly.

Designing of Patient-Focused Mobile Agent

1099

3 Patient Purpose Mobile Agent Design and Evaluation 3.1 System Design This paper proposes system design as shown in Figure 3 below, Proposed system is divided into mobile agent, agent information center and hospital system. The uppermost shows mobile agent, indicating mobile devices carried by users. At this juncture, mobile devices includes: a sensor control module that administers sensor, data management module that saves and transmits, and private information security module that protects user information, as well as an interface that provides convenience to users. Agent information center is composed of module that manages information collected from mobile agent and is largely consisted of sensor information analysis, data storage, private information encryption, and descrambling. On this point, as mobile agent and agent information requires wireless transmission, it is critical in designing a network control module to ensure secured transmission.

Fig. 3. Proposes mobile agent system design

3.2 System Construction Scenario to create the suggested system is described below. User A is using the mobile device that has sensors detecting temperature, blood pressure, and blood glucose level, and has recently been diagnosed with high blood pressure. When the sensors attached to the mobile device carried by the user detect abnormal increase in blood pressure, its purpose is to request for medical service and provide the hospital information most appropriate for the user. Testing environment for this system is described in Figure 4. In the test environment, mobile device connected to the USB represents the mobile agent used by the user, which sends information on temperature, blood pressure, and blood glucose level to “middle-way” via separate application. Current

1100

J. Song, S. Kim, and S. Kim

location of the user also sends data to special area, while the hospital also inputs information of imaginary hospital based on the location of the user. For this example, let’s say there are five hospitals, and one of the hospitals (M3) has previous history of providing medical service to the user. Here, each hospitals are located M1 to M5 distance away from the location of the user. Test result shows that depending on the intensity of the sensor information that represents emergency situation of the user, it indicates hospital M1 as the nearest, while as the sensor intensity decreases it indicates hospital M3 which has previous history of providing medical service to the user. In this system, XML type technology method is used in order to standardize information collected in various forms.

Fig. 4. Testing environment for this system

3.3 System Evaluation and Comparison As controlled system provides medical service that is based on the patient specific, it has fast response time on emergency and its system is less complicated, making it easier to adapt. The medical examination information process, using patient-focused mobile agent is as follows on Figure 5. In comparison to existing medical examination system, it proposes 5 items, as shown below in Table 2. Existing examination system is tailored to large-scale hospital, and thus, its application is limited to such institutions. However, with the proposed patient-focused mobile agent, its services are created with the individual patient

Fig. 5. The medical examination information process

Designing of Patient-Focused Mobile Agent

1101

Table 2. In comparison to existing medical examination system Item

Information system

Patient focused mobile agent

System scaleG Sensing scope User information management Patient medical information portability

Large-scale system In hospital In hospital In hospital Portability is good in hospitalG Portability is bed outside

Small system Any ware Real time Real time Portability is good in hospitalG Portability is bed outsideG

on mind, and hence, its application are now applicable for small to mid-size hospitals and furthermore, patients can receive various medical services that goes beyond traditional services.

4 Conclusion The proposed patient-focused agent medical examination service moves focus from existing automated hospital focus system to patient specific medical examination service and through provision of individualized service, it prevents redundancy and eliminates unnecessary procedures. In addition, instead of adapting overtly excessive system, it is possible to adapt only required service, such as medical information system that is specifically made for medical examination service. Thus, it is possible to select small to large scale sized system depending on the cost, size and its requirement of the institution.

References 1. Mikkelsen G., Aasly J.: Concordance of Iinformation in Parallel Electronic and Paper Based Patient Records. Int. J. Med. Inf. (2001) 123-131 2. Istepanian, R. S. H., Jovanov, E., Zhang, Y. T.: Guest Editorial Introduction to The Special Section on M-Health: Beyond Seamless Mobility and Global Wireless Health-Care Connectivity. IEEE Trans. on Information Technology in Biomedicine, 8 (4) (2004) 405-414 3. Shortliffe, E. H., Perreault, L. E., Wiederhold, G. G., Fagan, L. M., Hannh, K. J., Ball, J., Wiederhold, G.: Medical Informatics: Computer Applications in Health Care and Biomedicine(CAHCB). Springer-Verlag, Heidelberg Berlin (2000) 4. Laerum H, Ellingsen G, Faxvaag A: Doctors' Use of Electronic Medical Records Systems in Hospitals: Cross Sectional Survey. BMJ, 323 (7325) (2001) 1344-1348 5. Prabhakar, A., Visweswara, G. H.: Datanet Corp Ltd., Bangalore: IT -Applications in Hospitals. Computer Applications in Hospital (ISHA), chapter. 13 6. Danny, L., Mitsuru, O.: Programming and Deploying Java Mobile Agents with Aglets. Addison-Wesley, Doug Lea (1998)

Mobile Robot Path Planning by Circular Scan Code and Circular Distance Transform Method Gyusang Cho1 and Jinkyung Ryeu2 1

Dept. of Computer Eng., Dongyang Univ., Youngju, Korea [email protected] 2 Dept. of IT Electronic Eng., Dongyang Univ., Youngju, Korea [email protected]

Abstract. The circular scan code method that is used for a local planner is suggested to avoid obstacles on the unknown environment. Detected obstacle in the circular window is converted to the bit data and then to the steering angle. Also, circular distance transform method that is used for a global planner is suggested. The shortest path is generated fast by using the method. The method to determine the steering angle and the global path generation was proved by the simulations.

1 Introduction Without any information about work space, robot may happen to meet unexpected obstacles in the pathway and the effective path planning to avoid them is required. Wall Following Method is applied to robot that is driven along the wall, but it does not always guarantee the shortest path. Potential Field Method was suggested by Khatib[1], in which obstacle and target exert repulsive and attractive force, respectively. By this method, robot can avoid the obstacle and plan the trajectory easily, but it also may fall into the local minima. Koren and Borenstein suggested Virtual Force Field Histogram method[2] and Vector Field Histogram method[3]. VFH applied occupancy grid method in modeling the environment. Direction and velocity of robot motion are calculated by using the grid map. But it also may happen to fall into the local minima in this method. Dynamic Window method was suggested by Fox[4], in which mobile robot can be controlled with high velocity in cluttered environment. Li and Bui[5] suggested path planning using fluid model and collision avoidance method. Distance transform method finds a shortest path by transforming distance from the target point to the start point on the contrary. Borgefors[6] and Zelinsky[7] are proposed a similar manner to calculate a shortest path by transforming a path. In this paper, a local path planner by circular scan code is proposed to avoid obstacles without any priori information about circumstances. Detected information on the circular window are converted to circular scan code and applied to avoid obstacles and determine steering angle. Circular distance transform method is proposed for a global path planner. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 1102 – 1107, 2006. © Springer-Verlag Berlin Heidelberg 2006

Mobile Robot Path Planning

1103

2 Local Path Planning by Circular Scan Code Method 2.1 Generation of Circular Scan Code Obstacle window has the radius WR and is divided by vertical and horizontal lines. Window is composed of QR and QL ranges, which has data converted from obstacle to data bits. Clockwise direction increases bit number in QR region and counter-clockwise direction also increases bit number in QL region about the front vertical line.

QR = b1b2 b3 bn

(1)

QL = b1b2b3 bn

(2)

The circle with radius m centered at a mobile robot is used to express the distance between the mobile robot and obstacle. Radius of outer and inner most circle is r 1 and r m , respectively. When an obstacle is detected in r i -th circle, the number of k-th bit becomes j, namely bk = i . In Figure 1, the detected obstacle in QR is located all at r 3 and that in QL at r 3 , r 2 , r 2 and r 1 from the center, in which data are converted to the following data bits. So, the bits become QR = [3330 0000 0000] and QL = [3221 0000 0000] . 2.2 Definition of Obstacle

When any obstacle is o, this means o ⊂ S and obstacle set is defiend as follows. m

Sobs ≡ { o j : o j ⊂ S }

(3)

j =1

Subset of S is S free ≡ S − Sobs and S obs is defined in window in case that there exists at least one bit except 0. In order to define obstacle, B j includes id and count, specifying obstacle existence and its bit size, respectively. Any obstacle o j is defined by the following equations.

B j .id = " obs "

(4)

B j .count = n(bs bs +1bs + 2 bs + l −1 ) (∀bi ≠ 0, l is integer) B j .id = " obs "

(5) (6)

where n(⋅) is the function specifying the number of continuous bit. S free is the string having at least one 0 bit within window and defined as follows. B j .id = " free "

(7)

B j .count = n(bs bs +1bs + 2 bs +l −1 ) (∀bi ≠ 0, l is integer)

(8)

o jf = B j free

(9)

1104

G. Cho and J. Ryeu

Fig. 1. One obstacle is in the quadrant I and II. The obstacle occupied in quadrant I is represented by bit scan code by QR=[3333 0000 0000 0000] and in quadrant II is represented by bit scan code QL=[3221 0000 0000 0000] , where bit number 3 means 3rd radius from outer most circle, n=16.

2.3 Calculation of Steering

Steering angle and distance between target point and current point are denoted by δ t and d t , respectively. The angle size α j occupied by B j can be calculated by Eq. (10).

α j = aunit × B j .count

(degree)

(10)

Window is divided into right side QR and left side QL about vertical line. For the all data bits of QR and QL , crossable range and the corresponding steering angle can be found as the following procedures. CASE 1: B1R .id =" obs" of QR or B1L .id =" obs" of QL

1) Find the case of B Rj .count ≥ Bsafe among B Rj .id = " free " for j = 1, 2, n and calculate the minimum angle α j by Eq. (10). 2) Find the case of B Lj .count ≥ Bsafe among B Lj .id = " free " for j = 1, 2, n and calculate the minimum angle β j calculated by the same equation of α j by Eq. (10). 3) α j and β j calculated by i) and ii) determines the direction angle δ by Eq. (11).

δ = min(α j , β j ) + δ safe CASE 2: B Rj .id = " free " of QR and B Lj .id = " free " of QL

1) For the case of B1R .count + B1L .count ≥ Bsafe i) If α j ≥ 1 δ safe and β j ≥ 1 δ safe are satisfied, then δ = δ t 2 2

(11)

Mobile Robot Path Planning

1105

ii) If α j > 1 δ safe and β j < 1 δ safe are satisfied, then δ = δ t + (δ safe − β j ) 2 2 1 iii) If α j < δ safe and β j > 1 δ safe are satisfied, then δ = δ t − (δ safe − β j ) 2 2 2) For the case of B1R .count + B1L .count < Bsafe follow the above CASE I.

3 Global Path Planning by Circular Distance Transform Circular distance transform method is for a global path planning. Bresenham midpoint circle generation algorithm is applied to find circle cells. The process of circular distance transform for the global path is described as the following steps. STEP 1 : Initialization Distance from the robot center to the target is n, a circle with radius j is represented as r j . map(x,y) is any cell on the grid map. All cells on the grid map is set to 0. STEP 2 : Generation of circles Circles r 1 , r 2 , , r n , where n is distance to the target, are generated by Bresenham midpoint circle generation algorithm, respectively. The center of the circle is located on the j

center of robot. The cells on any circle r are composed of equidistance cells j r j = {r1 j , r2j ,, rmj } , where m means total number of cells on circle r . Cells on any circle

r j are marked with their radius value on the grid map. Straight line segment, which start from the robot to the target point, is needed for assigning a priority of cells. If no obstacles are present, the cells of the line segment on the circle r j become path cell. STEP 3 : Construction of obstacle map

Obstacles on any circle

r j are represented by set of cells as follows o1j = {ri j , ri +j1 , , ri +j b }

(12)

o j = {o1j , o2j , , o pj }

(13)

where b is a size of cell cluster and p is a number of obstacle clusters on any circle r j . Cells occupied with obstacles are marked as -1 in the grid map. If cells are not occupied with obstacles, shaded regions by obstacles are also marked as -1. STEP 4 : Reconstruction of map Cells, which are occupied with -1 but not obstacles, are changed with new value after the process of reconstruction of map. A new circle t j centered at the target point is generated. An initial value of inner radius is set to higher value, and radius values are getting lower as the radius becomes large. The radius of outer most circle radius must be greater than the maximum radius of the previous circle generated in STEP 2. STEP 5 : Path selection Starting from the target cell t(x,y), the i-th chosen cell pi has the smallest values of 3x3 window. If several same cell values exist in the window, the nearest cell value to

1106

G. Cho and J. Ryeu

the line segment is selected. If the path is not optimal, the new path is reconstructed with another cell. A finite set of the chosen path P are represented as follows.

P = p1 ⊕ p 2 ⊕ ⊕ p k

(14)

where symbol ⊕ stands for concatenation of path.

4 Simulation In this simulation, the workspace is defined to 30x30 grid cells. Fig 2 (a) shows is the result of initialization (STEP 1) and circle generation process (STEP 2). Target point

(a)

(b)

Fig. 2. Simulation are performed on 30x30 grid cells. (a) shows the result of application of STEP 1 and STEP 2. -1s means that cells are obstacles or still not assigned any values. (b) shows the final result of application of STEP 3, STEP 4, and STEP 5. The circled cells shows the generated path for the robot.

Fig. 3. This simulation shows the results of circular scan code method for local path planner. 4 different situations are performed with different target points. Robot moves in a workspace safely with eight obstacles.

Mobile Robot Path Planning

1107

is located in the center of block. From start point, circles are generated and labeled with their circle radius values. -1 are assigned to cells outside the outer most circle and a shaded region by obstacles. After obstacles are defined in a grid map(STEP 3), these -1s are reassigned in STEP 4 as shown in Fig. 2 (b). The labeled cells, from the robot to the target, are the chosen path for the robot. Fig. 3 shows simulation result of circular scan code method. Eight obstacles are laid in workspace, the robot has moved to the target point safely with help of the local planner. In the figure 4 different simulations with different target point are performed.

5 Conclusion On the partially known or perfectly unknown environment, moving the mobile robot safely to the target point is important thing. In circular scan code method that is used for a local path planner, obstacle information is applied to the determination of steering angle. This method can be easily applied to ultrasonic sensor which is widely used to mobile robot, and also the steering angle can be determined by simple calculation. Circular distance transform method is for a global path planner. The method is fast for generating the shortest robot path. The suggested methods were found to be effective for mobile robot path planning, which was proven by simulation results.

References 1. Khatib, O.: Real-time Obstacle Avoidance for robot manipulator and mobile robots. Int. J. Robotics Research, 5 (1) (1986) 90-98 2. Koren, Y., Borenstein, J.: Real-time Obstacle Avoidance for Fast Mobile Robots in Cluttered Environment. Proc. IEEE Int. Conf. Robotics and Automation, 2876 (1990) 572-577 3. Koren, Y., Borenstein, J.: The Vector Field Histogram - Fast Obstacle Avoidance for Mobile Robots. IEEE Trans. on Robotics and Automation, 7 (3) (1991) 278-288 4. Fox, D., Burgard, W., Thrum, S.: The Dynamic Window Approach to Collision Avoidance. IEEE Robotics and Automation Magazine, 4 (1) (1997) 23-33 5. Li, Z. X., Bui, T. D.: Robot Path Planning using Fluid Model. J. of Intelligent and Robotics Systems, 21 (1998) 29-50 6. Borgefors, G.: Distance transform in arbitrary dimensions. Comput. Vision, Graphics Image Processing, 27 (1984) 321-345 7. Zelinsky, A.: Navigation with Learning. Proc. IEEE/RSJ Conf. Intelligent Robot Systems (1989) 331-338

Ultrasonic Intruder Detection System for Home Security Yue-Peng Li, Jun Yang, Xiao-Dong Li, and Jing Tian Institute of Acoustics, Chinese Academy of Sciences, 100080 Beijing, China {liyuepeng, jyang, lxd, tian}@mail.ioa.ac.cn

Abstract. This paper describes a new intrusion detection system in the indoor environment, which is based on the airborne sonar technology and pattern classification. It utilizes an ultrasonic transmitter and a receiver to receive the pulse-echo. The simplest feature of echoes, envelope is extracted to train and test the pattern classifiers. The correct classification rates of K-nearest-neighbor (KNN) and Back-Propagation (BP) network classifiers are compared in this paper. Experiment results show that this system has a good performance on the intrusion detection.

1 Introduction Intrusion detection devices have been applied extensively in an alarm system for industrial applications and security purposes, which can detect unauthorized entry into the protected zone and provide an alarm signal or other notice. Security industry is growing at a rapid rate, and various video surveillance systems have been installed for monitoring of public and private places [1]. However, these systems have disadvantages: A person has to watch the video camera and the places need to be illuminated. Moreover, some intrusion detection techniques are generally based on the emission of ultrasonic waves or electromagnetic waves (for example, radar) and the system functions by exploiting the Doppler effect [2, 3]. This frequency shift between emitted and received waves is a characteristic of the reflection from a moving object, which probably indicates an intrusion. The detection system activates an alarm in response to the measured frequency shift. In practical application, these intrusion detection techniques may have high risk of false alarm. For example, disturbances such as sound waves can cause modulation of the emitted wave. This modulated wave, after processing and analysis by the receiver can be incorrectly interpreted as an intrusion, since the characteristics of modulated wave may be identical to those of the wav reflected from a real intruder inside the room or vehicle. There is also detection system based on room acoustic transfer function [4], but the burglars will flee once the audible sound is transmitted. In this paper, we focus on the processing of acoustic data to detect intruders for security. The system monitors the acoustic environment and employs sonar neural network to help reduce nuisance false alarms. The proposed method can be D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 344, pp. 1108 – 1115, 2006. © Springer-Verlag Berlin Heidelberg 2006

Ultrasonic Intruder Detection System for Home Security

1109

applied to intrusion detection in a small room or a vehicle. Also, it can be extended to a large scale space after improvement. The use of sonar neural network for real world tasks is based on the effectiveness of biological sonar system [5-7]. Research on bats and dolphins has shown that sonar echoes can convey rich information about the environment [8-10]. It suggested that, we can extract much information from pulse-echoes for monitoring the target area. In an alternative detection system, we use a transmitter to emit ultrasonic pulses at a single frequency and a receiver to receive the echoes. The echoes are processed to extract feature vectors as input for neural network. Then the KNN and BP network are employed to implement classifiers, and classification results of the two classifiers are compared in the present paper. A cost-effective method based on the feature extraction from ultrasound echoes is targeted by this paper. Experiment results show that the developed intrusion detection system is simple to implement for indoor environment, because it is a narrow band system by using single frequency pulses, and only the simplest feature like the envelope from pulse-echo is to be extracted.

2 Methodology 2.1 Description of the Detection Method We performed experiment in a 4 × 3.5 × 3 m3 room. A piezoelectric transducer transmitted sonar pulses, each of which lasted for 20 cycles at frequency of f = 40kHz . A microphone was connected to a channel of an analog-to-digital converter sampling at 500 kHz with 12 bit resolution. The received echo was processed in a data processor to extract features. Then a classifier gave a judgment to activate an alarm or not. Figure 1 shows the flow of the method.

Fig. 1. Flowchart of detection method

2.2 Echoes

The experiment setup and configuration in the room are shown in Figure 2. The transmitter and receiver were placed in a corner of the room, named position A. It is well known, the ultrasonic intrusion detection system by using Doppler effect likely makes false alarm when a dog or cat comes into the room. For simplicity, we chose an

1110

Y.-P. Li et al.

Fig. 2. Positions distributed in a room. A transmitter and a microphone were laid in position A. A person and an object were located on positions B, C, D, E and F for recording echo samples.

object assumed as a dog in the experiment to test our system's performance on discrimination of circumstances. A great deal of echoes was recorded in the following four cases: (1) There was neither intruder nor object in the room; (2) An object was placed around the positions B, C, D, E and F, respectively; (3) An intruder stood on positions B, C, D, E and F, respectively; (4) An intruder continuously moved in the room. Accordingly, 100 echoes were recorded in every case, and we had a database of (1+5+5+1) ×100=1200 echoes in a total. Echoes obtained from cases (1)-(3) were used to train and test the classifier. The data in the case 4 were only used to further testing the classifier.

Fig. 3. Flowchart of echo data processing

Ultrasonic Intruder Detection System for Home Security

1111

Fig. 4. A filtered and generalized echo when there is an intruder in the room

Figure 3 shows the flowchart of echo processing. The echoes were truncated and filtered for features extraction. According to their amplitudes, all the echoes were truncated into 60 ms . After that a band-pass filter was used to make the echoes between 25 ~ 65kHz . The filtered echoes were generalized to extract features. Figures 4 and 5 show a filtered echo recorded in a non-empty (human presence) and empty room, respectively. Obviously, there is a significant difference between the two wave forms, which contain rich information.

Fig. 5. A filtered and generalized echo when there is no intruder in the room

2.3 Feature Extraction and Classifier

In the time domain, the simplest feature, envelope vector is extracted from each echo as follows [10]: Compute each filtered echo’s envelope first. Then the envelope is segmented into more segments and the mean for each segment is computed. Each echo can be represented by a vector consisting of those means. Clearly, the vector’s dimension Dv is related to echo length LE and segment length Ls as follows:

1112

Y.-P. Li et al.

Dv = LE / LS

(1)

Although there are a great number of standard classifiers to be selected, the KNN and BPNN were used for simplicity. There are two phases in setting up these classifiers: First, the classifier’s structure and its internal parameters is determined using sets of feature vectors (the training phase). Secondly, another set of data was used to test the classification ability of the classifier (the testing phase). In the cases of (1), (2) and (3), 60% data is chosen from each case as a training set, i.e., totally 660 groups of processed data were used in the training phase. The rest data is selected as the testing set. Classification results were divided into two categories: 1) an intruder presence, corresponding to cases (1) and (2); 2) no intruder corresponding to case (3). Moreover, the obtained data in the case (4) was used for further system testing.

3 Experiment Results and Discussion 3.1 K-Nearest-Neighbor (KNN)

The KNN method assumes that the feature vectors are clustered into K clusters. We computed the clusters centers in the training phase, according to the distance measure and the relationship of each cluster to each class. The widely-used Euclidean distance was adopted. New data is then assigned to a cluster such that the Euclidean distance from the data point to the center is minimal. Table 1 shows the classification results. The rates of correct classifications are all lower than 85%, of which the KNN algorithm achieves 84 percent correct classification for a smaller segment length of 0.25 ms . Next we will employ the BP neural network to improve the performance of the present detection system. Table 1. Classification results of KNN with different Segment Length Ls Segment Length Ls

Vector Dimension Dv

Rate of Correct Classification

0.25 ms 0.5 ms 1.0 ms 2.0 ms

240 120 60 30

84.09% 82.73% 71.82% 68.86%

3.2 Back-Propagation Neural Network (BPNN)

The Back-Propagation (BP) learning algorithm, commonly used in the sonar research, was chosen for several reasons: 1) The available training data is applicable to the supervised learning since it was taken from known cases; 2) None of the network outputs were to be fed back to the inputs, so a feedback network was suitable; 3) A testing system was commercially available on hardware.

Ultrasonic Intruder Detection System for Home Security

1113

The BPNN was trained and then activated by logistic sigmoid function. In the intrusion detection system, BPNN has only two output nodes. The weights were initialized to random numbers between – 0.9 and +0.9 .The amount of training was supervised so that the BPNN would not overtrain and learn irrelevant details from some individual training vectors. Table 2 shows the test results of BPNN with different Segment Length Ls . It can be seen that the bigger Ls is, the higher classification error will be. Since the accuracy of envelope description decreased with the increment of Ls , a proper selection of Ls is required with regard to good classification results and low complexity of computation. We set Ls = 1ms in the following experiments. The final network structure is shown in Figure 6. There are 60 input nodes the same number as Dv and 10 hidden nodes. Table 2. Classification results of BPNN with different Segment Length Ls Segment Length Ls

Vector Dimension Dv

Rate of Correct Classification

0.25ms 0.5ms 1.0 ms 2.0 ms

240 120 60 30

99.31% 97.27% 97.04% 88.86%

Fig. 6. BPNN structure

Table 3 shows detailed classification results with Ls = 1ms . From the table 3, it is demonstrated that this classifier can reach high rate of correct classification to avoid high false alarm. Moreover, data set of case 4 was used for further testing system performance. Consider a person moving continuously in the room, the BPNN algorithm still achieve high rate of correct classification (96%).

1114

Y.-P. Li et al.

Table 3. Classification results of BPNN with Ls = 1ms . The top row shows the correct results and left column the results into which the test data is classified. Rate of Correct Classification An Intruder No Intruder

An Intruder

No Intruder

99.5% 0.5%

5% 95%

3.3 Discussion

Comparison of the above algorithms indicates that, the BPNN method is superior to the KNN method with over 15% improvement of the correct classification rate. The intrusion detection system based on BPNN has a good classification capability, which is much preferred. For the KNN or BPNN-based detection system, we can further improve the correct classification rate by using more echo samples. If more positions shown in Fig. 2 are considered for testing, the number of echo samples used for training classifiers will be increased. As a result, it is possible to achieve a very high rate of correct classification, but requiring long computation time. It should be paid more attention on the balance between the computation load and classification performance for a cost-effective intruder detection system.

4 Conclusion A novel approach has been proposed for interior presence detection by utilizing sound characteristics of an enclosed space. An active sensing technology is developed based on the information contained in transmitted reflections. The new sensor system signals the presence of an intruder or object in response to the variation of the ultrasound field determined by a classifier. In this paper, the neural network is employed to process simple features and then accomplish intruder detection tasks. An important advantage of the present method is that a high and stable rate of correct classification can be achieved using the developed system. The low cost feature extraction, off-the-shelf transducers and good performance enable the detection system both affordable and effective. In near future, we will investigate on the intrusion detection by utilizing broadband ultrasonic signal. It is expected that more information could be extracted from echoes by means of time -frequency analysis.

Acknowledgements This research was partially supported by National Natural Science Foundation of China under Grant No. 10474115 and No. 60535030, and Chinese Academy of Science under Grant No. KGCX2-SW-602-7.

Ultrasonic Intruder Detection System for Home Security

1115

References 1. Chun, C.F., Jerrat, N.: A neural network based intelligent intruders detection and tracking system using CCTV images, Proceedings of TENCON 2000,Kuala Lumpur, Ma1aysia (2000) 409-414 2. Denso Corporation: Intrusion detecting apparatus for a vehicle, United States Patent 5856778 issued on January 5. (1999) 3. Bonhoure, F.: Device for detecting the intrusion of a body in a predetermined space, United States Patent 6157293, issued on December 5. (2000) 4. Choi, Y.K., Kim, K.M., Jung, J.W., Chun, S.Y., Park, K.S.: Acoustic intruder detection system for home security, IEEE Trans. Consumer Electronics, 51 (2005) 130–138 5. Kleeman, L., Kuc, R.: Mobile robot sonar for target localization and classification, Int. J. Robot. Res., 14 (1995) 295-318 6. Kuc, R.: Biomimetic sonar system recognizes objects using binaural information, J. Acoustic. Soc. Amer., 102 (1997) 689-696 7. McKerrow, P.J., Harper, N.L.: Recognizing leafy plants with in-air sonar, Sensor Rev., 19 (1999) 202-206 8. Hartley, D.J., Suthers, R.A.: The sound emission pattern of the echolocating bat, Eptesicus fuscus, J. Acoust. Soc. Amer., 85 (1989) 1348-1351 9. Simmons, J.A., Eastman, K.M., Horowitz, S.S., O’Farrell, M.J., Lee, D.N.: Versatility of biosonar in the big brown bat, Eptesicus fuscus, ARLO 2 (2001) 43-48 10. Ecemis, M.I., Gaudiano, P.: Object recognition with ultrasonic sensors, Proc.Int. Symp. on Computational Intelligence in Robotics and Automation, CIRA'99, Monterey, Canada (1999) 250-255

Author Index

Ak, Ayca Gokhan 527 Aliskan, Ibrahim 331 Andreozzi, Sergio 982 Aung, Khin Mi Mi 158 Bae, S.M. 456, 486 Baek, Kwang-Ryul 438 Bao, Guangbin 150 Bao, Yongping 219 Bell, David 778 Cai, Yu 539 Cai, Yuanli 533 Cansever, Galip 527 Cao, Jian 816 Cao, Jun 240 Cao, TieYong 29 Cao, Xianqing 616 Chai, Tianyou 264, 428 Chang, Kyung-Bae 138, 171 Chavarr´ıa-B´ aez, Lorena 676 Chen, Chi 176 Chen, Jianhua 229 Chen, Jiaoliao 571 Chen, Keming 373 Chen, Lei 54 Chen, Liang 144 Chen, Wanmi 494 Chen, Weidong 188 Chen, Xiyuan 417 Chen, Xuhui 150 Chen, Yongqiang 882 Chen, Yuping 909 Chen, Zhimei 565 Chen, Zilong 945 Cheng, Jian 319 Cheng, Xin 176 Cho, Gyusang 1000, 1102 Choi, Bok Yong 1068 Choi, Jonghwa 578 Choi, Jung-Ki 438 Choi, Kwang Nam 1068 Chu, Jian 407, 602 Chung, I-Fang 299

Ciancarini, Paolo 982 Cui, Peiling 933 Cui, Xinchun 810 Dai, Huaping 130 Dai, Wei-di 633 Deng, Xuezhi 373 Ding, Shifei 951 Doan, Son 664 Dong, Hui 72 Du, Wenchao 219 Encheva, Sylvia

834

Fan, Liping 616 Fan, Xinghua 790 Fang, Binhao 428 Fang, Chong 828 Fang, Xin 514 Fei, Minrui 176, 494, 722 Fei, Yu 921 Feng, Chen 747 Feng, Naiqin 1 Fu, Tiaoping 229 Fu, Yuli 60 Ge, Sibo 533 Ge, Xiaosan 828 Gong, Dun-wei 319 Gong, Peng 194 Gu, Zhimin 822 Guan, Donghai 741 Guan, Jiwen 778 Gulez, Kayhan 331 Guo, Donghui 759 Guo, Ge 584, 590 Guo, Qiang 642 Guo, XiaoPing 803 Guo, Xuewu 395 Guo, Yi-Nan 319 Han, Han, Han, Han,

Bo 909 Kyungsook 596 Sangman 741 Yingjie 822

1118

Author Index

Hassan, Mohammad M. 623 He, Hanlin 469 He, Ling 463 He, Pi-lian 633 He, Yanxiang 95, 202 He, Yanzhao 565 Heywood, Malcolm 845 Ho, Tu Bao 939 Horiguchi, Susumu 664 Hou, Chaozhen 500 Hou, Zhixiang 480 Hsu, Chun-Fei 299 Hu, Chuanping 176 Hu, Feng 790 Hu, Huosheng 494 Hu, Kunyuan 654 Hu, Wei 688 Hua, Jun 784 Huang, Hsiang-Chan 450 Huang, Qinghua 12 Huang, Qinhua 969 Huang, Shiqi 144, 385 Huang, Wei 939 Huang, Weitong 903 Huh, Moon Haeng 957 Hwang, J.H. 456, 486 Hwang, Sun-myoung 1084, 1090 Ji, Guangrong 747 Jia, MingXing 803 Jia, Shixiang 275 Jian, Zi-Jie 450 Jiang, Eric 700 Jie-Liu 633 Jie, Min Seok 520 Jin, Fengxiang 951 Jin, Peng 654 Jin, Xue-bo 963 Jing, Yuan-wei 341, 463 Jo, Taeho 1056 Ju, Lincang 533 Juang, Jih-Gau 252 Jung, Kyung-Yong 839 Jung, Sungyun 438 Jung, Young Jin 124 Kang, Jiayin 976 Kang, Lishan 909 Kang, Sanggil 888 Kawasaki, Saori 939

Khan, M. Junaid 539 Khan, Mohammad A.U. 741 Kim, Chin Su 520 Kim, D.M. 486 Kim, Dohyung 596 Kim, Duk Kyung 194 Kim, Eun Hee 165 Kim, Hak Cheol 124 Kim, Hea-Suk 857 Kim, InSu 506, 1040 Kim, Jinho 857 Kim, Kyung Ok 124 Kim, Seoksoo 1078, 1096 Kim, Soongohn 1078, 1096 Kim, Wonil 888 Ko, Eung Nam 990, 1012 Kong, Young-Bae 171 Krishnaswamy, S 863 Lee, Bum Ju 957 Lee, Byungjeong 915 Lee, Chongwon 915 Lee, Deok Gyu 1068 Lee, Dong Gyu 124 Lee, Eun Ser 1084, 1090 Lee, G.A. 456, 486 Lee, Heon Gyu 957 Lee, H.J. 456, 486 Lee, H.W. 456, 486 Lee, Hyunghyo 83, 506 Lee, Jae-Il 1034 Lee, Jong Yun 957 Lee, Juyoung 888 Lee, Kang Woong 520 Lee, Malrey 1056 Lee, Myung Jin 165 Lee, N.K. 456, 486 Lee, Seungyong 83 Lee, Sungy-oung 741 Lee, W.S. 486 Lee, Yang Koo 124 Lee, Youngkoo 741 Lee, Younglok 83, 506, 1040 Lee, Yue-Shi 731 Li, Jigong 590 Li, Jun 207 Li, Ming 712 Li, Mingbao 784 Li, Ning 945 Li, Shujiang 264

Author Index Li, Sufen 654 Li, Xiao-Dong 1108 Li, Xiaoou 676 Li, Yaqin 395 Li, Yijun 894 Li, Yilong 828 Li, Yue-Peng 1108 Li, Zheng 642 Li, Zhijun 882 Liang, Wei 106 Lin, Bo-Shian 252 Lin, Chih-Min 299 Lin, Chun-Liang 450 Lin, Feng-Chu 252 Lin, Mao-Song 939 Liu, Daizhi 144 Liu, Erqi 712 Liu, Hong 373 Liu, Jian 765 Liu, Ruiming 712 Liu, Van-Tsai 450 Liu, Ya-qiu 240 Liu, Yunfeng 144, 385 Liu, Yushu 229 Liu, Zhen-hua 633 Loh, Woong-Kee 857 Lu, Jiangang 72 Lu, Yuchang 903 Luo, Xiao 845 Lv, Yingjie 894 Ma, Junjie 182 Manicka, Sankarnarayanan McGinnity, TM 778 Mehta, Preeti 863 Montesi, Danilo 982 Moon, Yang-Sae 857 Moretti, Rocco 982 Mumcu, Tarık Veli 331 Nam, Seung Woo 596 Nepal, Chirag 596 Nhan, Vu Thi Hong 875 Nian, Rui 747 Ogiela, Lidia 851 Ogiela, Marek R. 851 O˘ gul, Ersin 309 Oh, Jaewon 915 Ouyang, Weimin 969

Pan, Hui 275 Pan, Jianxin 921 Pan, Quan 933 Park, Gwi-Tae 138, 171 Park, Hee-Un 1034 Park, Jong Hyuk 1068 Park, Jong Sou 158 Park, Mi 124 Pian, Jinxiang 264 Prasad, Girijesh 778 Prilusky, Jaime 863 Qian, Weiqi 207 Qin, Xiaolin 810 Qiu, Yuhui 1 Qu, Liangsheng 48 Qv, Kai-feng 514 Raghavenderan, L 863 Rao, Geeta V 863 Ren, Shuqin 158 Ren, Tao 341 Ryeu, Jinkyung 1000, 1102 Ryu, Keun Ho 124, 165, 875, 957

863

Shen, Lincheng 546 Shen, Tao 554 Sheng, Gang 810 Sheng, Huanye 688 Shi, Changjiang 747 Shi, Lei 822 Shi, Min 41 Shi, Ren 533 Shi, Yang 207 Shi, Yongli 500 Shi, Yun 822 Shi, Zhongzhi 951 Shin, Dongil 578 Shin, Dongkyoo 578 Shin, Dong-Myung 1034 Son, Tae-Hwan 138 Song, Huazhu 909 Song, Jaekoo 1096 Song, Jian-she 554 Song, Wen-long 240 Song, Yexin 882 Srividhya, K.V. 863 Su, Hongye 407, 602 Su, Jia 514 Sun, Youxian 72, 130, 963

1119

1120

Author Index

Sun, Yuqiang 1 Sussman, Joel L.

863

Tadeusiewicz, Ryszard 851 Tan, Beihai 21, 60 Tan, Min 395 Tan, Minghao 264 Tang, Daquan 219 Tang, Feng 816 Tang, Xinting 275 Tao, Chaohai 790 Tao, Huan 35 Tian, Haiyan 753 Tian, Jing 1108 Tian, Jinwen 765 Tian, Yuan 584 Tombul, Mustafa 309 Tumin, Sharil 834 Wan, Yongle 66 Wang, Fang 1 Wang, Fanglin 712 Wang, FuLi 803 Wang, Guizeng 933 Wang, Heshou 176, 722 Wang, Hong 927 Wang, Jianying 54 Wang, Linze 608 Wang, RuJing 771 Wang, Sheng 182 Wang, Shuning 539 Wang, Xiaodong 130 Wang, Xin 287 Wang, Xue 182 Wang, Yuncheng 951 Wang, ZhenLi 29 Wang, Zhenyan 565 Wang, Zhi 130 Wang, Zhongsheng 469 Wei, Lin 822 Wei, Yanqin 494 Wen, Guangrui 48 Wen, John T. 395 Won, Woo-Jae 1034 Wu, Chisu 915 Wu, Jiangning 753 Wu, Jin 118, 765 Wu, Lei 176 Wu, QingXiang 778 Wu, Xiaofeng 722

Xi, Yugeng 188 Xia, Feng 444 Xia, Wenbo 21 Xiao, Yi 869 Xie, Jun 514 Xin, Guan 869 Xu, Fang 571 Xu, Hualong 385 Xu, Huipu 584 Xu, WanXin 771 Xue, Peng 194 Yan, Weisheng 207 Yang, Guangfei 753 Yang, Hui 287 Yang, Jie 12, 712 Yang, Jun 1108 Yang, J.Y. 456 Yang, Qingyu 533 Yang, Simon.X 790 Yang, Wenbin 176 Yang, Xiuzhen 219 Yang, Yahong 361 Yang, Yuequan 395 Yang, Zuyuan 66 Yao, Danya 539 Ye, Meiying 351 Ye, Qiang 894 Yen, Show-Jane 731 Yi, Jianqiang 395 Yi, Qingming 41 Yin, Bingkun 765 Yin, Jian 797 Yin, Zhongke 54 You, He 869 You, Ilsun 1022 Yu, Chunlai 385 Yu, Haibin 106 Yu, Jingwen 797 Yu, Jinyong 219 Yu, Lin 35 Yu, Shuanghe 474 Yuan, Liang 188 Yuan, Weiwei 741 Yuan, Zhanting 150 Yue, Heng 428 Zang, Chuanzhi 106 Zeng, Jianping 759 Zeng, Jiguo 229

Author Index Zeng, Yuanyuan 95, 202 Zha, Hongbin 373 Zhan, Hongwu 571 Zhang, Guozhong 546 Zhang, Hui 939 Zhang, Jian-hua 319 Zhang, Jian-yun 35 Zhang, Jinggang 565 Zhang, Jun 797 Zhang, Jun-xian 633 Zhang, Libin 571 Zhang, Lihua 828 Zhang, Qiuyu 150, 361 Zhang, Wenjuan 976 Zhang, Wen-Xiu 927 Zhang, Xingzhou 642 Zhang, Xining 48 Zhang, XiongWei 29 Zhang, Yin-Chao 514 Zhao, Da-yi 633 Zhao, Fuqing 361 Zhao, Juan 539

Zhao, Pei-Tao 514 Zhao, Wenhong 444 Zhao, Wenli 608 Zhao, Yu 903 Zhao, Yue-Feng 514 Zheng, Jiping 810 Zheng, Shiqiang 784 Zheng, Shuibo 176, 722 Zhong, ShaoChun 778 Zhou, Duanning 797 Zhou, Gang 945 Zhou, Gengui 816 Zhou, Lifang 559 Zhou, Lijian 747 Zhou, Luwen 559 Zhou, Xiaojie 428 Zhou, Yu-cheng 341, 463 Zhou, Yue 12 Zhu, Hai-yu 463 Zhu, Yunlong 654 Zincir-Heywood, A. Nur 845 Zou, Hongbo 407, 602

1121

Intelligent Computing in Signal Processing and Pattern Recognition: International Conference on Intelligent Computing, ICIC 2006, Kunming, China, (Lecture Notes in Control and Information Sciences)

Intelligent Computing: International Conference on Intelligent Computing, ICIC 2006, Kunming, China, August 16-19, 2006, Proceedings, Part I

Computational Intelligence and Bioinformatics: International Conference on Intelligent Computing, ICIC 2006, Kunming, China, August 16-19, 2006, Proceedings,

Computational Intelligence: International Conference on Intelligent Computing, ICIC 2006, Kunming, China, August 16-19, 2006, Proceedings, Part II

Intelligent Control and Innovative Computing

Advances in Intelligent Computing: International Conference on Intelligent Computing, ICIC 2005, Hefei, China, August 23-26, 2005, Proceedings, Part II

Advances in Intelligent Computing: International Conference on Intelligent Computing, ICIC 2005, Hefei, China, August 23-26, 2005, Proceedings, Part I

Intelligent Infrastructures (Intelligent Systems, Control and Automation: Science and Engineering)

Fuzzy Reasoning in Information, Decision and Control Systems (Intelligent Systems, Control and Automation: Science and Engineering)

WG12.3 International Conference on Intelligent Information Processing

Advanced Intelligent Computing Theories and Applications: 6th International Conference on Intelligent Computing, ICIC 2010, Changsha, China, August ... Computer Science and General Issues)

Intelligent Control and Automation: International Conference on Intelligent Computing, ICIC 2006, Kunming, China, August, 2006 (Lecture Notes in Control and Information Sciences)

Intelligent Computing in Signal Processing and Pattern Recognition: International Conference on Intelligent Computing, ICIC 2006, Kunming, China, (Lecture Notes in Control and Information Sciences)

Intelligent Computing: International Conference on Intelligent Computing, ICIC 2006, Kunming, China, August 16-19, 2006, Proceedings, Part I

Computational Intelligence and Bioinformatics: International Conference on Intelligent Computing, ICIC 2006, Kunming, China, August 16-19, 2006, Proceedings,

Computational Intelligence: International Conference on Intelligent Computing, ICIC 2006, Kunming, China, August 16-19, 2006, Proceedings, Part II

Intelligent Control and Innovative Computing

Advances in Intelligent Computing: International Conference on Intelligent Computing, ICIC 2005, Hefei, China, August 23-26, 2005, Proceedings, Part II

Advances in Intelligent Computing: International Conference on Intelligent Computing, ICIC 2005, Hefei, China, August 23-26, 2005, Proceedings, Part I

Intelligent Infrastructures (Intelligent Systems, Control and Automation: Science and Engineering)

Singular Control Systems (Lecture Notes in Control and Information Sciences)

Networked Control Systems (Lecture Notes in Control and Information Sciences)

Ensuring Control Accuracy (Lecture Notes in Control and Information Sciences)

Intelligent Freight Transportation (Automation and Control Engineering)

Intelligent Systems: Modeling, Optimization, and Control (Automation and Control Engineering)

Intelligent Control and Computer Engineering (Lecture Notes in Electrical Engineering)

Fuzzy Reasoning in Information, Decision and Control Systems (Intelligent Systems, Control and Automation: Science and Engineering)

WG12.3 International Conference on Intelligent Information Processing

Advanced Intelligent Computing Theories and Applications: 6th International Conference on Intelligent Computing, ICIC 2010, Changsha, China, August ... Computer Science and General Issues)

Robotics (Intelligent Systems, Control and Automation: Science and Engineering)

Colloquium on Automatic Control (Lecture Notes in Control and Information Sciences)

L2-Gain and Passivity Techniques in Nonlinear Control (Lecture Notes in Control and Information Sciences)

Robotics (Intelligent Systems, Control and Automation: Science and Engineering)

Robotics (Intelligent Systems, Control and Automation: Science and Engineering)

Robotic Welding, Intelligence and Automation (Lecture Notes in Control and Information Sciences)

Recent Advances in Control and Optimization of Manufacturing Systems (Lecture Notes in Control and Information Sciences)

Topics in Time Delay Systems: Analysis, Algorithms and Control (Lecture Notes in Control and Information Sciences)

Recent Advances in Learning and Control (Lecture Notes in Control and Information Sciences)

Intelligent Automatic Generation Control

Intelligent control systems using soft computing methodologies

Robustness in Identification and Control (Lecture Notes in Control and Information Sciences)

Robotic Welding, Intelligence and Automation (Lecture Notes in Control and Information Sciences)

Intelligent Control and Automation: International Conference on Intelligent Computing, ICIC 2006, Kunming, China, August, 2006 (Lecture Notes in Control and Information Sciences)

Intelligent Computing in Signal Processing and Pattern Recognition: International Conference on Intelligent Computing, ICIC 2006, Kunming, China, (Lecture Notes in Control and Information Sciences)

Intelligent Computing: International Conference on Intelligent Computing, ICIC 2006, Kunming, China, August 16-19, 2006, Proceedings, Part I

Computational Intelligence and Bioinformatics: International Conference on Intelligent Computing, ICIC 2006, Kunming, China, August 16-19, 2006, Proceedings,

Computational Intelligence: International Conference on Intelligent Computing, ICIC 2006, Kunming, China, August 16-19, 2006, Proceedings, Part II

Intelligent Control and Innovative Computing

Advances in Intelligent Computing: International Conference on Intelligent Computing, ICIC 2005, Hefei, China, August 23-26, 2005, Proceedings, Part II

Advances in Intelligent Computing: International Conference on Intelligent Computing, ICIC 2005, Hefei, China, August 23-26, 2005, Proceedings, Part I

Intelligent Infrastructures (Intelligent Systems, Control and Automation: Science and Engineering)

Singular Control Systems (Lecture Notes in Control and Information Sciences)

Networked Control Systems (Lecture Notes in Control and Information Sciences)

Ensuring Control Accuracy (Lecture Notes in Control and Information Sciences)

Intelligent Freight Transportation (Automation and Control Engineering)

Intelligent Systems: Modeling, Optimization, and Control (Automation and Control Engineering)

Intelligent Control and Computer Engineering (Lecture Notes in Electrical Engineering)

Fuzzy Reasoning in Information, Decision and Control Systems (Intelligent Systems, Control and Automation: Science and Engineering)

WG12.3 International Conference on Intelligent Information Processing

Advanced Intelligent Computing Theories and Applications: 6th International Conference on Intelligent Computing, ICIC 2010, Changsha, China, August ... Computer Science and General Issues)

Robotics (Intelligent Systems, Control and Automation: Science and Engineering)

Colloquium on Automatic Control (Lecture Notes in Control and Information Sciences)

L2-Gain and Passivity Techniques in Nonlinear Control (Lecture Notes in Control and Information Sciences)

Robotics (Intelligent Systems, Control and Automation: Science and Engineering)

Robotics (Intelligent Systems, Control and Automation: Science and Engineering)

Robotic Welding, Intelligence and Automation (Lecture Notes in Control and Information Sciences)

Recent Advances in Control and Optimization of Manufacturing Systems (Lecture Notes in Control and Information Sciences)

Topics in Time Delay Systems: Analysis, Algorithms and Control (Lecture Notes in Control and Information Sciences)

Recent Advances in Learning and Control (Lecture Notes in Control and Information Sciences)

Intelligent Automatic Generation Control

Intelligent control systems using soft computing methodologies

Robustness in Identification and Control (Lecture Notes in Control and Information Sciences)

Robotic Welding, Intelligence and Automation (Lecture Notes in Control and Information Sciences)

Recommend Documents