Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max Planck Institute for Informatics, Saarbruecken, Germany
6677
Derong Liu Huaguang Zhang Marios Polycarpou Cesare Alippi Haibo He (Eds.)
Advances in Neural Networks – ISNN 2011 8th International Symposium on Neural Networks, ISNN 2011 Guilin, China, May 29 – June 1, 2011 Proceedings, Part III
13
Volume Editors Derong Liu Chinese Academy of Sciences, Institute of Automation Key Laboratory of Complex Systems and Intelligence Science Beijing 100190, China E-mail:
[email protected] Huaguang Zhang Northeastern University, College of Information Science and Engineering Shenyang, Liaoing 110004, China E-mail:
[email protected] Marios Polycarpou University of Cyprus, Dept. of Electrical and Computer Engineering 75 Kallipoleos Avenue, 1678 Nicosia, Cyprus E-mail:
[email protected] Cesare Alippi Politecnico di Milano, Dip. di Elettronica e Informazione Piazza L. da Vinci 32, 20133 Milano, Italy E-mail:
[email protected] Haibo He University of Rhode Island Dept. of Electrical, Computer and Biomedical Engineering Kingston, RI 02881, USA E-mail:
[email protected] ISSN 0302-9743 e-ISSN 1611-3349 ISBN 978-3-642-21110-2 e-ISBN 978-3-642-21111-9 DOI 10.1007/978-3-642-21111-9 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2011926887 CR Subject Classification (1998): F.1, F.2, D.1, G.2, I.2, C.2, I.4-5, J.1-4 LNCS Sublibrary: SL 1 – Theoretical Computer Science and General Issues © Springer-Verlag Berlin Heidelberg 2011 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Preface
ISNN 2011 – the 8th International Symposium on Neural Networks – was held in Guilin, China, as a sequel of ISNN 2004 (Dalian), ISNN 2005 (Chongqing), ISNN 2006 (Chengdu), ISNN 2007 (Nanjing), ISNN 2008 (Beijing), ISNN 2009 (Wuhan), and ISNN 2010 (Shanghai). ISNN has now become a well-established conference series on neural networks in the region and around the world, with growing popularity and increasing quality. Guilin is regarded as the most picturesque city in China. All participants of ISNN 2011 had a technically rewarding experience as well as memorable experiences in this great city. ISNN 2011 aimed to provide a high-level international forum for scientists, engineers, and educators to present the state of the art of neural network research and applications in diverse fields. The symposium featured plenary lectures given by worldwide renowned scholars, regular sessions with broad coverage, and some special sessions focusing on popular topics. The symposium received a total of 651 submissions from 1,181 authors in 30 countries and regions across all six continents. Based on rigorous reviews by the Program Committee members and reviewers, 215 high-quality papers were selected for publication in the symposium proceedings. We would like to express our sincere gratitude to all reviewers of ISNN 2011 for the time and effort they generously gave to the symposium. We are very grateful to the National Natural Science Foundation of China, the Institute of Automation of the Chinese Academy of Sciences, the Chinese University of Hong Kong, and the University of Illinois at Chicago for their financial support. We would also like to thank the publisher, Springer, for cooperation in publishing the proceedings in the prestigious series of Lecture Notes in Computer Science. Guilin, May 2011
Derong Liu Huaguang Zhang Marios Polycarpou Cesare Alippi Haibo He
ISNN 2011 Organization
General Chairs Marios M. Polycarpou Paul J. Werbos
University of Cyprus, Cyprus National Science Foundation, USA
Advisory Committee Chairs Ruwei Dai Bo Zhang
Chinese Academy of Sciences, China Tsinghua University, China
Advisory Committee Members Hojjat Adeli Shun-ichi Amari Dimitri P. Bertsekas Amit Bhaya Tianyou Chai Guanrong Chen Andrzej Cichocki Jay Farrell Russell Eberhart David B. Fogel Walter J. Freeman Kunihiko Fukushima Marco Gilli Aike Guo Zhenya He Tom Heskes Janusz Kacprzyk Nikola Kasabov Okyay Kaynak Anthony Kuh Deyi Li Yanda Li Chin-Teng Lin Robert J. Marks II Erkki Oja Nikhil R. Pal Jose C. Pr´ıncipe
Ohio State University, USA RIKEN Brain Science Institute, Japan Massachusetts Institute of Technology, USA Federal University of Rio de Janeiro, Brazil Northeastern University, China City University of Hong Kong, Hong Kong RIKEN Brain Science Institute, Japan University of California, Riverside, USA Indiana University-Purdue University, USA Natural Selection, Inc., USA University of California-Berkeley, USA Kansai University, Japan Politecnico di Torino, Italy Chinese Academy of Sciences, China Southeast University, China Radboud University Nijmegen, The Netherlands Polish Academy of Sciences, Poland Auckland University of Technology, New Zealand Bogazici University, Turkey University of Hawaii, USA National Natural Science Foundation of China, China Tsinghua University, China National Chiao Tung University, Taiwan Baylor University, USA Helsinki University of Technology, Finland Indian Statistical Institute, India University of Florida, USA
VIII
ISNN 2011 Organization
Leszek Rutkowski Jennie Si Youxian Sun DeLiang Wang Fei-Yue Wang Shoujue Wang Zidong Wang Cheng Wu Donald Wunsch II Lei Xu Shuzi Yang Xin Yao Gary G. Yen Nanning Zheng Jacek M. Zurada
Czestochowa University of Technology, Poland Arizona State University, USA Zhejiang University, China Ohio State University, USA Chinese Academy of Sciences, China Chinese Academy of Sciences, China Brunel University, UK Tsinghua University, Beijing, China Missouri University of Science and Technology, USA The Chinese University of Hong Kong, Hong Kong Huazhong University of Science and Technology, China University of Birmingham, UK Oklahoma State University, USA Xi’An Jiaotong University, China University of Louisville, USA
Steering Committee Chair Jun Wang
Chinese University of Hong Kong, Hong Kong
Steering Committee Members Jinde Cao Shumin Fei Min Han Xiaofeng Liao Bao-Liang Lu Yi Shen Fuchun Sun Hongwei Wang Zongben Xu Zhang Yi Wen Yu
Southeast University, China Southeast University, China Dalian University of Technology, China Chongqing University, China Shanghai Jiao Tong University, China Huazhong University of Science and Technology, China Tsinghua University, China Huazhong University of Science and Technology, China Xi’An Jiaotong University, China Sichuan University, China National Polytechnic Institute, Mexico
Organizing Committee Chairs Derong Liu Huaguang Zhang
Chinese Academy of Sciences, China Northeastern University, China
ISNN 2011 Organization
IX
Program Chairs Cesare Alippi Bhaskhar DasGupta Sanqing Hu
Politecnico di Milano, Italy University of Illinois at Chicago, USA Hangzhou Dianzi University, China
Plenary Sessions Chairs Frank L. Lewis Changyin Sun
University of Texas at Arlington, USA Southeast University, China
Special Sessions Chairs Amir Hussain Jinhu Lu Stefano Squartini Liang Zhao
University of Stirling, UK Chinese Academy of Sciences, China Universit`a Politecnica delle Marche, Italy University of Sao Paulo, Brazil
Finance Chairs Hairong Dong Cong Wang Zhigang Zeng Dongbin Zhao
Beijing Jiaotong University, China South China University of Technology, China Huazhong University of Science and Technology, China Chinese Academy of Sciences, China
Publicity Chairs Zeng-Guang Hou Manuel Roveri Songyun Xie Nian Zhang
Chinese Academy of Sciences, China Politecnico di Milano, Italy Northwestern Polytechnical University, China University of the District of Columbia, USA
European Liaisons Danilo P. Mandic Alessandro Sperduti
Imperial College London, UK University of Padova, Italy
Publications Chairs Haibo He Wenlian Lu Yunong Zhang
Stevens Institute of Technology, USA Fudan University, China Sun Yat-sen University, China
X
ISNN 2011 Organization
Registration Chairs Xiaolin Hu Zhigang Liu Qinglai Wei
Tsinghua University, China Southwest Jiaotong University, China Chinese Academy of Sciences, China
Local Arrangements Chairs Xuanju Dang Xiaofeng Lin Yong Xu
Guilin University of Electronic Technology, China Guangxi University, China Guilin University of Electronic Technology, China
Electronic Review Chair Tao Xiang
Chongqing University, China
Symposium Secretariat Ding Wang
Chinese Academy of Sciences, China
ISNN 2011 International Program Committee Jose Aguilar Haydar Akca Angelo Alessandri Lu´ıs Alexandre Bruno Apolloni Marco Antonio Moreno Armend´ariz K. Vijayan Asari Amir Atiya Monica Bianchini Salim Bouzerdoum Ivo Bukovsky Xindi Cai Jianting Cao M. Emre Celebi Jonathan Hoyin Chan Ke Chen Songcan Chen YangQuan Chen Yen-Wei Chen Zengqiang Chen
Universidad de los Andes, Venezuela United Arab Emirates University, UAE University of Genoa, Italy Universidade da Beira Interior, Portugal University of Milan, Italy Instituto Politecnico Nacional, Mexico Old Dominion University, USA Cairo University, Egypt Universit`a degli Studi di Siena, Italy University of Wollongong, Australia Czech Technical University, Czech Republic APC St. Louis, USA Saitama Institute of Technology, Japan Louisiana State University, USA King Mongkut’s University of Technology, Thailand University of Manchester, UK Nanjing University of Aeronautics and Astronautics, China Utah State University, USA Ritsumeikan University, Japan Nankai University, China
ISNN 2011 Organization
Jianlin Cheng Li Cheng Long Cheng Xiaochun Cheng Sung-Bae Cho Pau-Choo Chung Jose Alfredo Ferreira Costa Sergio Cruces-Alvarez Lili Cui Chuangyin Dang Xuanju Dang Mingcong Deng Ming Dong Gerard Dreyfus Haibin Duan Wlodzislaw Duch El-Sayed El-Alfy Pablo Estevez Jay Farrell Wai-Keung Fung John Gan Junbin Gao Xiao-Zhi Gao Anya Getman Xinping Guan Chengan Guo Lejiang Guo Ping Guo Qing-Long Han Haibo He Zhaoshui He Tom Heskes Zeng-Guang Hou Zhongsheng Hou Chun-Fei Hsu Huosheng Hu Jinglu Hu Guang-Bin Huang Ting Huang Tingwen Huang Marc Van Hulle Amir Hussain Giacomo Indiveri
XI
University of Missouri Columbia, USA NICTA Australian National University, Australia Chinese Academy of Sciences, China University of Reading, UK Yonsei University, Korea National Cheng Kung University, Taiwan Federal University, UFRN, Brazil University of Seville, Spain Northeastern University, China City University of Hong Kong, Hong Kong Guilin University of Electronic Technology, China Okayama University, Japan Wayne State University, USA ESPCI-ParisTech, France Beihang University, China Nicolaus Copernicus University, Poland King Fahd University of Petroleum and Minerals, Saudi Arabia Universidad de Chile, Chile University of California Riverside, USA University of Manitoba, Canada University of Essex, UK Charles Sturt University, Australia Helsinki University of Technology, Finland University of Nevada Reno, USA Shanghai Jiao Tong University, China Dalian University of Technology, China Huazhong University of Science and Technology, China Beijing Normal University, China Central Queensland University, Australia Stevens Institute of Technology, USA RIKEN Brain Science Institute, Japan Radboud University Nijmegen, The Netherlands Chinese Academy of Sciences, China Beijing Jiaotong University, China Chung Hua University, Taiwan University of Essex, UK Waseda University, Japan Nanyang Technological University, Singapore University of Illinois at Chicago, USA Texas A&M University at Qatar Katholieke Universiteit Leuven, Belgium University of Stirling, UK ETH Zurich, Switzerland
XII
ISNN 2011 Organization
Danchi Jiang Haijun Jiang Ning Jin Yaochu Jin Joarder Kamruzzaman Qi Kang Nikola Kasabov Yunquan Ke Rhee Man Kil Kwang-Baek Kim Sungshin Kim Arto Klami Leo Li-Wei Ko Mario Koeppen Stefanos Kollias Sibel Senan Kucur H.K. Kwan James Kwok Edmund M.K. Lai Chuandong Li Kang Li Li Li Michael Li Shaoyuan Li Shutao Li Xiaoou Li Yangmin Li Yuanqing Li Hualou Liang Jinling Liang Yanchun Liang Lizhi Liao Alan Wee-Chung Liew Aristidis Likas Chih-Jen Lin Ju Liu Meiqin Liu Yan Liu Zhenwei Liu Bao-Liang Lu Hongtao Lu Jinhu Lu Wenlian Lu
University of Tasmania, Australia Xinjiang University, China University of Illinois at Chicago, USA Honda Research Institute Europe, Germany Monash University, Australia Tongji University, China Auckland University, New Zealand Shaoxing University, China Korea Advanced Institute of Science and Technology, Korea Silla University, Korea Pusan National University, Korea Helsinki University of Technology, Finland National Chiao Tung University, Taiwan Kyuhsu Institute of Technology, Japan National Technical University of Athens, Greece Istanbul University, Turkey University of Windsor, Canada Hong Kong University of Science and Technology, Hong Kong Massey University, New Zealand Chongqing University, China Queen’s University Belfast, UK Tsinghua University, China Central Queensland University, Australia Shanghai Jiao Tong University, China Hunan University, China CINVESTAV-IPN, Mexico University of Macao, Macao South China University of Technology, China University of Texas at Houston, USA Southeast University, China Jilin University, China Hong Kong Baptist University Griffith University, Australia University of Ioannina, Greece National Taiwan University, Taiwan Shandong University, China Zhejiang University, China Motorola Labs, Motorola, Inc., USA Northeastern University, China Shanghai Jiao Tong University, China Shanghai Jiao Tong University, China Chinese Academy of Sciences, China Fudan University, China
ISNN 2011 Organization
Yanhong Luo Jinwen Ma Malik Magdon-Ismail Danilo Mandic Francesco Marcelloni Francesco Masulli Matteo Matteucci Patricia Melin Dan Meng Yan Meng Valeri Mladenov Roman Neruda Ikuko Nishikawa Erkki Oja Seiichi Ozawa Guenther Palm Christos Panayiotou Shaoning Pang Thomas Parisini Constantinos Pattichis Jaakko Peltonen Vincenzo Piuri Junfei Qiao Manuel Roveri George Rovithakis Leszek Rutkowski Tomasz Rutkowski Sattar B. Sadkhan Toshimichi Saito Karl Sammut Edgar Sanchez Marcello Sanguineti Gerald Schaefer Furao Shen Daming Shi Hideaki Shimazaki Qiankun Song Ruizhuo Song Alessandro Sperduti Stefano Squartini Dipti Srinivasan John Sum Changyin Sun
XIII
Northeastern University, China Peking University, China Rensselaer Polytechnic Institute, USA Imperial College London, UK University of Pisa, Italy Universit`a di Genova, Italy Politecnico di Milano, Italy Tijuana Institute of Technology, Mexico Southwest University of Finance and Economics, China Stevens Institute of Technology, USA Technical University of Sofia, Bulgaria Academy of Sciences of the Czech Republic, Czech Republic Ritsumei University, Japan Aalto University, Finland Kobe University, Japan Universit¨at Ulm, Germany University of Cyprus, Cyprus Auckland University of Technology, New Zealand University of Trieste, Italy University of Cyprus, Cyprus Helsinki University of Technology, Finland University of Milan, Italy Beijing University of Technology, China Politecnico di Milano, Italy Aristole University of Thessaloniki, Greece Technical University of Czestochowa, Poland RIKEN Brain Science Institute, Japan University of Babylon, Iraq Hosei University, Japan Flinders University, Australia CINVESTAV, Mexico University of Genoa, Italy Aston University, UK Nanjing University, China Nanyang Technological University, Singapore RIKEN Brain Science Institute, Japan Chongqing Jiaotong University, China Northeastern University, China University of Padua, Italy Universit`a Politecnica delle Marche, Italy National University of Singapore, Singapore National Chung Hsing University, Taiwan Southeast University, China
XIV
ISNN 2011 Organization
Johan Suykens Roberto Tagliaferri Norikazu Takahashi Ah-Hwee Tan Ying Tan Toshihisa Tanaka Hao Tang Qing Tao Ruck Thawonmas Sergios Theodoridis Peter Tino Christos Tjortjis Ivor Tsang Masao Utiyama Marley Vellasco Alessandro E.P. Villa Draguna Vrabie Bing Wang Dan Wang Dianhui Wang Ding Wang Lei Wang Lei Wang Wenjia Wang Wenwu Wang Yingchun Wang Yiwen Wang Zhanshan Wang Zhuo Wang Zidong Wang Qinglai Wei Yimin Wen Wei Wu Cheng Xiang Degui Xiao Songyun Xie Rui Xu Xin Xu Yong Xu Jianqiang Yi Zhang Yi
Katholieke Universiteit Leuven, Belgium University of Salerno, Italy Kyushu University, Japan Nanyang Technological University, Singapore Peking University, China Tokyo University of Agriculture and Technology, Japan Hefei University of Technology, China Chinese Academy of Sciences, China Ritsumeikan University, Japan University of Athens, Greece Birmingham University, UK University of Manchester, UK Nanyang Technological University, Singapore National Institute of Information and Communications Technology, Japan PUC-Rio, Brazil Universit´e de Lausanne, Switzerland University of Texas at Arlington, USA University of Hull, UK Dalian Maritime University, China La Trobe University, Australia Chinese Academy of Sciences, China Australian National University, Australia Tongji University, China University of East Anglia, UK University of Surrey, USA Northeastern University, China Hong Kong University of Science and Technology, Hong Kong Northeastern University, China University of Illinois at Chicago, USA Brunel University, UK Chinese Academy of Sciences, China Hunan Institute of Technology, China Dalian University of Technology, China National University of Singapore, Singapore Hunan University, China Northwestern Polytechnical University, China Missouri University of Science and Technology, USA National University of Defense Technology, China Guilin University of Electronic Technology, China Chinese Academy of Sciences, China Sichuan University, China
ISNN 2011 Organization
Dingli Yu Xiao-Hua Yu Xiaoqin Zeng Zhigang Zeng Changshui Zhang Huaguang Zhang Jianghai Zhang Jie Zhang Kai Zhang Lei Zhang Nian Zhang Xiaodong Zhang Xin Zhang Yunong Zhang Dongbin Zhao Hai Zhao Liang Zhao Mingjun Zhong Weihang Zhu Rodolfo Zunino
XV
Liverpool John Moores University, UK California Polytechnic State University, USA Hohai University, China Huazhong University of Science and Technology, China Tsinghua University, China Northeastern University, China Hangzhou Dianzi University, China University of New Castle, UK Lawrence Berkeley National Lab, USA Sichuan University, China University of the District of Columbia, USA Wright State University, USA Northeastern University, China Sun Yat-sen University, China Chinese Academy of Sciences, China Shanghai Jiao Tong University, China University of S˜ ao Paulo, Brazil University of Glasgow, UK Lamar University, USA University of Genoa, Italy
Table of Contents – Part III
Reinforcement Learning and Decision Making An Adaptive Dynamic Programming Approach for Closely-Coupled MIMO System Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jian Fu, Haibo He, Qing Liu, and Zhen Ni
1
Adaptive Dual Heuristic Programming Based on Delta-Bar-Delta Learning Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jun Wu, Xin Xu, Chuanqiang Lian, and Yan Huang
11
A Design Decision-Making Support Model for Prioritizing Affective Qualities of Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chun-Chih Chen and Ming-Chuen Chuan
21
Local Search Heuristics for Robotic Routing Planner . . . . . . . . . . . . . . . . . Stanislav Sluˇsn´y and Roman Neruda
31
Action and Motor Control Fuzzy Disturbance-Observer Based Control of Electrically Driven Free-Floating Space Manipulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhongyi Chu and Jing Cui
41
Dynamic Neural Network Control for Voice Coil Motor with Hysteresis Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xuanju Dang, Fengjin Cao, and Zhanjun Wang
50
A New Model Reference Adaptive Control of PMSM Using Neural Network Generalized Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Guohai Liu, Beibei Dong, Lingling Chen, and Wenxiang Zhao
58
RBF Neural Network Application in Internal Model Control of Permanent Magnet Synchronous Motor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Guohai Liu, Lingling Chen, Beibei Dong, and Wenxiang Zhao
68
Transport Control of Underactuated Cranes . . . . . . . . . . . . . . . . . . . . . . . . . Dianwei Qian, Boya Zhang, and Xiangjie Liu Sliding Mode Prediction Based Tracking Control for Discrete-time Nonlinear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lingfei Xiao and Yue Zhu
77
84
XVIII
Table of Contents – Part III
Adaptive and Hybrid Intelligent Systems A Single Shot Associated Memory Based Classification Scheme for WSN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nomica Imran and Asad Khan
94
Dynamic Structure Neural Network for Stable Adaptive Control of Nonlinear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jingye Lei
104
A Position-Velocity Cooperative Intelligent Controller Based on the Biological Neuroendocrine System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chongbin Guo, Kuangrong Hao, Yongsheng Ding, Xiao Liang, and Yiwen Dou
112
A Stable Online Self-Constructing Recurrent Neural Network . . . . . . . . . . Qili Chen, Wei Chai, and Junfei Qiao
122
Evaluation of SVM Classification of Avatar Facial Recognition . . . . . . . . . Sonia Ajina, Roman V. Yampolskiy, and Najoua Essoukri Ben Amara
132
Optimization Control of Rectifier in HVDC System with ADHDP . . . . . . Chunning Song, Xiaohua Zhou, Xiaofeng Lin, and Shaojian Song
143
Network Traffic Prediction Based on Wavelet Transform and Season ARIMA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yongtao Wei, Jinkuan Wang, and Cuirong Wang
152
Dynamic Bandwidth Allocation for Preventing Congestion in Data Center Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cong Wang, Cui-rong Wang, and Ying Yuan
160
Software Comparison Dealing with Bayesian Networks . . . . . . . . . . . . . . . . Mohamed Ali Mahjoub and Karim Kalti
168
Adaptive Synchronization on Edges of Complex Networks . . . . . . . . . . . . . Wenwu Yu
178
Application of Dual Heuristic Programming in Excitation System of Synchronous Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuzhang Lin and Chao Lu
188
A Neural Network Method for Image Resolution Enhancement from a Multiple of Image Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shuangteng Zhang and Ezzatollah Salari
200
Multiparty Simultaneous Quantum Secure Direct Communication Based on GHZ States and Mutual Authentication . . . . . . . . . . . . . . . . . . . . Wenjie Liu, Jingfa Liu, Hao Xiao, Tinghuai Ma, and Yu Zheng
209
Table of Contents – Part III
A PSO-Based Bacterial Chemotaxis Algorithm and Its Application . . . . . Rui Zhang, Jianzhong Zhou, Youlin Lu, Hui Qin, and Huifeng Zhang Predicting Stock Index Using an Integrated Model of NLICA, SVR and PSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chi-Jie Lu, Jui-Yu Wu, Chih-Chou Chiu, and Yi-Jun Tsai Affective Classification in Video Based on Semi-supervised Learning . . . . Shangfei Wang, Huan Lin, and Yongjie Hu
XIX
219
228
238
Incorporating Feature Selection Method into Neural Network Techniques in Sales Forecasting of Computer Products . . . . . . . . . . . . . . . . Chi-Jie Lu, Jui-Yu Wu, Tian-Shyug Lee, and Chia-Mei Lian
246
Design of Information Granulation-Based Fuzzy Models with the Aid of Multi-objective Optimization and Successive Tuning Method . . . . . . . . Wei Huang, Sung-Kwun Oh, and Jeong-Tae Kim
256
Design of Fuzzy Radial Basis Function Neural Networks with the Aid of Multi-objective Optimization Based on Simultaneous Tuning . . . . . . . . Wei Huang, Lixin Ding, and Sung-Kwun Oh
264
The Research of Power Battery with Ultra-capacitor for Hybrid Vehicle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jia Wang, Yingchun Wang, and Guotao Hui
274
Neuroinformatics and Bioinformatics Stability Analysis of Genetic Regulatory Networks with Mixed Time-Delays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhanheng Chen and Haijun Jiang
280
Representing Boolean Functions Using Polynomials: More Can Offer Less . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yi Ming Zou
290
The Evaluation of Land Utilization Intensity Based on Artificial Neural Network: A Case of Zhejiang Province . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jianchun Xu, Huan Li, and Zhiyuan Xu
297
Dimension Reduction of RCE Signal by PCA and LPP for Estimation of the Sleeping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yohei Tomita, Yasue Mitsukura, Toshihisa Tanaka, and Jianting Cao
306
Prediction of Oxygen Decarburization Efficiency Based on Mutual Information Case-Based Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Min Han and Liwen Jiang
313
XX
Table of Contents – Part III
Off-line Signature Verification Based on Multitask Learning . . . . . . . . . . . You Ji, Shiliang Sun, and Jian Jin
323
Modeling and Classification of sEMG Based on Instrumental Variable Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiaojing Shang, Yantao Tian, and Yang Li
331
Modeling and Classification of sEMG Based on Blind Identification Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yang Li, Yantao Tian, Xiaojing Shang, and Wanzhong Chen
340
Micro-Blood Vessel Detection Using K-Means Clustering and Morphological Thinning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhongming Luo, Zhuofu Liu, and Junfu Li
348
PCA Based Regional Mutual Information for Robust Medical Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yen-Wei Chen and Chen-Lun Lin
355
Discrimination of Thermophilic and Mesophilic Proteins via Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jingru Xu and Yuehui Chen
363
Information Retrieval Using Grey Relation Analysis and TOPSIS to Measure PCB Manufacturing Firms Efficiency in Taiwan . . . . . . . . . . . . . . . . . . . . . . . . . . Rong-Tsu Wang
370
Application of Neural Network for the Prediction of Eco-efficiency . . . . . . Slawomir Golak, Dorota Burchart-Korol, Krystyna Czaplicka-Kolarz, and Tadeusz Wieczorek
380
Using Associative Memories for Image Segmentation . . . . . . . . . . . . . . . . . Enrique Guzm´ an, Ofelia M.C. Jim´enez, Alejandro D. P´erez, and Oleksiy Pogrebnyak
388
Web Recommendation Based on Back Propagation Neural Networks . . . . Jiang Zhong, Shitao Deng, and Yifeng Cheng
397
A Multi-criteria Target Monitoring Strategy Using MinMax Operator in Formed Virtual Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xin Song, Cuirong Wang, and Juan Wang
407
Data Mining and Knowledge Discovery Application of a Novel Data Mining Method Based on Wavelet Analysis and Neural Network Satellite Clock Bias Prediction . . . . . . . . . . . . . . . . . . Chengjun Guo and Yunlong Teng
416
Table of Contents – Part III
Particle Competition and Cooperation for Uncovering Network Overlap Community Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fabricio Breve, Liang Zhao, Marcos Quiles, Witold Pedrycz, and Jiming Liu Part-Based Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhijie Xu and Shiliang Sun A Parallel Wavelet Algorithm Based on Multi-core System and Its Application in the Massive Data Compression . . . . . . . . . . . . . . . . . . . . . . . Xiaofan Lu, Zhigang Liu, Zhiwei Han, and Feng Wu
XXI
426
434
442
Predicting Carbon Emission in an Environment Management System . . . Manas Pathak and Xiaozhe Wang
450
Classification of Pulmonary Nodules Using Neural Network Ensemble . . . Hui Chen, Wenfang Wu, Hong Xia, Jing Du, Miao Yang, and Binrong Ma
460
Combined Three Feature Selection Mechanisms with LVQ Neural Network for Colon Cancer Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tianlei Zang, Dayun Zou, Fei Huang, and Ning Shen
467
Estimation of Groutability of Permeation Grouting with Microfine Cement Grouts Using RBFNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kuo-Wei Liao and Chien-Lin Huang
475
Improving Text Classification with Concept Index Terms and Expansion Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XiangHua Fu, LianDong Liu, TianXue Gong, and Lan Tao
485
Automated Personal Course Scheduling Adaptive Spreading Activation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yukio Hori, Takashi Nakayama, and Yoshiro Imai
493
Transfer Learning through Domain Adaptation . . . . . . . . . . . . . . . . . . . . . . Huaxiang Zhang The Design of Evolutionary Multiple Classifier System for the Classification of Microarray Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kun-Hong Liu, Qing-Qiang Wu, and Mei-Hong Wang Semantic Oriented Clustering of Documents . . . . . . . . . . . . . . . . . . . . . . . . . Alessio Leoncini, Fabio Sangiacomo, Sergio Decherchi, Paolo Gastaldo, and Rodolfo Zunino Support Vector Machines versus Back Propagation Algorithm for Oil Price Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adnan Khashman and Nnamdi I. Nwulu
505
513 523
530
XXII
Table of Contents – Part III
Ultra-Short Term Prediction of Wind Power Based on Multiples Model Extreme Leaning Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ting Huang, Xin Wang, Lixue Li, Lidan Zhou, and Gang Yao
539
BursT: A Dynamic Term Weighting Scheme for Mining Microblogging Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chung-Hong Lee, Chih-Hong Wu, and Tzan-Feng Chien
548
Towards an RDF Encoding of ConceptNet . . . . . . . . . . . . . . . . . . . . . . . . . . Marco Grassi and Francesco Piazza Modeling of Potential Customers Identification Based on Correlation Analysis and Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kai Peng and Daoyun Xu An Innovative Feature Selection Using Fuzzy Entropy . . . . . . . . . . . . . . . . Hamid Parvin, Behrouz Minaei-Bidgoli, and Hossein Ghaffarian Study on the Law of Short Fatigue Crack Using Genetic Algorithm-BP Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zheng Wang, Zihao Zhao, Lu Wang, and Kui Wang
558
566 576
586
Natural Language Processing Large Vocabulary Continuous Speech Recognition of Uyghur: Basic Research of Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Muhetaer Shadike, Xiao Li, and Buheliqiguli Wasili
594
Sentic Medoids: Organizing Affective Common Sense Knowledge in a Multi-dimensional Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Erik Cambria, Thomas Mazzocco, Amir Hussain, and Chris Eckl
601
Detecting Emotions in Social Affective Situations Using the EmotiNet Knowledge Base . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alexandra Balahur, Jes´ us M. Hermida, and Andr´es Montoyo
611
3-Layer Ontology Based Query Expansion for Searching . . . . . . . . . . . . . . Li Liu and Fangfang Li
621
Chinese Speech Recognition Based on a Hybrid SVM and HMM Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xingxian Luo
629
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
637
An Adaptive Dynamic Programming Approach for Closely-Coupled MIMO System Control Jian Fu1, , Haibo He2, , Qing Liu1 , and Zhen Ni2 1
2
School of Automation,Wuhan University of Technology, Wuhan, Hubei 430070, China
[email protected],
[email protected] Department of Electrical, Computer and Biomedical Engineering, University of Rhode Island, Kingston, RI 02881, USA {he,ni}@ele.uri.edu
Abstract. In this paper, we investigate the application of adaptive dynamic programming (ADP) for a real industrial-based control problem. Our focus includes two aspects. First, we consider the multiple-input and multiple-output (MIMO) ADP design for online learning and control. Specifically, we consider the action network with multiple outputs as control signals to be sent to the system under control, which provides the capability of this approach to be more applicable to real engineering problems with multiple control variables. Second, we apply this approach to a real industrial application problem to control the tension and height of the looper system in a hot strip mill system. Our intention is to demonstrate the adaptive learning and control performance of the ADP with such a real system. Our results demonstrate the effectiveness of this approach. Keywords: adaptive dynamic programming (ADP), multiple-input and multiple-output (MIMO), online learning and control, looper system.
1
Introduction
Adaptive dynamic programming (ADP) has been a key methodology for adaptive control and optimization for a wide range of domains [1],[2],[3],[4]. Briefly speaking, the fundamental idea of ADP is the “principle of optimality” in dynamic programming. Recently, important research results have also suggested that ADP is one of the core technologies to potentially achieve the brain-like general-purpose intelligence and bridge the gap between theoretic study and challenging real-world engineering applications [2].
This work was done when Jian Fu was a visiting scholar at the Department of Electrical, Computer, and Biomedical Engineering at The University of Rhode Island, Kingston, RI 02881, USA. This work was supported in part by the National Science Foundation (NSF) under Grant ECCS 1053717.
D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 1–10, 2011. Springer-Verlag Berlin Heidelberg 2011
2
J. Fu et al.
Over the past decades, extensive efforts have devoted to the ADP research and tremendous progresses have been achieved across different application domains, ranging from power system control, communication networks, to helicopter control. For instance, a synchronous generator neuro-controller in a power system with a HDP structure was implemented by multilayer perceptron neural network and radial basis function neural network[5]. In [6], ADP has been successfully applied to the complex helicopter control based on a structured cascade action network and an associated trim network. Tracking applications in engine torque and air fuel ratio control was presented in [7] by ADP design with time lagged recurrent neural network. In [8], a hierarchical ADP architecture is proposed to adaptively develop multiple levels of internal goal representations to facilitate the optimization over time. In this work, our main focus is to demonstrate the ADP application to a practical industrial problem. The action network in our ADP architecture includes multiple outputs, which will be suitable for applications which require multiple control variables. The application case we tested in this research is a looper system in a hot strip mill, a typical nonlinear closely-coupled MIMO system. The rest of this paper is organized as follows. Section 2 presents the MIMO ADP design and its implementation details, such as the adaptation and tuning of parameters in both critic network and action network. In section 3, we present in detail the application to the height and tension control of the looper in a hot strip mill. In addition, experimental settings and results are presented and analyzed. Finally, a conclusion is given in section 4.
2
ADP for System with Multiple Control Variables
In this work, we built our research based on the ADP architecture as presented in [4] (see Fig.1). In this structure, the temporal difference is obtained by calculating the current cost-to-go J (k) and previous value J (k − 1) in the diagram. The binary reinforcement signal r is provided from the external environment which we mark as reinforcement signal explicitly. In our current study, we adopt a common strategy to use “0” and “-1” representing “success” and “failure”, respectively.
&ULWLF 1HWZRUN
-ˆ ; N -1
B
;N
$FWLRQ 1HWZRUN
XN
6\VWHP
;N
$FWLRQ 1HWZRUN
XN
&ULWLF 1HWZRUN
-ˆ ; N
˞
UN
5HLQIRUFHPHQWVLJQDO
B
8F
Fig. 1. Schematic diagram for implementation of online learning based on ADP
An ADP Approach for Closely-Coupled MIMO System Control
3
We realize in many practical situations, the system under control normally require multiple control variables. To do so, the action network needs to include multiple output neurons, which will also in turn augment the inputs to the critic network. Therefore, we first derive the weight adaptation procedure under this situation. All discussions in this article assume neural network with multi-layer perceptron (MLP) architecture is used in both critic network and action network design. 2.1
Critic Network
In the critic network, the output can be defined as follows qi (k) =
N cin
wc(1) (k) · xj (k), ij
i = 1, · · · , Nch
(1)
i = 1, · · · , Nch
(2)
j=1
pi (k) =
1 − exp−qi (k) , 1 + exp−qi (k) Jˆ (k) =
Nch
wc(2) (k) · pi (k) , i
(3)
i=1
where qi is the ith hidden node input of the critic network, pi is the corresponding output of the hidden node qi , Ncin is the total number of the input nodes in the critic network including Nx inputs from system states and Naout inputs from the outputs nodes in the action network, and Nch is the total number of the hidden nodes in the critic network. As discussed in [4], in this structure J can be estimated by minimizing the following error over time Eh =
2 1 ˆ J (k) − r (k) − αJˆ (k + 1) . 2
(4)
k
So the objective function to be minimized in the critic network is Ec (k) =
2 1 2 ec (k) = αJˆ (k) − Jˆ (k − 1) − r (k − 1) , 2
(5)
We can reach the adaptation of the critic network as the following by applying the chain backpropagation rule [9]. This process is summarized as follows. (2) 1) Δwc (hidden to output layer) ∂Ec (k) (2) Δwci (k) = lc (k) − (2) , (6) ∂wci (k) ∂Ec (k) (2) ∂wci
(k)
=
∂Ec (k) ∂ec (k) ∂ Jˆ (k) · · = αec (k) pi (k) ∂ec (k) ∂ Jˆ (k) ∂wc(2) i (k)
(7)
4
J. Fu et al. (2)
where lc (k) > 0 is the learning rate of the critic network at time k and wci is the weight between ith hidden node with output node in the critic network. (1) 2) Δwc (input to hidden layer) ∂E (k) c Δwc(1) (k) = lc (k) − (1) , (8) ij ∂wcij (k) ∂Ec (k) ∂ec (k) ∂ Jˆ (k) ∂pi (k) ∂qi (k) · · · · (1) ∂ec (k) ∂ Jˆ (k) ∂pi (k) ∂qi (k) ∂wc(1) ∂wcij (k) ij (k)
1 = αec (k) wc(2) (k) · 1 − p2i (k) · xj (k) i 2 ∂Ec (k)
=
(9) (10)
(1)
where wcij is the weight between jth input node with ith hidden node in the critic network. 2.2
Action Network
We now proceed to investigate the weight tuning process in the action network. Note in this case, we consider the action network has multiple control outputs Naout . The associated equations for the action network are: hi (k) =
N ain
wa(1) (k) · xj (k), i = 1, · · · , Nah ij
(11)
j=1
gi (k) =
vm (k) =
N ah
1 − exp−hi (k) , i = 1, · · · , Nah 1 + exp−hi (k)
(12)
wa(2) (k) · gi (k) , m = 1, · · · , Naout mi
(13)
i=1
um (k) =
1 − exp−vm (k) , m = 1, · · · , Naout 1 + exp−vm (k)
(14)
where hi is the ith hidden node input of the action network, gi is the ith hidden node output of the action network, vm is the input of the mth output node of the action network, um is the output of the mth output node of the action network, Nain is the total number of the input nodes in the action network, Nah is the total number of the hidden nodes in the action network, and Naout is the total number of the output nodes in the action network. The purpose in adapting the action network is to implicitly back propagate the error between the desired ultimate object Uc and approximate J function from critic network. In our current study, we set Uc (k) equal to zero. The error of the action network is defined as Ea (k) =
1 2 2 e (k) = [J (k) − Uc ] . 2 a
(15)
An ADP Approach for Closely-Coupled MIMO System Control
5
When the classic gradient descent search is used, the generalized approach for MIMO system can be summarized as follows. (2) 1) Δwa (hidden to output layer) ∂Ea (k) (2) Δwami = la (k) − (2) , (16) ∂wami (k) ∂Ea (k) (2) ∂wami
(k)
=
∂Ea (k) ∂ea (k) ∂ Jˆ (k) ∂um (k) ∂vm (k) · · · · ∂ea (k) ∂ Jˆ (k) ∂um (k) ∂vm (k) ∂wa(2) mi (k)
(17)
N ch
1 1 = ea (k)gi (k) 1 − u(2) wc(2) (k)· 1 − p2l (k) ·wc(1) (k) (18) m (k) l l(Nx+m) 2 2 l=1
Here the um (k) is the mth output of the action network, which also serves the (Nx + m)th (again, Nx is the number of inputs from the system) inputs to the ∂J (k) critic network as well. One should note that under MIMO scenario, the ∂u m (k) need to be calculated by summarizing the respective partial derivatives in all hidden nodes in the critic network. (1) 2) Δwa (hidden to output layer) ∂Ea (k) (1) Δwaij = la (k) − (1) , (19) ∂waij (k) N aout ∂Ea (k) ∂ea (k) ∂ Jˆ (k) ∂um (k) ∂vm (k) ∂gi (k) ∂hi (k) = (1) ∂ea (k) ∂ Jˆ (k) m=1 ∂um (k) ∂vm (k) ∂gi (k) ∂hi (k) ∂wa(1) ∂waij (k) ij (20) ∂Ea (k)
1
2 = ea (k)· 1 − pl (k) wa(2) (k) mi 2 m=1 l=1 1
1
2 2 ·xj (k) 1 − um (k) 1 − gi (k) (21) 2 2 N aout N ah
wc(2) l
(k) wc(1) (k) l(Nx+m)
We would like to note that for the MIMO system, the information provided (1) by waij , the hidden node pl of the critic network and output node um are all intermediate variables. They all contribute as a part to ∂J(k) in terms of the (1) ∂waij (k)
nested pattern.
3
Case Study with Looper Tension/Height System
In this section, we investigate the application of the ADP architecture with multiple control variables to an industrial looper system currently deployed at
6
J. Fu et al.
the Wuhan Iron and Steel (Group) Corporation. In such a system, the control of looper tension and angle are important because they affect both the dimensional quality and the mass flow of a strip [10]. Therefore, both of them should be kept to a desired value simultaneously, which is called as a nominal operation point. Intuitively, we intend to adjust roller motor speed ratio to keep looper height constant or to regulate looper motor current to supply a predefined tension. However, in such a closely-coupled system, any single manipulation can affect two target values (angle of the looper and stress of the strip) at the same time. Therefore, it is very difficult to achieve the optimal control performance (angle and stress) simultaneously. Fig. 2 shows the looper system diagram we considered in this article. For space consideration, we only give a summary of the system model here. For detailed model parameters, interested readers can refer to [11] for further details. 1 wM load GR wT
'iref _ i
1 'iact _ i 1 Ti S
Cm
ˉ
'Tref _ i ˉ
1 JS
360 2S
'Zmotori
'T motor _ i
1 S
1 GR
'T i
Looper motor
1 wM load GR wW f _ i 1 GR
dL1 dL2 dT dT 'vref _ i
1 1 TV S
'v0 _ i
1 fi
+ ˉ
'W f _ i
Elp LS
ˉ ˉ
v0i
dfi dW f _ i
v0 _ i1
d Ei dW f _ i
Fig. 2. System model of looper tension and height
The main symbols in Fig. 2 are described as follows. Δiref i : current reference increment of motor of looper attached to stand i; Δiact i : actual current increment of motor of looper attached to stand i ; Δωmotor i : angle speed incremental of looper motor attached to stand i ; Δθi : angle incremental of looper attached to stand i; Δvref i : line speed reference incremental of rolling i; Δv0 i : actual speed incremental of rolling i; Δτi : actual stress incremental of strip between stand i and stand i + 1; v0 i : actual line speed of the roller in the stand i; We select state variables as T
x = [Δτi , Δθi , Δωmotor i , Δiacti , Δvo i ] .
(22)
An ADP Approach for Closely-Coupled MIMO System Control
7
manipulated variables T
u = [Δiref i , Δvref i ] .
(23)
and output variables as T
y = [Δθi , Δτi , ] .
(24)
Thus a state space model is obtained as follows [11] x˙ = Ap x + Bp x , y = Cp x
(25)
where ⎡ a11 ⎢ 0 ⎢ ⎢ Ap = ⎢a31 ⎢ ⎣ 0
a32 0
a13 1 GR
−
KD 360 J 2π
0
0 0
−
Elp L
(1+fi )
0
Cm 360 J 2π
− T1
l
0 0
⎤ ⎥ ⎥ ⎥ ⎥, ⎥ ⎦
− T1v 0 0 T 0 0 0 T1l 0 Bp = 0 0 0 0 T1v 01000 Cp = 10000
E Elp dβi+1 dfi dL1 dL2 1 = − Llp v0i dτ +v ; a = + ; 0 13 i+1 dτ L dθ dθ GR i i i i
0
a11
0 0
(26)
0
(27) (28)
∂Mload 1 360 1 360 a31 = − G1R ∂M∂τload J 2π ; a32 = − ∂θi JGR 2π . i
3.1
Experiment 1: Nominal Model with External Noise
In our experiments, we conduct 100 runs for each experiment. A run consists of a maximum of 1000 consecutive trials. It is considered successful for a run provided that a trial has lasted 6000 time steps and the index of that trial is equal or less than 1000. Otherwise, the run is considered unsuccessful. The time for each time step is 0.02s, and a trial is a complete process from start to finishe which is regarded when either tension is outside the range [−1.029M pa 1.029M pa] or angle is outside the range of [−3.3◦ 3.3◦ ]. Several experiments were conducted to evaluate the effectiveness of our proposed leaning and control design. The critic neural network is chosen as a 7 − 10 − 1 MLP structure and the action neural network is chosen as 5 − 6 − 2 MLP structure. The parameters used in simulations are summarized in Table 1 with the same notations defined in [4]. Besides, we have added both sensor and actuator noise to the state measurements and action network outputs. In our current study, we consider both Uniform and Gaussian noise on actuators and sensors. We present the experimental results in the left part of Table 2. There are three sub-columns under
8
J. Fu et al. Table 1. Summary of the parameters used in obtaining the results given in table 2 parameter lc (0) la (0) lc (f ) la (f ) value 0.28 0.28 0.005 0.005 parameter Nc value 100
Na Tc Ta 200 0.04 0.004
Table 2. Performance evaluation of MIMO method
Noise type
nominal model Success rate trial Noise type
Noise free U.5% a.∗ U. 10% a. U. 5% s.† U. 10% s. G.‡σ 2 (0.1) s. G. σ 2 (0.2) s. U. 5% a.s. U. 10% a.s. G. σ 2 (0.1) a.s. Gaussian σ 2 (0.2) a.s.
96% 96% 94% 92% 93% 88% 79% 95% 94% 84% 74%
224.9 236.3 190 208.2 196.4 273.8 301.3 208.9 280.6 324.8 372.6
generalized model Perturbation Success rate trial
Noise free U. 5% a.s. U. 10% a.s. G. σ 2 (0.1) a.s. G. σ 2 (0.2) a.s. Noise free U. 5% a.s. U. 10% a.s. G. σ 2 (0.1) a.s. G. σ 2 (0.2) a.s.
U. U. U. U. U. G. G. G. G. G.
5% 5% 5% 5% 5% σ 2 (0.001) σ 2 (0.001) σ 2 (0.001) σ 2 (0.001) σ 2 (0.001)
96% 95% 94% 81% 71% 91% 95% 93% 76% 76%
212.4 226.9 249.9 381.9 402.9 235.9 228.9 233.4 269.1 357.3
∗
a. : actuators are subject to the noise s. : sensors are subject to the noise a.s. : both actuators and sensors are subject to the noise Uniform noise ‡ Gaussian noise †
this caption. Specifically, the column ’Noise type’ indicates which kind of noise is added into the looper system (actuators, sensors, or both), the column ’Success Rate’ denotes the number of successful runs over total number of the runs, and the column ’trial’ demonstrates the average number of needed trials among those successful runs. 3.2
Experiment 2: Generalized Model with Internal Perturbation and External Noise
As a matter of fact, variation exists in the inner looper system due to the fluctuation of temperature and reduction ratio. The latter plays an important role in the looper dynamics. Therefore, in this case, we imitate the practical looper object via the generalized model with the perturbation (forward slip) in the elements of matrix Ap equation (26). In his way, the generalized plant model with the matrix A˜p can be presented as follows
An ADP Approach for Closely-Coupled MIMO System Control
⎡ a11 (1 + δ) a12 ⎢ a21 a22 ⎢ A˜p = Ap + ΔAp = ⎢ .. .. ⎣ . . a52 a51
a13 a23 .. .
⎤ a14 a15 (1 + δ) ⎥ a24 a25 ⎥ ⎥. .. .. ⎦ . .
a53 a54
9
(29)
a55
Since the deviation in the strip thickness is mostly below the 100μm or so, a Gaussian distribution with a mean zero and variance 0.001 is chosen to simulate the perturbation. The right part of Table 2 demonstrates the simulation results. We would like to note that the column of ’perturbation’ implies the kind of imitation for inner perturbation imposed on the looper system. From these results, we can see the proposed approach demonstrates effective performance when the system is subject to not only the external noise in sensors and actuators, but also internal parameter perturbations. Further observations show that the success rate and trial numbers in this case are almost at the same level compared to the system under general distributions (the left part of Table 2), which demonstrates the robustness of this approach.
4
Conclusions
In this paper, we investigate the application of ADP for an industrial application problem. We first derive the learning procedure of both action network and critic network with multiple control variables, and then study its control performance of the tension and height of the looper system in a hot strip mill. Various simulation results under different types of internal perturbations and external noises demonstrated the effectiveness of this approach.
References 1. Werbos, P.J.: Adp: The key direction for future research in intelligent control and understanding brain intelligence. IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics 38(4), 898–900 (2008) 2. Werbos, P.J.: Intelligence in the brain: A theory of how it works and how to build it. Neural Networks 22(3), 200–212 (2009) 3. Wang, F.Y., Zhang, H., Liu, D.: Adaptive dynamic programming: An introduction. IEEE Computational Intelligence Magazine 4(2), 39–47 (2009) 4. Si, J., Yu-Tsung, W.: Online learning control by association and reinforcement. IEEE Transactions on Neural Networks 12(2), 264–276 (2001) 5. Park, J.W., Harley, R.G., Venayagamoorthy, G.K.: Adaptive-critic-based optimal neurocontrol for synchronous generators in a power system using mlp/rbf neural networks. IEEE Transactions on Industry Applications 39(5), 1529–1540 (2003) 6. Enns, R., Si, J.: Helicopter trimming and tracking control using direct neural dynamic programming. IEEE Transactions on Neural Networks 14(4), 929–939 (2003)
10
J. Fu et al.
7. Liu, D., Javaherian, H., Kovalenko, O., Huang, T.: Adaptive critic learning techniques for engine torque and air-fuel ratio control. IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics 38(4), 988–993 (2008) 8. He, H., Liu, B.: A hierarchical learning architecture with multiple-goal representations based on adaptive dynamic programming. In: Proc. IEEE International Conference on Networking, Sensing, and Control, ICNSC 2010 (2010) 9. Werbos, P.J.: Backpropagation through time: What it does and how to do it. Proceedings of the IEEE 78(10), 1550–1560 (2002) 10. Cheng, C.C., Hsu, L.Y., Hsu, Y.L., Chen, Z.S.: Precise looper simulation for hot strip mills using an auto-tuning approach. The International Journal of Advanced Manufacturing Technology 27(5), 481–487 (2006) 11. Fu, J., Yang, W.D., Li, B.Q.: Lmis based h decoupling method and the looper height and tension control. Kongzhi yu Juece/Control and Decision 20, 883–886+891 (2005)
Adaptive Dual Heuristic Programming Based on Delta-Bar-Delta Learning Rule Jun Wu, Xin Xu, Chuanqiang Lian, and Yan Huang Institute of Automation, College of Mechatronics and Automation, National University of Defense Technology, Changsha, 410073, China
[email protected]
Abstract. Dual Heuristic Programming (DHP) is a class of approximate dynamic programming methods using neural networks. Although there have been some successful applications of DHP, its performance and convergence are greatly influenced by the design of the step sizes in the critic module as well as the actor module. In this paper, a Delta-Bar-Delta learning rule is proposed for the DHP algorithm, which helps the two modules adjust learning rate individually and adaptively. Finally, the feasibility and effectiveness of the proposed method are illustrated in the learning control task of an inverted pendulum. Keywords: reinforcement learning, adaptive critic design, dual heuristic programming, Delta-Bar-Delta, neural networks.
1 Introduction Dynamic Programming (DP) [1] is a general approach for sequential decision making under the framework of Markov decision processes (MDPs). However, classical dynamic programming algorithms, such as value iteration and policy iteration, are computationally expensive for MDPs with large or continuous state/action spaces. In recent years, approximate dynamic programming (ADP) and reinforcement learning (RL) have been widely studied to solve the difficulties in DP. Particularly, Adaptive Critic Designs (ACDs) were developed [2][3][4] as a class of learning control methods for MDPs with continuous state and action spaces. ACDs combine the concepts of reinforcement learning and approximate dynamic programming, and neural networks are usually used to approximate the value functions and policies. In various applications, ACDs have been shown to be capable of optimization over time under conditions of noises and uncertainties [5][6]. As depicted in Fig.1, the ACD-based learning controller approximates a nearoptimal control law for a dynamic system by successively adapting two Artificial Neural Networks (ANNs), namely, an actor neural network (which dispenses the control signals) and a critic neural network (which learns to approximate the cost-togo or utility function). These two neural networks approximate the Hamilton-JacobiBellman equation associated with the optimal control theory [6]. According to the critic's inputs and outputs, there are three types of ACD-based learning algorithms, i.e. Heuristic Dynamic Programming (HDP), Dual Heuristic Programming (DHP), and the action dependent versions of HDP and DHP, namely, ADHDP and ADDHP D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 11–20, 2011. © Springer-Verlag Berlin Heidelberg 2011
12
J. Wu et al.
Fig. 1. The schematic diagram of ACDs
[5]. As an important class of ACD-based learning control method, DHP estimates the derivatives of utility functions and has been successfully applied in some real-world problems, such as generator control [6][7], motion controller design [8] , collective robotic search [9] and flight control [10]. As the DHP technique relies on the online learning performance of ANNs, selecting an appropriate learning rate is crucial for its successful implementation. A low learning rate may be good for learning stably in ANNs. However, it may become a bad choice for DHP since DHP requires the ANNs converge quickly enough so as to learn a near-optimal policy successfully. In DHP, the critic module and the actor module depend on each other. The ‘bad’ learning performance in one module will influence the other module, and the ‘bad’ result will be strengthened iteratively. Finally, it may result in failures for learning a good control policy. Therefore, it is desirable to study new methods in the design of learning rates to improve DHP. However, according to the authors’ knowledge, there are very few works on the optimization of learning rates in DHP. In this paper, a new method, which adopts the Delta-Bar-Delta rule [11], is presented to accelerate DHP’s learning performance heuristically. The simulations on the Inverted Pendulum problem show that the new method can regulate the learning rates adaptively and, finally, increase DHP’s success rate.
2 Dual Heuristic Programming In ACDs, the critic network is used to estimate the utility function J or its derivative with respect to the state vector x. When a deterministic MDP with continuous state and action spaces is considered, the utility function J embodies the desired control objective through time and is expressed in the Bellman equation as follows: ∞ J (t ) = U (t ) + γ J (t + 1) = ∑ k =0 γ kU (t + k ) ,
(1)
where γ is a discount factor (0< γ ≤1), and U(.) is a local cost. The aim of DHP is to learn a sequence of actions that maximize or minimize J. The entire process of DHP can be characterized as a simultaneous optimization problem: approximated gradientbased optimization for the critic module together with gradient-based optimization for the actor module. Since the gradient is an important aspect for controller training, DHP uses critics to estimate the derivatives of J directly, i.e. λ = ∂J / ∂x instead of the value function itself. The corresponding Bellman's Recursion is expressed as:
Adaptive Dual Heuristic Programming Based on Delta-Bar-Delta Learning Rule
∂J (t ) ∂U (t ) ∂J (t + 1) . = +γ ∂x(t ) ∂x(t ) ∂x(t )
13
(2)
Then the error in critic training can be expressed as
Ec = λ (t ) − γ
∂J (t + 1) ∂U (t ) ∂J (t ) ∂J (t + 1) ∂x(t + 1) ∂U (t ) . − = −γ − ∂x(t ) ∂x(t ) ∂x (t ) ∂x (t + 1) ∂x (t ) ∂x (t )
(3)
In DHP, applying the chain rule for derivatives yields,
∂x j (t + 1) m n ∂x j (t + 1) ∂uk (t ) ∂J (t + 1) n , = ∑ λ j (t + 1) + ∑∑ λ j (t + 1) ∂xi (t ) ∂xi (t ) ∂uk (t ) ∂xi (t ) j =1 k =1 j =1
(4)
where λ j ( t + 1) = ∂J (t + 1) / ∂x j (t + 1) , and n, m are the numbers of outputs in the model network and the actor network, respectively. By exploiting (4), each component in the vector Ec is determined by:
Ecj (t ) =
∂J (t ) ∂J (t + 1) ∂U (t ) m ∂U (t ) ∂uk (t ) . −γ − −∑ ∂x j (t ) ∂x j (t ) ∂x j (t ) k =1 ∂uk (t ) ∂x j (t )
[w ji (n)] x1 (n)
Σ
(5)
[wkj (n)]
Σ
f 1 (n )
y j ( n)
xi (n)
f k (n )
Σ Σ
f K (n )
xI (n)
Σ
y J ( n)
Fig. 2. The ANN structure used in DHP
To train both the actor and critic networks, a model of the system dynamics that includes all the terms from the Jacobian matrix of the coupled system, i.e. ∂x j (t + 1) / ∂xi (t ) and ∂x j (t + 1) / ∂uk (t ) , is needed. These derivatives can be found from an analytic model directly or from a neural network model (or fuzzy model) indirectly [12]. The actor network is adapted by propagating λ(t+1) back through the model down to the actor network. According to the Bellman optimality principle, to obtain the best control signal u(t ) , the following equation should hold,
∂U (t ) ∂J (t + 1) ∂U (t ) ∂J (t + 1) ∂x(t + 1) ∂J (t ) +γ = +γ =0 . =0 ⇒ ∂u (t ) ∂u (t ) ∂u (t ) ∂u (t ) ∂x(t + 1) ∂u (t )
(6)
14
J. Wu et al.
As shown in Fig.2, for ensuring non-linear approximation capacity and faulttolerance capacity, ANNs based on multi-layer perceptrons are usually adopted to build the critic module and actor module in DHP. The activation functions in ANNs usually include linear function Linear (x)=k ( x) and non-linear sigmoid function Sigmoid ( x) = (1 + e − x ) −1 .
3 Adaptive Learning Rates Based on the Delta-Bar-Delta Rule In the original DHP algorithm, the learning rate adopted in the weight update is always fixed and uniform. However, a fixed learning rate is not an optimal choice for ANNs’ learning. Obviously, DHP with a fixed learning rate is lacking of adaptability. The learning rate should be regulated according to the problem disposed and current learning phase resided. Furthermore, since an ANN has many weights, it is questionable whether all the weights should be learned at the same learning rate. Especially, there is another difficulty in adopting a fixed learning rate for DHP. Since the critic network and the actor network depend on each other seriously, the learning results in one network will be passed to the other network and affect its learning procedure. Since the two learning procedures operate iteratively, a ‘bad’ learning behavior in one module may result in an unsuccessful learning procedure. Therefore, although a conservative learning rate is usually selected in ANNs for achieving good convergence performance, an adaptive learning rate can be a better choice for DHP. In the following context, a Delta-Bar-Delta rule [11] meeting such demands is adopted for DHP. 3.1 The Delta-Bar-Delta Rule As mentioned above, how to select learning rates will affect the learning procedure greatly. Since the cost surface for multi-layer networks can be complex, choosing a fixed learning rate can be improper. What works in one location may not work well in another location. The Delta-Bar-Delta rule can regulate the learning rate locally and heuristically as training progresses [11]: z z z z z
Each weight has its own learning rate. For each weight, the gradient at the current time-step is compared with the gradient at the previous step (actually, previous gradients are averaged). If the gradient is in the same direction, the learning rate is increased. If the gradient is in the opposite direction, the learning rate is decreased. Should be used with batch mode only.
The Delta-Bar-Delta rule [11] regulates the learning rates according to the gradients of objective function E(.) as follows:
∂E ( t ) ∂E ( t ) ∂E ( t − 1) , =− ∂α ij ( t ) ∂wij ( t ) ∂wij ( t − 1)
(7)
Adaptive Dual Heuristic Programming Based on Delta-Bar-Delta Learning Rule
15
where α ij ( t ) is the learning rate with respect to weigh wij ( t ) . The learning rate can be regulated as
Δα ij ( t + 1) = −γ
∂E ( t ) ∂E ( t ) ∂E ( t − 1) , =γ ∂α ij ( t ) ∂wij ( t ) ∂wij ( t − 1)
(8)
where γ is a positive number. Due to the difficulty in choosing an appropriate γ , a revised learning rate regulating rate is proposed as follows:
⎧ a ⎪ t Δα ij ( + 1) = ⎨ −bα ij ( t ) ⎪ 0 ⎩
if Sij ( t − 1) Dij ( t ) > 0 if Sij ( t − 1) Dij ( t ) < 0 ,
(9)
else
where Dij ( t ) = ∂E ( t ) / ∂wij ( t ) , Sij ( t ) = (1 − ξ ) Dij ( t − 1) + ξ Sij ( t − 1) , a, b and ξ are positive numbers and usually set as [11]: 10−4 ≤ a ≤ 0.1 , 0.1 ≤ b ≤ 0.5 , 0.1 ≤ ξ ≤ 0.7 . Apparently,
the learning rate aij ( t ) can increase linearly only, but decrease exponentially, which protects the learning rate from being too large. 3.2 Adopting Delta-Bar-Delta for DHP
According to Fig.2, the expected output of the critic network is:
λsD (t ) =
S ⎧ ∂U ( t ) K ⎧ ∂U (t ) ∂u k ( t ) ⎫ ⎪ ∂J (t + 1) K ⎛ ∂xs′ (t + 1) ∂uk (t ) ⎞ ⎫⎪ . +∑⎨ × ∑⎜ ⎬+γ ∑⎨ ⎟⎬ ∂xs (t ) k =1 ⎩ ∂uk (t ) ∂xs (t ) ⎭ s ′ =1 ⎪ ⎩ ∂xs′ (t + 1) k =1 ⎝ ∂uk (t ) ∂xs (t ) ⎠ ⎭⎪
(10)
The critic network learns minimization of the following error measure over time
Ec (t ) =
1 (λs (t ) − λsD (t )) 2 . ∑ 2 s
(11)
When a fixed learning rate is used, the update rule for the critic weights is
Δwsm (t ) = α where
∂Ec (t ) ∂λ (t ) , = α (λs (t ) − λsD (t )) s ∂wsm (t ) ∂wsm (t )
(12)
wsm denotes the weight for the m-th neuron connection between the hidden
node with the s-th output of the critic network, α is a positive learning rate. λs (t ) is the output value of the s-th output of the critic network. The actor network tries to minimize the following error measure over time
Ea = ∑ [ t
∂U (t ) ∂J (t + 1) 2 . +γ ] ∂u (t ) ∂u (t )
If a fixed learning rate is used, the update rule for the actor weights is
(13)
16
J. Wu et al.
⎛ ∂U (n) ∂J (n + 1) ⎞ ∂uk (n) Δvkm (n) = α ⎜ +γ ⎟ ∂uk (n) ⎠ ∂vkm (n) ⎝ ∂uk (n) ⎛ ∂U (n) ∂x (n + 1) ⎞ ∂uk (n) =α ⎜ + γ ∑ λsD (n + 1) s ⎟ ∂uk (n) ⎠ ∂vkm (n) s ⎝ ∂uk (n)
,
(14)
where β is a positive learning rate, vkm denotes the weight for the m-th neuron connection between the hidden node with the k-th output of the actor network. Generally speaking, if the weights of the actor network converge finally, it can be considered that a near-optimal control policy is obtained, also. With the help of the Delta-Bar-Delta rule, the weights of the critic network in DHP are updated as follows:
Δwsm (t ) = α sm
∂Ec (t ) ∂λ (t ) , = α sm (λs (t ) − λsD (t )) s ∂wsm (t ) ∂wsm (t )
(15)
where α sm is the adaptive learning rate for weight wsm . Correspondingly, the weights of the actor network are updated as follows: ⎛ ∂U (t ) ∂x (t + 1) ⎞ ∂uk (t ) , Δvkm (t ) = α km ⎜ + γ ∑ λsD (t + 1) s ⎟ ∂uk (t ) ⎠ ∂vkm (t ) s ⎝ ∂uk (t )
where α km is the adaptive learning rate for weight
(16)
vkm .
4 Experiments and Analysis 4.1 The Inverted Pendulum Problem
In this paper, the Inverted Pendulum problem [1] is used to evaluate the improved DHP algorithm, which requires balancing a pendulum of unknown length and mass at the upright position by applying forces to the cart. The control system of an inverted pendulum is depicted in Fig. 3. θ
l
F
mc
x=0
x Fig. 3. The control system of an inverted pendulum
Adaptive Dual Heuristic Programming Based on Delta-Bar-Delta Learning Rule
17
The inverted pendulum system consists of a cart moving horizontally and a pole with one end contacting with the cart. Let x denote the horizontal distance between the center of the cart and the center of the track, where x is negative when the cart is in the left part of the track. θ denotes the angle of the pole from its upright position (in degrees) and F is the amount of force (N) applied to move the cart towards its left or right. The state variables x, θ are the derivatives of x and θ , respectively. The state transitions are governed by the nonlinear dynamics of the system:
{
}
x = α F + ml[θ 2 sin θ − θ cos θ ] − μc sgn( x ) , θ =
g sin θ + cos θ [− F − mlθ + μ c sgn( x )] − μ p / ( ml ) l (4 / 3 − α m cos 2 θ )
(17)
,
(18)
where g is the gravity constant (g = 9.8m/s2), m is the mass of the pendulum (m = 0.1kg), l is the length of the pendulum(l = 1m), and α = 1.0/(m + Mcart) , Mcart is the mass of the cart (Mcart = 1.0kg). The friction factor between the cart and track is μc = 0.0005 and the friction factor between the cart and pole is μp = 0.000002. The simulation step is set to be 0.02 seconds. A reward 0 is given as long as the angle of the pendulum does not exceed π/2 and the horizontal distance does not exceed the boundary 2.4m in absolute value, otherwise a penalty of -1 is given. The discount factor of the process is set to be 0.90. For controlling inverted pendulums, the task is to construct a control policy that can balance the pendulum only with state feedbacks and reward signals. 4.2 Implementation and Evaluation
Based on the above model, the state vector can be defined as
S ( t ) = ( x1 ( t ) , x2 ( t ) , x3 ( t ) , x4 ( t ) , x5 ( t ) , x6 ( t ) ) ,
(19)
where x1 (t ) = θ (t ) , x2 (t ) = θ(t ) , x3 (t ) = θ(t ) , x4 (t ) = x (t ) , x5 (t ) = x (t ) , and
x6 (t ) = x (t ) . The action is defined as a ∈ [−10 N ,10 N ] . The local utility function is defined as:
U (t ) = 0.25( x(t ) − xd (t )) 2 + 0.25(θ (t ) − θ d (t )) 2 ,
(20)
where xd (t ) =0 and θ d (t ) =0 denote the desired position and desired angle, respectively. The critic network and the actor network are both implemented with three-layer perceptrons, where the hidden layer has 5 nodes with Sigmoid activation functions. The number of input nodes is equal to the dimension of states, which is 6. For the critic network, the output layer has 6 linear nodes while in the actor network, only 1 linear node is needed since the action dimension is 1. In the experiments, the maximum number of episodes is set to be 500 and the maximum control step for every episode is 4000. One episode is unsuccessful if at the
18
J. Wu et al.
end of the episode, either the pole falls or the cart exceeds the boundary. The initial position and angle are randomized uniformly within the intervals x ∈[ − 0.5m, 0.5m] and θ ∈ [ −1D ,1D ] , respectively. To evaluate the proposed method comprehensively, different initial learning rates, i.e. α m =0.01, 0.04, 0.07, 0.10, 0.15, 0.20, are tested respectively. In the following Fig.4, the convergence rates of the original DHP and the improved DHP are compared in terms of the quadratic norm of the differences between the actor weights in two successive time steps, i.e., ew = wGt +1 − wG t . The initial learning rate α m is set to be 0.04. The necessary precondition for comparison is that both the two
learning processes succeed, i.e. the final learned policy can balance the pole with the maximum steps. Generally speaking, the convergence of the actor weights indicates that the DHP learning process converges. So the errors of the actor network’s weights are used for comparisons. 0.2 Original DHP Improved DHP 0.15
ew 0.1
0.05
0 0
10
20
30 Time(s)
40
50
Fig. 4. A comparison of the convergence rate with/without Delta-Bar-Delta rule 0.6
Success rate
0.5
Original DHP Improved DHP
0.4 0.3 0.2 0.1 0
0.01 0.04 0.07 0.10 0.15 0.20 Initial Learning rate
Fig. 5. A comparison of the success rates between the original DHP and the improved DHP
Adaptive Dual Heuristic Programming Based on Delta-Bar-Delta Learning Rule
19
Fig. 4 shows that the learning process of the original DHP algorithm converged after 43s, however, the improved DHP algorithm converged after 21s only. It is evident that the adoption of the Delta-Bar-Delta rule helps the DHP algorithm learn more effectively and, finally, result in better convergence performance. The following Fig. 5 shows the success rates of the original DHP and the improved DHP. The success rate denotes the ratio of the number of successful learning processes to the total number of learning episodes, which is 500 in the simulation. According to the results, a distinct improvement is achieved for the improved DHP adopting the Delta-Bar-Delta rule. The above experimental results show that the improved DHP gains advantages over the original one. The Delta-Bar-Delta rule helps the networks learn with adaptive learning rates, and then achieve better convergence performance and success rates.
5 Conclusion DHP has been shown to be a promising approach for online learning control in MDPs with continuous state and action spaces. However, the learning rates will greatly influence the performance of DHP. In this paper, a Delta-Bar-Delta rule is adopted to regulate the weights of the critic and the actor adaptively. The feasibility and the effectiveness of the proposed method are illustrated in the experiments on an Inverted Pendulum problem. Future work will include more theoretical and empirical analysis of adaptive learning rates in online reinforcement learning. Acknowledgments. This work is supported in part by the National Natural Science Foundation of China under Grant 60774076, 61075072 and 90820302, the Fork Ying Tung Education Foundation under Grant 114005, and the Natural Science Foundation of Hunan Province under Grant 2007JJ3122.
References 1. Suykens, J.A.K., de Moor, B., Vandewalle, J.: Stabilizing Neural Controllers: A Case Study for Swing up a Double Inverted Pendulum. In: NOLTA 1993 International Symposium on Nonlinear Theory and Its Application, Hawaii (1993) 2. Werbos, P.J.: A Menu of Designs for Reinforcement Learning Over Time. In: Mliler, W.T., Sutton, R.S., Werbos, P.J. (eds.) Neural Networks for Control, ch. 3. MIT Press, Cambridge (1990) 3. Werbos, P.J.: Approximate Dynamic Programming for Real-Time Control and Neural Modeling. In: White, D.A., Sofge, D.A. (eds.) Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, ch.13, Van Nostrand Reinhold, New York (1992) 4. Liu, D.R.: Approximate Dynamic Programming for Self-Learning Control. Acta Automatica Sinica 31(1), 13–18 (2005) 5. Prokhorov, D., Wunsch, D.: Adaptive Critic Designs. IEEE Trans. Neural Networks 8, 997–1007 (1997) 6. Venayagamoorthy, G.K., Harley, R.G., Wunsch, D.C.: Comparison of Heuristic Dynamic Programming and Dual Heuristic Programming Adaptive Critics for Neurocontrol of a Tur-bogenerator. IEEE Transactions on Neural Networks 13(3), 763–764 (2002)
20
J. Wu et al.
7. Park, J.W., Harley, R.G., Venayagamoorthc, G.K., et al.: Dual Heuristic Programming Based Nonlinear Optimal Control for a Synchronous Generator. Engineering Applications of Arti-ficial Intelligence 21, 97–105 (2008) 8. Lin, W.S., Yang, P.C.: Adaptive Critic Motion Control Design of Autonomous Wheeled Mobile Robot by Dual Heuristic Programming. Automatica 44, 2716–2723 (2008) 9. Zhang, N., Wunsch II, D.C.: Application of Collective Robotic Search Using Neural Network Based Dual Heuristic Programming (DHP). In: Wang, J., Yi, Z., Żurada, J.M., Lu, B.-L., Yin, H. (eds.) ISNN 2006. LNCS, vol. 3972, pp. 1140–1145. Springer, Heidelberg (2006) 10. Ferrari, S., Stengel, R.F.: Online Adaptive Critic Flight Control. Journal of Guidance, Control, and Dynamics 27(5), 777–786 (2004) 11. Jacobs, R.A.: Increased Rates of Convergence Through Learning Rate Adaption. Neural Networks 1, 295–307 (1988) 12. Lendaris, G.G., Shannon, T.T., Schultz, L.J., et al.: Dual Heuristic Programming for Fuzzy Control. In: Proceedings of IFSA / NAFIPS, Vancouver, B.C., vol. 1, pp. 551–556 (2001)
A Design Decision-Making Support Model for Prioritizing Affective Qualities of Product Chun-Chih Chen 1 and Ming-Chuen Chuan 2 1
Department of industrial design, National Kaohsiung Normal University, 62, Shenjhong Rd. , Yanchao Township, Kaohsiung Taiwan 824, R.O.C.
[email protected] 2 Institute of applied arts, National Chiao Tung University, 1001 University Road, Hsinchu, Taiwan 300, R.O.C.
[email protected]
Abstract. Nowadays, the affective qualities elicited by the product design plays a crucial role in determining the likelihood of its success in today’s increasingly competitive marketplace. To resolve the trade-offs dilemma in multipleattribute optimization of affective user satisfaction, a decision-making support model based on the two-dimensional quality model, the extended Kano model, is presented in this study. The proposed model can assists designers in prioritizing actions and resource allocation to guide ideal product conception in different competitive situation. The implementation of this proposed model is demonstrated in detail via a case study of mobile phone design. Keywords: User satisfaction, Kano model, Multiple-attribute decision making, affective design.
1 Introduction In order to improve the competitiveness, a well-designed product should be able to not only meet the basic functionality requirements, but also satisfy users’ psychological needs (or feeling, affects). The affective qualities elicited by the product form design have become a critical determinant of user satisfaction. Considering the multiple-attribute nature of user satisfaction, however, there are difficult trade-offs to be made when deciding which affective qualities should be emphasized in the product design process. To resolve this problem requires understanding which qualities create more satisfaction than others. Kano et al. [6] developed a two-dimensional model describing the linear and non-linear relationship between quality performance and user satisfaction. The Kano model divides qualities into must-be, one-dimensional, indifferent and attractive qualities based on how they satisfy user needs, as Figure 1 illustrates. So far, the application of the Kano model has mostly focused on the effects of physical or technical product qualities on user satisfaction. Considering the different ability to satisfy user need between physical and subjective (affective) qualities, this study hypothesizes that in addition to the existed Kano quality categories, more quality types may be identified for affective qualities. Thus, an extended Kano model is proposed to explore the different effects of affective qualities on user satisfaction. D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 21–30, 2011. © Springer-Verlag Berlin Heidelberg 2011
22
C.–C. Chen and M.-C. Chuan
Fig. 1. Kano model of user satisfaction
A mobile phone design experiment was conducted to demonstrate the benefits of using the proposed design decision-making support model to prioritize affective qualities for enhancing user satisfaction.
2 Theoretical Background 2.1 The Extended Kano Model User satisfaction has mostly been viewed as one-dimensional; that is, product attributes result in user satisfaction when the attributes are fulfilled and dissatisfaction when not fulfilled. However, fulfilling user expectations to a great extent does not necessarily imply a high level of user satisfaction. Kano et al. [6] have proposed a ‘two-dimensional’ quality model characterizing the following categories of product attributes that influence user satisfaction in different ways (Figure 1). It can help designers ascertain the extent to which a given set of product attributes of a product satisfies users’ wants and needs [6]. The Kano model was originally developed to probe the characteristics of objective (physical or technical) product qualities in affecting users’ satisfaction in different ways. For a physical quality, increasing quality performance generally causes monotonous enhancement of satisfaction. But this may not be true for some affective qualities. For example, Berlyne [1] suggested a Wundt (inverted-U) curve function to explain the effect of sense feelings on user preferences. He indicated that the hedonic value (pleasure) associated with the perception of a stimulus, such as complexity, will peak when there is an optimal level of psychological arousal; too little arousal (too simple) causes indifference, while too much arousal (too complex) causes displeasure. To explain the relationships between quality performances and satisfaction in a systematic and more comprehensive manner, an extended Kano model using the linear regression model for classifying Kano categories is proposed in this study. The extended Kano model classifies product qualities into four distinct categories. Each quality category affects users’ satisfaction in a different way, as Figure 2 shows. The different types of qualities are explained as follows: 1. The attractive quality: Here, user satisfaction increases super linearly with increasing attribute performance (quality). There is not, however, a corresponding decrease in user satisfaction with a decrease in attribute performance (Figure 2 (A)).
A Design Decision-Making Support Model for Prioritizing Affective Qualities
23
2. The one-dimensional quality: Here, user satisfaction is a linear function of product attribute performance (quality). Increasing attribute performance leads to enhanced user satisfaction (Figure 2 (B)). 3. The must-be quality: Here, users become dissatisfied when the performance of this product attribute is low or absent. However, user satisfaction does not rise above the neutral level with increased performance of this product attribute (Figure 2 (C)). 4. The ‘nominal-the-better: A specified level of attribute performance will cause the highest user satisfaction (optimization). In other words, the closer a quality comes to the target performance, the higher the user satisfaction will be (Figure 2 (D)). Besides these four qualities, two more quality types can be identified: the indifferent and reversal qualities (to be precise, they should be called characteristics because they are not really a user need or quality). For the indifferent quality, user satisfaction is not affected by the performance of a product quality at all, as Figure 2 (E) illustrates. For the reversal quality, users are more dissatisfied as the attribute performance increases. It can be classified into four sub-types: reversed attractive quality (-A), reversed one-dimensional quality (-O), reversed must-be quality (-M), and reversed nominal-the-better (-NB). The characteristics of -NB quality (nominal-the-worse quality), -A quality, -O quality, and -M quality are simply the reversed situations of the A, O, M and NB, qualities, respectively, as shown in Figure 2 (F), (G), (H), (I).
Fig. 2. The extended Kano model (linear model)
2.2 Identification of the Extended Kano Qualities Ting and Chen [8] proposed a regression model to classify qualities into Kano types of quality. They performed a regression analysis for each quality, using user satisfaction as the dependent variable and the positive/negative performance of a quality as the independent variables. Positive performance of a quality means that this quality is presenting (sufficient), and negative performance means the quality is absent (insufficient). The impact of positive and negative quality performance on user satisfaction can be estimated by the following linear regression model:
US = C + β 1 × ( − Kn ) + β 2 × Kp
(1)
24
C.–C. Chen and M.-C. Chuan Table 1. The Kano decision table for the extended Kano model β2 (+) Regression coefficient Sig.(+) n.s. Sig.(-) Sig.(+) - NB -A -O β1 n.s. A I -M (-) Sig.(-) O M NB (A-attractive; O-one-dimensional; M-must-be; I-indifferent; R: reversal; NB: nominal-the-better).
Where US is the degree of user satisfaction; negative and positive performance of a quality are represented as Kn and Kp , respectively; and β1 and β 2 are their corresponding regression coefficients. When the performance of a quality is negative, the value of the negative performance is equal to (- Kn ) and Kp is assigned to 0. On the other hand, if the performance of a quality is positive, the value of the positive performance is equal to Kp , and Kn is assigned to 0. Comparing the two regression coefficients ( β1 and β 2 ) can reveal the varied relationship between the quality and user satisfaction, according to the positive or negative value of the coefficients with significance. The greater the absolute value of the coefficient, the greater its effect on user satisfaction. According to the extended Kano model, qualities can be classified into different categories according to whether the regression coefficients reach the significant level or not. The classification guidelines are described as Table 1. In this table, Sig.(+) means the regression coefficient is significantly positive; Sig.(-) means the regression coefficient is significantly negative; n.s. means the regression coefficient is non-significant in the regression equation.
3 The Verified Experiment Design for the Extended Kano Model To illustrate how the extended Kano model can be applied to identify the different relationships between affective qualities and user satisfaction, we conduct an experimental study of mobile phones. The design of the experiment is described as the following. In this study the photos of mobile phone design were used as the stimuli for eliciting users’ affective responses. A total of 35 representatives samples (Figure 3) were selected.
Fig. 3. Representative mobile phone designs
A Design Decision-Making Support Model for Prioritizing Affective Qualities
25
Each mobile phone design was represented by a scaled 4”×6” grayscale photographic image. While a literature review [2, 4] provided a comprehensive set of affective qualities for selecting the representative semantic differential scales (adjectives). A group of experts and users then were recruited to identify the suitable and important qualities from the collected pool of qualities. As a result, 20 bipolar adjective pairs were selected for the SD survey, as shown in Table 2. Then 32 male and 28 female subjects between 18-24 years old were recruited for the SD evaluation experiment. All 60 subjects were asked to evaluate the 35 experimental samples of mobile phones, with a 7-point Likert scale (-3-+3) of the SD method, on each of the 20 affective qualities and overall satisfaction.
4 Result and Discussion 4.1 The Classification of Affective Qualities Based on the Extended Kano Model To identify the different effects of qualities on user satisfaction, an extended Kano model using the regression method [8], as described in 2.2, was used to classify these qualities into different Kano categories. The means of 20 qualities for each phone design were then transformed into negative and positive performance, respectively. Setting overall satisfaction as the dependent variable and positive and negative quality performance as the independent variables, the linear regression analyses were performed, in using the software of SPSS, for each of the 20 affective qualities according to Eq. (1). The significance of the regression coefficients was then used to determine the proper category for each quality, according to Table 1. The result of Kano’s quality classification of each attribute is summarized in Table 2. Note that these qualities (No.1-No.7) related with aesthetics perception of user were categorized as one-dimensional qualities. Higher performance in these qualities may lead to higher user satisfaction. These three qualities related with the emotion factor, ‘lively’, ’fun’ interface style , and ‘fascinating’ were categorized as must-be qualities. If the performance of these qualities is low, young users become dissatisfied. However, high performance of this attribute does not raise the level of their satisfaction. Pleasant emotions may presumably regarded as an attractive quality or excitement need for users, but the results indicate that cell phone designs with these qualities are a basic need for young users. The ‘anticipated’ is classified as an attractive quality. That is, the interface design with the characteristics of easy understanding can make users satisfactory; on the contrary an amazing interface design will not lead to user dissatisfaction. While the ‘simple’ attribute is classified as a reversed must-be (-M) quality. In other words, the ‘complex’ (not too simple) feeling in phone design is a must-be quality for young users in this case. The other qualities are all categorized as indifferent qualities. Therefore, these qualities do not affect user satisfaction. Furthermore Table 2 also shows that ‘simple’, with the greatest absolute value of the significantly negative coefficient β2 (-0.759), should be provided to the minimum acceptable level for avoiding user dissatisfaction. Hence, the “anticipated” with the greatest absolute value of the significantly positive coefficient β2 (0.552) can be used as means of differentiating attribute offering from competitors. The values of coefficient can be used to identify the relative importance of qualities affecting on user dissatisfaction or satisfaction.
26
C.–C. Chen and M.-C. Chuan Table 2. Results of the Kano classification Kano classification 1 Conventional-fashionable -0.472 * 0.494 * 0.734 O 2 Vulgar-elegant -0.464 * 0.494 * 0.673 O 3 Ugly-beautiful -0.544 * 0.459 * 0.726 O 4 Common-unique -0.488 * 0.401 * 0.597 O 5 Unpleasant- pleasant -0.561 * 0.346 * 0.610 O 6 Old-young -0.590 * 0.288 * 0.601 O 7 Intense-relaxed -0.367 * 0.343 * 0.393 O 8 Dull-lively -0.627 * 0.197 n.s. 0.557 M 9 Not fun-fun -0.464 * 0.267 n.s. 0.384 M 10 Bored-fascinating -0.478 * 0.241 n.s. 0.367 M 11 Rugged-streamlined 0.130 n.s 0.284 n.s 0.057 I 12 Confused-intelligible interface -0.128 n.s -0.083 n.s 0.014 I 13 Difficult to use-easy to use -0.170 n.s 0.143 n.s 0.024 I 14 Amazing-anticipated -0.008 n.s 0.552 * 0.308 A 15 Novel-familiar -0.099 n.s -0.202 n.s 0.031 I 16 Decorative-functional -0.215 n.s 0.172 n.s 0.110 I 17 Complex-simple -0.132 n.s -0.759 * 0.494 -M 18 Masculine-feminine -0.180 n.s 0.057 n.s 0.046 I 19 Heavy-light -0.039 n.s 0.156 n.s 0.033 I 20 Fragile-durable -0.156 n.s -0.216 n.s 0.045 I *: Sig. <0.05; n.s.: non-significant; A-attractive; O-one-dimensional; M-must-be; I-indifferent. No.
Quality attribute
β1
Sig.
β2
Sig.
R2
In Berlyne’s study [1] and Taguchi method [7] both suggested that the attribute performance characteristic of NB should be considered in optimizing satisfaction. Therefore in this study we hypothesized that the characteristics of NB quality may exist in affective qualities. Unfortunately, none of the qualities has been identified as the NB or -NB quality. A close examination of Table 2 reveals that the regression coefficients β1 and β 2 of these qualities, ‘Confused-intelligible’, ‘Novel-familiar’, ‘Complex-simple’ and ‘Fragile-durable’ , are all negative. The values of β1 and β 2 of ‘rugged-streamlined’ attribute are both positive. It seems to suggest that these qualities may more or less show the characteristics of NB or –NB. The possible ‘nominalthe-better’ relationship between this attribute (‘Complex-simple’ with higher R2) performance and user satisfaction using curve estimation may be worthy of further investigation. This study also conducted the curve fits for linear and quadratic models, setting overall satisfaction as the dependent variable and quality performance of the affective attribute as the independent variable. Figure 4 and Table 3 shows that the analysis results indicate that the quadratic model (R2=0.48) has a higher R2, which provides a better explanation than the linear model (R2=0.28). Thus, the ‘simple’ feeling exhibits a characteristic related to the ‘nominal-the-better’ quality. This finding is in line with a previous study [1], and suggests that the hedonic value associated with the perception of simplicity peaks at an optimum level of arousal; too little arousal (too simple) or too much arousal (too complex) will produce less satisfaction. As the above-mentioned the attribute ‘Complex-simple’ seems to have the characteristic of NB. However, the classification of NB for ‘Complex-simple’ is not statistically significant at the 0.05 level by using the regression method in this study. It may be explained by the individual difference of users. Each user has his/her own specific needs and his/her pattern of preference. Griffin and Hause [3] pointed out that designers should consider users heterogeneity when describing the importance of product
A Design Decision-Making Support Model for Prioritizing Affective Qualities
27
Fig. 4. Effect of the quality attribute ’simple’ on consumer satisfaction
qualities on acceptance. Considering the difference of each individual as applying Kano model, the 60 subjects were further classified based on the similarity degree of attribute performance evaluation. A systematic cluster analysis, in using the software of SPSS, was conducted to group the subjects for being a homogeneous cluster of users. By adopting a proper cut in the resulted hierarchical dendrogram, three consumer groups were finally determined. The number of subjects in these three groups is 6, 23 and 31, respectively. Table 3. Quadratic and linear models for the quality attribute ’simple’ Variable Coefficients b β t -0.299 -0.563 -4.406 Quadratic model Simple2 Simple -0.195 -0.446 -3.488 Constant 4.004 46.512 R=0.69; R2=0.48; Sig.=0.000*; Std. error of the estimate= 0.395 Variable Coefficients b β t Linear model Simple -0.282 -0.531 -3.596 Constant 3.814 49.412 R=0.53; R2=0.28; Sig.=0.001*; Std. error of the estimate= 0.475
Sig. 0.000** 0.001** 0.000* Sig. 0.001* 0.000*
To further identify the Kano quality classification of ‘Complex-simple’ for each user group, Ting and Chen’s [8] regression method, as described above, was also used for Kano quality classification. The significance of the regression coefficients was then used to determine the proper category. Table 4 shows the regression analysis results of the Kano quality classification of ‘Complex-simple’ for these three consumer groups. Table 4. Regression analysis results of ‘Complex-simple’ for three consumer groups Group (the number of consumers/% ) ST
β1
1 group (6/10%) -0.093 2nd group (23/38.3%) -0.315 3rd group (31/51.7%) -0.031 *:Sig.<0.05; ***:Sig.<0.001; n.s.:non-significant.
Sig.
β2
Sig.
R2
n.s.
-0.685 -0.702 -0.621
*** *** ***
0.412 0.398 0.367
* n.s.
Kano classification -M NB -M
28
C.–C. Chen and M.-C. Chuan
As Table 4 indicates, the results of Kano classification of the 1st and 3rd groups are the same, the ‘simple’ attribute is classified as a reversed ‘must-be’ (-M) quality. For the 2nd group, the ‘simple’ attribute is categorized as a ‘nominal-the-better’ (NB) quality. Thus, the proposed quality type, nominal-the-better (NB) performance characteristic, in the extended Kano model exists. Therefore, using the proposed extended Kano model to appropriately classify user needs into quality categories with more types may be justified. The result also explains the individual difference of users on Kano quality classification. Thus, it was verified that the advantages of applying Kano model can better understand user requirements, helps to identify the critical qualities of satisfaction. 4.2 A Design Decision-Making Support Model for Prioritizing Affective Qualities To resolve the difficult trade-offs of optimizing user satisfaction with multipleattribute characteristic, researchers have typically adopted conventional weight determination methods, such as the eigenvector method, entropy method, and analytic hierarchy process (AHP) etc., for determining attribute priorities[5]. However, these methods may not be able to accurately reflect the relationship between qualities and user satisfaction levels. Using only the weight determination methods may lead to misinterpretation regarding real user requirements, and hence, inappropriate resource allocation. Thus, this study proposes a design decision-making support model showing how to prioritize the qualities during the product development stage according to the result of Kano classification. The critical (affective) qualities of a product must be identified first, and an SD survey on existing products should be conducted to determine user evaluations of these qualities and overall satisfaction. Qualities must then be classified into Kano categories based on attribute performance and user satisfaction relationship. An example of this classification is the regression approach and the decision table in Table 1. Once the qualities are categorized, the product can be designed to meet the different requirements of each quality attribute, according to its category. Comparing the performance of the current design with that of the average design can clarify the strengths and weaknesses of the current design. Referring to the definition of the Kano categories, the priority for attribute improvement can be determined. Figure 5 depicts the decision matrix. According to the decision matrix, one-dimensional qualities, which can create user satisfaction and have an actual competitive advantage, should be improved in all situations. Attractive qualities with better performance than average (target) can be the key to beating competitors because they delight users. Therefore, these qualities should be further developed. On the contrary, weak attractive qualities, which do not result in user dissatisfaction, do not critically endanger the competition position, and can be disregarded. Must-be qualities with a minimum acceptable level of attribute performance are taken for granted. Thus, weak must-be qualities should be improved to a certain level (average) to avoid major user dissatisfaction. If must-be qualities are on the same level as the average (target), they should be maintained. For better resource allocation, the performance of strong must-be qualities should be maintained or even adjusted downwards to avoid over-fulfillment in exchange for cost savings. For nominal-the better qualities, the performance should be as close to the optimal (averaged) level as possible to achieve optimal user satisfaction. Indifferent qualities,
A Design Decision-Making Support Model for Prioritizing Affective Qualities
29
Fig. 5. Framework to prioritize the qualities for enhancing user satisfaction
which do not affect user satisfaction, should be disregarded or even reduced in exchange for cost savings. This study presents a case example to illustrate how the extended Kano model can be integrated into a design decision-making support model. In this case, the No. 7 sample in the SD survey represents the current design of Company A, as shown in Figure 2, and the other 34 samples represent the currently existing competitors in the market. The representative affective qualities were selected with the criteria of covering the different types of quality categories in Table 2. However, the –M quality attribute of ‘simple’ is reversed, ‘complex’ attribute which then is classified as M quality. The SD survey was conducted to gather the users’ evaluation on these qualities performance and overall satisfaction of each design. According to the result of SD survey, the quality classification of each representative attribute was identified by applying the extended Kano model, as shown in Table 1. Table 5 presents the quality classification of each representative attribute, the average and maximum performance of currently existing products for each quality attribute, and the performances of the sample No. 7. Table 5. Consumer evaluation and Kano classification results Quality attribute Kano classification No. 7 Average performance Max Performance Optimal value
Lively
Weak Strong Fun Complex Fashionable Anticipated Elegant Feminine Intelligible interface interface
Overall satisfaction
M
M
M
O
A
O
I
I
-0.56
-0.54
-0.70
1.64
1.82
1.58
0.88
1.12
4.00
0.16
0.10
-0.25
1.22
1.11
0.94
0.58
0.48
3.81
1.60
1.88
1.12
2.66
1.98
2.12
1.66
1.28
4.72
0.16
0.10
-0.25
2.66
1.98
2.12
-
-
-
30
C.–C. Chen and M.-C. Chuan
Comparing the performance of the sample No. 7 with the average performance identifies the weak and strong qualities. Because ‘feminine’ and ‘intelligible’ are categorized as indifferent qualities which do not affect user satisfaction, they should be disregarded. For the one-dimensional qualities, ‘fashionable’ and ‘elegant,’ the increase of attribute performance will lead to higher user satisfaction. They are critical as competitive means with the targeted product and should be further improved to gain a competitive advantage. Referring to our model, enhancing the high performance of an attractive attribute is desirable. Thus, the strong attractive attribute ‘anticipated’ interface style could be further developed to effectively enhance overall satisfaction. The weak must-be qualities ‘lively’, ‘fun’ interface style and ‘complex’ should be improved to a certain level. In row 5, the average performance is assigned as the level to be reached for the must-be qualities.
5 Conclusion Considering the difference of user perception between subjective and physical quality attributes, an extended Kano model was proposed and verified in this study. The empirical results support that the extended Kano model can help designers to better understand user requirements of user satisfaction in a systematic and comprehensive manner. However, to effectively achieve higher user satisfaction, designers should not only identify the different effects of qualities, but also be aware of the accurate quantitative information for properly allocating design resource. This proposed design decision-making support model, by considering both the Kano classification of qualities and the resource allocation method, provides a better prioritization plan for improving quality attribute performance. It can effectively indicate where and how much the resource should or should not be allocated to resolve the trade-off problem in multiple-attribute decision making situations in product design. Although the proposed approach in this study uses mobile phone designs as examples, with some adjustment, it may be applied to other fields or products with multiple-attribute characteristics for resolving the trade-off decisions and resource allocation.
References 1. Berlyne, D.E.: Studies in the new experimental aesthetics. Hemisphere Publishing Corporation, Washington, DC (1974) 2. Chang, H.C., Lai, H.H., Chang, Y.M.: Expression modes used by consumers in conveying desire for product form: A case study of a car. International Journal of Industrial Ergonomics 36, 3–10 (2006) 3. Griffin, A., Hauser, J.R.: The voice of the user. Marketing Science 12(1), 1–27 (1993) 4. Hsiao, K.A., Chen, L.L.: Fundamental dimensions of affective responses to product shapes. International J. of Industrial Ergonomics 36, 553–564 (2006) 5. Hwang, C.L., Yoon, K.: Multiple Qualities Decision Making. Springer, Berlin (1989) 6. Kano, N., Seraku, N., Takahashi, F., Tsuji, S.: Attractive quality and must-be quality. Hinshitsu (Quality, the Journal of Japanese Society for Quality Control) 14, 39–48 (1984) 7. Taguchi, G., Clausing, D.: Robust Quality. Harvard Business Review, Boston (1990) 8. Ting, S.C., Chen, C.N.: The asymmetrical and non-linear effects of store qualities on user satisfaction. Total Quality Management 13(4), 547–569 (2002)
Local Search Heuristics for Robotic Routing Planner Stanislav Slušný and Roman Neruda Institute of Computer Science Academy of Sciences of the Czech Republic Pod vodárenskou vˇeží 2, Praha 8 Czech Republic
[email protected] Abstract. Several approaches to one particular variation of robot path planning problem are introduced in this paper. A detailed description of local neighborhood search heuristics as well as a comparison with more sophisticated algorithms based on constraint programming technique is presented. The enhancements and further adaptation to the real world environment are discussed as well.
1 Problem Area and Related Work Recent advances in robotics have allowed robots to operate in cluttered and complex spaces. The technology has evolved, and nowadays robots can execute fundamental tasks in real-world environments. However, to efficiently handle the full complexity of the real-world tasks, new deliberative planning strategies are required. Several tasks from robotics share the same peculiarities and restrictions given by the robot construction and the nature of environment they deal with. The capacity of the robot is mostly limited and it can be usually replenished in one of the available depots placed in room corners or determined buildings standpoints. The task we are dealing with is to develop a robot solving a routing problem - an often overlooked variant of the standard Vehicle Routing Problem. In our setting, the robot has to clean out a collection of wastes spread in a building; but under the condition of not exceeding its internal storage capacity in any time. The storage tank can be emptied in one of available collectors. The goal is to come up with the routing plan, minimizing the covered trajectory (Fig. 1). In this work, we examine the integration of high-level planner based on Constraint Programming (CP) with reactive-based robot path planner. The amount of literature addressing VRP subject and its derived variants is enormous. However, to our best knowledge, the variant of VRP problem, called VRP with satellite facilities (VRPSF) has not been solved extensively. The latest published results can be found in [1], and are mentioned also in [2]. Authors presented an exact procedure for solving the VRPSF that combines heuristics with methods from polyhedral theory in a branch and cut framework. They focused on obtaining an optimal solution, that took almost an hour on 15-customer case instances. Nowadays, there exist truly numerous optimization attitudes ranging from the “classic” operation research field to those inspired by nature [3,4]. In the published results, a majority of authors show their testing method is applicable to use or even outperforms some other methods for the studied problem variants. Therefore, to properly decide which way to choose is a hard row to hoe. D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 31–40, 2011. c Springer-Verlag Berlin Heidelberg 2011
32
S. Slušný and R. Neruda
Fig. 1. Example of 6+3 task. The biggest circle represents the robot, six smaller circles show distribution of wastes. The three squares are collectors. The goal of the robot is to collect wastes and minimize covered distance. In this example, robot can hold at most two wastes at a time.
There have been several ways of tackling VRP problem using CP presented so far. Among the first pioneers in this field is considered to be the pursuit of Shaw [5] who used LDS methods with Large Neighborhood Search for capacitated VRP and VRP with time windows (VRPTW). Another work [6] describes a method for combining Constraint Programming with Tabu Search and Guided Local Search techniques to solve the standard VRP problem. A combined method was able to outperform the other methods on most problems. To mention one more approach, authors of [7] studied benefits of several constraint-based operators on 56 Solomon instances of VRPTW; the method showed good results and allowed easy further modification of the model.
2 Problem Formalization This part formally explains the problem and its underlying mathematical formulation. We start by defining the model inspired by the flows in network (Kirchoff’s constraints) as proposed by Simonis [8]. The real robot’s environment is represented as a mixed weighted graph ((V, E), where V = {I} ∪ W ∪ C ∪ {D}) contains both directed and undirected edges. Since the distances between any two objects are symmetrical we try to eliminate as much pairs of directed edges as possible and replace them by single undirected edge. Vertex representing the initial position of the robot (I) is connected by directed edges to all vertices representing a waste (∈ W). The subgraph over waste vertices forms a complete undirected graph. Each pair of waste and collector vertex (∈ C) forms a directed digraph. For simplification of the following model, we introduce a dummy vertex D connected only with incoming directed edges from the collectors. These edges have weight 0, while all other weights correspond to the initial nonnegative distances taken from the real world. This elimination of edges implicitly breaks a lot of symmetries and the constraint based model does not have to reason about them any more. The task is to find a minimal-cost path starting at I and leading to D, such that every waste from W is visited exactly once and the number of any consecutive vertices from W must be bound by the given capacity of the robot.
Local Search Heuristics for Robotic Routing Planner
33
Fig. 2. Schematic drawing of the input graph
3 CP Model Based on Network Flows We start by defining the core component of the constraint model - the decision variables and their constraints respectively. For every edge e we introduce a binary decision variable Xe stating whether the edge is used in the solution or not. Let IN(v) and OUT(v) denote the set of incoming and outgoing (directed) edges from the vertex v respectively and ICD(v) the set of incident (undirected) edges to the vertex v. Then the following arithmetic constraints Xe = 1, Xe = 1, e∈OUT(I)
∀c ∈ C :
e∈IN(D)
Xe =
e∈OUT(c)
Xe
(1)
e∈IN(c)
ensure that robot initiates the path from the starting position via a single edge and the route is finished entering the destination from some sole collector. The number of arrivals and departures from any collector must be equal. Requirement that robot has to collect every waste w ∈ W once is provided by the following constraint: Xe ≤ 1, Xe ≤ 1, e∈OUT(w)
e∈IN(w)
(2)
Xe = 2
e∈OUT(w)∪IN(w)∪ICD(w)
These relations allow isolated loops along with the valid path (Fig. 3, left). To suppress this, VRP models usually use an enumeration technique, where for every pair of subsets of the vertices a new constraint is posted stating the subsets have to be connected. In our problem, we need to consider every pair of subsets S1 , S2 ⊂ (W∪C), such that neither S1 nor S2 can be composed only of collectors, since the robot may not necessarily visit some, and at the same time it holds: S2 = (W ∪ C) \ S1
∧
S1 ∩ W = ∅
∧
S2 ∩ W = ∅
(3)
34
S. Slušný and R. Neruda
Fig. 3. Example of a forbidden loop allowed by pure Kirchoff’s constraints
Next, we need to arrange that at least one edge from the final path will serve as a direct connection between S1 and S2 . This requirement can be easily realized considering all edges e connecting S1 with S2 with the following constraints: Xe ≥ 1 (4) e∩S1 =∅ ∧ e∩S2 =∅
Finally, the last set of constraints guarantees that the robot’s capacity will never be exceeded. We introduce non-decision integer variables Sv for every vertex v indicating the amount of waste in robot’s stack after visiting the vertex. The corresponding constraints for every waste w ∈ W: Xe = 1 =⇒ Sw = 1 (5) e∈IN(w)
∀e, f ∈ ICD(w), e = {u, w}, f = {w, v} : Xe + Xf = 2 =⇒ |Su − Sv | = 2
(6)
∀e = {u, w} ∈ ICD(w) :
Xe = 1 =⇒
|Su − Sw | = 1 Su ≥ 1 Sw ≥ 1
(7)
ensure the robot’s internal storage is purged at the beginning and also after leaving any collector (Eq.5); and as robot collects waste the stack gradually fills up (Eq.6-7). The solver reasons only about the decision Xe variables, while the domains of Sv variables are continuously filtered thus the validity of the real world conditions is preserved. The objective function minimizing the total cost of traveled edges is defined as e∈E Xe · weight (e). Scalability. The major limitation of the above formulation is the constraint represented as the equation 4. Due to the exponential grow of possible subsets forming a constraint, the full enumeration is applicable only for smaller instances. Some authors ([9]) propose to use single or multi-commodity flow principles to reduce the number of constraint by
Local Search Heuristics for Robotic Routing Planner
35
introducing auxiliary variables for TSP variants. Because of the combination of both directed and undirected edges in our model, the definition of flow and “flow-based” constraints would be unnecessarily complex. To overcome the exponential growth of the model, the method we use in our routing problem is based on lazy insertion. The solver starts without subtour elimination constraints (alternatively it can be initiated with a reasonable amount of them) and searches for a feasible solution. If the retrieved solution is a valid plan, the new bound is inserted to the model. Otherwise, the subtours are diagnosed, and relevant elimination constraints are posted to the model. In both cases the search continues from the node in a tree search with additional constraints until the time limit is reached or the last solution was optimal. Search heuristics. In constraint programming, the variable and value selection heuristics determine the shape of the search tree, which is usually traversed in a depth-first order. A clever branching strategy is a key ingredient of any constraint approach. The solver uses variable selection heuristics similar to the greedy approach from TSP [10]. A new Xe variable is selected for assignment in the following way. If the path is empty, start from the initial position of the robot using the cheapest edge. If possible, extend the existing path to the nearest waste; otherwise, visit the closest collector.
4 CP Model Based on Finite State Automaton The model is based on the idea that valid path is formed by a sequence of nodes that satisfies some constraints. We define four types of constraints. First, there is a restriction on existence of connection between the nodes — a routing constraint. Second, there is a capacity constraint stating there is no continuous subsequence of waste nodes whose length exceeds the given capacity. Third, each waste must be visited exactly once, while the collectors can be visited arbitrarily times — an occurrence constraint. Finally, a cost constraint guards the sum of weights of used arcs does not exceed given limit (used for finding the path with the minimal weight). Naturally, the first three types of constraints can be described using a finite state automaton as they use a local information about state transitions; so the path can be modeled using a regular constraint. Clearly, the size of resulting automaton depends on the number of nodes and on robot’s capacity, therefore we decided for a combination of constraints that captures the above restrictions. These constraints resemble the idea of slide constraint described in literature. Before going into a description of constraint model, it is important to realize that the exact path length is unknown in advance (the collector nodes can be visited more times). Let us assume that the path length is measured as the number of visited nodes, the robot starts at initial position and finishes at some collector node (in contrast to the previous model we will not need dummy destination). Let K = |W| be the number of waste nodes, L = |C| be the number of collector nodes, and C ≥ 1 be robot’s capacity. Then the maximal path length is 2K + 1 (collector node is visited immediately after a waste node). Computing the minimal path length is a bit tricky, we can assume the robot exploits fully its capacity - it visits exactly C waste nodes before going to a collector node (except the last loop). If K mod C = 0, one can easily see the minimal path length is 1 + (C + 1) · (K ÷ C). Otherwise, the minimal path length is 2 + (C + 1) · (K ÷ C) + (K mod C).
36
S. Slušný and R. Neruda
In our constraint model we use three types of variables. Let N = 2K + 1 be the maximal path length. Then we have N variables N odei , Capi , and Costi (i = 1, . . . , N ) so we assume the path of maximal length. The meaning of the variables is as follows. The N ode variables describe the path, hence their domain is a set of positive integers 1, . . . , K to identify the waste nodes and K + 1, . . . , K + L for the collector nodes. Capi is used capacity of the robot after leaving node N odei (Cap1 = 0 as robot starts empty), the domain is 0, . . . , C, and Costi is the cost of arc used to leave node N odei (CostN = 0), the domain consists of non-negative numbers. In order to allow paths shorter than the maximal length, we introduce a dummy node such that there is a zerocost arc from any collector to this dummy node (its identification is 0) and there is a zero cost arc (loop) from the dummy node to the dummy node. In other words, when we reach the dummy node, it is not possible to leave it. To model the problem restrictions we used the following constraints. Global cardinality constraint over the set N ode1 , . . . , N odeN ensures each waste node appears exactly once in this set, each collector node appears at most (K + 1) times, and the dummy node appears any number of times. If D is a variable describing the number of appearances of the dummy node (this variable participates in the global cardinality constraint) then we can use constraint N odeN −D > 0 to show where the dummy nodes cannot be located in the path. We can also set the initial upper bound for D by using the information about the minimal path length (D ≤ N − M inP athLength). In summary, the global cardinality constraint realizes the occurrence constraint. The cost constraint can be easily described as ni=1 Costi < CostLimit. We connect the Cost variables with the N ode variables to describe the routing constraint using a set of ternary constraints over variables N odei , Costi , N odei+1 i = 1, . . . , N − 1. This set of constraints corresponds to the idea of slide constraint. We implement the constraint between variables N odei , Costi , N odei+1 as a ternary tabular (extensionally defined) constraint where the triple (p, q, r) satisfies the constraint if there is an arc from p to r with cost q. In other words, this table describes the original routing network with costs extended by the dummy node. Finally, the capacity constraint is realized using a set of ternary constraints over variables Capi , N odei+1 , Capi+1 i = 1, . . . , N − 1, again exploiting the idea of slide constraint. The constraint is again implemented using a tabular constraint with the following semantics. Triple (p, q, r) satisfies this constraint if – q is identification of a collector node (q > K) or a dummy node (q = 0) and r = 0 – q is identification of a waste node (0 < q ≤ K) and r = p + 1. The constraint model describes the problem and any instantiation of the variables describes a feasible solution to the problem. We do not show a formal proof, but one can easily realize that the implemented constraints correspond to the constraints in problem specification. Note that the search space is defined by the N ode variables only because if these variables are instantiated then the Cap and Cost variables are instantiated by propagating through the above tabular constraints. Search strategy and selection heuristics. When searching for the solution we first use a greedy approach to find the initial solution (the initial cost). This greedy algorithm instantiates the variables N odei in the order of increasing i in such a way that the current path is extended with the arc having the smallest cost (naturally, the capacity constraint is taken in account).
Local Search Heuristics for Robotic Routing Planner
37
To find the optimal solution we use a standard branch-and-bound approach with restarts. To instantiate the N ode variables we use min-dom heuristic for variable selection and we select the values in the order defined in the problem (waste nodes are tried before the collector nodes). When a feasible solution is found a new bound is set and search procedure is restarted to find a better solution until some solution is found. The last feasible solution is the optimal solution.
5 CP Model Embedded in Local Search To assure the good quality of the plan even for larger instances, we try to embed the CP solver as an operator within the local neighborhood search (LNS - as proposed by Shaw [5]). In our case, the state in LNS corresponds to the plan - a valid path for the robot, the potential solution to the problem. As the local search repeatedly chooses a move that reduces the objective function, it can get “trapped” in the first local minimum it encounters. In order to escape a local minimum, a controlled method of accepting a descending move is required. In this paper, we examined the simplified simulated annealing metaheuristics. The initial solution for the local search method is obtained from the CP solver described in the section 3 and 4; however, this time we do not search for the optimum but only for the first valid solution following the inner heuristic. The crucial part of the LNS is the neighborhood operator which defines how to move to explore and find a possibly better state. Neighborhood generator. Our operator removes an induced path of the predefined length from the actual solution and fixes the rest of edges from the currently best path. This input with reduced freedom (robot has to follow the fixed edges) is passed to the CP solver. The number of fixed edges defines the complexity of the relaxed problem for the embedded CP solver. In other words, this is the parameter that controls the freedom of the CP solver has during the search.
6 Experiments All the following measurements were performed on Intel Xeon
[email protected] with 4GB of RAM, running a Debian GNU Linux operating system. The first model was implemented in Java using Choco v.21 , an open-source constraint programming library. The second model, based on finite state automaton, was implemented in SICStus Prolog2. All tested instances were generated using a square-sized robot arena. The positions of the wastes and the initial pose of the robot were uniformly distributed inside the field. The collectors were uniformly distributed along the boundaries of the arena and the weights set up as a point-to-point distance using the Euclidean metric. 1 2
http://choco.emn.fr http://www.sics.se/sicstus
38
S. Slušný and R. Neruda
Time to compute optimal paths 250
Model A Model B
20
Time (sec)
200 10
150
100
0 2
3
4
5
6
50
0 2
3
4
5
6
7
8
9
Wastes
Fig. 4. Up Time to compute and prove optimal solution using CP models over increasing number of wastes. We can see that automaton based model (blue color) outperforms the flow based one (red color).Down Physically realistic robot simulator.
Local Search Heuristics for Robotic Routing Planner
39
6.1 Performance of Pure CP Models The graph in Fig. 4-up represents the performance of the CP models as a function of instance size. As we can see, the solver based on the first model, scales well up to the instance of size 6+3 (waste+collectors). The second model solves instances of size 8+3 in reasonable amount of time. Note that this is the time needed to find and prove the optimality of the solution. Since in robotics, finding a good plan fast is more important than having the optimal one late, we started to investigate the quality of the plans found by the CP solvers in a limited time. 6.2 Performance of Local Search Our goal was to implement an anytime planner, that is able to solve instances of much bigger sizes (40 + 3). The planner should have been based on local search, as described in the previous sections. Although this approach outperformed pure CP method based on the former (flow-based) model, it showed worse performance than planner based on automaton. In other words, local search was not able to improve solutions found by the automaton in short time limit (2 minutes) - hence, the plan found by the solver based on automaton in few seconds indicates very good quality.
7 From High-Level Planning to Real World To evaluate an overall performance of the system, we use the professional commercial development environment Webots [11], that enables us to test robot in a physically realistic world. The robot is working in an indoor environment (Fig. 4-down). The very first step is the map building process. The robot autonomously scans and identifies yet unknown environment and creates the input graph for a high-level path planner. CSP plans the optimal path taking all reality-driven constraints into an account. Robot immediately starts to execute the plan, taking care of avoiding both static and dynamic obstacles. Meanwhile localization and motion planning are continuously carried out.
8 Conclusions In this work, we developed the robotic architecture incorporating both purely reactive execution and sophisticated reasoning that works in complex and dynamic environment. The use of constraint based approach allowed us to naturally define an underlying novel model based on automaton for the solver that was able to find first solution in hundreds of microseconds on problems of reasonable size. Further, we studied local search techniques to assure the good quality of the plan produced by the CP solver. While there had been numerous other works tackling the generic VRP, we concentrated on achieving reasonable performance of the solver under short time constraints, since in robotics the ability to plan fast is often of a bigger importance than ability to plan optimally. The realistic simulations using the Webots system and deep, statistically stable comparison of planning methods indicate the approach is promising. In the future, we would like to focus on uncertainty handling.
40
S. Slušný and R. Neruda
Acknowledgments This research has been fully supported by the the project KJB100300804 of the ˇ GA AV CR.
References 1. Bard, J.F., Huang, L., Dror, M., Jaillet, P.: A branch and cut algorithm for the VRP with satellite facilities. IIE Transactions 30(9), 821–834 (1998) 2. Cordeau, J.-F., Laporte, G., Savelsbergh, M.W.P., Vigo, D.: Vehicle routing. In: Barnhart, C., Laporte, G. (eds.) Transportation, Handbooks in Operations Research and Management Science, vol. 14, pp. 367–428. Elsevier, Amsterdam (2007) 3. Bell, J.E., McMullen, P.R.: Ant colony optimization techniques for the vehicle routing problem. Advanced Engineering Informatics 18(1), 41–48 (2004), http://www.sciencedirect.com/science/article/B6X1X-4DR0TJ0-5/ 2/1bef0eafe5930a7bfeafaf51f7daa183, doi:10.1016/j.aei.2004.07.001 4. Baker, B.M., Ayechew, M.A.: A genetic algorithm for the vehicle routing problem. Comput. Oper. Res. 30(5), 787–800 (2003) 5. Shaw, P.: Using constraint programming and local search methods to solve vehicle routing problems. In: Maher, M.J., Puget, J.-F. (eds.) CP 1998. LNCS, vol. 1520, pp. 417–431. Springer, Heidelberg (1998) 6. De Backer, B., Furnon, V., Shaw, P., Kilby, P., Prosser, P.: Solving Vehicle Routing Problems Using Constraint Programming and Metaheuristics. Journal of Heuristics 6(4), 501–523 (2000) 7. Rousseau, L.-M., Gendreau, M., Pesant, G.: Using Constraint-Based Operators to Solve the Vehicle Routing Problem with Time Windows. Journal of Heuristics 8(1), 43–58 (2002) 8. Simonis, H.: Constraint applications in networks. In: Rossi, F., van Beek, P., Walsh, T. (eds.) Handbook of Constraint Programming, pp. 875–903. Elsevier, Amsterdam (2006) 9. Pop, P.C.: New Integer Programming Formulations of the Generalized Travelling Salesman Problem. American Journal of Applied Sciences 11, 932–937 (2007) 10. Ausiello, G., Protasi, M., Marchetti-Spaccamela, A., Gambosi, G., Crescenzi, P., Kann, V.: Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties. Springer-Verlag New York, Inc., Secaucus (1999) 11. Cyberbotics: Webots simulator, http://www.cyberbotics.com/
Fuzzy Disturbance-Observer Based Control of Electrically Driven Free-Floating Space Manipulator Zhongyi Chu and Jing Cui School of Instrument and Opto-Electronics, Beihang Uiversity, Beijing 100191 {alfred.hofmann,ursula.barth,ingrid.beyer,natalie.brecht, christine.guenther,frank.holzwarth,piamaria.karbach, anna.kramer,erika.siebert-cole,lncs}@springer.com http://www.springer.com/lncs
Abstract. This paper is concerned with the robust tracking control of an electrically driven free-floating space manipulator with internal parameter uncertainty and external disturbance, a fuzzy disturbanceobserver based output feedback control are employed to provide the solution to the control problem. Wherein, the parameters are tuned online by using fuzzy logic systems to compensate equivalent disturbance, and the stability condition of the closed-loop system is also derived.The suggested method does not require exact prior knowledge of the system and less parameters to tune. Simulation results from a two-link electrically driven space manipulator show the effectiveness and applicability of the proposed control scheme. Keywords: fuzzy disturbance observer, free-floating, space manipulator.
1
Introduction
Space robot will play an important role in future space projects, many researchers have focused on its kinematics, dynamics and control problem. Thereinto, freefloating manipulator becomes hotspot, where neither the position nor the orientation of the base(satellite) is controlled for the purpose of energy saving,the problem of designing controller is extremely challenging because of nonlinearity and complexity. Dealing with internal parameter uncertainty and external disturbance, adaptive control methods have to be used, which take inertia and forces into account. The dynamical model of free-floating space manipulator system cannot be linearly parameterized,therefore,the adaptive control using a linearly parameterized model cannot be used directly.In order to overcome this problem,Xu[1] proposed a non-linear parameterization adaptive control scheme with an extended robot
Z.Y. Chu is with the school of instrument science and opto-electronics,
[email protected].
D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 41–49, 2011. c Springer-Verlag Berlin Heidelberg 2011
42
Z. Chu and J. Cui
model. Parlaktuna [2] proposed a joint space adaptive tracking control scheme with an on-line parameter identification process different from the off-line adaptive identification scheme in Shin[3]. However,these adaptive schemes usually can only handle parameter uncertainties,not existing disturbance and other unmodeled nonlinearities. Wang[4] proposed a fuzzy neural network scheme to control the joint position of a planar two-link space robot.However, most of them usually require measurements of velocity or even acceleration, and the implementation always requires a precise knowledge of the structure of the entire dynamic model. On the other hand, aforementioned controllers were designed with respect to the second order dynamics and the actuator dynamics were neglected. Hwang pointed out the effects of the motor dynamics and presented robust tracking controller for electrically driven robot with model uncertainties in the robot dynamics and the motor dynamics based on adaptive fuzzy logic and backstepping method [5]. Zhu reported virtual decomposition based adaptive control for the electrically driven space robots[6].Becedas developed a GPI(generalized Proportional Integral) controller combined with a nonasymptotic algebraic identification method of the unknown parameters[7].The common feature existing in most of previous algorithms is that the dynamics of the mechanical subsystem must be incorporated into the derivation of current tracking error dynamics, and so the resulting electrical dynamics are quite complicated. Moreover,in many practical control systems these mechanical and electronical elements may receive uncertain variations because of space environment and some unpredictable phenomena and so on, this problem persistes nowadays. In our earlier work[8],disturbance-observer based control of electrically driven space manipulator was investigated to deal with internal parameter uncertainty and external disturbance, and some preliminary results was unveiled. The intention of this paper is to present more effective adaptive parameter estimating algorithm and stability conditions for this method. This paper proposed a new fuzzy adaptive disturbance observer-based robust tracking control for an electrically driven space manipulator. Two adaptive fuzzy logic systems are employed to estimate the nonlinear and uncertain terms occurring in the space manipulator and the joint motor dynamics.
2
Modeling and Equations of Motion
The nominal model of a rigid manipulator mounted on a free-floating base driven by DC(Direct Current) brushless motors in joint space is expressed as [8] ˇ ˇ ˇ ˇ eI M(q)¨ q + C(q, q) ˙ q˙ + F(q, q) ˙ =K
(1)
ˇ dI + R ˇ ·I+K ˇ t dq = u L (2) dt dt where q, q, ˙ and q¨ are the joint angle,velocity, and acceleration of the manipˇ ˇ ulator, respectively. M(q)is the inertia matrix and C(q, q) ˙ is the vector of the ˇ ˇ e is the torque concoriolis and centrifugal forces;F(q, q) ˙ is the friction terms; K ˇ R ˇ and stant of motors; I is the armature current vector of n joint motors;L,
Fuzzy Disturbance-Observer Based Control
43
ˇ t are inductance of armature circuit, resistance, and back electromotive force K constant(back EMF)of the motors, respectively.u is the control voltage output of the motors. The nominal values are assumed to be different from the actual values due to the payload variations and internal model uncertainties. Especially, if the manipulator is perturbed further by the external disturbance d, the actual dynamics of the nominal model eqn(1,2) may be expressed as ˆ ˆ ˆ M(q)¨ q + C(q, q) ˙ q˙ + F(q, q) ˙ +d=I
(3)
dI ˆ ˆt dq = L ˆ −1 u +R·I+K (4) dt dt ˆ denotes the actual value of variable ”−”. where the characters with ”hat” ”−” using the nominal model, the actual model (3,4) can be rewritten as M(q)¨ q + C(q, q) ˙ q˙ + F(q, q) ˙ + d1 (q, q, ˙ I) = I
(5)
dI dq ˇ −1 u + R · I + Kt + d2 (q, ˙ I, u) = L (6) dt dt By combining the uncertainties and nonlinearities together, the dynamics of the manipulator (5)driven by the dc motors (6) can be expressed as q ¨ = M−1 (q)(−C(q, q) ˙ q˙ − F(q, q) ˙ − d1 (q, q, ˙ I) + I) −1 −1 = −M (q)C(q, q) ˙ q˙ − M (q)(F(q, q) ˙ + d1 (q, q, ˙ I)) + M−1 (q)I = −M−1 (q)C(q, q) ˙ q˙ + −1 −1 M (q)Ω(q, q, ˙ I) + M (q)I
(7)
and dI ˇ −1 u − Kt dq − d2 (q, = −R · I + L ˙ I, u) dt dt ˇ −1 u + Ωm (q, =L ˙ I, u)
(8)
if the equivalent disturbance Ω(.), Ωm (.) can be estimated properly, then we can design simplified controller by adding the estimated disturbance signal to the control input. In the subsequent development, an observer-controller structure will be presented, two adaptive fuzzy systems will be employed to approximate the uncertain and nonlinear terms included in the electrically driven space manipulator to a sufficient degree of accuracy.
3 3.1
Controller Design Fuzzy Disturbance Observer
In this subsection, the fuzzy adaptive disturbance observer(FADO) is presented. ˆ | Θ) that provides a good approximation to To construct the fuzzy system Ω(. Ω(.), the FADO system is designed:
44
Z. Chu and J. Cui
ˆ | Θ) + M−1 (q)(I − C(q, q)) μ˙ = Ω(. ˙ + σ1 (q˙ − μ)
(9)
where σ1 is a positive scalar, and a new variable ζ = q˙ − μ
(10)
is named as disturbance observation error[9],and ˆ 1 , x2 , x3 | Θ∗ ) + ΔΩ(.) Ω(x1 , x2 , x3 ) = Ω(x ˆ | Θ) Θ ∈ Uz ) Θ∗ = arg min (sup Ω(z) − Ω(z
(11)
ˆ 1 , x2 , x3 | Θ) maynot be used directly beUnfortunately, the fuzzy system Ω(x cause of unmeasurement of partial state variables, as a next step in the design of fuzzy adaptive disturbance observer, a nonlinear state observer is designed: z1 ) zˆ˙ 1 = zˆ2 + Kv (x1 − ˆ ˆ z1 , zˆ2 , zˆ3 | Θ) + M−1 (x1 )(x3 − C(x1 , x2 )) zˆ˙ 2 = Ω(ˆ +Kp (x1 − ˆ z1 )
(12)
where Kv and Kp are symmetric positive-definite matrices. and zˆ3 is defined as zˆ3 = GL (s)x3
(13)
where GL (s) is a Butterworth low-pass filter.By defining the observation error ˆ1 = z˜1 x ˜1 = x1 − xˆ1 = q − x x˜2 = x2 − x ˆ2 = q˙ − x ˆ2
(14)
where variables x ˆ1 and x ˆ2 are defined as x ˆ1 = zˆ1 xˆ2 = zˆ2 + Kv ˜ z1
(15)
and x˙ = Ao x + Bo (Θ∗ y(x1 , x2 , x3 ) − Θy(ˆ z1 , zˆ2 , zˆ3 )) + Bo Δ1 (z) ∗ = Ao x + Bo (Θ − Θ)y(ˆ z1 , zˆ2 , zˆ3 ) z1 , zˆ2 , zˆ3 )) + Bo ΔΩ(z) +Bo Θ∗ (y(x1 , x2 , x3 ) − y(ˆ ˜ z1 , zˆ2 , zˆ3 ) + Bo Δ(z, zˆ) = Ao x + Bo Θy(ˆ where y(.) is an FBF(Fuzzy Basis Function),i.e.
(16)
Fuzzy Disturbance-Observer Based Control
45
r n n y i = Πj=1 μAij (.)/ i=1 (Πj=1 μAij (.)),μAij (.) are member functions, and the error of adaptive tuning parameter is defined as ˜ = Θ∗ − Θ Θ
(17)
and zˆ = (ˆ z1T , zˆ2T , zˆ3T )T , Δ(z, zˆ) = ΔΩ(z)+Δy(z, zˆ), Δy(z, zˆ) = Θ∗ {y(x1 , x2 , x3 )− y(ˆ z1 , zˆ2 , zˆ3 )}. Following the same developments, the dynamics of the disturbance observer error ζ becomes ˜ z1 , zˆ2 , zˆ3 ) + Δ(z, zˆ) ζ˙ = −σ1 ζ + Θy(ˆ 3.2
(18)
Controller Design
In this subsection, the direct output feedback method is employed to design the controller and the corresponding tuning methods for the FADO are developed. Define the control error e = q − qd and the filtered tracking error r = e˙ + KD e,then, differentiating r(t) and substituting it to the Eqn(7),and design a stabilizing control α for current control input x3 = I ˆ z1 , zˆ2 , zˆ3 | Θ) + KP r α = M (q)¨ qd + C(q, q) ˙ q˙d − Ω(ˆ
(19)
one finds the dynamics in terms of r(t) as ˆ q, M (q)r˙ = −C(q, q)r ˙ + Ω(q, q, ˙ I) − Ω(q, ˙ I) − KP r + x ˜3
(20)
Replacing the unknown system Ω(.) with the FADO yields ˜ M (q)r˙ = −C(q, q)r ˙ + Θy(z) − KP r + x ˜3 + Δ
(21)
On the other hand, design a stabilizing control u for motor dynamics, the error dynamics of x ˜3 between x3 and α is given as follows: q˙ ˇ −1 u + d2 (x2 , x3 , u) − α˙ x ˜˙ 3 = x˙ 3 − α˙ = −RI − Kt + L (22) dt From (19), we can see that the controller α is a function of zˆ1 , zˆ2 , zˆ3 , x1 , and qd , q˙d , q¨d and Eqn(22) can be rewritten as x ˜˙ 3 = x˙ 3 − α(ˆ ˙ z1 , zˆ2 , zˆ3 , x1 , qd , q˙d , q¨d , zˆ˙ 1 , zˆ˙ 2 , zˆ˙ 3 , x2 ) ˇ −1 u =L +Ωm (ˆ z1 , zˆ2 , zˆ3 , x1 , qd , q˙d , q¨d , zˆ˙ 3 , x2 , x3 , u)
(23)
ˆm (. | Θm ) to where, we employ another fuzzy adaptive disturbance observer Ω estimate the unknown term Ωm (.) and choose the controller u = KP 2 L˜ x3 + ˆ −LΩm (ˆ z1 , zˆ2 , zˆ3 , x1 , qd , q˙d , q¨d , zˆ˙ 3 , zˆ2 , zˆ3 , uf )
(24)
46
Z. Chu and J. Cui
and we employ the same idea of Ω(.) in Eqn(11).The stability proof relies on choosing the Lyapunov function as [10] 1 M r V = [rT x ˜T3 ] + L x ˜3 2 T ˜ ˜ 1 Θ Λ1 Θ tr (25) ˜T ˜m Λ2 Θ Θ 2 m and fuzzy adaptive tunning algorithms are r Θ˙ = Λ1 yrT − Kw Λ1 Θ x ˜3 r Θ˙ m = Λ2 ym x ˜T3 − Kw Λ2 Θm x ˜3
4
(26)
Simulation Study
To validate the proposed scheme, simulation of motion along desired joint trajectories were performed with a planar two-link free-floating space manipulator driven by two dc motors(Fig.1). m0 is the base mass(Kg), m1 , m2 are the link masses(Kg), 0 , 1 , 2 are the link lengths(m). The nominal values of the parameters are 0 = 1, 1 = 2, 2 = 2, and m0 = 400, m1 = 4.0, m2 = 8.0, the nominal parameters of joint motors for space manipulator are L = diag(0.04, 0.04), R = diag(70, 70), KB = diag(0.145, 0.145), KT = diag(1.38, 1.38). However, the actual parameters of the manipulator and its driving joint motors are assumed to be m0 = 300, m1 = 6.0, m2 = 6.0, and L = diag(0.05, 0.05), the other parameters are assumed to be same as the nominal parameters. parameters of the suggested robust tracking controller is following as.Kp = diag(5.0, 5.0); Kv = diag(0.5, 0.5); KP = diag(300, 300); KD = diag(3.35, 3.35) GL (s) =
8000000 s3 +400s2 +80000s+8000000
for the fuzzy systems in the FADO and the membership functions for each variable are selected as follows: 1 μA1 = (5(x+0.8×a)) 1+e x+0.6×a 2 μ 2 = e−( w ) A
x+0.4×a 2 μA3 = e−( w ) x+0.2×a 2 μ 4 = e−( w )
A
x
2
μA5 = e−( w ) x−0.2×a 2 μ 6 = e−( w ) A
x−0.4×a 2 μA7 = e−( w ) x−0.6×a 2 μ 8 = e−( w )
A
μA9 =
1 1 + e(−5(x−0.8×a))
(27)
47
2 O P 2
2
2 O P P < O
T T
;
Fuzzy Disturbance-Observer Based Control
Fig. 1. free-floating space manipulator system
The values of a and w are listed in Table I for each variable. ˆ ˆ m (.) use the rule In this simulation, the adaptive fuzzy systems Ω(.) and Ω base with the fuzzy rules of the form : ˆ is ω i (i = 1, · · · , 9) if x is Ai , then Ω(.) where the premise variable x could be one of the variables in Table I. That is, we use only one variable for each rule to reduce the number of fuzzy rules ˆ in the fuzzy adaptive disturbance observer. The adaptive fuzzy system Ω(.) T T T T uses (ˆ z1 , zˆ2 , zˆ3 ) as input arguments and the rule base is composed of fifty four ˆm (.) (9 × 3 × 2) fuzzy rules whereas the adaptive fuzzy system Ω T T T T ˙T T T employs (ˆ z1 , zˆ2 , zˆ3 , x1 , zˆ 3 , uf ) as input arguments and the rule base is composed of hundred and eight (9 × 6 × 2) fuzzy rules. The parameters of the tuning laws for adaptive fuzzy systems are selected as Λ1 = diag(130, 130), Λ2 = diag(130, 130), kw = 0.5, σ1 = 20. To make the problem more difficult, the external disturbance d = (d1 , d2 )T are considered. d1 = 10.0sin(πt/4)
d2 = 10.0cos(πt/4)
Fig. (2) and (3) show ζ of the FADO and the output of the nonlinear observer, respectively.Fig. (4) shows shows the good trajectory following performance of the suggested method. they demonstrates the capability of overcoming not only the model uncertainties but also the external disturbances. Table 1. Premise parameter for each variable used in FADO Var x zˆ11 zˆ21 zˆ31 qˆ1 ˆ˙ zˆ31 u1
a 1.5 1 15.5 1.5
w Var x a 1.5 zˆ12 1.5 1 zˆ22 1 15.5 zˆ32 6.5 1.5 qˆ2 1.5 ˆ˙ 10000 10000 zˆ32 10000 15.5 15.5 u2 6.5
w 1.5 1 6.5 1.5 10000 6.5
48
Z. Chu and J. Cui
Fig. 2. FADO estimating error(d = (d1 , d2 ))
(a) q˙1
(b) q˙2
Fig. 3. output of the nonlinear observer(d = (d1 , d2 ))
(a) q1
(b) q2
Fig. 4. comparison of actual joint trajectory with desired trajectory (d = (d1 , d2 ))
5
Conclusion
In this paper, a new robust tracking control method for an electrically driven space manipulator has been suggested. The developed system is composed of a nonlinear observer, a fuzzy adaptive disturbance observer, a manipulator
Fuzzy Disturbance-Observer Based Control
49
controller and a motor controller. Fuzzy adaptive disturbance observer and direct output feedback method were employed to control the interaction between the manipulator dynamics and the motor dynamics such that the joints of the manipulator track the given trajectory in the presence of the uncertainties and the disturbances. Finally, a computer simulation was provided to demonstrate the validity of the suggested method.
Acknowledgment This work was jointly supported by the National Natural Science Foundation of China (Grant No. 50905006,90716021)and the Research Fund for the Doctoral Program of Higher Education (Grant No. 20091102120027).
References 1. Xu, Y., Shum, H., Wade, T., Lee, J.: Parameterization control of space and adaptive robot systems. IEEE Trans. on Aerospace and Electronics Systems 30(2), 435–451 (1994) 2. Parlaktuna, O., Ozkan, M.: Adaptive control of free-floating space manipulators using dynamically equivalent manipulator model. Robotics and Autonomous Systems (46), 185–193 (2004) 3. Shin, J.H., Lee, J.J.: Dynamical control with adaptive identification for free-flying space robots in joint space. Robotica (12), 541–551 (1994) 4. Wang, C.H., Feng, B.M., Ma, G.C., Ma, C.: Robust tracking control of space robots using fuzzy neural network. In: Proc. of IEEE Inter. Symposium on Computational Intelligence in Robot. and Auto., pp. 181–185 (2005) 5. Hwang, J.P., Kim, E.: Robust tracking control of an electrically driven robot:adaptive fuzzy logic approach. IEEE Trans. on Fuzzy Systems (2006) 6. Zhu, W.H., De Schutter, J.: Adaptive control of electrically driven space robots based on virtual decomposition. Journal of Guidance, Control, and Dynamics 22(2), 329–339 (1999) 7. Becedas, J., Trapero, J.R., Feliu, V., et al.: Adaptive controller for single-link flexible manipulators based on algebraic identification and generalized proportional integral control. IEEE Trans. on Systems, Man, and Cybernetics Part B. 39(3), 735–775 (2009) 8. Chu, Z.Y., Sun, F.C., Wu, H.: Motion control of electrical driven free-floating space manipulator. International Journal of Information and Systems Sciences 3(3), 526– 540 (2007) 9. Kim, E., Park, C.: Fuzzy disturbance observer approach to robust tracking control of nonlinear sampled systems with the guaranteed suboptimal H∞ performance. IEEE Trans. on Systems, Man, and Cybernetics-Part.B. 34(3), 1574–1581 (2004) 10. Kwan, C., Lewis, F.L.: Robust backstepping control of nonlinear systems using neural networks. IEEE Trans. on Systems, Man and Cybernetics-Part A 30(6), 753–766 (2000)
Dynamic Neural Network Control for Voice Coil Motor with Hysteresis Behavior Xuanju Dang, Fengjin Cao, and Zhanjun Wang School of Electronic Engineering and Automation, Guilin University of Electronic Technology, Guilin 541004, PR China
Abstract. Due to the hysteresis behavior of VCM (Voice Coil Motor), a neural network adaptive inverse control is proposed. In the control, a dynamic neural network is improved by adding low pass filter to BP (Back Propagation) to increase weights’ dynamic information, and the neural network control with the dynamic internal and external features is used to improve the dynamic behavior of the control system. The simulation results show that the new control approach for VCM is effective. Keywords: Voice coil motor; Hysteresis model; Neural network; Adaptive inverse control.
1 Introduction VCM[1] is a special direct drive motor with many features such as the simple structure, small volume, high speed, high acceleration and high response. Recently, VCM is widely used to the semiconductor encapsulation, hard disk drive, laser record, and prices positioning system. Basis on the experiment [2,3], but, the bigger repeat error causing by the hysteresis behavior of VCM does not meet requirements for the high speed and high precise. So, the control of the hysteresis behavior for VCM is conducted in this paper. The control for the system with hysteresis behavior is challenge [4], because of two aspects: 1) it’s multi-mapping characteristic, namely, the same input value leads to the multi-output value for system which differs from the multivalve relationship for general dynamic system, 2) it’s memory characteristic, namely, the output of hysteresis system is not only the function of the input value in current time, but also of the input value in previous time. Because of the hysteresis characteristic for the VCM, general neural networks, for example, the BP neural network and RBF (Radio Basis Function) neural networks, are not adaptive to the control VCM with hysteresis behavior. An improved nonlinear dynamic neural network is applied to control VCM in order to enhance the precise of control system. The control method is improved by two aspects: 1) the weight value update of the neural networks, in which the filtered weight value is utilize to update the weight process of neural network to increase the dynamic information by the previous weight value; 2) in addition to the inner feedback, the external feedback controller consisting of the previously output valve for VCM in the inverse controller in order to enhance the dynamic behavior for inverse controller. D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 50–57, 2011. © Springer-Verlag Berlin Heidelberg 2011
Dynamic Neural Network Control for Voice Coil Motor with Hysteresis Behavior
51
2 Dynamic Neural Network Basis on Dynamic Activation Function 2.1 Dynamic Neuron The general neuron is only related to the activation in current time, namely, the output of the neuron is the weight summing to the input signals in current time. Dynamic neuron [5] is related not only to the input signals in current time, but also to the input signals in previous time. The structure is described as:
Fig. 1. Dynamic neuron
In Fig.1, x is the input vector of neuron, w is the weight vector, net is the input to neuron, z is the output of neuron and Δ is the first order delay. The mathematics model of dynamic neuron is given as follows:
⎧v ( k ) = x ( k ) ⋅ w ( k ) ⎪ ⎪ ⎨net ( k ) = k1 ⋅ v ( k ) + k 2 ⋅ net ( k − 1) + k 3 ⋅ v ( k − 1) ⎪ ⎪⎩ z ( k ) = f net ( k )
(
)
(1)
Because z(k) is related to net(k) and net(k-1) which are related to x(k-1) and x(k), z(k) is of the dynamic behavior. 2.2 Improved Learn Method for Dynamic Neural Network If neuron is the realization of the dynamic nonlinear mapping, the neural network consisting of the dynamic neurons is a dynamic neural network. The single output DAFNN[5] (Dynamic Activation Function Neuron Network) is shown in Fig.2 in which the hidden layer is a dynamic neuron. For the multi-output of output layer, it is extended by single output layer. In Fig.2, v is the output of neuron, w1and w2 are the weights of the input layer and output layer, respectively. The cost function: 2 1 J (k ) = d (k ) − y(k ) (2) 2
(
)
52
X. Dang, F. Cao, and Z. Wang
Fig. 2. Dynamic neural network
where d is the output object, y denotes the output of neural network, u represents the input to neural network. The output of dynamic neural network is written as: ⎧v k == u k ⋅ w1 k ( ) ( ) ⎪ ( ) ⎪net ( k ) = k1 ⋅ v ( k ) + k 2 ⋅ net ( k ⎪ ⎪ ⎨ ⎛ net ( k ) −net ( k ) ⎞ −e ⎟⎟ ⎪ z ( k ) = ⎜⎜ e ⎝ ⎠ ⎪ ⎪ 2 ⎪⎩ y ( k ) = z ( k ) ⋅ w ( k )
− 1) + k 3 ⋅ v ( k − 1) ⎛ net ( k ) − net ( k ) ⎞ +e ⎜⎜ e ⎟⎟ ⎝ ⎠
(3)
The improved weight update for neural network by the filter phase is:
⎧Δw2 k = β ⋅ Δw2 k − 1 − η ⋅ 1 − β ⋅ e k ⋅ z k ) 1 ( ) ( ) i( ) ⎪ i ( ) i ( ⎨ ⎪ w2 ( k + 1) = w2 ( k ) + Δw2 ( k ) i i ⎩ i ⎧ dnet j ( k ) dnet j ( k − 1) ⎪ = k1 j ( k ) + k 2 j ( k ) ⋅ dv j ( k − 1) ⎪ dv j ( k ) ⎪ ⎪ dJ ( k ) dnet j ( k ) ⎪ = −e ( k ) ⋅ w2j ( k ) ⋅ ⎛⎜ 1 − z 2 j ( k ) ⎞⎟ ⋅ ⋅ ui ( k ) ⎝ ⎠ dv j ( k ) ⎪ dw1 ( k ) i , j ⎨ ⎪ ⎪Δw1 ( k ) = β ⋅ Δw1 ( k − 1) − η ⋅ (1 − β ) ⋅ dJ ( k ) 2 i, j ⎪ i, j dw1i, j ( k ) ⎪ ⎪ ⎪ w1 ( k + 1) = w1 ( k ) + Δw1 ( k ) i, j i, j ⎩ i, j
(4)
(5)
Dynamic Neural Network Control for Voice Coil Motor with Hysteresis Behavior
53
The weights update for k1, k2 and k3 are obtained: ⎧ 2 2 ⎛ ⎞ ⎪k1i ( k + 1) = α ⋅ k1i ( k ) + (1 − α ) ⋅η1 ⋅ e ( k ) ⋅ wi ( k ) ⋅ ⎜⎝1 − z i ( k ) ⎟⎠ ⋅ vi ( k ) ⎪ ⎪ 2 2 ⎛ ⎞ ⎨k 2i ( k + 1) = α ⋅ k 2i ( k ) + (1 − α ) ⋅η2 ⋅ e( k ) ⋅ wi ( k ) ⋅ ⎜1 − z i ( k ) ⎟ ⋅ neti ( k − 1) ⎝ ⎠ ⎪ ⎪ 2 2 ⎛ ⎞ ⎪k3i ( k + 1) = α ⋅ k3i ( k ) + (1 − α ) ⋅η3 ⋅ e ( k ) ⋅ wi ( k ) ⋅ ⎜1 − z i ( k ) ⎟ ⋅ vi ( k − 1) ⎝ ⎠ ⎩
(6)
where η 1 , η 2 and η 3 present learn rates, α and β are momentum coefficients. In the improved DAFNN, the weight is filtered to increase the dynamic information, and with enhancing the dynamic neural network learn speed.
3 Hysteresis Model for VCM Duhem model with the dynamic hysteresis behavior is a representational hysteresis model. So, Duhem model is used to describe the hysteresis characteristic of VCM in this paper. Duhem model is researched by Coleman and Hodgdon[6], the mathematics model of hysteresis behavior basis on the differential equation is described as: dw dv dv =α [ f (v ) − w ] + g (v ) dt dt dt
(7)
where α, w and v represent constant, input to model and output of model, respectively. f and g are defined as 1) f (·) is piecewise smooth , monotonic increasing, odd, and bounded. •
2) g (·) is piecewise continue ,even, with •
∞ •
lim v →∞ g (v ) = lim v →∞ f (v )
v > 0, f (v) > g (v) > αα α v ∫ f (ζ ) − g (ζ ) e−α v dζ v
Basis on the reference [6], Duhem model is used to the hysteresis of VCM, in which α = 1 , f(v)=3.1635v, g(v)=0.345.
4 Design of Controller 4.1 Improved Dynamic Neural Network Adaptive Inverse Control To meet demands for the high speed and high precision control of the VCM, direct reference model adaptive inverse control for VCM is proposed, Fig.3 shows the structure of the adaptive inverse control in which the external feedback is added for enhancing the dynamic behavior of control system. Jacobin information of the controlled object is the dynamic gain of VCM which is ∂y
used to the learn program for the adaptive inverse control. Jacobin ∂ u is replaced
54
X. Dang, F. Cao, and Z. Wang
∑
∑
∑
∑
∑
∑
Fig. 3. Direct reference model adaptive inverse control
bys g n [ uy (( kk )) −− uy (( kk −−11 )) ] , because the gain value is indirectly adjust by weight value in the learn program[7]. 4.2 Simulation Result for Adaptive Inverse Control To validate the proposed control method, the improved learn technique is applied to the neural network adaptive control for VCM. In VCM model or Duhem model, the desired signal is a sine wave with the attenuation amplitude. In the neural network adaptive inverse control, the number of net hidden layer is thirty, learn rates η 1 0.3 η 1 =0.3 and η 1 =0.8 respectively. The momentum coefficients β 0.1 and α 0.7, respectively.
,
,
=
1.5 Desired output Real output 1
Output
, =
0.5
0
-0.5
-1
0
5
10
15
20
25
Sample time
Fig. 4. Response of output trace
30
35
40
=
Dynamic Neural Network Control for Voice Coil Motor with Hysteresis Behavior
55
0.3
0.2
Output error
0.1
0
-0.1
-0.2
-0.3
-0.4
0
5
10
15
20
25
30
35
40
Sample time
Fig. 5. Output error
Fig.4 and Fig.5 show the trace results of the output signal and trace error, respectively. The experiment results show that the proposed control method is able to effectively trace the desired signal in which MSE (Mean Square Error) is 0.0019. In order to compare with DAFNN[5], DAFNN is used to VCM control in which the learn rates and momentum coefficients are same with the proposed method. For DAFNN control, the curves of input and output signal are showed in Fig.6. The curve of trace error is as in Fig.7. 1.5 Desired output Real output
Output
1
0.5
0
-0.5
-1
0
5
10
15
20
25
Sample time
Fig. 6. Response of output trace
30
35
40
56
X. Dang, F. Cao, and Z. Wang 0.8
0.6
Output error
0.4
0.2
0
-0.2
-0.4
-0.6
0
5
10
15
20
25
30
35
40
Sample time
Fig. 7. Output error
In DAFNN control, MSE is 0.0232, which is ten times more than the proposed control method.
5 Conclusion In order to meet demand for VCM control, a nonlinear dynamic neural network adaptive inverse control is presented, in which the weight value of neural network is filtered and the external feedback is added for raising the dynamic behavior of the control system. Simulation results show that the proposed method can efficiently control VCM with the high precision. Acknowledgments. This work has been supported by the National Nature Science Foundation Project (60964001) and Science Foundation Project of Guangxi Province (0991019Z) and Information and Communication Technology Key Laboratory Foundation Project of Guangxi Province (10902).
References 1. Chen, S.X., Li, Q., Man, Y., et al.: A high-bandwidth moving-magnet actuator for hard disk drives. IEEE Transaction Magnetism 33(5), 2632–2634 (1997) 2. XuanZe, W., Hong, Y., Bo, T.: Cacteristic Text of a Volume Coil Motor Based on Line Cutting. Three Gorger Univetctiy 26(4), 263–265 (2004) 3. Xiaomei, F., Dawei, Z., Xingyu, Z., Weitao, D.: Design Method of High Speed Precision Positioning System Based on Voice Coil Actuator. China Mechanical Engineering 16(16), 1414–1418 (2005)
Dynamic Neural Network Control for Voice Coil Motor with Hysteresis Behavior
57
4. Wang, Y., Su, C.-Y., Hong, H.: Model Reference Control including Adaptive Inverse Hysteresis for Systems with Unknown Input Hysteresis. In: Proceedings of the 2007 IEEE International Conference on Networking, Sensing and Control, London, UK, pp. 70–75 (2007) 5. Ming, L., Hui-ying, L., Han-sheng, Y., Cheng-wu, Y.: Nonlinear Adaptive Inverse Control Using New Dynamic Neural Networks. Journal System Simulation 19(17), 4021–4024 (2007) 6. Du, J., Feng, Y., Su, C.-Y., Hu, Y.-M.: On the Robust Control of Systems Preceded by Coleman-Hodgdon Hysteresis. In: 2009 IEEE International Conference on Control and Automation, Christchurch, New Zealand, pp. 9–11 (2009) 7. Yan-Yang, L., Shuang, C., Hong-Wei, L.: PID-like neural network nonlinear adaptive control. In: 29th Chinese Control Conference, CCC 2010, Beijing, China, pp. 2144–2148 (2010)
A New Model Reference Adaptive Control of PMSM Using Neural Network Generalized Inverse Guohai Liu, Beibei Dong, Lingling Chen, and Wenxiang Zhao School of Electrical and Information Engineering, Jiangsu University, Zhenjiang, 212013, China
[email protected],
[email protected]
Abstract. A new strategy of model reference adaptive control (MRAC) system based on neural network generalized inverse (NNGI) algorithm, termed as MRAC-NNGI system, is proposed for the current and speed regulations of permanent magnet synchronous motor (PMSM) drives. Due to the fact that PMSM is a multivariable nonlinear system with strong couplings, this paper gives an analysis of generalized reversibility combined with NN. The developed scheme of NNGI is transformed into a pseudo-linear system from connecting the motor plant and achieved the purposes of decoupling and linearization with Levenberg-Marquardt algorithm off-line. Therefore, an adjustable gain of closed-loop adaptive controller is developed by introducing MRAC into this kind of pseudo-linear system. The self-adaptive law is given for the gain regulation of linear system. Comparison of simulation results from others widely used algorithms confirm that it incorporates the merits of model-free learning, highprecision tracking and strong anti-interference capability. Keywords: Current control, speed control, neural network, generalized inverse, MRAC, PMSM.
1 Introduction For the development of modern permanent magnet material and microprocessor technologies, permanent magnet synchronous motor (PMSM) has many excellent features such as small volume, low cost, simple mechanism, high flux density and high efficiency [1]. Due to these advantages, PMSM has gained widespread acceptance in research and application fields of modern aviation, military, industrial automation, intelligent robotics and chemical industry, etc. [2]. However, it can be inferred from a dynamic model that PMSM is a nonlinear and high-order system with interaxis couplings, which may cause poor static and dynamic performances. Moreover, the operation is strongly affected by the rotor magnetic saliency, saturation, and armature reaction effects [3]. So, it is absolutely essential to linearize the system and decouple the operation of current and speed. For advancement of power electronics and computer technology, several schemes have been reported to enhance the control performance of PMSM drives against some of the aforementioned problems. For stabile control purpose, a design method of robust speed servo system with PMSM was presented by applying robust control principle, but there is a bottleneck in D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 58–67, 2011. © Springer-Verlag Berlin Heidelberg 2011
A New Model Reference Adaptive Control of PMSM Using NNGI
59
acquiring the dynamic decoupling and linearization based on id=0 strategy [4]. Especially, the fact that the nominal implementation is faced with the occurrence of some uncertainties and perturbation in motor parameters complicates the design of a practical high-performance control system and is always a model-based controller as well as the same conditions in paper [5]. Neural networks (NN), a multidisciplinary technology, can approach a nonlinear complex function or process with arbitrary precision with two layers using a backpropagation (BP) algorithm. [6]. However, to be an identifier only in many papers [7], [8], [9], the application of NN for controlling motor is restrained considerably in various fields. Subsequently, a common BP algorithm is easy to fall into local minimum and has a relatively low convergence speed during training. To overcome these problems, another kind of BP algorithm, which named as Levenberg-Marquardt algorithm, will be applied. It is the combination of both decreasing gradient and Newton algorithm and has the advantages of improving convergence rate. Neural network generalized inverse (NNGI) system is a method based on the development of NN inverse and becoming popular in linearization and decoupling for controlling of a general nonlinear dynamic continuous system [10] [11]. This paper will propose a strategy of model reference adaptive control (MRAC) system based on NNGI method for the current and speed regulations of PMSM drives. The static NN is responsible for approximating the nonlinear mapping described by the analytical expression of the motor and the linear components are used to represent the dynamics of the GI. Combining NNGI and motor plant, the original system can be transformed into a pseudo-linear composite system, which can be completely equivalent to two single input single output (SISO) linear-subsystems. Compared to traditional linear controller designs, this paper will propose the MRAC method to the pseudo-linear regulators based on Lyapunov stability theory according to the characteristics of these linear-subsystems. All of the afore-mentioned characteristics make the proposed scheme an ideal choice for implementation in a real-time PMSM drives.
2 Generalized Inverse System of PMSM By taking the rotor coordinates (d-q axes) of a motor as reference coordinates, the current model plant of a PMSM can be described in a synchronous state frame as
Lq ⎡ Rs ⎤ 1 x2 x3 + u1 ⎢ − x1 + ⎥ L L L d d d ⎢ ⎥ ⎡ x1 ⎤ ⎢ ψ L R 1 ⎥ f x = f ( x, u ) = ⎢⎢ x2 ⎥⎥ = ⎢ − d x1 x3 − s x2 − x3 + u2 ⎥ L Lq Lq Lq ⎥ ⎢⎣ x3 ⎥⎦ ⎢⎢ q ⎥ ⎢σ x x + η x − n p T ⎥ 2 L ⎢⎣ 1 2 ⎥⎦ J
(1)
Where, u=[u1, u2]T =[ud, uq]T is the system state control variable, ud and uq are the stator voltages in d- and q-axis respectively; x=[x1, x2, x3]T =[id, iq, ωr]T is the system state variable, id and iq are the stator currents in d- and q-axis, ωr is the angular velocity; Ld and Lq are the stator inductances in d- and q-axis; np is the number of pole
60
G. Liu et al.
pairs; Rs is the stator resistance; ψf is the rotor flux linkage; J is the moment of inertia; TL is the load torque; y=[y1, y2]T =[id, ωr]T=[x1, x3]T are the system state outputs; σ=3np2(Ld-Lq)/(2J), η=3np2ψf/(2J). To obtain the GI model, differentiation is applied to the outputs y until there are inputs u1 or u2 appear. Firstly, the derivative of output y1 is
y1 = x1 = id = − x1 Rs Ld + x2 x3 Lq Ld + u1 Ld
(2)
Obviously, in y1 , u1 is included. So, the matrix rank t1 is
t1 = rank (∂Y1 ∂u T ) = 1
(3)
Secondly, the derivative of output y2 is y2 = x3 = wr = σ x1 x2 + η x2 − n p J TL
y2 = x3 = wr = −σ Rs x1 x2 (1 Ld − 1 Lq ) + σ x3 ( x12 Lq Ld − x2 2 Ld Lq ) − η x2 Rs Lq − x1 x3 (σψ f + η Ld ) Lq − η x3 ψ f Lq + x2 u1 σ Ld + x1u2 σ Lq + u2 η Lq
(4)
(5)
Obviously, in y2 , u is included. Then Jacobian matrix is 0 ⎤ ⎡ ∂y ∂u ∂y1 ∂u2 ⎤ ⎡ 1 Ld =⎢ J 2 ( x, u ) = ∂Y2 ∂u T = ⎢ 1 1 ⎥ ⎥ ⎣ ∂y2 ∂u1 ∂y2 ∂u2 ⎦ ⎣σ x2 Ld (σ x1 + η ) Lq ⎦
(6)
Where, Y2 = [Y1 , y2 ]T , Y1 = y1 . Det ( J 2 ( x, u )) = (σ x1 + η ) ( Ld Lq ) ≠ 0 , the matrix rank t2 is t2 = rank (∂Y2 ∂u T ) = 2
(7)
When x1≠-η/σ, J2(x, u) is nonsingular. The system relative-order is α=(α1, α2)T=(1, 2)T, α1+α2=3=n (system order), so the GI of system is existent. The system original-order is ne= ne1+ne2=1+2=3=n=α. Therefore, the GI system is described as u = φˆ[( y2 , y2 , y1 ), vˆ]
(8)
Where, a10, a11, a20, a21, a22 are the coefficients of pole-zero assignment of GI system; vˆ = [vˆ1 , vˆ2 ]T are inputs of GI system, vˆ1 = a11 y1 + a10 y1 , vˆ2 = a22 y2 + a21 y2 + a20 y2 .
3 Design of MRAC-NNGI Controller 3.1 Design of NNGI
From what analyzed in 2.2, the pseudo-linear composite system is shown in Fig.1. The transfer function can be represented as
G ( s ) = diag (G11 , G22 ) = diag (1 (a11 s + a10 ) ,1 (a22 s 2 + a21s + a20 )) Where, a10=1, a11=0.05, a20=1, a21=0.4, a22=0.04.
(9)
A New Model Reference Adaptive Control of PMSM Using NNGI
νˆ1
G11 ( s )
s −1
νˆ2
G22 ( s ) i s
u1
y1
u2
y2
νˆ1 νˆ2
1 a1 1 s + a 1 0
1 a 22 s 2 + a 21 s + a 20
61
y1 y2
Fig. 1. Diagram of pseudo-linear system
In Fig. 1, the pseudo-linear system can be completely equivalent to two SISO linear-subsystems. The poles can be placed in the left-half complex plane for an ideal open-loop frequency characteristic. One of SISO is a first-order current unit and the other is a second-order speed unit. The second-order unit can be described as G22=1/(T2s2+2ςTs+1)=1/(a22s2+a21s+1). In order to obtain the excellent performance, the damping coefficient of subsystem select ς=0.707, which termed as optimal damping, and less than 4.3% overshoot and shortest regulation tine. When a21=0.4, a22 should be 0.08. However, a22=0.04 has the lower overshoot. In this scheme, the Levenberg-Marquardt algorithm applied to approximate the nonlinear mapping φˆ(i) . The NNGI system, which captures the dynamic behavior of a system, requires fewer training simples, converges quickly and performs good ability of global optimal solution. Its controller design is simplified own to the linearization model [9], [10]. 3.2 Design of MRAC Controller
The MRAC method can offer simpler implementation and require less computational effort. So it is a popular strategy used for motor control [12]. It makes it possible to design a completely decoupled control structure in the synchronous frame as unknown disturbances based on Lyapunov stability theory [13]. Two adjustable gains of closed-loop adaptive controllers are formed according to characteristics of these aforementioned linear-subsystems. The control diagram is shown in Fig.2. The Gm1(s) and Gm2(s) are the expectation reference models, Gm1(s)=G11(s), Gm2(s)=G22(s); C1 and C2 are the current and speed controller respectively; D(s) is the disturbances of PMSM from internal or external; e1 and e2 are the error for adaptive control law; Kp1 and Kp2 are the original gains of two linear subsystems; Kc1 and Kc2 are the adaptive gains; The desired system performances are given by the reference models with reference inputs [id*, ωr*]T and reference outputs [ym1, ym2]T. In other words, the pseudo-linear composite system can be completely equivalent to two SISO linear-subsystems and be controlled separately. The two adaptive control laws are inferred by Lyapunov stability theory, which will force the system to follow the reference model performance when the controlled variable deviates from the response of the reference model.
62
G. Liu et al.
id*
ν1
id
ωr*
ν2
ωr
Kc1 = e1i*d (λ1a11 K p1 ) Kc2 = X T PBωr* (λ2 K p 2 )
Fig. 2. Control diagram of the MRAC-NNGI proposed in paper
3.2.1 Adaptive Law of Current Subsystem In application of the MRAC method to first-order current subsystem, the reference model connects the pseudo-linear subsystem in parallel. So the closed-loop transfer function is
Φ1 ( s ) = Gm1 ( s ) − G11 ( s ) = K1 (a11s + a10 ) = K1 (0.5s + 1)
(10)
Where, K1=Km1-Kc1*Kp1. The time-varying model is a11e1 + e1 = K1id , so e1 = (−e1 + K1id ) a11 . The Lyapunov function with coefficient λ1 can be described as V1 (e1 ) = e12 + λ1 K12, λ1 > 0
(11)
According to the Lyapunov stability theory, the following function can be derived. e1 (t ) = 0 ⎧lim ⎪⎪ t →∞ ⎨V1 (e1 ) positive definite ⎪ ⎪⎩V1 (e1 ) negative definite
(12)
For above reasons, the propose adaptive law of current regulation is K c1 =
e1id* e1id* = λ1a11 K p1 0.5λ1 K p1
3.2.1 Adaptive Law of Speed Subsystem For the second-order speed subsystem, the closed-loop transfer function is
(13)
A New Model Reference Adaptive Control of PMSM Using NNGI
63
Φ 2 ( s ) = Gm 2 ( s ) − G22 ( s ) = K 2 / (a22 s 2 + a21 s + a20 ) = K 2 / (0.04s 2 + 0.4s + 1)
(14)
Where, K2=Km2-Kc2*Kp2. The Lyapunov function with coefficient λ2 can be described as V2 ( X ) = X T PX + λ2 K 2, λ2 > 0
(15)
In order to obtain the symmetric matrix P, Lyapunov method is applied to define the coefficient matrixes A and B as follows ⎡ 0 A=⎢ ⎣ − 1 a22
1 − a21
⎤ ⎡0⎤ ,B = ⎢ ⎥ ⎥ a22 ⎦ ⎣1 ⎦
Where, P can be calculated by the function PA+ATP=-I (I is an identity matrix). So the propose adaptive Law of speed regulation is Kc 2 =
ωr* ω* ⎡ 1.5 0.02 ⎤ ⎡0⎤ ωr* X T PB = r [ e2 e2 ] ⎢ = (0.02e2 + 0.052e2 ) ⎥⎢ ⎥ λ2 K p 2 λ2 K p 2 ⎣0.02 0.052⎦ ⎣1⎦ λ2 K p 2
(16)
It is clear from the above discussion that both of the two adaptive laws of MRAC controller design are much simpler and requires less effort to synthesize, in which the reference models selection and stability play the very impotent roles.
4 Verifications To evaluate the performance of the proposed control scheme, the simulation figures compared with other methods are shown as follows. In the simulation, λ1=0.5, λ2=1, Kp1=Kp2=1, so Km1=Km2=1, and the initial value of adjustable gains Kc1=Kc2=1, J=0.0021kg·m2, Ld=16.45mh, Lq=16.60mh, np=3. Fig.3 shows the square wave response of rotor speed when the current in d-axes (id) is set to 5A. The maximum absolute value of speed is set to 80 rad/s and the minimum is 20 rad/s. It can be seen from the (b) and (c) that the actual speed response has the smaller tracking error and low overshoot compared to other methods. (a) and (b) show the excellent decoupling performance. Fig.4 shows the same nice operation performance of current tracking feature compared to the Fig.6 and 9 when the speed is step to 80 rad/s. The maximum value is set to 6A and the minimum is 0A. It can be inferred from Fig.5 that the anti-interference capability of the proposed algorithm is obviously stronger than those of the other schemes. In this simulation, toque load is set to from 3N step to 1N and then step to 3N, what about 30% to 10% to 30% of rated load. It almost has the complete anti-interference capability of speed response. In the same operation conditions, the current interference response is less than 3.7% and the others over 12%. (The current interference responses of Fig.6, 7, 8, 9, 10 are 50%, 18%, 12%, 18% and 25% respectively.)
64
G. Liu et al.
6
4
Id (A)
Id (A)
6
2
4 2 0
20 (a)
30
40
0
80 60
Id (A)
20 0
20 (a)
30
40
10
20 (b)
30
40
10
20 (c) Time (s)
30
40
4 2 0
10
20 (b)
30
40
80 60 40 20 0 0
10
6
40
0
Wr (rad/s)
10
0
Wr (rad/s)
Wr (rad/s)
0 0
10
20 (c) Time (s)
30
80 60 40 20 0
40
Fig. 3. Decoupling and tracking response of MRAC-NNGI (wr steps). (a) Given id* & Response id. (b) Given wr* & Response wr. (c) Reference response ym2.
0
Fig. 4. Decoupling and tracking response of MRAC-NNGI (id steps). (a) Given id* & Response id. (b) Reference response ym1. (c) Given wr* & Response wr.
8 10 Id (A)
Id (A)
6 4
5
2 0 0
10
20 (a)
30
0 0
40
Wr (rad/s)
Wr (rad/s)
20 (a)
30
40
30
40
100
80 60 40 20 0 0
10
80 60 40 20
10
20 (b) Time (s)
30
40
Fig. 5. Anti-interference response of MRACNNGI
0 0
10
20 (b) Time (s)
Fig. 6. Response of IMC-NNGI
A New Model Reference Adaptive Control of PMSM Using NNGI
65
8
8 Id (A)
Id (A)
6
6 4 2
2 0
0 0
2
4
(a)
6
8
-2 0
10
2
4
2
4
(a)
6
8
10
6
8
10
100
100
80 Wr (rad/s)
80
Wr (rad/s)
4
60 40
60 40 20
20 0 0
0
2
4
(b) Time (s)
6
8
10
0
Fig. 7. Response of PID-NNGI
(b) Time (s)
Fig. 8. Response of PID-NNI 20
10
15 Id (A)
Id (A)
8 6 4
5
2 0 0
10
2
4
6
8
0 0
10
2
4
2
4
(a) 120 80
Wr (rad/s)
Wr (rad/s)
8
10
6 (b) Time (s)
8
10
100
100 60 40 20
80 60 40 20
0 0
6
(a)
2
4
(b) Time (s)
6
8
Fig. 9. Response of PID-NN(RBF)GI
10
0 0
Fig. 10. Response of PID
Where, in Figs.5-10, (a) represents given id* & response id, (b) represents given wr* & response wr. Fig.3, 4 and 5 are the current and speed responses in MRAC-NNGI method; Figs.6,7,8,9,10 are the responses in the methods of Internal Model Control GGNI (IMC-NNGI), PID-NNGI, PID-NNI (NN inverse), PID-NNGI with RBF algorithm and PID respectively.
5 Conclusion This paper has proposed an MRAC-NNGI scheme of a PMSM which includes a current regulation and a speed regulation. From the analysis and simulation results, we can conclude that the NNGI with Levenberg-Marquardt algorithm is simple and suitable for implementation. In addition, we can properly place the poles of the pseudo-linear composite system so as to linearize, decouple and reduce the order of the original
66
G. Liu et al.
system. In other words, the original nonlinear system has been transformed into a pseudo-linear composite system, which includes two SISO pseudo-linear subsystems. Last but not least, based on the aforesaid hypothesis of linearization, it considerably simplifies the design of an additional closed-loop linear controller. Therefore, with simpler design, less computational effort and advance stability, MRAC method is introduced into the algorithm. Simulation results have shown that the implementation is independent of the accurate mathematical model of original system. Also, the proposed control strategy provides an effective and achievable approach to realizing the linearization and decoupling control of nonlinear MIMO systems. Further study will be shown in the next paper. Acknowledgments. This work was supported in part by grants (Project No. 51077066, 608074014 and 50907031) from the National Natural Science Foundation of China and a grant (Project No. BK2010327) from the Natural Science Foundation of Jiangsu Province.
References 1. Mohamed, Y.A.-R.I., El-Saadany, E.F.: A Current Control Scheme with an Adaptive Internal Model for Torque Ripple Minimization and Robust Current Regulation in PMSM Drive Systems. IEEE Transactions on Energy Conversion 23(1), 92–100 (2009) 2. Zhao, W., Chau, K., Cheng, M., Hi, J., Zhu, X.: Remedial brushless AC operation of faulttolerant doubly-salient permanent-magnet motor drives. IEEE Transactions on Industrial Electronics 57(6), 2134–2141 (2010) 3. Rahman, M.A., Zhou, P.: Field Circuit Analysis of Brushless Permanent Magnet Synchronous Motors. IEEE Transactions on Industrial Electronics 43(2), 256–267 (1996) 4. Li, S.H., Liu, Z.: Adaptive Speed Control for Permanent-Magnet Synchronous Motor System with Variations of Load Inertia. IEEE Transactions on Industrial Electronics 56(8), 3050–3059 (2009) 5. Mohamed, Y.A.-R.I.: Design and Implementation of a Robust Current-Control Scheme for a PMSM Vector Drive with a Simple Adaptive Disturbance Observer. IEEE Transactions on Industrial Electronics 54(4), 1981–1988 (2007) 6. Bose, B.K.: Neural Network Applications in Power Electronics and Motor Drives-An Introduction and Perspective. IEEE Transactions on Industrial Electronics 54(1), 14–33 (2007) 7. Gadoue, S.M., Giaouris, D., Finch, J.W.: A Neural Network Based Stator Current MRAS Observer for Speed Sensorless Induction Motor Drives. In: IEEE International Symposium on Industrial Electronics (ISIE), Cambridge, pp. 650–655 (June 2008) 8. Yalcin, B., Ohnishi, K.: Infinite-Mode Neural Networks for Motion Control. IEEE Transactions on Industrial Electronics 56(8), 2933–2944 (2009) 9. Gadoue, S.M., Giaouris, D., Finch, J.W.: Sensorless Control of Induction Motor Drives at Very Low and Zero Speeds Using Neural Network Flux Observers. IEEE Transactions on Industrial Electronics 56(8), 3029–3039 (2009) 10. Dai, X., He, D., Zhang, T., Zhang, K.: ANN Generalized Inversion for the Linearization and Decoupling Control of Nonlinear Systems. IEEE Proceedings: Control Theory and Applications 150(3), 267–277 (2003)
A New Model Reference Adaptive Control of PMSM Using NNGI
67
11. Liu, G.H., Liu, P.Y., Shen, Y., Wang, F.L., Kang, M.: Neural Network Generalized Inverse Decoupling Control of Two-motor Variable Frequency Speed-regulating System. J. Proceeding of the CSEE 28(36), 98–102 (2008) 12. Jin, H., Lee, J.: An RMRAC current regulator for permanent-magnet synchronous motor based on statistical model interpretation. IEEE Transactions on Industrial Electronics 56(1), 169–177 (2009) 13. Rashed, M., Stronach, A.F.: A Stable Back-EMF MRAS-based Sensorless Low Speed Induction Motor Drive Insensitive to Stator Resistance Variation. IEEE Proceedings Electric Power Applications 151, 685–693 (2004)
RBF Neural Network Application in Internal Model Control of Permanent Magnet Synchronous Motor Guohai Liu, Lingling Chen, Beibei Dong, and Wenxiang Zhao 1
School of Electrical and Information Engineering, Jiangsu University, 212013, Zhenjiang, China
[email protected],
[email protected]
Abstract. As a significant part of artificial intelligence (AI) techniques, neural network is recently having a great impact on the control of motor. Particularly, it has created a new perspective of decoupling and linearization. With reference to the non-linearization and strong coupling of multivariable permanent magnet synchronous motor (PMSM), this paper presents internal model control (IMC) of PMSM using RBF neural network inverse (RBF-NNI) system. In the proposed control scheme, the RBF-NNI system is introduced to construct a pseudo-linear system with original system, and internal model controller is utilized as a robust controller. Therefore, the new system has advantages of above two methods. The efficiency of the proposed control scheme is evaluated through computer simulation results. By using the proposed control scheme, original system is successfully decoupled, and expresses strong robustness to load torque disturbance, the whole system provides good static and dynamic performance. Index Terms: RBF neural network (RBF-NN); internal model control (IMC); permanent magnet synchronous motor (PMSM); decoupling control.
1 Introduction As the progress of power electronics technology, permanent magnet synchronous motor (PMSM) drives have overcome the disadvantage of wide-speed operation, and have been applied in many industrial areas [1],[2]. Thus, PMSM drives are likely to be strong competitor of induction motor for wide range of future applications in advanced motion control and drive systems [3]. Despite many advantageous features of a PMSM drive, the precise control of PMSM speed regulation system remains a great challenge. PMSM is a multivariable and strong coupling nonlinear system, so the key of the control is decoupling and linearization. It has been known to everyone that vector control, differential geometry and inverse system method are all commonly used in decoupling applications. However, differential geometry and inverse system all need the accurate mathematic model of the controlled object. Furthermore, the physical conception of differential geometry suffers unclear expression and is hard to master. In addition, though the theory of Inverse System is simple to analyze, its anti-disturbance and robustness often cannot meet the requirements [4]. So, those two methods are hard to be applied in practice. D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 68–76, 2011. © Springer-Verlag Berlin Heidelberg 2011
RBF Neural Network Application in Internal Model Control
69
As the neural network technology is advancing fast, its applications in different areas are expanding rapidly [5]. By introducing neural network method into inverse system, a method called neural network inverse (NNI) system was proposed. This method incorporates advantages of above two methods. So far, most of the reported works have utilized Back Propagation (BP) neural network to build the inverse system [6],[7]. The disadvantages of BP neural network (BP-NN) are well known, as slow convergence speed, existence of local minima and bad real-time performance [8]. As compared to BP-NN, Radial Basis Function (RBF) neural network can greatly accelerate the learning speed and successfully avoids local minima. For the purpose of improving the accuracy, robustness and self-adaptability of original system, this paper will propose RBF neural network (RBF-NN) to build the inverse system. Usually, high performance motor control requires fast and accurate speed response, quick recovery of speed from load torque disturbances, and insensitivity of speed response to parameter variations. However, the pseudo-linear system which combines the RBF-NNI system with original system is not a simple, ideal linear system. Traditional PI controller often cannot meet the high requirements of accurate control [9],[10]. In the present work, internal model control (IMC) theory will be utilized to design an extras controller for the pseudo-linear system. It is superior to common controller that it can meet both requirements of control and robustness. The performance of the proposed IMC-based RBF-NNI system of PMSM will be investigated by simulations at different operating conditions.
2 PMSM Model and Its Inverse The mathematical model of a PMSM in the d-q synchronously rotating reference frame is given as the third-order nonlinear model.
Lq ⎧ disd usd Rs = − isd + isq w1 ⎪ Ld Ld Ld ⎪ dt ⎪⎪ disq usq R ψf L = − s isq − d isd w1 − w1 ⎨ Lq Lq Lq Lq ⎪ dt ⎪ 2 2 ⎪ dw1 = 3n p ψ f i + 3n p ( Ld − Lq ) i i − n p T sq sd sq L ⎪⎩ dt J 2J 2J
(1)
Where isd and isq are stator current of d-axis and q-axis, respectively; w1 is rotor electrical angular velocity; usd and usq are voltage of d-axis and q-axis, respectively; Ld and Lq are inductances of d-axis and q-axis, respectively. It has to be noted that Ld and Lq are equal in the paper; Rs is stator resistance; Ψf is permanent magnet flux linkage of rotor; np is the number of pole pairs; J is the moment of inertia; TL is load torque. Choose w1 and isd to be the outputs of the system, then y=[y1, y2]T=[isd, ω1]T; Choose usd and usq to be the control variables, then u=[u1, u2]T=[usd, usq]T; Choose isd, isq and w1 to be the state variables, then x=[x1, x2, x3]T=[isd, isq, ω1]T. It is also to be noted that isd influences the flux linkage, so the key of the PMSM decoupling control is decoupling of isd and ω1. Consequently, equation (1) can be given as:
70
G. Liu et al.
⎡ ⎢ ⎡ x1 ⎤ ⎢ ⎢u x = f ( x, u ) = ⎢⎢ x2 ⎥⎥ = ⎢ 2 L ⎢⎣ x3 ⎥⎦ ⎢ q ⎢ ⎢ ⎢⎣
⎤ ⎥ ⎥ ψf ⎥ Rs − x2 − x1 x3 − x3 ⎥ Lq Lq ⎥ ⎥ 3n 2pψ f np ⎥ x2 − TL ⎥⎦ 2J J u1 Rs − x1 + x2 x3 Ld Ld
(2)
According to the inverse system theory, differentiate the outputs of the system: y1 =
u1 Rs − x1 + x2 x3 Ld Ld
y 2 =
y2 =
3n 2pψ f 2J
x2 −
np J
(3)
TL
(4)
3n 2pψ f ⎛ u2 Rs ψf ⎞ x2 − x1 x3 − x3 ⎟ ⎜⎜ − 2 J ⎝ Lq Lq Lq ⎟⎠
(5)
Then the Jacobi matrix is: ⎡ ∂y1 ⎢ ∂u A( x, u )= ⎢ 1 ⎢ ∂ y2 ⎢ ⎣ ∂u1
∂y1 ⎤ ⎡ 1 ⎢ ∂u2 ⎥ ⎢ L1 ⎥= ∂ y2 ⎥ ⎢ ⎥ ⎢0 ∂u2 ⎦ ⎢⎣
Det ( A( x, u )) =
⎤ ⎥ ⎥ 3n p 2ψ f ⎥ ⎥ 2 JL1 ⎥⎦ 0
3n p 2ψ f 2 JL12
(6)
(7)
When the rotor flux linkage Ψs≠0, A(x, u) is nonsingular. The relative order of the system is α= (α1, α2) = (1, 2), obviously, α1+α2=1+2=3=n (order of system), so the inverse of the original system is: u = φ ( y1 , y1 , y2 , y 2 , y2 ) (8)
3 RBF-NNI System Application in Internal Model Control 3.1 RBF Neural Network
As we all known, RBF-NN is a kind of feed-forward neural network which executes function mapping in localized receptive fields. It has three layers, which are input layer, hidden layer and output layer, as shown in Fig.1. Radial basis function is used as the incentive functions of the hidden-layer neurons. The main advantages of radial basis
RBF Neural Network Application in Internal Model Control
h1
x1
w1 w2
h2
x2
71
∑
ym
wm xn
hm
Fig. 1. The structure of the RBF-NN
function are that its form is simple; function curve is very smooth and it is radial symmetry. Gaussian function is often selected to be the radial basis function. It can approximate any constant function. Particularly, it does not need to depend on the model of system. The output of the j-th hidden layer neuron: h j = exp(−
X −Cj 2b 2j
2
), j = 1, 2,......, m
(9)
‖*‖is Euclidean norm; X is the input vector; C is central vector of the j-th
Where
j
hidden layer neuron; bj is the width of the j-th hidden layer neuron. The output of the RBF-NN is: m
ym ( k ) = ∑ w j h j
(10)
j =1
Where wj is the weighing between the j-th hidden neuron and output layer; m is the number of the hidden layer neurons. The performance function which is approximated by the RBF-NN is given as: E (k ) =
1 ( y (k ) − ym (k )) 2 2
(11)
Where y(k) is the output of the controlled object. The sensitivity of the output to the input is called Jacobian algorithm, which is given as: c j1 − x1 ∂y (k ) ∂ym ( k ) ∂ym ( k ) m ≈ = = ∑ wj hj ∂u (k ) ∂u ( k ) ∂x1 b 2j j =1
(12)
In the above formulation, u(k) is regarded as the first input of RBF-NN, so u(k)=x1.
72
G. Liu et al.
3.2 RBF Neural Network Inverse System
In the previous work, the existence of inverse system has been proved. Due to the nonlinear and strong coupling of the multivariable motor system, it is still hard to obtain an analytical inverse. Neural network offers a good solution to the problem. By introducing neural network method to inverse theory, an RBF-NNI system is built to construct a pseudo-linear system with original system, as shown in Fig.2. v1 = y1
s −1
v2 = y2
s −1
s −1
v1 = y1
u1
y1
u2
y2
v2 = y2
1 s 1 s2
y1
y2
Fig. 2. Construction of the pseudo-linear system
3.3 Design Internal Model Controllers
In the practice implementation, due to the parametric perturbations, unpredictable disturbances and un-modeled dynamics of the pseudo-linear system, the control effect will deviate from expected target. IMC offers strong robustness to disturbances. In this paper, IMC theory is adopted to design the extra controllers. As compared to the common PID controller, IMC is more stable and often can meet the requirements of both control and robust performances. The structure of IMC is shown in Fig.3.
Gm1 ( s ) y1* y
* 2
Gc1 ( s) −1 m1
F1 ( s )G ( s ) −1 m2
F2 ( s )G ( s ) Gc 2 ( s )
G (s)
d1 u1 y1 y2
u2 d2 Gm 2 ( s)
Fig. 3. Block diagram of IMC
In the above figure, G(s) is the pseudo-linear system, Gm1(s) and Gm2(s) are the internal models of the system, d1 and d2 are the disturbance signals. Gc1(s) and Gc2(s) are the internal model controllers. F1(s) and F2(s) are the filters. According to inverse system theory, the internal models of the system are given by:
RBF Neural Network Application in Internal Model Control
1 1 Gm1 ( s ) = , Gm 2 ( s ) = 2 s s
73
(13)
In order to get good static and dynamic performance, based on the simulation, the filters which are used in control of Gm1(s) and Gm2(s) are designed as: F1 ( s ) =
1 1 , F2 ( s ) = 2s + 1 3s + 1 2
( )
(14)
Then the corresponding internal model controllers respectively are:
Gc1 ( s ) =
s s2 , Gc 2 ( s ) = 2s + 1 (3s + 1) 2
(15)
4 Simulation Results
100
speed, rad/sec
speed, rad/sec speed (rad/s)
In order to verify the effectiveness of the proposed IMC-based RBF-NNI system of the PMSM, a computer simulation model is developed in Matlab/simulink software according to Fig.3. After applying the appropriate incentive signals to the original system, the responses of the d-axis current and rotational speed can be obtained. Then obtained sampling sets {id, id(1), wr, wr(1), wr(2)} and {usq, usd } are normalized to training the RBF-NNI. At last, by introducing the designed internal model controllers into the pseudo-linear system, the robust control of the whole system is completed. The performance of the proposed IMC-based RBF-NNI system of PMSM has been investigated extensively simulations at different dynamic operating conditions. Results are presented below. Fig. 4 shows the simulated responses of the rotational speed for a step input of 100rad/s. It is shown that the speed can follow the reference speed without any overshoot/undershoot and with zero steady-state error. Further simulated result for step changes in reference speed is shown in Fig. 5. It is shown the proposed scheme is capable of handing the step changes in speed commands.
50 0 0
50 100 time, sec
150
80 60 40 20 0 0
50
100 150 time,sec
200
Fig. 4. Response of rotational speed for step Fig. 5. Response of rotational speed for a step input of 100rad/s change
74
G. Liu et al.
speed, rad/sec
10 5 0 0
5 time, sec
10 5 0 0
10
speed, rad/sec
d-axis current, A
d-axis current, A
Fig. 6 shows the simulated responses of both the rotational speed and d-axis current at step inputs for both PID controller system and proposed IMC-based RBF-NNI system. It is shown from Fig. 6 that by using the proposed scheme, the rotational speed and d-axis current are almost successfully decoupled, and both responses have good dynamic performances. Whereas, with the PID controller there is still coupling between the rotational speed and d-axis current, and both responses have overshoots and oscillations, especially when the references are changed.
50
100 150 time, sec
200
100 50 0 0
5 time, sec
10
100 150 time, sec
200
100 50 0 0
50
Fig. 6. Responses of both isd and w1 for step inputs, (a) PID control scheme; (b) proposed control
speed, rad/sec
scheme
100 50 0
50
100 150 time, sec
200
Fig. 7. Speed response for sudden change of load
RBF Neural Network Application in Internal Model Control
75
The robustness of the internal model controller is shown by Fig. 7. It is evident from Fig. 7 that the PMSM with the proposed scheme is almost insensitive to load torque disturbance. These results indicate that the proposed scheme which combines the IMC with RBF-NNI system is robust and suitable for the decoupling control of the PMSM drive.
5 Conclusion RBF-NNI system application in IMC of the PMSM has been presented in this paper. The proposed control scheme has been verified that it can realize the accuracy tracking of the rotational speed. A performance comparison of the proposed control scheme with the common control method has also been demonstrated. From the results, it has been shown that the proposed control scheme not only decouples the nonlinear system successfully, but also has strong robustness to load torque disturbance and un-modeled dynamics. Moreover, the whole system has provided good static and dynamic performance. Consequently, it provides a new way to high-performance control of PMSM. Acknowledgments. This work was supported in part by grants (Project No. 60874014, 50907031 and 51077066) from the National Natural Science Foundation of China, a grant (Project No. BK2010327) from the Natural Science Foundation of Jiangsu Province.
References 1. Morel, F., Retif, J.M., Lin-Shi, X.F., Valentin, C.: Permanent Magnet Synchronous Machine Hybrid Torque Control. IEEE Transactions on Industrial Electronics 55(2), 501–511 (2008) 2. Zhao, W.X., Chau, K.T., Cheng, M., Hi, J., Zhu, X.: Remedial Brushless AC Operation of Fault-tolerant Doubly-salient Permanent-magnet Motor Drives. IEEE Transactions on Industrial Electronics 57(6), 2134–2141 (2010) 3. Zhu, Z.Q., Howe, D.: Electrical Machines and Drives for Electric, Hybrid, and Fuel Cell Vehicles. Proceedings of the IEEE 95(4), 746–765 (2007) 4. Su, W.T., Liaw, C.M.: Adaptive Positioning Control for a LPMSM Drive Based on Adapted Inverse Model and Robust Disturbance Observer. IEEE Transactions on Power Electronic 21(2), 505–517 (2006) 5. Bose, B.K.: Neural Network Applications in Power Electronics and Motor Drives- An Introduction and Perspective. IEEE Transactions on Industrial Electronics 54(1), 14–33 (2007) 6. Pham, D.T., Yildirim, S.: Design of a Neural Internal Model Control System for a Robot. Robotica 5(5), 505–512 (2000) 7. Li, Q.R., Wang, P.F., Wang, L.Z.: Nonlinear Inverse System Self-learning Control Based on Variable Step Size BP Neural Network. In: International Conference on Electronic Computer Technology (February 2009) 8. Zhang, Y.N., Li, Z., Chen, K., Cai, B.H.: Common Nature of Learning Exemplified by BP and Hopfield Neural Networks for Solving Online a System of Linear Equations. In: Proceedings of 2008 IEEE International Conference on Networking, Sensing and Control, vol. 1, pp. 832–836 (2008)
76
G. Liu et al.
9. Uddin, M.N., Rahman, M.A.: High-Speed Control of IPMSM Drives Using Improved Fuzzy Logic Algorithms. IEEE Transactions on Industrial Electronics 54(1), 190–199 (2007) 10. Rubaai, A., Castro-Sitiriche, M.J., Ofoli, A.R.: DSP-based Laboratory Implementation of Hybrid Fuzzy-PID Controller Using Genetic Optimization For High-performance Motor Drives. IEEE Transactions on Industry Applications 44(6), 1977–1986 (2008)
Transport Control of Underactuated Cranes Dianwei Qian , Boya Zhang, and Xiangjie Liu School of Control and Computer Engineering, North China Electric Power University, Beijing, 102206, P.R. China {dianwei.qian,liuxj}@ncepu.edu.cn
Abstract. Overhead cranes are important equipments that are used in many industries. They belong to underactuated mechanical systems. Since it is hard to obtain an accurate model for control design, this paper presents a design scheme for the transport control problem of overhead cranes with uncertainties. In this scheme, a variable structure control law based on sliding mode is designed for the nominal model and neural networks are utilized to learn the upper bound of system uncertainties. In the sense of Lyapunov theorem, the update formulas of the network weights are deduced to approximate the system uncertainties. From the design process and comparisons, it can be seen that: 1) the neural approximator is able to compensate the system uncertainties, 2) the control system possesses asymptotic stability, 3) better performance can be achieved. Keywords: Variable structure control, Neural networks, Underactuated systems, Approximator, Crane.
1
Introduction
Overhead cranes are important equipments that are used in many industries. But their performance may be constrained because loads have a pendulum-type motion, harmful for industrial security. It is desired that overhead cranes are to be able to transport loads to the required position as fast and as accurately as possible without free swings [1]. Concerning the control problem of overhead cranes, a variety of control approaches have been proposed in the last two decades, e.g. fuzzy control [1], adaptive coupling control [2], passivity-based control [3], wave-based control [4], adaptive fuzzy control [5], 3-dimensional path planning [6], etc. But most of the referred methods only focused on the accurate model of crane systems to formulate the control input of cranes. Their performance may be deteriorated when there exist uncertainties in crane systems. With the development of nonlinear control theory, applications of sliding mode control (SMC) have received more attention. SMC [7], belonging to variable structure control, is able to respond quickly, invariant to systemic parameters and external disturbance. It is a good
This work was supported by the NSFC Projects under grants No. 60904008, 60974051, the Fundamental Research Funds for the Central Universities under grant No. 09MG19.
D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 77–83, 2011. c Springer-Verlag Berlin Heidelberg 2011
78
D. Qian, B.Zhang, and X. Liu
choice to deal with the control problem of overhead crane systems. Some control methods on basis of SMC have been proposed for overhead crane systems, e.g. coupling SMC [8], second-order SMC [9], high-order SMC [10], adaptive SMC [11], hierarchical SMC [14], etc. Although the robust SMC is invariable to matched disturbances [7], it is necessary to get a known bound of uncertainties on the aspect of the systems stability, which may be too strict to be achieved for practical accounts. Due to neural networks possessing the ability of identifying a broad category of complex nonlinear systems, this paper addresses a neural-network-based SMC method to deal with the control problem of overhead crane systems with unknown bounded uncertainties. The update formulas of the network weights are deduced from the Lyapunov direct method to ensure the asymptotically stability of the control system. Simulation results demonstrate the applications of the presented method.
2
System Model
Fig. 1 shows the coordinate system of an overhead crane system and its load. Apparently, this system consists of the trolley subsystem and the load subsystem. The former subsystem is with the trolley driven by a force f . The latter is with the load suspended from the trolley by a rope. For simplicity, we assume the load is regarded as a material particle, an inflexible rope with negligible mass, no friction etc. In Fig. 1, the symbols are described as the trolley mass M , the load mass m, the rope length L, the swing angle θ of the load with respect to the vertical line , the trolley position x with respect to the origin, the force f applied to the trolley. The state space expression of the crane system with uncertain dynamics [3] is written as
Fig. 1. Structure of an overhead crane system
Transport Control of Underactuated Cranes
⎧ x˙ 1 ⎪ ⎪ ⎨ x˙ 2 x˙ 3 ⎪ ⎪ ⎩ x˙ 4
= x2 = f1 (x) + b1 (x) · u + d1 (t) = x4 = f2 (x) + b2 (x) · u + d2 (t)
79
(1)
Here g is the gravitational acceleration; x = [x1 , x2 , x3 , x4 ]T ; x1 = x; x3 = θ; x2 is the trolley velocity; x4 is the angular velocity of the load; u = f is the control input; d1 (t) and d2 (t) are bounded lumped disturbances, involving the parameter variations and external disturbances, but both of them are with the unknown bounds; fi and bi (i = 1, 2) can be derived as f1 =
m · L · x24 · sin x3 + m · g sin x3 · cos x3 M + m · sin2 x3 b1 =
f2 = −
1 M + m · sin2 x3
(m + M ) · g · sin x3 + m · L · x24 · sin x3 · cos x3 (M + m · sin2 x3 ) · L cos x3 b2 = − (M + m · sin2 x3 ) · L
Let d1 (t) = 0 and d2 (t) = 0 in (1). We can obtain the nominal system of this uncertain system.
3
Design of RBF Network-Based Sliding Mode Controller
For designing the sliding mode control law of the uncertain system (1) in the sense of Lyapunov, its sliding surface is defined as s = c1 x1 + c2 x2 + c3 x3 + x4
(2)
here cj (j = 1, 2, 3) are constants and c2 is positive. Employing the equivalent control method [7], we define the control law as u = ueq + usw
(3)
here ueq is the equivalent control and usw is the switching control. After the sliding mode takes place, we have s˙ = 0
(4)
Substituting the equation of the nominal system into (4) yields the equivalent control law ueq as c1 x2 + c3 x4 + c2 f1 + f2 ueq = − (5) c 2 b1 + b2
80
D. Qian, B.Zhang, and X. Liu 2
Define a Lyapunov function as V (t) = s2 . Differentiating V with respect to time t and substituting (1)–(5) into it yield V˙ = ss˙ = s[c1 x2 + c2 (f1 + b1 u + d1 ) + c3 x4 + f2 + b2 u + d2 ] = s[(c2 b1 + b2 )usw + c2 d1 + d2 ] Let usw = −
ks + ηsgn(s) (c2 b1 + b2 )
(6)
(7)
where both k and η are positive constants. Substituting (7) into (6) yields V˙ = −ks2 − η|s| + s(c2 d1 + d2 ) ≤ −ks2 − (η − c2 |d1 | − |d2 |)|s|
(8)
From (8), we can pick up a positive constant η0 and define η = η0 + c2 d¯1 + d¯2 on the aspect of the system stability if the upper bounds of uncertainties, d¯1 = sup d1 (t) and d¯2 = sup d2 (t), are known. In practical accounts, it is hard to obtain the upper bounds. Thus, it is necessary to approximate them to guarantee the system stability. Due to RBF networks having the ability of approximating complex nonlinear mapping directly from input-output data with a simple topological structure [12], we will propose an approach to approximate the upper bounds of d1 (t) and d2 (t) in (1) by RBF networks. Define the vector x as the network input, and dˆi (i = 1, 2), the estimated value of di (t), as the network outputs. Then, the i-th RBF network output is determined as T dˆi (x, wi ) = wi Φi (x) (9) ∗
here wi ⊆ Rn ×1 is the weight vector of the i-th RBF neural network, where n∗ is the number of hidden neurons, Φi (x) = [φi1 (x), φi2 (x), · · · , φin∗ (x)]T is a radial basis function vector, where the k-th RBF function of the i-th network is determined as ||x − γik ||2 φik (x) = exp(− ) (10) 2 δik here γik and δik depicting the center and width of the k-th hidden neuron of the i-th RBF network. To deduce the update formulas, we make the following assumptions [13]. – A1: There exists an optimal weight wi∗ so that the output of the i-th optimal T network satisfies |wi∗ Φi (x) − d¯i | < i0 , here i0 is a positive constant. – A2: The norm of the system uncertainties and its upper bound satisfy the following relationship |d¯i − di (t)| > i1 > i0 . To get the update formulas, we redefine a Lyapunov function as ˜ iT w ˜i s2 α−1 i w + 2 2 2
Vn (t) =
i=1
(11)
Transport Control of Underactuated Cranes
81
˜ i = wi∗ −wi , w ˜˙ i = −w ˙ i , α1 and α2 are positive constants. Differentiating Here w Vn with respect to time t yields ˙ V˙ n (t) = ss−
2
˜ iT w ˜˙ i ≤ −ks2 −(η−c2 |d1 |−|d2 |)|s|− α−1 i w
i=1
2
i∗ i T i ˙ α−1 i (w −w ) w
i=1
(12) Let ˙ 1 = α1 c2 Φ1 (x)|s| w
(13a)
2
˙ = α2 Φ2 (x)|s| w
(13b)
Substituting them into (12), we have V˙ n (t) ≤ −ks2 − (η − c2 |d1 | − |d2 |)|s| − c2 (w1∗ − w1 )T Φ1 (x)|s| − (w2∗ − w2 )T Φ2 (x)|s| = −ks2 − η0 |s| − c2 |s|(d¯1 − dˆ1 ) − |s|(d¯2 − dˆ2 ) − c2 |s|[w
1∗ T
Φ1 (x) − |d1 |] − |s|[w
= −ks − η0 |s| − 2
c2 (11
−
10 )|s|
−
2∗ T
(21
(14)
Φ2 (x) − |d2 |]
− 20 )|s|
From the assumptions A1 and A2, V˙n ≤ 0 in (14) so that both the update formulas (13a) and (13b) are able to ensure the asymptotic stability of the system (1) with unknown bounded uncertainties under the control law (3) in the sense of Lyapunov.
4
Simulation Results
The presented method in this section will be applied on transport control of an overhead crane system, whose physical parameters are determined as M = 37.32 kg, m = 5 kg, l = 1.05 m, g = 9.81m · s−2 . The initial and desired states, x0 and xd , are [2 0 0 0]T and [0 0 0 0]T , respectively. The parameters of the sliding surface s and the switching control are picked up as c1 = 1.6, c2 = 2.3, c3 = −8.7, and k = 10, η0 = 0.2. The center γik and width δik of the k-th hidden neuron of the i-th RBF network are designed as random numbers in the interval (0, 1). α1 , α2 , and n∗ are selected as 104 , 104 and 6 after trial and error. Here, d1 (t) and d2 (t) in (1) are assumed as random uncertainties. The simulation results in Fig. 2 illustrate the comparison with and without NNs, where the blue solid depicts the results with NNs approximating the unknown bounds and the black dash illustrates the results with no NNs. Under the conditions of no NNs to approximate the unknown bounds, we select a larger value of η, η = 2, to ensure the stability of this system. As we have pointed out, the presented RBF-network-based SMC method with the update formulas (12) makes the crane system with unknown bounded uncertainties asymptotically stable. Further, although the curves of cart position x1 and load angle x3 in Fig. 2 almost make no difference, the plot of the
D. Qian, B.Zhang, and X. Liu 0.3
0.5 0 3
6
0.1 0 −0.1 0
9
Time (s)
5
estimated d1
4
estimated d
Control input u [N] with NNs
Output of NNs
6
2
3 2 1 0 0
3
6 Time (s)
9
4 with NN with no NN desired
Sliding surface s
1
−0.5 0
0.2
2
1.5
Load Angle x [rad]
with NN with no NN desired
1
Cart Position x [m]
2
3 6 Time (s)
500
0
−500
−1000 0
with NN
3 6 Time (s)
9
3 2 1 0 −1 0
9
Control input u [N] with no NNs
82
3 6 Time (s)
9
500
0 with no NN −500
−1000 0
3 6 Time (s)
9
Fig. 2. Comparison with and without NNs to estimate the upper bounds. (a) cart position, (b) load angle, (c) sliding surface, (d) outputs of RBF NNs dˆ1 & dˆ2 , (e) control input with NNs, (f) control input with no NNs.
control input u with NNs is superior. This case indicates that a larger fixed η with no NNs leads to the chattering phenomena of the control input. On the contrary, η is variable and adaptive on the dynamic process if the upper bounds of uncertainties are estimated by the output of NNs. This effectively reduces the chattering phenomena of the control input.
5
Conclusion
This paper has presented a robust sliding mode controller based on RBF neural networks for overhead crane systems with uncertain dynamics. The update formulas of the RBF functions are deduced from the Lyapunov method to ensure the stability of the control system. Compared with the results of no NN approximator, the simulation results illustrate the feasibility and robustness of the presented method. The main contribution of this presented approach is to be able to solve the control problem of overhead crane systems with unknown bounded uncertainties.
References 1. Yi, J.Q., Yubazaki, N., Hirota, K.: Anti-swing and positioning control of overhead traveling crane. Information Sciences 115(1-2), 19–42 (2003) 2. Yang, J.H., Yang, K.S.: Adaptive coupling control for overhead crane systems. Mechatronics 17(2-3), 143–152 (2007)
Transport Control of Underactuated Cranes
83
3. Fang, Y., Dixon, W.E., Dawson, D.M., Zergeroglu, E.: Nonlinear coupling control laws for an underactuated overhead crane system. IEEE/ASME Transactions on Mechatronics 8(3), 418–423 (2003) ´ 4. Yang, T.W., OConnor, W.J.: Wave based robust control of a crane system. In: Proc. of IEEE/RSJ International Conference on Intelligent Robots & Systems, Beijing, pp. 2724–2729 (2006) 5. Chang, C.Y.: Adaptive fuzzy controller of the overhead cranes with nonlinear disturbance. IEEE Transactions on Industrial Informatics 3(2), 164–172 (2007) 6. Kroumov, V., Yu, J., Shibayama, K.: 3D path planning for mobile robots using simulated annealing neural network. International Journal of Innovative Computing, Information and Control 6(7), 2885–2899 (2010) 7. Utkin, V.I.: Sliding modes in control and optimization. Springer, New York (1992) 8. Shyu, K.K., Jen, C.L., Shang, L.J.: Design of sliding-mode controller for antiswing control of overhead cranes. In: Proc. of the 31st Annual Conference of IEEE Industrial Electronics Society, pp. 147–152 (2005) 9. Bartolini, G., Pisano, A., Usai, E.: Second-order sliding-mode control of container cranes. Automatica 38(10), 1783–1790 (2002) 10. Chen, W., Saif, M.: Output feedback controller design for a class of MIMO nonlinear systems using high-order sliding-mode differentiators with application to a laboratory 3-D crane. IEEE Transactions on Industrial Electronics 55(11), 3985–3997 (2008) 11. Park, M.S., Chwa, D., Hong, S.K.: Antisway tracking control of overhead cranes with system uncertainty and actuator nonlinearity using an adaptive fuzzy slidingmode control. IEEE Transactions on Industrial Electronics 55(11), 3972–3984 (2008) 12. Huang, G.B., Saratchandran, P., Sundararajan, N.: A generalized growing and pruning RBF (GGAP-RBF) neural network for function approximation. IEEE Transactions on Neural Networks 16, 57–67 (2005) 13. Man, Z., Wu, H.R., Palaniswami, M.: An adaptive tracking controller using neural networks for a class of nonlinear systems. IEEE Transactions Neural Networks 9(5), 947–955 (1998) 14. Qian, D.W., Yi, J.Q., Zhao, D.B.: Hierarchical sliding mode control for a class of SIMO under-actuated aystems. Control & Cybernetics 37(1), 159–175 (2008)
Sliding Mode Prediction Based Tracking Control for Discrete-time Nonlinear Systems Lingfei Xiao and Yue Zhu College of Energy and Power Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu, 210016, China
Abstract. A novel tracking control algorithm based on sliding mode prediction is proposed for a class of discrete-time nonlinear uncertain systems in this paper. By constructing a sliding mode prediction model at first, the future, present and history information of the system can be used to improve the performance of sliding mode control. Because of sliding mode reference trajectory, the reachability of sliding mode is achieved, and the reaching mode can be determined by designers in advance. Due to feedback correction and receding horizon optimization, the influence of uncertainty can be compensated in time and control law is obtained subsequently. Theoretical analysis proves the closed-loop system possesses robustness to matched or unmatched uncertainty, without requiring the known bound of uncertainty. Simulation results illustrate the validity of the presented algorithm. Keywords: Sliding mode prediction, sliding mode control, discrete-time nonlinear systems, tracking.
1
Introduction
As a powerful robust control method, sliding mode control (SMC) has been researched since the early 1950’s [1]. One of reasons for the popularity of SMC is its remarkably good invariance for matched uncertainty on sliding surface. With the wide using of micro-controllers, the study of discrete-time SMC (DSMC) is required, because when continuous-time SMC schemes are implemented on digital devices, some advantages of SMC are lost. In [2], Gao et al. specified desired properties of the controlled systems and proposed a reaching law-based approach to complete DSMC. For a single-input single-output (SISO) linear uncertain plant, [3] addressed a discrete-time variable-structure repetitive control algorithm, which was on the basis of a created delay reaching law. In [4], Chen presented a robust adaptive sliding mode tracking controller for discrete-time multi-input multi-output systems. Bounded motion of the system around the sliding surface and stability of the global system were guaranteed, when all signals were bounded. Using a recursive switching function, [5] presented a DSMC
This paper is supported by Natural Science Foundation of China (NO. 61004079) and NUAA Research Funding (NO. NS2010050).
D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 84–93, 2011. c Springer-Verlag Berlin Heidelberg 2011
Sliding Mode Prediction Based Tracking Control
85
algorithm based on [2]. The algorithm produced low-chattering control signal. In [6], a robust controller was developed by employing reaching law in [2], provide that the upper bound of uncertainty was known. In [7], Takagi-Sugeno (T-S) fuzzy model was introduced to transform the considered nonlinear discrete-time systems into linear uncertain system model, and then a modified reaching law was used. However, the guideline of constructing T-S model was not given. [8] discussed the problems involved in the discretization of continuous-time terminal sliding mode (TSM) and presented a discrete-time TSM algorithm. All of above researches did not take optimization into consideration. The purpose of this paper is to design a suitable robust tracking controller for a class of discrete-time nonlinear uncertain systems. Because SMC is an effective nonlinear robust control method, SMC is used to achieve the control objective. Nevertheless, SMC only possesses strong robustness to matched uncertainty; the corresponding control signal is required to be large in order to force arbitrary states reach sliding surface, which probably results in controller saturation; and the bound of uncertainty is often required to be known, which may not be gotten in practice and make closed-loop systems have high conservation. Considering these shortcomings, predictive control (PC) strategy is utilized to improve the performance of SMC. According to PC, prediction model, feedback correction (FC) and receding horizon optimization (RHO) are employed during the design process [9].Therefore, a sliding mode prediction model (SMPM) is created at first, so that the future, present and history information in the system can be used. Considering model mismatch, the error between the history output of SMPM and present sliding mode value is used to make feedback correction for SMPM. Because of sliding mode reference trajectory (SMRT), the reachability of sliding mode is obtained. Since SMRT is optional, the reaching mode can be determined by designers in advance. By RHO, control signal is gained and can be optimized online. Due to FC and RHO, the influence of uncertainty can be compensated in time. Theoretical analysis proves the closed-loop system possesses robustness to matched or unmatched uncertainty, without requiring the known bound of uncertainty. This paper is organized as follows. Section 2 describes the considered systems and control objective. Section 3 contains main results, i.e., the creation of SMPM, the feedback correction for SMPM, the selection of SMRT, the construction of control law and the analysis of robustness. In Section 4, advantages of the presented algorithm are verified by a numerical example. Finally, Section 5 draws conclusions of the paper.
2
Problem Formulation
In the following, we will consider the following uncertain system, xi (k + 1) = xi+1 (k) + di (k), i = 1, . . . , n − 1 xn (k + 1) = f x(k) + g x(k) u(k) + dn (k)
(1)
where xi (k) and u(k) stand for xi (kT ) and u(kT ), respectively, with T is sampling period. x(k) = [x1 (k), . . . , xn (k)]T ∈ Rn is the measurable state vector,
86
L. Xiao and Y. Zhu
u(k) ∈ R is the control input, f x(k) and g x(k) are smooth functions. d(k) = [d1 (k), . . . , dn (k)]T ∈ Rn is matched or unmatched uncertainty vector, which includes all parameter perturbation, modeling error and external disturbance in the system. For the sake of brevity, we will use the notations f (k) = f x(k) and g(k) = g x(k) in further text. Assumption: g(k) is a positive function, gL > 0 is the known lower bound of g(k), i.e., g(k) ≥ gL > 0. The objective of this paper is to construct a control law u(k), such that the state vector x(k) tracks a desired state vector xd (k) in a stable manner, where xd (k) = [xd1 (k), . . . , xdn (k)]T ∈ Rn is known vector and satisfies xdi (k + 1) = xdi+1 (k), xdn (k + 1) = a(k), (i = 1, . . . , n − 1), with a(k) is a known function. Define tracking error as e(k) = xd (k) − x(k), the corresponding error system of (1) is ei (k + 1) = ei+1 (k) − di (k),
i = 1, . . . , n − 1
en (k + 1) = a(k) − f (k) − g(k)u(k) − dn (k)
3
(2)
Main Results
In terms of SMC theory, there are two steps to accomplish the design: 1). to determine a sliding surface which possesses desired dynamics; 2). to design a control law such that tracking error is driven to the sliding surface and remains on it thereafter. In the following, after specifying a suitable sliding surface, a novel SMPM will be constructed. Based on the SMPM, due to feedback correction and receding horizon optimization, a satisfying sliding mode control law will be obtained. 3.1
The Creation of Sliding Mode Prediction Model (SMPM)
Define switching function as s(k) = ce(k) =
i = 1n−1 ci ei (k) + en (k)
(3)
where c = [c1 , . . . , cn ], with cn = 1. Therefore, the sliding surface n is S = {e(k)|s e(k) = 0}. In (3), ci should be chosen such that the roots of i=1 ci zi−1 = 0 are in open unit disk, where z is shift operator, thus the stability and dynamic performance of ideal sliding mode can be guaranteed. In order to obtain desired control law, we would like to use the future, present and history information in the system. Therefore, the following sliding mode prediction model (SMPM) is created to predict a step ahead value of switching function, sm (k + 1) =
n−1
ci ei+1 (k) + a(k) − f (k) − g(k)u(k) + φ(k)u(k)
(4)
i=1
where φ(k) > 0 satisfies φ(k) =
g(k)+
√
g 2 (k)−4λ 2
(5)
Sliding Mode Prediction Based Tracking Control
87
with λ is a designable positive parameter. The guideline of adjusting λ will be given in subsection 3.4. To ensure the existence of φ(k) and φ(k) > 0, g 2 (k)−4λ > 0 is required for any time instant k, so 0<λ< For brevity, the notations Σ(k) =
n−1 i=1
g2 L 4
(6)
ci ei+1 (k) will be used in further text.
Remark 1. Obviously, taking a step forward (3), gives s(k + 1) =
n−1
ci ei (k + 1) + en (k + 1)
(7)
i=1
The substitution of (2) in (7) yields s(k + 1) =
n−1
ci ei+1 (k) − di (k) + a(k) − f (k) − g(k)u(k) − dn (k)
i=1
= Σ(k) + a(k) − f (k) − g(k)u(k) − cd(k)
(8)
Compared (4) with (8), one can see the term φ(k)u(k) in (4) acts as a compensation term for uncertainty in the system. 3.2
The Feedback Correction for SMPM
In practice, due to time-variance, nonlinearity or disturbances, the model predictive value will not be the same as actual output. In predictive control strategy, a common way to handle such a problem is feedback correction [9].Here, the error between practical sliding mode value s(k) and SMPM predictive value sm (k), is used to make feedback correction for sm (k + 1), namely, s˜m (k + 1) = sm (k + 1) + s(k) − sm (k) (9) where s˜m (k + 1) is the closed-loop output of SMPM. Remark 2. According to (4), when time instant k − 1 is viewed as start point, the predictive value of SMPM sm (k) is sm (k) = Σ(k − 1) + a(k − 1)− f (k − 1) − g(k − 1)u(k − 1) + φ(k − 1)u(k − 1) (10) Define s¯(k) = s(k) − sm (k)
(11)
therefore, (9) reduces to s˜m (k + 1) = sm (k + 1) + s¯(k). Since state x(k) is measurable and xd (k) is known, s(k) can be obtained. Besides, sm (k) is available according to (10). Thereby, s¯(k) is known variable at time instant k.
88
3.3
L. Xiao and Y. Zhu
The Selection of Sliding Mode Reference Trajectory (SMRT)
According to predictive control strategy, the value of switching function s(k) should tend to desired value sd along a reference trajectory with respect to k [9]. Because the control objective is to drive error states to the sliding surface, namely, s(k) = 0, thus the desired value of switching function should be zeros, i.e., sd = 0. In this wise, when the sliding mode reference trajectory (SMRT) sr (k) is tracked, the sliding surface can be reached, therefore, the reachability of sliding mode is achieved. The SMRT is optional, which is determined by designers according to practical performance requirement. Generally speaking, arbitrary trajectory which finally converges to sd or a vicinity of sd can be chosen as SMRT, however, the following three kinds are typical, 1. sr (k + 1) = sd = 0
(12)
When SMRT (12) is selected, it will result in any error state from its initial position to sliding surface in one time instant, thus control saturation may appear. However, if the initial position is not far from sliding surface, SMRT (12) can provide simple control algorithm, fast dynamic and short reaching time. 2. sr (k + 1) = αsr (k) + (1 − α)sd , sr (0) = s(0)
(13)
where 0 < α < 1 is smoothing factor. Such form of reference trajectory is common in predictive control strategy. Due to sd = 0, (13) can be reduced to sr (k + 1) = αsr (k) = αk+1 s(0). Obviously, the smaller α is, the faster SMRT will reach desired value sd . 3.
3.4
sr (k + 1) = (1 − β)sr (k) − γsat srΩ(k) , sr (k) = s(k) (14) s (k) sgn sr (k) , |sr (k)| > Ω where 0 < β < 1, γ > 0, sat rΩ = , sgn(·) is sr (k) , |sr (k)| ≤ Ω Ω γ sign function, Ω = 1−β is the width of boundary layer. Clearly, SMRT (14) is in the form of reaching law [2] and sgn(·) is replaced by sat(·) in order to reduce chattering [10]. When SMRT (14) is selected, the reaching mode of the system under the algorithm in this paper is similar to that of under reaching law method. The Construction of Control Law
Generally, predictive control strategy determines control signal by optimizing a performance index, therefore, it is a kind of optimization algorithm. However, predictive control is different from traditional optimal control, it solves a finite horizon optimal control problem, implements receding horizon optimization[9]. Supposing the predictive horizon is M sampling instant, the approach of receding horizon optimization is: For present sampling instant k, calculate control
Sliding Mode Prediction Based Tracking Control
89
input signals from sampling instant k to sampling instant k + M − 1, namely to obtain [u(k), u(k + 1), . . . , u(k + M − 1)]T . However, only present control input signal u(k) is implemented, other elements u(k + 1), . . . , u(k + M − 1)] are not used for control, but served as initial values for the next optimization. For the next sampling instant, control input signal u(k +1) will be calculated recursively. In this paper, for simplicity and to make the design principle be prominent, one-step preceding horizon optimization is used in the following. Now, performance index is given as 2 2 φ(k − 1) J = sr (k + 1) − s˜m (k + 1) + λ u(k) − u(k − 1) (15) φ(k) g2
where 0 < λ < 4L is a weight coefficient, which adjusts the relation between the closed-loop SMPM tracking error and control effort. When control signal changes extremely, increasing λ will bring on better control performance [9]. Define δ(k) =
φ(k−1) φ(k)
q(k) = g(k) − φ(k) p(k) = sr (k + 1) − Σ(k) − a(k) + f (k) − s¯(k)
(16a) (16b) (16c)
According to (4), (9), (16a), (16b) and (16c), (15) can be re-written to 2 2 J = sr (k + 1) − sm (k + 1) − s¯(k) + λ u(k) − δ(k)u(k − 1) 2 2 = p(k) + q(k)u(k) + λ u(k) − δ(k)u(k − 1) (17) The solution of minimizing (17) gives control signal u(k). Obviously, the minimum of J can be computed analytically. By setting the partial derivative of J ∂J to zero, i.e., ∂u(k) = 0, solving the resulting equation, the optimal solution of u(k) is p(k)q(k) − λδ(k)u(k − 1) u(k) = − (18) q 2 (k) + λ Obviously, because of (5), (6), (16b) and Assumption 1, q(k) > 0 and q 2 (k)+λ > 0 are held.Thus, the control law u(k) for the closed-loop system is obtained. Remark 3. When g(k) = g(k − 1) = g, according to (5), gives φ(k) = φ(k − 1) = 2 φ, therefore, performance index (15) reduces to J = sr (k + 1) − s˜m (k + 1) + 2 λ u(k)−u(k−1) . As a results, control law (18) turns to u(k) = − p(k)q−λu(k−1) , q 2 +λ where g, φ, q are positive constants, and q = g − φ. 3.5
The Analysis of Robustness
Consider the uncertain system (2), according to (8) and (18), yields p(k)q(k) − λδ(k)u(k − 1) s(k + 1) = Σ(k) + a(k) − f (k) + g(k) − cd(k) q 2 (k) + λ
90
L. Xiao and Y. Zhu
Because of (5) and (16b), gives φ2 (k)−g(k)φ(k)+λ = 0 ⇒ g 2 (k)+φ2 (k)−2g(k)φ(k)+λ = g 2 (k)−g(k)φ(k) 2 ⇒ g(k) − φ(k) + λ = g(k) g(k) − φ(k) ⇒ q 2 (k) + λ = g(k)q(k) therefore,
g(k)q(k) q2 (k)+λ
= 1 and
g(k) q2 (k)+λ
= q −1 (k). Thus
s(k + 1) = Σ(k) + a(k) − f (k) + p(k) − q −1 (k)λδ(k)u(k − 1) − cd(k) = sr (k + 1) − s¯(k) − q −1 (k)λδ(k)u(k − 1) − cd(k)
(19)
Delaying each term in (8) a step gives s(k) = Σ(k − 1) + a(k − 1) − f (k − 1) − g(k − 1)u(k − 1) − cd(k − 1)
(20)
According to (11), (10) and (20), yields s¯(k) = −cd(k − 1) − φ(k − 1)u(k − 1)
(21)
Substituting (21) into (19) gives s(k + 1) = sr (k + 1) + cd(k − 1) + φ(k − 1)u(k − 1) − q −1 (k)λδ(k)u(k − 1) − cd(k) = sr (k + 1) + cd(k − 1) − cd(k) + φ(k − 1) − q −1 (k)λδ(k) u(k − 1) (22) In terms of (5), q(k) > 0 and φ(k) > 0, one can see q(k)φ(k) = g(k) − φ(k) φ(k) g(k) + g 2 (k) − 4λ g(k) + g 2 (k) − 4λ g 2 (k) − g 2 (k) − 4λ = g(k) − = =λ 2 2 4 λ λ ⇒ = 1 ⇒ 1− φ(k − 1) = 0 ⇒ φ(k − 1) − q −1 (k)λδ(k) = 0 q(k)φ(k) q(k)φ(k) Therefore, (22) reduces to s(k + 1) = sr (k + 1) + cd(k − 1) − cd(k) Theorem 1. The closed-loop error system (2) is stable robustly under the control law (18), such that the following inequality holds, |cd(k) − cd(k − 1)| ≤ ξ
(23)
where ξ is the boundary of the change rate of uncertainty and is a known positive constant. Proof. Because SMRT can be chosen as arbitrary trajectory which converges to sd or a vicinity of sd = 0, that is ∃kN < ∞, ∀k > kN , such that the following inequality holds, |sr (k + 1)| < η (24)
Sliding Mode Prediction Based Tracking Control
91
where η is the boundary of sliding mode reference trajectory and is a positive constant. When SMRT (12) is used, |sr (k + 1)| = 0, thus η = 0; if SMRT (13) is applied, due to 0 < α < 1, obviously ∃kN < ∞, ∀η > 0, k > kN , such that |sr(k+1) | < η; once SMRT (14) is selected, η = Ω. Here, let ε = η + ξ, clearly ε is a positive constant as a result. On account of (23) and (24), one can see when k > kN ,
|s(k + 1)| = sr (k + 1)− cd(k)−cd(k − 1) ≤ |sr (k + 1)|+ cd(k)−cd(k − 1) < η + ξ = ε
Therefore, there exists |s(k + 1)| < ε, namely, system (2) satisfies the reaching condition of quasi-sliding mode [11] in the ε vicinity of sliding surface S = {e(k)|s e(k) = 0}. Hence, the practical sliding mode motion of the closed-loop system will definitely converge to a ε vicinity of sliding surface and stay in it subsequently. Since the stability and dynamic performance of ideal sliding mode has been guaranteed by (3), the closed-loop system (2) with control law (18) is robustly stable. Consequently, the design objective is completed, namely, the state vector x(k) can track desired state vector xd (k) in a stable manner, even when uncertainty appears in the system. Remark 4. Generally speaking, it is sound to assume that the change rate of uncertainty is bounded [11], namely, (23) can be satisfied. Especially, for slowly varying uncertainty or sample period T → 0, quasi-sliding mode band width ε will be tiny, and ξ may be small as a result.
4
Numerical Simulation
Consider the following third order discrete-time nonlinear uncertain system, x1 (k + 1) = x2 (k) + d1 (k) x2 (k + 1) = x3 (k) + d2 (k)
(25)
x3 (k + 1) = f (k) + g(k)u(k) + d3 (k) where f (k) = x1 (k) + x1 (k)x2 (k) + x23 (k), g(k) = 1 + e−k , Δf (k) = 0.2f (k), Δg(k) = −0.3g(k), d1 (k) = 0.1 sin(3k), d2 (k) = 0.05 cos(5k) and d3 (k) = Δf (k) + Δg(k)u(k) + 0.025 sin(10k). ⎧ 1 ≤ k ≤ 50 ⎨ 0.5 + 0.1 sin(50k), Supposing a(k) = 0.25 + 0.1 sin(50k), 51 ≤ k ≤ 150 , choosing c = [ 0.02, −0.3, 1 ], ⎩ −0.75 + 0.1 sin(50k), 151 ≤ k ≤ 200
γ λ = 0.15 and 1−β = 0.5, γ = 0.1, Ω = 1−β in SMRT (14), assuming uncertainty influences system from k = 101 to k = 105. The simulation results of the closedloop system with the presented algorithm are illustrated in Fig. 1–2. Because reaching law method is widespread in SMC area, the system (25) is controlled under reaching law method [2] with boundary layer [10] as well. That
L. Xiao and Y. Zhu
Uncertainty appears from k=101 to k=105
Uncertainty appears from k=101 to k=105
1
1 ↓ max= 0.1212
← max= 0.3221
−1
0
50
100 k
150
200
Input uk
0
↓ max=0.0896
0
−1
1 −2 0
0
50
100 k
150
200
1 0 −1
0
50
100 k
150
200
1
0
50
100 k
150
k
−1
Switching function s
Tracking error e
3
Tracking error e
2
Tracking error e
1
92
0.5 ↓ max= 0.1298 0 −0.5
200
−1
0
50
100 k
150
200
Fig. 1. Algorithm in this paper e(k) Fig. 2. Algorithm in this paper (u(k),s(k)) Uncertainty appears from k=101 to k=105 1 ← max= 0.3221 Input uk
0 ↑ min= −0.2083 −1
0
50
1
100 k
150
200
50
100 k
150
200
50
100 k
150
200
0
0
50
100 k
150
k
0
1
−1
0
1 Switching function s
−1
0.5 0 ↑ min= −0.1728
−0.5
200
−1
0
50
100 k
150
200
Fig. 4. [2] with boundary layer (u(k),s(k))
Uncertainty appears from k=101 to k=105
Uncertainty appears from k=101 to k=105
1
1 ← max= 0.4375 ↑ min= −0.1521 0
50
100 k
150
200
Input uk
0 −1
0 ↑ min= −0.2728 −1
1 −2 0 −1
50
100 k
150
200
1 0
50
100 k
150
200
1 0 −1
0
0
50
100 k
150
200
Fig. 5. DTSM algorithm in [8] e(k)
k
1
Tracking error e
2
Tracking error e
3
↑ min= −0.3290 −1
−2
Fig. 3. [2] with boundary layer e(k)
Tracking error e
0
0
Switching function s
Tracking error e
3
Tracking error e
2
Tracking error e
1
Uncertainty appears from k=101 to k=105 1
0.5 0 ↑ min= −0.1521 −0.5 −1
0
50
100 k
150
200
Fig. 6. DTSM algorithm in [8] (u(k),s(k))
Sliding Mode Prediction Based Tracking Control
93
is, let s(k + 1) = (1 − β)s(k) − γsat s(k) . Assume the same desired state vector Ω xdk , switching function, uncertainty, 1 − β, γ and Ω are employed. The results are shown in Fig. 3–4. Besides, we apply the discrete-time terminal sliding mode (DTSM) algorithm [8] to the system (25). According to [8], the switching function is s(k) = e3 (k), the control law is u(k) = g −1 (k) a(k) − f (k) . The same desired state vector xdk and uncertainty are used. The results are shown in Fig. 5–6. Compared with Fig. 3–6, after uncertainty appears, all values of the maximum absolute value of tracking error |e(k)|, switching function |s(k)|, and control signal |u(k)| are smaller in Fig. 1–2, thus the closed-loop system possesses stronger robustness under the control of the algorithm in this paper.
5
Conclusions
The robust tracking control algorithm presented in this paper, is the combination of predictive control strategy and sliding mode control method. Because of a designed sliding mode prediction model, the future information of sliding mode can be used. Due to feedback correction and receding horizon optimization, the desired control law is obtained and can be optimized continual. The analysis of robust stability proves that the closed-loop system has strong robustness to uncertainty with unknown bound. The simulation results verify the advantages of the proposed algorithm.
References 1. Utkin, V.: Variable structure systems with sliding modes. IEEE Trans. Automat. Contr. 22, 212–222 (1977) 2. Gao, W.B., Wang, Y., Homaifa, A.: Discrete-time variable structure control system. IEEE Trans. Ind. Electron. 42, 117–122 (1995) 3. Sun, M., Wang, Y., Wang, D.: Variable-structure repetitive control: A discrete-time strategy. IEEE Trans. Ind. Electron. 52, 610–616 (2005) 4. Chen, X.: Adaptive sliding mode control for discrete-time multi-input multi-output systems. Automatica 42, 427–435 (2006) 5. Munoz, D., Sbarbaro, D.: An adaptive sliding-mode controller for discrete nonlinear systems. IEEE Trans. Ind. Electron. 47, 574–581 (2000) 6. Liao, T.-L., Chien, T.-I.: Variable structure controller design for output tracking of a class of discrete-time nonlinear systems. JSME International Journal Series C 45, 462–469 (2002) 7. Zheng, Y., Dimirovski, G.M., Jing, Y., Yang, M.: Discrete-time sliding mode control of nonlinear systems. In: Proceedings of the 2007 American Control Conference, New York, USA, July 11-13, pp. 3825–3830 (2007) 8. Janardhanan, S., Bandyopadhyay, B.: On discretization of continuous-time terminal sliding mode. IEEE Trans. Automat. Contr. 51, 1532–1536 (2006) 9. Xi, Y.G.: Predictive control, 1st edn., pp. 18–90. National Defence Industry Press, Beijing (1993) 10. Slotine, J.J., Sastry, S.S.: Tracking control of nonlinear systems using sliding surfaces with application to robot manipulator. Int. J. Contr. 38, 465–492 (1983) 11. Bartoszewicz, A.: Discrete-time quasi-sliding-mode control strategies. IEEE Trans. Ind. Electron. 45, 633–637 (1998)
A Single Shot Associated Memory Based Classification Scheme for WSN Nomica Imran and Asad Khan School of Information Technology Monash University, Clayton, Victoria, Australia {nomica.choudhry,asad.khan}@infotech.monash.edu.au
Abstract. Identifier based Graph Neuron (IGN) is a network-centric algorithm which envisages a stable and structured network of tiny devices as the platform for parallel distributed pattern recognition. The proposed scheme is based on highly distributed associative memory which enables the objects to memorize some of its internal critical states for a real time comparison with those induced by transient external conditions. The approach not only save up the power resources of sensor nodes but is also effectively scalable to large scale wireless sensor networks. Besides that our proposed scheme overcomes the issue of false-positive detection - (which existing associated memory based solutions suffers from) and hence assures accurate results. We compare Identifier based Graph Neuron with two of the existing associated memory based event classification schemes and the results show that Identifier based Graph Neuron correctly recognizes and classifies the incoming events in comparative amount of time and messages. Keywords: Classification, Pattern Identification, Associated Memory Based Techniques, Wireless Sensor Networks.
1
Introduction
Wireless sensors are spatially distributed tiny battery operated devices. These sensors work autonomously and monitor physical or environmental conditions (e.g. heat, pressure, sound, light, electro-magnetic field, vibration, etc.). Each sensor is comprised of a receiver to sense the environment and a transmitter to disseminate this sensed data. Besides sensing and transmitting, current sensor nodes are also capable of storing and processing. The wireless sensor networks play a central role in achieving the goal of truly ubiquitous computing and smart environments. These sensor networks sense their environment and produce large amount of data. The generated data usually is in its raw form and is of rare use. To extract any meanings out of this data, it must be processed. The data processing required is application specific. Some applications prefer data fusion or aggregation – the processed data is combined from multiple sources to achieve inference. This aggregated information is often useful for other complex applications including network monitoring, load balancing, identifying hot spots, D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 94–103, 2011. c Springer-Verlag Berlin Heidelberg 2011
A Single Shot Associated Memory Based Classification Scheme for WSN
95
reaching consensus in distributed control and decision applications. However, these schemes suffer from the issue of central point of failure. In certain other applications [1], sensors are required to be deployed densely. For example, a large quantity of sensor nodes could be deployed over a battlefield to detect enemy intrusion. Because of the dense deployment of sensors, most of the data generated is redundant and is of no use. Transferring all of this data (including the redundant one) to the base station is not a wise choice as data transmission consumes more energy than any other operation. That is due to the fact that the radio of a sensor node is usually the most power hungry component. Data communication takes up approximately 90% of the scare battery power of a sensor node. Hence, reducing communication is desirable for a solution to achieve efficiency. A simple approach to reduce the communication is to reduce the amount of data collected by sparsely deploying the sensors only on key locations. Identifying such key spots itself is a challenging issue. Moreover, the data collected from the sparsely deployed sensors is usually not consistent and is therefore is hard to be verified for accuracy. Other applications demand even more generalized approach where information is extracted out of the streams of data in the form of patterns. Existing distributed pattern recognition methods are also not considered feasible for wireless networks. Some of these approaches focus on minimizing the error rate by creating feature pattern. The raw patterns are therefore first transformed to lower dimensional feature vectors preserving the salient class information of the original measurements. However, such a transformation can not improve the theoretically optimal recognition results. Other methods are based on probabilistic approach such as Bayesian classifiers. These recognition methods, require a large number of training set which in turn requires more resources. This makes it hard to apply these methods in WSN. Traditional signal processing based pattern detection schemes also proves to be too complex to use in WSN. The current technologies still cannot comprehend the situation. A better approach needs to be developed. In this paper, we present a light-weight pattern classification scheme, called Identifier based Graph Neurons (IDN). It is a neural network based patten recognition algorithm. The motivation to use neural network based methods in WSNs is the analogy between WSNs and ANNs. The proposed classifier is build while keeping the general characteristics of WSNs in mind. One of the peculiarities of this technique is the employment of parallel in-network processing capabilities to address scalability issues effectively. At the same times, we have tried to make the scheme as much close to the network as possible. Most of the processing is carried out in the network through in-network processing. It results in reducing the data as much as possible resulting in saving the communication cost.
2
Related Work
One of the well known neural network is RNN [2], [3], recurrent neural network. RNN is a class of neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it
96
N. Imran and A. Khan
to exhibit dynamic behavior. Unlike feed forward neural networks, RNN can use their internal memory to process arbitrary sequences of inputs. Though the performance of these techniques is quite reasonable, yet they still have some intrinsic drawbacks. These drawbacks are more prominent when these networks are applied for recognition of patterns in noise rich and high-resolution streams. The major concern is that there training is very slow and they require a lot of memory. At the same times, there execution and analysis are not real-time (statistical methods) and reasonable performance is restricted to specific temporal problems (LSTM). Without extra adaptations these methods suffer from noisy input, making them ill-suited for wireless sensor network applications. Away from those recognizing strategies, the development of Hopfields associative memory [6] has given a new opportunity in recognizing patterns. Associative memory applications provide means to recall patterns using similar or incomplete patterns. Associative memory is different to conventional computer memory. In conventional computer memory contents are stored and recalled based on the address. In associative memory architecture, on the other hand, contents are recalled based on how closely they are associated with the input (probe) data. Khan [7] have replaced the term content-addressable, introduced by Hopfield [8] as associative memory (AM). Hence, content addressable memory is interchangeable with AM. The spatio-temporal encoded, or spiking, neurons developed by Hopfield [6] draw upon the properties of loosely coupled oscillators. However, the problem of scaling up the AM capacity still remains. Izhikevich [9] states, though some new capabilities at differentiating between similar inputs have been revealed, however, there is no clear evidence to suggest their superiority over the classical Hopfield model in terms of an overall increase in the associative memory capacity. Hence, one of the motivations behind GN development was to create an AM mechanism, which can scale better than the contemporary methods. Furthermore, we needed to create an architecture which would not be dependent for its accuracy on preprocessing steps such as patterns segmentation [10] or training [7]. From the discussion of several pattern recognition techniques, it seems that GN [11] is a promising AI architecture for recognizing patterns in WSN.
3
A Simple GN-Based Classifier
The communication between GN nodes is accomplished through different stages with the following fashion: 1. Stage I: In this stage, each of p(position, value) pairs of the input pattern P will be sequentially broad-casted to all the nodes in the network. Those GN nodes which are predetermined to receive these values, respond by storing these input pairs. 2. Stage II: At this stage, the end of input pattern is determined by a synchronization phase, which informs all the nodes within the network about the end of an incoming pattern.
A Single Shot Associated Memory Based Classification Scheme for WSN
97
3. Stage III: In this stage, each of the GN nodes contacts its neighboring node(s) to determine whether these adjacent nodes have encountered any p(position, value) pairs during the current input pattern cycle. These stages can be accomplished in an absolute parallel manner which can not only address the scalability concerns effectively but also can decrease the amount of processing delays to maintain real time processing functionality. 3.1
False Positive Detection Problem in GN
Each GN in GN application requires only three parameters to process an input pattern, including its own values and values of its adjacent GNs. This restricted view of the GNs leads to inaccurate results. As an instance, several GNs can recall their sub-patterns simultaneously which in turn leads to a full recall. However, the pattern which is recalled may not have been presented before. Thus, from each active GNs perspective, the recall of sub-patterns is an accurate action however the recall of the pattern by the array is not a justified act and as a result, the overall GN application will botched up in such a scenario. Following example will explain the False Positive Detection Problem in detail. Example of False Positive Detection in GN. As an example, assume that there is a GN-based classifier that can receive patterns of length L = 5 over E = {x, y}. The constructed GN overlay have two rows, one each for x and y and five columns. Consider, GN overlay has already processed two input patterns xxxyy and yxxyx. Notice that, these patterns has been processed by nodes n(x, 1), n(x, 2), n(x, 3), n(y, 4), n(y, 5) and n(y, 1), n(x, 2), n(x, 3), n(y, 4), n(x, 5) respectively as shown in Table 1. Now, upon arrival of a third pattern xxxyx, nodes n(x, 1), n(x, 2), n(x, 3), n(y, 4), n(x, 5) change their state from inactive to active. Each active node transmit its value and position to its left and right active neighbors. On receipt of this information, all active node locally construct a sub-pattern. Subsequently, all active nodes find their locally constructed sub-patterns in their respective biasarrays and individually flag the incoming pattern as a recall. The recalls of all five sub patterns by individual nodes are correct from their local perspective, but the overall recall of the pattern by the GN is incorrect. This leads GN-based classifier to falsely conclude that the incoming pattern xxxyx is already stored. This problem is commonly known as false-positive-recall. Table 1 shows the bais-array of the nodes with non-empty bias-array values after arrival of third pattern xxxyx. Table 1. Example: A GN Based Classifier n(x, 1) n(y, 1) n(x, 2) n(y, 2) n(x, 3) n(y, 4) n(x, 5) -xx — -xx
— -yx —
xxx yxx xxx
xyy xyx xyy
xxy xxy xxy
yy— —
— yxyx-
98
N. Imran and A. Khan
4
Proposed Scheme
In a Identifier based GN-based classifier, each node n(ei , j) is programmed to respond to only a specific element ei at a particular position j in a pattern. That is, node n(ei , j) can only process all those patterns in P such that ei is at the jth position in that pattern. Each node maintains an active/inactive state flag to identify whether it is processing the incoming pattern or not. Initially all nodes are in inactive state. Upon arrival of a pattern, if a node finds its programmed element ei at the given position in the pattern, it switches its state to active otherwise it remains inactive. Only active nodes participate in our algorithm and inactive nodes remain idle. At any time, there can be exactly L active nodes in a IGN. Hence, there are exactly one active left-neighbor and exactly one active right-neighbor of a node n(ei , j) where j = 0, l. Whereas terminal nodes n(ei , 0) and n(ei , L) has only one active left and right neighbor respectively. 4.1
System Model
In this section, we model the wireless sensor network. Table 2 provides the summary of notations used in this paper. Let there are N sensor nodes in the wireless sensor network. Each sensor senses a particular data from its surroundings. Let E = {e1 , e2 , . . . , eE } be a non-empty finite set of such data elements sensors sense from their surroundings. We find it convenient to describe input data in the form of patterns. A pattern over E is a finite sequence of elements from E. The length of a pattern is the number of sensors in the system. We define P as a set of all possible patterns of length L over E: P = {p1 , p2 , . . . , pP }, where P is the total number of possible unique patterns in P and can be computed as: P = EL. For example, if E = {x, y}, then P of length, say, 3 over E is: P = {xxx, xxy, xyx, xyy, yxx, yxy, yyx, yyy}. We model IGN as a structured overlay G = {(E × L)} where L = {1, 2, . . . , L}: G = {n(ei , j)} forall ei ∈ E, j ∈ L, where n(ei , j) is a node in G at i-th row and j-th column. GN can be visualized as a two dimensional array of L rows and E columns. Total number of nodes in the G are E × L. We refer all the nodes in the (j − 1) column as the left neighbors of any node n(∗, j) in j-th column. Similarly, all the nodes in the (j + 1) column are called as the right neighbors of n(∗, j).
A Single Shot Associated Memory Based Classification Scheme for WSN
99
Table 2. Notation of proposed system model Symbol Description N E E L P P G n(ei , j) Ls
4.2
number of sensors nodes set of all elements {e1 , e2 , . . . , eE } size of set E pattern length set of all patterns of length L over E size of set P and is equal to E L GN overlay, a two-dimensional array of E × L a node at i-th row and j-th column in G Local State of active node n(ei , j)
Proposed Protocol
On arrival of an input pattern P, each active node n(ei , j) store ei in its jth position. Each node n(ei , j) sends its matched element ei to its active neighbors (j + 1) and (j − 1). The IGNs at the edges will send there matched elements to there penultimate neighbors only. Upon receipt of the message, the active neighboring nodes update there bias array. Each active node n(ei , j) will assign a local state Ls to the received (ei , ) value. The generated local state Ls will be Recall if the the added value is already present in the bias array of the active node and it will be a store if in-case its new value. An < ID > will be generated against each added value. There are certain rules that need to be considered by each active node n(ei , j) while creating states and assigning < IDS > against those states. The rules are as under: Rule 1: Store(Si) > Recall(Ri) . Store Si has a natural superiority over Recall Ri i-e Store(Si) > Recall(Ri) . If node n(ei , j) self local state Ls is Recall but it receives a Store command from any of its neighbors, (j + 1) or (j − 1), it will update its own state from Recall(Ri) to Store(Si) . Rule 2: All New Elements. If any of the elements presented to G is not existing in the bias array of any of the active nodes n(ei , j) suggests that its a new pattern. Each active node n(ei , j) will create a new
by incrementing the already stored maximum in there bias array by 1. Rule 3: All Recalls with Same ID. If ei presented to G is the recall of previously stored pattern with same , means that its a repetitive pattern. The same will be allocated to this pattern. Rule 4 : All Recalls with Different IDs. If all the ei of the pattern P presented to G are the recall of already stored patterns with different indicates that it’s a new pattern. Each active node n(ei , j) will find out the max(ID) in there bias array and will increment it by 1.
100
N. Imran and A. Khan
Rule 5: Mix of Store and Recall. If the end decision is to Store due to mix of Store and Recall, each active node n(ei , j) will again find out the max(ID) in there bias array and will increment it by 1. After generating localagainst generated states each active node n(ei , j) will undergo phase transition mode. During first round of phase transition mode, all the active nodes n(ei , j) will share locally generated and Ls with there (j + 1) and (j − 1) neighbors. On receiving the values all the n(ei , j) will compare the received values with there local values. If received value and self value is same there won’t be any change in there state. If received value = local value, the node will upgrade its local value according to the rules listed below: Transition Rule 1. If the active node n(ei , j) has received a greater value from its left neighbor (j + 1), it will upgrade its local state and transfer this updated value to its right (j − 1) neighbor only. Transition Rule 2. In case if the received value from both the neighbors (j +1) and (j − 1) are smaller than the local value, node n(ei , j)will upgrade its value and transfer this new value to both of its neighbors. When the pattern is resolved, the generated will be stored in the bias array of each active IGN. Bias array wont be having duplicated . It is also not necessary that the in the bias array will be in ordered form. Once the pattern has been stored, a signal is sent out within the network informing all the IGN nodes that the pattern has been stored. This is called the pattern resolution phase.
5
Simulation Work
For the purpose of ensuring the accuracy of our proposed IGN algorithm for pattern recognition, we have conducted a series of tests involving random patterns. We develop an implementation of IGN in Eclipse Java. We have performed several runs of experiments each time choosing random patterns. We have constructed a database of stored patterns in which patterns can be added manually and stored in the database too. The performance metrics are accuracy and time. To estimate the communication overhead for HGN two factors are considered. 1). Bias array size and 2). time required by the HGN array to execute the request. In case, of IGN the number of bias array entries are significant but they are not as high as in HGN as it does not have to create hierarchies and secondly the execution time is also less as compared to HGN as it only has to communicate to its immediate neighbors. In IGN the decision is not taken by the top node as a result lot of processing and communication cost is saved and the storage requirements within the bias array are not significantly extended with the increase in the number of stored patterns from the scalability point of view. It should be mentioned that the implementation of HGN requires asynchronous communication and mutually exclusive processes while these features can be easily implemented in Java, compared with other real-time programming
A Single Shot Associated Memory Based Classification Scheme for WSN
(a)
101
(b)
Fig. 1. (a) Information about all the active nodes in IGN (b) Recall percentage of active nodes in IGN
(a)
(b)
Fig. 2. Execution time for selected patterns - (a) Average execution time for 1000 patterns in IGN, GN and HGN, (b) Average execution time for 2000 patterns in IGN and GN
(a)
(b)
Fig. 3. Execution time for selected patterns - (a) Average execution time for 500 patterns in IGN, GN and HGN, (b) Average execution time for 5000 patterns in IGN and GN
102
N. Imran and A. Khan
(a)
(b)
Fig. 4. Execution time for selected patterns - (a) Average execution time for six selected patterns in IGN, GN and HGN, (b) Average execution time for six selected patterns in IGN and GN
languages such as C or Ada. Fig. 1 shows the number of entries in the bias array of IGNs while processing a database of 1000 patterns, in which each pattern is consisted of 7 elements in 5 positions However, the percentage of recalls for each of the GNs is also illustrated. The results in Fig. 2 also show that the pattern detection time in IGN is much less as compared to HGN. This is mainly due to the fact that by increasing the number of stored patterns, the search will be accomplished through higher layers of HGN hierarchy which results in more delays in delivering the final outcome. To investigate the impact of response time,Fig. 3 illustrates a comparison between GN, IGN and HGN. The response time for IGN is also not consistent because of the overhead of message communication that varies from pattern to pattern. A comparison between the response time of GN and IGN for selected patterns have been made it Fig. 4.
6
Conclusion and Future Work
In this paper we have proposed an in-network based pattern matching approach for providing scalable and energy efficient pattern recognition within a sensor network. IGN algorithm not only provides a single-cycle learning model which is remarkably suitable for real time applications but also overcome the issue of crosstalk in normal GN approach by delivering accurate results. It is highly scalable solution that works close to the network and hence save communication cost resulting in prolonging the lifetime of the network.
References 1. Yu, L., Wang, N., Meng, X.: Real-time forest fire detection with wireless sensor networks. In: Proceedings of the 2005 International Conference on Wireless Communications, Networking and Mobile Computing, vol. 2, pp. 1214–1217 (2005)
A Single Shot Associated Memory Based Classification Scheme for WSN
103
2. Obst, O.: Poster abstract: Distributed fault detection using a recurrent neural network. In: Proceedings of the 2009 International Conference on Information Processing in Sensor Networks, IPSN 2009, pp. 373–374. IEEE Computer Society, Washington, DC, USA (2009) 3. Sollacher, R., Gao, H.: Efficient online learning with spiral recurrent neural networks. In: Neural Networks, IJCNN 2008, pp. 2551–2558 (2008) 4. Fu, K.S., Aizerman, M.A.: Syntactic methods in pattern recognition. IEEE Transactions on Systems, Man and Cybernetics 6(8), 590–591 (1976) 5. Doumit, S., Agrawal, D.: Self-organized criticality and stochastic learning based intrusion detection system for wireless sensor networks. In: Military Communications Conference, MILCOM 2003, vol. 1, pp. 609–614. IEEE, Los Alamitos (2003) 6. McEliece, R.J., Posner, E.C., Rodemich, E.R., Venkatesh, S.S.: The capacity of the hopfield associative memory. IEEE Trans. Inf. Theor. 33(4), 461–482 (1987) 7. Muhamad Amin, A.H., Raja Mahmood, R.A., Khan, A.I.: Analysis of pattern recognition algorithms using associative memory approach. In: A Comparative Study Between the Hopfield Network and Distributed Hierarchical Graph Neuron (dhgn), pp. 153–158. IEEE, Los Alamitos (2008) 8. Kim, J., Hopfield, J.J., Winfree, E.: Neural network computation by in vitro transcriptional circuits 2004. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, pp. 681–688 (2007) 9. Izhikevich, E.M.: Dynamical Systems in Neuroscience: The Geometry of Excitability and Bursting (Computational Neuroscience), 1st edn. The MIT Press, Cambridge (2006) 10. Basirat, A.H., Khan, A.I.: Building context aware network of wireless sensors using a novel pattern recognition scheme called hierarchical graph neuron. In: ICSC 2009: Proceedings of the 2009 IEEE International Conference on Semantic Computing, pp. 487–494. IEEE Computer Society, Washington, DC, USA (2009) 11. Basirat, A.H., Khan, A.I.: Building context aware network of wireless sensors using a novel pattern recognition scheme called hierarchical graph neuron. In: IEEE International Conference on Semantic Computing, ICSC 2009 (2009) 12. Nasution, B.B., Khan, A.: A hierarchical graph neuron scheme for real-time pattern recognition. IEEE Transactions on Neural Networks, 212–229 (2008) 13. Mahmood, R., Muhamad Amin, K.A.: A distributed hierarchical graph neuronbased classifier: An efficient, low-computational classifier. In: First International Conference on Intelligent Networks and Intelligent Systems, ICINIS 2008 (2008) 14. Baqer, M., Khan, A.I., Baig, Z.A.: Implementing a graph neuron array for pattern recognition within unstructured wireless sensor networks. In: Enokido, T., Yan, L., Xiao, B., Kim, D.Y., Dai, Y.-S., Yang, L.T. (eds.) EUC-WS 2005. LNCS, vol. 3823, pp. 208–217. Springer, Heidelberg (2005)
Dynamic Structure Neural Network for Stable Adaptive Control of Nonlinear Systems Jingye Lei College of Engineering, Honghe University, 661100, Mengzi,Yunnan, China [email protected]
Abstract. In this paper an adaptive control strategy based on neural network for a class of nonlinear system is analyzed. A simplified algorithm is presented with the technique in generalized predictive control theory and the gradient descent rule to accelerate learning and improve convergence. Taking the neural network as a model of the system, control signals are directly obtained by minimizing the cumulative differences between a setpoint and output of the model. The applicability in nonlinear system is demonstrated by simulation experiments. Keywords: adaptive control; neural network; nonlinear system; predictive control; gradient descent rule; simulation.
1 Introduction Great progress has been witnessed in neural network (NN) control of nonlinear systems in recent years, which has evolved to become a well-established technique in advanced adaptive control. Adaptive NN control approaches have been investigated for nonlinear systems with matching [1], [2], [3], [4], and nonmatching conditions [5], [6], as well as systems with output feedback requirement [7], [8], [9], [10], [11]. The main trend in recent neural control research is to integrate NN, including multilayer networks [2], radial basis function networks [12], and recurrent ones [13], with main nonlinear control design methodologies. Such integration significantly enhances the capability of control methods in handling many practical systems that are characterized by nonlinearity, uncertainty, and complexity [14], [15], [16]. It is well known that NN approximation-based control relies on universal approximation property in a compact set in order to approximate unknown nonlinearities in the plant dynamics. The widely used structures of neural network based control systems are similar to those employed in adaptive control, where a neural network is used to estimate the unknown nonlinear system, the network weights need to be updated using the network’s output error, and the adaptive control law is synthesized based on the output of networks. Therefore the major difficulty is that the system to be controlled is nonlinear with its diversity and complexity as well as lack of universal system models. It has been proved that the neural network is a complete mapping. Using this characteristic, an adaptive D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 104–111, 2011. © Springer-Verlag Berlin Heidelberg 2011
Dynamic Structure Neural Network for Stable Adaptive Control of Nonlinear Systems
105
predictive control algorithm is developed to solve the problems of tracking control of the systems.
2 Problem Statement Assume that the unknown nonlinear system to be considered is expressed by
y (t + 1) = f ( y (t ), y (t − 1),", y (t − n), u (t ), u (t − 1),", u (t − m))
(1)
where y (t ) the scalar output of the system, u (t ) is the scalar input to the system, f (") is unknown nonlinear function to be estimated by a neural network, and n and m are the known structure orders of the system. The purpose of our control algorithm is to select a control signal u (t ) , such that the output of the system y (t ) is made as close as possible to a prespecified setpoint r (t ) . Figure 1 shows the overall structure of the closed-loop control system which consists of the system (1), a feedforward neural network which estimates f (") and a controller realized by an optimizer.
Fig. 1. Neural-network control system
Figure 2 shows the neural network architecture. A two-layer neural network is used to learn the system and the standard backpropagation algorithm is employed to train the weights. The activation functions are hyperbolic tangent for the first layer and linear for the second layer. Since the input to the neural network is
p = [ y (t ), y (t − 1), " , y (t − n ), u (t ), u (t − 1), " , u (t − m )]
(2)
The neural model for the unknown system (1) can be expressed as
yˆ(t + 1) = fˆ [ y (t ), y (t − 1)," , y (t − n), u (t ), u (t − 1), " , u (t − m))
(3)
Where yˆ (t + 1) is the output of the neural network and fˆ is the estimate of f . Since the backpropagation training algorithm guarantees that
[ y (t + 1) − yˆ (t + 1)]2 = min
(4)
106
J. Lei
Fig. 2. The neural-network structure
yˆ (t + 1) is also referred to as a predicted output of the system (1). Therefore the control signal can be selected such that yˆ (t + 1) is made as close as possible to r (t ) .
3 Adaptive Algorithm Take an objective function J as follows:
J= where
1 2 e (t + 1) 2
e(t + 1) = r (t + 1) − yˆ (t + 1)
(5) (6)
The control signal u (t ) should therefore be selected to minimize J . Using the neural network structure, (3) can be rewritten to give
yˆ (t + 1) = w 2 [tanh( w1 p + b1 )] + b 2
(7)
where w1 , w2 , b1 and b2 are the weights and biases matrices of the neural network. To minimize J , the u (t ) is recursively calculated via using a simple gradient descent rule
u (t + 1) = u (t ) − η
∂J ∂u (t )
(8)
where η > 0 is a learning rate. It can be seen that the controller relies on the approximation made by the neural network. Therefore it is necessary that yˆ (t + 1) approaches the real system output y (t + 1) asymptotically. This can be achieved be keeping the neural network training online. Differentiating (5) with respect to u (t ) , it can be obtained that
∂J ∂yˆ (t + 1) = −e(t + 1) ∂u (t ) ∂u (t )
(9)
where ∂yˆ (t + 1) ∂u (t ) is known as the gradient of the neural network model with respect to u (t ) . Substituting (9) into (8), we have
Dynamic Structure Neural Network for Stable Adaptive Control of Nonlinear Systems
u (t + 1) = u (t ) + ηe(t + 1)
∂yˆ (t + 1) ∂u (t )
107
(10)
The gradient can then be analytically evaluated by using the known neural network structure (7) as follows:
dp ∂yˆ (t + 1) = w2 [sec h 2 ( w1 p + b1 )]w1 du ∂u (t )
(11)
dp = [0,0," ,0,1,0," 0]′ du is the derivative of the input vector p respect to u (t ) . Finally, (10) becomes where
u (t + 1) = u (t ) + ηe(t + 1) w2 [sec h 2 ( w1 p + b1 )]w1
dp du
(12)
(13)
Equation (13) can now be used in a computer program for real-time control. To summarize, we have the adaptive algorithm: 1). produce yˆ (t + 1) using (7); 2). find e(t + 1) using (6); 3).update the weights using backpropagation algorithm; 4). compute new control signal from (13); 5). feed u (t + 1) to the system; 6). go to step 1).
4 Adaptive Predictive Control Algorithm The algorithm described in section 3 can be improved by using the technique in generalized predictive control theory [5], which considers not only the design of the instant value of the control signal but also its future values. As a result, future values of setpoint and the system output are needed to formulate the control signal. Since the neural network model (3) represents the plant to be controlled asymptotically, it can be used to predict future values of the system output. For this purpose, let T be a prespecified positive integer and denote
Rt ,T = [r (t + 1), r (t + 2)," , r (t + T )]′
(14)
as the future values of the setpoint and
Yˆt ,T = [ yˆ (t + 1), yˆ (t + 2),", yˆ (t + T )]′
(15)
as the predicted output of the system using the neural network model (7), then the following error vector.
Et ,T = [e(t + 1), e(t + 2),", e(t + T )]′
(16)
108
J. Lei
can be obtained where
e(t + i ) = r (t + i ) − yˆ (t + i )
(17)
Defining the control signals to be determined as
U t ,T = [u (t + 1), u (t + 2),", u (t + T )]′
(18)
and assuming the following objective function
1 J 1 = [ EtT,T Et ,T ] 2
(19)
then our purpose is to find U t ,T such that J 1 is minimized. Using the gradient decent rule, it can be obtained that ∂J 1 (20) U tk,T+1 = U tk,T − η ∂U tk,T where
∂Yˆt ,T ∂J 1 = E t ,T ∂U tk,T ∂U tk,T
(21)
and
∂Yˆt ,T ∂U tk,T
⎡ ∂yˆ (t + 1) ⎢ ∂u (t ) ⎢ ⎢ ∂yˆ (t + 2) = ⎢ ∂u ( t ) ⎢ # ⎢ ∂yˆ (t + T ) ⎢ ⎣ ∂ u (t )
⎤ ⎥ ⎥ ∂yˆ (t + 2) ⎥ 0 " ⎥ ∂u (t + 1) ⎥ # % # ∂yˆ (t + T ) ∂yˆ (t + T ) ⎥ " ⎥ ∂u (t + 1) ∂u (t + T − 1) ⎦ 0
"
0
(22)
It can be seen that each element in the above matrix can be found by differentiating (3) with respect to each element in (18). As a result, it can be obtained that n −1 ∂yˆ (t + n) ∂fˆ ( p) ∂fˆ ( p) ⎡ ∂yˆ (t + i ) ⎤ = +∑ ∂u (t + m − 1) ∂u (t + m − 1) i = m ∂yˆ (t + i ) ⎢⎣ ∂u (t + m − 1) ⎥⎦
for n = 1, 2," , T
(23)
m = 1, 2," , T
Equation(22) is the well-known Jacobian matrix which must be calculated using (23) every time a new control signal has to be determined. This could result in a large computational load for a big T . Therefore a recursive form for calculating the Jacobian matrix is given in the following so that the algorithm can be applied to the real-time systems with fast responses.
Dynamic Structure Neural Network for Stable Adaptive Control of Nonlinear Systems
∂fˆ ( p ) ⎤ ∂yˆ (t + n − 1) ⎡ ∂yˆ (t + n ) = ⎥ ⎢1 + ∂u (t + m − 1) ∂u (t + m − 1) ⎢⎣ ∂yˆ (t + n − 1) ⎥⎦
109
(24)
From (24) it can be seen that in order to find all the elements in the Jacobian matrix it is only necessary to calculate the diagonal elements and the partial derivatives of the fˆ (") with respect to the previous predicted output. Therefore less calculation is required to formulate the Jacobian matrix. The two derivative terns in (24) are given by
where
∂yˆ (t + n − 1) dp ⎤ ⎡ = w2 ⎢sec h 2 ( w1 p + b1 )]w1 ⎥ ∂u (t + m − 1) du ⎦ ⎣
(25)
⎡ ∂fˆ ( p ) dp ⎤ = w2 ⎢sec h 2 ( w1 p + b1 )]w1 ⎥ ˆ ∂y (t + n − 1) dyˆ ⎦ ⎣
(26)
dp ′ = [1, 0,", 0, 0, 0,", 0] dyˆ
(27)
Equations (25) and (26) can now be used in a computer program to calculate the Jacobian matrix and the adaptive predictive algorithm is summarized as follows: 1). Select a T ; 2). Find new predicted output yˆ (t + 1) using (7); 3).Calculate ∂yˆ (t + n − 1) ∂u (t + m − 1) and ∂fˆ ( p ) ∂yˆ (t + n − 1) via (25) and (26); 4).Update vector p , using new yˆ (t + 1) calculated in step 2 and u(t + n − 1) from the vector of future control signals (18); 5). Use (24) and the results obtained in Step 3) to calculate the off-diagonal elements of the Jacobian; 6). Use (20) form a new vector of future control signals; 7). Apply u (t + 1) found in step 6) to close the control loop; 8). Return to step 1).
5 Simulation The following simulation with the adaptive algorithm and the adaptive predictive algorithms gives the situations of systems tracking square-wave. Considering a SISO nonlinear system:
y (t + 1) = 2.6 y (t ) − 1.2 y (t − 1) + u (t ) + 1.2u (t − 1) + sin(u (t ) u (t ) + u (t − 1) + y (t ) + y (t − 1) + u (t − 1) + y(t ) + y (t − 1) − 1 + u 2 (t ) + u 2 (t − 1) + y 2 (t ) + y 2 (t − 1)
(28)
Figure 3 and Figure 4 show the simulation results with the adaptive control and the adaptive predictive control respectively. It can be seen that the adaptive predictive controller has excellent control performance with better stability and convergence, less study parameters and small calculation.
110
J. Lei
Fig. 3. Adaptive tracking control
Fig. 4. Adaptive predictive tracking control
6 Conclusions A neural network based adaptive control strategy is presented in this paper for general unknown nonlinear systems, where a simplified formulation of the control signals is obtained using the combination of a feedforward neural network and an optimization scheme. The neural network is used online to estimate The system and the backproagation training algorithm is applied to train the weights. Taking the resulting neural network estimation as a known nonlinear dynamic model for the system, control signals can be directly obtained using the well-established gradient descent rule. An improved algorithm is also included where both instant and future values of control signals are derived. Simulations have successfully demonstrated the use of the proposed method.
References 1. Tan, Y., Keyser, R.D.: Neural-network-based Adaptive Predictive Control. Advances in MBPC, 77–88 (1993) 2. Chen, F.C., Khalil, H.K.: Adaptive Control of Nonlinear Systems Using Neural Networks. Int. J. Cont. 55, 1299–1317 (1994) 3. Man, Z.H.: An Adaptive Tracking Controller Using Neural Networks for a Class of Nonlinear Systems. IEEE Trans. Neural Networks. 9(5), 947–954 (1998) 4. Fabri, S., Dadirkamanathan, V.: Dynamic Structure Neural Networks for Stable Adaptive Control of Nonlinear Systems. IEEE Trans. Neural Networks 6(5), 1151–1165 (1995) 5. Jin, L., Nikiforuk, P.N., Gupta, M.M.: Adaptive Tracking of SISO Nonlinear Systems Using Multilayered Neural Networks. In: Proceeding of American Control Conference, Chicago, pp. 56–62 (1992)
Dynamic Structure Neural Network for Stable Adaptive Control of Nonlinear Systems
111
6. Rovithakis, G.A., Christodoulou, M.A.: Direct adaptive regulation of unknown nonlineare dynamical systems via dynamic neural networks. IEEE Trans. Syst., Man, Cybern. 125, 1578–1595 (1995) 7. Hayakawa, T., Haddad, W.M., Hovakimyan, N.: Neural Network Adaptive Control for a Class of Nonlinear Uncertain Dynamical Systems With Asymptotic Stability Guarantees. IEEE Transactions on Neural Networks 19(1), 80–89 (2008) 8. Hayakawa, T., Haddad, W.M., Hovakimyan, N., Chellaboina, V.: Neural network adaptive control for nonlinear nonnegative dynamical systems. IEEE Transactions on Neural Networks 16(2), 399–413 (2005) 9. Deng, H., Li, H.X., Wu, Y.H.: Feedback-Linearization-Based Neural Adaptive Control for Unknown Nonaffine Nonlinear Discrete-Time Systems. IEEE Transactions on Neural Networks 19(9), 1615–1625 (2008) 10. Li, T., Feng, G., Wang, D., Tong, S.: Neural-network-based simple adaptive control of uncertain multi-imput multi-output nonlinear systems. Control Therory & Application 4(9), 1543–1557 (2010) 11. Chen, P.C., Lin, P.Z., Wang, C.H., Lee, T.T.: Robust Adaptive Control Scheme Using Hopfield Dynamic Neural Network for Nonlinear Nonaffine Systems. In: Zhang, L., Lu, B.-L., Kwok, J. (eds.) ISNN 2010. LNCS, vol. 6064, pp. 497–506. Springer, Heidelberg (2010) 12. Sun, H.B., Li, S.Q.: General nonlinear system MRAC based on neural network. Application Research of Computers 26(11), 4169–4178 (2009) 13. Poznyak, A.S., Sanchez, E.N., Yu, W.: Differential Neural Networks for Robust Nonlinear Control. In: Identification, State Estimation and Trajectory Tracking, World Scientific, Singapore (2001) 14. Farrell, J.A., Polycarpou, M.M.: Adaptive Approximation Based Control: Unifying Neural, Fuzzy and Traditional Adaptive Approximation approaches. Wiley, Hoboken (2006) 15. Ge, S.S., Lee, T.H., Harris, C.J.: Adaptive Neural Network control of Robotic Manipulators. World Scentific, River Edge (1998) 16. Lewis, F.L., Jagannathan, S., Yesilidrek, A.: Neural Network Control of Robot Manipulators and Nonlinear systems. Taylor and Francis, London (1999)
A Position-Velocity Cooperative Intelligent Controller Based on the Biological Neuroendocrine System Chongbin Guo1,2, Kuangrong Hao1,2,*, Yongsheng Ding1,2, Xiao Liang1, and Yiwen Dou1 2
1 College of Information Sciences and Technology MOE Engineering Research Center of Digitized Textile & Fashion Technology Donghua University, Shanghai 201620, China [email protected]
Abstract. Based on multi-loop regulation mechanism of neuroendocrine system (NES), the present paper introduces a novel position-velocity cooperative intelligent controller (PVCIC), which improves the performance of controlled plant. Corresponding to NES, the PVCIC structure consists of four sub-units: Planning unit (PU) is the motion input unit of desired velocity and position signals. Coordination unit (CU) is the position-velocity coordination with a designed soft switching algorithm. Identification optimization control unit (IOCU), which is inspired from hormone regulation theory, is as the key control unit including a PID controller, a hormone identifier and a hormone optimizer. The hormone identifier and hormone optimizer identify control error and optimize PID controller parameters respectively. The execution unit (EU) is as executor which includes driver and plant. The promising simulation results indicate the PVCIC is practical and useful with the better performance compared with the conventional PID controller. Keywords: Neuroendocrine system; Hormone regulation; Position-velocity controller; Cooperative intelligent control.
1 Introduction The bio-intelligent control systems and algorithms, such as artificial neural networks, evolutionary algorithms and artificial immune systems, usually not require accurate mathematical model of plants but have better control performances with physiological regulation [1, 2, 3]. However, some of them may be a little complicated, or not easy to be realized in the engineering. Therefore, it is necessary to develop more simple and efficient control method from new biological mechanism for the complicated system[4]. Besides nervous system and immune system, neuroendocrine system (NES) is also one of three major physiological systems in human body and has some outstanding cooperative modulation mechanisms. It has at least three types of feedback mechanisms which include long, short and ultra-short loop feedbacks. Through such multi-loop feedback mechanism, NES could control multi-hormones harmoniously, *
Corresponding Author.
D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 112–121, 2011. © Springer-Verlag Berlin Heidelberg 2011
A Position-Velocity Cooperative Intelligent Controller
113
meanwhile has high self-adaptability and stability in human body [5]. Therefore, some novel control methods can be developed by exploring and imitating the functionalities of NES. Such approach can also help to design some controllers for cooperative motion control scenarios such as position-velocity cooperative intelligent control. In previous works, based on neuroendocrine or endocrine system, some intelligent controllers have been designed and acquired good results in control field. Neal and Timmis [6] proposed the firstly artificial endocrine system (AES) in 2003. The AES was defined to include hormone secretion, hormone regulation and hormone control, and such theory was applied to design a useful emotional mechanism for robot control. Córdova and Cañete [7] discussed in conceptual terms the feasibility of designing an ANES in a robot and to reflect upon the bionic issues associated highly with complex automatons. Liu et al. [8] designed a neuroendocrine regulation principle-based two-level structure controller, which include a primary controller and an optimization controller that can not only achieve accurate control but adjust control parameters in real time. Ding et al. [9] developed a bio-inspired decoupling controller from the biregulation principle of growth hormone in NES. These controllers have good control performances and provide some new ideas to motion control field, but no one is designed for position-velocity cooperative intelligent control. The proposed position-velocity cooperative intelligent control in this paper is defined as follows. The controlled plants could require a fine uniform velocity, a fast start, a smooth stop, an accurate position and a strong self-adaptability, etc[10, 11]. In this paper, based on the multi-loop regulation mechanism of NES [12], a novel position-velocity cooperative intelligent controller (PVCIC) with related control structure, detailed designed algorithm and the steps in parameters tuning are proposed. The PVCIC consists of a planning unit (PU), a coordination unit (CU), an identification optimization control unit (IOCU) and an execution unit (EU). The PU is primarily responsible for processing and transmitting the desired velocity and position signal to CU. The CU is designed as long loop feedback mechanism, and it could be a coordinator between position and velocity. The IOCU is as the key control unit which is designed as short loop feedback mechanism and could send control signal via a PID controller. Meanwhile, based on ultra-short loop feedback mechanism and some hormone regulation mechanism [13, 14], control error is identified via a hormone identifier and PID parameters are optimized via hormone optimizer. The EU is an execution unit which includes the driver and the plant. The simulation experimental results demonstrate that the PVCIC can not only ensure position accuracy but also keep the plant operating smoothly; and accuracy, stability, responsiveness and self-adaptability of the PVCIC are better than those of the conventional PID controller. The main contribution of this paper lies in that based on multi-loop regulation mechanism of NES it provides a bio-inspired cooperative intelligent control approach. And based on principle of the hormone regulation, it provides an identification and optimization approach to improve the global result. Furthermore, simulating biological mechanism of NES in the control system provides a new and efficient idea for the position-velocity cooperative intelligent control, and these approaches also could to be extended to achieve more complex multi-objective cooperative intelligent control. The paper is organized as follows. Section 2 introduces the multi-loop regulation mechanism of NES, and then presents the PVCIC structure. In Section 3, the detailed control algorithm design of the PVCIC is elaborated. The parameters tuning methods
114
C. Guo et al.
are also introduced in Section 4. The simulation experimental results are given to verify the effectiveness of the proposed controller in Section 5. Finally, the work is concluded in Section 6.
2 PVCIC Structure Design Inspired from NES 2.1 Regulation Mechanism of NES A classic instance of NES can be viewed as a multiple closed-loop feedback control system, which is essential for the stability and agility of the inner environment of human body, as shown in Fig. 1. The regulation mechanism of neuroendocrine hormone can be generalized as: central nervous system processes and transmits the nerve impulse as appropriate response to hypothalamus. Hypothalamus receives the nerve impulses and secretes relevant tropic hormone (TH), which stimulates pituitary to secrete releasing hormone (RH). Then RH stimulates gland to secrete hormones. There are at least three types of feedback, which include ultra-short loop feedback, short loop feedback, and long loop feedback. The ultra-short loop feedback means the concentration of RH is fed back to the pituitary; the short loop feedback means the concentration of hormone is fed back to the pituitary; the long loop feedback means the concentration of hormone is fed back to the hypothalamus. Through the multiloop feedback mechanism the hormone concentration is stable and easy [12].
Fig. 1. Regulation mechanism of NES
2.2 Controller Structure Design According to multi-loop regulation mechanism of NES, a novel PVCIC is provided, as shown in Fig. 2. The proposed structure and sub-units of the PVCIC are similar as that of NES. Corresponding to NES, the sub-unit PU, CU, IOCU and EU can be regarded as central nervous system, hypothalamus, pituitary and gland respectively. The PU is primarily responsible for processing and transmitting the desired velocity and position signal to CU. The CU is as a position-velocity coordinator and designed as the long loop feedback mechanism. This unit can receive the actual position feedback signal and transmit the processed signal to PU. The IOCU is as the key control unit which includes a PID controller, a hormone identifier and a hormone optimizer. PID controller is the main control module. Hormone identifier is designed to identify control error and create corresponding corrected factor. Hormone optimizer is responsible for optimizing the control parameters of the PID-controller. There are two kinds loop feedback, i.e. short and ultra-short loop feedback. The short loop feedback is that the
A Position-Velocity Cooperative Intelligent Controller
115
Fig. 2. The structure of PVCIC
actual velocity signal is fed back to the PID controller. The ultra-short loop feedback is that the adjusted parameters are fed back to the PID controller via the hormone identifier and the hormone optimizer. The GU is corresponding to an execution unit which includes driver module and plant module.
3 Algorithm Design The PU is designed for planning the desired velocity Vin (t ) and desired position Pin (t ) . The EU is designed as an executor which outputs the actual position signal P(t ) . We can read the P(t ) by sensor, and then calculate the actual velocity signal V (t ) . In order to describe the control algorithm of the PVCIC clearly, design of the CU and IOCU are detailed as following. 3.1 CU Design The CU is designed as a coordinator for velocity and position. Position-velocity mode is used when the actual position is far from desired position that plant can achieve fine uniform velocity, a fast start. Position-position control mode is used when the actual position is close to desired position that plant can achieve accurate position and a smooth stop. This rule for automatic switching is described as follows [10, 11]:
⎧ position-velocity, r > rc , strategy = ⎨ ⎩ position-position, r < rc
(1)
where r is the distance from the actual position to desired position and rc is a constant distance. The control algorithm can be designed as follows: ⎧ V (t ), | e1 (t ) |> δ , U1 (t ) = ⎨ in ⎩e1 (t ) ⋅ K c , | e1 (t ) |< δ
(2)
116
C. Guo et al.
where Kc =
V (t switch ) . e1 (t switch )
(3)
Where U1 (t ) is the output of the CU, e1 (t ) is the position error signal between the desired and the actual position, δ is a position switching coefficient, K c is a conversion coefficient, and tswitch is the initial time of the switching process. 3.2 IOCU Design
In this unit, conventional PID controller is chosen as initial controller, and then its control parameters are adjusted real-time by hormone identification and hormone optimization. (1) Initial PID controller. The initial control law of PID controller obeys the conventional PID control algorithm U 2 (t ) = K p0 ⋅ e2 (t ) + K i0 ⋅ ∫ e2 (t ) dt + K d0 ⋅
de2 (t ) . dt
(4)
Where e2 (t ) is the error signal between the U1 (t ) and the actual velocity, U 2 (t ) is the output of the PID controller, K 0p , K i0 , and K d0 are initial control parameters of the PID controller. (2) Hormone identifier. In NES, the organ could enhance identification and secretion precision within the working scope. However when the stimulate signal beyond the controlled scope, hormone secretion rate is at its high limit. Similarly, the control error identification approach follows the principle of the hormone secretion. Therefore, the absolute value of control error e2 (t ) is calculated at first, and then mapped to the corresponding regulation scope. Hormone identification error 0 ≤ E (t ) ≤ 1 is designed as ⎧ | e2 (t ) | , | e2 (t ) |< emax − emin ⎪ , E (t ) = ⎨ emax − emin ⎪1, | e2 (t ) |≥ emax − emin ⎩
(5)
where emax and emin is high and low limited error of optimal scope, respectively. The hormone secretion rate is always nonnegative and monotone [13]. It will be monotone increasing if represents a positive control. If the control is negative, it will be decreasing. So the increasing and decreasing hormone secretion regulation mechanism follows the Hill functions [14]. The corresponding increasing corrected factor is design as Fup ( E (t )) =
and decreasing corrected factor is
E (t ) n1 , E (t ) n1 + 1
(6)
A Position-Velocity Cooperative Intelligent Controller
Fdown ( E (t )) =
1 , E (t ) n2 + 1
117
(7)
where n1 and n2 are Hill coefficients. (3) Hormone optimizer. If the secretion rate of hormone A is regulated by concentration of hormone B, the relationship between them is [14] S A (CB ) = aFup ,
( down )
(C B ) + S A 0 ,
(8)
where S A is the secretion rate of hormone A, CB is the concentration of hormone B, S A0 is the basal secretion rate of hormone A, and 0 ≤ a ≤ 1 is a coefficient of Fup , ( down ) . Therefore, the control algorithm of hormone secretor can be designed as S (t ) = S 0 + Fup ( E (t )) ⋅ ( S H − S 0 ) + Fdown ( E (t )) ⋅ ( S L − S0 ) .
(9)
Where S (t ) is the real-time secretion rate, S0 is the initial secretion rate, when error e2 (t ) is too big, the biggest or the least hormone secretion rate is S H and in contrast, the hormone secretion rate is S L . It also should be noted that if n1 = n 2 , Eq. (9) can be expressed as S (t ) = S L + Fup ( E (t )) ⋅ ( S H − S L ) ,
(10)
S (t ) = S H + Fdown ( E (t )) ⋅ ( S H − S L ) .
(11)
or
Ultimately, Eq. (9) is used to optimize initial control parameters of the PID controller. E.g. when control error e2 (t ) is too big, the proportion gain K 0p should decrease to K pH to weaken the control action, thus reduce the overshoot. In contrast, the propor-
tion gain should increase to K pL to enhance the control action to eliminate control error quickly. The correcting regulation of integral coefficient K i0 and differential coefficient K d0 are similar to that of the proportion gain. The algorithm of the parameters adjusted is ⎧ K P (t ) = K p0 + Fup ( E (t )) ⋅ ( K pH − K p0 ) + Fdown ( E (t )) ⋅ ( K pL − K p0 ) ⎪ 0 H 0 L 0 ⎨ Ki (t ) = Ki + Fup ( E (t )) ⋅ ( Ki − Ki ) + Fdown ( E (t )) ⋅ ( Ki − Ki ) , ⎪ K (t ) = K 0 + F ( E (t )) ⋅ ( K H − K 0 ) + F ( E (t )) ⋅ ( K L − K 0 ) d up d d down d d ⎩ d
(12)
where K p (t ) is the optimized proportion gain, K i (t ) is the optimized integral coefficient, K d (t ) is the optimized differential coefficient, K Hj and K Lj ( j = p, i, d ) is the limited parameter value when the control error is too big or small, respectively.
118
C. Guo et al.
(4) Optimized PID controller: When the initial PID control parameters are adjusted dynamically by the hormone identifier and hormone optimizer, the optimized control law is as follows: U 2 (t ) = K p (t ) ⋅ e2 (t ) + K i (t ) ⋅ ∫ e2 (t ) dt + K d (t ) ⋅
de2 (t ) . dt
(13)
4 Parameters Tuning 4.1 Tuning Steps (1) Tune the initial PID control parameters. First, only take the short loop feedback control (includes desired and actual velocity signals, PID controller and EU) into action, and then tune the initial control parameters K 0p , K i0 and K d0
approximately. (2) Tune the high and low limited PID parameters. As the step 1, only take the short loop feedback control into action. When control error e2 (t ) is too big, tune the control parameters K pH , K iH and K dH to ensure plant stable faster with little or without overshoot. In contrast, tune the control parameters K pL , KiL and K dL to ensure more accurate velocity. (3) Determine the high and low limited hormone identification error. According to the results in step 1 and step 2, determine the high limited error emax and low limited error emin of the optimal working scope. (4) Determine the Hill coefficients. Base on step 1, the ultra-short loop feedback is added to determine Hill coefficients n1 and n2 . By choosing suitable Hill coefficients, better velocity control performance can be achieved. (5) Determine the position switching coefficient. Take the PVCIC into action, and then determine the position switching coefficient δ to ensure control strategy switching smoothly.
5 Results and Discussion A typical mathematical model of the robot servo control system is chosen as controlled plant for simulation experiments, as shown in Eq. (14) [15]. G ( s) =
6000 . s + 128s 2 + 10240 s 3
(14)
To verify control effectiveness more clearly, the proposed PVCIC with the conventional PID controller (PVCIC without hormone identifier and hormone optimizer) is taken for comparison to find out whether hormone regulation method yields better global control results. To make the contrast effects more clearly, three groups of control parameters (PID-0, PID-H and PID-L) in conventional PID controller is the same
119
3RVLWLRQ
9HORFLW\
A Position-Velocity Cooperative Intelligent Controller
7LPH V
7LPH V
(a) Position contrast effect
(b) Velocity contrast effect
Fig. 3. The contrast control effectiveness
Table 1. Control parameter sets K p0
Ki0
K d0
K pH
KiH
K dH
K pL
KiL
KdL
emax
emin
n1
n2
δ
0.4
40
0.006
1
120
0.005
0.1
20
0.01
0.2
-0.2
2
2
0.1
as K 0j , K Hj and K Lj ( j = p, i, d ) in PVCIC, respectively. The other control parameters of them are same and control parameter sets are listed in Table 1. The same desired input signals are Pin (t ) = 1 and Vin (t ) = 1 . All control algorithms are developed under the Matlab/Simulink environment and the fixed sample time is T=0.001s. The simulation result shows that with multi-loop feedback structure and soft switching algorithm all controllers could achieve cooperative control for position and velocity. Fig.3 (a) shows that PVCIC can make position control stable and faster without overshoot due to its better responsiveness and adaptability in switching process. Fig.3 (b) shows that in starting process, PVCIC could achieve a faster velocity response with little overshoot and have better velocity stability compared with the conventional PID controllers. In stopping and switching process due to strong selfadaptability, PVCIC could stop quickly and smoothly without negative overshoot of the velocity. As shown in Fig.3 (b), we can also find that PID-H have a fast velocity response in starting process but be slowly in switching process; PID-L has no overshoot in starting process but has big negative overshoot in stopping process; PVCIC has good intelligence control effectiveness which cloud automatically take advantages of them.
6 Conclusions Based on the biological neuroendocrine regulation network structure and its working mechanism, a novel and simple PVCIC with related networks structure, detailed designed algorithm and steps in parameters tuning are proposed for position-velocity cooperative intelligent control. An effective control structure with soft switching
120
C. Guo et al.
algorithm is proposed to achieve cooperative strategy. Meanwhile, based on the principles of hormone regulation, hormone identification and hormone optimization are proposed to identify control error and optimize control parameters. The simulation experimental results demonstrate that the PVCIC can not only ensure position accuracy but also keep the plant operating smoothly; and the accuracy, the stability, the responsiveness and the self-adaptability of the PVCIC are better than those of the conventional PID controller. According to the knowledge of the authors, this is the first time that a position-velocity cooperative intelligent control approach based on the NES was proposed. Designing motion controller inspired from NES also provides a new and efficient method for the position-velocity cooperative intelligent control.
Acknowledgments This work was supported in part by the National Nature Science Foundation of China (No. 60975059, 60775052), Support Research Project of National ITER Program (No.2010GB108004), Specialized Research Fund for the Doctoral Program of Higher Education from Ministry of Education of China (No. 20090075110002), Project of the Shanghai Committee of Science and Technology (Nos. 10JC1400200, 10DZ0506500, 09JC1400900).
References 1. Mukherjee, A., Zhang, J.: A reliable multi-objective control strategy for batch processes based on bootstrap aggregated neural network models. Journal of Process Control 18, 720– 734 (2007) 2. Bagis, A., Karaboga, D.: Evolutionary algorithm-based fuzzy pd control of spillway gates of dams. Journal of the Franklin Institute 344, 1039–1055 (2007) 3. Xie, F.-w., Hou, Y.-f., Xu, Z.-p., Zhao, R.: Fuzzy-immune control strategy of a hydroviscous soft start device of a belt conveyor. Mining Science and Technology (China) 19, 544– 548 (2009) 4. Mitra, S., Hayashi, Y.: Bioinformatics with soft computing. IEEE Transactions on System, Man and Cybernetics: Part C 36, 616–635 (2006) 5. Savino, W., Dardenne, M.: Neuroendocrine control of thymus physiology. Endocrine Reviews 21, 412–443 (2000) 6. Neal, M., Timmis, J.: A useful emotional mechanism for robot control. Informatica (Slovenia) 27, 197–204 (2003) 7. Córdova, F.M., Cañete, L.R.: The challenge of designing nervous and endocrine systems in robots. International Journal of Computers, Communications & Control I, 33–40 (2006) 8. Liu, B., Ren, L., Ding, Y.: A novel intelligent controller based on modulation of neuroendocrine system. In: Wang, J., Liao, X.-F., Yi, Z. (eds.) ISNN 2005. LNCS, vol. 3498, pp. 119–124. Springer, Heidelberg (2005) 9. Ding, Y.S., Liu, B., Ren, L.H.: Intelligent decoupling control system inspired from modulation of the growth hormone in neuroendocrine system. Dynamics of Continuous, Discrete & Impulsive Systems, Series B: Applications & Algorithms 14, 679–693 (2007) 10. Sun, Z., Xing, R., Zhao, C., Huang, W.: Fuzzy auto-tuning pid control of multiple joint robot driven by ultrasonic motors. Ultrasonics 46, 303–312 (2007)
A Position-Velocity Cooperative Intelligent Controller
121
11. Farkhatdinov, I., Ryu, J.-H., Poduraev, J.: A user study of command strategies for mobile robot teleoperation. Intelligent Service Robotics 2, 95–104 (2009) 12. Keenan, D.M., Licinio, J., Veldhuis, J.D.: A feedback-controlled ensemble model of the stress-responsive hypothalamo-pituitary-adrenal axis. PNAS 98, 4028–4033 (2001) 13. Vargas, P.A., Moioli, R.C., de Castro, L.N., Timmis, J., Neal, M., Von Zuben, F.J.: Artificial homeostatic system: A novel approach. In: Capcarrère, M.S., Freitas, A.A., Bentley, P.J., Johnson, C.G., Timmis, J. (eds.) ECAL 2005. LNCS (LNAI), vol. 3630, pp. 754–764. Springer, Heidelberg (2005) 14. Farhy, L.S.: Modeling of oscillations of endocrine networks with feedback. Methods Enzymol. 384, 54–81 (2004) 15. Guo, C., Hao, K., Ding, Y.: Parallel robot intelligent control system based on neuroendocrine method. Journal of Mechanical Electrical Engineering 27, 1–4 (2010) (chinese)
&
A Stable Online Self-Constructing Recurrent Neural Network Qili Chen, Wei Chai, and Junfei Qiao Intelligent Systems Institute, Electronic Information and Control Engineering, Beijing University of Technology, Beijing, 100124, China [email protected], [email protected]
Abstract. A new online self-constructing recurrent neural network (SCRNN) model is proposed, of which the network structure could adjust according to the specific problem in real time. If the approximation performance of SCRNN is insufficient, SCRNN can create new neural network state to increase the learning ability. If the neural network state of SCRNN is redundant, it should be removed to simplify the structure of neural network and reduce the computation load otherwise, if the hidden neuron of SCRNN is significant, it should be retained. Meanwhile, the feedback coefficient is adjusted by synaptic normalization mechanism to ensure the stability of network state. The proposed method effectively generates a recurrent neural model with a highly accurate and compact structure. Simulation results demonstrate that the proposed SCRNN has a self-organizing ability which can determine the structure and parameters of the recurrent neural network automatically. The network has a better stability.
;
Keywords: self-constructing; recurrent neural network; dynamic system.
1 Introduction The RNN theory has been studied for more than ten years because of wildly application in system control, image processing, speech recognition and many other fields. As we known, the size of RNN depends on specific real objects, so the recurrent network design must be involved when it is applied to real objects. However, the dynamic characteristics of the network bring stability problems in the application, furthermore, the relative complexity of the network structure itself make the stability condition which calculated by theory is not easy for application. The above factors result in the difficulties of the recursive network design in practical application. From our literature review, we found that most existing RNNs lack a good mechanism to determine their structure sizes automatically. Park et al.[1] proposed a self-structuring neural network control method, which can create new neurons on the hidden layer to decrease the approximated error; unfortunately, the proposed approach can not avoid the structure of neural network growing unboundedly. To avoid the problem, Gao and Er[2] proposed an error reduction ratio with QR decomposition was used to prune the rules; however, the design procedure is overly complex. Yu-Liang Hsu.[3] construct automatically the network structure according to the identified dimension. But the dimension of the system is determined D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 122–131, 2011. © Springer-Verlag Berlin Heidelberg 2011
A Stable Online Self-Constructing Recurrent Neural Network
123
by input and output data. This algorithm is not suitable for online adjustment of network structure. Chun-Fei Hsu[4] proposed a algorithm to determine whether to eliminate the existing hidden neurons which are inappropriate. The neurons removed by pruning algorithm contain small amount of information, which would lead to different levels of information loss according to different threshold value. At the same time, the dynamic characteristics of recurrent neural network would change because the removing neuron separately destroys the network structure. For the recurrent neural network structure in this paper, we design a algorithm for on-line adjustment of the recurrent neural network structure. The rest of this paper is organized as follows. In section 2, we introduce the recurrent network structure. Section3 presents an online adjustment algorithm and analyzes the dynamic characteristics of recurrent neural networks. Section4 provides computer simulations of dynamic applications to validate the effectiveness of the proposed network. Finally, the conclusions are presented in section 5.
2 Structure of Novel Recurrent Neural Network In this paper, we realize a conventional wiener model by a novel recurrent neural network structure. A large number of research of studies have indicated the superior capability and effectiveness of Wiener models in nonlinear dynamic system identification and control [5~7]. Based on the basic principle of Wiener models, we propose a simple recurrent neural network. With certain dynamic characteristics, local recurrent global feedback neural network has strong approximation ability to complex nonlinear dynamic systems. The network structure is shown in Fig. 1: w1h
z −1 W 34
u1
W
i
X1 (k ) w2 h
z
−1
x1 (k )
Wo
y1
u2
X 2 (k ) um
x2 (k ) yn
wl h z −1
X l (k )
xl (k )
Fig. 1. The structure of SCRNN
The network structure only exists self-feedback among neurons in hidden layer, the information of previous time was still send to the neurons in hidden layer of next time in essence because of the latter static nonlinear layer. Therefore, this network could be equal to a kind of hidden layer. The state space of SCRNN model can be described as follows:
X (k + 1) = W h X (k ) + W iU (k )
(1)
124
Q. Chen, W. Chai, and J. Qiao
y (k ) = W o f (W 34 X (k ))
(2)
where, f ( ⋅) is the sigmoid function.
3 Online Self-Constructing Algorithm The number of neurons in hidden layer should be determined by trial-and-error to achieve favorable approximation. To solve this problem, this paper proposes a simple growing and pruning algorithm to self-structure the neural network on-line. When the approximation performance of SCRNN is insufficient and the growing condition is satisfied, SCRNN can create new state to increase the learning ability; when the approximation performance of SCRNN is insufficient and the growing condition isn’t satisfied, SCRNN can training the network by learning algorithm to increase the learning ability; when the state of SCRNN is redundant, it should be removed to simplify the structure of neural network and reduce the computation load. 3.1 Growing Algorithm and Pruning Algorithm
Growing algorithm can split up the neurons and create new network state at the K-th sampling time, if the following condition is satisfied [1]: l
∑ (| w k =1
l
| + | w ki34 |)+ | w io |
h ki
l
∑ (∑ (| w i =1
k =1
h ki
| + | w |)+ | w |) 34 ki
> Gth
(3)
o i
Where Gth denotes the disjunction threshold value. When the approximated nonlinear functions are too complex, the disjunction threshold value should be chosen as a small value so the neurons can be easily created. This algorithm is derived from the observation that if the left hand side of (3) is larger than the threshold value, then precise approximation is hard to capture because the rate of the i-th neuron is relatively large, which will cause a large updating of the weight values. For that reason, if condition (3) is satisfied, then splitting up the weight W o and create new neural neto work state x 'i (k ) . As shown in Fig. 2. The new w weights connected to the i ' -th neuron are decided as follows:
w 'io (k + 1) = α wio (k )
(4)
Where α is a positive constant. The weights wo connected to the i ' -th neuron are decided as follows:
wio (k + 1) = (1 − α ) wio (k )
(5)
and the new network state x 'i (k ) and the network state xi (k ) are decided as follows:
A Stable Online Self-Constructing Recurrent Neural Network
125
⎡ X i (k + 2) ⎤ ⎡ wi h (k ) 0 ⎤ ⎡ X i (k + 1) ⎤ ⎡ wi i (k ) ⎤ ⎢ ⎥=⎢ ⎥⎢ ⎥+⎢ ⎥ U (k + 1) wi h (k ) ⎦ ⎣ X i (k + 1) ⎦ ⎣ wi i (k ) ⎦ ⎣ X 'i (k + 2) ⎦ ⎣ 0
(6)
⎡ xi (k + 1) ⎤ ⎡(1 − β ) wi 34 (k ) β wi 34 (k ) ⎤ ⎡ X i (k + 1) ⎤ ⎢ ⎥=⎢ ⎥⎢ ⎥ 34 (1 − β ) wi 34 (k ) ⎦ ⎣ X 'i (k + 1) ⎦ ⎣ x 'i (k + 1) ⎦ ⎣ β wi (k )
(7)
wi h z −1
U
X i (k )
y(k )
wio
wi 34
wi i
xi ( k )
wi h z −1
U
wi i
(1 − β ) wi 34 X i (k )
wi h z −1 wi i
β wi 34
(1 − α )wi o
xi ( k )
y '( k )
β wi 34 (1 − β ) wi 34
X 'i ( k )
α wo i
x 'i ( k )
Fig. 2. Schematic illustration of the growing algorithm
Pruning algorithm can delete network state at the K-th sampling time, if some conditions are satisfied. Where W i can be represented as [ wi1 wi 2 " wi l ]T , wi i is row vector of W i , W 34 can be represented as [ w341
w34 2 " w34l ] , w34i is the
column vector of W 34 . If condition (8) is satisfied, we consider X i (k ) and X j (k ) have linear dependence. Therefore, one of them could be deleted as shown in Fig. 3(a): ⎧ X i (k ) = X j (k ), ⎪ h h ⎨ w i (k ) = w j (k ), ⎪ X (0) = X (0), j ⎩ i
i≠ j i≠ j
(8)
i≠ j
We consider the state xi ( k ) and x j ( k ) have linear dependence at the K-th sampling time, if the following condition is satisfied, Pruning algorithm can delete the state x j ( k ) which is shown in fig.3(b).
w34 j (k ) = aw34i (k ), a ∈ R; i ≠ j Where w34i (k ) is the i th row vector of W 34 (k ) .
(9)
126
Q. Chen, W. Chai, and J. Qiao
Pruning algorithm can delete the X j ( k ) and x j (k ) , only if the following condition is satisfied N X = l − rank (W 34 (k + 1))
(10)
Where N X represents the linear dependent number of X i (k ) , l is the lines of weight matrix W 34 (k + 1) . After pruning x j ( k ) and X j (k ) , the weights are updated as: W i ( k + 1) = [ wi1 ( k ) " wi j −1 ( k ) wi j +1 ( k ) " wi l ( k )]T
(11)
W h (k + 1) = [ wh1 (k ) " wh j −1 (k ) wh j +1 (k ) " whl (k )]T
(12)
Wˆ 34 ( k + 1) = [ w341 ( k ) " w34i −1 ( k ) w34i (k ) + w34 j (k ) w34i +1 ( k ) " w34 j −1 ( k ) w34 j +1 ( k ) " w34l ( k )] T W 34 (k + 1) = [ wˆ 134(k ) " wˆ 34j −1 (k ) wˆ 34j +1 (k ) " wˆ 34 l ( k )]
W o (k + 1) = [ wo1 (k )" wo i −1 (k ) wo i (k ) + α wo j (k ) woi +1 (k )" wo j −1 (k ) wo j +1 (k )" wo l (k )]
(13)
(14)
(15)
This method increased and deleted the states of recurrent neural networks, but did not affect the stability of the network status, the specific analysis can be seen in 3.3 the analysis of SCRNN state stability in structure adjustment process. 3.2 Learning Algorithm
Using gradient descent to adjust the parameters in parameter learning process, define the error function of weights adjustment of recurrent neural network as: J (k ) =
2 2 1 1 1 E (k )2 = ∑ j ⎡⎣e j (k ) ⎤⎦ = ∑ j ⎡⎣Y j (k ) − y j (k ) ⎤⎦ 2 2 2
j = 1, 2," n
(16)
where, Y j (k ) is the system target output, y j (k ) is the actual output of recurrent network. The weight adjustment formula of recurrent neural network can be obtained by using gradient descent. To ensure the feedback weight in (-1, 1) with keeping recurrent neural network state stable, a synaptic normalization (SN) mechanisms [8] was added in the process of regulating. SN proportionally adjusts the values of back connections to a neuron so that they sum up to a constant value. Specifically, the wh i connections are rescaled at every time step according to:
A Stable Online Self-Constructing Recurrent Neural Network wi h z −1
wi h z −1 wi i
U
X i (k )
X i (k )
U
wk
woi y (k )
wko
wk i
xk ( k )
X k (k ) o j
xk ( k )
w
woj
x j (k )
x j (k )
wi h z −1
wi h z −1
wi i
U
xi ( k )
wk h z −1
y (k )
wko
z −1
X j (k )
wi i
wio
xi ( k )
wk h z −1 wk i X k (k )
127
X i (k ) h
z
xi ( k )
wio
−1
wko
wk i X k (k )
wi i
y( k )
U
xk ( k )
xi ( k )
wk i
woj x j (k )
X i (k )
y '( k ) o
wk X k (k )
(a)
wio + α woj
wk h z −1 xk ( k )
(b)
Fig. 3. Schematic illustration of the pruning algorithm wh i ( k ) = wh i ( k ) / ∑ w h i ( k ) i
(17)
This rule does not change the relative strengths of synapses established by gradient descent algorithm, but regulates the total incoming drive a neuron receives. The general online learning algorithm, written in vector-matricial, is given by the following equation: W ( k + 1) = W ( k ) + η ΔW ( K ) + α ΔW ( K − 1)
(18)
where W is the weight matrix , being modified ( W o , W h , W 34 , W i ), ΔW is the weight matrix correction ( ΔW o , ΔW h , ΔW 34 , ΔW i ),which is defined as ΔW (k ) ; η is a learning rate normalized parameter’s diagonal matrix, and α is a momentum term normalized learning parameter’s diagonal matrix. 3.3 The Stability Analysis
The analysis of SCRNN state stability can be obtained through the following analysis. The necessary and sufficient condition of discrete linear system stability is: all the roots of the characteristic equation are located within the unit circle in z plane. At the same time due to the diagonal matrix W h , only ensure the absolute value of diagonal elements are less than 1, this recurrent neural network state is ensured stable. In growing process, according to (4)~(7), we can obtain: X i (k ) = X 'i (k ) , xi (k ) = x 'i (k ) , y '(k ) = y (k )
(19)
In pruning process, combine the xi ( k ) and x j (k ) . According to (11) ~ (15), we can obtain: x j (k ) = α xi (k )
(20)
128
Q. Chen, W. Chai, and J. Qiao
y '(k ) = y (k )
(21)
Whether growing algorithm or pruning algorithm, the change of the network structure would not affect the mapping relation between outputs and inputs. Therefore, stability of SCRNN structure would not be affected while the structure is changing.
4 Experimental Results To validate the performance of our SCRNN, we have conducted extensive computer simulations on dynamic system identification problems. SCRNN can adjust the network structure according to the difficulty of specific problem. Therefore, we approximate three dynamic systems to verify the adaptive ability of SCRNN network. In addition, all the simulation results are compared with those of some existing fixconstructing neural network methods found in the literature. The system input signals are: u1 (k ) = u2 (k ) = sin(0.1k ) + sin(0.167k )
(22)
u3 (k ) = sin(0.1k ) + sin(1.167k )
(23)
System 1[3], system 2, system 3 can be expressed as (24), (25), (26) respectively. For system 1 and system 2, Train the neural network with the former 1000 steps data, and then use the resulting neural network model to predict the latter 500 steps of the system after the former 1000 steps. The results as shown in Fig.4 For system 3, the change of system inputs has great influence on the system outputs. The dynamic characteristic is comparatively complicated. Change the input signals, train the neural network with the former 550 steps data, and then use the resulting neural network model to predict the latter 200 steps of the system after the former 550 steps. ⎧ x(k ) = −1.4318 x(k − 1) − 0.6065 x (k − 2) ⎪ − 0.9044u1 (k − 1) + 0.0883u1 (k − 2) ⎪ ⎨ ⎪ y1 ( k ) = f [ x( k )] = 0.3163x(k ) ⎪ 0.1 + 0.9[ x(k )]2 ⎩
y2 (k ) = 1.39 y2 (k − 1) y2 (k − 2)
y2 ( k − 1) + 0.25 + u2 (k − 1) 1 + y2 (k − 1)2 + y2 (k − 2)2
y3 (k ) = 0.4 y3 (k − 1) + 0.4 y3 (k − 2) + 0.6u3 (k − 1)3 + 0.1
(24)
(25)
(26)
A Stable Online Self-Constructing Recurrent Neural Network
0.3
4
0.2
3
129
2
0.1
1 y2
y1
0 0
-0.1 -1 -0.2
-2
-0.3
-3
system output SRNN output
-0.4 1000 1050 1100
1150 1200
1250 1300 1350 step k
1400 1450
-4 1000 1050 1100 1150 1200
1500
1250 1300 1350 1400 1450 step k
1500
Fig. 4. The prediction of system 1 and system 2 bases on SCRNN. The star-line represents the system outputs while asterisk represents the SCRNN output.
The results as shown in Fig. 5(b), the dynamic characteristic of system 3 is comparatively complicated the number of hidden layer adjustment process is shown in Fig.5(a). The experimental results show that comparative complex dynamic systems require more complex recurrent neural network to approach them. According to different approximating system, SCRNN can organize network structure by itself. Using a fixed structure of the recurrent neural network (FSRNN) to predict the above three systems and select 12 neurons for hidden layer. The results are shown in Fig. 6. The MSE of SCRNN and FSRNN prediction on three systems is shown in table 1. Experimental results show that for fixed structure neural network only fit for the approximation of a specific type system. However, the problem can not be well resolved when the system changes the complexity. 18
10
16
8 6 4
12
2 y3
the number of hidden layer
14
10
0 8
-2 6
-4
4 2 0
-6
100
200
300 step k
400
500
-8 550
600
(a)
600
650 step k
700
750
(b)
Fig. 5. The prediction of system 3 bases on SCRNN. The star-line represents the system outputs while real line represents the SCRNN output. The left figure is change of neurons in SCRNN hidden layer.
130
Q. Chen, W. Chai, and J. Qiao
8
0.2
6
0.1
4
0
2
y1
y2
0.3
-0.1
0
-0.2
-2
-0.3
-4
-0.4 1000
1050
1100
1150
1200
1250 1300 step k
1350
1400
1450
-6 1000
1500
1050
1100
1150
(a)
1200
1250 1300 step k
1350
1400
1450
1500
(b)
4 3 2
y3
1 0 -1 -2 -3 -4 550
600
650 step k
700
750
(c)
Fig. 6. (a) and (b) are the prediction of system 1 and system 2 bases on SCRNN. The star-line represents the system outputs while asterisk represents the SRNN output. (c) is the prediction of system 3 bases on FSRNN. The star-line represents the system outputs while real line represents the FSRNN output. Table 1. The MSE of SCRNN and FSRNN prediction on three systems
MSE(self-constructing) MSE (fix-structuring)
System 1 0.00058 0.0017
System 2 0.2989 0.9501
System 3 1.9335 4.5833
5 Conclusion A novel online self-constructing recurrent neural network (SCRNN) model has been proposed for nonlinear identification problems. The major contributions of this paper are: 1) compare to fix-constructing recurrent neural network the network is selfconstructing. For different systems, the researchers no longer need to determine the network structure based on experience while SCRNN can adjust its own structure according to the complexity of different issues. 2) the state of SCRNN will become stable over time. Even in the restructuring process, the state stability of the network still can be guaranteed.
A Stable Online Self-Constructing Recurrent Neural Network
131
Acknowledgments. This work is supported by the National Natural Science Foundation of China under Grant No.60873043 and No. 61034008; the National 863 Scheme Foundation of China under Grant No. 2007AA04Z160; the Doctoral Fund of Ministry of Education of China under Grant No. 200800050004; the Beijing Municipal Natural Science Foundation under Grant No. 4092010, and 2010 PhD Scientific Research Foundation, Beijing University of Technology (No. X0002011201101).
References 1. Park, J.H., Huh, S.H., Seo, S.J., Park, G.T.: Direct adaptive controller for nonaffine nonlinear systems using self-stucturing neural networks. IEEE Transactions on Neural Networks 16(2), 414–422 (2005) 2. Gao, Y., Er, M.J.: Online adaptive fuzzy neural identification and control of a class of MIMO nonlinear systems. IEEE Transactions on Fuzzy Systems 11(4), 462–477 (2003) 3. Hsu, Y.L., Wang, J.S.: A Wiener-type recurrent neural network and its control strategy for nonlinear dynamic applications. Journal of Process Control 19, 942–953 (2008) 4. Hsu, C.F.: Intelligent position tracking control for LCM drive using stable online self-constructing recurrent neural network controller with bound architecture. Control Engineering Practice 17, 714–722 (2009) 5. Hunter, I.W., Korenberg, M.J.: The identification of nonlinear biological systems: Wiener and Hammerstein cascade models. Biol. Cybern. 55, 135–144 (1986) 6. Janczak, A.: Identification of Nonlinear systems using neural networks and polynomial models. Springer, New York (2005) 7. Ni, X., Verhagen, M., Krijgsman, A.J., Verbruggen, H.B.: A new method for identification and control of nonlinear dynamic systems. Eng. Appl. Artif. Intell. 9, 231–243 (1996) 8. Lazer, A., Pipa, G., Triesch, J.: SORN: a self-organizing recurrent neural network. Frontiers in Computational Neuroscience 3(23), 1–9 (2009)
Evaluation of SVM Classification of Avatar Facial Recognition Sonia Ajina1, Roman V. Yampolskiy2, and Najoua Essoukri Ben Amara1 1 National Engineering School of Sousse, University of Sousse, Tunisia [email protected], [email protected] 2 Computer Engineering and Computer Science, Louisville University, USA [email protected]
Abstract. The security of virtual worlds has become a major challenge. The huge progress of Internet technologies, the massive revolution of media and the use of electronic finance increasingly deployed in a world where the law of competition forces financial institutions to devote huge amounts of capital to invest in persistent digital worlds like Second Life or Entropia Universe whose economic impact is quite real [1]. Thus, virtual communities are rapidly becoming the next frontier for the cyber-crime. So, it is necessary to develop equitable tools for the protection of virtual environments, similar to those deployed in the real world, such as biometric security systems. In this paper, we present a biometric recognition system of non-biological entities (avatar) faces based on exploration of wavelet transform for characterization and Support Vector Machines for classification. This system is able to identify and verify avatars during their access to certain information or system resources in virtual communities. We also evaluate the performance of our avatar authentication approach focusing specifically on the classification stage. Obtained results are promising and encouraging for such first contribution within the framework of this new research field. Keywords: Biometric, Face Recognition (FR), Avatar, Virtual world, Support Vector Machines (SVM), Discrete Wavelet Transform (DWT).
1 Introduction Domestic and industrial robots, intelligent software agents, virtual world avatars and other artificial entities are quickly becoming a part of our everyday life. Just like it is necessary to be able to accurately authenticate identity of human beings, it is becoming essential to be able to determine identity of the non-biological entities rapidly infiltrating all aspects of modern society [2]. Since its first access to a virtual world, the user is given an opportunity to choose the appearance of his avatar to benefit from different assets of virtual community, such as interaction and communication with other users through several possible ways (Business meetings, sports and games, socialization and group activities, live events, D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 132–142, 2011. © Springer-Verlag Berlin Heidelberg 2011
Evaluation of SVM Classification of Avatar Facial Recognition
133
conferences, collaborative projects, E-learning, etc.), even if they are not physically in the same location [3]. However, the use of these digital metaverses requires special vigilance in view of their vulnerability to run-of-the-mill criminals interested in conducting identity theft, fraud, tax evasion, illegal gambling, smuggling, defamation, money laundering and other traditional crimes causing real social, cultural and economic damages [4]. Henceforth, with over 1 billion dollar invested in 35 virtual worlds in the past year [5], the installation of a biometric system for securing virtual worlds, similar to those developed for the real world is necessary since there are just no existing techniques other for identity verification of intelligent entities other than self-identification [1]. This research work which follows a line of investigation presented by Boukhris [6] sheds light on biometric avatar face recognition, a research which has not been addressed before. The rest of this paper is organized as follows: in Section 2, we describe our biometric approach for avatar faces authentication by focusing on the classification stage, after having detailed the typical architecture of a biometric facial recognition system. In Section 3, we present the principal experiments carried out as well as the recorded results. Finally, we summarize in the last Section conclusions and eventual future directions.
2 System Description 2.1 Problematic In literature, several research works [7, 8, 9, 10] have been focalized on biometric human face recognition. Technological revolution and appearance of virtual worlds have created a new need manifested in virtual characters face recognition. In fact, these two subfields of research present many similarities as well as divergences. Indeed, the human face is characterized by an infinite diversity of grimaces, facial expressions and characteristics of physical entities forming the face (nose, eyes, mouth, skin, etc). Thanks to this important variation, the distinction between human faces proves easier compared with avatar faces characterization where facial expressions and characteristics are limited since it is beforehand fixed by the choice of the creator. Therefore, a biometric recognition system of avatar faces has to assure a great characterization level. It should be based on a set of discriminant characteristics insuring a best texture description of faces to fill the gap of possible resemblance problems in virtual communities. 2.2 Developed Recognition System Our developed biometric recognition system of avatar faces is inspired by the human faces recognition system developed by our team SID (Signal, Image and Document) [6, 11]. As we have mentioned, the ultimate goal of recognition system of avatar faces is to differentiate between the different non-biological entities wishing to access some resources in the virtual world. To achieve this goal, it is absolutely necessary to obtain a reference dataset for all faces of non-biological entities known by the system. To collect this dataset, we have used a special web site devoted to avatar creation [12].
134
S. Ajina, R.V. Yampolskiy, and N. Essoukri Ben Amara
In fact, during the processing of facial image, we use wavelet transform to extract its global characteristics. Then, we employ SVM technique as a classifier for recognition. After completing collection stage and acquisition of different faces, a pretreatment step is necessary to prepare feature vectors for the characterization step. These characteristics, associated with each face, are assumed to be invariant for each subject and different from one subject to another [13, 14]. Then, during the face recognition stage, the system will compare the feature vector associated with the face to recognize with all other feature vectors associated with the rest of the images in the reference base. This allows the recognition of the entity that has the most resemblance of the face and whose vector is the most similar. 2.3 Extracted Features for Characterization Characterization is a key step in our facial identification system during which, we have to extract characteristic parameters that would well-describe the main variations of images to be treated. It allows the identification of images from a set of discriminant features in order to ensure, thereafter, a better separation between the different classes (entities). Generally, the set of features dealing with face recognition may be classified as either geometric or global approach [15]. The geometric approach relates to extraction of local features from different parts of human face (such as the nose, mouth and eyes, etc) [16], whereas the holistic or global approach works on the face as a complete single entity based on pixel information of the texture in order to extract global information that will characterize each attributed facial image [17]. Our developed approach is classified as a holistic approach since it is based on certain global features derived from textural analysis. Indeed, we used wavelet transform, a tool for image processing and texture analysis, for the extraction of a set of global characteristics. Wavelet transform allows us to obtain derived images corresponding to approximation image, horizontal, vertical and diagonal details [18]. The choice of the wavelet family used and the corresponding decomposition level greatly influence the recognition results obtained at the output of our system. They are determined experimentally from several tests carried out on a number of wavelets families. Best results were recorded using the Symlet wavelet family with a decomposition level 5 as shown in Figure 1.
Sym8
Fig. 1. Wavelet decomposition using “Symlet 8” wavelet family with decomposition level 5
Evaluation of SVM Classification of Avatar Facial Recognition
135
In order to find most discriminant parameters allowing a better characterization of avatar facial images, we selected several characteristics recommended in the literature between the most suitable characteristics for textural analysis (contrast, correlation, entropy, standard deviation and energy) and we studied their relevance on our dataset. Naturally, higher diversity of the characteristics leads to easier the authentication [19]. We chose the use of two types of primitives: textural and statistical primitives resulting respectively from wavelets decomposition and co-occurrence matrix associated with the image. From wavelet transform, we retained the average and the standard deviation of the approximation matrix, the standard deviations of matrices associated with horizontal, vertical and diagonal details of each component RGB as well as the entropy of the color image. Moreover, we added to these 16 characteristics 12 other parameters to define each image of face as well as possible. From statistical characteristics, we utilized the moment of third order and the moment of fourth order of approximations matrices of each color component. We also considered second order metrics extracted from co-occurrence matrix like contrast and energy for each RGB component of the original image. The result of this stage is a feature vector of 28 parameters illustrated in Figure 2. Approximation Matrix for each RGB component: Average + Standard deviation + Skewness + Kortosis
Horizontal Details Matrix for each RGB component: Standard deviation
Ve rtical Details Matrix for each RGB component: Standard deviation
Diagonal De tails Matrix for each RGB component: Standard deviation
Color image Matrix: Entropy
Re d Component: Contrast + Energy
Blue Component: Contrast +Energy
Green Component: Contrast + Energy
Fig. 2. Total of parameters composing the feature vector
2.4 SVM Classification Numerical classification techniques were always present in the pattern recognition and particularly in the automatic face recognition.
136
S. Ajina, R.V. Yampolskiy, and N. Essoukri Ben Amara
According to Mari &Napoli (1996) [20], classification is undertaken to describe relations between different objects on the one hand and between each object and its parameters on the other hand, so accurate authentication will be done. This process, also called categorization, allows the construction of whole objects partition under a set of homogeneous classes. Indeed, we have ensured the classification step by using a supervised learning technique called Support Vector Machines, based on the research of optimal hyperplane for classification that separates the different types of samples [16]. This technique has proven performance even on a large number of samples contrarily to not-supervised classification technique with k-means that allows a simple separation just between two classes. We notice that linear, polynomial and RBF kernel are the most used kernels in automatic classification based on SVM techniques. Various tests were conducted to the choice of an SVM architecture whose kernel is a Radial Basis Function RBF, defined by (1): ି ୖ ሺݔǡ ݕሻ ൌ ݁
ȁȁೣషȁȁ మ మ మ
.
(1)
With σ is the parameter to vary for the RBF kernel. This core is suitable for almost all types of classification problems and offers a good detection rate [16].
3 Experimental Tests and Results In this section, we present briefly an overview of the collected dataset and we expose the different tests and obtained results by focusing on the classification stage. 3.1 Dataset Description In order to evaluate robustness and reliability of our avatar face recognition system, we have been led to collect a large and diversified dataset of virtual characters which even deals with problems of disguise or usurpation of identity in the virtual worlds. By considering the most known human faces databases in literature on biometric identification [21, 22, 23, 24], we noticed the significant effect of variation in pose, contrast and brightness on recognition results as well as the importance of similarities type of all images (same size, resolution and format) in order to guarantee the performance of face recognition system. In fact, our new gallery contains a set of 1800 facial images belonging to 100 classes. Each avatar is represented using a series of 18 different images collected thanks to the exploration of the web site “MyWebFace☺” [12]. All images of our database are in RGB-encoded BMP format taken in frontal position with white homogenous background and under the same lighting conditions. Images size is fixed to 150 x 175 pixels at 90 PPI of resolution in order to improve the recognition rate obtained by our system.This dataset is devised into two parts: the first contains 1200 avatar faces (12 facial expressions for each subject) considered for training phase whereas the second part is reserved for testing phase and contains the rest of 600 images (6 images with accessories for each avatar) used to evaluate the performance measures for classification tasks. We have chosen randomly six images for testing and twelve images for training.
Evaluation of SVM Classification of Avatar Facial Recognition
137
3.2 Tests Tables 1 and 2 summarize the main tests carried out to retain the best wavelet family and it corresponding decomposition level. Table 1. Recognition rate and classification threshold for tested wavelet families Wavelet Family Discrete Meyer Wavelet Haar (or db1) Bior1.1 Bi-orthogonal Bior1.5 Bior2.2 Bior6.8 Coif1 Coiflet Coif3 Coif5 Db2 Daubechies Db5 Db9 Db15 Reverse Rbio1.1 Bi-orthogonal Rbio3.3 Rbio4.4 Sym2 Symlet Sym5 Sym8 Sym20
Recognition Rate 94.1517% 93.65% 93.6517% 93.83 % 94.3167% 94.3333% 94.3467 % 94.1683% 93.5217% 94% 94.3467% 94.34% 93.335% 93.65% 94.16% 93.855% 94.015% 94.325% 94.5417% 93.4017%
Classification Threshold 0.527 0.520 0.518 0.532 0.539 0.552 0.537 0.544 0.539 0.529 0.538 0.544 0.521 0.520 0.540 0.520 0.530 0.539 0.557 0.533
Various tests with these discrete wavelet families showed that the recognition results are extremely alike (ranged between 93 and 95%), but the best recognition rate was recorded with the wavelet family “Symlet 8” resulted in a recognition rate of 94.5417% and 0.557 as optimal classification threshold. Once wavelet family and its index are selected, best decomposition level of this family should be determined. Table 2. Recognition results of the retained wavelet family for different decomposition levels Decomposition Level
Recognition Rate
Classification Threshold
Equal Error Rate
Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 Level 7
93.6500% 94.1517% 94.2283% 94.5083% 95.4917% 94.8367% 94.8983%
0.520 0.527 0.539 0.593 0.618 0.624 0.620
6% 5.4% 5.8% 5.5% 4.5% 5.16% 5.1%
138
S. Ajina, R.V. Yampolskiy, and N. Essoukri Ben Amara
3.3 Evaluation of Our Classification Methodology To identify the problems in avatar face recognition encountered by our approach, we show in the Table 3 the Recognition Rate (RR), True Rejection Rate (TRR) and False Acceptance Rate (FAR) calculated for a sample of five virtual subjects randomly selected. Table 3. Recognition results for a sample of avatar Avatar N°2 N°3 N°42 N°68 N°79
RR (%) 92.333 98.667 78.6667 99.8333 100
TRR (%) 33.333 0 0 16.6667 0
FAR (%) 7.4074 1.3468 21.5488 0 0
At first glance, it would appear that calculated rates show a remarkable variation from one subject to another. Furthermore, we note that the presence of accessories like caps, glasses and other structural components affects the obtained Recognition Rates. Indeed, as a comparison between these results, we observe that the low average Recognition Rate (RR) and most importantly TRR and FAR rates are obtained with Avatar N°42 in our dataset. This is explained by fact that large-size accessories presented in some images of this avatar affects considerably authentication results, while unremarked or barely visible accessories as in the case of subject N°79 generates ideal TRR and FAR (0%) and optimal RR (100%). Figure 3 shows some difference in the nature of accessories added.
(a)
(b) Fig. 3. Illustration of accessories variation ((a): Subject N°42 presenting an important accessory variation, (b): Subject N°79 presenting an unremarkable accessory)
Evaluation of SVM Classification of Avatar Facial Recognition
139
According to False Acceptance Error experiments, we can also notice that misclassified test images are assigned to more than one reference image owing to classifier nature used in classification stage for the calculation of close distances between feature vectors. In addition, we notice that the wide variation in facial expressions, image size, digital resolution, lighting conditions and the choice of images used for training and testing process, greatly influence the recognition accuracy in our recognition system [13, 14]. For a better evaluation of classification quality, we present the Confusion Matrix (Contingency Table) evaluated on the first five avatars of our dataset images as shown in Table 4. Table 4. Confusion Matrix
Reference
N°1 N°2 N°3 N°4 N°5
N°1 6 1 0 2 0
Classification Avatar N°2 N°3 N°4 0 0 2 4 2 0 1 6 0 0 1 5 1 0 1
N°5 1 1 0 0 6
This Contingency Table is obtained by comparing classified data with reference data which should be different from data used to perform the classification process [25]. According to Table 4, we can claim that the system makes a misclassification of subject N°1 by its confusion with a reference image of subject N°2 and two reference images of subject N°4. Due to the size limitations of the paper we have been limited only to the representation of a 5x5 Confusion Matrix, a part of our 100x100 Confusion Matrix representing the total number of subjects in our dataset. Generally, the Confusion Matrix illustrates errors and possible risks of confusion for each subject with other subjects. We note that risk of errors is relatively high due to the possibility of assigning a test image of any subject to a set of reference images corresponding to different subjects. 3.4 Recorded Results Experiments of Boukhris [6] suggest that “Daubechies 9” wavelet family with a second decomposition level provide a recognition rate of 92.5%. In our approach, we increased the number of experiments to select finally “Symlet 8” wavelet family among several wavelet families with a fifth decomposition level. The average recognition rate is 95.4917% obtained for classification threshold of 0.618. Beyond this threshold, the effectiveness of the classification is no longer acceptable.
140
S. Ajina, R.V. Yampolskiy, and N. Essoukri Ben Amara distribution of FAR and TRR according to classification score 100 TRR FAR
Percentage of total recognition rate
90 80 70 60 50 40 30 20 10 0
0
0.1
0.2
0.3
0.4 0.5 0.6 Classification score
0.7
0.8
0.9
1
Fig. 4. Obtained Distribution Curve
Indeed, we used an automatic classification threshold while passing from 0 to 1 with a step of 0.001. The optimal threshold is that which corresponds to the highest Recognition Rate. It can be also determined thanks to the distribution curve (intersection between FAR and TRR) as shown in Figure 4 which allows us to judge the performance of our system differently.
4 Conclusion and Future Works This review paper exposes a new subfield of security research concerning virtual worlds in which we describe a facial biometric recognition approach for virtual characters (avatars), based on exploration of several global features in the image’s texture resulting from DWT and SVM techniques for classification stage. We focused also on the characterization level to determine the best family and wavelet decomposition level fitting best for this methodology followed. System performances and classification effectiveness were tested on a database of 1800 images of avatar visages. The tests have led to the choice of “Symlet 8” wavelet family with a decomposition level 5 and Radial Basis Function for SVM kernel among linear, polynomial and Gaussian Kernels. The experimental results are very promising. Actually, we intend to improve and consolidate the performance of our authentication system and deal with problems encountered on the level of accessory and poses variation by using simultaneously other biometric techniques in order to produce multimodal system capable of authenticating both biological (human being) and nonbiological (avatars) entities.
Evaluation of SVM Classification of Avatar Facial Recognition
141
References 1. Oursler, J.N., Price, M., Yampolskiy, R.V.: Parameterized Generation of Avatar Face Dataset. In: 14th International Conference on Computer Games: AI, Animation, Mobile, Interactive Multimedia, Educational & Serious Games, Louisville, KY, pp. 17–22 (2009) 2. Gavrilova, M.L., Yampolskiy, R.V.: Applying Biometric Principles to Avatar Recognition. In: International Conference on Cyberworlds, pp. 179–186. IEEE Press, Canada (2010) 3. Peachey, A., Gillen, J., Livingstone, D., Smith-Robbins, S.: Researching Learning in Virtual Worlds. Human–Computer Interaction Series, pp. 369–380. Springer, Heidelberg (2010) 4. Klare, B., Yampolskiy, R.V., Jain, A.K.: Face Recognition in the Virtual World: Recognizing Avatar Faces, Technical Report from MSU Computer Science and Engineering (2011) 5. Science & Technology News, http://www.virtualworldsmanagement.com/2007/index.html 6. Boukhris, M., Essoukri Ben Amara, N.: First contribution to Avatar Facial Biometric, Endof-studies Project. National Engineering school of Sousse, Tunisia (2009) 7. Lu, J., Kostantinos, A., Plataniotis, K.N., Venetsanopoulos, A.N.: Face Recognition Using LDA Based Algorithms. IEEE Transactions on Neural Networks 14(1), 195–200 (2003) 8. Turk, M.: Interactive-time vision: face recognition as a visual behavior, PhD Thesis, Massachusetts Institute of Technology (1991) 9. Baron, R.J.: Mechanisms of human facial recognition. International Journal International Journal of Man Machine Studies 15, 137–178 (1981) 10. Kanade, T.: Picture Processing System by Computer Complex and Recognition of Human Faces. PhD Thesis, Kyoto University, Japan, pp. 25–37 (1973) 11. Saidane, W.: Contribution à la biométrie du visage, End-of-studies Project. National Engineering School of Sousse, Tunisia (2008) 12. MyWebFace, http://home.mywebface.com/faceApp/index.pl 13. Ajina, S., Yampolskiy, R.V., Essoukri Ben Amara, N.: Avatar Facial Biometric Authentication. In: 2nd International Conference on Image Processing Theory, Tools and Applications, France, pp. 2–5 (2010) 14. Ajina, S., Yampolskiy, R.V., Essoukri Ben Amara, N.: Authentification de Visages d’Avatars. In: 17th Conception and Innovation Colloquy, CONFERE 2010 Symposium, Tunisia (2010) 15. Heseltine, T.D.: Face Recognition: Two-Dimensional and Three-Dimensional Techniques. PhD Thesis, University of York, UK (2004) 16. Huang, J., Blanz, V., Heisele, B.: Face Recognition Using Component-Based SVM Classification and Morphable Models. In: Lee, S.-W., Verri, A. (eds.) SVM 2002. LNCS, vol. 2388, p. 334. Springer, Heidelberg (2002) 17. Jourani, R.: Face recognition. End-of-studies project, Faculty of Sciences Rabat- Mohammed V - University Agdal (2006) 18. Audithan, S., Chandrasekaran, R.M.: Document Text Extraction from Document Images Using Haar Discrete Wavelet Transform. European Journal of Scientific Research 36(4), 502–512 (2009) 19. Ghomrassi, R.: Analyse de Texture par Ondelettes. End-of-studies project, Tunisia (2008) 20. Mari, J.F., Napoli, A.: Aspects de la classification. In: 7th Journey of Francophone Classification, France (1999) 21. The Color FERET Database, http://face.nist.gov/colorferet/request.html
142
S. Ajina, R.V. Yampolskiy, and N. Essoukri Ben Amara
22. The Yale Face Database, http://cvc.yale.edu/projects/yalefaces/yalefaces.html 23. The AT&T Database, http://www.cl.cam.ac.uk/research/dtg/attarchive/ facedatabase.html 24. The ORL Database, http://www.cl.cam.ac.uk/research/dtg/attarchive/ facedatabase.html 25. Traitement des données de télédétection. Girard M. C, DUNOD Edition, pp. 328–333, Paris (1999)
Optimization Control of Rectifier in HVDC System with ADHDP* Chunning Song1, Xiaohua Zhou1,2, Xiaofeng Lin1, and Shaojian Song1 1
College of Electrical Engineering, Guangxi University, Guangxi Nanning 530004, China 2 Department of Electronic Information and Control Engineering, Guangxi University of Technology, Guangxi Liuzhou 545006, China
Abstract. A novel nonlinear optimal controller for a rectifier in HVDC transmission system, using artificial neural networks, is presented in this paper. The action dependent heuristic dynamic programming(ADHDP), a member of the adaptive critic designs family is used for the design of the rectifier neurocontroller. This neurocontroller provides optimal control based on reinforcement learning and approximate dynamic programming(ADP). A series of simulations for a rectifier in dulble-ended unipolar HVDC system model with proposed neurocontroller and conventional PI controller were carried out in MATLAB/Simulink environment. Simulation results are provided to show that the proposed controller performs better than the conventional PI controller. the current of DC line in HVDC system with the proposed controller can quickly track with the changing of the reference current and prevent the occurrence of the current of DC line collapse when the large disturbances occur. Keywords: Optimal control, rectifier,HVDC transmission system, approximate dynamic programming(ADP), action dependent heuristic dynamic programming (ADHDP).
1 Introduction The HVDC technology is used in transmission systems to transmit electric bulk power over long distances by cable or overhead lines. It is also used to interconnect asynchronous AC systems having the same or different frequency. Figure 1 shows a simplified schematic picture of an HVDC system, with the basic principle of transferring electric energy from one AC system or node to another, in any direction. The system consists of three blocks: the two converter stations and the DC line. Within each station block there are several components involved in the conversion of AC to DC and vice versa. *
This work was supported in part by the Natural Science Foundation of China under Grant 60964002; the Natural Science Foundation of Guangxi Province of China under Grant 0991057; the Science & Research Foundation of Educational Commission of Guangxi Province of China under Grant 200808MS03.
D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 143–151, 2011. © Springer-Verlag Berlin Heidelberg 2011
144
C. Song et al.
Fig. 1. Structure of HVDC sysytem
In order to improve the performance of traditional controller for converters in HVDC system,many researchers have engaged in this field and done many jobs. In recent years, a large number of HVDC controller schemes based on various control techniques have been proposed [1-3] to improve the transient and dynamic stability of power systems. More recently, intelligent control techniques such as fuzzy logic and neural networks have been applied to HVDC control. However, most of the results are still in the early stage of theoretical investigations[4-7]. Artificial neural networks (ANNs) are good at identifying and controlling complex nonlinear systems [8]. They are suitable for multi-variable applications, where they can easily identify the interactions between the inputs and outputs. This paper presents a new technique not found before in the design of a neuro-controller for a rectifier of HVDC that is based on critic training.In this paper the design of a neurocontroller using the action dependent heuristic dynamic programming (ADHDP) method,a type of ADP is discussed, and simulation results are presented to show that critic based trained neural networks has the ability to control rectifier of HVDC.
2 Principle of ADHDP Method Suppose that one is given a discrete-time nonlinear system:
x(t + 1) = F [ x(t ), u (t ), t ]
(1)
Where, x represents the state vector of the system and u denotes the control action.If an initial state x (i ) is given, the goal of dynamic programming is to choose the appropriate control sequence go function:
u ( k ), k = i, i + 1,⋅ ⋅ ⋅ ,and minimize the system cost-to∞
J [ x (i ), i ] = ∑ γ k −1U [ x(k ), u (k ), k ]
(2)
k =i
Where U is called the utility function and γ is the discount factor with 0< γ ≤1. The goal of ADHDP method is to obtain approximate Solution of Bellman equation as follow:
J * [ x (t ), t ] = min (U [ x(t ), u (t ), t ) + γJ * [ x (t + 1), t + 1]) u (t )
(3)
Optimization Control of Rectifier in HVDC System with ADHDP
145
Obviously,this method can obtain sub-optimal solution and significantly reduce the computational cost of dynamic programming. Figure 1 shows the schematic principle of ADHDP[9,10],its components include action network and critic network..The input of action network is the parameters of the system state x (t ) ,the output is the control variable u (t ) in the current state; The inputs of critic network is the system parameters x (t ) and control u (t ) in the current state its role is to evaluate the control, and its output is the approximation of the cost function J (t ) . The dashed line in Fig.2 shows the error back propagation training path. The objective of the action network is to minimize the cost function J , their weights are updated by the output of the critic network. The weights of the critic network decided by the difference of actual output and expected output.
Fig. 2. Structure of ADHDP controller
The critic network adopts the idear of supervised learning method,the weight value of network is adjusted by the error function defined as follow:
ec (t ) = J (t ) − γJ (t + 1) − U (t )
(4)
The output of action network is the approximation of the cost function J (t ) ,its weight value of action network is adjusted by the error function defined as follow:
ea (t ) = J (t )
(5)
3 Design of ADHDP-based Rectifier Controller Fig.3 shows the structure of ADHDP-based Rectifier Controller.When designing ADHDP-based Rectifier Controller in this paper,we adopt constant current control mode,and its critic network and action network are designed using three layers BP neural network.
146
C. Song et al.
ΔI d (t − 2)
ΔI d (t − 1)
ΔI d (t )
ΔI d (t + 1)
Fig. 3. Structure of ADHDP rectifier controller
Fig.4 shows the Structure of action neural network (a) and critic neural network (b) for ADHDP controller.In order to obtain better dynamic performance and learning effect,the inputs of the action network includes an error signal obtained from the reference current I dref and the value of the DC line current I d measured at time t, t-1 and t-2, which described with ΔI d (t ) , ΔId (t −1) and ΔId (t − 2) , the output of its is control
signal u(t ) ,which represents the trigger angle Δα of the rectifier, the hidden layer has six neurons. In the critic network, the inputs include ΔI d (t ) , ΔId (t −1) , ΔId (t − 2) and u(t ) . Its output is cost function J with the error function ec (t ) = J (t ) − γ J (t + 1) − U (t ) and its hidden layer has ten neurons.
ΔI d (t ) ΔI d (t − 1) ΔI d (t − 2)
• • •
Δα
ΔI d (t ) ΔI d (t − 1) ΔI d (t − 2) Δα
(a)
• • •
J
(b)
Fig. 4. Structure of action neural network (a) and critic neural network (b) for ADHDP controller
The neural network expressions of the critic network are:
⎧ Jh1 = wc1 X c (t ) ⎪ 1 − e − Jh1 ⎪ f _ Jh 1 = ⎨ 1 + e − Jh1 ⎪ ⎪⎩ J = wc 2 f _ Jh1
(6)
Optimization Control of Rectifier in HVDC System with ADHDP
Where X c (t ) is the inputof critic network, J is the output,
147
f _ Jh1 is the layer out-
put wc1 and wc 2 are the weights of the network at first and second layer respectively. According to the error function definited as equation (4), the critic network minimize objective function as follow:
1 1 E c (t ) = ∑ [ J (t ) − γJ (t + 1) − U (t )] 2 = ec2 (t ) 2 t 2
(7)
As the gradient descent, we can get the wc1 and wc 2 with the following formula:
wc1 = wc1 − lc ⋅
∂E ∂wc1
∂E ∂J ∂J ∂f _ Jh1 ∂Jh1 = ec (t) = ec (t) ∂wc1 ∂wc1 ∂f _ Jh1 ∂Jh1 ∂wc1 ∂f _ Jh1 = ec (t)wc′2 (t) Xc′ (t) ∂Jh1 wc 2 = wc 2 − lc ⋅
∂E ∂wc 2
∂E ∂J = ec (t ) = ec (t ) f _ Jh1 ∂wc 2 ∂wc2 Where γ is the discount factor,
(8)
(9)
(10)
(11)
lc is the learning rate of the critic network.
Imitating the critic network,we can gain the neural network expressions of the action network are:
⎧uh1 = wa1 X a (t ) ⎪ 1 − e − uh1 ⎪ f _ uh 1 = ⎨ 1 + e −uh1 ⎪ ⎪⎩u = wa 2 f _ uh1
(12)
X a (t ) is the input of action network, u is the output, f _ uh1 is the layer output. wa1 and wa 2 are the network weights at first and second layer. Where
According to the error function definited as equation (5), the action network minimize objective function as follow:
148
C. Song et al.
E a (t ) =
1 2 ea (t ) 2
(13)
As the gradient descent, the updating of weight wc1 and wc 2 with the following formula:
wa1 = wa1 + Δwa1 Δwa1 = −la ⋅
∂Ea ∂Ea ∂ea ∂J ∂J = −la ⋅ = −la ⋅ ea ⋅ ∂wa1 ∂ea ∂J ∂wa1 ∂wa1
wa 2 = wa 2 + Δwa 2 Δwa2 = −la ⋅
∂Ea ∂Ea∂ea ∂J ∂J = −la ⋅ = −la ⋅ ea ⋅ ∂wa2 ∂ea ∂J ∂wa2 ∂wa2
(14)
(15)
(16)
(17)
Where l a is the learning rate of the action network. Utility function U (t ) can reflect the control effect of controller[11], in this paper is given in equation (18):
U(t) = [4ΔI d (t) + 4ΔI d (t −1) +16ΔI d (t − 2)]2 Where ΔI d (t ) , ΔI d (t
U (t ) defined (18)
− 1) and ΔI d (t − 2) are the value of input of the action net-
work at time t, t-1 and t-2. The action network optimizes the overall cost over the time horizon of the problem by minimizing the function J. It basically provides the optimal control output to the plant.
4 Simulation Results In simulation test,a dulble-ended unipolar HVDC system model is built to test or verify performance of ADHDP rectifier controller. The rectifier is 12 pulse converter in this model. The test is carried out in the MATLAB/Simulink environment,and the rectifier adopts constant current control mode,the inverter adopts constant γ control mode. In order to show the control effects of this controller, the simulation results of ADHDP controller compared with traditional PI controller. Case 1: Simulation results during series changing of reference current In this case, the reference current changes at different simulation time to show steady state performance of controller,thoes changes are indicated in table 1. The PI controller and the ADHDP-based controller are worked in this system respectively.
Optimization Control of Rectifier in HVDC System with ADHDP
149
Fig.5 shows the currents curve of DC line at the control of PI and ADHDP controller,as we can see, the maximum of direct current of DC line at the control of PI controller is 1.35pu,and it further surpasses the reference current 1.0pu.However, the maximum of direct current of DC line at the control of ADHDP controller is close to the reference current 1.0pu,in addition,the response speed of ADHDP controller is faster than that of PI controller when reference current changes. Table 1. Changing of reference current time 0s 0.4s 0.7s
Changing of reference current 1p.u Down to 0.8 p.u Return to 1p.u
1.4
Id&Idref/p.u
1.2
Id(PI) Idref
1 0.8
Id(ADHDP)
0.6 0.4 0.2 0
0.2
0.4
0.6
0.8
1
t/s
Fig. 5. Current of DC line during changing of reference current
Case 2: Simulation result during DC line short circult fault. DC line short circult is a typical fault of HVDC system. In this case, the DC line short circult fault occurred at 0.5s,and it was resected. The PI controller and the ADHDP-based controller are worked in system respectively.Fig.6 illustrates the currents curve of DC line at the control of the ADHDP controller,and Fig. 7 shows the currents curve of DC line at the control of the PI controller. We can see that the ADHDP-based controller can stabilize the current of DC line to 1.0pu at 0.6s,but PI controller stabilize the current of DC line to 1.0pu until 0.7s.And the ADHDP-based controller can achieve stability faster and with less overshoot than the traditinal PI controller.
150
C. Song et al.
Id&Idref/p.u
1.5
1
0.5 Id(ADHDP) Idref
0 0
0.2
0.4
0.6
0.8
1
t/s Fig. 6. Current of DC line during DC line short circult with ADHDP controller
3.5 3 Id(PI) Idref
Id&Idref/p.u
2.5 2 1.5 1 0.5 0 0
0.2
0.4
0.6
0.8
1
t/s Fig. 7. Current of DC line during DC line short circult with PI controller
5 Conclusions In this paper, a novel optimal neurocontroller based on the adaptive critic designs technique is designed for rectifier of a dulble-ended unipolar HVDC system.The structure of it is relatively simple and does not depend on the mathematical model of plant,in addition,it can be trained on-line in MATLAB/simulink environment. The numerical simulation results with ADHDP- based neurocontroller show that the current of DC line can track with the changing of reference current and prevent the occurrence of the current of DC line collapse when the large disturbances occur; it is more effective in
Optimization Control of Rectifier in HVDC System with ADHDP
151
controlling the rectifier of HVDC system compared with conventional PI controller.The proposed neurocontroller achieves optimal control through learning online and can adapt the changes of the size and complexity of the power system,Therefore, the prospect of this controller will be very broad in this field.
References 1. Machida, T.: Improving transient stability of AC systems by joint usage of DC system. IEEE Trans. Power Apparatus Syst. 85(3), 226–232 (1996) 2. Jovcic, D., Pahalawaththa, N., Zavahir, M.: Novel current controller design for elimination of dominant oscillatory mode on an HVDC line. IEEE Trans. Power Deliv. 14(2), 543–548 (1999) 3. Li, X.Y.: A nonlinear emergency control strategy for HVDC transmission systems. Electric Power Systems Research 67, 153–159 (2003) 4. Narendra, K.G., Khorasani, K., Sood, V.K., Patel, R.V.: Intelligent current controller for an HVDC transmission link. IEEE Trans. Power Syst. 13(3), 1076–1083 (1998) 5. Narendra, K.G., Sood, V.K., Khorasani, K., Patel, R.V.: Investigation into an artificial neural network based on-line current controller for an HVDC transmission link. IEEE Trans. Power Syst. 12(4), 1425–1431 (1997) 6. Dash, P.K., Routray, A., Mishra, S.: A neural network based feedback linearising controller for HVDC links. Electric Power Systems Research 50, 125–132 (1999) 7. Peiris, H.J.C., Annakkage, U.D., Pahalawaththa, N.C.: Frequency regulation of rectifier side AC system of an HVDC scheme using coordinated fuzzy logic control. Electric Power Systems Research 48, 89–95 (1998) 8. Hunt, K.J., Sbarbaro, D., Zbikowski, R., Gawthrop, P.J.: Neural networks for control systems -a survey. Automatica 28(6), 1083–1112 (1992) 9. Prokhorov, D., Wunsch, D.: Adaptive critic designs. IEEE Transactions on Neural Net-works 8(5), 997–1007 (1997) 10. Liu, D.: Action-dependent Adaptive Critic Designs. In: Proceedings of IEEE International Joint Conference on Neural Networks, pp. 990–995 (2001) 11. Mohagheghi, S., del Valle, Y., Venayagamoorthy, G.K.: A Proportional-Integrator Type Adaptive Critic Design-Based Neurocontroller for a Static Compensator in a Multimachine Power System. IEEE Transactions on Industrial Electronics 54(1), 86–96 (2007)
Network Traffic Prediction Based on Wavelet Transform and Season ARIMA Model Yongtao Wei, Jinkuan Wang, and Cuirong Wang 1
School of Information Science and Engineering, Northeastern University, Shenyang, China [email protected], {wjk,wcr}@mail.neuq.edu.cn
Abstract. To deal with the characteristic of network traffic, a prediction algorithm based on wavelet transform and Season ARIMA model is introduced in this paper. The complex correlation structure of the network history traffic is exploited with wavelet method .For the traffic series under different time scale, self-similarity is analyzed and different prediction model is selected for predicting. The result series is reconstructed with wavelet method. Simulation results show that the proposed method can achieve higher prediction accuracy rather than single prediction model. Keywords: Network Traffic; Wavelet; Season ARIMA.
1 Introduction In the area of data communications, particularly in the area of network traffic prediction, Fractal models and wavelet analysis play a important role. Such as the work of Leland et al. [1] and subsequent studies, which demonstrated network traffic loads exhibit fractal properties such as self-similarity, Abruptness, and long-range dependence (LRD)[2]. The models by classical Poisson or Markov models are inadequate, and these properties influence network performance strongly. For example, performance predictions based on classical traffic models are often far too optimistic when compared with actual performance with real data. Fractal traffic models have provided exciting new insights into network behavior and promise new algorithms for network data prediction and control. Some other studies show the application of wavelet analysis to traffic modeling [3-5]. To accurately predict network traffic on a virtual network plays an important role for cost control and dynamic bandwidth allocation. However, traffic variations at different timescales are caused by different network mechanisms. Traffic variations at small timescales (i.e. in the order of ms or smaller timescale) are caused by buffers and scheduling algorithms etc. Traffic variations at larger timescales (i.e. in the order of 100ms) are caused by traffic and congestion control protocols, e.g. TCP protocols. Traffic variations at even larger timescales are caused by routing changes, daily and weekly cyclic shift in user populations. Finally long-term traffic changes are caused by long-term increases in user population as well as increases in bandwidth requirement of users due to the emergence of new network applications. D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 152–159, 2011. © Springer-Verlag Berlin Heidelberg 2011
Network Traffic Prediction Based on Wavelet Transform and Season ARIMA Model
153
Considering the complex scaling behavior of network traffic, several wavelet-based traffic predictor has been proposed. Mao [6] proposed a wavelet-based ARIMA method, predicting traffic component at each time scale with an ARIMA scheme. Yin et al.[7] used similar approach to predict wireless traffic. In this paper, we develop a new wavelet-based signal model for positive, stationary, and LRD data. While characterizing positive data in the wavelet domain is problematic for general wavelets. The wavelet transformation, which had the thorough research in the economic projection aspect, will apply in the network flow to come to have one kind of new breakthrough. The wavelet transform has the long distance dependent current flow to get up has gone to the correlation, with which the time domain problems can be possibly solved if transformed to the frequency range in layer by layer wavelet transform. A general approximation coefficient of wavelet algorithms (Haar wavelet filter) at this time will be treated as a stationary series [6]. ARIMA (p, d, q), the method of modeling, the use of differential equations so that after the establishment of a smooth approximation coefficients of ARMA models. Then there will be a smooth wavelet analysis and the classical time series modeling combined to non-stationary time series modeling and forecasting provides a new approach, and improves forecast accuracy. The rest of this paper is organized as follows: section II describes the framework of the Wavelet based Combination Model, including the introduction of the Mallat algorithm for decomposition, the Combination Model for prediction. In the end, we will present the experiments and conclusions.
2 Wavelet and SARIMA Model Based Prediction 2.1 Wavelet Transformation The wavelet method is used in time series processing for decomposition and composition [8]. Using multi-scale wavelet decomposition and synthesis, the time series can be decomposed in different scales of detail coefficients and approximation factor. The Mallat decomposition and reconstruction algorithm formula is as follows: Suppose f(t) is the original signal, the wavelet transform can be described as formula (1), in L2(R),
(W f )(b, a) =
1 a
∫
+∞
−∞
f (t )ψ (
f (t ) ∈ L2 ( R)
t −b ) dt a
(1)
a, b ⊂ R , a ≠ 0 .The parameter a is called dilation factor, b is called timelapse factor, (W f )(b, a ) is called coefficient of wavelet transform, ψ (t ) is basal Where
wavelet. In practice, f(t) is usually expressed as discrete form. So the multi-scale pyramid method (Mallat discrete algorithm [8-9]) can be employed for calculation. And it can be expressed as:
f ( j + 1) = θ j + f j ( j = N - 1 , N - 2 ,… , N -M )
(2)
154
Y. Wei, J. Wang, and C. Wang
where N is the initiative scale, M is the decomposed degree, θ j ,
f j can be obtained
by formula (3).
f j = ∑ Cmj φ j , m m
(3)
θ j = ∑ d mjψ j ,m m
( j = N - 1 , N - 2 ,… , N -M )
φ j ,m is scaling function, ψ j ,m is wavelet function, and the coefficient Cmj can be expressed as follows,
and
d mj
Cmj = ∑ hk − 2 mC kj +1 k
d = ∑ g k − 2 mC kj +1 j m
(4)
k
( j = N - 1 , N - 2 ,… , N -M ) hk-2m and gk-2m is based on the choice of the wavelet generating function. θj and fj (j=N-1,N-2,…,N-M) are called low frequency component and high frequency component in scale N respectively. 2.2 SARIAM Model The model ARMA(p, q) consists of two parts, an autoregressive (AR) part and a moving average (MA) part. The model is usually then referred to as the ARMA(p,q) model where p is the order of the autoregressive part and q is the order of the moving average part [10]. The two parts AR(p) and MA(q) models are described as formula (5) and (6). Xt=ϕ1Xt-1+ ϕ2Xt-2 + … + ϕpXt-p + μt
μt=εt - θ1εt-1 - θ2εt-2 - … - θqεt-q
(5) (6)
ARIAM model[11,12] is built based on the Markov random process, which reflects the dynamic characteristics of both absorbed the advantages of regression analysis also carries forward the merits of moving average. An ARIMA(p,d,q) model is an ARMA(p,q) model that has been differenced d times. In many practical problems, such as network traffic), the observed sample data series (Xt, t = 0,1,2 …) are generally non-stationary series, but it will be d times its finite difference treatment, the sequence has become a smooth sequence. Set d non-negative integers, say { Xt } is the ARIMA (p, d, q) sequence: Φ(B)(1-B)d Xt = θ(B)εt
(7)
Where Φ (B) and θ (B) two times respectively for p and q of the characteristic polynomial, p and q are positive integers, the expression for the:
Network Traffic Prediction Based on Wavelet Transform and Season ARIMA Model
Φ( B) = 1 − Φ1 B − ... − Φ p B p
θ ( B) = 1 + θ1 B + ... + θ q B q
155
(8)
Where B is the delay factor, Bxt=xt-1, εt is Gaussian white noise sequence, Subject to WN(0, σ2) distribution. The expression for the SRAIMA model is:
U ( B S ) = (1 − B S ) D X t = V ( B S )ε t
(9)
Where
U ( B S ) = 1 − Γ1 B S − ... − Γ k B kS , and V ( B S ) = 1 − H1 B S − ... − H m B mS . 2.2 Wavelet Transform and Prediction Model The network traffic time series Xt is decomposed into n layer with wavelet method firstly. For each layer, traffic is predicted with corresponding model. And then, the results are reconstructed with wavelet method. 1. Wavelet Decomposition. We use a redundant wavelet transform, i.e. the Mallat wavelet transform, to decompose the history traffic. 2. Multi-scale prediction. The wavelet coefficients and the scaling coefficients is predicted independently at each scale. The significance of a single branch reconstruction of the signal down into the specific frequency range of the time signal, single-branch reconstructed signal analysis, it is the exclusion of other frequency components of interference. The detail of the reconstructed coefficient is generally regarded as a smooth process to deal with that can be used with SARIMA model and ARIAM model to modeling and predicting. 3. Reconstruction. We use the wavelet coefficients and the scaling coefficients to construct the predicted value of the original traffic.
3 Network Traffic Prediction There is often single service in a virtual network. So we use the Packet traces from WIDE backbone [13] for experiment, a daily trace of a trans-Pacific line: /samplepoint-B/2006/, and from which we get the HTTP traffic as the object original traffic data. Figure 1 shows the observed time period of the original traffic data.
Fig. 1. Traffic rate of the trace measured by WIDE backbone
156
Y. Wei, J. Wang, and C. Wang
Fig. 2. Wavelet coefficients and Scaling coefficients
Using db4 wavelet, using formula (2-4) on time series decomposition, we get the detail coefficients cd1, cd2, cd3, and approximation coefficients of ca3. Figure 2 shows the wavelet coefficients and the scaling coefficients of the WIDE traffic trace. A visual inspection of the scaling coefficients and the wavelet coefficients indicates that the wavelet coefficients can be reasonably treated as a stationary time series with zero mean. Therefore wavelet coefficients can be modeled using the ARMA(p,q) model, or equivalently the ARIMA(p,0,q) model. There is significant non-stationarity in the scaling coefficients. This non-stationarity becomes more obvious when examining the scaling coefficients over a longer time period. B-J forecasting methodology is used to establish the ARIMA(p,d,q) model for prediction at each scale. B-J methodology involves four steps[12]: 1. The first step is the tentative identification of the model parameters. This is done by examining the sample autocorrelation function and the sample partial autocorrelation function [12] of the time series Xt. 2. Estimation. Once the model is established, the model parameters can be estimated using either a maximum likelihood approach or a least mean square approach. In this paper both the maximum likelihood approach and the least mean square approach is tried and their results are almost exactly the same. Thus we stick to the least mean square approach for its simplicity. 3. Diagnostic check. Diagnostic checks can be used to see whether or not the model that has been tentatively identified and estimated is adequate. This can be done by examining the sample autocorrelation function of the error signal, i.e. the difference between the predicted value and the real value. If the model is inadequate, it must be modified and improved. 4. When a model is determined, it can be used to predict future time series. Table 1 shows the model parameters of the ARIMA model at each scale. Three scales are chosen. The choice on the number of scales is made based on the tradeoff between
Network Traffic Prediction Based on Wavelet Transform and Season ARIMA Model
157
the model complexity and accuracy. Further increase in the number of scales significantly increases the complexity of the algorithm but there is only a marginal increase in accuracy. Table 1. The prediction models Scale Wavelet coefficient 1 Wavelet coefficient 2 Wavelet coefficient 3 Scale coefficient 3
Model
ARMA(1,0,4) ARIMA(3,0,4) ARIMA(3,0,7) ARIMA(2,1,7)
Most noise in the model comes from the wavelet coefficients at scale 1. In comparison with the wavelet coefficients and the scaling coefficients at other scales, the wavelet coefficient at scale1 has very weak autocorrelations and a white noise like power spectral density. It is almost like white noise. It is the wavelet coefficients at scale 1 that limits the over all performance that can be achieved by the prediction algorithm. Figure 3 shows the reconstructed prediction result.
Fig. 3. Reconstructed series after prediction
4 Result Analysis To achieve a fair comparison, the same trace is also predicted with single ARIMA model. The parameters of the ARIMA model without wavelet decomposition are p = 1; d = 1; q = 5. To measure the performance of the prediction algorithm, two metrics are used. One is the normalized mean square error(NMSE):
1 NMSE = N
∑
N
( X (n) − Xˆ (n)) n =1 var( X (n))
2
(10)
where Xˆ ( n) is the predicted value of X(n) and var(X(n)) denotes the variance of X(n). The other is the mean absolute relative error (MARE), which is defined as:
MARE =
1 N
∑
N n =1
X ( n) − Xˆ ( n) X ( n)
(11)
158
Y. Wei, J. Wang, and C. Wang
Table 2 shows the performance of the prediction algorithm. For comparison purpose, the performance of traffic prediction algorithms using ARIMA model without wavelet decomposition are also shown in the table. Table 2. The prediction performance Proposed Model
ARIMA Model
NMSE 0.1526
NMSE 0.1650
MARE 0.1731
MARE 0.2132
As shown in the table above, the proposed algorithm gives better performance than the ARIMA model without wavelet decomposition both in NMSE and MARE.
5 Conclusion Since the relevant features that make the traditional model of network traffic, or a single network traffic model can’t make good prediction. This paper introduces the wavelet technology, wavelet transform has been used to the traffic that has a longrange dependence, which in the time domain not easy to solve, and can be transformed into the frequency domain. In this paper we proposed a combined network traffic prediction algorithm based on a time scale decomposition. The history traffic data is first decomposed into different timescales using the Mallat wavelet transform. The prediction of the wavelet coefficients and the scaling coefficients are performed independently at each timescale using the SARIMA and ARIMA model. The predicted wavelet coefficients and scaling coefficient are summed to give the predicted traffic value. As traffic variations at different timescales are caused by different network mechanisms, the proposed time scale decomposition approach to traffic prediction can better capture the correlation structure of traffic caused by different network mechanisms, which may not be obvious when examining the history data directly
Acknowledgment This work is supported by National Nature Science Foundation of China under Grant No. 60874108 and 60904035.
References 1. Leland, W., Taqqu, M., Willinger, W., Wilson, D.: On the self-similar nature of Ethernet traffic (extended version). IEEE/ACM Trans. Networking, 1–15 (1994) 2. Papagiannaki, K., Taft, N., Zhang, Z.-L.: Long-term forecasting of Internet backbone traffic: Observations and initial models. In: IEEE INFOCOM 2003 (2003) 3. Hongfei, w.: Wavelet-based multi-scale network traffic prediction model. Chinese Journal of Computers 29(1), 166–170 (2006)
Network Traffic Prediction Based on Wavelet Transform and Season ARIMA Model
159
4. Leiting, y.: A network traffic prediction of wavelet neural network model. Computer Applications 26(3), 526–528 (2006) 5. Garcia-Trevino, E.S., Alarcon-quino, V.: Single-step prediction of chaotic time series using wavelet-networks. In: Electronics Robotics and Automotive Mechanics Conference, p. 6 (2006) 6. Mao, G.Q.: Real-time network traffic prediction based on a multiscale decomposition. In: Lorenz, P., Dini, P. (eds.) ICN 2005. LNCS, vol. 3420, pp. 492–499. Springer, Heidelberg (2005) 7. Yin, S.Y., Lin, X.K.: Adaptive Load Balancing in Mobile Ad hoc Networks. In: Proceedings of IEEE Communications Society /WCNC 2005, pp. 1982–1987 (2005) 8. Mallat, S., Hwang, W.I.: Singularity detection and processing with wavelets. IEEE Trans on Information Theory 38(4) (1992) 9. Charfiled, C.: Model uncertainty, data mining and statistical inference. J. Roy. Statist. Soc. A 158 (1995) 10. Zouboxian, l.: ARMA model based on network traffic prediction. Computer Research and Development (2002) 11. Groschwitz, N.K., Polyzos, G.C.: A time series model of long-term NSFNET backbone traffic. In: IEEE International Conference on Communications (1994) 12. Bowerman, B.L., O’Connell, R.T.: Time Series Fore casting - Uni¯ ed Concepts and Computer Implementation, 2nd edn. PWS publishers (1987) 13. http://mawi.wide.ad.jp/mawi/
Dynamic Bandwidth Allocation for Preventing Congestion in Data Center Networks Cong Wang1, Cui-rong Wang2, and Ying Yuan1 1
School of Information Science and Engineering, Northeastern University, Shenyang, 11004, China [email protected], [email protected] 2 Department of Information, Northeastern University at Qinhuangdao, 066004, China [email protected]
Abstract. Running multiple virtual machines over a real physical machine is a promising way to provide agility in current data centers. Virtual machines belong to one application are striped over multiple nodes, and the generated traffic often shares the substrate network with other traffic of other applications. In such conditions, clients can experience severely degraded performance, such as TCP throughput collapse and network congestion due to competing network traffic. The basic cause of this problem is that network traffic from multiple sources which shares the same network link can cause transient overloads in the link. In this paper, we make the case that network virtualization opens up a new set of opportunities to solve such congestion performance problem. We present an architecture which compartmentalize virtual machines of same application into same virtual networks by network slicing, and divides the role of the traditional ISPs into two: infrastructure providers and service providers to achieve more commercial agility needed by cloud computing in particular. We also present a dynamic bandwidth allocation mechanism, which can prevent congestion and maximize utilization of substrate networks. Experimental result shows that the network slicing mechanism and bandwidth allocation algorithm can prevent network congestion significantly. Keywords: Virtualization; network slicing; data center network; bandwidth allocation.
1 Introduction In recent years, more and more data centers are being built to provide increasingly popular online application services, such as search, e-mails, IMs, web 2.0, and gaming, etc. These data centers often host some bandwidth-intensive services such as distributed file systems (e.g., GFS [1]), structured storage (e.g., BigTable [2]). Another related trend is CPU virtualization which leads administrators to consolidate more virtual machines (VMs) on fewer physical servers and single substrate network topology. The substrate network can become overwhelmed by high-throughput traffic that can be bursty and synchronized, leading to packet losses. The resulting network could easily bring down the effective and possibly cause congestion collapse. In addition, let so many traffic refer to different applications compete in the same substrate network will lead to confusion of management or even result in terrible breakdown. D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 160–167, 2011. © Springer-Verlag Berlin Heidelberg 2011
Dynamic Bandwidth Allocation for Preventing Congestion in Data Center Networks
161
The major cause of congestion is there are so many traffic from various service has multiplex custom Quality of Service (QoS) requirements, and non-coordination of these traffic may result in crash in network links. Hardware and software mechanisms to control network congestion or support QoS are far from well-developed [3]. In modern data centers, especially towards the cloud computing, not all network traffic is TCP-friendly. In fact, a growing proportion of network traffic is not TCP-friendly such as streaming media, voice/video over IP, and peer-to-peer traffic. Although TCP is self-regulating, even a small amount of non-TCP friendly traffic can disrupt fairsharing of switched network resources [4]. We argue that network virtualization can extenuate the ossifying forces of the current Ethernet and stimulate innovation by enabling diverse network architectures to cohabit on a shared physical substrate [5]. This paper leverages network virtualization to proactively prevent network congestion and provide more agility. In the present architecture, virtual machines of same applications in data center are partitioned into same virtual networks by network slicing. Therefore the business model can be divided into two parts: infrastructure provider who builds the data center fabric and service provider who rents the virtual machines to run their own applications and provides Internet service. We also give a dynamic bandwidth allocation algorithm hosting in switches based Particle Swarm Optimization (PSO), which can prevent congestion and maximize the utilization of the substrate network.
2 The Network Virtualization Architecture Network virtualization is a promising way to support diverse applications over a shared substrate by running multiple virtual networks, which customized for different performance objectives. Researchers have proposed that future networks can run multiple virtual networks, with the separate resources for each virtual network [6]. Leveraging network virtualization technology, the supervisor can classify virtual machines of same application into same virtual networks so as to achieve more agility that needed by cloud computing. In this condition, the fabric owner as the substrate provider to provide the physical hardware, service provider rent virtual machines and virtual networks to run their applications to provide service, therefore can achieve a flexible lease mechanism which is more suitable for modern commercial operations. Today, network virtualization is moving from fantasy to reality, as many major router vendors start to support both switch virtualization and switch programmability [7]. Our network virtualization architecture also based such switches. Fig. 1 depicts the substrate node architecture that is composed of three modules: substrate resources, virtual management, and virtual switch. The substrate resources module comprises the physical resources of the real switch, such as network interfaces, RAM, IO (buffer), etc. The virtual management module is a middleware as an abstract of the substrate resource for the up layer. It contains the components of the adapted-control unit to dynamically build or modify parameters of virtual networks and a monitoring unit to monitor the running status of each virtual network.
162
C. Wang, C.–r. Wang and Y. Yuan
Fig. 1. Substrate switch architecture for virtualization
Virtual switches are logical independent, each of them has absolute module such as forwarding table, queue and virtual CPU (we omitted the CPU virtualization in Fig. 1), and therefore they can run custom network protocol or forwarding strategy on their own. If each virtual switch is assigned appropriate resource, then we can build different virtual networks above the fixed substrate physical network like in Fig. 2.
Fig. 2. Virtual networks above substrate physical network
Dynamic Bandwidth Allocation for Preventing Congestion in Data Center Networks
163
Fig. 2 illustrates two virtual networks above a fixed substrate physical network. The substrate network has five switches and six physical links, and that mean the topology is fixed. On the contrary, the two virtual networks have different topology with different number of switches and links. That is what network virtualization aims at, it can provide much more agility to build and modify network topology or running parameters. Note that we omitted all virtual machines connected to switches. In this architecture, the data center owner can slice the substrate network to many virtual networks and lease them to any service provider. Service provider can run their own service applications in the rental virtual network and can change the number of virtual nodes with their practical requirements. Data center owner as substrate provider just need to control and monitoring the virtual networks running in data centers. With a commercial multi-tenancy strategy and a dynamic bandwidth allocation algorithm between virtual networks, which we will present in Section 3, the data center owner can achieve best revenue and the service provider can also gain sufficient QoS guarantee in their virtual networks.
3 Dynamic Bandwidth Allocation Virtual networks are constructed over the substrate physical network by subdividing each physical switch and each physical link into multiple virtual switches and virtual links. The substrate runs schedulers that arbitrate access to the substrate resources, to give each virtual network the illusion that it runs on a dedicated physical infrastructure. In this case, for QoS guarantee, the most important thing is the bandwidth indemnification in each virtual link. For future cloud computing, there will be lots of virtual machines running on physical machines in future data center networks. If in nonvirtualized network, multiple virtual machines share the same physical link with different type of traffic which we discussed in Section 1 will lead to serious congestion and connection collapse. On the other hand, if we can’t provide a fair-sharing of bandwidth in virtualized network, the results may even worse, for some virtual networks may gain little bandwidth in some virtual links. Smooth running multiple virtual networks needs a dynamic adaptive bandwidth allocation algorithm hosting in the switches. This work is executed by joint working of both the adaptive control unit and the monitoring unit in virtual management module in Fig. 1 we discussed in Section 2. The monitoring unit monitors each physical port of switch, gather each virtual link’s status, and the adaptive control unit calculate how many bandwidth each virtual link should be assigned in that port based the parameters the monitoring unit delivers. In order to provide agility, the adaptive control unit should rebalance the allocations between the virtual networks when the monitoring parameters are changed. Let
s ( k ) denote the congestion price for virtual network k in that virtual port. The monitoring unit will send probe packets from each port periodically to collect the congestion price of each virtual network. We use a mechanism like the TCP congestion (k )
control to calculate s [8]. The congestion prices are summed up over each virtual machine and interpreted as end-to-end packet loss or delay in each virtual link refers to the physical port.
164
C. Wang, C.–r. Wang and Y. Yuan
s ( k ) (t + T ) = ⎡⎣ s ( k ) (t ) − β ( y ( k ) − z ( k ) (t )) ⎤⎦
+
(1)
y ( k ) is the bandwidth assigned to virtual (k ) (k ) network k in that physical port and z is the link rate in that virtual link. s is Where t is time, T is a same timescale,
updated for virtual network k based on the difference between the virtual link load and virtual link capacity. The stepsize β moderates the magnitude of the update, and reflects the tradeoff between convergence speed (large stepsizes) and stability (small (k )
must be nonnegative. stepsizes). The [ ]+ implies that s Each virtual edge node updates the path rates based on the local performance objective and the congestion level on its virtual paths. But all virtual networks should subject to the bandwidth constraint. We model this problem as an optimal model.
minimize
∑ω ∑y k
subject to
ϕ k ⎡⎣ s ( k ) (t ) − β ( y ( k ) − z ( k ) (t )) ⎤⎦
(k )
(k )
+
≤C
(2)
k
variables Where
ω ( k ) is
y ( k ) , ∀k
a priority of virtual network k assigned by substrate provider,
ϕ = C y is a punish parameter which can prevent a virtual network subscribe too many bandwidth in heavy load case. The goal of the model is to optimize the aggregate utility of bandwidth for all virtual networks. The dynamic bandwidth allocation of the substrate provider determines how satisfied each virtual network is with its k
k
(k )
and the priority allocated bandwidth by minimizing the sum of congestion price s of each virtual network. We use Particle Swarm Optimization (PSO) [9] to solve this optimal problem. Suppose that the search space is N-dimensional and a particle swarm consists of n particles, then the i-th particle of the swarm can be represented by a N-dimensional vector x = {x1 , x2 ...xn } . The velocity of this particle can be represented by another N-dimensional vector v = {v1 , v2 ...vn } . Every particle has a fitness that can be determined by the objective function of optimization problem, and knows the best previously visited position Pbest and present position xi of itself. Every particle has known the position of the best individual of the whole swarm Gbest. The velocity of each particle as given is influenced by the particles own flying experience as well as its neighbors flying experience. This velocity when added to the previous position generates the current position of the particle as following:
v(t + 1) = w * v(t ) + rand () * c1 * ( PBest (t ) − x(t )) + rand () * c 2 *(GBest − x(t ))
x(t + 1) = x(t ) + v(t + 1)
(3)
Dynamic Bandwidth Allocation for Preventing Congestion in Data Center Networks
165
Where w is inertia weight, rand() is random number uniformly from the interval [0,1], c1 and c2 are two positive constants called acceleration coefficients. In PSO, each particle use equation (3) to determine its next velocity and position. The violation which is the function of the constrained condition is introduced. The expression of the fitness and violation are as following:
Ffitness = min ∑ k ω ( k )ϕ k ⎡⎣ s ( k ) (t ) − β ( y ( k ) − z ( k ) (t )) ⎤⎦ Fviolation = max(0, ∑ k y ( k ) − C )
+
The particle competition algorithm to calculate optimal model (2) is: n −1
n −1
Function PSO( PBest , GBest ) {assume that each particle is already initialized with random bandwidth allocation in the swarm, and n is the number of particles}; n −1 n −1 PBest = ∅ , GBest = ∅; for (i=1 to n){ Update velocity and position using expression (3); Calculate the F fitness and Fviolation of particle i; if
Ffitness ( Pi n −1 ) > F fitness ( xi ) & & Fviolation ( xi ) = 0 then Pi n = xi ; else Pi n = Pi n −1 ;
if
n −1 Ffitness ( xi ) < GBest & & Fviolation ( xi ) = 0 then n GBest = xi ;}
if
n n n −1 GBest = ∅ then GBest = GBest ;
return
n n GBest , PBest ;
Giving a congestion price s of virtual networks connected to one physical port of the switch, after server iteration, the stable bandwidth allocation rate will be gained, and the substrate provider can set the fair-balanced bandwidth at this port according to the rate. The monitoring unit will send probe packet to collect congestion price for each virtual networks every T timescale. When it finds that the congestion price is changed in a physical port, the bandwidth among virtual networks will be re-allocated by the above algorithm.
4 Performance Evaluation We implemented a bandwidth allocation algorithm discussed in Section 3 using OpenFlow VM environment. OpenFlow is an open standard to support programmable switches. Upon low-level primitive, researchers can build networks with new highlevel properties. It is in the process of being implemented by major switch vendors and hence has good prospects for practical application in future network fabric. From
166
C. Wang, C.–r. Wang and Y. Yuan
the 1.0 release, OpenFlow begin to support QoS by queue slicing. Though the current release just provide bare minimum support for QoS-related Network Slicing, i.e. limit-rate (10MB) and minimum bandwidth guarantee, we believe that it will not be long before the arrival of a advanced version which can be deploy in practical networks. For the limitation of the current software, we made a simple but sufficiently persuasive experiment. The test topology is shown in Fig. 3, where PC 1 has two ports, eth1 and eth2 and PC 2 has one port connect to the OpenFlow Switch.
Fig. 3. Topology for experiment
In PC 1, we use two iperf clients to generate a TCP flow and a UDP flow with destination to PC 2 which run an iperf server. We need to oversubscribe a link for making traffic to be congested in Link 3. We test the each flow’s assigned bandwidth with and without network virtualization to verify if the network virtualization mechanism does work more efficiently. We set both virtual network’s priority to 1 and use a big time scale T=1s for easily counting. 10
10
TCP Traffic UDP Traffic
Assigned bandwidth (MB/s)
Assigned bandwitdh (MB/s)
9 8 7 6 5 4 3 2
6
4
2
TCP Traffic UDP Traffic
1 0
8
0
1
2
3 Time (s)
(a)
4
5
6
0
0
1
2 Time (ms)
3
4
(b)
Fig. 4. Assigned bandwidth of both traffic under non-virtualized and virtualized case
For non-virtualized case as shown in Fig.4 a, when UDP flow was sent, the bandwidth assigned to TCP flow dropped seriously. That will not be accepted for a implementing in practical data center networks for any TCP application. In Fig.4 b, for the substrate network was sliced into two virtual networks, and with the dynamic bandwidth allocation algorithm, both flow (i.e. both virtual networks)’s QoS is guaranteed. In out implementation, both virtual networks have equal priority, therefore they fairly
Dynamic Bandwidth Allocation for Preventing Congestion in Data Center Networks
167
share the bandwidth when they oversubscribed the link simultaneously. This can confirm the truth of that network virtualization can provide a efficient mechanism to prevent congestion in data center network which full of virtual machines, and also a more agility way for QoS guarantee.
5 Conclusion This paper introduced that network virtualization offers new opportunities to alleviate congestion-driven performance problems in data center networks running large number of virtual machines for various applications that generate different type of traffic. We present a network virtualization architecture and a bandwidth allocation algorithm for virtual networks. Maybe there is a certain distance from a practical fully mature system, we believe that network virtualization and dynamic bandwidth allocation are more suitable for the future data center networks, especially in the context of multitenancy mechanism of cloud computing.
References 1. Ghemawat, S., Gobioff, H., Leung, S.: The Google File System. In: ACM SOSP 2003 (2003) 2. Chang, F., et al.: Bigtable: A Distributed Storage System for Structured Data. In: OSDI 2006 (2006) 3. Kant, K.: Towards a virtualized data center transport protocol. In: Proc. of INFOCOM Workshop on High Speed Networks (2008) 4. Rajanna, V.S., Shah, S., Jahagirdar, A., Gopalan, K.: XCo: Explicit Coordination for Preventing Congestion in Data Center Ethernet. In: Proc. of 6th IEEE International Workshop on Storage Network Architecture and Parallel I/Os, Incline Village, NV, USA (2010) 5. Keller, E., Lee, R., Rexford, J.: Accountability in hosted virtual networks. In: ACM SIGCOMM Workshop on Virtualized Infrastructure Systems and Architectures, VISA (2009) 6. He, J., Zhang-Shen, R., et al.: DaVinci: Dynamically Adaptive Virtual Networks for a Customized Internet. In: ACM CoNEXT 2008, Madrid, Spain, December 10-12 (2008) 7. Cisco opening up IOS (2010), http://www.networkworld.com/news/2007/121207-cisco-ios.html 8. Low, S.H.: A duality model of TCP and queue management algorithms. IEEE/ACM Trans. Networking 11, 525–536 (2003) 9. Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Networks (1995)
Software Comparison Dealing with Bayesian Networks Mohamed Ali Mahjoub1 and Karim Kalti2 1
Preparatory Institute of Engineer of Monastir (IPEIM) 5019 Monastir, Tunisia [email protected] 2 University of Monastir (FSM) Monastir, Tunisia [email protected]
Abstract. This paper presents a comparative study of tools dealing with Bayesian networks. Indeed, Bayesian networks are mathematical models now increasingly used in the field of decision support and artificial intelligence. Our study focuses on methods for inference and learning. It presents a state of the art in the field. Keywords: Bayesian network, software, comparison.
1 Introduction Bayesian networks are part of the family of graphical models [1],[3]. They band together in the same formalism of graph theory and the probability to provide effective intuitive tools to represent a probability distribution attached to a set of random variables. For handling and scheduling algorithms dealing with Bayesian networks, several libraries have been initiated. The purpose of this paper is firstly, to study these libraries, see what provided each of them, the implemented algorithms, data types supported and the development interface. Thus, we present a synthesis of Bayesian networks. Initially, we present the formalism of Bayesian networks covering fundamentals notions and associated areas of applications of Bayesian networks. In this article, we review some of the more popular and/or recent software packages for dealing with graphical models.
2 Formalism of Bayesian Networks One of the key issues in the field of research in Artificial Intelligence is being able to design and develop dynamic and evolving systems. Therefore, they must be equipped with intelligent behaviors that can learn and reason. But in most cases, the knowledge gained is not always adequate to allow the system to take the most appropriate decision. To answer such questions, several methodologies have been proposed, but only probabilistic approaches are better suited not only to reason with the knowledge and belief uncertain, but also the structure of knowledge representation. These probabilistic approaches are called Bayesian networks. They are a compact representation of a D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 168–177, 2011. © Springer-Verlag Berlin Heidelberg 2011
Software Comparison Dealing with Bayesian Networks
169
joint probability of variables on the basis of the concept of conditional independence. In addition, they aim to facilitate the description of a collection of belief by making explicit the relationship of causality and conditional independence among these beliefs and provide a more efficient way to update the strength of beliefs when new evidence is observed. 2.1 Basic Concepts In a first part, we recall the Bayes theorem: Given two events A and B, Bayes theorem can be stated as follows: P(A|B) = P(B|A)*P(A)/P(B)
(1)
Where P(A) is the prior probability of A, P(B|A) is the likelihood function of A, and P(B) is the prior probability of B. Thus, P(A|B) is a posterior probability of A given B. A Bayesian network is an acyclic graph. This graph is directed and without circuit possible. Each node of a Bayesian network is a label that is an attribute of the problem. These attributes are binary, which can take (with some probability) the value TRUE or FALSE, which means that a random variable is associated with each attribute. A Bayesian network is defined by: 9 A directed graph without circuit G = (V, E) where V is the set of nodes G and E is the set of arcs of G. 9 A finite probability space (Ω,Z,p). 9 A set of random variables associated with nodes of the graph defined on (Ω,Z,p),as : (2) where P(Vi) is the set of causes (parents) of Vi in the graph G. A Bayesian network is then composed of two components; a causal directed acyclic graph (qualitative representation of knowledge) and a set of local distributions of probability1. After defining the different values that can take a characteristic, the expert must indicate the different relationships between these characteristics. Finally, the definition of the different probabilities is necessary to complete the network, each value (state) of a given node must have the probability of occurrence.
3 Problems Associated with Bayesian Networks There are several problems in the use of BNs, we will cite the mains [1]:The first one; The correspondence between the graphical structure and associated probabilistic structure will allow to reduce all the problems of inference problems in graph theory. However, these problems are relatively complex and give rise to much research. The second difficulty of Bayesian networks lies precisely in the operation for transposition of the causal graph to a probabilistic representation. Even if the only probability 1
Quantitative representation of knowledge.
170
M.A. Mahjoub and K. Kalti
tables needed to finish the entire probability distribution are those of a node conditioning compared to his parents, he is the definition of these tables is not always easy for an expert. Another problem of Bayesian networks, the problem of automatic learning of the structure that remains is a rather complex problem.
4 Study on Applications of Learning Bayesian Networks We present a study on the tools manipulating Bayesian networks, see what provides each of them specifies the algorithms for inference and learning and a study on exchange formats for interaction between these programs. Add to this a comparison of these tools in tabular form to facilitate understanding of them. For handling and scheduling algorithms dealing with Bayesian networks, several bookstores and software have been developed for this purpose. We cite these tools on which more research has been conducted. BAYES NET TOOLBOX (BNT) BNT is an open-source library used Matlab2 and is now supported by many researchers view the integration of several algorithms for inference and learning Bayesian networks. However, this library is still difficult to use by non-specialists because it requires some knowledge of Matlab and BNT. Indeed, the manipulation is done through the use of such knowledge and the introduction of the code with a text editor Matlab. BNT offers several inference algorithms for discrete Bayesian networks, Gaussian, or mixed (conditional Gaussian) as the elimination of variables, junction Tree, Quick Score, Pearl exact algorithm (for polyarbres) or approached and sampled (Likelihood Weighting and Gibbs Sampling). For learning parameters, BNT uses two types of learning settings first by maximum likelihood or maximum a posteriori for complete data and the second using the EM algorithm for incomplete data. On learning of structure, BNT uses several scoring functions as the criterion BIC. The only saving format is supported by BNT format Matlab extension ". m". BAYESIALAB BayesiaLab is a product of Bayesia (www.bayesia.com), a French company dedicated to the use of methods of decision support and learning from artificial intelligence and their operational applications (industry, services , finance, etc. BayesiaLab looks like a laboratory full of manipulation and study of Bayesian networks. It is developed in Java and is currently available in French, English and Japanese. BayesiaLab can treat the entire chain modeling study of a system by Bayesian network. Learning in BayesiaLab is using a text file or an ODBC link describing all cases. This application uses exact and approximate inference, but the algorithms are not mentioned. The backup formats supported by BayesiaLab formats are "XBL", "BIF", "DNE" and "NET". NETICA Netica is a Bayesian network software with the greatest circulation in the world3. It is used for diagnosis, prevention or simulation in the fields of finance, environment, 2 3
Following the work of Kevin Murphy. It is developed in 1992 by the Society Norsys which has a free version of software that is limited to the use of 15 variables and samples of 1000 cases for both learning from data.
Software Comparison Dealing with Bayesian Networks
171
medicine, industry and other fields. NETICA software is very responded in the use of Bayesian networks, but uses only one inference algorithm: junction tree. On the other hand, Netica offers a graphical interface for easy operation, and can easily transform a Bayesian network and explore relationships between variables in a model built by learning from data by inverting links or absorbing nodes, while keeping unchanged the probability of overall Bayesian network. Learning in Netica is using a text file and CVS files delimited by tabs or an ODBC link describing all cases. The backup formats supported by Netica are "DNE" and "NETA". This application also allows the import of a Bayesian network in the format "DSC", "DXP" and "NET". HUGIN Hugin is one of the greatest tools used in Bayesian networks. It is a commercial product with similar functionality to the BNT. It was one of the first packet to the model DAG. This tool provides a graphical environment and a development environment that help define and build the foundations of knowledge based on Bayesian networks. Hugin supports one inference algorithm which is the junction tree. The Junction tree algorithm can be seen and it is possible to change the method of triangulation. Learning structure and parameters is done from a Java, C or Visual Basic. The learning parameters in Hugin is with the use of the EM algorithm while learning structures is ensured with the use of two algorithms PC and NPC. The backup formats supported by HUGIN formats are "OOBN" and "HKB" and "NET". JAVABAYES JavaBayes is a set of Java tools that create and manipulate Bayesian networks4. This system consists of a graphical editor, a core inference and a set of parsers. The graphical editor allows the user to create and modify Bayesian networks. Methods in JavaBayes are not commented and many variables were not significant names which makes understanding the code difficult. In addition, this JavaBayes still many bugs and shortcomings in its interface. It is difficult to handle, as safeguarding the network or import into another program. The inference algorithms implemented in this system are the elimination of variables and the Junction tree. JavaBayes is able to use models with sets of distributions to calculate the intervals of posterior distributions or intervals expectations, but it does not propose algorithms for learning parameters and structure. The backup formats supported by JavaBayes formats are "BIF" and "XML". GENIE Released in 1998 by the group decision systems Druzdzel, GENIE (Graphical Network Interface) was a development environment for the decision and the construction of Bayesian networks, characterized by its inference engine SMILE (Structural Modeling Reasoning, and Learning Engine) . Engineering offers several inference algorithms as it has several backup formats for the exchange network between different applications. Genie uses essentially the algorithm of junction tree and Polytree algorithm for inference, and several other approximate algorithms that can be used if the networks become too large for the clustering (logic sampling, likelihood weighting,
4
It was developed by Fabio Gagliardi Cozman in 1998 and has been licensed under the GNU (General Public License).
172
M.A. Mahjoub and K. Kalti
self importance and heuristic importance sampling, sampling backwood). Learning settings and learning structures are supported. The backup formats supported are engineered formats "xDSL", "DSL", "NET", "DNE", "DXP", "ERG" and "DSC". BNJ BNJ (Bayesian network tools in Java) is a set of Java tools research and development of Bayesian networks. This project was developed within the KDD laboratory at the University of Kensai. This is an Open Source project licensed under the GNU5. Its latest version has been published in April 2006. It provides a set of inference algorithms for Bayesian networks. In BNJ it is possible to define two types of probability distribution for the nodes: discrete tabular layout and continuous distribution. Bayesian networks created are stored in XML files. BNJ provides two categories of inference algorithms: exact inference ("Tree Junction", "Elimination of variables with optimization") and approximate inference which is based on exact inference algorithms. Indeed, some methods use the concept of sampling as "Adaptive Importance Sampling (AIS)", "Logic Sampling" and "Forward Sampling" and also, there are other methods of applying the algorithms on an exact inference Selection of arcs of the graph to be treated as "KruskalPolytree", "BCS" and "PTReduction. On the other hand, traveling to all source files of this toolkit we find no implementation of learning algorithms or parameters or structure. The backup formats supported by BNJ formats are "NET", "Hugin" and "XML". MSBNX MSBNX is a component-based Windows applications to create and evaluate Bayesian networks. MSBNX algorithm uses the junction tree algorithm for the inference. There is no learning algorithm for parameters or structures. The backup formats supported by MSBNX formats are "XBN", "DSC" and "XML" (XML for this application has a specific design MSBNX different from other XML formats of other applications that provide a standard format). SAMIAM Samiam has two main components: a graphical interface and an inference engine. The graphical interface allows users to develop models of Bayesian networks and save them in a variety of formats. The inference engine includes many works: the classical inference, parameter estimation, space-tim; sensitivity analysis and explanation, based on the MAP and MPE. Samiam is a free software including a variety of algorithms for inference but are based on one inference algorithm junction Tree. It supports three implementations of the junction Tree algorithm: Hugin architecture, architecture Shenoy-Shafer, and a new architecture that combines best architectures previous PD. Samiam uses the EM algorithm (Expectation Maximization) for estimating parameters of the network based on the data. It adopts the "File" Hugin format for specifying data as a set of cases6. The backup formats supported by Samiam are "NET" and "HUGIN.
5 6
GNU : General Public License. It also includes utilities to generate data randomly from a given network and for storing data in files.
Software Comparison Dealing with Bayesian Networks
173
UNBBAYES UnBBayes is an open source modeling, learning and probabilistic reasoning networks. It uses a variety of algorithms for inference and learning but there are often bugs in handling of a mistake. The software is in an infinite loop that crashes the system and this is due to the lack of exceptions generation of software errors. UnBBayes uses algorithms of junction tree, Likelihood Weighting and Correct Classification Review to make the inference. Also, it uses algorithms K2, B, V and Incremental Learning to learning. The backup formats supported by UnBBayes are "NET", "UBF" and "XML". This application also allows the import of a Bayesian network in the format "OWL". PROBT ProBT is a C++ library. The commercial and industrial exploitation of this software has been granted an exclusive basis to the Company Probayes. ProBT is a very powerful software that offers a free version for the purpose of research, but it does not provide the source code itself, it offers only an explanation of the different classes in which it is built. For exact inference, ProBT uses the "Successive Restrictions Algorithm (SRA). For approximate inference, several arrangements of approximation are used by ProBT like Monte Carlo, the simultaneous evaluation and maximization. ProBT also provides algorithms for learning parameters. In the case of complete data, it proposes an algorithm based on maximum likelihood and an algorithm based on the principle of EM in the case of incomplete data. On the other hand, ProBT contained no implementation for learning Bayesian network structure7. The save format supported by ProBT format is "XML" which is specific to this application. This application offers the possibility to import an Excel file after classified data in tables with graphs. ANALYTICA Analytica is a shareware software for creating, analyzing and modeling of graphical models like Bayesian networks. There may be a limited trial version of this software, simply fill out a form indicating the telephone number on their website and you phone for your opinion on their software. This software offers a single inference algorithm specific to that application can not be compared by the inference algorithms known in the literature of Bayesian networks. Analytica uses its own inference engine ADE (The Analytica Decision Engine). There is no learning algorithm or parameters or structure. The only saving format is supported by Analytica format "ANA". BNET BUILDER BNet.Builder is software for the Bayesian networks developed with Java and its use is very easy. This software uses an inference engine that does not mention the inference algorithm used. Bnet Builder provides an onboard engine to the inference. Browsing through all the documentation and help documents and using the software, there is no evidence indicating that this software makes learning structure or parameter. The backup formats supported by BNetBuilder formats are "XBN", "DNE" and "NET". 7
But recent projects are under development as the algorithm MWST and K2 for learning structure from complete data.
174
M.A. Mahjoub and K. Kalti
PNL Open Source Probabilistic Networks Library, a tool for working with graphical models, supporting directed and undirected models, discrete and continuous variables, various inference and learning algorithms. BAYES BUILDER BayesBuilder is free software for the construction of Bayesian networks. It is only available on Windows, because the inference engine is written in C + + and has been compiled for Windows yet8. This software uses an inference engine without mentioning the inference algorithm, which classifies it as a black box in algorithmic perspective. There is an inference algorithm not mentioned. He supports neither the training nor the structure of the learning parameters. The backup formats supported by BNetBuilder formats are "bbnet. This application also allows the import of a Bayesian network in the format "NET", "DNET", "DSC" and "BIF". XBAIES This software bayesian networks as dialogs for inference and learning Bayesian networks. This software has no graphical interface for entering the network but can be entered based on rules, not making it difficult to use as new generations of applications of expert systems is now based on graphical models and more on systems based on rules. Also it uses only one inference algorithm which is the junction tree. XBAIES uses the junction tree algorithm. Learning parameters and structure are supported XBAIES does not provide a format for saving a Bayesian network, but entering a network is literally using the buttons. OPENBUGS BUGS is software for Bayesian networks, which do not show the networks built, he has limited command lines similar to those of Matlab without graphical display of Bayesian network. This software is for the older generation of expert systems because it is based on rules and not on graphical models. The inference algorithm used in Bugs is Gibbs sampling. Bugs is the learning parameter but not the structure. The save format is supported by OpenBugs format "BUGS". BAYESIAN KNOWLEDGE DISCOVERER / BAYESWARE DISCOVERER Bayesian Knowledge Discoverer is a free software that has been replaced by a commercial version Bayesware Discoverer which itself offers a trial version of 30 days. This software provides a graphical interface with powerful visualization options and is available for Windows, Unix and Macintosh. This software uses only one inference algorithm which is not mentioned. The software algorithm uses the junction tree for inference. Bayesware been learning structure and parameters from complete data and incomplete. The algorithm used is approached 'bound and collapse'. The save format supported by BKD / BD is the format "DBN". This application also allows the import of a Bayesian network in the format "DSC".
8
The GUI program is written in Java and it is easy to use.
Software Comparison Dealing with Bayesian Networks
175
VIBES VIBES (Variational Inference for Bayesian Networks) is a java program for Bayesian networks. This software uses a single inference algorithm not known in the literature of Bayesian networks. VIBES provides an inference engine to clean it VIBES (Variational Inference for Bayesian Networks). The learning parameters is supported while learning structure is not supported. The save format supported by VIBES format is "XML" which has a specific design for this application.
5 Comparison of Tools Manipulating Bayesian Networks To better understand the tools manipulating Bayesian networks, we propose in this section a comparative analysis, summarized in two tables. First, we propose a comparison of applications of Bayesian networks in the form of a table based on the type Table 1. Table of comparison tools manipulating Bayesian networks
Tools name BNT BayesiaLab NETICA HUGIN JavaBayes
Type Source Library Matlab/ c Library No software No software No software Java
GeNIe BNJ MSBNX
software software software
No Java No
Free Free Free
SamIam
software
Java
Free
UnBBayes
software
Java
Free
ProBT Analytica BNet Builder Bayes builder OpenBugs
Library software software software software
C++ No No No No
Free PLEV PLEV PLEV Free
BKD/BD PNL
software Librairy
No Yes
PLEV Free
VIBES
software
Java
Free
PLEV : Pay-limited evaluation version
License Free PLEV PLEV PLEV Free
Web site http://www.cs.ubc.ca/~murph yk/Software/BNT/bnt.html http://www.bayesia.com http://www.norsys.com http://www.hugin.com http://www.cs.cmu.edu/~java /bayes/Home http://genie.sis.pitt.edu http://bnj.sourceforge.net http://research.microsoft.com/ enus/um/redmond/groups/adapt/ /msbnx http://reasoning.cs.ucla.edu/sa miam http://unbbayes.sourceforge.n et http://www.probayes.com http://www.lumina.com http://www.cra.com http://www.snn.ru.nl http://www.mrc/bsu.cam.ac.uk/bugs http://bayesware.com http://sourceforge.net/projects //openpnl http://www.johnwinn.org/ et http://vibes.sourceforge.net/
176
M.A. Mahjoub and K. Kalti
of tool, the availability of source code and what language used for development, the software license and the site Official web application. Second we propose a second comparison chart applications based on the types of inference algorithms used in learning, the formats supported and types of variables used. On the other hand all tools use discrete variables. A more extensive comparison can be found at this [2]. Table 2. Technical Comparison of tools manipulating Bayesian networks
Tools name BNT BayesiaLab NETICA
Inference E/A E/A Exact
Learning P/S P/S P
HUGIN JavaBayes GeNIe
Exact Exact E/A
P/S -
BNJ MSBNX SamIam
E/A Exact Exact
P
UnBBayes ProBT Analytica AgenaRisk BNet Builder Bayes builder
Exact E/A Exact E/A Exact E/A
S P/S P -
XBAIES OpenBugs BKD/BD VIBES
Exact E/A E/A E/A
P/S P P/S P
Supported formats m BIF, xbl, dne et net. dne, neta, dsc, dxp, net. Oobn, hkb, net. BIF et XML xdsl, dsl, net, dne, dxp, erg et dsc XML, net et hugin xbn, dsc et XML. net, hugin, dsl, xdsl, dsc, dne, dnet, erg. XML, net et ubf. xml ana cmp xbn, dne et net. bbnet, net, dnet, dsc, BIF bugs dbn et dsc xml
Variables
C/D C/D C/D C/D Discrete Discrete Discrete Discrete C/D Discrete C/D C/D C/D C/D D C/D C/D C/D C/D
P/S : parameter/structure C/D: continous and discrete E/A : Exact/Approximate
6 Conclusion Graphical models are a way to represent conditional independence assumptions by using graphs. The most popular ones are Bayesian networks. To conclude, we can say that BNT library is most useful tool for Bayesian networks. Also we note that all softwares are paying while all the libraries are free.
Software Comparison Dealing with Bayesian Networks
177
References 1. Naïm, P., Wuillemin, P.-H., Leray, P., Pourret, O., Becker, A.: Ré-seaux Bayésiens, 3rd edn. Eyrolles, 2. Murphy, K.: Software for Graphical models: a review. ISBA Bulletin (December 2007) 3. Pearl, J.: Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge (2000)
Adaptive Synchronization on Edges of Complex Networks Wenwu Yu Department of Mathematics, Southeast University, Nanjing 210096, China [email protected], [email protected]
Abstract. This paper studies distributed adaptive control for synchronization on edges of complex networks. Some distributed adaptive strategies are designed on all the coupling weights of the network for reaching synchronization, which are then extended to the case where only a small fraction of coupling weights are adjusted. A general criterion is given and it is found that synchronization can be reached if the corresponding edges and nodes of the updated coupling weights form a spanning tree. Finally, simulation examples are given to illustrate the theoretical analysis. Keywords: Synchronization; Complex network; Distributed adaptive strategy; Pining control; Leader-follower control.
1
Introduction
Networks exist everywhere nowadays, such as biological neural networks, ecosystems, social network, metabolic pathways, the Internet, the WWW, electrical power grids, etc. Typically, each network is composed of the nodes representing individuals in the network and the edges representing the connections among them. Recently, small-world and scale-free complex network models were proposed by Watts and Strogatz (WS) [13], and Albert and Barabasi (BA) [2]. Thereafter, the study of various complex networks has attracted increasing attention from researchers in various fields of physics, mathematics, engineering, biology, and sociology. Synchronization, a typical collective behavior in nature, has received a lot of interests, since the pioneering work of Pecora and Carroll [9] due to their potential applications in secure communications, chemical reactions, biological systems, and so on. Usually, there are large numbers of nodes in real-world complex networks nowadays. Therefore, much attention has been paid to the study of synchronization in various large-scale complex networks with network topology [1,7,8,10,11,14,15,16,18,19,20,21]. In [10,11], local synchronization was investigated by the transverse stability to the synchronization manifold, and a distance from the collective states to the synchronization manifold was defined in [14,15], based on which some results were obtained for global synchronization of complex networks and systems [7,8,16,18,19,20,21]. D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 178–187, 2011. c Springer-Verlag Berlin Heidelberg 2011
Adaptive Synchronization on Edges of Complex Networks
179
Recently, in order to save control cost, pinning control has been widely investigated in complex networks [3,12,16,20,21]. If the network can not be synchronized by itself, some controllers may be applied on a small fraction of all the nodes to force the network to synchronize, which is known as pinning control. However, many derived conditions for ensuring network synchronization were sufficient and thus a little conservative. Then, a lot of works have been devoted to use adaptive strategies to adjust network parameters to get better conditions for reaching network synchronization based on the previous results in adaptive synchronization in nonlinear systems [17,22]. For example, in [3,20,24,25], the adaptive laws were applied on the control gains from the leader to the nodes, and in [3,20], adaptive schemes were designed on the network coupling strength which are centralized algorithms. However, there are few works about how to update the coupling weights of the network for reaching synchronization. In [23], an algorithm was proposed on the updated coupling weights for reaching network synchronization but the theoretical analysis was not shown. Recently, a general distributed adaptive strategy on the coupling weights was proposed and some theoretical conditions were also derived for reaching network synchronization in [4]. However, the conditions in [4] are not easy to apply and in addition, it is impossible to update all the coupling weights as in [4,23]. A basic contribution of this paper is that a simple distributed adaptive strategy is applied on the coupling weights of the network and the derived conditions are very easy to apply. For example, if the corresponding edges and nodes with updated coupling weights form a spanning tree, synchronization can be reached in the network, i.e., updating N − 1 coupling weights is enough to achieve network synchronization. Another contribution of this paper is that pinning weights control is applied both on a small fraction of the controlled nodes and coupling weights which originates from pinning nodes control in the literature [3,12,16,20,21,24,25]. Additionally, some novel distributed adaptive control laws are proposed for reaching network synchronization in this paper. The rest of the paper is organized as follows. In Section 2, some preliminaries are briefly outlined. Distributed adaptive control strategies are proposed in Section 3. In Section 4, simulation examples are given on the random and scale-free networks to illustrate the theoretical analysis. Conclusions are finally drawn in Section 5.
2
Preliminaries
Let G = (V, E, G) be a weighted undirected graph of order N , with the set of nodes V = {v1 , v2 , · · · , vN }, the set of undirected edges E ⊆ V × V, and a weighted adjacent matrix G = (Gij )N ×N . An undirected edge in graph G is denoted by eij = (vi , vj ). If there is an edge between node vi and node vj , then Gij (t) = Gji (t) > 0 is the time-varying weight associated with the edge eij ; otherwise, Gij (t) = Gji (t) = 0 (j = i). As usual, we assume there is no self-loop in G. Consider the following complex dynamical network consisting of N nodes with linearly diffusive coupling [7,8,10,11,14,15,18,19].
180
W. Yu
x˙ i (t) = f (xi (t), t) +
N
Gij (t)Γ (xj (t) − xi (t)), i = 1, 2, · · · , N,
(1)
j=1,j=i
where xi (t) = (xi1 (t), xi2 (t), · · · , xin (t))T ∈ Rn is the state vector of the ith node, f : Rn ×R+ −→ Rn is a nonlinear vector function, Γ = diag(γ1 , . . . , γn ) ∈ Rn×n is a semi-positive definite inner coupling matrix where γj > 0 if two nodes can communicate through the jth state, G(t) = (Gij (t))N ×N is the coupling configuration matrix representing the topological structure of the time-varying network, and the Laplacian matrix L is defined by Lij (t) = −Gij (t), i = j; Lii (t) = −
N
Lij (t),
(2)
j=1,j=i
which ensures the diffusion that
N j=1
Lij (t) = 0 for all t. Equivalently, network
(1) can be rewritten in a simpler form as follows: x˙ i (t) = f (xi (t), t) − c
N
Lij (t)Γ xj (t), i = 1, 2, · · · , N.
(3)
j=1
The objective of control here is to find some adaptive control laws acting on Gij (t) under fixed network topology such that the solutions of the controlled network (3) can reach synchronization, i.e., lim xi (t) − xj (t) = 0, ∀ i, j = 1, 2, · · · , N.
t→∞
(4)
Here, it is emphasized that the network structure is fixed and only the coupling weights can be time-varying, i.e., if there is no connection between nodes i and j, Gij (t) = Gji (t) = 0 for all t. Assumption 1. There exist a constant diagonal matrix H = diag(h1 , . . . , hn ) and a positive value ε > 0 such that (x − y)T (f (x, t) − f (y, t)) − (x − y)T HΓ (x − y) ≤ −ε(x − y)T (x − y), ∀x, y ∈ Rn .
(5)
Note that Assumption (5) is very mild. For example, all linear and piecewise linear functions satisfy this condition. In addition, if ∂fi /∂xj (i, j = 1, 2, . . . , n) are bounded and Γ is positive definite, the above condition is satisfied, which includes many well-known systems. Without loss of generality, only connected networks are considered throughout the paper. Otherwise, one can consider the synchronization on each connected component of the network.
Adaptive Synchronization on Edges of Complex Networks
181
Lemma 1. [6] (1) The Lapacian matrix L in the undirected network G is semi-positive definite. It has a simple eigenvalues 0 and all the other eigenvalues are positive if and only if the network is connected. (2) The second smallest eigenvalue λ2 (L) of the Laplacian matrix L satisfies xT Lx λ2 (L) = min . xT 1N =0,x=0 xT x N N 1 (3) For any η = (η1 , . . . , ηN )T ∈ RN , η T Lη = Gij (ηi − ηj )2 . 2 i=1 j=1
3
Distributed Adaptive Control in Complex Networks
In this section, some distributed adaptive laws on the weights Lij for i = j are proposed, which result in the obtained adaptive laws on Gij since Lij = −Gij N 1 for i = j. Let x ¯= xj . Then, one gets N j=1 x ¯˙ (t) =
N 1 f (xj (t), t). N j=1
(6)
Subtracting (6) from (3) yields the following error dynamical network e˙ i (t) = f (xi (t), t) −
N N 1 f (xj (t), t) − c Lij (t)Γ ej (t), i = 1, 2, . . . , N, (7) N j=1
j=1
where ei = xi − x ¯, i = 1, . . . , N . 3.1
Distributed Adaptive Control in Complex Networks
Theorem 1. Suppose that Assumption 1 holds. The network (3) is synchronized under the following distributed adaptive law L˙ ij (t) = −αij (xi − xj )T Γ (xi − xj ), 1 ≤ i = j ≤ N,
(8)
where αij = αji are positive constants, 1 ≤ i = j ≤ N . Proof. Consider the Lyapunov functional candidate: V (t) =
N N N 1 T ei (t)ei (t) + 2 i=1
i=1 j=1,j=i
c (Lij (t) + σij )2 , 4αij
where σij = σji are positive constants to be determined, 1 ≤ i = j ≤ N .
(9)
182
W. Yu
The derivative of V (t) along the trajectories of (7) gives V˙ =
N
eTi (t)e˙ i (t) +
i=1
N N i=1 j=1,j=i
⎡
c (Lij (t) + σij )L˙ ij (t) 2αij
⎤ N N 1 = eTi (t) ⎣f (xi (t), t) − f (xj (t), t) − c Lij (t)Γ ej (t)⎦ N j=1 i=1 j=1 N
N N c (Lij (t) + σij )(xi − xj )T Γ (xi − xj ) 2 i=1 j=1,j=i N N 1 T = ei (t) f (xi (t), t) − f (¯ x, t) + f (¯ x, t) − f (xj (t), t) N j=1 i=1 N N N c −c Lij (t)Γ ej (t) − (Lij (t) + σij )(ei − ej )T Γ (ei − ej ). (10) 2
−
j=1
Since
N
i=1 j=1,j=i
eTi (t) = 0, one has
i=1
N i=1
eTi (t) f (¯ x, t) −
1 N
f (x (t), t) = 0. From j j=1
N
Assumption 1, it follows that N
eTi (t) [f (xi (t), t) − f (¯ x, t)] ≤ −ε
i=1
N
eTi (t)ei (t) +
i=1
N
eTi (t)HΓ ei (t). (11)
i=1
Define the Laplacian matrix Σ = ( σij )N ×N where σ ij = −σij , i = j; σ ii = N − σ ij . In view of Lemma 1, one obtains j=1,j=i
c
N N
(Lij (t) + σij )(ei − ej )T Γ (ei − ej )
i=1 j=1,j=i
= −2c
N N
Lij (t)eTi Γ ej + 2c
i=1 j=1
N N
σ ij (t)eTi Γ ej .
(12)
i=1 j=1
Therefore, combining (10)-(12) and by Lemma 1, one finally has V˙ ≤ −ε
N i=1
eTi (t)ei (t) +
N
eTi (t)HΓ ei (t) − c
i=1
N N
σ ij (t)eTi Γ ej
i=1 j=1
T
= e (t) [−ε(IN ⊗ In ) + (IN ⊗ HΓ ) − c(Σ ⊗ Γ )] e(t) ≤ eT (t) [−ε(IN ⊗ In ) + (IN ⊗ HΓ ) − cλ2 (Σ)(IN ⊗ Γ )] e(t).
(13)
By choosing σij sufficiently large such that cλ2 (Σ) > maxj (hj ), then one gets (IN ⊗ HΓ ) − cλ2 (Σ)(IN ⊗ Γ ) ≤ 0. Therefore, one has V˙ ≤ −εeT (t)e(t),
Adaptive Synchronization on Edges of Complex Networks
where e(t) = (eT1 (t), eT2 (t), . . . , eTN (t))T . This completes the proof.
183
2
Remark 1. The dynamical coupling weights are adjusted according to the local synchronization property between the node and its neighbors in [23]. A new type of decentralized local strategies was then proposed and detailed theoretical analysis was also given in [4]. However, the derived conditions are conservative and not easy to apply. In Theorem 1, by designing a simple distributed adaptive law on the coupling weights, the conditions in [4] are totally removed. As long as Assumption 1 is satisfied, the network can reach synchronization under (8). In addition, one can also let the convergence rate αij be very small if the communication ability between nodes i and j are limited. 3.2
Distributed Adaptive Pinning Weights Control in Complex Networks
In Theorem 1, all the coupling weights are adjusted according to the distributed adaptive law (8). Though the algorithm is local and decentralized, it is literally impossible to update all the coupling weights of the network. Next, we aim to update a minority of the coupling weights so that synchronization can still be reached, which is called pinning weights control here. Assume that the pinning strategy is applied on a small fraction δ (0 < δ < 1) of the coupling weights in network (3). Suppose that the coupling weights Lk1 m1 , Lk2 m2 , . . . , Lkl ml are selected, where l = δN represents the integer part of the real number δN . E, G) be a weighted undirected graph, with the set of nodes Let G = (V, V = {vk1 , vk2 , · · · , vkl , vm1 , vm2 , · · · , vml }, the set of undirected edges e ki mi = and a weighted adjacent matrix G = (G ij )N×N where G ki mi (t) = ( vki , v mi ) ∈ E, be Gmi ki (t) > 0, i = 1, . . . , l. It is very clear that G is a subgraph of G. Let L the corresponding Laplacian matrix of the adjacency matrix G. Theorem 2. Suppose that Assumption 1 holds. Under the following distributed adaptive law L˙ mi ki (t) = L˙ ki mi (t) = −αki mi (xki − xmi )T Γ (xki − xmi ), i = 1, . . . , l, (14) where αki mi = αmi ki are positive constants, the network (3) is synchronized if there are positive constants σ mi ki such that (IN ⊗ HΓ ) − cλ2 (L)(IN ⊗ Γ ) ≤ 0,
(15)
where L = (Ljs )N ×N is a Laplacian matrix with Ljs = Lsj = − σmi ki when j = mi and s = ki ; otherwise Ljs = Lsj = Ljs , i = 1, . . . , l, 1 ≤ j = s ≤ N . Proof. Consider the Lyapunov functional candidate: V (t) =
N
l
i=1
i=1
1 T c ei (t)ei (t) + (Lmi ki (t) + σ mi ki )2 , 2 2αki mi
(16)
184
W. Yu
where σ mi ki = σ ki mi are positive constants to be determined, i = 1, 2, . . . , l. The derivative of V (t) along the trajectories of (16) gives ⎡ ⎤ N N V˙ = eTi (t) ⎣f (xi (t), t) − f (¯ x, t) − c Lij (t)Γ ej (t)⎦ i=1
−c
j=1 l
(Lmi ki (t) + σ mi ki )(eki − emi )T Γ (eki − emi )
i=1 T
≤ e (t) [−ε(IN ⊗ In ) + (IN ⊗ HΓ ) − c(L ⊗ Γ )] e(t).
(17)
The proof is completed. 2 In Theorem 2, a general criterion is given for reaching synchronization by using the designed adaptive law (14). Certainly, one can solve (15) by using the LMI approach. However, sometimes the condition in (15) is difficult to apply in a very large scale network. Next, we aim to simplify the condition in (15). Corollary 1. Suppose that Assumption 1 holds. If the network G contains a spanning tree, the network (3) is synchronized under the following distributed adaptive law L˙ mi ki (t) = L˙ ki mi (t) = −αki mi (xki − xmi )T Γ (xki − xmi ), i = 1, . . . , l, (18) where αki mi = αmi ki are positive constants. Proof. Consider the Lyapunov functional candidate in (16) and take the derivative of V (t) along the trajectories of (16) gives
⊗ Γ ) e(t), (19) V˙ ≤ eT (t) −ε(IN ⊗ In ) + (IN ⊗ HΓ ) − c(L ⊗ Γ ) − c(Σ are Laplacian matrix defined by L = L − L and Σ = (− where L and Σ σij )N ×N with σ mi ki > 0 and 0 otherwise, i = 1, . . . , l. It is easy to see that each offdiagonal entry in L is the same as in L except that Lmi ki = Lki mi = 0. Since G can be sufficiently large if one chooses sufficiently contains a spanning tree, λ2 (Σ) large positive constants σ ki mi , i = 1, . . . , N . Therefore, one finally gets
N ⊗ Γ ) e(t), V˙ ≤ eT (t) −ε(IN ⊗ In ) + (IN ⊗ HΓ ) − cλ2 (Σ)(I (20) This completes the proof.
2
Remark 2. In Corollary 1, if the corresponding edges and nodes of the updated coupling weights form a spanning tree, then synchronization in network (3) can be reached. Therefore, the minimal number of the updated coupling weights under this scheme is N − 1, which is much lower than updating all the coupling weights. Note that at the first stage of adaptive control in complex networks, the general scheme is proposed here. Some detailed pinning weights strategies will be further investigated later.
Adaptive Synchronization on Edges of Complex Networks
4
185
Simulation Examples
In this section, a simulation example is provided to demonstrate the theoretical analysis in this paper.
xi1
10 0 −10
0
1
2
3
4
5
3
4
5
3
4
5
t
xi2
1 0 −1
0
1
2 t
xi3
5 0 −5
0
1
2 t
Fig. 1. Orbits of the states xij in network i = 1, 2, . . . , 100 and j = 1, 2, 3
1.15
Gij
1.1
1.05
1
0.95
0
1
2
3
4
5
t
Fig. 2. Orbits of the weights Gij in network i, j = 1, 2, . . . , 100
Suppose that the nonlinear dynamical system s(t) ˙ = f (s(t), t) is described by the Chua’s circuit ⎧ ⎨ s˙ 1 = η(−s1 + s2 − l(s1 )), s˙ 2 = s1 − s2 + s3 , (21) ⎩ s˙ 3 = −βs2 , where l(x1 ) = bx1 + 0.5(a − b)(|x1 + 1| − |x1 − 1|). The system (21) is chaotic when η = 10, β = 18, a = −4/3, and b = −3/4. In view of Assumption 1, by computation, one obtains θ = 5.1623. Consider an ER random network [5], and suppose there are N = 100 nodes with probability p = 0.15, which contains about pN (N −1)/2 ≈ 724 connections. If there is a connection between node i and j, then Gij (0) = Gji (0) = 1 (i = j), i, j = 1, 2, . . . , 100, and the coupling strength c = 0.2. A simulation-based analysis on the random network is performed by using random weights pinning scheme. In the random weights pinning scheme, one randomly selects δpN (N − 1)/2 weights with a small fraction δ = 0.35 to pin. By Corollary 1, since the network G is connected, the network is synchronized under the distributed designed adaptive law. The orbits of the states xij are shown in Fig. 1 for i = 1, 2, . . . , 100 and j = 1, 2, 3. The updated weights Gij are illustrated in Fig. 2.
186
W. Yu
From Fig. 2, it is easy to see that the coupling weights change a little above 1, which is very hard to check from other results in the literature for a very small coupling strength c. Therefore, by using distributed adaptive pinning weights control, one can find that synchronization can be reached in the network under very mild conditions.
5
Conclusions
In this paper, distributed adaptive control for synchronization in complex networks has been studied. Some distributed adaptive strategies have been designed on all the coupling weights of the network for reaching synchronization, which were then extended to the case where only a small fraction of coupling weights are adjusted. A general criterion was given and it was found that synchronization can be reached if the corresponding edges and nodes with updated coupling weights form a spanning tree. Some novel distributed adaptive strategies on the coupling weights of the network have been proposed. However, it is still not understood what kind of pinning schemes may be chosen for a given complex network to realize synchronization if the number of the selected coupling weights is less than N − 1. The study of adaptive control on coupling weights of complex network is at the beginning, and many works about the detailed pinning weights strategies will be further investigated in the future. Acknowledgments. The author would like to thank his supervisor Prof. Guanrong Chen, Dr. Pietro DeLellis, Prof. Mario di Bernardo, and Prof. J¨ urgen Kurths for inspiring discussions and helpful comments.
References 1. Arenas, A., Diaz-Guilera, A., Kurths, J., Moreno, Y., Zhou, C.: Synchronization in complex networks. Physics Reports 468(3), 93–153 (2008) 2. Barab´ asi, A.L., Albert, R.: Emergence of scaling in random networks. Science 286, 509–512 (1999) 3. Chen, T., Liu, X., Lu, W.: Pinning complex networks by a single controller. IEEE Trans. Circuits Syst. I 54(6), 1317–1326 (2007) 4. DeLellis, P., di Bernardo, M., Garofalo, F.: Novel decentralized adaptive strategies for the synchronization of complex networks. Automatica 45, 1312–1318 (2009) 5. Erd¨ os, P., R´enyi, A.: On random graphs. Pub. Math. 6, 290–297 (1959) 6. Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, Cambridge (1985) 7. Lu, W., Chen, T.: Synchronization of coupled connected neural networks with delays. IEEE Trans. Circuits Syst. I 51(12), 2491–2503 (2004) 8. L¨ u, J., Chen, G.: A time-varying complex dynamical network models and its controlled synchronization criteria. IEEE Trans. Auto. Contr. 50(6), 841–846 (2005)
Adaptive Synchronization on Edges of Complex Networks
187
9. Pecora, L.M., Carroll, T.L.: Synchronization in chaotic systems. Phys. Rev. Lett. 64(8), 821–824 (1990) 10. Wang, X., Chen, G.: Synchronization in scale-free dynamical networks: robustness and fragility. IEEE Trans. Circuits Syst. I 49, 54–62 (2002) 11. Wang, X., Chen, G.: Synchronization in small-world dynamical networks. Int. J. Bifurcation and Chaos 12, 187–192 (2002) 12. Wang, X., Chen, G.: Pinning control of scale-free dynamical networks. Physica A 310, 521–531 (2002) 13. Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998) 14. Wu, C.: Synchronization in arrays of coupled nonlinear systems with delay and nonreciprocal time-varying coupling. IEEE Trans. Circuits Syst. II 52(5), 282–286 (2005) 15. Wu, C., Chua, L.O.: Synchronization in an array of linearly coupled dynamical systems. IEEE Trans. Circuits Syst. I 42(8), 430–447 (1995) 16. Xiang, J., Chen, G.: On the V-stability of complex dynamical networks. Automatica 43, 1049–1057 (2007) 17. Yu, W., Cao, J., Chen, G.: Robust adaptive control of unknown modified CohenGrossberg neural networks with delay. IEEE Trans. Circuits Syst. II 54(6), 502–506 (2007) 18. Yu, W., Cao, J., Chen, G., L¨ u, J., Han, J., Wei, W.: Local synchronization of a complex network model. IEEE Trans. Systems, Man, and Cybernetics-Part B 39(1), 230–441 (2009) 19. Yu, W., Cao, J., L¨ u, J.: Global synchronization of linearly hybrid coupled networks with time-varying delay. SIAM Journal on Applied Dynamical Systems 7(1), 108–133 (2008) 20. Yu, W., Chen, G., L¨ u, J.: On pinning synchronization of complex dynamical networks. Automatica 45, 429–435 (2009) 21. Yu, W., Chen, G., Wang, Z., Yang, W.: Distributed consensus filtering in sensor networks. IEEE Trans. Systems, Man, and Cybernetics-Part B (in press) 22. Yu, W., L¨ u, J., Chen, G., Duan, Z., Zhou, Q.: Estimating uncertain delayed genetic regulatory networks: an adaptive filtering approach. IEEE Trans. Auto. Contr. 54(4), 892–897 (2009) 23. Zhou, C., Kurths, J.: Dynamical weights and enhanced synchronization in adaptive complex networks. Phys. Rev. Lett. 96, 164102 (2006) 24. Zhou, J., Lu, J., L¨ u, J.: Adaptive synchronization of an uncertain complex dynamical network. IEEE Trans. Auto. Contr. 51(4), 652–656 (2006) 25. Zhou, J., Lu, J., L¨ u, J.: Pinning adaptive synchronization of a general complex dynamical network. Automatica 44(4), 996–1003 (2008)
Application of Dual Heuristic Programming in Excitation System of Synchronous Generators* Yuzhang Lin and Chao Lu Department of Electrical Engineering, Tsinghua University, 10084 Beijing, China [email protected]
Abstract. A new method, Dual Heuristic Programming (DHP) is introduced for the excitation control system of synchronous generator in this paper. DHP is one type of Approximate Dynamic Programming (ADP), which is an adaptive control method. Nonlinear function fitting method is used to approximate the performance index function of dynamic programming, so that ADP can get optimal control for the plant. In this paper, DHP implements the adaptive control of the excitation system of synchronous generators. Results show that the DHP method performs better than the conventional PID method in excitation control of synchronous generators. The DHP method has obtained a more rapid response. More over, it can optimize the performance globally, which reduced the swinging of excitation voltage and the energy consumption. Keywords: Dual Heuristic Programming, Adaptive Dynamic Programming, Excitation System, Synchronous Generator, Neural Network.
1 Introduction The excitation system of generators provides excitation current to the generator in normal situation. And also it regulates the excitation current according to the load of the generator, so that the terminal voltage is stable. This kind of regulating method is called Automatic Voltage Regulator (AVR). Conventional automatic voltage regulator is a linear control method. The model is linearized at the set point in the conventional method, and the control algorithm is usually PID. The conventional method depends on an accurate math model. In addition, it is effective only at set point, if not, the performance will be degraded. So we should develop a better control method. Approximate dynamic programming was proposed by Werbos[1], based on the dynamic programming. It is a cross-discipline involving artificial neural network, dynamic programming, and reinforcement learning. ADP can deal with the disturbances and uncertain situations. It approximates the cost function in bellman equation to avoid the curse of dimensionality in dynamic programming. So it can be applied to large scale and complicated nonlinear systems. In [1], ADP approaches were classified into four main schemes: Heuristic Dynamic Programming (HDP), Dual Heuristic *
This work was supported by the Students Research Training of Tsinghua University.
D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 188–199, 2011. © Springer-Verlag Berlin Heidelberg 2011
Application of Dual Heuristic Programming in Excitation System
189
Programming (DHP), Action Dependent Heuristic Dynamic Programming (ADHDP), and Action Dependent Dual Heuristic Programming (ADDHP). In this paper, the Dual Heuristic Programming is applied to excitation control system of synchronous generators. And its function is Automatic Voltage Regulator. The proposed control algorithm is designed for a single machine infinite bus power system, and numerical simulation is carried out. The results show that the response of the proposed method is more rapid than the conventional PID method, and also it can optimize the performance globally.
2 Dual Heuristic Programming Dynamic Programming can solve problems that have the form of maximum or minimum in engineering and economic fields theoretically. But actually, the curse of dimensionality prevents its direct adoption in many real-world control problems. DP’s dependence on system numerical model is another disadvantage. So the DP algorithm works well only for small scale problems or simple problems. Approximate Dynamic Programming employs parameterized function structure to approximate the performance index function and control strategy, under the guarantee of principle of optimality. So the optimal control and optimal performance index can be obtained with the ADP algorithm, and also the curse of dimensionality is prevented. This algorithm provides a feasible method to solve large scale and complicated nonlinear system optimal control problems. 2.1 Principle of Dual Heuristic Programming Considered the following discrete time nonlinear system
x ( k + 1) = F ( x ( k ) , u ( k ) , k )
(1)
Where x ∈ R n denotes the system states. u ∈ R m denotes the control variables. The performance index function (cost function) is ∞
J ( x ( k ) , k ) = ∑ γ t − kU ( x ( t ) , u ( t ) , t )
(2)
t =k
Where U is utility function, γ is discount factor, and 0 < γ ≤ 1 . The J function is the cost function of the system, and it depends on the initial time step and the initial system state x ( k ) . Formulas (2) can be transformed as following
J ( x ( k ) ) = U ( x ( k ) , u ( k ) ) + γ J ( x ( k + 1) )
(3)
In algorithm of DHP, it does not approximate cost function J directly, but it ap∂J ( x ( k ) ) proximate the co-state , the derivate of function J with respect to the sysx (k ) tem state x . The co-state can be represented by
190
Y. Lin and C. Lu
∂J ( x ( k ) ) ∂x ( k )
Here, λ ( k ) =
∂J ( x ( k ) ) ∂x ( k )
=
∂U ( x ( k ) , u ( k ) ) ∂x ( k )
+γ
∂J ( x ( k + 1) )
(4)
∂x ( k )
.
According to the Bellman equation, the problem can be solved as following: J * ( x ( k ) , k ) = min ⎡⎣U ( x ( k ) , u ( k ) , k ) +γ J * ( x ( k + 1) , k + 1) ⎤⎦ u( k )
(5)
The optimal control must be satisfied to the first order derivative condition, that is
∂J * ( x ( k ) ) ∂u ( k )
= =
∂U ( x ( k ) , u ( k ) ) ∂u ( k )
∂U ( x ( k ) , u ( k ) ) ∂u ( k )
+ +
∂J * ( x ( k + 1) ) ∂u ( k )
∂J
*
( x ( k + 1) ) ∂F ( x ( k ) , u ( k ) )
∂x ( k + 1)
(6)
∂u ( k )
So the optimal control is
⎛ ∂J * ( x ( k ) ) ∂U ( x ( k ) , u ( k ) ) − u* = arg min ⎜ u ⎜ ∂u ( k ) ∂u ( k ) ⎝ ∂J * ( x ( k + 1) ) ∂F ( x ( k ) , u ( k ) ) ⎞ ⎟ − ⎟ ∂x ( k + 1) ∂u ( k ) ⎠
(7)
2.2 Neural Network Implementation for DHP
The structure of neural network implementation for DHP is shown as Fig. 1. There are three neural networks in it. They are Action Network, Model Network, and Critic Network.
x (t )
u(t )
∂J (t ) ∂x (t )
xˆ(t + 1) ∂J (t + 1) ∂U (x (t ), u(t )) ∂x (t + 1) ∂x (t )
Fig. 1. The Structure of DHP
Action Network is similar to the conventional controller. Its input is the system state x ( k ) . The output is the current control variable u ( k ) . Model Network is to get
Application of Dual Heuristic Programming in Excitation System
191
the system state’s estimation of next time step. Its input and output is system state x ( k ) and control variable u ( k ) respectively. The input of Critic Network is system state x ( k ) , and the output λˆ ( k ) is the estimation of co-state λ ( k ) =
∂J ( x ( k ) ) ∂x ( k )
. The
three networks can be replaced by any other parameterized function structures, such as polynomial, neural network, fuzzy logic and so on. In this paper, it is a three layered feed-forward neural network. Assume the number of the neurons of hidden layer is denoted by l , the weight matrix between the input layer and hidden layer is denoted by V , the weight matrix between the hidden layer and the output layer is denoted by W . This network can be represented by F ( X ,V ,W ) = W T σ (V T X
)
(8)
e zi − e − zi Where σ (V T X ) ∈ R l , ⎡⎣σ ( z ) ⎤⎦ i = zi , i = 1," , l are activation functions. e + e − zi 2.2.1 Model Network The functions of model network are to estimate the system state of next time step and transmit learning error of other network, so it must approximate the plant as much as possible. Before the whole algorithm goes on, the model network should be trained with a large number of data which contains lots of information of the plant. In addition, the model network can be trained online too, so that the model network can adaptive to the system’s dynamic when the set point or the environment are changed. The error function is defined as
eM ( k + 1) = x ( k + 1) − xˆ ( k + 1) EM ( k + 1) =
1 T em ( k + 1) em ( k + 1) 2
(9) (10)
Where xˆ ( k + 1) is the output of the model network. According to the gradient-based algorithm, minimize the defined error function EM , the weight update rule for the model network is given by wM ( k + 1) = wM ( k ) + ΔwM ( k )
(11)
⎡ ∂E ( k + 1) ⎤ ΔwM ( k ) = lM ⎢ − ⎥ ⎣⎢ ∂wM ( k + 1) ⎦⎥
(12)
where lM is the learning rate of model network, and wM the weight vector of the model network.
192
Y. Lin and C. Lu
2.2.2 Critic Network
Critic network is trained to approximate co-state
∂J ( x ( k ) ) ∂x ( k )
, the derivative of cost
function J . For a controllable system, if neural network convergent to the cost function correctly, the controller can guarantee the system stable. In order to get an approximation of the co-state, define the error of the network as eC ( k ) = λ ( k ) −
∂U ( x ( k ) , u ( k ) ) ∂x ( k )
EC =
−γ
∂J ( x ( k + 1) ) ∂x ( k )
1 T eC ( k ) eC ( k ) 2
(13)
(14)
Minimize objective function EC to train critic function. Where ∂U ( x ( k ) , u ( k ) ) ∂x ( k )
∂J ( x ( k + 1) ) ∂x ( k )
=
∂U ( x ( k ) ) ∂x ( k )
+
∂U ( u ( k ) ) ∂u ( k ) ∂u ( k )
∂x ( k )
∂J ( x ( k + 1) ) ⎛ ∂xˆ ( k + 1) ∂xˆ ( k + 1) ∂u ( k ) ⎞ + ⎜ ⎟ ∂xˆ ( k + 1) ⎜⎝ ∂x ( k ) ∂u ( k ) ∂x ( k ) ⎟⎠ ∂xˆ ( k + 1) ∂xˆ ( k + 1) ∂u ( k ) = λ ( k + 1) + λ ( k + 1) ∂x ( k ) ∂u ( k ) ∂x ( k )
(15)
=
(16)
According to the gradient-based algorithm, the weight update rule for the critic network is given by wC ( k + 1) = wC ( k ) + ΔwC ( k ) ⎡ ∂E ( k ) ⎤ ΔwC ( k ) = lC ⎢ − C ⎥ = lC ⎣⎢ ∂wC ( k ) ⎦⎥
⎡ ∂EC ( k ) eC ( k ) ∂λ ( k ) ⎤ ⎢− ⎥ ⎣⎢ eC ( k ) ∂λ ( k ) ∂wC ( k ) ⎦⎥
(17)
(18)
where lC is the learning rate of the critic network and wC is the weight vector of the critic network. 2.2.3 Action Network In order to get optimal performance, DHP obtains the optimal control by training the action network. And the objective is to minimize the cost function J . Because function J is a positive, zero is minimal. Then define the error function as
EA = J ( t ) − 0 = J (t )
(19)
According to the gradient-based algorithm, the weight update rule for the action network is given by
Application of Dual Heuristic Programming in Excitation System
193
wa (k + 1) = wa (k ) + Δwa (k )
(20)
⎡ ∂J ( k ) ⎤ Δ wA ( k ) = l A ⎢ − ⎥ ⎣⎢ ∂wA ( k ) ⎦⎥
(21)
Where l A is the learning rate of the action network and wA is the weight vector of the action network. 2.3 Procedure of DHP Method
Given the above preparation, now the DHP method is summarized as follows: Step 1: Given the computation precision ε , initialize the weights of each network. Step 2: According the model network weight update rule, train the model network. Step 3: Do the forward computations of action network, model network, and critic network. Step 4: According to the critic network weight update rule, train the critic network. Step 5: Do the forward computations of critic network. Step 6: According to the action network weight update rule, train the action network. Step 7: Jump to step 3, until the desired performance is satisfied.
3 The Application of DHP in Excitation System of Synchronous Generators 3.1 Single Machine Infinite Bus Power System
In order to facilitate the analysis, the power system is simplified into a single machine infinite bus system in this paper. The single machine infinite bus system includes synchronous generator, turbine, speed governor, transmission line connected to infinite bus and exciter and AVR excitation control system. ΔP
–
Pref
+
Σ
turbine
governor
Pm
Δω
Synchronous generator
Z=Re+jXe
Vfd
exciter
Ve AVR
Um
Vt
–
Σ
+
Vref
Fig. 2. Single machine infinite bus power system
194
Y. Lin and C. Lu
In Fig. 2, Re , X e represent the transmission line parameters. Δω represents the speed variation with respect to speed of operation, U m represents the voltage of infinite bus, Vt represents the terminal voltage of the generator, Ve represents the input voltage of exciter. V fd represent the output voltage of exciter, Vref represents the reference voltage of terminal voltage. Pm represents the mechanical power. Pref represents the reference power of the mechanical power. 3.2 The Excitation Controller Design Based on DHP
In this paper, DHP method replaces the conventional PID AVR of the excitation system. And the three networks are three layered feed-forward neural network, and the activation function of the hidden layer is bipolar sigmoidal function, and the activation function of the output layer is pure linear function The training strategy is as follows: 1training the model network first, then training the action network and critic network simultaneously. The most important objective of AVR is to let the terminal voltage tracking the reference terminal voltage quickly, and then maintaining the voltage at the set point. So define the following utility function of the system as U (t ) = [(0.4ΔVt (t ) + 0.4ΔVt (t − 1) + 0.16ΔVt (t − 2))]2
(22)
3.2.1 The Design of Model Network The input of model network is selected as the voltage variation with respect to the reference terminal voltage ΔVt , the excitation adjusting voltage ΔVe , and two time
step of their delayed value, that is, ⎡⎣ ΔVt ( k ) ΔVt ( k − 1) ΔVt ( k − 2 ) ΔVe ( k ) ΔVe ( k − 1) ΔVe ( k − 2 )⎤⎦ . The output is the next time step of terminal voltage variaT
tion ΔVt ( k + 1) . There are 15 neurons at the hidden layer. So the structure of the model network is 6-15-1. ΔVt (k )
Wm1
ΔVt (k − 1)
Wm 2
ΔVt (k − 2)
ΔVˆt (k + 1)
ΔVe (k ) ΔVe (k − 1) ΔVe (k − 2)
Input layer
Hidden layer
Output layer
Fig. 3. The topology of model network
Application of Dual Heuristic Programming in Excitation System
195
First, define the input of model network Min = ⎡⎣ ΔVt ( k ) ΔVt ( k − 1) ΔVt ( k − 2 ) ΔVe ( k ) ΔVe ( k − 1) ΔVe ( k − 2 ) ⎤⎦ . The forward computation is T
6
Mh1 j ( t ) = ∑ Min ( k ) Wm1ij ( k ) i =1
Mh 2 j ( t ) =
1− e
− Mh1 j ( t )
1+ e
− Mh1 j ( t )
(23)
15
ΔVˆt ( k + 1) = ∑ Mh 2 j ( t ) Wm2 j ( t ) j =1
Where Mh1 is the output vector of input layer. Mh 2 is the output vector of hidden layer. Wm1 is the weight matrix between input and hidden layer. Wm2 is the weight matrix between hidden and output layer. The update rule of the model network is ΔWm 2 ( k ) = lM ( k ) Mh 2Τ ( k ) eM ( k + 1) ΔWm1( k ) =
{
(24)
}
1 lM ( k ) MinΤ ( k ) ⎡⎣ eM ( k + 1)Wm2Τ (k ) ⎤⎦ ⋅ [1 − Mh2( k ) ⋅ Mh 2( k )] 2
(25)
Where “ ⋅ ” represents “entrywise product”. 3.2.2 The Design of Critic Network According to the definition of utility function U ( t ) , there are 3 input neurons. So the T input vector is ⎡⎣ ΔVt ( k ) ΔVt ( k − 1) ΔVt ( k − 2 )⎤⎦ . The output is Jˆ , the estimation of function J . And there are 8 neurons at the hidden layer. So the structure of the critic network is 3-8-1.
Let the input of the critic network Cin = ⎡⎣ ΔVt ( t ) ΔVt ( t − 1) ΔVt ( t − 2 ) ⎤⎦ , then T
the forward computation is 3
Ch1 j ( k ) = ∑ Cin ( k ) Wc1ij ( k ) i =1
Ch2 j ( k ) =
1− e
− Ch1 j ( k )
1+ e
− Ch1 j ( k )
(26)
8
Jˆ ( k ) = ∑ Ch2 j ( k ) Wc 2 j ( k ) j =1
Where Ch1 is the output vector of input layer. Ch2 is the output vector of hidden layer. Wc1 is the weight matrix between input and hidden layer. Wc 2 is the weight matrix between hidden and output layer.
196
Y. Lin and C. Lu
The update rule of the critic network is
ΔWc 2 ( k ) = −lC (k )Ch2Τ ( k ) eC ( k )
(27)
{
}
1 ΔWc1( k ) = − lC ( k ) ΔVt T ( k ) ⎡⎣eC ( k )Wc 2Τ ( k ) ⎤⎦ ⋅ ⎡⎣1 − Ch2 ( k ) ⋅ Ch2 ( k ) ⎤⎦ 2 Where “ ⋅ ” represents “entrywise product”.
3.2.3 The Design of Action Network The input of action network is the same as critic network. Output is the excitation adjusting voltage ΔVe ( k ) , and it is the control variable. The hidden layer also have 8
neurons. So the structure of the action network is 3-8-1. Let Ain = ⎡⎣ ΔVt ( k ) ΔVt ( k − 1) ΔVt ( k − 2 ) ⎤⎦ , then the forward computation is T
3
Ah1 j ( k ) = ∑ Aini ( k )Wa1ij ( k ) i =1
Ah2 j ( k ) =
1− e
− Ah1 j ( k )
1+ e
− Ah1 j ( k )
(28)
8
ΔVe ( k ) = ∑ Ah2 j ( k )Wa 2 j ( k ) j =1
Where Ah1 is the output vector of input layer. Ah2 is the output vector of hidden layer. Wa1 is the weight matrix between input and hidden layer. Wa 2 is the weight matrix between hidden and output layer. The update rule of the action network is
⎛ ∂J ( k + 1) ⎞ ΔWa 2 ( k ) = −l A ( k ) Ah2Τ ( k ) ⎜⎜ 2ΔVe ( k ) + γ ⎟ ∂ΔVe ( k ) ⎟⎠ ⎝
(29)
∂J (k + 1) 1 = λ (k + 1)Wm2Τ (k )Wm1u ( k ) ⋅ ⎡⎣1 − Mh2 ( k ) ⋅ Mh2 ( k )⎤⎦ ∂u (k ) 2
(30)
1 γ ⎛ ΔWa1(k ) = − l A ( k ) ΔVt T ( k ) ⎜ 2ΔVe ( k ) + λ ( k + 1)Wm2Τ ( k ) 2 2 ⎝
(Wm1 ( k ) ⋅ (1 − Mh2 ( k ) ⋅ Mh2 ( k ) ) ) (Wa 2 ( k ) ⋅ (1 − Ah2 ( k ) ⋅ Ah2 ( k )) )
T
u
Τ
Where “ ⋅ ” represents “entrywise product”.
)
(31)
Application of Dual Heuristic Programming in Excitation System
197
3.3 Simulation and Results
According to the description of previous sections, a numerical simulation is set up at the MATLAB/SIMULINK environment, containing DHP Controller module and the single machine infinite bus power system module. A conventional PID method simulation is set up too. In the single machine infinite bus system, the parameters of the generator is as follows: Pn = 200MVA , Vn = 16.8kVrms , f n = 50 Hz . All values in the simulation are expressed in standard per unit (p. u.). The simulation set the system in fault status to test the DHP based excitation controller. The fault is three phase short circuit at the time span of 6.0s ~ 6.1s. Results show the control effect of DHP based controller and conventional PID based controller. The simulation results are shown in Fig. 4, Fig. 5, and Fig. 6. 1.2 1.0 DHP PID
Vt
0.8 0.6 0.4 0.2 0.0
6
8 time (s)
10
Fig. 4. The terminal voltage with three phase fault
0.01
DHP PID
Δω
0.00
-0.01
6
8
time (s)
10
12
Fig. 5. The speed variation with three phase fault
DHP PID
ΔVe
3 2 1 0 -1
6
7
8 time (s)
9
10
Fig. 6. The excitation adjusting voltage with three phase fault
198
Y. Lin and C. Lu
Fig. 4 shows the terminal voltage response diagram when the system at with three phase fault In this status, the terminal voltage drops down significantly. When the fault is clear, DHP based controller has a more rapid response to recover the voltage than conventional PID. Fig. 5 shows the speed variation response diagram at fault status. This also shows that the DHP based controller has a slightly more rapid response than conventional PID. Fig. 6 shows the controller’s output, the excitation adjusting voltage’s diagram. This shows that DHP based controller has a more rapid response than conventional PID, and the DHP based controller’s output does not swing up. This will reduce the loss of actuator and energy consumption.
4 Conclusions This paper introduced a DHP based automatic voltage regulator to replace the conventional PID based one. DHP does not depend on system’s accurate numerical model. It learns the system dynamics from existing data. In addition the model network can be updated online, to learn new dynamics caused by disturbances and environment changes The action network is also adaptive with environment changes. As the results show, the control signal does not swing up. This advantage shows that the DHP method can optimize the system performance globally. It also reduces the loss of actuator and energy consumption.
References 1. Werbos, P.J.: Approximate dynamic programming for realtime control and neural modeling. In: White, D.A., Sofge, D.A. (eds.) Handbook of Intelligent Control, Van Nostrand Reinhold, New York (1992) 2. Liu, W., Venayagamoorthy, G.K., Wunsch, D.C.: Design of Adaptive Neural Network Based Power System Stabilizer. Neural Networks 16, 891–898 (2003) 3. Wang, F.-Y., Zhang, H., Liu, D.: Adaptive Dynamic Programming: An Introduction. Computational Intelligence Magazine, 39–47 (2009) 4. Seong, C.-Y., Widrow, B.: Neural Dynamic Optimization for Control Systems—Part I: Background. IEEE Trans. on Systems, Man and Cybernetics- Part B 31(4), 482–489 (2001) 5. Liu, D., Xiong, X., Zhang, Y.: Action-dependent adaptive critic designs. In: Proceeding of International Joint Conference on Neural Networks (IJCNN 2001), July 15-19, vol. 2, pp. 990–995 (2001) 6. Si, J., Wang, Y.-T.: On-line learning control by association and reinforcement. IEEE Trans. on Neural Networks 12(2), 264–276 (2001) 7. Hunt, K.J., Sbarbaro, D., Zbikowski, R., Gawthrop, P.J.: Neural networks for control systems -a survey. Automatica 28(6), 1083–1112 (1992) 8. Prokhorov, D., Wunsch, D.: Adaptive critic designs. IEEE Transactions on Neural Net-works 8(5), 997–1007 (1997)
Application of Dual Heuristic Programming in Excitation System
199
9. Mohagheghi, S., del Valle, Y., Venayagamoorthy, G.K.: A Proportional-Integrator Type Adaptive Critic Design-Based Neurocontroller for a Static Compensator in a Multimachine Power System. IEEE Transactions on Industrial Electronics 54(1), 86–96 (2007) 10. Wei, Q., Zhang, H., Liu, D.: An Optimal Control Scheme for a Class of Discrete-Time Nonlinear Systems with Time Delays Using Adaptive Dynamic Programming. Acta Automatica Sinica (2008) 11. Lu, C., Si, J., et al.: Direct heuristic dynamic programming for damping oscillations in a large power system. IEEE Trans. Syst., Man., Cybern. B 38(4), 1008–1013 (2008)
A Neural Network Method for Image Resolution Enhancement from a Multiple of Image Frames Shuangteng Zhang1 and Ezzatollah Salari2 2
1 Department of Computer Science, Eastern Kentucky University, Richmond, KY 40475 Department of Electrical Eng. & Comp. Science, University of Toledo, Toledo, OH 43607
Abstract. This paper presents a neural network based method to estimate a high resolution image from a sequence of low resolution image frames. In this method, a multiple of layered feed-forward neural networks is specially designed to learn an optimal mapping from a multiple of low resolution image frames to a high resolution image through training. Simulation results demonstrate that the proposed neural networks can successfully learn the mapping in the presented examples and show a good potential in solving image resolution enhancement problem, as it can adapt itself to the various changeable conditions. Furthermore, due to the inherent parallel computing structure of neural network, this method can be applied for real-time application, alleviating the computational bottlenecks associated with other methods and providing a more powerful tool for image resolution enhancement. Keywords: Image resolution, image enhancement, neural network, image sequence analysis.
1 Introduction Super-resolution images are very important in many image processing applications. One way to obtain such high-resolution images is to apply image reconstruction or restoration techniques to reconstruct or restore the needed high-resolution images from the corresponding low-resolution ones. Following a mathematical approach, this reconstruction or restoration problem is considered to be highly ill-posed when an improved resolution is required to be obtained from a single low resolution image. However, the problem can lead to a more stable solution when there exists a sequence of images corresponding to the same object or scene. A common technique is to use the relative motions between two consecutive low-resolution image frames and reconstruct a high-resolution image from a multiple of low-resolution images. Several approaches to enhancing the image resolution using a multiple of lowresolution images have been proposed in recent years. Tsai and Huang [1] introduced a Discrete Fourier Transform (DFT) based algorithm to reconstruct a high resolution image from a multiple of input image frames and the method was further enhanced by Kim [2] to handle the noise and the blurring introduced by the imaging device. Stark and Oskoul proposed a convex projection method to obtain high-resolution images from coarse detector data [3]. Hardie et. al. [4] presented a method in which the translation and rotational motion is considered and a high-resolution image is obtained by minimizing a cost function in an iterative algorithm and this method was further D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 200–208, 2011. © Springer-Verlag Berlin Heidelberg 2011
A Neural Network Method for Image Resolution Enhancement
201
enhanced by exploiting non-global scene motion [5]. Freeman et. al. [6] developed a learning based technique in which fine details of image regions are learned from a low resolution in a training set. The learned relationships are used to predict the fine details in other images. Techniques based on maximum a posteriori or maximum likelihood estimation were also used in image resolution improvement [7] and [8]. Most of the methods mentioned above are expressed in terms of an optimization problem and try to minimize some sort of cost function. Solutions to such optimization problems require intensive computations which are very time consuming. In this paper, a neural network based method is proposed to estimate a high resolution image from a sequence of low resolution image frames. This method uses a multiple of layered feed-forward neural networks that is particularly designed to be capable of learning an optimal mapping from a multiple of low resolution image frames to a high resolution image through training. Simulation results demonstrate that the proposed neural networks has good potential in solving image resolution enhancement problem, as it can adapt itself to the various changeable conditions. Furthermore, due to the inherent massively parallel computing structure of neural network, this new method can be used for real-time application, alleviating the computational bottlenecks associated with other methods and providing a more powerful tool for image resolution enhancement.
2 Problem Statement When the detector array of the imaging systems is not sufficiently dense to meet the Nyquist criterion, the obtained digital images are undersampled and have a lower resolution. Assuming that the downsampling factor is L, a simple model of obtaining a low-resolution image of size (N1,N2) from a high-resolution image of size (L x N1, L x N2) can be expressed by the following equation, L x N1 L x N 2
g (m, n)
¦ ¦h
m , n , k ,l
k 1
z (k , l ) K (m, n)
(1)
l 1
where g(m,n) and z(k,l) are the low-resolution and the high-resolution images, respectively, hm,n,k,l is the contribution of the (k,l)th pixel in the high-resolution image to the (m,n)th pixel in the low-resolution image, η(m,n) is the Gaussian noise added to the (m,n)th pixel in the low-resolution image. The problem here is to recover the highresolution image z from P low-resolution images gp, p=1,2,…P. A neural network method is proposed to solve this problem. This method takes advantage of the powerful learning ability of feed-forward neural networks and relative motions between the low-resolution images to recover the lost information due to undersampling, and therefore, generate reconstructed high-resolution image. Assume that the relative motion between a reference low-resolution image gr (can be anyone of the low-resolution images) and other images gp is represented by using the related rotational and translational parameter triplet, (θp, hpx, hpy), the details of the proposed neural network method are provided in the following sections.
202
S. Zhang and E. Salari
3 The Proposed Neural Network Structure The proposed neural network is a multiple layered feed-forward network and can be functionally divided into three sub-networks: preliminary resolution enhancement (PRE), additional information (AI), and output networks. Figure 1 shows the overall diagram of the proposed network. Note, Ia and Ib denote two sets of the external inputs for the preliminary resolution enhancement and additional information sub-networks, respectively, and O represents the set of outputs for the whole network. The PRE network computes high-resolution image pixel information from the reference lowresolution image. The AI network is constructed to extract additional high-resolution image information from the other low-resolution images. The output network combines the calculated high-resolution image pixel information with the additional information from the AI network to obtain the expected high-resolution image. Preliminary Resolution Enhancement Network
Ia
Ya O b
Y
Ib
Output Network
Additional Information Network
Fig. 1. Diagram of the proposed neural network
3.1 The PRE Network As mentioned above, the PRE network is used to compute preliminary high-resolution image pixel information from the reference low-resolution image. This function can be achieved by using a feed-forward neural network. The learning capability of a multi-layered feed-forward network makes it a universal approximator. With proper structure design, it can be trained to learn the optimal mapping from low-resolution image to high-resolution image pixels. The proposed PRE network is designed to be a fully connected three-layered feed-forward neural network as shown in Figure 2. Assume that Ik (k=1, 2,…, 9) denotes the k-th pixel value of a 3x3 window in the reference low-resolution image frame gr, the network takes the vector Ia=(I1, I2, …, I9) as the input and generates M=L2 high-resolution image pixel values yak (k=1, 2, …, M) corresponding to the central pixel of the 3x3 window in the reference image gr. The behavior of the network can be formulated as,
y ka
Q
9
j 1
i 1
f (¦ u ajk f (¦ vija I i )), k
1, 2, ..., M
(2)
A Neural Network Method for Image Resolution Enhancement
ya1
ya2
203
yaM xxx
Ua xxx
Va
I1
I2
I3
I4
I5
I6
I7
I8
I9
Fig. 2. Neural network structure of the PRE network
where yak is the output of the k-th neuron in the ouput layer, uajk is the weight of the connection between the j-th neuron in the hidden layer and the k-th neuron in the output layer, vaij is the connection weight between the i-th neuron in the input layer and jth neuron in the hidden layer, and Q is the number of the neurons in the hidden layer. The activation function of the neurons f(.) is,. f ( x)
2 /(1 e Ox ) 1
(3)
where λ > 0 is the neuron activation function coefficient determining the steepness of the function. 3.2 The AI Network Using only the PRE network can not generate high quality high-resolution image since no enough information can be obtained from only the reference low-resolution image to recover the lost information due to the undersampling. The AI network takes use of the relative motion between the reference image gr and the other low-resolution image gp to extract additional information for the further enhancement of the image resolution in the output network. The proposed IA network structure is shown in Figure 3. It is also a fully connected three-layered feed-forward network. The input of the IA network consists of P-1 vectors with each vector having elements of Ip1, Ip2, …, Ip9, θp, hpx, and hpy. Note Ipk (k=1, 2, …, 9) is the k-th pixel value of a 3x3 window in the p-th low-resolution image, and θp, hpx, and hpy are the rotational and translational motion parameters between the p-th low-resolution image and the reference image gr. Representing the input of the IA network by the vector Ib=( Ib1, Ib2, …, Ib12(P-1))= (I11, I12, … , I19, θ1, h1x, h1y, I21, I21, …, I29, θ2, h2x, h2y, … , I(P-1)1, I(P-1)2, … , I(P-1)9, …, θ(P-1), h(P-1)x, h(P-1)y), the output of the network can be expressed as,
204
S. Zhang and E. Salari yb1
yb2
ybM
xxx Ub xxx b
V
xxx
xxx
I11 x x x I19 T1
xxx
xxx
h1x h1y I21 x x x I29 T2 h2x h2y x x x I(P-1)1 x x x I(P-1)9 T(P-1) h(P-1)x h(P-1)y
Fig. 3. Neural network structure of the AI network
R
12( P 1)
j 1
i 1
f (¦ u bjk f (
y kb
¦v
b ij
I ib )), k
1, 2, ..., M
(4)
where ybk is the output of the k-th neuron in the ouput layer, ubjk is the connecting weight between the j-th neuron in the hidden layer and the k-th neuron in the output layer, vbij is the weight of connection between the i-th neuron in the input layer and jth neuron in the hidden layer, R and M are the number of the neurons in the hidden and output layer, respectively, P is the number of the low-resolution images. The activation function of the neurons is the same as equation (3). The AI network requires the motion parameters between the reference lowresolution image gr and the other images gp as part of the input. However, in the practical applications, these parameters are usually unknown. There exist some approaches for the estimate of the parameters [9] and [10]. Here we propose a neural network to estimate the motion parameters and incorporate it into the IA network. We use P-1 neural networks, each calculating the parameters (θp, hpx, hpy) for one of the (P-1) low-resolution images and having the same structure as shown in Figure 4. As in the PRE and AI networks, assume that Ik and Ipk (k=1, 2,…, 9) denote the k-th pixel value of a 3x3 window in the reference low-resolution image frame gr and the p-th low-resolution image frame gp, respectively, the output Yc = (yc1, yc2, yc3)=(θp, hpx, hpy) of the motion parameter network with input of Ic=(Ic1, Ic2, …, Ic18)=( I1, I2, …, I9, Ip1, Ip2, Ip9) can be obtained by using the following equation,
y kc
S
18
j 1
i 1
f (¦ u cjk f (¦ vijc I ic )), k
1, 2, 3
(5)
where ucjk and vcij as in equation (4) are the connection weights between input, hidden and output layers, and S is the number of the neurons in the hidden layer.
A Neural Network Method for Image Resolution Enhancement
yc1,
yc2,
205
yc3
Uc xxx
Vc xxx
I1
I2
xxx
xxx
I9
Ip1
Ip2
xxx
Ip9
Fig. 4. Neural network structure of the motion parameter network
3.3 Output Network The output network is a two-layered feed-forward network as shown in Figure 5. It combines the outputs of the PRE and AI networks as input vector, (ya1, ya2, …, yaM, yb1, yb2, …, ybM), to generate a set of high-resolution image pixel values, Z=(z1, z2, …, zM). The output zm can be calculated by the following equation, M
f ( y ma wmm ¦ w( p k ) m y kb )
zm
(6)
k 1
Note, wkm is the connecting weight between the k-th neuron in the input layer and mth neuron in the output layer. z1
z2
zM xxx
W xxx
ya1
ya2
xxx
xxx
yaM
yb1
yb2
xxx
ybM
Fig. 5. The neural network structure of the output network
206
S. Zhang and E. Salari
4 Training the Proposed Network The proposed neural network consists of three multi-layered feed-forward subnetworks. These three networks work together to form a big feed-forward network to reconstruct a high-resolution image from a multiple of low-resolution images. The capability of the network can be achieved by training it to learn the optimal mapping from a set of low-resolution image frames to high-resolution image through adjusting the neuron connection weights. The standard Error Back-Propagation Training algorithm (EBPT) [11] can be used for the training. EBPT is a gradient descent algorithm that minimizes some cost function. The following error functions are used for the training, 2
Ea
1 M M 1 M (Oi z i ) 2 J ¦ (¦D ij z j ) ¦ 2 i 1 j1 2 i 1
Eb
1 M M 1 M (Oi y ia ) 2 J ¦ (¦D ij y ia ) ¦ 2 i 1 j1 2 i 1 Ec
1 3 ¦ ( Di y c i ) 2 2 i 1
(7) 2
(8)
(9)
where Oi is the expected/desired output of the proposed network, Di is the desired output of motion parameter network, zi, yai and yci are the observed output of the whole network, the PRE sub-network and the motion parameter network, respectively. Note, in equations (7) and (8), the second part calculates the summation of the squared Laplacian of each generated high-resolution image pixel value zm. By adding the additional part to the functions (7) and (8), the training of the network searches the optimal solution in the reduced search space, and thus generates a better training result. Here the coefficient αij is used for the calculation of Laplacian of each generated high-resolution image pixel and the coefficient γ is used to balance the solution generated by the network and is set to a small value. As mentioned earlier, M is the number of the neurons in the output layer of the output network, where M=L2 (L is the downsampling factor).
5 Preliminary Experimental Results The proposed neural network was implemented. In our experiments, we used 16 lowresolution images to generate a high-resolution image with a size of 4 times of the low-resolution image in each dimension. Therefore, P equals to 16, the downsampling factor L=4, and M=L2=16. Synthetically generated image frames were used in our experiments. To generate the low-resolution images synthetically, we used an eight bit gray level image of size 512 x 512. At first, sixteen sets of motion parameters (θp, hpx, hpy) with subpixel magnitudes were randomly generated. Then, part of the original image was rotated and translated by using the sixteen sets of motion parameters, respectively, to obtain sixteen variations of high-resolution images of size 320 x 320. These sixteen images
A Neural Network Method for Image Resolution Enhancement
207
were further blurred by a 5 x 5 moving-average filter and then decimated by a factor of 4 to produce sixteen low-resolution images of size 80 x 80. Follow the procedures, two sets of sixteen low-resolution and their corresponding high-resolution images were generated for ‘lenna’ and ‘boat’ images, respectively. The original high-resolution images plus the generated low-resolution images as well as the generated motion parameter sets were used for the training of the neural network. After the training process, the network was used to reconstruct the highresolution image using the sixteen low-resolution lenna images. Figures 6, 7, 8 and 9 show the original and the generated high-resolution lenna images using pixel replication, bilinear interpolation, and the proposed neural network, respectively.
10
50
20
100 30
150
40
200
50 60
50
100
150
10
200
Fig. 6. Orginal high resolution image
50
100
100
150
150
200
200
100
150
200
30
40
50
60
Fig. 7. A single low-resolution image
50
50
20
50
100
150
200
Fig. 8. Reconstructed lenna image using Fig. 9. Reconstructed lenna image using the bilinear interpolation proposed neural network
208
S. Zhang and E. Salari
6 Summary and Future Work A novel neural network based method for high-resolution image reconstruction from a multiple of low-resolution image frames has been proposed. The preliminary experimental results show that the proposed neural network can successfully learn the mapping from a set of the low-resolution image sequences to the high-resolution image, and therefore, it is promising in solving the ill-posed high-resolution image reconstruction problem. To further investigate the proposed neural network and its capability of image resolution enhancement, more research work is needed. Some directions of future work can be: (1) investigating better neural structure that is more suitable for the task; (2) Designing complete training scheme for a better training result, and (3) testing the network on variety of images.
References 1. Tsai, R.Y., Huang, T.S.: Multiframe Image Restoration and Registration. In: Huang, T.S. (ed.) Advances in Computer Vision and Image Processing, pp. 317–339. JAI Press Inc., United Kingdom (1984) 2. Kim, S.P., Su, W.: Recursive High-resolution Reconstruction of Blurred Multi-frame Images. IEEE Trans. Image Processing 2, 534–539 (1993) 3. Stark, H., Oskoui, P.: High-resolution Image Recovery From Image-plane Arrays, Using Convex projections. J. Opt. Soc. Amer. A. 6, 1715–1726 (1989) 4. Hardie, R.C., Barnard, K.J., Bognar, J.G., Watson, E.A.: High-resolution Image Reconstruction from a Sequence of Rotated and Translated Frames and its Application to an Infrared Imaging System. Opt. Eng. 37, 247–260 (1998) 5. Tuinstra, T.R., Hardie, R.C.: High Resolution Image Reconstruction from Digital Video by Exploitation of Non-global Motion. Optic. Eng. 38, 806–814 (1999) 6. Freeman, W.T., Jones, T.R., Pasztor, E.C.: Example-Based Super-Resolution. IEEE Comput. Graph. Appl. 22, 56–65 (2002) 7. Schultz, R.R., Stevenson, R.L.: Extraction of High-resolution Frames from Video Sequences. IEEE Trans. Image Processing 5, 996–1011 (1996) 8. Hardie, R.C., Barnard, K.J., Armstrong, E.E.: Joint MAP Registration and Highresolution Image Estimation Using a Sequence of Undersampled Images. IEEE Trans. Image Processing 6, 1621–1633 (1997) 9. Irani, M., Peleg, S.: Improving resolution by image registration. CVGIP: Graph. Models and Image Process. 53, 231–239 (1991) 10. Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proc. 7th International Joint Conference on Artificial Intelligence, pp. 674–679. Morgan Kaufmann Publishers Inc., San Francisco (1981) 11. Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall, Upper Saddle (1999)
Multiparty Simultaneous Quantum Secure Direct Communication Based on GHZ States and Mutual Authentication Wenjie Liu1, Jingfa Liu1, Hao Xiao2, Tinghuai Ma1, and Yu Zheng1 1
School of Computer and software, Nanjing University of Information Science and Technology, Nanjing, China 2 School of Information and Engineering, Huzhou Teachers College, Huzhou, China {wenjieliu,jfliu}@nuist.edu.cn, [email protected], {thma,yzheng}@nuist.edu.cn
Abstract. In the modern communication, multiparty communication plays a important role, and becomes more and more popular in the e-business and e-government society. In this paper, an efficient multiparty simultaneous quantum secure direct communication protocol is proposed by using GHZ states and dense coding, which is composed of two phases: the quantum state distribution process and the direct communication process. In order to prevent against the impersonators, a mutual identity authentication method is involved in both of the two processes. Analysis shows that it is secure against the eavesdropper’s attacks, the impersonator’s attacks, and some special Trent’s attacks (including the attack by using different initial states). Moreover, Comparing with the other analogous protocols, the authentication method is more simple and feasible, and the present protocol is more efficient. Keywords: quantum cryptography communication, QSDC, simultaneous communication, mutual authentication, GHZ states.
multiparty
1 Introduction The principles in quantum mechanics, such as no-cloning theorem, uncertainty principle, and entanglement, provide some interesting ways for secure communication. For example, quantum key distribution (QKD), the most mature branch of quantum communication, provides a novel way for two legitimate parties to share a common secret key with unconditional security [1-3]. Quantum secret sharing (QSS) is another branch, which supplies a secure way not only for sharing a classical message [4, 5], but also for sharing an unknown quantum state [6, 7]. Different from QKD and QSS, quantum direct communication (QDC) is to transmit directly the secret message without generating a secret encryption key between the communication parties in advance. Due to its instantaneous, QDC may be used in some urgent circumstances, and it was proposed and actively pursued by some groups [8-15]. In essence, these QDC schemes can be divided into two categories: quantum secure direct communication (QSDC) [8-11] and deterministic secure quantum D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 209–218, 2011. © Springer-Verlag Berlin Heidelberg 2011
210
W. Liu et al.
communication (DSQC) [12-15]. Their ultimate difference is that the receiver can read out the secret message only after he exchanges at least one bit of classical information for each qubit with the sender in DSQC protocol. Recently, Lee et al. [16] proposed two QSDC protocols with authentication (LLY for short). However, Zhang et al. [17] indicated these protocols were insecure against the authenticator Trent’s attacks, and put forward two modified protocols by using the Pauli σ Z operation instead of the original bit-flip operation σ X (ZLW). And Liu et al. [18] put forward two corresponding efficient protocols (LCL) by using four Pauli operations { I , σ X , iσ Y , σ Z }. Unfortunately, in 2009, Qin et al. [19] claimed that in LCL protocols Trent and an eavesdropper Eve can elicit half of information about the secret message from the public declaration. And then, Yen et al. [20] also pointed out that ZLW protocols are still insecure when Trent would prepare different initial states, and presented a new DSQC protocol with mutual authentication and entanglement swapping (YHG) to make up this kind of security leakage. In these above proposed protocols [8-20], the one-to-one communication mode is widely discussed, but the multiparty simultaneous communication mode is rarely discussed deeply. In this paper, we present an efficient multiparty simultaneous QSDC protocol (MSQP) with multi-particle Greenberger-Horne-Zeilinger (GHZ) states and mutual authentication. What’s more, the mutual authentication procedures is included in both the quantum state distribution process and the direct communication process. In order to prevent against some special Trent’s attacks, such as the attacks proposed by Zhang et al. [17] (ZLW attacks), the attack of using different initial states propose by Yen et al. [20] (YHG attack), the communication users wouldn’t send qubits to Trent. That is, Trent acts as a “genuine” authenticator who is only used to authenticate the users’ identities.
2 Preliminaries The quantum-mechanical analog of the classical bit is the qubit, a two-level quantum system, which can be written as
ϕ =α 0 + β 1 . Where
0
(1)
represents the horizontal polarization, 1
represents the vertical
polarization, α and β are complex numbers that specify the probability amplitudes of the corresponding states, and α + β = 1 . Some other essential notations are defined as follows: 1 1 (1) φ ± , ψ ± (Bell states): φ ± = ( 00 ± 11 ), ψ ± = ( 01 ± 10 ) . 2 2 2
P±
(2) Q
±
, =
1 2
Q±
,
R±
S±
,
(| 001〉± | 110〉 ) , R
2
±
=
(GHZ states): 1 2
(| 010〉± | 101〉 ) , S
P ±
±
=
=
1
1
(| 011〉± | 100〉 ) .
2
(| 000〉± | 111〉 ) ,
2
Multiparty Simultaneous Quantum Secure Direct Communication
(3) I , σ x , iσ y , σ z , H
σx = 0 1 + 1 0 H=
I= 0 0 +1 1 ,
(Unitary operations):
iσ y = 0 1 − 1 0
,
,
211
σz = 0 0 − 1 1
,
1
(0 0 − 1 1 + 0 1 + 1 0). 2 (4) h(.)(One-way hash function): h :{0, 1}* × {0, 1}l → {0, 1}c , where the asterisk, l, and c represent an arbitrary length, the length of a counter, and a fixed number, respectively. (5) IDuser ( User’s identity number): The user’ identity number IDA ( IDB , or IDC ) is only known to the authenticator Trent except Alice (Bob, or Charlie). (6) AK user (User’s authentication key): AKuser = h( IDuser , cuser ) , where cuser is the
counter of calls on the hash function. And the length of AK user is luser .
3 Multiparty Simultaneous QSDC Protocol For simplicity, we take the three-party simultaneous QSDC as example, and assume there are four parties: the senders (Alice and Bob) want to send secret messages to the distant receiver Charlie simultaneously, and Trent is the third-party authentication center who will be used to supply the three-particle GHZ states and authenticate the users. It is composed of two phases: the quantum state distribution process and the many-to-one direct communication process. 3.1 Quantum State Distribution Process
The purpose of the quantum state distribution process is to let the three users (Alice, Bob and Charlie) safely share the four-particle GHZ states (shown in Fig.1). Suppose AK A ( AK B , AK C ) is the user Alice’s (Bob’s, Charlie’s) authentication key, shared between Trent and Alice (Bob, Charlie) in advance. The procedures are as follows: receive VA , VA and VC ;
I or H decode ⎯⎯⎯⎯⎯ →VA (VA , VC ); SK A ( SK B , SK C ) -------------prepare GHZ states ( S A , S B , SC ); I or H encode : ⎯⎯⎯⎯⎯ → S A ( S A , SC ); SK A ( SK B , SKC ) prepare VAC , VBC ; I or H encode : ⎯⎯⎯ → V ,VBC SKC AC
S A +VAC
VA
S B +VBC
VB
SC
VC
prepare VA to authenticate Trent;
I or H encode : ⎯⎯⎯ SK A → V
A
;
-------------obtain ( S A + VAC ) sequence; prepareVB to authenticate Trent; I or H I or H encode : ⎯⎯⎯ decode : ⎯⎯⎯ → SA ; SK B →VB ; SK A v qubits ---S A ⎯⎯⎯ → outcomes. z-basis obtain ( S B + VBC ) sequence; I or H decode : ⎯⎯⎯ SK B → S A ; same positions S B ⎯⎯⎯⎯→ outcomes. z-basis
prepare VC to authenticate Trent;
I or H encode : ⎯⎯⎯ →V SKC
C
;
-------------obtain S C sequence; I or H decode : ⎯⎯⎯ SKC → S C ; same positions SC ⎯⎯⎯⎯→ outcomes. z-basis
Fig. 1. The quantum state distribution process of three-party MSQP
212
W. Liu et al.
Step 1: Alice prepares an ordered v decoy photons sequence, namely VA sequence,
each of which is randomly chosen in { 0 , 1 }. And these decoy photons will be used for verifying the identity of Trent. It should be noted that the length v is a sufficient large for checking the noise of the quantum channel. In general, v >> l A , where l A is the length of Alice’s authentication key, AK A . Alice makes an operation H or I on each particle of VA sequence according to the bit value of AK A being 1 or 0. For instance, if the bit value is 1, an H operation is performed on the corresponding particle; otherwise, an I operation is applied (i.e., nothing is done). Since the number v is normally larger than the length l A of AK A , the key bits of AK A will be
regenerated by IDA and a new cA for operation performing. The process will be repeated until all v particles in VA sequence are encoded with corresponding H or I operations. Finally, Alice then sends VA sequence to Trent. Step 2: After receiving VA sequence, Trent decodes each of the particles by an
operation H or I also according to the bit value 1 or 0 of AK A as Alice did in Step (1). If this “Trent” is the genuine authenticator Trent, the state of each of the ordered v decoy photons will return to its initial state. Trent then measure these decoy photons in the Z-basis { 0 , 1 }, and publish the outcomes to Alice over public channel. Step 3: Alice is going to authenticate Trent as well as to check the presence of Eve. She compares her initial states of VA sequence with Trent’s measurement outcomes.
Since only the “real” Trent knows her authentication key, AK A , then if they have a sufficient large number of results that are the same, Alice would accept that Trent is the genuine Trent (the authenticator). Otherwise, if the error rate is too high, she just stops the procedure. Simultaneously, Bob and Charlie authenticate Trent in the same way, respectively. If all the users ensure Trent is the “genuine” authenticator, then continue the following steps to be authenticated by Trent. Step 4: Trent prepares an ordered sequence consisted of N (i.e., equals to n+v) threeparticle GHZ states, and each of the states is
ϕi = P + =
1 2
( 000
ABC
+ 111
ABC
) (i = 1, 2,L , N )
(2)
where the subscript indicates the ordering number of quadruplets, A, B and C represent three qubits of each GHZ state. Trent then takes particle A from each GHZ state to form an ordered particle sequence, called S A sequence. Similarly, the remaining partner particles compose S B sequence, SC sequence. And he encodes S A ( S B , SC ) with Alice’s (Bob’s, Charlie’s) authentication key AK A ( AK B , AK C ).
For example, if the bit value of AK A is 1, an H operation is performed on the corresponding particle; otherwise, an I operation is applied. The new authentication keys will be regenerated when li is not long enough.
Multiparty Simultaneous Quantum Secure Direct Communication
213
Step 5: On the same time, Trent prepares two ordered v decoy photon sequences, each of which is randomly chosen in { 0 , 1 }, namely VAC and VBC , respectively. He encodes VAC and VBC sequences according to the authentication keys AK C and AK C , respectively, as described in Step (1). Trent inserts these decoy photons of VAC ( VBC ) into S A ( S B ) sequence with random positions, and keeps these positions in secret. Then he sends the ( S A + VAC ) , ( S B + VBC ) and SC sequences to Alice, Bob and Charlie, respectively. Step 6: After making sure that Alice, Bob and Charlie have received their sequences, respectively, Trent will authenticate Alice, Bob and Charlie as well as check the security of the channels. Taking Alice as example, Trent tells Alice the positions of VAB , and Alice decodes S A by the operations {H, I} according to AK A as Trent did in Step (4). If this “Alice” is the real user Alice, the state of each qubit in VAC will return to its initial state. On the same time, Bob and Charlie perform the similar operations on S B and SC with AK B and AK C as Alice did, respectively. Step 7: Alice selects randomly v (sufficiently large) qubits from the decoded S A sequence, and makes Z-basis measurement on them, so do Bob and Charlie on the same positions, respectively. Step 8: Alice, Bob and Charlie compare their measurement results through the public channel. If the error rate is higher than expected, then Alice, Bob and Charlie abort the protocol. Otherwise they can confirm that their counter parts are legitimate and the channels are all secure. In this process, Trent is verified by all users (Alice, Bob and Charlie) in Step 1-3, each user is also verified by Trent in Step 4-8. If the error rate of Alice’s, Bob’s and Charlie’s outcomes is under the expectation, then the GHZ states are safely distributed to Alice and Bob, and then they can continue the next phase. 3.2 Many-to-One Direct Communication Process
In this process, Alice and Bob will utilize the remained n GHZ states as message carriers: they encode their messages on the particles in S A , S B sequences, and '
prepare VA' to authenticate sender; I or H encode : ⎯⎯⎯ →VA' ; SK A ( I , σ x , iσ y , σ z ) → SA; encode: ⎯⎯⎯⎯⎯ ' insert VAC , VA into S A .
obtain ( S A +VAC +VA ) sequence; ' B
obtain ( S B +VBC +V ) sequence; I or H decode : ⎯⎯⎯ SKC → VAC , VBC ; verify receiver's identity with VAC , VBC ; verify sender's identity with VA' ,VB' ; GHZ basis S A S B SC ⎯⎯⎯⎯ → outcomes.
(S A +VAC +VA' ) '
prepare VB to authenticate sender;
I or H encode : ⎯⎯⎯ →V ; SK B ( I , iσ y ) encode: ⎯⎯⎯→ S B ; insert VBC , V into S B ;
(S B +VBC +VB' )
'
B
'
B
Fig. 2. The many-to-one direct communication process of three-party MSQP
214
W. Liu et al.
transmit them to Charlie, respectively. The procedures are shown as follows (also shown in Fig.2): Step 1: Alice (Bob) prepares an ordered v decoy photons sequence, namely VA' ( VB' )
sequence, and encodes VA' ( VB' )according to AK A ( AK B ). Step 2: Alice encodes message on the remanded n-length S A sequence by performing
the four unitary operations {I , σ x ,iσ y , σ z } : two bits message corresponds to a unitary operation on a qubit, “00”Æ I , “01”Æ σ x , “10”Æ iσ y , and “11”Æ σ z . At the same time, Bob encodes his message on the remanding S B sequence as follows: “0”Æ I , “1” Æ iσ y . Then, the original ϕi becomes one of the following states. I ⊗ I ⊗ I ϕi =
1
( 000
2
σ x ⊗ I ⊗ I ϕi =
1
iσ y ⊗ I ⊗ I ϕ i =
1
2 2
σ Z ⊗ I ⊗ I ϕi =
1
I ⊗ iσ y ⊗ I ϕi =
1
( 000
1 2
ABC
1 2
( 110
( − 010
) = P+
ABC
+ 011
− 111
ABC
( − 110
ABC
+ 011
ABC
( − 010
2
iσ y ⊗ iσ y ⊗ I ϕi =
ABC
( − 100
1
σ x ⊗ iσ y ⊗ I ϕi =
σ Z ⊗ iσ y ⊗ I ϕi =
( 100
2 2
+ 111
ABC
) = S+
ABC
(4) ABC
) =| S − 〉 ABC
(5)
) =| P − 〉 ABC
(6)
ABC
+ 101
(3) ABC
ABC
) = R−
ABC
+ 001
ABC
) = Q−
ABC
− 001
ABC
) = Q+
ABC
+ 101
ABC
) = R+
ABC
(7) (8)
ABC
(9) ABC
(10) ABC
Step 3: Alice inserts these decoy photons of VAC and VA' into S A sequence with random positions, and keeps these positions in secret. It should be noted that the new positions of VAC is only known for Alice instead of Trent. On the same time, Bob also
inserts the photons of VBC and VB' into S B sequence in the same way. It should be noted that VAC , VBC will be used to verify the receiver’s identity, and VA' , VB' will be used to verify the senders’ identities. Step 4: Alice sends the message-encoded ( S A + VAC + VA' ) sequence to Charlie, Bob sends ( S B + VBC + VB' ) sequence to Charlie, respectively. Step 5: After Charlie has received ( S A + VAC + VA' ) sequence, Alice announces the
positions of VAC to Charlie over the public channel. According the announcement,
Multiparty Simultaneous Quantum Secure Direct Communication
215
Alice will authenticate the receiver (i.e., Charlie). He asks Charlie to perform the corresponding H or I operation on the particles of VAC according to the bit value of AK C being 1 or 0. If the receiver is the “true” Charlie, each qubit in VAC will return to its initial state Trent prepared. And then, Charlie measures the decoy photons of VAC in Z-basis, and tells Trent the measurement. So, Trent can judge the receiver’s identity by comparing the measurement outcomes with the initial state. If the receiver is the “true” Charlie, Charlie is going to authenticate the sender (i.e., Alice) too. After Alice tells Charlie the positions of VA' firstly, Charlie measures the decoy photons of VAC randomly in Z-basis { 0 , 1 } or X-basis {{ + , − }} and tells Trent the
outcomes and the corresponding bases. Then, Trent draws out those outcomes measured in a right basis. That is, if the bit of AK B is 0, the corresponding qubit is measured in Z-basis; if the bit of AK B is 1, the corresponding qubit is measured in Xbasis. Trent asks Alice to tell the corresponding initial states, and he can judge the sender’s identity by comparing the outcomes with their initial state. On the same time, Bob does the same authentication procedures with VBC and VB' as Alice. In a word, if the sender (or receiver) is not legitimate or the channel is not safe, then the users will abort the process. But, they will continue the next step when the users are authenticated to be legitimate as well as the channel is safe. Step 6: Charlie performs GHZ basis measurement on pairs of particles consisting of S A , S B and SC sequence. From their outcomes, she can deduce the probable operations performed by Alice and Bob, respectively (shown in Table 1).
For example, if Charlie’s measurement outcome is S + , she can infer that Alice had performed a σ x operation and Bob had chosen a I operation, that is, Alice’s message is “01”, Bob’s message is “0”. Table 1. Relations of Alice’s operation, Bob’s operation and Charlie’s measurement in the many-to-one direct communication process of the three-party MSQP Charlie’s Measurement P+
S S P R
R
Bob’s Operation
Bob’s message
I
00
I
0
ABC
σx
01
I
0
ABC
iσ y
10
I
0
ABC
σz
11
I
0
ABC
I
00
iσ y
1
ABC
σx
01
iσ y
1
ABC
iσ y
10
iσ y
1
ABC
σz
11
iσ y
1
− − −
Q
Alice’s message
ABC
+
Q
Alice’s Operation
− +
+
216
W. Liu et al.
4 Security and Efficiency Analysis 4.1 Security Analysis
Since the present protocol is similar to those analogous protocols [16-20], which are proven to be secure against outer Eve’s attacks and the impersonator’s attacks, we just analyze the security against the special Trent’s attacks here. At first, we consider the Trent’s attack proposed by Zhang et al. [17] (ZLW attack). Under this kind of attack, Trent will intercept all the particles transmitted from the senders to the receiver, and attempt to make a certain measurement in the direct communication process for stealing the secret message. Fortunately to us, there is no redundant particle of each EPR pairs to be kept in Trent’s hand, even if she can impersonate Charlie to gain all the transmitted particles, she sill cannot get any information just acting as the impersonator role. So, the present scheme is secure against ZLW attack. Finally, we take account of YHG attack, where Trent uses a different initial state instead of the original state 1 2 (| 000〉+ | 111〉 ) . Under this attack, In order to avoid be detected by the channel checking procedure in Step 8 of the quantum distribution process, Trent will prepare the different initial state 1 2 (| + + +〉 + | − − −〉 ) instead of 1 2 (| 000〉+ | 111〉 ) . Then, after Alice and Bob finished message encoded, the GHZ state becomes one of the following, I ⊗I ⊗I ϕ' =
σx ⊗ I ⊗ I ϕ ' = iσ y ⊗ I ⊗ I ϕ ' =
σZ ⊗ I ⊗ I ϕ ' = I ⊗ iσ y ⊗ I ϕ ' =
1 2 1
( +++
2 1
2
1 2 1
ABC
+ −−−
ABC
)=
( +++
ABC
− −−−
ABC
)=
( −++
ABC
− +−−
ABC
)=
ABC
+ +−−
ABC
)=
( −++
1
( + − + ABC − − + − ABC ) = 2 1 ( + − + ABC + − + − ABC ) = σ x ⊗ iσ y ⊗ I ϕ ' = 2 1 ( − − + ABC + + + − ABC ) = iσ y ⊗ iσ y ⊗ I ϕ ' = 2 1 σ Z ⊗ iσ y ⊗ I ϕ ' = ( − − + ABC − + + − ABC ) = 2
2 1
( φ+
0
AB
(ψ +
C
+ ψ+
C
+ φ+
C
+ φ−
AB
1 C)
(11)
1 C)
(12)
AB
AB
1 C)
(13)
AB
1 C)
(14)
( φ− 1 C − ψ− 0 C) AB AB 2 1 ( φ− 0 C −ψ− 1 C) AB AB 2 1 ( φ+ 0 C − ψ+ 1 C) AB AB 2 1 ( φ + AB 1 C − ψ + AB 0 C ) 2
(15)
2 1
2 1
2 1
(ψ −
( φ−
AB
AB
AB
0
0
0
C
+ ψ−
(16) (17) (18)
Trent will intercept the transmitted qubits from Alice and Bob and draw out S A and S B sequence (because he know all users’ authentication keys), measures them in Bell basis to steal some information about message. For example, suppose Trent’s measurement is φ + , according to Equations (11), (12), (17) and (18), Alice’s
Multiparty Simultaneous Quantum Secure Direct Communication
217
possible operation is one of {I , σ x ,iσ y , σ z } with the same probability 1/4, Bob’s possible operation is one of {I , iσ y } with the same probability 1/2, that is to say that there is no message to be leaked to Trent. In summary, our MSQP is secure against the eavesdropper’s attacks, the impersonator’s attacks, ZLW attacks and YHG attack. Finally, we conclude the security of some analogous protocols (shown in Table 2). 4.2 Efficiency Analysis
Recently, Li et al. [15] presented a useful efficiency coefficient to describe the total efficiency of a quantum communication, which is defined as follows.
ηtotal = mu (qt + bt )
(19)
where mu is the numbers of message transmitted, qt is the number of total qubits used and bt is the numbers of the classical bits exchanged. In the present MSQP, the communication users don’t need any classical information to help the receiver reading out the secret message, and three bits of quantum information (a three-qubit GHZ state) are utilized to communicate three bits of secret message, that is, mu = 3, qt = 3 and bt = 0. Thus, its total efficiency is ηour = 3/(3+0) = 100% in theory. In this way, the other analogous protocols’ efficiency can be computed (see Table 2). Table 2. Security and efficiency of the other analogous protocols and our protocol Protocol LLY ZLW LCL YHG Ours
Eve’s attacks Yes Yes Yes Yes Yes
Impersonator’s attacks Yes Yes Yes Yes Yes
ZLW attacks No Yes No Yes Yes
YHG attack No No Yes Yes Yes
Total Efficiency 25% 25% 50% 25% or 50% 100%
From Table 2, it is clear that our proposed protocol is more efficient than the other analogous protocols.
5 Conclusions As we all know, multiparty communication plays an important role in the modern communication. And it becomes more and more popular in the e-business and e-government society because of a certain demand. In this paper, an efficient MSQP by using GHZ states and dense coding is proposed, which is secure against the eavesdropper’s attacks, the impersonator’s attacks, and some special Trent’s attacks (including ZLW attacks and YHG attack). By introducing the decoy photon checking technology to check the channel as well as authenticating the participators, the present scheme is more feasible and economy with present-day technique. Moreover, the total efficiency of the present scheme is higher than those analogous protocols too.
218
W. Liu et al.
Acknowledgment This work is supported by the Research Foundation of NUIST (20080298), the Natural Science Foundation of Jiangsu Province, China(BK2010570) and the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (09KJB520008).
References 1. Bennett, C.H., Brassard, G.: Quantum cryptography: Public-key distribution and tossing. In: Proc. of IEEE International Conference on Computers, Systems and Signal Processing, Bangalore, India, pp. 175–179. IEEE Press, Los Alamitos (1984) 2. Ekert, K.: Quantum cryptography based on Bell’s theorem. Phys. Rev. Lett. 67, 661–663 (1991) 3. Bennett, H.: Quantum cryptography using any two nonorthogonal states. Phys. Rev. Lett. 68, 3121–3124 (1992) 4. Hillery, M., Bužek, V., Berthiaume, A.: Quantum secret sharing. Phys. Rev. A 59, 1829 (1999) 5. Deng, F.G., Long, G.L., Zhou, H.Y.: An efficient quantum secret sharing scheme with Einstein–Podolsky–Rosen pairs. Phys. Lett. A 340, 43–50 (2005) 6. Cleve, R., Gottesman, D., Lo, H.K.: How to Share a Quantum Secret. Phys. Rev. Lett. 83, 3 (1999) 7. Lance, M., Symul, T., Bowen, W.P., Sanders, B.C., Lam, P.K.: Tripartite Quantum State Sharing. Phys. Rev. Lett. 92, 17 (2004) 8. Boström, K., Felbinger, T.: Deterministic Secure Direct Communication Using Entanglement. Phys. Rev. Lett. 89, 187902 (2002) 9. Wójcik, A.: Eavesdropping on the ‘ping-pong’ quantum communication protocol. Phys. Rev. Lett. 90, 157901 (2003) 10. Deng, F.G., Long, G.L., Liu, X.S.: Two-step quantum direct communication protocol using the Einstein-Podolsky-Rosen pair block. Phys. Rev. A 68, 042317 (2003) 11. Wang, C., Deng, F.G., Long, G.L.: Multi-step quantum secure direct communication using multi-particle Green-Horne-Zeilinger state. Opt. Commun. 253, 15–20 (2005) 12. Beige, A., Englert, B.G., Kurtsiefer, C., Weinfurter, H.: Secure communication with single-photon two-qubit states. Acta Phys. Pol. A 101, 357–361 (2002) 13. Yan, F.L., Zhang, X.: A scheme for secure direct communication using EPR pairs and teleportation. Euro. Phys. J. B 41, 75–78 (2004) 14. Gao, T., Yan, F.L., Wang, Z.X.: Deterministic secure direct communication using GHZ states and swapping quantum entanglement. J. Phys. A: Math. Gen. 38, 5761–5770 (2005) 15. Li, X.H., Deng, F.G., Li, C.Y., Liang, Y.J., Zhou, P., Zhou, H.Y.: Deterministic Secure Quantum Communication Without Maximally Entangled States. J. Kerean Phys. Soc. 49, 1354–1360 (2006) 16. Lee, H., Lim, J., Yang, H.: Quantum direct communication with authentication. Phys. Rev. A 73, 042305 (2006) 17. Zhang, Z.J., Liu, J., Wang, D., Shi, S.H.: Comment on Quantum direct communication with authentication. Phys. Rev. A 75, 026301 (2007) 18. Liu, W.J., Chen, H.W., Li, Z.Q., Liu, Z.H.: Efficient quantum secure direct communication with authentication. Chin. Phys. Lett. 25, 2354–2357 (2008) 19. Qin, S.J., Wen, Q.Y., Meng, L.M., Zhu, F.C.: High Efficiency of Two Efficient QSDC with Authentication Is at the Cost of Their Security. Chin. Phys. Lett. 26, 020312 (2009) 20. Yen, A., Horng, S.J., Goan, H.S., Kao, T.W., Chou, Y.H.: Quantum direct communication with mutual authentication. Quantum Information and Computation 9, 376–394 (2009)
A PSO-Based Bacterial Chemotaxis Algorithm and Its Application Rui Zhang, Jianzhong Zhou*, Youlin Lu, Hui Qin, Huifeng Zhang Collage of Hydorpower Information Engineering Huazhong University of Scence and Technology Wuhan, Hubei, 430074, China [email protected], [email protected]
Abstract. In this paper, a new two-phased optimization algorithm called BC-PSO is presented. Bacterial Chemotaxis Algorithm is a biologically inspired optimization method which is analyzing the way bacteria react to chemoattractants in concentration gradients. Aiming at the shortcomings of BC which is lack of global searching ability with low speed, PSO is introduced to accelerate convergence before BC works. With some famous test functions been used, the numerical experiment results and comparative analysis prove that it outperforms standard PSO and GA. Finally, the hybrid algorithm is applied to solve the problem of mid-long term optimal operation of cascade hydropower stations. Comparing with the other algorithms, the operation strategy shows its higher efficiency. Keywords: bacterial chemotaxis algorithm; particle swarm operation; optimal operation; cascade hydropower station.
1 Introduction Bacterial Chemotaxis Algorithm (BC) is a biologically inspired optimization method which was pioneered by Bremermann[1] in 1974 and then proposed by Muller[2] in 2002. This novel optimization strategy is developed from bacteria processes by analogy to the way bacteria react to chemoattractants in concentration gradients[2]. On account of its simplicity and robustness, this evolution strategy has high performance in many optimization problems. However, in allusion to the limitations of the strategy which bacteria is considered individuals, some scholar proposed modified algorithm by introducing social interaction into this model[3]. Some introduce chemotaxis mechanism into PSO by using both attraction force and repulsion force[5]. Recently, modified BC method is developed to solve shop scheduling problem[4-6]. In this paper, we focus on improving the ability of global search as well as increase searching speed of the bacteria chemotaxis algorithm and develop a new two-phased optimization algorithm called BC-PSO. Particle Swarm Optimization (PSO), first introduced by Kenny and Eberhart[7], is one of the population-based optimization technique. In PSO model, all the individuals called particles change their positions *
Corresponding author.
D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 219–227, 2011. © Springer-Verlag Berlin Heidelberg 2011
220
R. Zhang et al.
with environment information. Aiming at the shortcomings of BC which is lack of global searching ability with low speed, PSO is introduced to execute the global search first and then stochastic local search works by BC. Meanwhile, elitism preservation and orthogonal initiation method are used in the paper. Combining PSO with BC can achieve global search with high searching speed and prevent the search process from falling into the local optimal solution. This hybrid algorithm is applied to solve the problem of mid-long term optimal operation of cascade hydropower stations. Comparing with the other algorithms, the operation strategy shows its feasibility and efficiency This paper is organized as follows. In Section 2, we introduce bacterial chemotaxis algorithm and standard PSO. In section 3, a new optimization algorithm called BCPSO is proposed. In section 4, we verify the effectiveness of the novel algorithm through numerical experiments. In section 5, BC-PSO is applied to the problem of mid-long term optimal operation of cascade hydropower stations. Finally, we conclude the paper in Section 6.
2 Bacterial Chemotaxis Algorithm Bacterial Chemotaxis Algorithm is a newly developed stochastic gradient evolutionary algorithm. Differed from the interaction models for behavior of social insects, bacteria is considered individual and social interaction is not used in the model. The movement of bacteria depends on its direction and the duration of the next movement step while this information is obtain by comparing an environmental property at two different time steps. On account of its simplicity and robustness, this evolution strategy is worthy of further research. In order to describe the optimization algorithm of bacterial chemotaxis clearly, the motion of a single bacterium in two dimensions is as follows. The strategy in n-dimension can be found in [2]. BC is presented below: Step 1: Computing the velocity ν which is assumed to be a constant value ν=cont.
(1)
Step 2: Computing the duration of the trajectory τ from the distribution of an exponential probability density function: P( X = τ ) =
1 −τ / T e T
(2)
The time T is given by ⎧ ⎪T0 , ⎪ T =⎨ ⎪T (1 + b f pr ), ⎪ 0 l pr ⎩
for for
f pr l pr f pr l pr
≥0
(3) <0
Where T0: The initial mean time; fpr: The difference between the actual and the previous function value;
A PSO-Based Bacterial Chemotaxis Algorithm and Its Application
221
lpr: The length of the previous setp; b: The dimensionless parameter. Step 3: Computing the new direction. The angle α between the previous and the new direction is decided by Gaussian probability density distribution for turning right or left respectively 1
P( X = α , v = μ ) =
e
−
(α − v ) 2
P( X = α , v = −μ ) =
2σ 2
σ 2π The probability density distribution for the angle α P( X = α ) =
1
σ 2π
e
1 [P( X = α, v = μ) + P( X = α, v = −μ)] 2
−
(α −v ) 2 2σ 2
(4)
(5)
Where the expectation value is μ= E(X) = 62° and the standard deviation is σ = Var(X ) = 26° . When fpr / lpr<0, the variation of the expectation value and the variance are indicated as follows:
μ = 62°(1 − cos(θ ))
(6)
σ = 26°(1 − cos(θ ))
(7)
Where −τ τ
cos( θ ) = e c pr Where τc is the correlation time, and τpr the duration of the previous step.
(8)
Step 4: Computing the new position. →
→
→
x new = x old + n μ l
(9) →
Where the length of the path l is given by l=ντ and the vector n μ is the normalized new direction. In the proposed algorithm, there are four parameters to be determined: T0, b, τc, ν. They depend on the chosen target precision ε. From the given precision we can compute the values[2]:
T0 = ε 0.3 ⋅ 10 −1.73 b = T0 (T0−1.54 ⋅ 10 0.6 ) τc = (
b 0 .31 ) ⋅ 10 1 .16 T0
(10) (11) (12)
In order to improve the algorithm, the automatic change of all strategy parameters is adapted during the optimization.
3 The PSO-Based Bacterial Chemotaxis Algorithm Generally speaking, Bacterial Chemotaxis Algorithm is a stochastic gradient search strategy because the trajectory only depends on the information of previous step or several steps before. For its randomness, this optimization algorithm can prevent the search process form falling into the local optimal solution and becoming premature
222
R. Zhang et al.
convergence. Whereas, some shortcomings still exist as a stochastic optimization strategy based on individual search. Firstly, single individual gathers limited gradient information during the search and it will waste too much time at the beginning of optimization. Meanwhile, when the difference of gradient between the previous and current step is below a certain constant, individuals will move stochastic[3]. With the precision become higher, the searching space cannot be ensured in the neighborhood of global optimal solution. Particle swarm optimization(PSO) is a efficient population-based optimization technique with high searching speed whose particles adapts their positions by using their own experience as well as the whole swarm’s. However, escaping from the local optimum and preventing premature convergence are two inevitable difficulties. Especially when the problem becomes more complex, the possibility for finding global optimum sharply decreases. The evolutionary strategy of the particle is shown as follows: ⎧⎪ v ijt +1 = w ⋅ v ijt + c1 ⋅ r1 ⋅ ( p ijt − x ijt ) + c 2 ⋅ r2⋅ ⋅ ( p gjt − x ijt ) ⎨ t +1 t t ⎪⎩ x ij = x ij + v ij
(13)
In order to make use of the local search based on the bacterial chamotaxis algorithm, PSO is introduced to improve the ability of global search as well as increase the optimization searching speed. Therefore, a two-phased optimization algorithm called BC-PSO is proposed to combine the advantages of BC and PSO. Meanwhile, elitism preservation and orthogonal initiation method are adopted. The novel hybrid method is presented as follows: Step 1: Initialization. Initialize n particles with random velocities in the problem space. Meanwhile, orthogonal initiation[10] is used for the position of particles by scattering the position uniformly over the feasible solution space. Initialize the parameters of the PSO: the acceleration constants c1, c2, inertia weight factor ω. Initialize the parameters of the BC: the start and end precision εinit, εend, precision reducing ratio α, single-pass maximal iterative generation gen. Initialize the elitism preservation ratio η. Step 2: Searching based on PSO. Evaluate the fitness of each particle in the swarm. Compare each particle’s fitness with its personal best position and global best position ptij, ptgj, if its fitness is better, replace ptij, ptgj. Update the velocity and position vtij, xtij of each particle by using (13) and then evaluate the updated particles. Step 3: Elitism preservation According to the fitness, sort the particles in sequence. Maintain the outstanding solution of the particles using the elitism preservation ratio η. Step 4: Searching based on BC Update the maintained outstanding particles by using the strategy of bacterial chamotaxis algorithm which the procedures are provided in Section 3.
A PSO-Based Bacterial Chemotaxis Algorithm and Its Application
223
Evaluate the updated particles and update the particle’s personal best position and global best position ptij, ptgj. Step 5: Stop the iteration process if the end precision is met and output the global best solution. Otherwise, return to Step 2 as well.
4 Benchmark Function Tests In order to test the effect of BC-PSO, we use several well-know functions: Sphere function: f1 ( x ) =
N
∑x i =1
x i ∈ [− 5 . 12 ,5 .12 ]
2 i
(14)
Rosenbrock function: N −1
[
f 2 ( x ) = ∑ 100 ⋅ ( x i2 − x i +1 ) 2 + ( x i − 1) 2 i =1
]
x i ∈ [− 5.12,5.12 ]
(15)
Rastrigin function: N
[
f 3 ( x ) = ∑ x i2 − 10 cos( 2πx i ) + 10 i =1
]
x i ∈ [− 5.12,5.12 ]
(16)
x i ∈ [− 600 , 600
(17)
Griewank function: f 4 ( x) = 1 +
N
∑
i =1
x i2 − 4000
N
∏ cos( i =1
xi /
i)
]
The global optimum solution of all the function is known to be zero when all the variables of function (14), (16), (17) equal to zero while (15) equal to one. For all the functions, PSO parameters are set like this: n=20, c1=c2=2, ω=0.6, and the BC search parameters are εinit=2, εend=0, α=1.25, gen=10, η=0.2, the fitness is set as the function values. Meanwhile, the genetic algorithm and particle swarm optimization are used so as to compare the result of benchmark test. The parameters for the PSO are the same as the BC-PSO, while for GA the crossover rate is set as 0.8 and mutational rate 0.1. The test functions are solved with 30 dimensions. In order to eliminate the influence of randomicity, we performed 10 independent runs for BCPSO and standard PSO, record: 1) the mean number of the function evaluations, 2) the mean function value, and 3) the globally minimal function value. The test results are shown in table 1and figure 1. Figure 1 shows the performance of BC-PSO, and we can see that the BC-PSO outperforms standard PSO and GA in solving all the benchmark functions. Since the Rosenbrock function is a multimodal pathological function, the complexity make the mean function value is 0.012 after 30020 function evaluations. From the table 1, it can be concluded that BC-PSO has much better accuracy than standard PSO and GA. Especially for the functions with lots of local optimal solutions such as Rosenbrock, Rastrigin and Griewank, the mean value of function is much close to the global minimum, which indicates the BC-PSO can get rid of the local optimal and voids premature convergence.
224
R. Zhang et al. Table 1. Result of the benchmark function tests
Function f1 f2 f3 f4
Algorithm
BC-PSO PSO GA BC-PSO PSO GA BC-PSO PSO GA BC-PSO PSO GA
Mean function value 2.59946E-39 1.736271647 0.000618552 0.012008277 125.6483068 71.36654509 0 49.22921504 22.39462404 0 51.00896195 0.388103521
Globally minimal function value 8.90E-42 0.19097903 0.000779 0.00429157 52.921745 26.4114 0 27.5128809 2.362801 0 18.7055358 0.30949
Function evaluations 300020 300020 300020 300020 300020 300020 227846 300020 300020 211268 300020 300020
Fig. 1. Comparison of four functions
5 Optimal Operation of Cascade Hydropower Stations Based on BC-PSO The mid-long term optimal operation of cascade hydropower stations is a typical multi-dimensional optimization problem. Aiming at the characteristics of optimal operation of cascade hydropower stations, lots of systematic methods and algorithms [8-9] have been developed. In this section, we apply the BC-PSO to solve Three Gorges cascade optimal operation problem. According to control the water release through the hydropower station, the operating strategy can maximize the total benefit of the cascade hydropower generation, which subject to a number of constraints.
A PSO-Based Bacterial Chemotaxis Algorithm and Its Application
225
A. Objective function The optimal operation model is maximum of Three Gorges cascade generated energy of one year: T N ⎧ i ⎪ E = max ∑ ∑ N t Δ ti t =1 i =1 ⎨ ⎪N i = K Qi H i i t t ⎩ t
(18)
Where E is the total generated energy; Niy is the average output of the i-th hydropower station at t-th period; Δti is the length of period; Ki is the output coefficient; Qit is the power flow; Hit means the water head. B. Constrains Water Balance Constraint
⎧⎪Vti+1 =Vti + (Iti − Qti − Sti )Δt (i =1,2,...,N; t =1,2,...,T) ⎨i i −1 i ⎪⎩It = Qt −τ i + qt
(19)
Where Vit+1 is the reservoir storage at the end of the period t while Vit is at the beginning, Iit is the inflow while Qit is the outflow in the same time. Sit is the losses of the reservoir. qit is the region inflow while Qi-1t-τi is the outflow. Power generation Constraint i i Nmin ≤ Nti ≤ Nmax (i = 1,2,..., N; t = 1,2,...,T )
(20)
i i Qmin ≤ Qti ≤ Qmax (i = 1,2,..., N ; t = 1,2,...,T )
(21)
Outflow Constrain
Boundary Condition i ⎧⎪ Z 0i = Z begin ⎨ i i ⎪⎩ Z T − 1 = Z end
(22)
Since Three Gorges hydropower station is an annual regulating storage reservoir and Gezhouba hydropower station is a daily regulation reservoir, the control period is set as month while water level control method is used to Gezhouba station. Considering the complexity of these constraints, power generation constraint is transformed into water level range in each operating period. In this paper, BC-PSO is unified with power station operation and water levels are corresponded to particles’ positions. Meanwhile, scheduling period is set to dimension of the searching space and the total generated energy is the fitness. All the parameters are the same as Section 5. The inflow of Three Gorges reservoir at the inflowing water frequency 0.01 is chosen to calculate. The operation results are shown in table 2. In order to compare the results of different strategies, standard PSO and DDDP are used to solve the cascade operation problem. The parameters of SPSO are all the same to the BC-PSO. For the operation method of DDDP, state variable is period water level and the range of each period water level is divided into 600 discrete points. The calculating results for comparison are shown in table 3.
226
R. Zhang et al. Table 2. Optimal operation results of Three Gorges cascade hydropower stations
Power station Inflow month (m3/s) 5590 1 5130 2 6610 3 11300 4 19100 5 28700 6 47400 7 45600 8 44400 9 29600 10 15400 11 7940 12 E(108kwh)
Sanxia Z Q (m) (m3/s) 175 5590 175 5137 174.98 9455 166.96 16900 145 19100 145 28700 145 47400 145 45600 145 35854 175 29600 175 15400 175 7940 SX
1104.07
N (104kw) 525 481 862 1314 1297 1638 1513 1525 1863 1867 1454 749 GZB
H (m) 109.81 109.88 105.41 89.21 77.81 76.16 72.30 72.68 89.66 106.00 108.40 109.41
163.53
Z (m) 64.5 64.5 64.5 64.5 64.5 64.5 64.5 64.5 64.5 64.5 64.5 64.5 total
Gezhouba Q N (m3/s) (104kw) 6149 122 5650 113 10400 190 18590 287 21010 277 31570 209 52140 114 50160 121 39440 169 32560 204 16940 271 8734 165 Process 1267.60 time
H (m) 23.87 24.12 21.79 18.35 17.48 14.25 9.86 10.22 12.36 14.00 18.98 22.58
3.76s
Table 3. Comparison between different algorithms Algorithm DDDP PSO BC-PSO
SX 1098.05 1098.22 1104.07
E(108kwh) GZB 151.23 160.58 163.53
TOTAL 1249.28 1258.80 1267.60
Processing time
6.41s 4.54s 3.76s
As shown in the table 3, the reservoir cascade optimal result computed by BC-PSO algorithm is better than DDDP and PSO, which the processing times of DDDP and PSO are longer. The total power generation computed by BC-PSO is more than PSO 8.8(108kwh) and more than DDDP 18.32(108kwh). For a discrete searching strategy, the optimization results of DDDP are influenced by the state variable precision and the initial solution. If the precision improved by dividing the water level into much more discrete points, the calculation time will increase dramatically. While BC-PSO can search the solution space continuously in the range of water level, so that a better optimal solution may be found.
6 Conclusions In this paper, a novel modified PSO-based bacteria chemotaxis algorithm which is aiming at the combination of the advantages of BC and PSO is proposed. By analyzing the results of the benchmark test, the algorithm can improve the global searching ability and increase convergence speed as well as getting rid of the local optimal solution in time which outperforms PSO. After that, investigating the performance of BC-PSO in optimization on more complex problems such as mid-long
A PSO-Based Bacterial Chemotaxis Algorithm and Its Application
227
term optimal operation of cascade hydropower stations is suggested. According to the application research on Three Gorges cascade optimal operation, BC-PSO can maximize the yearly total cascade hydropower generation benefit and solve the “dimension disaster” problem which may come out when DDDP is used. Meanwhile, the optimal result can offer a scientific guarantee for the operation of Three Gorges cascade hydropower stations. Acknowledgements. This work is granted by the Special Research Foundation for National S&T Supported Plan of China (No.2008BAB29B08), the Public Welfare Industry of the Ministry of Science and Technology and the Ministry of Water Resources (No.200701008).
References 1. Bremermann, H.J.: Chemotaxis and Optimization. J. Franklin Inst. 297, 397–404 (1974) 2. Muller, S.D., Marchetto, J., Airaghi, S., Koumoutsakos, P.: Optimization Based on Bacterial Chemotaxis. IEEE Transaction of Evolutionary Computation 6(1), 16–29 (2002) 3. weiwu, L., Hui, W., Zhijun, Z., Jixin, Q.: Function optimization method based on bacterial colony chemotaxis. Journal of Circuits and Systems 2, 58–63 (2005) 4. Guzman, M.A., Delgado, A., De Carvalho, J.: A novel multiobjective optimization algorithm based on bacterial chemotaxis. In: Engineering Applications of Artificial Intelligence, pp. 292–301 (2010) 5. Niu, B., Zhu, Y., He, X., Zeng, X.: An improved Particle Swarm Optimization Based on Bacterial Chemotaxis. In: Procedings of the 6th Word Congress on Intelligent Control and Automation, June 21-23 (2006) 6. Li, G., Liao, H., Chen, H.: Improved Bacterial Colony Chemotaxis Algorithm and its application in Available Transfer Capability. In: 2009 Fifth International Conference on Natural Computation (2009) 7. Eberhart, R., Kennedy, J.: New optimizer using particle swarm theory. In: Proceeding of the 1995 6th International Symposium on Micro Machine and Human Science, October 1995, pp. 39–43 (1995) 8. Xie, W., Ji, C., Li, X.: Particle Swarm Optimization Based on Cultural Algorithm for Short-term optimal Operation of Cascade Hydropower Stations. In: 2009 Fifth International Conference on Natural Computation (2009) 9. Lihua, C., Yakun, Z., Yadong, M., Na, Y.: Particle Swarm Optimization for long-term Operation of Cascade Hydropower Stations. In: 2010 Asia-Pacific Power and Energy Engineering Conference (APPEEC), March 28-31, pp. 1–4 (2010) 10. Leung, Y., Wang, Y.: An Orthogonal Genetic Algorithm with Quantization for Global Numerical Opti mization. IEEE Transactions on Evolutionary Computation 5(1) (February 2001)
Predicting Stock Index Using an Integrated Model of NLICA, SVR and PSO Chi-Jie Lu1,*, Jui-Yu Wu2, Chih-Chou Chiu3, and Yi-Jun Tsai3 1
Department of Industrial Engineering and Management, Ching Yun University, Jung-Li 320, Taoyuan, Taiwan, R.O.C. [email protected] 2 Department of Business Administration, Lunghwa University of Science and Technology, Guishan 333, Taoyuan,Taiwan, R.O.C. 3 Department of Business Management, National Taipei University of Technology, Taipei 10608, Taipei, Taiwan, R.O.C.
Abstract. Predicting stock index is a major activity of financial firms and private investors. However, stock index prediction is regarded as a challenging task of the prediction problem since the stock market is a complex, evolutionary, and nonlinear dynamic system. In this study, a stock index prediction model by integrating nonlinear independent component analysis (NLICA), support vector regression (SVR) and particle swarm optimization (PSO) is proposed. In the proposed model, first, the NLICA is used as preprocessing to extract features from observed stock index data. The features which can be used to represent underlying/hidden information of the original data are then served as the inputs of SVR to build the stock index prediction model. Finally, PSO is applied to optimize the parameters of the SVR prediction model since the parameters of SVR must be carefully selected in establishing an effective and efficient SVR model. Experimental results on Shanghai Stock Exchange composite (SSEC) closing cash index show that the proposed stock index prediction method is effective and efficient compared to the four comparison models. Keywords: Stock index prediction, nonlinear independent component analysis, support vector regression, particle swarm optimization.
1 Introduction Stock index prediction is an important issue in all private and institution investors when making investment decisions. However, it is a difficult task to develop an effective and efficient model since stock market is a complex, evolutionary, and nonlinear dynamic system. Stock index data may be influenced by some underlying factors like seasonal variations or economic events [1,2]. Therefore, in developing a stock index prediction model, the first important step is usually to use feature extraction to reveal the underlying/interesting information that can’t be found directly from the observed data. The performance of predictors will be improved by using the features as inputs. *
Corresponding author.
D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 228–237, 2011. © Springer-Verlag Berlin Heidelberg 2011
Predicting Stock Index Using an Integrated Model of NLICA, SVR and PSO
229
Independent component analysis (ICA) is a novel feature extraction technique. It aims at recovering independent sources from their mixtures, without knowing the mixing procedure or any specific knowledge of the sources [3]. The independent sources, called independent components (ICs), are hidden information of the observable data. ICA model can be classified as linear ICA (LICA) and nonlinear ICA (NLICA) models. Linear ICA model is mainly based on the assumption that the observed mixture data are generated by a linear mixing from ICs [3]. Conversely, the nonlinear ICA model assumes that the observations are a nonlinear combination of ICs. Linear ICA is suitable when it is reasonable to assume that the mixing mechanism is linear. However, in many realistic cases, the observed mixture data (i.e. stock index data) might be a nonlinear mixture of latent source signals [4]. Since the nonlinear mixing model is more realistic and practical for stock index data, NLICA has the potential to become a powerful tool to extract underlying/interesting information from stock index data. Nonlinear ICA remains an active field of research, few applications of using NLICA in stock price data are proposed in literatures [4,5,6]. For stock index prediction, support vector regression (SVR), a new neural network algorithm developed through statistical learning theory [7], has been widely applied for building prediction model. This is largely due to the structure risk minimization principles in SVR. The structural risk minimization principle allows researchers to obtain a better generalization from limited-size datasets. SVR attracts a lot of attentions in recent years and has become an important method in stock index forecasting problems [6,7,8]. Lu et al. [6] combined NLICA and SVR for stock price forecasting. They firstly applied NLICA to extract features and then used the features as inputs for building SVR model. Their results showed that prediction accuracy of the SVR approach integrated with NLICA can the more better than that of the SVR models, respectively, combined with linear ICA and principle component analysis (PCA). However, the performance of this integrated NLICA and SVR model can be improved by carefully selecting the parameters of SVR since it is well known that the estimation accuracy of SVR depends on setting of parameters [9]. The parameters selection of SVR is usually based on trial-and-error method or user’s prior knowledge and/or expertise. However, the cross-validation method is very time consuming and data-intensive. On the other hand, the user’s expertise method faces the risk of using single parameter set and is not appropriate for nonexpert users. Particle swarm optimization (PSO) is an evolutionary computation/ heuristic algorithm technique originally developed by [10]. It has been found to be robust and easily implemented in solving nonlinear optimization problem [10]. Therefore, PSO is used in this study for searching the best parameter set of SVR for improving the accuracy of the SVR prediction model. In this study, a stock index prediction model by integrating NLICA, SVR and PSO (called NLICA-SVR-PSO model) is proposed. In the proposed model, first, the NLICA is used as preprocessing to extract features from the collected stock index data. The features which can be used to represent underlying/hidden information of the original data are then served as the inputs of SVR to build the stock index prediction model. Finally, PSO is applied to optimize the parameters of the SVR prediction model. In order to evaluate the performance of the proposed approach, the Shanghai Stock Exchange composite (SSEC) closing cash index is used as the illustrative example.
230
C.-J. Lu et al.
The rest of this paper is organized as follows. Section 2 gives brief overviews of nonlinear ICA, SVR and PSO. The proposed model is described in Section 3. Section 4 presents the experimental results and this paper is concluded in Section 5.
2 Methodology 2.1 Independent Component Analysis Linear ICA is becoming a well research area. In the linear ICA model [3], it is assumed that m measured variables, x = [ x1, x2 , , xm ]T can be expressed as linear
combinations of n unknown latent source components s = [ s1 , s 2 ,
, s n ]T :
n
x = ∑ a j s j = As
(1)
j =1
where a j is the j-th row of unknown mixing matrix A . Here, we assume m ≥ n for A to be full rank matrix. The vector s is the latent source data that cannot be directly observed from the observed mixture data x. The linear ICA aims to estimate the latent source components s and unknown mixing matrix A from x with appropriate assumptions on the statistical properties of the source distribution. Thus, linear ICA model intents to find a de-mixing matrix W such that n
y = ∑ w j x j = Wx ,
(2)
j =1
where y = [ y1 , y 2 , , y n ]T is the independent component vector. The elements of y must be statistically independent, and are called independent components (ICs). The ICs are used to estimate the source components s j . The vector w j in equation (2) is the j-th row of the de-mixing matrix W . Since the nonlinear ICA model assumes that the observations are a nonlinear combination of latent sources, it can be formulated as the estimation of the following model [3] (3) x = F (s) where x and s denote the data and source vector as before, and F is an unknown nonlinear transformation function. Assume now for simplicity that the number of independent components equals the number of mixtures. The nonlinear ICA problem then consists of finding a mapping G : ℜ n → ℜ n that yields components y = G (x) (4) which are statistically independent. Nonlinear ICA is still less studies than linear ICA. In this study, the MISEP method proposed by Almeida [4] is adapted to solve for the ICs. MISEP uses adaptive nonlinearities at the output. These nonlinearities are intimately related to the statistical distributions of the components, and the adaptively allows the method to deal with components with a wide range of distributions.
Predicting Stock Index Using an Integrated Model of NLICA, SVR and PSO
231
The MISEP tries to perform Eq.(4) according to a mutual information criterion. The mutual information of the components of y is defined as [4] I (y ) =
∑ H ( yi ) − H ( y )
(5)
∫
where H denotes Shannon’s differential entropy, H (y ) = − p (y ) log p (y )dy , for continuous variables; The mutual information I(y) is non-negative, and is zero if the components of y are mutually statistically independent; p(y) represents the probability density of the multidimensional random variable y; pi ( yi ) will denote the marginal density of the i-th component of y. The mutual information is not affected by invertible transformations made on a percomponent basis. That is, if one performs invertible, possibly nonlinear, transformations on individual components of y, z i = ψ i ( yi ) , then I(z)=I(y). However, Eq.(5) shows that to estimate mutual information I(y) one needs to estimate the marginal densities pi ( yi ) . In principle, the joint density p(y) should also be estimated. In practical situations one normally has access only a finite set of observations x. From these, the MISEP method can compute a finite set of vectors y = G (x) , given a transformation G, but it does not have access to the actual densities pi ( yi ) [4]. These densities have to be estimated from those data. It estimates the components’ cumulative probability functions jointly with the optimization of the transformation G. The resulting ICA method has the advantage of using a single, specialized multiplayer perceptron (MLP), optimized according to a single objective function, the output entropy. 2.2 Support Vector Regression Support vector regression (SVR) is an artificial intelligent forecasting tool based on statistical learning theory and structural risk minimization principle [7]. It can be expressed as the following equation: f (x ) = (z ⋅ φ (x )) + b
(6)
where z is weight vector, b is bias and φ (x) is a kernel function which use a nonlinear function to transform the non-linear input to be linear mode in a high dimension feature space. Traditional regression gets the coefficients through minimizing the square error which can be considered as empirical risk based on loss function. Vapnik [7] introduced so-called ε-insensitivity loss function to SVR. It can be express as: ⎧ f (x) − q − ε if f ( x ) − q ≥ ε Lε ( f (x) − q) = ⎨ 0 otherwise ⎩
(7)
where q is the target output; ε defined the region of ε-insensitivity, when the predicted value falls into the band area, the loss is zero. Contrarily, if the predicted value falls out the band area, the loss is equal to the difference between the predicted value and the margin.
232
C.-J. Lu et al.
Considering empirical risk and structure risk synchronously, the SVR model can be constructed to minimize the following programming: Min :
1 T z z+C 2
∑ (ξ + ξ ) * i
i
i
⎧ qi − z T x i − b ≤ ε + ξ i ⎪ Subject to ⎨ z T x i + b − qi ≤ ε + ξi* ⎪ ξi , ξi* ≥ 0 ⎩
(
(8)
)
1 T z z is 2 the structure risk preventing over-learning and lack of applied universality; C is modifying coefficient representing the trade-off between empirical risk and structure risk. The equation (8) is a quadratic programming problem. After selecting proper modifying coefficient (C), width of band area (ε) and kernel function (K), the optimum of each parameter can be resolved though Lagrange function. Cherkassky and Ma [9] proposed that Radial basis function (RBF) is suited for solving most forecasting problems. So this paper uses RBF with parameter σ=0.2 as kernel function in SVR modeling. where i=1,…,n is the number of training data; ξi + ξi* is the empirical risk;
2.3 Particle Swarm Optimization PSO is a population based optimization tool. It resembles the social interaction from a school of flying birds. Individuals in a group of flying birds are evolved by cooperation and competition among the individuals themselves through generations [10]. Each individual, named as a “particle”, adjusts its flying according to its own flying experience and its companions’ flying experience. Each particle with its current position represents a potential solution to the problem in hand. In PSO, a number of particles, which simulate a group of flying birds, are simultaneously used to find the best fitness in the search space. At each iteration, every particle keeps track of its personal best position by dynamically adjusting its flying velocity. The new velocity is evaluated by its current velocity and the distances of its current position with respect to its previous best local position and the global best position. After a sufficient number of iterations, the particles will eventually cluster around the neighborhood of the fittest solution. Assume there are a total of t particle in the PSO. Let the ith particle be represent as Ki = [ki1 , ki 2 ,..., kin ] , i=1,2,…,t. The best previous position, i.e. the position giving the best fitness value, of particle i (called personal best) is recorded and represented as Pi L = [ piL1, piL2 ,..., pinL ] . Let P g = [ p1g , p2g ,..., png ] denote the best particle that give the current optimal fitness value among all the particles in the population. The rate of the position change, i.e. the velocity, for particle i is represented by Vi = [vi1, vi 2 ,..., vin ] . Each particle moves over the search space with a velocity dynamically adjusted according to its historical behavior and its companions. The velocity and position of a particle are updated according to the following equations:
Predicting Stock Index Using an Integrated Model of NLICA, SVR and PSO
233
vijnew = d ⋅ vij + c1 ⋅ rand 1 ⋅ ( pijL − kij ) + c 2 ⋅ rand 2 ⋅ ( pig − kij )
(9)
kijnew = kij + vijnew , for i=1, 2,…, t and j=1,2,…,n
(10)
where rand1 and rand2 are two random numbers in the range between 0 and 1; d is weighting function; c1 and c2 are two positive constants. The second term in the right-hand side of the velocity equation in Eq. (9) is interpreted as the “cognition” part, which represents the private thinking of the particle itself. The third term in Eq. (9) is the “social” part, which represents the collaboration among the particles [10]. The two positive constants c1 and c2 are the weights used to regulate the acceleration of self-cognition (a local best) and social interaction (a global best). Kennedy et al. [10] suggested c1=c2=2 since they are on average make the weights for “cognition” and “social” parts to be 1. They are also the values adopted in this study. The velocity equation in Eq. (9) calculates the particle’s new velocity in each dimension according to its previous velocity and the distances of its current position from its own best experience (i.e. position) and the group’s best experience. Then the particle moves toward a new position according to the position update equation in Eq. (10). The performance of each particle is measured by the fitness value.
3 The Integrated NLICA-SVR-PSO Scheme This paper proposes a stock index prediction model by integrating NLICA, SVR and PSO (called NLICA-SVR-PSO model). The research scheme of the proposed methodology is presented in Figure 1. As shown in Figure 1, the proposed methodology consists of three stages. The first stage of the proposed model is to smooth the forecasting variables since, in the conventional prediction model, the studied time series is usually smoothed in the first step through a suitable discrete differentiation preprocess. The original data x is processed by smoothing (logarithmic) step to have its smoothed form,
(
x s = x1s , x2s ,… , xms
)
T
. In the second stage, the ICs contained hidden information of the prediction variables are used as input variables to construct SVR forecasting model. As discussed in Section 2.2, the performance (estimation accuracy) of SVR depends on the choice of two parameters: regularization constant C, loss function ε . Therefore, in the third stage, PSO is used to determine the best parameters of SVR. The process of optimizing the SVR parameters with PSO is as follows: 1. PSO initialization: Randomly generate the initial value and velocity of each particle. A particles are composed of the parameters C and ε . Set the PSO parameters including number of particles, particle dimension, number of maximal iterations (i.e. stopping condition), and positive constants (c1 and c2). 2. Evaluate fitness of particles: the fitness function used in this study is RMSE, n
RMSE = 1 / n
∑(r − f ) i
i =1
i
2
, where t and f represent the actual and predicted
234
C.-J. Lu et al.
value, respectively, n is total number of data points. Compute the fitness value of each particle. The objective of the PSO is to minimize the fitness value. 3. Update particle’s information: Update the global and personal best according to the fitness evaluation results. Update the velocity and position value of each particle. The velocity of each particle is computed by Eqs(9) and (10). 4. Stopping condition checking: if the stopping condition (i.e. the predefined maximum iteration) is met, the procedure of PSO is stop and the optimal SVR parameters can be determined by the particle of global best. Otherwise, go to step 2. Finally, the SVR model with the optimal parameters is used to generate final prediction results of stock index. Original Predicting Variables
Data Smoothing Stage 3
Stage 1
PSO Initialization
Feature extraction by NLICA
IC1
IC2
…
Evaluate fitness of particles
ICn
Stage 2
Update particle’s information Train SVR
Optimal SVR parameters obtained
yes
Satisfy stooping conditions?
No
Prediction Results Fig. 1. Framework of the proposed NLICA-SVR-PSO prediction model
4 Experimental Results For evaluating the performance of the proposed NICA-SVR-PSO prediction model, the daily SSEC closing cash index is used in this study. In forecasting model of the SSEC index closing price, four technical indicators (the previous day’s cash market high, low, and closing prices, and the current day’s opening cash index) are used as
Predicting Stock Index Using an Integrated Model of NLICA, SVR and PSO
235
forecasting variables since they are the most widely used features in stock price prediction [8]. The daily data of technical indicators and cash prices from October 10, 2005 to September 4, 2009 of the SSEC index are provided by Bloomberg. The daily SSEC index closing cash prices are depicted in Figure 2. There are a total of 1000 data points in the dataset. The dataset was partitioned into training and testing parts, each training part representing 80% of the dataset. The prediction results of the proposed model are compared to the SVR model without using preprocessing tool and PSO (called single BPN model), the SVR model only with PSO (called single SVR-PSO model), and the SVR models that use LICA and PCA as feature extraction and PSO to optimize the parameters are generated for comparison (called LICA-SVR-PSO and PCA-SVR-PSO models, respectively). The prediction performance is evaluated using the following performance measures, namely, the root mean square error (RMSE), mean absolute difference (MAD), mean absolute percentage error (MAPE), root mean square prediction error (RMSPE) and directional accuracy (DA).
stock index
8000 6000 4000 2000 0 1
100
199
298
397
496
595
694
793
892
991
data points
Fig. 2. The daily SSEC closing cash prices from 10/10/2005 to 9/4/2009
In the experiments, the parameters of PSO are set as: number of particles is 10; particle dimension is 2 (i.e. C and ε); the maximum iteration number is 300; weighting function (d) is equal to 0.5 + rand 0.5 and c1=c2=2. For building SVR forecasting models, the LIBSVM package proposed by Chang and Lin (2001) is adapted in this study. For single SVR model, analytic parameter selection method proposed by [8] is used in this study to determine the best parameter set of C and ε. The comparison of the forecast results from the five prediction models for SSEC closing index is shown in Table 1. From the table, it can be found that the RMSE, MAD, MAPE, RMSPE and DA values for the proposed NLICA-SVR-PSO model are 21.77, 16.84, 0.694%, 0.905% and 74.00% respectively, significantly better than the other four methods. It concludes that the proposed model can produce lower prediction errors and higher prediction accuracy in the direction of change in price and it a effective stock index prediction model. Table 2 shows the comparison of elapsed times from five prediction models. It can be observed from Table 2 that the proposed NLICA-SVR-PSO model has the shortest elapsed time for modeling the prediction model. From Tables 1 and 2, it can be concluded that the proposed stock index prediction method is effective and efficient compared to the four comparison models.
236
C.-J. Lu et al.
Table 1. The SSEC closing cash index forecasting results using NLICA-SVR-PSO, LICASVR-PSO BPN, PCA-SVR-PSO, SVR-PSO and single PSO models
Prediction Model NLICA-SVR-PSO LICA- SVR- PSO PCA- SVR-PSO SVR-PSO Single SVR
RMSE 21.77 22.92 22.01 22.26 23.27
MAD 16.84 17.90 16.95 17.17 18.34
MAPE 0.694% 0.746% 0.708% 0.716% 0.762%
RMSPE 0.905% 0.967% 0.934% 0.947% 0.972%
DA 74.00% 73.50% 73.00% 73.50% 73.00%
Table 2. Comparison of elapsed times from five prediction models
Prediction Model NLICA-SVR-PSO LICA- SVR- PSO PCA- SVR-PSO SVR-PSO Single SVR
Elapsed (seconds)
ε
C
0.262 -6.106 7.971 3.321 9.000
-8.71 -14.99 -9.14 -15.00 -15.00
time 236.80 736.34 1650.34 1006.70 13139.72
5 Conclusion This paper has presented a stock index prediction model by integrating NLICA, SVR and PSO. The SSEC closing cash index is used in this study for evaluating the performance of the proposed method. This paper compared the forecasting performance of the proposed model, LICA-SVR-PSO, PCA-SVR-PSO, SVR-PSO and single SVR models in stock index prediction using prediction error and prediction accuracy as criteria. Moreover the elapsed times of the five models are also compared. Experimental results showed that the proposed NLICA-SVR-PSO model can produce the lowest prediction error and the highest prediction accuracy in the datasets. It has the shortest elapsed time for modeling prediction model and outperformed the comparison methods used in this study According to the experiments, it can be concluded that the proposed stock index prediction method is effective and efficient compared to the four comparison models.
Acknowledgements This research was partially supported by the National Science Council of the Republic of China under Grant Number NSC 98-2221-E-231-005.
References 1. Kiviluoto, K., Oja, E.: Independent Component Analysis for Parallel Financial Time Series. In: 5th International Conference on Neural Information, Tokyo, Japan, pp. 895–898 (1998)
Predicting Stock Index Using an Integrated Model of NLICA, SVR and PSO
237
2. Mok, P.Y., Lam, K.P., Ng, H.S.: An ICA Design of Intraday Stock Prediction Models with Automatic Variable Selection. In: 2004 IEEE International Joint Conference on Neural Networks, pp. 2135–2140 (2004) 3. Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. John Wiley & Sons, New York (2001) 4. Almeida, L.B.: MISEP-Linear and Nonlinear ICA Based on Mutual Information. Journal of Machine Learning Reaching 4, 1297–1318 (2003) 5. Lu, C.J., Chiu, C.C., Yang, J.L.: Integrating nonlinear independent component analysis and neural network in stock price prediction. In: Chien, B.-C., Hong, T.-P., Chen, S.-M., Ali, M. (eds.) IEA/AIE 2009. LNCS, vol. 5579, pp. 614–623. Springer, Heidelberg (2009) 6. Lu, C.J., Wu, J.Y., Fan, C.R., Chiu, C.C.: Forecasting Stock Price using Nonlinear Independent Component Analysis and Support Vector Regression. In: The IEEE International Conference on Industrial Engineering and Engineering Management (IEEM 2009), Hong Kong, China, December 8-12 (2009) 7. Vapnik, V.N.: The nature of statistical learning theory. Springer, New York (2000) 8. Lu, C.J., Lee, T.S., Chiu, C.C.: Financial time series forecasting using independent component analysis and support vector regression. Decision Support Systems 47(2), 115–125 (2009) 9. Cherkassky, V., Ma, Y.: Practical selection of SVM parameters and noise estimation for SVM regression. Neural Networks 17, 113–126 (2004) 10. Kennedy, J., Eberhart, R.C., Shi, Y.: Swarm Intelligence. Morgan Kaufmm Publishers, San Francisco (2001) 11. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2010), http://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html
Affective Classification in Video Based on Semi-supervised Learning Shangfei Wang , Huan Lin, and Yongjie Hu Key Lab of Computing and Communicating Software of Anhui Province School of Computer Science and Technology University of Science and Technology of China, Hefei, Anhui, P.R. China
Abstract. In the previous work of affective video analysis, supervised learning methods are frequently used as classifiers. However, labeling abundant examples is time consuming and even impossible for it needs annotation from human beings. While unlabeled video clips are easy to be obtained and they are adequate. In this paper, we present a semisupervised approach to recognize emotions from videos. Firstly, visual and audio features are extracted. Then bivariate correlation is used to select sensitive features. After that, low density separation, a semi-supervised learning algorithm, is adopted as the classifier. The comparative experiments on our own constructed database showed that the semi-supervised algorithm performs better than supervised one, illuminating the effectiveness and feasibility of our approach. Keywords: Affective classification, videos, semi-supervised learning.
1
Introduction
With the enormous quantity of digital videos arising in our life, efficient and powerful video labeling and retrieval methods have become more and more important and essential. Affective contents, as a kind of natural rules to classify and retrieve information for human, have been studied in the video domain[1][2][3][5][4][6] and achieved good progresses in the past decades. Most of them extract low-level features from videos, and adopt some supervised machine learning methods or combine the related domain knowledge, to construct emotion models with emotion categories or dimensions, which have been labeled manually before. Intuitively, experiments with plenteous labeled data result in better performance. Nevertheless, labeling abundant examples is time consuming and even impossible for it needs annotation from human beings. While unlabeled video clips are adequate. The problem of small training sample size has become an obstacle in this research field. Semi-supervised learning may bring light to this field, since it makes use of both labeled and unlabeled data for training - typically a small amount of labeled data with a large amount of unlabeled data. The semi-supervised learning
This paper is supported by National 863 Program (2008AA01Z122), and SRF for ROCS, SEM.
D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 238–245, 2011. c Springer-Verlag Berlin Heidelberg 2011
Affective Classification in Video Based on Semi-supervised Learning
239
methods have attracted an increasing amount of interests recently, and many feasible approaches and applications have been proposed [7].However, to our best knowledge, there are few studies related to the semi-supervised learning on affective analysis and classification of visual media. In this paper, we propose an affective video analysis method using semisupervised learning. Firstly, several visual and audio features are extracted from movie clips. Then bivariate correlation is used to select sensitive features to emotion valence and arousal respectively. After that, Low Density Separation(LDS) is adopted as the semi-supervised learning method to build the relationship between video features and valence or arousal. The comparative experiments on our own constructed database demonstrated the effectiveness of our proposed approach. The paper is structured as follows. In section 2, we introduce our proposed affective video classification method using semi-supervised learning algorithm in detail. Comparative experiments are illuminated in Section 3. Conclusions are given in Section 4.
2
Methods
Our proposed affective video classification using semi-supervised learning methods is shown in Fig.1, which consists of feature extraction, feature selection and classification. Features extracon
Visual Visual ffeatures eatures data d ata sset et
Video Video database database Features extracon
Audio Audio ffeatures eatures data d ata sset et
Features selecon
Features selecon
SSelected elected visual visual features features
Arousal relave features Valence relave features LDS ssemiLDS emisupervised supervised learning learning
SSelected elected audio audio features features
Arousal Classificaon C lassificaon Valence
Arousal relave features Valence relave features
Fig. 1. Framework of our approach
2.1
Feature Extraction
Audio information does affect human’s emotions[11][8]. Its features had been studied in audio retrieval and audio classification, and have obtained a pretty good effect. Therefore, the features, which are widely used in audio and speech processing, are chosen to construct our audio feature vector. The audio feature vector used here contained 69 low-level audio features, which were listed in Table 1. Among these features, Energy, Fundamental Frequency, LPCC and MFCC were all extracted by using the PRAAT software package[9]. It should be noted that the audio channels of the video clips were extracted as WAV format with a sampling rate of 44 kHz. From the cinematographic perspective, lighting key and color energy are very important to elicit emotions[10]. Therefore, we transformed the film frames into
240
S. Wang, H. Lin, and Y. Hu Table 1. Low-level audio features Feature set Energy Formants
Extracted features Average energy of the audio features Average and variance of the First 3 formants (6 features) Fundamental Average and variance of the Fundamenfrequency(FF) tal frequency (2 features) LPCC LPCC (16 features), derivative of LPCC (LPCC Der., 16 features) MFCC MFCC (13 features), derivative of MFCC (MFCC Der., 13 features) Silence ratio Proportion of silence in a time ZCR Zero crossing rate
the HSV space, and then extracted the lighting key. Color energy, which measures the joint valence-arousal quality of a scene arising from the color composition alone, was extracted from HLS space. Visual Excitement[10], which represents the motion in a video sequence, was also adopted as a visual feature. Finally, a total of 3 visual features have been extracted for the subsequent processes. 2.2
Feature Selection
Not all the extracted features are relevant to the emotion space. To employ the effective information and reduce the noise caused by superfluous features for the subsequent classifier, feature selection should be conducted. In this study, we take the Bivariate Correlation (a correlation calculating method in SPSS) to compute the correlations between each feature and emotion valence or arousal, and the features are selected according to the correlation threshold. Only the absolute correlation coefficient surpasses 0.15 with p-value below 0.05, the significant correlation between two vectors is supposed to exist. 2.3
Introduction to Semi-supervised Classification of LDS
The goal of semi-supervised learning method is to use unlabeled data to obtain higher accuracies with less annotating efforts. In this paper, we use the semi-supervised method called as Low Density Separation. The basic idea of LDS is to combine the following two algorithms: graph-based algorithm and ∇TSVM(Support Vector Machine: SVM). The former first builds nearest neighbor graph from all unlabeled and labeled data, then computes the distance matrix and performs a non-linear transformation to get kernel K, finally trains an SVM with kernel K and predicts. The later performs a standard gradient descent. The detailed description of the algorithms can be found in [12]. In our study, LDS is used as the classifier to recognize the emotion valence and arousal from low-level visual and audio features.
Affective Classification in Video Based on Semi-supervised Learning
3 3.1
241
Experiment Experimental Condition
Database. Our database consists of 170 video clips collected from the webside, storing in .rmvb format ranging from 1 minute to 7 minutes and the average length of them is about 3 minutes. The video clips are selected from both classical and recent popular movies, including outdoor, indoor, night, day and so on. Each video represented one of the emotions, i.e. happiness, sadness, and fear. Each video clip was assessed by six testers (three men and three women) for its emotion categories, arousal and valence on a five-point scale. We told testers, who are our students, that they would view some short video clips, and then select one of the three emotions which could best express their feelings to the video clips. In order to prevent the testers from feeling that the video clip they watch does not belong to any categories, we also offer another option “other emotions ”for testers to select. If one clip was labeled as a same emotion category by more than 3 persons, this clip was considered as this emotion, otherwise the clip was abandoned.Ultimately, we have 131 video clips, of which 21 clips are marked as sadness, 63 clips are marked as happiness, and 47 clips are marked as fear. For these clips, if the average evaluation of arousal or valence is larger than 3, they are labeled as positive. Otherwise, they are labeled as negative. Thus, 110 clips are labeled as positive and 21 are labeled as negative based on the arousal category, while 63 are labeled as positive and 68 are labeled as negative based on the valence category. Experimental design. Affective classification experiments using both SVM and LDS are conducted to verify the effectiveness of our approach, each of which consist of two parts, two discrete arousal (positive and negative) and two discrete valence (positive and negative) recognition. 10 different pieces of labeled and unlabeled data were randomly generated and the experimental results are averaged over them. We compare the classification results of LDS semi-supervised algorithm with those of SVM algorithm through changing the number of labeled data from 15 to 55 out of 131 movie clips. The parameters of the classifiers[12] are listed in Table 2. For the semi-supervised and supervised classifier, to make model select the parameters feasible, we consider the combinations of values on a finite grid. As we are interested in the best possible performance, we select the parameter values minimizing the test error. Table 2. Parameters used in the classifiers parameters
values
exponent ρ 0,20 ,21 ,22 ,23 ,24 ,+∞ penalty C
10−1 ,100 ,101 ,102
242
3.2
S. Wang, H. Lin, and Y. Hu
Results
Results of feature selection. For each of the visual features and audio features, the correlation coefficients between them and self-assessed emotions are calculated. After this process, 2 visual features and 3 audio features are selected for arousal, while 2 visual features and 17 audio features are selected for valence. For the large number of the selected features, 2 visual features and 3 audio features for arousal with absolute correlations over 0.15 are selected, 2 visual features and 11 audio features for valence with absolute correlations over 0.20 are selected and shown in Table 3. From the table, we can see that for the arousal and valence, lighting key and color energy are important features because they are with high correlations. In general, the correlations of visual and audio features for the emotion valence are greater than those for emotion arousal. Table 3. Visual and audio features with high correlation with labels Arousal Lighting Key Color Energy 5th MFCC Der. 6th MFCC Der. 7th MFCC Der.
Cor. 0.189 0.250 0.189 0.181 0.245
Valence Lighting Key Color Energy Audio Energy 1st FF 6th formant Silence ratio 5th LPCC Der. 9th LPCC Der. 11th LPCC Der. 12th LPCC Der. 13th LPCC Der. 1st MFCC 4th MFCC Der. 5th MFCC Der. 6th MFCC Der. 7th MFCC Der. 8th MFCC Der. 9th MFCC Der. 10th MFCC Der.
Cor. 0.771 0.689 0.251 -0.188 0.200 0.333 0.172 0.173 0.184 0.212 0.179 0.334 0.298 0.216 0.338 0.227 0.343 0.303 0.303
Results of the classifications. 1) Classification of arousal. Fig. 2 shows the average rate of arousal recognition on the selected visual features, audio features and the combination of visual and audio features respectively, while the quantity of labeled examples ranges from 15 to 55 in two learning algorithm. In this figure, we can see a lot of fluctuations in the experimental results of both algorithms, and both algorithms are ineffective. This is caused by the misproportion of samples. In our database, 110 of 131 video clips show positive arousal, while only 21 of them show negative one. It causes that all unlabeled data are put in the same
Affective Classification in Video Based on Semi-supervised Learning
(1)Classification of Arousal Based On Visual Features
recognition accuracy
0.86
0.84
LDS SVM
0.82
0.8
0.78 15
20
25
30
35
40
45
50
55
(2)Classification of Arousal Based On Audio Features
recognition accuracy
0.86
0.84
0.82
0.8
0.78 15
LDS SVM 20
30
35
40
45
50
55
(3)Classification of Arousal Based On Visual And Audio Features
0.86
recognition accuracy
25
0.84
0.82
0.8
0.78 15
LDS SVM 20
25
30
35
40
45
50
55
Fig. 2. Classification of Arousal (1)Classification of Valence Based On Visual Features 1
LDS SVM
recognition accuracy
0.9
0.8
0.7
0.6
0.5 15
20
25
30
35
40
45
50
55
(2)Classification of Valence Based On Audio Features 0.6
recognition accuracy
LDS SVM 0.55
0.5
0.45 15
20
25
30
35
40
45
50
55
(3)Classification of Valence Based On Visual And Audio Features
recognition accuracy
0.6
LDS SVM 0.55
0.5
0.45 15
20
25
30
35
40
45
50
Fig. 3. Classification of Valence
55
243
244
S. Wang, H. Lin, and Y. Hu
class in the classification of two algorithms. In this case, the classification rate is meaningless. 2) Classification of valence. From Fig.3 we can see the experimental results of the valence classification according to the selected visual features, audio features and the combination of visual and audio features respectively, when the amount of labeled data is incremental in the SVM and LDS algorithm. As is shown in the figure, the classification by LDS algorithm is markedly much superior to the one by SVM algorithm which is similar to the three category classification. On the other hand, from Fig.3, we can see that classification based on the visual features performs much better than that based on the audio features. It is consistent with the pretty high correlation between visual features and valence.
4
Conclusions
The contribution of this paper is applying semi-supervised learning approach to affective video classification. We first extracted the visual and audio features from videos. Then, we demonstrated the potential and feasibility of semi-supervised learning applying in the affective video analysis through comparative experiments. As a future work, we should increase the number of negative arousal samples in the database to evaluate the effectiveness of the proposed method for recognizing valence. Also, it will be helpful to weight visual and audio features before combining them together and apply a proper function on them to promote their co-operation results. Furthermore, affective classification using other semi-supervised approach, such as an enhanced co-training algorithm, will be included in the future work.
References 1. Nack, F., Dorai, C., Venkatesh, S.: Computational media aesthetics: finding meaning beautiful. IEEE Multimedia 8(4), 10–12 (2001) 2. Kang, H.-B.: Affective Contents Retrieval from Video with Relevance Feedback. Digital Libraries: Technology and Management of Indigenous Knowledge for Global Access, 243–252 (2003) 3. Hanjalic, A., Xu, L.Q.: Affective video content representation and modeling. IEEE Transactions on Multimedia 7(1), 143–154 (2005) 4. Xu, M., Jin, J.S., Luo, S., Duan, L.: Hierarchical movie affective content analysis based on arousal and valence features. In: Proceeding of the 16th ACM International Conference on Multimedia, Vancouver, British Columbia, Canada, pp. 677–680 (2008) 5. Arifin, S.: Affective Level Video Segmentation by Utilizing the Pleasure-ArousalDominance Information. IEEE Transactions on Multimedia 10(7), 1325–1341 (2008) 6. Wang, S.F., Wang, X.F.: Emotional semantic detection from multimedia: a brief overview. Kansei Engineering and Soft Computing: Theory and Practice, pp. 126– 146. IGI Global, Pennsylvania (2010) 7. Zhu, X.: Semi-Supervised Learning Literature Survey (2008), http://pages.cs.wisc.edu/jerryzhu/research/ssl/semireview.html
Affective Classification in Video Based on Semi-supervised Learning
245
8. Schuller, B., Dorfner, J., Rigoll, G.: Determination of Non-Prototypical Valence and Arousal in Popular Music: Features and Performances. EURASIP Journal on Audio, Speech, and Music Processing, Special Issue on Scalable Audio-Content Analysis 2010, 19 pages (2010) Article ID 735854 9. Boersma, P., Weenink, D.: Praat: doing phonetics by computer. Computer Program (2008) 10. Wang, H.L., Cheong, L.-F.: Affective Understanding in Film. IEEE Trans. Circuits System Video Technology 16(6), 689–704 (2006) 11. Moncrieff, S., Dorai, C., Venkatesh, S.: Affect Computing in Film through Sound Energy Dynamics. In: Proceeding of the Ninth ACM International Conference on Multimedia, Ottawa, Canada, pp. 525–527 (2001) 12. Chapelle, O., Zien, A.: Semi-supervised classification by low density separation. In: Proc. of the Tenth International Workshop on Artificial Intelligence and Statistics (2005)
Incorporating Feature Selection Method into Neural Network Techniques in Sales Forecasting of Computer Products Chi-Jie Lu1,*, Jui-Yu Wu2, Tian-Shyug Lee3, and Chia-Mei Lian4 1
Department of Industrial Engineering and Management, Ching Yun University, Jung-Li 320, Taoyuan, Taiwan, R.O.C. [email protected] 2 Department of Business Administration, Lunghwa University of Science and Technology, Guishan 333, Taoyuan,Taiwan, R.O.C. 3 Department of Business and Administration, Fu Jen Catholic University, Hsin-Chuang 24205, Taipei, Taiwan, R.O.C. 4 Graduate School of Business Administration, Fu Jen Catholic University, Hsin-Chuang 24205, Taipei, Taiwan, R.O.C.
Abstract. Sales forecasting of computer products is regarded as an important but difficult task since computer products are characterized by product variety, rapid specification changes and rapid price declines. Artificial neural networks (ANNs) have been found to be useful techniques for sales forecasting. However the inability to identify important forecasting variables is one of the main shortcomings of ANNs. For selecting an appropriate number of forecasting variables which can best improve the performance of the neural network prediction model, a commonly discussed data mining technique, multivariate adaptive regression and splines (MARS), is adapted in this study. The proposed model, firstly, uses the MARS to select important forecasting variables. The obtained significant variables are then served as the inputs for two neural network models-support vector regression (SVR) and cerebellar model articulation controller neural network (CMACNN). A real sales data collected from a Taiwanese computer dealer is used as an illustrative example. Experimental results showed that the obtained important variables from MARS can improve the forecasting performance of the SVR and CMACNN models. The proposed two-stage forecasting models provide good alternatives for sales forecasting of computer products. Keywords: Sales forecasting, computer product, feature selection, multivariate adaptive regression splines, artificial neural networks.
1 Introduction As sales forecasting can predict consumer demand before the sale begins, it can be used to determine the required inventory level to meet this demand and avoid the *
Corresponding author.
D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 246–255, 2011. © Springer-Verlag Berlin Heidelberg 2011
Incorporating Feature Selection Method into Neural Network Techniques
247
problem of under/ over stocking. In addition, sales forecasting can have implications on corporate financial planning, marketing, client management and other areas. Therefore, improving the accuracy of sales forecasting has become an important agenda in business operation. Currently, many researches have been conducted on the study of sales forecasting in various industries such as textile and clothing, fashion, book and electronics [1,2,3,4,5]. However, there is little research centered on sales forecasting of computer products [6]. With technological advancements and rapid changes in consumer demand, computer products are characterized by product variety, rapid specification changes and rapid price declines. These factors have made sales forecasting of computer products an important but difficult task. This paper will focus on the study of sales forecasting for computer products. Artificial neural networks (ANNs), such as support vector regression (SVR), have been found to be useful techniques for sales/demand forecasting due to their ability to capture subtle functional relationships among the empirical data even though the underlying relationships are unknown or hard to describe [7]. Unlike traditional time series forecasting model, such as ARIMA and multivariate regression analysis, ANNs are data-driven and non-parametric models. They do not require strong model assumptions and can map any nonlinear function without a priori assumption about the properties of the data [8]. Coupled with superior performance in constructing nonlinear models, ANNs have been successfully applied in sales/demand forecasting [7]. However the inability to identify important forecasting variables is one of the main shortcomings of ANNs. Selection of critical forecasting variables is crucial to the construction of sales forecast models, as the selection results will usually affect the accuracy of the model. Important forecasting variables that impact sales forecasting results are often the key focus areas or indicators requiring management attention. Therefore, through the discussion and understanding of these important forecasting variables will lead to improved management and sales efficiency. Hence, this study utilizes a methodology that enables faster processing as well as variable selection, multivariate adaptive regression splines (MARS), to identify important sales forecasting variables. Through the outstanding ability of MARS in screening variables, the more important variables in sales forecasting of computer products can be determined in order for companies to make better sales management decisions. In this study, a two-stage forecasting scheme by incorporating MARS into neural network techniques is proposed. In the proposed scheme, MARS is first used as a screening tool for the forecasting variables. Thereafter, the important forecasting variables that have been screened out are used as input variables for forecast model construction using SVR, a novel neural network algorithm, and CMACNN, an effective neural network model [9]. In order to evaluate the performance of the proposed hybrid sales forecasting procedure, a monthly sales data from a Taiwanese computer dealer is used as the illustrative examples. The superior forecasting capability of the proposed scheme can be observed by comparing the results with those solely using SVR and CMACNN approaches. The rest of this paper is organized as follows. Section 2 gives brief overviews of MARS, SVR and CMACNN. The proposed model is described in Section 3. Section 4 presents the experimental results and this paper is concluded in Section 5.
248
C.-J. Lu et al.
2 Methodology 2.1 Multivariate Adaptive Regression Splines Multivariate adaptive regression splines (MARS) is a nonlinear and non-parametric regression methodology [10]. It is a flexible procedure which models relationships that are nearly additive or involve interactions with fewer variables. Compared to ANN models, the major advantage of MARS is that it can identify “important” independent variables through the built basis functions when many potential variables are considered. Moreover, MARS does not need long training process and hence can save lots of modeling time when data set is huge. MARS has been widely used in the areas of forecasting and classification problems [11]. However, there are no applications using MARS in sales prediction. The modeling procedure of MARS is basically inspired by the recursive partitioning technique governing classification and regression tree and generalized additive modeling [10]. MARS excels at finding optimal variable transformations and interactions, as well as the complex data structure that often hides in high-dimensional data. MARS essentially builds flexible models by fitting piecewise linear regressions; that is, the non-linearity of a model is approximated through the use of separate linear regression slopes in distinct intervals of the independent variable space. Therefore the slope of the regression line is allowed to change from one interval to the other as the two ‘knot’ points are crossed. The variables to be used and the end points of the intervals for each variable are found through a fast but intensive search procedure. In addition to searching variables one by one, MARS also searches for interactions between variables, allowing any degree of interaction to be considered as long as the built model can better fit the data. The general MARS function can be represented using the following equation [10] Km
M
f ( x) = a0 +
∑ ∏[s am
m =1
km ( xv (k , m) − tkm )]
(1)
k =1
where a0 and am are parameters, M is the number of basis functions, Km is the number of knots, skm takes on values of either 1 or -1 and indicates the right/left sense of the associated step function, xv (k , m) is the label of the independent variable, and tkm indicates the knot location. The optimal MARS model is selected in a two-stage process. Firstly, MARS constructs a very large number of basis functions to over-fit the data initially, where variables are allowed to enter as continuous, categorical, or ordinal- the formal mechanism by which variable intervals are defined, and they can interact with each other or be restricted to enter in only as additive components. In the second stage, basis functions are deleted in the order of least contributions using the generalized cross-validation (GCV) criterion. A measure of variable importance can then be assessed by observing the decrease in the calculated GCV when a variable is removed from the model. MARS can also handle the missing-value problems using dummy variable skills. By allowing any arbitrary shape for the function and interactions as well as using the abovementioned two-stage model building procedure, MARS is capable of tracking very complex data structures that often hide in high-dimensional
Incorporating Feature Selection Method into Neural Network Techniques
249
data. Please refer to Friedman [10] for more details regarding the complete model building process. 2.2 Support Vector Regression
Support vector regression (SVR) is an artificial intelligent forecasting tool based on statistical learning theory and structural risk minimization principle [8]. It has been successfully applied in different problems of time series prediction such as production value forecasting and financial time series forecasting [6, 12]. SVR can be expressed as the following equation: f (x ) = (z ⋅ φ (x )) + b
(2)
where z is weight vector, b is bias and φ (x) is a kernel function which use a nonlinear function to transform the non-linear input to be linear mode in a high dimension feature space. Traditional regression gets the coefficients through minimizing the square error which can be considered as empirical risk based on loss function. Vapnik [8] introduced so-called ε-insensitivity loss function to SVR. It can be express as: ⎧ f (x) − q − ε if f ( x ) − q ≥ ε Lε ( f (x) − q) = ⎨ otherwise 0 ⎩
(3)
where q is the target output; ε defined the region of ε-insensitivity, when the predicted value falls into the band area, the loss is zero. Contrarily, if the predicted value falls out the band area, the loss is equal to the difference between the predicted value and the margin. Considering empirical risk and structure risk synchronously, the SVR model can be constructed to minimize the following programming: 1 ξi + ξi* Min : zT z + C 2 i
∑(
)
⎧ qi − z T x i − b ≤ ε + ξ i ⎪ Subject to ⎨ z T x i + b − qi ≤ ε + ξi* ⎪ ξi , ξi* ≥ 0 ⎩
(
)
(4)
1 T z z is 2 the structure risk preventing over-learning and lack of applied universality; C is modifying coefficient representing the trade-off between empirical risk and structure risk. The equation (4) is a quadratic programming problem. After selecting proper modifying coefficient (C), width of band area (ε) and kernel function (K), the optimum of each parameter can be resolved though Lagrange function. Cherkassky and Ma [13] proposed that Radial basis function (RBF) is suited for solving most forecasting problems. So this paper uses RBF with parameter σ=0.2 as kernel function in SVR modeling. where i=1,…,n is the number of training data; ξi + ξi* is the empirical risk;
250
C.-J. Lu et al.
2.3 Cerebellar Model Articulation Controller Neural Network The CMACNN is a supervised neural network and introduced based on the functions of the human cerebellum, which is responsible for muscle control and motor coordination [9]. A cerebellum works as follows. An input signal to the cerebellum activates many mossy fibers, each touching a granule cell. The output of the cerebellum is then sum of the activated Granule cells. A CMACNN performs cerebellum functions by a series of mappings, and acts as clever look-up table. The CMACNN has the advantages of very fast learning, reasonable generalization ability and robust noise resistance [9, 14]. A CMACNN comprises five cells, input space (X), sensory cell (S), association cell (A), physical memory cell (P) and output cell (Y), and transforms input values into output values using a series of mappings[54]. First, the data in the input space (X) is transformed to sensory cell (S). A quantization operator is used to transform components of input vectors into discrete quantization indexes. A sensory cell comprises random tables [9]. The size of each random table is calculated by: Ri = Ei + g − 1, i = 1, 2,.., N
(5)
where Ri = size of the random table of input vector xi ; Ei = quantization resolution of input vector
xi ; g= generalization size. Second, the data in the sensory cell (S) is
mapped to are mapped to random table i, the association cell (A). The segment mappings are created based on parameter g. The quantization and segment mappings generate natural interpolation, and give the CMAC NN the ability to generalize [9]. A large g increases generalization ability, but reduces the approximation accuracy. Finally, a vector a in cell association cell corresponds to a table w in physical memory cell (P). The table w stores many weights. An actual output value in output cell (Y) can be computed by the matrix operation: y = aT w
(6)
where a = (a1 , a 2,..., ak )T , w= ( w1 , w2,..., wk )T , k is the size of vectors a and w, y is the actual output. The CMACNN uses the least mean square (LMS) rule to adjust weights (w). Only activated weights need to be modified in the CMACNN algorithm.
3 The Proposed Two-Stage Forecasting Scheme This paper presented a two-stage sales forecasting model by incorporating MARS into ANN models. The schematic representation of the proposed forecasting model is illustrated in Figure 1. As shown in Figure 1, the first step of the proposed model is to use the MARS model in the original forecasting variables to identify important prediction variables through the built basis functions. In the second step, the obtained significant predictor variables serve as the input variables to construct ANN forecasting models. In this
Incorporating Feature Selection Method into Neural Network Techniques
251
study, the SVR and CMACNN models are used and the combined models of SVR and CAMCNN are respectively called MARS-SVR and MARS-CMANCC. In building the SVR forecasting model, the first step is to select the kernel function. As mentioned in Section 2.2, the RBF kernel function is adapted in this study. It is well known that the estimation accuracy of SVR depends on setting of parameters. Thus, the selection of two parameters, regularization constant C and loss function ε of an SVR model is important to the accuracy of forecasting. In this study the analytical method proposed by Lu et al. [12] is used to select a parameter set of C and ε . For the details of the analytic method, please refer to Lu et al. [12]. For constructing the CMACNN model, there are no general rules for the choice of the important parameters of CMACNN algorithm, i.e. parameters E and g. In this study, the best parameter set for building CMACNN sales forecasting model is determined by the trial-and error method. Original Prediction Variables
Building SVR/CMACNN models
MARS model
Selected important prediction variables
Sales forecasting results
Fig. 1. The proposed two-stage sales forecasting scheme
4 Experimental Results For evaluating the performance of the proposed two-stage forecasting scheme, the monthly sales data of a computer dealer in Taiwan are used in this study. The data collection period for sales amount is from January 1996 to February 2009, with a total of 158 data points for monthly sales amount. The sales trend is shown in Figure 2. For sales data, we use 80/20 as the split ratio between training and testing samples. For building forecasting models, 12 indicators, based on subjective judgment, expert opinion and by referencing the more commonly used financial market variables, are used as forecasting variables which are depicted in Table 1. Note that since relative strength (RSI) and BIAS indicators are the most widely used features in stock price prediction [15], they are considered in this study. For details about technical indicators please refer to Wood [15]. As evaluation criteria for forecast performance, this study utilizes four commonly used indicators: Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), Mean Absolute Difference (MAD) and Root Mean Square Percentage Error (RMSPE) to evaluate the accuracy of the forecast results generated by the forecast models.
252
C.-J. Lu et al.
7000
sales amount
6000 5000 4000 3000 2000 1000 0 1
14
27
40
53
66
79
92
105
118
131
144
157
Data points
Fig. 2. The monthly sales amount of a Taiwan computer dealer from January 1996 to February 2009 Table 1. List of forecasting variables in building forecasting models
Variables X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11
Description 3-month moving average (MA3) 6-month moving average(MA6) Previous month’s sales amount (T-1) Previous two months’ sales amount (T-2) Previous two months’ sales amount (T-3) 3-month relative strength indicator (RSI3) 6-month relative strength indicator (RSI6) 12-month relative strength indicator (RSI12) 3-month BIAS (BIAS3) 6-month BIAS (BIAS6) 12-month BIAS (BIAS12)
For building the two-stage forecasting schemes, i.e. MARS-SVR and MARSCMACNN models, first, the original eleven forecasting variables are passed to the MARS model to select important predictor variables. The variable selection results and obtained basis functions by using MARS are summarized in Table 2. It is observed that three variables including X1 (MA3), X4 (T-2: sales amount in t-2 period) and X3 (T-1: previous month’s sales amount) are identified from the initial 11 variables and play crucial roles in building the MARS sales forecasting model. It also can be found from Table 2 that the obtained basis functions, important predictor variables and the MARS prediction function can provide important implications for sale management and help managers make appropriate sales decisions or strategies. Then, for MARS-SVR model, the analytic parameter selection method is used for selecting the best parameter set of the MARS-SVR model. The best parameter sets for building MARS-CMACNN sales forecasting model is determined by the trial-and error method. After the selection process, the parameter set ( ε =2-5, C=2-1) is the best parameter setup for constructing the MARS-SVR sales forecasting model. The best parameter set for building the MARS-CMACNN sales forecasting is (E=30, g=20). For comparison, the single SVR and CMACNN models without using MARS as
Incorporating Feature Selection Method into Neural Network Techniques
253
feature selection tool are also constructed. For modeling the single SVR and CMACNN model, the eleven predictor variables are directly used as input variables. After using selection method mentioned before, the parameter set ( ε =2-3, C=21) is the best parameter set for single SVR model in sales forecasting. The single CMACNN model obtained the best training and testing results by using the parameter set (E, g) = (40, 30). Note that the model selection details of the three models are omitted for saving space. Table 2. Basis functions and important forecasting variables using MARS
Variable selection results Variable Relative name importance(%) X1(MA3) 100.000 X4(T-2)
78.939
X3(T-1)
46.694
Basis function Equation name BF2 = max(0, X1(MA3) - 381); BF4 = max(0, X4(T-2) - 1226); BF5 = max(0, 1226 - X4(T-2)); BF6 = max(0, X3(T-1) - 282);
MARS prediction function: f(x) = -365.252 + 3.000 * BF2 - 1.000 * BF4 + 1.000 * BF5 - 1.000 * BF6; The comparison of the forecast results from the four sales forecasting models is shown in Table 3. From the table, it can be known that the MAD, RMSE, MAPE and RMSPE values for the MARS-SVR model are 117.51, 140.72, 0.25% and 0.34% respectively, significantly better than the other three methods. Therefore the MARSSVR model can produce the best forecasting results. In addition, it can be discovered from the table that forecasting results of two-stage models are better than those of single forecasting models. For example the forecasting error of the MARS-SVR model is smaller than that of single SVR model. This show using forecasting variables that have been first screened by MARS can indeed enhance the forecast performance of the model. Hence, it is shown that MARS provides excellent ability in variable screening and interpretation. Table 3. The comparison of the forecast results from the four forecasting models
Methods SVR MARS-SVR CMACNN MARS-CMACNN
MAD 147.56 117.51 189.75 145.83
RMSE 202.01 140.72 264.06 218.00
MAPE 0.32% 0.25% 0.36% 0.36%
RMSPE 0.36% 0.34% 0.44% 0.43%
254
C.-J. Lu et al.
5 Conclusion Sales forecasting is a crucial aspect of the marketing and inventory management in managing computer products. This paper proposed a two-stage sales forecasting scheme by incorporating MARS into neural network techniques for computer products. The proposed two-stage method first uses MARS to identify significant prediction variables. The selected important variables are then served as the inputs of neural network forecasting model. In this study, the SVR and CMACNN are used in this study and then building MARS-SVR and MARS-CMACNN forecasting model. The experiments have evaluated the monthly sales data of a computer dealer in Taiwan. This study compared the proposed method with the single SVR and single CMACNN models using prediction error as criteria. Experimental results showed that the proposed MARS-SVR model can produce lower prediction error and outperformed the comparison methods used in this study. Moreover, experiments also reveled that the forecasting results of two-stage models are better than those of single forecasting models. Therefore, it can be concluded that the proposed MARS-SVR model can be a good alternative for sales forecasting of computer products since it can generate good forecasting performance. And, the MARS can effectively identify the important forecasting variables and then indeed enhance the forecast performance of the model. The selected significant prediction variables, i.e. 3-month moving average (MA3), previous two week’s sales volume (T-2) and Previous week’s sales volume (T-1), can provide valuable information for further marketing and inventory decisions/strategies.
Acknowledgements This research was partially supported by the National Science Council of the Republic of China under Grant Number NSC 99-2628-E-231-001.
References 1. Thomassey, S., Fiordaliso, A.: A hybrid sales forecasting system based on clustering and decision trees. Decision Support Systems 42(1), 408–421 (2006) 2. Thomassey, S., Happiette, M.: A neural clustering and classification system for sales forecasting of new apparel items. Applied Soft Computing 7(4), 1177–1187 (2007) 3. Ni, Y., Fan, F.: A two-stage dynamic sales forecasting model for the fashion retail. Expert Systems with Applications 38(3), 1529–1536 (2011) 4. Chang, P.C., Lai, C.Y., Lai, K.R.: A hybrid system by evolving case-based reasoning with genetic algorithm in wholesaler’s returning book forecasting. Decision Support Systems 42(3), 1715–1729 (2006) 5. Chang, P.C., Liu, C.H., Wang, Y.W.: A hybrid model by clustering and evolving fuzzy rules for sales decision supports in printed circuit board industry. Decision Support Systems 42(3), 1254–1269 (2006) 6. Lu, C.J., Wang, Y.W.: Combining independent component analysis and growing hierarchical self-organizing maps with support vector regression in product demand forecasting. International Journal of Production Economics 128(2), 603–613 (2010)
Incorporating Feature Selection Method into Neural Network Techniques
255
7. Fildes, R., Nikolopoulos, K., Crone, S.F., Syntetos, A.A.: Forecasting and operational research: A review. Journal of the Operational Research Society 59(9), 1150–1172 (2008) 8. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (2000) 9. Lu, C.J., Wu, J.Y.: Forecasting financial time series via an efficient CMAC neural network. Lecture Notes in Electrical Engineering 67, 73–82 (2010) 10. Friedman, J.H.: Multivariate adaptive regression splines (with discussion). The Annals of Statistics 19, 1–141 (1991) 11. Lee, T.S., Chiu, C.C., Chou, Y.C., Lu, C.J.: Mining the customer credit using classification and regression tree and multivariate adaptive regression splines. Computational Statistics and Data Analysis 50(4), 1113–1130 (2006) 12. Lu, C.J., Lee, T.S., Chiu, C.C.: Financial time series forecasting using independent component analysis and support vector regression. Decision Support Systems 47, 115–125 (2009) 13. Cherkassky, V., Ma, Y.: Practical selection of SVM parameters and noise estimation for SVM regression. Neural Networks 17, 113–126 (2004) 14. Wu, J.Y., Lu, C.J.: Applying classification problems via a data mining approach based on a cerebellar model articulation controller. In: 1st Asian Conference on Intelligent Information and Database Systems, Dong Hoi City, Vietnam, pp. 61–66 (2009) 15. Wood, S.: Float Analysis: Powerful Technical Indicators using Price and Volume. John Wiley & Sons, New York (2002)
Design of Information Granulation-Based Fuzzy Models with the Aid of Multi-objective Optimization and Successive Tuning Method Wei Huang1, Sung-Kwun Oh2, and Jeong-Tae Kim3 1
State Key Laboratory of Software Engineering, Wuhan University, Wuhan 430072, China 2 Department of Electrical Engineering, The University of Suwon, San 2-2, Wau-ri, Bongdam-eup, Hwaseong-si, Gyeonggi-do, 445-743, South Korea [email protected] 3 Department of Electrical Engineering, Daejin University, Gyeonggi-do, South Korea
Abstract. In this paper, we propose a hybrid identification of information granulation-based fuzzy models by means of multi-objective optimization and successive tuning method. The proposed multi-objective algorithm using a nondominated sorting-based multi-objective strategy is associated with an analysis of solution space. The granulation of information is realized by means of the C-Means clustering algorithm. Information granules formed in this way become essential at further stages of the construction of the fuzzy models by forming the centers of the fuzzy sets constituting individual rules of the inference schemes. The overall optimization of fuzzy inference systems comes in the form of two identification mechanisms: structure identification (such as the number of input variables to be used, a specific subset of input variables, the number of membership functions, and polynomial type) and parameter identification (viz. the apexes of membership function). The structure identification as well as parameter identification is simultaneously realized with the aid of successive tuning method. The evaluation of the performance of the proposed model was carried out by using two representative numerical examples such as NOx emission process data and Mackey-Glass time series. The proposed model is also contrasted with the quality of some “conventional” fuzzy models already encountered in the literature. Keywords: Multi-objective Optimization, Successive Tuning Method, Information Granulation (IG), Fuzzy Inference System (FIS).
1 Introduction There has been a diversity of approaches to fuzzy modeling such as linguistic modeling [1] and so on [2-3]. In the design of fuzzy models, there are two main and conflicting objectives. One is the accuracy and the other is the complexity. Various approaches [4-5] have been proposed to improve the accuracy of fuzzy models using evolutionary algorithms. The complexity of the model increases as a result of the accuracy maximization. Some researchers attempted to simultaneously optimize the D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 256–263, 2011. © Springer-Verlag Berlin Heidelberg 2011
Design of Information Granulation-Based Fuzzy Models
257
accuracy and the complexity of the fuzzy models [6]. It is, however, impossible to simultaneously optimize these objectives due to the existence of the accuracycomplexity tradeoff. Recently, multi-objective optimization (MOO) techniques have been applied to design fuzzy system exhibiting high accuracy and significant interpretability [7]. Nevertheless, when dealing with the IG-based fuzzy model, the previous studies lack an optimization vehicle which considers not only the solution space being explored but also the techniques of MOO. In this study, we present a multi-objective space search algorithm (MSSA) and then introduce a fuzzy identification of Information Granule-based fuzzy inference systems (IG-FIS) by means of the MSSA and successive tuning method. A MSSA-Based tuning method is developed to realize the identification of the IG-FIS. The optimization is of multi-objective character so that we deal with the simplicity as well as accuracy of the model.
2 Design of the IG-Based Fuzzy Inference Systems Granulation of information is an inherent and omnipresent activity of human beings carried out with intent of gaining a better insight into a problem under consideration and arriving at its efficient solution. The identification of the conclusion parts of the rules deals with a selection of their structure (type 1, type 2, type 3 and type 4) that is followed by the determination of the respective parameters of the local functions occurring there. The conclusion part of the rule that is extended form of a typical fuzzy rule in the TSK (Takagi-Sugeno-Kang) fuzzy model has the form.
R j : If x1 is A1c and " and xk is Akc then y j − M j = f j ( x1 ," , xk )
(1)
Type 1 (Simplified Inference): f j = a j 0 Type 2 (Linear Inference): f j = a j 0 + a j1 ( x1 − V j1 ) + " + a jk ( xk − V jk ) Type 3 (Quadratic Inference): f j = a j 0 + a j1 ( x1 − V1 j ) + " + a jk ( xk − Vkj ) + a j ( k +1) ( x1 − V1 j ) 2 + " + a j (2 k ) ( xk − Vkj )2 + a j (2 k +1) ( x1 − V1 j )( x2 − V2 j ) + " + a j (( k + 2)( k +1) / 2) ( xk −1 − V( k −1) j )( xk − Vkj )
Type 4 (Modified Quadratic Inference): f j = a j 0 + a j1 ( x1 − V1 j ) + " + a jk ( xk − Vkj ) + a j ( k +1) ( x1 − V1 j )( x2 − V2 j ) + " + a j ( k ( k +1) / 2) ( xk −1 − V( k −1) j )( xk − Vkj )
The calculations of the numeric output of the model, based on the activation (matching) levels of the rules there, rely on the following expression. n
∑ y* =
n
∑w
w ji yi
j =1 n
∑ j =1
= w ji
ji ( f j ( x1," , xk ) + M j )
j =1
n
∑ j =1
n
= w ji
∑ wˆ j =1
ji ( f j ( x1," , xk ) + M j )
(2)
258
W. Huang, S.-K. Oh, and J.-T. Kim
Here, as the normalized value of wji, we use an abbreviated notation to describe an activation level of rule R j to be in the form wˆ ji =
w ji
, wˆ ji =
n
∑w
ji
j =1
A j1 ( x1i ) × " × A jk ( xki ) n
∑A
j1 ( x1i ) ×" ×
(3)
A jk ( xki )
j =1
Where, R j is the j-th fuzzy rule, xk represents the input variables, Akc is a membership function of fuzzy sets, ajk is a constant, Vjk and Mj is a center value of the input and output data, respectively, n is the number of fuzzy rules, y* is the inferred output value, wji is the premise fitness matching R j (activation level). We consider two performance indexes: standard root mean squared error (RMSE) and mean squared error (MSE).
⎧ 1 m ⎪ ∑ ( yi − yi* )2 , ⎪ m i =1 PI (or E _ PI ) = ⎨ m ⎪ 1 ( y − y* ) 2 . ∑ i ⎪⎩ m i =1 i
( RMSE ) (4)
( MSE )
Where, y* is the output of the fuzzy model, m is the total number of data, and i is the data index. The consequence parameters ajk can be determined by the standard least-squares method that leads to the expression
aˆ = ( XT X) -1 XT Y
(5)
In the case of Type 2 we have
aˆ = [a10 " an 0 a11 " an1 " a1k " ank ]T , X = [x1
x 2 " xi " x m ]T ,
xTi = [ wˆ1i " wˆ ni ( x1i − V11 ) wˆ1i " ( x1i − V1n ) wˆ ni " ( xki − Vk1 ) wˆ1i " ( xki − Vkn ) wˆ ni ] ⎡ ⎛ Y = ⎢ y1 − ⎜ ⎜ ⎢ ⎝ ⎣
⎞ M j w j1 ⎟ ⎟ j =1 ⎠ n
∑
⎛ y2 − ⎜ ⎜ ⎝
⎞ M j wj2 ⎟ " ⎟ j =1 ⎠ n
∑
⎛ ym − ⎜ ⎜ ⎝
⎞⎤ M j w jm ⎟ ⎥ ⎟⎥ j =1 ⎠⎦ n
T
∑
3 Optimization of IG-Based Fuzzy Inference Systems 3.1 Multi-objective Space Search Algorithm
We first discuss a space search algorithm (SSA). A precondition should be satisfied when evolutionary algorithms can find the optimal solution. The precondition is, in most of local areas, a point (solution) and the other points located in the point’s adjacent space have the similar values of the objective function. In other words, in most of local areas, a solution with better fitness value is closer to the optimal solution. Moreover, if we take the entire space as the biggest local area into consideration, the
Design of Information Granulation-Based Fuzzy Models
259
precondition can be satisfied for any target optimization problems. Based on this observation, we may give rise to a space search mechanism to update the current solutions. The role of space search is to generate new solutions from old ones. The search method is based on the operator of space search, which generates two basic steps: generate a new subspace and search the new space. Search in the new space is realized by randomly generating a new solution (individual) located in this space. Regarding the generation of the new space, we consider two cases: (a) space search based on M selected solutions (denoted here as Case I), and (b) space search based on one selected solution (Case II). In Case I, the core issue is to determine the adjacent space based on the M solutions. For convenience, we use the following representations:
X k = ( x1k , x2k ,......, xnk ) ,
k = 1, 2,......, M , where n is the index of the dimension. To adjust the size of the new generating subspace, we use the coefficients ai ∈ [l , u ] as parameters, where l new
is a and u are given numbers. Suppose that V is the new generating space, X new feasible solution in V , S is the entire feasible solution space. The new space can be determined by the following expression: M
M
k =1
i =1
V = {X new xinew = ∑ ai xik ∪ X new ∈ S , where ∑ ai = 1, l ≤ ai ≤ u}
(6)
Where l < 0, and u > 1. In Case II, the role of this operator is to adjust the best solution by means of searching its adjacent space. We denoted the current best solution as
X best = ( x1best , xbest ,......, xbest ) , where x best ∈ [li , ui ], i = 1, 2,......, n. The new 2 n i adjacent space
V2 can be determined by the following expression:
new best V2 = { X new | X new = ( x1best , x2best ,......, xibest , xibest −1 , xi +1 ,......, xn ),
where xinew ≠ xibest ∪ xinew ∈ [li , ui ]}
(7)
Next, let us consider the MSSA. The non-dominated sort is realized with the aid of the crowding distance among solutions. The overall algorithm can be outlined as the following sequence of steps. Step 1. Initialize solution set. Step 2. Search space based on M solutions (Case I) Step 3. Search space based on the current best solution (Case II) Step 4. Sort (non-domination) and select the current solutions. Step 5. Repeat the steps 2 to 4 P times, where P is the number of iterations. Step 6. Output the optimal solutions. 3.2 Successive Tuning of IG-FIS
Traditionally, sequential tuning [2-3] is utilized as the optimization method. In the sequential tuning, the structural and the parametric optimization are carried out sequentially. In this study, we develop a MSSA-based successive tuning method
260
W. Huang, S.-K. Oh, and J.-T. Kim
which simultaneously realizes the structural as well as parametric optimization of the model. The arrangement of the chromosome is the same as the reference [3]. The scheme of space search operator in pseudo-code is as follows: While {the termination conditions are not met} Select M solutions (parent individuals) from the current solution set, where M is a given number. Generate random variable (r1). Calculate a variant identification ratio (p) which is a generation-based stochastic variable of the form r + (1 − gen / max gen) p= 1
λ
IF {p > 0.5} Search solution space within the first solutions for structural optimization. Else Search solution space within the second solutions for parametric optimization. End IF End while
part
of
part
of
In the MSSA-based successive tuning method, a stochastic variable (a variant identification ratio) used within a modified simple search space operator in the MSSA is used to support an efficient successive tuning embracing both the structural as well as parametric optimization of the model. Essential parameters such as gen, maxgen, and λ are given. Here, gen is an index of the current generation, maxgen stands for the maximal number of generations being used in the algorithm, and λ serves as some adjustment coefficient whose values can determine the variant identification ratio (p). 3.3 Objective Functions of IG-FIS
Three objective functions are used to evaluate the accuracy and the complexity of an IG-FIS. The three objective functions are given as performance indexes, entropy of partition and the total number of the coefficients of the polynomials to be estimated. The accuracy criterion E includes both the training data and testing data and comes as a convex combination of the two components.
E = θ × PI + (1 − θ ) × E _ PI
(8)
Here, PI and E_PI denote the performance index for the training data and testing data, respectively. θ is a weighting factor that allows us to strike a sound balance between the performance of the model for the training and testing data. As a measure for evaluating the structure complexity of a model we consider the following partition criterion: n
H = ∏ Fi
(9)
i =1
Where n is the total number of selected input variables, Fi is the number of membership functions for the ith corresponding input variable. As a simplicity criterion we consider the total number of coefficients of local models. In a nutshell, we find the Pareto optimal sets and Pareto front by minimizing
Design of Information Granulation-Based Fuzzy Models
261
{E, H, N} by means of the MSSA. This leads to easily interpretable, simple, and accurate fuzzy models.
4 Experimental Studies We consider two well-known data sets. Each data set is divided into two parts of the same size. PI denotes the performance index for the training data while E_PI stands for the testing data. In all considerations, the weighting factor θ is set to 0.5. The parameters of the MSSA are set as follows: We use a size of 200 populations (solutions) and run the method for 1,000 generations. In each generation, we first search the space based on 8 solutions generated randomly and then search the space based on the current best solution. In the simultaneous tuning, λ is set as 2.0. 4.1 NOx Emission Process Data of Gas Turbine Power Plant
N (number of coefficients)
NOx emission process [8-12] is modeled using the data of gas turbine power plants. The performance index is MSE defined by Eq. (4). We consider 260 pairs of the original input-output data. 130 out of 260 pairs of input-output data are used as the learning set; the remaining part serves as a testing set. Figure 1 illustrates Pareto fronts generated by means of the MSSA in case of using simultaneous tuning method.
80 60 40 20 0 20 20 15
10
10 5
E (accuracy)
0
0
H (partition criterion)
Fig. 1. Pareto front obtained by successive tuning using MSSA (NOx) Table 1. Comparative analysis of selected models (NOx) Model
PI
E_PI
No. of rules
Regression model Hybrid FS-FNNs [8] Hybrid FR-FNNs [9] Multi-FNN [10] Hybrid rule-based FNNs[11] Choi’s model [12] Our model
17.68 2.806 0.080 0.720 3.725 0.012 0.016
19.23 5.164 0.190 2.205 5.291 0.067 0.057
30 32 30 30 18 8
262
W. Huang, S.-K. Oh, and J.-T. Kim
Table 1 illustrates the results of comparative analysis of the proposed model when being contrasted with other models. It is evident that the proposed model outperforms several previous fuzzy models known in the literature. 4.2 Chaotic Mackey-Glass Time Series
N (number of coefficients)
The second well-known dataset is Mackey-Glass time series [13-16]. The standard RMSE is considered as performance index. The first 500 data pairs were used as the training data set of FIS while the remaining 500 pairs were the testing data set for assessing the predictive performance. Figure 2 depicts Pareto fronts generated by means of the MSSA in case of using simultaneous tuning method. The identification error of the proposed model is compared with the performance of some other models; refer to Table 2. It is easy to see that the performance of the proposed model is better in the sense of its approximation and prediction abilities.
800 600 400 200 0 0.1 40 0.05 E (accuracy)
20 0
0
H (partition criterion)
Fig. 2. Pareto front obtained by successive tuning using MSSA (Mackey) Table 2. Comparative analysis of selected models (Mackey) Model
Standard neural networks RBF neural networks ANFIS [13] Incremental type multilevel FRS [14] Aggregated type multilevel FRS[14] Hierarchical TS-FS[15] Our model
PI
E_PI
NDEI
0.018 0.015 0.0016 0.0240 0.0267 0.0120 0.00011
0.411 0.313 0.0015 0.0253 0.0256 0.0129 0.00014
0.0705 0.0172 0.007
0.0007
No. of rules
15 nodes 15 nodes 16 25 36 28 16
5 Concluding Remarks This paper contributes to the research area of fuzzy modeling in the following two aspects: 1) we proposed a multi-objective space search algorithm. The MSSA using a technique of non-dominated sort and crowding distance is designed. 2) we introduced
Design of Information Granulation-Based Fuzzy Models
263
the fuzzy identification of IG-FIS based on a MSSA-based successive tuning method. Numerical experiments show that the proposed model exhibits better performance in comparison with the fuzzy models reported in the literature. Acknowledgements. This work was supported by the GRRC program of Gyeonggi province [GRRC SUWON2010-B2, Center for U-city Security & Surveillance Technology] and also supported by the Powe Generation & Electricity Delivery the Korea Institute of Energy Technology Evaluation and Planning(KETEP) grant funded by the Korean government Ministry of Knowledge Economy (2009T100100563).
References 1. Pedrycz, W.: An identification algorithm in fuzzy relational system. Fuzzy Sets Syst. 13, 153–167 (1984) 2. Oh, S.K., Pedrycz, W.: Identification of Fuzzy Systems by means of an Auto-Tuning Algorithm and Its Application to Nonlinear Systems. Fuzzy Sets and Syst. 115, 205–230 (2000) 3. Park, B.J., Pedrycz, W., Oh, S.K.: Identification of Fuzzy Models with the Aid of Evolutionary Data Granulation. IEEE Proc.-Control Theory and Applications 148, 406– 418 (2001) 4. Lin, F.J., Teng, L.T., Lin, J.W., Chen, S.Y.: Recurrent Functional-Link-Based FuzzyNeural-Network-Controlled Induction-Generator System Using Improved Particle Swarm Optimization. IEEE Trans. Indust. Elect. 56, 1557–1577 (2009) 5. Huang, W., Ding, L., Oh, S.K., Jeong, C.W., Joo, S.C.: Identification of fuzzy inference system based on information granulation. KSII Transactions on Internet and Information Systems 4, 575–594 (2010) 6. Setnes, M., Roubos, H.: GA-based modeling and classification: complexity and performance. IEEE Trans. Fuzzy Syst. 8, 509–522 (2000) 7. Ishibuchi, H., Nojima, Y.: Analysis of interpretability-accuracy tradeoff of fuzzy systems by multiobjective fuzzy genetics-based machine learning. Int. J. Approx. Reaso. 44, 4–31 (2007) 8. Oh, S.K., Pedrycz, W., Prak, H.S.: Hybrid identification in fuzzy-neural networks. Fuzzy Set Syst. 138, 399–426 (2003) 9. Park, H.S., Oh, S.K.: Fuzzy relation-based fuzzy neural-networks using a hybrid identification algorithm. Int. J. Cont., Autom. Syst. 1, 289–300 (2003) 10. Park, H.S., Oh, S.K.: Multi-FNN identification based on HCM clustering and evolutionary fuzzy granulation. Int. J. Cont., Autom., Syst. 1, 194–202 (2003) 11. Oh, S.K., Pedrycz, W., Prak, H.S.: Implicit rule-based fuzzy-neural networks using the identification algorithm of hybrid scheme based on information granulation. Adv. Eng. Inform. 16, 247–263 (2002) 12. Choi, J.N., Oh, S.K., Pedrycz, W.: Identification of fuzzy relation models using hierarchical fair competition-based parallel genetic algorithms and information granulation. Applied Mathematical Modelling 33, 2791–2807 (2009) 13. Jang, J.S.R.: ANFIS: adaptive-network-based fuzzy inference system. IEEE Trans. Syst., Man Cybern. 23, 665–685 (1993) 14. Duan, J.C., Chung, F.L.: Multilevel fuzzy relational systems: structure and identification. Soft. Comput. 6, 71–86 (2002) 15. Chen, Y., Yang, B., Abraham, A.: Automatic design of hierarchical Takagi-Sugeno type fuzzy systems using evolutionary algorithms. IEEE Trans. Fuzzy Syst. 15, 385–397 (2007)
Design of Fuzzy Radial Basis Function Neural Networks with the Aid of Multi-objective Optimization Based on Simultaneous Tuning Wei Huang1, Lixin Ding1, and Sung-Kwun Oh2 1
State Key Laboratory of Software Engineering, Wuhan University, Wuhan 430072, China 2 Department of Electrical Engineering, The University of Suwon, San 2-2, Wau-ri, Bongdam-eup, Hwaseong-si, Gyeonggi-do, 445-743, South Korea [email protected]
Abstract. In this paper, we concerns a design of fuzzy radial basis function neural network (FRBFNN) by means of multi-objective optimization. A multiobjective algorithm is proposed to optimize the FRBFNN. In the FRBFNN, we exploit the fuzzy c-means (FCM) clustering to form the premise part of the rules. As the consequent part of fuzzy rules of the FRBFNN model, four types of polynomials are considered, namely constant, linear, quadratic, and modified quadratic. The least square method (LSM) is exploited to estimate the values of the coefficients of the polynomial. In fuzzy modeling, complexity, interpretability (or simplicity) as well as accuracy of the obtained model are essential design criteria. Since the performance of the RBFNN model is directly affected by some parameters such as e.g., the fuzzification coefficient used in the FCM, the number of rules and the orders of the polynomials in the consequent parts of the rules, we carry out both structural as well as parametric optimization of the network. The proposed multi-objective algorithm is used to optimize the parameters of the model while the optimization is of multi-objective character as it is aimed at the simultaneous minimization of complexity and maximization of accuracy. Keywords: Multi-objective optimization, fuzzy radial basis function neural network (FRBFNN), Fuzzy C-Means (FCM), least square method (LSM).
1 Introduction Fuzzy Radial Basis Function Neural Networks (FRBFNNs) are designed by integrating the principles of a Radial Basis Function Neural Network (RBFNN) and exploiting the mechanisms of information granulation provided by the Fuzzy C-Means (FCM) [1]. In the design of FRBFNN, there are two main and conflicting objectives. One is the accuracy and the other is the complexity. Various approaches have been proposed to improve the accuracy of fuzzy models using evolutionary algorithms such as Particle Swarm Optimization (PSO) [2]. Those methods usually help improve the accuracy of the resulting fuzzy model. The complexity of the model increases as a result of the accuracy maximization. Some researchers attempted to simultaneously D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 264–273, 2011. © Springer-Verlag Berlin Heidelberg 2011
Design of FRBFNNs with the Aid of Multi-objective Optimization
265
optimize the accuracy and the complexity of the fuzzy models [3]. It is, however, impossible to simultaneously optimize these objectives due to the existence of the accuracy-complexity tradeoff. Recently, a number of evolutionary algorithms (EAs) have been developed to solve multi-objective optimization problems. As a result, Multi-objective optimization (MOO) techniques have been applied to design fuzzy system exhibiting high accuracy and significant interpretability [4]. Nevertheless, when dealing with the RBFNN model, the previous studies lack of an optimization vehicle which considers not only the solution space being explored but also the techniques of MOO. In this study, we present a multi-objective space search algorithm (MSSA) and then introduce a design of FRBFNN with the aid of the MSSA. To alleviate the problems presented earlier, we used the MSSA as a vehicle to maximize the accuracy of the FRBFNN. The optimization is of multi-objective character so that we deal with the simplicity as well as accuracy of the model. To reflect the multi-objective character of the design, we consider the root mean squared error (RMSE) to quantify accuracy and the structure complexity as well as the total number of polynomial coefficients in the consequence part of the fuzzy rules to articulate simplicity of the model. The structure of the paper is reflective of the main objectives of the study. Section 2 introduces the architecture of the FRBFNN. Section 3 presents the MSSA and a multi-objective optimization of FRBFNN using the MSSA. Section 4 reports the experimental results. Finally, some conclusions are drawn in Section 5.
2 A Design of the FRBFNN Let us recall that in conventional RBFNNs, the activation function of a hidden node is typically realized in the form of some Gaussian function. The location of any RBF in the input space is uniquely specified by its center and width (spread). The number of the receptive fields has to be decided in advance. The output of the RBFNN is a weighted sum of the activation levels of the individual RBFs. The learning algorithm of the RBFNN is used to adjust the weights of the links from the hidden layer to the output layer. Fuzzy clustering may be used to determine the number of RBFs, as well as a position of their centers and the values of the widths [5]. The gradient - method or the least square algorithm is used to realize parametric learning to deal with the conclusions of the rules [6],[7]. 2.1 Architecture of the FRBFNN As pointed out by Jang, even though RBFNNs and fuzzy inference systems (FIS) exhibit different conceptual roots, they share common functional characteristics and exhibit similar learning strategies. RBFNNs and FISs having rules with constant consequent (a zero-order Takagi-Sugeno model) are functionally equivalent [8]. In the architecture of a conventional RBFNN, the number of the RBF units is equal to the number of fuzzy “if-then” rules present in the FIS. Each RBF of the RBFNN can be treated as membership functions defined in the multivariable input space of the premise part of the corresponding fuzzy rule. The weights of the links are present in the consequent parts of the fuzzy rules. The RBFs or membership functions divide the
266
W. Huang, L. Ding, and S.-K. Oh
entire input space into the corresponding sub-spaces. In the FRBFs, the FCM is used to form the receptive fields (RBFs). In this manner, the RBFs do not assume any explicit functional form (such as Gaussian, ellipsoidal, triangular and others) but are of implicit nature as the resulting activation level (degree of membership) result from the computing of the relevant distances between data and the corresponding prototypes (centers of the receptive fields). The FRBFNN has the advantage by providing a through coverage of the entire input space. Given the zero-order polynomial used as a local model, the resulting accuracy of the model becomes limited. The FRBFNN consists of the premise part and consequent part as illustrated in Fig. 1.
xkl
f1 ( xk1 , ", xkl , v1 )
A2 w2 k
xk 2
#
w1k
A1
xk 1
f 2 ( xk 1 , ", xkl , v 2 )
∑
#
# wnk
An
yˆ k
f n ( xk 1 , ", xkl , v n )
Fig. 1. Architecture of FRBFNN
2.2 Premise Part of the FRBFNN with the Use of Fuzzy C-Means Let us consider the set X composed by m vectors located in a certain l-dimensional Euclidean space, that is, X = {x1 , x 2 , ", x m } , x k = {xk1 , xk 2 ,", xkl } ∈ ℜl , where m is the number of input data, l is the number of input variables. Clustering results in the assignment of the input vectors x k ∈ X into n clusters where the clusters are represented by the prototypes (center values) v i = {vi1 , vi 2 , " , vil } ∈ ℜ l ,
1≤ i ≤ n.
Let
ℜ denote the set of real n × l matrices with the entries in [0,1]. Then U = [uik ] ∈ ℜnl . The level of assignment of x k ∈ X to the ith cluster is expressed by the membership function uik = ui ( x k ) . Fuzzy partitions U of X satisfy the condition nl
n
∑u
ik
= 1,
1≤ k ≤ m
(1)
i =1
and m
0 〈 ∑ uik 〈 m, 1 ≤ i ≤ n
(2)
k =1
The FCM algorithm develops the structure in data by minimizing the following objective function
Design of FRBFNNs with the Aid of Multi-objective Optimization n
267
m
J m = ∑∑ (uik ) r d (x k , v i ), 1 〈 r 〈 ∞
(3)
i =1 k =1
In (3), r is the fuzzification coefficient, d (x k , v i ) is a distance between the input
vector x k ∈ X and the prototype v i ∈ ℜl . Quite commonly it comes in the form of the weighed Euclidean distance computed between x k and v i and defined as follows
d (x k , v i ) = x k − v i
2
l
=∑
( xkj − vij )
j =1
2
σ 2j
(4)
where σ 2j is the variance of the lth input variable. This type of distance equipped with the weights allows us to deal with variables of substantial variability. Under the assumption of the form of the weighted Euclidean distance, the necessary conditions for solutions (U,V) of min{ J m ( U, V )} are specified as
uik = wik =
1 ⎛ xk − vi ⎜ ∑ ⎜ xk − v j i =1 ⎝ n
⎞ ⎟ ⎟ ⎠
( 2 /( r −1))
, 1≤ k ≤ m , 1≤ i ≤ n
(5)
and m
vi =
∑u k =1 m
r ik
∑u k =1
xk
1≤ i ≤ n
,
(6)
r ik
2.3 Consequence Part of the FRBFNN with the Use of Least Square Method The learning of the consequent part is concerned with the estimation of the parameters of the polynomials of the local models. In this study, we consider the least square method (LSM) as the learning of the consequent part of the rules. LSM is a well known global learning algorithm that minimizes an overall squared error JG between output of the model and the experimental data. n ⎞ ⎛ J G = ∑ ⎜ y k − ∑ wik f i ( x k − v i ) ⎟ k =1 ⎝ i =1 ⎠ m
2
(7)
wik is the normalized firing (activation) level of the i-th rule as expressed by (5). The performance index J G can be expressed in a concise form as follows
where
J G = (Y − Xa) (Y − Xa ) T
(8)
where a is the vector of coefficients of the polynomial, Y is the output vector of real data, X is matrix which rearranges input data, information granules (centers of each
268
W. Huang, L. Ding, and S.-K. Oh
cluster) and activation level. In case all consequent polynomials are linear (first-order polynomials), X and a can be expressed as follows
⎡1 w11(x11 − v11) ⎢# # ⎢ X = ⎢1 w1k ( x12 − v11) ⎢# # ⎢ ⎣⎢1 w1m(x1m − v11) 6
" w11(xl1 − v1l ) % # " w1k ( xl1 − v1l ) % # " w1m(xlm − v1l )
a11 "
" % " %
↵
thefirstfuzzyrule
a = [a10
1 wn1(x11 − vn1) # # 1 wnk (x12 − vn1) # # " " 1 wnm(x1m − vn1) " % " %
a1l
6
" wn1(xl1 − vnl ) ⎤ ⎥ % # ⎥ " wnk (xl1 − vnl ) ⎥ ⎥ % # ⎥ " wnm(xlm − vnl )⎦⎥ ↵
then'th fuzzyrule
" " an 0
a n1 " anl ]
T
The optimal values of the coefficients of the consequent are determined in the form
a = ( X T X) −1 XT Y
(9)
3 Optimization of the FRBFNN 3.1 Multi-objective Space Search Algorithm In the previous study, we proposed a space search algorithm (SSA) [9] with single object at first. The role of space search is to generate new solutions from old ones. The search method is based on the operator of space search, which generates two basic steps: generate new subspace (local area) and search the new space. Search in the new space is realized by randomly generating a new solution (individual) located in this space. Regarding the generation of the new space, we consider two cases: (a) space search based on M selected solutions (denoted here as Case I). The role of this operator is to update the current solutions by new solutions approaching to the optimum. the core issue is to determine the adjacent space based on the M solutions. For convenience, we use the following representations:
X k = ( x1k , x2k ,......, xnk ) ,
k = 1, 2,......, M , where n is the index of the dimension. To adjust the size of the new generating subspace, we use the coefficients ai ∈ [l , u ] as parameters, where l new
and u are given numbers. Suppose that V is the new generating space, X is a new feasible solution in V , S is the entire feasible solution space. The new space can be determined by the following expression: M
M
k =1
i =1
V = {X new xinew = ∑ ai xik ∪ X new ∈ S , where ∑ ai = 1, l ≤ ai ≤ u}
(10)
Where l < 0, and u > 1. (b) space search based on one selected solution (Case II). The role of this operator is to adjust the best solution by means of searching its adjacent space.
Design of FRBFNNs with the Aid of Multi-objective Optimization
We denoted the current best solution as
269
X best = ( x1best , xbest ,......, xbest ) , where 2 n
xbest ∈ [li , ui ], i = 1, 2,......, n. The new adjacent space V2 can be determined by i the following expression: new best V2 = { X new | X new = ( x1best , x2best ,......, xibest , xibest −1 , xi +1 ,......, xn ),
where xinew ≠ xibest ∪ xinew ∈ [li , ui ]}
(11)
Based on the SSA, we can develop a MSSA using the non-dominated sort which is realized with the aid of estimation of the crowding distance among solutions in the current solution set S. The pseudo-code of MSSA can be briefly outlined as follows: Initialize solution set (population). While { the termination conditions are not met } Randomly select M solutions (parent individuals) from the current solution set, where M is a given number. Search Space (Case I) Search Space (Case II) Sort the current solutions with the aid of nondomination. Select solutions from the current solution set. End while Output the optimal solutions.
3.2 Objective Functions of FRBFNN The arrangement of the chromosome for FRBFNN is the same as the reference [13]. A solution is represented as a vector involving the fuzzification coefficient, the number of fuzzy rule and the type of polynomial of consequent part for fuzzy rules. Three objective functions are used to evaluate the accuracy, the complexity and the interpretability of an FRBFNN. The three objective functions are the RMSE, entropy of partition and the total number of the coefficients of the polynomials to be estimated. The RMSE being the accuracy criterion of the IG-RBFNN are given as
E=
1 m ( yk − yˆ k ) 2 ∑ m k =1
(12)
As a measure for evaluating the structure complexity of a model we consider the entropy of the partition [11]. The entropy of partition reflects a degree of overlap between the regions of the fuzzy sets. Considering all samples of the training dataset, the entropy of the partition reads as n
m
H = −∑∑ wik log( wik )
(13)
j =1 k =1
As a simplicity criterion we consider the total number of coefficients of the local models. In a nutshell, we find the Pareto optimal sets and Pareto front by minimizing
270
W. Huang, L. Ding, and S.-K. Oh
{E, H, N} by means of the MSSA. This leads to easily interpretable, simple, and accurate fuzzy models.
4 Experimental Studies We use two well-known data sets. PI denotes the performance index for the training data and E_PI stands for the testing data. The parameters of the MSSA are set as follows: We use 200 as the dimension of populations and run the method for 150 generations. In each generation, we first search the space based on 8 solutions generated randomly and then search the space based on the current best solution. 4.1 Automobile Miles Per Gallon (MPG) Data We consider the automobile MPG data (ftp://ics.uci.edu/pub/machine-learningdatabased/auto-mpg) with the output being the automobile’s fuel consumption expressed in miles per gallon. The data set includes 392 input-output pairs (after removing incomplete instances) where the input space involves 8 input variables. The automobile MPG data is partitioned into two separate parts. The first 235 data pairs are used as the training data set for FRBFNN while the remaining 157 pairs are the testing data set for assessing the predictive performance. Figure 2 illustrates Pareto fronts generated by means of the MSSA. Table 1 illustrates the results of comparative analysis of the proposed model when being contrasted with other models. To compare with other models, here we select the solution with best accuracy in the Pareto front. It is evident that the proposed model outperforms several previous fuzzy models known in the literature. 4.2 Boston Housing Data
The number of coefficient
Here we experiment with the Boston housing data set. This data set concerns a description of real estate in the Boston area where houses are characterized by features
2000 1500 1000 500 0 6 100
4 50
2 Accuracy
0
0
Partition entropy
Fig. 2. Pareto front obtained by using MSSA (MPG)
Design of FRBFNNs with the Aid of Multi-objective Optimization
271
Table 1. Results of comparative analysis (MPG) Model
PI
The number of coefficient
Linguistic model [12] RBFNN [13] Functional RBFNN [13] Our model
E_PI
2.86 3.24 2.41 2.74
No. of rules
3.24 3.62 2.82 2.77
36 36 33 5
1500
1000
500
0 10 300 5
200 100 0
Accuracy
0
Partition entropy
Fig. 3. Pareto front obtained by using MSSA (Housing) Table 2. Results of comparative analysis (Housing) Model
NN RBFNN [13] SVR [14] MARS [15] FPNN Our model
PI
3.27 6.63 1.17 0.97 3.51 2.78
E_PI
5.14 7.14 5.84 5.96 16.93 4.60
No. of rules
24 20
16 6
such as crime rate, size of lots, number of rooms, age of houses, etc. and their median price. The dataset consists of 506 14-dimensional data. The dataset is split into two separate parts. The construction of the fuzzy model is completed for 253 data points being regarded as a training set. The rest of the data set (i.e. 253 data points) is retained for testing purposes. Fig. 3 shows Pareto fronts generated by means of the MSSA. The identification error of the proposed model is compared with the performance of some other model; refer to Table 2. To compare with other models, here we select the solution with best accuracy in the Pareto front. It is easy to see that the performance of the proposed model is better in the sense of its approximation and prediction abilities.
272
W. Huang, L. Ding, and S.-K. Oh
5 Concluding Remarks This paper contributes to the research area of FRBFNN in the following two aspects: 1) we proposed a multi-objective space search algorithm. The MSSA using a technique of non-dominated sort and crowding distance is designed on the basis of a space search algorithm. 2) we introduced a design of FRBFNN by means of the MSSA. Numerical experiments show that the proposed model exhibits better performance in comparison with the fuzzy models reported in the literature. Acknowledgements. This work was supported by National Research Foundation of Korea Grant funded by the Korean Government (NRF-2010-D00065), the GRRC program of Gyeonggi province [GRRC SUWON2010-B2, Center for U-city Security & Surveillance Technology], and the National Natural Science Foundation of China (Grant No. 60975050).
References 1. Mitra, S., Basak, J.: FRBF: A Fuzzy Radial Basis Function Network. Neural Comput. Applic. 10, 244–252 (2001) 2. Lin, F.J., Teng, L.T., Lin, J.W., Chen, S.Y.: Recurrent Functional-Link-Based FuzzyNeural-Network-Controlled Induction-Generator System Using Improved Particle Swarm Optimization. IEEE Trans. Indust. Elect. 56, 1557–1577 (2009) 3. Setnes, M., Roubos, H.: GA-based modeling and classification: complexity and performance. IEEE Trans. Fuzzy Syst. 8, 509–522 (2000) 4. Ishibuchi, H., Nojima, Y.: Analysis of interpretability-accuracy tradeoff of fuzzy systems by multiobjective fuzzy genetics-based machine learning. Int. J. Approx. Reaso. 44, 4–31 (2007) 5. Yen, G.G., Leong, W.F.: Dynamic Multiple Swarms in Multiobjective Particle Swarm Optimization. IEEE Trans. Syst., Man Cybern.-PART A 39, 890–911 (2009) 6. Ishibuchi, H., Nojima, Y.: Analysis of interpretability-accuracy tradeoff of fuzzy systems by multiobjective fuzzy genetics-based machine learning. Int. J. Approx. Reaso. 44, 4–31 (2007) 7. Herrera, L.J., Pomares, H., Rojas, I., Valenzuela, O., Prieto, A.: TaSe, a Taylor seriesbased fuzzy system model that combines interpretability and accuracy. Fuzzy Sets and Syst. 153, 403–427 (2005) 8. Papadakis, S.E., Theocharis, J.B.: A GA-based fuzzy modeling approach for generating TSK models. Fuzzy Sets Syst. 131, 121–152 (2002) 9. Huang, W., Ding, L., Oh, S.K., Jeong, C.W., Joo, S.C.: Identification of Fuzzy Inference System Based on Information Granulation. KSII Transactions on Internet and Information Systems 4, 575–594 (2010) 10. Deb, K., Pratab, A., Agrawal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6, 182–197 (2002) 11. Delgado, M., Ceullar, M.P., Pegalajar, M.C.: Multiobjective Hybrid Optimization and Training of Recurrent Neural Networks. IEEE Trans. Syst., Man Cybern. –Part B 38, 381–403 (2008)
Design of FRBFNNs with the Aid of Multi-objective Optimization
273
12. Pedrycz, W., Kwak, K.C.: Linguistic models as a framework of user-centric system modeling. IEEE Trans. Syst., Man Cybern. –Part A: Systems and Humans 36, 727–745 (2006) 13. Pedrycz, W., Park, H.S., Oh, S.K.: A granular-oriented development of functional radial basis function neural networks. Neurocomputing 72, 420–435 (2008) 14. Chris, H.D., Burges, J.C., Kaufman, L., Smola, A., Vapnik, V.: Support vector regression machines. Adv. Neural Inform. Process Syst. 9, 155–161 (1997) 15. Friedman, J.H.: Multivariate adaptive regression splines. Ann. Stat. 19, 1–67 (1991)
The Research of Power Battery with Ultra-capacitor for Hybrid Vehicle Jia Wang1,2 , Yingchun Wang3 , and Guotao Hui3, 1
School of Flight Vehicle Engineering, Beijing Institute of Technology, Beijing, China 2 Shang Hai EV-TECH New Energy Technology Co. Ltd, Shanghai, China 3 College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning, 110819, P.R. China [email protected], [email protected], [email protected]
Abstract. In this note, a novel power battery with ultra-capacitor method is proposed to drive hybrid power system. By using such method, the open problem that high power density and energy density which not be provided simultaneously by the power source of electric vehicles can be solved. Furthermore, the relevant performance of electric vehicles is improved which based on the proposed method. Finally, lots of test data in practical situation can supply the strong evidence for our design. Keywords: Hybrid vehicle, Hybrid power system, Ultra-capacitor.
1
Introduction
with the advancement in the automation industry, the research of electric vehicles has grown in recent years, for example see in [1]- [5]. Currently, the main problem which to obstruct development of electric vehicles is the performance of battery. Simply speaking, there is no suitable battery which can satisfy the energy requirement for electric vehicles. All kinds of energies on the market can not provide high power density and energy density at the same time. Therefore, a power battery with ultra-capacitor strategy is proposed in this note for solving such open problem. It is well known that the ultra-capacitor has some noticeable advantages, such as the high discharge current in the short time, long cycle life, high efficiency of regenerative braking and wide working range [6]. Based on these advantages, Hybrid power system which driven by power battery with ultra-capacitor can not only take the advantage of the load balance function of ultra-capacitor, but also lower the discharge current of battery pack. Moreover, it can increase greatly the usage efficiency of energy, extend remarkably the life of battery, and improve the power and driving range of vehicle. So the power battery with ultra-capacitor strategy can improve the relevant performance of electric vehicles. In this note, we show all the research results according to the some performance index and test data. The reason for that is the use of high-tech smart chip, which is
Corresponding author.
D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 274–279, 2011. c Springer-Verlag Berlin Heidelberg 2011
The Research of Power Battery with Ultra-capacitor for Hybrid Vehicle
275
designed by O2 Security Ltd. I have signed a confidentiality agreement with this company. Therefore, the related neural computation process is solidified to the chip, and could be reflected the corresponding data which supplied by us.
2
Power System Structure
The energy system of hybrid vehicle driven by the power battery and ultracapacitor is shown as Fig.1. The control strategy of the hybrid vehicle is described as the following three cases: Case 1. The power battery provides the electric energy for the motor during the hybrid vehicle running on the normal condition. For making the ultra-capacitor own the output ability at anytime, the hybrid vehicle must run under constant speed. At the moment, the power battery not only provides the electric energy for the motor, but also provides the electric
Fig. 1. The physics structure of power system
Fig. 2. The vehicle energy distribution on the normal condition
Fig. 3. The vehicle energy distribution on acceleration or climbing condition
276
J. Wang, Y. Wang, and G. Hui
Fig. 4. The vehicle energy distribution on brake or downhill condition
energy for the ultra-capacitor. The case 1 is shown in Fig.2. Case 2. When the hybrid vehicle accelerates or climbs, the power battery and ultra-capacitor provide electric energy for the motor together. The case 2 is shown in Fig.3. Case 3. When the hybrid vehicle brakes or goes downhill, the motor work as the generator, the regenerative braking energy is charged to the ultra-capacitor. If the ultra-capacitor is full, the regenerative braking energy is charged to the battery. The case 3 is shown in Fig.4.
3
Main Results
In this section, a novel power battery with ultra-capacitor method is proposed to drive hybrid power system. At first, the hybrid power system can be described by the following parameters, which is shown in Table 1: Table 1. Hybrid power system Name Parameter Maximum speed 80km/h Range 100km Grade ability Higher than 0.25 0 − 60km/h acceleration time Lower than 12s Gross Vehicle Weight (GVW) Lower than 1000KG
3.1
Motor
When vehicle is driving at a constant speed from 20 to 80 kilometer per hour, the matching relation between the motor power and the vehicle speed can be obtained in Table 2: Table 2. The matching relation between vehicle speed and motor power Vehicle speedkm/h 20 30 40 50 60 70 80 motor powerkW 2.51 3.42 4.5 5.81 7.41 9.37 12.74
The Research of Power Battery with Ultra-capacitor for Hybrid Vehicle
277
Vehicle power at the maximum speed should be less than the rated power, the motor rated power is set at 15KW and peak power is set at 35kw. The 0 − 60km per hour acceleration time is less than 12s, then the design requirements are satisfied. The designed vehicle has the 25 grade ability at the speed of 20km per hour and at this time the running speed of motor is 2056r/min. According to the equation of vehicle design, vehicle must be meet the equation during climbing: Ft ≥ Ff + Fw + Fi .
(1)
The result of (1) is that the motor should provide 2757.6 Newton. Then the result of motor torque is 87 (Newton·meter). Furthermore, when electric vehicles climb 0.25 slope at 20km per hour, the required power is 26.18Kw, less than the peak power of 35Kw. To sum up, the rated power of motor is 15KW, the peak power of motor is 35KW, the rated speed of motor is 2500r/min, the maximum torque of motor is 100(Newton·meter), the rated voltage of motor is 288V. 3.2
Power Batteries with Ultra-capacitor
The determinacy of the number of batteries needs to consider the maximum output power of battery and battery output power in the process of traffic, which are determined by control strategy. In practical case, the power battery is considered as LiM P O4 (Lithium Iron Phosphate). The battery cell voltage is 3.2V.The battery cell energy is 12A · hour. The battery pack is made up of four battery modules in parallel. Each battery modules includes ninetyone battery cells in series. The total battery pack energy is 14Kwh.The total battery pack volume is 120L.The main parameters of battery cell are showed in Table 3: Table 3. The information of LiMPO4 battery model 28/82/118 1 Rated voltage 3.2V 2 Rated energy 12Ah 3 Resistance < 5mΩ 4 The limit voltage of Discharge 2.0V 5 The limit voltage of charge 3.6V 6 Maximum sustained Discharge current 3C(50A) 7 Transient discharge current 7C(82A, 10S) 8 The Max charge Current 2C(24A), T ransient5C(60A), 10S 9 Weight 0.5Kg 10 Size 28.5 × 82.5 × 118
The determinacy of the number of ultra-capacitor is mainly affected by the ultra-capacitor charge time and discharge time, the vehicle’s braking, acceleration
278
J. Wang, Y. Wang, and G. Hui
time, as well as by output power of ultra-capacitor which is designed by control strategy. They need to satisfy the following formulation: 2 2 0.5n1 n2 Cc (Ucmax − Ucmin ) ≥ Pc t,
(2)
in which n1 is serial ultra-capacitor number in each group; n2 is the number of ultra-capacitors series group; Cc is cell capacity of each ultra-capacitor; Ucmax is the maximum working voltage of each capacitor; Ucmin is the minimum working voltage of each capacitor; Pc is output power of the ultra-capacitor. The voltage of ultra-capacitor is 2.7V, the capacity of ultra-capacitor is 650F. The ultracapacitor module is made up of 24 pieces in series .The ultra-capacitor group is made up of five ultra-capacitor modules in series. The whole ultra-capacitor pack is made up of three ultra-capacitor groups in parallel. The main parameters of ultra-capacitor module are in the following two tables. Table 4. Ultra-capacitor cell parameter Rated capacity 650 F Capacity deviations +0.2 or -0.5 Rated voltage 2.7 V Operating temperature range −40 ∼ +60.5 degree AC resistance 0.6 mΩ
Table 5. Ultra-capacitor module parameters Rated voltage The total number of ultra-capacitor Rated capacity DC resistance Serial number
4
129.6 F 48 piece 13.5 F ≤ 50 mΩ 48 piece
The Tests in Practical Situation
Electric range test data is showed in the following table 6: Table 6. Range test data range(km) Continued driving timemin Driving speed(km/h) the ultra-capacitor 110 138 48.6 not open 117 145 49.4 open
Acceleration time test data is showed in the following table 7:
The Research of Power Battery with Ultra-capacitor for Hybrid Vehicle
279
Table 7. Range test data range(s) the ultra-capacitor 10.8 not open 10.4 open
According to the above test results, our strategy is more effective and powerful.
5
Conclusions
In this note, the battery power with ultra-capacitor strategy has been proposed for handling hybrid vehicles. Compared to ordinary battery electric vehicles, the vehicle power which by using our method is significantly strengthened and the range is extended.
References 1. Camara, M.B., Gualous, H., Gustin, F., Berthon, A., Dakyo, B.: DC/DC Converter Design for Supercapacitor and Battery Power Management in Hybrid Vehicle Applications Polynomial Control Strategy. IEEE Transactions on Industrial Electronics 5(2), 587–597 (2010) 2. Khan, F.H., Tolbert, L.M., Webb, W.E.: Hybrid Electric Vehicle Power Management Solutions Based on Isolated and Nonisolated Configurations of Multilevel Modular Capacitor-Clamped Converter. IEEE Transactions on Industrial Electronics 56(8), 3079–3095 (2009) 3. Kim, N., Cha, S., Peng, H.: Optimal Control of Hybrid Electric Vehicles Based on Pontryagin’s Minimum Principle. IEEE Transactions on Industrial Electronics 99(1), 1–9 (2010) 4. Yoo, H., Sul, S.-K., Park, Y., Jeong, J.: System Integration and Power-Flow Management for a Series Hybrid Electric Vehicle Using Supercapacitors and Batteries. IEEE Transactions on Industrial Electronics 44(1), 108–114 (2008) 5. Mapelli, F.L., Tarsitano, D., Mauri, M.: Plug-In Hybrid Electric Vehicle: Modeling, Prototype Realization, and Inverter Losses Reduction Analysis. IEEE Transactions on Industrial Electronics 57(2), 598–607 (2010) 6. Chen, Q.: Advanced electric vehicle technology. Chemistry Industry Publishing Company (2007)
Stability Analysis of Genetic Regulatory Networks with Mixed Time-Delays Zhanheng Chen1,2 and Haijun Jiang1 1
College of Mathematics and System Sciences, Xinjiang University, Urumqi, 830046, China 2 Department of Mathematic, YiLi Normal University, YiNing, 835000, China [email protected], [email protected]
Abstract. In this paper, a class of genetic regulatory networks with discrete delays and distributed delays is investigated. By applying the Young inequality and introducing many real parameters, a series of new and useful criteria on the existence, uniqueness of equilibrium point and global asymptotical stability are established. Finally, an example is given to illustrate the result obtained in this paper. Keywords: Genetic regulatory networks, Discrete and distributed delays, Global asymptotical stability, Equilibrium point.
1
Introduction
Genetic networks are biochemically dynamical systems, and it is natural to model genetic networks by using dynamical system, which provide a powerful tool for studying gene regulation processes in living organisms. Basically, there are two types of genetic network models: the Boolean model [1, 2] and the differential equation model [3-8]. A general differential equation for genetic networks has the following form: x˙ i (t) = fi (x), i = 1, 2, · · · , n,
(1)
where x = (x1 , x2 , . . . , xn ) denotes the concentration of gene products, such as proteins, mRNAs or other chemical complexes, and n is the total number of chemical species in the cell, fi (x) is the synthetic rate of species i. A special case of (1) can be written as [9]: M˙ i (t) = −ai Mi (t) + Gij gj (Pj (t)) + Ii , (2) j=1 P˙i (t) = −ci Pi (t) + di Mi (t), i = 1, 2, · · · , n, where Mi (t), Pi (t) are the concentrations of mRNA and protein of the ith node; ai , ci are the degradation rates of the mRNA and protein, di is a constant; g(P ) = (P/β)H /[1 + (P/β)H ] is the Hill form regulatory function, and D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 280–289, 2011. c Springer-Verlag Berlin Heidelberg 2011
Stability Analysis of Genetic Regulatory Networks
281
G = (Gij ) ∈ Rn×n is the coupling matrix of the genetic network, which is defined as follows: if there is no link from node i, Gij = 0; if transcription factor j is a activator of gene i, Gij = αij ; if transcription factor j is a repressor of gene i, Gij = −αij . Ii is the transcriptional rate of the repressor of gene i. Generally, a cellular system is characterized with significant time delays in gene regulation, in particular, for the transcription, translation, diffusion, and translocation processes [10]. In fact, transportion can be modeled as diffusive or active in nature. The active can be modeled with a time delay. If we assume that each macromolecule takes a short length of time to translocate from its place of synthesis to the location where it exerts an effect, the time delay can be discrete. However, many large-scale genetic networks usually have a long spatial length. Therefore, there will be a distribution of conduction velocities along these paths. A distribution transportation is only modeled with discrete delays appear to be somewhat unreasonable, a more satisfactory hypothesis is to incorporate continuously distributed delays. Up to now, genetic regulatory networks with or without discrete delays have received much attention in the biological and biomedical sciences (see [11-19] and references therein), but the study of distributed delays in genetic regulatory networks is lacking. In this paper, we analysis a class of improved genetic regulatory networks with mixed time-delays by using a method different from other literatures: M˙ i (t) = −ai Mi (t) + +
n j=1
bij
n
ωij gij (Pj (t − τij ))
j=1 t
−∞
kj (t − s)fij (Pj (s))ds + Ii ,
P˙i (t) = −ci Pi (t) + di Mi (t − δij ) + ei
t −∞
(3)
ki (t − s)Mi (s)ds,
where τij and δij are the time-varying delays, W = (ωij ) ∈ Rn×n and B = (bij ) ∈ Rn×n are the coupling matrix of the genetic network, ki is the delay kernel, gij (x) and fij (x) are the Hill form regulatory functions. The main purpose of this paper is to study the existence, uniqueness of equilibrium point and its global asymptotical stability of system (3) by applying the Young inequality and introducing many real parameters. The rest of this paper is organized as follows. In Section 2, some preliminaries are given. The main results are derived in Section 3 and Section 4. In Section 5, an example is given to illustrate the derived results. In Section 6, some conclusions are given.
2
Some Preliminaries
In this paper, for system (3) we introduce the following assumptions. (H1 ) There are constants λij , μij > 0 such that for any u1 , u2 ∈ R, |fij (u1 ) − fij (u2 )| ≤ μij |u1 − u2 |,
|gij (u1 ) − gij (u2 )| ≤ λij |u1 − u2 |.
282
Z. Chen and H. Jiang
(H2 ) Assume that the kernels ki (i = 1, 2, · · · , n) are real-valued non-negative continuous integrable defined on [0, ∞) and satisfy ∞ ∞ ki (s)ds = 1, ski (s)ds < ∞. 0
0
(H3 ) There exist constants rij , sij , pij , qij , νij , ϑij , Rij , Iij ∈ R, hj > 0 (i, j = 1, 2, · · · , n) such that for r > 1, n n r−pij r−sij ai ci r−1 1 p − |ωij | r−1 λijr−1 − |ωji |sji λjiji di + e i r j=1 r j=1 n n r−Iij r−Rij r−1 1 I − |bij | r−1 μijr−1 − |bji |Rji μjiji > 0, r j=1 r j=1
rai − −
n
n
q r hj ij di ij
− (r − 1)
j=1 ν ϑ hj ij ei ij
− (r − 1)
j=1 n
n
p
λjiji |ωji |sji −
j=1
−(r − 1)
r−sij r−1
r−pij
λijr−1
j=1
|bij |
n
n
r−Rij r−1
r−Iij
μijr−1 > 0, I
|bji |Rji μjiji
j=1
r−νij r−1
hj
r−ϑij r−1
ei
− (r − 1)
j=1
3
|ωij |
j=1
rhj ci −
j=1
n
n
n
r−qij
r−rij
hj r−1 di r−1 > 0.
j=1
Existence and Uniqueness of Equilibrium Point
Assume that system (3) satisfies the following initial conditions Mi (θ) = φi (θ),
Pi (θ) = ψi (θ),
θ ∈ (−∞, 0],
(4)
where φi and ψj are continuous real valued functions defined on (−∞, 0]. Theorem 1. Assume that (H1 )− (H3 ) hold, then system (3) exists a unique equilibrium point (M ∗ , P ∗ ), where M ∗ = (M1∗ , · · · , Mn∗ ), P ∗ = (P1∗ , · · · , Pn∗ ). Proof. If (M ∗ , P ∗ ) is an equilibrium of system (3), then (Mi∗ , Pi∗ ) satisfies −ai Mi∗ + −ci Pi∗
n
ωij gij (Pj∗ ) +
j=1
+
di Mi∗
+
ei Mi∗
From (5), we get Mi∗ =
n j=1
= 0,
bij fij (Pj∗ ) + Ii = 0,
(5)
i = 1, 2, · · · , n.
ci P ∗, di + ei i
(6)
Stability Analysis of Genetic Regulatory Networks
283
which shows that if Pi∗ exists and is unique, Mi∗ also exists and is unique, so we only to prove the existence and uniqueness of Pi∗ . Define H(P ) = (H1 (P ), . . . , Hn (P ))T , P = (P1 , . . . , Pn )T ∈ Rn , where Hi (Pi ) = −
ai ci Pi + ωij gij (Pj ) + bij fij (Pj ) + Ii , i = 1, 2, ..., n. (7) di + e i n
n
j=1
j=1
From (7), in order to prove the existence and uniqueness of Pi∗ , we must prove that H is a homeomorphism of Rn×n onto itself. In other words, we will prove the following statements [20]: (i) If Pi = Pi , then Hi (Pi ) = Hi (Pi ). (ii) H(P ) → ∞ as P → ∞. Firstly, we prove (i). If there exist Pi , Pi ∈ Rn satisfying Pi = Pi such that Hi (Pi ) = Hi (Pi ), i = 1, 2, ..., n.
(8)
Then, −
n n ai c i (Pi − Pi ) = ωij (gij (Pj ) − gij (Pj )) + bij (fij (Pj ) − fij (Pj )). di + ei j=1 j=1
According to assumption (H1 ), we have ai ci |Pi − Pi | ≤ |ωij |λij |Pj − Pj | + |bij |μij |Pj − Pj |. di + e i n
n
j=1
j=1
(9)
Multiply both side of equation of (9) by |Pi − Pi |r−1 , we have ai ci |Pi − Pi |r ≤ |ωij |λij |Pj − Pj ||Pi − Pi |r−1 di + ei j=1 n + |bij |μij |Pj − Pj ||Pi − Pi |r−1 . n
j=1
By using Young inequality [5, 21, 22], we further have n j=1
|ωij |λij |Pj − Pj ||Pi − Pi |r−1
r−pij r−sij r−1 1 p ≤ |ωij | r−1 λijr−1 |Pi − Pi |r + |ωij |sij λijij |Pj − Pj |r , r j=1 r j=1
n j=1
≤
n
n
(10)
|bij |μij |Pj − Pj ||Pi − Pi |r−1
r−Rij r−Iij r−1 1 |bij | r−1 |μij | r−1 |Pi − Pi |r + |bij |Rij |μij |Iij |Pj − Pj |r . r j=1 r j=1
n
n
(11)
284
Z. Chen and H. Jiang
From (9), (10) and (11), we have n n n r−pij r−sij ai c i r−1 1 p − |ωij | r−1 λijr−1 − |ωji |sji λjiji d r r i + ei i=1 j=1 j=1 n n r−Iij r−Rij r−1 1 I − |bij | r−1 μijr−1 − |bji |Rji μjiji |Pi − Pi |r ≤ 0. r j=1 r j=1
(12)
From (H3 ) and (12), |Pi − Pi | = 0, which leads to a contradiction. Secondly, we prove (ii). From (7), we obtain Hi (Pi ) − Hi (0)sign(Pi )|Pi |r−1 n n n r−pij r−sij ai ci r−1 1 p ≤− − |ωij | r−1 λijr−1 − |ωji |sji λjiji d r r i + ei i=1 j=1 j=1 n n r−Iij r−Rij r−1 1 I − |bij | r−1 μijr−1 − |bji |Rji μjiji |Pi |r . r j=1 r j=1
(13)
From (H3 ), there exists σi > 0 such that n
σi |Pi |r ≤ −
i=1
n
Hi (Pi ) − Hi (0)sign(Pi)|Pi |r−1
i=1
≤
n
(14)
Hi (Pi ) − Hi (0)|Pi |r−1 .
i=1
Let
1 r
+
1 q
= 1 and σ = min{σi }. By using Young inequality, σP r ≤ H(P ) − H(0)r ≤ H(P )r − H(0)r .
Evidently, when P → ∞, H(P ) → ∞. From above analysis, H is a homeomorphism of Rn onto itself, then H(P ) = 0 exists a unique solution of P ∗ . Hence, system (3) exists a unique equilibrium point (M ∗ , P ∗ ).
4
Stability Analysis
In this section, we consider the global asymptotic stability of the equilibrium point of system (3). Let m(t) = M (t) − M ∗ and p(t) = P (t) − P ∗ , we have m ˙ i (t) = −ai mi (t) +
n
ωij gij (pj (t − τij )) +
j=1
p˙i (t) = −ci pi (t) + di mi (t − δij ) + ei
t −∞
n j=1
bij
t
−∞
kj (t − s)f ij (pj (s))ds,
ki (t − s)mi (s)ds,
(15)
where gij (pj (t)) = gij (Pj (t)) − gij (Pj∗ ) and f ij (pj (t)) = fij (Pj (t)) − fij (Pj∗ ).
Stability Analysis of Genetic Regulatory Networks
285
Theorem 2. Suppose that (H1 )-(H3 ) hold. Then equilibrium point (M ∗ , P ∗ ) of system (3) is globally asymptotically stable. Proof. We construct the following Lyapunov functional V (t) =
n
|mi (t)|
i=1 0 n n
1 + r
sr−1 ds + p
n n
λijij |ωij |sij
i=1 j=1
i=1 j=1 t
|pi (t)|
hj sr−1 ds
0
|pj (s)|r ds
t−τij
n n 1 qij rij t + h j di |mi (s)|r ds r t−δij i=1 j=1 t n n 1 νij ϑij ∞ + hj ei ki (s) |mi (z)|r dzds r i=1 j=1 0 t−s ∞ t n n 1 Rij Iij + |bij | μij kj (s) |pj (z)|r dzds. r i=1 j=1 0 t−s
(16)
Calculating the upper right derivative of V (t) along the solution of (3), we have D + V (t) ≤ −
n
ai |mi (t)|r +
i=1
+
n n
|bij |
i=1 j=1
−
n n
+
i=1 j=1
−∞
hj ei
|ωij |λij |pj (t − τij ||mi (t)|r−1
i=1 j=1 t
kj (t − s)μij |pj (s)||mi (t)|r−1 ds
hj ci |pi (t)|r +
i=1 j=1
n n
n n
n n
hj di mi (t − δij )|pi (t)|r−1
i=1 j=1 t −∞
ki (t − s)mi (s)|pi (t)|r−1 ds
n n n n 1 pij 1 pij sij r + λij |ωij | |pj (t)| − λij |ωij |sij |pj (t − τij )|r r r i=1 j=1 i=1 j=1 n n n n 1 qij rij 1 q r + hj di |mi (t)|r − hj ij di ij |mi (t − δij )|r r r i=1 j=1 i=1 j=1 n n 1 νij ϑij ∞ + h e ki (s)|mi (t)|r ds r i=1 j=1 j i 0 n n 1 νij ϑij ∞ − hj ei ki (s)|mi (t − s)|r ds r i=1 j=1 0 ∞ n n 1 I + |bij |Rij μijij kj (s)|pj (t)|r ds r i=1 j=1 0 ∞ n n 1 Rij Iij − |bij | μij kj (s)|pj (t − s)|r ds. (17) r i=1 j=1 0
286
Z. Chen and H. Jiang
Estimating the right of (17) by using Young inequality. Since n n i=1 j=1
≤
|ωij |λij |pj (t − τij ||mi (t)|r−1
r−pij r−sij r − 1 1 p |ωij | r−1 λijr−1 |mi (t)|r + |ωij |sij λijij |pj (t − τij )|r , r i=1 j=1 r i=1 j=1
n
n
n
n
(18) n n i=1 j=1
hj di mi (t − δij )|pi (t)|r−1
r−rij ij r − 1 r−q 1 qij rij ≤ hj r−1 di r−1 |pi (t)|r + h d |mi (t − δij )|r , r i=1 j=1 r i=1 j=1 j i
n n
n
n
t
hj ei
−∞ i=1 j=1 n n
r−1 ≤ r
n
n
(19)
ki (t − s)mi (s)|pi (t)|r−1 ds r−νij r−1
hj
r−ϑij r−1
ei
i=1 j=1 n n
1 |pi (t)| + r i=1 j=1
|bij |
n
n
r
t
ν
−∞
ϑ
ki (t − s)hj ij ei ij |mi (t)|r ds. (20)
t
−∞ i=1 j=1 n n
kj (t − s)μij |pj (s)||mi (t)|r−1 ds
r−Iij r−Rij r−1 |bij | r−1 μijr−1 |mi (t)|r r i=1 j=1 n n 1 t I + kj (t − s)|bij |Rij μijij |pj (s)|r ds, r i=1 j=1 −∞
≤
(21)
Then, we finally have n
n n 1 qij rij 1 νij ϑij h j di + h e r j=1 r j=1 j i i=1 n n r−pij r−Iij r−sij r−Rij r−1 r−1 + |ωij | r−1 λijr−1 + |bij | r−1 μijr−1 |mi (t)|r r j=1 r j=1 (22) n n n n 1 pji 1 I + − hj ci + λji |ωji |sji + |bji |Rji μjiji r j=1 r j=1 i=1 j=1 n n r−ν r−ϑ r−q ij ij ij r − 1 r−1 r−1 r − 1 r−1ij r−r + hj ei + hj di r−1 |pi (t)|r . r j=1 r j=1
D + V (t) ≤
− ai +
From (H3 ) and (22), there exist positive constants ξi and ηi such that D+ V (t) ≤ −
n i=1
ξi |mi (t)|r −
n i=1
ηi |pi (t)|r .
Stability Analysis of Genetic Regulatory Networks
287
Consequently, V (x, y)(t) +
n
t
ξi
|mi (t)|r ds +
t0
i=1
n
t
|pi (t)|r ds ≤ V (x, y)(t0 ).
ηi
(23)
t0
j=1
It follows from (16) and (23) that V (t) is bounded on (0, ∞), which implies that mi , pi are bounded on (0, ∞). From (16), mi , pi are also bounded on (0, ∞) and mi , pi are uniformly continuous (0, ∞). It follows from (23) that n
|mi (t)|r +
i=1
n
|pi (t)|r ∈ L1 (0, ∞).
(24)
i=1
n
n The uniform continuity of i=1 |mi (t)|r + j=1 |pj (t)|r on (0, ∞) together with (24) implies that n n |mi (t)|r + |pi (t)|r → 0. (25) i=1
i=1
From (23) and (25), mi (t) → 0, pi (t) → 0 as t → ∞ (i = 1, 2, · · · , n).
Remark 1. In (H3 ), the first inequality is to ensure the existence and uniqueness of the equilibrium of system (3) and the last two inequalities is to ensure the global stability of this equilibrium. Further, a large number of parameters are introduced in order to improve the generality, integrity and elegance of the obtained results. In practice, we can choose appropriate parameters to simplify those conditions for convenience.
5
Numerical Simulations
In this section, we will give an example to illustrate the correctness of our results. In system (3), select fij (pj ) = gij (pj ) =
p2j , 1+p2j
the Hill coefficient is 2. Obviously,
fij (pj ) and gij (pj ) are bounded and satisfy the Lipschitz conditions with the Lipschitz constants λij = μij = 0.65. Further, taken ω11 = −1.2, ω12 = 1.5, ω21 = −1.7, ω22 = 2, b11 = −0.8, b12 = 1.4, b21 = 1.6, b22 = −0.7, τij = 0.1, a1 = 4, a2 = 5, c1 = 2, c2 = 2.5, d1 = 1, d2 = 1.5, e1 = 1, e2 = 1.5 and ki (s) = e−s . Evidently, (H1 ) and (H2 ) are satisfied. In addition, choose rij = sij = pij = qij = νij = ϑij = Rij = Iij = hj = 1, r = 2, then 2ai − 2
2 i=1
2 j=1
ci −
di − 2 i=1
2 j=1
ei −
2
|ωij |λij −
j=1 2
λij |ωij | −
i=1
|bij |μij −
2
|bij |μij > 0
j=1 2 i=1
ei −
2
di > 0, i = 1, 2.
i=1
Therefore, (H3 ) also holds and the unique equilibrium point of system (3) is M ∗ = (0.3607, 0.4580)T , P ∗ = (0.3607, 0.5496)T . Therefore, by Theorem 2, the equilibrium point of system (3) is globally asymptotically stable. Simulation results are depicted in Figures 1-4.
288
Z. Chen and H. Jiang 0.5
0.8
0.4
0.6 0.4
2
mRNA concentrations (M )
1
mRNA concentrations (M )
0.3 0.2 0.1 0 −0.1 −0.2
0.2 0 −0.2 −0.4
−0.3 −0.6
−0.4 −0.5
0
2
4
6
8
10 t
12
14
16
18
−0.8
20
Fig. 1. Trajectories of M1 (t)
0
2
4
6
8
10 t
12
14
16
18
20
Fig. 2. Trajectories of M2 (t)
1 0.8
1
protein concentrations (P2)
1
protein concentrations (P )
0.6 0.4 0.2 0 −0.2 −0.4
0.5
0
−0.5
−0.6 −1
−0.8 −1
0
2
4
6
8
10 t
12
14
16
Fig. 3. Trajectories of P1 (t)
6
18
20
0
5
10 t
15
20
Fig. 4. Trajectories of P2 (t)
Conclusions
In this paper, by introducing many real parameters and applying Young inequality, we obtain a series of new criteria on the existence of equilibrium point, globally asymptotical stability for genetic regulatory networks with discrete delays and distributed delays. The results obtained in this paper are new and useful, and it is easy to check and apply our results in practice.
Acknowledgments This work was supported by the National Natural Science Foundation of P.R. China (60764003, 10961022, 10901130), the Scientific Research Programmes of Colleges in Xinjiang (XJEDU2007G01, XJEDU2008I35) and Youth Project in Yili Normal University (2009-33).
References 1. Smolen, P., Baxter, D., Byrne, J.: Mathematical modeling of gene networks. Neuron. 26, 567–580 (2000) 2. Kobayashi, T., Chen, L., Aihara, K.: Modeling genetic switches with positive feedback loops. J. Theor. Biol. 221, 379–399 (2002)
Stability Analysis of Genetic Regulatory Networks
289
3. Ren, F., Cao, J.: Asymptotic and robust stability of genetic regulatory networks with time-varying delays. Neurocomputing 71, 834–842 (2008) 4. He, W., Cao, J.: Robust stability of genetic regulatory networks with distributed delay. Cogn. Neurodyn. 2, 355–361 (2008) 5. Chen, L., Aihara, K.: Stability of genetic regulatory networks with time delay. IEEE Trans. Circuits Syst. I, Fundam. Theory Appl. 49, 602–608 (2002) 6. Jong, H.: Modelling and simulation of genetic regulatory systems: A literature review. J. Comp. Biol. 9, 67–103 (2002) 7. Bolouri, H., Davidson, E.H.: Modelling transcriptional regulatory networks. BioEssay 24, 1118–1129 (2002) 8. Wang, R., Zhou, T., Jing, Z., Chen, L.: Modelling periodic oscillation of biological systems with multiple time scale networks. Systems Biol. 1, 71–84 (2004) 9. Li, C., Chen, L., Aihara, K.: Stability of genetic networks with SUM regulatory logic: lure system and LMI approach. IEEE Trans. Circuits Syst. I 53, 2451–2458 (2006) 10. Elowitz, M.B., Levine, A.J., Siggia, E.D.: Stochastic gene expression in a single cell. Science 297, 1183–1186 (2002) 11. Becskei, A., Serrano, L.: Engineering stability in gene networks by autoregulation. Nature 405, 590–593 (2000) 12. Cao, J., Ren, F.: Exponential stability of discrete-time genetic regulatory networks with delays. IEEE Trans. Neural Networks 19, 520–523 (2008) 13. Elowitz, M.B., Leibler, S.: A synthetic oscillatory network of transcriptional regulators. Nature 403, 335–338 (2000) 14. Gardner, T.S., Cantor, C.R., Collins, J.: Construction of a genetic toggle switch in Escherichia coli. Nature 403, 339–342 (2000) 15. Grammaticos, B., Carstea, A.S., Ramani, A.: On the dynamics of a gene regulatory network. J. Phys. A Math. Gen. 39, 2965–2971 (2006) 16. MacDonald, N.: Biological delay systems: linear stability theory. Cambridge University Press, Cambridge (1989) 17. Smolen, P., Baxter, D.A., Byrne, J.H.: Modeling transcriptional control in gene networksmethods, recent results, and future directions. Bull. Math. Biol. 62, 247– 292 (2000) 18. Smolen, P., Baxter, D.A., Byrne, J.H.: Mathematical modeling of gene networks. Neuron. 26, 567–580 (2000) 19. Wei, G., Wang, Z., Shu, H., Fraser, K., Liu, X.: Robust filtering for gene expression time series data with variance constraints. Int. J. Comput. Math. 84, 619–633 (2007) 20. Hu, C., Jiang, H., Teng, Z.: Globally Exponential Stability for Delayed Neural Networks Under Impulsive Control. Neural Process Letter 31, 105–127 (2010) 21. Cao, J.: New results concerning exponential stability and periodic solutions of delayed cellular neural networks. Phys. Lett. A 307, 136–147 (2003) 22. Hardy, G., Littlewood, H., Polya, G.: Inequality. Cambridge University Press, Cambridge (1934)
Representing Boolean Functions Using Polynomials: More Can Offer Less Yi Ming Zou Department of Mathematical Sciences University of Wisconsin-Milwaukee Milwaukee, WI 53201, USA
Abstract. Polynomial threshold gates are basic processing units of an artificial neural network. When the input vectors are binary vectors, these gates correspond to Boolean functions and can be analyzed via their polynomial representations. In practical applications, it is desirable to find a polynomial representation with the smallest number of terms possible, in order to use the least possible number of input lines to the unit under consideration. For this purpose, instead of an exact polynomial representation, usually the sign representation of a Boolean function is considered. The non-uniqueness of the sign representation allows the possibility for using a smaller number of monomials by solving a minimization problem. This minimization problem is combinatorial in nature, and so far the best known deterministic algorithm claims the use of at most 0.75 × 2n of the 2n total possible monomials. In this paper, the basic methods of representing a Boolean function by polynomials are examined, and an alternative approach to this problem is proposed. It is shown that it is possible to use at most 0.5 × 2n = 2n−1 monomials based on the {0, 1} binary inputs by introducing extra variables, and at the same time keeping the degree upper bound at n. An algorithm for further reduction of the number of terms that used in a polynomial representation is provided. Examples show that in certain applications, the improvement achieved by the proposed method over the existing methods is significant. Keywords: artificial neural networks, Boolean neurons, Boolean functions, polynomial representations.
1
Introduction
In this paper, we consider the problem of using a smaller number of monomial terms in the expression of a polynomial threshold neuron with binary inputs [8]. These neurons correspond to Boolean functions, and it is known that any Boolean function can be represented by a polynomial function. There exist three basic methods for representing a Boolean function by a polynomial in n variables. The first one is to consider a Boolean function as a function {0, 1}n −→ {0, 1}, and in this case the representation is unique [14]. The second one is to consider a Boolean function as a function {−1, 1}n −→ {−1, 1}, and in D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 290–296, 2011. c Springer-Verlag Berlin Heidelberg 2011
Representing Boolean Functions Using Polynomials
291
this case, the representation is also unique [15, 18] (see also the discussion later in this section). The third, called sign representation, which is not unique and allows flexibility in the choice of a polynomial representation, has the input set {−1, 1}n and the output set given by all nonzero real numbers1 . Recall that a polynomial p : {−1, 1}n −→ R is said to be sign representing a Boolean function f : {−1, 1}n −→ {−1, 1} if f = sign(p) for all vectors in {−1, 1}n. Since for the values 0 and 1 we have x2 = x, and for the values ±1 we have x2 = 1, in all three cases, there are 2n linearly independent monomials of the form xi1 xi2 · · · xik , 1 ≤ i1 < i2 < · · · < ik , 0 ≤ k ≤ n,
(1)
such that any other polynomial is a linear combination of these 2n monomials (when k = 0, the corresponding monomial is 1). If a polynomial function f : {−1, 1}n −→ {−1, 1} is given by its values on the n 2 input vectors by a column vector b = (b1 , b2 , . . . , b2n )T , then the coefficients of the linear combination (in terms of the monomials in (1) in the right order, see [16]) are given uniquely by 21n H2n b, where H2n is the Hadamard matrix of order 2n . For a sign representation, we can replace the vector b by any column vector c such that the signs of its entries match those of b (we say that c sign represents b), that is, if c = (c1 , c2 , . . . , c2n )T then sign(ci ) = bi , i = 1, 2, ..., 2n . If c sign represents b, then 21n H2n c will provide a sign representation for the Boolean function f . Consider the system of equations H2n x = c. If we let Db be the diagonal matrix with the diagonal entries given by the entries of b, then c := Db · c has all positive entries. We write c > 0 if the entries of c are all positive. The problem of finding a polynomial sign representation with the smallest number of terms can be formulated as the following optimization problem: Min||x||0
subject to
Db H2n x > 0,
(2)
where the 0 norm counts the nonzero entries of x. This minimization problem offers a rich theory and it is related to complexity problems [1, 4, 6, 11, 15]. Yet, in spite of all the attentions this problem has received [3, 5, 7, 9, 10, 13, 17], there was no deterministic algorithm toward this minimization problem until the recent work [12], where a deterministic algorithm claims the use of at most 0.75 × 2n monomials. However, as pointed out in [12], the algorithm proposed there is computationally costly, in particular, comparing with the simplicity of finding the exact solution. In contrary, due to the fact that when a Boolean function is considered as a polynomial function {0, 1}n −→ {0, 1} the expression is unique, relatively little attention has been paid to this case. Though it is known that in some cases, these representations actually offer better solutions. Since {0, 1} inputs and outputs are natural for Boolean functions, Boolean functions of the form {0, 1}n −→ {0, 1} are widely used in applications such as in computer science to modeling biological systems. In the next section, we will explain how to gain 1
In some literature, the sign representations are allowed to take 0 as an output. It is easy to see that these two definitions are equivalent.
292
Y.M. Zou
more flexibility using {0, 1} both as inputs and outputs by introducing extra variables, and thus make it possible to choose different representations for the same function. We will also provide an algorithm for the reduction of the number of terms used in a representation while keeping the degree upper bound to be n.
2
Main Result
To simplify our writing, let F2 = {0, 1}. Since F2 is the Galois field of 2 elements, we can consider the ring of multivariate polynomials F2 [x] := F2 [x1 , x2 , ..., xn ], x = (x1 , x2 , . . . , xn ),
(3)
with coefficients in F2 . The polynomial ring F2 [x] is not what we use to represent polynomial binary functions since x2i = xi in this polynomial ring. To make x2i = xi , we need to quotient out the ideal2 I of F2 [x] generated by the binomials (note that over the Galois field F2 , xi − x2i = xi + x2i ): xi + x2i , i = 1, . . . , n. n Explicitly, I = { i=1 gi (xi + x2i ) | gi ∈ F2 [x], 1 ≤ i ≤ n}. The quotient F2 [x]/I is the Boolean algebra in which xi = x2i , 1 ≤ i ≤ n. We denote this Boolean algebra by B[x], where B = {0, 1}, to distinguish it from F2 [x]. We have the following well-known fact [14]. Proposition 1. Every polynomial in B[x] is a linear combination (with coefficients 0 or 1) of the 2n monomials described in (1) and every function f : F2n −→ F2 can be uniquely represented by such a polynomial. Following a similar construction, we begin with a polynomial ring in 2n variables F2 [x, y] := F2 [x1 , . . . , xn , y1 , . . . , yn ], and then form the quotient by taking the ideal J of F2 [x, y] generated by the following binomials and trinomials: xi + x2i , yi + xi + 1, 1 ≤ i ≤ n. That is, we consider the Boolean algebra B[x/y] = F2 [x, y]/J. We use the notation “x/y” instead of “x, y” in order to distinguish the algebra constructed here from the usual Boolean algebra in 2n variables. We have the following theorem Theorem 1. The Boolean algebras B[x], B[y], and B[x/y] are all isomorphic, i.e. B[x] ∼ = B[y] ∼ = B[x/y]. Proof. Since F2 [x, y] = F2 [x][y] and B[x] is a quotient of F2 [x], by the substitution principle [2], there exists a ring homomorphism φ : F2 [x, y] −→ B[x] defined by φ : xi −→ xi , yi −→ xi + 1, 1 ≤ i ≤ n. This homomorphism is clearly onto, so by the first isomorphism theorem [2], we have B[x] ∼ = F2 [x, y]/ker(φ),
(4)
where ker(φ) := {p ∈ F2 [x, y] | φ(p) = 0}. We need to show that ker(φ) = J. It is clear that xi + x2i , yi + xi + 1, i = 1, . . . , n, are all in ker(φ), so ker(φ) ⊇ J. 2
An ideal of F2 [x] is a nonempty subset I ⊆ F2 [x] such that (1) a + b ∈ I, ∀a, b ∈ I; and (2) pa ∈ I, ∀a ∈ I, ∀p ∈ F2 [x].
Representing Boolean Functions Using Polynomials
293
To see that they are actually equal, we note that in B[x/y] = F2 [x, y]/J, we have yi = xi + 1 (1 ≤ i ≤ n), so every element can be expressed as a linear combination of the 2n monomials in xi (1 ≤ i ≤ n) as described in (1). If ker(φ) J, then we would have the following relation on the cardinalities of the Boolean algebras |F2 [x, y]/ker(φ)| |B[x/y]| ≤ |B[x]|, which contradicts (4). Thus ker(φ) = J and B[x/y] ∼ = B[x]. The proof for B[x/y] ∼ = B[y] is similar. Q.E.D. Theorem 1 allows us to identify the Boolean algebra B[x] with the Boolean algebra B[x/y]. To identify the elements of these two Boolean algebras as functions, we embed {0, 1}n into {0, 1}2n as follows. For each vector (a1 , a2 , . . . , an ), where ai ∈ {0, 1}, we identify it with (a1 , a2 , . . . , an , a1 + 1, a2 + 1, . . . , an + 1). Through this embedding, we can identify the elements of these two Boolean algebras as functions by assigning the values to the variables by (x1 , x2 , . . . , xn , y1 , y2 , . . . , yn ) = (a1 , a2 , . . . , an , a1 + 1, a2 + 1, . . . , an + 1).
For example, the function f = x2 + x1 x2 + x2 x3 + x1 x2 x3 = (x1 + 1)x2 (x3 + 1) can be identified with f = y1 x2 y3 . The following theorem is the main result. Theorem 2. With the identification of B[x] with B[x/y] as described above, any Boolean function {0, 1}n −→ {0, 1} can be represented by a polynomial of degree ≤ n with at most 2n−1 terms. Proof. For each vector a = (a1 , a2 , . . . , an ) ∈ {0, 1}n, let pa := z1 z2 · · · zn , where zi = xi if ai = 1, and zi = yi if ai = 0. Then pa (x) = 1 if x = a, and 0 otherwise. For any function f : {0, 1}n −→ {0, 1}, we have f = x f (x)px (this is known as the disjunctive normal form of f ). Furthermore, we have pa = 1. (5) a∈{0,1}n
Given f , we separate {0, 1}n into two disjoint subsets S0 := {x ∈ {0, 1}n | f (x) = 0} and S1 := {x ∈ {0, 1}n | f (x) = 1}.
Then by (5), we have f=
x∈S1
px = 1 +
px .
(6)
x∈S0
Now if |S1 | ≤ 2n−1 , then we use the first expression; and if |S1 | > 2n−1 , then we can take the second expression. Q.E.D. Though the above theorem guarantees the use of at most 2n−1 terms of degree ≤ n monomials in representing a Boolean function, an expression given by (6) is not necessary satisfactory. In applications, one can perform certain simplification process to simplify an expression. Here we describe a simple procedure for reducing the number of monomial terms by combining terms using the relations xi + yi = 1, 1 ≤ i ≤ n.
294
Y.M. Zou
Algorithm 1. Boolean Polynomial Function Reduction Algorithm INPUT: f : {0, 1}n −→ {0, 1}. OUTPUT: A polynomial expression for f of degree ≤ n with at most 2n−1 terms of monomials in the expression. 1. Choose a polynomial expression for f according to Theorem 2. 2. Simplify the polynomial obtained in step 1 by combining pairs of terms differ only in one place using the relations xi + yi = 1, 1 ≤ i ≤ n. This step can be repeated. 3. Substitute in xi + 1 for yi (1 ≤ i ≤ n) in the polynomial obtained in step 2 and simplify module 2. Then applying step 2 to combine terms if needed. 4. Compare the polynomials obtained in step 2 and step 3, and chose the one desired. Example 1. Let f : {0, 1}4 −→ {0, 1} be the function with output 1 if the input sequence corresponds to an integer which is a sum of two squares (0 is allowed), and 0 otherwise. Then f outputs 1 at the binary inputs correspond to the integers 0, 1, 2, 4, 5, 8, 9, 10, 13 (these integers are sums of two squares) and outputs 0 for the binary inputs correspond to the integers 3, 6, 7, 11, 12, 14, 15. The polynomial obtained in step 1 of Algorithm 1 is f = 1 + y1 y2 x3 x4 + y1 x2 x3 y4 + y1 x2 x3 x4 + x1 y2 x3 x4 +x1 x2 y3 y4 + x1 x2 x3 y4 + x1 x2 x3 x4 . In step 2, we combine the second term with the fourth term y1 y2 x3 x4 +y1 x2 x3 x4 = y1 x3 x4 , the third with the seventh y1 x2 x3 y4 + x1 x2 x3 y4 = x2 x3 y4 , and the fifth with the eighth x1 y2 x3 x4 + x1 x2 x3 x4 = x1 x3 x4 , and get f = 1 + y1 x3 x4 + x2 x3 y4 + x1 x3 x4 + x1 x2 y3 y4 . It can be further simplified by combining the second term and the fourth term: f = 1 + x3 x4 + x2 x3 y4 + x1 x2 y3 y4 .
3
A Comparison Example
In this section, we compare our method with the result in [12] using the example therein on restricted prime functions. The restricted prime functions p4 and p5 (pn counts the primes in {0, 1, . . . , 2n − 1}) were explicitly given in [12] using sign representation polynomials with 7 and 17 terms respectively. Using our algorithm, we can represent these two functions with 4 and 6 terms respectively (the correctness of these polynomial representations can be checked by substitute in the binary sequences for the integers): p4 = x1 x2 x4 + x1 x3 x4 + y1 x2 x4 + y1 y2 x3 , p5 = y1 y2 y3 x4 + y1 y2 x3 x5 + x2 x3 y4 x5 + x1 y2 y3 x5 + x1 x3 x4 x5 + y1 x2 y3 x4 x5 .
Representing Boolean Functions Using Polynomials
295
We give one more formula for these restricted prime functions and report the other comparison detail in Table 1. The function p6 requires 39 terms using sign representation in [12]. Using our method, it can be represented by the following polynomial with 11 terms: p6 = y1 y2 y3 y4 x5 + y1 y2 y3 x4 x6 + y2 x3 y4 x5 x6 + y1 x3 x4 y5 x6 +y1 x2 y3 y4 x6 + y1 x2 x4 x5 x6 + x1 y3 x4 y5 x6 + x1 y2 x3 x5 x6 +x1 x3 y4 x5 x6 + x1 y2 x3 y4 y5 x6 + x1 x2 x3 x4 y5 x6 .
In Table 1, the third column reports the results of running Algorithm 1 once for each case. In some cases, as seen in Example 1, it is possible to simplify the representations further by repeating step 2 and/or step 3 in the algorithm, but we decided to leave them as they are for a uniform presentation. All computations were done on a Dell laptop with a core due processor of 3.06 GHz and 3.5 GB RAM using MAPLE 11. Table 1. Comparison table. The computing time for p12 is 37.34 sec. Restricted prime # Monomials used Functions
4
# Monomials used by
in [12] (% of full set) our method (% of full set)
p7
82 (64.06%)
23 (17.97%)
p8
147 (57.42%)
38 (14.84%)
p9
315 (61.52%)
66 (12.89%)
p10
633 (61.82%)
115 (11.23%)
p11
1259 (61.47%)
202 (9.86%)
p12
(Not reported)
366 (8.93%)
Concluding Remarks
We have proposed a novel method for representing a Boolean function using a polynomial. Our basic idea is to introduce extra variables to gain more flexibility in the representations. In Theorem 1, we showed that the Boolean algebra B[x/y] we have constructed is isomorphic to the classical Boolean algebra B[x], so we can identify Boolean functions {0, 1}n −→ {0, 1} with polynomials of B[x/y]. In contrast to the uniqueness of the polynomials in B[x], we can represent a polynomial in B[x/y] in different ways and thus make it possible to reduce the terms used. Our main theorem, Theorem 2, states that each Boolean function can be represented by a polynomial in B[x/y] with at most 2n−1 monomial terms of degree at most n. In practice, one can apply Algorithm 1, which is straight forward and easy to implement, to further reduce the number of terms used and thus reducing the number of input lines needed for the corresponding neuron. We compared our method with best known deterministic algorithm for representing a Boolean function using a sign representation polynomial [12]. Table 1
296
Y.M. Zou
shows that when applied to the restricted prime functions, our method produces significant improvements. Future work includes further theoretical analysis of the proposed new approach and its relationship with the existing approaches, in particular its relationship with sign representations.
References [1] Anthony, M.: Classification by polynomial surfaces. Discrete Appl. Math. 61, 91–103 (1995) [2] Artin, M.: Algebra. Prentice-Hall, Englewood Cliffs (1991) [3] Aspnes, J., Beigel, R., Furst, M., Rudich, S.: The expressive power of voting polynomials. Combinatorica 14, 1–14 (1994) [4] Beigel, R.: Perceptrons, PP, and the Polynomial Hierarchy. Computational Complexity 4, 339–349 (1994) [5] Bohossian, V., Bruck, J.: Algebraic techniques for constructing minimal weight threshold functions. SIAM J. Discrete Math. 16, 114–126 (electronic) (2002) [6] Bruck, J.: Harmonic analysis of polynomial threshold functions. SIAM J. Discrete Math. 3, 168–177 (1990) [7] Emamy-Khansary, M.R., Ziegler, M.: New bounds for hypercube slicing numbers. In: Discrete Mathematics and Theoretical Computer Science Proceedings AA (DM-CCG), pp. 155–164 (2001) [8] Hassoun, M.H.: Fundamentals of Artificial Neural Networks. MIT Press, Cambridge (1995) [9] Matulef, K., O’Donnell, R., Rubinfeld, R., Servedio, R.A.: Testing halfspaces. SIAM J. Comput. 39, 2004–2047 (2010) [10] Nisan, N., Szegedy, M.: On the degree of Boolean functions as real polynomials. Comput. Complexity 4, 301–313 (1994) [11] O’Donnell, R., Servedio, R.A.: Extremal properties of polynomial threshold functions. J. Comput. System Sci. 74, 298–312 (2008) [12] Oztop, E.: Sign-representation of Boolean functions using a small number of monomials. Neural Networks 22, 938–948 (2009) [13] Parnas, M., Ron, D., Samorodnitsky, A.: Testing basic Boolean formulae. SIAM J. Discrete Math. 16, 20–46 (2002) [14] Rudeanu, S.: Boolean Functions and Equations. North-Holland, Amsterdam (1974) [15] Saks, M.E.: Slicing the hypercube, Surveys in Combinatorics. London mathematical society lecture note series, vol. 187, pp. 211–255 (1993) [16] Siu, K.Y., Roychowdhury, V., Kailath, T.: Discrete Neural Computation. Prentice Hall PTR, Englewood Cliffs (1995) [17] Schmitt, M.: On computing Boolean functions by a spiking neuron. Ann. Math. Artificial Intelligence 24, 181–191 (1998) [18] Wang, C., Williams, A.C.: The threshold order of a Boolean function. Discrete Appl. Math. 31, 51–69 (1991)
The Evaluation of Land Utilization Intensity Based on Artificial Neural Network* A Case of Zhejiang Province Jianchun Xu1, Huan Li1, and Zhiyuan Xu2 1
2
Land Institute, Zhejiang Gongshang University, Hangzhou, 310018, China Destination Management, Zhejiang Gongshang University, Hangzhou, 310018, China
Abstract. In order to study the new method of the evaluation of intensive land use in development zone, the author apprised the intensity of those development zones in Zhejiang based on artificial neural network and tried to explore the new way to improve the intensity and tabled proposals. In the study, methods of artificial neural network and empirical analysis were employed. The research results indicated that: (1) Generally, the level of land intensive use in Zhejiang province was high, not only the land use efficiency, but also the structural condition of land were higher than the national average and also took the leading position in Yangtze river delta area; (2) There existed spatial differences and it could be described as east was high, middle was average, north and south was low; (3)The intensity was associated with economic development and the correlative coefficient was as high as 0.903. From the above, the author concluded that BP artificial neural network was able to overcome the adverse impact of man-made on the evaluation results and had great practical value. Keywords: artificial neural network; intensive land use; development zone.
1 Introduction Now, in china, AHP and Regression Analysis are used in the evaluation of intensive land use widely. In order to establish the evaluation system, these methods should determine the weight, estimate the ideal value, standardized the date and so on. Although these integrated approaches have been used widely, the processes of weight determination and ideal value selection contain the impact of subjectivity. So these methods may make the results deviate from the actual situation. Compared with traditional methods, Artificial Neural Network can study and save a lot of input-output mapping relations. Through the back-propagation, it can adjust weights and thresholds of the network continuously and make the error of the model to be minimum[1]. *
Project is supported by the technological innovation projects of graduate students (No. 2007R408056).
D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 297–305, 2011. © Springer-Verlag Berlin Heidelberg 2011
298
J. Xu, H. Li, and Z. Xu
In this study, after a lot of data collection, Artificial Neural Network is used to build a suitable evaluation model. On the whole, this model gets rid of the negative impact of subjective factors and reflects the scientific of the research.
2 Survey Methods and Sample Structure 2.1 Survey Methods In this survey, multi-stage random survey and intensive sampling survey are used[2]. And 3S technology is also used to investigate the status of land utilization intensity in Zhejiang province. First, make sure the target of evaluation and develop a survey program; Second, make field investigation and collect data and analyze data; At last, sample data should be processed and prepared for the study. 2.2 Sample Structure There are a total of 65 groups of sample data in this research (Tab.1). According to the needs of this research, these sample date can be separated into two parts: The first part, collected in 2010, contains 11 groups. They are selected from 11 cities in Zhejiang province and each sample consists with 14 indicators. The second part is the remaining 54 groups and they are also divided into two parts. The first part contains 45 groups which are used to train the Artificial Neural Network and the other 9 groups are used to test the model.
3 Overview of Artificial Neural Network An Artificial Neural Network consists of a number of data processing elements called neurons or nodes that are grouped inlayers. The input layer neurons receive input data or information and transmit the values to the next layer of processing elements across connections. This process is continued until the output layer is reached. This type of network in which data flows in one direction (forward) is known as a feed-forward network. The application of Artificial Neural Network has been the topic of a large
Fig. 1. The basic structure of ANN
The Evaluation of Land Utilization Intensity Based on Artificial Neural Network
299
number of recent literatures[3]. Figure 1 shows the basic concept of an Artificial Neural Network. The content of Fig.1 can be abstracted by a mathematical formula. Let Xi t as the input information which neuron j get from neuron I and let Kj=Oj t as the output information of neuron j. So the state of neuron j can be expressed as:
()
()
(t+1)= F { [ ∑ w x(t ) ] - T } n
Oj
ij
j
i =1
In the formula wij ———— Coefficient of synaptic connections; Tj ———— Threshold of neuron j F . ———— Transfer function of neurons.
;
()
3.1 Construction of Artificial Neural Network 3.1.1 Design the Layers of Artificial Neural Network Roberto has proved that for any continuous function on closed interval can be approximated with a three-layer Artificial Neural Network (An input layer, one hidden layer, an output layer) and every n-dimensional to m-dimensional mapping can be finished by three-layer Artificial Neural Network too[5]. So in this research we make the Artificial Neural Network has three layers. 3.1.2 Design the Form of Input and Output The indicator system reflects the goal, content and degree of development zones. Index system should have functions of explanation, evaluation and prediction to make the process of appraising to be dynamic and continuous[6]. According to the Evaluation of intensive land use in Zhejiang Province , we established the index system(Tab.1). So we can ensure that the research is scientific and reasonable.
《
》
Table 1. Index system of land intensive use Name
Code
Name
Code
Land Development Rate
u1
u8
Land Supply Rate
u2
Land Completion Rate
u3
Rate Of Industrial Land
u4
Consolidated Volume Rate
u5
Building Density Of Industrial Land Intensive degree Of Investment In Fixed Assets Of Industrial Land Output Intensive degree Of Industrial Land Disposal Rate Of Land For The Project Due Disposal Rate Of Ideal Land
u12
Building Density Consolidated Volume Rate Of Industrial Land
u6
Compensation rate Of The Land
u13
u7
Market Rate Of The Land
u14
u9 u10 u11
300
J. Xu, H. Li, and Z. Xu
In addition, the second sample date has a score of intensive degree, so we only design a output node and named it as S. 3.1.3 Design the Hidden Layer The nodes of the hidden layer in Artificial Neural Network are very important. They can not only influence the performance of the model, but they are also the direct cause of over-fitting. In this research, we refer to the method which is created by professor Yan[7](empirical formula method). He considered that the optimal number of the nodes in hidden layer have a direct relationship with the output-layer, input-layer and training date. On the basis of the experiment, he summarized a simple formula: NH =
(1)
Ni × No +Np/2
NH is the optimal hidden nodes, Ni is the input layer nodes, NO is the output layer nodes, NP is the training samples. So according to Formula 1 , NH = 14 × 1 + 45/2 ≈ 26. At this point, the topology of the Artificial Neural Network has been constructed (Fig.2).
()
Land Development Rate
U1
Land Supply Rate
U2
Land Completion Rate
U3
Rate Of Industrial Land
U4
Consolidated Volume Rate
U5
Building Density
U6
Consolidated Volume Rate
U7
1 2 3
Building Density Of Industrial Land Input Intensive degree Of Industrial Land Output Intensive
U8
U9
4
. . . . .
S
23
U10
24
degree Of Industrial Land
U11
Disposal Rate Of Land
U12
Disposal Rate Of Ideal Land Compensation rate Of The Land
INPUT
25
U13 U14
26 HIDDEN LAYER
OUTPUT
Fig. 2. The topology of Artificial Neural Network
3.1.4 Date Processing Some of the evaluation indexes have different dimension and they can not establish a comprehensive evaluation network. So data should be standardized. Positive effect index and negative effect index are standardized by formula(2):
The Evaluation of Land Utilization Intensity Based on Artificial Neural Network
X’i j=
(X 。 — X )/( X , (X , — X 。 )/( X , i
j
i
max
i max —
i min
i
Xi
i max —
j
[
(2 )
min)
Xi
x'1.i x' 2.i x'3.i ... x' 45.i ] x '12 ... x '1.14 ⎤ ⎡ x '11 ⎢ x' x' 22 ... x ' 2.14 ⎥⎥ 21 So X = ⎢ = ( u1 ⎢ : : : : ⎥ ⎢ ⎥ ⎣ x ' 45.1 x ' 45.2 ... x' 45.14 ⎦ Let ui=
301
min)
T
u2
On the other hand, output-layer has built a Matrix S= (s1
u3
s2
...
s3
u14 )
... s 45 ) T
3.1.5 Creating Training, Validation and Test Sets During the learning stage of the network, the weights are iteratively modified to minimize the error between the network output and the user’s desired output. Nevertheless, an excessive number of parameters or weights in relation to the problem at hand and to the number of training data may lead to over-fitting. This phenomenon occurs when the model fits the irrelevant features present in training data too closely instead of fitting the underlying function which relates inputs and outputs, this process requires the use of software called Matlab. We import the input matrix X and output matrix S to the model which has be established. First set parameters and then start to train the model. After 2462 times training, the network converged and the final MSE is 9.9962*10-4. In order to ensure the model which has been established matches the requirements of the research, we should test the network and the test result is in Tab.2. We can see that the absolute value of error are less than 3%. Thus far, model is completed. Table 2. The test results of BP network
(
(
Code
Measured
Simulated
Error %)
Code
Measured
Simulated
Error %)
1 2 3 4 5
0.76 0.81 0.94 0.93 0.84
0.78 0.8 0.94 0.92 0.86
2.63% -1.23% 0.00% -1.08% 2.38%
6 7 8 9
0.76 0.98 0.90 0.90
0.77 0.96 0.92 0.91
1.32% -2.04% 2.22% 1.11%
3.2 Application of Artificial Neural Network Using the Artificial Neural Network which has been established, we try to predict the degree of land utilization intensity in 11 cities. (Tab.3)
302
J. Xu, H. Li, and Z. Xu
Table 3. Prediction of intensive land use City Hang Zhou
Name Hangzhou Yuhang Economic Development Zone
Score 0.94
Ning Bo
Ningbo Yinzhou Industrial Park
0.93
Wen Zhou
Wenzhou Binhai Industrial Park
0.91
Jia Xing Hu Zhou
Jiaxing Industrial Park Huzhou Wuxing Economic Development Zone
0.83 0.81
Shao Zing
Shaoxing Economic Development Zone
0.89
Jin Hua
Jinhua Economic Development Zone
0.87
Qu Zhou Zhou Shan
Quzhou Economic Development Zone Zhoushan Industrial Park
0.79 0.77
Tai Zhou
Taizhou Economic Development Zone
0.91
Li Shui
Lishui Economic Development Zone
0.83
4 Result of the Evaluation 4.1 The Land Intensive Degree Is Generally High in Zhejiang Zhejiang Province is located in the south of Yangtze River Delta, on the east of the East China Sea, on the south of Fujian, on the west of Jiangxi and Anhui, on the north of Shanghai and Jiangsu. By the advantage of location, transportation, economic, technology and talent, industries which contain high technology and have high economic returns, just like electronic, energy, computer, are developing quickly in Zhejiang. Formed the core competitiveness of the industrial zone, and develop export-oriented economy[9]. And after years of development, most development zones have formed an effective management system which can promote the use of land to be more efficient, orderly and rational. Shown in Table.4, 82% score of the development zones are above 0.8, they are in the range of intensive use. And 36% of the development zones have scores above 0.9, they are in the range of highly intensive use. On the other hand, in Zhejiang province, whether the land-use utility or status of land structure have shown high levels of intensive land use. For example, the average volume of industrial land is 72%, the land development rate is 78%, the land completion rate is 89%, average intensive degree of assets is 1,951,200 yuan / mu, output ratio of industrial land is 2,905,100 yuan / mu, output tax ratio of industrial land is 1.5 million yuan / mu, they are not only in the lead of Yangtze River Delta, but also higher than the national average. Table 4. The evaluation result of land intensive utilization in development zone of Zhejiang
Item
Score<0.8
0.9>Score>0.8
Score>0.9
Rate
18%
46%
36%
The Evaluation of Land Utilization Intensity Based on Artificial Neural Network
303
4.2 The Land Intensive Degree Exist Spatial Difference According to Tab.3, the spatial distribution of land utilization intensity is mapped. Connecting with the actual situation in Zhejiang Province, We can draw the following characteristics: The land intensive degree in the east is high, in the middle is general, in the south and north is the low.
Fig. 3. Directional distribution of the land intensive utilization evaluation in development zone of Zhejiang
As illustrated in Fig.3, the scores of the eastern coastal cities, just like Ning Bo, Wen Zhou, Shao Xing, are all above 0.9. They are in the range of highly intensive use. The scores of the cities in the middle of Zhejiang are between 0.85 and 0.9. They are at an intermediate level. While the scores of the cities in the north or in the south of Zhejiang are generally low. The scores are all bellow 0.85. 4.3 Intensive Degree Is Highly Correlated with Economic Development Optimal and intensive land-use is important measure to handle the relationship between economic development and resource protection. Rapid development of economy is a
304
J. Xu, H. Li, and Z. Xu
1
5000
0.9
4000
0.8
3000 0.7
2000
n ha ou s
Zh
Li
uz h
ou
u
(Billion)
Q
uz ho H
nh Ji
u
Ji ax
iz ho Ta
gb o
W en zh
in N
gz ho an H
GDP
sh ui
0.5 ua
0 in g
0.6
ou Sh ao xi ng
1000
Score
6000
u
GDP(Billion)
prerequisite for land intensive utilization and intensive use of land provides a guarantee for economic develop sustainably. As illustrated in Fig.4, with the GDP curve from high to low, the intensive degree curve will also show a downward trend. In order to study the relationship between GDP and intensive degree more accurately, we make a correlation analysis by Spss 17.0. The result is illustrated in Tab 5: the correlation coefficient is up to 0.903. Therefore, the intensive degree and economic development are linked and influenced mutually. On the one hand, only economic developed can fit the needs of land intensive utilization. On the other hand, land intensive utilization provides the conditions for inclusive growth.
Score
Fig. 4. Relationship between economic development and land intensive use Table 5. The table of Correlation Analysis GDP
Score
1
.903** .000 11 1
Pearson Correlation GDP
Score
11 .903** .000 11
11
** At the .01 level (bilateral) is significantly correlated.
5 Summary and Conclusions Artificial Neural Network can be considered flexible, non-linear, all purpose statistical tools, capable of learning the complex relations that occur in the social processes associated with land intensively utilizing appraise. In this study, Artificial Neural Network model is used for estimating the intensive degree of development zones in Zhejiang province. 45 samples were used for training the model and 9 samples were used to verify the model. Inputs for training the model include 14 indexes. Compared
The Evaluation of Land Utilization Intensity Based on Artificial Neural Network
305
with the traditional evaluation method, just like AHP and Regression Analysis, this technology presents a series of advantages. One the most prominent advantage of Artificial Neural Network is objectivity. It avoids the influence of subjective factors on the evaluated results. In summary, the Artificial Neural Network l with 14 input nodes is able to estimate the intensive degree of development zones. However, it should be pointed out that the accuracy of an Artificial Neural Network model is data sensitive, and should not be applied to the conditions outside of the data range used in training the model without verification.
References 1. Yang, J.: Practical Course on Artificial Neural Networks. Zhejiang University Press (2001) 2. Ye, J., Jiang, Y., Roy, P., Zhu, K., Feng, L., Li, P.: The Research on Land Use Rights in Rural China in 2005. World Management (7), 77–84 (2006) 3. Yang, C.T., Marsooli, R., Aalami, M.T.: Evaluation of Total Load Sediment Transport Formulas Using ANN. International Journal of Sediment Research (24), 274–286 (2009) 4. Li, C., Guo, S., Zhang, J.: Modified NLPM-ANN Model and Its Application. Journal of Hydrology 378, 137–141 (2009) 5. Liming, Z.: Artificial Neural Network Model and Its Application. Fudan University Press (1993) 6. Wei, Z., Xiulan, W.: Conservation and intensive land use evaluation index system. Journal of Anhui Agricultural Sciences 35(2), 491–493 (2007) 7. Taishan, Y.: Glass Crack Detection Based on BP Neural Network Model. Sci-Tech Information Development & Economy 15(8), 182–183 (2005) 8. Sun, Z., et al.: Research of Evaluation on Present Status of Urban Land Intensive Utilization—Taking Baoding city as case. Journal of Anhui Agri. Sci. 35(15), 4576–4578 (2007) 9. Yang, S.: Connotation of Intensified Urban Land Use and Its Construction of Evaluation Index. Explore on Economical Problem (1), 27–30 (2007) 10. Nie, X.-z., Zhang, J., Liu, Z.-h., Zhang, J.-h.: Studies on The Spatial Characteristics and The Development of Theme Tourism Clusters in China. Human Geography 20(4), 65–68 (2005) 11. Xia, Z.-c., Xie, C.-s.: Probe into Some Basic Problems of The Tourism Industry Cluster. Journal of Guilin Institute of Tourism 18(4), 479–484 (2007) 12. Wang, Z.-f.: The Research of Games Between The Stakeholder Government and the Tourism Industry Cluster. Journal of Jishou University (Natural Science Edition) 29(4), 93–99 (2008) 13. Ozturk, H.E.: The Role of Cluster Types and Firm Size in Designing the Level of Network Relations: The Experience of the Antalya tourism Region. Tourism Management 29, 1–9 (2008) 14. Zhang, X., et al.: The Classification of Land Use Potentials in Cities of China Based on Principal Component Analysis. Journal of Hunan Agricultural University 33(1), 113–116 (2007) 15. Lim, C., McAleer: Time Series Forecasts of International Travel Demand for Australia. Tourism Management 23, 389–396 (2002) 16. Kim, J., Wei, S., Ruys: Segmenting the Market of West Australian Senior Tourists Using An Artificial Neural Network. Tourism Management 24, 25–34 (2003) 17. Kolarik, T., Rudorfer: Time Series Forecasting Using Neural Networks. APL Quote Quad. 25(1), 86–94 (1994)
Dimension Reduction of RCE Signal by PCA and LPP for Estimation of the Sleeping Yohei Tomita1 , Yasue Mitsukura1 , Toshihisa Tanaka1,3 , and Jianting Cao2,3 1
Tokyo University of Agriculture and Technology, 2-24-16, Nakacho, Koganei, Tokyo, Japan [email protected], {mitsu e,tanakat}@cc.tuat.ac.jp 2 Saitama Institute of Technology, 1690, Okabe, Saitama, Japan [email protected] 3 RIKEN Brain Science Institute, 2-1, Hirosawa, Wako, Saitama, Japan
Abstract. Irregular hour and suffering from stress cause driver doze and falling asleep during important situations. Therefore, it is necessary to know the mechanism of the sleeping. In this study, we distinct the sleep conditions by the rhythmic component extraction (RCE). By using this method, a particular EEG component is extracted as the weighted sum of multi-channel signals. This component concentrates the energy in a certain frequency range. Furthermore, when the weight of a specific channel is high, this channel is thought to be significant for extracting a focused frequency range. Therefore, the sleep conditions are analyzed by the power and the weight of RCE. As for weight analysis, the principal component analysis (PCA) and the locality preserving projection (LPP) are used to reduce the dimension. In the experiment, we measure the EEG in two conditions (before and during the sleeping). Comparing these EEGs by the RCE, the power of the alpha wave component decreased during the sleeping and the theta power increased. The weight distributions under two conditions did not significantly differ. It is to be solved in the further study. Keywords: EEG, RCE, PCA, LPP.
1
Introduction
Modern people have been more interested in the sleeping. There are two reasons [1], [2]. One is that importance of the sleeping is revealed by advancement in the brain science technology. Another is that there are many harmful effects by the sacrifice of the sleeping. In the modern society, many people have irregular hour and suffer from much stress, so that they cannot have enough of sleep. It causes driver dozing and falling asleep during important situations. Therefore, it is necessary to reveal the mechanism of the sleeping, and to prevent from falling asleep at critical moments. D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 306–312, 2011. c Springer-Verlag Berlin Heidelberg 2011
Dimension Reduction of RCE Signal
307
In order to judge whether a subject sleeps or not, an action observation or the electroencephalogram (EEG) is used. Generally, a doctor visually assesses a situation of the sleeping by the EEG for 8–9 hours at least. In computer simulations, many researchers pay attention to the frequency component such as δ, θ, α, β waves by the frequency analysis [3]. In the Fourier transform, rhythmic components are extracted for decomposing a signal to a set of frequency components. For example, Yokoyama et al. suggest that the transitional EEG wave caused by the sleeping is important [4]. In recent study as for rhythmic component, in addition to these frequency bands, so slow EEG fluctuations are known to be important [5]. Focusing other aspects, if we use the multi-channel signals, the principal component analysis (PCA) and the independent component analysis (ICA) are effective. They are developed for extracting meaningful components in a noisy environment. Furthermore, Tanaka et al. advocate if the frequency of interest is known in advance, it is more natural to directly estimate a component with respect to the frequency range ([6]–[8]). Then, for this purpose, the rhythmic component extraction (RCE) was proposed by them. It outputs the weighted sum of multi-channel signals. These weights are optimized to maximize the power of a specific frequency component in the linear summed variable. We assume that the rhythmic component in the EEG is caused by the sleeping condition, so it may be effective for the analysis of the sleeping. Furthermore, the effective electrode can be found by the weights of the RCE results. However, there is not effective analysis method of these weights. The variation of the weights has the high dimension, because the RCE signal is based on the multichannel signal. For analysis of the features of the weights, we think the principal component analysis (PCA) and the locality preserving projection (LPP) are effective. The PCA is a classical linear technique that projects the data along the directions of maximal variance. The LPP is a linear dimensionality reduction algorithm, and builds a graph incorporating neighborhood information of the data set. This effectiveness is reported in [9]. Therefore, we analyze time variation of these weights by feature dimensionality reduction by the PCA and the LPP.
2
Rhythmic Component Extraction
The RCE is a method to extract particular EEG component that concentrates the energy in a certain frequency range. It is expressed as weighted sum of multichannel signals. Observed signal xi [k](k = 0, ..., N − 1) is based on the channel i(i = 1, ...M ). The RCE signal is expressed as xˆ[k] =
M
wi xi [k].
(1)
i=0
where wi is the weight of signal i. ˆ −jω ) be the discrete-time Fourier transform (DTFT) of xˆ[n]. If we Let X(e are interested in the specific frequency range, denoted by Ω1 ⊂ [0, π], then the power of x ˆ[k] with respect to this range is described as
308
Y. Tomita et al.
ˆ −jω )|2 dω. |X(e
P =
(2)
Ω1
Then, the weight wi is determined to maximize the power of specific frequency component, whereas the power of the other frequency component is minimized. Ω1 ⊂ [0, π] is frequency component we want to extract, and Ω2 ⊂ [0, π] is the other frequency component we want to suppress. Therefore, the cost function to be maximized is given as the following. ˆ −jω )|2 dω |X(e Ω J[w] = 1 (3) ˆ −jω )|2 dω |X(e Ω2
The maximization of the above cost function is reduced to a generalized eigen value problem in the following way. Define X∈RM ×N as [X]ik = xi [k], and matrices W1 as: [W1 ]l,m = e−jω(l−m) dω, (4) Ω1
where l, m = 0, ..., N − 1. In the same way, W2 is defined by replacing Ω1 with Ω2 in (4). Then, J[w] in (3) can be described in the matrix-vector form as J[w] =
wT XW1 X T w . wT XW2 X T w
(5)
The maximizer of J[w] is given by the eigen vector corresponding to the maximum eigen value of the following generalized eigen value problem: XW1 X T w = λXW2 X T w.
(6)
The problem can be solved by using a matrix square root of XW2 X T . Since XW2 X T is symmetric, a matrix square root, S, exists such that XW2 X T = SS T . Note that S is not uniquely determined. Then, the optimal solution, w∗ , is given by w∗ = S −T w, ˆ (7) where w ˆ is the eigenvector corresponding to the largest eigen value of the matrix S −1 XW1 X T S −T .
3
Electroencephalograph
The EEG data is obtained via the electrodes based on the international 10/20 system shown in Fig. 1. It is a system which describe the locations of electrodes on the human skull. Based on this system, the EEG is measured by the electroencephalograph. In this study, the electroencepharograph shown in Fig. 2 is used as a measurement devise. This system enables the measurement of brain activity over the whole head, because it is the cup type electroencephalograph released by NEUROSCAN. There are 32 channels (including two reference electrodes on the ears), and sampling frequency is 1000 Hz. The electrodes’ impedance should be set to less than 8 kΩ for measurement.
Dimension Reduction of RCE Signal
309
Fig. 1. The international 10/20 system
Fig. 2. The electroencephalograph
Measurement (Before sleeping)
Measurement (Sleeping)
If he notices 0min
4min
If he notices 8min
If he doesn't notice 12min
Experimenter drums on the table every 4 min.
Fig. 3. The experimental procedure
4
Experimental Procedure
In the experiment, we investigate EEG features just after falling asleep. There is one subject, the EEG is measured in two states: before the sleeping and during the sleeping. The EEG data is obtained by a NEUROSCAN system. After the attachment of the electroencephalograph, EEG is measured for one minute. This EEG data is defined as the data before the sleeping. After that, the subject keeps relaxing and sleeps. For the definition of falling asleep, we drums on the table by the forefinger every four minutes (Fig. 3). If he notices this small sound, the subject is regarded as being still waking. Otherwise, if he does not notice, EEG is measured for one minute. It is defined as the data during the sleeping. Approval from ethics committee had been obtained from Tokyo University of Agriculture and Technology.
5
Results and Discussions
Among the experiment, the two types of EEG data ware obtained (the EEG before the sleeping and that during the sleeping). First of all, differences of the frequency features of these were investigated. After that, the rhythmic component extracted by RCE was investigated.
310
Y. Tomita et al. 2200 2000 1800 1600 1400 1200 1000 800 600 400 200
before sleep during sleep
4
6
8
10 12 14 16 18 20 22 Frequency [Hz]
Fig. 4. Spectrum of Fp1 power before the sleeping and that during the sleeping 3.5
4.5 4 3.5 3 2.5 2 1.5 1 0.5 0
before sleep during sleep
3 2.5 2 1.5 1 0.5 0 0
10
20
30 Time [s]
40
50
before sleep during sleep
0
10
20
30 Time [s]
40
50
Fig. 5. Power spectrum of the theta com- Fig. 6. Power spectrum of the alpha component by RCE ponent by RCE
At first, frequency features were analyzed by short time Fourier transform. The window size is set as N = 1000, where sampling frequency is 1000 Hz in this experiment. In Fig. 4, one minutes averaged power spectrum is shown. The vertical axis represents the power spectrum and the abscissa axis represents the frequency. Compared with the power spectra of two conditions, the power of the alpha wave component (8–13 Hz) decreased during the sleeping. On the other hand, the power of the theta wave component (4–7 Hz) increased. These features were also confirmed at the other 29 electrodes. These results suggest that the alpha and theta wave components are promising candidates of doze features. From these results, theta component and the alpha component were chosen for the parameters of RCE analysis. In RCE analysis, the theta component and the alpha component were extracted. To extract the theta wave component by RCE, Ω1 is set to the range corresponding to 4–7 Hz. To extract the alpha wave component, Ω1 is set to the range corresponding to 8–13 Hz. The RCE signals are extracted from all 30 channels per 1 second. Moreover, to investigate the variation of the frequency component of the RCE signals, Fourier transform was applied per 1 second. One minute spectrum average of the theta wave (4–7 Hz) and the alpha wave (8–13 Hz) are shown in Figs. 5–6. The abscissa axis represents
Dimension Reduction of RCE Signal 1
before sleep during sleep
0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -0.5
0
0.5
1
1.5
2
2.5
1.2 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -0.5
311
before sleep during sleep
0
0.5
1
1.5
2
2.5
Fig. 7. PCA mapping of the weight ex- Fig. 8. PCA mapping of the weight extracted by the theta component tracted by the alpha component 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.3 -0.2 -0.1
before sleep during sleep
0.4 0.3
before sleep during sleep
0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 0
0.1 0.2 0.3 0.4 0.5
-0.5 -0.3-0.2-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Fig. 9. LPP mapping of the weight ex- Fig. 10. LPP mapping of the weight extracted by the theta component tracted by the alpha component
the time range and the vertical axis represents the average of the power spectrum. Time range is set as 60 seconds. From the RCE result, the power of the theta wave component during sleeping tends to be higher, although there was no great difference in the theta wave component between conditions. On the contrary, the power of the alpha wave during the sleeping is lower in the condition of during sleeping. Finally, the weights distribution is analyzed. The RCE weights are thought to be the extracted power distribution. Therefore , if we want to know the effectiveness of the specific electrodes, we should analyze the features of the weight. For mapping of the weight features into a 2-dimensional space, there are linear projective maps methods (PCA and LPP). PCA is to reduce the dimensionality of a data set consisting of a large number of interrelated variables. LPP is linear projective maps that arise by solving variational problem that optimally preserves the neighborhood structure of the data set. The variation of the weight in 30 channels is mapped to a 2–dimensional space by first two directions of each method. The weight variation among 30 electrodes by the PCA is shown in Figs. 7–8, and the results by the LPP are shown in Figs. 9– 10. Abscissa and vertical axes are 1st factor and 2nd factor, respectively. In
312
Y. Tomita et al.
these results, they were difficult to discriminate the locality maps. Therefore, it is difficult to confirm that there are the specific important electrodes. It may caused by the analysis method as the PCA and the LPP, or there are too many clusters in the EEG features.
6
Conclusions
In this paper, we investigate the sleeping EEG features by the RCE. In the experiment, we obtain the EEG data in two conditions (before and during the sleeping). In order to extract the sleeping features, we use the theta and the alpha wave in calculation of the RCE. From the results by RCE, the power spectrum of the alpha wave comparatively decreased during the sleeping and the theta power increased. Furthermore, the weight distribution is analyzed by PCA and LPP for finding effective electrodes. However, the specific electrodes are not confirmed. In the next step, the weights are analyzed by some pattern recognition simulation to confirm differences of sleep conditions.
References 1. Johns, M.W.: A sleep physiologist’s view of the drowsy driver. Transportation Research Part F 3, 241–249 (2000) 2. Lal, S.K.L., Craig, A.: A critical review of the psychophysiology of driver fatigue. Biological Psychology 55, 173–194 (2001) 3. Sanei, S., Chambers, J.: EEG signal processing. John Wiley & Sons, Hoboken (2007) 4. Yokoyama, S., Shimada, T., Takemura, A., Shiina, T., Saitou, Y.: Detection of characteristic wave of sleep EEG by principal component analysis and neural-network. IEICE Trans. J76-A(8), 1050–1058 (1993) 5. Monto, S., Palva, S., Voipio, J., Palva, J.M.: Very slow EEG fluctuations predict the dynamics of stimulus detection and oscillation amplitudes in humans. The Journal of Neuroscience 28(33), 8272–8286 (2008) 6. Tanaka, T., Saito, Y.: Rhythmic component extraction for multi-channel EEG data analysis. In: Proc. of ICASSP 2008, pp. 425–428 (2008) 7. Saito, Y., Tanaka, T., Higashi, H.: Adaptive rhythmic component extraction with regularization for EEG data analysis. In: Proc. of ICASSP 2009, pp. 353–356 (2009) 8. Higashi, H., Tanaka, T.: Rhythmic component extraction considering phase alignment and the application to motor imagery-based brain computer interfacing. In: Proc. of IEEE WCCI IJCNN, pp. 3508–3513 (2010) 9. He, X., Niyogi, P.: Locality preserving projections. In: Advances in Neural Information Processing Systems 16, Vancouver, Canada (2003)
Prediction of Oxygen Decarburization Efficiency Based on Mutual Information Case-Based Reasoning Min Han and Liwen Jiang School of Electronic and Information Engineering, Dalian University of Technology, Dalian 116023, China [email protected]
Abstract. Oxygen decarburization efficiency prediction model based on mutual information case-based reasoning is proposed and the blowing oxygen of static and dynamic phases is calculated according to the forecasting results. First, a new prediction method of blowing oxygen is proposed which attributes the oxygen decarburization efficiency as solution properties of case-based reasoning. Then the mutual information is introduced into the process of determining weights of attributes, which solve the problem that the lack of information is ignored between the problem properties and the solution property in the traditional case retrieval method. The proposed model will be used in a 150 tons converter for the actual production data. The results show that the model has high prediction accuracy. On this basis the calculation accuracy of blowing oxygen in the two phases is ensured and the requirements of actual production are met. Keywords: basic oxygen furnace; case-based reasoning; mutual information; oxygen decarburization efficiency; blowing oxygen.
1 Introduction Steelmaking is a critical process in the steel industry, it is a complex oxidation process warming and reduction of carbon by blowing the oxygen into the pool with highspeed, which reacts with molten iron and then heat is released. Then reduction of carbon can be achieved, molten iron can be heated and the impurity elements such as phosphorus and sulfur can be reduced. Ultimately the steel meeting the technological requirements can be obtained [1]. The blowing oxygen and the amount of refrigerant are the dominant factors in process control, which directly affects the effectiveness and quality of steel smelting and is the key factors of the control of element oxidation, bath temperature, slagging, the removal of steel inclusions and the control of splash. It will finally affect the content of endpoint carbon and the temperature of molten steel, which has an important role for strengthening refining and improving the technology indicators. With the improvement of the converter technology the vice-gun measure has been widely used in the area of industrial steelmaking. Because of the limit of the technological level there are only two vice-gun measures in a smelting process. One is D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 313–322, 2011. © Springer-Verlag Berlin Heidelberg 2011
314
M. Han and L. Jiang
carried out in the late blowing and the other in the end blowing. The boundary is measured with the time of vice-gun measurement, the first stage is called the main blowing stage and the second stage is known as the auxiliary blowing stage. The main blowing stage is the basic stage, in which most reactions are complicated, and in this stage the prediction of oxygen blowing is based on the initial information referred to the static prediction. The forecast accuracy of static blowing oxygen is the foundation of the follow-up process. The auxiliary blowing stage is the final phase, the prediction of oxygen blowing mainly based on the information of vice-gun known as dynamic prediction, whose blowing oxygen directly determines that whether the endpoint carbon content and temperature can achieve the requirements. The molten iron and scrap steel
The main blowing stage
The static prediction
The first vice-gun measure
The second vice-gun measure
The auxiliary blowing stage
The dynamic prediction
Fig. 1. Schematic diagram of converter steel production process
Artificial experience is currently used in the control of the blowing oxygen, or through the mechanism methods, statistical methods and intelligent methods and so on. The blowing oxygen forecast directly bases on the initial conditions and information, such as [2,3,4]. Yet in intelligent methods the oxygen decarburization efficiency changing in different reaction phases is not considered. In addition the method ignoring intermediate "black box modeled" is used. In the same time a lot of unknown parameters are introduced, which lacks the interpretability in history database, and the use of case experience to adjust is also not considered. In this paper a new model computing blowing oxygen was proposed, the oxygen decarburization efficiency was used as the solution of case-based reasoning. Through the mechanism analysis the factors of the oxygen decarburization efficiency in the primary and secondary blowing stages could be found, and the problem properties were identified. Case-based reasoning was used to predict the oxygen decarburization efficiency, because of ignoring an amount of information in the traditional method of case-based reasoning, the case retrieval method based on mutual information was proposed, the oxygen decarburization efficiency of the current case could be get. Finally, the blowing oxygen of static and dynamic phase was calculated with the mechanism formula.
Prediction of Oxygen Decarburization Efficiency
315
2 Oxygen Decarburization Efficiency and Case-Based Reasoning Based on Mutual Information Case-based reasoning consists of the following steps: case description, case retrieval, case reuse, case correction and preservation [5,6].
Fig. 2. Schematic model of case-based reasoning
2.1 Case Description Case description is the basis of case-based reasoning which ensures the effectiveness of subsequent steps. 2.1.1 The Identification of the Problem Properties According to the working condition of converter steel production and the process data the problem attributes can be constructed. The changes of the elements in the molten pool are usually divided into three stages namely pre-blowing, mid and late in converter steelmaking process. In the pre-blowing the average pool temperature is below 1400 . This period is mainly the oxidation of silicon and manganese, but due to the high temperature in a reaction zone a small amount of carbon will be oxidized. At the same time lower temperature and the rapid formation of alkaline oxidation residue which are the thermodynamic conditions of dephosphorization. So the early slag has a strong ability of dephosphorization. In the medium blowing silicon and manganese has been mostly oxidized out, and the bath temperature has risen to more than 1500 . The intense oxidation of carbon begins. In this phase the decarburization rate is so high that the blowing oxygen into the molten pool is mostly consumed in the reaction of decarburization. In the blowing end with the reaction of decarburization the carbon content in molten steel decreases, and the decarburization rate also decreases. The basicity in the slag determines the slagging rate. The blowing process is only ten minutes which must ensure the appropriate alkalinity so as to eliminate
℃
℃
316
M. Han and L. Jiang
effectively phosphorus, sulfur and other elements to improve the efficiency of oxygen decarburization. The bath temperature and the amount of coolant also affect the reaction of carbon and oxygen. The high bath temperature can cause severe reaction and improve the efficiency of oxygen decarburization. Thus in the main blowing stage the factors that affect the oxygen decarburization efficiency are the carbon content in hot metal, the target carbon content, the temperature of molten iron, the target temperature, the silicon content in hot metal, the manganese content in hot metal, the phosphorus content in hot metal, the dolomite, the lime, the iron ball, the molten iron and the scrap steel. In the secondary blowing stage the reaction of carbon and oxygen is the protagonist. Thus the factors affecting the oxygen decarburization efficiency of the dynamic model are only the measured content of carbon and the measured temperature by vice-gun. 2.1.2 The Determination of the Solution Property Traditional CBR process usually target as a solution directly to the property. However the actual data through the analysis shows that the actual oxygen decarburization efficiency and the blowing oxygen have a great relationship. To predict the blowing oxygen the first definition of oxygen decarburization efficiency is given: Oxygen decarburization efficiency η of some stage can be defined as the ratio that the oxygen consumed by carbon oxidation in the bath to the actual amount of oxygen.
η =
Q (C ) × 100% Q
(1)
Where Q(C ) is the blowing oxygen in melt pool. Q is the actual blowing oxygen.
The traditional formula calculating blowing oxygen is shown below [7] Qstatic =
[ w(C )iron × β − w(C )aim ] × μ
ηstatic
×W
(2)
According to equation (2) the formula calculating the dynamic blowing oxygen is shown below Qdynamic =
[ w(C )sub − w(C )aim ] × μ
ηdynamic
×W
Where Qstatic is the total blowing oxygen (static model). Qdynamic is the blowing oxygen of blowing phase (dynamic model). W is the total load. ηstatic is the efficiency oxygen decarburization of the static model.
(3)
Prediction of Oxygen Decarburization Efficiency
317
ηdynamic is the oxygen decarburization efficiency of dynamic model.
(
)
μ = 22.4 / 2× 12 =0.933 . β is the proportion of hot metal.
β=
Wiron × 100% Wiron + Wscrap
(4)
Practical production shows that accurate judgment and estimate of oxygen decarburization efficiency is necessary to calculate the blowing oxygen and achieve goals. Generally oxygen decarburization efficiency can be judged by actual production experience, which usually is taken from 0.70 to 0.75. However the analysis tells that the oxygen decarburization efficiency for actual data often exceeds the experience range. It can be affected by different factors at all stages. There are interactions between various factors. Therefore oxygen decarburization efficiency is determined to the solution property of case-based reasoning. 2.2 Case Retrieval
Case retrieval is the key of case-based reasoning, through which several cases similar to the current problems are obtained, and the solution of similar cases retrieved can be used to case reuse. 2.2.1 The Traditional Method of Case Retrieval Case retrieval methods: the nearest neighbor method, knowledge seeker, induction index method [8]. According to the characteristics of the continuous data the traditional method uses the nearest neighbor method. First the underlying similarity is calculated
simij = 1 −
xij − x j ' max( xij − x j ') − min( xij − x j ')
(5)
Then the global similarity is calculated based on the underlying similarity. As the property values are continuous variables in all similar cases, so the following formula is used. m
n
j =1
i =1
Simi = ∑ ( w j ∑ simij )
(6)
2.2.2 The Determination of the Weights Based on Mutual Information Currently the standard deviation of similarity is used to calculate the weights. In different cases smaller the property differences are, smaller the property role is on the distinction of the similarity in all cases which were given a smaller weight. On the contrary were given a greater weight. The formula of w j is as follows
318
M. Han and L. Jiang
wj =
σj m
∑σ j =1
(7) j
1 n ∑ (simij − sim j ) n i =1
σj =
(8)
sim j is the mean similarity of the current case and the jth case in the case base. Difficulty in the traditional CBR approach is the choice of feature weight. Usually the choice of weights is based on the standard deviation of local similarity. The larger the standard deviation of Local similarity is, the greater the corresponding weight of the feature is. However in practical applications the standard deviation of local similarities of a single property can’t determine on the contribution of the overall similarity of the property, but also the result and the actual have a large deviation. Because the data is collected in the industrial field which contains much redundant information, with the feature dimension increasing the data density decreases. Therefore it is lack of sufficient reason that the properties with smaller standard deviation of local similarities have a smaller amount of contribution to the control objects. In the study of the actual data it is found that although sometimes in a case the standard deviations of local similarity of each attribute were large, it didn’t have any impact for the predictions of control object. And sometimes although the standard deviations of local similarities of the individual properties were very close, the prediction of the control object had a big difference. The amount of information between the input attributes and the control object need to be considered. Mutual information can be defined based on the entropy theory, the formula is as follows:
H(X )
I ( X , Y ) = H ( X ) + H (Y ) − H ( X , Y ) = H(X ) − H (X | Y) = H (Y ) − H (Y | X )
(9)
H ( X ) = −∑ px ( xi ) log( px ( xi ))
(10)
H ( X | Y ) = −∑ px , y ( xi ) log( px| y ( xi | yi ))
(11)
H ( X , Y ) = −∑ p xy ( xi , y j ) log( pxy ( xi , y j ))
(12)
, H ( X | Y ) and H (Y | X ) are defined as follows: i
i
i, j
Mutual information is used as the basis to determine the attribute weights which are calculated as follows: I (x j , y ) wj = m (13) ∑ I (x j , y ) j =1
Prediction of Oxygen Decarburization Efficiency
319
In this way the amount of information revealed by the control properties and the data redundancy contained in input attributes are fully considered. The smaller mutual information between the control object and the input properties, the smaller relatively weight is given. On the contrary a greater weight is given. The formula (13) can be put into (6), so that global similarity can be calculated, and the results are ordered according to similarity, the l cases with the greatest similarity with the current case are choose as similar cases to reuse the case in the next step. 2.3 Case Reuse, Modification and Save
Case remake can propose the current solution or final solution for the current problem. With the l most similar cases used in the previous section the solution of current issues can be constructed by the following formula: l
y=
∑ (Sim
k
k =1
yk )
(14)
l
∑ Simk k =1
The rectification work will be handed over to operators to complete. Case save is necessary to normal and effective operation of case-based reasoning system. The cases in case library can be enriched by case save. But as the production progresses, the case library can expand excessively, and the reliability of case-based reasoning will reduce, therefore it is necessary to carry out the maintenance. If the current blowing results meet the technical requirements, and the required number of cases reaches the upper limit, the earliest similar cases in case base could be replaced with the current case.
3 Simulations The results shown oxygen decarburization efficiency of the two stages as follows: 0. 9
Forecast Actual
The efficiency of oxygen decarburization
The efficiency of oxygen decarburization
0.9 0. 85 0.8 0.75 0.7 0.65
0
5
10
15
20
25
30
35
40
the number of samples
(a) In static phase
45
50
Forecast Actual
0. 8 0. 7 0. 6 0. 5 0. 4 0. 3 0. 2 0
5
10
15
20
25
30
35
40
45
50
the number of samples
(b) In dynamic phase
Fig. 3. Predicted value of oxygen decarburization efficiency compared with the actual value
320
M. Han and L. Jiang
Fig.3 shows the predictive value of oxygen decarburization efficiency is well fitted for the actual value which provides a strong guarantee to calculate accurately the next two-stage blowing oxygen. The results of oxygen decarburization efficiency of the two-stage can be taken into the formula (2) and (3) to calculate the static and dynamic blowing oxygen, the comparison between the actual and forecast blowing oxygen is shown in Fig.4. 2000
10000
1800 The actual oxygen blowing
The actual oxygen blowing
9500 9000 8500 8000 7500
1600 1400 1200 1000 800 600 400 200
7000 7000
7500
8000
8500
9000
9500
0
10000
0
200 400 600 800 1000 1200 1400 1600 1800 2000
Predict oxygen blowing
Predict oxygen blowing
(a) In static phase
(b) In dynamic phase
Fig. 4. Calculation of the actual wind blowing oxygen and oxygen deviation
The solid line in Fig.4 shows that the calculated value is equal to the actual value, the points within the dotted lines satisfy the specified scope of absolute errors (static requirement is ± 500, dynamic requirement is ± 300), the points within the imaginary lines meet the range of absolute errors in ± 700 and ± 400. Fig.4 shows the predicted results distributed evenly on both sides of the solid lines. Almost all of the absolute errors of the calculated values and the actual values meet the required range. The data is compared before and after improvement as shown in Table 1. Table 1. Comparison before and after introducing the oxygen decarburization efficiency Static Prediction model
dynamic
Accuracy
Accuracy
RMSE
(±500 m 3 )
RMSE
(±300 m3 )
CBR
429.64
0.84
195.93
0.84
Decarburization efficiency
362.25
0.88
172.41
0.92
Seen from Table 1, when the traditional method of case-based reasoning is used directly to predict the blowing oxygen, mean-square error and accuracy of two stages have no obvious improvement compared with statistical and intelligent methods (shown as Table 3). However the introduction of blowing oxygen model based on oxygen decarburization efficiency, the mean square error of two phases is significantly decreased, and the accuracy is also improved significantly.
Prediction of Oxygen Decarburization Efficiency
321
Table 2. Comparison before and after introducing the mutual information Static Prediction model
Accuracy
dynamic Accuracy
RMSE
(±500 m 3 )
RMSE
(±300 m 3 )
Decarburization efficiency
362.25
0.88
172.41
0.92
This method
328.73
0.88
157.70
0.94
Further case-based reasoning method based on mutual information is introduced to determine the attribute weights as shown in Table 2. The root mean square error (RMSE) of the static model fells to 328.73, and that of the dynamic model fells to 157.7. the percentage of the test sample points whose absolute errors in the static model is less than to 500 m3 is 88%, while that in the dynamic model whose absolute errors in the static model is less than to 300 m3 is 94%. Comparing with traditional methods this method has obvious advantage in various indicators. The method will be compared with other methods using the same data, and the results are shown in Table 3. Table 3. Comparison between this method and the existing methods Static
Dynamic
Prediction model
RMSE
Accuracy (±500 m3 )
RMSE
Accuracy (±300 m 3 )
Regression [2]
454.64
0.78
242.34
0.86
BP[3]
423.28
0.80
187.70
0.90
SVM
405.85
0.82
190.81
0.88
This method
328.73
0.88
157.70
0.94
4 Conclusions Through the mechanism analysis the factors affecting the blowing oxygen in two stages of converter steelmaking can be determined. Then the case-based reasoning based on mutual information which is used to obtain the attribute weights in the process of case retrieval is proposed to determine the oxygen decarburization efficiency, finally the blowing oxygen of the two stages can be calculated. The amount of information contained in the attribute can be associated with the contribution to the cases similarity from the attributes, which can avoid the defects that the traditional casebased reasoning method relies solely on the standard deviation of local similarity to calculate the attribute weights, the accuracy predicting oxygen decarburization
322
M. Han and L. Jiang
efficiency is improve. Experimental results show that the accuracy can reach respectively to 88% 94% with calculating static and dynamic blowing oxygen, which can ensure to obtain the molten steel meeting the requirements.
Acknowledgements This research was supported by the project (60674073) of the National Nature Science Foundation of China, the project (2006BAB14B05) of the National Key Technology R&D Program of China and the project (2006CB403405) of the National Basic Research Program of China (973 Program). All of these supports are appreciated.
References [1] Valyon, J., Horvath, G.: A sparse robust model for a Linz–Donawitz steel converter. IEEE Transactions on Instrumentation and Measurement 58(8), 2611–2617 (2009) [2] Zhu, G.J., Liang, B.C.: Optimum model of static control on BOF steelmaking process. Steelmaking 15(4), 25–28 (1999) [3] Cox, I.J., Levis, R.W., Ransing, R.S.: Application of neural computing in basic oxygen steelmaking. Journal of Material Processing Technology 120(1), 310–315 (2002) [4] Ullman, S., Basri, R.: Recognition by linear combinations of models. IEEE Transactions on Pattern Analysis and Machine Intelligence 13(10), 992–1006 (1991) [5] Chang, P.C., Liu, C.H., Lai, R.K.: A fuzzy case-based reasoning model for sales forecasting in print circuit board industries. Expert Systems with Applications 34(3), 2049–2058 (2008) [6] Huang, M.J., Chen, M.Y., Lee, S.C.: Integrating data mining with case-based reasoning for chronic diseases prognosis and diagnosis. Expert Systems with Applications 32, 856–867 (2007) [7] Stephane, N., Marc, L.L.J.: Case-based reasoning for chemical engineering design. Chemical Engineering Research and Design 86, 648–658 (2008) [8] Virkki, T., Reners, G.L.L.: A case-based reasoning safety decision-support tool: nextcase/safety. Expert Systems with Applications 36(7), 10374–10380 (2009)
Off-line Signature Verification Based on Multitask Learning You Ji, Shiliang Sun, and Jian Jin Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 200241, P.R. China [email protected], {slsun,jjin}@cs.ecnu.edu.cn
Abstract. Off-line signature verification is very important to biometric authentication. This paper presents an effective strategy to perform offline signature verification based on multitask support vector machines. This strategy can get a significant resolution of classification between skilled forgeries and genuine signatures. Firstly modified direction feature is extracted from signature’s boundary. Secondly we use Principal Component Analysis to reduce dimensions. We add some helpful assistant tasks which are chosen from other tasks to each people’s task. Then we use multitask support vector machines to build a useful model. The proposed model is evaluated on GPDS and MCYT data sets. Our experiments demonstrated the effectiveness of the proposed strategy. Keywords: Off-line Signature Verification, Multitask Learning, Support Vector Machines, Machine Learning.
1
Introduction
Handwritten signatures as a behavioral biometric are widely used in the field of human identification [1]. Off-line signature verification is different from on-line signature verification which can extract dynamic features from time, pressure, acceleration, stroke order, etc. Because of the lack of dynamic information, offline signature verification is more difficult than on-line signature verification. Thereby automated off-line signature verification has been a challenging problem for researchers. In an off-line signature verification system, three kinds of forgery such as random, simple and skilled forgeries are considered [2]. The random forgery is usually represented by a signature sample that belongs to a different writer of the signature model. The simple forgery is represented by a signature sample with the same shape of the genuine writer’s name. The skilled forgery is represented by a suitable imitation of the genuine signature model. Because the skilled forgeries are very similar to genuine signatures, it is difficult to distinguish skilled forgeries from genuine signatures [2]. To overcome this difficult problem, global shape features which are based on the projection of the signature and local grid features are proposed in [3]. Local features such as stroke and sub-stroke are discussed in [4]. Structural features [5,6] are also used to improve accuracy of off-line signature verification. The D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 323–330, 2011. c Springer-Verlag Berlin Heidelberg 2011
324
Y. Ji, S. Sun, and J. Jin
Gradient, Striatal and Concavity (GSC) features [7] are also successful in offline signature verification. Various classifiers, such as Neural Networks [8], Support Vector Machines (SVMs), or a threshold decision method are employed. Effective models such as fuzzy model [9] and Hidden Markov Models [10] are used to improve accuracy of forgery detection. So far, off-line signature verification is still an open problem which needs more efforts to address [2]. In this paper, we mainly aim at designing of multitask learning strategy which can solve the problem of distinguishing skilled forgeries from genuine signatures. If we take everyone’s signature verification as a single task, some common information may be shared between tasks. If we take one’s signature verification as the target task, we may add some other peoples’ signature verification as assistant tasks to help improve the accuracy of classification. Because of the sharing of common information between tasks, the generalization of classifiers are highly improved. The rest of the paper is organized as follows. Section 2 mainly discusses feature extraction and dimensionality reduction. Section 3 introduces multitask SVMs. The effective strategy is proposed in Section 4. Besides reporting experimental results, Section 5 also includes analysis of experimental results. Finally, Section 6 presents our conclusions and future work.
2
Feature Extraction
Feature extraction plays an important role in a signature verification system. A suitable feature extraction method will improve the accuracy of classification. Because of the effectiveness of Modified Direction Feature (MDF) [6,11,12], we use it as our feature extraction method. Here, we just introduce it simply. Details may be found in [6,11,12]. Main steps of feature extraction can be seen in Table 1. This feature extraction strategy is based on two other feature extraction techniques, Direction Feature (DT) [11,12] and Transition Feature (TF) [11,12]. DF mainly aim at the extraction of direction features while TF records the locations of the transitions between foreground and background in signature’s boundary. Reducing the dimensions of feature vectors extracted by MDF is an essential part of signature verification. We use Principal Component Analysis (PCA) [13] to reduce dimensions. Then we will get a vector with lower dimensions. Table 1. MDF Feature Extraction Strategy Input: Gray image 1. 2. 3. 4.
Image preprocessing. Retrieve the boundary of the image. Replace foreground pixels with direction numbers. Extract location and direction feature in pairs.
Off-line Signature Verification Based on Multitask Learning
3
325
Multitask Support Vector Machines
If we take each people’s signature verification as a single task, some common information between tasks may be found by multitask learning [14,15]. Because of the success of regularized multitask learning [14], we use multitask SVMs as our classifiers. In order to apply multitask, we consider the following setup. We assume that there are T persons’ signatures. Everyone have two categories, genuine signatures and skilled forgeries. And we assume all the data for the tasks come from the same distribution X × Y . For simplicity, we take X ⊂ Rd , and Y ⊂ {−1, 1} (-1 stands for skilled forgeries and 1 stands for genuine signatures). For the t person’s signature verification, the n samples {(xt1 , yt1 ) , (xt2 , yt2 ) , . . . , (xtn , ytn )} ⊂ Pt
(1)
are get from the distribution Pt on X × Y . So, all samples are shown below. {{(xt1 , yt1 ) , . . . , (xtn , ytn )} , . . . , {(xT 1 , yT 1 ) , . . . , (xT n , yT n )}}
(2)
For single-task SVMs, it is trying to find a hyperplane which can maximize the margin between two classes. That is ft (x) = wt ·x (t stands for the t person’s learning task, and · stands for dot product). Kernel methods can extend it to nonlinear models. Because T tasks are sampled from the sample distribution X × Y . We may assume all wt are close to a mean function w0 . Then wt for all t (t ⊂ {1, . . . , T }) can writhen as wt = w0 + vt (3) where w0 stands for the information shared among T signature verification tasks. To this end, we have to solve the following optimization problem [14] which is analogous to sing-task SVMs. λ1 T n 2 2 λ1 min J(w0 , vt , εit ) := εit + T vt + λ2 w0 w0 ,vt ,εit
t=1 i=1
T
∀i ∈ {1, 2, . . . , nt } , ∀t ∈ {1, 2, . . . , T } , subject to yit (w0 + vt ) · xit > 1 − εit εit > 0
(4)
With the notion of feature map [16], we may rewrite the equations as below. ⎛ ⎞ Φ (x, t) = ⎝ √xtu , 0, . . . , xt , 0, . . . , 0⎠
t−1
u = Tλλ12 √ w = ( uw0 , v1 , . . . , vT ) Then we may get
T −t
w · Φ (x, t) = (w0 + vt ) · xt T 2 w = vt 2 + w0 2 . t=1
(5)
(6)
326
Y. Ji, S. Sun, and J. Jin
Solving the multitask SVMs problem (4) is equivalent to solving a standard SVMs which uses the linear kernel below.
Kst (Φ(x, s) , Φ (z, t)) = u1 + δst x · z 0 if s = t (7) δst = 1 if s = t Then the multitask SVMs can be solved by multitask kernels. We may use multitask kernels to replace regular kernels in SMVs’ toolbox such as LibSVM [17]. More details about nonlinear multitask kernels are given in [14].
4
The Strategy for Signature Verification
For simplicity, we take simple forgeries as skilled forgeries. Then signature verification may split into two classification problems. One is classification between skilled forgeries and genuine signatures. The other one is classification between random forgeries and genuine signatures. Here we just focus on the first classification problem. Although experiments in [14] demonstrate the useness of multitask learning strategy, we may get worse results when we put all the tasks together and learn by multitask learning. That is because not all tasks are sampled from the same distribution. If we want to get better results than single-task learning, we have Table 2. Algorithms to Find Helpful Assistant Tasks Input: integer i stands for the ith task. integer T stands for the number of tasks. Output: set A means a set which contains the ith task and helpful assistant tasks for ith task. Functions: SVMs means multitask SVMs. if the input is just one task, it acts as single-task SVMs. A = {i} while(1) MaxAccuracy=SVMs(A), HelpTask=0 for j = 1 to T S =A∪j Accuracy=SVMs(S) if( Accuracy > MaxAccuracy ) MaxAccuracy=Accuracy; HelpTask=j end if end for if ( MaxAccuracy > SVMs(A) ) A = A ∪ HelpT ask else break end if end while
Off-line Signature Verification Based on Multitask Learning
327
to add some helpful assistant tasks to the target task. We assume these tasks are sampled from the same distribution. How to find helpful assistant tasks is described as the pseudo code in Table 2. The main idea of this algorithm is adding other person’s signature verification task as assistant task until reach the max accuracy on training data sets. At last, each task will get some helpful assistant tasks. Then we deal with these tasks together by multitask SVMs.
5
Experiments
Experiments are carried out on GPDS [18] and MCYT [19] data sets, respectively. 5.1
The Evaluation of Experiments
Error Rate (ER), False Acceptance Rate (FAR) and False Rejection Rate (FRR) are three parameters used for measuring performance of any signature verification method. ER, FAR and FRR are calculated by equations as below. ER =
number of wrong predicted images number of images
× 100%
× 100%
F AR =
number of f orgeries accepted number of f orgeries tested
F RR =
number of genuine signatures rejected number of genuine signatures
(8) × 100%
We compute the Average ER (AER), Average FAR (AFAR) and Average FRR (AFRR) on T persons by equations below. AER =
1 T
T t=1
ERyt ,
AF AR =
1 T
T t=1
F ARt ,
AF RR =
1 T
T t=1
F RRt
(9)
For comparison, four kernel methods are used in the table below. In our experiments, we use cross validation to choose parameters for each task. 5.2
Experiments on GPDS Database
Details about GPDS Database. GPDS database contains images from 300 signers. Each signer has 24 genuine signatures and 30 skilled forgeries. The 24 genuine specimens were produced in a single day writing sessions. The skilled forgeries were produced from the static image of the genuine signature which is chosen randomly from the 24 genuine signatures. Forgers were allowed to practice the signature for as long as their wishes. Each forger imitated 3 signatures of 5 signers in a single day writing session. So for each genuine signature there are 30 skilled forgeries made by 10 forgers from 10 different genuine specimens. After we use MDF to extract features, we employ PCA to reduce dimensions. The dimensions are reduced from 2400 to 113. For every people, we just randomly choose 12 genuine signatures and 15 skill forgeries to train. The images
328
Y. Ji, S. Sun, and J. Jin Table 3. Common Kernels Kernel Function
Formula
Linear Ploynomial RBF Sigmoid
(xi · xj ) (xi · xj + 1)p 2 2 e−xi −xj /2σ tanh(kxi · xj − δ)
Table 4. Comparison between Single-task SVMs and Multitask SVMs
Linear Polynomial RBF Sigmoid
AER(%) AFAR(%) AFRR(%) SSVMs MSVMs SSVMs MSVMs SSVMs MSVMs 20.30 6.72 15.46 4.67 26.36 9.29 15.46 4.67 20.00 7.40 25.81 10.04 26.36 9.29 25.81 10.04 29.33 13.13 20.00 7.40 11.92 5.56 25.89 8.53
left (12 genuine signatures and 15 skilled forgeries) are used to test. We get average evaluation parameters (AER, AFAR, AFRR) by repeating 10 times. For simplicity, We let u in (7) equal to 1, although this parameter may get better results if we use cross validation to choose parameter. Items of Table 2 show that our strategy is much better than single-task SVMs’ solution. For instance, if we use single-task SVMs as classifiers, the smallest value of AER is 15.46%. But we can get better results (4.67%) when we use multitask SVMs after we find and add some helpful assistant tasks to our target task. Considering all kernels used in our experiments, all evaluation parameters are get better results than single-task SVMs’ results. 5.3
Experiments on MCYT Database
Details about MCYT Database. For evaluating effectiveness of our method, a sub-corpus of the larger MCYT bimodal database which contains 2250 images is used in our experiments. Each person has 15 genuine signatures and 15 forgeries which are contributed by three different user-specific forgers. The 15 genuine signatures were acquired at different times (between three and five) of the same acquisition session. At each time, between one and five signatures were acquired consecutively [2]. After we employ PCA to reduce dimensions which are extracted by MDF, the dimensions are reduced from 2400 to 102. For every people, we just randomly choose 7 genuine signatures and 7 skilled forgeries to train. The images left (8 genuine signatures and 8 skilled forgeries) are used to test. We get average evaluation parameters (AER, AFAR, AFRR) by repeating 10 times. For simplicity, We let u in (7) equal to 1.
Off-line Signature Verification Based on Multitask Learning
329
Table 5. Comparison between Single-task SVMs and Multitask SVMs
Linear Polynomial RBF Sigmoid
AER(%) AFAR(%) AFRR(%) SSVMs MSVMs SSVMs MSVMs SSVMs MSVMs 15.26 4.22 20.20 4.29 10.32 3.35 20.20 4.29 15.50 4.75 11.17 3.87 10.32 3.35 11.17 3.87 14.58 9.33 15.50 4.75 16.37 5.85 13.08 10.00
Table 3 demonstrated that the proposed methods outperformed the singletask SVMs methods. For instance, if we use single-task SVMs as classifiers, the smallest value of AER is 10.32%. But we can get better results (3.35%) when we use multitask SVMs after we find and add some helpful assistant tasks to our target task. Considering all kernels used in our experiments on MCYT corpus, all evaluation parameters are get better results than single-task SVMs’ results.
6
Conclusion and Future Work
This paper mainly aims at classification between skilled forgeries and genuine signatures which is a difficult problem for signature verification. In this paper, we build our model based on multitask learning. And we proposed an effective algorithm to select help assistant task for our main task. Experiments on GPDS and MCYT demonstrated that the proposed methods achieved favorable verification accuracy. The verification between random forgeries and genuine signatures is focused in the future work. Future work is also focused on classifier combination, feature selection and active learning for signature verification. Acknowledgments. This work is supported in part by the National Natural Science Foundation of China under Project 61075005, 2011 Shanghai Rising-Star Program, and the Fundamental Research Funds for the Central Universities.
References 1. Fierrez, J., Ortega, G.J., Ramos, D., Gonzalez, R.J.: HMM-based on-line Signature Verification: Feature Extraction and Signature Modeling. Pattern Recognition Letters 28, 2325–2334 (2007) 2. Wen, J., Fang, B., Tang, Y., Zhang, T.P.: Model-based Signature Verification with Rotation Invariant Features. Pattern Recognition 42, 1458–1466 (2009) 3. Qi, Y., Hunt, B.R.: Signature Verification Using Global and Grid Features. Pattern Recognition 27, 1621–1629 (1994) 4. Fang, B., Leung, C.H., Tang, Y.Y., Tse, K.W., Kwok, P.C.K., Wong, Y.K.: Offline Signature Verification by the Tracking of Feature and Stroke Positions. Pattern Recognition 36, 91–101 (2003)
330
Y. Ji, S. Sun, and J. Jin
5. Huang, K., Yan, H.: Off-line Signature Verification Using Structural Feature Correspondence. Pattern Recognition 35, 2467–2477 (2002) 6. Nguyen, V., Blumenstein, M., Leedham, G.: Global Features for the Off-Line Signature Verification Problem. In: International Conference on Document Analysis and Recognition, pp. 1300–1304 (2009) 7. Kalera, M.K., Srihari, S., Xu, A.: Off-line Signature Verification and Identification Using Distance Statistics. International Journal of Pattern Recognition and Artificial Intelligence 18, 1339–1360 (2004) 8. Armand, S., Blumenstein, M., Muthukkumarasamy, V.: Off-line Signature Verification Using the Enhanced Modified Direction Feature and Neural-based Classification. In: International Joint Conference on Neural Networks, pp. 684–691 (2006) 9. Madasu, H., Yusof, M.H.M., Madasu, V.K.: Off-line Signature Verification and Forgery Detection Using Fuzzy Modeling. Pattern Recognition 38, 341–356 (2005) 10. Coetzer, J., Herbst, B.M., Preez, J.A.: Offline Signature Verification Using the Discrete Radon Transform and a Hidden Markov Model. EURASIP Journal on Applied Signal Processing 2004, 559–571 (2004) 11. Blumenstein, M., Liu, X.Y., Verma, B.: An Investigation of the Modified Direction Feature for Cursive Character Recognition. Pattern Recognition 40, 376–388 (2007) 12. Blumenstein, M., Liu, X.Y., Verma, B.: A Modified Direction Feature for Cursive Character Recognition. In: International Joint Conference on Neural Networks, vol. 4, pp. 2983–2987 (2005) 13. Esbensen, K., Geladi, P., Wold, S.: Principal Component Analysis. Chemometrics and Intelligent Laboratory Systems 2, 37–52 (1987) 14. Evgeniou, T., Pontil, M.: Regularized Multi-task Learning. In: International Conference on Knowledge Discovery and Data Mining, pp. 109–117 (2004) 15. Sun, S.: Multitask Learning for EEG-based Biometrics. In: Proceedings of the 19th International Conference on Pattern Recognition, pp. 1–4 (2008) 16. Vapnik, V.N., Vapnik, V.: Statistical Learning Theory, vol. 2 (1998) 17. Chang, C.C., Lin, C.J.: LIBSVM: a Library for Support Vector Machines (2001) 18. Vargas, J.F., Ferrer, M.A., Travieso, C.M., Alonso, J.B.: Off-line Handwritten Signature GPDS-960 Corpus. In: International Conference on Document Analysis and Recognition, vol. 2, pp. 764–768 (2007) 19. Ortega, G.J., Fierrez, A.J., Simon, D., Gonzalez, J., Faundez, Z.M., Espinosa, V., Satue, A., Hernaez, I., Igarza, J.J., Vivaracho, C.: MCYT Baseline Corpus: a Bimodal Biometric Database. Vision Image and Signal Processing 150, 395–401 (2004)
Modeling and Classification of sEMG Based on Instrumental Variable Identification* Xiaojing Shang 1, Yantao Tian 1,2, and Yang Li 1 1
School of Communication Engineering, Jilin University, Changchun {130025,312511605}@qq.com 2 Key Laboratory of Bionic Engineering, Ministry of Education Jilin University
Abstract. sEMG is biological signal produced by muscular. According to the characteristics of myoelectric signal, FIR model of the single input and multiple outputs was proposed in this paper. Due to the unknown input of the model, instrumental variable with blind identification was used to identify the model’s transfer function. The parameters of model were used as input of neural network to classify six types of forearm motions: extension of thumb, extension of wrist, flexion of wrist, fist grasp, side flexion of wrist, extension of palm. The experimental results demonstrate that this method has better classification accuracy than the classical AR method. Keywords: sEMG; Instrumental variable; Blind identification; Probabilistic neural networks; Pattern recognition.
1 Introduction sEMG[1] is obtained from the neuromuscular activity, which reflects the function and status of nerve and muscle. Using sEMG can identify different movement patterns, because different actions corresponding to different sEMG signals. Therefore, how to extract features of sEMG effectively and accurately is the key to control artificial limb using myoelectric signal. At present, analyzing characteristics and building model of sEMG signal is rare at home and abroad. In order to make full use of the correlation of signals between channels, Doerschuk [2] proposed multivariate AR model of four channels based on the original AR model proposed by Graupe. The better recognition has obtained by using the model parameters as the input of classifier. According to the complex computation, this method is not suitable for real-time implementation. Therefore, Zhi zhong Wang et al[3] proposed IIR method to establish sEMG model. This method has simple computation and implement online easily, which has better prospects. *
This paper is supported by the Key Project of Science and Technology Development Plan of Jilin Province (Grant No.20090350), Chinese College Doctor Special Scientific Research Fund (Grant No.20100061110029) and the Jilin University "985 project" Engineering Bionic Sci. & Tech. Innovation Platform.
D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 331–339, 2011. © Springer-Verlag Berlin Heidelberg 2011
332
X. Shang, Y. Tian, and Y. Li
In this paper, FIR model of two-channel single input multiple output sEMG is proposed after analyzing myoelectric signals. The whole pathway of sEMG only has zero points, which calculate simply than all-pole IIR model. Due to the unknown input of the model, blind identification is proposed in this paper to identify the model’s transfer function. It is difficult to determine the form of noise, which would create difficulties for modeling. Therefore, the blind identification method based on instrumental variable which does not need to know the specific form of noise, can identify transfer function of model parameters easily and quickly. These parameters are used as the input of neural network to recognize six types of forearm motions: extension of thumb, extension of wrist, flexion of wrist, fist grasp, side flexion of wrist, extension of palm. The experiments results show that it can achieve good results.
2 Modeling of Two-Channel Electromyography Signals EMG is under the control of the central nervous system[4].In this paper, the whole generating process of EMG is equivalent to a FIR system. The input is an electronic pulse sequence caused by motor units. Conduction of muscle fibers and filtering of skin and electrode can be used as transfer function H of FIR system. Then the output will be the EMG which we want to acquisition. Figure 1 is the description of FIR model.
H (1)
x (1)
N (1)
y (1)
u N ( 2) H (2)
x (2)
y ( 2)
Fig. 1. Model of two-channel EMG signal
Where u is the excitation signal produced by the central nervous, H (1) and H (2) are transfer functions of the muscle fibers’ transmission channels; x is the system output; N is the observation noise; y is the actual EMG signal.
3 Instrumental Variable Blind Identification Blind identification is basic parameters identification method, which establishes only on the basis of the system output data and do not directly depend on the input signal to estimate the system parameters. Blind identification gets more and more attention in the field of communication and signal process, which also produced a series of techniques and methods. Blind identification theory has also important applications in seismic signal analysis and image processing [5]. In this paper, two-channel sEMG model is proposed. The input of the model is unknown and unpredictable, and only the output can be measured. So the identification method is not the same as traditional system identification whose both input and output
Modeling and Classification of sEMG Based on Instrumental Variable Identification
333
signals are known. Therefore, the blind channel identification method is used to establish muscle signal model to recognize actions. A SIMO FIR system is applied here, T where y ( t ) = ⎡⎣ y1 ( t ) , y2 ( t ) , ", ym ( t ) ⎤⎦ are output signals, u is input, N ( t ) = ⎡⎣ N1 ( t ) , N 2 ( t ) , ", N m ( t ) ⎤⎦
T
are irrelevant random noise vectors, and H is the transfer
function: H(i ) ( z ) = bi ( 0) + bi (1) z−1 + bi ( 2) z−2 +"+ bi ( ni ) z−ni
(1)
where bi ( l ) is the system parameter which needs to be identified. From fig.1, we can obtain the relation of input and outputs: y ( i ) = H ( i ) ( z )u ( t ) + N (i ) ( t ) , i = 1, 2
(2)
Because the input u is unknown, for any un-zero constant α , we can get the equation: ⎡1 ⎤ y ( i ) = ⎢ H (i ) ( z ) ⎥ (α u ( t ) ) + N (i ) ( t ) , i = 1, 2 ⎣α ⎦
(3)
The equation (2) and (3) have the same output, that is, any method can not identify transfer function H uniquely. In order to recognize transfer function H uniquely, according to the literatures[6], we could normalize transfer function or input signal first, let the norm of the unknown coefficients in transfer function H as 1, that is ni
∑ b (l ) = 1 . We can also suppose that the transfer function are polynominal, whose l =0
2
i
coefficients are all 1, that is bi ( 0 ) = 1, i = 1, 2 , and this suppose is applied in this paper. From fig.1, we can obtain: y (1) ( t ) − N (1) ( t ) = H (1) ( z ) u ( t ) y (2) ( t ) − N (2) ( t ) = H (2) ( z ) u ( t )
(4)
Divided the two ends of the above equations respectively, we can get: y (1 ) (t y ( 2 ) (t
)− )−
N N
(1 ) (2)
(t ) (t )
=
H H
(1 ) (2)
(z ) (z )
(5)
Through cross multiplication, there is: H (1) ( z ) ( y(2) (t ) − N(2) ( t ) ) = H (2) ( z ) ( y(1) ( t ) − N(1) ( t ) )
(6)
Then, write the above equation into differential equation forms as : n1
n2
n2
n1
l =1
l=1
l =1
l=1
y(2) ( t ) + ∑b1 ( l ) y(2) ( t − l ) = y(1) ( t ) + ∑b2 ( l ) y (1) ( t − l ) − N (1) ( t ) − ∑b2 ( l ) N (1) (t − l ) + N (2) (t ) + ∑b1 (l ) N (2) ( t − l )
(7)
Define the vector λT = ⎡⎣−y(2) (t −1) , −y(2) ( t −2) ,"− y(2) ( t −n1) , y(1) ( t −1) , y(1) (t −2) ,"y(1) ( t −n2 )⎤⎦
T
(8)
n1
n2
σ = ⎡⎣b1 (1) ,b1 ( 2) ,",b1 ( n1) ,b2 (1) ,b2 ( 2) ,",b2 ( n2 )⎤⎦ e( t) = N ( t) +∑b1 ( l) N ( t −l) −N ( t) −∑b2 (l) N ( t −l) T
(2)
(2)
l =1
And the equation (8) can be rewritten as:
(1)
(1)
l =1
334
X. Shang, Y. Tian, and Y. Li y (2) ( t ) − y (1) ( t ) = λ T σ + e ( t )
(9)
It can be seen that noise e ( t ) is the superposition of the two-channel noise through above equations, and the specific form of the noise can not be determined in the course of the analysis. Therefore, the auxiliary variable technique, which does not need to know the specific form of noise to identify the model parameters and can simplify the process of identification, can be used to recognize here. But the traditional auxiliary variable method takes the input vector as an auxiliary variable matrix, then the key of the problem is how to construct the auxiliary variable matrix in the case of unknown input signal. The choice of auxiliary variable should meet certain conditions: the selected auxiliary variables should be independent of the noise signal of N (1) ( t ) and N (2) ( t ) , but relate to y (1) ( t ) and y (2) ( t ) . For this system model, the basic idea is that assuming the system rank n1 , n2 is known, under the conditions of known output, combining the instrumental variable blind identification method , and linear combination of the double channel signals is used to construct instrumental variables to identify the model parameters. We can construct instrumental variate matrix according to the above ideas and derivation instrumental variate blind identification method. From formula (9) ⎡ y (2) (1) − y (1) (1) ⎤ ⎡ e (1) ⎤ ⎡λ T (1) ⎤ ⎢ (2) ⎥ ⎢ ⎥ ⎢ T ⎥ (1) ⎢ y ( 2) − y ( 2)⎥ , ⎢λ ( 2 )⎥ E = ⎢ e ( 2 ) ⎥ Y =⎢ A=⎢ ⎥ ⎢ # ⎥ ⎥ # ⎢ ⎥ ⎢ ⎥ ⎢ # ⎥ ⎢ y (2) ( n ) − y (1) ( n ) ⎥ ⎢e ( n )⎥ ⎢λ T ( n )⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦
Then, the formula (9) can be written as Y = Aσ + E
(10)
Where n is the length of data, assume that there exists a matrix F, which has the same dimension with A, make the formula (11) like this E ⎡⎣FT E⎤⎦ = lim⎡⎣FT E⎤⎦ = O E ⎡⎣FT A⎤⎦ = lim⎡⎣FT A⎤⎦ = Q n→∞ n→∞
(11)
Where Q is singular matrix, from the formula we can see that F is not related to E, but related to A. In formula (10) , both sides of the equation are multiplied by at the same time, then the unbiased estimation of can be got σ : σ = ( F T A) F T Y ∧
−1
(12)
That is E ⎡⎢σ ⎤⎥ = σ .Traditional instrumental variables matrix is structured by input ⎣ ⎦ signal which is unknown in this paper. Therefore, the traditional method can not be used here. In order to satisfy the conditions of formula (11), we can use any channels signal to construct instrumental variables. This paper adopts linear combination of double channel output signal to construct instrumental variable matrix, it is more in line to the conditions of the instrumental variables F and A are correlated strongly, then F and E are independent of each other, its form is as follows: ∧
⎡−y(1) ( −n2 ) "− y(1) (1− n2 − n1 ) ⎢ (1) − y 1 − n " − y(1) ( 2 − n2 − n1 ) ( 2) ⎢ F =⎢ # # ⎢ ⎢−y(1) ( n −1− n ) "− y(1) ( n − n − n ) 2 2 1 ⎣
y(2) ( 0)
" y(2) (1− n2 ) ⎤ ⎥ " y(2) ( 2 − n2 ) ⎥ ⎥ # # ⎥ (2) (2) y ( n −1) "y ( n − n2 ) ⎥⎦ y(2) (1)
(13)
Modeling and Classification of sEMG Based on Instrumental Variable Identification
335
Auxiliary matrix F has constructed, combine with formula (12), the unbiased estima∧
tion of model coefficient vector σ will be easily to obtained . From the experiment we found that when n1 = n2 = 6 , the signal waveform produced by model is similar to the actual one, which means the model coefficients we got can represent signal characteristics well. Figure 2 is the comparison between actual signals and model signals by selecting actions and activity period signals random. Table 1 is the coefficient of part of the signal model. As Figure 2 shows that output signals of model are similar to actual ones, and by experiment we find that adopting the linear combination of the double channel signal to construct the instrumental variables is better than using any channels signal to construct instrumental variate modeling, which means that the instrumental variable method is feasible to identify model coefficients. The data in Table 1 also illustrate that there has big differences among all kinds of features. 2
x 10
4
actual measured signal model output
1.5
signal amplitude
mv
1 0.5 0 -0.5 -1 -1.5 -2 -2.5
0
100
200 300 400 sEMG signal points sampled
500
600
Fig. 2. The contrast between actual signal and model signal of any action Table 1. Part signal model coefficients model coefficients
extension of thumb
extension of wrist
fist grasp
side flexion of wrist
b1 1
-2.7112
-2.5559
-2.8729
-2.9918
b1 2
3.8379
2.9515
4.5583
5.0158
b1 3
-3.6107
-2.1715
-4.0603
-5.1942
b1 4
2.3765
1.2032
3.2487
3.7148
b1 5
-1.0448
-0.5534
-1.4935
-1.7414
b1 6
0.2726
0.1872
0.4253
0.4344
b2 1
-3.9609
2.1759
-2.7531
-4.1177
b2 2
5.1284
0.605
7.1993
6.9812
b2 3
-5.0144
-3.5277
-5.2593
-6.2875
b2 4
5.8978
1.6458
1.3623
3.024
b2 5
-3.2873
-1.7778
-1.0567
-0.6082
b2 6
0.8377
0.4889
0.2225
0.4837
336
X. Shang, Y. Tian, and Y. Li
4 Movement Patterns Identification Probabilistic neural network (PNN) is an important deformation of radial basis function (RBF) network, which proposed by Specht in 1990. It has four layers: input layer, mode layer, summation layer and output layer. The input layer is training sample which can be expressed as X = ( x1, x2 , ⋅⋅⋅, xq )T . Its neurons numbers are equal to the sample vector dimension. The mode layer firstly calculates the distance between training sample and weight vector & X − IWij &2 , then get output vector M through radial basis nonlinear mapping. It often uses Gaussian function as the Radial basis function. The output vector M represents the probability of training sample belongs to every sort. The calculation formula is shown as equation (14). M ij ( X ) =
1 (2πσ 2 )
n
2
⎡& X − IWij &2 ⎤ exp ⎢ ⎥ i ∈ {1, 2, ⋅ ⋅ ⋅, n} , j ∈ {1,2, ⋅ ⋅ ⋅, m} 2σ 2 ⎢⎣ ⎥⎦
(14)
Where: n is total number of mode, m is neurons number in hidden layer (it is equal to the number of input samples), σ is smooth parameter, whose value determines the width of clock shape curve, which takes sample points as the center. The summation layer calculates the weight of vectors M and S : Ni
Si ( X ) = ∑ ωij M ij ( X ) ωij ∈ [0,1], i ∈ {1, 2, ⋅⋅⋅, n} , j ∈ {1, 2, ⋅⋅⋅, m}
(15)
i =1
Where ωij indicates the mixed weight and satisfies: Ni
∑ω i =1
ij
=1
i ∈ {1, 2, ⋅⋅⋅, n} , j ∈ {1, 2, ⋅⋅⋅, m}
represents the neuron numbers of the i th kind of mode layer. The output layer gets the network output O through the maximum response in
(16)
Ni
O ( X ) = arg max ( S i ) , i ∈ {1, 2, ⋅⋅⋅, n}
S
: (17)
Let the value of neurons output which correspond to the maximum category is 1, and other kinds of neurons are 0. The parameter of radial basis function is spread rate which had the significant effect on experiment. It needs to be manually set during the experiments which complicate the experiment process. Hence, this paper adopts PSO method to optimize the parameter. It not only simplifies the experimental process, and makes the results more effective [10]. By the time of network training, let S as the objective vector T . T is n-dimension vectors. Each of them corresponding to different pattern categories respectively, in which, there is one and only one component to be 1, the rest are all 0. This means that corresponding input vector belongs to the corresponding mode. It should determine the structure of PNN network first. According to the characteristics of the radial basis function, the neurons number of input layer is equal to the input sample vector dimension, the neurons number of output layer is equal to the number of types in the training sample data. The output layer of the network is competition layer, each neuron respectively corresponding to a data type. The structure of
Modeling and Classification of sEMG Based on Instrumental Variable Identification
337
PNN network designed is that input layer has 4 neurons, output layer has 6 neurons, and the transfer function of intermediate layer is Gaussian function, the transfer function of the output layer is linear function. This paper collected 6 kinds of gestures : extension of thumb, extension of wrist, flexion of wrist, fist grasp, side flexion of wrist, extension of palm. Each kind has 10 groups data. After establishing the model, the instrumental variables blind identify method is used to identify the model coefficients. Among the 10 groups ,5 groups are used for neural network training and other 5 groups for testing.
5 Experimental Result Model coefficients are used to classify the movement patterns after modeling the two-channel sEMG. First, let 5 groups date of every movement as the training samples of networking, each channel has 6 rank which means 6 characteristic parameters. We got 12 dimension characteristic vector because of two channels. The training data of 6 movements make up a 12 × 30 characteristic matrix. Then use other 30 groups data to test the network which has been trained. Figure 3 is the feature space distribution of 6 kinds of gesture actions, the axis is the second ,the third and the seventh model coefficients respectively. Figure 4 and Figure 5 is the network training results and the error chart respectively. Table 2 is the comparison results between the double-channel sEMG recognition.
model coeffic ients b(7)
4 2 0 -2 -4 -6 2 0
6 5
-2
4 3
-4 model coefficients b(3)
-6
2 1
model coefficients b(2)
Fig. 3. Random three feature space distribution
forecasting category actual category
forecasting category actual category
error after network training 1
error after network training 2
0.8
5
1
0.6
6
0.4
5
0 classification results
classification results
6
0.2 4 0 3
-0.2 -0.4
2
-1
4
-2
3
-3
2
-0.6 -4
1
1
-0.8 0
20 40 predict number
60
-1
0
20 40 predict number
60
Fig. 4. Network training rendering and error chart after training network
0
10 20 30 predict number
40
-5
0
10 20 30 predict number
40
Fig. 5. Test rendering and error chart after training
338
X. Shang, Y. Tian, and Y. Li
Figure 3 indicates that there have good separability between each pattern, and the effect of clustering is good. It is suitable for classification when using the model coefficients as multi-channel sEMG features by instrumental variable method, so that the extraction of features is effective. From Figure 4 and 5 we can see that the recognition result of the gesture actions by the probabilistic neural network is good. Testing for trained network, only 3 samples are error among 30 samples. The recognition correct is quite high. Table 2. The comparison of the double channel sEMG recognition results (correct number / test number)
classification
extension of
extension of
flexion of
fist
side flexion
extension of
method
thumb
wrist
wrist
grasp
of wrist
palm
4/5
5/5
5/5
5/5
4/5
4/5
4/5
3/5
5/5
3/5
3/5
instrumental variable method
AR coefficient
3/5
In order to compare the features got by the instrumental variable method with the traditional AR method, three order AR coefficient of two-channel are used as the character of the signal. The recognition results of two methods are shown in Table 2. It can be seen that the average recognition rate of the instrumental variable method is 90%, and the average recognition rate of the AR coefficients is only more than 70%. The classification effect of instrumental variable method is better than AR method in extensor wrist, clench fist and forefinger extend. Obviously using instrumental variable method to classify sEMG signals is effective.
6 Conclusion Based on the characteristics of sEMG, model for two-channel sEMG signals are built by the instrumental variable method, and the model coefficients were used as the input of neural network to classify six gesture movements. At the same time the traditional AR coefficients also used as signal features to compare with the method above. The experimental results show that the features got through the blind identification modeling are better than the traditional AR coefficient in classification. Instrumental variable blind identification method is often used in communication and signal processing field (such as the multiuser communications system, multi-sensor radar, sonar system, microphone array, etc.), etc. This paper is an attempt of blind identification theory at physiologic signal application. The experiment shows that this method has good prospects in movement pattern recognition.
Modeling and Classification of sEMG Based on Instrumental Variable Identification
339
References 1. Jin, D.W., Wang, R.C.: Artificial lntelligent prosthesis. Chinese J. Clinical Rehabilitation 5(20), 2994–2995 (2002) 2. Doerschuk, P.C., Gustafson, D.E., Willsky, A.S.: Upper extremity limb function discrimination using EMG signal analysis. IEEE Transactions on Biomedical Engineering 30(1), 18–29 (1983) 3. Cai, L., Wang, Z., Liu, Y.: Modelling and Classification of Two-Channel elect romyography Signals Based on Blind Channel Identification Theory. Journal of Shang Hai Jiao Tong University 34(11), 1468–1471 (2000) 4. Deluca, C.: Physiology and mathematics of myoelectric signals. IEEE Transactions on Biomedical Engineering 26(6), 313–325 (1979) 5. Luo, H., Li, Y.D.: Application of blind channel identification techniques to restack seismic deconvolution. Proceedings of IEEE 86(10), 2082–2089 (1998) 6. Ding, F., Chen, T.: Identification of Hammerstein nonlinear AR-MAX systems. Automatica 41(9), 1479–1489 (2005) 7. Shang, X., Tian, Y., Li, Y., Wang, L.: The Recognition of Gestures and Movements Based On MPNN. Journal of Jilin University (Information Science Edition) 28(5), 459–466 (2010)
Modeling and Classification of sEMG Based on Blind Identification Theory* Yang Li1, Yantao Tian1,2,**, Xiaojing Shang1, and Wanzhong Chen1 1 2
School of Communication Engineering, Jilin University, Changchun, 130025 Key Laboratory of Bionic Engineering, Ministry of Education Jilin University [email protected]
Abstract. Surface electromyography signal is non-stationary, susceptible to external interference. For this situation under this case, cyclostationary input with the inverse nonlinear mapping of the Hammerstein-Wiener model were combined to build surface electromyography model and to realize the blind discrete nonlinear system identification. The parameters of model were used as input of improved BP neural network. The experiments results demonstrated the effectiveness of this approach. Keywords: sEMG, Blind Identification, Hammerstein-Wiener Model.
1 Introduction sEMG is obtained from the neuromuscular activity, which reflects the function and status of nerve and muscle. Using sEMG can identify different movement patterns, because different actions corresponding to different sEMG signals. Therefore, sEMG has not only been widely used in clinical medicine and sports medicine, but also been control signal in multi-degree of freedom artifical limb. sEMG is nonlinear biological signal which is non-stable, robustness and susceptible to external interference[1,2]. It is the key of control myoelectric prothesis that extracting effective features of sEMG to accurately identify different movement patterns. Scholars at home and abroad of sEMG signal processing usually use traditional methods to obtain sEMG features, such as time domain method, frequency domain method and time-frequency domain method. The characteristics of sEMG itself don’t be made an intensive study, which causing the extracted features are not typical, the operation time is long, and the recognition rates are not high[3,4]. In this paper, mathematical model of dual-channel forearm sEMG are established based on sEMG characteristics. The output obtained by the electrode is the only measurable signal, the input is unknow and can not be predicted. Therefore, Hammerstein-Wiener blind identification method is used to establish sEMG model. *
This paper is supported by the Key Project of Science and Technology Development Plan for Jilin Province (Grant No.20090350), Chinese College Doctor Special Scientific Research Fund (Grant No.20100061110029) and the Jilin University "985 project" Engineering Bionic Sci. & Tech. Innovation Platform. ** Corresponding author. D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 340–347, 2011. © Springer-Verlag Berlin Heidelberg 2011
Modeling and Classification of sEMG Based on Blind Identification Theory
341
The parameters of model are used as input of improved BP neural network for multipattern recognition. The experiments demostrate that good results have been achieved.
2 Hammerstein-Wiener Model Nonlinear discrete Hammerstein-Wiener model as shown in Figure 1.
Fig. 1. Structure of Hammerstein-Wiener model
Where the linear part is G( z ) =
β z −1 + β z −2 + " + β z − n 1
2
(1)
n
1 − α z −1 − α z −2 − " − α z − n 1 2 n
The linear relationship between input and output can be expressed as n
n
i =1
i =1
x(t ) = ∑ α i x(t − i ) + ∑ β i u (t − i )
(2)
The nonlinear part is composed by a continuous, reversible sequence, which is shown as (3) q
y = f ( x) = ∑ li si ( x)
(3)
i =1
It needs to meet the following conditions: 1) In nonlinear sequence x = f −1 ( y ) , si ( x) is a continuous, invertible mapping
whose inverse model x = f −1 ( y ) can be expressed as m
x = f −1 ( y ) = ∑ γ i pi ( y )
(4)
i =1
2) If the discrete input is limited sequence, the model output is still limited one 3) The orders of model m, n, q are known Under these conditions, the nonlinear system blind identification can be stated like this: the input of nonlinear system u (t ) is cyclostationary which can not be measured, the output y (t ) and the model parameters
α i , βi , li
can be identified if
the sampling period T and statistical properties of u (t ) are known.
3 Cyclostationary Signals and Statistical Properties A cyclostationary process is a signal having statistical properties that vary cyclically with time. It can better characterize human biological signals such as sEMG. It has the following features:
342
Y. Li et al.
1) The statistical properties vary cyclically with time which can reflect the characteristics of non-stationary signals. It is a generalized description of wide-sense stationary signal. 2) The statistical properties vary cyclically with time, so it is reasonable simplification of non-stationary signals based on objective description. 3) Correlation of frequency-shift signal is contained which is peculiar to signal cyclical characteristics. In this paper, the statistical characteristics of cyclostationary signals with Hammerstein-Wiener model are combined to solve the problem of blind identification of nonlinear systems.
4 Hammerstein-Wiener Model Identification The statistical characteristics of cyclostationary signals can be used with nonlinear inverse mapping of Hammerstein-Wiener model to obtain parameters, and to complete model identification. 4.1 Parameters
αi , γ i
Identification
Put equation (4) of model structure into (2) we get m
n
m
n
j =1
i =1
j =1
i =1
∑ γ j p j ( y(t )) = ∑αi ∑ γ j p j ( y(t − i)) + ∑ βiu(t − i) Assumed that γ 1
(5)
= 1 . With this assumption the results can be uniquely identified.
Then the (8) can be expressed as n
p1 ( y (t )) = φ T (t ) ⋅ θ + ∑ β iu (t − i )
Where φ (t ), θ are vectors, can be shown as φ T (t ) = [− p2 ( y(t ))
− p3 ( y(t ))" − pm ( y(t ))
p1 ( y(t − 1))
p1 ( y(t − 2))" p1 ( y(t − n))
p2 ( y(t − 1))
p2 ( y(t − 2))" p2 ( y(t − n))
" pm ( y (t − 1))
(7)
pm ( y (t − 2))" pm ( y(t − n))]
θ T = [γ 2 γ 3 " γ m α1 α 2 " α n α1γ 2 α 2γ 2 α 3γ 2 " α nγ 2 " α1γ m α 2γ m α 3γ m " α nγ m ]
Now consider the next period t + T , the equation of given by n
i =1
(8)
p1 ( y (t + T )) is accordingly
p1 ( y (t + T )) = φ T (t + T ) ⋅θ + ∑ β i u (t + T − i )
Equation (9) minus (6) can be obtained
(6)
i =1
(9)
Modeling and Classification of sEMG Based on Blind Identification Theory
343
p1 ( y (t + T )) − p1 ( y (t )) = [φ T (t + T ) − φ T (t )] ⋅θ +
(10)
n
∑ β [u (t + T − i) − u (t − i)] i
i =1
Since the statistical characteristics of cyclostationary input has cyclic pattern E[u (t + T − i )] = E[u (t − i )] , the operation result follows that E[ p1 ( y(t + T )) − p1 ( y(t ))] = E[φ T (t + T ) − φ T (t )] ⋅ θ
(11)
From this we can see that the system parameter θ is only related to the output y (t ) and
pi (⋅) , irrelevant to input u (t ) . Then θ can be obtained by the least square method which means that the value of model parameters α i , γ i can be got.
the structure of
4.2 Parameters
li Identification
According to equation (6) and (7),
li will be got by the least square method
2 q m ⎤ ⎪⎫ ⎪⎧ ⎡ min L = min ⎨∑ ⎢ y (t ) − ∑ li ⋅ si (∑ γ i pi ( y ))⎥ ⎬ i =1 i =1 ⎦ ⎪⎭ ⎪⎩ t ⎣
4.3 Parameters
βi
(12)
Identification
Nonlinear inverse model can be got after
αi , γ i
are obtained. The estimated symbol is
m
xˆ(t ) = ∑ γˆi pi ( y )
(13)
i =1
Substituting (13) into (2), it follows that xˆ (t ) − ∑ αˆ i xˆ (t − i) = ∑ βi u(t − i)
(14)
⎡
(15)
n
n
i =1
i =1
Due to n
⎤
n
∑ β u (t − i) = ⎢⎣∑ β δ (t − i) ⎥⎦ ⋅ u(t ) i =1
i
i =1
i
Equation (15) can be expressed as n ⎡ n ⎤ xˆ (t ) − ∑ αˆ i xˆ (t − i ) = ⎢ ∑ β iδ (t − i ) ⎥ ⋅ u (t ) i =1 ⎣ i =1 ⎦
(16)
Let n
Q (t ) = xˆ (t ) − ∑ αˆ i xˆ (t − i ) i =1
,
⎡ n ⎤ H = ⎢∑ β iδ (t − i ) ⎥ ⎣ i=1 ⎦
Then (15) can be rewritten as Q(t ) = H ⋅ u (t )
(17)
In equation (17) H can be calculated using linear blind identification of second-order statistics. Then βi can be got. The calculation steps are as follows
344
Y. Li et al.
1) Centralize Q(t ) , make it zero mean 2) Mixed signal Q(t ) has no noise, the correlation matrix can be estimated as 1 Rˆ Q (0) = N
N
∑ ⎡⎣Q(t ) ⋅ Q t =1
T
(t ) ⎤⎦
3) Singular value decomposition for RˆQ (0) RˆQ (0) = Vu Λ uVuT
4) Pre-whitening −
1
Q(t ) = Λu 2 ⋅ VuT ⋅ Q(t ) 1
− Λˆ u 2 = diag {(λ1 − σˆ N2 ), (λ2 − σˆ N2 )," (λn − σˆ N2 )}
Q(t ) has no noise so
1
ˆ −2 = Λ ˆ Λ u u
5) The time delay p ≠ 0 , then singular value decomposition for covariance 1 RˆQ ( p ) = N
N
∑ ⎡⎣Q(t ) ⋅ Q t =1
T
(t − p ) ⎤⎦
= U Q ⋅ ∑ Q ⋅VQ T
6) If the singular value of
∑
Q
have same value, then take different p to repeat step
4), Otherwise 1
ˆ − 2 ⋅V T ) −1 ⋅U Hˆ = W + ⋅U Q = (Λ u u Q 1
ˆ 2 ⋅U = V ⋅ Λ ˆ −1⋅U =Vu ⋅ Λ u Q u u Q
5 Improved BP Neural Network Classifier A new BP neural network improved algorithm is proposed in this paper based on original BP algorithm and simulated annealing algorithm. This new algorithm has used BP algorithm as main frame and introduced SA strategy to learning process. Through this, it not only uses supervised learning of gradient descent to improve local search performance, but utilizes the ability of SA that could probabilistic jump out of local minimum to achieve global convergence [5]. It has greatly improved the learning performance of the network. The new algorithm is used in neural network classifier to recognize five hand gestures in this paper [6]. Let step is η whose adjustment formula is (18) E = E (k ) − E (k − 1)
, ,
When E<0 otherwise
η ( k + 1) = 1.3 × η ( k ) η (k + 1) = 0.8 ×η ( k )
a=a
(19)
a=0
(20)
η (k ) is kth learning step, E ( k ) is kth network error of sum squares, a is momentum factor, W is weight matrix of network, MiniError is minimum error of system, m is number of iterations, Maxtime is the maximum number of iterations in system, N is the number of samples in training set, E (m) is mth network error of sum squares,
and SA(W) is the function of SA algorithm.
Modeling and Classification of sEMG Based on Blind Identification Theory
345
Then the improved BP neural network algorithm can be described as follows: (1) Initialize weight matrices, MiniError and Maxtime. Let m=1, Counter=0 ,
η (k ) = 0.1
(2) Train the network using BP algorithm, and calculate E(m) (3) Determine whether E(m)< MiniError. If the statement is true, which means the algorithm has been achieved convergence requirements. Then go to (12). (4) Comparison of E(m)-E(m-l), then adjust step size according to step size adjustment formula of adaptive learning in equation (14)-(16) (5) If | E(m)-E(m-l)|<λ , then let Counter= Counter+1, otherwise Counter=0 (6) Determine whether Counter ≥ Const . If the statement is true, which means neural network has been trapped into local minimum. Then record the error, let LocalMin=E(m), and go to (9) (7) According to BP algorithm updates the weight matrix W (8) Let E(m-l)=E(m),m=m+l. Go to (2) for the next iteration of BP (9) Use SA algorithm, let E(m)=SA(W) (10) Determine whether E(m)<MinError. If the statement is true, which means the algorithm has reached convergence requirement. Then go to (12) (11) Determine whether E(m)
6 Experimental Results The input layer neurons of improved BP network are parameters of sEMG model. In this paper, let m = 3, n = 2, q = 3 , P1 ( y ) = y, P2 ( y ) = sin( y ), P3 ( y ) = y .The 3
experiments demonstrate that
γ
and l have better classification ability. Therefore,
α and β are abandoned, γ and l
with one or two channel sEMG are used as network input respectively to identify five different movement patterns (extension of wrist (EXWR), flexion of wrist (FLWR), hook gesture (HOOK), side flexion of wrist Table 1. Parts of
Movement Patterns WSPN EXWR FLWR HOOK SFWR
γ with two-channel sEMG
First Channel γ1
6.028320 -0.72241 -1.06055 -3.80273 -8.12109
Second Channel γ2
-6.39844 -5.61963 -1.22412 -1.24353 -0.40234
γ1
3.697754 -1.875 -3.03918 -7.88306 -18.7042
γ2
2.257813 12.29688 -0.45361 -12.0609 -6.23315
346
Y. Li et al.
(SFWR), wrist supination (WSPN)). In this paper, sEMG data are read 40 times, each time sampling of 450 data. The period of sEMG is 0.225s. Then parts of γ value are shown in Table 1. The experimental results of one channel and two channels sEMG are shown in Figure 2 and Figure 3, in which red circle is category prediction and cross is actual category. 6
1
5.5 0.5
5 4.5
0
error
category
4 3.5
-0.5
3 -1
2.5 2
-1.5
1.5 1
1
2
3
4 5 6 7 the number of data for anticipate
8
9
-2
10
1
2
3
(a)
4 5 6 7 the number of data for anticipate
8
9
10
(b)
Fig. 2. The experimental results of one-channel sEMG (a. is the result of classification, b. is network training error) 6
1 0.8
5 0.6 0.4
4
error
category
0.2
3
0 -0.2
2 -0.4 -0.6
1
-0.8
0
1
2
3
4 5 6 7 the number of data for anticipate
8
9
10
-1
1
(a)
2
3
4 5 6 7 the number of data for anticipate
8
9
10
(b)
Fig. 3. The experimental results of two-channel sEMG ( a. is the result of classification, b. is network training error) Table 3. Recognition results for five movement patterns
WSPN EXWR FLWR HOOK SFWR Average Rate
Typical BP 37 (40) 38 (40) 36 (40) 39 (40) 37 (40) 93.5%
Improved BP 37 (40) 39 (40) 36 (40) 40 (40) 39 (40) 95.5%
Modeling and Classification of sEMG Based on Blind Identification Theory
347
Table 3 is comparison of recognition results for five movement patterns between typical BP network and improved BP network.
7 Conclusion and Prospects Blind identification theory is used to modeling sEMG of five hand movements in this paper. The parameters of model are used as input of improved BP network. It can be seen from the experimental results that the parameters of sEMG model have capacity to characterize different motions in a certain extent. Improved BP network is better than typical BP method to classify different patterns. sEMG is weak and susceptible to external influences, in this paper the interference is neglected. So in the future, we need to consider variety of typical disturbances, such as electromagnetic interference, the different physiological conditions in the same person, or the same physiological conditions in different persons, and then establish multi-model description to obtain more precise mathematical expressions of sEMG.
Acknowledgment The authors are grateful to Professor Chen Xiang in University of Science and Technology of China for kind assistance in data collection.
References [1] Kim, J., Bee, N., Wagner, J., André, E.: Emote to Win: Affective Interactions with a Computer Game Agent. Lecture Notes in Informatics 50, 159–164 (2004) [2] Park, D.G., Kim, H.C.: Muscleman. Wireless input device for a fighting action game based on the EMG signal and acceleration of the human forearm (December 25, 2009), http://melab.snu.ac.kr/Research/melab/doc/HCI/ muscleman_paper.pdf [3] Scott Saponas, T., Tan, D.S., et.al.: Demonstrating the Feasibility of Using Forearm Electromyography for Muscle-Computer Interfaces. In: Proceeding of the Twenty-Sixth Annual SIGCHI Conference on Human Factors in Computing Systems, Florence, Italy, April 5-10, pp. 515–524 (2008) [4] Jin, J., Peng, J.: Blind Identification and Digital Compensation of Memory Nonlinearities for Multiband Systems. In: 2010 WRI International Conference on Communications and Mobile Computing, vol. 3, pp. 7–11 (2010) [5] Li, Y., Tian, Y., Chen, w.z.: Multi-pattern Recognition of sEMG Based on Improved BP Neural Network Algorithm. In: Proceedings of the 29th Chinese Control Conference, Beijing, China, July 29-31, pp. 2867–2872 (2010) [6] Li, Y., Tian, Y., Chen, W.: Modeling and Pattern Recognition of sEMG for Intelligent Bionic Artificial Limb. In: IEEE International Conference on Robotics and Biomimetics, Tianjin, China, December 14-18 (2010)
Micro-Blood Vessel Detection Using K-means Clustering and Morphological Thinning Zhongming Luo, Zhuofu Liu, and Junfu Li The higher educational key laboratory for Measuring & Control Technology and Instrumentations of Heilongjiang Province, Harbin University of Science and Technology, Harbin, Heilongjiang, 150040, China {luozhongming,zhuofu_liu}@hrbust.edu.cn
Abstract. This paper introduces a combination method for blood vessel segmentation based on k-means clustering and morphological thinning. In the first stage, the original image was partitioned into two clusters (foreground and background). As this step is a coarse classification, a fine detection proceeded to the pre-processed image with the help of the morphological thinning algorithm. Experimental results indicated that blood vessels within an image have been detected by using the coarse-to-fine segmentation method with the accuracy of more than 90%. Keywords: Blood Vessel, Detection, Segmentation, K-means Clustering, Morphological Thinning.
1 Introduction Characteristic extraction of blood cells from in vivo video images is a challenging event in the field of biomedical engineering. Those extracted characterization can be employed to analyze physiological and biological properties of targets which provide useful information for medical diagnosis, disease prevention and social security. However, it is time consuming to perform analysis and outcomes depend on the experiences of analysts. To overcome those shortcomings, it is necessary to develop an effectively automatic program releasing human being from heavy labors and improving accuracy of results. The most vital step is to locate the blood vessel before carrying out any cell tracking task. The task of precise blood vessel segmentation has received much attention in the field of image processing because it can enhance the accuracy of cell tracking and reduce computing time by restricting the tracking regions [1-4]. Edge/Region-based methods [5] were commonly used to identify target regions (region of interest, ROI). To improve the precision of blood vessel segmentation, Ricci and Perfetti [6] employed two orthogonal line detectors along with the grey level of the target pixel for feature extraction while a support vector machine was chosen as a classifier. Mendonca and Campilho [7] proposed an automated means based on directional differential operators and iterative region growing method to segment the vascular network in retinal images. A snake model was used to detect blood vessel borders [8]. However, most of the previous studies relied on standard databases (such as DRIVE D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 348–354, 2011. © Springer-Verlag Berlin Heidelberg 2011
Micro-Blood Vessel Detection Using K-means Clustering and Morphological Thinning
349
database and STARE database) whose images were taken or processed by professionals. As a result, most of the proposed methods can not solve real-world problems. In this paper, a combination of k-means clustering and morphological thinning algorithms was used to segment blood vessels using images captured using the real-time image acquisition system described in section 2. In section 3, both k-means clustering and morphological thinning methodologies were explained. Then the combination method was applied to detect blood vessels from a coarse to fine point of view in section 4. Finally, a conclusion and future works were briefed in section 5.
2 System Description The real-time image acquisition system is based on MV-VD200SM high resolution digital video camera equipped with LED lighting source and AFT-M1024 lens. Figure 1 illustrated the basic structure of the whole hardware system. The resolution of MV-VD200SM can reach 1600×1200 pixels under the capturing rate of 10 frames per second. As the movement of blood vessels will be slower than this parameter, the experimental images were recorded at the maximum rate (10 frames/second).
Fig. 1. System Diagram
3 Algorithm A K-means clustering K-means clustering algorithm developed by J. MacQueen (1967) and then by J. A. Hartigan and M. A. Wong around 1975 is a method of cluster analysis whole goal is to partition n observations into k mutually exclusive clusters based on attributes/features. K is positive integer number. The classification is done by which each observation belongs to the cluster centroid (or ‘cluster center’ is the point to which the sum of distances from all objects in that cluster is minimized) with the nearest mean. The mathematical expression of K-means clustering is:
350
Z. Luo, Z. Liu, and J. Li
k
arg min ∑ S
where
X = ( x1 , x 2 ,
∑x
i =1 x j ∈S i
2 j
− ui
(1)
, x n ) is a set of observations, x j is a d-dimensional real
, S k } (k < n ) , μ i is the mean of S i . (1) (1) (1) Given an initial set of k means μ1 , μ 2 , , μ k , which may be specified ran-
vector,
S = {S1 , S 2 ,
domly or heuristically, the algorithm goes back and forth between the following two steps [9]: Assignment step: Assign each observation to the cluster with the closest mean.
{
S i( t ) = x j : x j − μ i( t ) ≤ x j − μ i(*t )
i * = 1,
, i − 1, i + 1,
}
(2)
,k
Update step: Calculate the new means to be the centroid of the observations in the cluster.
μi(t +1) =
1 Si(t )
∑x
j
(3)
x j ∈Si( t )
This algorithm moves objects between clusters until the sum cannot be decreased further. The result is a set of clusters that are as compact and well-separated as possible. B Morphological thinning A derivation of the hit-or-miss transform is called morphological skeletonisation whose result is the union of skeleton subparts. It gives a skeleton of which all points belong to the medial axis, but due to pixelation and the finite size of the structuring element it is not necessarily connected nor maximally thin. The expression of skeletonisation is expressed [10]:
S ( A) = ∪ S k ( A )
(4)
S k ( A) = ( A Θ kB ) − ( A Θ kB ) B
(5)
K = max{k | ( AΘkB ) ≠ Φ}.
(6)
K
k =0
where A is the set to be morphologically skeletonised while B is the structure element. K is the last iterative step before A erodes to an empty set which means to carry out k successive erosion of the set A.
Micro-Blood Vessel Detection Using K-means Clustering and Morphological Thinning
351
( A Θ kB ) = ( ( AΘ B )Θ B )Θ )ΘB
(7)
k times
The reconstruction expression of the set A from these subsets is:
A = ∪ (S k ( A) ⊕ kB )
(8)
(Sk ( A) ⊕kB) = (( (Sk ( A) ⊕B) ⊕B) ⊕ ) ⊕B
(9)
K
k =0
k times
which means to continue the dilation k times.
3 Experiments The image sequences were captured by the real-time in vivo image acquisition system introduced in section 2. Before the experiments, a commercially available nude frog was fastened on a metal platform with a three dimensional flexibility.
In vivo blood vessel image
Cluster the input image into two parts: background and target (blood vessel)
Skeletonise the blood vessel to extract the thinner part and suppress unwanted noisy background
Post-process the skeletonised image in order to reduce pruning effects
Fig. 2. Flowchart of the whole process
352
Z. Luo, Z. Liu, and J. Li
The process of blood vessel segmentation can be divided into two parts: coarsely categorized step and fine locating step, which is illustrated by the flow chart of the algorithms in Figure 2. In the first part of blood vessel segmentation, the original image was coarsely separated into two parts (foreground/vessel and background/skin) with the aid of k-mean clustering and the segmented result is illustrated in Figure 4.
Fig. 3. Original image captured
Fig. 4. Coarse segmentation using k-means clustering method
Though the micro-blood vessels were detected through k-mean clustering method, it is obviously containing faulty information caused by the composition of color and texture, for example the square dots in Figure 4. So it is necessary to carry out fine recognition and morphological image processing method was applied to the coarsely pre-processed image. In this trial, the morphological skeleton algorithm was used to
Micro-Blood Vessel Detection Using K-means Clustering and Morphological Thinning
353
Fig. 5. Fine recognition based on morphological method
Fig. 6. Blood vessel detection using a coarse-to-fine means
suppress unwanted noise. Skelentonised blood vessel was shown in Figure 5, but some unexpected tiny blood branches occurred due to the side effect of pruning. In order to reduce the pruning artifacts, a post-processing procedure was applied to clean up the skeletonised image and the detected micro-blood vessels were superimposed on the original image as shown in Figure 6. It is clear that the micro-blood vessels which are blurred in the original image have been effectively detected.
4 Conclusions In this paper, the micro-blood vessels were segmented using a coarse-to-fine method. For the coarse segmentation, the original image was divided into two parts: target and background based on the k-mean clustering means. In the fine step, the morphological thinning algorithm was applied to the pre-processed image and some unwanted disturbances were removed effectively.
354
Z. Luo, Z. Liu, and J. Li
In the future work, image sequence analysis based on data fusion will be tested to further improve the segmentation quality. If the image quality can be improved by choosing higher performance microscopic camera, blood cells will be discovered. Since the location of the blood vessels has been identified, more biomedical information can be extracted by tracking the movement of cells and the pattern of blood flow.
Acknowledgment This work is supported by Scientific Project of Heilongjiang Education Department (Grant No. 11551103).
References 1. Martinez-Perez, M.E., Hughes, D., Stanton, A.V., Thom, S.A., Chapman, N., Bharath, A.A., Parker, K.H.: Retinal vascular tree morphology: A semi-automatic quantification. IEEE Trans. Biomed. Eng. 49(8), 912–917 (2002) 2. Eden, E., Waisman, D., Rudzsky, M., Bitterman, H., Brod, V., Rivlin, E.: An Automated Method for Analysis of Flow Characteristics of Circulating Particles From In vivo Video Microscopy. IEEE Transactions on Medical Imaging 24(8), 1011–1024 (2005) 3. Said, J.N., Cheriet, M., Suen, C.Y.: A recursive thresholding technique for image segmentation. IEEE Trans. Image Process. 7(6), 918–921 (1998) 4. Wu, D., Zhang, M., Liu, J.-C., Bauman, W.: On the Adaptive Detection of Blood Vessels in Retinal Images. IEEE Transactions on Biomedical Engineering 53(2), 341–343 (2006) 5. Greenspan, H., Laifenfeld, M., Einav, S., Barnea, O.: Evaluation of center-line extraction algorithms in quantitative coronary angiography. IEEE Trans. Med. Imag. 20(9), 928–941 (2001) 6. Ricci, E., Perfetti, R.: Retinal Blood Vessel Segmentation Using Line Operators and Support Vector Classification. IEEE Transactions on Medical Imaging 26(10), 1357–1365 (2007) 7. Mendonça, A.M., Campilho, A.: Segmentation of Retinal Blood Vessels by Combining the Detection of Centerlines and Morphological Reconstruction. IEEE Transactions on Medical Imaging 25(9), 1200–1212 (2006) 8. Tang, J., Acton, S.T.: Vessel boundary tracking for intravital microscopy via multi-scale gradient vector flow snakes. IEEE Trans. Biomed. Eng. 51(2), 316–324 (2004) 9. MacKay, D.: An Example Inference Task: Clustering. In: Information Theory, Inference and Learning Algorithms, ch. 20, pp. 284–292. Cambridge University Press, Cambridge (2003) 10. Gonzalez, R.C., Woods, R.E.: Digital image processing, 2nd edn. Prentice Hall, Englewood Cliffs (2002)
PCA Based Regional Mutual Information for Robust Medical Image Registration Yen-Wei Chen1,2 and Chen-Lun Lin2 1 2
Electronics & Inf. Eng. School, Central South Univ. of Forestry and Tech., China
College of Information Science and Eng., Ritsumeikan University, Shiga, Japan
Abstract. Mutual information (MI) is a widely used entropy-based similarity metric for medical image registration. It can be used for both mono-modality and multi-modality registration. Recently an improved mutual information metric named regional mutual information (RMI) has been proposed for robust image registration by including some regional or special information. Though RMI has been demonstrated more effective and more robust than traditional MI, it takes larger computation cost because of computation of high-dimensional joint distribution. For improvement of RMI, we propose
a novel PCA based regional mutual information (PRMI) to implement a more robust and faster medical image registration.
,
Keywords: Mutual Information (MI) Region Mutual Information (RMI), PCA Based Region Mutual Information (PRMI), Principal Components Analysis (PCA).
1 Introduction Medical image registration addresses the problem of finding a geometric transformation which is able to align two given medical images (volumes) together [1, 2]. Fig 1 shows the process of medical image registration that can be represented by four phases: transformation, interpolation, criterion and optimization. It is crucial to define a criterion metric which can measure the similarities of two images. To date, many image similarity metrics have been proposed and used for medical image registration. Mutual information (MI) is one of intensively researched metrics [3-10] in image registration, which was first invented by Shannon [11] as a measure of the statistical independence between two signals and applied as a similarity metric of image alignment by Viola and Wells [6]. This information-theoretic Fig. 1. Framework of medical image registration metric is fully automatic, D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 355–362, 2011. © Springer-Verlag Berlin Heidelberg 2011
356
Y.–W. Chen and C.-L. Lin
and needs no predefined landmarks. In addition, unlike other intensity-based metrics, it is suitable to be applied on both mono-modality and multi-modality registration. Many independent researches show that mutual information performs very well [7]. However, MI is not a robust metric for images containing noise, since MI is usually calculated using only the corresponding pixel intensities of two images and it lacks spatial information [3]. Several methods have been proposed to improve the robustness and precision of similarity criterion by the use of additional spatial information [12-14]. One of methods proposed by Russakoff [12] is known as regional mutual information (RMI), in which the joint histogram is calculated by using corresponding regions (blocks) instead of corresponding pixels. Though the RMI is a robust similarity metric for image registration, it requires too much computing time because of the high-dimensional joint histogram. In this paper, we propose a novel PCA [15] based regional mutual information (PRMI) to implement a more robust and faster medical image registration.
2 Related Methods 2.1 Mutual Information The mutual information (MI) [7] between two images is a widely used entropy based similarity metric for medical image registration. Its calculation is based on the joint histogram of the fixed and transformed moving images as shown in Fig.2(a) and 2(b)). Fig.2(c) shows the situation when two images are registered. Because the images are identical, all grey value correspondences lie on the diagonal. Suppose the intensity of a pixel on fixed image (F) and moving image (M) can be expressed by f F and f M and transformation parameter be u. The joint entropy of two images (F and M) can be calculated from their joint probability distribution (joint histogram) P( m, n | u ) ( m ∈ F , n ∈ M ) by Eq.(1). H ( fF , fM u ) = −
∑
m ∈ F , n∈ M
P ( m , n u ) log P ( m , n u )
(1)
And we can calculate the individual entropy of each images from its marginal probability (histogram) P (m) or P (n) by Eq.(2).
H ( f F ) = − ∑ P ( m ) log P ( m ) m∈F
H ( f M u ) = − ∑ P ( n u ) log P ( n u )
(2)
b∈B
Then, MI is defined in terms of entropy as Eq.(3).
(
)
MI F , T ( M u ) = H ( f F ) + H ( f M u ) − H ( f F , f M u )
(3)
The MI is used as a similarity metric for image registration. The transformation T which rely on parameter u can be found by maximizing the MI as Eq.(4). T′ =
arg m ax T
(
MI F , T ( M u )
)
(4)
PCA Based Regional Mutual Information for Robust Medical Image Registration
357
Fig. 2. (a) Corresponding points in two images to be registered, (b) joint distribution of two images, (c) joint distribution when two images are registered
2.2 Regional Mutual Information Since MI is only calculated on a pixel by pixel basis, meaning that they take into account only the relationship between corresponding individual pixels and not those of each pixel’s respective neighborhood. This drawback would make registration fail especially when images to be registered are with noise. A new similarity metric named regional mutual information (RMI) has been proposed to improve traditional MI by including spatial information [12]. The basic idea of RMI is to calculate joint distribution using corresponding regions (blocks) instead of corresponding pixels as shown in Fig.3. In Fig.3, we use a 3 × 3 region for computation of joint distribution instead of a single pixel. So we would use 9D histograms for the marginal probabilities and an 18D histogram for the joint distribution as shown in Fig.3 (in traditional mutual information, 1D histograms and a 2D histogram are used for the marginal probabilities and the joint distribution, respectively).
Fig. 3. (a) The relationship between an image region and its corresponding multi-dimensional point. (b) Corresponding image neighborhoods and the multi-dimensional point representing them in the joint distribution. (c) The joint distribution matrix of RMI.
As shown in Fig 3 (b), we can make a d-dimensional vector of the joint distribution and combine each vector as a d × N matrix of joint distribution. Given a specified
358
Y.–W. Chen and C.-L. Lin
square radius r, here
d = 2 ( 2r + 1)
2
and
N = ( m − 2 r )( n − 2 r )
. Using this method to extract
spatial feature information and create joint distribution would allow users try to rotate and translate such high-dimensional distribution into a space where each dimension is independent. Independence in each dimension allows us to decouple the entropy calculation. We could then calculate the marginal and joint entropies as Eq.(5) [12]. d
1
H g (Σ d ) = log((2π e) 2 det(Σ d ) 2 )
(5)
At last, we can obtain the value of RMI by Eq.(6). RMI = H g ( C A ) + H g ( C B ) − H g ( C )
(6)
Since Σd is a covariance matrix of the joint distribution matrix, so it takes a large computing time. To calculate the covariance matrix, we should make the matrix of sample multiplied it by transpose but the multiply operation usually requires more computing time especially two matrices which should be multiplied by each other are large. For two 256 × 256 images, the N is going to be more than 30,000. That is the reason why this method always executes slowly and is not useful in real applications. Therefore, we should find a way to improve it.
3 Our Proposed Method Here we propose a method based on high dimensional joint distribution combined with a PCA based dimension reduction to improve the computation efficiency of RMI, which is called as PCA based regional mutual information (PRMI).
Fig. 4. describes the method how we extract regional information from two images and how the high-dimensional regional information matrices are created by theses information
As well as RMI, we could obtain two d × N matrices (Fig.4), which represent the relationship between an image region and its corresponding multi-dimensional point from two images individually. The process of our algorithm has shown as below. Fig 5 shows how we implement our proposed method. First, we create two matrices of regional information as shown in Fig 4 and then we use principal component analysis (PCA) to reduce two matrices as two 1D vectors (only top principal component of
PCA Based Regional Mutual Information for Robust Medical Image Registration
359
Fig. 5. Process of our proposed algorithm
a region is selected). Then two vectors are normalized and used for computation of both joint entropy and individual entropies by the use of Eqs.(1) and (2). The mutual information is calculated by Eq.(3). Fig. 6 indicates difference and improvement of our proposed method compared with traditional MI and RMI. Comparing with traditional MI, PRMI include regional information to overcome the lack of spatial information as well as RMI. On the other hand, PRMI just takes corresponding feature points to create joint distribution and calculate entropies. So we could implement it quickly, while RMI spends too much effort to compute the joint distribution that includes all regional information. That is the reason why our proposed PRMI much effective than RMI.
Fig. 6. Comparison of PRMI with previous methods
4 Experimental Results In order to evaluate our proposed method, we applied PRMI to two 2D MR brain datasets that include different rate of noise, which are obtained from brain web site [16]. One dataset shown in Fig.7(a) is T1 to T1 (mono-modality image registration) and another dataset shown in Fig.7(b) is T2 to T1 (multi-modality image registration).
360
Y.–W. Chen and C.-L. Lin
Fig. 7. Comparison of registration results of MI and PRMI. (a) Dataset 1 (mono-modality image registration), (b) Dataset 2 (multi-modality image registration) Table 1. Comparison of registration accuracy
T1-T1 (mono-modality registration)
ΔΤ (mm) MI PRMI 33.6 0.91
Δθ (degree) PRMI ΜΙ 21.0 0.52
T2-T1 (multi-modality registration)
3.80
24.4
Data set
1.02
0.43
Fig. 8. Joint distributions of both PRMI (left) and MI (right)
Both dataset contain noise. The registration results by conventional MI and our proposed PRMI are shown in Fig.7(a) and 7(b), respectively. It can be seen that it is not possible to achieve good registrations by conventional MI because of noise, while by using our proposed PRMI, we can achieve good registrations for both dataset.
PCA Based Regional Mutual Information for Robust Medical Image Registration
361
Fig. 9. Comparison of computation time
In order to make a quantitative evaluation of registration accuracy, the root mean squared error (ΔΤ and Δθ) between the estimated parameters and the ground truth for translation and rotation are shown in Table 1. It can be seen that our proposed PRMI performs much better results than conventional MI. In order to show the advantage of our proposed PRMI, we show the joint distributions of PRMI and MI for dataset 1 in Fig 8. From Fig 8, we can find our joint distributions of PRMI more approximate to a diagonal line and it also show that our proposed PRMI much better than conventional MI. Though RMI can also achieve good registration accuracy, but it takes large computation time. The comparison of computation time is shown in Fig.9. Our proposed PRMI is more efficient than RMI.
5 Conclusions We proposed a PCA based regional mutual information (PRMI) for robust and fast medical image registration. Experimental results with both mono-modality medical image data and multi-modality image data show that the proposed PRMI performs much better results than conventional MI and much faster than RMI.
Acknowledgments This work was supported in part by the Grand-in Aid for Scientific Research from the Japanese Ministry for Education, Science, Culture and Sports under the Grand No. 21300070 and in part by the R-GIRO Research fund from Ritsumeikan University.
References [1] Hajnal, J.V., Hill, D.L.G., Hawkes, D.J.: View of the future. In: Hajnal, J.V., Hill, D.L.G., Hawkes, D.J. (eds.) Medical Image Registra-tion, CRC, Boca Raton (2001) [2] Hill, D.L.G., Batchelor, P.G., Holden, M., Hawkes, D.J.: Medical image registration. Phys. Med. Biol. 46, R1–R45 (2001)
362
Y.–W. Chen and C.-L. Lin
[3] Brown, L.: A survey of image registration technique. ACM Comput. Surv. 24, 325–376 (1992) [4] Hill, D.L.G., Batchelor, P.G.: Registration methodology: concepts and algorithms. In: Hajnal, J.V., Hill, D.L.G., Hawkes, D.J. (eds.) Medical Image Registration, CRC, Boca Raton (2001) [5] Dowson, N., Bowden, R.: Metric mixtures for mutual information(m3i) tracking. In: Proc. 17th Int. Conf. on Pattern Recognition, pp. 1051–4651 (2004) [6] Viola, P., Wells, W.M.: Alignment by maximization of mutual information. International Journal of Computer Vision 24, 137–154 (1997) [7] Pluim, J., Maint, A.: Mutual information-based registration of medical image: a survey. IEEE Transactions on Medical Imaging, 986–1004 (2003) [8] Maes, F., Collignon, A., Vandermeulen, D., Marchal, G., Suetens, P.: Multimodality image registration by maximization of mutual information. IEEE Transactions on Medical Imaging 16, 187–198 (1997) [9] Hill, D.L.G., Studholme, C., Hawkes, D.J.: Voxel similarity measure for automated image registration. In: Proc. Visualization in Biomedical Computing, Rochester Mn., U.S.A, vol. 2359, pp. 205–216. SPIE Press, Bellingham (1994) [10] Pong, H.K., Cham, T.J.: Alignment of 3d modelstoimages using region-based mutual information and neighborhood extended gaussian images. In: Proc. Asian Conf. for Computer Vision, pp. 60–69 (2006) [11] Shannon, C.: A mathematical theory of communication. Bell System Tech. 27, 370–423 (1948) [12] Russakoff, D., Tomasi, C., Rohlfing, T., Maurer, C.: Image similarity using mutual information of regions. In: Proc. Europ. Conf. on Comp. Vision, pp. 596–607 (2004) [13] Plium, J.P.W., Maintz, J.B.A., Viergever, M.A.: Image registration by maximum of combined mutual information and gradient information. IEEE Transactions on Medical Imaging 19, 809–814 (2000) [14] Xu, R., Chen, Y.-W.: A Wavelet-based Multiresolution Medical Image Registration Strategy Combining Mutual Information with Spatial Information. International Journal of Innovative Computing, Information and Control 3, 285–296 (2007) [15] Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley and Sons Inc., Chichester (2001) [16] http://www.bic.mni.mcgill.ca/brainweb
Discrimination of Thermophilic and Mesophilic Proteins via Artificial Neural Networks Jingru Xu and Yuehui Chen Computational Intelligence Lab. School of Information science and Engineering University of Jinan, 106 Jiwei Road, 250022 Jinan, P.R. China [email protected]
Abstract. Discrimination of thermophilic and mesophilic proteins is still a challenge in the research of the protein pattern recognition. That is because that the discrimination in the level of amino acid sequence would be helpful to knowing better the folding mechanism and the function of protein. What is more, it would be an important assistant to reconstruct the stability of protein. In this paper, a feature extraction method came to be true which considered the fusion of amino acids models and chem-composition models. Meanwhile, the feedforward artificial neural network which was optimized by particle swarm optimization (PSO-NN) was introduced to discriminate protein thermostsbility. Compared with the previous research, the discrimination accuracy had been improved to some extent. The result reflect that on the basis of amino acids models, chem-composition models plays an important part in improving the accuracy, and PSO-NN should be regard as an effective tool for recognition of mesophilic and thermophilic proteins. Keywords: protein thermostability, amino acids models, chemcomposition models, fusion, particle swarm optimization, artificial neural network.
1
Introduction
The research of protein thermostability has been vigorously studied in the field of biophysical and biological technology. This is mainly because mesophilic protein is easy to be denatured at high temperatures. In the process of industrial production, the phenomenon of protein denaturation caused difficulty to broaden its application field [1]. Therefore, how to improve the thermostability of protein has been concerned in the field of molecular biology, bioengineering and chemical industries. Currently, the discrimination of protein stability can be roughly divided into two kinds: one kind is using the protein spatial structure information to predict protein stability [2][3]; Another kind is using the amino acid sequence information to predict its stability [4] [5]. The method on the spatial structure information generally has high predict accuracy because of its known spatial
Corresponding author.
D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 363–369, 2011. c Springer-Verlag Berlin Heidelberg 2011
364
J. Xu and Y. Chen
structure, but for most of the protein speaking, we know only their sequence instead of their structure, but amino acid sequence contains all the information which decides protein function, so understanding the function of proteins from sequence information is possible. In this way, we can use amino acid sequence to detect protein thermostability. In this research field, how to extract features from amino acid sequence and search an effective tool to recognize mesophilic and thermophilic proteins are what we should consider. Several investigations have been carried out and researchers have found that protein amino acid composition was correlated to its thermo stability [6] [7] [8]. For example, Kumar [7] summarized that along with the increasement of temperature, the content of Ile, Glu, Lys and Tyr in protein is increased, but the content of Gln,Cys is reduced. Thompson [8] discovered that the thermophilic proteins were characteristically increased in Glu, Val, Arg and Gly, but reduced in Gln, Ser, Asp and Lys. Meanwhile, the elements of Hydrophobic Interactions [9] , Packing Effect [10], Solvent Accessible Area [9], Temperature Factor [11] are also found the actors to influence protein thermostability. Those elements are correlated with protein’s chemical property, so we inosculate chem-composition models with amino acids models, and expect gain higher prediction accuracy. In this paper, a feature extraction method would be used which considered the fusion of amino acids models and chem-composition models. Meanwhile, the forward artificial neural network which was optimized by particle swarm optimization (PSO-NN) was introduced firstly to discriminate protein thermostsbility. In this way, we can analyze not only the validity of PSO-NN but also the different action of amino acids models and chem-composition models in the process of discriminating protein thermostability.
2 2.1
Materials and Methods Dataset
There are two datasets in our experiment. They are described as follows: Dataset a: This dataset was derived from reference [5] [12]. The database for training came from reference [5]. Seventy-six pairs set of thermophilic and mesophilic proteins for training were downloaded from Swiss-Prot for the reason of its non-redundancy. In order to check the accuracy of each model, 20 pairs set of thermophilic and mesophilic proteins were selected as testing dataset. The sequences of the testing sample which were found in Protein Data Bank (PDB) via the PDB code were provided by reference [12]. These 20 pairs set of thermophilic and mesophilic proteins were different from the proteins used in the training sample. Dataset b: This training database came from reference [4]. It contains 3521 thermophilic and 4895 mesophilic protein sequences which were derived from SwissProt. The corresponding two testing datasets contained 859 proteins. The first one came from reference [4] which included 382 thermophilic protein sequences of Aquifex aeolicus [15], and 325 mesophilic protein sequences of Xylella fastidiosa [16]. The second one included 76 pairs set of thermophilic and mesophilic proteins
Discrimination of Thermophilic and Mesophilic Proteins via ANNs
365
and was also downloaded from Swiss-Prot. They are also the training sequences of database a. 2.2
Feature extraction
In this paper, a feature extraction method was used which considered the fusion of amino acids models and chem.-composition models. Specific steps are as follows: (1) Amino acids models (AA) Twenty kinds of common amino acid residues were calculated their frequency of occurrences in every protein sequences. Every dimension of characteristic vector can be expressed as: Xi = Ni /N, i = 1, 2, 3......20
(1)
In the formula, Ni indicates the numbers of the ith residue: N indicates the length of the sequence. This model contains twenty dimensions. (2) Chem-Composition models(CC) The 20 native amino acids are divided into three groups for their physicochemical properties, including seven types of hydrophobicity, normalized van der Waals volume, polarity, polarizibility, charge, secondary structures and solvent accessibility. For instance, using hydrophobicity attribute all amino acids are devided into three groups: polar, neutral and hydrophobic. A protein sequence is then transformed into a sequence of hydrophobicity attribute. Therefore, the composition descriptor consists of three values: the global percent compositions of polar, neutral and hydrophobic residues in the new sequence. For seven types of attributes, the chem-composition model consists of twenty -one dimensions. (3)Fusion This is the process of combination of these existing features which were extracted by amino acids models and chem-composition models separately, that is to say, it contains forty-one dimensions. 2.3
Discrimination Method
Artificial neural network (ANN) [13] has the capabilities of learning, parallel processing, auto-organization, auto-organization, auto-adaptation, and fault tolerance. Therefore, ANN is well suited for a wide variety of engineering applications, such as function approximation, pattern recognition, classification, prediction, control, modeling, etc. In this paper, we choose ANN as our discrimination model to recognize thermophilic and mesophilic proteins, and particle swarm optimization (PSO) was used to optimize the parameters of ANN. Particle swarm optimization (PSO) is a population-based stochastic optimization technique, which was originally designed by Kennedy and Eberhart [14] . It shares many similarities with evolutionary computation techniques such as GA. However, unlike GA, the PSO algorithm has no evolutionary operators, such as crossover and mutation. In the PSO algorithm, each single candidate solution can be considered a particle in the search space, each particle move through the
366
J. Xu and Y. Chen
problem space by following the current optimum particles. Particle i is represented as Xi, which represents a potential solution to a problem. Each particle keeps a memory of its previous best position Pbest, and a velocity along each dimension, represented as Vi. At each iteration, the position of the particle with the best fitness value in the search space, designated as Gbest, and the current particle are combined to adjust the velocity along each dimension, and that velocity is then used to compute a new position for the particle. The updating rule is as follows: Vk + 1 = W ∗ Vk + c1 ∗ r1 ∗ (P best − Xk ) + c2 ∗ r2 ∗ (GBEST − Xk ) Xk + 1 = Xk + Vk+1
(2) (3)
Where W is an inertia weight, it regulate the range of the solution space. c1 and c2 determine the relative influence of the social and cognition components (learning factors), while rand1 and rand2 denote two random numbers uniformly distributed in the interval [0,1]. In this paper, we choose W =0.5, c1=c2=2.
3
Experimental Results
In this paper, a feature extraction method was used which considered the fusion of amino acids models and chem-composition models. Meanwhile, the feedforward artificial neural network which was optimized by particle swarm optimization (PSO-NN) was introduced to predict protein thermostsbility. Our experiment can divide into two sections: section A, the process of validity test with our method; section B, the process of analyze the different action of Amino acids models and Chem-composition models in discriminating of thermophilic and mesophilic proteins. First, we defined the followings: TP (true positives): it represents the number of mesophilic proteins that were predicted correctly. TN (true negatives): it represents the number of thermophilic proteins that were predicted correctly. FP (false positives): it represents the number of thermophilic proteins that were predicted as mesophilic proteins. FN (false negatives): it represents the number of mesophilic proteins that were predicted as thermophilic proteins. N: it represents the number of all protein molecule. The following metrics were used to evaluate the prediction results [4] [5][17]: sensitivity of the positive data: sensitivity = T P/(T P + F N );
(4)
specificity of the negative data: specif icity = T N/(T N + F N );
(5)
Discrimination of Thermophilic and Mesophilic Proteins via ANNs
367
Table 1. The final experimental results of database a sensitivity specif icity accuracy
method Our method(PSO-NN+Fusion) PCA [5] PLSR [5] PCANN [5]
100.0 75.0 65.0 60.0
55.0 45.0 80.0 85.0
77.5 60.0 72.5 72.5
Table 2. The final experimental results of database b Test sample sensitivity specif icity accuracy
method
Our method(PSO-NN+Fusion) test1+test2 BPNN+Dipeptide[17] test1 test2 BPNN+AA[17] test1 test2 Adaboost[4] test1+test2 Logitboost[4] test1+test2
92.80 93.50 96.10 97.40 97.40 82.97 87.34
87.60 85.20 75.00 63.70 46.10 92.27 90.77
90.20 89.70 85.50 81.90 71.70 87.31 88.94
Table 3. The final experimental results of database a method PSO-NN+AA PSO-NN+CC PSO-NN+Fusion
sensitivity specif icity accuracy 90 60 100
45 60 55
67.5 60.0 77.5
Table 4. The final experimental results of database b method PSO-NN+AA PSO-NN+CC PSO-NN+Fusion
sensitivity specif icity accuracy 93.3 90.3 92.8
82.4 80.8 87.6
87.7 85.4 90.2
accuracy of prediction: accuracy = (T P + T N )/N ;
(6)
Section A: Table 1. 2 are the results of our method, also other methods which used in the reference[4] [5][17]are listed. Section B: In order to analyze the action of amino acids models (AA), chemcomposition models (CC), and the fusionin in discriminating of thermophilic and mesophilic proteins. we used three kinds of feature in PSO-NN. The final results are reflected in Table 3. 4.
368
4
J. Xu and Y. Chen
Conclusion
Thermophilic microorganisms are the main path to obtain thermophilic protein, but single point mutations and knockout will transform mesophilic protein into thermophilic protein for those who cannot gain in thermophilic microorganisms. Through the experiment, we find that our method has a good performance in discriminating protein thermophilic, so in order to save time and labor, we can use this method to discriminate protein thermostability before doing single point mutations or knockout. Additionally, through the result of the comparison of amino acids models, chem-composition models, and the fusion, we find that amino acids models is the main reason for protein thermostability, but chem-composition also influence protein thermostability to same extent but not the main factor. Therefore, the accuracy is higher under the case of fusion but lower under chem-composition models.
Acknowledgments This research was partially supported by the Natural Science Foundation of China (61070130, 60903176, 60873089), the Program for New Century Excellent Talents in University (NCET-10-0863), the Shandong Distinguished Middle-aged and Young Scientist Encourage and Reward Foundation of China (BS2009SW003), and the Shandong Provincial Key Laboratory of Network Based Intelligent Computing.
References 1. Fagain, C.O.: Understanding and increasing protein stability. Biochim. Biophys. Acta 1252, 1–14 (1995) 2. Seung, P., Young, J.Y.: Protein thermostability: structure-based difference of amino acid between thermophilic and mesophilic proteins. Journal of Biotechnology 111(3), 269–277 (2004) 3. Gilis, D., Rooman, M.: PoPMuSiC, an algorithm for predicting protein mutant stability changes: application to prion protein. Protein Eng., 849–856 (2000) 4. Zhang, G., Fang, B.: LogitBoost classifier for discriminating thermophilic and mesophilic proteins. Journal of Biotechnology 127(3), 417–424 (2007) 5. Zhang, G., Fang, B.: Discrimination of thermophilic and mesophilic proteins via pattern recognition methods. Process Biochemistry 41(3), 552–556 (2006) 6. Suhre, Claverie, J.M.: Genomic correlatives of hyperthermostability, an update. J. Biol. Chem. 17198–17202 (2003) 7. Kumar, S., Nussinov, R.: How do thermophilic proteins deal with heat? Cell Mol. Life Sci. 58, 1216–1233 (2001) 8. Thompson, M.J., Eisenberg, D.: Transprotemic evidence of a loop-deletion mechanism for enhancing protein thermostability. J. Mol. Biol. 290, 595–604 (1999) 9. Sadeghi, M., Naderi-Manesh, H., Zarrabi, M., et al.: Effective factors in thermostability of thermophilic proteins. Bilphys. Chem. 119(3), 256–270 (2006)
Discrimination of Thermophilic and Mesophilic Proteins via ANNs
369
10. Chen, J., Stites, W.E.: Replacement of staphylococcal nuclease hydrophobic core residues with those from thermophilic homologues indicates packing is improved in some thermostable proteins. J. Mol. Bi. 344(1), 271–280 (2004) 11. Parthasarathy, S., Murthy, M.R.: Protein thermal stability in sights from atomic displacement parameters(B values). Protein Eng. 13(1), 9–13 (2000) 12. Seung, P.P., Yoo, Y.J.: Protein thermostability: structure-based difference of amino acid between thermophilic and mesophilic proteins. J. Biotechnol. 1111, 269–277 (2004) 13. McCulloch, W.S., Pitts, W.A.: Logical Calculus of the Ideas Immanent in Nervous Activity. Bulletin of Mathematical Biophysics 5, 115–133 (1943) 14. Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proceedings of the 1995 IEEE International Conference on Neural Networks, Perth, vol. 4, pp. 1942– 1948 (1995) 15. Fredj, Y., Bernard, D.: Amino acid composition of genomes, lifestyles of organisms, and evolutionary trends: a global picture with correspondence analysis. Gene. 297, 51–60 (2002) 16. David, J.L., Gregory, A.C., Donal, A.H.: Synonymous codon usage is subject to selection in thermophilic bacteria. Nucl. Acids Res. 4272–4277 (2002) 17. Zhang, G., Fang, B.: Application of amino acid distribution along the sequence for discriminating mesophilic and thermophilic proteins. Process Biochemistry 41(8), 1792–1798 (2006)
Using Grey Relation Analysis and TOPSIS to Measure PCB Manufacturing Firms Efficiency in Taiwan Rong-Tsu Wang Associate Professor, Graduate School of Business and Management, College of Management, Vanung University, No.1, Vanung Rd. Chung-Li, Tao-Yuan, 320, Taiwan, ROC [email protected]
Abstract. This paper tries to construct a performance evaluation procedure for printed circuit board (PCB) industry combining productivity and financial ratios. A case study is conducted using 22 PCB companies in Taiwan as example. A total of 51 indicators are initially considered and fourteen are selected. They represent five indicators for production efficiency, four for marketing efficiency, and five for execution efficiency. The division of total performance is helpful for operators to recognize the performance of a department of a PCB firm and to identify the responsibility of that department. Keywords: performance evaluation, printed circuit board, productivity, financial ratio.
1 Introduction Taiwan’s PCB market was valued at approximately US$35.23 billion in 2009, making Taiwan the fourth largest PCB producer in the world. The development of the PCB industry has been intimately linked to the growth of the electronics industry, indirectly influencing the electronics products’ market competitiveness. Following the strong growth in information technology (IT), telecommunications, and other consumer electronics in recent years, the demand for flexible PCBs and IC substrates has increased significantly. Taiwan is facing a greater challenge in moving from rigid PCBs production, which is highly competitive, to producing the flexible-PCBs which involves a rapidly evolving technology. Most studies concerning the PCB industry focus primarily on manufacturing or production technologies, such as facility design [1], [2], [3], [4], [5], manufacturing technology [6], [7], [8], [9], [10], [11], [12], [13], and industry groupings [14], [15], [16]. As for the performance evaluation of PCB industry, which directly influences the survival and development of a company, there have been few discussions on this issues [17], [18]. The purpose of this paper is to construct a conceptual framework which will help to form performance indicators involving both productivity and financial ratios for the PCB industry. Besides, this framework also provides a powerful method to divide the PCB firm’s total performance into three parts- production, marketing, and execution. The division is helpful for operators to recognize the performance of a department of a PCB firm and to identify the responsibility of that department. D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 370–379, 2011. © Springer-Verlag Berlin Heidelberg 2011
Using Grey Relation Analysis and TOPSIS
371
2 A Conceptual Framework The objective of most private-owned enterprises is to maximize profits. Whether the activities of an enterprise are efficient or not has direct influence on profitability, thereby potentially threatening the survival of the enterprise [19], [20]. As shown in figure 1, the operational activities of a PCB company include three parts: factor input, product output, and customer consumption. These activities also constitute the three stages of the business operation cycle: production, marketing and execution. The activities of a PCB company can be viewed as a consecutive and cyclic process, and the operators decide on the most suitable factor inputs (e.g. assets, capital, labor, or production capacity) for the current period based on customer consumption during the previous period. At the same time, they pursue greater product output (incurring debts, costs or expenses) in the production stage under a given factor input. Likewise, they seek higher customer consumption (e.g. revenue, income/loss, or sales volume) in the marketing stage given the existing output level. The final result of sales during this period can be used to calculate the remuneration of factor input for this period in the execution stage and then to decide the amount of factor input for the next period. According to this functional departmentation structure, the three stages of a PCB company operation as shown in figure 1 represent the three types of performance categories: production efficiency, marketing efficiency, and execution efficiency, respectively, corresponding to the departments of production, marketing, and management. Production
Marketing
Factor
Product
Consumer
Input
Output
Consumption
Execution Deciding the factor inputs for the next time Fig. 1. Cycle of operation activities of a PCB company
As figure 2 illustrates, the production efficiency of factor input and product output measures a PCB company’s degree of resource usage. Given the same factor input, the efficiency increases as the output level increases. The marketing efficiency of product output and consumer consumption measures a PCB company’s marketing planning capability. Given the same output level, the marketing efficiency increases if consumers are willing and able to buy more products. The execution efficiency of consumer consumption and factor input measures planning and execution capability during its preliminary and interim period. Given the same consumption level, the execution efficiency increases as factor input decreases, which means that the operator can plan objectives in the preliminary period better and execute them in the interim period.
372
R.-T. Wang
□Productivity □Finance 。Labor 。Assets 。Production 。Capital Factor Input
capacity
Total Performance
-Production efficiency -Marketing efficiency -Execution efficiency Production efficiency
□ 。
□ 。 。
Product Output Productivity Finance Turnout Debts Expense
Execution efficiency
Marketing efficiency
□Productivity □Finance 。Sales volume 。Income/Loss Consumer Consumption
Fig. 2. Conceptual framework of the operation performance evaluation for a PCB company Table 1. Items for performance evaluation
Classifications Factor input
Evaluation category Labor Production capacity Assets
Evaluation items number of employees production capacity * * current assets , fixed assets * * machine plant ,total assets * * Capital stock capital , stockholder’s Equity Product output Turnout turnout * * Debts current liabilities , long-term liabilities * total liabilities * * Expense operation cost , operation expense Consumer Sales sales volume * * consumption Income/loss operation revenue , gross profit(loss) * operation income(loss) * * income(loss) before tax ,net income(loss) *accounting items in financial statement.
Using Grey Relation Analysis and TOPSIS
373
This paper first makes a classification both productivity and finance. Labor and production capacity are categorized as the input of production factors, turnout as the output of production factor and sales volume as the outcome of production factor. Furthermore, taking five accounting elements into consideration, assets and the capital of the owner’s equity are categorized as the input of financial factors, debts and expense as the output of financial factors and income/loss as the outcome of financial factors. The evaluation items are shown in Table 1.
3 Initial Performance Indicators Set Two criteria were used to help form the initial indicators set. First, an evaluation indicator should have a preliminary explanatory meaning. For example, the ratio of Table 2. Performance indicators set in production efficiency Classification Liquidation
Code P1 P2 P3
Solvency
P4 P5 P6 P7
Production capacity productivity
P8 P9 P10 P11 P12
Assets productivity
P13 P14 P15 P16
Equity and Labor productivity
P17 P18 P19
Indicator Current ratio Ratio of machine plant to long-term liabilities Fixed/long-term ratio Ratio of machine plant to debt Debt ratio Ratio of capital to debt Debt-equity ratio Ratio of production capacity to long-term liabilities Ratio of production capacity to total debt Ratio of production capacity to operation cost Ratio of production capacity to operation expense Production capacity utilization rate Ratio of turnout to current assets Ratio of turnout to fixed assets Ratio of turnout to machine plant Ratio of turnout to total assets Ratio of turnout to capital Ratio of turnout to equity Ratio of turnout to number of employees
Evaluation formula Current assets/ current liabilities Machine plant/ long-term liabilities Fixed assets/ long-term liabilities Machine plant/total liabilities Total assets / total liabilities Capital/total liabilities Equity/total liabilities Production capacity/long-term liabilities Production capacity/total liabilities Production capacity/operation cost Production expense
capacity/operation
Production capacity/turnout Turnout/current assets Turnout/fixed assets Turnout/machine plant Turnout/total assets Turnout/capital Turnout/equity Turnout/number of employees
374
R.-T. Wang
Table 3. Performance indicators set in marketing efficiency Classification Debts Turnover
Code M1 M2 M3
Profitability
M4 M5 M6
Turnout Marketing capability
M7 M8 M9 M10
Operation marketing capability
M11 M12
Indicator Ratio of operation revenue to current liabilities Ratio of operation revenue to long-term liabilities Ratio of operation revenue to total debt Operation cost ratio Ratio of operation revenue to operation expense Ratio of Net income to (operation cost + operation expense)
Ratio of (Operation revenue-operation cost) to turnout Ratio of net income to turnout Ratio of income before tax to turnout Ratio of sales volume to turnout Ratio of sales volume to operation cost Ratio of sales volume to operation expense
Evaluation formula Operation revenue/current liabilities Operation revenue/long-term liabilities Operation revenue/total liabilities Operation cost/operation revenue Operation revenue/operation expense Net income/ (operation cost + operation expense)
(Operation revenue – operation cost) / turnout Net income/turnout Income before tax/turnout Sales volume/turnout Sales volume/operation cost Sales volume/operation expense
debts per employee, the ratio of operation expense per machine plant, and the ratio of machine plant per current assets are not included in the set because they bear no relevant meaning. Second, if a priori knowledge can be employed to judge the high correlation among evaluation indicators, one of the indicators is chosen as the performance indicator. For example, for marketing efficiency, operation income/loss is closer to the operation of a PCB company than net income/loss, according to the accounting definition. Based on the above two selection criteria and the ratios of both evaluation items in Table 1, the set contains 51 evaluation indicators, which are classified into three main classifications: production, marketing and execution. Among them, 19 in the production category are re-categorized into five groups, including liquidation, solvency, production capacity productivity, assets productivity, and equity/labor productivity (Table 2). Twelve indicators in the marketing category are re-categorized into four groups, including debts turnover, profitability, turnout marketing capability, and operation marketing capability, while 20 evaluation indicators in the execution category are re-categorized into five groups, including assets and stockholder’s turnover, return of investment, sales volume execution capacity, production capacity execution capacity, and labor execution capacity (Table 3 and 4).
Using Grey Relation Analysis and TOPSIS
375
Table 4. Performance indicators set in execution efficiency Classification Assets and stockholder’s turnover
Code C1 C2 C3 C4 C5
Return of investment
C6 C7 C8 C9 C10
Sales volume execution capability
C11 C12 C13 C14
Production capacity execution capacity
C15 C16 C17
Labor execution capacity
C18 C19 C20
Indicator Current assets turnover Fixed assets turnover Machine plant turnover Total assets turnover Stockholder’s equity turnover Return on current assets Return on fixed assets Return on machine plant Return on total assets Return on stockholder’s equity Ratio of sales volume to current assets Ratio of sales volume to fixed assets Ratio of sale volume to machine plant Ratio of sales volume to total assets Ratio of operation revenue to production capacity Ratio of income before tax to production capacity Ratio of sales volume to production capacity Ratio of sales volume to number of employees Ratio of operation revenue to number of employees Ratio of income before tax to number of employees
Evaluation formula Operation revenue/current assets Operation revenue/fixed assets Operation revenue/machine plant Operation revenue/total assets Operation revenue/average stockholder’s equity Income before tax/ current assets Income before tax/ fixed assets Income before tax/ machine plant Income before tax/ total assets Income before tax/average stockholder’s equity Sales volume/current assets Sales volume/fixed assets Sales volume/machine plant Sales volume/total assets Operation revenue/production capacity Income before tax/production capacity Sales volume/production capacity Sales volume/number employees Operation revenue/number employees Income before tax/number employees
of of of
4 Grey Relation Analysis and TOPSIS The purpose of using GRA in this paper is selecting only those representative indicators for evaluation. In general, the representative indicators can be selected by grouping, which minimizes the differences within a certain group, and maximizes the differences between those groups. If the samples are large enough and normally distributed, some statistical or econometrical methods such as factor analysis, cluster analysis, and discriminate analysis can be used to decide the representative indicators. However, if the sample size is small and distribution of samples is unknown, the grey relation analysis should be used to select the representative indicators. Moreover, the TOPSIS method will be used in conjunction to calculate performance scores and outranking. Grey system theory was originated by Deng [21]. The analysis method, which measures the relation among elements based on the degree of similarity or difference of development trends among these elements, is called “grey relation analysis”. More
376
R.-T. Wang
precisely, during the process of system development, should the trend of change between two elements be consistent, it then enjoys a higher grade of synchronized change and can be considered as having a greater grade of relation, otherwise, the grade of relation would be smaller. Grey relation analysis will be applied in the selection of representative indicators. TOPSIS, developed by Hwang and Yoon in 1981[22], will be used as the ranking method in this paper. The advantage of this method is simple and yields an indisputable preference order. But it does assume that each indicator takes monotonic (increasing or decreasing) utility. TOPSIS is based on the concept that the chosen indicator should have the shortest distance from the ideal solution and the farthest from the worst solution. The ideal solution is the one that enjoys the largest benefit indicator value and the smallest cost indicator value among each of the substitutive PCB companies. The worst solution is the one that enjoys the smallest benefit indicator value and the largest cost indicator value among each of the substitute PCB companies.
5 Application This paper applies this developed process to 22 listed Taiwanese companies for evaluation, namely, COMPEQ, WUS, Chin-Poon, GCE, UNIMICRON, Tripod, HSB, Table 5. Classification of indicators groups of production, marketing, and execution Categories
Groups Indicators within each group Representative indicator of each group P-I P1 P1 a Current ratio P2 P3 P8 P14 P15 P17 P-II P15 c Ratio of turnout to machine plant P18 P19 Indicators in production P-III P4 P6 P12 P12 b Production capacity utilization rate efficiency P c Ratio of production capacity to total P5 P7 P9 P10 P11 P13 9 P-IV debt P-V P16 P16 c Ratio of turnout to total assets M1 M8 M10 M-I M8 c Ratio of net income to turnout M2 a Ratio of operation revenue to M2 M7 M9 Indicators in M-II long-term liabilities marketing M5 a Ratio of operation revenue to M M M M M-III 3 4 5 6 efficiency operation expense M12 c Ratio of sales volume to operation M11 M12 M-IV expense C15 c Ratio of operation revenue to C1 C2 C15 C-I production capacity C3 C13 C19 C-II C3 a Machine plant turnover Indicators in C17 b Ratio of sales volume to production C4 C17 C-III execution capacity efficiency C5 C11 C12 C14 C18 C-IV C14 c Ratio of sales volume to total assets C6 C7 C8 C9 C10 C20 c Ratio of income before tax to number C-V of employees C16 C20
、 、 、 、 、 、 、 、 、 、 、 、 、 、
a
、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、
refers to the financial ratios, of which the number total 4; b refers to the productivity indicators, of which the number total 2; c refers to the mixed indicators, of which the number total 8.
Using Grey Relation Analysis and TOPSIS
377
Table 6. Preference orders for total, production, marketing and execution efficiency Outranking for
Companies production
marketing
execution
total
COMPEQ
19(0.143)
15(0.052)
14(0.119)
18(0.065)
WUS
7(0.256)
18(0.019)
15(0.114)
17(0.077)
Chin-Poon
6(0.299)
2(0.454)
3(0.440)
4(0.354)
GCE
8(0.253)
12(0.101)
10(0.167)
11(0.123)
UNIMICRON
5(0.327)
6(0.224)
11(0.156)
6(0.191)
Tripod
21(0.140)
22(0.000)
5(0.312)
13(0.114)
HSB
12(0.209)
16(0.046)
6(0.263)
10(0.131)
CADAC
3(0.334)
14(0.062)
18(0.103)
12(0.115)
Best Friend
18(0.163)
20(0.004)
19(0.063)
21(0.033)
BOARDTEK
20(0.140)
11(0.116)
13(0.124)
16(0.090)
FHT
9(0.237)
8(0.207)
12(0.125)
9(0.144)
YangAn
1(0.805)
4(0.273)
2(0.606)
1(0.593)
VIC
17(0.182)
20(0.004)
21(0.048)
22(0.031)
H&T
13(0.206)
17(0.020)
20(0.060)
19(0.046)
YUFO
16(0.191)
3(0.317)
4(0.329)
5(0.244)
UNITECH
10(0.232)
10(0.145)
17(0.105)
14(0.112)
QEC
4(0.330)
13(0.085)
9(0.189)
8(0.156)
PLOTECH
22(0.119)
9(0.172)
16(0.110)
15(0.102)
PSC
11(0.225)
19(0.006)
22(0.038)
20(0.036)
LIN HORN
2(0.409)
1(0.771)
8(0.197)
2(0.438)
APCB
14(0.196)
5(0.228)
7(0.201)
7(0.165)
TPT
15(0.196)
7(0.218)
1(0.840)
3(0.437)
CADAC, Best Friend, BOARDTEK, FHT, YangAN, VIC, H&T, YUFO, UNITECH, QEC, PLOTECH, PSC, LIN HORN, APCB, TPT. They are involved in the production, assembly, and sales of rigid PCBs. In 2009, the production of single-sided rigid PCB, double-sided rigid PCB, and multilayer rigid PCBs account for over 70% of printed circuit board operating revenue. Performance indicators are grouped into three categories, i.e., production, marketing, and execution, in accordance with the coefficient of each indicator, as shown in Table 5.
378
R.-T. Wang
The TOPSIS method was used to calculate the total performance score of each PCB firm. This performance can be divided into three classifications: production, marketing and execution, according to the normalized value of each representative indicator, and the preference orders for total performance and three efficiency classifications are shown in Table 6. The figures inside the parentheses in Table 6 refer to the relative closeness to the ideal solution. The higher the figure is, the closer the distance.
6 Conclusions This paper develops a performance evaluation model for PCB industry that combines the productivity indicators and financial ratios. This model is then applied to a case study for the performance evaluation of 22 rigid PCB fabrication firms listed on the Taiwan Stock Exchange. The conceptual framework in this paper is more complete than a framework in which only the productivity indicators or financial ratios are considered. Likewise, in order to overcome the limitation of sample size and distribution type, grey relation analysis has been utilized for selecting the representative indicators. It provides a solution for grouping the indicators when the sample size is small and the distribution type is unknown. Total performance of a PCB firm is divided into three categories (production, marketing and execution), based on the cycle of operation and the characteristics of organization. The division of the total performance can successfully be used as a diagnostic tool to provide a preliminary insight into a PCB firm for operators. From the case study done here, the total performance of a PCB firm is not fully equal to its performance in production, marketing or execution. For example, although YangAn, LIN HORN, and TPT rank first, second, and third respectively in total performance, the marketing efficiency of YangAn, the execution efficiency of LIN HORN and the production efficiency of TPT still need improvement. This reveals that it is more difficult to discover problems in operation if the focus is only on the evaluation of total performance.
References 1. Harvel, J.T.: Classification of PCB types for cost effective solutions. In: The 15th Design Automation Conference (1978) 2. Dengiz, B., Akbay, K.S.: Computer simulation of a PCB production line: Metamodeling approach. International Journal of Production Economics 63, 195–205 (2000) 3. Grunow, M., Günther, H.O., Föhrenbach, A.: Simulation-based performance analysis and optimization of electronics assembly equipment. International Journal of Production Research 38, 4247–4259 (2000) 4. Chan, F.T.S., Chan, H.K.: Design of a PCB plant with expert system and simulation approach. Expert Systems with Applications 28, 409–423 (2005) 5. Ryan, A., Lewis, H.: Manufacturing an environmentally friendly PCB using existing industrial processes and equipment. Robotics and Computer-Integrated Manufacturing 23, 720–726 (2007)
Using Grey Relation Analysis and TOPSIS
379
6. Chang, C.C., Cong, J.: An efficient approach to multi-layer layer assignment with application to via minimization. In: The 34th Design Automation Conference (1997) 7. Altinkemer, K., Kazaz, B., Köksalan, M., Moskowitz, H.: Optimization of printed circuit board manufacturing: Integrated modeling and algorithms. European Journal of Operational Research 124, 409–421 (2000) 8. Doniavi, A., Mileham, A.R., Newnes, L.B.: A systems approach to photolithography process optimization in an electronics manufacturing environment. International Journal of Production Research 38, 2515–2528 (2000) 9. Lohan, J., Tiilikka, P., Rodgers, P., Fager, C.M., Rantala, J.: Using experimental analysis to evaluate the influence of printed circuit board construction on the thermal performance of four package types in both natural and forced convection. In: Inter Society Conference on Thermal Phenomena (2000) 10. Giachetti, R.E., Alvi, M.I.: An object-oriented information model for manufacturability analysis of printed circuit board fabrication. Computers in Industry 45, 177–196 (2001) 11. Chenard, J.S., Chu, C.Y.: Design methodology for wireless nodes with printed antennas. In: The 42nd Design Automation Conference (2005) 12. Tsai, C.Y., Chiu, C.C., Chen, J.S.: A case-based reasoning system for PCB defect prediction. Expert Systems with Applications 28, 813–822 (2005) 13. Tsai, C.Y., Chiu, C.C.: A case-based reasoning system for PCB principal process parameter identification. Expert Systems with Applications 32, 1183–1193 (2007) 14. Cullen, A.: Comparison of measured and predicted environmental PCB concentrations using simple compartmental models. Environmental Science & Technology 36, 2033–2038 (2002) 15. Rossetti, M.D., Stanford, K.J.A.: Group sequencing a PCB assembly system via an expected sequence dependent setup heuristic. Computers and Industrial Engineering 45, 231–254 (2003) 16. Yu, S., Kim, D., Park, S.: Integer programming approach to the printed circuit board grouping problem. International Journal of Production Research 43, 1667–1684 (2005) 17. Liu, S.T., Wang, R.T.: Efficiency measures of PCB manufacturing firms using relational two-stage data envelopment analysis. Expert Systems with Applications 36, 4935–4939 (2009) 18. Wang, R.T., Ho, C.T., Oh, K.: Measuring production and marketing efficiency using grey relation analysis and data envelopment analysis. International Journal of Production Research 48(1), 183–199 (2010) 19. Feng, C.M., Wang, R.T.: The comparative analysis of applying different indicator types to the performance evaluation in both airlines and highway bus companies. Journal of Eastern Asia Society for Transportation Studies 4, 13–25 (2001) 20. Feng, C.M., Wang, R.T.: Considering the financial ratios on the performance evaluation of highway bus industry. Transport Reviews 21, 449–467 (2001) 21. Deng, J.L.: Control problems of grey systems. Systems and Control Letters 5, 288–294 (1982) 22. Hwang, C.L., Yoon, K.: Multiple attribute decision making: methods and applications. Springer, Heidelberg (1981)
Application of Neural Network for the Prediction of Eco-efficiency Sławomir Golak1, Dorota Burchart-Korol2, Krystyna Czaplicka-Kolarz2, Tadeusz Wieczorek1 1
Silesian University of Technology, Poland 2 Central Mining Institute, Poland
Abstract. This paper presents the application of neural networks in the design process of new technologies taking into account factors such as their influence on the environment and the economic effects of their implementation. The use of neural networks allowed eco-efficiency assessment of technologies based on highly reduced number of descriptive design parameters, which are very difficult to collect at the conceptual design stage. The great diversity of technologies involved along with the small number of available examples made difficult to construct a neural model and demanded careful data preprocessing and network structure selection. Keywords: eco-efficency, prediction, feature selection, neural network structure optimization.
1 Introduction A technology life-cycle consists of many stages, from concept design, through detailed designing process, implementation and use phase, all the way to its liquidation. So far the principal criterion for the evaluation of technology has been its economic efficiency, i.e. the ratio of expenditures for its development and operation to the profit return. Recently global economy is paying more attention to the environmental performance of various technologies. For these reasons environmental impact assessment (expressed in eco-indicator points) and economic-environmental impact assessment are becoming an inseparable element of the design process. Both these aspects together are expressed in eco-efficiency indicator points. Classic analytical methods for eco-efficiency assessment require thorough economic and environmental analyses. The main drawback of this approach, beside its being a laborious process, is the fact that the analyses are possible to carry out only on the basis of a completed design of the technology assessed. As a result, the environmental impact assessment can be obtained only after the design phase has been finished, and quite frequently not until the implementation process, which, in the case of bad indicators, means a costly necessity of re-design process. Since such enterprises usually involve complex and multistage financing systems, a return to designing phase may even prove impossible. D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 380–387, 2011. © Springer-Verlag Berlin Heidelberg 2011
Application of Neural Network for the Prediction of Eco-efficiency
381
Therefore, it was essential to find such tools that would allow the assessment of economic and environmental efficiencies in the earliest possible design phase and on the basis of a smaller data set than the one required in the classic analyses. A way to replace analytical methods is the application of models based on the data which describe the parameters and influences of the completed and implemented technologies. In the selection process of such a model, a few factors must be taken into account, and they are namely: nonlinear or even discontinuous character of the dependencies between the parameters and the impact indicators of the technology, high level of abstraction of eco-efficiency indicator, and the fact that the ultimate measure of technological impact is greatly influenced by subjective, stochastic decisions of the person who carries out the classic analysis. So far neural networks have been used to predict only ecological effects of the implemented technologies [1-4]. This paper presents research concerning the prediction of both economic and environmental effects expressed in eco-efficiency indicator points.
2 Eco-efficiency Eco-efficiency is a new concept in environmental management which integrates environmental considerations with economic analysis to improve products and technologies. Eco-efficiency is a strategic tool and is one of the key factors of sustainable development. Eco-efficiency analysis allows designers to find the most effective solution taking into account economic aspects and environmental compatibility of products/technologies. The environmental impact should be as low as possible while the economic performance should be the highest achievable. Eco-efficiency tools used in the Central Mining Institute are Life Cycle Assessment (LCA) and Net Present Value (NPV). The main purposes of eco-efficiency analysis are the following: reducing consumption of resources, reducing the environmental impact, increasing the product value added and increasing economic efficiency of production while reducing environmental impact. The purpose of eco-efficiency is to maximize value creation while having minimized the use of resources and emissions of pollutants. In the literature sources can be found some formulas for eco-efficiency calculation based on the various available data on the environmental impact and the size and cost of production. The eco-efficiency analysis is used for comparing similar products/technologies in order to choose the best solution with the lowest cost and the least environmental impact (eco-effective solutions). Central Mining Institute in Poland has a lot of experience in eco-efficiency analysis and has been conducting this kind of analysis for several years now. Their research methodology is based on eco-efficiency tool of environmental management and economic tools. Environmental assessment is based on LCA analysis, while economic indicators are calculated on the basis of NPV (Net Present Value) [5]. Life Cycle Assessment (LCA) is an environmental assessment method for evaluation of impacts that a product, process or technology has on the environment over the entire period of its life – from the extraction of the raw material through the manufacturing, packaging and marketing processes, the use, re-use and maintenance of the
382
S. Golak et al.
product or technology, to its eventual recycling or disposal as waste at the end of its useful life. The second element of the eco-efficiency analysis is net present value NPV. NPV method involves discounting the stream of revenue and expenditure and calculating their present value. It allows designers to achieve comparability of receipts and payments at various times by converting them into value at the specified time base. Calculating the net present value NPV involves summing all the net benefits (net cash flows) associated with the technology/product performance throughout the economic life cycle which is discounted before aggregation, which boils down to one moment of time in order to unify their monetary value. In order to calculate eco-indicator, specific tools should be used (e.g. Sima Pro), and input and output data of technology should be defined very precisely, including all life cycle phases: the construction phase, operation and decommissioning. Carrying out a full LCA analysis is a very difficult task and often are introduced some limitations. Obtaining data on the construction and decommissioning phase of each plant is often a very difficult and laborious task. Since it is the operation stage which has the greatest impact on eco-efficiency indicator for most of the analysed technologies, it should be considered the most important phase in technology life cycle from the users’ point of view.
3 Data Preprocessing The main problem with artificial neural networks used to predict eco-efficiency is low availability of data necessary to create the model. It must be remembered that the undertakings in question are usually multi million dollar investments, which naturally limits the number of cases. Additional difficulty is the fact that such enterprises usually involve financial secrecy and trade secret, which prevents the creators of a given technology from sharing all the information they possess with third parties. The resulting situation is highly problematic because of the complexity of the modeled dependency combined with small number of data. Since in the study presented here the data were collected only on a few dozen technologies, it was essential that all the cases be considered. Two major problems were taken in considerations: ─ the studied technologies belonged to three different fields: energy, material, environmental. ─ the scale of the technologies ranged from local implementations to national undertakings, which meant analogous range of their parameter variability and environmental impact measures. In these particular circumstances of neural networks application, the primary principle of features selection was their availability at various stages of design process and the cost of their acquisition. The set of potential features included the following: ─ predictors of resource use, energy consumption, waste and emissions for the stages of construction, operation and decommissioning, ─ environmental impact indicators for the stages of construction, operation and decommissioning which are functionally dependent on the predictors,
Application of Neural Network for the Prediction of Eco-efficiency
383
─ indicators of ecological effects, impact on human health, and resource use which are functionally dependent on the predictors, and thereby the indicators for stages, ─ total eco-indicator of the environmental impact of technology which is functionally dependent on the eco-indicators for all stages and kinds of impact, ─ social impact, ─ lifetime of technology and its annual, total product, ─ financial effect NPV. Since it is the operation phase that seems to have the greatest influence on the ecoefficiency indicator for most of the analysed technologies, and the data concerning this phase are relatively easily available, at the first stage of features selection only the parameters describing this phase were chosen to predict the total impact of the technologies. The order of magnitude of some of the features under consideration can undergo even a ninefold change. This is why, among others, the basic set of features was extended by adding the following features made from the primary features: ─ Transformed to logarithmic scale
(
)
ξ i , j = ln x i , j + 1 ⋅ sign (x i , j )
(1)
where: xi,j- original j-th feature of i-th technology, ξi,j- scaled j-th feature of i-th technology ─ Ratio of original features
ξ i , jk =
xi , j xi , k
(2)
where: xi,j -original j-th feature of i-th technology, xik - original k-th feature of i-th technology, ξi,jk - scaled jk-th feature of i-th technology Since eco-efficiency has very different orders of magnitude, scaled variants of this value were created using analoguous transformations. Extremely low number of items in the created data set (only 39 examples) which is to be used to construct a neural model has the following consequences: ─ statistical tools [6] cannot be used in features selection, ─ construction time of a single neural model (for one combination of features) is very short, ─ there is a possibility of using the brute-force search method to find the optimum model-oriented set of features. The diagram in Fig.1 shows an algorithm for features selection based on neural models for all subsets of the extended set of features. In the process of generating the subsets for individual models, it was decided that each feature could be taken into account only once, either in its primary form or transformed into one of the scales.
384
S. Golak et al.
An initial set of features and a basic va riant of eco-effi cienc y
A manual selection of feat ures based on data availabili ty The bas ic s et extended by scaled features and eco-effi cienc y
C onstructing of a neural model for a s ubset of features and a ecoefficiency variant
: C onst ructing of a neural model for a subset of features and a ecoeffici ency variant.
A s election of subset s of feat ures and ecoefficiency variant
Selecti on of a subset of features and ecoeffici ency variant ba sed of prediction errors of neural models
Model wi th defined a st ructure, input features and an eco-efficiency vari ant
Fig. 1. An algorithm of the brute-force feature selection
The decision was made on the ground that multiple considering of any feature would not introduce any new information to the model, and at the same time it would dramatically increase the number of iterations of the selection algorithm. Additionally, the feature values and the output value were standardized. Before calculating the prediction error for individual features subsets and variants of the estimated values, the results provided by the network were transformed back to their original scales so that the comparison between various models be possible.
4 Creating of a Neural Model The classic multi-layer perceptron was chosen to predict eco-efficiency as this neural model proves efficient when the prediction of continuous values is involved, and this type of network structure is also successful with small data sets [7, 8]. For the same reasons as above, in the case of data preprocessing, the optimal network structure for the input features and the output variant set by the superior algorithm was determined by the brute-force search method. The searching limits were determined on the basis of Kolmogorov’s theorem, and the only nets that were considered were those with one or two hidden layers with sigmoidal transfer function, and linear output layer. The limited numbers of neurons in particular layers were selected with a view to the time needed to run the iteration algorithms for network structure selection executed within the features selection algorithm. In order to verify that the selected maximum numbers of neurons were justified, the influences of the numbers of the descriptive parameters on the neural network errors were shown in graphs. The number of the network descriptive parameters was calculated from Formula (3) for networks with one hidden layer, and from Formula (4) for networks with two hidden layers.
Application of Neural Network for the Prediction of Eco-efficiency
385
N P = (N I + 2) ⋅ N L1 + 1
(3)
N P = (N I + 1) ⋅ N L1 + ( N L1 + 2) ⋅ N L 2 + 1
(4)
where: NP -number of network parameters, NI - number of network inputs, NL1 - number of neurons in the first hidden layer, NL2 - number of neurons in the second hidden layer.
Fig. 2. Dependence of a neural model error on number of its parameters
The cloud of 420 points representing structures of tested network that can be seen in the exemplary graph (for the best set of input features) is the result of random initialization of neural network weights. Although owing to the multi-start method, which was applied, each of the points represents an average for 10 networks of the same structure and various initial weights values, it was not sufficient to completely eliminate error value fluctuations. However, from this cloud of points, which is the result of calculation limitations, a tendency can be noticed for the error values to increase in the case of bigger structure of networks, which justifies the assumed limitations concerning the network size.
5 Results Since the data set used in the presented study was relatively small, in order to obtain the greatest possible learning data set the constructed neural network was validated by means of cross validation type Leave-one-out, which consists in assigning only one example at each iteration in the validation set. Additional and very important advantage of this kind of validation is obtaining eco-efficiency prediction errors individually for each technology considered, which increases the possibilities of substantive assessment of the obtained results by indicating the technologies distinguishing from others. Because the neural network was used to predict eco-efficiency of many various technologies with very different orders of magnitude of its indicator, the only possibility of determining a common unit of measure was to base it on relative errors:
386
S. Golak et al.
E=
1 N
N
∑ i =1
yi − d i ⋅ 100% yi
(5)
where: E – error measure, N- number of technologies, yi – network prediction for i-th test technology, di – expected value of the indicator for i-th test technology. The application of the above algorithm for features selection and network structure selection made it possible to achieve very high accuracy of eco-efficiency prediction (1,54%) on the basis of data relatively easily available at the conceptual stage (Table 1, p.1). Encouraged by this success, we made a few some several attempts to predict eco-efficiency relying exclusively on the input data used in analytical calculations of eco-indicator for the operation phase. Lower accuracy of prediction (Table 1, p.2) obtained for all predictors of the operation phase indicates that potential additional information introduced to the model will not compensate for the growth of network structure when the number of learning data remains unchanged. The subsequent prediction models for individual predictors are characterized by lower accuracy (p. 3-6); however, considering three times less labour required to prepare the data compared with eco-indicator, they are an attractive alternative. Table 1. Accuracy of eco-efficiency prediction
1 2 3 4 5 6
Input features Eco-indicator for operation phase, social impact, NPV Predictors of resources, energy, waste and emissions, social impact, NPV Predictor of resources, social impact, NPV Predictor of energy, social impact, NPV Predictor of waste, social impact, NPV Predictor of emissions, social impact, NPV
Structure
Relative error
3-5-17-1
1,54%
6-12-20-1
2,05%
3-11-12-1 3-17-6-1 3-5-6-1 3-9-18-1
3,75% 6,51% 8,41% 6,51%
6 Conclusions A possibility of predicting economic and ecological effects of new technologies at the stage of their conceptual design and relying in this respect on a simplified set of descriptive parameters allows the designers to avoid costly mistakes when the first strategic decisions are made. The main difficulty with the application of neural network to do the task is a limited number of learning data, and what makes the situation even more problematic is relatively high complexity of dependencies between the parameters and the technological impacts. On the other hand, a small learning dataset, which means potentially small neural network, makes it possible to apply brute-force search methods, both for finding optimal network structure and input data set. Owing to this solution the level of accuracy of eco-efficiency prediction has been so high for easily available parameters that the predictions facilitate conceptual decisions about technologies to a large degree.
Application of Neural Network for the Prediction of Eco-efficiency
387
It should be noticed that a characteristic feature of the presented application of neural network is the fact that it may be a source of additional knowledge about the technologies whose data were used to create the neural network. Significant over- or underestimation of their eco-efficiency indicates that a given technology is not typical and needs further analyses, contrary to classic approach, where such a case is frequently disregarded. The fact that a large part of descriptive parameters can be disregarded proves the existence of strong dependencies between them which are very hard to define. This situation encouraged environmental protection specialists to thorough studies in this field. Additional information on the meaning and synergic effect of individual parameters might be obtained by applying one of methods of a knowledge extraction from neural networks [9].
References 1. Seo, K.-K., Min, S.-H., Yoo, H.-W.: Artificial neural network based life cycle assessment model for product concepts using product classification method. In: Gervasi, O., Gavrilova, M.L., Kumar, V., Laganá, A., Lee, H.P., Mun, Y., Taniar, D., Tan, C.J.K. (eds.) ICCSA 2005. LNCS, vol. 3483, pp. 458–466. Springer, Heidelberg (2005) 2. Sousa, I., Wallace, D.: Product classification to support approximate life-cycle assessment of design concepts. Technological Forecasting & Social Change 73, 228–249 (2006) 3. Seo, K.-K., Kim, W.-K.: Approximate Life Cycle Assessment of Product Concepts Using a Hybrid Genetic Algorithm and Neural Network Approach. In: Szczuka, M.S., Howard, D., Ślȩzak, D., Kim, H.-k., Kim, T.-h., Ko, I.-s., Lee, G., Sloot, P.M.A. (eds.) ICHIT 2006. LNCS (LNAI), vol. 4413, pp. 258–268. Springer, Heidelberg (2007) 4. Li, J., Wu, Z.: Application of neural network on environmental impact assessment tools. Int. J. Sustainable Manufacturing 1(1/2), 100–121 (2008) 5. Czaplicka-Kolarz, K., Wachowicz, J., Bojarska-Kraus, M.: A Life Cycle Method for Assessment of a Colliery Colliery’s Balance. The International Journal of Life Cycle Assessment 9, 247–253 (2004) 6. Duch, W., Wieczorek, T., Biesiada, J., Blachnik, M.: Comparison of feature ranking methods based on information entropy. In: IEEE International Conference on Neural Networks, vol. 2, pp. 1415–1419 (2004) 7. Lanouette, R., Thibault, J., Valade, J.L.: Process modeling with neural networks using small experimental datasets. Computers and Chemical Engineering 23, 1167–1176 (1999) 8. Li, D.C., Yeh, C.W.: A non-parametric learning algorithm for small manufacturing data sets. Exepers Systems with Application 32, 391–398 (2008) 9. Wieczorek, T., Golak, S.: An algorithm of knowledge extraction from trained neural networks. In: IIPWM 2004. Advances in Soft Computing, Springer, Heidelberg (2004)
Using Associative Memories for Image Segmentation Enrique Guzmán1, Ofelia M.C. Jiménez1, Alejandro D. Pérez1, and Oleksiy Pogrebnyak2 1
Universidad Tecnológica de la Mixteca, Carretera Acatlima Km 2.5, Huajuapan de Le+on, 69000 Oaxaca, México {eguzman,ojimenez,aperez}@mixteco.utm.mx 2 Centro de Investigación en Computación – IPN, Av, Juan de Dios Bátis s/n, Del. Gustavo A. Madero, 07738, México D.F. [email protected]
Abstract. In this paper, the authors have proposed an algorithm of segmenting grayscale image using associative memories approach. The algorithm is divided in three steps. In the first step, a set of regions (classes), where each one groups to a certain number of pixel values, is obtained. In the second step, the associative memories training phase is applied to the information obtained from first phase and an associative network, that contains the centroids group of each of the regions in which the image will be segmented, is obtained. Finally, using the associative memories classification phase, the centroid to which each pixel belongs is obtained and the image segmentation process is completed. Keywords: Image segmentation, associative memories, clustering techniques.
1 Introduction The goal of image segmentation is to find and isolate homogeneous regions contained in an image. For this purpose, some characteristics or computed properties such as color, intensity or texture are considered. The segmentation techniques can be based on a number of diverse methods: clustering methods, probabilistic methods, region growing methods, histogram-based methods, graph partitioning methods and neural networks-based methods. The central focus of this work is the image segmentation based on clustering methods. This kind of method, first, divides the image into a small amount of regions; then, assigns each image pixel to the region that presents the nearest matching based on similarity criterion. In clustering methods, the k-means algorithm is the most popular technique, and it is used to partition an image into k clusters [1], [2]. In [3], Yan Ling Li and Yi Shen presented an extended version of robust fuzzy clustering method (RFCM) for image segmentation of noisy images [4]. This new algorithm is based on the kernel-induced distance measure which extends the RFCM algorithm to the corresponding kernelled version, named “Kernel Robust Fuzzy C Means” (KRFCM) by the kernel methods. Furthermoreïïthe KRFCM algorithm includes a class of robust non-Euclidean distance measures for the original D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 388–396, 2011. © Springer-Verlag Berlin Heidelberg 2011
Using Associative Memories for Image Segmentation
389
data space to derive new objective functions and thus clustering the non-Euclidean structures in data. Shinichi Shirakawa and Tomoharu Nagao proposed a method for evolutionary image segmentation based on multi-objective clustering [5]. First, this algorithm initializes the set of pixels using the minimum spanning tree (MST) technique. Next, by means of the multi-objective clustering with automatic k-determination (MOCK), developed by Handle and Knowles [6], [7], two complementary objectives based on cluster compactness and connectedness are optimized using a multi-objective evolutionary algorithm (MOEA) [8], [9]. In this case the objectives are on overall deviation (measure of the similarity of pixels in the same region) and edge value (measure of the difference in the boundary between the regions). MOCK returns not just one solution, but an entire set of solutions. Finally, the Strength Pareto Evolutionary Algorithm 2 (SPEA2) is used to find the optimal solution. In [10], Zhang Xue-xi and Yang Yi-min suggests a hybrid intelligent color image segmentation method. This work combines two segmentation approaches: region growing and clustering. Region growing is used to generate initial segmentation; it focuses on local variations of an image with fast speed. The final image segmentation is realized by the minimum spanning tree (MST) method which looks every region produced by the region growing process as a node and it is used to extract the global property of the image, then a particle swarm optimization is used to get the best threshold of MST and improve the algorithm speed. In this paper, we propose an efficient grayscale image segmentation algorithm based on the Extended Associative Memories (EAM). The proposed algorithm is divided in three phases. First, the suitable features that will form the feature vectors are extracted using a histogram technique; the result of this phase is a set of classes where each one groups to a certain number of pixel values. Second, the EAM training phase is applied to the information obtained from the first phase; the result of this phase is an associative network that contains the centroids group of each of the regions in which the image will be segmented. Finally, the centroid to which each pixel belongs is obtained at the EAM classification phase, and the image segmentation process is completed. The remaining sections of this paper are organized as follows. In next Section, a brief theoretical background of EAM is given. In Section 3 we describe our proposal, the grayscale image segmentation algorithm based on EAM. Numerical simulation results obtained for the conventional image segmentation techniques (k-means) and the proposed algorithm, when it uses prom and med operators, are provided and discussed in Section 4. Finally, Section 5 contains the conclusions of this paper.
2 Extended Associative Memories An associative memory designed for pattern classification is an element whose fundamental purpose is to establish a relation of an input pattern x = [xi ]n with the index i of a class ci (see Fig. 1).
390
E. Guzmán et al.
Fig. 1. Scheme of associative memory for pattern
Let {(x1 , c1 ), (x 2 , c 2 ),..., (x k , ck )} be k couples of a pattern and its corresponding class index defined as the fundamental set of couples, composed by a pattern and its corresponding class-index. The fundamental set of couples is represented by
{(x where x μ ∈ R n and
μ
)
, c μ | μ = 1,2,..., k
}
(1)
cμ = 1,2,..., N . The associative memory is represented by a
matrix generated from the fundamental set of couples y denoted by M. In 2004, H. Sossa et al. proposed an associative memory model for the classification of real-valued patterns [11]. This model, named Extended Associative Memory (EAM), is an extension of the Lernmatrix model proposed by K. Steinbuch [12]. The EAM is based on the general concept of the learning function of associative memory and presents a high performance in pattern classification of real value data in its components and with altered pattern version [13]. 2.1 The EAM Training Phase The EAM training phase consists on evaluating function φ for each class. Then, the matrix M can be structured as: ⎡ φ1 ⎤ M = ⎢⎢ # ⎥⎥ ⎢⎣φ N ⎥⎦
(2)
where φi is the evaluation of φ for all patterns of class i, i = 1,2,..., N . The function φ can be evaluated in various manners. The arithmetical average operator (prom) is frequently used in the signals and images treatment. In [11], the authors studied the performance of this operator and the median operator (med) to evaluate the function φ . The goal of the training phase is to establish a relation between an input pattern x = [xi ]n , and the index i of a class ci . Considering that each class is composed for q patterns x = [xi ]n , and that
φi = (φi ,1 ,...,φi,n ) . Then, the training phase of the EAM, when the prom operator is used to evaluate the function φi , is defined as:
Using Associative Memories for Image Segmentation
φi , j =
1 q ∑ x j ,l , j = 1,..., n q l =1
391
(3)
The training phase of the EAM, when the med operator is used to evaluate the function φi , is defined as: q
φ i , j = med x j ,l , j = 1,..., n
(4)
l =1
The memory M is obtained after evaluating all functions φi . In the case when N classes exist and the vectors to classify are n-dimensional, the resultant memory M = [mij ]N ×n is: ⎡ φ1,1 φ1, 2 ⎢φ φ 2,2 2 ,1 M=⎢ ⎢ # # ⎢ ⎣φ N ,1 φ N , 2
" φ1, n ⎤ ⎡ m1,1 " φ 2,n ⎥⎥ ⎢⎢ m 2,1 = % # ⎥ ⎢ # ⎥ ⎢ " φ N , n ⎦ ⎣m N ,1
m1, 2 m2, 2 # m N ,2
m1, n ⎤ " m 2,n ⎥⎥ % # ⎥ ⎥ " m N ,n ⎦ "
(5)
2.2 The EAM Classification Phase The goal of the classification phase is the generation of the class index to which an input pattern belongs. The pattern classification by EAM is done when a pattern x μ ∈ R n is presented to the memory M generated at the training phase. The EAM posses the feature of classifying a pattern not necessarily one of those already used to build the memory M. When the prom operator is used, the class to which x belongs is given by ⎤ ⎡N n i = arg ⎢ ∧ ∨ mlj − x j ⎥ l ⎣l =1 j =1 ⎦
(6)
In this case, the operators ∨ ≡ max and ∧ ≡ min perform morphological operations on the difference of the absolute values of the elements mij of M and the component x j of the pattern x to be classified. When the EAM uses the med operator, the class to which x belongs is given by n ⎡N n ⎤ i = arg ⎢ ∧ med mlj − med x j ⎥ l = 1 j j = 1 = 1 l ⎣ ⎦
(7)
In this case, the operator ∧ ≡ min performs a morphological erosion over the absolute difference of the median of the elements mij of M and the median of the component x j of the pattern x to be classified. The Theorem 1 and Corollaries 1-5 from [11] govern the conditions that must be satisfied to obtain a perfect pattern classification. Either the pattern may be from the fundamental set of couples or be an altered version of the pattern.
392
E. Guzmán et al.
3 Proposed Algorithm In this section, we introduce our proposal, the grayscale image segmentation algorithm. The algorithm is divided in three phases. In the first phase, by means of the histogram technique, a uniform distribution of the image pixel values is determined. The result of this step is a set of regions (classes) where each one is grouped to a certain number of pixel values. The histogram of a digital image with gray levels in the range [0, L − 1] may be defined as a discrete function given by: h( rk ) = n k
(8)
where, L = 2 m , m is the number of bits/pixel, k = 0,1,2,..., L − 1 , rk is the k-th gray level, and n k is the number of pixels in the image having gray level rk . Making use of the generated histogram, the fundamental set of couples that will be used to generate the associative memory is obtained. For this purpose, first, the elements rk where h( rk ) = 0 are excluded from the process. Then, the set of pixel values, denoted as X, is used to form the regions is defined as follows:
rk ∈ X → h(rk ) ≠ 0
(9)
The set X is formed by e elements, where e ≤ L . Next, the elements that integrate each region are defined. If in the segmentation process μ regions are defined, then the number of elements that will group each region is e μ . Thus, the limits of regions are defined by l i = (i )(e μ ) and the belonging of a pixel to a region is defined as follows rk ∈ c1 → rk ≤ l1
(10)
rk ∈ c 2 → l1 < rk ≤ l 2 rk ∈ c μ → l μ −1 < rk ≤ e
and the fundamental set of couples is defined as
(r0 )1 ∈ c1 (r1 )2 ∈ c1
(r ) ∈ c (r ) ∈ c
#
#
(r ) l1 −1
e
μ
∈ c1
1
l1
"
2
2
l1 +1
(r ) l 2 −1
e
μ
2
∈ c2
" % "
(r ) ∈ c (r ) ∈ c 1
μ
μ
2
μ +1
(re )
μ
(11)
# e
μ
∈ cμ
We observe that each feature vector contains only one descriptor, that is r j = [ri ] p , where p = 1 ; for simplicity we will denote it as r j , j = 1,2,..., e μ . In the second phase, the region centers are generated applying the EAM training phase to the fundamental set of couples, defined by the equation (11). Each element r that is a part of a class of the fundamental set of couples includes only one image characteristic, the intensity (gray level), thus the algorithm handles
Using Associative Memories for Image Segmentation
393
scalar numbers instead of vectors. Furthermore, each class is composed of e μ ele-
( )
ments r; therefore, the function that evaluates each class is φi = φ1i , i = 1,2,..., μ . Due to this fact, the generalizing learning mechanism of the EAM, denoted by the expressions (3) and (4), must be rewritten. When the prom operator is used to generate M, the expression (3) is rewritten as:
φ i = mi =
e
μ
μ
∑r e
j
(12)
j =1
when med operator is used to generate M, the expression (4) is rewritten as: e
μ
φ i = mi = med r j j =1
(13)
where i = 1,2,..., μ . Since μ classes or regions exist and the elements that integrate them contain only one descriptor, then the associative network, that contains the centroids group of each of the regions in which the image will be segmented, is represented by a column vector M = [m h ]μ : ⎡ φ 1 ⎤ ⎡ m1 ⎤ ⎢ 2 ⎥ ⎢m ⎥ φ 2 M=⎢ ⎥=⎢ ⎥ ⎢ # ⎥ ⎢ # ⎥ ⎢ μ⎥ ⎢ ⎥ ⎣⎢φ ⎦⎥ ⎣⎢m μ ⎦⎥
(14)
The synaptic weight, mh , represents the centroid of the region h, h = 1,2,..., μ . Finally, in the third phase, applying of the classification phase of the EAM, the region to which each image pixel belongs is obtained and the image segmentation process is completed. In this case, each feature vector contains only one descriptor, denoted by r ( p = 1 ), and μ regions have been defined, expression (11). Then, the computation of d (r , m h ) when the prom operator is used to generate the p
memory M is defined by d (r , m h ) = ∨ m hj − r j ; for this case p = 1 , then j =1
d (r , m h ) = m h − r
(15)
and, when the med operator is used to generate the memory M is defined by p
p
j =1
j =1
d (r , m h ) = med m hj − med r j , as p = 1 , then d (r , m h ) = m h − r
(16)
One can conclude that for three different operators the degree of belonging of the pixel r with each one of the μ regions is defined by the same expression. The result is a μ -dimensional column vector
394
E. Guzmán et al.
⎡ d (r , m1 )⎤ ⎢ d (r , m )⎥ 2 ⎥ ⎢ ⎢ # ⎥ ⎢ ⎥ ⎣⎢d (r , m μ )⎦⎥
Finally, the morphologic operator min is applied to this vector to establish which class belongs to a new feature vector. The class is indicated by the index of the row of M that present the nearest matching with the feature vector r. ⎤ ⎡μ index = centroid = arg ⎢ ∧ d (r , mμ )⎥ h = 1 h ⎣ ⎦
(17)
The segmentation process finishes when all pixels have been replaced by the centroid value of the class (index) to which it belongs. Since this algorithm is used to segment grayscale images and uses only one descriptor to form the feature vectors, it is the simplest case and, therefore, it offers the highest processing speed and lowest demand of system memory.
4 Experimental Results and Discussion In this Section, we present the experimental results obtained when the proposed algorithm is applied to grayscale images. For this purpose, a set of test images with 256 gray levels was used in simulations. The experiments include a comparative performance for frequently-used image segmentation by the clustering method, the k-means, and the proposed algorithm when it uses prom and med operators. In our experiment, the two algorithms (k-means and our proposal with prom and med operators) are applied on test images by modifying the parameter μ . This experiment allows observing the behavior of the algorithms for diverse cluster numbers. Also, it allows a human visual evaluation (subjective analysis) of the results obtained. The Figs. 2 and 3 show the visual results of this process on pigs, and ballerina images.
Fig. 2. Results of applying k-means algorithm and proposed algorithm, with prom and med operator, to pigs image
Using Associative Memories for Image Segmentation
395
Fig. 3. Results of applying k-means algorithm and proposed algorithm, with prom and med operator, to ballerina image
5 Conclusiones This paper introduces a novel clustering image segmentation algorithm based on extended associative memories. In the experimental result section we present a subjective analysis of our proposal and the traditional clustering segmentation method, the k-means. Based on these results, we can conclude that our algorithm presents a better performance in the segmentation process that the traditional image segmentation by clustering method, k-means. The proposed algorithm, when uses the prom operator, has a low-computational complexity since its operation is based on applied morphological operations over the absolute values of differences; ïthis fact results in a high processing speed. Furthermore, when the extended associative memories are used as a classifier, they present the property of replacing an input vector by the centroid index that presents the best similarity, without importing that the vector has not been used in the centroids group generation. Since the group of centroids define the regions into which the image will be segmented, this ability of the EAM allows an efficient isolation of the objects of interest contained in the image.
References 1. Salem, A., Samma, B., Abdul, R.: Adaptation of K-Means Algorithm for Image Segmentation. Proceedings of World Academy of Science, Engineering and Technology 38, 58–62 (2009)
396
E. Guzmán et al.
2. Dragan, M., Pavlovic, M., Reljin, I.: Image Segmentation Method Based on Self Organizing Maps and K-Means Algorithm. In: 9th Symposium on Neural Network Applications in Electrical Engineering, Belgrade, Serbia, pp. 27–30 (2008) 3. Yanling, L., Yi, S.: Robust Image Segmentation Algorithm Using Fuzzy Clustering Based on Kernel-Induced Distance Measure. In: International Conference on Computer Science and Software Engineering, Wuhan, China, vol. 1, pp. 1065–1068 (2008) 4. Yang, Z., Chung, F.L., Shitong, W.: Robust fuzzy clustering based image segmentation. Applied Soft Computing Journal 9(1), 80–84 (2008) 5. Shirakawa, S., Nagao, T.: Evolutionary Image Segmentation Based on Multi objective Clustering. In: IEEE Congress on Evolutionary Computation, Trondheim, Norway, pp. 2466–2473 (2009) 6. Handle, J., Knowles, J.: Multiobjective clustering with automatic determination of the number of clusters. Technical Report TR-COMPSYSBIO-2004-02, Department of chemistry, UMIST, Manchester (2004) 7. Handle, J., Knowles, J.: An evolutionary approach to multiobjective clustering. IEEE Transactions on Evolutionary Computation 11(1), 56–76 (2007) 8. Deb, K.: Multi-Objective Optimization Using Evolutionary Algorithms. John Wiley & Sons, New York (2001) 9. Coello, C., Lamont, G., Van Veldhuizen, D.: Evolutionary Algorithms for Solving MultiObjective Problems, 2nd edn. Springer, New York (2007) 10. Xue-xi, Z., Yi-min, Y.: Hybrid Intelligent Algorithms for Color Image Segmentation. In: Chinese Control and Decision Conference, Yantai, China, pp. 264–268 (2008) 11. Sossa, H., Barrón, R., Vázquez, A.: Real-valued Patterns Classification based on Extended Associative Memory. In: Fifth Mexican International Conference on Computer Science, México, pp. 213–219. IEEE Computer Society, Los Alamitos (2004) 12. Steinbuch, K.: Die Lernmatrix. Kybernetik 1(1), 26–45 (1961) 13. Barron, R.: Associative Memories and Morphological Neural Networks for Patterns Recall (in Spanish). PhD dissertation, Center for Computing Research - National Polytechnic Institute, México (2006)
Web Recommendation Based on Back Propagation Neural Networks Jiang Zhong, Shitao Deng, and Yifeng Cheng College of Computer Science, Chongqing University, 400030 Chongqing, China {zhongjiang,dengshitao,yifengcheng}@cqu.edu.cn
Abstract. Back Propagation Neural Networks are simple, effective, robust and great to estimate predication model, which can be used to combine the latent and extenal features of users and web page for web recommendation.This paper propose a unified collaborative filtering model to get more accuracy recommendation. To discover user communities and prototypical interest profiles, Probabilistic Latent Semantic Analysis Model is applied.The experiment result shows that the technique are the higher accuracy, constant time prediction, and an explicit and compact model representation in the users’ Web access log and users’ basic information respectively. Keywords: Web recommendation; Collaborative Filtering; Probabilistic Latent Semantic Analysis; Back Propagation Neural Networks.
1 Introduction Generally, Web recommendation can be viewed as a process that recommends customized Web presentation or predicts tailored Web content to users according to their specific preferences. Because of the ability to help user deal with information overload and provide personalized recommendations, recommender systems have become an important research area since the first paper on collaborative filtering in the mid-1990s [1]. Recommender systems use historical data on user preferences and purchases and other available data on users and items to recommend items that might be of interest to new user. One of the earliest techniques developed for recommendations is based on nearest neighbor collaborative filtering algorithms that use the history of user preferences as input. Nearest-neighbor methods use some notion of similarity among user for whom predictions are being generated. Variations on this notion of similarity and other aspects of memory-based algorithms are discussed by Breese et al [2] and Deshpande et al [3]. Although the Nearest-neighbor User-based collaborative filtering is the most successful technology for building recommender systems to date and is extensively used in many commercial recommender systems, the methods suffer from the scalability problem because the computational complexity of these methods grows linearly with the number of customers, which in typical commercial applications can be several millions.To address these scalability concerns model-based recommendation D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 397–406, 2011. © Springer-Verlag Berlin Heidelberg 2011
398
J. Zhong, S. Deng, and Y. Cheng
techniques have been developed. A simple probabilistic approach to collaborative filtering was proposed by Breese et al [2], where the unknown ratings are calculated as the exception of the active user’s rating to the item according the known rating values. A collaborative filtering method in a machine learning framework was introduced by Billsus [4], where various machine learning techniques coupled with feature extraction techniques can be used. Probabilistic Latent Semantic Analysis (PLSA) Model is a new probabilistic graphical model for users' purchase behavior [5], this models rely on a statistical modeling technique to discover user communities and prototypical interest profiles. Most of traditional collaborative filtering models ignore external features and only focus on history user ratings. To employ the external features Basu et al [6] presented a method exploited both user ratings and content feature in recommending movies to achieved significant improvements in precision over a pure collaborative filtering approach, but the selection of collaborative features and content features depends on the expert’s experiences. Although it is difficult to get the users’ actives and features in the open networking environments, it’s easy to collect the users’ accessing logs of web pages in controlled environments, such as campus networks etc. In such cases, users must be registered before they can access to the internet. We can record the access time, using frequencies and duration of web pages for each user through network logging server. Consequently, it is feasible to get some users’ basic information in those environments, such as gender, age, working location and vocation. Inspired by the hybrid recommender method, we present a unified model based on learning classification model using both history ratings and the external features for higher predication accuracy. In the rest of the paper we introduce the unified CF model in Section 2, the method for mining the latent features are introduced in Section 3, the empirical evaluations of the proposed approach are showed in Section 4, the conclusion and review the approach are presented in section 5.
2 Unified Collaborative Filtering Model Based on Classifier There are two kinds of reasonable assumptions used in traditional CF algorithms. One is based on the users’ historical ratings and the other is based on the external user and item features. In our view, the second assumption is more in line with the nature of the recommendation system. Because people will rate the item based on his/her satisfaction which depends on his/her requirements and the corresponding item’s characteristics. Hence the user’s historical ratings are the results of the interaction between the user and items, and the hidden ratings should also depend on the features of user and items. The unified model in this paper is trying to establish the relationship between the ratings and these features. Let U be the set of all users and let I be the set of all possible items. U i represents the user i , and I j represents the item j. Both user and item have their external features, assuming the number of user’s features is p and the number of items’ features is q, the user and item could be defined as feature vectors: UJK i = { a i 1 , ..., a i p } ,
Web Recommendation Based on Back Propagation Neural Networks K I
j
= { b j 1 , ..., b jq }
399
.Formally, the collaborative filtering problem is to find a fit relation-
ship between two high-dimensional spaces. Once we can estimate ratings for the yet unrated items, we can recommend to the user the item(s) with the highest estimated rating(s). A triple data set H = {< U i , I j , rij >} could be collected normally, which represents user’s historical preferences, where rij is an user preference and it usually represented by a rating, which indicates how a particular user U i liked a particular item I j . the ratings depend on the characteristics of users and items. So the rating function could JJK JJK be denoted as ϕ ( M ,U i , I j ) , M is a prediction model learned from the historical ratJJK JJK ing data H , U i and I j are features of the user i and item j.
Unfortunately, applying the above method directly is very difficult. Because selection the suitable features for the user and item in CF problem are almost an impossible mission, which needs a lot of prior expert knowledge belong to the art fields rather than technology in most cases. Even if the feature set is chosen, it is not feasible to collect the corresponding data because of some data are involved the privacy of people or some features could not be described and coded formally. It leads the low accuracy of prediction only based on the limited observed features. However it is lucky for us that the users’ historical activities data H reflected characteristics of item and user to some extends. So it is feasible to reconstructing some valuable characteristics for item and user through analyzing the historical data. These characteristics gotten form H were called as latent features in this paper. Intuitively, the predication models built based on the observed external features plus latent features should have relative higher prediction accuracy. Suppose the users are grouped by their interests, e.g., users in a group will be favorite some kind of items. Suppose the items are grouped by the rating mode, e.g., items in a group will have similar rating values to some kind of user. Then the probabilities belong to user groups could be seen as a latent features for users, and the probabilities belong to item groups could also be seen as a latent features for Items. The crucial problem is to selecting suitable clustering methods to group the users and items. In the paper, the probabilistic latent semantic analysis is chosen to implement the grouping procedure. JK L K L Let denote U , I as latent feature space for the users and items, let the vector KL JK L U i = {a 'i1 ,..., a 'ik } and I j = {b ' j1 ,..., b ' jk } represent user and item latent attributes JJK JJK L JK K L respectively. The prediction function could be denoted as ϕ (M , U i , U i , I j , I j ) , the JJK JJK L JK K L historical rating data could be converted to H ' = { < U i , U i , I j , I j , r > } . We could use various methods for estimating predication model M based on H ' , such as nearest neighbor classifiers, decision trees etc. Because the BPNNs (Back propagation Neural Networks) are simple, effective, robust and great at predication [5], in our experiments we selected the BPNNs as the preference approximate model M to predict the user’s unrated items. ij
400
J. Zhong, S. Deng, and Y. Cheng
3 Mining Latent Features for User and Web services 3.1 Foundation of Probabilistic Latent Semantic Analysis
The PLSA has been largely inspired and influenced by latent semantic analysis (LSA) [7,8] which could be used any type of count data over a discrete dyadic domain. Suppose that there is a collection of text documents D = {d , ..., d } with terms from a 1
N
vocabulary W = {w , ..., w } . By ignoring the sequential order in which words occur in 1
M
a document, one may summarize the data in a rectangular N × M co–occurrence table of counts C ( n ( d , w )) , where n( d , w ) denotes the number of times the term w j i
j
i
j
occurred in document d i . In this particular case, C is also called the term-document matrix and the rows/columns of C are referred to as document/term vectors, respectively. The starting point for probabilistic latent semantic analysis is a statistical model which has been called the aspect model . The aspect model is a latent variable model for co-occurrence data which associates an unobserved class variable zk ∈ { z1 ,..., z K } with each observation, an observation being the occurrence of a word in a particular document, Fig. 1 shows asymmetric and symmetric graphical model of the latent variables.
Fig. 1. Graphical model of the latent variable in the asymmetric (a) and symmetric (b) parameterization
P(d ) is used to denote the probability for a particular document d i ; P ( w j | zk ) i denotes the class-conditional probability of a specific word conditioned on the unobserved class variable zk ; P ( zk | d i ) denotes a specific document’s probability distribution over the latent variable. During the analysis of text documents, the latent variables represent the hidden topics in the documents set. Then P ( zk | d i ) means the probability of the document d i would belong to the topic zk ; P( w j | zk ) means the probability of topic z k would include the word w j . In order to utilize PLSA method in grouping the Web page and users, a mapping scheme for the Web recommendation problem to text analysis problem was described in table 1.
Web Recommendation Based on Back Propagation Neural Networks
401
Table 1. Mapping between text analysis and Web Page Recommendation
Term
Text analysis
Web Page Recommendation
the ith document containing a set of words the jth word
di wj
The list of Web Page(or Web path) have been accessed by the ith user The jth Web Page(Path) The times of user i has accessed the Web page j or the preference
number of times the word j
n(d i , w j )
occurred in document i
zk
of user i to item j The kth group of user(community) sharing the common preference or
the kth topic
the kth group of Web page accessed by the similar users
P ( zk | d i )
the probability of document i
The probability of the user i
belong to topic k
belong to the community k
the probability the topic k
The probability the community k
including the word j
accessing the Web page j
P( w j | zk )
P( zk | d i , w j )
the conditional probability of
The conditional probability of the
the topic k when d i and w j
user community k or the Web page
were observed
group k
3.2 Calculating the Latent Features
The probabiliti {P( z1 | di ),...., P ( z K | di )} and {P ( z1 | w j ),...., P ( z K | w j )} are the latent features for user
di and item w j , and the latent variable zk could not be observed,
we can use EM method to estimate those conditional probabilities. For the complete data, the joint probability should be maximized, and then we have to maximize L c in equation (1). Lc =
∑ ∑ N
M
i =1
j =1
∑
K k =1
n ( z k , d i , w j ) log P ( z k , d i , w j ) .
(1)
As showed in the figure 1, we can simply apply the Bayes’ formula and could obtain the following equations. P (d i , w j ) =
K
∑
k '=1
P ( zk ' )P (d i | zk ' )P (w j | zk ')
P (zk | di, w j ) =
P (d i | zk ) =
P (d i , w
j
| zk )P (zk )
P (d i , w j )
P ( zk | d i )P (d i ) P (zk )
.
.
.
(2) (3) (4)
402
J. Zhong, S. Deng, and Y. Cheng
P ( zk , d i , w j ) = P (d i ) P ( zk | d i ) P (w j | zk ) .
(5)
During the E step, the posterior probabilities are computed for the latent variables, based on the current estimates of the parameters. For the maximization (M) step, parameters are updated based on the complete data log–likelihood which depends on the posterior probabilities computed in the E-step. For the E-step we simply apply Bayes’ formula to estimate P( zk | di , w j ) through equation (3) and (2). P ( zk | d i , w j ) =
P (d i | zk )P (w j | zk ) P ( zk ) K
∑
k '= 1
P ( zk ' )P (d i | zk ' )P (w j | zk ' )
In the M-step we have to maximize the expected complete data log-likelihood L c ~ in (1). Since the posterior probabilities P ( z k | d i , w j ) = n ( z k , d i , w j ) / n ( d i , w j ) , we should maximize =
Lc
=
∑ ∑
N i =1 N i =1
∑ ∑
M j =1 M j =1
n (d i , w j )∑ n (d i , w j )∑
K
p ( z k | d i , w j )l o g ( P ( d i ) P ( z k | d i ) P ( w
k =1 K k =1
p ( z k | d i , w j )l o g ( P ( z k ) P ( d i | z k ) P ( w
j
j
| z k ))
.
| z k ))
(6)
In order to take care of the normalization constraints, (6) has to be augmented by appropriate Lagrange multipliers α i , β j , λ K N K M K H = Lc + ∑ i =1αi (1 − ∑ j =1 P(d j | zk )) + ∑ i =1 βi (1 − ∑ j =1 P(wj | zk )) + λ (1 − ∑ k =1 P( zk )) .
(7)
Maximization of H with respect to the probability mass functions leads to the following set of stationary equations:
∑
N i =1
∑
n(di , wj ) p( zk | di , wj ) − α k P(w j | zk ) = 0 ,1 ≤ k ≤ K, 1 ≤ j ≤ M .
(8)
n(di , w j ) p( zk | di , w j ) − β k P(di | zk ) = 0 , 1 ≤ k ≤ K, 1 ≤ i ≤ N .
(9)
N j =1
∑ ∑ N
M
i =1
j =1
n ( d i , w j ) p ( z k | d i , w j ) − λ P ( zk ) = 0 , 1 ≤ k ≤ K .
(10)
After eliminating the Lagrange multipliers one obtains the M-step re-estimation equations
∑ n(d , w ) p( z | d , w ) ∑ ∑ n( d , w ) p ( z | d , w N
P(w j | zk ) =
i =1 M
N
i =1
∑
P (d i | zk ) =
P (zk ) =
∑ ∑
∑
j ' =1
M i =1
N i ' =1
M i=1
∑
M j =1
∑
∑ N j =1
j
i
k
j'
i
k
.
j
i
j'
)
n(d i , w j ) p ( zk | d i , w j ) M j =1
N j =1
∑
i
(12)
n(d i', w j ) p( zk | d i', w j )
n(d j , wi ) p ( zk | d j , wi ) K
.
n(d j , wi ) p ( zk ' | d j , wi ) k '= 1
(11)
.
(13)
Web Recommendation Based on Back Propagation Neural Networks
403
The E-step and the M-step equations are alternated until a termination condition is met. In this paper we selected the termination condition is that the improving of not obviously or the iteration times reach the predefined number.
Lc is
4 Experiments 4.1 Data Sets
Our data set is collected from the log server and registration server in the center of information and network of Chongqing University, which contain the users’ Web access log and users’ basic information respectively. We selected 200 site sections denoted as web path from ten famous web portals, such as “news.sohu.com”, “money.163.com”. We selected 1000 users with age, gender, and occupation features in our experiments. The rating values are based on the user’s access times during one month. The Equal frequency discretization method was used to map the access times to rating values. The average number of ratings per user is about 40, the medium number of ratings per user is about 28 and the minimal number of observed ratings for one user is 2. 4.2 Evaluation Metric and Experiment Setup
In collaborative filtering research, one is typically interested in two types of accuracy, the accuracy for predicting ratings and the accuracy for making recommendations. We have investigated assumes that the goal of the system is to predict user ratings. We use the mean absolute error (MAE), the rooted mean square (RMS) error and 0/1 loss error. We used the leave-one-out protocol to evaluate the obtained prediction accuracies. It means one vote randomly for every user from the training set was hidden and models trained on this reduced set. For above three measures, the smaller value means better performance. the parameter M meaning the minimal number of observed ratings for each test user was varied in our experiments on the Web log data set. The structure of the BPNNs was three layers; the hidden layer was chosen as 50 nodes; the training epochs of the BPNNs were set as 200. Although we have experimented with different number of the hidden layer from 40 to 80, it found that the results were consistent throughout these experiments, thus we present only results for this parameters setup. 4.3 Results of Prediction for Web Recommendation
Table 2 summarizes experimental results obtained by the proposed method in this paper (unified method with 8 latent features for users and items respectively), a memory-based method using Nearest-neighbor user to predicate ratings. The baseline is simply defined by the overall mean vote for each item. The results of Baseline, User based, and variant Unified method obtained from the same data set with test votes have been selected by leave-one-out and the minimal number of observed ratings for each user M=2, the number of users is 1000, the average number of ratings per user is about 40, the medium number of ratings per user is about 28.
404
J. Zhong, S. Deng, and Y. Cheng Table 2. Prediction Accuracy of Various Methods On EachMovie Data set
Method
Error
Baseline User based Mixture pLSA Unified Method (no latent features) Unified Method (no external features) Unified Method (both features)
Relative improvements
MAE 0.927 0.867 0.848 0.906
RMS 1.232 1.174 1.170 1.185
0/1 loss 71.5 64.2 63.4 66.8
MAE ±0 6.8% 8.5% 2.5%
RMS ±0 4.7% 5.0% 3.8%
0/1 loss ±0 10.2% 11.3% 6.5%
0.791
1.067
65.4
14.6%
13.4%
8.5%
0.771
1.047
62.6
17.2%
15.5%
12.4%
As can be seen, the results in table 2 indicted that the accuracy is even lower than memory based methods only depend on the external features which means that the external features are not enough to build exact predicting function. Also it shows that the predication accuracy is not idea too if only depend on the latent feature gotten from historical data. The effects of parameter M on the prediction accuracy obtained by the variant Unified methods have investigated in experiments. Fig. 2 shows the result in terms of MAE for M = 2, 10, 50, 90, 120,150.
Fig. 2. MAE for the variant Methods for different values of M
4.4 Results of Ranking Scenario on Web Log Data Set
The second scenario we have experimentally investigated is the free prediction mode. Since the prediction accuracy in predicting votes is no longer adequate, the half-life utility metric with half-life constant α set to 4 and Pearson correlation have used to score the rank lists. The results are summarized in Table 3, where the parameter M=20 for the experiments data set. The number of latent features for Unified methods is set as K=8.
Web Recommendation Based on Back Propagation Neural Networks
405
Table 3. Prediction Accuracy of Various Methods On Web Log Data set
Method Baseline User based Unified Method (no latent features) Unified Method (no external features) Unified Method (both features)
Error
Relative improvements
MAE 0.812 0.784 0.793
RMS 1.048 0.983 0.989
0/1 loss 67.3 64.2 65.4
MAE ±0 3.45% 2.34%
RMS ±0 6.20% 5.63%
0/1 loss ±0 4.61% 2.82%
0.724
0.934
63.1
10.84%
10.88%
6.24%
0.714
0.916
62.5
12.07 %
12.60%
7.13%
As can be seen, the best ranking scores are obtained by the Unified methods using all features. The results only depending on the external features have not got obvious improvement relative to baseline method. The difference between using all features and using only latent features is not significant. Also it could be found that the User based method obtained relative high scores in Half-life utility and Pearson Correlation measures, but it has no advantages in the predication mode (MAE=0.724). Although the unified method has not improved the ranking performance obviously relative user based method, the unified method could complete the ranking recommendation in constant time.
5 Conclusions In this paper, collaborative filtering is investigated as a classification problem, and the probabilistic Latent Semantic Analysis method is used to construct the latent features for both item and user and Back Propagation Neural Networks is used to combine the latent and extenal features. This method achieves competitive recommendation and prediction accuracies, Because this method did not need any prior expert knowledge about the collaborative filtering problem and it could work relying on limited external features for both users and items. So it could easy to be used in the Web recommendation problem. Acknowledgement. This work is supported by Natural Science Foundation Project of CQ CSTC(2010BB2046), the Third Stage Building of “211 Project” (Grant No. S-10218), and Fundamental Research Funds for the Central Universities (Project No. CDJZR10 18 00 25).
References 1. Adomavicius, G., Tuzhilin, A.: Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions. IEEE Trans. on Knowledge and Data Engineering 17, 734–749 (2005)
406
J. Zhong, S. Deng, and Y. Cheng
2. Breese, J.S., Heckerman, D., Kadie, C.: Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In: 14th Conf. Uncertainty in Artificial Intelligence, pp. 43–52. Morgan Kaufmann, San Francisco (1998) 3. Deshpande, M., Karypis, G.: Item-Based Top-N Recommendation Algorithms. ACM Trans. on Information Systems 22, 143–177 (2004) 4. Billsus, D., Pazzani, M.: Learning Collaborative Information Filters. In: Machine Learning: Proc. of the Fifteenth Intl. Conf., pp. 46–54. Morgan Kaufmann, San Francisco (1998) 5. Battiti, R.: First and second order methods for learning: Between steepest descent and Newton’s method. Neural Computation 4, 141–166 (1992) 6. Basu, C., Hirsh, H., Cohen, W.: Recommendation as Classification: Using Social and Content-Based Information in Recommendation. In: 15th National Conference on Artificial Intelligence, pp. 714–720. AAAI, California (1998) 7. Hofmann, T.: Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning 42, 177–196 (2001) 8. Hofmann, T.: Latent Semantic Models for Collaborative Filtering. ACM Transactions on Information Systems 22, 89–115 (2004)
A Multi-criteria Target Monitoring Strategy Using MinMax Operator in Formed Virtual Sensor Networks Xin Song1,2, Cuirong Wang1, and Juan Wang1 1
School of Information Science and Engineering, Northeastern University, 110819, Shenyang, China 2 The Key Laboratory of Complex System and Intelligence Science, Institute of Automation, Chinese Academy of Sciences, 100190, Beijing, China
Abstract. Target monitoring and event detection are very critical issues in wireless sensor networks. Sensor nodes forward the sensory data to the base station by relaying through intermediate nodes used as relays, from which users collect valuable information. The multi-criteria target monitoring is still important in wireless sensor network application scenarios. In order to support energyefficient monitoring and simplify the complexity of dealing with heterogeneous sensor networks, in this paper, we propose a multi-criteria target monitoring strategy using MinMax operator in formed virtual sensor networks. The strategy builds a hierarchical structure by distributed clustering technique on the network before forming virtual sensor network to meet various monitoring requirements from different kinds of application deployment. Then, for the multi-criteria monitoring of virtual sensor networks framework, we present method of setting and updating hierarchical thresholds by using the MinMax operator. Finally, the simulation results demonstrate that the strategy can effectively reduce the energy consumption and increase the system lifetime. Keywords: Multi-criteria Target Monitoring, MinMax Operator, virtual sensor network, wireless sensor network.
1 Introduction The Wireless Sensor Networks (WSN) design often employs some methods as energy-aware techniques, in-network processing, multi-hop communication, multi-domain cooperation strategies, and density control techniques to extend the network lifetime. Since technology for WSN is still developing and has yet to reach its full potential, for the time being it is typical to assume that in a given location there is a single WSN deployed. Independent sensor networks each dedicated to a specific task. However as the continuing advances in the sensor nodes processing capability, the technology more mature and standardized, it is possible that two or more sensor networks with different application scenarios are deployed and managed by the authorities but deployed at the same physical location. For the networks that overlap partially in physical space, if the resources sharing can be implemented, the results should bring on efficient resource usage rate and better performance attributes. Target monitoring queries are fundamental for WSN that collect data for physical environment. With the D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 407–415, 2011. © Springer-Verlag Berlin Heidelberg 2011
408
X. Song, C. Wang, and J. Wang
implementation of Virtual Sensor Networks (VSN) frameworks in the hybrid and heterogeneous WSN, the monitoring task thus will be based on the VSN formed, where multiple VSN exist on a physical monitoring domain. Some nodes will be engaged in tracking monitoring targets, others provide communication support. In order to support the multi-criteria constraint monitoring, it is urgent research problem that the technology of the multi-dimensional physical attributes recognition and maintenance for VSN membership of nodes. In this paper, we consider the problem of cooperation for heterogeneous WSN deployed on a physical monitoring domain but implementing different tracking tasks. We propose a multi-criteria target monitoring strategy using MinMax operator in formed virtual sensor networks. Several assumptions are made for the strategy, sensor nodes are not mobile, and the nodes in a physical domain can communicate each other. First, the sensor nodes have been organized according to a distributed clustering algorithm before forming the VSN. Then, we form a virtual tree based VSN by connecting CH nodes observing the same phenomenon. Cluster tree is rooted at the root node and formed by each CH keeping track of its parent and child CHs. Finally, for implementing the multi-criteria constraint monitoring, the MinMax operator was introduced in the hierarchical threshold update and maintenance of virtual tree, which further promotes efficiency of recognition and computation of monitoring nodes. The remainder of the paper is organized as follow: In Section 2, we describe and discuss some of the related work about the energy-efficient monitoring in WSN and present several possible strategies to implement a VSN. The proposed strategy is detailed in Section 3. The experimental setup and analytical results are discussed in Section 4. Finally, we conclude the paper in Section 5.
2 Related Work Target monitoring of physical phenomena is one of the main areas of application for WSN. Due to its application-oriented characteristics, we classify the applications into two types: entire-network data collection and partial-network data collection. The former applications need to keep in database the sensed data from all nodes without aggregation for later analysis. The latter applications are only interested in the data from some nodes. For energy efficient optimization of the monitoring queries, the data collection task should be selectivity-aware. Physical phenomena are characterized by their spatial correlation; hence, when monitoring a physical phenomenon, some optimization schemes were proposed to minimize the use of non-selected parts of a WSN to achieve overall energy efficiency and extend the lifetime. Muhammad Umer, Lars Kulik, and Egemen Tanin proposed an adaptive locationbased data collection algorithm, Pocket Driven Trajectories (PDT), which optimizes the monitoring queries by exploiting their key characteristics. The PDT algorithm first discovers the set of pockets of selected nodes for a given query. It then aligns a data collection tree to the spatially optimal path connecting these pockets. This path minimizes the use of non-selected nodes in the data collection tree and consequently optimizes the overall energy efficiency of the data collection process. Furthermore, they extended and improved their previous work by introducing adaptability into the algorithm and analyze the performance for various dynamic situations [1].To support
A Multi-criteria Target Monitoring Strategy Using MinMax Operator
409
energy efficient skyline monitoring in sensor networks, in ref [2] first presented a naive approach as baseline, and then proposed an advanced approach that employs hierarchical thresholds at the nodes. The threshold-based approach focuses on minimizing the transmission traffic in the network to save the energy consumption [2]. In order to provide efficient monitoring service in a large scale sensor network, the hierarchical organization of the sensors, grouping them and assigning those specific tasks into the group before transferring the information to higher levels, is one the mechanisms proposed to deal with the sensors limitations and is commonly referred to as clustering. In the last few years, a lot of clustering algorithms have been proposed in WSN. Most of these algorithms aimed at generating the minimum number of clusters that maximize the network lifetime and data throughput [3-6].The essential operation in such monitoring applications is data gathering, i.e., to collect the physical information from sensor nodes and transmit to the base station and control server for processing [7]. Recently, Virtual Sensor Networks mechanism is shown to be a promising technology to enhance applications that involve dynamically varying subsets of sensor nodes collaborating tightly to achieve the desired outcomes, while relying on the remaining nodes to achieve connectivity and overcome the deployment and resource constraints [8]. For many monitoring applications with a periodic collection pattern, a tree-based topology was often adopted because of its simplicity and power efficiency. H. M. N. Dilum Bandara et al proposed a cluster tree based mechanism to form VSNs, facilitate inter-VSN and intra-VSN communication, and perform other VSN support functions. The cluster-tree based approach is significantly more efficient and reliable compared to random-routing based protocols [9]. In ref. [10], an architecture based on a Multi-Agent System (MAS) is proposed as an alternative to alleviate some of the problems with the layered technique. Utilizing MAS facilitates the distribution of new VSN onto a deployed WSN and permits heterogeneous timing regimes to be implemented that allow optimal performance of the individual VSN. In order to simplify the complexity of dealing with heterogeneous sensor networks, a novel approach (VIP Bridge) was proposed to integrate these different sensor networks into one virtual sensor network [11]. Although there are some research fruits, they are still immature and have wide unfathomed problem. In this paper, our approach to cost reduction is through reducing the amount of data sent by each node where the accuracy can be pre-configured. The strategy implemented a multi-criteria target monitoring using MinMax operator in formed virtual sensor networks. Users can easily query and monitor the correlative data through this virtual sensor network.
3 Implementation of Multi-criteria Target Monitoring Strategy The multi-criteria target monitoring strategy includes three phases: distributed clustering process, CH tree based VSN formation, MinMax-Threshold maintenance of nodes. 3.1 Distributed Clustering Process In most recent research progress of hierarchical clustering protocol, the more novel algorithm is EADEEG (an energy-aware data gathering protocol for wireless sensor
410
X. Song, C. Wang, and J. Wang
networks) that achieves a better performance in terms of lifetime by minimizing energy consumption for communications and balancing the energy load among all nodes [12]. For each node in network, the time t of the cluster head (CH) request message sent can be computed from equation (1).
t = k ×T ×
Eaverage
(1)
Eresidual
Where k is a float randomly uniform distributed in the interval [0.9, 1], T is a predefined duration time parameter of CH selection mechanism, Eaverage denotes the average residual energy of all neighborhood nodes in the Cluster radius of node Vi , Eresidual denotes the residual energy of node Vi . The main parameter of the competition for CH election is Eaverage / Eresidual . While there are advantages to using the EADEEG distri-
buted cluster formation algorithm, it offers no guarantee about the monitoring blind spot brought or the disconnected CH nodes in some situations. However, using a auxiliary parameter deg (degree of the sensor node) to associate with Eaverage / Eresidual for the competition for CH election may produce better clusters throughout the network. In our scheme, each node Vi computes the average residual energy Eaverage of all neighborhood nodes V j from equation (2).
Eaverage =
1 d ∑ E jr deg j =1
(2)
Where E jr denotes the residual energy of the node V j . Then, the time t of the cluster head (CH) request message sent can be computed from equation (3) by improving equation (1). Where Einitial denotes the initial energy of the node. The CH competition phase can complete in time T . From 0 to T / 2 , a majority of CHs are certain. After the phase (from T / 2 to T ), a few of remain node become CH according to the large ratio of node residual energy to initial energy.
Eaverage ⎧ 1 × ⎪k × T × Eresidual deg + 1 ⎪ t=⎨ ⎪ T + k × T × Eresidual ⎪⎩ 2 2 Einitial
Eresidual > Eaverage (3) Eresidual < Eaverage
The clustering process can be described as follow: Step 1: the neighbor discovery phase, each node broadcasts respective message with communication radius, receive the message from its neighbor nodes, initialize Eaverage , deg and compute the time t according equation (3). Step 2: CHs find phase, when current time is lesser than t , the nodes have received CH message from the neighbor node, the nodes abandon the CH competition, else broadcast CH message.
A Multi-criteria Target Monitoring Strategy Using MinMax Operator
411
Step 3: nodes adscription phase, other nodes apart from CH nodes broadcast join message to the CH with max energy, update the parent information, The CH nodes receive join message, update the deg and other parameters. The clustering process is completed. 3.2 Virtual Sensor Network Formation
A VSN can be formed by providing logical connectivity among these collaborative communication CH nodes (the CH nodes can communicate directly each other).CH nodes can be grouped into different VSN based on the phenomenon they track (e.g., rockslides vs. animal crossing) or the task they perform. The formation process can be divided into two main phases. The first phase: if there is only a sink node, it sends a virtual tree formation message to neighbor CHs and allows all the neighbors to join its child CH set. Each of those neighbor CHs then forwards the broadcast to the next set of neighbors. This process continues until the entire network is covered. The process achieved a virtual tree backbone. The second phase: whenever a node detects an interested event for the first time, it sends a discovery message towards the parent node, indicating that it apperceives the event and wants to collaborate with similar nodes. If cluster member node sent the message, the CH checks its array of VSN entries to determine whether it is already aware of the VSN. If not, the CH marks itself as being part of the VSN. The message is sequentially forwarded to its parent CH until the message is forwarded up to the root node. Note that, it is possible for a CH to be part of multiple VSN. 3.3 MinMax Operator Based Multi-criteria Monitoring
Multi-Criteria target monitoring implies a making selection process of the most interested alternatives based on a series of monitoring target with multi-attribute. Imaging the following forest conservation monitoring application scenario, we use wireless sensor network to identify potential conservation areas. Assume two values are used to represent monitoring data: temperature and humidity at the sensed point that measures how serious the conservation situation is at that point. The area of the higher temperature and little humidity is fire hazard place. Then, the forest conservation planning as multi-criteria monitoring problem is formulated. We formed the VSN as the target detection continuance in designing the multicriteria monitoring strategy, every node installs a threshold that is n dimensional point ( n is monitoring attribute number). Whenever the sensor node detects a relevant overstepping threshold event for the first time it sends the message towards the root node of the virtual CH tree. Note the monitoring strategy only works when it is paired with the method of setting and updating the threshold. In order to further promote efficiency of threshold computation, we present method of threshold maintenance by using the MinMax operator in formed VSN backbone. We can monitor in real time all the most dangerous sites in terms of high temperature and low humidity. For the suchlike application scenario, in the VSN formed, the setting of MinMax threshold reduced the transmission traffic of the entire network. We aim to minimize the energy consumption. The definition of the MinMax operator described as equation (4). The input is a point set with the same monitoring attribute dimensionality and the
412
X. Song, C. Wang, and J. Wang
output is a single point. Suppose data set D = {d1 , d 2 ,...,d n } , MinMax( D ) is defined as follow. maxi = MAX kj=1di [k ](1 ≤ i ≤ n) minmax = MIN in=1maxi
(4)
MinMax( D ) = {minmax, minmax,...minmax} 144444244444 3 j
Where j denotes the monitoring attribute dimensionality [2]. An example of the MinMax operator was shown in Fig.1. Point set D consists of five 2-dimensional points. We observe maxi = d i [ y ] when i = 1, 2 and maxi = d i [ x] when i = 3, 4,5 . As shown in Fig.1, d 3 [ x] is the smallest among d1[ y ], d 2 [ y ], d 3 [ x ], d 4 [ x ], d 5 [ x ] , so MIN {d1[ y ], d 2 [ y ], d 3 [ x], d 4 [ x], d 5 [ x]} equals to d 3 [ x] . MinMax( D ) = ( d 3 [ x] , d 3 [ x] ).
Fig. 1. The MinMax operator
In the first phase, there is no historical information on each CH node, which gathering data toward the central monitoring node. Once the VSN was formed, the hierarchical threshold is initialized as the corresponding MinMax operator. The parent CH node computes MinMax( D ) among its own point and the reported points from its child node. MinMax( D ) has the equivalent dominating ability to monitoring points that are not dominated by any other point. The point p dominates another point q only if p is better (according to a certain user defined preference) than q on at least one dimension and not worse than q on the other dimensions. As a result, such transformation minimizes the query to be one-point message and thus significantly reduces the traffic cost.
4 Performance Evaluation This section evaluates the performance of proposed approaches, two different simulation scenarios are considered. In the first scenario, the application of VSN formed
A Multi-criteria Target Monitoring Strategy Using MinMax Operator
413
was decided to compare it to the traditional dedicated sensor networks. Consider two sensor-network based systems, one for warning of forest fire and the other for monitoring of chemical plume pollution, both to be deployed on a planar rectangle grid of 500m*500m, where 600 sensor nodes are randomly distributed with the transmission radius r = 100m. The sink node is placed in the middle of the sensor field. The initial energy of each node is 50J. Fig.2 shows the total energy consumption of the nodes obtained with two non-cooperation WSNs and two different VSN subsets. In traditional dedicated WSN, all the nodes collaborate more or less as equal partners to achieve the sensory data. In contrast, the subset of nodes belonging to the VSN collaborates to carry out a given application data. The VSN depends on the remaining nodes providing VSN support functionality to create, maintain and operate VSN. Thus, VSN support will simplify application deployment and reduce energy consumption. 250
The total energy consumption(J)
VSN framework two specialty WSN
200
150
100
50
960
900
840
780
720
660
600
540
480
420
360
300
240
180
120
0
60
0
Time(s)
Fig. 2. The total energy consumption compare of two frameworks
0.9 )J (e 0.8 do n re 0.7 p no 0.6 tip mu 0.5 s no c 0.4 yg re 0.3 ne e 0.2 agr ev 0.1 A 0
hierarchical MinMax-threshold
2
3
Dimensionality
4
5
Fig. 3. The average energy consumption of variety dimensionality per node
414
X. Song, C. Wang, and J. Wang
In order to evaluate the performance of the multi-criteria monitoring using MinMax operator, in the second scenario, with the increasing of the dimensionality, the average energy consumption per node increases for both the common hierarchical framework and MinMax-threshold based approach. Fig. 3 shows the performance evaluation results when the dimensionality varies between 2 and 5. MinMax-threshold based strategy is superior to the common scheme in any dimensionality, reducing the energy consumption. However, the difference shrinks with the increasing of the dimensionality because the dominance relationship between high-dimensional points is weakened.
5 Conclusion Algorithms for WSN have some specific features such as self-configuration and selforganization, depending on the type of the target application. Self-configuration means the capacity of an algorithm to adjust its operational parameters according to the design requirements. Self-organization means the capacity of an algorithm to autonomously adapt to changes resulted from external interventions, such as topological changes (due to failures, mobility, or node inclusion) or reaction to a detected event, without the influence of a centralized entity. The several features of strategy proposed include: (1) It is fully distributed, so that it has better scalability. (2) It is self-organizing, thus robust and fault tolerant; the operations to be performed in each node are very simple. (3) It can support collaborative, resource efficient, and multipurpose sensor network by forming VSN. (4) The update and maintenance hierarchical thresholds are more efficiency by compute the MinMax operator. It is more suitable for a large scale sensor network in multi-criteria target monitoring. Acknowledgment. The research work was supported by open research fund of Key Laboratory of Complex System and Intelligence Science, Institute of Automation, Chinese Academy of Sciences.
References 1. Umer, M., Kulik, L., Tanin, E.: Optimizing Query Processing Using Selectivity-awareness in Wireless Sensor Networks. Computers, Environment and Urban Systems 33, 79–89 (2009) 2. Chen, H., Zhou, S., Guan, J.: Towards Energy-Efficient Skyline Monitoring in Wireless Sensor Networks. In: Langendoen, K.G., Voigt, T. (eds.) EWSN 2007. LNCS, vol. 4373, pp. 101–116. Springer, Heidelberg (2007) 3. Akkaya, K., Senel, F., McLaughlan, B.: Clustering of Wireless Sensor and Actor Networks Based on Sensor Distribution and Connectivity. J. Parallel Distrib. Comput. 69, 573–587 (2009)
A Multi-criteria Target Monitoring Strategy Using MinMax Operator
415
4. Bandyopadhyay, S., Coyle, E.: An Energy-efficient Hierarchical Clustering Algorithm for Wireless Sensor Networks. In: Proceedings of IEEE INFOCOM, pp. 1713–1723. IEEE Press, San Francisco (2003) 5. Min, X., ShiWei-ren, et al.: Energy Efficient Clustering Algorithm for Maximizing Lifetime of Wireless Sensor Networks. Int. J. Electron. Commun. 64, 289–298 (2010) 6. Chamam, A., Pierre, S.: A Distributed Energy-efficient Clustering Protocol for Wireless Sensor Networks. Computers and Electrical Engineering 36, 303–312 (2010) 7. Xu, H., Huang, L., et al.: Energy-efficient Cooperative Data Aggregation for Wireless Sensor Networks. J. Parallel Distrib. Comput. 70, 953–961 (2010) 8. Jayasumana, A.P., Han, Q.: Virtual Sensor Networks- A Resource Efficient Approach for Concurrent Applications. In: Proc. International Conference on Information Technology (ITNG 2007), pp. 111–115. IEEE Press, Las Vegas (2007) 9. Dilum Bandara, H.M.N., Jayasumana, A.P., et al.: Cluster Tree Based Self Organization of Virtual Sensor Networks. In: Globecom Workshops, pp. 1–6. IEEE Press, New Orleans (2008) 10. Tynan, R., O’Hare, G.M.P., et al.: Virtual Sensor Networks: An Embedded Agent Approach. In: 2008 International Symposium on Parallel and Distributed Processing with Applications, Sydney, pp. 926–932 (2008) 11. Lei, S., Xu, H., et al.: VIP Bridge: Integrating Several Sensor Networks into One Virtual Sensor Network. In: International Conference on Internet Surveillance and Protection, pp. 2–9. IEEE Press, Cote (2006) 12. Liu, M., Cao, J., et al.: EADEEG: An Energy-Aware Data Gathering Protocol for Wireless Sensor Networks. Journal of Software 18(5), 1092–1109 (2007) (in Chinese)
Application of a Novel Data Mining Method Based on Wavelet Analysis and Neural Network Satellite Clock Bias Prediction Chengjun Guo* and Yunlong Teng University of Electronic Science and Technology of China, Research Institute of Electronic Science and Technology, No.2006, Xiyuandadao, 611731 Chengdu, China {gcjcdn-007,tyluestc}@163.com
Abstract. A novel four-stage data mining method for clock bias prediction based on wavelet analysis and FONN (fuzzy optimization neural networks) is proposed. The basic ideas, prediction models and steps of clock bias prediction based on wavelet analysis and FONN are discussed respectively. And then, to validate the feasibility and validity of the proposed method, make a careful precision analysis for satellite clock bias prediction with the performance parameters of GPS satellite clock, and make comparison and analysis with Grey system model and neural network model. The results of simulation shows that the prediction precision of the novel four-stage model based on wavelet analysis and FONN is more better, can afford high precise satellite clock bias prediction for real-time GPS precise point positioning. Keywords: Data mining; satellite clock bias; wavelet analysis; FONN; coasting window.
1 Introduction The accuracy for the user receivers to determine its position, velocity and synchronize to GPS system time depends on a complicated interaction of various factors. In general, GPS accuracy performance depends on the quality of pseudo-range and carrier phase measurements as well as the broadcast navigation data. In addition, the accuracy of the underlying physical model which relates these parameters is relevant. For example, the offsets of satellite and receiver clock can directly translate into pseudorange and carrier-phase errors. So, in a manner of speak, for satellite navigation and positioning, accurate position measurement could just means accurate time measurement. And even we can announce that no high precision time-frequency, no high precision navigation and positioning [1]. So the prediction of satellite clock bias is an important employment, it is of far reaching importance for high precision navigation and position, and is the key link for realizing real-time GPS precise point position [2]. *
This research was supported by the Aeronautic Science Foundation of China (No. 20090580013).
D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 416–425, 2011. © Springer-Verlag Berlin Heidelberg 2011
Application of a Novel Data Mining Method
417
The most common clock bias prediction algorithms generally contain second-order polynomial model [3] and grey system model [4]. But in fact, the applications for both of these two models have some localization [5]. And the frequencies of the satellites atomic clocks are very high, lead to they are extraordinary hypersensitive, and will be affected by the factors of outside or themselves wondrously easily, so hold the reins of their complex and meticulous evolutions is very difficult. Meanwhile, literature [6] index that a non-linear and variance-stationary random series with wondrously complex transformation should be difficult to carry out effective prediction with single model. Come to light, neural network has certain ability for approaching discretional nonlinear mapping by learning, and breaks through the conventional method which based on building parameter model, indicates obvious advantage in non-linear prediction. And in an otherwise, Fuzzy theory can deal with unknown and uncertain questions and neural network may learn and self-adapt the uncertain system. Therefore, the integration of neural network and fuzzy theory can dispose quantitative and qualitative knowledge and is applicable to complex system. T J He and W Huang combined fuzzy theory and neural network, and built fuzzy neural network (FNN) model of pavement performance assessment [7]. But above modeling was comparatively difficult due to the slow convergence speed and local convergence of neural network. Genetic algorithm (GA) is a global search optimization algorithm, so X G Hu and B G Wang introduced GA and built GANN model for pavement performance evaluation [8]. Although, the model only applied GA for optimizing the connection weights of BP network and did not make use of the advantages of BP network adequately. M Y Liu and W X Huang proposed GANN pavement performance assessment model that made up for the defects of above GANN model [9]. But the neural network structure had not specific physical meaning. And then, fuzzy optimization BP neural network model proposed by Chen S Y combined of both fuzzy theory and neural network [10]. Especially, fuzzy optimization model is as activation functions of neural network, so physical meaning of the model is comprehensible. Considering these factors, this paper contraposes the characteristics of clock bias series, presented a mixture model based on wavelet analysis and FONN for satellite clock bias prediction. An example is presented to validate the feasibility and validity of the proposed method, and then analysis prediction precision. The results of simulation and analysis show that the method presented in this paper could acquire higher prediction precision, can afford high precise satellite clock bias prediction for realtime GPS precise point positioning.
2 Design Scheme of Clock Bias Prediction Model The satellite clock bias prediction model based on wavelet analysis and FONN could be compartmentalized to 4 phases: (1) Analyze the clock bias series into different time-frequency layers by wavelet analysis; (2) Denoise the coefficients of every layer, so as to wipe off the isolated point and leap point by wavelet denoising; (3) Prediction the coefficients of every layer by FONN; (4) Reconstruction the series by wavelet reconstruction, so as to acquire the clock bias prediction series. And the idiographic prediction flow is described as Figure 1 which is indicated as follow.
418
C. Guo and Y. Teng
X0
X1
X N −1 XN
Fig. 1. Satellite clock bias prediction model
For the mixture model based on wavelet analysis and FONN, the coefficients of every layer of clock series after wavelet analyzed, denotes the trend and detail, then after the process of the isolated point and leap point by wavelet, FONN can hold the complex and meticulous evolutions rule more accurately by learning so as to approach the clock series better, and to acquire higher prediction precision.
3 Data Processing Based on Wavelet Analysis The wavelet transform analysis method, which is developed from the Fourier transform, has good localization characteristics in both the time and frequency domains, and is usually applied to analyze the signal component in specified signal frequency band and time window. The definition of the wavelet transform of a signal f (τ) is as follows:
(W f ) ( a, b ) = a ϕ
−
1 2
⎛τ − b ⎞ +∞ ∫−∞ f (τ ) ϕ ⎜ ⎟ dτ ⎝ a ⎠
(1)
Where, (Wϕ f )( a, b ) is the wavelet coefficient; a is a parameter characterizing the frequency of f (τ) and is called the scale; b is a parameter characterizing the position of f (τ) in time τ or space; φ is the basic wavelet, being the same as e-jωτ in Fourier transform; and ϕ is the complex conjugate of φ. A multi-resolution representation method was proposed by Mallat [11]. The Mallet algorithm based on multi-resolution representation is a fast discrete wavelet transform algorithm, just as the FFT to Fourier transform. It decomposes and reconstructs a signal by using the wavelet filters H, G, h, and g. The decomposition procedure can be described as follows [12]: ⎧ ⎪ A0 ⎡⎣ f ( t ) ⎤⎦ = f ( t ) ⎪ ⎨ Aj ⎣⎡ f ( t ) ⎦⎤ = ∑ H ( 2t − k ) A j −1 ⎣⎡ f ( t ) ⎦⎤ k ⎪ ⎪ D j ⎡⎣ f ( t ) ⎤⎦ = ∑ G ( 2t − k ) Aj −1 ⎡⎣ f ( t ) ⎤⎦ k ⎩
(2)
Application of a Novel Data Mining Method
419
where, t is the discrete time sequence, t = 1, 2, …, N; f(t) is the original signal; j is the decomposition level, j = 1, 2, …, J, and J = log2N; H and G are the quadrature mirror filters for decomposition in time domain, i.e., H is the low-pass filter, and G is the high-pass filter; Aj is the wavelet coefficient of the approximation component, i.e., thec low frequency component of f(t) at level j; and Dj is the wavelet coefficient of the detail component, i.e., the high-frequency component of f(t) at level j. In this manner, an integrated signal is decomposed into a sequence of sub-signals with different frequency bands. The reconstruction algorithm can be described as follows: Aj ⎡⎣ f ( t ) ⎤⎦ = 2∑ h ( t − 2k )Aj +1 ⎡⎣ f ( t ) ⎤⎦ + 2∑ g ( t − 2k )D j +1 ⎡⎣ f ( t ) ⎤⎦ k k
(3)
4 Prediction Based on FONN Neural Network The principle neural network of is carrying out the non-linear transform from input space to output space by the linear compages of non-linear basis function. The coefficients of every layer of clock series after wavelet analyzed are a kind time series with strong non-linearity, carry through prediction for them, materially means that find the non-linear mapping relationship from input space to output space. So the neural network is propitious to predict non-linear time series (just as the coefficients of every layer of clock series) extraordinarily [13]. 4.1 Brief Introduction of Neural Network
The Neural Network architecture is a massive parallel distributed information processing system that has certain performance characteristics resembling biological networks of the human brain [14]. A neural network always comprises one input layer, one or more hidden layers and one output layer and linking with link-weights. After training the neural network can match any mapping relationship between input and output. Fig.2 is a 3-layer ANN BP system with fuzzy optimization. There are m input nodes at input layer, which means there are m feature values of objectives. The number of hidden nodes is l, which is the number of unit systems. The number of output nodes is g. Assume that there exist n samples, i.e., the observation data span is days and here n = 200. rij is the input of sample jth, i =1, 2, .., m; j = 1, 2, .., n. At input layer, node ith transfers message directly to the nodes of hidden layer. Therefore, input equals output at node ith( uij = rij ). For node kth at hidden layer, its input can be expressed as: m
I kj = ∑ wik ri j
(5)
i =1
m
Here wik is the link-weight between node ith and kth, and satisfies ∑ wik = 1, wik ≥ 0 . i =1
The output is
420
C. Guo and Y. Teng
Fig. 2. A 3-layer ANN BP system with fuzzy optimization
ukj =
(
1
)
⎡ 1 + ⎢ ∑ wik rij ⎣ i =1 m
−1
=
⎤ − 1⎥ ⎦
1 1+ (I
−1 kj
− 1)
2
(6)
For node pth at output layer, its input reads: l
I pj = ∑ wkp ukj
(7)
k =1
Here wkp is the link-weight between nodes of hidden layer and nodes of output layer, and equal the weight of unit system at the system middle layer of BP model and m
satisfies ∑ wik = 1, wik ≥ 0 . Its output is i =1
u pj =
(
1
⎡ l 1 + ⎢ ∑ wkp rkj ⎣ k =1
)
−1
⎤ − 1⎥ ⎦
=
1
1 + ( I pj−1 − 1)
2
(8)
The actual output upj of the network is the response of the ANN BP model to the j input rij. Let t (pj ) be expected output of sample jth, and then its global error is
Ej =
(
1 g ( j) ∑ u pj − t pj p = 1 2
)
2
(9)
Average error of n samples is
E=
(
1 n 1 n g ( j) ∑ Ej = ∑ ∑ u pj − t pj n j =1 2n j =1 p =1
)
2
(10)
By using maximum decreasing gradient method, the adjusting amount equation of weight between hidden layer node and output layer node is as follows:
Application of a Novel Data Mining Method l ⎡ ⎤ 1− w u 2η n 2 ⎢ k∑=1 kp kj ⎥ g ( j ) t pj − u pj Δwkp = ∑ u pj ukj ⎢ 3 ⎥ ∑ n j =1 ⎢ ∑l w u ⎥ p =1 ⎢⎣ k =1 kp kj ⎥⎦
(
(
)
)
421
(11)
Here η is learning coefficient. The adjusting amount equation of weight between input layer node and hidden layer node is denoted as: m ⎡ ⎤ 1 − ∑ wik rij ⎥ ⎢ 2η n 2 Δwik = δ ∑ rij wkp ukj ⎢ i =1 3 ⎥ pj m n j =1 ⎢ ∑ wik rij ⎥ ⎢⎣ i =1 ⎥⎦
(
)
(12)
Here δ pj is determined by l ⎡ ⎤ 1 − ∑ wkp ukj ⎥ g ⎢ 2 k =1 t (pjj ) − u pj δ pj = 2u pj ⎢ 3 ⎥ ∑ l p =1 ⎢ ∑w u ⎥ ⎢⎣ k =1 kp kj ⎥⎦
(
)
(
)
(13)
The weight-adjusted equations are:
wik ( t + 1) = wik ( t ) + Δwik ( t + 1) + αΔwik ( t )
(14)
wkp ( t + 1) = wkp ( t ) + Δwkp ( t + 1) + αΔwkp ( t )
(15)
Here t is iterated number and α is momentum coefficient that varies between 0 and 1. Based on the customary iterated method of ANN and the BP model mentioned above, the link-weights of network can be obtained. That is to determine the weights between objectives and unit system and to minimize the error between actual output and expected output (within 3‰). Then we use iterative method to train ANN and retrieve corresponding I(K) by observation t(K) and ANN[14,15]. After network training completed and fixed weights ANN selected, we can run retrieving work. According to another set of known observation output vector multiplayer MAS is used to retrieve appropriate parameter input vector correspondingly. The excellence of iterative retrieval is its accurate precision and easy control; especially it can introduce restriction based on known knowledge during retrieving process, which improves retrieving speed and precision. Initial value selection of iteration has great effect on retrieving results. 4.2 Prediction of Coefficients of Every Layer
The purpose of prediction of coefficients of every layer is that predict the value of the following M moments by the value of the foregoing N moments of this series. In this paper, we adopt the data of the foregoing N moments of this series as coasting window, and map them into the value of M moments. These values of M moments denote
422
C. Guo and Y. Teng
the prediction value of the following M moments. The architecture of this method of data compartmentalization for prediction model is illustrated in Table 1. Compartmentalize the data into K data segments which N+M in length and with some superposition, every data segment could be regarded as a stylebook, thereby, acquire K=L-(N+M)-1 stylebooks. Take the data of the foregoing N moments of every stylebook as the input of neural network, and the value of the following M moments as purpose output. Through learning, carry out the purpose of the mapping from input space to output space, so as to achieve the purpose of prediction. Table 1. Method of data compartmentalization Input
Output
x1 , " , x N
x N +1 , " , xN +M
x2 , " , x N +1
x N + 2 , " , x N +M +1
… xK , " , x N + K −1
… xN +K , " , xN + M + K −1
5 Test Procedure and Results To analysis the precision and other correlative characteristic of the application on real time satellite clock bias prediction the model presented in this paper, with extensive matters comprehensive considerations, adopt Db6 wavelet of Daubechies wavelet to carry out clock bias series data processing, and adopt N=6, M=1, to construct coasting window, then select 4 satellites described as Figure 3, which have the more typical clock bias variation laws for carrying through precision analysis of clock bias prediction.
Fig. 3. Clock bias data of satellite
Because of the neural network is provided with the ability of massively parallel processing, awfully mighty adaptive learning and fault tolerant, so it is a beautiful modeling method for dynamic system. To validate the advantage of the mixture
Application of a Novel Data Mining Method
423
model presented in this paper, above all, study the modeling mechanism and modeling characteristic of neural network, and compare the performance of it with the model presented in this paper. In this section, we carry out prediction precision contrastive analysis for neural network model, traditional grey model and the proposed model in this paper with IGS precise ephemeris data. By the prediction precision contrastive analysis among them three familiar excellent prediction model, validate the feasibility and validity of the application on satellite clock bias prediction of the mixture model presented in this paper. Satellite clock bias series are the functions of time, when we predict satellite clock bias with second-order polynomial model, the accumulate of errors will augment along with the protraction of prediction time, and beyond that, the polynomial model will be influenced by the anthropogenic factors just as the order of polynomial, the number of the data and so on easily, so the prediction performance of traditional grey model is better than the second-order polynomial model [16]. Then we carry out clock bias prediction for the 4 satellites by the three models mentioned above respectively, and their prediction errors are shown as Figure 4 as follow.
Fig. 4. Prediction errors of these models
424
C. Guo and Y. Teng
From Figure 4, we can find that the prediction precision and stability of neural network is far better than the traditional grey model. That is mainly caused by that the styles of clock bias series are not exclusive, and the degrees of sensitivity of the different style clock bias series with the exponent coefficients of grey model are far more different, so when we predict clock bias with the grey model, the different exponent coefficients of grey model are provided with mortal influence to the prediction precision, but the exponent coefficients of traditional grey model are always supposed as a constant regularly, which will bring on prodigious error, and even fault. And also according to Figure 4, we can find that the prediction precision and stability of the mixture model based on wavelet analysis and neural network is better than the neural network model. These mainly caused by for satellite clock bias prediction, neural network model also has some localization. The solve ability of clock bias data which mutations existed, although it allows the mutations existing, it has no idea to predict these mutations, and it could only reflect the infections of these mutations for a certain speed in the subsequent prediction. But one of the most important purposes of the importing of wavelet analysis in the mixture model based on wavelet analysis and neural network is carry out processing to the isolated point and leap point, so as to wipe off their infections to clock bias prediction, and pick-up the trend terms of these series, which will be provided with important significations for the data processing and the building of the model for long term measurements, so the application of wavelet analysis is aim to improve the modeling characteristic of neural network model, so as to solve the localization which is weak to process the clock bias data with mutations of neural network model commendably.
6 Conclusions and Further Research Based on discussing the localizations of the application on satellite clock bias prediction of neural network model and traditional grey model, adopt wavelet analysis to carry out pretreatment for the clock bias data, present a novel mixture model based on wavelet analysis and neural network for clock bias real-time predicting. Then research the performance with the clock bias data of different satellites, and acquire several conclusions which should be provided with certain reference value for satellite clock bias prediction of real-time GPS precise point positioning. (1) Adopt “coasting window” to compart- mentalize the clock bias data, when carry out predicting by neural network, which could not only improve the use ratio of the initial data availably, and also improve prediction precision by the real-time renewal of data. (2) By data pretreatment with wavelet analysis, neural network can hold the complex and meticulous evolutions rule more accurately by learning so as to approach the clock series better, and considering these all to acquire higher prediction precision. (3) The mixture model based on wavelet analysis and neural network could obviate the localizations of grey model and neural network model availably, so as to acquire better prediction precision.
Application of a Novel Data Mining Method
425
References 1. Huang, G.W., Zhang, Q., Xu, G.C., et al.: IGS Precise Satellite Clock Model Fitting and Its Precision by Using Spectral Analysis Method. Geomatics and Information Science of Wuhan University 33(5), 496–499 (2008) 2. Hao, M., Ou, J.K., Guo, J.F., et al.: A New Method for Improvement of Convergence in Precise Point Positioning. Geomatics and Information Science of Wuhan University 32(10), 902–905 (2007) (in Chinese) 3. Li, X.H., Wu, H.T., Bian, Y.J., et al.: Satellite virtual atomic clock with pseudorange difference function. Sci. China Ser. G-Phys Mech. Astron. 52(3), 353–359 (2009) 4. Cui, X.Q., Jiao, W.H.: Grey System Model for the Satellite Clock Error Predicting. Geomatics and Information Science of Wuhan University 30(5), 447–450 (2005) (in Chinese) 5. Lu, X.F., Yang, Z.Q., Jia, X.L., et al.: Parameter Optimization Method of Gray System Theory for the Satellite Clock Error Predicating. Geomatics and Information Science of Wuhan University 33(5), 492–495 (2008) (in Chinese) 6. Huang, N.E., Wu, M.L., Qu, W.D., et al.: Applications of Hilbert-Huang Transform to Non-Stationary Financial Time Series Analysis. Applied Stochastic Models in Business and Industry 19(3), 245–268 (2003) 7. He, T.J., Huang, W.: Application of fuzzy neural network in bitumen pavement performance assessment. Journal of Highway and Transportation Research and Development 17(4), 15–18 (2000) 8. Hu, X.G., Wang, B.G.: Application of two genetic algorithms based method in pavement performance synthetic evaluation. Journal of Chang’ an University (Natural Science Edition) 22(2), 6–9 (2002) 9. Liu, M.Y., Huang, W.X., Ye, J.J.: Performance evaluation of asphalt pavement based on mixed GANN. Journal of Wuhan University of Technology 25(1), 36–39 (2003) 10. Chen, S.Y.: Engineering Fuzzy Set Theory and Application. National Defense Industry Press, Beijing (1998) 11. Mallat, S.G.: A Theory for Multi-Resolution Signal Decomposition: The Wavelet Representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(7), 674 (1989) 12. Yang, J.G.: Wavelet Analysis and Its Engineering Applications. China Machine Press, Beijing (2005) (in Chinese) 13. Karunasinghea, D., Liongb, S.: Chaotic time series prediction with a global model: Artificial neural network. Journal of Hydrology 323(1~4), 92–105 (2006) 14. Chen, S.Y.: Fuzzy Recognition Theory and Application for the Optimization of Complex Water Resource System. Jiling University Press, Changchun (2002) 15. Chen, S.Y., Ji, H.L.: Fuzzy optimization neural network approach for ice forecast in the Inner Mongolia reach of the Yellow River. Hydrological Sciences Journal 50(2), 319–330 (2005) 16. Zhu, L.F., Tang, B., Li, C.: Analysis on Performances of the Two Methods Satellite Clock Error Prediction. Journal of Spacecraft TT&C Technology 26(3), 39–43 (2007) (in Chinese)
Particle Competition and Cooperation for Uncovering Network Overlap Community Structure Fabricio Breve1 , Liang Zhao1 , Marcos Quiles2 , Witold Pedrycz3,4, and Jiming Liu5 1
Department of Computation, Institute of Mathematics and Computer Science, University of S˜ ao Paulo, S˜ ao Carlos, SP, Brazil, 13560-970 {fabricio,zhao}@icmc.usp.br 2 Department of Science and Technology, Federal University of S˜ ao Paulo, S˜ ao Jos´e dos Campos, SP, Brazil [email protected] 3 Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, T6R 2V4, Canada [email protected] 4 Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland 5 Computer Science Department, Hong Kong Baptist University, Kowloon, Hong Kong [email protected]
Abstract. Identification and classification of overlap nodes in communities is an important topic in data mining. In this paper, a new graphbased (network-based) semi-supervised learning method is proposed. It is based on competition and cooperation among walking particles in the network to uncover overlap nodes, i.e., the algorithm can output continuous-valued output (soft labels), which corresponds to the levels of membership from the nodes to each of the communities. Computer simulations carried out for synthetic and real-world data sets provide a numeric quantification of the performance of the method. Keywords: Graph-based method, community detection, particle competition and cooperation, overlap nodes.
1
Introduction
Community detection in networks is an important data mining problem that has received increasing interest in the last years. Many networks are found to be divided naturally into communities or modules, therefore discovering of these communities structure became an important research topic [1, 2, 3, 4, 5]. The problem of community detection is very hard and not yet satisfactorily solved, despite a large amount of efforts having been made over the past years [6]. The notion of communities in networks is straightforward, each community is defined as a subgraph whose nodes are densely connected within itself but D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 426–433, 2011. c Springer-Verlag Berlin Heidelberg 2011
Particle Competition and Cooperation
427
sparsely connected with the rest of the network. However, in practice there are common cases where some nodes in a network can belong to more than one communities at the same time. For example, in a social network of friendship, individuals often belong to several communities: their families, their colleagues, their classmates, etc. These nodes are often called overlap nodes, and most known community detection algorithms cannot detect them. Therefore, uncovering the overlapping community structure of complex networks becomes an important topic in data mining [7, 8, 9]. In this paper, we present a new community detection method, which uses competition and cooperation among particles walking in the network. It is inspired by the community detection method proposed in [10], in which only hard labels can be produced by unsupervised learning. That model features walking particles in the network competing with each other in order to possess as many nodes as possible. In the proposed method, besides of the particle competition mechanism, we also introduce the cooperative behavior through the concept of teams of particles. Particles in the same team cooperate with their teammates and compete against other teams. We also transform the unsupervised learning mechanism into a semi-supervised learning mechanism, in order to take advantage of a small portion of labeled samples that usually are available in data sets. The proposed model produces a fuzzy output (soft label) for each node of the network. Such continuous-valued output can be treated as the levels of membership of each node to each community. Therefore, it is able to uncover the overlap community structure in networks. The rest of this paper is organized as follows: Section 2 describes the model in details. Section 3 shows some experimental results from computer simulations, and in Section 4 we draw some conclusions.
2
Model Description
Given a data set χ = {x1 , x2 , . . . , xl , xl+1 , . . . , xn } ⊂ Êm and the corresponding label set L = {1, 2, . . . , c}, the first l points xi (i ≤ l) are labeled as yi ∈ L and the remaining points xu (l < u ≤ n) are left unlabeled, i.e, yu = ∅. The goal of the algorithm is to provide a vector of membership degrees to each class for each of the unlabeled data, which corresponds to a node in the graph representation. The first step of the algorithm is to build a network from a given data set. Define a graph G = (V, E). V = {v1 , v2 , . . . , vn } is the set of nodes, where each one vi corresponds to a sample xi . E is the set of edges (vi , vj ), which can also be represented by an adjacency matrix W: 1 if ||xi − xj ||2 ≤ σ and i = j Wij = , (1) 0 if ||xi − xj ||2 > σ or i = j where Wij specifies whether there is an edge between the pair of nodes xi and xj . σ is a threshold which defines the distance between xi and xj below which the nodes vi and vj are connected, and ||.|| is the Euclidean distance.
428
F. Breve et al.
For each labeled data xi ∈ {x1 , x2 , . . . , xl } or its corresponding node in the network vi ∈ {v1 , v2 , . . . , vl }, a particle ρi ∈ {ρ1 , ρ2 , . . . , ρl } is generated and its initial position is at the node vi . Thus, there is a particle for each labeled sample in the data set. If vi is the initial position of particle ρi , we call it the home node of particle ρi . At each iteration, each particle changes its position and register the distance it is from its home node. Particles generated from samples with the same class labels form a team and cooperate with each other to compete with other teams. d Each particle ρj comes with two variables: ρω j (t) and ρj (t). The first variable ω ρj (t) ∈ [0, 1] is the particle strength, which indicates how much the particle can affect a node levels at time t. The second variable is a distance table, i.e., a vector d1 d2 dn di ρd n − 1] j (t) = {ρi (t), ρi (t), . . . , ρi (t)}, where each element ρj (t) ∈ [0, corresponds to the distance measured between the particle’s home node vj and its current position. Each node vi has two variables. The first variable is a vector viω (t) = ω1 {vi (t), viω2 (t), . . . , viωc (t)} called instantaneous domination levels, and each element viω (t) ∈ [0, 1] corresponds to the level of domination of team over node vi . At each node, the sum of the domination levels is always constant, as follows: c
viω = 1.
(2)
=1
This relation is possible because particles increase the node domination level of their own team and, at the same time, decreases the other teams’ domination levels. The second variable is the long term domination levels, which is a vector viλ (t) = {viλ1 (t), viλ2 (t), . . . , viλc (t)}, and each element viλ (t) ∈ [0 ∞] represents long term domination level by team over node vi . Long term domination levels can vary from zero to infinity, and they never decrease. Each node vi has the initial value of its instantaneous domination vector viω set as follows: ⎧ ⎨ 1 if yi = viω (0) = 0 if yi = and yi ∈ L , (3) ⎩1 if y = ∅ i c i.e., for each node corresponding to a labeled data item, the domination level of the dominating team is set to the highest value 1, while the domination levels of other teams are set to the lowest value 0; for each node corresponding to an unlabeled data item, the domination levels of all particle teams are set to the same value 1c , where c is the number of classes (number of teams). On the other hand, in all nodes, all long term domination levels viλ (0) have their initial values set to zero, for all the classes no matter if the corresponding data item is labeled or unlabeled. Each particle has its initial position set to the corresponding home node, and their initial strength is set as follows: ρω j (0) = 1, i.e., each particle starts with maximum strength.
(4)
Particle Competition and Cooperation
429
Particles have limited knowledge of the network, they only know the distances from their home node to nodes that they already visited. Distances are recalculated dynamically at each particle movement. Thus, the distance table of each particle is set as follows: 0 if i = j ρdj i (t) = , (5) n − 1 if i = j i.e., for each particle, the distance from its home node is set to zero, and all the other distances are assumed to be the largest possible value n − 1. At each iteration, each particle will select a neighbor to visit. There are two different kinds of movements a particle can use: random movement and greedy movement. During random movement, a particle randomly chooses any neighbor to visit without concerning domination levels or distance from its home node. This movement is used for exploration and acquisition of new nodes. Meanwhile, in greedy movement, each particle prefers visiting those nodes that have been already dominated by its own team and that are closer to their home nodes. This movement is used for defense of both its own and its team’s territories. In order to achieve an equilibrium between exploratory and defensive behavior both movements are applied. Therefore, at each iteration, each particle has probability pgrd to choose greedy movement and probability 1 − pgrd to choose random movement, with 0 ≤ pgrd ≤ 1. Once the random movement or greedy movement is determined, the target neighbor node ρτj (t) will be chosen with probabilities defined by Eq. (6) or Eq. (7), respectively. In random walk the particle ρj tries to move to any node vi with the probabilities defined as: Wqi p(vi |ρj ) = n , (6) μ=1 Wqμ where q is the index of the current node of particle ρj , so Wqi = 1 if there is an edge between the current node and any node vi , and Wqi = 0 otherwise. In greedy movement the particle tries to move to a neighbor with probabilities defined according to its team domination level on that neighbor ρω j and the inverse of the distance (ρdj i ) from that neighbor vi to its home node vj as follows: p(vi |ρj ) =
Wqi viω
1 d (1+ρj i )2 n ω 1 μ=1 Wqμ vi (1+ρdi )2 j
.
(7)
Once more, q is the index of the current node of particle ρj and = ρfj , where ρfj is the class label of particle ρj . Particles of different teams compete for owning the network nodes, when a particle moves to another node, it increases the instantaneous domination level of its team in that node, at the same time it decreases the instantaneous domination level of the other teams in that same node. The exception are the labeled nodes, which instantaneous domination levels are fixed. Thus, for each selected target node vi , the instantaneous domination level viω (t) is updated as follows:
430
F. Breve et al.
⎧ Δv ρω j (t) ⎪ max{0, viω (t) − c−1 } ⎪ ⎪ ⎪ ⎪ ⎨ if yi = ∅ and = ρfj viω (t + 1) = viω (t) + q= viωq (t) − viωq (t + 1) , ⎪ ⎪ ⎪ if y = ∅ and = ρfj ⎪ ⎪ ⎩ ω i vi (t) if yi ∈ L
(8)
where 0 < Δv ≤ 1 is a parameter to control changing rate of the instantaneous domination levels and ρfj represents the class label of particle ρj . If Δv takes a low value, the node instantaneous domination levels change slowly, while if it takes a high value, the node domination levels change quickly. Each particle ρj increases the instantaneous domination level of its team (viω , = ρfj ) at the node vi when it moves to it, while it decreases the instantaneous domination levels of this same node of other teams (viω , = ρfj ), always respecting the conservation law defined by Eq. (2). The instantaneous domination level of all labeled node viω are always fixed, as defined by the third case expressed by Eq. (8). Regarding long term domination levels, at each iteration, for each selected node vi in random movement, the long term domination level viλ (t) is updated as follows:: viλ (t + 1) = viλ (t) + ρω (9) j (t) where is the class label of particle ρj . Eq. (9) shows that the updating of the long term domination levels viλ (t + 1) is proportional to the current particle strength ρω j (t). This is a desirable feature because the particle probably has a higher strength when it is arriving from its own neighborhood, while it has a lower strength when it is arriving from nodes from other teams neighborhoods. When greedy movement is selected, long term domination levels are not updated. Regarding particles strength, they get stronger when they move to a node being dominated by its own team and they get weaker when they move to a node dominated by other teams. Thus, at each iteration t, a particle strength ρω j (t) is updated as follows: ω ρω j (t + 1) = vi (t + 1),
(10)
where vi is the target node, and = ρfj , i.e., is the class label of particle ρj . Therefore, each particle ρj has its strength ρω j set to the value of its team ω instantaneous domination level vi j of the node vi . It is important to notice that when a particle moves, it may be accepted or rejected in the target node due to the competition mechanism. First, a particle modifies both the node instantaneous and long term domination levels as explained, then it updates its own strength, and finally it will be accepted in the new node only if the domination level of its team is higher than others; otherwise, a shock happens and the particle goes back to the last node until next iteration. The distance table purpose is to keep the particle aware of how far it is from its home node. This information is used in the greedy movement in order to keep the particle around its own neighborhood most of the time, avoiding letting it
Particle Competition and Cooperation
431
susceptible to be attacked by other teams. The instantaneous domination levels together with the distance information also avoid situations where a particle would walk into enemies’ neighborhoods and lose all its strength. Each particle ρj updates its distance table ρdj k (t) at each iteration t as follows: ρdj k (t + 1) =
ρdj i (t) + 1 if ρdj i (t) + 1 < ρdj k (t) , ρdj k (t) otherwise
(11)
where ρdj i (t) and ρdj k (t) are the distances to its home node from the current node and the target node, respectively. After the last iteration, the degrees of membership fi ∈ [0 1] corresponding to each node vi are calculated using the long term domination levels, as follows: v λ (∞) fi = c i λq q=1 vi (∞)
(12)
where fi represents the final membership level from the node vi to community . Based on the membership degrees (fuzzy output), we have formed an overlap measure in order to easily illustrate the application of the algorithm. Therefore, the overlap index oi for a node vi is defined as follow: oi =
fi∗∗ fi∗
(13)
where ∗ = arg max fi and ∗∗ = arg max,=∗ fi , and oi ∈ [0 1], where oi = 0 means completely confidence that the node belongs to a single community, while oi = 1 means the node is completely undefined being shared among two or more communities.
3
Computer Simulations
In this section, we present some simulation results to evaluate the effectiveness of these modifications. The graphs are built from the data sets by using Eq. (1), with the parameter σ being empirically set for each problem, i.e., a set of simulations is executed by varying σ and the value leading to the best result is chosen. The algorithm parameters in this modified version are less sensitive than in the original version, and therefore they are empirically set to Δv = 0.1 and pgrd = 0.5 for all the experiments in this subsection. Figure 1a shows a data set with 4 classes with Gaussian distribution, generated by using PRTools [11] function gendats with 1, 000 elements (250 per class) and 20 samples are labeled (5 per class), represented by the red squares, blue triangles, green lozenges and purple stars. The algorithm is applied to the data set and the detected overlap indexes are shown in Fig. 1b. We see that the nodes in the interior of each class are small and dark blue, i.e., they are clearly non-overlapping nodes. Meanwhile, the nodes in the borders among classes have
432
F. Breve et al.
6
6
4
4
2
2
0
0
−2
−2
−4
−4
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
−6 −5
−4
−3
−2
−1
0
1
2
3
−6 5 −5
4
−4
−3
−2
−1
(a)
0
1
2
3
4
5
0
(b)
Fig. 1. Classification of normally distributed classes (Gaussian distribution). (a) toy data set with 1, 000 samples divided in four classes, 20 samples are labeled, 5 from each class (red squares, blue triangles, green lozenges and purple stars). (b) nodes size and colors represent their respective overlap index detected by the proposed method. 1
18 22
0.9
13 5
31
2
4
0.7 23 33
8
3
12
15
34
20
6
0.8
9
14
17
21 19
1
11
7
10
16 32 24 28
26 25
0.5 0.4
30 29
0.6
27
0.3 0.2 0.1 0
Fig. 2. The karate club network. Nodes size and colors represent their respective overlap index detected by the proposed method.
tonalities and larger sizes, which represent their different levels of overlap. These results are in agreement with our intuition. The proposed algorithm is also applied to a real-world data set: the Zachary’s Karate Club Network [12]. The data set is presented to the algorithm with only two labeled nodes: 1 and 34, each one representing a different class. The results are shown in Fig. 2, and the overlap index of each node is indicated by their sizes and colors. Our visual inspection indicates that this is a good result as well. Notice that although the two labeled nodes exhibit some degree of overlap, the algorithm still produced a good result, even detecting these overlap degrees in the labeled nodes (notice the slightly larger size and the lighter blue color).
Particle Competition and Cooperation
433
This is also a desirable feature, since we do not need to choose a non-overlap node to represent a class.
4
Conclusions
This paper presents a new semi-supervised learning graph-based method for uncovering the network overlap community structure. The method combines cooperation and competition among particles in order to generate a fuzzy output (soft label) for each node in the network. The fuzzy output correspond to the levels of membership of the nodes to each class. An overlap measure is derived from these fuzzy output, and it can be considered as a confidence level on the output label. Computer simulations with both synthetic and real-world data sets show that the proposed model is a promising method for classification of data sets with overlap structure, as well as detecting and quantifying an overlap measure for each node in the network. Acknowledgments. This work is supported by the State of S˜ ao Paulo Research Foundation (FAPESP) and the Brazilian National Council of Technological and Scientific Development (CNPq).
References [1] Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Physical Review E 69, 026113 (2004) [2] Newman, M.: Modularity and community structure in networks. Proceedings of the National Academy of Science of the United States of America 103, 8577–8582 (2006) [3] Duch, J., Arenas, A.: Community detection in complex networks using extremal optimization. Physical Review E 72, 027104 (2005) [4] Reichardt, J., Bornholdt, S.: Detecting fuzzy community structures in complex networks with a potts model. Physical Review Letters 93(21), 218701 (2004) [5] Danon, L., D´ıaz-Guilera, A., Duch, J., Arenas, A.: Comparing community structure identification. Journal of Statistical Mechanics: Theory and Experiment 9, P09008 (2005) [6] Community detection in graphs. Physics Reports 486(3-5), 75–174 (2010) [7] Zhang, S., Wang, R.S., Zhang, X.S.: Identification of overlapping community structure in complex networks using fuzzy c-means clustering. Physica A Statistical Mechanics and its Applications (2007) [8] Palla, G., Der´enyi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature (7043), 814–818 (2005) [9] Zhang, S., Wang, R.S., Zhang, X.S.: Uncovering fuzzy community structure in complex networks. Physical Review E 76(4), 046103 (2007) [10] Quiles, M.G., Zhao, L., Alonso, R.L., Romero, R.A.F.: Particle competition for complex network community detection. Chaos 18(3), 033107 (2008) [11] Duin, R., Juszczak, P., Paclik, P., Pekalska, E., de Ridder, D., Tax, D., Verzakov, S.: Prtools4.1, a matlab toolbox for pattern recognition [12] Zachary, W.W.: An information flow model for conflict and fission in small groups. Journal of Anthropological Research 33, 452–473 (1977)
Part-Based Transfer Learning Zhijie Xu and Shiliang Sun Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 200241, P.R. China [email protected], [email protected]
Abstract. Transfer learning, serving as one of the most popular theory in machine learning, has attracted a lot of attention recently. In this paper, we propose a new learning strategy called part-based transfer learning (pbTL), which is a process of parameter transfer. Dissimilar to many traditional works, we consider how to transfer the information from one task to another in the form of parts. We regard all the complex tasks as a collection of constituent parts and every task can be divided into several parts respectively. It means transfer learning between two complex tasks can be accomplished by sub-transfer learning tasks between their parts. Through developing it in this hierarchical fasion, we can reach a better outcome. Experiments on synthetic data with support vector machines (SVMs) validate the effectiveness of the proposed learning framework. Keywords: Machine learning, Transfer learning, Part-based modeling, Support vector machine.
1
Introduction
Standard machine learning relies on the availability of a number of labeled data to train an effective model in one feature space. However, the fact is that researchers can not get enough data they want in most of the time. Due to this reason, various machine learning strategies have been proposed, including multitask learning [1], multi-view learning[2], transfer learning [3,4] and self-taught learning [5]. In addition, unfortunately, in real applications, source task (auxiliary task) and target task (the task to be studied) often come from different distributions. For example, Wall Street firms often hire physicists to solve finance problems [6] even though these two domains have superficially nothing in common. Easy to see in this example that human can deal with this kind of problem through applying knowledge learning in one domain to an entirely different one. What is more, the reason human can do this is that they have ability to choose the part of knowledge which is useful to learn the new task. Nevertheless, computer cannot distinguish whether one part in the source task is good or bad to transfer to the target task. As a result, it is important for us to teach the computer to judge the importance of every part in one complex task. After dividing the whole into parts, more details which belong to different parts can come to us. D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 434–441, 2011. c Springer-Verlag Berlin Heidelberg 2011
Part-Based Transfer Learning
435
Many collections of data exhibit a common underlying structure: they consist of a number of parts of factors, each with a range of possible states [7]. Through the learning of different parts, we can get more various kinds of knowledge even they come from the same task. This theory which names part-based model has been used in many fields, especially the image processing and computer vision. In [8,9], authors use this theory to establish the model for the task of object recognition in different ways. But, all of them have just made use of part-based theory in one single task. Different from these research, now we integrate part-based modeling into transfer learning so as to promote the effectiveness of transfer learning on the basis of focusing more attention on the part which can contribute much more. In this paper, for the purpose of distinguishing the importance of different parts and getting a set of better parameters for transfer, we establish a framework named, part-base transfer Learning (pbTL). The function of pbTL can be indicated from following three points. Firstly, it is necessary for us to make a difference about the effect of various parts for transfer learning. Secondly, some latent knowledge can be explored in the learning of parts. Finally, we believe the source task which contains a larger data set can get a better set of parameters than the target task. As a result, in our framework, according to every part divided from the source task and target task, we can construct a classifier with the help of support vector machines (SVMs). Moreover, the most important step is to get the optimal parameters in the source task part by part and transferring them into target task to help training the classifiers. After that, these classifiers are combined into a whole with the weights about the accuracy rate. The effectiveness of our proposed learning framework can be clearly supported by the experiment on synthetic data designed by us. The remainder of this paper is organized as follows. In Section 2, we introduce the related background knowledge on transfer learning and SVMs. Following this, in Section 3, we describe our framework pbTL in detail. Then, experiments to illustrate the validity of the proposed method are provided in Section 4. Section 5 concludes the whole paper and gives some future works.
2 2.1
Background Knowledge Transfer Learning
Transfer learning is what happens when someone finds it much easier to learn to play table tennis having already learned to play badminton; or to recognize motors having already learned to recognize bicycles; or to learn Spanish having already learned Italian. Early transfer learning works raised some important issues, such as learning how to learn, learning one more thing, multi-task learning [10]. Similar to multi-task learning, one significant factor of transfer learning is to find the common knowledge between multiple tasks [11]. However, in transfer learning, this interesting knowledge usually only belongs to the target task, which is different from multi-task learning. The only goal of us is to solve the target
436
Z. Xu and S. Sun
task with the help of auxiliary knowledge got from source tasks. This kind of learning theory is essentially one way learning, not interdependent learning. 2.2
SVMs
The SVM training is essentially the minimization of both the empirical error (training data error) and the structural error (generalization error), according to the structural risk minimization principle. The SVM is based on the fact that the generalization error rate is limited by the addition of the training error rate and a value that depends on the VC dimension, which is a measure of the capacity of a family of classifiers [12]. Moreover, in the help of maximum margin attainable between two classes, the support vectors can define the boundaries of the class region. According to the basic theory of SVM and training set (xi , yi )M i=1 , we can get the decision equation as following: f (x) = w, φ(x) + b.
(1)
In above formula, b represents the bias parameter,·, ·is the dot product in the feature space and φ(·) is the non-linear mapping from the input space to the high-dimensional feature space. In order to obtain a SVM with a soft margin, the following function (2), which is subject to (3), is necessary for the minimization. In these formulas, ξi represents a non-negative parameter to relax the classification boundaries and C is the regularization coefficient which is used to establish a balance between model complexity and training error. 1 2 w + C ξi , . 2 i=1
(2)
yi (w, φ(x) > +b) ≥ 1 − ξi , .
(3)
M
g(w, ξ) =
In the next step, we need to get the optimization problem presented in (2) and (3). Detailed formula has been written as (4) and its constraint Function (5), where αi represents the Lagrange multiplier corresponding to the xi example, and k(xi , xj ) is the kernel function. min W (α) = −
M i=1
M
1 yi yj αi αj k(xi , xj ). 2 i=1 j=1 M
αi +
M
αi yi = 0 and 0 ≤ αi ≤ C, i = 1, . . . , M .
(4)
(5)
i=1
The kernel trick is used to define a kind of mapping of the training samples. The most common kernel functions for SVM can be seen in the table below:
Part-Based Transfer Learning
437
Table 1. Commonly used kernel functions
3
Kernel Function
Formula
Linear Ploynomial RBF Sigmold
(xi · xj ) (xi · xj + 1)p 2 2 e−xi −xj /2σ tanh(kxi · xj − δ)
Our Proposed Method
After explaining the concept of transfer learning and part-based theory, now we develop our learning framework in this section. 3.1
Part-Based Transfer Learning (pbTL)
In order to use the benefits of part-based theory in parameter transfer learning fully, we design a new learning framework combining them into a whole. At first, we use SVMs to train a classifier and obtain a set of optimal parameters for every part of source task. Then, these sets of parameters need to be transferred to the part of target task correspondingly. This is the process of parameter transfer learning. Following this, we can get several better classifiers trained on the basis of these parameters about every part in target task and combine them into a final classifier in the help of weights at last. The function of this step is to make sure which part can contribute much more. Experiments in the following section show that this method is effective. Before coming to our learning framework pbTL, some hypotheses need to be illustrated here. Firstly, because the purpose of our paper is to show the effectiveness of using part-based theory in transfer learning, we suppose that the data containing the same label in both source task and target comes from the same distribution. This means we do not need to consider the problem of similarity here. Secondly, we use RBF as the specific kernel function in SVMs. The detailed formula can be written as: 2
k(xi − xj ) = e−xi −xj
/2σ2
.
(6)
Thirdly, depending on the first suppose, we divide the source task and target task into several parts correspondingly in the term of features. Moreover, in this step, we not only need to divide these features into diverse parts averagely and orderly, but also consider the factor that different part may be relative to each other and contain some similar characteristics. For example, according to the suppose that dimension = 10 (the dimension of data set) and N = 3 (the number of parts will be divided), now we can split the first nine features into
438
Z. Xu and S. Sun
three parts in order and supply the last feature for every part so as to create three interdependent 4-dimensions parts. Detailed process of pbTL is given below:
Framework of Part-Based Transfer Learning Input: sequence of n labeled examples X = (x1 , y1 ), · · · , (xn , yn ) in source task sequence of m labeled examples U = (u1 , l1 ), · · · , (um , lm ) in target task Initialize the number of parts will be divided : N Initialize the parameter set (C, σ) in SVM and RBF 1. Divide the source task into N parts by their features: S1 , · · · , SN 2. Get N classifiers of every part in source task:fS1 , · · · , fSN , and their optimal parameter vector: (CS1 , σS1 ), · · · , (CSN , σSN ) 3. Divide the target task into N parts corresponding to the source task: T1 , · · · , TN 4. Design the parameter vector of T1 , · · · , TN by (CS1 , σS1 ), · · · , (CSN , σSN ) 5. Get N classifiers of target task:fT1 , · · · , fTN 6. Calculate the accuracy rate, AccuracyT1 , · · · , AccuracyTN about fT1 , · · · , fTN 7. Calculate the weights of fT1 , · · · , fTN : f or i = 1, · · · , N AccuracyTi WTi = N Accuracy i=1
Ti
end Output the hypothesis: N 1 if i=1 WTi ∗ fTi (x) ≥ 0 hf (x) = −1 otherwise
According to the learning frame above, it needs to notice following two points. On one hand, every set of parameters get in source task are only transferred to the relevant part in target task. On the other hand, the classifiers in both task are trained on the basis of their own data set, X and U . 3.2
Explanation about pbTL
Owing to the characteristic of SVM and RBF, the set of parameters contains two member, C in (2) and σ in (6). They are the core elements in our parameter transfer learning. Through transferring them to target task part by part, the merit of different parts can be indicated clearly. Moreover, the goal of treating diverse part different can be reached as well. AccuracyTi WTi = N . i=1 AccuracyTi
(7)
What is more, in (7), so as to further distinguish the importance of different parts in target task, we calculate the weight of every classifier by the distribution
Part-Based Transfer Learning
439
of their accuracy rate about the training data set. These accuracy rates indicate the percent of samples which are predicted correctly by fTN . Furthermore, in the output step, the sum of the product of every classifier and its weight compose the final classifier. Detailed formula can be written as: hf (x) =
N 1 if i=1 WTi ∗ fTi (x) ≥ 0 −1 otherwise
(8)
In addition to some basic description above, it is also important for us to do some further discussion here. In our paper, we introduce the general framework of pbTL and have not considered the problem of the diversity between the parts coming from two tasks. For example, in step 4, we train the classifiers in target task on the basis of the optimal set of parameters got in source task directly and do not make any change. However, sometimes we can not avoid the problem of diversity. As a result, if we want to deal with this issue and come to the parameters which are more suitable for the target task, we can actually just initialize the parameter vector by the optimal set of parameters got in source task. After that, the work is to update it through continuous iterations with some other processors such as Neural Network until we get the satisfied one.
4
Experiments and Results
In this section, we implement the experiments of two-classes classification problem focusing on the synthetic data set designed by the multivariate Gaussian distribution. In the process of this action, we intend to create two ten-dimensions data set in both source task and target task. Moreover, in every data set, we randomly get the mean of class one (M ean1 ) and come to the mean of class two (M ean2 ) as formula below: M ean2 = M ean1 ± 5.
(9)
In order to obtain the influence caused by discreteness in our experiment, we design the covariance in both class as three values, Cov = 15, 20, 25. Furthermore, we use Support Vector Machines to establish the basic classifier and choose Gussian RBF as the kernel function. Also, we design the parameter N = 3, which indicates the number of parts will be divided in source task and target task. Table 2 demonstrates the summary of data sets. According to the Table 2, we can know that source task which is regard as the auxiliary task contains 500 samples totally and all of these samples are training data which are used to train the optimal set of parameters for transfer. However, target task contains only half number of samples compared to source task and the ratios of training data to testing data is one to one.
440
Z. Xu and S. Sun Table 2. Summary of data sets Synthetic data set
-
Number of Samples in Source Task Number of Samples in Target Task Ratios of training data to testing data in Target task Dimension Number of Class Mean of Class One (M ean1 ) Mean of Class Two (M ean2 ) Covariance of both Class
500 250(125:125) 1:1 10 2 Random M ean1 ± 5 15/20/25
Table 3. Experimental results Accuracy rate %
Cov=15
Cov=20
Cov=25
SVM Part-based SVM Transfer SVM pbTL-SVM
48 62.4 89.6 90.4
47.2 51.2 69.6 83.2
44 44 45.6 66.4
Depending on all the information above, we finish three groups of experiments in this paper with the change of covariance. The goal of this action is to know the effect when the intercrossing between two classes increases. Table 3 gives the classification results. Table 3 indicates that both Transfer Learning SVM (TLSVM) and PbTLSVM outperforms the standard SVM and Part-based SVM. However, with the increasing of covariance, more samples from two classes intercross with each other, PbTL-SVM obviously indicates its stronger stability than TLSVM and come to a better result.
5
Conclusion and Future Work
In this paper, we propose a novel learning framework, part-based Transfer Learning. It embeds part-based theory into transfer learning and makes a fully use of the contribution of different part in one task through bringing out their own characteristics. In the help of two-class classification problem experiments in synthetic data set, this framework proves to be quite useful and effective than traditional transfer learning. In the future work, this method needs to be extended to the practise. At the same time, compared to our proposed learning frame, it may be a new challenge to research it with the increasing of the number of parts divided in one task.
Part-Based Transfer Learning
441
Acknowledgment This work is supported in part by the National Natural Science Foundation of China under Project 61075005, 2011 Shanghai Rising-Star Program, and the Fundamental Research Funds for the Central Universities.
References 1. Jin, F., Sun, S.: A multitask learning approach to face recognition based on neural networks. In: Proceedings of 9th International Conference on Intelligent Data Engineering and Automated Learning, pp. 24–31 (2008) 2. Chen, Q., Sun, S.: Hierarchical multi-view fisher discriminant analysis. In: Proceedings of 16th International Conference on Neural Information Processing, pp. 289–298 (2009) 3. Raina, R., Ng, A.Y., Koller, D.: Constructing informative priors using transfer learning. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 713–720 (2006) 4. Wu, P., Dietterich, T.G.: Improving SVM accuracy by training on auxiliary data sources. In: Proceedings of the 21st International Conference on Machine Learning, p. 110 (2004) 5. Raina, R., Battle, A., Lee, H., Packer, B., Ng, A.Y.: Self-taught learning: Transfer learning from unlabeled data. In: Proceedings of the 24th International Conference on Machine Learning, pp. 759–766 (2007) 6. Davis, J., Domingos, P.: Deep transfer via second-order Markov logic. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 217–224 (2009) 7. Ross, D.A., Zemel, R.S.: Learning parts-based representations of data. Journal of Machine Learning Research 7, 2369–2397 (2006) 8. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 1627–1645 (2009) 9. Bar-Hillel, A., Hertz, T., Weinshall, D.: Object class recognition by boosting a partbased model. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Recognition, vol. 1, pp. 702–709 (2005) 10. Dai, W., Yang, Q., Xue, G.R., Yu, Y.: Boosting for transfer learning. In: Proceedings of the 24th International Conference on Machine Learning, pp. 193–200 (2007) 11. Ben-David, S., Schuller, R.: Exploiting task relatedness for multiple task learning. In: Proceedings of the 6th Annual Conference on Learning Theory, pp. 567–580 (2003) 12. Lima, N.H.C., Neto, A.D.D., Melo, J.D.: Creating an ensemble of diverse support vector machines using Adaboost. In: Proceedings of the International Joint Conference on Neural Networks, pp. 2342–2346 (2009)
A Parallel Wavelet Algorithm Based on Multi-core System and Its Application in the Massive Data Compression Xiaofan Lu, Zhigang Liu, Zhiwei Han, and Feng Wu School of Electrical Engineering, Southwest Jiaotong University, Chengdu 610031, China [email protected]
Abstract. In this paper we discuss the issue relevant to the parallel implementation of wavelet and wavelet packet on the single PC with multi-core system. The massive data compression has to manage large size of data which making parallel computing highly advisable. We adopt the OpenMP to develop the wavelet and wavelet packet parallel algorithm. Through analyzing the potential concurrency of the Mallat algorithm and distributing the loops to different processors properly, we propose a new parallel wavelet and both a nested and a non-nested parallel wavelet packet algorithms. Then they are applied in the massive data compression. Comparing with the time consuming of serial wavelet algorithm and parallel, the speed of parallel algorithm is nearly two times as fast as the serial one which can be proportional to the number of the processors. Keywords: wavelet; wavelet package; parallel; OpenMP; single PC with multi-core system.
1 Introduction Wavelet transformation has the good localization properties in time and frequency domain which makes it appear the obvious superiority in various application of the power system [1]. However, with the rapid growth of the data recording device, the real time demands of the mass data compression and memory have been further promoted. The convolution operational was adopted in the traditional serial wavelet transformation. But the process was complex, the operand was large, and the timeliness was bad, these hindered the application pace of the practical engineering. At present, scholars state that utilizing the parallel technology to speed up the wavelet transformation is an effective way [2-4]. Most literatures take the tests in the multi-machine network environment which would be inevitably affected by the system maintenance and the bandwidth speed [5-7].With the development of the computer, the single multi-core system is going deeply into people’s production and living. Therefore, the parallel wavelet based the multi-core technology has great significance. We utilize the OpenMP technology to improve the speed of wavelet calculation and propose both a nested and a non-nested parallel strategy based on the characteristics of wavelet packet decomposition and reconstruction. Then the parallel algorithms D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 442–449, 2011. © Springer-Verlag Berlin Heidelberg 2011
A Parallel Wavelet Algorithm Based on Multi-core System
443
programmed in C language are tested on the single multi-core system for the mass data compression experiment and the results are significant.
2 OpenMP OpenMP standard came out as a shared memory standard and an API library on multiprocessors which the portable multithreaded application is written by[8]. Visual C++2008 fully supports this. OpenMP uses the guidance statement, calls library function and environment variables to provide to the users the way of creating and managing portable parallel programs on the shared memory computer. It takes the Fork-Join Model and starts to run with one main thread. The main thread forks a set of threads when it meets the parallel zone (including the main thread). At the end of the parallel program executing, the main thread continues to run, the others hang up. A parallel region can be nested in another parallel region, so the original thread becomes the main thread of the thread group that it forked [9]. Using the OpenMP standard to develop the potential parallel performance of the wavelet serial programs, which could make the parallel wavelet algorithms analyze the mass date quickly.
3 Principle of Parallel Wavelet and Wavelet Packet Mallat Algorithms 3.1 Principle of Parallel Wavelet Mallat Algorithm According to the theory of the multi-resolution, Mallat proposed the fast algorithm of the wavelet decomposition and reconstruction. The decomposition principle is explained by the equation (1) which shows in the Figure 2: a j +1 ( k ) = d j +1 ( k ) =
∞
∑a
n = −∞
j
( n ) h0 ( n − 2 k ) = a j ( k ) * h0 ( 2 k )
j
( n ) h1 ( n − 2 k ) = a j ( k ) * h1 ( 2 k )
∞
∑a
n = −∞
(1)
Where h (k ) = h(−k ) , a j +1 are the approximate coefficients, and d j +1 are the detail coefficients. Equation (1) and figure 1 show that the core of the wavelet decomposition is the convolution process between the filter and the original data. From the formulas of the convolution, we can learn that the results of the convolution are related with the original data, the convolution process of the original data in different locations with the filter gets different results. Therefore, the results of the convolution from different locations have no correlations, and the original data can be partitioned and analyzed in parallel convolution. 3.2 Principle of Parallel Wavelet Packet Mallat Algorithm Wavelet packet is the extension of the wavelets. It is on the basis of wavelet transform (V0=V1⊕W1=V2⊕W2⊕W1=…)not only decomposed the V scale but also the Wj scale,
444
X. Lu et al. d1(k) h1(-k)
2
a0(k) h1(-k)
h1(-k)
2
a1(k) h0(-k)
2
h0(-k)
2
h1(-k)
2
h0(-k)
2
a0(k)
2 h0(-k)
2
Fig. 1. Wavelet Mallat algorithm principles Figur e 1. Figur e 1.
2
h1(-k) d1(k)
Ă
h0(-k)
2
Ă
a1(k)
Ă
Fig. 2. Wavelet packet Mallat algorithm principles
so it could thin the spectrum window broadened by increasing j in orthogonal wavelet transform, and be the most suitable time-frequency window or optimal basis to original signal. The process of decomposition is showing in the following Figure2. It can be seen from figure 2 that the principle of the wavelet packet decomposition is the same as the wavelet. It also has the potential of the parallel convolution process. In addition,Figur theree 3. is no correlation between the decomposition process of scale space and waveletFigur space. e 3.This part can also be executed as parallel task.
4 The Implementation of Parallel Mallat Algorithms Based on OpenMP 4.1 The Implementation of Parallel Wavelet Mallat Algorithms OpenMP standard can adjust the parallelism of the program directly on the serial program. In order to reduce the computation time, the convolution and alternation extraction can be performed simultaneously. for loops are the core of the wavelet decomposition, the direct #pragma omp for can make them executed simultaneously on the different processors. According to the number of the processors divide the original data into p groups, each group is executed by one thread, then the p threads contemporarily completed the process of the convolution and the alternation extraction of the p groups of data. To address the boundary issue, it could scale the boundary in cycle and make the shared data as a private copy to allocate to each thread. To illustrate the principle simply, the filter coefficients are assumed to be 4 in figure3. Where Ti(i=0., 1, 2…p-1)represents the different threads. The m sets of data are assigned to the different threads for executing as the private copies. After the data decomposition, read and write the scale coefficients and wave coefficients are generated by different data which also have no correlation, then take the OpenMP to complete the process of written back. Just like figure 4. The reconstruction is similar to the decomposition. 4.2
Implementation of Parallel Wavelet Packet Mallat Algorithms
The core part of the wavelet packet decomposition is more complex than the wavelet decomposition which is nested loops. It means to link the decomposition level of wavelet with the number of the low-frequency and the high-frequency coefficients.
Figur e 1.
A Parallel Wavelet Algorithm Based on Multi-core System The number of thread
T0
T1
...
Loop partitioning
0
1
2
3
4
5
6
7
Filter
3
2
1
0
3
2
1
0
3
2
1
0
3
2
1
...
445
Tp-1 0
1
2
3
0
Fig. 3. The parallel convolution process of the wavelet decomposition Figur e 3. start the main thread Loading data
Ă
Fork threads
The decomposion on the n level g0
g1
g2
g3
...
gp-1
T0
T1
T2
T3
...
Tp-1
Calculation end
Join to the main thread Fork the orther threads
Ă
Save the results and writ the data back
No Is the end of decompositopn? Yes Over
Fig. 4. The parallel wavelet flowchart based on OpenMP
The decompose level is assumed to be n(n≥1), the outer loop must be executed 2n-1 times. If the outer loop executes to the j( j=0, 1, 2…2n-1-1), the inner loop would start at j×N/2n-1and end in N/2n-1×(j+1). 4.2.1 Nesting In OpenMP, nesting is a Boolean property defaulted to be invalid. The nested parallel region is appearing at the time when the thread had executed in one parallel region but met another thread. If the nesting is enabled, it will generate a new thread set based on the rule of the dynamic thread. It enables the state of the nesting by the order omp_set_nested(i), i=0 means nesting is invalid and i=1 oppositely. The decomposition of the wavelet packet uses the nesting to parallel the nested loops. When the main thread meets the outer loop, it forks a set of threads, and these threads become the “main thread” in the inner loop including the original thread, which fork the new thread to execute the inner loop in parallel as before. At the end of the inner loop, these threads hang up progressively and back to the original main thread finally. The nested loop executed in parallel is shows in Figure 5.
446
X. Lu et al. T0
Tp-1
T0
...
T0(p-2)/2
T1
...
T1(p-2)/2
Ă j*N/m+1~N/m*(j+1)
j*N/m+1~N/m*(j+1)
T0
Fig. 5. The threads of the nested loop executing in parallel T0
T0
...
T0(p-1)
j*N/m+1~N/m*(j+1)
T0
Fig. 6. The threads of the non-nested loop executing in parallel
4.2.2 Non-nesting Another method is similar to the parallel implementation of the wavelet decomposition. It can only choose to parallel one loop level for the program of the nested loop. The number of the outer loops in the wavelet packet decomposition is not too many and the inner loops are more complicated, so the inner loop is paralleled. It means to utilize the multi-core processors to do the convolution and the alternation extraction of the different scales coefficients in parallel. Non-nested loop executes the threads in parallel showed in Figure 6.
5 The Compression Data Experiment To test the performance of parallel wavelet data compression, mass data decompression treatments by parallel wavelet compression have been made in this paper. Through changing the amount of raw data quantities, the effect of parallel processing to data can be analyzed step by step. Data compression method uses the threshold compression algorithm. After processing the threshold, record the non-zero coefficient position and the coefficient value by a bitmap compression method proposed in reference [11].Experiment platform used in this paper: Processor: Intel(R) Core(TM)2 Duo CPU T5750; Memory: 3.00GB. All data is the average of the five times time cost. 5.1 Experiment Plans (1) Comparison of the calculation performance of parallel and serial wavelet algorithm
A Parallel Wavelet Algorithm Based on Multi-core System
447
To compare the performance of algorithms, experiment takes the wavelet parallel and serial algorithm strategy mentioned in this paper respectively for data compres10 15 16 20 21 sion. Experiment processes five kinds of data quantity, such as2 、2 、2 、2 、2 , and uses the DB4 wavelet to analyze. Table 1shows the experiment results and the time unit is ms. Figure 7 shows the comparison of the performance of parallel and serial wavelet algorithm, described by speedup. The speedup =time-consuming of serial algorithm / time-consuming of parallel algorithm. (2) Comparison of the calculation performance of parallel and serial wavelet packet algorithm Parallel process uses the both nested and non-nested parallel strategy to compare with the serial algorithm respectively. Wavelet basis still takes DB4 wavelet basis to analyze. Figure 8 shows the comparison of the speedup of parallel and serial wavelet packet algorithm. The abscissa is the magnitude of the 2 power, the ordinate is speedup. Table 1. Time-consuming of DB4 wavelet data compression quantity 210 215
Time average(serial) 0.1919632 6.2669284
Time average(parallel) 0.4362786 5.340578
acceleration ratio 0.44 1.17
216 220 221
11.4965372 201.8578334 413.2011082
8.8223984 154.5957968 292.965784
1.29957 1.3057 1.4104
Table 2. Time-consuming of wavelet packet data compression quantity 210 215 216 220 221
Time average (serial) 0.4918208 17.2575582 36.5099106 606.60032 1224.56896
Time average (nested parallel ) 13.100061 13.2721102 25.2376276 335.491235 683.411787
Time average (non-nested parallel) 0.6591404 10.9800384 21.1781546 323.994865 658.941395
Table 3. The speedup of wavelet packet data compression quantity 210 215 216 220 221
acceleration ratio (nested parallel) 0.0375 1.30028 1.4466 1.808 1.7918
acceleration ratio (non-nested parallel) 0.746 1.57 1.724 1.872 1.858
448
X. Lu et al.
Fig. 7. The acceleration ratio of wavelet parallel
Fig. 8. The comparison of speedup of two wavelet packet parallel
5.2 Experiment Conclusions Experiment (1) and (2) show that the wavelet and wavelet packet algorithms based on the OpenMP have the great advantage in calculation speed. And this ascension of performance is growing with the increase of data. Relative to the serial algorithm, it has closed to 2 times of speed upgrade. And it is proportional to the number of processors. The data size is smaller, the serial computation gets the better performance which due to the inevitable overhead of parallel computing is bigger than the saving of the parallelism, and this issue will be gradually ignored with the development of massive data processing. The experiment (2) shows that the performance of the parallel nested loop is worse than the parallel non-nested loop, it dues to the inner loop still generate the threads just like the outer loop which had got the maximum parallel execution threads based on the number of processors. So the new thread cannot execute with its main thread in parallel. In addition, the process of generation and exiting the threads also takes some performance degradation and down the performance of the parallel program. If the number of the processors is more than two, the performance is better by allocating the threads dynamic and changing the way of allocating inner and outer threads.
6 Conclusions With the widespread application of single PC with multi-core processors, OpenMP technology applied in parallel wavelet transformation can significantly increase the speed of data processing and further promote the application of the wavelet transform in various fields. For example, it can satisfy the real-time requirement of the mass data processing caused by the rapid development of filtering device in power system and ensure the operation of power system safety, stabile and economic.
Acknowledgment The authors would like to thank National Nature Science Foundation of China (51007074), Program for New Century Excellent Talents in University (NECT-08-0825), Fok Ying Tung Education Fund (101060), Sichuan Province Distinguished Scholars Fund (07ZQ026-012) and Fundamental Research Funds for the Central Universities (SWJTU09ZT10) in China.
A Parallel Wavelet Algorithm Based on Multi-core System
449
References 1. Khanfir, S., Jemni, M., Braiek, E.B.: Parallelization of an image compression and decomposition algorithm based on 1D wavelet transformation Control. In: First International Symposium on Communications and Signal Processing, pp. 423–426 (2004) 2. You, J.B.: A wavelet-based coarse-to-fine image matching scheme in a parallel virtual machine environment. IEEE Transactions on Image Processing, 1547–1559 (2000) 3. Chaver, D., Prieto, M., Pinuel, L., Tirado, F.: Parallel wavelet transform for large scale image processing. In: Proceedings of International Parallel and Distributed Processing Symposium, IPDPS 2002, pp. 4–9 (2002) (Abstracts and CD-ROM 2002) 4. Cheng, Y.l., Li, Y., Zhao, R.c.: 8th International Conference on A Parallel Image Fusion Algorithm Based on Wavelet Packet Signal Processing (2006); Shi, R.E.: Empirical study of particle swarm optimization. In: Proc. of the 1999 Congress on Evolutionary Computation, pp. 1945–1950 (1999) 5. Jie, H.l.: Parallel image compression and decompression based on wavelet principle on multi-core cluster. Inner Mongolian University (2008) 6. Ling, Y.: Parallel wavelet analysis based on multi-core cluster. Master dissertation of Hei long jiang University (2005) 7. The writing group of the multi-core series of the textbooks: Multi-core programming. The Press of the Tsinghua University, Beijing (2007) 8. Michael, J., Quinn, M.P.I.: OpenMP parallel programming in C. Tsinghua University, Beijing (2004) 9. Zhiwei, H., Zhigang, L., Xiaofan, L., Dengdeng, Z.: A high-speed parallel wavelet algorithm based on CUDA and its application in the power system harmonic analysis. Electric Power Automation Equipment 30(1), 98–101 (2010) 10. Xun, G.: Application research of Multi-core parallel and design pattern on massive transient data processing and analysis in power system. Doctor Degree Dissertation of Southwest Jiao tong University (2009) 11. Jinfeng, S.: Study and implementation of electric power data compression, transmission and decompression algorithm. Master dissertation of Beijing University of Chemical Technology (2008)
Predicting Carbon Emission in an Environment Management System Manas Pathak and Xiaozhe Wang School of Management, La Trobe University, Melbourne, VIC 3086, Australia
Abstract. In this paper, we explore how sustainability management in an organization can be augmented with use of business intelligence (BI) systems. The important role BI has in helping organizations implement and supervise efforts towards sustainable practice is emphasized. Particularly the information planning phase in a BI project can provide a systematic way of defining relevant information in order to integrate it in reporting activities, and analysis of the data. We propose a replicable predictive model that seeks to support the process of analysis of environmental indicators in the implementation of an organizational strategy for sustainability. A case study based on a manufacturing company of packaging material is used to demonstrate our model on an Environment Management System. Keywords: Business Intelligence, Regression, Environment Management System.
1 Introduction In this research, we aim to explore how business intelligence (BI) tools, especially data mining methods can support the management of sustainability business practices in modern-day-firms. BI methods and tools have an important but as yet not well studied role to play in helping organizations implement and monitor sustainable and socially responsible business practices [1]. Analytics is based on availability of relevant data, and we highlight this importance of data availability in modeling and analysis, and how it can further an organization’s cause of sustainability. Organizations across business domain has invested heavily in setting up a platform of technological infrastructure that pledges support to business processes and increasing significantly overall efficiency of operational activities. BI has successfully established itself as the technological tool that promises to answer the ever growing demands of access to relevant information enabled with comprehensive use of information technology (IT) [1]. The role of BI is to create an informational environment in which operational data gathered from transactional systems and external sources can be analyzed, in order to reveal ‘‘strategic” business dimensions [2].
2 Backgrounds The concepts of sustainability and corporate social responsibility (CSR) have been among the most important topics to come forward over the last decade at the global D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 450–459, 2011. © Springer-Verlag Berlin Heidelberg 2011
Predicting Carbon Emission in an Environment Management System
451
level [3]. The ever growing significance of sustainability and CSR assessment within the multifaceted context of globalization, deregulation and privatization, where social, environmental and economic inequalities continue to increase [4]. In the backdrop of such contextual perspective, it becomes imperative for decision makers to account for not only increased sales and profits and/or decreased costs, but also sustainable development of the business itself and of the surrounding context. As a result, an emergent amount of business concerns all over the world have engaged themselves seriously in an effort to integrate sustainability into their business practices [5]. Despite such an imperative nature of organizational interest in and apprehension with inclusion of sustainability in their activities, an effective implementation has featured serious obstacles. Until now, many of the organizations have kept the issue of sustainability separated from considerations of business strategy and performance evaluation, areas that are often dominated by purely ‘‘economic” performance indicators. In addition to a definition of social and environmental indicators, the question of how they can be reported, visualized, and monitored is crucial in terms of value added to sustainability management, once transparency and accessibility of corporate information become important issues [6]. Further to that, the organizations who are taking upto the reporting requirements of the Global Reporting Initiatives (GRIs), need to know how to translate the corporate sustainability goals to regional or site level, or in that essence into the floor of production processes. This calls for the empowerment of the decision maker with better data modeling to enable improved analysis and reporting with the purpose of higher degree of insight. Across the globe, including Australia, industries are continually encouraged and aided to develop higher environmental performance, while promoting sustainable competitiveness and realizing better profitability. In Senator Robert Hill’s words, dual aims of excelling economically while protecting the environment can be realized with improving the environmental efficiency of business. What is pivotal in today’s date for an organization is being Eco-efficient [7]. An Organization must strategically work towards achieving higher production of goods and services while consuming lesser energy and fewer natural resources with controlled process waste and pollution [8]. Moreover the need for convergence of Industrial ecology and Environment Management System (EMS) is ever increasing in reinforcing a holistic approach to the issue of corporate environmental responsibility and eco-sustainability [9].
3 Environment Management System An EMS is a collection of internal organizational efforts at policy making, assessment, planning, and implementation of a system for the environment management. EMSs are systems of management processes that enable organizations to continually reduce their impact to the natural environment. EMS consist of an environmental policy, as well as a set of evaluation processes that require organizations to assess their environmental impacts, establish goals, implement environmental goals, monitor goal attainment, and undergo management review [10]. Environmental management as an integral part of business operations promises improved business profitability. To get a better understanding and insight of the impact of its business processes on environment, organizations need to capture the right indicators and its critical analysis is
452
M. Pathak and X. Wang
pivotal. Due to the strategic importance of environmental management for a business, simply calculating and reporting a number for compliance is of limited value. Organizations must go beyond calculation and Greenhouse Gas (GhG) inventories into the world of activity modeling if real understanding, leadership and goals are to be achieved. One must look ahead of sufficing for simplified compliance reporting. An EMS initiative can empower the organization to effectively and efficiently answer generic questions such as, “What is the primary cause of our emissions today? What activities are associated with the creation of those emissions?” This in turn will empower the organization to realize their environmental targets and in the process, these initiatives will present a sustainable competitive advantage to the organization. 3.1 Attributes of an EMS In general, an EMS has a set of desirable attributes: Improved and quality data capture with uniform data entry: The challenge that is presented before any environmental system is that of the interrelation of various components and processes and complexity stemming from various sources. Moreover the spatial and temporal nature of environmental data challenges data gathering plan and necessitates better data collection practice and procedures. Data integration and automation: Integration of relevant data from existing data sources from across the organization and enabling automated data collection at the source is pivotal for facilitating effective and efficient analysis. Good modeling tools should be multidimensional: Including context and analysis variables that will be used to analyze the key drivers of an emissions footprint. For example, one may want to see the scopes of emission sources broken down by different geopolitical entities. Pre and post analysis Data Visualization: Data visualization capabilities, both before and after the analysis, such as presenting the data in interactive histograms, pie charts to spot trends is equally important for effective analysis. Analysis and forecasting: A carbon management model is not complete without the ability to analyze, forecast and communicate the results. As mentioned earlier, a good model is multidimensional. Because we are using multidimensional modeling techniques, we can “slice” the information based on any of the dimensions we used in the model. Also significant is the use of the model results, combined with advanced forecasting ability, to predict where emissions will be in the future along with establishing correlation among various indicators, spotting trends and exceptions, identifying problem areas, etc. Integrated Timely Reporting: Another important aspect of the EMS solution is its capacity to enable generation of reports required to meet various local, national and international reporting regulation. Moreover, it has become important for organizations to publish annual report on sustainability for both internal users and external stakeholders.
Predicting Carbon Emission in an Environment Management System
453
Performance Management Dashboard and Environmental Scorecard: BI promises decision maker better and faster decision making by providing snapshots of the organization’s performance in different aspects of business. Corporate Environmental Portal: A dedicated portal for sustainability and environment performance can further the cause of environmental management. Such a portal can embed into itself apart from data presentation, various tools and model for analysis of the environmental and sustainability indicators. 3.2 Tools and Related Studies A large number of tools for assessing environmental impacts are available, such as Environmental Impact Assessment, Environmental Auditing, Life-Cycle Assessment (LCA), and Material Flow Analysis. The Australian National Packaging Covenant (ANPC) is a self-regulatory voluntary agreement between commonwealth, state/ territory and local government and all sectors of the packaging supply chain, are encouraging its signatories to perform a LCA of their products. There are many studies aimed at differentiating the relative contribution of various factors affecting energy usage or pollutant emission. Rose and Chen studied factors affecting activity levels and energy intensity in freight energy consumption for the period 1972–1985 using simple average Divisia method [11]. Torvanger selected five factors, including emission coefficient, energy intensity, fuel mix, production structure and international structure to decompose the industrial CO2 intensity in nine OECD countries [12]. Viguier studied various GhG emissions, and divided the change in emission intensity into four components, including emission factors, fuel mix, economic structure and energy intensity, and found that the main cause of high emission intensities is due to high-energy intensities of industries [13]. Nag and Parikh conducted a study for understanding the commercial energy consumption evolution patterns [14]. They tasted various factors such as income effect, electricity useefficiency and coal quality and their role in carbon emission intensity. Their study reported a positive relation for all the factors with the carbon emission intensity. The scopes of these studies identifying the deciding factor of CO2 intensity are at industry sector level or country level. There is little mention of such studies conducted at a micro level aimed at identifying factors influencing GhG emission. Our study aims at performing these analyses at the micro level, that is at product and industry level. The LCA data along with the EMS data for the respective product/process and industry will be considered. This will facilitate deeper insight, understanding and strategy formulation among corporate environmental decision maker.
4 Case Study As outlined in the literature review, organizational sustainability needs to be looked at with a holistic approach. The emphasis on conducting LCA of the products in assessing organizational environmental impact is growing. The new laws and signatories of various sustainability bodies mandate such an approach. This proposed case study is based on research involving a signatory organization of ANPC. The rational for the choice of organization is that the signatories of this covenant are working towards
454
M. Pathak and X. Wang
development and maintenance of product LCA data. The LCA data along with other indicator reported under National Greenhouse and Energy Reporting will be considered in identifying the important factors in predicting the GhG emission. Upon extensive review of the literature, suitable data modeling methodology must be identified for the concerned industry sector selected, and data mining procedure will be demonstrated. A 6 phases approach is designed in this case study, as shown in Fig. 1. The phases have been designed with close adherence to Yin’s (1994, 2009) recommendation for such type of case study based research work.
Case Study Setting
Data Collection
Data Processing
EMS strategy
Model Analysis
Modeling
Fig. 1. Case study framework with 6 phases
4.1 Cast Study Setting Our case study is based on the Australian Packaging Industry, which is chosen because it is a signatory of the Australian National Packaging Covenant. The issue of sustainable development is noteworthy in this industry sector, and such organizations are the flag bearer in valorization of the Global reporting Initiative for reporting corporate sustainability indicator. The data is collected from one of the leading manufacturing companies of packaging material in Australia. This company has a wide range of product lines ranging from plastic, fibre, metal and glass packaging products, along with packaging-related services operating in many countries worldwide and nearly 100 sites within Australia. This company has set certain goals to cater to their Corporate Environmental Responsibility (CER). For instance, GhG emission has a target of 10% reduction and water to landfill of 30% reduction by 2011. Upon identification of the organization for the study, formal communication were initiated with the management of the organization for the grant of various access levels to organizational data, documents, and personnel for successfully conducting the research. Interviews with the Environment and Sustainability Officer were aimed to be arranged for better understanding of the processes involved in the business of the organization. It is very important to perform data contextualization before proceeding with the phase of data collection and analysis. This ensures a seamless and uniform understanding of the data throughout all the phases of the research. For instance, it is always wise to define beforehand; the handling of environmental information will be directly comparable to the handling of cost information, which will also be expressed per component, per kilogram, and so forth. 4.2 Data Collection and Processing A secondary data, the company’s environmental data was collected, which is the data captured as part of the existing EMS. They are the time stamped data with three
Predicting Carbon Emission in an Environment Management System
455
financial years’ data over monthly intervals. They include energy type and quantity, waste and water disposal, landfill, production quantity, to name a few, from all the production sites of the company in Australia. The time period of the data to collect were initially aimed towards about five years of historic data. The final dataset used for this research is the EMS data captured for 68 sites over three financial years: 06-07, 07-08, and 08-09. There are 28 variables in total, including location information, time stamps, and 22 existing environmental variables, for instance, Coal, Electricity, onsite landfill, etc. Emission is not an entry in the original data set. As the research proposes to identify the important variable in prediction of GhG with convergence of EMS data and LCA data, careful steps need to be undertaken to ensure a valid dataset for performing a meaningful analysis. The appropriateness of the characteristic of the dataset will be important as we propose to perform different data mining procedures. The performance of the data mining procedure in effectively predicting depends on many aspects of the data set such as presence of missing values, identification and categorization of interval, categorical and ordinal variables. The environmental data set describes the consumption of various resources and other indicators in different states and individual sites within Australia. One of the most concerning fact about the data set is that it has a huge amount of missing data. It points towards absence of an organization wide data collection plan and standard. There are only two sites which have been capturing the entire indicator over the last three financial years. For this reason while developing the various models for demonstrating prediction model capabilities data pertaining to these sites were used. These are site specific models which will need careful consideration while extending to include region-wide or organization-wide data. But replicating the steps demonstrated in this study will ensure development of meaningful prediction model for any scope of prediction. In the data collected, many sites only have data for financial year 06-07, in absence of any other information it is presumed that these were sites which ceased to operate any more. Using simple procedure such as data characterization and visual exploration, only 40 sites were used for analysis in this study. We have also cleaned the data with erroneous data entry and unexpected values. As the analysis data set contains huge number of indicators, special care was taken to avoid “curse of dimensionality”. Variable that are redundant and irrelevant were excluded from the model. This was performed based on the review of the literature coupled with meetings with the representatives of the organization who are the process expert. Three main source of energy namely coal, diesel, and electricity are used in the manufacturing activities. Many sites also captured data for LP Gas and natural gas usages. There are some redundancy observed as well, such in case of Total Tonnes and Net which are apparently measures for the same indicator. To gain a better insight of the data, variable grouping was carried out wherein similar attributes were grouped together. Such as, project waste, waste and hazardous solid are grouped together as ‘waste’. Emission value is calculated in terms of Carbon Dioxide equivalent. The formula used for calculation of emission is based on energy source and has been the one prescribed by Environment Protection Authority for greenhouse gas equivalence calculation. That is, For 1 kilowatt-hours of electricity used equals to 0.718 metric Tons of carbon Dioxide Equivalent, For 1 gallons of gasoline 8.9 Metric Tons of carbon Dioxide Equivalent, For 1000 therms of natural gas 5 Metric Tons of carbon Dioxide Equivalent. Finally, 6 variables are derived as indicators in the EMS in
456
M. Pathak and X. Wang
this study. They are: Purchase, Waste, Recycling amount, Net production, Low emission product, landfill. The emission is calculated as an additional variable. Depending on the type of model, data input can affect its quality to certain extent. Regression model and most other statistical techniques are built on the assumption of normality, which makes regression models sensitive to extreme or outlying values in the input’s distribution. In case of stepwise regression, highly skewed inputs might be selected over other inputs which will bias the model. We have used Log(X/ (N - X)) which helps to reduce the skewness for the input variables. The data input to a data mining algorithm is most commonly a single 2-dimentional table. To fit in the modeling tasks, we segregated the raw data into various tables. Unlike the decision tree that has the ability to handle missing values in either numeric or categorical input fields by simply considering null to be a possible value with its own branch, regression models are adverse to presence of missing values in the dataset. So, the missing values in the dataset were imputed using ‘single value substitution’ before executing the models. 4.3 Modeling and Analysis We used Regression models in developing predictive model for four different analyses. The predictive model building process was completed using the SAS Enterprise Miner. Model 1 - Prediction of emission (estimate) for current financial year based on historical data of emission. This is a simple model that aims to give a rough estimate of current financial year emission based on historical data of previous year’s emission values. The model uses two input variable f1emission and f2emission of all 40 sites and an interval target variable f3emission. The dataset is divided into training partition (70%) and validation partition (30%). The least squares regression is used, and the equation ŷ=b0+b1x is the line for which the sum of squared residuals ∑(yi- ŷi)² is a minimum. Model 2 - Prediction of whether current financial year emission target will be achieved. This model intend to predict the binary target variable “current year emission target”, which states if the current year emission will be less or more than target set for the years emission. For that purpose, a binary variable target was included in the training dataset. This has been calculated based on a simple rule, that is, if average of emission for f1 and f2 is greater than f3, we set the binary target variable as 0 (that is, 0 states emission will be less than the previous years) and if average of f1 and f2 are greater than f3, it was set to 1 (that is, 1 states emission will be more than the previous years). The model developed used logistic regression with stepwise model selection with validation error. In presence of various indicators, which we expect in terms of sustainability data, we can check the model to identify any significant indicator with predictive significance. But an important aspect here is to adhere to parsimonious selection of model whereby simplicity is chosen over complexity (with insignificant gain, keep the model simple with inclusion of limited variables). In this model, we observed that the iteration stopped at step zero as no additional effect met the significance level of 0.05 to be fit to enter the model. The average square error in the validation data for this regression model is 0.25, which must be compared when more than one model are to be developed.
Predicting Carbon Emission in an Environment Management System
457
Model 3 - Identification of predictive significance of different indicators in emission prediction. As outlined earlier, as a word of caution, parsimonious approach while building a predictive model is more effective. In this model, the target variable is the current year’s emission, which is an interval variable. The data set for this model includes different variables. Upon availability of more indicators, we can always extend the dataset to accommodate more input variables. As shown in Fig. 2, the model remained stable after the introduction of the second variable, and remains the same till introduction of the fifth variable into the model. So, the model for prediction is better to avoid inclusion of the remaining variable as it does not improve predictive capability.
Fig. 2. Average square error plot in model iteration
The model selected is the model trained in the last step (step 4) and it has included four variables into it upon performance of stepwise regression. As shown in Table 1, we may conclude that net production level has a direct and most significant impact on the emission output. The model also highlighted that landfill as also significant. Such result can help decision maker to identify what is the underlying factor. When dealing with large dataset with many variables, such an approach enables inclusion of the best indicators into the model. Table 1. Summary of stepwise selection Step 1st 2nd 3rd 4th
Variable Net Production Landfill Low emission product Waste
F Value Pr > F 233.02 <.0001 44.31 <.0001 12.41 0.0006 5.44 0.2414
Some of the fit statistic of the model is quite high which points towards inferior quality of the model, which is mainly due to low availability and quality of the dataset at hand. Therefore it is pivotal to establish a standard mechanism in the organization for capturing quality data in order to facilitate better analysis.
458
M. Pathak and X. Wang
Model 4 - Comparison of predictive significance of different indicators over different financial year to observe change in any significance within. In order to get a better insight into the data, and find out if the significance of the variables identified by the model had varied in different financial year. Such investigation may lead to even further and better understanding of the reason for what might be causing such variability in significance of variables in predicting emission. The dataset was divided into three data sets for three financial years. Each data has the same input variables as used in Model 3. For example, we investigated to see if net production has been significant consistently throughout different time period. Table 2. Most significant variable in three financial years Financial year 1st 2nd 3rd
Most significant variable Landfill Net Production Net Production
As shown in Table 2, interestingly, landfill has been identified as the most significant, although net production is still the top driven variable in modeling in the second and third years. Such results can provide a deeper insight into the data.
5 Discussions and Conclusion The importance of BI and BI tools in formulation of sustainability has been acknowledged and highlighted. The importance of data availability and modeling to get a better insight thereby enabling better decision making has been established, but there has been challenges leading to limiting the scope of the research work. The availability of the data is subject to the authorization of the management at the respective organization. This can be mitigated by initiating liaison with the respective contact at the organization. Due to the fact that the domain of environmental management is new, the environmental data available was very limited. A dataset with erroneous data and missing value needs to be approached with suitable method to mitigate its effect. A careful pre-processing activity design is pivotal. Pre-processing of the data via visual exploration and statistical analysis were carried out to identify any significant issues. If the LCA data is not available from the organization, it can be sourced from ecoinvent v2 database. This can possibly lead to rise in the project cost. Since the organization in this study did not provide LCA data, the model was developed without them. This limits the scope of the research work but again as mentioned before, on availability of such indicator they can be included into the model for better prediction. Furthermore, the LCA data which has been adjusted as an important aspect in assessing environmental impact of an organization’s business, which should be included in the future research. In an event leading to unavailability of access to the same software package, suitable open source software can be used. The modeling is a tedious task, but it is a onetime initiatives and worth undertaking. The demonstration of the predictive model building in the case study provides insight into such
Predicting Carbon Emission in an Environment Management System
459
modeling initiatives and assessment of these models. The model can be replicated and even can be easily extended to facilitate accommodation of more indicators. This study intended to augment our understanding of how BI initiatives can be directed to enhance sustainability strategy formulation, particularly using of data mining techniques in the first phase of sustainability strategy formulation. Develop a climate strategy which includes mainly the sub phases as assess emission profile, gauge risk and opportunity, evaluate action option and set goals and targets. Translating the overall corporate sustainability goals to the unit level or process level is pivotal, and empowering the decision maker to gain better insight into the data will make realization of such translation feasible.
References 1. Petrini, M., Pozzebon, M.: Managing sustainability with the support of business intelligence: Integrating socio-environmental indicators and organizational context. Journal of Strategic Information Systems 18, 178–191 (2009) 2. Turban, E., Arsonson, J., Liang, T., Sharda, R.: Business Intelligence, A Managerial Approach. Pearson Prentice Hall, Upper Saddle River (2008) 3. Subramanian, R., Talbot, B., Gupta, S.: An Approach to Integrating Environmental Considerations Within Managerial Decision-Making. Journal of Industrial Ecology 14(3), 378–398 (2010) 4. Raynard, P., Forstarter, M.: Corporate social responsibility: implications for small and medium enterprises in developing countries. United Nations Industrial Development Organization, Viena (2002), http://www.unido.org/doc/44511 5. Jones, C.: Tiering in environmental assessment, can it be delivered? In: Annual Meeting of the International Association for Impact Assessment, Morocco (June 14-20, 2003) 6. Clarkson, M.B.E.: A stakeholder framework for analyzing and evaluating corporate social performance. Academy of Management Review 20, 65–91 (1995) 7. Boonekamp, P.G.M.: Energy and emission monitoring for policy use trend analysis with reconstructed energy balances. Energy Policy 32, 969–988 (2004) 8. Chang, Y.C., Chang, N.B.: The design of a web-based decision support system for the sustainable management of urban river system. Water Science Technology 46(6), 131–139 (2002) 9. Gentil, E., Christensen, T.H.: Greenhouse gas accounting and waste management. Waste Management & Research 29, 696–702 (2009) 10. Lamprecht, J.L.: ISO 14001: Issues and Implementation Guidelines for Responsible Management. Amacom, New York (1997) 11. Rose, A., Chen, C.Y.: Sources of change in energy use in the US economy, 1972–1982. Resources and Energy 13, 1–21 (1991) 12. Torvanger, A.: Manufacturing sector carbon dioxide emissions in nine OECD countries. Energy Economics 13, 168–186 (1991) 13. Viguier, L.: Emissions of SO2, NOx and CO2 in transition economics, emission inventories and Divisia index analysis. The Energy Journal 20, 59–87 (1999) 14. Nag, B., Parikh, J.: Indicators of carbon emission intensity from commercial energy use in India. Energy Economics 22, 441–461 (2000)
Classification of Pulmonary Nodules Using Neural Network Ensemble Hui Chen*, Wenfang Wu, Hong Xia, Jing Du, Miao Yang, and Binrong Ma School of Biomedical Engineering, Capital Medical University, Beijing 100069, China [email protected]
Abstract. A neural network ensemble (NNE) scheme was designed for distinguishing probably benign, uncertain and probably malignant lung nodules on thin-section CT images. To construct the NNE scheme, a multilayer neural network with the back-propagation algorithm (BPNN), a radial basis probabilistic neural network (RBPNN) and a learning vector quantization neural network (LVQNN) were employed, and the Bayesian criterion was used as combination rule to integrate the outputs of individual neural networks. Experimental results illustrated that the proposed classification scheme had higher classification accuracy (78.7%) as compared to the best individual classifier (LVQNN: 68.1%), as well as to another NNE scheme with the same individual neural networks but with majority voting combination rule. Keywords: Neural network ensemble, Probability, Classification, Lung nodule.
1 Introduction Several investigations have shown that early detection of lung cancer may allow more timely therapeutic intervention and thus a more favorable prognosis for the patient [1, 2]. At present, CT imaging is regarded as the most effective means for the early detection of asymptomatic lung cancer. CT imaging technology has provided increasingly clear images, however, it remains difficult for radiologists to distinguish benign from malignant lesions, and the interpretation of a large number of CT images is a time-consuming task for radiologists. Therefore, many investigators are developing computer aided diagnostic schemes to facilitate the differentiation between benign and malignant lung nodules on CT images [3]. More recent studies have investigated the potential of pattern recognition techniques (neural networks, linear discriminators, etc.) for improving the diagnostic accuracy of lung cancer [4, 5]. In this study, discrimination of probably benign, uncertain and probably malignant lung nodules based on texture analysis of thin section CT images was realized by using our proposed Bayesian criterion based NNE, which consisted of a BPNN, a RBPNN and an LVQNN. The study aims also to provide machine learning based decision support system for contributing to the doctors in their diagnosis decisions. *
Corresponding author.
D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 460–466, 2011. © Springer-Verlag Berlin Heidelberg 2011
Classification of Pulmonary Nodules Using Neural Network Ensemble
461
2 Bayesian Criterion-Based Neural Network Ensemble 2.1 The Component Neural Networks For a labeled pattern classification problem, let c is the number of all possible patterns, and n is the number of features upon which the classifier is built. Back-propagation Algorithm-based Multilayer Neural Network. The structure of a three-layer BPNN consists of one input layer with n neurons, one hidden layer with m neurons, and one output layer with c neurons. The hidden layer neurons and the output layer neurons use nonlinear sigmoid activation functions and linear activation functions respectively. The neural network is trained under the back-propagation algorithm. For a two-pattern classification problem, the initial outputs of the two output neurons were transformed into probabilities simply by dividing each output by the sum of them. Radial Basis Probabilistic Neural Network. The construction of a RBPNN involves four different layers: one input layer, one pattern layer, one summation layer and one output layer. The input layer consists of c neurons which represent the input feature vector. The pattern layer (radial basis layer) consists of c center vectors, responding to the c classes, determined by input training sample set. It generally employs Gaussian radial basis function as its activation function. The summation layer, generally having c neurons, selectively sums the outputs of the pattern layer. In general, the weights between the pattern layer and the summation layer are constants. That is, these weights are set as fixed values and do not require learning. The last layer is the output layer, whose neurons output 1 or 0 after competition. In the summation layer, the posterior probabilistic density was calculated for each class according to the output of the pattern layer, and the summation of probability for each class was calculated in the summation layer. Therefore, P(x/ωi), the conditional probability of the class ωi is calculated by dividing the output of the ith neuron of the summation layer by the size of the center vector it is residing in. For a two-pattern classification problem, i.e. c=2, given that the prior-probabilities of each class are equal, then the output of the two neurons in the output layer can be defined as equation (1) in the form of probability. p1 =
P( x ω1 ) P( x ω1 ) + P( x ω2 )
p2 =
P( x ω2 ) P( x ω1 ) + P( x ω2 )
(1)
Neural Network with Learning Vector Quantization Algorithm. The structure of a LVQNN has simply three layers. In the hidden layer (Kohonen layer), each neuron is assigned a reference vector after being trained by Kohonen’s competitive learning rules, and all the neurons are partitioned into c groups responding to c classes. In output layer, each neuron is associated with a certain group of neurons in the Kohonen layer, and the output is 1 if its associated Kohonen neuron group wins.
462
H. Chen et al.
In order to transform the output of the LVQNN into the value of probability, a distance-probability transformation was adopted. The transformation equation is shown in (2): ⎧1 ⎪ 2⎤ P ( x ωi ) = ⎨ ⎡ i 2 d maxi − d mini ⎥ ⎪exp ⎢− 2 d i − d min ⎦ ⎩ ⎣
(
) (
)
if d i < d mini else
(2)
where di is the minimum distance among distances between the input vector x and all reference vectors in the Kohonen layer, and dmaxi and dmini are the maximum distance and minimum distance among distances between input vectors of training sample set of class ωi and the reference vectors of class ωi. For a two-pattern classification problem, i.e. c=2, i should be 1 or 2 in (2), and the outputs of the LVQNN in the form of probability, p1 and p2, are calculated according to (1). 2.2 Combination of the Outputs of the Individual Neural Networks In our proposed NNE, the individual neural networks’ outputs were generated in form of Bayesian probability, so the Bayesian criterion could be put on the weighted sum of these probabilities to make a classification decision. The weight of the probability output by individual neural network is determined by the accuracy of that neural network when performing classification task by itself. In medical diagnosis, since Youden’s index[6] is a frequently used summary measure for diagnostic accuracy, we normalized the Youden’s index of each individual neural network as the weight w for that neural network’s output. For a two-pattern (ω1 and ω2) classification problem and the unknown pattern x, the ensemble’s outputs are the weighted sums of the outputs of the three component neural networks pj1 and pj2 (j=1, 2, 3):
P(ω1 x ) = ∑ w j p j1, P(ω2 x ) = ∑ w j p j 2
(3)
If P(ω1/x)> P(ω2/x), according to the Bayesian criterion, the unknown pattern x is then classified as class ω1, otherwise x is classified as class ω2.
3 Experiment and Results The experiment was a three-category classification task, that is the discrimination of probably benign, uncertain and probably malignant lung nodules on CT scans. 3.1 Data Source The sample CT images (with slice thickness of 2.5 mm or 5 mm) came from LIDC [7], as shown in Fig. 1. Radiologists subjective assessed the likelihood of malignancy of each nodule >3 mm as highly unlikely for cancer, moderately unlikely for cancer, indeterminate likelihood, moderately suspicious for cancer, and highly suspicious for cancer. As the numbers (3 and 6, respectively) of nodules of the first two categories
Classification of Pulmonary Nodules Using Neural Network Ensemble
463
Fig. 1. Two sample thin-section lung CT images with three nodules from LIDC
were small, the total 9 nodules were incorporated into the category of probably benign (PB) nodules. In consideration of the category symmetry, nodules of the last two categories were incorporated into probably malignant (PM) nodules. Thus, the nodules used in this study were of one of the following categories: PB nodules, UC (uncertain for cancer) nodules and PM nodules, with 9, 14 and 24 nodules respectively. 3.2 Feature Extraction Before feature extraction, the nodule on CT image was segmented by using Canny’s edge detection technique after the location of a nodule was identified by a radiologist. A series of image features were then determined from the outline and through image analysis of the regions inside and outside of the segmented nodules. Fifty among them were selected which were statistically different among the three types of the nodules by using the nonparametric Kruskal-Wallis test, including five gray-level features (i.e. quartile of CT values, skewness of histogram, smoothness, uniformity and entropy), four texture features based on the gray-level co-occurrence matrix (i.e. energy, contrast, homogeneity, and correlation), and six shape and size features (i.e. area, diameter, form factor, solidity, circularity, and rectangularity). 3.3 Design of the Classification Scheme To create a three-class classifier (classify a nodule as a PB nodule, a UC nodule, or a PM nodule), a two-layer neural network ensemble scheme (as shown in Fig. 2.) was proposed, each acting as a two-class classifier. In the first layer of the scheme, Layer1, an NNE (named NNE1 in the figure) was applied to feature subset 1 to differentiate UC nodules from non-UC nodules. In the second layer, Layer2, NNE2 was the second ensemble that trained separately from the first one. Similarly, it was applied to feature subset 2 to differentiate those non-UC nodules between PB nodules and PM nodules. The training sample nodules for NNE1 classifier were 14 UC nodules and 33 nonUC nodules (i.e. 9 PB nodules and 24 PM nodules), and those for NNE2 classifier were 9 PB nodules and 24 PM nodules. Both NNE-based classifiers were trained separately, and validated using leave-one-out validation.
464
H. Chen et al.
Fig. 2. Schematic diagram of the two-layer neural network ensemble system for the classification of lung nodules on thin-section CT images
3.4 Experimental Classification Results Classification Results for the First Layer. Table 1. depicts the performance of individual classifiers and neural network ensemble (the first three columns and the fourth column of Table 1. respectively) in the first layer of the overall NNE scheme. The normalized Youden’s indices of the BPNN, RBPNN and LVQNN classifiers used in (3) were 0.21, 0.37 and 0.42, respectively. In order to compare the performance of the probabilistic NNE (prob-NNE) proposed in this study and that of NNE with majority voting combination rule(MV-NNE), the classification performance of MV-NNE are listed in the last column in Table 1. as well. Classification Results for the Second Layer. Table 2. depicts the performance of individual classifiers and two NNEs in the second layer of the NNE scheme. Table 1. Accuracy of individual classifiers and two NNEs in the first layer of the classifycation scheme using leave-one-out validation (%)
UC non-UC Total
BPNN 57.1 75.8 70.2
RBPNN 85.7 72.7 76.6
LVQNN 85.7 78.8 80.9
prob-NNE 85.7 84.8 85.1
MV-NNE 85.7 78.7 80.9
Table 2. Accuracy of individual classifiers and two NNEs in the second layer of the classification scheme using leave-one-out validation (%)
PM PB Total
BPNN 77.8 79.2 78.8
RBPNN 44.4 79.2 69.7
LVQNN 55.6 83.3 75.8
prob-NNE 55.6 95.8 84.8
MV-NNE 66.7 87.5 81.8
Classification of Pulmonary Nodules Using Neural Network Ensemble
465
Table 3. Accuracy of individual neural networks and neural network ensembles for the overall three-class problem (%)
PB UC PM Total
BPNN 44.4 57.1 75.0 63.8
RBPNN 11.1 85.7 75.0 66.0
LVQNN 11.1 85.7 79.2 68.1
prob-NNE 22.2 85.7 95.8 78.7
MV-NNE 22.2 85.7 87.5 74.5
The normalized Youden’s indices of the BPNN, RBPNN and LVQNN classifiers were 0.48, 0.20 and 0.32, respectively. Classification Results for the Overall NNE Scheme. Table 3. depicts the performance of individual classifiers and two NNE schemes for the overall three-class problem. The probabilistic NNE proposed in this study gave a correct classification rate of 78.7% (it classified 2 of 9 PB nodules as PB nodules, 12 of 14 UC nodules as UC nodules, and 23 of 24 PM nodules as PM nodules).
4 Discussion and Conclusion With respects to the tasks of classification or pattern recognition, BPNN, RBPNN and LVQNN were considered to have higher performance than other neural networks [8-10], and theoretic investigations [11] shown that the larger the difference among the individual neural networks, the well the NNE perform. So we combined these three neural networks into a NNE for a certain classification task in this study. A number of investigations [12-14] that focused on making medical decisions on the basis of NNEs have shown that combining many classifiers into one classification system bring important benefits and result in a significant improvement to accuracy as compared with the best single classifier. The similar conclusion was drawn from our study. The NNE classifier was able to classify 78.7% of the sample nodules into correct classes in the leave-one-out validation, whereas the three individual neural networks could just classify 63.8%, 66.0%, and 68.1% of the samples into correct classes, respectively. In this study, we proposed a novel NNE classifier where Bayesian criterion was introduced. The outputs of the individual neural networks were transformed into Bayesian probabilities, and then the Bayesian criterion was put on the weighted sum of these probabilities to make a classification decision. The “weights” were assigned on the basis of the classification performance of individual neural networks. For the given task of classifying lung nodules into three classes, it can be seen from Table 3. that the classification performance of the proposed NNE scheme is a little higher than that of the NNE with the majority voting combination rule. Additionally, in terms of clinical use, the diagnostic performance of our NNE based classification scheme was similar to that of senior radiologists for classifying lung nodules on CT images [15], acting as a “second reader” in assisting radiologists in their diagnosis.
466
H. Chen et al.
References 1. Flehinger, B.J., Kimmel, M., Melamed, M.R.: The Effect of Surgical Treatment on Survival from Early Lung Cancer: Implication for Screening. Chest 101, 1013–1018 (1992) 2. Mittinen, O.S.: Screening for Lung Cancer. Radiol. Clin. N. Am. 38, 479–496 (2000) 3. Li, Q., Li, F., Suzuki, K., Shiraishi, J., Abe, H., Engelmann, R., Nie, Y.K., MacMahon, H., Doi, K.: Computer-aided Diagnosis in Thoracic CT. Semin. Ultrasound CT 26, 357–363 (2005) 4. Matsuki, Y., Nakamura, K., Watanabe, H., Aoki, T., Nakata, H., Katsuragawa, S., Doi, K.: Usefulness of an Artificial Neural Network for Differentiating Benign from Malignant Pulmonary Nodules on High Resolution CT: Evaluation with Receiver Operating Characteristic Analysis. Am. J. Roentgenol. 178, 657–663 (2002) 5. Suzuki, K., Li, F., Sone, S., Doi, K.: Computer-aided Diagnostic Scheme for Distinction between Benign and Malignant Nodules in Thoracic Low-dose CT by Use of Massive Training Artificial Neural Network. IEEE T. Med. Imaging. 24, 1138–1150 (2005) 6. Zhou, X., Obuchowski, N.A., Mcclish, D.K.: Statistical Methods in Diagnostic Medicine. John Wiley & Sons, New York (2002) 7. National Biomedical Imaging Archive, https://imaging.nci.nih.gov/ncia 8. Er, O., Temurtas, F.: A Study on Chronic Obstructive Pulmonary Disease Diagnosis Using Multilayer Neural Networks. J. Med. Sys. 32, 429–432 (2008) 9. Serpen, G., Tekkedil, D.K., Orra, M.: A Knowledge-based Artificial Neural Network Classifier for Pulmonary Embolism Diagnosis. Comput. Biol. Med. 38, 204–220 (2008) 10. Delen, D., Walker, G., Kadam, A.: Predicting Breast Cancer Survivability: a Comparison of Three Data Mining Methods. Artif. Intell. Med. 34, 113–127 (2005) 11. Krogh, A., Vedelsby, J.: Neural Network Ensembles, Cross Validation and Active Learning. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.) Advances in Neural Information Pro-cessing Systems, pp. 231–238. MIT Press, Cambridge (1995) 12. Das, R., Turkoglu, I., Sengur, A.: Diagnosis of Valvular Heart Disease Through Neural Networks Ensembles. Comput. Meth. Prog. Bio. 93, 185–191 (2009) 13. Osowski, S., Markiewicz, T., Hoai, L.T.: Recognition and Classification System of Arrhythmia Using Ensemble of Neural Networks. Measurement 41, 610–617 (2008) 14. Zhou, Z.H., Jiang, Y., Yang, Y.B., Chen, S.F.: Lung Cancer Cell Identification Based on Artificial Neural Network Ensembles. Artif. Intell. Med. 24, 25–36 (2002) 15. Chen, H., Xu, Y., Ma, Y.J., Ma, B.R.: Neural Network Ensemble-based Computer-aided Diagnosis for Differentiation of Lung Nodules on CT Images: Clinical Evaluation. Acad. Radiol. 17, 595–602 (2010)
Combined Three Feature Selection Mechanisms with LVQ Neural Network for Colon Cancer Diagnosis Tianlei Zang, Dayun Zou, Fei Huang, and Ning Shen School of Electrical Engineering, Southwest Jiaotong University, 610031, Sichuan Province, China [email protected], {zoudayun1986630,huangfei_87}@163.com, [email protected]
Abstract. In this paper, a novel two-stage classification method for colon cancer diagnosis based on gene expression data is introduced, which combine three feature selection mechanisms with Learning Vector Quantization Neural Network (LVQNN). The first stage is to select effective informative gene based on Bhattacharyya distance, pairwise redundancy analysis (PRA) and principal component analysis (PCA) for dimension reduction and feature extraction. In the second stage, LVQ Neural Network is employed to construct a cancer data classifier. To show the validity of the method presented, the gene expression profile data set of colon cancer was used for classifying. The experimental results show that the proposed method can effectively identify colon cancer. Compared with other three neural network methods, LVQ Neural Network has the best classification effect. Keywords: cancer diagnosis, pairwise redundancy analysis, principal component analysis, Learning Vector Quantization Neural Network.
1 Introduction The gene expression profiles from particular microarray experiments have been used for cancer diagnosis in recent years [1,2]. Usually, the classification of gene expression data requires two steps: feature selection and data classification. To reduce the dimensionality of data, a large number of feature selection methods have been proposed, which include Signal-to-Noise [2], Bhattacharyya distance [3], Principal Components Analysis (PCA) [4,5], correlation analysis[6] etc. Meanwhile, to find more accurate classifier, many methods have been used to approach this problem, such as clustering [1], Artificial Neural Networks (ANN) [7-10], Support Vector Machines (SVM) [11], and K Nearest Neighbor [12]. Although, in order to improve the accuracy and robustness of classification, there appeared a lot of new methods, the problem is not fundamentally solved. In this paper, a novel two-stage classification method for colon cancer diagnosis based on gene expression data is proposed. In the first stage, the input feature vector dimension is reduced by using Bhattacharyya distance, pairwise redundancy analysis (PRA) algorithm and principal component analysis (PCA). In stage two, Learning Vector Quantization Neural Network (LVQNN) is used as the classifier. Figure 1 D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 467–474, 2011. © Springer-Verlag Berlin Heidelberg 2011
468
T. Zang et al.
Fig. 1. The structure of the novel two-stage classification method
shows the structure of two-stage classification method for colon cancer diagnosis and describes the role of each algorithm. This paper is organized as follows: In section 2, the feature selection method based on Bhattacharyya distance, pairwise redundancy analysis and principal component analysis are described. Section 3 presents a brief introduction to LVQNN. Experimental results and analytical work are discussed in section 4. Finally, the conclusion is given in section 5.
2 The Approach of Feature Gene Selection 2.1 Filtering Out Irrelevant Categories Genes Using Bhattacharyya Distance Since only a small number of genes have relevance with a particular phenotype of samples, most of the remaining genes are not related with the phenotype, or can be considered as "noise genes". In order to select effective samples characteristics for classification, the Bhattacharyya distance [3] was utilized to give a measure of how much sample classified information genes containing for some tumor type. The Bhattacharyya distance, which takes into account both the mean differences and variance differences in different categories, reflects the properties distribution’ difference in two different samples. The specific model is as equation (1).
B( g ) =
( μ + ( g ) − μ − ( g )) 2 1 ⎛ σ +2 ( g ) + σ −2 ( g ) ⎞ + ln ⎜ ⎟. 4(σ +2 ( g ) + σ −2 ( g )) 2 ⎝ 2σ + ( g )σ − ( g ) ⎠
(1)
Combined Three Feature Selection Mechanisms with LVQ Neural Network
469
where μ+ and μ− are expression levels’ mean of the gene g in different samples,
σ + and σ − are corresponding standard deviation. And the larger some gene Bhattacharyya distance, which means the gene expression levels differences in two samples are greater, the stronger classification ability. According to the equation (1), each gene Bhattacharyya distance could be obtained.
2.2 Removing Strong Correlation Redundant Genes From the aspect of classification, the remanding genes we have in above paragraphs can be taken as classify characteristic genes. However, there may exist redundancy genes in these genes, which will not affect the sample-classify ability of the whole group of classify characteristic genes. Therefore, pairwise redundancy analysis (PRA) algorithm [13] was taken into excluding the redundancy genes. In detail, the relevant coefficient of any two genes was calculated with equation (2) firstly, then if the relevant coefficient is greater than the threshold value, these two genes are considered as strong-relevant and the gene with smaller index of classify information will be removed. After these operations, the remaining genes have greater index of classify information. The specific model of the relevant coefficient is as equation (2). n
Corr _ Coef ( gi , g j ) =
∑ (x k =1
n
∑ (x k =1
gik
gik
− xgi )( xgik − xgj )
− xgi )
.
n
2
∑ (x k =1
gik
− xgj )
(2)
2
where xgik , xgjk are the average expression level of the gene gi , g j in the k sample of training set, and xgi , xgj are the average expression level of the gene gi , g j in the all samples.The workflow of pairwise redundancy analysis algorithm is shown as Fig. 2. 2.3 Extracting Principal Component Based on Principal Component Analysis After the above data processing, the dimension of the gene expression data is still too high, so the dimension reduction needs to be done further in order to use fewer comprehensive indexes to replace the previous lager amount of indexes. These comprehensive indexes should contain as much information reflected by the previous lager amount of indexes as possible, meanwhile they should be independent with each other. Principal Component Analysis (PCA) [4,5] is a statistical analysis method, which extracts a few composite indicators from the previous lager amount of variables. In order to extract principal component from the matrix M s , three steps are taken:
Step1. After eliminating the impact of dimension, the normalized matrix M b was obtained, then the correlation coefficient matrix of M b could be calculated with equation (3):
470
T. Zang et al.
Fig. 2. The workflow of pairwise redundancy analysis algorithm
R (i, j ) =
cov(i, j ) . cov(i, i ) cov( j , j )
(3)
Step2. For the correlation coefficient matrix R , the Jacobian approach can be used to solve the characteristic equation ( R - λ I = 0 ). Non-negative eigenvalues
( λ1 > λ2 > L > λ p ≥ 0 ) and corresponding eigenvectors ( vi = (vi1 , vi 2 , K, vi p ) ) to λi will be accessed, also these eigenvectors satisfy the following relations. p ⎧1, i = j vi v j = ∑ vik v jk = ⎨ , i, j = 1, 2, K , p . k =1 ⎩0, i ≠ j
(4)
Step3. Select the principal components and make them reserve the original genes’ information.
3 Learning Vector Quantization Neural Network (LVQNN) LVQ neural network [14] combines the competitive learning with supervised learning and it can realize nonlinear classification effectively. In this paper, the basic LVQ
Combined Three Feature Selection Mechanisms with LVQ Neural Network
471
Fig. 3. Learning Vector Quantization neural network architecture
neural network classifier (LVQ1) is chosen for colon cancer data classification. The LVQ neural network is composed of three layers of neurons, that is, an input layer, a hidden unsupervised competitive layer and a supervised linear output layer, shown as Fig. 3. In order to classify an input vector, it must be compared with all prototypes. The Euclidean distance metric is used to select the closest vector to the input vector. The input vector is classified to the same class as the nearest prototype. In the hidden layer, only the winning neuron has an input of one and other neurons have outputs of zero. The weight vectors of the hidden layer neurons are the prototypes. The number of the hidden neurons is defined before training and it depends on the complexity of the input-output relationship [15]. The learning phase starts by initiating the weight vectors of neurons in hidden layer. The input vectors are presented to the network in turn. For each input vector X j , the weight vector Wi* of a winning neuron i is adjusted. The winning neuron is chosen according to:
|| X j − Wi* ||= min || X j − Wi || .
(5)
The weight vector Wi* of the winning neuron is updated as follows: If X j and Wi* belong to the same class, then
Wi* (t + 1) = Wi* (t ) + η (t )( X j − Wi* (t )) .
(6)
If X j and Wi* do not belong to the same class, then
Wi* (t + 1) = Wi* (t ) − η (t )( X j − Wi* (t )) .
(7)
The weight vectors of other neurons keep constant.
Wk (t + 1) = Wk (t ) . Where 0 ≤ η (t ) ≤ 1 is the learning rate.
(8)
472
T. Zang et al.
The training algorithm is stopped after reaching a pre-specified error limit. After training, the LVQ neural network should be tested. During the test phase, LVQ classifies an input vector by assigning it to the same class as the output unit has its weight vector closest to the input vector.
4 Calculation Example Analysis The experimental data is from the gene expression profile data set of colon cancer announced by Alon [1]. There are 40 colon cancer tissue samples and 22 normal tissue samples in the data set. Every sample includes 2000 gene expression data. The Bhattacharyya Distance of every gene has been calculated according to equation (1), and the histogram of genes’ Bhattacharyya Distance distribution is shown as Fig. 4. 1000
Number of Genes
800 600 400 200 0
0
0.2
0.4 0.6 Bhattacharyya Distance
0.8
1
Fig. 4. The histogram of genes’ Bhattacharyya Distance distribution
In this paper, 1800 genes, the Bhattacharyya Distance of which are smaller than 0.1, are taken as unrelated genes. So after removed the unrelated genes, the remaining 200 genes are reserved as the basis for further analysis. Next, when the threshold was set to 0.5, 27 candidate genes could be obtained after redundancy elimination. And then using principal component analysis to pick up feature genes, 8 feature genes are reserved. The 8 feature genes can compose 28=256 kinds of gene combination. And every gene combination is called one feature subset. With the pure traversal searching algorithm used in the space constituted by feature subset in this paper, the optimization feature subset which has the best classification ability can be got. The process in detail is described as follow: Four kinds of neural networks are employed to classify the samples using these feature subsets, and the feature subset with a minimum error is chosen as a set of
Combined Three Feature Selection Mechanisms with LVQ Neural Network
473
informative genes. The experiments are based on the Matlab Neural Network Toolbox, in which the experimental parameters are all the default values. After that, the set of informative genes which include 5 genes has the best classification ability. The labels and function descriptions of the 5 informative genes are shown in Table 1. Table 1. The labels and function descriptions of the informative genes No. Gene labels 1 R72300 2 H40560 3 T51539 4 X54941 5 X56597
Function descriptions INITIATION FACTOR 5A (Homo sapiens) THIOREDOXIN (HUMAN) HEPATOCYTE GROWTH FACTOR-LIKE PROTEIN PRECURSOR H.sapiens ckshs1 mRNA for Cks1 protein homologue. Human humFib mRNA for fibrillarin.
With the method of group testing [16], dividing the samples into 5 groups randomly, every group is taken as testing set by turns, as well as the other groups are as training set. Learning Vector Quantization Neural Network (LVQNN), BP Neural Network (BPNN), Probabilistic Neural Network (PNN) and Generalized Regression Neural Network (GRNN) are used to train the training set and test the testing set. The classification results of the four neural networks are shown in Table 2. Table 2. The classification correct ratio of the four neural networks
Group 1 Group 2 Group 3 Group 4 Group 5
LVQNN 0.9000 0.8333 0.8500 0.8667 0.8333
BPNN 0.7500 0.8333 0.7333 0.8000 0.8167
PNN 0.7583 0.8000 0.6777 0.8333 0.8000
GRNN 0.6667 0.7500 0.8333 0.6777 0.7500
Table 2 shows that the proposed method can effectively identify colon cancer. Compared with other three neural network methods, LVQ Neural Network has the best classification effect.
5 Conclusion In this paper, a novel method is proposed for cancer data classification. The colon cancer data are used for conducting experiments. Gene features are extracted based on three feature selection mechanisms, which observably reduces dimensionality, furthermore obtains the informative features. Finally LVQ Neural Network is used to construct the classifier. The experimental results show that the proposed approach is efficient in colon cancer diagnosis. Compared with other three neural network methods, LVQ Neural Network has higher recognition rate of accuracy.
474
T. Zang et al.
References 1. Alon, U., Barkai, N., Notterman, D., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natnl. Acad. Sci. USA 96(12), 6745–6750 (1999) 2. Golub, T.R., Slonim, D.K., Tamayo, P., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Scienc 286(5439), 531–537 (1999) 3. Li, Y.-X., Liu, Q.-J., Ruan, X.-G.: Analysis of Leukemia Gene Expression Profiles and Subtype Informative Genes Identification. Chinese Journal of Biomedical Engineering 24(2), 240–244 (2005) 4. Peterson, L.E.: Partitioning large-sample microarray-based gene expression profiles using principal components analysis. Computer Methods and Programs in Biomedicine 70(2), 107–119 (2003) 5. Wang, S.-L., Wang, J., Chen, H.-W., et al.: Research of a Tumor Diagnosis Algorithm Based on Principal Component Analysis. Computer Engineering & Science 29(9), 84–90 (2007) 6. Wang, M.-Y., Wu, P., Wang, D.-L.: Gene selection algorithm based on correlation analysis. Journal of Zhejiang University (Engineering Science) 38(10), 1289–1292 (2004) 7. Azuaje, F.: A Computational Neural approach to Support the Discovery of Gene Function and Classes of Cancer. IEEE Transactions on Biomedical Engineering 48(3), 332–339 (2001) 8. Narayanan, A., Keedwella, E.C., Gamalielssonb, J., Tatinenia, S.: Single-layer artificial neural networks for gene expression analysis. Neurocomputing 61(1-4), 217–240 (2004) 9. Tan, A.-H., Panb, H.: Predictive neural networks for gene expression data analysis. Neural Networks 18(3), 297–306 (2005) 10. Chiang, J.-H., Ho, S.-H.: A Combination of Rough-Based Feature Selection and RBF Neural Network for Classification Using Gene Expression Data. IEEE Transactions on Nanobioscience 7(1), 91–99 (2008) 11. Guyon, I., Weston, J., Barnhill, S., et al.: Gene selection for cancer classification using support vector machines. Machine Learning 46(13), 389–422 (2000) 12. Li, L., Weinberg, C., Darden, T., Pedersen, L.: Gene Selection for Sample Classification Based on Gene Expression Data: Study of Sensitivity to Choice of Parameters of the GA/KNN Method. Bioinformatics 17(12), 1131–1142 (2001) 13. Li, Y.-X., Ruan, X.-G.: Cancer Subtype Recognition and Feature Selection with Gene Expression Profiles. Acta Electronica Sinica 33(4), 651–655 (2005) 14. Kohonen, T.: Learning Vector Quantization. Neural Networks 1(1), 303 (1988) 15. Ma, X., Liu, W., Li, Y.-B., Song, R.: LVQ neural network based target differentiation method for mobile robot. In: International Conference on Advanced Robotics, vol. 2005, pp. 680–685 (2005) 16. Tan, Y.-H., He, Y.-G.: Diagnosis of Mammary Carcinomar Based on Quantum Neuron. Computer Systems & Applications 17(5), 111–114 (2008)
Estimation of Groutability of Permeation Grouting with Microfine Cement Grouts Using RBFNN Kuo-Wei Liao1 and Chien-Lin Huang2 1 Department of Construction Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan, [email protected] 2 Department of Bioenvironmental Systems Engineering, National Taiwan University, Taipei, Taiwan
Abstract. The use of microfine cements in permeation grouting has been growing as a strategy in geotechnical engineering because it usually provides improved groutability (N). One of the major challenges of using microfine cement grouts is the ability to estimate the N within a reasonable level of error. The suitability of traditional groutability prediction formulas, which are mostly basis on the grain-size of the soil and the grout, is questionable for semi-nanometer scale grout. This study first investigated the accuracy of the current formulas; we found that the accuracy ranges from 45% to 68%, a level that is not adequate for practical engineering. An alternative approach, basis on a Radial Basis Function Neural Network (RBFNN), was developed. After finding a good correlation between the field observation and the RBFNN output, it was concluded that RBFNN is a suitable and reliable tool to predict the outcome of permeation grouting when microfine cement grout is used. Keywords: RBFNN, groutability, microfine cement, permeation grouting.
Notation: The following symbols are used in this paper: n = soil porosity D15 = the diameter through which 15% of the total soil mass passes d85 = the diameter through which 85% of the total grout passes w/c = water-to-cement ratio of grout FC = the fines content of the total soil mass N(i) = the groutability of soils P = the grouting pressure Cu = the uniformity coefficient Cz = the coefficient of gradation M = the number of centroids φ = the radial basis function cj = the location of the jth centroid d(p) = the target output value of the pth data point y(p) = the network output value of the pth data point wj = weighting factor e = void ratio N = data dimension of the input layer D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 475–484, 2011. © Springer-Verlag Berlin Heidelberg 2011
476
K.-W. Liao and C.-L. Huang
1 Introduction and the Existing Approaches Predicting the N with cement-basis grouts is a complicated task. Many parameters affect the N and must be evaluated systematically. In the literature, several relationships between soil grain-size and cement grout particle size have been defined to build a groutability prediction formula. For example, Burwell [1] proposed the following two simple equations: N1 =
( D15 ) soil (d85 ) grout
(1)
N2 =
( D10 )soil (d95 ) grout
(2)
Burwell stated that grouting may be possible if N1 is greater than 25, but that grouting is not possible if N1 is less than 11, the center region (i.e., 11 < N1 < 25) is undefined. For the first case above (i.e., N1 > 25), grouting only succeeds when N2 is greater than 11. Incecik and Ceren [2] suggested an alternate equation as follow: N=
( D10 ) soil (d 90 ) grout
(3)
Basis on equation (3), grouting succeeds if N is greater than 10. Another similar approach was proposed by Krizek et al. [3]; they suggested the identical equations (i.e., equation (1) and (2)) as Burwell but with different thresholds: grouting only succeeds if N1 is greater than 15 and N2 is greater than 8. Huang et al. [4] also observed similar groutability rules for soils in field grouting. If N1 > 9 or N2 > 4, Huang et al has successfully injected microfine cement (MFC-GM8000) into the sandy silt soil using 1% superplasticizer with a high speed mixer for three minutes. Axelsson et al. [5] utilized the concept of a ‘fictitious aperture’ in the field of rock grouting, in which fictitious aperture (bfic) is the pathway between sand grains. Furthermore, Axelsson et al stated that bfic only depends on the median soil grain-size and can be determined through the following equation: b fic = 0.15 × D50
(4)
They defined N as follows N=
b fic (d95 ) grout
=
0.15D50 (d95 ) grout
(5)
The threshold for N depends on the w/c, taking w/c = 2.0 for example, if N > 4.2 is considered to be groutable, while if N < 3 is not considered to be groutable. Between the thresholds, grouts cannot be sufficiently injected into the soils. Akbulut and Saglamer [6] conducted a series of tests in the laboratory and basis on the results, they proposed a multi-parameter formula to predict the N as follows: N=
( D10 )soil w/c P + k1 + k2 FC Dr ( d90 ) grout
(6)
Estimation of Groutability of Permeation Grouting with Microfine Cement Grouts
477
in which P is in kPa, and Dr is the relative density of soil samples. k1 = 0.5 and k2 = 0.01 in 1/kPa are constants. In the case of N > 28, the soil is considered to be groutable, and conversely, if N < 28, the soil is not considered to be groutable. Note that equation (6) provides a reasonable prediction only when the following conditions are satisfied: 0 < FC < 6%, 0.8 < w/c < 2 and 50 < P < 200. Tekin and Akbas [7] developed an ANN model to estimate the N by cement-basis grouts, using a database of 87 laboratory results. In their model, the w/c, the relative density of the soil, the P, and the diameter of the sieves through which 15% of the soil particles and 85% of the grout passed were used as the input parameters. This ANN model provided a very accurate prediction. To investigate the accuracy of each of the prediction equations mentioned above, 240 field grouting data samples were collected in Taiwan, then equations (1) through (5) were applied to obtain the prediction outcomes. These outcomes were compared to observations made in the field. Because the accuracies of the above formulas did not reach the desired target, we proposed a new model to predict the N. A similar examination procedure was also applied to our proposed model. The multi-parameter formula proposed by Akbulut and Saglamer (i.e., equation (6)) was not used for comparison because the FC and w/c of the collected data was not within the applicable range. Similarly, Tekin and Akbas’s ANN model was not evaluated here due to lack of data on the relative density of the soil and the P.
2 Application of RBFNN to Groutability Prediction 2.1 Study Area A total of 25 grouting holes were chosen to gather soil groutability information. In each grouting hole, data samples (e.g., between 5 and 11) were recorded from the boring log. A total of 240 on-site permeation grouting data samples were collected in the cities of Taipei and Kaohsiung, Taiwan. Among these 25 grouting holes, 21 of them were located in the city of Kaohsiung, and a total of 220 grouting data samples were collected; a total of 20 grouting data samples from 4 grouting holes were collected in Taipei city. A mix of microfine cement (MFC-GM8000A) and micro-slag (MFC-GM8000B) in equal proportions was used as the injected grouts. The diameters through which 95%, 90% and 85% of the total grout passes are 7.4 μm, 6.4 μm and 0.2 μm, respectively. Furthermore, the diameter through which 70% of the total grout passes is less than 1 μm. The grouts used here can be considered to be seminanometer materials. Two w/c (4.0 and 4.65) were used in Kaohsiung city, and only one w/c (3.34) was used in Taipei city. A total of 195 grouting data samples from 20 grouting holes were used in the RBFNN training procedure. Given the distribution of the w/c, the number of data samples for w/c equals to 3.34, 4.0 and 4.65 was 13, 107 and 75, respectively. The details regarding the number of data samples used in training and testing procedures are shown in Table 1. 2.2 Influence Parameters of the N Seven input parameters were considered in our RBFNN model and their corresponding ranges, mean values and standard deviations are listed in Table 2. Note that, as FC
478
K.-W. Liao and C.-L. Huang
Table 1. The number of grouting data samples used in the training and testing procedures
No. of data (Holes) for training No. of data (Holes) for testing
MRT (Taipei) w/c=3.34 13(3) 7(1)
No. 1 highway (Kaohsiung) w/c=4.0 w/c=4.65 107(10) 75(7) 20(2) 18(2)
Total 195(20) 45(5)
Table 2. Statistics of input factors in RBFNN factors D10 D15 e FC Cz Cu w/c
Max. value 85 μm 125 μm 1.04 99.6% 27.84 581.82 4.65
Min. value 0.1 μm 0.2 μm 0.35 6.9% 0.02 2.11 3.34
Mean
Standard deviation
20.07 μm 32.01 μm 0.71 41.67% 2.44 26.38 4.0
20.83μm 27.54μm 0.13 29.94% 2.42 63.12 0.53
Table 3. Combinations of seven input factors in the input layer Cases Case 1 Case 2 Case 3 Case 4 Case 5 Case 6 Case 7 Case 8 Case 9
Selected input factors D10 D15 D10, D15 D10, D15, w/c D10, D15, w/c, e D10, D15, w/c, e, FC D10, w/c, e, FC, Cz, Cu D15, w/c, e, FC, Cz, Cu D10, D15, w/c, e, FC, Cz, Cu
ranged from 99.6% to 6.9 %, there is a very high chance for D10 or D15 to be less than 0.074 mm; this is one of reasons that traditional empirical formulas fail to predict the N. Also, the multi-parameters formula proposed by Akbulut and Saglamer was not applicable to the soils investigated here because the FC and w/c were outside of the appropriate range. 2.3 The Proposed RBFNN The N of the input layer, the M in the hidden layer, the cj, the weights (wj) associated with each centroid and the φ all impact the accuracy of the RBFNN model. Before proceeding further, the original collected data must be normalized to avoid overfitting the weights during the weight adjustment procedure. The original input data are first normalized by the range of the maximum and minimum values so that all input values fall between 0 and 1. The normalized values are then sent to the input layer of the RBFNN model. Below are the details of the proposed RBFFNN models.
Estimation of Groutability of Permeation Grouting with Microfine Cement Grouts
479
2.3.1 Determination of the N in the Input Layer Nine cases with different combinations of the seven parameters (Table 3) were examined to determine which case provided the best prediction outcome. As D10 and D15 appear in most traditional prediction formulas (as shown in equations (1), (2), (3) and (6)), these parameters are considered to be the most important ones for determining the N of the input layer. Thus, the first three trials of the nine cases considered these two factors individually, and their combination is indicated in Table 3. Other input factors, such as the w/c and the e, were randomly arranged and combined with D10 and D15 to form cases 4 through 9. The results indicate that cases 7 through 9 all present promising predictions. 2.3.2 Determination of the M & cj in the Hidden Layer Two different approaches, the method of Orthogonal Least Squares (OLS, Ham [8]) and the method of K-Means, are used to calculate the values of M and cj. In OLS, each data point is initially considered as a candidate for the centroid (cj), and they are evaluated one by one. The evaluation consists of calculating the network output error (d(p) - y(p)) for each data point as the new centroid of that cluster. The data point that reduces the network error the most is the new centroid for the current evaluation. The above procedure proceeds continuously until the network error reaches an acceptable value. That is, OLS is a supervised approach to search for the M and the cj. Because OLS needs to evaluate each data point in a single evaluation, it can require a long calculation time. However, this was not observed in our current study. The total number of data points in this study was 240 with a N of seven. With the computational capability of a modern computer, this evaluation procedure was not a burden. For the K-Means approach, M must be predefined. The M that was determined using OLS is used here. The input points are partitioned into M clusters by minimizing the sum, over all clusters, of the within-cluster sums of the point-to-cluster-centroid distances. It should be noted that a number of data points equal to M must be provided as the centroids for the first run of K-Means. A common technique is to use random sampling to provide these data points, although doing so produces uncertainties in the resulting centroids. 2.3.3 Determination of the wj Associated with Each Centroid The goal of the method of least squares is to minimize the error due to the difference between the predefined and the calculated y. The method of least squares (LS) is widely used to find the function that provides the best fit to a set of data. LS estimates the coefficients (e.g., the wj of the RBFNN model) of a chosen function by minimizing the model error, which is the sum of the squares of the differences between the measurements (the in-situ observations) and the model (the y of RBFNN). However, achieving optimal results with LS relies heavily on the assumption of homoscedasticity. If the model errors are heteroskedastically distributed, LS estimators are no longer the best linear unbiased estimators.
480
K.-W. Liao and C.-L. Huang
Thus, a common remedy technique for heteroskedasticity (the method of weighted least squares, WLS) is used here as a second approach to determine the weights (wj). The goal is to assign a weight to each observation that reflects the uncertainty of that particular measurement. The weight wi that is assigned to the ith observation typically contains an inverse variance of that observation. 2.3.4 Determination of the φ Several functions (such as linear, cubic, thin-plate-spline, Gaussian and inverse multiquadratic functions) are available for the φ. Because Gaussian is usually considered as a reasonable assumption for the distribution of the data points, it was chosen in this study.
3 Results A total of 240 data samples collected from two construction fields were used to evaluate the precision of the empirical formulas; the results are summarized in Table 4. 3.1 Results of Empirical Formula The comprehensive prediction outcomes are illustrated by separately calculating the correct percentages of groutable and non-groutable soils in the fields. The correct percentages for non-groutable soils were very high, ranging from 97.75% to 100%. However, the prediction results from the empirical formulas for groutable soils were relatively inaccurate and varied only from 1.03% to 49.33% correct. This result indicates that most empirical formulas only provide accurate predictions when the soil is non-groutable. However, the probability that a particular soil is groutable ranges from 46% (76/166) to 59% (131/221) when the formulas predict that the soil is nongroutable. Overall, the accuracy of the empirical formulas (ranging from 45% to 68%) is unsatisfactory. Therefore, there is significant room for improvement. 3.2 Results of RBFNN Models with Different Input Ns Nine cases were evaluated to search for the “best” N. The results of each case for the training and testing procedures are shown in Table 5, indicating that cases 7 through 9 all produce promising prediction outcomes. Among these three cases, the accuracy percentages ranged from 93.33% to 96.92% for the training and testing procedures. Furthermore, the accuracy percentage ranged from 84.44% to 89.74% even when only one factor (D10 or D15) was considered (case 1 and case 2). The traditional prediction models with similar input parameters (i.e., using only one or two parameters) only achieved accuracies of 45% to 68% (Table 4), indicating that RBFNN establishes a better prediction model for injecting grouts with high w/c and microfine cement into soils with a high ratio of FC. In Table 5, only the overall accuracy percentage of the RBFNN is indicated; more detailed results (for case 9) are included in Table 6. 3.3 Comparison among RBFNN Models Table 5 indicates the prediction results of the four models. From the results of OLSLS and OLS-WLS, there is no clear trend to conclude that which methodologies
Estimation of Groutability of Permeation Grouting with Microfine Cement Grouts
481
Table 4. Prediction results of each empirical formula
Empirical formula
Results from field
Burwell et al. (1958)2
Incecik and Ceren(1995)
Krizek et al. (1992)
Results from formula √
1
1
X
Accuracy (%)
√
1
96
1.03
X
0
90
100
√
19
131
12.67
X
0
90
100
√
20
130
13.33
X
0
90
100
74
76
49.33
X
0
90
100
√
54
96
36.00
X
0
90
100
19
102
15.70
√ 3
N1 Huang et al. (2007)
N23 √
Overall accuracy (90+1)/187 = 48.66 (19+90)/240 = 45.42 (20+90)/240 = 45.83
(74+90)/240 = 68.33 (54+90)/240 = 60.00
(19+87)/210 = 50.48 X 2 87 97.75 1 2 : X denotes non-groutable, √ denotes groutable : 53 of 240 data samples were not defined and were therefore excluded from this evaluation 3: defined in equation (1) and equation (2) 4: 30 of 240 data samples were not defined and were therefore excluded from this evaluation. Axelsson et al. (2009)4
Table 5. Accuracies of the RBFNN models with different cases OLS-LS (%) cases training
1 2 3 4 5 6 7 8 9
88.72 89.74 92.82 92.82 91.28 95.38 94.87 96.92 95.90
testing 84.44 84.44 88.89 80.00 84.44 91.11 93.33 93.33 95.56
RBFNN models K-Means*-LS OLS-WLS (%) (%) training 88.72 88.72 90.80 91.15 92.30 94.83 95.21 94.87 93.71
testing 84.44 84.44 86.88 88.65 90.00 92.72 95.87 94.97 96.02
training
88.72 89.74 92.82 94.36 91.79 95.90 94.36 96.41 96.41
testing 84.44 84.44 88.89 82.22 88.89 88.89 93.33 93.33 93.33
K-Means*WLS (%) training 88.72 88.72 90.96 91.79 92.82 95.53 95.61 95.50 94.56
testing 84.44 84.44 85.55 86.16 85.39 88.46 95.25 94.71 96.75
* results of K-Means is the mean of 25 runs.
(WLS and LS) is a better approach for considering the weights between neurons. Also, by comparing the results of OLS-LS and K-Means-LS, it is apparent that there is no clear trend can be observed to determine an optimal cluster approach. The inherit uncertainties associated with the method of K-Means produce variation in the results of the K-Means-LS approach. In addition, the differences between the sample
482
K.-W. Liao and C.-L. Huang Table 6. Prediction results of RBFNN under different conditions Results from RBFNN
Results from field
Model
Accuracy
X1 1 81
√1 √ 149 RBFNN X 9 1 : X denotes non-groutable, √ denotes groutable.
0.993 0.900
Overall accuracy 0.958
Table 7. Accuracy percentage of different models for groups divided by D10 6.3 μm < D10<13 μm √1 X1
criterion
D10 < 6.3 μm X1
Burwell et al.
N1 > 25& N2 > 11
100%
0%
100%
0.9%
Incecik and Ceren
N > 10
100%
0%
100%
17.3%
Krizek et al.
100%
0%
100%
17.3%
Huang et al.,
N1 > 15& N2 >8 N>9 Nc > 4
100% 100%
0% 0%
100% 100%
67.3% 49.1%
Axelsson et al.
N>5
97%
2.5%
95.8%
16.4%
100% 90%
100% 100%
70% 50%
98.9% 96.7%
Models
RBFNN Case9: D10, D15, w/c, e, FC, Cz, Cu
training testing
D10 >13 μm √1
1
: X denotes non-groutable, √ denotes groutable. Table 8. Accuracies of the SVM model with different cases SVM model cases 1 2 3 4 5 6 7 8 9
training 91.2 92.9 92.3 94.1 91.4 90.8 94.9 95.2 95.6
testing 90.8 92.5 90.8 93.8 90.8 88.8 92.1 92.1 94.6
means from the two approaches (OLS-LS and K-Means-LS) are slight. These factors explain why no apparent tendency can be observed. For overall performance, one can choose OLS-WLS with case 9 as the final RBFNN model. In this final proposal, this model achieves an accuracy rate of over 95.8% (Table 6) regarding the overall outcomes.
Estimation of Groutability of Permeation Grouting with Microfine Cement Grouts
483
Fig. 1. The RBFNN architecture used in this study
3.4 Comparison with Results from Support Vector Machines (SVM) For comparison reason, Table 8 displays the prediction results of the SVM model. The general trend of the accuracies for the 9 cases is similar to that of the RBFNN models. The SVM possesses a better prediction with less number of input neurons, but when increasing the number of inputs, the RBFNN models provide a more promising result.
4 Conclusions and Suggestions Given these promising results, several conclusions and suggestions can be drawn: (1) With a three-layer perceptron structure, seven parameters for the input layer, the use of OLS to locate the centroids in the hidden layer, and the use of WLS for regression, the proposed RBFNN approach was proven to be a reliable and stable model with satisfactory precision. (2) Using only the grain-size ratio of the soil and grout to predict the N cannot completely capture the behavior of the grouting mechanism. Basis on one of the promising RBFNN models (case 9), all of the parameters (including the w/c, the e, the FC, the Cz and the Cu) play important roles in determining the N and should be included in the prediction model. (3) Because 70% of the grain-size of the microfine cements used here was less than 1 μm, the grouts used here can be considered to be a semi-nanometer material, with different physical behaviors relative to those of Portland cement. With sizes approaching the semi-nanometer scale, high w/c and a high speed of the colloidal mixer, the microfine-cement grouts possess an improved ability to penetrate relative to traditional Portland cement. It is believed that some of the construction sites will be nongroutable if regular grout is used but will become groutable if microfine-cement grout is used. Thus, many groutable data are judged as non-groutable using the empirical formulas, that is, the empirical formulas are not suitable for microfine-cement grouts.
484
K.-W. Liao and C.-L. Huang
(4) The FC in this study was relatively high, as shown in Table 2; in other words, D10 and D15 were likely less than 0.074 mm. This factor caused the N value calculated from the empirical formulas to be smaller than expected. Thus, more non-groutable results were predicted than were actually present in the data. This effect was worse in the fuzzy zone with groutable soils. Therefore, the empirical formulas may not be appropriate for predicting the N of the collected soils in this study.
References 1. Burwell, E.B.: Cement and Clay Grouting of Foundations: Practice of the corps of engineering. J. Soil Mech. Found. Div. 84, 1551/1–1551/22 (1958) 2. Incecik, M., Ceren, I.: Cement grouting model tests. Bulletin of The technical University of Istanbul 48(2), 305–317 (1995) 3. Krizek, R.J., Liao, H.J., Borden, R.H.: Mechanical Properties of Microfine Cement / Sodium Silicate Grouted Sand. ASCE Special Technical Publication on Grouting, Soil improvement and Geosynthetics 1, 688–699 (1992) 4. Huang, C.L., Fan, J.C., Yang, W.J.: A Study of Applying Microfine Cement Grout to Sandy Silt Soil. Sino-Geotech. 111, 71–82 (2007) 5. Axelsson, M., Gustafson, G., Fransson, A.: Stop Mechanism for Cementitious Grouts at Different Water-to-Cement Ratio. Tunn. Undergr. Sp. Tech. 24, 390–397 (2009) 6. Akbulut, S., Saglamer, A.: Estimating the Groutability of Granular Soils: a New Approach. Tunn. Undergr. Sp. Tech. 17, 371–380 (2002) 7. Tekin, E., Akbas, S.O.: Artificial neural networks approach for estimating the groutability of granular soils with cement-basis grouts. Bull. Eng. Geol. Environ. (May 29, 2010) (published online) 8. Ham, F.M., Kostanic, I.: Principles of Neurocomputing for Science & Engineering. McGraw-Hill, New York (2001)
Improving Text Classification with Concept Index Terms and Expansion Terms XiangHua Fu, LianDong Liu, TianXue Gong, and Lan Tao College of Computer Science and Software Engineering, Shenzhen University, Shenzhen Guangdong, 518060, China [email protected], [email protected]
Abstract. Feature selection methods are widely employed to improve classification accuracy by removing redundant and noisy features. However, removing terms from documents may damage the integrity of content. To bridge the gap between the integrity of documents and the performance of classification, we propose a novel method for classification by two steps. Firstly, we select index terms and expansion terms through Maximum-Relevance and Minimum-Redundancy Analysis (MR2A). Then we combine the predictive power of index terms and expansion terms via Concept Similarity Mapping (CSM). Testing experiments on 20Newsgroups, and SOGOU datasets are carried out under different classifiers. The experiment results show that both CSM and MR2A outperform the baseline methods: Information Gain and Chi-square. Keywords: Feature Selection, Text Classification, Concept Index Terms, Expansion Terms.
1
Introduction
The task of text classification is automatically assigning predefined categories to free documents. With the amount and type of web information increase rapidly, inductive and effective classification techniques are urgently demanded. However, the art of inductive learning starts with the design of appropriate data representations. Indeed, most features are irrelevant with classification or redundant with each other, and their presence may obscure the important intrinsic structure behind data [1]. Thus, it is necessary to remove redundant features from data sets as many as possible. According to the result features, dimension reduction methods can be divided into feature selection and feature extraction. The result elements of feature selection are original terms in a corpus. Irrelevant features can be eliminated, however, feature selection methods evaluate features with individual discriminative power and combine the discriminative terms for classification. Which ignore the overall importance measurement for all selected terms. As the top ranked terms are always correlated with each other, combination of good features does not necessarily lead to good classification performance [2]. Differently, Feature extraction methods extract features through synthesizing different terms by clustering or transformation. so that the similar words get similar representations [3]. However, Feature D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 485–492, 2011. c Springer-Verlag Berlin Heidelberg 2011
486
X. Fu et al.
extraction methods, both cluster-based and transformation-based, are limited by the computational complexity and unsatisfactory statistical foundation. To take advantage of the strengthens of above two methods and avoid the limitations of them, many approaches have been proposed. Chris Ding [4] proposed a Maximum Relevance and Minimum redundancy (MRMr) feature selection framework for microarray data analysis. However, to obtain the MRMr features requires intensive calculations for the pairwise correlations between features. A predominant correlation-based approach proposed by [5] could effectively identify relevant features as well as redundancy among relevant features without pairwise correlation analysis. Wang [6] proposed a feature selection method called Conditional Mutual Information Maximin (CMIM), which could select a set of individually discriminating and weakly dependent features with a desirable performance. Above methods can select discriminative and max-independent terms. However, removing terms from documents may damage the integrity of content. Terms removed by feature selection methods may be important for some texts that express the similar concept with non-index terms. In such case, the predication of these texts will be incorrect. Some related researches suggested that using expansion terms in text category properly will improve classification accuracy [7] [8]. In this paper, we propose a novel text feature selection method for improving the performance of classification by MRMr Analysis (MR2A). Firstly, candidate terms with maximal relevance are selected with Multi-class Odds Ration (MOR) method, and index terms with minimal redundancy are obtained by a modified CMIM method. We remove the index terms from candidate terms, and select the rest high-rank terms as expansion terms. In order to reduce the information loss caused by feature selection, we combine the information of expansion terms and expansion terms by Concept Similarity Mapping (CSM), which maps the expansion terms into the concept field of index terms and represent the documents with the combined vectors. This paper is organized as follows: In Section 2, we introduce MRMr Analysis to select index terms and expansion terms and propose CSM-based classification method . In Section 3, we address and analyze the experiment results. Finally, the conclusion of the paper is given in Section 4.
2
Classification with Concept Index Terms and Expansion Terms
In general, the combination of high relevant features will result in redundancy and cause poor prediction performance consequently. We can gain a better classification performance with fewer terms by reducing the dimension of features as following steps. (1) Ranking all the terms by their relevances with different categories and selecting those terms of max-relevance as candidate terms.
Text Classification with Concept Index Terms
487
(2) Selecting the set of index terms which can keep most information of classification with Conditional Mutual Information Max-Min (CMIM) from candidate terms. treating he rest terms as expansion terms. (3) Calculating the confidence of co-occurrences between index terms and expansion terms, combining the weight of expansion terms with index terms. The first two step can select the set of index terms which is maximal relevant with the categories and minimal redundancy among terms. This mehod, MRMr Analysis method, can gain a promising classification accuracy with different classifiers. The third step, combining the semantic of index terms and expansion terms, is a advanced method based on MR2A. 2.1
Select Max-Relevance Terms
There are many feature ranking measures for feature selection. Such as mutual information, information gain, chi-square and odds ratio [9] [10]. Odds ratio is commonly used in document classification and retrieval. Odds ratio is very suitable for feature weighing [11]. We weight the relevance between term w and categories c with a modified odds ratio measure as following equation. Relevance(w, c) = log
p(w|c) p(w|¯ c) − log 1 − p(w|c) 1 − p(w|¯ c)
(1)
where p(w|c) is the probability of the word w occurs in the texts labeled class c, and p(w|¯ c) is the probability that word w occurs in the texts without c label. By ranking the terms according to their relevance with class c, the terms corresponding to the N largest relevance value are selected as candidate terms. 2.2
Remove Redundant Terms
Traditional feature selection methods are incapable of removing redundant features because redundant features likely have similar rankings. To bridge this problem, we select the Min-redundancy feature subset to maximize overall information by iteratively selecting a feature sharing minimal redundancy with the already selected features. Information shared by feature subset F and category set C is evaluated by I(F, C) = −
f ∈F,c∈C
p(f, c)log
p(f, c) p(f ) × p(c)
(2)
where p(f, c) is the probability of feature f presenting in the documents defined class c, p(c) denotes the proportion of documents assigned label c, and p(f ) denotes the proportion of documents including feature f . I(F, C) is a beneficial subset evaluation metric, which is able to evaluate the overall predictive power of the selected terms.
488
X. Fu et al.
To search a minimal subset of features which contains category information as much as possible, we iteratively seek a free term which can bring more information for classification and append it to the selected set. Conditional mutual information (CMI) is a good metric for evaluating the amount of information brought by appending a new feature Fk to the original k−1 features F1 , . . . , Fk−1 . CM I(Fk ; C|F1 , . . . , Fk−1 ) = I(F1 , . . . , Fk ; C) − I(F1 , . . . , Fk−1 ; C).
(3)
To directly calculate CMI, we need to compute the complex join probability, which is both computationally intractable, and not robust. To avoid this difficulty, Wang [6] proposed a method to tackle this problem via approximation CMI of k features with fewer dimensionality as CM I(F ∗ ; C|F1 , . . . , Fk ) ≈ min{CM I(F ∗ ; C| Fi , . . . , Fj )}.
(4)
k−1
The k − 1 selected features are the ones most correlated with the feature F ∗ . However, the summation of information of F ∗ is reduced by this approximation. It is reasonable to select a more informative term to avoid this problem. If a term is weakly relevant or redundant with the terms already selected, such a term should provide more information. The redundancy shared by two terms is evaluated by p(t, f ) Redundunce(t, f ) = p(t, f ) × log (5) p(t) × p(f ) To maximize the category-relevant information provided by the selected features and minimize the redundancy among features, the max-relevance and minredundancy terms are selected as index terms. The term selection criteria is t = arg maxi=1,...,v {minf ∈Kc Redundance(t, f )}.
(6)
Where t is the term to be selected, Kc is the set of already selected terms of class c, and v is the number of total terms. The set of index terms is obtained by iteratively selection with above criteria. 2.3
Map Expansion Terms
Theoretically, a context-sensitive classifier is more powerful than a context-free classifier [7]. Traditional feature selection methods remove those redundant or noisy terms to reduce the dimensionality of documents. However, removing words from documents will result in the loss of useful information and damage the context integrity of documents. To combine the predictive power of index terms and expansion terms, we proposed a method that map the weight of expansion terms into index terms, and we name this method Concept Similar mapping(CSM). The base track of our thought is that we synthesize the weight of index terms and expansion terms by their semantic relevance like feature extraction methods. however, we map the
Text Classification with Concept Index Terms
489
weight of expansion terms into index terms according to their co-occurrence frequencies. Consequently, this combination will boost the classification accuracy. As we can predict a term’s occurrence by observing its similar terms and make sense of this document through observing expansion terms and predicting the concepts related with index terms. Concept-prediction confidence is defined by Definition 1. If word x present in a document, we can predict that the concept of y will present in this document ( x → y¯ ) with a confidence as Conf idence : C(x → y¯) =
σ(x ∩ y) . σ(x)
(7)
The support count σ(x) denotes the number of training documents that contain x, and σ(x ∩ y) denotes the co-occurrence times of x and y. Concept-prediction confidence denotes the frequencies that term y appears in the cases of x has presented. If y never co-occurs with x, then the concept of y will not present with x; if y frequently co-occurs with x, the concept of y will present with x. According to Definition 1, concept-similar association between index terms X and expansion terms Y is weighted by matrix H as: H(x, y) =
σ(x ∩ y) , x ∈ X, y ∈ Y. σ(x)
(8)
Given term set T = {t1 , t2 , . . . , tn }, and a free document d = {σ(t/d)|t ∈ T }, where σ(t/d) denotes the frequency that t occurs in d. We have selected the index terms X ⊂ T and the expansion terms Y ⊂ T with MR2A. Document d can be represented with index terms with d1 = {σ(t/d)|t ∈ X}, or represented with expansion terms as d2 = {σ(t/d)|t ∈ Y }. To combine the information of index terms and expansion terms, we map expansion terms into concept space according to the concept-prediction confidence, the result vector d∗ = H × d2 , where H is defined by Equation 8. The transformed vector of d is z = d1 + d∗ . When classification with concept index terms, the document is represented by d1 . After expansion terms are mapped, the document is represented with z = d1 +d∗ . The dimensionality of z and d1 are equal to the size of index terms, |z| = |d1 | = |X|. thus the computational complexity of CSM method approximate to that of MR2A. What’s more, CSM preserve more useful information of free documents. For this reason, CSM significantly boost the classification accuracy.
3
Experiments and Results
In our experimental evaluation, we make comparisons between the two proposed methods (MR2A and CMS) and two baseline methods: Information Gain (IG) and Chi-square (CHI) [10]. The text classification experiments are performed with three classifiers: Linear Discriminate Classification (LDC), Na¨ıve Bayes Classifier [12] and K-Nearest Neighbor (KNN) [13]. Two corpus, 20Newsgroups dataset, and SOGOU Chinese dataset, are utilized.
X. Fu et al.
classification accuracy(%)
490
80
80
80
70
70
70
60
60
60
50
50
50
40
LDC
30 20
0
CHI MR2A CSM IG
200 400 Feature Numbers
600
40
NBC
30 20
0
CHI MR2A CSM IG
200 400 Feature Numbers
600
40
KNN
30 20
0
CHI MR2A CSM IG
200 400 Feature Numbers
600
Fig. 1. Performances of CHI, MR2A, CSM and IG testing on the Newsgroups dataset with LDC, NBC and KNN classifiers
Experimental results on different datasets are depicted respectively in following figures. The classification accuracy comparisons of each feature selection methods are made with three classifiers. Accuracy curves of different methods are displayed on these figures. 20Newsgroups dataset1 contains about 20000 articles, evenly divided among 20 UseNet discussion groups. We randomly take 25% of articles as testing data, and the other 75% as training data. Figure 1 indicates that all feature selection methods perform best using NBC, and the highest accuracy 85.27% is acquired by CSM with NBC when the number of selected features is 600. MR2A follows CSM and its highest accuracy is 84.12%. CHI and IG are obviously inferior to CSM and MR2A. The results depicted on Figure 1-LDC are similar to that of Figure 1-NBC. We can observer from Figure 1-KNN that CSM as well as MR2A perform well using KNN classifier. However, both IG and CHI perform badly and get very low accuracy. SOGOU Chinese corpus2 provides abundance web data, and we only take 1000 documents for each topic of ten in our experiments, 70% for learning and 30% for testing. All the stop words and rare terms (less than 2) are removed from the vocabulary. The Experiment results of this corpus described in Figure 2. The curves on this figure indicate that the best result (82.51%) is obtained by CSM, and MR2A also performs well (82.12%). The best accuracy of IG is 77.15 %, and that of CHI is 76.43%. It is obvious that CSM and MR2A surpass IG and CHI slightly. However, CSM does not boost the accuracy significantly as we expected. There is one reason may explain this observation. English authors tend to polish their articles with synonymy, polysemy and homonymy in the expressions of the same idea, while Chinese authors tend to keep their expression style. As a result, the distribution hypothesis works better in English datasets than Chinese datasets. The main observations and conclusions we would like to emphasize are: CSM significantly boosts the classification accuracy and reduces the dimensionality 1 2
http://people.csail.mit.edu/jrennie/20Newsgroups/ http://www.sogou.com/labs/dl/t.html
classification accuracy(%)
Text Classification with Concept Index Terms
80
80
80
70
70
70
60
60
60
50
50
50
40
40
LDC
30 20
0
CHI MR2A CSM IG
100 200 300 Feature Numbers
400
NBC
30 20
0
CHI MR2A CSM IG
100 200 300 Feature Numbers
400
40
KNN
30 20
0
491
CHI MR2A CSM IG
100 200 300 Feature Numbers
400
Fig. 2. Performances of CHI, MR2A, CSM and IG testing on the SOGOU dataset with LDC, NBC and KNN classifiers
on different data sets. As we select the feature(index terms) with MR2A, and combine the information of index terms and expansion terms according to the concept-prediction matrix. This reconstruction does not increase the dimensionality of documents, for this reason, the CSM-based classification method can achieve a better performance. MR2A is a comparative, even better than IG and CHI, feature selection method. Which select the index terms by max-relevance and min-redundancy analysis. The main task of MR2A is to select the index terms for CSM-based classification. The main limitation of CSM may be it need more time for mapping the expansion terms (the computation complexity is corresponding to the size of expansion terms, It is suggested that set the size of expansion terms as 1.5 times of the size of index terms). However, CSM is still better than feature extraction methods (LSI and PCA) on performance and accuracy.
4
Conclusion
In this paper, we have studied the problem of text classification with concept index terms and expansion terms. The first contribution in this paper is to select index term and expansion terms with MR2A. Experimental studies show that MR2A is better than IG and CHI when the learning samples are provided enough. To take advantage of the predictive power of expansion terms, we further proposed CSM-based classification method which reconstruct the document vector by mapping the expansion terms to the concept field of index terms. Experiments on different data sets and classifiers show that the classification accuracy is significantly improved by CSM. Acknowledgments. This work is partly supported by the National Natural Science Foundation of China under award number 61001185, 61003271, 60903114, 60973100.
492
X. Fu et al.
References 1. Liu, H., Liu, L., Zhang, H.: Boosting feature selection using information metric for classification. Neurocomputing 73(1-3), 295–303 (2009) 2. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1226–1238 (2005) 3. Sahlgren, M., C¨ oster, R.: Using bag-of-concepts to improve the performance of support vector machines in text categorization. In: Proceedings of the 20th International Conference on Computational Linguistics, Association for Computational Linguistics, p. 487 (2004) 4. Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of the 2003 IEEE of the Bioinformatics Conference, CSB 2003, pp. 523–528. IEEE, Los Alamitos (2003) 5. Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlationbased filter solution. In: Machine Learning-International Workshop Then Conference, vol. 20, p. 856 (2003) 6. Wang, G., Lochovsky, F.: Feature sel¨ pection with conditional mutual information maximin in text categorization. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, pp. 342–349. ACM, New York (2004) 7. Yang, Y., Chute, C.G.: An example-based mapping method for text categorization and retrieval. ACM Transactions on Information Systems (TOIS) 12(3), 252–277 (1994) 8. Wittek, P., Dar´ anyi, S., Tan, C.: Improving text classification by a sense spectrum approach to term expansion. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning. Association for Computational Linguistics, pp. 183–191 (2009) 9. Mladenic, D., Grobelnik, M.: Feature selection for classification based on text hierarchy. In: Proceedings of the Workshop on Learning from Text and the Web (1998) 10. Rogati, M., Yang, Y.: High-performing feature selection for text classification. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 659–661. ACM, New York (2002) 11. Deng, Z.H., Tang, S.W., Yang, D.Q., Li, M.Z.L.Y., Xie, K.Q.: A comparative study on feature weight in text categorization. In: Advanced Web Technologies and Applications, pp. 588–597 (2004) 12. Ko, Y., Seo, J.: Automatic text categorization by unsupervised learning. In: Proceedings of the 18th Conference on Computational Linguistics. Association for Computational Linguistics, vol. 1, pp. 453–459 (2000) 13. Yang, Y., Slattery, S., Ghani, R.: A study of approaches to hypertext categorization. Journal of Intelligent Information Systems 18(2), 219–241 (2002)
Automated Personal Course Scheduling Adaptive Spreading Activation Model Yukio Hori1 , Takashi Nakayama2, and Yoshiro Imai1 1
Information Technology Center, Kagawa University, 2-1 Saiwaicho, Takamatsu, Kagawa, Japan 760-0016 [email protected] http://www.itc.kagawa-u.ac.jp/~horiyuki/ 2 Faculty of Science, Kanagawa University
Abstract. This paper describes a support system approach to a constraint satisfaction/optimization problem of the academic course scheduling problem. In general, course schedules are generated manually, using course information such as class titles, summaries and categories given by teachers. However, it is not easy for students to generate manually a course schedule from a large number of combination of classes, due to various constraints and/or criteria. We developed a system that generates course schedules automatically, using spreading activation on a course network, where the problem is formalized as a constraint satisfaction/optimization problem. We describe the use of the model to schedule in our university. The best results were obtained using spreading activation model in evaluation test.
1
Introduction
University Students have to decide their own course schedule by themselves. In order to make a course schedule, it is necessary to satisfy the student’s interest and to meet course credit restrictions. On the other hands, the university’s curriculum changes dynamically to catch up with social needs, advanced science and technology. In the many university, there are on-line course support system that allows students view all subjects information is available[4]. However, it is not easy for students to generate manually a course schedule from a large number of combination of classes, due to various constraints and/or criteria, especially for the freshman in the university. This paper develops an automated system for generating the student’s course schedule. A lot of students course schedule are manually generated based on syllabus information, such as class title, summary and categories scripted from the teachers. However, at a large number of classes, it is very exhaustive to manually generate the course schedule, due to various constraint condition or criteria. Thus, this paper formalizes this problem as a constraint satisfaction/optimization problem, develops an automated tool, and evaluates the effectiveness of our system. The system enables the students to make a suitable course schedule for the learning of the field they want. D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 493–504, 2011. c Springer-Verlag Berlin Heidelberg 2011
494
Y. Hori, T. Nakayama, and Y. Imai
In the next section, we introduce some related works. In Section 3, we define the feature space and then describe the method of making a course schedule. In Section 4, we describe the results of the preliminary use test. We conclude this paper in Section 5.
2
Related Works
Nozawa et. al. had been proposed a syllabus analysis system[2]. In designing of an original curriculum by a higher education institution, or in external evaluation of an institution’s curriculum, comprehending the curriculum contents of many institutions in the same field is necessary. However, this is very hard to do even for education experts in that field. This system uses clustering view from each class syllabus description. The main aim of this system is not for students, but for teachers to make specialized curriculum. This system treats syllabus data which constitute some curriculum and are expressed in a common format, calculates the similarity between the syllabi based on the occurrence frequency of technical terms, clusters the syllabus, and thus helps to find distinguishing features of the curriculum by visualizing and comparing the assignments of the syllabus to the clusters along various classification axes. Visualizing of syllabus data is useful to make student’s study strategy, but it is not always the case the students find suitable course plan because of their lack of desire. We have tried to make a model of the student’s interests. Ohiwa et. al. has developed idea generation tool with card-handling[6]. Card handling is one of the most useful methods for information representation and idea generation. A generated card can be picked and moved by a mouse. Cards may be grouped by enclosing them with a curve. Relationships of cards and groups can also be marked by special lines. This system can be used for the course planning. However, there are a lot of restrictions about course credit in the curriculum. It is difficult to operate card-handling due to various constrain condition or criteria. Thus, We have formalized this problem as a scheduling problem[1]. The spreading-activation-based model has been proposed in previous studies investigating the syntactic priming (SP) effects in the document understanding [13][14][16]. Kojima has proposed a framework of document understanding using text information and context[17]. These past studies focused human’s memory activation to find topics words. We applied this spreading-activation-based model to well-concerned course planning via all courses in curriculum. The course planning is a multi-objective combinatorial optimization problem. In past study, some application system has been reported in a nurse work time scheduling[9] and a trainman allocation problem[10][11]. We have formalized students course planning as a constrain satisfaction/optimization problem. This approach is particularly effective where there are many constraints. An objective function is first defined to measure the degree of active value by a course plan. We set out to optimize a problem defined by this objective function and the block of constraints for one student. Using this approach, we optimized every student’s course plan in a semester. This suggested the proposed system could
Automated Personal Course Scheduling Adaptive Spreading
495
be used for discovering useful course schedule. This shows the proposed system supports the student to easily make their own course schedule and to easily to know new interesting course.
3
Generating Course Schedules
This section describes about how to generate time schedules using spreading activation on a course network. We define the feature space represented as user’s interest and learning strategy, formalize as a constraint satisfaction/optimization problem to generating time schedules. Feature vectors in the field of text are based on a set of nouns extracted from syllabus text; examples include Web mining[19]. In this case, the differences among the user’s interest and learning strategy and courses are well represented by term correspondences. We use the heuristic search technique to solve a constraint satisfaction/optimization problem. We show the system architecture of generating time schedules in Figure 1. 1. Getting syllabus text data We consider the syllabus data discovery problem to find sets of Web page, each of which share some template, among many pages collected in the university web site by a Web crawler. A found set is a potential input for information extraction and wrapper generation algorithms. These results are restored into database and performing edit and browsing on our system. 2. Curriculum Analyzing We represent feature vector based on a set of nouns extracted from syllabus text. The differences among the courses are represented by the similarity between the syllabus based on the occurrence frequency of technical terms. Next, we use the Repeated bisection clustering method to analyze the curriculum’s preference. 3. User Profile We represent student’s interest and learning strategy as a set of nouns extracted from syllabus text. We has been developed 3 types of making user’s profile method.
Web syllabus (4) Finding a feasible Course Schedule M T W T F
Student
(1) (3) Getting Syllabus User Profile (2) Curriculum Analysis
Output list of Course Schedules
Fig. 1. The configuration of Active Syllabus
496
Y. Hori, T. Nakayama, and Y. Imai
4. Finding a feasible course schedule The student’s course schedule is built via to solve a constraint satisfaction/optimization problem from the user profile. I the next section, each step details and the system behavior are described. 3.1
Getting Syllabus Text Data
Obtaining syllabus information on the syllabus web site is described here. In general, there are some fundamental information and contents of the subject in the description of the syllabus. Fundamental information contains subject name, target faculty/grade, credit information (core/required/elective), teacher’s name, semester, lecture day and number of credits. The contents of the subject contains summary, goal, teaching method, course content. Our system extracts all of these as a subject data and define ci vector. The feature vector of each subject was constructed as follows. The weight of term wij was defined using tf-idf. ci = (wi1 , wi2 , . . . , win ) wi,tj = tf(tj , ci ) · idf(tj ) idf(tj ) = log(N/df(tj )) where tf(i, j) is the occurrence of term tj in a subject j, N is the number of total subjects, and df(tj ) is the number of subject term tj appeared once or more. The followings were excluded as stop words. 1. Words of one letter hiragana and katakana in Japanese. 2. Low frequency terms: The terms under 70% of frequency were excluded. 3.2
Curriculum Analyzing
To facilitate comprehension of the curriculum’s features, our system is utilizing document-clustering of syllabus data. The repeated bisection clustering is used in order to get student’s intended course plan. ci cj ||ci ||||cj || N wik wjk = k=1 N N 2 2 k=1 wik k=1 wjk
Lij =
(1)
Our system makes syllabus clusters using similarity between the syllabus based on the occurrence frequency of technical terms. The clusters help to find distinguishing features of the curriculum. Our system uses ward-method to make clusters. Moreover, because the size of the cluster is different in the subject and each curriculum, the faculty staff has adjusted the clustering results.
Automated Personal Course Scheduling Adaptive Spreading
3.3
497
User Profile
There are many combinations of the course schedule in the term, and the student has an original intended plan. Our system makes the student profile as follow. U = {u1 , u2 , · · · , um }ui ∈S Here, ui is a word described in the syllabus, S is all syllabus set. However, it is not good way to manually choose various words via syllabus to make its profile. Our system has 3 types of strategy to make a student profile. The student can use these strategies by combining. – Manual select The students select words from syllabus by manual operations. – Cluster select By clustering the syllabus, classified fields of subject is generated. The student selects a cluster and it includes various keywords. – Browsing behavior Students repeat browsing which follows links from one page to another, and examine the contents of the pages they got by link selection. Students expect to acquire useful information by following syllabus. In a word, a student’s syllabus browsing behavior could be utilized as an intended query. Our system extracts, keywords from the student’s syllabus browsing history. The users are available to use combine these 3 profile setting methods. Figure 2 shows a user profile setting menu. Top menu is presented on upper left of Figure 2. These 3 profile setting methods are effective for arranging to select keyword into easily understanding their interest. 3.4
Course Scheduling
As discussed previously, making a course schedule is formalized as a scheduling problem. We use the courses contain the keywords in the user profile for an initial solutions of the scheduling problem, solve the scheduling problem using iterative scheme with best-first search. To solve a scheduling problem, There are five conditions as the follows. – An output must consider about 1 semester’s schedule. – An output contain classes from the class in each semester and grade. – An output must consider the schedule within the restriction and number of credits of each semester. – An output must include the required classes. – An output must consider the student’s interest. Out system sets the objective function and chooses the one with the maximum value for the objective function. – An output must consider a student’s personal schedule. We formalized this as a constraint satisfaction/optimization problem.
498
Y. Hori, T. Nakayama, and Y. Imai
Fig. 2. Screenshot of making a user profile
Fig. 3. The spreading activation model for a subject network
Output: Course Schedule C = {c1 , ..., co } Input: User Profile U = {u1 , ..., um }ui ∈ S Maximize: T +d T = i ai · O, ai = j aj Lij (i, j = 1, 2, · · · , o) Subject to min < credit(C) < max M {m|m is a required course} ⊂ C P {p|p is a prohibited course pattern } ⊂ C D{d|d is a weight value which represents student’s requests }
(2)
Here, credit(C) is number of credit in the course schedule C, m is a required subject in the semester, p is a prohibited course pattern, d is a weight value which represents student’s requests. The objective function T in (2) is sum of ci in the course schedule C with Lij represents similarity between each classes. This function represents the network with the vector and similarities of classes in the course schedule. ai is active value of class node ci . Therefore, T means active value of the course network from the view point of student’s interest. The spreading activation based model has been proposed in previous studies investigating the syntactic priming (SP) effects in head-initial languages (e.g., English). The cohesiveness is calculated by spreading activation on a semantic network constructed systematically from word network. The spreading activation based model is defined as follows. A(N ) = C(N ) + ((1 − γ)I + αR(N ))A(N − 1)
(3)
Automated Personal Course Scheduling Adaptive Spreading
499
where γ is attenuation, α is a propagation parameter, A(N ) is a vector of active value when the firing count is N , C(N ) is a vector of active value from syllabus text, I is identity matrix and R(N ) is a propagation matrix and it’s node i and j such as cosine element rij represents a similarity metric between similarity. In equation (2), rij means Lij . ai = j aj · Lij is an element of activities in equation (2), a vector of active value presents A = (a1 , a2 , · · · , ao )t if propagation count N is left out. This means C(N ) = 0, γ = 1, α = 1 in equation (3). Therefore equation (3) becomes as follows. A(N ) = L(N )A(N − 1) Here, our system uses only one time propagation to the course network. Sum of each element A(N ) is activation-based value in entire course network. O is a projection operator from syllabus feature vector to student interest vector, then activation-based value of entire course network represents T in equation (2). This projection means activation-based value via student’s interest on a course network. We use T and d represents student’s requests as the objective function. The solution of the objective function is larger when a course schedule has similar courses. Therefore, as can be seen from Fig 2, we can represent the word network model which has an ability to optimize both connection weights and the activation functions. For the first trial of making a course schedule, a course schedule matched by a student profile is used. Our system chooses the one with the maximum value for the objective function. This becomes the new trial solution for the next iteration, and repeat this iterative process until a satisfactory and feasible schedule is obtained. This optimization is as follows. 1. Make an initial course schedule matched by a student profile. 2. Choose one subject from current course schedule, and replace with the most similarity other subjects. 3. If objective function’s value is updated, this trial solution is replaced to current one. 4. repeat 2. until the whole of subjects is checked or replacing. 3.5
Example of Output
Figure 4(upper-side) shows an example of course schedule by our system. Figure 4(down-side) shows an example of course schedule making by a real student. ∗ mark shows a required subject in the semester. A course schedule in Figure 4(upper-side) is made by a profile includes words related computer science. This course schedule has more classes in the field of computer science and mathematics than student’s one.
4
Experiment
We evaluate the proposed methods using our university curriculum data. The evaluation test involved a total of 8 course schedules from different 8 students in
500
Y. Hori, T. Nakayama, and Y. Imai
1st grade time Mon Tue 1 Fundamental Introduction Mathematics 2 General Introduction Biology C 3 Introduction Mathematics 4
schedule by active syllabus(upper 1st semeter/bottom 2nd semester) Wed Thu Fri to Statistics Computer System∗ Modern History to Economics to
Introduction to Mechanical Eng. Artificial Intelligence Programming I ∗ Programming I ∗
5
1 Fundamental Introductory Career-design I Probability and Statistics∗ Informatics 2 Community and Sociology Mathematics Science II∗ 3 Fundamental Life Science Logical Circuit∗ 4 Programming II∗ 5 Mon 1 Algebra 2 English I
Differential and Integral Calculus I Mathematical Science I ∗
Differential and Integral Calculus Programming II∗ Basic Physics 1st grade time schedule by student (upper 1st semester/bottom 2nd semester) Tue Wed Thu Fri Basic Physics Computer System ∗ German I Biology Life and Human Gender and Sexuality German-Nordic Legend and Opera
3 German I 4
Programming I∗
5
Programming I∗
1
Life and Human
2 English II 3 4
German II
Probability and Statistics ∗ Introduction to Psychology C Mathematical Science II∗ Sociology D Logical Circuit ∗ German II Programming II∗ Differential and Integral Calculus Programming II∗
5
Differential and Integral Calculus I Mathematical English Com. Science I ∗ Human Com.
Cross-cultural Communications
Fig. 4. An example course schedule generated by Active Syllabus
grade from B4 to M2 and . In this evaluation test is a subjective experiment to choose an appropriate classes in the course schedule. Here, the course schedules by the students are their registered course schedules. 4.1
Outline of Test
We first describe about the curriculum for this experiment. Table 1 shows a credit requirements in our university. The number in () shows registerable number of classes in the division. In this curriculum consist of two basic elements, first one is public lecture course to acquire broader knowledge and understanding, and the other is professional education to expertise acquisition in a particular area. The number of courses, keywords, and the target grade are shown as Table 2. As there is a lot of variation in the 4th grade student’s course schedule, it have been eliminated from this experiment. We defined precision is a ratio of suitable subjects in a course schedule to learn a field of curriculum. p/n is a precision of a course schedule, here n is te number of subjects and p is the number of suitable subjects in a course schedule. This precision is scored by subjects’ judgement. Here we requested each subjects to judge about point of importance from specialized field, relativity and width in a course schedule. The subjects scored the precision of a course schedule with seeing class title list in a same semester, and if there is a more suitable class in the list in a course schedule, then it’s class is judged as not good class.
Automated Personal Course Scheduling Adaptive Spreading
501
Table 1. Experimental curriculum data Category Theme subject 53 General Education Seminar [elective] 52 Common Subjects (70) Health · Sports [elective] 44 General Education for upper-grade [elective] 6 Foreign Second Foreign Language 145 Language First Foreign Language 71 Subtotal Faculty General Analytical Thinking Subject 9 of Engineering Communication Skill 4 Engineering Subject Fundamental Mathematics/Science Skill 9 Classes Specialized Fundamental Specialized Subject 25 Subject Advanced Specialized Subject 29 Graduation Research Elective courses Subtotal Total General Education Subjects
Required credits for graduate 8 credits 2 credits 8 credits 2 credits 4 credits 4 credits subtotal 24 credits 6 credits 30 credits 8 credits 6 credits 10 credits 30 credits 32 credits 6 credits 6 credits 98 credits 128 credits
Table 2. Experimental data Target curriculum:
Dept of Information Systems Engineering, Kagawa University Number of courses: 513 Number of keywords: 7,819 Target grade: 1 ∼ 3 1st/2nd semester
The subjects included 6 of B4 and 2 of M2, therefore the subjects have the knowledge about this curriculum. Specifically, the subject name shown by the under line in figure 4 is not suitable subject from this subjective evaluation test. The students can choose the subjects from Mathematics, Physics and Computer Science in their course schedule. In this case, the precision is higher when a couse schedule has much relative subjects. 4.2
Outline of Evaluation Test
The evaluation test procedure is as follows. The test involved a total of 8 students ranging in grade from B4 to M1; 6 were B4 and 2 were M1. First, we got their course schedules from grade 1 to 3(human). We made the same number course schedules by Active Syllabus(as). This course schedules were made by history mode as shown in Fig 2 using subjects they registered. For the comparison, we conducted the same number course schedules chose subject at random(rand) and filtered the keywords in their profile(kw). The course schedule (kw) is the same as an initial solution of course scheduling problem of (as). We evaluated the following precision in this test; How many appropriate subjects in the course schedule. Here, the subject doesn’t know in which mode the course schedule was made. By comparing each type of precision, we analyzed the subjective evaluation test.
502
4.3
Y. Hori, T. Nakayama, and Y. Imai
Results
Table 3 shows a precision of this subjective evaluation result in each grade. Table 3. The results of the subjective evaluation B1-1st rand 0.59 kw 0.75 as 0.83 human 0.83
B1-2nd 0.64 0.74 0.80 0.83
B2-1st 0.62 0.70 0.80 0.81
B2-2nd 0.62 0.74 0.83 0.81
B3-1st 0.60 0.76 0.85 0.85
B3-2nd 0.63 0.73 0.80 0.82
Mean 0.62 0.74 0.82 0.83
As can be seen from Table 3, the precision is high in the order of (human), (as), (kw), (rand). The results of (human) and (as) are same level of precision. Considering the results of this subjective evaluation test, The experiment shows that Active Syllabus is almost as suitable as precision to select goodness class and is more suitable to make a course schedule efficiently. In comparison, the results of (kw) is decreased because random class is selected when there is no class that matches to the user profile. It is hard to specify all keywords repesented student interest, Activation and Spreading Model is effective in in a university course scheduling problem. We conducted analysis of variation in this 4 types of course schedules, and it showed differ significantly by precision(p < .01). Multiple comparison between (human) and (as) didn’t show differ significantly and the other types showed differ significantly (p < .05). Therefore, Our proposal method can make the same level course schedules as (human). 4.4
Discussion
Considering the results of subjective evaluation test, we made an objective evaluation test. We calculated network active value T in 4 types of all course schedules. T was calculated by expression (2) Table 4 shows network active value T . Table 4. The results of the active value of the course schedule B1-1st rand 18.44 kw 2.75 hm 5.87 as 42.44
B1-2nd 8.26 4.94 9.19 18.74
B2-1st 22.32 39.54 44.83 62.52
B2-2nd 21.08 22.48 57.30 33.66
B3-1st 10.15 14.28 25.13 30.22
B3-2nd 10.09 30.22 11.17 23.84
Mean 15.06 19.04 25.58 35.24
As can be seen from Table 4, the T was high in the order of (as), (human), (kw) and (rand). (as) score was showed highest in average, but (hm) score was higher than (as) in the second semester of 2nd grade. This means our search method of optimization problem could not find a best solution.
Automated Personal Course Scheduling Adaptive Spreading
503
We conducted analysis of variatin in this objective evaluation test. Multiple comparison between (as) and the others showed differ significantly (p < .05) and the other types didn’t show differ significantly. This objective evaluation means (as) has high relativity in the course schedule than others.
5
Conclusions and Future Work
In this paper, we have proposed a plan support system Active Syllabus aimed improving the efficiency of course schedules. This system made the course schedule using the spreading activation model in consideration of student’s interest and the relation between subjects. The result of the experiment, precision was about 0.7. Our system should be doing more tests to confirm these preliminary results with in real use case. Using activation spreading model into the syllabus description, Our system can provide not only existing classes matched by a student’s interest, but also their relevant classes. This is differ from a lot of past research that merely provide show the class information. Promising direction for future works are the optimization of period to makes a student profile, the examination of syllabus descriptions and the similarity between classes. These problems should be considered for wide coverage.
References 1. Hori, Y., Takimoto, M., Imai, Y.: A support system for course schedule design based on syllabus description. In: 8th International Conference on Information Technology Based Higher Education and Training (ITHET), pp. 58–62 (2007) 2. Nozawa, T., Ida, M., et al.: Construction of Curriculum Analyzing System Based on Document Clustering of Syllabus Data(Education). Transactions of Information Processing Society of Japan 46(1), 289–300 (2005) 3. Miyazaki, K., Ida, M., Yoshikane, F., Nozawa, T., Kita, H.: Development of a Course Classification Support System for the Awarding of Degrees using Syllabus Data. IPSJ Journal 46(3), 782–791 (2005) 4. Yamada, S., Matsunaga, Y., Itoh, E., Hirokawa, S.: A Study of Design for Intellingent Web Syllabus Crawling Agent. The Transactions of the Institute of Electronics, Information and Communication Engineers J86-D-I(8), 566–574 (2003) 5. Chubachi, Y., Ishii, Y., Ohiwa, H.: enTrance System Fundamental Concept and Value for a University Environment Software Engineers Association Software Symposium, pp. 181–184 (2001) 6. Ohiwa, H., Takeda, N., Kawai, K., Shimomi, A.: KJ editor: a card-handling tool for creative work support. Knowledge-Based Systems 10, 43–50 (1997) 7. Ohmi, Y., Kawai, K., Takeda, N., Ohiwa, H.: Pointing Operations in Cooperative Works using Card-handling Tool ”KJ-Editor”. Transactions of Information Processing Society of Japan 36(11), 2720–2727 (1995) 8. Ikegami, A., Niwa, A., Ookura, M.: Nurse Scheduling Problem in Japan. Operations Research as a Management Science Research 41(8), 436–442 (1996)
504
Y. Hori, T. Nakayama, and Y. Imai
9. Aickelin, U., Dowsland, K.A.: An indirect genetic algorithm for a nurse-scheduling problem. Comput. Oper. Res. 31(5), 761–778 (2004) 10. Caprara, A., Fischetti, M., Toth, P.: A heuristic method for the set covering problem. Operations Research 47, 730–743 (1999) 11. Easton, K., Nemhauser, G., Trick, M.: The traveling tournament problem description and benchmarks. In: Proceedings of Seventh International Conference on Principles and Practice of Constraint Prodramming (CP 1999), pp. 580-584 (1999) 12. Horio, M., Suzuki, A.: Development of a General-purpose Scheduling Solver under the Framework of RCPSP/.TAU. with Time Constraints. Journal of Japan Industrial Management Association 54(3), 203–213 (2003) 13. Anderson, J.R.: A spreading activation theory of memory. Journal of Verbal Learning and Verbal Behavior 22, 261–295 (1983) 14. Collins, A.M., Loftus, E.F.: A Spreading Activation Theory of Semantic Processing. Psychological Review 82, 407–428 (1975) 15. Lorch, R.F.: Priming and Searching Processes in Semantic Memory: A test of three models of spreading activation. Journal of Verbal Learning and Verbal Behavior 21, 468–492 (1982) 16. Waltz, D.L., Pollack, J.B.: Massively parallel parsing, A Strongly Interactive Model of Natural Language Interpretation. Cognitive Science 9, 51–74 (1985) 17. Kozima, H., Ito, A.: Context-sensitive word distance by adaptive scaling of a semantic space. In: Mitkov, R., Nicolov, N. (eds.) Recent Advances in Natural Language Processing. Contemporary Issues in Linguistic Theory, vol. 136, pp. 111–124. John Benjamins, Amsterdam (1997) 18. Huberman, B.A., Hogg, T.: Phase Transition in Artificial Intelligence Systems. Artificial Intelligence 33, 155–171 (1987) 19. Perkowitz, M., Etzioni, O.: Towards adaptive Web site: Conceptual framework and case study. Artificial Intelligence 118, 245–275 (2000)
Transfer Learning through Domain Adaptation Huaxiang Zhang Department of Computer Science, Shandong Normal University, Jinan 250014, Shandong China [email protected]
Abstract. It is interesting and helpful to use the labeled data of some tasks to improve the classification performance of another task. This paper focuses on this issue and proposes an algorithm named SSDT (Synthetic Source Data Transfer). As the number of the training data influences the classification performance greatly, we create some synthetic training data using the source data and combine them with the target data to train a classifier. The classifier is applied to the target data, and experimental results show that SSDT improves the performance obviously. Keywords: KNN; data distribution; transfer learning; classification.
1 Introduction The challenge for transfer learning technology is to learn and transfer knowledge from other tasks. The proposed approaches of transfer learning are categorized [1] as inductive transfer learning, transductive transfer learning and unsupervised transfer learning. They implement knowledge transfer by transferring part of related domain data to the target task [2,3,4], learning a good feature representation from the related domain data for the target domain [5,6,7,8,9 ] and sharing parameters or priors of the learned models from the related task [10,11]. In order to select out-of-domain data useful to the classification of in-domain problems, a simple framework is proposed [12] to treat the in-domain data as drawn from a mixture of two distributions: a “truly in-domain” distribution and a “general domain” distribution. Chuong et al [13] proposed an algorithm that leverages data from a variety of related classification problems to obtain a general mapping function and transfer the function to new tasks. Given the labeled data in the in-domain and unlabeled data in the out-domain, a co-clustering based text classification algorithm (CoCC) is proposed to transfer the knowledge and class structure from the in-domain to the out-of-domain through shared words [14]. Additional class-label information given by the in-domain data is extracted and used for labeling the word cluster for the unlabeled out-of-domain data. When the training data is scarce, a learned classifier will have high variance and high error, thus incorporating auxiliary data may improve the classifier’s performance. Two ways of incorporating the auxiliary data into a classification algorithm have been considered [15]. Some of the auxiliary data make positive contributions but the others may not, so we should ignore those bad data or transform them to benefit to the primary classifier. Dai et al [3] utilized the same-distribution and diff-distribution labeled data to extend AdaBoost algorithm. D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 505–512, 2011. © Springer-Verlag Berlin Heidelberg 2011
506
H. Zhang
In this paper, we focus on transferring the labeled data from the source domain to a related target domain with scarce labeled data, and propose algorithm SSDT.
2 Some Assumptions for Handling Data Transfer We treat that part of the labeled target data is governed by a multivariate normal distribution, and try to transform the labeled source data and make them to be used for the target domain classification task. Theorem 1. Central Limit Theorem. Consider a set of random variables drawn iid, and the sample mean
x1 ," , xn
x = (∑i =1 xi ) / n , then as n approaches ∞, the n
distribution governing x approaches a normal distribution. Given the source data drawn iid from a domain governed by some unknown probability ps , and the target data drawn iid from another domain governed by a dif-
pt . Each source and target sample can be considered as the value of random variable xs that obeys ps and random variable xt that obeys pt ferent unknown probability respectively. Assumption 1. Feature independent assumption. Each feature of dom variable, and is independent on any of the other features of
xt (or xs ) is a ran-
xt (or xs ).
Based on assumption 1, we have E{xti ⋅ xtj } = E{xti }E{xtj } .
(1)
E{xsi ⋅ xsj } = E{xsi }E{xsj } .
(2)
Where x*i and x* j ( * ∈ {t , s} ) denote two features of a sample from the target or the source domain, and x*i ⋅ x* j is the scalar multiplication of two vectors. Assumption 2. Class label assumption. Consider a set of instances same class label draw iid, we define the mean proaches ∞,
x1 ," , xn with
x = (∑i =1 xi ) / n , then as n apn
xn shares the same class label.
Over sampling [16] is commonly used in imbalanced data classification to generate synthetic instances for the minority class [16], and assigns minority class label to the synthetic instances.
3 Task Formalism and Domain Adaptation We transform the labeled data from the target and source domains to one common domain, and learn a classifier using all the labeled training data. For a target task, there
Transfer Learning through Domain Adaptation
507
exists a training data set Dt = {( xti , yti ) | i = 1," , N t } , where xti is the ith example and yti is its corresponding class label. N t is the number of the target training instances. The instances in Dt are drawn iid from a fixed but unknown distribution pt . There still exists another set of training data set drawn iid from a different but relevant domain which is governed by another fixed but unknown distribution ps . We call this domain the source domain in this paper and denote the data set Ds = {( xsi , ysi ) | i = 1," , N si } , where
xsi is the ith example and y si is its corresponding class label.
N s is the
number of the source training instances. We assume the training data from the target domain are scarce and the task we need to do is to transfer useful information underlying Ds to the target task. How to transform the labeled data from the source domain data obeying a predefined multivariate normal distribution is the key to transfer learning. We select a common domain where the distribution governing the underlying random variables is a multivariate normal distribution pc , and transform the data drawn from domain governed by
ps and pt to the common domain. We partition Dt into different subsets and the data in each subset shares the same class label. These data subsets are denoted as Dt1 ," Dtl . For a specific subset Dti of Dt , where i is the class label of the instances in Dti , we randomly choose m instances, and calculate the mean of these selected instances. The mean is a synthetic datum and is considered as a sample with class label i . Given the instance number of
Dti (we denote it Nti ), we generate Nti synthetic samples according to the same method. After all the partitioned subsets of Dt have been processed, we generate l synthetic data sets Dt1 ," Dtl , and each Dti with Nti mean values as its members and
i as each member’s class label. As m approaches infinite, these synthetic data are governed by different multivariate normal distributions and the data of the same class share a multivariate normal distribution when the feature independence assumption holds. For example, the synthetic data of the mean value is calculated as
Dti obey a multivariate normal distribution,
μti = ( ∑ xi ) / N ti and the covariance ∑ti ∈ R n× n ( n x i ∈D
is the dimension number of the instances) is a non-diagonal covariance matrix, and is calculated as = E( ( x j − μti )( xk − μti )) . The off-diagonal entries of
∑
∑
ti
ti
∑ ∑
x j ∈Dti x k ∈Dti
capture the dependencies among features. Based on assumption one(which may
does not hold), that the features are independent, the off-diagonal entries of the matrix should be zero and only the diagonal entries are considered. We ignore the off-diagonal entries and just consider the diagonal ones. For each member of a subset drawn
508
H. Zhang
from Dt and each member of subset drawn from Ds , we do similar calculations, and generate one corresponding synthetic data set respectively. Each synthetic data set has two values: the mean value for each class data and the covariance matrix. We call the above synthetic data generating approach random-mean, and for all the synthetic data sets, we denote the mean values μt1 ," , μtl , μ s1 ," μ sl , and the covariance matrixes ∑ ," ∑ , ∑ ," ∑ . t1 tl s1 sl After the above executions have been done, we find that, for a specific class i , the distributions governing the synthetic data sets some transformation on data set
Dsi and Dti are different. We perform
Dsi and make it share a common distribution
with Dti . For a given multivariate normal distribution with a full covariance matrix, the probability density function is described as p( x) =
1 (2π )
n
2
∑
1
2
1 exp{− ( x − μ )T ∑ −1 ( x − μ )} 2
(3) .
Where, x is a n-dimensional random variable. If assumption one is satisfied, the covariance matrix is reduced to be a diagonal matrix ⎡σ 112 ... 0 ⎤ ⎢ ⎥ ∑=⎢ 0 ⎥ ⎢ 0 ... σ nn2 ⎥ ⎣ ⎦
For each synthetic instance
(x Where,
si
(∑ si ) j
.
x si in D si , we perform the following transformation
) =(x
− μ si
(4)
j
' si
),
− μti
(∑ti ) j
j
j ∈ {1,.., n}
(5) .
( xsi − μ si ) j denotes the jth component of vector ( xsi − μ si ) , and
(∑ ** ) j refers to
σ jj
of
∑
**
.
′
Based on transformation (5), we obtain new synthetic data xsi , which is governed by a multivariate normal distribution with mean μti and covariance
∑
ti
respectively.
Conclusion one: Given that part of the target labeled data obeys a multivariate normal distribution with mean μti and covariance ∑ respectively, after the transforti
mation given by (5), the synthetic data
′ ′ xsi and xti ( xti ∈ Dti ) obey a multivariate
Transfer Learning through Domain Adaptation
normal distribution with mean
μti
and covariance
∑
ti
, and
509
′ xsi can be used together
with the labeled data from the target domain to construct a classifier. The synthetic data generated using the target data will be quite similar to the original data, and makes little contribution to widen the decision boundaries among different classes. Inspired by the KNN ( k nearest neighbor) algorithm, we think that if each synthetic data is generated by the averaged value of its k ( k = m ) nearest neighbors, it may be better than random-mean. We call this synthetic data generation approach KNN-mean. Table 1. Description of SSDT
Input:
m , the target and the source data set Dt
and
Ds
Output: a mapping function f : x → y Method:
Dt
Partition
and
Ds
into different data subsets
Dti (i = 1," , l ) and Dsi (i = 1,", l )
according to the class labels of the data For each
Obtain
and Dsi , calculate μti , ∑
Dti
Dti
ti
, μ si and
∑
si
by the following approach
For each instance in Dti , generate a synthetic instance by KNN-mean Obtain
Dsi
by the following approach
For each instance in
Dsi , generate a synthetic instance by KNN-mean
4 Experiments We use C4.5, NB (Naïve Bayes classifier) and NN (Neural Network) to evaluate our domain adaptation approach from wide aspects. 4.1 Data Sets We choose 5 UCI data sets and partition each of them into two parts: training data of the target domain and training data of the source domain. The partition method and the number instances in each part are given in table 1.
510
H. Zhang
Table 2. Data set partition scheme
Data set
Partition attribute
Source data/num
Target data/num
Diabetes Tae Heart-statlog Balance-scale Segment
pres Whether_of_not… fasting_blood_sugar left-weight short-line-density-5
=0 /733 =2 /122 =0 /230 >1/625 =0/2032
>0 /35 =1 /29 =1/40 =1/125 >0/90
4.2 Results and Evaluation We first do experiments to evaluate whether synthetic data can be used when constructing classifiers. We compare the accuracies of classifiers constructed based on the original data and the synthetic KNN averaged data using C4.5, NN and NB. Random-mean data refer to synthetic data generated by calculating the mean of the randomly selected data from the original data set. Results in table 2 show that almost all the times KNN-mean outperforms random-mean, but KNN-mean performs the same well as or a little bit worse than the original data. This reveals that the synthetic data calculated by KNN-mean approach can be adopted to construct a classifier. Table 3. Accuracies of the classifiers (expressed in %) using original and synthetic data Data set
C4.5 original
Heart-c
77.56
NB
random
KNN
–mean
-mean
69.97
77.56
original
83.49
NN
random
KNN
-mean
-mean
73.92
80.86
original
80.86
random
KNN
-mean
-mean
84.16
81.85
Cmc
46.44
46.44
56.55
51.39
50.78
49.69
52.34
46.91
54.72
Breast-w
94.56
78.97
93.85
95.99
91.56
94.85
95.27
94.56
92.56
Bridge-version1
57.14
68.57
72.38
67.62
63.81
78.09
69.52
79.05
84.76
Diabetes
73.83
69.79
70.96
76.30
51.56
73.83
75.39
73.57
70.44
Heart-statlog
76.67
71.48
70.37
83.70
74.81
82.96
78.15
82.22
73.70
Liver-disorders
68.70
62.61
57.39
55.36
53.62
63.19
71.59
69.86
71.88
Segment
96.92
62.64
79.83
80.25
65.71
74.17
96.06
77.88
89.61
Vehicle
72.46
47.28
59.46
44.79
44.68
50.95
81.685
64.06
74.11
We randomly select 30 instances (30T) from each target domain, and randomly select 200 synthetic instances and combine them with 30T to consist of a new data set(30T+200S). We train C4.5, NN and NB using 30T, 30T+200S and 30T+nS (n is the number of synthetic data of the corresponding data set, where n is 733 for Diabetes, 122 for Tae, 230 for Heart-statlog, 625 for Balance-scale, and 2032 for Segment), and
Transfer Learning through Domain Adaptation
511
obtain the classification accuracy on each data set for each algorithm. The number of data from source domain in Tae is only 122, so selecting 200 instances randomly from the source domain is infeasible. The accuracy represents the average of 10-fold cross validation result. Final results are given in table 3. Table 4. Accuracies of the classifiers (expressed in %) on different data sets
Data set
C4.5 30T
30T
NB 30T+nS
30T
+200S Diabetes
77.33
85.71
30T
NN 30T+nS
30T
+200S 91.43
68.00
74.29
30T
30T+nS
+200S 80.00
68.57
88.57
97.14
Tae
68.97
N/A
76.00
62.07
N/A
76.00
58.62
N/A
75.33
Heart-statlog
75.33
92.50
90.00
81.33
90.00
92.50
85.33
97.50
100
Balance-scale
74.00
83.33
97.14
85.33
93.33
90.95
93.55
97.50
97.14
Segment
80.00
94.78
98.33
77.33
83.33
80.00
87.46
100
100
The experimental results described in table 3 show that the classifiers using synthetic data perform much better than those without using synthetic data. This reveals that the synthetic data generated by SSDT using the data from the source domain are helpful to classify the target data.
5 Conclusions Synthetic data from the source domain combined with the inadequate target training data widen the class boundary of the data, and make the supervised target classification task easily. Acknowledgement. This research is partially supported by the Science and Technology Projects of Shandong Province, China (No. 2008B0026, J09LG02, ZR2010FM021, 2010G0020115).
References 1. Pan, S.J., Yang, Q.: A Survey on Transfer Learning (2008), http://www.cs.ust.hk/~qyang/publications.htm 2. Bickel, S., Bruckner, M., Scheffer, T.: Discriminative Learning for Differing Training and Test Distributions. In: 24th International Conference on Machine Learning, New York, pp. 81–88 (2007) 3. Dai, W.Y., Yang, Q., Xue, G.R., Yu, Y.: Boosting for transfer learning. In: 24th International Conference on Machine Learning, Corvallis, pp. 193–200 (2007)
512
H. Zhang
4. Jiang, J., Zhai, C.X.: Instance Weighting for Domain Adaptation in NLP. In: 45th Annual Meeting of the Association of Computational Linguistics, Prague, pp. 264–271 (2007) 5. Raina, R., Battle, A., Lee, H., Packer, B., Andrew, Y.N.: Self-Taught Learning: Transfer Learning from Unlabeled Data. In: 24th International Conferenceon Machine Learning, Corvallis, pp. 759–766 (2007) 6. Lee, S.I., Chatalbashev, V., Vickrey, D., Koller, D.: Learning a Meta-Level Prior for Feature Relevance from Multiple Related Tasks. In: 24th International Conference on Machine Learning, Corvalis, pp. 489–496 (2007) 7. Daume III, H.: Frustratingly Easy Domain Adaptation. In: 45th Annual Meeting of the Association of Computational Linguistics, Prague, pp. 256–263 (2007) 8. Blitzer, J., McDonald, R., Pereira, F.: Domain Adaptation with Structural Correspondence Learning. In: Conference on Empirical Methods in Natural Language, Sydney, pp. 120–128 (2006) 9. Argyriou, A., Evgeniou, T., Pontil, M.: Multi-Task Feature Learning. In: 19th Annual Conference on Neural Information Processing Systems, Vancouver, pp. 41–48 (2007) 10. Schwaighofer, A., Tresp, V., Yu, K.: Learning Gaussian process kernels via hierarchical bayes. In: 17th Annual Conference on Neural Information Processing Systems, pp. 1209–1216. MIT Press, Cambridge (2005) 11. Bonilla, E., Chai, K.M., Williams, C.: Multi-task Gaussian Process Prediction. In: 20th Annual Conference on Neural Information Processing Systems, pp. 153–160. MIT Press, Cambridge (2008) 12. Daume III, H., Marcu, D.: for Statistical Classifiers. Journal of Artificial Intelligence Research 26, 101–126 (2006) 13. Chuong, B.D., Andrew, Y.N.: Transfer Learning for Text Classification. Journal of Advances in Neural Information Processing Systems 18, 299–306 (2006) 14. Dai, W.Y., Xue, G.R., Yang, Q., Yu, Y.: Co-clustering Based Classification for out-of-domain Documents. In: KDD, California, pp. 201–219 (2007) 15. Wu, P.C., Dietterich, T.G.: Improving SVM Accuracy by Training on Auxiliary Data Sources. In: 21st International Conference on Machine Learning, Banff (2004) 16. Chawla, N.V., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Oversampling Technique. J. Artificial Intelligence Research 16, 321–357 (2002)
The Design of Evolutionary Multiple Classifier System for the Classification of Microarray Data Kun-Hong Liu, Qing-Qiang Wu*, and Mei-Hong Wang Software School of Xiamen University, Xiamen, Fujian Province, 361005, China {lkhqz,wuqq,wangmh}@xmu.edu.cn
Abstract. Designing an evolutionary multiple classifier system (MCS) is a relatively new research area. In this paper, we propose a genetic algorithm (GA) based MCS for microarray data classification. In detail, we construct a feature poll with different feature selection methods first, and then a multi-objective GA is applied to implement ensemble feature selection process so as to generate a set of classifiers. Then we construct an ensemble system with the individuals in last generation in two ways: using the nondominated individuals; using all the individuals accompanied with a classifier selection process based on another GA. We test the two proposed ensemble methods based on two microarray data sets, and the experimental results show that these two methods are robust and can lead to promising results. Keywords: multiple classifier system, genetic algorithm, microarray data.
1 Introduction With the development of microarray technology, it is possible to diagnose and classify some particular cancers directly based on DNA microarray data sets nowadays. Although it is important to design an efficient and accurate method for classifying the cancer samples through analyzing the microarray data, it is a challenging task to diagnose cancer type precisely because the number of variables p (genes) is far larger than the number of samples, n. Among the methods proposed to tackle this problem, multiple classifier system (MCS) is a promising solution. It is found that one single classifier can not always lead to good prediction results. Instead, MCS is proved to be more accurate and robust than an excellent single classifier in many fields [1]. Dietterich pointed out that the integration of multiple classifiers is one of the four most important directions in machine learning research [2]. The design of previous MCS, including neural network (NN) ensemble [3], mixtures of different experts [4], and various bagging and boosting methods [5-6], mainly relied on human experience. For example, the number of classifiers or the division of training data. However, for many real-world problems, it is impractical to design a robust system manually, especially when no enough prior knowledge has been obtained. And tedious trial-and-error processes are usually involved. *
The corresponding author.
D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 513–522, 2011. © Springer-Verlag Berlin Heidelberg 2011
514
K.-H. Liu, Q.-Q. Wu, and M.-H. Wang
The evolutionary approaches are proved to be efficient in the search of high quality results. The evolutionary MCS is a relative new technique, and many MCS have been proposed based on evolutionary algorithms recently. For example, Liu et al proposed evolutionary ensembles with negative correlation learning (EENCL) to construct an ensemble system with NNs as base classifiers [7]. Chandra and Yao developed an ensemble learning system, named as DIVACE, based on multipleobjective evolutionary system [8]. In these evolutionary systems, the systems are trained with the whole training data sets, and the diversity in the system is encouraged by adjusting the weights of the NNs’. By randomly selecting feature subsets for a base classifier, it is also hopeful to generate a robust MCS [9]. But it is noted that similar to the common feature selection problem, after eliminating the irrelevant and redundant features, and assigning different relevant feature subsets to different base classifiers, the generalization performance of a MCS can be further improved, which is named as ensemble feature selection [10]. For example, Kuncheva and Jain designed two methods to run a genetic algorithm (GA) for selecting feature subsets [11], and Oliveira et al built a MCS with searching for proper feature subsets based on multi-objective GAs [12]. Although feature selection is very important for the microarray data classification, and by finding predictive gene features, the cancer prediction accuracy can be much higher, to the best of our knowledge, there is not an evolutionary based MCS designed for microarray data prediction problem. So we try to explore the ensemble feature selection problem and build a MCS for microarray data based on GAs. In this paper, a multi-objective GA based classifier ensemble approach, named as GA-1, is proposed for generating a set of classifiers. GA-1 is designed to implement the ensemble feature selection by selecting valuable outcomes from multiple filter feature selection methods, and discover the proper feature subset ensemble methods for the ensemble of multiple classifiers. Because of the weak correlation between many of the current diversity measures and the generalization performance, as was pointed out in [13], in the design of GA-1, no diversity measures are used, which is different from the proposed multi-objective GAs [8, 12]. Instead, GA-1 evolves towards the two objectives: maximizing the prediction accuracy of each individual and maximizing the minimum margin of the ensemble system. As pointed out in [14], it is hopeful that by pruning some classifiers of a MCS, better generalization performance may be achieved. We build ensemble systems with the individuals in last generation in two ways: a. combining the nondominated individuals form a MCS; b. combining all the individuals together to form a MCS, and then another GA, named as GA-2, is used to select the individuals so as to construct an concise but accurate MCS. The paper is organized as follows: in section 2, the design of the evolutionary MCS is described in detail. In section 3, the experimental results and corresponding discussion is given. And the paper is concluded in section 4.
2 The Design of Evolutionary MCS GA is inspired by mechanisms of evolutions in nature, and has been proved to be successful at tackling the optimization problem. And this paper proposes a GA based MCS design scheme for microarray data classification.
The Design of Evolutionary Multiple Classifier System for the Classification
515
Because of the property of the microarray data, the design of our MCS is different from the proposed approaches. The microarray data are usually divided into training set (denoted by Tr) and test set. We further divide Tr into two parts: training subset (denoted by Tr’) and validation subset (denoted by V’), in proportion of 2:1. The samples are assigned to Tr’ and V’ randomly but also stratified, that is, Tr’ and V’ contain the same proportion of samples of each class compared to the original training and test set. As the number of training samples in Tr is only around or less than 100 usually, the number of samples in Tr’ are even smaller. So a base classifier is trained with a training bootstrap sample subset generated from Tr’. Then the performance of MCS would be evaluated with the validation subset V’. To fully utilizing the information carried by the thousands of genes, we set up a feature pool with the top genes selected with different filter methods. 2.1 The Design of GA-1 (1) The Design of Chromosome In microarray data sets, there are thousands of gene features typically. Filter feature selection methods are efficient techniques for the reduction of the dimension of microarray data. With the aid of feature selection techniques, the irrelevant features can be removed and important features can be identified by applying certain selection criteria. However, as all the filter methods are motivated by various hypotheses which would not be true in many cases, one method performing well for a data set would be bad for other data sets, and different methods would pick up quite different feature subsets from a same feature set. What’s more, as the filter approach is independent to the prediction rule, it is impossible to decide directly that with which filter method a classifier can work best. So it is even harder for us to select a proper filter method for an ensemble system. Based on this consideration, we set up a feature pool by different filter methods to implement the ensemble feature selection. Eight different filter methods are applied in this paper: concave minimization based feature selection [15], Relief [16], signal to noise correlation coefficient [17], mutual information based [18], T-test, entropy based, Chernoff bound based and U-statistic (the last four methods are provided in bioinformatics toolbox in Matlab v.7.1). The top 100 features of each method are kept and form a separate feature set. This is based on the observation that the top selected features selected by a filter method would be correlated with each other, so it is of great possibility that a feature subset selected from a same feature set would give out more information. As a result, in our experiments, the feature subsets are selected within each set independently, without any influence on other features in other sets. Another important topic of the design of MCS is how to select proper classifiers to construct an ensemble system. In this paper, we mainly discuss the design of MCS with some different base classifiers and with multiple NNs. And for convenience, the two design schemes are named as MCS-1and MCS-2 respectively. Taking these into consideration, a multi-objective GA is proposed to select proper feature subsets from the feature pool to implement an ensemble feature selection process. At the same time, when designing MCS-1, the GA would also try to search
516
K.-H. Liu, Q.-Q. Wu, and M.-H. Wang
for a base classifier to match a training set with proper feature subsets. The detail design can be described as follows. The binary coding scheme is adapted, and the structure of chromosome can be illustrated as Fig.1. Assume that there are Nc classifiers, Nm filter feature selection methods, and the length of feature subset is Nf, then a chromosome is constructed by these three parts, and the total length of it is ⎡⎢ log 2 (N c ) ⎤⎥ + ⎡⎢log 2 (N m ) ⎤⎥ + ⎡⎢log 2 (N f ) ⎤⎥ . In detail, the first part of chromosome is used to choose a proper classifier, and the second is used to assign a filter method to match the classifier. After the filter method is determined, the corresponding feature set will be applied to train the classifier with a feature mask which is presented by the third part of the chromosome. In this paper, when designing MCS-1, five different base classifiers are used, and eight different filter methods are applied with 100 features kept for each method respectively, so the total length of chromosome is 106 in all corresponding experiments. But when designing MCS-2, the first part of the chromosome is cancelled, and the corresponding length of chromosome is 103.
Fig. 1. The design of the chromosome for GA-1
What’s more, it should be noted that once a bootstrap sample subset is generated to train the i-th individual, this sample subset will always be assigned to this individual in each generation, no matter the corresponding feature subset or base classifier is changed or not. (2) The design of fitness function Training classifiers with different feature subset can already ensure the diversity in the system, so in the paper [11], only the accuracy of individual classifiers was used as the fitness function. However, we find that in our experiments, when only considering the accuracy, the final MCS would unavoidably suffer overfitting due to the small sample size. And it is necessary to encourage the diversity in the design of GA. And some papers use different diversity measures as a part of the objective function. For example, in [10], the fitness function was calculated as Fitnessi=accuracyi+α × diversityi for i-th individual, which combined these two goals as one with simple linear weighted. Although the author claimed that high prediction accuracy could be achieved with this scheme, some efficient multi-objective solutions are more promising. In [8, 12], the diversity measure and prediction accuracy were regard as two objectives, and the ensemble systems were constructed based on a multi-objective GA. However, as pointed out in [13], many diversity measures are correlated to the average accuracy of the base classifiers and can not always lead to the good ensemble performance. Instead, the authors used the margin, which was introduced in [19], to analyze the success of MCS, and suggested that maximizing of minimum margin can
The Design of Evolutionary Multiple Classifier System for the Classification
517
lead to better generalization performance of an MCS. Given L base classifiers, and a weight wj assigned to j-th classifier, where weight wj≥0 and the sum of weights is equal to 1, the margin of the system can be defined by L (1) mi = ∑ w j Oij j =1
where Oij equals 1 if the sample xi is classified by the j-th classifier correctly, and equals -1 otherwise. It is obvious that in order to maximize the minimum margin, we should pay attention to the hardest samples, which are most likely to be misclassified. In the case of simple majority vote, the margin can be calculated as L (2) mi = ∑ (Oij / L) j =1
So when all the hardest samples are classified correctly with as many classifiers as possible, the minimum margin can be maximized. Based on this observation, GA-1, a multi-objective GA, is designed to evolving proper base classifiers. One objective of GA-1 is to maximize the prediction accuracy of each individual. Another objective is designed to maximize the minimum margin. To achieve this goal, we use the fitness sharing function which is similar to the one used in [7]. In detail, this fitness function is calculated by checking the correct ‘covering’ of the same samples by shared individuals. Let Kij equals 1 when sample xi can be correctly classified by j-th individual; and 0 otherwise. Assume that there are N samples. Pi individuals can classify xi correctly, then each of these individuals gets 1/ Pi fitness, and others receive zero fitness. And the fitness of j-th individual is calculated by summing the total fitness assigned to it. It can be calculated by N 1 (3) f j = ∑ i =1 ( × Kij ) Pi However, in our experiments, we find that this measure is not efficient enough. The main drawback of it is that it can not force the system to classify all the samples correctly, and sometimes one or more samples can not be correctly classified by all the individuals in the system. In this case, the minimum margin gets the lowest value. In order to further encourage the covering of the samples, we adapt this fitness function with a penalty parameter. When m samples are not correctly classified by all of the individuals in the system, the fitness value of all the individuals in this generation would be divided by (1+m), then the formula (3) would be adjusted as N 1 (4) f j = ∑ i =1 ( × Kij ) /(1 + m) Pi So when all of the samples are correctly recognized in a population, the fitness value of all the individuals in the population can be much higher compared with the case that some samples are remained to be misclassified by a whole population. In this way, all samples can usually be correctly classified in the final generation, and then the maximizing minimum margin can be achieved. GA-1 is implemented based on the NSGA- II [20], and the fitness evaluation process is done according to the non-dominate sorting procedure. When the GA-1 stops, we can obtain a set of individuals with both accuracy and maximized minimum margin. But it should be noted that it is impossible to expect that all the individuals in last generation are nondominated.
518
K.-H. Liu, Q.-Q. Wu, and M.-H. Wang
There are two methods to construct a MCS: combining the nondominated individuals; combining all individuals in the last generation. It is obvious that a MCS may not always achieve the best generalization performance with simply combining all individuals in last generation as there are some dominated individuals usually. In fact, an unsolved problem is when to stop the evolutionary algorithm so as to guarantee the best set of classifiers can be achieved in last generation. As proved in [13], a compact and more accurate ensemble system can usually be obtained by pruning some classifiers. And by leaving out some individuals, it is not so important to solve the questions that when and how to stop the evolutionary process. So we use another GA to prune some individuals from the whole population in last generation. Then the two possible ensemble methods can be concluded as: employing the nondominated individuals; employing the whole population with a pruning process. These two methods are compared in next section. 2.2 The Design of GA-2 GA-2 is proposed to further select the classifiers generated by GA-1. The design scheme of GA-2 is the same as the GA proposed in [13]. In detail, GA-2 is a simple standard GA, and the real code strategy is applied. Each chromosome presents an ensemble system. Each gene is valued within [0, 1] so as to represent the weight assigned to an individual classifier, and the classifier with the weight less than a threshold would be ignored. The prediction accuracy of the ensemble is assigned as the fitness function, and is calculated by the majority vote results. Further details can be found in [13]. In our experiments, we find that there are usually a lot of chromosomes achieving 100% prediction accuracy after running some generations. So it is obviously that due to the small sample size, only using a best individual in last generation may lead to overfitting, and in [21], it was suggested that combining some evolved ensembles can lead to more stable results. Then we try to obtain the final decision by combining all of the best individuals in last generation. It should be noted that each individual represents a MCS already, so this fusion method uses a number of ensemble systems represented as the best individuals in last generation.
3 Experimental Results and Discussion We use two publicly available microarray datasets: breast cancer data set [22] and prostate cancer data set [23]. In these two datasets, there are 78 and 102 samples for classifier training, and 19 and 34 samples for test respectively. The breast cancer data set introduced in [22] contains missing values. Those have been estimated based on 5% of the gene expression profiles that have the largest correlation with the gene expression profile of the missing value. No further preprocessing is applied to the prostate data set. When building MCS-1, five classifiers are used: fishier classifier (fishier), binary decision tree (tree), nearest mean classifier (nmc), support vector classifier (svc), the nearest neighbor classifier (1-nn). And they are all provided by the PRTools [24].
The Design of Evolutionary Multiple Classifier System for the Classification
(a) Breast cancer data set
519
(b) Prostate cancer data set
Fig. 2. The prediction accuracy of base classifiers. Eight feature sets are shown as different groups, in the sequence as shown in section 2. For each feature set, the prediction accuracy of the six base classifiers is shown in each group in the sequence: fishier, tree, nmc, svc, 1-nn, NN.
When designing MCS-2, backpropagation algorithm [25] is applied to train the neural networks, and each network has one hidden layer with five hidden units. Corresponding parameters are set as the default values of Matlab [26]. Firstly, the prediction results based on different classifiers and different feature sets are shown in Fig.2. From Fig.2, it can be found that the performance of each classifier varied greatly with different feature sets. The prediction ability of a same classifier would be quite different for the two datasets even when it was combined with a same feature selection method. So there is not a best classifier which can always lead to the highest prediction accuracy. As a result, there is no way to foresee that which classifier and feature selection method should be combined to predict new samples in a data set best. Without enough prior experience, the generalization performance of a single classifier is not good enough. And it is impractical to expect that an excellent classifier can be applied to classify every microarray data accurately. The performance of a MCS can be much better. We show the results of our evolutionary MCS with the two ensemble methods respectively in Fig.3. In our experiments, for GA-1, the population size is set as 50. GA-1 stops at 20th generation as we find that no benefit would be obtained when running some more generations. Other parameters are set as follows: the mating pool size is equal to half of the population size; single-point crossover and bitwise mutation are used, and crossover probability is 0.9 and mutation probability is 1/l (l is the string length). For GA-2, each chromosome contains 50 genes to represent the 50 classifiers, and the population size is also set as 50. GA-2 stops after 100 generations. This GA is realized by the GAOT toolbox [27], and the genetic operators, including select, crossover, mutation, and all other parameters, including the crossover probability, the mutation probability, the selection pressure, are all set according to the default values of GAOT. The threshold is set to 0.02. In each experiment, GA-1 and GA-2 both run only once. The results are obtained by 10 independent runs, and the first populations of the two GA are generated randomly.
520
K.-H. Liu, Q.-Q. Wu, and M.-H. Wang
(a) Breast cancer data set
(b) Prostate cancer data set
Fig. 3. The comparison of the different MCSs. The eight ensemble methods are shown in the sequence: boosting with NN; boosting with svc; bagging with NN; bagging with svc; MCS-1 based on GA-1; MCS-1 based on GA-2; MCS-2 based on GA-1; MCS-2 based on GA-2.
The results of bagging [5] and adaboost.M1 [6] are also shown in Fig.2 for comparison. Because our MCS are constructed based on the ensemble feature selection scheme, it is hard to compare these methods with boosting and bagging algorithms directly. For convenience, we applied boosting and bagging algorithms to the two data sets with the eight feature sets respectively, and all the results are recorded and shown in Fig.2. It should be noted that in our experiments, only NN and svc are used as base learners for the two algorithms because the corresponding results are better than using other base classifiers. From Fig.3, we find that neither boosting nor bagging could lead to the best prediction results, and only the prediction results of boosting with NN on prostate data set were close to those of our GA based MCS. What’s more, the performance of these two algorithms varied quite a lot when applied to different data sets. So again, we find that the performance of these two algorithms can be affected by feature selection methods greatly. And different base classifier will also lead to different performance of the two algorithms. Our GA based MCS can perform better than boosting and bagging on the two data sets. We find that the GA-2 could not always lead to better results, and the results obtained by GA-1 were close to those by GA-2 or even better than those of GA-2, which is not consistent with the conclusion given in [14]. The reason may mainly lie in that the number of nondominated individuals was usually close to or more than 70% in last generation although GA-1 evolved only 20 generations. As the number of nondominated individuals is large enough, promising results can already be obtained by majority vote. GA-2 is applied to select individuals from the whole population which is composed of nondominated and dominated individuals, so there is no guarantee that the selected ensembles contain only the nondominated individuals. What’s more, due to the small sample size, GA-2 may suffer overfitting easily, which also limits its performance. And it can be expected that with larger sample size, the generalization performance of the MCS based on GA-2 can be better.
The Design of Evolutionary Multiple Classifier System for the Classification
521
In conclusion, the proposed two ensemble methods are both efficient, and our evolutionary based MCS approaches are promising. The evolutionary based approaches can lead to good prediction results despite of the small sample size. At the same time, unlike bagging or boosting, the performance of these approaches will not be affected by the application of different feature selection methods with the aid of the ensemble feature selection process. What’s more, from the results of MCS-1, we can find that our GA can efficiently select proper base classifiers, which is also an advantage compared with other algorithms. So these evolutionary MCS are robust.
4 Conclusion This paper proposes an evolutionary based MCS scheme to tackle the caner classification problem. Firstly, we construct a feature pool with eight different filter selection methods, then a multi-objective GA is proposed to select proper feature subsets for an ensemble system. And this GA aims to maximize the individual prediction accuracy and the minimum margin. We try to construct a MCS based on the individuals generated by this GA, and two schemes are tested: using the nondominated individuals in last generation; using the all individuals in last generation with a pruning process based on another GA. We find that the two methods can both result in robust MCS with similar performance in our experiments. And compared with other ensemble system, such as bagging and boosting, our approaches are robust. Although the proposed approach is only applied to tackle the two-class prediction problem in this paper, it can be extended to predict the multi-class microarray data by applying multi-class classifiers. In addition, the combination of MCS and feature extraction techniques, as proposed in [28], is also a promising research field. These are our future work. Acknowledgments. the project was supported by the Natural Science Foundation of Fujian Province of China (No. 2010J05137) and the Fundamental Research Funds for the Central Universities (No: 2010121038)
References [1] Kuncheva, L.I.: Combining pattern classifiers: methods and algorithms. Wiley, Chichester (2004) [2] Dietterich, T.G.: Machine learning research: four current directions. AI Magazine 18(4), 97–136 (1997) [3] Sharkey, A.J.C.: On combining artificial neural nets. Connect. Sci. 8(3/4), 299–313 (1996) [4] Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3(6), 79–87 (1991) [5] Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996) [6] Freund, Y., Schapire, R.E.: A decision–theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997) [7] Liu, Y., Yao, X., Higuchi, T.: Evolutionary ensembles with negative correlation learning. IEEE Trans. Evol. Comput. 4(4), 380–387 (2000)
522
K.-H. Liu, Q.-Q. Wu, and M.-H. Wang
[8] Chandra, A., Yao, X.: Ensemble learning using multi-objective evolutionary algorithms. Journal of Mathematical Modelling and Algorithms 5(4), 417–445 (2006) [9] Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Machine Intell. 20(8), 832–844 (1998) [10] Opitz, D.W.: Feature selection for ensembles. In: Proc. 16th National Conf. on Artificial Intelligence, pp. 379–384 (1999) [11] Kuncheva, L.I., Jain, L.C.: Designing classifier fusion systems by genetic algorithm. IEEE Trans. Evol. Comput. 4(4), 327–336 (2000) [12] Oliveira, L.S., Morita, M., Sabourin, R.: Feature selection for ensembles using the multiobjective optimization approach? Studies in Computational Intelligence, vol. 16, pp. 49– 74 (2006) [13] Tang, E.K., Suganthan, P.N., Yao, X.: An analysis of diversity measures. Mach. Learn. 65, 247–271 (2006) [14] Zhou, Z.H., Wu, J.X., Tang, W.: Ensembling neural networks: many could be better tan all. Artificial Intelligence 137(1-2), 239–263 (2002) [15] Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 82–90 (1998) [16] Kira, K., Rendell, L.: A practical approach to feature selection. In: International Conference on Machine Learning (1992) [17] Golub, T.R., et al.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999) [18] Zaalon, M., Hutter, M.: Robust Feature Selection by Mutual Information Distributions. In: Proceedings of the 18th International Conference on Uncertainty in Artificial Intelligence (2002) [19] Schapire, R.E., Freund, Y., Bartlett, P.L., Lee, W.S.: Boosting the margin: a new explanation for the effectiveness of voting methods. Annals of Statistics 26(5), 1651–1686 (1998) [20] Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002) [21] Liu, Y.: How to stop the evolutionary process in evolving neural network ensembles. In: Jiao, L., Wang, L., Gao, X.-b., Liu, J., Wu, F. (eds.) ICNC 2006. LNCS, vol. 4221, pp. 185–194. Springer, Heidelberg (2006) [22] van’t Veer, L.J., Dai, H., Van De Vijver, M.J., He, Y.D., Hart, A.A.M., Mao, M., Peterse, H.L., Van Der Kooy, K., Marton, M.J., Witteveen, A.T., Schreiber, G.J., Kerkhoven, R.M., Roberts, C., Linsley, P.S., Bernards, R., Friend, S.H.: Gene Expression Profiling Predicts Clinical Outcome of Breast Cancer. Nature 415, 530–536 (2002) [23] Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., Sellers, W.R.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002) [24] Duin, R.P.W., Juszczak, P., Paclik, P., Pekalska, E., de Ridder, D., Tax, D.M.J.: PRTools4, A Matlab Toolbox for Pattern Recognition, Delft University of Technology (2004) [25] Rumelhart, D.E., Hinton, G.E., Willianms, R.J.: Learning internal representations by error propagation. Parallel Distributed Processing: Explorations in the Microstructure of Cognition 1, 152–161 (1986) [26] Demuth, H., Beale, M.: Neural network toolbox for use with MATLAB, The MathWorks Inc., Natick (1998) [27] Houck, C.R., Joines, J.A., Kay, M.G.: A genetic algorithm for function optimization: a matlab implementation, Technical Report: NCSU-IE-TR-95-09, North Carolina State University, Raleigh, NC (1995) [28] Liu, K.H., Li, B., Zhang, J., Du, J.X.: Ensemble component selection for improving ICA based microarray data prediction models. Pattern Recognition 42(7), 1274–1283 (2009)
Semantic Oriented Clustering of Documents Alessio Leoncini, Fabio Sangiacomo, Sergio Decherchi, Paolo Gastaldo, and Rodolfo Zunino 1
SeaLab, DIBE, University of Genoa, via all’Opera Pia 11, 16135 Genoa, Italy {alessio.leoncini,fabio.sangiacomo,sergio.decherchi, rodolfo.zunino}@unige.it
Abstract. Semantic web-based approaches and computational intelligence can be merged in order to get useful tools for several data mining issues. In this work a web-based tagging process followed by a validation step is carried to tag WordNet adjectives with positive, neutral or negative moods. This tagged WordNet is used to define a semantic metric for text documents clustering. Experimental results on movie reviews prove that the introduced semantically oriented metric is extremely fast and gives improved results with respect to the classical frequency based text mining metric from the accuracy point of view. Keywords: Semantic Orientation; EuroWordNet; Clustering.
1 Introduction This work couples the semantic orientation task [1] with a clustering tool [2] in order to categorize and calibrate [3] documents. When referred to a word, semantic orientation (SO) indicates: “the deviation of a word from the norm for its lexical field” [1]; in this work the mood represents the semantic orientation; it can be either positive, neutral or negative. The paper shows that semantic orientation can be used for building a metric [3]. This task requires the evaluation of an average SO for every document, starting from a SO of each term inside the documents. One of the tools used to support SO is EuroWordNet [4]: using adjectives, and tagging them with their mood allowed to obtain a mood-tagged version of EuroWordNet. The tagging process is human-based and was obtained with the generous help of many students of the University of Genoa [5]: the values obtained from the tagging process were evaluated and validated in order to avoid inconsistencies among EuroWordNet synsets with the goal of discriminating positive from negative or neutral feelings. The paper is organized as follows: Section 2 describes the existing categorization framework on which this work is built and EuroWordNet semantic network; Section 3 describes how semantic orientation is embedded in the clustering engine, Section 4 discuss the experimental results obtained on a movie review database [6].
2 Text Clustering Engine The reference text clustering engine is SLAIR [2]; SLAIR is a versatile framework whose software components are easily customizable. SLAIR essentially is clustering D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 523–529, 2011. © Springer-Verlag Berlin Heidelberg 2011
524
A. Leoncini et al.
tool and is based on Kernel K-Means; additionally the usual pre-processing steps involved in text mining applications such as stemming, punctuation removal and stop words removal are available. SLAIR can also perform queries to EuroWordNet semantic network; these queries are used to obtain a semantic metric. The text document representation for clustering used in SLAIR is the vector space model (VSM) [7]. Additionally SLAIR uses the classical frequency based metric coupled with a style based metric which allows improved clustering performances [3]. Concerning the text clustering process, it is based on the following phases: 1. 2. 3. 4.
5.
Stemming is obtained by using the word morphing functions provided with original WordNet English package [8]. Stopwords removal is performed using a simple list of frequent words with no discriminative effects. A vocabulary is built with all the stemmed words from all the analyzed documents. If enabled, the frequency based representation is mapped to a semantic one [2] by using EuroWordNet. This feature is used in this work to obtain a SO score. Finally Hierarchical Kernel K-means [9], with dynamic branch creation on the clusters tree is run.
2.1 The EuroWordNet Semantic Network EuroWordNet [4] is a multilingual lexical database built by researches coming from several European countries. The structure of EuroWordNet was inspired by the Princeton WordNet [10] project. Both EuroWordNet and WordNet contains many concepts belonging to all the lexical categories, or part-of-speech (POS): nouns, verbs, adjectives and adverbs. The fundamental building block of these databases is the “synset”: a set of words which share the same meaning, e.g. {“car”, “auto”, “automobile”, “machine”, “motorcar”}. Synsets can be related to each other by semantic relations, and can belong to some hierarchies of concepts. According to the lexical category, synsets have different semantic relations. Another very interesting feature of EuroWordNet is the Inter-Lingual Index (ILI), which has the purpose to provide an efficient mapping across languages. The EuroWordNet database for the English, is a mapping of the original WordNet 1.5 born in Princeton: hence the English database is the most fine-grained database and for this reason was also the base for the creation of the ILI. The EuroWordNet semantic network allows to retrieve, for every term in a document, a synset with words linked in several ways to the queried term.
3 Semantic Orientation Tagging and Metric Two are the main contributions of this work; the mood tagging of EuroWordNet and a mood guided metric for clustering. Next subsections elucidate on these aspects.
Semantic Oriented Clustering of Documents
525
3.1 Tagging EuroWordNet In order to achieve EuroWordNet tagging in positive/neutral/negative sensations, an offline preprocessing phase for the assignment of the labels for the semantic orientation was requested. Starting from the 112,641 synsets of EuroWordNet’s English database, only those belonging to the adjective lexical category were chosen to perform tagging. The synsets containing words with the initial capital letter (e.g. those concerning nationalities) were considered as non taggable. After the selection of all the WordNet adjective synsets, 15,726 synsets were obtained; on-line [5] a human-based categorization in positive/neutral/negative sensations was made possible by the generous contribution of many students of the Engineering Faculty of the University of Genoa. This approach leads to pros and cons: from a perspective, a web-based tagging allows a large number of users to tag adjectives, however the process is not supervised and inconsistencies among synonyms can appear. A first solution to address this problem is to repeat more times (i.e. three) the full tagging, thus obtaining redundant scores for every synset, and so minimizing, as much as possible, the subjectivity of the results. The manual check of randomly picked samples confirmed the goodness of the votes when corrected in such a way. Table 1 shows the answers obtained by the volunteers. Table 1. Semantic Oriented percentual categorization based on students answers Adjective synsets 15,726
Positive sensation 3045 (19.4%)
No sensation 8966 (57.0%)
Negative sensation 3715 (23.6%)
3.2 Validation of the Tagging Process This first stage gave a preliminary tagged EuroWordNet. In order to further increase robustness of the tagging a validation stage was adopted. The synsets filled with adjectives are mainly characterized by two semantic relations: “synonymy” (indication of same concept) and “antonymy” (indication of opposite concept). Hatzivassiloglou and McKeown [11] in their work on the prediction of SO, claim that “if we know that two words are related to the same property but have different orientations, we can usually infer that they are antonyms”; in the current work, the opposite problem is present: the EuroWordNet databases provide reliable antonym relations and on the basis of these relations one can validate the tagging process, hence eliminating possible inconsistencies among synsets. The central idea on which validation is based is that redundancies allow to fix possible inconsistencies: operatively when a synset is under validation all its synonyms and antonyms are analyzed and the consistencies of the links among synsets are checked; if an antonym is found its vote is inverted, that is a positive SO becomes negative and vice-versa; reasonably a neutral vote remains neutral. If the synset under test is not consistent with the majority of the relations to the other synsets, then its SO is set using majority voting induced by the other synsets. In this procedure one suppose that all the synsets surrounding the synset under analysis
526
A. Leoncini et al.
Fig. 1. Synsets validation example: the synset under test is inconsistent with respect to all its neighborhoods. Thus his tag is swapped accordingly from negative to positive.
are correctly tagged: this assumption is approximately correct also for a roughly tagged network because a synset is usually linked to a large number of synsets and one can reasonable suppose that the high number of synsets allow a robust validation. Figure 1 graphically explicates the procedure; in that case a synset was humanly tagged in a inconsistent way with respect to the other synsets, thus its mood is changed accordingly. The full procedure can be summarized in the following pseudo-code: For each synset s with a vote v, create a vector v: Get from EuroWordNet the synonyms S of s; Append the vote v of each element of S to v; Get from EuroWordNet the antonyms A of s; Invert the vote v of each element of A obtaining v’; Append the vote v’ of each element of A to v; End for. For each synset s with an array of votes v: + 0 Counts v ,v ,v : the number of positive, negative and neutral votes; Find majority vote v* and assign it to s; If tie, the human vote wins; End for. After one validates the human-tagged adjectives one can further increase the number of tagged adjectives themselves. Indeed all the available synonyms and antonyms linked to the validated adjectives can be safely tagged. Results of the validation phase and further tagging process are showed in Table 2. Table 2. Results from the WordNet processing and validation of students answers Adjective synsets 15,842
Positive sensation 4010 (25.3%)
No sensation 7371 (46.5%)
Negative sensation 4461 (28.2%)
Semantic Oriented Clustering of Documents
527
From the Tab. 2 results, one sees that the number of concepts that express a positive sensation has become closer to the number of concepts expressing negative sensations with respect to Table 1; more importantly the number of neutral adjectives has been shrunk of 10.5%. This can be interpreted as a good result, as it is reasonable to suppose that each positive adjective roughly has its negative counterpart. 3.3 Semantically Oriented Metric After the validation process each adjective synset exhibits a single vote, thus the votes can be used by the clustering tool for a metric. In particular every document di is associated to a quantity mi that is the mean of the adjectives mood present in the document, where a positive adjective is encoded with +1, negative with -1, and neutral with 0, thus mi is in the range [-1,+1]; hence mi represents the mood of the document di. This mood mi can be parameterized by adopting a strategy reminiscent of an activation function f ( mi ) ; f (mi ) is a function that using a threshold τ can mitigate or
emphasize the positive and negative mood. The function f (mi ) is parameterized by the threshold τ and is defined as:
⎧ +1, if mi > τ ⎪ f (mi ) = ⎨ mi ,if mi ∈ [−τ , +τ ] ⎪ −1,if m < −τ i ⎩ In the limit case for which τ = 0, one obtains hard thresholding that is f (mi ) is a binary function and the only available values are +1 and -1; this limit case enforces a very crisp solution in which positive and negative tagged documents are highly unrelated. The opposite case is the value τ = 1, for which due to the range [-1,+1] of mi then f ( mi ) = mi thus the documents mood varies smoothly. Once the f(mi) scalars are computed these induce a simple Euclidean metric: given di and dj the distance D(di , dj) is (f(mi) – f(mj))2 . This parameterization allows to define different metrics that induce different clustering results where the concentration of the documents varies: in the limit case τ = 0, at least in theory, only two clusters are possible the positive and the negative one; in the other limit case τ = 1 clustering tool is free to get any result. To some extent the role of τ is that of a regularizer which defines the complexity of the solution, i.e. the number of clusters; the more the metric is concentrated (small τ) the less the number of clusters.
4 Experimental Results The aim of this section is to compare the developed metric with the default SLAIR metric [3] within a suitable text mining domain. Being the object of this study the mood, or the affective side in texts, a suitable domain is offered by the movie review database [6]; the database used is the polarity dataset v1.0 freely available at
528
A. Leoncini et al.
Table 3. Comparison between default SLAIR metric and semantic oriented metric parameterized by τ Metric Default SO SO SO SO SO SO SO
τ 0 0.01 0.1 0.2 0.5 0.75 1
Time 9531 ms 656 ms 562 ms 562 ms 578 ms 672 ms 610 ms 562 ms
Accuracy 67.71% 64.86% 65.50% 69.00% 69.57% 69.43% 69.93% 70.36%
Number of Clusters 149 3 9 64 101 136 134 139
http://www.cs.cornell.edu/people/pabo/movie-review-data/. It collects 1400 movie reviews that are pre-classified in positive and negative reviews; the purpose of the experiments is to understand if a semantically oriented metric can be used instead of the usual text frequency metric in a context where the goal is classifying documents in two specific categories, positive and negative ones. Another aspect that is assessed is the influence of the threshold τ on the clustering results. Table 3 compares the default SLAIR metric with the SO oriented metric with varying threshold values. The comparison is performed in terms of number of clusters, accuracy of classification and time needed to perform clustering. The first aspect that emerges is that the new metric is extremely fast; indeed computing a distance value between 2 documents only requires a difference and a square, opposed to the classical frequency based distance that is much more expensive. In particular one obtains that the semantic metric is at least one order of magnitude faster than the usual metric used in text mining. From the accuracy point of view the newly defined metric is very effective too; indeed with only 64 clusters accuracy is higher than that obtainable by the usual metric and 149 clusters. Experiments also make clear the role of the threshold τ; when τ=0 only 3 clusters are obtained; the first has size 648, the second 750, and the last 2. This means that the clustering engine has split the data almost exactly in two clusters thus giving an highly crisp result. For increasing values of τ both the number of clusters and accuracy grow accordingly, meanwhile execution time is not affected by the augmented number of clusters: this happens because the total clustering time is dominated by distance computation and not by Kernel K-Means iterations. Finally when τ=1 the accuracy is 70.36% that is higher than the SLAIR baseline metric that achieves 67.71%.
5 Conclusions The proposed work shows how an affective technique called semantic orientation can be embedded into a clustering tool for text mining. A preparation phase involved many students from the University of Genoa to build, in a social-based way, a database containing couples adjective-sensation. To validate the obtained results, a method was
Semantic Oriented Clustering of Documents
529
proposed to improve consistencies of the first tagging stage that robustly corrected and validated the first human-based process. Semantic Orientation was used to define a metric for clustering. Obtained results showed that the metric is extremely fast and gives improved accuracy with respect to the usual text frequency metric. Future works will study other variants of the metric and other semantic orientations.
References 1. Lehrer, A.: Semantic fields and lexical structure. North Holland, Amsterdam (1974) 2. Sangiacomo, F., Leoncini, A., Decherchi, S., Gastaldo, P., Zunino, R.: SeaLab Advanced Information Retrieval. In: Proc. of IEEE International Conference on Semantic Computing, pp. 444–445 (2010) 3. Decherchi, S., Gastaldo, P., Zunino, R.: K-Means clustering for Content Based Document Management in Intelligence. In: Solanas, A., Bellesté, A.M. (eds.) Advances In Artificial Intelligence for Privacy Protection and Security. World Scientific Publishing, Singapore (2009) 4. Vossen, P.: EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Kluwer Academic Publishers, Dordrecht (1998) 5. SeaLab Natural Language Processing Test, http://effimero-esng.dibe.unige.it 6. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment Classification using Machine Learning Techniques. In: Conference on Empirical Methods in Natural Language Processing (2002) 7. Salton, G., Wong, A., Yang, L.S.: A vector space model for information retrieval. Journal Amer. Soc. Inform. Sci. 18, 613–620 (1975) 8. WordNet morphology functions, http://wordnet.princeton.edu/wordnet/documentation/ 9. Zhang, R., Rudnicky, A.: A large scale clustering scheme for kernel k-means. In: ICPR 2002, pp. 289–292 (2002) 10. Miller, G.A.: WordNet: A Lexical Database for English. Communications of the ACM 38(11), 39–41 (1995) 11. Hatsivassiloglou, V., McKeown, K.R.: Predicting the semantic orientation of adjectives. In: Proc. of 35th Annual Meeting of the ACL, pp. 174–181 (1997)
Support Vector Machines versus Back Propagation Algorithm for Oil Price Prediction Adnan Khashman1 and Nnamdi I. Nwulu1,2 2
1 Intelligent Systems Research Group (ISRG) Department of Electrical and Electronic Engineering, Near East University, Lefkosa, Mersin 10, Turkey [email protected], [email protected]
Abstract. The importance of crude oil in the world economy has made it imperative that efficient models be designed for predicting future prices. Neural networks can be used as prediction models, thus, in this paper we investigate and compare the use of a support vector machine and a back propagation neural network for the task of predicting oil prices. We also present a novel method of representing the oil price data as input data to the neural networks by defining input economic and seasonal indicators which could affect the oil price. The oil price database is publicly available online and can be obtained from the West Texas Intermediate crude oil price dataset spanning a period of 24 years. Experimental results suggest the neural networks can be efficiently used to predict future oil prices with minimal computational expense. Keywords: Crude Oil, Price Prediction, Neural Networks, Back Propagation Algorithm, Support Vector Machines, Radial Basis Function.
1 Introduction Predicting oil prices is by no means an easy task as the price of oil is highly non- linear and the factors that affect price are difficult to model mathematically. In general, research into predicting the price of crude oil can be classified into two categories: research using econometric models or tools [1], [2], and research using soft computing methods; in particular, artificial neural networks [3], [4], [5], [6]. A recent example of using support vector machines (SVM) for price prediction task is the work in [3], where the authors obtain daily West Texas Intermediate (WTI) oil prices from 2002 to 2008. The data is fed to a Slantlet algorithm which extracts various features as input to the hybrid system consisting of an SVM and Auto Regressive Moving Average. Artificial neural networks were also used to perform this task in another work [5], where the empirical mode decomposition (EMD) technique was used. The EMD uses the Hilbert–Huang transform which decomposes a time series to intrinsic mode functions (IMFs). A Feed forward neural network is then used to model the decomposed IMFs and residual components. The results of this neural network are then fed to an adaptive linear neural network which functions as an integrator. The final model is used to forecast both Brent and WTI crude oil prices. The correct prediction rate in this work, when using the neural network, was 69%. D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 530–538, 2011. © Springer-Verlag Berlin Heidelberg 2011
Support Vector Machines versus Back Propagation Algorithm for Oil Price Prediction
531
More recently, in [6] the authors developed an adaptive neurofuzzy interference system that predicts one day ahead whether the WTI price of crude oil is going to rise or fall. This hybrid system uses the hybrid learning algorithm and historical data from 2004 to 2007 for training while testing was done throughout the month of May 2007 and also form May to June, 2008. The correct prediction rate in this recent work was reported as 66.67%. Considering the scarcity of neural network models for oil price prediction, and the obtained prediction results from the few existing models as described above, there is a need for developing more efficient neural prediction models, which is what we propose and investigate in this work. The two neural models we propose are an SVM model and a BP model. The paper is organized as follows: in section two, we briefly describe SVM models. In section three, the crude oil price dataset used in this work is introduced and our novel method to prepare indicators and features from the dataset is described. In section four, we describe the oil price prediction system where a detailed description is made of the data pre-processing phase and the structure and topology of the SVM and the BP neural models. In section five, the experimental results of training and validating both models are discussed. Finally, in section six, the work is concluded.
2 Support Vector Machines These are supervised learners founded upon the principle of statistical learning theory. An SVM obtains results by nonlinearly mapping the inner product of a feature space unto the original space using the kernel as a tool. This makes it better suited to handle the problems of over fitting or local minima because when training in SVM, the solution of SVM is unique globally, and it is only dependent on a small subset of training data points which are referred to as support vectors. SVM also excels at learning in spaces of high-dimensionality with a small number of training examples. There are four basic kernel types currently in use with SVM; these are: the linear kernel, polynomial kernel, radial basis function (RBF) kernel, and the sigmoid kernel. In this work the RBF kernel is used with the LIBSVM software package [7]. The equation for the RBF kernel is given by: K (x,y) = exp(-γ || x-y||2), γ > 0.
(1)
Our choice of the RBF kernel is due to its possession of less hyper-parameters than other kernels and its capability to handle cases when the relationship between class labels and attributes is highly non-linear as in the case of crude oil price prediction. The two parameters of the RBF kernel used to control the generalization capability are Gamma (γ) and C (cost parameter of the error term). Typically C and γ are greater than zero. The basic learning procedure using the RBF kernel in SVM is outlined in the following steps: 1. 2. 3. 4.
Employ data pre-processing via scaling or normalization (In this work all features were scaled to values between 0 and 1). Utilize cross validation to determine the best parameters (C and γ). Use obtained parameters (C and γ) for dataset training. Test the trained dataset on the testing dataset.
532
A. Khashman and N.I. Nwulu
3 Dataset Information Attributes The dataset used in this work is the Energy Information Administration dataset maintained by the American government and available online [8]. This dataset, which comprises weekly spot prices of West Texas Intermediate (WTI) crude oil, was from January 03, 1986 to Dec 25, 2009 (i.e. a total of 1252 observations). The performance and efficiency of any soft computing scheme is very dependent on its input features. Part of the novelty in this work is the formulation of input training values or features from the oil prices dataset, in other words the input data coding. The following are the input features or attributes which we propose: 1.
2.
3.
4. 5.
6.
7.
8.
Years: Ranging from 1 to 24. The year 1986 corresponds to 1, year 1987 to 2, and so on until the year 2009 which is represented with value 24. This feature accounts for trend. Seasonal Demand: There is seasonality in demand for oil, where the demand in colder months is usually higher than in warmer months. We, therefore, assume two values: ‘1’ to represent the warmer months spanning April to September, and ‘2’ to represent the colder months spanning October to March. Average price of previous week: The average price of the previous week is an input for the present week. This attribute is aimed at providing continuality of oil prices throughout the weeks. Total number of weeks: Each week throughout the entire dataset is represented numerically by values from 1 to 1252. Yearly number of weeks: Each week within each year in the dataset is numerically represented by number 1 to 53. Usually a year has 52 weeks, however, some years have 53, and thus the maximum value for this particular attribute is assigned the value 53. This is another feature that accounts for seasonality. World events impact factor (WEIF): This novel attribute is aimed at accounting for the impact of political, economic and other external events on the oil price market. With such an effect being unpredictable, we use random numerical values ranging from 0.1 to 0.5; with 0.5 being a major world event that could impact the oil prices, and 0.1 a minor world event. Global demand: Oil consumption has increased as the years go by. This has been partly ascribed to the world’s increasing demand for oil, increasing Gross Domestic Product (GDP) and population growth. This attribute has numerical values ranging from 1 to 7. NYMEX future contract prices: It has been suggested in previous works that there is a relationship between spot prices and the New York Mercantile Exchange (NYMEX) future prices [9],[10],[11], where future contract prices are often indications of the direction spot prices would go. Therefore, we include this attribute in our prediction model design.
The above input attributes or features are considered as sufficient in this work, as they include both economic and trend indicators which could affect the oil price. These eight attributes will be used as the input to both SVM and ANN models during training and testing/validation. In total there are 1252 observations spanning the 24
Support Vector Machines versus Back Propagation Algorithm for Oil Price Prediction
533
years as indicated previously. Each entry in the dataset contains one set of input features and its corresponding output (the oil price). The dataset is divided equally into 626 observations for training the models, and 626 observations for testing or validating the trained models. This connotes training to testing ratio of 50%:50%. It should be noted here, that training to testing ratios of prior works ranged from an unbalanced 88.2%:11.8% in [4] and a ratio of 60%:40% in [3]. We believe an equal training to testing ratio will ensure a more robust learning.
4 Modelling the Oil Price Prediction System The developed system consists of two key phases: a scaling or normalization phase and the learning phase. The scaled features obtained from the first phase serve as input values to the second phase. 4.1 Data Processing Phase In this phase we employ feature/attribute scaling/normalization and output representation. Firstly, the values of the eight input features are scaled to values between 0 and 1. Secondly, the output values (predicted oil price) for training and testing are coded to suitable output classes. The scaling technique which is used in this work is based on finding the maximum or the highest value within each input feature for all 1252 observations in the dataset, and dividing all the values within that same feature by the obtained maximum value amongst the values in that feature. Table 1 shows the maximum values for the eight input features. For the second stage instead of training both SVM and ANN models with the exact output value (weekly oil price), we defined 20 weekly average price ranges within 5 US$/barrel as the output, thus resulting in 20 classes for our models. The motivation behind this output representation is to provide a degree of flexibility to the predicted price and also improve learning. Table 2 shows the output price intervals and their respective output classes. Table 1. Maximum values for each input feature of the 1252 Observations; these values are used in the normalization procedure of the input data prior to training/testing Input Attribute Max.Value
1 24
2 2
3 142.52
4 53
5 1252
6 0.5
7 7
8 142.46
4.2 SVM Learning Phase For SVM learning, the C-SVM model with an RBF kernel was used. In order to search for suitable parameters (C and γ) for the RBF kernel we perform a parameter search using the v-fold cross validation method. The cross validation procedure is a technique used to avoid the over fitting problem. In v-fold cross validation, we first divide the training set into v subsets all with equal size. Sequentially one subset is tested using the SVM classifier trained on the remaining (v – 1) subsets. Cross validation correct prediction rate (CPR) is the percentage of data which are correctly predicted.
534
A. Khashman and N.I. Nwulu Table 2. The system output price range coding and representation Weekly Price Range (US$ / barrel) 0 – 15 16 – 20 21 - 25 26 - 30 31 - 35 36 - 40 41 - 45 46 – 50 51 – 55 56 – 60 61 - 65 66 - 70 71 - 75 76 - 80 81 - 85 86 - 90 91 - 100 100 - 110 111 - 120 >120
SVM Output
ANN Output
1 -1 2 -2 3 -3 4 -4 5 -5 6 -6 7 -7 8 -8 9 -9 10 -10
10000000000000000000 01000000000000000000 00100000000000000000 00010000000000000000 00001000000000000000 00000100000000000000 00000010000000000000 00000001000000000000 00000000100000000000 00000000010000000000 00000000001000000000 00000000000100000000 00000000000010000000 00000000000001000000 00000000000000100000 00000000000000010000 00000000000000001000 00000000000000000100 00000000000000000010 00000000000000000001
The parameters which produce the best cross validation CPR are saved and then used to train the SVM learner. The saved model is then used on the out-of-sample data (testing set). In this work v=5. The parameter search range for C was conducted from (2-50 – 250) while for γ it was (2-15 – 215). The best C obtained was 2965820 while γ was 0.00195313. These values of C and γ were then used for training the SVM learner. 4.3 ANN Learning Phase The ANN is based on the back propagation learning algorithm. The input layer has eight neurons according to the number of the input features; each input neuron receives a normalized feature numerical value. There is one hidden layer containing 10 neurons which was determined after several experiments involving the adjustment of the number of hidden neurons from one to 80 neurons. The output layer has 20 neurons according to the 20 prediction price ranges or intervals. During the learning phase, the learning coefficient, and the momentum rate were adjusted during various experiments; achieving an error value of 0.0053 which was considered as sufficient for this application. The final training parameters for this neural model were: learning coefficient (0.008459) and momentum rate (0.613). The initial weights of the neural network were randomly generated with values between 0.35 and +0.35.
Support Vector Machines versus Back Propagation Algorithm for Oil Price Prediction
535
5 Experimental Results and Discussion In this work, the experimental results were obtained using a 2.2 GHz PC with 2 GB of RAM, Windows XP OS and LIBSVM v 2.9.1 software for the SVM model and MATLAB v 7.9.0 for the ANN model. The dataset used in this paper totals 1252 observations which comprises the weekly WTI crude oil prices from January 03 1986 to December 25 2009. For both SVM and ANN models, a training-to-validation ratio of 50%:50% was used where the last 626 observations (period: January 02 1998 to December 25 2009) were used for training, while the first 626 observations (period: January 03 1986 to December 26 1997) were used for validating the trained models. The reason for using the later observations or the second period for training rather than validation is explained below with the aid of Fig. 1 which shows the graph of the weekly oil prices over the period from January 1986 until December 2009.
Fig. 1. Graph of WTI Spot Prices (January 03, 1986 to December 25, 2009) [8]
•
•
It is obvious from the graph that the second period is more chaotic and the price range is more varied, therefore training any soft computing model with data from that period would expose the model to these variations thus making it more robust when used for prediction. The first period prices can be seen as more stable and with less range variety. The oil price market is an ever evolving one, it is more reasonable to train the soft computing model with data from more recent periods in order to achieve a more accurate prediction of the current and future prices.
For the trained SVM model the implementation results were as follows: using the training dataset (later 626 observations) yielded a 92.7316% correct prediction rate (CPR). The testing of the trained SVM model using the testing dataset (earlier 626 observations) yielded a CPR of 69.8211%. Combining the training and testing CPRs yields an overall CPR of 81.27635%. The SVM model generalized the training data in 0.95 seconds. Table 3 lists the final parameters and the CPRs of the successfully trained SVM model. The ANN model learnt and converged after 31950 iterations and within 522.8 seconds, whereas the running time for the trained ANN; using one forward pass, was
536
A. Khashman and N.I. Nwulu Table 3. Final parameters and CPRs of the SVM prediction model Number of features Number of classes C Parameter search range γ Parameter search range C Γ V Type of SVM used Kernel Training optimization time (s)1 Training dataset CPR Validation/Test dataset CPR Overall CPR 1
8 20 2-50 – 250 2-15 - 215 2965820 0.00195313 5 C-SVM RBF 0.95 92.7316% 69.8211% 81.27653%
Using a 2.2 GHz PC with 2 GB of RAM, Windows XP OS and LIBSVM v2.9.1.
Table 4. Final parameters and CPRs of the ANN prediction model Number of input neurons Number of hidden neurons Number of output neurons Learning coefficient Momentum rate Obtained error Performed iterations Training time (s)1 Test time (s)1 Training dataset CPR Validation/Test dataset CPR Overall CPR 1
8 10 20 0.008459 0.613 0.0053 31950 522.8 1.7x10-4 93.18% 76.2% 84.69%
Using a 2.2 GHz PC with 2 GB of RAM, Windows XP OS and MATLAB v 7.9.0 (R2009b).
1.7x10-4 seconds. Table 4 lists the final parameters and the CPRs of the successfully trained ANN. The implementation results of the trained ANN model were as follows: using the training dataset yielded 93.18% CPR, whereas using the validation or testing dataset yielded a CPR of 76.2%. Combining the training and validation prediction results yields an overall CPR of 84.69%. Comparisons of both results show that the ANN prediction system achieves better CPR while the SVM learner requires less time. It is our opinion, however, that the ANN prediction system would be more suitable for practical applications of crude oil price prediction. Although the SVM learner produced satisfactory results with shorter
Support Vector Machines versus Back Propagation Algorithm for Oil Price Prediction
537
training times, the training time does not consider the time taken for cross validation (search for the best RBF parameters: C and γ) which is an additional time consuming procedure. When comparing both models to previously suggested oil price prediction model (69% in [5], 66.67%. in [6]), we realized that both suggested models in this paper outperform recent previous works. The superiority of our suggested models is evident in the higher CPRs, the use of balanced training-to-validation data, and the utilization of the longest period span (24 years records) of the WTI crude oil price database. Having said this, if we were to choose a better performer amongst the two models we present in this work, it would be the ANN model.
6 Conclusion This paper investigates the application of SVM and ANN models for the task of predicting crude oil prices. This task has been addressed in few previous works using either econometric models or soft computing methods. In this work, we used WTI average weekly prices of crude oil over the past 24 years as the dataset for developing the prediction models. The outputs of both SVM and ANN prediction models provide the weekly average price for crude oil within 5US$/barrel accuracy. The implementation of either prediction model comprises two phases: the data processing phase and the learning phase. During the first phase, the information in the oil price dataset is coded via scaling or normalisation into numerical values that are fed into the input of the model. The output of the model is also coded using binary representations of twenty oil price intervals with 5US$/barrel span. The dataset which contains 1252 observations (oil prices) is divided equally into training and validation or testing sets. This paper introduces also novel features or attributes which, in our hypothesis, affect the change in oil price. There are eight features reflecting the used dataset, and they include economic and trend indicators such as global demand factor and a random world event factor amongst other attributes. The presented SVM prediction model uses the RBF kernel and v-fold cross validation mechanism, whereas the ANN model is based on the back propagation learning algorithm. Experimental results suggest that both proposed SVM and ANN prediction models are superior to recently suggested oil price prediction models [5],[6]. However, amongst the two models presented in this paper, we believe that the ANN model outperforms the SVM model when considering the correct prediction rate (CPR); with the ANN achieving a CPR of 84.69% when compared to the SVM model with a CPR of 81.27635%. Future work will focus on reducing the span of the prediction output, and investigating the use of a single continuous output instead of the existing binary coded output.
References 1. Murat, A., Tokat, E.: Forecasting Oil Price Movements with Crack Spread Futures. Energ. Econ. 31, 85–90 (2009) 2. Hassan, M., Lixian, S.: International Evidence on Crude Oil Price Dynamics: Applications of ARIMA-GARCH Models. Energ. Econ. 32, 1001–1008 (2010)
538
A. Khashman and N.I. Nwulu
3. He, K., Xie, C., Lai, K.K.: Crude Oil Price Prediction using Slantlet Denoising Based Hybrid Method. In: IEEE International Joint Conference on Computational Sciences and Optimization, pp. 12–16. IEEE Press, China (2009) 4. Xie, W., Yu, L., Xu, S., Wang, S.: A new method for crude oil price forecasting based on support vector machines. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2006. LNCS, vol. 3994, pp. 444–451. Springer, Heidelberg (2006) 5. Yu, L., Wang, S., Lai, K.K.: Forecasting Crude Oil Price with an EMD-Based Neural Network Ensemble Learning Paradigm. Energ. Econ. 30, 2623–2635 (2008) 6. Ghaffari, A., Zare, S.: A Novel Algorithm for Prediction of Crude Oil Price Variation Based on Soft Computing. Energ. Econ. 31, 531–536 (2009) 7. Chang, C.C., Lin, C.-J.: LIBSVM: A Library for Support Vector Machines (2001), Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm 8. Energy Information Adminisntration. Weekly Cushing. OK WTI Spot Price FOB, http://www.eia.doe.gov 9. Dées, S., Gasteuil, A., Kaufmann, R.K., Mann, M.: Assessing the Factors Behind Oil Price Change. European Central Bank Working Paper Series 855 (2008) 10. Bekiros, S.D., Diks, C.G.H.: The Relationship Between Crude Oil Spot and Futures Prices: Cointegration, Linear and Nonlinear Causality. Energ. Econ. 30, 2673–2685 (2008) 11. Huang, B.N., Yang, C.W., Hwang, M.J.: The Dynamics of a Nonlinear Relationship Between Crude Oil Spot and Futures Prices: A Multivariate Threshold Regression Approach. Energ. Econ. 31, 91–98 (2009)
Ultra-Short Term Prediction of Wind Power Based on Multiples Model Extreme Leaning Machine Ting Huang1, Xin Wang1,*, Lixue Li1, Lidan Zhou2, and Gang Yao2 1
Center of Electrical & Electronic Technology, Shanghai Jiao Tong University, Shanghai, P.R. China, 200240 [email protected] 2 Department of Electrical Engineering, Shanghai Jiao Tong University, Shanghai, P.R. China, 200240
Abstract. Wind energy is considered one of the most rapidly growing energy resources all over the world. The uncertainty in wind generation is very large due to the stochastic in wind, and this needs a good algorithm applying to forecast wind speed and power for grid operator rapidly adjusting management planning. In this paper, Multiple Models Extreme Learning Machine (MMELM) is proposed for the feature of wind. A suspending criterion is designed to separate the whole models into two parts: the suspending models and the updating models. For the suspending models with minor error, the models needn’t update online. For the updating models with major error, it must utilize the random selection method to update the model parameters. Finally, the final output value should be the sum of all models outputs multiple their corresponding weight number. The simulation result shows that the fitting accuracy meets the requirement. Keywords: Wind Farm, Power Prediction, Ultra-Short Term, Weighted Multiple Models, Extreme Leaning Machine.
Introduction With the growing awareness of limited supply of fossil fuels and environment concerns, the interest in renewable energy has grown significantly over the last couple of decades, particularly wind energy is the fastest growing source of renewable energy. It is expected that about 12% of total world electricity demands to be supplied from wind energy resources by 2020 [1]. As the wind farm power is mainly determined by wind speed, which is stochastic, power grid with safe operation is challenge. In order to reduce the effect of wind power to grid, a good approach to forecast wind speed and power is essential, which can make power grid operator adjust management planning rapidly [2-3]. During the last decades, many techniques have been utilized and developed for wind speed forecasting and wind power prediction. In [4], Kalman filters is applied as a post-processing method to enhance wind speed prediction. In [5-6], a time-series *
Corresponding Author.
D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 539–547, 2011. © Springer-Verlag Berlin Heidelberg 2011
540
T. Huang et al.
models is developed to predict both wind speed and wind power. Then grey theory is adopted in wind speed forecasting and wind turbines’ output power prediction in [78]. Now fuzzy logic strategy is used in the prediction of wind speed and the produced electrical power [9]. In addition, a neural network is introduced to forecast a wind time series [10]. Unfortunately, no matter what kind of neural network, its training is an iterative process, so initial network weight will have a great influence for prediction accuracy. For strong stochastic wind, it is a great challenge for power prediction. In this paper, Multiple Models Extreme Learning Machine (MMELM) is proposed to predict the wind power. First, for the MMELM models with different prediction error, a suspending criterion is designed. It separates the whole models into two parts: the suspending models and the updating models. For the suspending models with minor error, they need not to be updated. For the updating models with major error, they must be updated online. Then, all models outputs should be multiplied their corresponding weight number as the final prediction output value. Finally a simulation with actual data from the wind farm in North China is presented to illustrate effectiveness of the method above.
2 Forecasting Model 2.1 Extreme Learning Machine Models Suppose x ( i ) ∈ R n1 is the input data and y ( i ) ∈ R n2 is the output data, where
{( x
(i )
, y (i ) )} , i = 1,
, N is the sample at time N . M SLFNNs (Single Layer Feed
forward Neural Networks) are built with n( n ≤ n1 ) hidden nodes, and the structure of SLFNN is show in Fig. 1.
x ( i ) ∈ R n1
y ( i ) ∈ R n2
n
Fig. 1. Architecture of SLFNN Model
Therefore, SLFNN j of the model
y j
(i )
j at time i is mathematically modeled as:
n
= ∑ βk ⋅ g ( wk i x (i ) + bk ), j = 1, k =1
j
j
j
, M , i = 1,
,N ,
(1)
Ultra-Short Term Prediction of Wind Power Based on MMELM
,w
where
k
541
is the weight vector connecting the kth hidden node and the input nodes,
j
βk is the weight vector connecting the kth hidden node and the output nodes, bk is j
j
(i)
the threshold of the kth hidden node, g ( x) is activation function, y is the real outj
put of the model. The above N equations can be written compactly as
H
(N )
j
β
(N )
=Y
( N ) [11-12]
(2)
j
j
⎡ g (w1 ⋅ x (1) + b1 ) g (wn ⋅ x (1) + bn ) ⎤ j j j j ⎥ ⎢ Where, H ( N ) = ⎢ ⎥ is called the hidden layer j ⎢ ⎥ (N ) g ( wn ⋅ x ( N ) + bn ) ⎥ ⎢ g ( w1 ⋅ x + b1 ) j j j ⎥ ⎢⎣ j ⎦ N ×n (N ) output matrix; and the kth column of H is the kth hidden node output vector. j
β
(N )
is called the weight matrix connecting hidden nodes and the output nodes,
(N)
is the output matrix.
j
Y j
According to randomly initial value of wk , it can calculate the weight matrix conj
necting hidden nodes and the output nodes of the jth model by generalized inverse algorithm as:
⎛ ⎞ ⎜ βj ⎟ ⎝ ⎠
(N )
+
−1
T
⎡ H ( N ) ⎤ Y ( N ) = ⎡ L( N ) ⎤ ⎡ H ( N ) ⎤ Y ( N ) ⎢⎣ j ⎥⎦ j ⎢⎣ j ⎥⎦ ⎢⎣ j ⎥⎦ j
(3)
+
where, ⎡ H ( N ) ⎤ is the Moore–Penrose generalized inverse of matrix H ( N ) ,
⎢⎣
L j
(N )
⎥⎦
j
= ⎡H ⎢⎣ j
(N )
j
T
⎤ H ( N ) .Then it acquires M SLFNN models. j ⎥⎦ j
2.2 Suspending Criterion For the Extreme Learning Machine (ELM) models with the same structure, the initial weight and threshold value of each model has been built randomly, which make the prediction accuracy of each model different very much. To improve the prediction accuracy, it is essential to preserve the model with minor error and update the model with major error. Then a suspending criterion is designed to separates the whole models into two parts: the suspending models and the updating models. The basic schematic is illustrated in Fig. 2.
542
T. Huang et al.
Fig. 2. The principle of the forecasting model
Suspending criterion is designed as
y(N )
L, 6XVSHQGLQJ0RGHO
j
y
(N )
(4)
L, 8SGDWLQJ0RGHO
j
where, L is the setting threshold of model to judge whether each output of model is in the permitted range. If the model output satisfies y ( N ) ≤ L , the model is called j
suspending model, otherwise, the model is called updating model. 2.3 Suspending Model By the judge of suspending criterion above, suspending models SLFNN j
( j = 1, ", P) can be got. When getting a new group of data ( x ( N +1) , y ( N +1) ) , It keeps the architecture of the network without updating output weight matrix and it just calculate with iteration method on the base of the previous model as
⎡ Y (N ) ⎤ ⎡ H (N ) ⎤ j ⎥ ⎢ ⎥ β ( N +1) = ⎢ j ( N + 1) ⎢ y ( N +1) ⎥ ⎢h ⎥ j ⎢⎣ j ⎥⎦ ⎣⎢ j ⎦⎥
(5)
Ultra-Short Term Prediction of Wind Power Based on MMELM
where, h( N +1) = ⎡ g ( w1 ⋅ x N +1 + b1 ) " g ( wn ⋅ x N +1 + bn ) ⎤ ⎢ ⎥ j
⎣
j
j
j
j
⎦ 1×n
543
is the hidden layer output
matrix constituting with the new data ( x ( N +1) , y ( N +1) ) , y ( N +1) is the model output j
constituting with the new data ( x
( N +1)
⎡ H (N ) ⎤ j = ⎢ ( N +1) ⎥ ⎢h ⎥ ⎣⎢ j ⎦⎥
β ( N +1) j
+
,y
( N +1)
) , so
(N ) ⎡Y ( N ) ⎤ ⎤ −1 ⎡ H j j ⎢ ⎥ = ⎡ L( N +1) ⎤ ⎢ ⎥ ⎢ y ( N +1) ⎥ ⎢⎣ j ⎥⎦ ⎢ h( N +1) ⎥ ⎢⎣ j ⎥⎦ ⎣⎢ j ⎦⎥
T
⎡Y ( N ) ⎤ ⎢j ⎥ ⎢ y ( N +1) ⎥ ⎢⎣ j ⎥⎦
(6)
T
Let L( N ) = ⎡ H ( N ) ⎤ H ( N ) , then
⎢⎣
j
L
( N +1)
j
⎥⎦
j
⎡ H (N ) ⎤ j = ⎢ ( N +1) ⎥ ⎢h ⎥ ⎣⎢ j ⎦⎥
T
j
⎡ H (N ) ⎤ ⎢ j ⎥ ⎢ h( N +1) ⎥ ⎣⎢ j ⎦⎥
(N) = ⎡ ( H )T ⎣⎢ j
(h
⎡ H (N ) ⎤ T j ( N +1) ⎤ ( N +1) ⎤ h ) ⎢ ( N +1) ⎥ = ( L)( N ) + ⎡h ⎢⎣ j j ⎥ ⎦⎥ ⎢ h ⎦⎥ j ⎢⎣ j ⎥⎦
( N +1) T
j
(7)
Substitute (7) into (6), it can be concluded that
β
( N +1)
=β
j
(N )
j
+ ⎡L ⎢⎣ j
( N +1)
T −1 ⎤ ⎡ h( N +1) ⎤ ⎡ y ( N +1) − h( N +1) β ( N ) ⎤ ⎥ ⎥⎦ ⎢⎣ j ⎥⎦ ⎢⎣ j j j ⎦
(8)
By Woodbury formula in [13], it is shown as follow
( )
−1 ⎡ L( N +1) ⎤ = ⎡ L( N ) + h( N +1) ⎢j ⎢⎣ j ⎥⎦ j ⎣
T
( N +1)
h j
⎤ ⎥ ⎦
−1
( )
T −1 −1 −1 ( N +1) ⎤ ( N +1) ⎤ ⎡ ( N +1) ⎤ ⎡ ( N +1) ⎡ ( N ) ⎤ ( N +1) = ⎡L − ⎡L h I +h L h ⎢ ⎢⎣ j j ⎣⎢ j ⎦⎥ ⎦⎥ ⎢⎣ j ⎣⎢ j ⎦⎥ j ⎦⎥ ⎣
T
−1
⎤ ( N +1) ⎡ ( N ) ⎤ −1 L ⎥ h ⎣⎢ j ⎦⎥ ⎦ j (9)
−1
Set F ( N +1) = ⎡ L( N +1) ⎤ , then the recursion formula of β ( N +1) can be written as
⎣⎢
j
β j
⎦⎥
j
( N +1)
=β j
(N)
j
+F j
( N +1)
( H ) ⎡⎢⎣ y ( N +1)
j
T
j
( N +1)
−h j
( N +1)
β j
(N )
⎤ ⎥ ⎦
(10)
544
T. Huang et al.
F j
( N +1)
=F j
(N )
−F j
(N )
(h ) ( N +1)
T
j
(
⎡ ( N +1) (N ) ( N +1) F h ⎢ I + hj j j ⎣
)
T
−1
⎤ ( N +1) ( N ) F ⎥ hj j ⎦
(11)
So, by equation (10) and (11), it realizes the online learning of network SLFNN j . According to equation (10), the output of suspending model is
y
( N +1)
=h
( N +1)
j
j
⋅β
( N +1)
(12)
j
2.4 Updating Model For the updating model SLFNN j ( j = 1,", M − P) , it randomly updates the model after getting a new group of data ( x ( N +1) , y ( N +1) ) , and the training method is similar with ELM models in 2.1. Then the model output is
y
( N +1)
=h
j
h
( N +1)
j
( N +1)
j
⋅β
(N )
(13)
j
⎡ = ⎢ g ( w1 ⋅ x ( N +1) + b1 ) " j j ⎣
⎤ g ( wn ⋅ x ( N +1) + bn ) ⎥ j j ⎦ 1× n
(14)
2.5 Weighting Output Suppose vector y ( N ) is the output of Nth sub-model and vector y ( N ) is the real wind j
power, then the error of every model can be got as e( N ) = y ( N ) − y ( N ) . In order to j
j
improve the accuracy of the wind power prediction, the model with minor error is assigned with a high weight. So the weight of every model can be determined as
⎛ (N ) e ⎜ j ⎜1 − M (N ) ⎜ e ∑ ⎜ j j =1 λ=⎝ j M −1
⎞ ⎟ ⎟ ⎟ ⎟ ⎠,
M
∑λ =1 j =1
(15)
j
Therefore, the final output value should be the sum of all models’ outputs multipling their corresponding weight numbers, as M
y(outN+1) = ∑ λ y j =1
( N +1)
(16)
j j
3 Ultra-Short Term Wind Power Predict Model The primary factors that affect wind output power are wind speed, wind angle, temperatures and so on. In this paper, wind speed wind angle temperature and wind
、
、
Ultra-Short Term Prediction of Wind Power Based on MMELM
545
Fig. 3. Architecture of Prediction model
power are selected as the inputs of the NN models above. Here 100 models are acquired in the entire neural network, in which each model has the same three-layer network structure. In each model, there are 12 neurons in the input layer, one neuron in the output layer, 12 neurons in hidden layer, and sigmoid function is utilized as activation function. The basic structure of the neural network can be seen in Fig. 3.
4 Experiment Results To illustrate the effectiveness of the algorithm above, an ultra-short term wind power forecast is studied with a report in Aug 2010 in a wind farm in North China.The data is measured with 10min interval. During the experiment, it selects wind turbine data of 1-4 Aug, 2010 as for the training samples, data of Aug 5 for forecasting. The prediction result is shown in Fig.4 and the forecasting mean error is shown in Tab.1. From the prediction result, the prediction curve can quickly follow the real power curve. The error is greatly less than that of BP network. By comparing the above simulation results, the estimated wind power tracks the actual wind power accurately and the average tracking error is about 13.50%, which is less than the error of power prediction about 27.30% with BP. It is known that the prediction effect of multiple models algorithm in this paper is superior to that of BP, which verifies the feasibility and effectiveness of the algorithm above in wind power prediction. Table 1. Forecasting mean error of BP and MMELM in Aug,5 2010 2:00-22:00
Mean Error (%)
BP
MMELM
27.16%
13.56%
546
T. Huang et al. 2010-8-5 2:00-22:00 wind power prediction curves 1600 Actual Data
BP
MMELM
1400 1200
Power/KW
1000 800 600 400 200 0 -200 02:00 04:00 06:00
8:00
10:00 12:00 14:00 16:00 18:00 20:00 Time
Fig. 4. The wind power prediction result using BP and MMELM in Aug,5 2010 2:00-22:00
5 Conclusions In this paper, Multiple Models Extreme Learning Machine is proposed for the feature of wind. A suspending criterion is designed to separate the whole models into two parts: the suspending models and the updating models. For the suspending models with minor error, it needn’t adopt the random method to update online. For the updating models with major error, it must utilize the random method to update the model. Then, the final output value should be the sum of all models outputs multiple their corresponding weight number. Finally, the simulation result shows that the proposed algorithm can improves the prediction accuracy greatly.
References 1. European Wind Energy Association (2002), Wind force 12, http://www.ewea.org/doc/WindForce12.pdf 2. Xiuyuan, Y., Yang, X., Shuyong, C.: Wind speed and generated power forecasting in wind farm. In: Proceedings of the CSEE, vol. 11(6), pp. 1–5 (2005) 3. Guoqiang, Z., Hai, B., Shuyong, C.: Amending algorithm for wind farm penetration optimization based on approximate linear programming method. In: Proceedings of the CSEE, vol. 24(10), pp. 68–71 (2004) 4. Louka, P., Galanis, G., Siebert, N., Kariniotakis, G., Katsafados, P., Pytharoulis, I., Kallos, G.: Improvements in wind speed forecasts for wind power prediction purposes using Kalman filtering. J. Wind Eng. Ind. Aerodyn. 96, 2348–2362 (2008)
Ultra-Short Term Prediction of Wind Power Based on MMELM
547
5. El-Fouly, T.H.M., El-Saadany, E.F., Salama, M.M.A.: One day ahead prediction of wind speed and direction. IEEE Transactions on Energy Conversion 23(1), 191–201 (2008) 6. Lei, D., Lijie, W., Shuang, G., Xiaozhong, L.: Modeling and Analysis of Prediction of Wind Power Generation in the Large Wind Farm Based on Chaotic Time Series. Transactions of China Electrotechnical Society 23(12), 125–129 (2008) 7. El-Fouly, T.H.M., El-Saadany, E.F., Salama, M.M.A.: Grey Predictor for Wind Energy Conversion Systems Output Power Prediction. IEEE Transactions on Power Systems 21(3), 1450–1452 (2006) 8. El-Fouly, T.H.M., El-Saadany, E.F., Salama, M.M.A.: Improved Grey predictor rolling models for wind power prediction. Transm. Distrib. 1(6), 928–937 (2007) 9. Damousis, I.G., Alexiadis, M.C., Theocharis, J.B., Dokopoulos, P.S.: A Fuzzy Model forWind Speed Prediction and Power Generation in Wind Parks Using Spatial Correlation. IEEE Transactions on Energy Conversion 2(19), 352–361 (2004) 10. Potter, C.W., Negnevitsky, M., Dokopoulos, P.S.: Very Short-Term Wind Forecasting for Tasmanian Power Generation. IEEE Transactions on Power Systems 21(2), 965–972 (2006) 11. Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme Learning Machine: Theory and Applications. Neurocomputing 70(1-3), 489–501 (2006) 12. G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, Extreme Learning Machine: A New Learning Scheme of Feedforward Neural Networks [C], 2004 International Joint Conference on Neural Networks, Budapest, Hungary, July 25-29, 2004: 985-990 13. Lee, J., Jung, D.-K., Kim, Y., Lee, Y.-W.: Smart grid solutions services and business model focused on Telco. In: IEEE Network Operation and Management Symposium Workshop, vol. 6(17), pp. 323–326 (2010)
BursT: A Dynamic Term Weighting Scheme for Mining Microblogging Messages Chung-Hong Lee, Chih-Hong Wu, and Tzan-Feng Chien Dept of Electrical Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan [email protected], [email protected], [email protected]
Abstract. One of the basic human needs is to exchange information and socialize with each other. Online microblogging services such as Twitter allow users to post very short messages related to everything ranging from mundane daily life routines to breaking news events. A key challenging issue of mining such social messages is how to analyze the real-time distributed messages and extract significant features of them in a dynamic environment. In this work, we propose a novel term weighting method, called BursT, using sliding window techniques for weighting message streams. The experimental results show that our weighting technique has an outstanding performance to reflect the shifts of concept drift. The result of this work can be extended to perform a periodic feature extraction, and also be able to integrate other sophisticated clustering methods to enhance the efficiency for real-time event mining in social networks. Keywords: information retrieval, text mining, term weighting scheme, social networks, social mining
1 Introduction 1.1 Motivation Recently, with a fast growing internet users keep up with newest information through text based microblogging streams like Twitter, searching for messages related to a hot news topic from the internet has been becoming an important activity for daily information acquisition. One of the key advantages of microblog is that it enables people to achieve a near real-time information awareness. For instance, Twitter has long offered users a list of real-time "trending topics" to explore--essentially all tweets containing the most-popular terms, such as "Haiti" or "Google Zeitgeist". This imposes an increasing need for utilizing automatic techniques to analyze and present correlative messages to a general reader in an efficient manner. Unfortunately, a key challenging issue of mining such social messages is how to analyze the real-time distributed messages and extract significant features of them in a dynamic environment. The view is taken, therefore, in this work we propose a novel term weighting method, called BursT, using sliding window techniques to tackle the challenges mentioned above. D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 548–557, 2011. © Springer-Verlag Berlin Heidelberg 2011
BursT: A Dynamic Term Weighting Scheme for Mining Microblogging Messages
549
1.2 Problem Statements In order to get insight into microblogging operations, the characteristics of microblogging messages are listed as follows: • Tremendous: Microblogs have been becoming popular services with explosive subscribers in recent years. A statistic report 1 from Twitter reveals the message streams are over 1,000 TPS (Tweets/sec). • Text Un-integrality: In order to shorten time of texting messages, most microblogging services limit the total length of a post. This leads to a problem with lack of semantic integrality in messages. • Time Sensitiveness: Real-time operation is an essential factor for the use of microblogging messages. User posts a tweet with a timestamp indicating what someone says has happened (or will happen) at the specific time point.
Fig. 1. A sample list of messages about the event of “Chile’s Rescued Miners” from Twitter
In Figure 1, a sample list of messages regarding the event of “Chile’s Rescued Miners” from Twitter is illustrated. In the message list, we found some characteristics within such short, timely and text-based user generated contents: a rapid growing amount of messages shared same topic words like “rescate”, “mineros” within a short period of time, which can be regarded as a message burst. Our work suggests that the detection of a message burst should consider the factor of accumulated historical data. In our work, the term “burst” is defined as a situation that a large number of frequently posted messages suddenly happened in a short time. Due to the dynamic characteristics of microblogging messages, several issues are described as follows: • The design of weighting scheme of microblogging messages should differ from traditional methods in respect of their topical dynamic characteristics. Under such
-
1 “ Twitter's New Search Architecture” http://engineering.twitter.com/2010/10/ twitters-new-search-architecture.html
550
C.–H. Lee, C.–H. Wu, and T.–F. Chien
• a circumstance, concepts are often not stable but change with time, which is also known as concept drift [1]. • For the purpose of instantly analyzing messages, an incremental approach for term weighting [2] is required to avoid re-calculating entire messages. • Maintaining cost in managing microblogging messages would be higher than other static messages, due to the nature of timely dynamics in the microblogging streams. In our previous work [3], we found that a crucial problem of mining streaming data is how to extract significant features in a dynamic environment. In this paper, the related work of term weighting techniques and burst detection algorithm are discussed in Section2. Section 3 presents a novel term weighting scheme for mining streaming messages. The experimental results are shown in Section 4. Finally, we conclude this paper in Section 5.
2 Related Work For considering the timing factors in evaluation term significance, the incremental approach is regarded as an effective way to prevent re-calculating entire corpus once new messages being coming into the system. Thus, we discuss two popular weighting schemes associated with our method, say, incremental TFIDF and TFPDF (i.e., Term Frequency ╳ Proportional Document Frequency) methods for streaming messages. Incremental TFIDF term weighting scheme [4] is an extension to traditional TFIDF (i.e., Term Frequency ╳ Inverse Document Frequency) method. The main idea of this method is to extend the document frequency df (see equation 3) and normalize term frequency tf (see equation 4) at each time t to fulfill the function of online calculation. It is now widely used in dynamic text retrieval and text mining systems. However, it is worth mentioning that almost all terms occur in each message only once due to the length limitation of messages. Therefore, the computation overhead of term frequency tf is strongly affected by the length of messages. In addition, the document frequency df conflicts the operation of topic mining since a higher df value of the words implies the terms occur in many documents, which might lead to the problem of missing topic words in messages to some extent.
tfidf t ( w, d ) =
Nt 1 f ( d , w) * log( ) Z t (d ) df t ( w)
Z t (d ) = ∑ w f (d , w) * log(
Nt ) df (w)
df t ( w) = df t −1 ( w) + df Ct ( w)
(1)
(2)
(3)
BursT: A Dynamic Term Weighting Scheme for Mining Microblogging Messages
551
The same view is taken by [5], TFPDF algorithm tries to give higher weight to the term that was occurred frequently in many documents from the newswire sources for finding emerging topics. As shown in equation 4, Fjc is the term frequency of j-th word in c channel (i.e., newswire source) and it will be normalized by the total words in the channel (see equation 5). PDF weighting represents an exponential distribution of the number of documents containing the term to the total number of documents in the channel. The TFPDF scheme benefits topic detection tasks particularly for finding significant words, but it is slightly inconvenient that extracting frequent features may enable oral words have heavier weights. These will be discussed in more detail later.
Wj =
∑
F jc =
c=D c =1
F jc exp(
n jc ) Nc
(4)
F jc
∑
k=K k =1
Fkc
2
(5)
To process texts with a chronological order, a fundamental problem we concerned is how to find the significant features in text streams. In classic text retrieval systems, the most common method for feature extraction is to deal with each document as a bag-of-words representation. Such an approach is not suitable for our dynamic system. It is observed that in microblogging text streams, some words are “born” when they appear first time, and then their intensity “grow” in a period of time till reach a peak. These words are called burst words. As time goes by, the topics are no longer discussed by people, they “fade away” with power law and eventually become “death” (disappear), or change to a normal state. Such a phenomena is so-called lifecycle of the feature. Related idea of the state-based method was proposed by Kleinberg [6]. Their work considers the arrival rate of messages as the main factor to feature extraction, and the result yields a nested representation of the set of bursts that impose a hierarchical structure on the overall stream. Alternatively, a trend-based method [7] treats a word as falling or rising word depending on a measuring on absolute or relative change. In this paper, we do not compare two event time (current and the last record), but make a consideration of long term expectation value. The concepts would be described in detail in Section 3 and Section 4. To the best of our knowledge, although some research work have been carried out on exploring the feature distribution on microblogging [8-10], little effort has been paid on utilizing sliding window techniques to developing a term weighting scheme for microblogging message streams. As a result, in this work we propose our novel term weighting scheme BursT to tackle the problem in the domain.
3 The Proposed Term Weighting Method for Microblogging Messages For microblogs, the corpus tend to be dynamic as new items always being added and old items being changed or deleted. Therefore, an ideal term weighting scheme for
552
C.–H. Lee, C.–H. Wu, and T.–F. Chien
mining microblogging messages should subtly reflect the changes over time and quickly assign proper weights in such a dynamic environment. In this section, we firstly describe the sliding window model, which is associated with the development of our weighting method, and then explain the weighting mechanism. 3.1 Fundamental Concepts of Sliding Window Model
The microblogging messages continuously posted by users around the world, and it is almost impossible to store all messages at one time due to the restrictions of memory limitation and constantly time lapsing. As a result, the sliding window model [11] is adopted for tackling the issues in this work, as shown in Figure 2. Briefly speaking, the steps of sliding window technique include: (i) the insertion operation in which a new index entry is built when a message comes in, (ii) the message is reserved until its lifetime exceeds the fixed length of time window tw, and (iii) the deletion operation in which the message will be removed from memory.
Fig. 2. A continuous message streams using sliding window model
3.2 The BursT Weighting Method
The primary goal of this work is to develop an incremental term weighting scheme for microblogging messages. The word burst is defined as an unusual number of frequently posted messages happened in a short time. Figure 3 shows a three-phase of word categories occurred in microblogging messages. In Figure 3, uninformative word means that a word rarely occurred in the sliding window, such as an oral word. If the word was occurring very frequently but with lower burstiness, they could be recognized as common word or social word, such as “haha” and “lol”. If a word has a higher burst than expectation within a certain range of document frequency, we will highlight its importance for weight design in the sliding window. burstiness
Phrase 3 Topic Word Phrase 1 Uninformative Word Phrase 2 Common Word df
Fig. 3. Word categories in microblogging texts
BursT: A Dynamic Term Weighting Scheme for Mining Microblogging Messages
553
Accordingly, our strategy in determining BursT value is that a heavier weight is achieved by a higher burstiness, in which some word occurs frequently in the window. Thus, the BursT weighting formula is shown in equation 6:
weight w,t = BS w, t * TOPw,t
(6)
Where the weight of the word w at time t will be constituted by two factors: BS (Burst Score) and TOR (Term Occurrence Probability). The detailed description of the weighting factors will be discussed in the following subsections. 3.2.1 The BS Weighting Factor The interval of arrival time between messages can be transformed into arrival rate for many streaming data applications. If the feature of some message arrives with short intervals incessantly, the feature representing the importance of a message may be more useful. Suppose there is a feature word w occurs in message sequence {mw,1, mw,2, mw,3…mw,t}, and each message has a specified arrival time atw,t. We can then define the arrival rate arw,t for current message mw,t by the formula shown in Equation 7:
arw,t =
atw,t
1 − atw,t −1 + 1
(7)
In Equation 7, if t=1, the interval value becomes zero because w is a brand new word in the system. The arrival rate arw,t represents the reciprocal type of arrival gap(atw,t-atw,t-1) which could be normalized between 0 to 1. In addition, there is an essential situation in microblogging messages according to our experimental statistics. The inter-arrival messages of the feature will accumulate a certain amount of arrival rate, particularly those words periodically appear daily, weekly and monthly. In order to reflect such a phenomenon and assign reasonable weights, an incremental mean method is adopted to compute the mean value of arrival rate of the word in this work.
1 n
μ n = μn −1 + ( )( xn − μn −1 ) Ei (ariw ,t ) = μ iw,t = μiw ,t −1 + (
1 )( ariw ,t − μiw ,t −1 ) niw ,t
(8)
(9)
Equation 8 represents the calculation of an incremental mean in mathematics form. Finch [12] explained a numerically stable way to avoid accumulating too large sums. In this work we apply their approach in our weighting scheme to formulate equations of insertion (i.e. equation 9), where ariw,t is the new arrival rate of the word. After computing the expectation values of arrival rate, the burst score is calculated as below:
ar − E (arw, t ) BSw,t = max{ w,t ,0} E (arw,t )
(10)
554
C.–H. Lee, C.–H. Wu, and T.–F. Chien
Therefore, we regard arw,t as the current observation result, to compare with expected value E(arw,t)of the word w at t-th arrival. In addition, we derive a formula equation 10 in which residual is the deviation between observation and expectation values. It should be noted that the result of equation 10 would not always be positive if the observation result is less than expectation value. In such a case, we define the word as a “falling word” at that time, and enable BS factor to be zero. 3.2.2 The TOP Weighting Factor The second consideration in BursT weighting scheme is TOP (term occurrence probability) factor, which is formulized by the proportion of the term in the sliding window. For the operation of mining hot news topics from messages, if a word occurs in more messages, it is more likely to be a trending topic. Thus, the term occurrence probability corresponding to the word w at t-th arrival is formulated as below:
TOPw,t = P( wt | ct ) =
{m : wt ∈ ct } ct
(11)
Where TOP represents the probability of the word occurrence in the sliding window, and ct denotes the message collection in the corpus collected from the time t-tw to current time. This factor would enable the weight of the word to grow with its occurrence frequency in messages, for identification of trending topics.
4 Experiments and Results 4.1 Data Source
The goal of this work is to develop a term weighting method for mining microblogging messages using sliding window techniques. In this section, we inspected BursT by setting the length of sliding window tw as an hour, and a total number of 27,929,301 microblogging posts were collected from Twitter (dating from September 19, 2010 to October 27, 2010). The test samples were collected through Twitter Stream API. After filtering out non-ASCII tweets, 14,009,908 available tweets had been utilized as our data source. Subsequently, we partitioned messages into unigrams and removed the substring “RT @username:”. In this work, the stopword list contains stopwords in seven different languages since the collected tweets contain multilingual texts. Finally, all capital letters in each tweet were converted into lowercase for our experiments. 4.2 Preliminary Experiment
One of the challenges encountered is due to the limitation of memory space for processing the streaming messages. Furthermore, an elimination process is also required for updating the weights in the limited memory space. Thus it is essential to start with reducing workloads by removing the “redundant features” in the index table. In order to look deep into these features, we have collected tweets for three weeks, as shown in Table 1.
BursT: A Dynamic Term Weighting Scheme for Mining Microblogging Messages
555
Table 1. Number of the term occurrence and the cumerative term occurrences
=1 ≤2 TO ≤ 3 TO ≤ 4 TO ≤ 5 TO ≤ 6 TO ≤ 7 TO ≤ 8 TO ≤ 9 TO ≤ 10 TO TO
total
2010/08/11-2010/08/17 # terms percentage 553210 64.553281% 653148 76.214903%
2010/08/18-2010/08/24 # terms percentage 546361 64.682219% 644370 76.285242% 686965 81.327951%
2010/08/25-2010/08/31 # terms percentage 561381 64.803643% 661501 76.361107% 705489 81.438911%
696744
81.302058%
722651
84.325108%
712591
84.361744%
731402
84.430207%
739801
86.326317%
729611
86.376697%
748713
86.428522%
752510
87.809312%
742143
87.860327%
761579
87.913723%
762210
88.941191%
751718
88.993885%
771401
89.047537%
769904
89.838993%
759160
89.874924%
779131
89.939858%
776251
90.579615%
765379
90.611175%
785512
90.676456%
781414
91.182079%
770604
91.229748%
790885
91.296694%
856982
100.000000%
844685
100.000000%
866280
100.000000%
In table 1, we found that the most of features occurred only few times within a week so we removed these terms if they appear less than three times as long as the length of sliding window. According to the experimental result, the elimination can remove more than 81% redundant features to enhance the performance. 4.3 Concept Drift with Burst Features and Oral Features
In order to examine the system performance in reflecting the concept drift of words, we selected “Chile’s Rescued Miners” event as our case study. Our experiment started with the first miner was rescued at 2010/10/13 11:11 (GMT +8:00), until all miners were rescued at 2010/10/14 20:56 (GMT +8:00). Figure 4 indicates that the intensity of inter-arrival gap the feature word “chile”, and it suddenly dropped at Oct 13 (GMT +8:00) when the event was happening.
Fig. 4. Inter-arrival gaps using the feature word “chile”
Subsequently, we compared the weighting values of BursT and TFIDF methods, as shown in Figure 5. We found that the incremental TFIDF can’t reflect the actual trends in sliding window algorithm, but TFPDF and our approach performed well in topic words. However, in the outcome of oral word analysis, we demonstrate the word “lol”, which both has a high density of collection and arrival rate, as an example, and obviously it might not be suitable to define “lol” as a valid feature. It is worth
556
C.–H. Lee, C.–H. Wu, and T.–F. Chien
Fig. 5. Evaluation of BursT, incremental TFIDF and TFPDF weighting techniques
mentioning that some popular oral words might be easily over weighted in TFPDF because it places too much emphasis on document frequency. As shown in Figure 5(d) and (e), the weighted number of “lol” in TFPDF is still higher than in incremental TFIDF even the event is still on the fly.
5 Conclusion In this paper, we describe a novel algorithm so called BursT, using the sliding window technique for weighting message streams. BursT facilitates online burst analysis by
BursT: A Dynamic Term Weighting Scheme for Mining Microblogging Messages
557
adopting a long-term expectation of arrival rate as a global baseline. The experimental results show that our weighting technique has an outstanding performance to reflect the shifts of concept drift, especially in dealing with the issue of dilimishing ineffective oral phrases. The result of this work can be extended to perform a periodic feature extraction, and is able to integrate other sophisticated clustering methods to enhance the efficiency for real-time event mining in social networks.
References 1. Tsymbal, A.: The Problem of Concept Drift: Definitions and Related Work. Trinity College, Dublin (2004) 2. Yang, Y., Pierce, T., Carbonell, J.: A Study of Retrospective and On-line Event Detection. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 28–36. ACM, Melbourne (1998) 3. Lee, C.H., Chien, T.F., Yang, H.C.: DBHTE: A Novel Algorithm for Extracting Real-time Microblogging Topics. In: The Conference Proceedings at the ISCA 23rd International Conference on Computer Applications in Industry and Engineering (CAINE 2010), Las Vegas, USA (2010) 4. Brants, T., Chen, F., Farahat, A.: A System for New Event Detection. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 330–337. ACM, Toronto (2003) 5. Bun, K.K., Ishizuka, M.: Topic Extraction from News Archive Using TFPDF Algorithm. In: Proceedings of the 3rd International Conference on Web Information Systems Engineering, pp. 73–82. IEEE Computer Society, Los Alamitos (2002) 6. Kleinberg, J.: Bursty and Hierarchical Structure in Streams. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 91–101. ACM, Edmonton (2002) 7. Kleinberg, J.: Temporal Dynamics of On-Line Information Streams. In: Conference Temporal Dynamics of On-Line Information Streams (2005) 8. Cataldi, M., Caro, L.D., Schifanella, C.: Emerging Topic Detection on Twitter Based on Temporal and Social Terms Evaluation. In: Proceedings of the Tenth International Workshop on Multimedia Data Mining, pp. 1–10. ACM, Washington, D.C (2010) 9. Cheong, M., Lee, V.: Integrating Web-based Intelligence Retrieval and Decision-making from the Twitter Trends Knowledge Base. In: Proceeding of the 2nd ACM Workshop on Social Web Search and Mining, pp. 1–8. ACM, Hong Kong (2009) 10. Chen, J., Nairn, R., Nelson, L., Bernstein, M., Chi, E.: Short and Tweet: Experiments on Recommending Content from Information Streams. In: Proceedings of the 28th International Conference on Human Factors in Computing Systems, pp. 1185–1194. ACM, Atlanta (2010) 11. Bifet, A.: Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams. IOS Press, Amsterdam (2010) 12. Finch, T.: Incremental Calculation of Weighted Mean and Variance. University of Cambridge (2009)
Towards an RDF Encoding of ConceptNet Marco Grassi and Francesco Piazza Department of Biomedical, Electronic and Telecommunication Engineering Universit` a Politecnica delle Marche - Ancona, 60131, Italy {m.grassi,f.piazza}@univpm.it http://www.semedia.dibet.univpm.it
Abstract. The possibility of relying on a rich background knowledge constitutes a key element for the development of more effective sentiment analysis and SW applications. In this paper, we propose to encode the wide knowledge base collected by Open Mind Common Sense initiative into ConceptNet, in a semantic aware format to make it directly available for Semantic Web applications. We also discuss how the encoding of ConceptNet into RDF can be beneficial to promote its connection with other resources such as WordNet and HEO ontology to further extend its knowledge base. Keywords: ConceptNet, ontology, Semantic Web, RDF/OWL, emotions.
1
Introduction
The World Wide Web represents one of the most revolutionary applications in the history of computing and human communication. In particular, the advent of the so called Web 2.0 has provided users with a large set of tools and applications for expressing themselves, their beliefs and their emotions. As a result the Web contains an always growing amount of information about people opinions, feelings and moods. The distillation of knowledge from this huge amount of unstructured affective information, also known as opinion mining and sentiment analysis, is a task which has recently raised more and more interest for reasons such as marketing and financial market prediction. Nevertheless, the dizzying expansion of the Web is not exempted from problems. As the Web grows at a constantly increasing rate, it becomes more and more difficult to capture the required information among a mass of irrelevant or mildly irrelevant information. Furthermore the demanded information is often spread over different sources and a highly time consuming activity is required to collect it, performing several queries and merging manually the partial information. This is due to the fact that most of web-content is perfectly suitable for human consumption, but it remains completely inaccessible to machines. The Semantic Web (SW) initiative by W3C tackles this problem through an appropriate representation of information in the web-page, able to univocally identify resources and encode the meaning of their descriptions in a machine D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 558–565, 2011. c Springer-Verlag Berlin Heidelberg 2011
Towards an RDF Encoding of ConceptNet
559
processable format. Computational ontologies are used in SW to provide specific domain knowledge base and allows inference power to applications. The possibility of relying on a rich background knowledge constitutes a key element for the development of more effective sentiment analysis and SW applications. With such purpose, in this paper, we propose to encode the wide knowledge base collected by Open Mind Common Sense initiative and gathered into ConceptNet, in a semantic aware format to make it directly available for Semantic Web applications. We also discuss how the encoding of ConceptNet into RDF can be beneficial to promote its connection with other resources such as WordNet and HEO ontology to further extend its knowledge base.
2
Semantic Web
The Semantic Web (SW)1 is an initiative by W3C that aims to improve the current Web in which the information can be expressed in a machine-understandable format and can be processed automatically by software agents. The SW enables data interoperability, allowing data to be shared and reused across heterogeneous device and applications. The SW is mainly based on the Resource Description Framework (RDF) to define relations among different data, creating semantic networks. In SW ontologies are used to organise information and formally describe concepts of a domain of interest. An ontology is a vocabulary including a set of terms and relations among them. Ontologies can be developed using specific ontology languages such as: the RDF Schema Language (RDFS) or the Web Ontology Language (OWL) Ontology for inference and knowledge base modelling. Even if originally created for the Web, SW techniques are suitable for application in all the scenarios that require advanced data-integration, to link data coming from multiple sources without preexisting schema, and powerful datamodeling to represent expressive semantic descriptions of application domains and to provide inferencing power for applications.
3
Concept Net
When people communicate with each other, they rely on similar background knowledge, e.g. the way objects relate to each other in the world, people’s goals in their daily lives, the emotional content of events or situations. This ‘taken for granted’ information is what it’s usually called common sense: obvious things people normally know and usually leave unstated. The Open Mind Common Sense (OMCS) project has been collecting this kind of knowledge from volunteers on the Internet since 2000 to provide intuition to AI systems and applications. ConceptNet (CN) [1] represents the information in the Open Mind corpus as a directed graph in which the nodes are concepts 1
See http://www.w3.org/2001/sw/ for Semantic Web initiative, related technologies and standards.
560
M. Grassi and F. Piazza
Fig. 1. A sketch of ConceptNet
and the labeled edges are assertions of common sense that interconnect them (Fig. 1). Differently from most of the existing lexical resources, concepts in CN are not simple word but also complex concepts. CN concepts are linked by 24 relations such as IsA, AtLocation and MadeOf which respectively answer the questions ‘What kind of thing is it?’, ‘Where would you find it?’ and ‘What is it made of?’. AnalogySpace [2] is a way of representing Common Sense knowledge base in a multidimensional vector space. Representing ConceptNet as a sparse matrix and using Singular Value Decomposition (SVD) to reduce its dimensionality, it’s possible to extract eigenconcepts, that are representative of large-scale patterns in the data collected. These can be used to determine concepts similarity and infer new common sense knowledge.
4
Other Resources
Several efforts have been done to develop wide knowledge bases for general purpose applications. Cyc [3] has been one of the first attempts to assemble a massive knowledge base spanning human Common Sense knowledge. Initially started by Doug Lenat in 1984, this project utilizes knowledge engineers who handcraft assertions and place them into a logical framework using CycL, Cyc’s proprietary language. Currently being developed, the Brandeis Semantic Ontology (BSO) [4] is a large lexical resource based in James Pustejovsky´ıs Generative Lexicon (GL), a theory of semantics that focuses on the distributed nature of compositionality in natural language. Princeton University’s WordNet [5] is one of the most widely used natural language processing resource today. Wordnet is a large lexical database of English, in which nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are
Towards an RDF Encoding of ConceptNet
561
interlinked by means of conceptual-semantic and lexical relations. WordNet’s structure makes it a useful tool for computational linguistics and natural language processing. Strapparava and Valituti [6] developed WordNet-Affect, an extension of WordNet Domains, including a subset of synsets suitable to represent affective concepts correlated with affective words. Wordnet-Affect assignes to a number of WordNet synsets one or more affective labels (a-labels). In particular, the affective concepts representing emotional state are individuated by synsets marked with the a-label emotion. There are also other a-labels for those concepts representing moods, situations eliciting emotions, or emotional responses. In a 2006 W3C document, Mark van Assem et al. [7] proposed a standard conversion of Princeton WordNet 2.0 to RDF/OWL. Following a similar metodology, recently, also WordNet 3.0 has been converted and published in a SW aware format, using RDF/OWL encoding [8]. This allows WordNet to be directly used by Semantic Web application developers and improves interoperability between SW applications that use WordNet.
5
Common Sense and Ontology Based Applications
ConceptNet is used as background knowledge in a wide number of applications requiring text disambiguation for many different purposes, ranging from the development of intelligent user interfaces to speech recognition improvement, from automatic story generation to on line product recommendation, as reported in [11]. In general, ontologies are increasingly employed in extracting opinions from text, to provide domain knowledge of individual words and define relationships between terms in order to understand the background knowledge and the meaning of text. For example, Zhang et al [12] used ontology for multi-word based text classification and Hotho et al. [13] have proposed to integrate core ontologies as background knowledge into the process of clustering text documents. Yang and Lin [10] use WordNet for word sense disambiguation to automatically discover and extract the underlying semantic information contained text documents. This semantic information is then annotated to the documents in RDF that conforms to the Semantic Web standards for future reuse. In an our recent work, we defined a novel paradigm for the classification and the management of social media resource based on their affective content, named Sentic Web [14]. A novel Artificial Intelligence technique, called Sentic Computing (SC) [15] is used for affective information inference from natural language. SC applies Single Value Decomposition on ConceptNet to build a vector space in which both concepts and emotions are represented by vectors. Distances between concepts and emotions in the Affective Space allow to estimate the affective valence and the polarity of concepts, enabling in this way the inference of emotions that are implicitly contained in text. Semantic Web techniques are employed for information encoding and management, to make it effectively accessible through an intuitive and engaging user interface and to enable semantic information retrieval. In particular a Human Emotion Ontology (HEO) [16] has been developed
562
M. Grassi and F. Piazza
Fig. 2. Some of HEO’s emotion categories
on purpose to standardizes the main existing models for the representation of human emotions into a computation ontology and provide at at the same time flexibility and interoperability in emotion description. HEO provides a large set of emotion categories taxonomically organized, as shown in Fig. 2, which other than provide a flexible emotion description can be also for sentiment analysis purposes as key-words set for affect detection.
6
Encoding ConceptNet in RDF
ConceptNet nature of a directed graph perfectly suits for being represented using RDF data model. Modeling the concepts relationships, defined by the 24 core properties introduced by ConceptNet 3.0, using the expressiveness of OWL ontology language allows to fully encode the semantic of data connections in a univocally machine interpretable and processable format. In this way, all the knowledge contained in ConceptNet becomes available for being effectively accessed and integrated into Semantic Web applications. OWL language also provides agile extensibility to ConceptNet knowledge base, allowing to define new properties and to add new statements, and enhance its interoperability with other existing ontologies. 6.1
Linking ConceptNet to WordNet
The idea of merging the knowledge contained into ConceptNet with WordNet is not new an has been already discussed and implemented in literature, as in [17], [18], but at the best of our knowledge has never been tackle from a SW
Towards an RDF Encoding of ConceptNet
563
Fig. 3. Augmenting ConceptNet knowledge base by WordNet connection
perspective, which can be beneficial in the development of SW applications. Once expressed in a Semantic Web compliant format, CN can be easily linked with WordNet RDF as an interconnected knowledge base. In this way, WordNet synsets can be used to find the synonyms of CN concept and to generate new data connections in CN. For example, as shown in Fig. 3, the word car exists in both ConceptNet and WordNet. CN contains several statements describing car properties, between which ”car isA machine”. WordNet car-Synset supplies other synonyms of car such as auto, automobile and motor car. Also these concepts are in CN and contain their own statements. The whole set of these statements is valid for each of the synonyms. In this way much new knowledge base can be inferred and added to CN. In our case, for example, previously unstated assertions can be introduced that state that an auto, an automobile and a motor car are machine. 6.2
Extending ConceptNet with Affective Information
As remarked above, CN represents a powerful tool for sentiment analysis purposes that allows to extract affective information from text, by estimating the polarity of concepts and their affective valence. A polarity property can be therefore added to CN to describe the positive and negative valence of a concept, with values ranging between -1 to 1. Also, CN can be linked to HEO ontology to provide an accurate description of the affective valence of a concept using the semantic descriptors provided by HEO, as shown in Fig. 4. Once encoded explicitly, such information can be accessed using SPARQL queries and becomes easily available for sentiment analysis applications.
564
M. Grassi and F. Piazza
Fig. 4. Extending ConceptNet with affective information
7
Conclusion and Future Works
In this paper, we showed the feasibility of encoding ConceptNet in a semantic aware format and discussed how this can result beneficial making available such wide knowledge base for being directly accessed by Semantic Web application and allowing its further extension. Future efforts will focus on the actual encoding of the whole ConceptNet knowledge base into RDF and its exploitation to drive novel generation sentiment analysis application based on the semantic analysis of text.
References 1. Havasi, C., Speer, R., Alonso, J.: ConceptNet 3: a flexible, multilingual semantic network for common sense knowledge. In: Recent Advances in Natural Language Processing, Borovets, Bulgaria (September 2007) 2. Speer, R., Havasi, C., Lieberman, H.: AnalogySpace: reducing the dimensionality of common sense knowledge. In: AAAI 2008: Proceedings of the 23rd National Conference on Artificial Intelligence, pp. 548–553. AAAI Press, Menlo Park (2008) 3. Lenat, D.B.: CYC: a large-scale investment in knowledge infrastructure. Commun. ACM 38(11), 33–38 (1995) 4. Pustejovsky, J., Havasi, C., Saur, R., Hanks, P., Rumshisky, A.: Towards a generative lexical resource: The Brandeis Semantic Ontology. In: Proceedings of the Fifth Language Resource and Evaluation Conference (2006) 5. Miller, G., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to Wordnet: An On-Line Lexical Database. J. Lexicography 3, 235–244 (1990) 6. Strapparava, C., Valitutti, A.: WordNet-Affect: An Affective Extension of WordNet. In: Int. Conf. Language Resources and Evaluation, vol. 4, pp. 1083–1086 (2004)
Towards an RDF Encoding of ConceptNet
565
7. van Assem, M., Gangemi, A., Schreiber, G.: RDF/OWL Representation of WordNet, Editor’s Draft (April 23, 2006), http://www.w3.org/2001/sw/BestPractices/WNET/wn-conversion.html 8. Princeton Wordnet3.0 in RDF, http://semanticweb.cs.vu.nl/lod/wn30/ 9. Khan, K., Baharudin, B.B., Khan, A., e-Malik, F.: Mining opinion from text documents: A survey. In: 3rd IEEE International Conference on Digital Ecosystems and Technologies, DEST 2009, June 1-3, pp. 217–222 (2009) 10. Yang, C.-Y., Lin, H.-Y.: An automated semantic annotation based-on Wordnet ontology. In: 2010 Sixth International Conference on Networked Computing and Advanced Information Management (NCM), August 16-18, pp. 682–687 (2010) 11. ConceptNet: A Practical Commonsense Reasoning Toolkit. BT Technology Journal 22 (to appear) 12. Zhang, W., Yoshida, T., Tang, X.: Text classification based on multi-word with support vector machine. Knowledge-Based Systems 21(8), 879–886 (2008) 13. Hotho, A., Staab, S., Stumme, G.: Ontologies Improve Text Document Clustering. In: Third IEEE International Conference on Data Mining, ICDM 2003 (2003) 14. Grassi, M., Cambria, E., Hussain, A., Piazza, F.: Sentic Web: a New Paradigm for Managing Social Media Affective Information. To appear in: Cognitive Computation, Springer, Heidelberg (2010) 15. Cambria, E., Hussain, A., Havasi, C., Eckl, C.: Sentic computing: Exploitation of common sense for the development of emotion-sensitive systems. In: Esposito, A., Campbell, N., Vogel, C., Hussain, A., Nijholt, A. (eds.) Second COST 2102. LNCS, vol. 5967, pp. 148–156. Springer, Heidelberg (2010) 16. Grassi, M.: Developing HEO human emotions ontology. In: Fierrez, J., OrtegaGarcia, J., Esposito, A., Drygajlo, A., Faundez-Zanuy, M. (eds.) Proceedings of the 2009 joint COST 2101 and 2102 International Conference on Biometric ID Management and Multimodal Communication (BioID MultiComm 2009), pp. 244– 251. Springer, Heidelberg (2009) 17. Altadmri, A., Ahmed, A.: VisualNet: Commonsense knowledgebase for video and image indexing and retrieval application. In: IEEE International Conference on Intelligent Computing and Intelligent Systems, ICIS 2009, November 20-22, vol. 3, pp. 636–641 (2009) 18. Aslam, N., Irfanullah, I., Loo, J., Looms, M., Ullah, R.: A Semantic Query Interpreter framework by Using knowledge bases for image search and retrieval. In: 2010 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), December 15-18, pp. 414–419 (2010)
Modeling of Potential Customers Identification Based on Correlation Analysis and Decision Tree Kai Peng and Daoyun Xu Department of Computer Science Guizhou University, Guizhou, China [email protected]
Abstract. The precise identification for potential customers who will subscribe mobile data services steadily is becoming the attention of mobile communications corporations in near year. With the application of data mine techniques, this paper introduces a model which will help mobile communications corporations to find potential customers who will subscribe Multimedia Messaging Service (MMS, as one of mobile data services, is a standard way to send messages that include multimedia content to and from mobile phones.) . In this paper, the modeling process will be displayed. Firstly, with the using of histogram technique, data preprocessing which is an important part of modeling is described in detail. Then, the application of correlation analysis and decision tree to find rules for identifying potential and stable customers will help us to generate the final model. In the conclusion, a suggestion on future developments to efficiently improve the model will be given. The result of validation for this model show a high accuracy and achieves the standard for practical business application. Based on targets mined by the model, a mobile communications corporation has launched a series of precise marketing and achieve significant economic benefits in 2010. Keywords: Dicision TreeCorrelation AnalysisSupportConfidenceLift.
1
Introduction
Along with the development of mobile communications, the types of mobile data services increase more and more in order that consumers become confused and do not even know what kind of services are right for them. This situation limits the release of consumers’ demands for subscribing mobile data services. In order to solve the problem, from historical records of database mobile communications corporations try to introduce data mining method to find the potential consumers who are interested in the mobile data services and remain Sustainable consumption.
Project Supported by Natural Science Foundation of China (NSFC, NO. 60863005, NO. 61011130038), the Government Foundation of Guizhou Province of China (NO.200802).
D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 566–575, 2011. c Springer-Verlag Berlin Heidelberg 2011
Modeling of Potential Customers Identification
1.1
567
Published Solutions
Some literatures have published solutions which are listed below: • To improve marketing effectiveness, Wang Kun [1] introduced clustering and association analysis to mine potential consumption information from enterprise’s database. • By using Apriori algorithm to analyze the data of market research, taking the insurance industry as an example, Yu yan Li [2] explored consumers’ characteristics which are used for shaping marketing strategies. • To obtain information of consumers’ characteristics and demands which can dynamically reflect changes in the market, Wang Zhe [3] adopted Apriori algorithm to analyze the data of market research of communication corporations. Based on CART algorithm, Kong Ying [4] constructed a singleclassification model through mining characteristics data of consumers using credit cards, and then use the model to predict the potential loan customers. • Nan Zhang [5] proposed an improved CART algorithm to achieve identification of potential consumers. 1.2
Problems of Published Solutions
However, the application of above methods for identification of potential customers who are interested in subscribing mobile data services are still inadequate: • The solutions are more concerned about performance improvement of algorithms with the neglect of complexity in the practical application, which will increase the cost of corporations for developing and maintaining model. • Because of the researchers’ lack of practical business experience, those algorithms proposed in the literature did not give the method how to choose effective variables before carrying out data mining. Nevertheless, the selection of variables has a major impact on the performance of the model. • Some solutions are to find the major characteristics of potential consumers and neglect analysis of stability of consumption of them. Therefore, there is a problem that a part of potential consumers mined by those models will also be prone to churn, which will be a great loss to the enterprise. • The main object of those solutions is to find potential consumers in the financial industry or the retail business, however, it are difficult to be transplanted to mobile communications corporations for the application because of differences of consumption characteristics of consumers in various industries. 1.3
Outline of Article
In view of this, based on consumers’ historical information from data warehouse of a mobile communications corporation, through using of correlation analysis and decision tree, this paper introduces an effective model in order to find potential and continuable customers who are interested in MMS. The method can also be used for modeling for mining potential and continuable customers who will to subscribe other mobile data services.
568
2
K. Peng and D. Xu
Data Preprocessing
2.1
Variable Selection
Through the extraction and analysis of the historical consumption records in july, august, and september 2010 of database supported by a mobile communications corporation, we formed a set of variables for data mining. Those variables mainly contain the following types: sales records of mobile calling services, traffic records of mobile data services, accounts of consumers, characteristics of consumers, etc. 2.2
Data Cleansing
A histogram is introduced to analyze above variables and divide ranges of its. By using the analytical technique, we filter out the impact of the larger variables. With the application of the histogram, Fig.1 shows the results of analysis for the variable ARPU (Average revenue per user).
Fig. 1. The results of analysis for the variable ARPU with the application of the histogram
From the Fig.1, we can draw the following conclusions: The higher average revenue consumers spent on mobile services, especially more than 80 RMB per months, the higher possibility consumers will subscribe MMS. Similarly, with analysis of histogram, we can draw the following conclusions for other variables. • The high-value customers who are used to roaming calls prefer to choice MMS. • The people who are often contacted by a large number of friends subscribing MMS will have a great possibility for usage of MMS.
Modeling of Potential Customers Identification
569
• Consumers who have subscribed mobile email services are favor in subscribing MMS. With the analysis of histogram for other variables, there are some conclusions about its, however, which will not be listed in this paper because of the limited space of this layout. 2.3
Final Variables
With the completion of data cleansing, the final 31 variables are generated for data mining and listed the following table 1: Table 1. The List of Variable S for Data Mining
Variables int sms cnt
Remark
The traffic of SMS(short messaging service is one of mobile data services) Int msms cnt The trafficr of monternet SMS int mmms com cnt The traffic of monternet MMS f gprs flow The traffic of General packet radio service (GPRS) is a packet oriented mobile data service on the 2G and 3G int main busi cnt The number of core mobile data servicessubscribed int busi detail cnt The number of mobile data services-subscribed int reld p2p mms cnt The Cumulative frequency of MMS applied by people contacted with a subscriber int cal color num The Cumulative number of Color-Ring used by people contacted with a subscriber f roam cal cnt The frequency of raoming calls f roam cal dur The time of raoming calls f roam cal dur pct The proportion of roaming calls time to overall calls time per month should bill The prepayment of consumer per month fact bill The average revenue per user f busi amt bill The average revenue of consumers in mobile data services int cal o cnt The frequency of local calls subscriber int cal i cnt The frequency of local calls subscriber int cal o dur The duration of local calls subscriber int cal i dur The duration of local calls subscriber f long calt cnt The frequency of local long-distance calls f long calt dur The duration of local long-distance calls f long calt dur pct The proportion of local long-distance calls time to overall calls time per month sex id The sexual proportion of consumers
570
K. Peng and D. Xu Table 1. (continued )
Variables age id brand id si mmsnews flag si wap flag si cring func flag si attr06 flag si 0121 flag si attr13 flag si attr09 flag
3
Remark The age of consumers The brand of mobile services Whether consumers have used MMS Whether consumers have used wap Whether consumers have used wap Whether there is an active consumer using Fetion data service Whether there is an consumer using the data service about weather forecast Whether there is an active consumer using mobile email Whether there is an active consumer using mobile magazine
Data Mining
According to variables obtained after data preprocessing, we extract 9 million historical consumption data for modelingin which there is a 4 million data set for training model, a 2.5 million data set for testing model and a 2.5 million data set for checking out model [6]. Those historical consumption data have the following characteristics: Firstly, each data has various variables listed in table 1 and corresponding values; secondly, to ensure that all values of those variables have a good representation in the analysis and statisticswe extracted data in the three consecutive three-month and convert them into average ones; Finally, we discrete some values of variable, such as traffic of smstraffic of mobile data service , etc. Next, with the application of correlation analysis Aprori [7] and decision tree C4.5 to mine data in the training set will help us to generate the final model, which will be adopted to identify potential and sustainable consumption customers who are willing to use MMS. The detailed steps are shown as the following: 3.1
Step 1: With the Application of Association Analysis Based on Apriori Algorithm, We Will Find Rules by Mining Data in the Training Set for Identifying Potential Customers
Association analysis is used to mine correlation of variables between si mmsnews flag and others listed in table 1. In computer science and data mining, Apriori is a classic algorithm for learning association rules and is a widely used and most influential. Thus, by using association analysis, we mine the 4 million training set and obtain the results shown in the following tables 2 and 3:
Modeling of Potential Customers Identification
571
Table 2. Two-Variable Association Analysis
No. Arrange of Variables Value Supp. Confid. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
int msms com cnt >100 0.0298 100>int msms com cnt >50 0.0028 50>int msms com cnt >25 0.0018 int msms cnt>40 0.0101 int mmms com cnt>100 0.0306 100>int mmms com cnt>50 0.0024 50>int mmms com cnt>25 0.0023 int busi detail cnt>15 0.0002 15>int busi detail cnt>10 0.0008 10>int busi detail cnt>5 0.0065 si attr13 flag=Y 0.0015 f busi amt bill>100 0.0015 100>f busi amt bill>50 0.0031 int main busi cnt>15 0.0015 15>int main busi cnt>10 0.0031 int reld p2p mms cnt>50 0.0026 50>int reld p2p mms cnt>30 0.006 brand id = ”GOTON” 0.0217 800>should bill>500 0.0003 500>should bill>300 0.0015 fact bill>500 0.0004 500>fact bill>300 0.0015 200>f roam cal dur>100 0.0046 f roam cal dur>50 0.0059
0.8655 0.5573 0.4589 0.1499 0.8539 0.4873 0.4296 0.5486 0.4445 0.3092 0.2022 0.1909 0.1647 0.3715 0.2083 0.2059 0.1558 0.152 0.1754 0.1552 0.1672 0.1552 0.1397 0.1361
The strength of an association rule can be measured in terms of the Support and Confidence [7]. From the table 2 we can see that the greater value of support and confidence [7] is and the stronger the association of variables between si mmsnews flag and others is, which mean that the probability for consumers to subscribe MMS will be higher. Through association analysis, we will obtain some rules for identifying potential customers as follows: • Consumers who have more traffic of Monternet MMS will tend to subscribe MMS, they are mainly those people who have more demands for applying mobile data service, such as students. • Business customers will prefer to subscribe MMS because it can effectively help them to obtain real-time economic information. • Consumers whose friends are used to use MMS will be adopted to choice MMS NEWS.
572
K. Peng and D. Xu Table 3. Three-Variable Association Analysis
No.
Arrange of Variables Value The First Set The Second Set
Supp. Confid.
1 int main busi cnt>15 150>fact bill>80 0.0024 2 int main busi cnt>15 300>fact bill>150 0.0018 3 int main busi cnt>15 fact bill>500 0.0001 4 int main busi cnt>15 500>fact bill>300 0.0004 5 int main busi cnt>15 80>fact bill>50 0.0005 6 15>int main busi cnt>10 500>fact bill>300 0.0006 7 15>int main busi cnt>10 300>fact bill>150 0.0036 8 15>int main busi cnt>10 150>fact bill>80 0.0063 9 15>int main busi cnt>10 fact bill>500 0.0001 10 15>int busi detail cnt>10 150>fact bill>80 0.0003 11 15>int busi detail cnt>10 300>fact bill>150 0.0003 12 int reld p2p mms cnt>50 f busi amt bill>100 0.0002 13 int reld p2p mms cnt>50 100>f busi amt bill>50 0.0004 14 int reld p2p mms cnt>50 50>f busi amt bill>30 0.0006 15 int reld p2p mms cnt>50 fact bill>500 0.0002 16 int reld p2p mms cnt>50 150>fact bill>80 0.0007 17 50>int reld p2p mms cnt>30 f busi amt bill>100 0.0003 18 50>int reld p2p mms cnt>30 100>f busi amt bill>50 0.0007 19 50>f roam cal cnt>30 f busi amt bill>100 0.0001 20 f roam cal cnt>50 f busi amt bill>100 0.0004 21 50>f roam cal cnt>30 100>f busi amt bill>50 0.0003 22 f roam cal cnt>50 100>f busi amt bill>50 0.0009 23 brand id = “GOTON” f busi amt bill>100 0.0011 24 si wap flag=“Y” fact bill>500 0.0001
0.4165 0.3988 0.3661 0.3651 0.2695 0.2507 0.2681 0.2477 0.2388 0.442 0.4174 0.4365 0.2993 0.2543 0.2316 0.2205 0.2923 0.2478 0.2702 0.2555 0.2024 0.2222 0.2581 0.2513
Similarly, from the table 3 we can find rules as follows: • Consumers whose categories of data services in the mobile center platform are more than 10 and the bill of consumer per month is more than 80 RMB will be adopted to choice MMS. • Consumers whose friends are used to apply MMS more than 50 times and Moment mobile data services and average expenditure of consumers using data services more than 50 RMB will be adopted to choice MMS • Consumers whose average expenditure on mobile data services is more than 50 RMB per month will be adopted to choice MMS. With the analysis of association, there are some other conclusions, however, which will not be listed in this paper because of the limited space of this layout.
Modeling of Potential Customers Identification
3.2
573
Step 2: With the Application of Decision Tree Based on C4.5 Algorithm, We Will Find Rules by Mining Data in the Training Set for Identifying Sustainable Consumption Customers
C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension of Quinlan’s earlier ID3 algorithm. After several rounds of selection, we select the following references for mining data: The maximum split number is 300, and the maximum number of nodes is 300, and the maximum depth is 8, as well as the decision tree’s pruning method is the Gain Ratio. Through the application of decision tree, we get the following results: • A customer’ consumption is more sustainable when his traffic of SMS between 50 and 100, as well as traffic of GPRS is between 0 and 3 MB and average revenue is between 80 and 150 RMB. • A customer’ consumption is more sustainable when his traffic of mms per month is more than 100 and traffic of GPRS per month is more than 3 MB. • A customer’ consumption is more sustainable when their proportion of roaming calls time to overall calls time per month is between 10% and 25% , as well as average expenditure for using mobile data services is less than 10 RMB. With the analysis of decision tree, there are some other conclusions, however, which will not be listed in this paper because of the limited space of this layout. 3.3
Integration Rules
With the rules generated from step1 and step2, we can obtain the integrated rules for identifying potential and sustainable customers. By using those integrated rules, we can identify potential and sustainable customers from data base of mobile communications corporations, and divide those customers into three categories (as shown in Fig. 2): • Sociable consumers: Consumers whose friends are used to use MMS and Moment mobile data services will be adopted to choice MMS.
Fig. 2. The categories of potential and sustainable customers
574
K. Peng and D. Xu
• Business consumers: Consumers who are business man and travel frequently will be adopted to choice MMS. • Campus customersConsumers who are young students and prefer to use data services will be adopted to choice MMS.
4
Model Validation
Based on a 2.5 million data set for testing model and a 2.5 million data set for checking out model, we will validation the rules, or model in items of accuracy and LIFT ( The LIFT is a measure of the performance of a model at classifying cases, measuring against a random choice model.). The results are shown the following table 4: Table 4. The Results of Validation for Model
2.5 million data for test 2.5 million data for checkout Accuracy LIFT Accuracy LIFT Sociable consumers Business consumers Campus customers All customers
89.84% 90.42% 80.15% 89.19%
1.94 1.95 1.73 1.93
87.66% 89.01% 78.56% 87.50%
1.89 1.92 1.7 1.89
From above results in the table 4, we can see, validation based on data test set and checkout set, the model’s accuracy and LIFT value are effective respectively. In 2010, a mobile communications corporation has used model to carry out precise marketing. With application of the model, marketing effectiveness is increased to 16.1%, while the traditional case is only 2.1%. Cost savings of 4 million RMB per month have been reached.
5
Conclusion
The application of histogram technique for selection of variables in the process of data mining can greatly enhance efficiency. The method which introduce a variety of algorithms for data mining in a certain case is better than a single algorithm. For example, in this paper, compared to usage of a single algorithm, the application of correlation analysis and decision tree together to find rules for identifying potential and sustainable customers will help us to generate the higher efficient model. In addition, the application of correlation analysis and decision tree to generate the model is not only high explanatory but also high efficiency. Finally, the results of validation show that the model’s accuracy and LIFT is more than 80However, there are still shortcomings following:
Modeling of Potential Customers Identification
575
• The potential and sustainable customers identified by the model could be biased in the actual commercial applications because customers’ behavior in the process of consumption are also affected by incidental factors, such as psychological factortime factor, etc. • In the process of data mining, the algorithm for modeling need to be further improved because of some defects in the algorithm itself. All in all, there should be a solution to overcome the above problems and improve the model in the further.
References 1. Wang, K.: The Study of Data Mining for the Application of Model In the Field of Cross-selling. Market Modernization (7), 65 (2007) 2. Yu, Y.: Apriori Algorithm with Their Application in Insurance. China New Technologies and Products (1) (2010) 3. Wang, Z., Zheng, Z.: Mining Apriori Algorithm in Data Mining in the Application of Mobile Value-added Business, http://www.paper.edu.cn/index.php/default/ releasepaper/content/200902-933 4. Kong, Y.: Classification Algorithm Based on Data Mining in the Identification of Potential Customers. Computer Era. (9) (2008) 5. Zhang, N.: The Study of Improved CART Algorithm in the Application of Identification of Potential Customers. Heibei University of Computer Science Technology, Hebei (2008) 6. Xie, B.: The Application of Data Mining in Insurance Industry. Statistics & Information Forum 24(11), 79–91 (2009) 7. Chen, A., Chen, L.: Data Mining Technology and Application. Science Press, Beijing (2007)
An Innovative Feature Selection Using Fuzzy Entropy Hamid Parvin, Behrouz Minaei-Bidgoli, and Hossein Ghaffarian Computer Department, Iran University of Science and Technology (IUST), Tehran, Iran {parvin,b_minaei,ghaffarian}@iust.ac.ir
Abstract. In this paper, a new feature subset selection approach is introduced. The proposed approach consists of two phases. In the first phase, we tried to reduce the run time order of the algorithm which is critical for high dimensional datasets. In this phase, first entire dataset is classified and according to silhouette value, the best number of clusters in the dataset is found. Using this value, second, each feature is classified alone with the same cluster number and proposed entropy fuzzy measures for them are calculated. In the second phase, it is tried to find a feature subset that meets the boundaries to get a high accuracy degree. The proposed method is examined on different datasets. The examination results show that the proposed method leans to find and select the minimum number of features with negligible removing final classification accuracy, among different feature subset selection methods. Keywords: Feature Selection; Fuzzy Entropy, Clustering.
1 Introduction One of the great challenges in data mining lies in dealing with the large datasets and their high dimensionality. That is, it suffers from a strange problem: size of datasets. Most of the times, there is huge amount of data that should be taken into account to find the final results. Working with such large datasets needs large amount of memories and powerful computers too. However some of the features are irrelevant or redundant. So, there are two main reasons to keep the dimensionality of the pattern representation as small as possible: measurement cost and classification accuracy. For getting rid of the necessity of using a large amount of data, there are two common methods, feature subset selection and dimensionality reduction. It is important to make a distinction between them [15]. In general, dimensionality reduction can be performed by keeping only the most important dimensions, i.e. the ones that hold the most useful information for the task at hand, or by projecting the original data into a lower dimensional space that is most expressive for the task. Feature selection methods only select a subset of meaningful or useful dimensions from the original set of dimensions. Using feature subset selection techniques, it could be omitted redundant and irrelevant features to reduce the amount of data in run time. There are many proposed feature subset selection techniques. The goal of Feature selection is to map a set of observations into a space that preserves the intrinsic structure as much as possible so that each target feature space is one feature of source D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 576–585, 2011. © Springer-Verlag Berlin Heidelberg 2011
An Innovative Feature Selection Using Fuzzy Entropy
577
feature space. Chaikla and Qi [3] propose a GA approach for feature subset selection. De et al. [7] and Platt [12] propose neuro-fuzzy approaches for this problem. De et al. [6] propose the OFEI and the FQI methods based on Neuro-fuzzy feature evaluation. Tsang et al. [16] propose OFFSS method. Battiti [2] proposes the mutual informationbased feature selector. Shie and Chen [14] propose FEMFSS method. Dimensionality reduction is the transformation of high-dimensional data into a meaningful representation of reduced dimensionality so that each target feature space is (could be) different from all source feature space. Many approaches have been proposed for dimensionality reduction, such as the well-known methods of PCA [10], LDA [1], FLDA [8], ICA [4] and MDS [5]. PCA offers a straightforward approach for computation and is optimal in a statistical sense of preserving a maximal amount of the variability present in the original data. Kernel version of PCA named Kernel Based PCA is proposed by Maaten et al. [11]. The purpose of MDS is to provide a visual representation of the pattern of proximities (i.e., similarities or distances) among a set of objects. Two novel methods have been proposed to tackle the nonlinear dimensionality reduction problem, namely Isomap [17] and LLE [13]. In this paper, a new method to handle feature subset selection for classifying problems based on FEMFSS [14] is proposed. Although FEMFSS method has a good performance, it has a complex approach in the first phase that increases its time complexity. Also in the second phase, it uses high values as boundary thresholds; so number of rounds in this algorithm will increase in high dimensional datasets. The proposed method tries to omit these problems by replacing them with simpler situations. The proposed method is also compared with the FOS-MOD algorithm witch is mainly determined by two parts: the orthogonalization procedure and the calculation of the correlation matrix [18].
2 FEMFSS Method Fig. 1 shows the pseudo-code of FEMFSS algorithm. FEMFSS algorithm consists of two phases. In the first phase (line 3-18), it tries to find the best number of clusters and cluster heads for each feature in dataset using K-Means algorithm. The best number of clusters for a feature f is founded when the difference between new fuzzy measure (with K clusters) and old fuzzy measure (with K-1 clusters) of the feature f is less than threshold Tc. Therefore, for a dataset with F features, FEMFSS will run F×K independent clustering. To find the best number of K, some iteration is needed in KMeans. After that, using discovered cluster heads, the algorithm creates membership functions and tries to assign a fuzzy measure for each value of the feature. In the second phase (line 19-28), the algorithm tries to find the best feature subset that covers their conditions in a combined extension matrix. For this reason, first an extension matrix that consists of membership grade of values of features is created. With fuzzy measures of this matrix; the authors calculated their proposed Class Degree (CD) and fuzzy entropy measure for each feature. In each round, the feature with minimum fuzzy entropy is selected, omitted from the extension matrix and inserted into and combined with the combined extension matrix. This process will be continued till it selects all the remainder features or changes negatively in new fuzzy
578
H. Parvin, B. Minaei-Bidgoli, and H. Ghaffarian
1: FEMFSS( dataset D, threshold Tc,Tr) 2: { 3: While exist feature in dataset D 4: { 5: Select feature f; 6: K=2; 7: While true 8: { 9: Find K clusters’ heads in f using K-Means algorithm; 10: Create membership functions using clusters’ heads; 11: Calculate fuzzy entropy of feature f; 12: If (decreasing rate of fuzzy entropy>Tc) 13: K=K+1; 14: else 15: K=K-1; 16: break; 17: } 18: } 19: Create extension matrix EMf; 20: Calculate fuzzy entropy of each feature; 21: While true 22: { 23: Select feature f with minimum fuzzy entropy value; 24: Add f into previous selected subset and update combined extension matrix; 25: Calculate fuzzy entropy of new selected subset according to Tr; 26: If (new fuzzy entropy value > previous fuzzy entropy value) or (fuzzy entropy =zero) or (there is no additional feature for selection) 27: break; 28: } 29: }
Fig. 1. FEMFSS pseudo-code
entropy of selected subset (fuzzy entropy become worse than previous) or fuzzy entropy becomes equal to zero for it. For calculating fuzzy entropy of the selected subset, a class degree threshold value Tr∈[0,1] given by the user is defined. This threshold acts as a cross-over point for selecting final result of AND operation between fuzzy measures of the previous selected subset and new selected feature f.
3 Classifying Features FEMFSS algorithm chose a complicated way to find the best clusters number in each feature which is not efficient in high dimensional datasets. For a dataset D with F features, N samples, K independent clusters for each feature and I iterations for each K clusters for every feature, complexity of FEMFSS for finding the best clusters for all features is:
⎛ F I K ⎞ Complexity FEMFSS = O ⎜ ∑∑∑ N × k ⎟ ⎝ f =1 i =1 k = 2 ⎠
(1)
In each round, the algorithm must calculates membership functions, fuzzy values, CD and fuzzy entropies to find if it needs more rounds to find the best number of clusters. This method is too time-consuming. To remove this complexity, the proposed approach tries to find the optimal number of clusters for the whole of dataset. According to this value, the algorithm will find the best clustering for each feature. For finding the best possible number of clusters in the dataset, Silhouette value is used. The Silhouette value for each point is a measure of how similar that point is to the points in its own cluster compared to the points in the other clusters. This measure ranges from +1, indicating points which are very distant from neighboring clusters, through 0, indicating points which are not distinctly
An Innovative Feature Selection Using Fuzzy Entropy
579
in one cluster or another, to -1, indicating points which are probably assigned to the wrong cluster [9]. It is defined as:
(
)
(
S (i ) = min (b ( i ,:) ,2) − a ( i ) . / max a ( i ) ,min (b ( i ,:) , 2)
)
(2)
where a(i) is the average distance from the ith point to the other points in its cluster, and b(i,k) is the average distance from the ith point to the points in another cluster k and ./ is an operator for dividing elements of the left matrix with the same elements in the right matrix one by one. To find the best value of K, it will rise up until new mean value of silhouette value becomes less than previous one. It means that increasing the number of clusters reduces the accuracy in clustering and it must be stopped. After finding the correct number of the clusters in the dataset, for example K clusters, the proposed method uses this number and using the K-Means clustering algorithm, it classifies the data for each feature f into k clusters and finds clusters’ centers. For the rest of the algorithm, we followed a similar approach likes FEMFSS. To create triangular membership functions for each feature, clusters’ heads are used as top points and left and right points of each triangle are determined as follow: ⎧U − (CC i −U min ) CC i , Left = ⎨ min CC i −1 ⎩
if
i =1 ,
otherwise
⎧U + (Umax −CCi ) CCi ,Right = ⎨ max CCi +1 ⎩
if i = K otherwise
(3)
Regardless of the leftmost and the rightmost triangular functions, for the other functions, it is used the center points of neighbor triangular functions on the left and the right, as the left and the right points of each triangle. Only for the left point of the most left triangular function and the right point of the most right triangular function, calculation is needed. For each attribute x of the sample s, it could be found the fuzzy degree in fuzzy set v i using the following formula: ⎧ ⎧ ⎫ CC i − x , 0 ⎬ if x < CC i ⎪ Max ⎨1 − CC CC − i Left ⎩ ⎭ ⎪ μv i ( x ) = ⎨ ⎧⎪ ⎫⎪ x − CC i ⎪ , 0⎬ otherwise ⎪Max ⎨1 − CC Right − CC i ⎩⎪ ⎭⎪ ⎩
(4)
where CCi is the ith cluster center. Fuzzy entropy has a better performance than common entropy measure [2]. There are different methods for calculating fuzzy entropy [2, 10, 1, 8, 4 and 5]. Assume that a dataset with S samples is divided into C classes and K clusters. For a fuzzy set v i that is defined for a feature f, the Similarity Degree is defined as follow: SDC (v ) =
∑ ∑
s ∈S kc s ∈S k
μv ( s ) μv ( s )
(5)
where Sk is a subset of S samples in kth cluster, Skc is a subset of Sk which is belong to class c where c ∈C . The fuzzy entropy of class c in fuzzy set v i in kth cluster is defined as follow:
580
H. Parvin, B. Minaei-Bidgoli, and H. Ghaffarian
FE ck (v ) = −SDck (v ) log 2 SDck (v )
(6)
The total fuzzy entropy of class c of fuzzy set v i , its summation fuzzy entropy and fuzzy entropy of feature f is respectively defined as follow:
FE c (v ) =
s
FE c (v ) , FEF ( f ) = ∑ v SFE (v ) ∑ FE (v ) , SFE (v ) = c∑ ∈C v ∈V s
k ∈K
(7)
ck
where V is the set of fuzzy sets in feature f and v ∈V is one of these fuzzy sets, S FE (v ) is the fuzzy entropy of fuzzy set v , S is summation of the fuzzy degrees of the samples in feature f and
S v is the summation of the fuzzy degrees of the samples
of the fuzzy set v in feature f. The overall complexity of this part of the method is: F I ⎛ I K ⎞ Complexity TheProposedMethod = O ⎜ ∑∑ N × k + ∑∑ N × k ⎟ f =1 i = 2 ⎝ i =1 k = 2 ⎠
(8)
As above equation, the proposed method breaks the main loop of FEMFSS algorithm; so it can run faster. This is very important, especially in high dimension datasets. While the order of running time of FEMFSS for clustering is O(n3), the order of running of clustering in the proposed algorithm is remained at O(n2).
4 Finding Features’ Subsets For the rest of the work and finding features’ subsets, first a Fuzzy Membership Grade Matrix is defined which consists of K fuzzy membership grades (one fuzzy membership grade for each cluster) of feature f for each one of N samples in dataset. ⎡ μv 1 ( s1 ) … μv k ( s k )⎤ ⎢ ⎥ FMGM ( f ) = ⎢ ⎥ ⎢μ (s ) … μ (s )⎥ v n v n k ⎣ k ⎦ N ×K
(9)
where si is the value of feature f in sample i, 1≤n≤N and 1≤k≤K. Features with minimum entropy are more important than the other features in feature subset selection operation; because they can get a better degree of separation between classes considering the selected feature. Therefore it is necessary to select these features first.
NSS = SS + {f } , f = min f ∈F
FEF ( f
) , F = F − {f }
(10)
where F is the set of features of dataset D, f is the selected feature with minimum fuzzy entropy, SS is current selected subset of features and NSS is new selected subset after adding feature f. To add a new feature to the previous selected subset, for calculating new fuzzy degree of each item in updated selected subset, because of dependence between them,
An Innovative Feature Selection Using Fuzzy Entropy
581
fuzzy minimum operator is needed. So, a Combined Fuzzy Membership Grade Matrix is defined which consist of the results of each round of selecting feature f and combination of it with the selected subset SS: ⎡ μv1 (S1SS ) ∧ μv1 (S1f ) … μv1 ( S1SS ) ∧ μv K (S1f ) … μv I (S1SS ) ∧ μv1 (S1f ) … μv I (S1SS ) ∧ μv K (S1f ) ⎤ ⎢ ⎥ CFMGM (SS , f ) = ⎢ ⎥ ⎢μ S ∧ μ S … μv1 ( SNSS ) ∧ μv K (SNf ) … μv I ( SNSS ) ∧ μv1 (SNf ) … μv I (SNSS ) ∧ μv K ( SNf ) ⎥ ⎣ v1 ( NSS ) v1 ( Nf ) ⎦N ×I
where I is the number of column in current CFMGM matrix, μv ( S nSS i
)
(11)
is the ith
fuzzy membership grade of sample n in subset SS where 1≤i≤I and 1≤n≤N and μv ( S nf ) is the kth fuzzy membership grade of sample n of feature f where 1≤k≤K. k
For calculating fuzzy entropy of combination of subset SS and feature f, the following formula could be used: S v SS ⎧ S SSL Sw S S × ∑ SFE (w ) + ∑ SFE (v SS ) if SSL < fL ⎪ S S S S Sf v SS ∈V SSGE SS SS ⎪ SS w ∈V NSS NSS CFE ( SS , f ,T r ) = ⎨ Sv f Sw ⎪ S fL × SFE (w ) + ∑ SFE (v f ) otherwise ⎪ S f w ∈∑ S V NSS v f ∈V fGE S f NSS ⎩
(12)
where Sv is the summation of fuzzy membership grades of fuzzy set v, vL is the subset of v which they have maximum class degree less than threshold value Tr∈[0,1], SFE(v) summation fuzzy entropy of fuzzy set v, VvGE is a set of fuzzy sets of v which they have maximum class degree greater than or equal to the threshold value Tr, SS is selected subset before combination, NSS is new selected subset after combination, w is a fuzzy set in NSS and f is the selected feature for current combination. This will be continued until CFE(SS,f,Tr) become zero or the difference of new CFE(SS,f,Tr) and previous one become negative. Based on the above, with respected to FEMFSS, the pseudo-code of the proposed feature subset selection method has meaningful changes in the first part of the algorithm; however the second part will remain unchanged.
5 Experimental Results Every experiment is repeated 5 times and the average results of them are shown in the follow. The Tr value in all of simulations is equal to 0.9. Also 10-fold cross-validation in classifications is used. The experimentations are arranged in three parts. In the first part, a comparison between the results of the proposed method, all features, OFFSS, OFEI, FQI, MIFS and FEMFSS is prepared. In the second part, additional comparison of simulation results of the proposed method with FEMFSS and Dongand-Kothari’s methods on Monk datasets are presented. As the third part of the comparisons, the proposed method is compared with FOS-MOD method. Value of Tc and Tr factors were used in the FEMFSS method in first series of simulations are: Iris{0.2,0.9}, Breast cancer{0.1,0.9}, Pima diabetes{0.2,0.75}, MPG{0.03,0.6}. The accuracy results of the first series of the experimentations using the selected subsets of features using different methods are shown in Table 1.
582
H. Parvin, B. Minaei-Bidgoli, and H. Ghaffarian Table 1. Average accuracy results of the first series of simulations
DATA SET
IRIS
BREAST CANCER
PIMA DIABETES
MPG
FEATURE SUBSET SELECTION METHOD ALL FEATURES (OFFSS) (OFEI) (FQI) (MIFS) (FEMFSS) PROPOSED METHOD ALL FEATURES (OFFSS) (OFEI) (FQI) (MIFS) (FEMFSS) PROPOSED METHOD ALL FEATURES (OFFSS) (OFEI) (FQI) (MIFS) (FEMFSS) PROPOSED METHOD ALL FEATURES (OFFSS) (OFEI) (FQI) (MIFS) (FEMFSS) PROPOSED METHOD
LMT 94 94.6667 94.6667 94.6667 94.6667 94.6667 94.6667 95.754 95.3148 95.3148 96.1933 95.4612 96.1933 94.7291 77.474 76.8229 76.0417 77.474 75.5208 77.2135 75.7813 77.5 70.8333 70.8333 75 74.1667 72.9167 77.9167
AVERAGE CLASSIFICATION ACCURACY NAIVE BAYES SMO 96 96 96 96 96 96 96 96 96 96 96 96 96 96 97.3646 96.1933 96.9253 96.0469 96.9253 96.0469 97.2182 96.4861 96.7789 95.6076 97.511 96.9253 95.6076 94.7291 76.3021 77.3438 73.0469 75.9115 76.8229 75.9115 76.3021 77.3438 76.4323 75.9115 77.474 76.8229 75 75.3906 63.75 66.6667 65 66.25 65 66.25 62.0833 66.25 61.6667 66.25 63.75 66.25 65.4167 66.25
C4.5 96 96 96 96 96 96 96 93.4114 94.8755 94.8755 90.9224 95.1684 93.4114 93.265 73.8281 75 74.349 73.8281 74.6094 74.8698 74.4792 76.25 75 75 78.75 72.5 79.5833 77.5
The results of the first series of simulations show that the proposed method has a good performance. The worst results of the proposed method happened in working with Breast Cancer dataset. This can be because, there are some negative silhouette values in the first and second clusters of Breast Cancer dataset that mean some incorrect classification in these clusters. Meanwhile the clustering results of Pima Diabetes dataset has some negative silhouette values too; but less than Breast Cancer dataset. On the other hand, the proposed method has the best performance in MPG dataset. This is because this dataset is clustered into four clusters without any negative silhouette values. Most of silhouette values in this classification are greater than 0.5. As a conclusion it could be say that the performance of the proposed method depends into the degree of correct clustering in the first phase of the proposed method. K-Means clustering has some limitations. Using high performance clustering methods, the proposed method could give better performance results. At the second series of comparisons, the accuracy results of the proposed method are compared with the results of FEMFSS and Dong-and-Kothari’s methods. The datasets in these series are Monk datasets. Table 2 shows the results of the classification by mapping the data into the selected features by different methods. The value of Tr factor were used in FEMFSS method in these series of simulations are: Monk-1{N/A, 0.9}, Monk-2{N/A, 0.6}, Pima Monk-3{N/A, 0.95}. Because of nominal nature of these datasets, authors did not use any value of Tc. As the accuracy results show, in Monk-1 dataset, the results of the proposed method are less accurate than Dong-and-Kothari and FEMFSS methods, spatially with LMT and C4.5 classifiers; however they are better than all feature selected. In Monk-2, using LMT, the results of the proposed method stands in the third place and
An Innovative Feature Selection Using Fuzzy Entropy
583
it is better than the results of FEMFSS method. However the proposed method gave the worst result using Naïve Bayes classifier. Using C4.5, standing with FEMFSS method, it achieved the best accuracy results. It is very interesting that using SMO classifier, none of four approaches could find any profit against others in all series of Monk datasets. This case repeated for Monk-3 and all of approaches received the same results as the others. For the last comparison, FOS-MOD method is chosen and compared with the proposed method within three high dimensional datasets. Author of FOS-MOD did not prepare selected subset features of their method; so their results are repeated here. Table 3 depicts the results of the accuracy using the selected subsets. As the results of the third series of simulations depict, obtained accuracy results of the proposed method is a little worse than all feature selection and FOS-MOD methods; but the main advantage of the proposed method is the length of selected subset. As it could be seen, the length of selected subset in WBC dataset is one third of all exist features (3 features of 9), in WDBC dataset is one over seven and half (4 features of 30), and in Ionospher is less than one over eleven and half. The record of FOS-MOD for these values is about half of the features. It means that with losing a little accuracy, the proposed method saves more memory space in act. This profit is very important in large datasets which they must run on short capacity of memory of current computers. Table 2. Average accuracy results of the second series of simulations FEATURE SUBSET SELECTION METHOD
DATA SET
MONK-1
MONK-2
MONK-3
ALL FEATURES DONG-AND-KOTHARI (FEMFSS) PROPOSED METHOD ALL FEATURES DONG-AND-KOTHARI (FEMFSS) PROPOSED METHOD
LMT 75 93.5484 93.5484 88.7097 63.3136 63.3136 60.355 60.9467
ALL FEATURES DONG-AND-KOTHARI (FEMFSS) PROPOSED METHOD
92.623 92.623 92.623 92.623
AVERAGE CLASSIFICATION ACCURACY NAIVE BAYES SMO 77.4194 83.871 75.8065 83.871 75.8065 83.871 74.1935 83.871 57.3964 58.5799 57.3964 58.5799 58.5799 58.5799 54.4379 58.5799 93.4426 93.4426 93.4426 93.4426
C4.5 82.2581 95.9677 95.9677 89.5161 56.213 56.213 62.1302 62.1302
93.4426 93.4426 93.4426 93.4426
93.4426 93.4426 93.4426 93.4426
Table 3. The average accuracy results of the third series of simulations
DATA SET
WISCONSIN BREAST CANCER (WBC) WISCONSIN DIAGNOSTIC BREAST CANCER (WDBC)
IONOSPHERE
FEATURE SUBSET SELECTION METHOD ALL FEATURES FORWARD ORTHOGONAL SEARCH- MAXIMIZING THE OVERALL DEPENDENCY (FOS-MOD) PROPOSED METHOD ALL FEATURES FORWARD ORTHOGONAL SEARCH- MAXIMIZING THE OVERALL DEPENDENCY (FOS-MOD)
NUMBER OF SELECTED FEATURES 9
K-NN RESULTS 97.4249 (5-NN)
4
97.42 (15-NN)
3 (5,2,3) 30
95.3148(3-NN) 97.0123(5-NN)
13
97.04(7-NN)
PROPOSED METHOD
4{24,4,14,23}
94.9033(4-NN)
ALL FEATURES FORWARD ORTHOGONAL SEARCH- MAXIMIZING THE OVERALL DEPENDENCY (FOS-MOD) PROPOSED METHOD
34
86.755(3-NN)
19
86.39(3-NN)
3{15,17,13}
85.755(5-NN)
584
H. Parvin, B. Minaei-Bidgoli, and H. Ghaffarian
In most of the performed experiments, it is obvious that the proposed method selects significantly less number of features. The accuracy of classification using these features is almost equal or near to the accuracy of classification on the features selected by the competitive methods.
6 Conclusion Feature subset selection is one of the most important aspects of data mining. The goal of this operation is to omit redundant and irrelevant features of dataset. These features cause a huge overhead in dataset rendering. In this paper, a new feature subset selection method is proposed. It tries to omit overall overheads of feature subset selection to find subsets fast and easily. The proposed method uses fuzzy entropy concept. The simulation results show a good performance against the previously proposed methods both in time and accuracy. The experimental results show that proposed method in most cases selects (even much) less features than its rivals and it simultaneously retains the most of information available in data set structure. Even in some cases that accuracy over features selected by the proposed method is to some extend less than other method but the number of selected features by the proposed method is much less than the others.
References 1. Anne-Laure, I.B.: Dimension Reduction and Classification with High-Dimensional Microarray Data Unsupervised Pattern Recognition. PhD thesis, University of Munichen (2004) 2. Battiti, R.: Using mutual information for selecting features in supervised neural network learning. IEEE Trans. Neural Network, 537–550 (1994) 3. Chaikla, N., Qi, Y.: Genetic algorithms in feature selection. In: IEEE Inter. Conf. on Systems, Man, and Cybernetics, pp. 538–540 (1999) 4. Comon, P.: Independent component analysis: a new concept? Signal Processing, 287–314 (1994) 5. Cox, T., Cox, M.: Multidimensional Scaling. Chapman & Hall, Boca Raton (1994) 6. De, R.K., Basak, J., Pal, S.K.: Neuro-fuzzy feature evaluation with theoretical analysis. Neural Network, 1429–1455 (1999) 7. De, R.K., Pal, N.R., Pal, S.K.: Feature analysis: neural network and fuzzy set theoretic approaches. Pattern Recognition, 1579–1590 (1997) 8. Gareth, M.J., Trevor, J.H.: Functional Linear Discriminant Analysis for Irregularly Sampled Curves. Journal of the Royal Statistical Society Series B, 533–550 (2001) 9. Help of MATLAB (2008a) software, http://www.mathworks.com 10. Jolliffe, T.: Principal Component Analysis. Springer, New York (1986) 11. Maaten, L.V.D., Postma, E.O., Herik, H.V.D.: Dimensionality reduction: a comparative review. University Maastricht, Amsterdam (2007) 12. Platt, J.C.: Using analytic QP and sparseness to speed training of support vector machines. Neural Information Processing Systems, 557–563 (1999) 13. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by local linear embedding. Science, 2323–2326 (2000)
An Innovative Feature Selection Using Fuzzy Entropy
585
14. Shie, J.D., Chen, S.M.: Feature subset selection based on fuzzy entropy measures for handling classification problems. Appl. Intell., 69–82 (2008) 15. Steve, D.B.: Unsupervised Pattern Recognition. PhD thesis, University of Antwerpen (2002) 16. Tsang, E.C.C., Yeung, D.S., Wang, X.Z.: OFFSS: optimal fuzzy valued feature subset selection. IEEE Trans. Fuzzy System, 202–213 (2003) 17. Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science, 2319–2323 (2000) 18. Wei, H.L., Billings, S.A.: Feature Subset Selection and Ranking for Data Dimensionality Reduction. IEEE Transaction on Pattern Analysis and Machine Intelligence, 162–166 (2007)
Study on the Law of Short Fatigue Crack Using Genetic Algorithm-BP Neural Networks Zheng Wang, Zihao Zhao, Lu Wang, and Kui Wang School of Energy and Power Engineering, Dalian University of Technology, Dalian, Liaoning,116023, People’s Republic of China
Abstract. The initiation and propagation of short crack is a complex process of nonlinear dynamics. The material field has been concerning of the law of short crack and put forward a number of models and predictive equations. But most of them have to meet the problems which are unclear physical parameters meaning, narrow range of applications etc. That is the reason why there is not yet widely recognized quantitative model of crack evolution so far. The introduction of genetic algorithm-BP neural networks could solve those problems by avoiding building the explicit model equation and directly extracting the potential rules from the data. In this paper, the short crack for low cycle is studied under complex stress at high temperature. The material adopted in the experiment is Q245R steel. The initiation, propagation and coalescence of short crack are observed. The comparisons between experiment results and neural network simulation results show that genetic algorithm-BP neural networks simulation method can predict the law of short crack with higher prediction accuracy. Keywords: genetic algorithms, neural network, fatigue short crack.
1 Introduction In recent years, researches on the fatigue crack have become the focus of the field of fatigue research. In engineering, many components working at high temperature, high pressure work with complex the loading situation, may occur fatigue failures resulting in equipments damage. And the entire component life often depends on hightemperature fatigue properties of materials[1]. Tests confirm that the fatigue processes depend directly on the propagation and initiation of the short crack. Short crack stage is as high as 80%[2] in the proportion of fatigue life. Propagation law of short and long crack is very different, so the study on the short crack for low cycle at high temperature under complicated stress is significant. Paris[3] formula has successful described crack growth conditions meeting the LEFM. Miller[4], Hobson[5], Polak[6] did plenty of work on short fatigue crack growth, and put forward their own models. However, the models are hard to use in practice because some parameters have not precise physical meaning, and then they are only suitable to specific material characteristics, experimental conditions or process. The applications are in narrow range. Those models do not have outreach capacity. Moreover, the short crack propagation is influenced strongly by microstructure structure, such as grain size, material’s physical D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 586–593, 2011. © Springer-Verlag Berlin Heidelberg 2011
Study on the Law of Short Fatigue Crack Using Genetic Algorithm-BP Neural Networks
587
shape and direction of grain boundary and the anisotropy[7]. The law of short crack is difficult to explain with explicit equation. Short crack initiation and propagation is the result of the evolution of complex nonlinear dynamics process, such as slip bands, crack path and fracture surface fractal properties, damage localization, fatigue crack propagation rate of density and dispersion, etc. [8, 9]. Neural networks, a complicated nonlinear system, have real time and parallel collective arithmetic ability. It can indicate true complicated nonlinear system following fairly good predicting results. Neural networks’ forecast trend are especially rational. Multilayered feedforward neural network is fit to handle large amount of scattering, containing noise or incomplete data. It can draw out law from data avoiding building explicit expression. Therefore, multilayered feedforward neural network adapt to handle the fatigue test data at reduced cost and gain crack evolution law specially. At present, it is common to study short fatigue crack using neural networks method, but the method combining neural networks and genetic algorithm is seldom.
2 Fusion between BP Neural Network and Genetic Algorithm In accordance with the flow of information within the network, the neural network can be divided into two types: feedforward networks and feedback networks. The neural networks commonly used consist of BP network, hopfield networks, ART networks, Kohonen networks. BP (Back Propagation) network, which is a kind of Multiple-layer feedforward network based on error back propagation algorithm, is one of the most widely used and the most successful networks. 80% ~ 90% artificial neural network models use BP network in practice[10]. In theory, it has been demonstrated that three layers BP network can approximate in nonlinear continuous function with any arbitrary accuracy. 2.1 BP Neural Network BP neural network is one type of multi-layer networks using Widrow-Hoff learning algorithm and nonlinear differentiable transfer function. BP neural network is made up of one input layer, one output layer and one or more hidden layers, and each layer consists of numbers of neurons nodes. As shown in Figure 1, the principle of BP neural network is that the input data (x1, x2, ... , xn) received by the input layer are conducted through the middle layer and eventually cause responses (y1, y2, ..., ym) in the output layer. For reducing the output error, link weights and thresholds of each layer are amended, and then it is back to the input layer to continue to conduct the same step. With the improvement of link weights and thresholds, the network’s response is close to output data gradually. The process can’t be terminated until the output error is less than the target error. This algorithm is called "BP algorithm". 2.2 Genetic Algorithm(GA) Genetic Algorithm is the stochastic global search and optimization method mimicking natural evolution mechanism, which draws inspiration from Darwinian Theory and
588
Z. Wang et al. x1
y1
x2
y2
…
…
…
…
xn
yn input layer
middle layer
output layer
Fig. 1. A typical BP neural network
Mendel's theory of evolution. Its essence is an efficient, parallel, global search method. It can automatically obtain and accumulate knowledge about the search space, and adaptive control process in order to achieve the optimal solution. 2.3 Introducing Genetic Algorithm to Improve BP Neural Network BP algorithm has the ability of accurately optimization and some drawbacks at the same time. The biggest drawbacks are the following problems: falling into local optimum, slow convergence and oscillation effects. Thus the prediction accuracy may be affected. The trouble of falling into local optimum can be resolved by adjusting the initial values, while the slow convergence and oscillation effects is caused by network training falling into local minimum. So, the method of finding a way to search for the approximate optimal weights as initial weights of BP algorithm can avoid the above problems. GA has strong macro-search capabilities and the greater probability of finding the global optimal solution. Therefore, introducing GA to determine the weights of neural networks can overcome the disadvantages of BP algorithm. Thus, genetic algorithm-BP neural network inherit genetic algorithm’s "strong search capabilities and good macro-global optimization capabilities", but also develop the neural network’s "strong nonlinear approximation ability" [11]. It is obvious that genetic algorithm improve the BP neural network. When BP network is not combined with genetic algorithm, BP network takes long time to converge with larger error. After the introduction of genetic algorithm, the BP neural network takes very short time to reach convergence with a target error. Through comparison of Figure 3 and Figure 4, it can be seen that the combination between genetic algorithm and BP neural network can make network convergence and accuracy greatly improved.
3 Low Cycle High-Temperature Fatigue Crack Experiments Firstly according to the practical problems the network structure is constructed. In this paper, three-layer network structure is constructed. The number of neurons of the input layer, output layer and hidden layer should be determined. BP network is trained by the
Study on the Law of Short Fatigue Crack Using Genetic Algorithm-BP Neural Networks
589
experimental data until convergence. When the model is established, program steps into the verification stage. Hot-rolled steel Q245R (20#) is investigated in the test, mechanical properties of materials is showed in Table 1. Table 1. Mechanical property of hot-rolled Q245R
Yield
Ultimate
Modulus
Temperature
limit
tensile
elongation
of
(oC)
V 0. 2
strength
G 5 (%)
elasticity
(Mpa)
V b (Mpa)
room
330.5
488.98
32
2.28×105
0.303
400
327.5
437.25
25.4
2.05×105
0.296
Poisson’s Ratio
P
E(Mpa)
The size and shape of specimens are shown in Fig. 2. Axial loading on the notch specimen’s gap will produce a different stress components non-uniform distribution three-dimensional stress state[10]. Gap size is processed into three shapes R3, R4, R5. Specimens with three different gap sizes are studied to obtain effects on the evolution of short fatigue crack under the different state of triaxial stress. Experiments were carried out on MTS Landmark 100KN fatigue testing system under strain control at a frequency of 0.5 Hz, cycle characteristics for the R=-1, the triangle wave, test temperature 400 . Crack initiation and propagation are tracked by direct observation and CCD imaging.
℃
Table 2. Specimen type used in the experiment
notch Number size
Fig. 2. Specimen size and shape
strain
temperature
amplitude ć
1
R3
0.2
2
R4
0.16
3
R4
0.2
4
R4
0.24
5
R5
0.2
Load frequency Hz
400
0.5
590
Z. Wang et al.
4 The Evolution of Short Fatigue Cracks GA-BP Neural Network Model 4.1 Genetic Algorithm-BP Neural Network Model of Short Fatigue Crack Growth Rate The general formula of short fatigue crack growth rate:
da = f a ( K c , d , ΔK , a, R, C ,...) dN
(1)
ΔK c the critical value, d grain size, ΔK the stress intensity factor range, a the crack length, R the stress ratio, C material constant. Factors (number of cycles, the maximum applied force, the minimum applied force, strain rate) that greater impact on the crack growth rate is selected as input, and the crack propagation rate is selected as output. BP neural network used in the model is the three layers structure including an input layer, an intermediate layer and an output layer. Number of neurons in the input layer, the same with number of columns of input data, is 4. Number of neurons in the output layer, the same with number of columns of the output data, is 1. There is no specific formula for selecting appropriate number of neurons in the middle layer, but should follow the principle of using neurons as little as possible with ensuring the convergence of neural networks. It could prevent over-fitting. Number of middle layer neurons is taken as 19 according to the experience of several experiments. Selection of BP neural network: learning rate 0.05, momentum factor 0.95, target error 1E-6. Selection of genetic algorithm parameters: population size 50, hereditary offspring 100. A total of 133 sets of data, 122 sets are selected randomly as training data to train the neural network, the remaining 11 sets of data as the forecast data to verify the simulation network performance. 4.2 Establish and Verify the Genetic Algorithm-BP Neural Network Step1: Training sample normalization: the data distribution is normalized between 0-1. Normalization aims to speed up the convergence of the network. ; Step2: Create a three-layer BP network net=newff(minmax(p),[S1, 1], {'tansig', 'purelin'}, 'trainlm'); Step3: Genetic algorithm is used to initial network’s weights and threshold value; Step4: The obtained weights and threshold values are adapted to neural network; Step5: Set the training parameters; Step6: Train network with training data and normalize the forecast data;
Study on the Law of Short Fatigue Crack Using Genetic Algorithm-BP Neural Networks
591
Step7: Forecast data is entered into the network to obtain output data. The output data is anti-normalized and compared with the actual data. The left of Fig. 3 shows a comparison between the simulation predicted data1 from BP neural network with the actual data. The right Fig. 3 shows the genetic algorithm-BP neural network simulation predicted data2 compared with the actual data.
Fig. 3. The comparison of two neural networks models’ crack growth rate output
Dots represent the neural network output data points while asterisks represent the actual data. The same 11 groups of data are used in two models. Forecast data are compared with the actual data as following: Table 3. Testing data for two types neural networks model of crack growth rate
Data
1
2
3
4
5
6
7
8
9
10
11
Predicted data 1/×10-3
1.5
4.2
8.7
0.5
3.3
2.4
10.2
-1.2
0.1
0.3
3.8
Predicted data 2/×10-3 Actual data /×10-3
4.1
4.3
17.3
0.3
2.1
0.4
11.1
0.1
1.6
0.4
2.5
3.3
2.8
17.6
0.4
2.0
0.7
10.9
0.12
1.6
0.8
2.4
Pm / cycles
From the comparison of data and pictures, it can be seen that the error of prediction is large with only BP neural network. GA-BP networks significantly improve the condition. The most of the GA-BP network data agree well with the raw data, individual data error is slightly large. Such as, large relative error of data1 and data2 is
592
Z. Wang et al.
mainly due to be lack of training data, the data error of experiment and the error from the neural network simulation itself. 4.3 Genetic Algorithm-BP Neural Network Model of Short Fatigue Crack Density BP neural network use the same three-layer structure with the short crack growth rate model. The middle layer neurons are taken as 17. Other parameters are the same with crack growth rate model. There are 115 groups of data. 106 groups of data are randomly selected as training data to train the neural network. The remaining 9 groups of data are as predicting of data aiming to verify network performances.
Fig. 4. The comparison of two neural networks models’ crack density rate output Table 4. Test data for two types neural networks model of crack density
Data
Predicted
1
2
3
4
5
6
7
8
9
12
63.9
19.0
9.0
281
15.4
122
16.0
10.2
9.0
30.1
18.8
6.2
231.8
15.6
174
7.0
13.7
8.7
39.4
10.1
4.7
234.9
16.1
154
7.7
13.7
-3
data1/×10 predicted
-3
data2/×10 actual
-3
data/×10
Most of the forecast data agree well with the measured data. But data3 and data 7 have relative large error. And the errors of crack growth rate are mainly due to the network without adequate training data. The other factors are the experiment error of measuring system and the error of neural network itself.
Study on the Law of Short Fatigue Crack Using Genetic Algorithm-BP Neural Networks
593
5 Conclusion This experiment proved that the introduction of the genetic algorithm can effectively overcome the trouble of convergence of the BP neural network and easily falling into local minimum. That the genetic algorithm-BP neural network simulate short fatigue crack growth rate and crack density prove that the method is a better tool to process the data and describe the fatigue crack law of fatigue crack evolution with higher accuracy. It confirms that the genetic algorithm-BP network used in the evolution research of short fatigue crack is feasible. Although the overall prediction accuracy is well, genetic algorithm-BP neural network method’s prediction accuracy is not high for some individual points as lack of a large amount of training data. And there is no clear formula about training parameters of neural network and genetic algorithm, so a lot of work needs to be done. Acknowledgments. I want to acknowledge the financial support from The National Natural Science Fund(NO.50771024). I would like to thank all the people who contributed to the content and reviewed this article. In particular, thanks go to, Lu Wang, Jing Zheng, Kui Wang and Professor Wang.
References 1. Yu, M.: Study on Initiation and Propagation of Short Crack for Low Cycle Fatigue at High Temperature and Complex Stress States. J. Dalian University of Technology 4 (2006) 2. Hong, Y.S., Fang, B.: Theory and meso-scopic process of initiation and propagation of short fatigue crack. J. Advances in Mechanics 23(4), 468–486 (1993) 3. Paris, P.C., Erdogan, F.: A critical analysis of crack propagation laws. J. Trans. ASME, Ser. 85, 528–534 (1963) 4. Miller, K.J., Van Elst, H.C., Bakker, A.: Proceedings Fracture Control of Engineering Structures ECF6. Amsterdam, 149–167(1986) 5. Hobson, P.D., Brown, M.W., de los Rios, E.R., Miller, K.J.: The Behavior of Short Fatigue Cracks, pp. 441–459. EGP Publication 1, Mechanical Engineering Publications, London (1986) 6. Obrtlik, K., Polak, J., Hajek, M.: Short fatigue crack behavior in 316L stainless steel. J. Int. Fatigue 19, 471–475 (1997) 7. Yin, G.Z., Dai, G.F., Wan, L.: Characteristics of bifurcation, chaos and self-organization of evolution of micro-cracks in rock. Chinese Journal of Rock Mechanics and Engineering 21(5), 635–639 (2002) 8. Sun, D.H., Sun, X.F., Liu, X.B.: Analysis on the parameter domains of short fatigue cracks system. Journal of Xiamen University (Natural Science) 39(2), 175–179 (2002) 9. Ni, K.: Advances in stochastic theory of fatigue damage accumulation. Advances in Mechanics 29(1), 45–46 (1999) 10. Li, P.Y., Feng, Y.L., Zhang, W.M.: System of obstacle avoidance used in deep-seabed vehicle based on BP neural network and genetic algorithm. Journal of central South University (Science and Technology) 41(4), 1374–1378 (2010) 11. Wang, L., Wang, Z., Yu, M.: Experimental study and numerical simulation on coalescence and interference of short cracks for low cycle fatigue at high temperature. Journal of Mechanical Strength 138(04), 642–646 (2008)
Large Vocabulary Continuous Speech Recognition of Uyghur: Basic Research of Decoder Muhetaer Shadike1,2, Xiao Li1, and Buheliqiguli Wasili3 1
The Xinjiang Technical Inistitute of Physics & Chemistry, Urumchi, China 2 Graduate School, Chinese Academy of Sciences, Beijing, China 3 Xinjiang Education Institute, Urumchi, China [email protected]
Abstract. In this paper, we introduce the decoder, the central component of speech recognition system, and develop a computer program for dynamic time warping. Use this program to compute the time-warping cost between the sequences “essalam,” spoken twice by the same speaker and “hörmetlik,” also spoken by the same speaker, using the former as reference sequence. Keywords: Uyhgur; Speech recognition; Decoder; Bayes’ rule; Dynamic time warping.
1 Introduction Today’s state-of-the-art speech recognizers are based on statistical techniques, with Hidden Markov Models being the dominant approach. The main components of a speech recognition system are the front-end processor, the decoder, the acoustic models, and the language models [1]. The central component of the recognizer is the decoder, which is based on a communication theory view of the recognition problem; it tries to extract the most likely sequence of words W=[w1,w2,…,wN] given the set of acoustic vectors X. This can be done using Bayes’ rule:
P (W ) p ( X | W ) Wˆ = arg max p( X ) w
(1.1)
The discrete probability P(W) of the word sequence W is obtained from the language model, where the acoustic model determines the likelihood of the acoustic vectors for a given word sequence p(X|W). Speech can be viewed as a discrete message source with a hierarchical structure: phonemes are joined from syllables, then words, phrase, and finally continuous discourse. In modern speech recognition systems, stochastic models are postulated for certain units of speech, such as words or phonemes. The basic acoustic models are then combined to form models for larger units with the aid of dictionaries, and/or probabilistic grammars. In designing a speech recognizer, an important decision to be made concerns the set of events that will be modeled [4]. The question “what is a good unit of speech for D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 594–600, 2011. © Springer-Verlag Berlin Heidelberg 2011
Large Vocabulary Continuous Speech Recognition of Uyghur
595
recognition?”, can be translated under the hierarchic structure perspective into the question “what level – or levels – of the hierarchy should we model?”. If unlimited amounts of training data (The set of speech utterances that will be estimation of the models’ parameters) were available, we would benefit by using models for words, or even groups of words; by modeling larger speech units, it is possible to eliminate the variability due to context. However, given practical limitations, we are forced to use units from lower levels of the hierarchy (e.g., phonemes) that are easier to train. The phoneme is natural and popular choice as the basic modeling unit for Uyghur. In this paper, we shall not look at full-blown speech recognition systems. We want to concentrate more on the theories for the decoder than on particular programs.
2 Bayes Decision Theory We will initially on the focus on the two cases. Let w1, w2 be the two classes in which our patterns belong. In the sequel, we assume that the a priori probabilities (language model) [3] P(w1), P(w2) are known. This is a very reasonable assumption, because even if they are not known, they can easily be estimated from the available training feature vectors. Indeed, if N is the total number of available training patterns, and N1,N2 of them belong to w1 and w2, respectively, then P(w1)≈N1/N and P(w2)≈N2/N. The other statistical quantities assumed to be known are the class-conditional probability density functions p(x|wi), i=1,2, describing the distribution of the feature vectors in each of the classes (acoustic model). The Probability Density Function (PDF) p(x|wi) is sometimes referred to as the likelihood function of wi with respect to x [2]. Here we should stress the fact that an implicit assumption has been made. That is, the feature vectors can take any value in the l-dimensional feature space. In the case that feature vectors can take only discrete values, density functions p(x|wi) become probabilities and will be denoted by P(x|wi). We now have all the ingredients to compute our conditional probabilities, as stated in the introduction. To this end, let us recall from our probability course basics the Bayes rule:
P ( wi | x) =
p ( x | wi ) P ( wi ) p( x)
(2.1)
were p(x) is the PDF of x and for which we have 2
p ( x) = ∑ p ( x | wi ) P ( wi )
(2.2)
i =1
The Bayes classification rule can now be stated as If P(w1| x)> P(w2| x), x is classified to w1 If P(w2| x)> P(w1| x), x is classified to w2
(2.3)
The case of equality is detrimental and the pattern can be assigned to either of the two classes. Using (2.1), the decision can equivalently be based on the inequalities
596
M. Shadike, X. Li, and B. Wasili
p ( x | w1 ) P( w1 ) >< p ( x | w2 ) P( w2 )
(2.4)
p(x) is not taken into account, because it is the same for all classes and it does not affect the decision. Furthermore, if the a priori probabilities (language model) are equal, that is, P(w1)=P(w2)=1/2, (2.4) becomes
p ( x | w1 ) >< p ( x | w2 )
(2.5)
Thus, the search for the maximum now rests on the values of the conditional PDFs evaluated at x.
3 Dynamic Time Warping In this section, we highlight the application of dynamic programming techniques in speech recognition. We will focus on the simpler form of the task, known as discrete or isolated word recognition. That is, we will assume that the spoken text consists of discrete words, well isolated with sufficient silent periods between them. In tasks of this type, it is fairly straightforward to decide where, in time, one word finishes and another one starts. This is not, however, the case in the more complex continuous speech recognition systems, where the speaker speaks in a natural way and temporal boundaries between words are not well defined. In the latter case, more elaborate schemes are required. When words are spoken by a single speaker and the purpose of the recognition system is to recognize words spoken by this person, then the recognition task is known as speaker-dependent recognition. A more complex task is speaker-independent recognition. In the latter case, the system must be trained using a number of speakers and the system must be able to generalize and recognize words spoken by people outside the training population. At the heart of any isolated word recognition system are a set of known reference patterns and a distance measure. Recognition of an unknown test pattern is achieved by searching for the best match between the test and each of the reference patterns, on the basis of the adopted measure. Figure 1(a) and 2(a) show the plots of two time sequences resulting from the sampling of the word “essalam,” spoken twice by the same speaker. The samples were taken at the output of a microphone at a sampling rate of 44,100 Hz. Although it is difficult to describe the differences, these are readily noticeable. Moreover, the two spoken words are of different duration. The arrows indicate the intervals in which the spoken segments lie. The intervals outside the arrows correspond to silent periods. Specifically, the sequence in figure 1(a) is 0.340 second long, and the sequence in figure 2(a) is 0.356 second long. Furthermore, it is important to say that this is not the result of a simple linear time scaling. On the contrary, a highly nonlinear mapping is required to obtain a match between these two “same” words spoken by the same person. For comparison, figure 2(b) shows the plot of the time sequence corresponding to another word, “hörmetlik,” spoken by the same speaker.
Large Vocabulary Continuous Speech Recognition of Uyghur
597
Fig. 1. Plots of (a) the time sequence corresponding to the word “essalam” and (b) the magnitude of the DFT for one of its frames
We will resort to dynamic programming techniques to unravel the nonlinear mapping (warping) required to achieve the optimal match between a test and a reference pattern. To this end, we must first express the spoken words as sequences (strings) of appropriate feature vectors, r(i), i=1,…,I, for the reference pattern and t(j), j=1,…,J, for the test one. Obviously, there is more than one way to choose the feature vectors. We will focus on Fourier transform features. Each of the time sequences involved is divided into successive overlapped time fames. In our case, each frame is chosen to be tf=512 samples long and the overlap between successive frames is t0=100 samples, as shown in figure 3. The resulting number of frames for speech segment shown in figure 1(a) is I=29, and for the other two they are J=31 (figure 2(a)) and J=35 (figure 2(b)), respectively. We assume that the former is the reference pattern and the latter two the test pattern. Let xi(n), n=0,…511, be the samples for the ith frame of the reference pattern, with i=1,…,I. The corresponding Discrete Fourier Transform (DFT) is given as
X i ( m) =
1 n =511 2π xi (n) exp(− j mn), m = 0,...,511 ∑ 512 512 n =0
(3.1)
Figure 1(b) shows the magnitude of the DFT coefficients for one of the I frames of the reference pattern [9]. The plot is a typical one for speech segments. The magnitude of the higher DFT coefficients is very small, with little contribution to the signal. This
598
M. Shadike, X. Li, and B. Wasili
justifies use of the first l DFT coefficients as features, where usually l<
⎡ X i (0) ⎤ ⎢ X (1) ⎥ i ⎥ , i = 1,..., I r (i ) = ⎢ ⎢ ⎥ # ⎢ ⎥ ⎣ X i (l − 1) ⎦
(3.2)
Fig. 2. Plots of the time sequence resulting from the words (a) “essalam” and (b) “hormetlik,” spoken by the same speaker t0
frame 1
frame 2
frame 3
tf
Fig. 3. Successive overlapping frames for computation of the DFT feature vectors
The feature vectors t(j) for each of the test patterns are formed in a similar way. The choice of DFT coefficients as features is just one of various possibilities that have been suggested and used over the years. Other popular alternatives include the parameters
Large Vocabulary Continuous Speech Recognition of Uyghur
599
from an AR modeling of the speech segment, the ceptral coefficients (inverse DFT of the logarithm of the magnitude of the DFT coefficient), and so on. Having completed the preprocessing and feature selection, the reference and test patterns are expressed as (ordered) sequences of feature vectors r(i) and t(j). Our goal becomes to compute the best match among the frames of the test and reference patterns. That is, the test pattern will be stretched in time (one test frame corresponds to more than one frame of the reference patterns) or compressed in time (more than one test frame corresponds to one frame of the reference pattern). This optimal alignment of the vectors in the two string patterns will take place via the dynamic programming procedure. To this end, we first locate the vectors of the reference string along the abscissa and those of the test pattern along the ordinate. Then, the following need to be determined: Global constraints; Local constraints; End-point constraints; The cost d for the transitions. Various assumptions can be adopted for each of these, leading to different results with relative merits. 3.1 End-Point Constraints In our example, we will look for the optimal complete path that starts at (0,0) and ends at (I,J) and whose first transition is to the node (1,1). Thus, it is implicitly assumed that the end points of the speech segments (i.e., r(i), t(j) and r(I), t(J)) match to a fair degree. These can be the vectors resulting from the silent periods just before and just after the speech segments, respectively. A simple variation of the complete path constraint result if the end points of the path are not specified a priori and are assumed to be located within a distance ϵ from points (1,1) and (I,J). It is left to the optimizing algorithm to locate them. 3.2 Global Constraints The global constraints define the region of nodes that are search for the optimal path. Nodes outside this region are not searched. Basically, the global constraints define the overall stretching or compression allowed for the matching procedure. If I≈J, then it is not difficult to show that the number of grid points to be searched is reduced by approximately one-third. 3.3 Local Constraints These constraints define the set of predecessors and the allowable transitions to a given node of the grid. Basically, they impose limits for the maximum expansion or compression rates that successive transitions can achieve. A property that any local constraints must satisfy is monotonicity. That is, ik-1 ≤ ik and jk-1 ≤ jk. in other words, all predecessors of a node are located to its left and south. This guarantees that the matching operation follows the natural time evolution and avoids confusing, for example, the word “menzil” with the word “menzir.” 3.4 The Cost A commonly used cost, which we will also adopt here, is the Euclidian distance between r(ik) and t(jk), corresponding to node (ik,jk), that is, d(ik, jk|ik-1, jk-1) = ||r(ik) t(jk)|| ≡ d(ik, jk). In this, we assume that no cost is associated with the transitions to a
600
M. Shadike, X. Li, and B. Wasili
specific node, and the cost depends entirely on the feature vectors corresponding to the respective node. Other costs have also been suggested and used. More recently used a cost that accounts for the most commonly encountered errors (e.g., different players style) in the context of music recognition. Finally, it must be stated that often a normalization of the overall cost D is carried out. This is to compensate for the difference in the path lengths, offering “equal” opportunities to all them. A logical choice is to divide the overall cost D by the length of each path.
References 1. Rayner, M., Carter, D., Bouillon, P., Digalakis, V., Wiren, M.: The Spoken Language Translator. Peking University Press (2010) 2. Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. China Machine Press, Beijing (2009) 3. Martinez, W.L., Martinez, A.R.: Computational Statistics Handbook with Matlab. Chapman & Hall/CRC (2002) 4. Rabiner, L., Juang, B.-H.: Fundamentals of Speech Recognition. Tsinghua University Press, Beijing (1999) 5. Ross, S.M.: A First Course in Probability. Posts & Telecom Press (2009) 6. Shadike, M., Xiao, L., Wasili, B.: Large Vocabulary Continuous Speech Recognition: Basic research of Trigram Language Model. In: ICMCE (2010) 7. Shadike, M., Xiao, L., Wasili, B.: Large Vocabulary Continuous Speech Recognition: Basic research of Acoustic Model. In: CSIE (2011) 8. Shadike, M., Xiao, L., Wasili, B.: Large Vocabulary Continuous Speech Recognition: Basic research of Front-end Processor. In: NCIS 2011 (2011) 9. Mandal, M., Asif, A.: Continuous and Discrete Time Signal and Systems. Post & Telecom Press (2010) 10. Hahn, B., Valentine, D.: Essential Matlap for Engineers and Scientists, 4th edn. Academic Press, London (2010)
Sentic Medoids: Organizing Affective Common Sense Knowledge in a Multi-Dimensional Vector Space Erik Cambria1 , Thomas Mazzocco1 , Amir Hussain1 , and Chris Eckl2 1
Cognitive Signal and Image Processing Research Laboratory University of Stirling, Stirling, UK 2 Sitekit Labs, Sitekit Solutions Ltd, Portree, UK {eca,tma,ahu}@cs.stir.ac.uk, [email protected] http://cs.stir.ac.uk/~ eca/sentics
Abstract. Existing approaches to opinion mining and sentiment analysis mainly rely on parts of text in which opinions and sentiments are explicitly expressed such as polarity terms and affect words. However, opinions and sentiments are often conveyed implicitly through context and domain dependent concepts, which make purely syntactical approaches ineffective. To overcome this problem, we have recently proposed Sentic Computing, a multi-disciplinary approach to opinion mining and sentiment analysis that exploits both computer and social sciences to better recognize and process opinions and sentiments over the Web. Among other tools, Sentic Computing includes AffectiveSpace, a language visualization system that transforms natural language from a linguistic form into a multi-dimensional space. In this work, we present a new technique to better cluster this vector space and, hence, better organize and reason on the affective common sense knowledge in it contained. Keywords: Sentic Computing, AI, Semantic Web, NLP, Clustering, Opinion Mining and Sentiment Analysis.
1
Introduction
The ways people express their opinions and sentiments have radically changed in the past few years thanks to the advent of social networks, online communities, blogs, wikis and other online collaborative media. But, although these online social data are perfectly suitable for human consumption, they remain hardly accessible to machines. To bridge the cognitive and affective gap between word-level natural language data and the concept-level opinions and sentiments conveyed by them, we recently developed Sentic Computing [1], a multi-disciplinary approach to opinion mining and sentiment analysis that exploits both computer and social sciences to better recognize, interpret and process opinions and sentiments over the Web. The main tool used, within Sentic Computing, to perform emotive reasoning on natural language is AffectiveSpace [2], a multi-dimensional representation of D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 601–610, 2011. c Springer-Verlag Berlin Heidelberg 2011
602
E. Cambria et al.
affective common sense knowledge. So far, a k-nearest neighbor (k-NN) approach has been used to divide this vector space into emotional clusters but resulting classification has tended to be imprecise for some less common affective concepts, which has often reflected on the overall analysis of texts, both for sentiment inference and polarity detection tasks. To this end we developed a new technique, which we call sentic medoids, which more conveniently organizes AffectiveSpace by iteratively selecting appropriate concepts in order to minimize the sum of dissimilarities between them and the other concepts within the same cluster. The structure of the paper is the following: Section 2 presents the main existing approaches to opinion mining and sentiment analysis, Section 3 explains the AffectiveSpace process, Section 4 illustrates the emotion categorization model adopted, Section 5 explains in detail the new technique developed for better clustering AffectiveSpace, Section 6 presents an evaluation of the technique and Section 7 comprises concluding remarks and a description of future work.
2
Opinion Mining and Sentiment Analysis
Existing approaches to automatic identification and extraction of opinions and sentiments from text can be grouped into four main categories: keyword spotting, lexical affinity, statistical methods and Sentic Computing. In keyword spotting [3][4][5] text is classified into categories based on the presence of fairly unambiguous affect words. Lexical affinity [6][7] assigns arbitrary words a probabilistic affinity for a particular opinion or emotion. Statistical methods [8][9] calculate the valence of keywords, punctuation and word co-occurrence frequencies on the base of a large training corpus. In Sentic Computing, whose term derives from the Latin ‘sentire’ (the root of words such as sentiment and sensation) and ‘sensus’ (intended as common sense), the analysis of text is not based on statistical learning models but rather on common sense reasoning tools [10] and domain-specific ontologies [11]. While the first three approaches mainly rely on parts of text in which opinions and sentiments are explicitly expressed such as polarity terms (e.g. good, bad, nice, nasty, excellent, poor) and affect words (e.g. happy, sad, calm, angry, interested, bored), Sentic Computing analyzes text at semantic-level, aiming to infer also the opinions and sentiments implicitly conveyed. Moreover, unlike statistical classification (which generally requires large inputs and thus cannot appraise texts with satisfactory granularity), Sentic Computing enables the analysis of documents not only on the page- or paragraph-level but also on the sentence-level.
3
AffectiveSpace
AffectiveSpace is built by blending ConceptNet [12], a directed graph representation of common sense knowledge, with WordNet-Affect (WNA) [13], a linguistic
Sentic Medoids: Organizing Affective Common Sense Knowledge
603
Fig. 1. AffectiveSpace
resource for the lexical representation of affective knowledge (Fig. 1). This alignment operation yields AffectNet : a new dataset in which common sense and affective knowledge coexist i.e. a matrix 14,301 × 117,365 whose rows are concepts (e.g. ‘dog’ or ‘bake cake’), whose columns are either common sense and affective features (e.g. ‘isA-pet’ or ‘hasEmotion-joy’), and whose values indicate truth values of assertions. Therefore, in AffectNet, each concept is represented by a vector in the space of possible features whose values are positive for features that produce an assertion of positive valence (e.g. ‘a penguin is a bird’), negative for features that produce an assertion of negative valence (e.g. ‘a penguin cannot fly’) and zero when nothing is known about the assertion. The degree of similarity between two concepts, then, is the dot product between their rows in AffectNet. The value of such a dot product increases whenever two concepts are described with the same feature and decreases when they are described by features that are negations of each other. When performed on AffectNet, however, these dot products have very high dimensionality and are difficult to work with. In order to approximate them in a useful way, we project all of the concepts from the space of features into a space with many fewer dimensions i.e. we reduce the dimensionality of AffectNet. In particular, we perform truncated singular value decomposition (TSVD) [14] on AffectNet and obtain a new matrix, AffectNet*, which forms a low-rank approximation of the original data. This estimation is based on minimizing the Frobenius norm of the difference between AffectNet and AffectNet* under the constraint rank(AffectNet* ) = k and it represents the best approximation of
604
E. Cambria et al.
AffectNet in the least-square sense (for the Eckart–Young theorem [15]). In particular, we choose to discard all but the first 100 principal components and hence obtain AffectiveSpace, a 100-dimensional space in which different vectors represent different ways of making binary distinctions among concepts and emotions. In AffectiveSpace common sense and affective knowledge are in fact combined, not just concomitant, i.e. everyday life concepts like ‘have breakfast’, ‘meet people’ or ‘watch tv’ are linked to a hierarchy of affective domain labels. By exploiting the information sharing property of TSVD, concepts with the same affective valence are likely to have similar features i.e. concepts concerning the same opinion tend to fall near each other in the vector space. Concepts and emotions are represented by vectors of 100 coordinates: these coordinates can be seen as describing concepts in terms of ‘eigenmoods’ that form the axes of AffectiveSpace i.e. the basis e0 ,...,e99 of the vector space. For example, the most significant eigenmood, e0 , represents concepts with positive affective valence. That is, the larger a concept’s component in the e0 direction is, the more affectively positive it is likely to be. Consequently concepts with negative e0 components have negative affective valence.
4
The Hourglass of Emotions
To reason on the disposition of concepts in AffectiveSpace, we use the Hourglass of Emotions [16], a novel affective categorization model in which sentiments are organized around four independent – but concomitant – dimensions, whose different levels of activation make up the total emotional state of the mind. The Hourglass of Emotions, in fact, is based on the idea that the mind is made of different independent resources and that emotional states result from turning some set of these resources on and turning another set of them off [17]. Each such selection changes how we think by changing our brain’s activities: the state of ‘anger’, for example, appears to select a set of resources that help us react with more speed and strength while also suppressing some other resources that usually make us act prudently. The primary quantity we can measure about an emotion we feel is its strength. But when we feel a strong emotion it is because we feel a very specific emotion. And, conversely, we cannot feel a specific emotion like ‘fear’ or ‘amazement’ without that emotion being reasonably strong. Mapping this space of possible emotions leads to an hourglass shape (Fig. 2). In the Hourglass of Emotions, affective states are not classified, as often happens in the field of emotion analysis, into basic emotional categories, but rather into four independent and concomitant dimensions – Pleasantness, Attention, Sensitivity and Aptitude – in order to understand how much respectively: 1. 2. 3. 4.
the the the the
user user user user
is is is is
amused by interaction modalities (Pleasantness) interested in interaction contents (Attention) comfortable with interaction dynamics (Sensitivity) confident in interaction benefits (Aptitude)
Sentic Medoids: Organizing Affective Common Sense Knowledge
605
Fig. 2. The Hourglass of Emotions
Each affective dimension is characterized by six levels of activation, called ‘sentic levels’, which determine the intensity of the expressed/perceived emotion as an int ∈ [−3,3]. These levels are also labeled as a set of 24 basic emotions (six for each of the affective dimensions) in a way that allows the model to specify the affective information associated with text both in a dimensional and in a discrete form. The dimensional form, in particular, is called ‘sentic vector’ and it is a four-dimensional f loat vector that can potentially express any human emotion in terms of Pleasantness, Attention, Sensitivity and Aptitude. Some particular sets of sentic vectors have special names as they specify wellknown compound emotions. For example the set of sentic vectors with a level of Pleasantness ∈ (1,2] (‘joy’), a null Attention, a null Sensitivity and a level of Aptitude ∈ (1,2] (‘trust’) are called ‘love sentic vectors’ since they specify the compound emotion of ‘love’.
5
Sentic Medoids
So far, a k-NN approach has been used to divide AffectiveSpace into clusters according to the Hourglass model. In particular, the centroid of each cluster has been originally taken as the concept corresponding to each sentic level of the
606
E. Cambria et al.
described model. This choice, because of its simplicity, can pose a few limitations on the effectiveness of division into clusters of AffectiveSpace. For example, at present, there are no evidences that the concepts corresponding to a specific sentic level is necessarily the best choice for clustering; moreover, it is not proved that a centroid must coincide with an actual concept instead of being any point in the space; further, this division of space leads inevitably to poor performances in classification of concepts with low affective valence. Within this study we address the first of these points, trying to provide with centroids, which may be regarded as better (in some sense) than the currently used ones. K-means [18] and k-medoids [19] are two widely used techniques for clustering whose aim is to partition the given observations into k clusters around as many centroids, trying to minimize a given cost function. While k-means algorithm does not pose constraints on centroids, k-medoids do assume that centroids must coincide with k observed points. At this stage, a k-medoids approach has been chosen to investigate if the concepts that are currently used as centroids need to be replaced with different ones. The most commonly used algorithm for finding the k medoids is the partitioning around medoids (PAM) algorithm. The PAM algorithm determines a medoid for each cluster selecting the most centrally located centroid within the cluster. After selection of medoids, clusters are rearranged so that each point is grouped with the closest medoid. Since k-medoids clustering is a NP-hard problem [20], different approaches based on alternative optimization algorithms have been developed, though taking risk of being trapped around local minima. For the purpose of this work, we use a modified version of the algorithm recently proposed by Park and Jun [21], which runs similarly to the k-means clustering algorithm. This has shown to have similar performance when compared to PAM algorithm while taking a significantly reduced computational time. In particular, we have N concepts (N = 14, 301) encoded as points x ∈ Rp (p = 50). We want to group them into k clusters and, in our case, we can fix k = 24 as we are looking for one cluster for each sentic level s of the Hourglass model. Generally, the initialization of clusters for clustering algorithms is a problematic task as the process often risks to get stuck into local optimum points, depending on the initial choice of centroids [22]. However, for this study, we decide to use as initial centroids the concepts which are currently used as centroids for clusters, as they specify the emotional categories we want to organize AffectiveSpace into. For this reason, what is usually seen as a limitation of the algorithm can be seen as advantage for this study, since we are not looking for the 24 centroids leading to the best 24 clusters but indeed for the 24 centroids identifying the required 24 sentic levels (i.e. the centroids should not be ‘too far’ from the ones currently used). In particular, as the Hourglass affective dimensions are independent but concomitant, we need to cluster AffectiveSpace four times, once for each dimension. According to the Hourglass categorization model, in fact, each concept can convey, at the same time, more than one emotion (which is why we get compound emotions) and this information can be expressed via a sentic vector specifying the
Sentic Medoids: Organizing Affective Common Sense Knowledge
607
concept’s affective valence in terms of Pleasantness, Attention, Sensitivity and Aptitude. Therefore, given that the distance between two points in AffectiveSp 2 pace is defined as D(a, b) = i=1 (ai − bi ) (note that the choice of Euclidean distance is arbitrary), the used algorithm, applied for each of the four affective dimensions, can be summarized as follows: 1. Each centroid Cn ∈ R50 (n = 1, 2, ..., k) is set as one of the six concepts corresponding to each s in the current affective dimension. 2. Assign each record x to a cluster Ξ so that xi ∈ Ξn if D(xi , Cn ) ≤ D(xi , Cm ) m = 1, 2, ..., k. 3. Find a new centroid C for each cluster Ξ so that Cj = xi if xm ∈Ξj D(xi , xm ) ≤ xm ∈Ξj D(xh , xm ) ∀xh ∈ Ξj . 4. Repeat step 2 and 3 until no changes on centroids are observed. Note that condition posed on steps 2 and 3 may occasionally lead to more than one solution. Should this happen, our model will randomly choose one of them. This clusterization of AffectiveSpace allows to calculate, for each common sense concept x, a four-dimensional sentic vector that defines its affective valence in terms of a degree of fitness f (x) where fa = D(x, Cj ) Cj |D(x, Cj ) ≤ D(x, Ck ) a = 1, 2, 3, 4 k = 6a-5, 6a-4, ..., 6a.
6
Evaluation
In order to evaluate the proposed technique, we exploited a corpus of 5,000 mood-tagged blogs from LiveJournal (LJ) [23], a virtual community that allow users to keep a blog, journal or diary and to label their posts with a mood label, by choosing from more than 130 predefined moods or by creating custom mood themes. In particular, we both embedded sentic medoids in the sentics extraction process (Fig. 3), to test the technique on natural language data, and built a benchmark for affective common sense knowledge (BACK), in order to more specifically calculate statistical classifications such as precision and recall. We compared the outputs of sentics extraction process, using a k-NN approach for clustering AffectiveSpace firstly and employing sentic medoids secondly. Results showed the latter technique to be up to 16% more accurate than the former. Classification of ‘happy’ and ‘sad’ posts, for example, was performed with 89% and 81% precision respectively and recall rates of 77% and 73% using sentic medoids, while k-NN identified the same posts with 75% and 62% precision and 70% and 58% recall respectively. The F-measure values obtained, hence, were significantly better: 82% and 76% for ‘happy’ and ‘sad’ posts using sentic medoids versus 72% and 60% respectively using k-NN, for an average accuracy improvement of 13%. To build BACK we applied CF-IOF (concept frequency - inverse opinion frequency) [24] on the LJ corpus. CF-IOF is a technique that identifies common domain-dependent semantics in order to evaluate how important a concept is to
608
E. Cambria et al.
Fig. 3. Evaluation of sentics extraction process
a set of opinions concerning the same topic. Firstly, the frequency of a concept c for a given domain d is calculated by counting the occurrences of the concept c in the set of available d-tagged opinions and dividing the result by the sum of number of occurrences of all concepts in the set of opinions concerning d. This frequency is then multiplied by the logarithm of the inverse frequency of the concept in the whole collection of opinions, that is: nk nc,d CF -IOFc,d = log nc k nk,d k
where nc,d is the number of occurrences of concept c in the set of opinions tagged as d, nk is the total number of concept occurrences and nc is the number of occurrences of c in the whole set of opinions. A high weight in CF-IOF is reached by a high concept frequency in a given domain and a low frequency of the concept in the whole collection of opinions. We exploited CF-IOF weighting to filter out common concepts in the LJ corpus and detect relevant mood-dependent semantics for each of the Hourglass sentic levels. The result was a benchmark of 2000 affective concepts that we compared with the classification results obtained by applying k-NN and sentic medoids on AffectiveSpace. Also in this case, the latter technique outperformed the former. In particular, the average precision gap between sentic medoids and k-NN was +9% and the mean recall difference +15%.
Sentic Medoids: Organizing Affective Common Sense Knowledge
7
609
Conclusion and Future Efforts
In a world in which millions of people express their opinions and sentiments in blogs, wikis, fora, chats and social networks, the distillation of knowledge from this huge amount of unstructured information is a very arduous task. While existing approaches mainly work at syntactic-level, we use Sentic Computing techniques and tools to analyze natural language text at semantic-level. In particular, we infer opinions and sentiments from the Web using AffectiveSpace, a multi-dimensional vector space of affective common sense knowledge. In this work, we proposed a new technique to cluster this vector space according to a novel emotion categorization model based on the idea that the mind is made of different independent resources and that emotional states result from turning some set of these resources on and turning another set of them off. In particular, sentic medoids organize the affective common sense knowledge contained in AffectiveSpace by iteratively selecting appropriate concepts in order to minimize the sum of dissimilarities between them and the other concepts within the same cluster, and, hence, lead to more accurate results for sentiment inference and polarity detection tasks. Whilst this study has shown encouraging results, further research studies are now planned to investigate if and how the centroids selection could be still improved. In particular, we would state whether modifications of the dissimilarity function and/or of the centroids’ constraints could lead to an improvement of the model performance with special regard to concepts with low affective valence. Soon, we will extend BACK using a bigger LJ corpus and make it publicly available for comparing Sentic Computing with other opinion mining and sentiment analysis techniques. Moreover, we plan to exploit BACK to make a comprehensive comparison between the proposed method and other popular clustering techniques such as maximum entropy classifiers, na¨ıve Bayes classifiers, support vector machines (SVMs) and neural networks, in order to select the best method for organizing affective common sense knowledge and, ultimately, improve Sentic Computing.
Acknowledgements This research has been funded by the UK Engineering and Physical Sciences Research Council (EPSRC Grant Reference No. EP/G501750/1) and Sitekit Labs. This work was undertaken during the first author’s research visit to the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese of Academy of Sciences (CAS) in Beijing (China), which was jointly funded by the Royal Society of Edinburgh (UK) and CAS.
References 1. Cambria, E., Hussain, A., Havasi, C., Eckl, C.: Sentic Computing: Exploitation of Common Sense for the Development of Emotion-Sensitive Systems. In: Esposito, A., Campbell, N., Vogel, C., Hussain, A., Nijholt, A. (eds.) Second COST 2102. LNCS, vol. 5967, pp. 153–161. Springer, Heidelberg (2010)
610
E. Cambria et al.
2. Cambria, E., Hussain, A., Havasi, C., Eckl, C.: AffectiveSpace: Blending Common Sense and Affective Knowledge to Perform Emotive Reasoning. In: WOMSA at CAEPIA, Seville (2009) 3. Elliott, C.: The Affective Reasoner: A Process Model of Emotions in a Multi-agent System. The Institute for the Learning Sciences, Technical Report No. 32 (1992) 4. Ortony, A., Clore, G., Collins, A.: The Cognitive Structure of Emotions. Cambridge University Press, New York (1988) 5. Wiebe, J., Wilson, T., Claire, C.: Annotating Expressions of Opinions and Emotions in Language. Language Resources and Evaluation 39(3), 165–210 (2005) 6. Somasundaran, S., Wiebe, J., Ruppenhofer, J.: Discourse Level Opinion Interpretation. In: COLING, Manchester (2008) 7. Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing Contextual Polarity in PhraseLevel Sentiment Analysis. In: HLT-EMNLP, Vancouver (2005) 8. Hu, M., Liu, B.: Mining Opinion Features in Customer Reviews. In: AAAI, San Jose (2004) 9. Goertzel, B., Silverman, K., Hartley, C., Bugaj, S., Ross, M.: The Baby Webmind Project. In: AISB, Birmingham (2000) 10. Cambria, E., Hussain, A., Havasi, C., Eckl, C.: Common Sense Computing: From the Society of Mind to Digital Intuition and beyond. In: Fierrez, J., Ortega-Garcia, J., Esposito, A., Drygajlo, A., Faundez-Zanuy, M. (eds.) BioID MultiComm2009. LNCS, vol. 5707, pp. 252–259. Springer, Heidelberg (2009) 11. Cambria, E., Grassi, M., Hussain, A., Havasi, C.: Sentic Computing for Social Media Marketing. To appear in: Multimedia Tools and Applications. Springer, Heidelberg (2010) 12. Havasi, C., Speer, R., Alonso, J.: ConceptNet 3: a Flexible, Multilingual Semantic Network for Common Sense Knowledge. In: RANLP, Borovets (2007) 13. Strapparava, C., Valitutti, A.: WordNet-Affect: an Affective Extension of WordNet. In: LREC, Lisbon (2004) 14. Wall, M., Rechtsteiner, A., Rocha, L.: Singular Value Decomposition and Principal Component Analysis. In: Berrar, D., et al. (eds.) A Practical Approach to Microarray Data Analysis, pp. 91–109. Kluwer, Norwell (2003) 15. Eckart, C., Young, G.: The Approximation of One Matrix by Another of Lower Rank. Psychometrika 1(3), 211–218 (1936) 16. Cambria, E., Hussain, A., Havasi, C.: Eckl: SenticSpace: Visualizing Opinions and Sentiments in a Multi-Dimensional Vector Space. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds.) KES 2010. LNCS (LNAI), vol. 6279, pp. 385–393. Springer, Heidelberg (2010) 17. Minsky, M.: The Emotion Machine. Simon and Schuster, New York (2006) 18. Hartigan, J., Wong, M.: Algorithm AS 136: A K-Means Clustering Algorithm. Journal of the Royal Statistical Society 28(1): 100-108 (1979) 19. Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990) 20. Garey, M., Johnson, D.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman, New York (1979) 21. Park, H., Jun, C.: A Simple and Fast Algorithm for K-Medoids Clustering. Expert System with Applications 36, 3336–3341 (2009) 22. Duda, R., Hart, P.: Pattern Classification and Scene Analysis. Wiley, New York (1973) 23. LiveJournal, http://www.livejournal.com 24. Cambria, E., Hussain, A., Durrani, T., Havasi, C., Eckl, C., Munro, J.: Sentic Computing for Patient Centered Applications. In: IEEE ICSP 2010, Beijing (2010)
Detecting Emotions in Social Affective Situations Using the EmotiNet Knowledge Base Alexandra Balahur, Jesús M. Hermida, and Andrés Montoyo Department of Software and Computing Systems, University of Alicante Aptdo. de Correos 99, 03080 Alicante, Spain {abalahur,jhermida,montoyo}@dlsi.ua.es
Abstract. The task of automatically detecting emotion in text is challenging. This is due to the fact that most of the times, textual expressions of affect are not direct – using emotion words – but result from the interpretation and assessment of the meaning of the concepts and their interaction, described in the chains of actions presented. This article presents the core of EmotiNet, a knowledge base (KB) for representing and storing affective reaction to real-life contexts and action chains described in text, and the methodology employed in designing, populating, extending and evaluating it. The basis of the design process is given by a set of self-reported affective situations in the International Survey on Emotion Antecedents and Reactions corpus. From the evaluation performed, we conclude that our final model represents a semantic resource appropriate for capturing and storing the semantics of real actions and predict the emotional responses triggered by chains of actions. Keywords: EmotiNet, emotion detection, knowledge base, appraisal theories, self-reported affect, action chain.
1 Introduction Understanding the manner in which humans express, sense and react to emotion has always been a challenging topic. Modern theories state that emotions are given by a compound of biological, neuronal transformations and are the root of all human reactions [1, 2] and decisions [3]. In Natural Language Processing (NLP), although different approaches to tackle the issue of emotion detection in text have been proposed, the complexity of the emotional phenomena led to a low performance of the systems implementing this task - e.g. the systems participating in the SemEval 2007 Task No. 14 [4]. The main explanation, supported by theories in psychology, is the fact that expressions of emotion are most of the times not direct, through the use of specific words. In fact, according to a linguistic study by [5], only for 4% of words carry an affective content. Most of the times, the affect expressed in text results from the appraisal of the actions presented, in a given context. In the light of these issues, the aim of this research is to propose and evaluate a model of affective reaction to real-life situations described in text as action chains (changes produced by or occurring to an agent related with the state of a physical or D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 611–620, 2011. © Springer-Verlag Berlin Heidelberg 2011
612
A. Balahur, J.M. Hermida, and A. Montoyo
emotional object) in a given context, based on an ontological representation and commonsense knowledge. To this aim, we model situations and their corresponding emotions, as they are reported in the International Survey on Emotion Antecedents and Reactions (ISEAR) – [6]. The theoretical underpinning of this approach is given by the Appraisal Theory (see [7]), according to which emotions are extracted from our evaluations (appraisals) of events that cause specific reactions, depending on the context (cognitive and affective) and our knowledge of the concepts involved in these events. By modelling chains of actions and their corresponding emotional response, we can extract general patterns of reaction to situations interpreted through the use of commonsense knowledge. Subsequently, these patterns can be employed to detect, extract or generate emotion in text.
2 State of the Art In Artificial Intelligence, the term affective computing was first introduced by [8]. Although there were previous approaches in the 80s and 90s, in the field of NLP, the task of emotion detection has grown in importance together with the exponential increase in the volume of subjective data on the Web in blogs, forums, reviews, etc. Previous approaches to spot affect in text include the use of models simulating human reactions according to their needs and desires [9], fuzzy logic [10], lexical affinity based on similarity of contexts – the basis for the construction of WordNet Affect [11] or SentiWord-Net [12], detection of affective keywords [13] and machine learning using term frequency [13, 14]. The two latter approaches are the most widely used in emotion detection systems implemented for NLP, because they are easily adaptable across domains and languages. Other proposed methods include the creation of syntactic patterns and rules for cause-effect modelling [15]. Significantly different proposals for emotion detection in text are given in the work by [16] and the recently proposed framework of sentic computing [17], whose scope is to model affective reaction based on commonsense knowledge. In linguistics, the Appraisal framework [18] is a development of work in Systemic Functional Linguistics [19] and is concerned with interpersonal meaning in text — the negotiation of social relationships by communicating emotion, judgement and appreciation. As far as knowledge bases are concerned, many NLP tools have been built using manually created knowledge repositories such as WordNet [20], CYC (http://cyc.com/cyc/ opencyc/overview) , ConceptNet (http://web.media.mit.edu/~hugo/conceptnet/) or SUMO (http://www.ontologyportal.org/index.html). Some authors tried to learn ontologies and relations automatically, using sources that evolve in time (e.g. Yago [21] which employs Wikipedia to extract concepts, using rules and heuristics based on the Wikipedia categories. Other approaches to knowledge base population were by [22], and for relation learning [23]. DIPRE [24] and Snowball [25] label a small set of instances and create hand-crafted patterns to extract ontology concepts.
3 Motivation and Contribution Even though lexical approaches might solve the task at a satisfactory level for examples where emotion is explicitly stated in text, they fail to do so when emotion is
Detecting Emotions in Social Affective Situations Using the EmotiNet KB
613
expressed indirectly, by including factual information or through the description of a situation. Moreover, as mentioned in the introductory section, only a small percentage of words have an explicit affective content. Similarly, while the fuzzy models of emotion perform well for a series of cases that fit the described patterns, they remain weak at the time of acquiring, combining and using new information. Finally, the most widely used methods of affect detection in NLP are the based on machine learning models built from corpora. While possibly strong in determining lexical or syntactic patterns of emotion, such models are nonetheless limited as far as the extraction of the text meaning is concerned, even when deep text understanding features are used. The basic idea of sentic computing [17] is acquiring knowledge on the emotional effect of different concepts. However, this approach only spots the emotion contained in separated concepts and does not integrate their interaction or context (cognitive or affective) in which they appear. As we have seen in the examples, this is not enough at the time of detecting emotion in text. The theoretical underpinning of our approach is given by the Appraisal Theory, which claims that emotions are elicited and differentiated on the basis of the subjective evaluation of the personal significance of a situation, object or event [7]. Thus, the nature of the emotional reaction can be best predicted on the basis of the individual’s appraisal of an antecedent situation, object or event. This theory best explains the need to contextualize emotional response, due to which the same situation can lead to different affective reactions and similar reactions can be obtained through different stimuli. The general idea behind our approach is to model chains of actions and their corresponding emotional effect using an ontological representation. In this manner, we can model the interaction of different events in the context in which they take place and add inference mechanisms to extract knowledge that is not explicitly present in the text. We can also include knowledge on the affect of different concepts found in other ontologies and knowledge bases (e.g. “The man killed the mosquito.” does not produce the same emotional effect as “The man killed his wife.” or “The man killed the burglar in self-defence.”). At the same time, we can define the properties of emotions and how they combine.
4 Modelling Affective Reactions Using Appraisal Models The approach we propose defines a new knowledge base to store action chains, called EmotiNet, which aims to be a resource for detecting emotions in text, and a (semi)automatic, iterative process to build it, which is based on existing knowledge from different sources. This process principally aims at extracting the action chains from a document and adding them to the knowledge base. From a more practical viewpoint, our approach defines an action chain as a sequence of action links, or simply actions that trigger an emotion on an actor. Specifically, the process proposed consists of several steps, which we present, in order, in the next subsections. 4.1 ISEAR – Self-Reported Affect In ISEAR [6] (http://www.unige.ch/fapse/emotion/databanks/isear.html), the student respondents, were asked to report situations in which they had experienced all of 7
614
A. Balahur, J.M. Hermida, and A. Montoyo
major emotions (joy, fear, anger, sadness, disgust, shame, and guilt). In each case, the questions covered the way they had appraised the situation and how they reacted. Some examples of entries in the ISEAR databank are: “I felt anger when I had been obviously unjustly treated and had no possibility to prove they were wrong.”. Each example is attached to one single emotion. In order to have a homogenous starting base, we selected from the 7667 examples in the ISEAR database only the ones that contained descriptions of situations between family members. This resulted in a total of 174 examples of situations where anger was the emotion felt, 87 examples for disgust, 110 examples for fear, 223 for guilt, 76 for joy, 292 for sadness and 119 for shame. Subsequently, the examples were POS-Tagged using TreeTagger. Within each category, we then computed the similarity of the examples with one another, using Ted Pedersen Similarity Package. This score is used to split the examples in each emotion class into six clusters using the Simple K-Means implementation in Weka. The idea behind this approach is to group examples that are similar, in vocabulary and structure. This fact was confirmed by the output of the clusters. 4.2 Semantic Role Modelling of Situations The next step performed was extracting, from each of the examples, the actions that are described. In order to do this, we employ the system introduced in [27]. From the output of this system, we extract the agent, the verb and the patient (the surface object of the verb). For example, if we use the input sentence “I’m going to a family party because my mother obliges me to”, the system extracts two triples with the main actors of the sentences: (I, go, family party) and (mother, oblige, me), related by the causal adverb “because”. Moreover, the subjects of these sentences might include anaphoric expressions or, even, self-references (references to the speaker, e.g. ‘I’, ‘me’, ‘myself’) that have to be solved before creating the instances. The resolution of anaphoric expressions (not self-references) was accomplished automatically, using a heuristic selection of the family member mentioned in the text that is closest to the anaphoric reference and whose properties (gender, number) are compatible with the ones of the reference. Self-references were also solved by taking into consideration the entities mentioned in the sentence and deducing the possible relations. Following the last example, the subject of the action would be assigned to the daughter of the family: (daughter, go, family_party) and (mother, oblige, daughter). 4.3 Models of Emotion In order to describe the emotions and the way they relate and compose, we employ Robert Plutchik’s wheel of emotion [28] and Parrot’s tree-structured list of emotions [29]. These models are the ones that best overlap with the emotions comprised in the ISEAR databank. We combine the 2 by adding the primary emotions missing in the first model and the secondary and tertiary emotions as combinations of the basic ones. 4.4 Ontology Design The process of building the core of the EmotiNet knowledge base (KB) of action chains started with the design of the core of knowledge, in our case, an ontology. Specifically, the design process was divided in four stages:
Detecting Emotions in Social Affective Situations Using the EmotiNet KB
615
4.4.1 Establishing the Scope and Purpose of the Ontology The ontology has to be capable of defining the concepts required in a general manner, which will allow us to expand and specialize it with external knowledge sources. The EmotiNet ontology needs to capture and manage knowledge from three domains: kinship membership, emotions (and their relations) and actions (features and relations between them). 4.4.2 Reusing Knowledge from Existing Ontologies In a second stage, we searched other ontologies on the Web that contained concepts of our knowledge cores. At the end of the process, we located two ontologies that would be the basis of our ontological representation: the ReiAction ontology (http://www.cs.umbc.edu/ ~lkagal1/rei/ontologies/ReiAction.owl), which represents actions between entities, and the family relations ontology, which contained knowledge about family members and kinship relations (http://www.dlsi.ua.es/~jesusmhc/ emotinet/family.owl). 4.4.3 Building Our Own Knowledge Core from the Ontologies Imported This third stage involved the design of the last remaining core, i.e. emotion, and the combination of the different knowledge sources into a single ontology. In this case, we designed a new knowledge core from scratch based on a combination of the models of emotion presented in Section 4.4. This knowledge core includes different types of relations between emotions and a collection of specific instances of emotion (e.g. anger, fear or joy). In the last step, these three cores were combined using new classes and relations between the existing members of these ontologies (see Fig. 1, subclasses of DomainAction). Action Chain mother_f1
emotionFelt
Feel
Crash
anger
Emotion
emotionFelt
Argue
feel_anger_1 actor
actor hasChild
Forget
disgust
rdfs:subClassOf
Person
oblige_1
implies
target
… rdfs:subClassOf
DomainAction
actor
second
daughter_f1
go_1 SimpleAction
first
rdfs:subClassOf
rdfs:subClassOf actor
Action
target
rdfs:subClassOf
Agent
sequence_2
Object Modifier
target isAffectedBy
Fig. 1. Main concepts of EmotiNet
second
party_1 first
sequence_1
Fig. 2. RDF graph of an action chain
4.5 Ontology Extension and Population The last process extended EmotiNet with new types of action and instances of action chains using real examples from the ISEAR corpus. We began the process by manually selecting a subset of 175 documents from the collection obtained in Section 4.3 with expressions related to all the emotions: anger (25), disgust (25), guilt (25),
616
A. Balahur, J.M. Hermida, and A. Montoyo
fear (25), sadness (25), joy (25) and shame (25). Once we carried out the processes described in section 4.2 on the chosen documents, we obtained 175 action chains (ordered lists of tuples). In order to be included in the EmotiNet knowledge base, all their elements needed to be mapped to existing concepts or instances within the KB. When these did not exist, they were added to it. In EmotiNet, each action link was modelled as a triple of ontology instances (actor, action, patient) that could be affected by a set of modifiers (e.g. opposite, degree of intensity) that directly affects each action. We would like to highlight that each tuple extracted from the process of semantic role labelling and emotion assignation has its own representation in EmotiNet as an instance of the subclasses of Action. Each instance of Action is related to an instance of the class Feel, which represent the emotion felt in this action. Subsequently, these instances (action links) were grouped in sequences of actions (class Sequence) ended by an instance of the class Feel, which, as just mentioned, would determine the final emotion felt by the main actor(s) of the chain. In our example, we created two new classes Go and Oblige (subclasses of DomainAction) and two new in-stances from them: instance “act1”: (class “Go”, actor “daughter”, target “family_party”); and instance “act2”: (class “Oblige”, actor “mother”, target “daughter”). The last action link already existed within EmotiNet from another chain so we reused it: instance “act3”: (class Feel, actor “daughter” emotionFelt “anger”). The next step consisted in sorting and grouping these instances into sequences by means of instances of the class Sequence. This is a subclass of Action that can establish the temporal order between two actions (which one occurred first). Considering a set of default rules that modelled the semantics of the temporal adverbial expressions, we group and order the instances of action in sequences, which must end with an instance of Feel. Fig. 2 shows an example of a RDF graph, previously simplified, with the action chain of our example. 4.6 Ontology Expansion In order to extend the coverage of the resource and include certain types of interactions between actions, we expanded the ontology with the actions and relations from VerbOcean [31]. In particular, 299 new actions were automatically included as subclasses of DomainAction, which were directly related to the actions of our ontology through three new relations: can-result-in, happens-before and similar. This knowledge extracted from VerbOcean will be the basis of inferences when the information extracted from new texts does not appear in our initial set of instances. Thus, this knowledge reduces the degree of dependency between the resource and the initial set of examples. The more external sources of general knowledge we add, the more flexible EmotiNet will be, thus increasing the possibilities to process unseen action chains.
5 Experiments and Evaluation 5.1 Experimental Setup The evaluation of our approach consists in testing if by employing the model we built, we are able to detect the emotion expressed in other examples pertaining to the
Detecting Emotions in Social Affective Situations Using the EmotiNet KB
617
categories in ISEAR. In order to assess the new examples from ISEAR, we followed the same process we used for building the core of EmotiNet, with the exception that the manual modelling of examples into tuples was replaced with the automatic extraction of (actor, verb, patient) triples from the output given by the [27] SRL system. Subsequently, we eliminated the stopwords in the phrases contained in these three roles and performed a simple coreference resolution. Next, we order the actions presented in the phrase, using the adverbs that connect the sentences, through the use of patters (temporal, causal etc.). The resulted action chains represent the test set which will be used in carrying different experiments. The test set comprises the final version of the EmotiNet core knowledge base – containing all 7 emotions in ISEAR, and 25 examples for each. We perform the following series of experiments: (1). In the first experiment, for each of the situations in the test sets (represented as action chains), we search the EmotiNet KB to encounter the sequences in which these actions in the chains are involved and their corresponding subjects. As a result of the search process, we obtain the emotion label corresponding to the new situation and the subject of the emotion based on a weighting function. This function takes into consideration the number of actions and the position in which they appear in the sequence contained in EmotiNet. The issue in this first approach is that many of the examples cannot be classified, as the knowledge they contain is not present in the ontology. The corresponding results are marked with C1. (2). A subsequent approach aimed at surpassing the issues raised by the missing knowledge in EmotiNet. In a first approximation, we aimed at introducing extra knowledge from VerbOcean, by adding the verbs that were similar to the ones in the core examples (represented in VerbOcean through the “similar” relation). Subsequently, each of the actions in the examples to be classified that was not already contained in EmotiNet, was sought in VerbOcean. In case one of the similar actions was already contained in the KB, the actions were considered equivalent. Further on, each action was associated with an emotion, using ConceptNet relations and concepts. Action chains were represented as chains of actions with their associated emotion. Finally, new examples were matched against chains of actions containing the same emotions, in the same order. While more complete than the first approximation, this approach was also affected by lack of knowledge about the emotional content of actions. To overcome this issue, we proposed two heuristics: (2a). In the first one, actions on which no affect information was available, were sought in within the examples already. The results are marked with C2a. (2b). In the second approximation, we used the most frequent emotion associated to the known links of a chain, whose individual emotions were obtained from SentiWordNet. In this case, the core of action chains is not involved in the process. The results are marked with C2b. 5.2 Empirical Results We performed the steps described on the 895 examples (ISEAR phrases corresponding to the seven emotions modelled, from which the examples used as core of EmotiNet were removed). For the first approach, the queries led to a result only in the case of 571, for the second approach, approximations (C2a) – 617 and (C2b) - 625. For the
618
A. Balahur, J.M. Hermida, and A. Montoyo
remaining ones, the knowledge stored in the KB is not sufficient, so that the appropriate action chain can be extracted. Table 1 presents the results of the evaluations on the subset of examples, whose corresponding query returned a result. Table 2 reports on the recall obtained when testing on all examples. The baseline is random, computed as average of 10 random generations of classes for all classified examples. Table 1. Results of the emotion detection using EmotiNet on the classified examples in test set C Emotion disgust shame anger fear sadness joy guilt Total
C1 16 25 31 35 46 13 59 225
Correct C2a 16 25 47 34 45 16 68 251
C2b 21 26 57 37 41 18 64 264
C1 44 70 105 58 111 25 158 571
Total C2a 42 78 115 65 123 29 165 617
C2b 40 73 121 60 125 35 171 625
C1 36.36 35.71 29.52 60.34 41.44 52 37.34 39.40
Accuracy C2a 38.09 32.05 40.86 52.30 36.58 55.17 41.21 40.68
C2b 52.50 35.62 47.11 61.67 32.80 51.43 37.43 42.24
Table 2. Results of the emotion detection using EmotiNet on all test examples in test set Emotion disgust shame anger fear sadness joy guilt Total Baseline
C1 16 25 31 35 46 13 59 225 126
Correct C2a 16 25 47 34 45 16 68 251 126
C2b 21 26 57 37 41 18 64 264 126
Total C1 59 91 145 85 267 50 198 895 895
C1 27.11 27.47 21.37 60.34 17.22 26 29.79 25.13 14.07
Recall C2a 27.11 27.47 32.41 52.30 16.85 32 34.34 28.04 14.07
C2b 35.59 28.57 39.31 61.67 15.36 36.00 32.32 29.50 14.07
5.3 Discussion From the results in Table 1 and 2, we can conclude that the approach is valid for emotion detection in social situations. We have proven that introducing new information can be easily done from existing common-sense knowledge bases and that the approach is robust in the face of the noise introduced. From the error analysis performed, the first important finding is that extracting only the action, verb and patient semantic roles is not sufficient. There are other roles, such as the modifiers, which change the overall emotion in the text (e.g. “I had a fight with my sister” – sadness, versus “I had a fight with my stupid sister” – anger). Therefore, such modifiers should be included as attributes of the concepts identified in the roles, and, additionally, added to the tuples, as they can account for other appraisal criteria. This can also be a method to account for negation. Given that just 3 roles were extracted, there were also many examples that did not make sense when input into the system. Further on, we tried to assign emotion to all the actions contained in the chains. However, some actions have no emotional effect. Therefore, an accurate source of knowledge on the affect associated to concepts has to be added. Another issue we
Detecting Emotions in Social Affective Situations Using the EmotiNet KB
619
detected was that certain emotions tend to be classified most of the times as another emotion (e.g. fear is mostly classified as anger). A further source of errors was that lack of knowledge on specific actions. As we have seen, this knowledge can be imported from external knowledge bases and integrated in the core. VerbOcean extended the knowledge, in the sense that more examples could be classified. However, given the ambiguity of the resource and the fact that it is not perfectly accurate also introduced many errors. Finally, other errors were produced by NLP processes and propagated at various steps of the processing chain. Some of these errors could be partially solved by using alternative NLP tools.
6 Conclusions and Future Work This article presented our contribution concerning three major topics. The first one was the proposal of a method to model real-life situations described in text based on the model of the appraisal theory. Based on the proposed model, the second contribution was the design and population of EmotiNet, a knowledge base of action chains representing and storing affective reaction to real-life contexts and situations described in text. Finally, the third contribution lied in proposing and evaluating a method to detect emotion in text based on EmotiNet, using other examples in the ISEAR corpus. We conclude that our approach is appropriate for detecting emotion in text, although additional elements should be included in the model and extra knowledge is required. Moreover, we found that the process of automatic evaluation was influenced by the low performance of the NLP tools used. Thus, alternative tools must be tested in order to improve the output. We must also test our approach on corpora where more than one emotion is assigned per context (such as the one in SemEval 2007 Task14). Future work aims at extending the model by adding affective properties to the concepts included, testing new methods to assign affective value to the concepts and adding new knowledge from sources such as ConceptNet and CYC. Acknowledgements. This paper has been partially supported by the Spanish Ministry of Science and Innovation (grant no. TIN2009-13391-C04-01), the Spanish Ministry of Education under the Doctoral Fellowship Program (FPU, grant no. AP2007-03076) and Valencian Ministry of Education (grant no. PROMETEO/2009/119 and ACOMP/ 2010/288).
References 1. Scherer, K.: Emotional expression is subject to social and technological change: Extrapolating to the future. Social Science Information 2(40) (1987) 2. Scherer, K.: Toward a dynamic theory of emotion. The component process of affective states. Cognition and Emotion 1(1) (2001) 3. Goleman, D.: Emotional Intelligence. Bantam Books (1995) 4. Strapparava, C., Mihalcea, R.: Semeval 2007 task 14: Affective text. In: Proceedings of SemEval 2007, satellite workshop to ACL 2007 (2007) 5. Pennebaker, J.W., Mehl, M.R., Niederhoffer, K.: Psychological aspects of natural language use: Our words, our selves. Annual Review of Psychology 54, 547–577 (2003)
620
A. Balahur, J.M. Hermida, and A. Montoyo
6. Scherer, K., Wallbott, H.: The ISEAR Questionnaire and Codebook. Geneva Emotion Research Group (1997) 7. Scherer, K.: Appraisal Theory. Handbook of Cognition and Emotion. John Wiley & Sons, Chichester (1999) 8. Picard, R.: Affective computing. Technical report, MIT Media Laboratory (1995) 9. Dyer, M.: Emotions and their computations: three computer models. Cognition and Emotion 1, 323–347 (1987) 10. Subasic, P., Huettner, A.: Affect Analysis of text using fuzzy semantic typing. IEEE Trasactions on Fuzzy System 9, 483–496 (2000) 11. Strapparava, C., Valitutti, A.: Wordnet-affect: an affective extension of wordnet. In: Proceedings of LREC 2004 (2004) 12. Esuli, A., Sebastiani, F.: Determining the semantic orientation of terms through gloss analysis. In: Proceedings of CIKM 2005 (2005) 13. Riloff, E., Wiebe, J.: Learning extraction patterns for subjective expressions. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (2003) 14. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of EMNLP 2002 (2002) 15. Mei Lee, S.Y., Chen, Y., Huang, C.-R.: Cause Event Representations of Happiness and Surprise. In: Proceedings of PACLIC 2009 (2009) 16. Liu, H., Lieberman, H., Selker, T.: A Model of Textual Affect Sensing Using Real-World Knowledge. In: Proceedings of IUI 2003 (2003) 17. Cambria, E., Hussain, A., Havasi, C., Eckl, C.: Affective Space: Blending Common Sense and Affective Knowledge to Perform Emotive Reasoning. In: Proceedings of the 1st Workshop on Opinion Mining and Sentiment Analysis, WOMSA 2009 (2009) 18. Martin, J.R., White, P.R.R.: Language of Evaluation: Appraisal in English. Palgrave Macmillan, Oxford (2005) 19. Halliday, M.A.K.: An Introduction to Functional Grammar. Edward Arnold, London (1994) 20. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998) 21. Suchanek, F., Kasnei, G., Weikum, G.: YAGO: A Core of Semantic Knowledge Unifying WordNet and Wikipedia. In: Proceedings of WWW 2007 (2007) 22. Pantel, P., Ravichandran, D.: Automatically Labeling Semantic Classes. In: Proceedings of HLT/NAACL 2004, Boston, MA, pp. 321–328 (2004) 23. Berland, M., Charniak, E.: Finding parts in very large corpora. In: Proceedings of ACL (1999) 24. Brin, S.: Extracting patterns and relations from the world wide web. In: Atzeni, P., Mendelzon, A.O., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590, pp. 172–183. Springer, Heidelberg (1999) 25. Agichtein, E., Gravano, L.: Snowball: Extracting Relations from Large Plain-Text Collections. In: Proceedings of the 5th ACM International Conference on Digital Libraries (2000) 26. Studer, R., Benjamins, R.V., Fensel, D.: Knowledge engineering: Principles and methods. Data & Knowledge Engineering 25(1-2), 161–197 (1998) 27. Moreda, P., Navarro, B., Palomar, M.: Corpus-based semantic role approach in information retrieval. Data Knowl. Eng (DKE) 61(3), 467–483 (2007) 28. Plutchik, R.: The Nature of Emotions. American Scientist 89, 344 (2001) 29. Parrott, W.: Emotions in Social Psychology. Psychology Press, Philadelphia (2001) 30. Grüninger, M., Fox, M.S.: Formal ontology in information systems. In: Proceedings of the IJCAI Workshop on Basic Ontological Issues in Knowledge Sharing, Montreal (1995) 31. Chklovski, T., Pantel, P.: VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations. In: Proceedings of EMNLP 2004 (2004)
3-layer Ontology Based Query Expansion for Searching Li Liu and Fangfang Li School of Computer Science and Technology, Beijing Institute of Technology, Beijing, P.R. China, 100081 [email protected], [email protected]
Abstract. In this paper we propose the schema of 3-layer domain ontology. The entity relation, entity-attribute pair and textual description are distributed in layers respectively. Based on the 3-layer ontology we proposed 4 query expansion methods: top-down expansion, synonyms expansion, bottom-up expansion and backtracking expansion. A set of experiment1 is carried out to prove the expansion methods. Different result shows the expansion methods is effective as a whole, and we analysis the shortage of our methods. Keywords: Ontology, Top-down Expansion, Synonyms Expansion, Bottom-up Expansion, Backtracking Expansion.
1 Introduction This work proposes the use of a multilayer ontology scheme in order to deal with the mentioned issues in a video description search engine that provides access to video content. The target is to introduce a multilayer ontology based scheme that provides a scalable description of knowledge domains of video contents and structure. The advantage of the multilayer ontology lies in the fact that the expansion of the query allows search engines to navigate easily in keywords based query to more detailed description of the video description [1]. Traditional ontology based search faces three distinct differentness. First, the attributes of every entity is not easily to extend. Second, every keyword should be searched with in all ontology nodes. Third, there is only one textual description of every node, it is not complete and not easily to extend [2-4].
2 The Schema of 3-layer Ontology 2.1 The Main Schema Our approach introduces a three layer semantic description of domain ontology: the 1st Entity relation Ontology layer, which describes the basic relations of the domain 1
Our method focuses on Chinese ontology construction and query expansion. The experiment ontology and question corpus are both described in Chinese. In order to conform publish requirement, we translate Chinese words into English in this paper.
D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 621–628, 2011. © Springer-Verlag Berlin Heidelberg 2011
622
L. Liu and F. Li
entities, a set of Domain Description Ontology layer that represent a more detailed description of each attribute of entity which is need by the question analysis module. In order to search in the ontology, the relationship of the three layers between the ontologies provides the important information. Also, the model utilizes a lexical ontology that comprises a set of lexical and notational synonyms of the various ontology instances supplying the searching entrance. 2.2 Structure of 3-layer Ontology The 1st layer of the ontology is comprised of a single ontology describing the worldly domain knowledge that represents the relations between domain entities. This layer is primarily used as a “relationship” for the type of contents and the domain of knowledge that are integrated. All the classified relationship such as “is-a” and nonclassified is descripted in this layer. There are also a number of properties denoting the relationship between classes. In the 2nd layer, lays a set of ontologies for the detailed descriptions of each attribute of entity that included in the 1st Search Ontology Layer. The Textual Metadata Layer (3rd layer) is comprised of all the variant semantic annotated metadata schemes that are supported by the domain dictionary. These schemes may follow wide known schemes such as synonyms dictionary, WordNet or any other custom ontology scheme used by each of the search mechanisms. Fig.1 presents a part of the ontology describing the knowledge domain of food. The main class is the “stir-fried eggplant, potato, and peppers” that consists of three subclasses “potato”, “eggplant” and “pepper”. Each of these subclasses is characterized by an object attribute which denotes relationship between two classes (1st layer), a set of attributes which can be either a simple data attribute (2nd layer) and a set of textual metadata contains synonyms and other textual description (3rd layer). 1st st layer layer
-off art-of is-part is-p
identifier identifier of of stirstirfried fried eggplant, eggplant, potato, and potato, and peppers ss pepperrrs off rtt--o paar isis--p
is-pa rt-of is-par t-of
identifier identifier of of pepper pepper 2nd 2nd layer layer
identifier identifier of of eggplant eggplant
mss nyym noon ssyyn
potato potato
mss ym no noonny ssyyn
identifier identifier of of potato potato
murphy murphy
ppro rodduuc cin ingg aare reaa
Identifier Id deentifier of of Shandong SShh aannddoonng handong ha ngg Province Province ss ymm oonny ssyynn
3rd 3rd layer layer
ssyyn noon nyym mss
Shandong Shandong Province Province
Fig. 1. A part of the ontology of food domain
LU LU
3-layer Ontology Based Query Expansion for Searching
623
2.3 Construction of 3-layer Ontology Modeling of domain-specific terminology is accomplished using conventional thesauri methods. Equivalent terms or synonyms are represented via textual ontology. Hierarchical relations whether ‘is-a’ or ‘part-of’ are represented in relation ontology. For each term, the query rule maintains a coefficient that indicates the influence of it on the interpretation of an ontology relationship, and this is derived by carrying out some document density studies. The main purpose of constructing 3-layer ontology is getting all kinds of elements according to our demand. It can be used for query expansion. The query expansion techniques presented in this paper are based on both domain ontology and syntactic rules. Different from term-based query expansion techniques, the proposed techniques expand a query by trying to derive its relative entities and attributes, and it is specially designed to resolve a domain specific query. Various factors, such as types of domain terms as encoded in the domain ontology, types of expansion rules as encoded in the question query module, the text description of the domain elements and their context of use are taken into account to support expansion of a question. The proposed techniques support the intelligent, flexible treatment of a domain question when a fuzzy ontology relationship is involved. Some experiments have been carried out to evaluate the performance of the proposed techniques using sample realistic ontologies.
3 Query Expansion 3.1 OKER This section describes how query expansion is performed by employing the domain ontologies. The proposed method is mainly to handle food domain queries with some domain terminologies. The query can be expanded using this method is focus on questions with more than two domain keywords. For every domain keyword, there is a textual description in the 3rd layer of ontology. If we got more than 2 keywords in a question, an expansion rule can be established according to the relation between these keywords. For example, in the question “How to cook potato?” we can get “potato” and “cook” as domain keyword. An expansion rule can be available: acquiring all kinds of cooking method for potato. If the expansion set is over-large, we set a statistical data that record the frequency user chosen for every expansion result. Practically it is a measure to choose the best suitable expansion result. We use Ontology Keywords to Expansion Rule System (OKERS) to denote the query expansion rules generated from the keyword sets. We first describe how the initial expansion rule is generated, and then describe how the expansion rule changed when the initial search result is not adequate. The type of expansion operation is determined by the relationship between the keyword set in question [5]. For example, if there is an entity-attribute relation in keyword set, then a top-down operation is performed. The generating expansion method is according to relation of keyword pair ( KSet ) , it is shown by using following function:
624
L. Liu and F. Li
ExpOp = f ( KSet ) The f function maps relationships in sets of keyword to expansion rules. Different KSets correspond to different expansion rules. We propose 3 kinds of expansion method: top-down expansion, synonyms expansion and bottom-up expansion. The following section shows the detail respectively. 3.2 Top-Down Expansion We take the basic keyword set for example. In the domain ontology, the basic relationship is a triple: {entity, attribute name, attribute value}. Define an instance of the triple: Entity E ; The attribute set of
E : {A1 , A 2 ,…, A n } ;
{{V11 , V21 , ...,Vi1}, ...,{V1n , V2n , ...,V jn }} ; Here n is the number of attribute and i , j is the number of each attribute ( i is not always equals j ). The attribute value set of each attribute:
Table 1 shows some condition of top-down expansion. Table 1. Top-down expansion
Keyword1
Keyword2
E
A1
Keyword3 Blank
Expansion rule Get the 1 1
set 1
{{E , A1 , V }, ...,{E , A1 , Vi }}
E
A2
V
No expansion
E
Blank
V2n
Get the set
2 2
{E , An ,V2n }
According to table 1, the main idea of top-down expansion method is to complete the relationship triple. It fits the ontology structure, but the increase of keywords will lead to low recall. In our experiment will prove it. 3.3 Synonyms Expansion Synonyms expansion [6] is the simple method aiming at failed search. In the 3 layer ontology, every ontology elements has its synonyms. The basic synonyms expansion method is as follows: Let Q = {w1 , w2 , …, wn } be the initial query Let S ( wk ) = {S j |S j k
sets) of
rd
k
∈ Synset 3rdOnto ( wk ),wk ∈ Q} be the synonym sets (syn-
wk from 3 layer of ontology, k=1,…,n.
3-layer Ontology Based Query Expansion for Searching
Let further C x
625
= ( S x11 , S x22 , …, S nn ) be a possible configuration of senses for Q
( xk is a sense index between 1 and the number of possible senses for
wk ).
For each configuration C x , do the following: 1. 2.
Do the search action; Assign a score the configuration. Finally selected Cbest = argmax( Score(Cx )) .
3.4 Bottom-Up Expansion The third expansion method is the bottom-up expansion. If we got a disappointing result in a search action, a possible reason is the search range is limited. Under this condition, it is valuable to extent the search range. In the 3-layer designed ontology, it has an increasing number of elements from entity to attribute value (top-down). The bottom-up expansion almost is to extent the relationship between entities. For example, “What's the taste of Fuji?”, for this question we got no result. If we expansion this question into “What's the taste of apple?”, we can get some result which covers the original question. The bottom-up expansion method is roughly described as fig.2:
Fig. 2. An example of bottom-up expansion
The bottom-up expansion is used in the following condition: The entity in the question has a parent node in the 1st ontology layer; 1. 2. 3.
The entity has a low frequency, in other word it is not often used in common questions; The parent node has a higher frequency than the child entity; The attribute mentioned in the query is also contained in the parent node.
3.5 Backtracking Expansion When an initial search results fail to satisfy the user, our method regenerates different query expansion to cover some regions beyond that of original expansion. It
626
L. Liu and F. Li
is called backtracking expansion. We use backtracking expansion according the following condition: 1. 2. 3.
Original query has not got a satisfactory search result. Expansion query has not got a satisfactory search result. The expansion set is not overlarge. If an expansion set has a large number of elements, the backtracking expansion will lead to large time cost.
4 Experiment 4.1 Experiment Environment We have purposed 4 kinds of expansion method. In order to verify these methods, we have carried out some experiments. In this section, we will demonstrate how query expansion techniques are used to improve search results. The 3-layer ontology is constructed manually by several domain experts. In this experiment, it contains 316 video descriptions of food domain of entity names, attribute names and attribute value. All the elements are associated in the 2nd ontology, and have no text description. Each of them is represented by a code. The detailed text description is in the 3rd ontology. Query expansion techniques are implemented using Java. Once a query expansion is generated, it is feed to video search component to retrieve the relevant video descriptions. All experiments were carried out on a DELL Precision T5500 workstation with 16core 2.40GHz processor and 12 GB of memory, running Microsoft Windows Server 2008 R2. The ontology adopts database architecture, and query expansion services talks to other components of the system through Apache SOAP. 4.2 Experiment Process The effectiveness of query expansion is achieved by the comparing the search result with the query expansion switch is on and off. When the switch is on, the search system performs both the original and expansion query. Then it is off, the search system just received the original textual based search. We carried out the experiment using a set of queries in food domain (shown in Table 2). The question including all the 4 conditions our expansion method covers, additionally, query 4 is complex that it has no search result even using the first 3 kinds of expansion method. The results produced from running these queries were analyzed manually. Table 2. A set of queries in food domain
Query 1 Query 2 Query 3 Query 4
potato, cook murphy, cook Fuji, taste orange, make into, meat stuffing
3-layer Ontology Based Query Expansion for Searching
627
4 examples were classified as tab.2. in our experiment, query 1-3 are for top-down expansion, synonyms expansion and bottom-up expansion, query 4 is for backtracking expansion. Tab.2. shows the experiment result. Columns 2, 3, 4 and 5 displays the number of relevant results retrieved, shown in table 3. Table 3. Experiment result
Query Expansion switch top-down synonyms bottom-up backtracking
1 Off On
2 Off On
3 Off On
12 12 12
9 9 9
2 2 2
10 35 12
8 35 3
4 Off On
0 7 13 0
15
4.3 Result Analysis Experiment result shows that the synonyms expansion always gets a satisfactory result. Every query using synonyms expansion at least gets a 100% boost. But, it is with large time cost [7]. Other expansion method is restricted by the expansion rule. The first query is designed to top-down expansion, but it has not got a satisfactory result. The top-down expansion is a complement for a query, in search action, more keywords, less result. Practically top-down expansion is for high precision, without high recall. Query 3 gets a high recall using bottom-up expansion. It fixes the condition we assumed for the expansion. Query 4 gets a tremendous result. It shows that the combination of our expansion method is efficient. The backtracking process should be well planned, otherwise it will leads to a large time cost.
5 Conclusion In this paper we proposed 4 methods for query expansion. Various factors are taken into consideration for supporting intelligent expansion of each kinds of query, and backtracking method also supports iterative query expansion when initial searches are not satisfactory. Our experiment shows that the appropriate expansion rules lead the satisfactory result. In practical use we will choose correct expansion method according to the feathers of query. And the combination expansion method in backing is our further research.
References 1. Dumontier, M., Villanueva-Rosales, N.: Three-layer OWL ontology design, Citeseer (2007) 2. Bhogal, J., Macfarlane, A., Smith, P.: A review of ontology based query expansion. Inform. Process. Manag. 43, 866–886 (2007)
628
L. Liu and F. Li
3. Tuominen, J., Kauppinen, T., Viljanen, K., Hyv nen, E.: Ontology-based query expansion widget for information retrieval. . Citeseer (2009) 4. Buscaldi, D., Rosso, P., Sanchis, E.: A wordnet-based indexing technique for geographical information retrieval. Evaluation of Multilingual and Multi-modal Information Retrieval, 954–957 (2010) 5. Fu, G., Jones, C.B., Abdelmoty, A.I.: Ontology-based spatial query expansion in information retrieval. In: On the Move to Meaningful Internet Systems 2005: CoopIS, DOA, and ODBASE, pp. 1466–1482 (2005) 6. Navigli, R., Velardi, P.: An analysis of ontology-based query expansion strategies, Citeseer, pp. 42–49 (2003) 7. Muller, H.M., Kenny, E.E., Sternberg, P.W.: Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol. 2, e309 (2004)
Chinese Speech Recognition Based on a Hybrid SVM and HMM Architecture Xingxian Luo Computer Center, China West Normal University Nanchong 637009, P.R. China [email protected]
Abstract. Hidden Markov Model (HMM), which is widely used in acoustic modeling, has powerful dynamic time-series modeling capability; Support Vector Machine (SVM) still has strong classification ability when the training samples are limited. This paper proposes an improved speech recognition algorithm based on a hybrid SVM/HMM architecture. We use the algorithm to extract the speech features and apply the features to the Speech Recognition (SR) interface of Microsoft Speech SDK (SAPI) to improve the interface data type. The experimental results show that the recognition rate increases greatly. Keywords: Support Vector Machine; Hidden Markov Model; Chinese Speech Recognition; Recognition Rate.
1 Introduction With the development of speech recognition technologies, people's demand for human-computer interaction is growing. People wish to use natural languages to operate computers, instead of keyboard or Mouse. Therefore, translating natural languages into text has become the primary research problem. Translating speech into text is an important research direction in the speech recognition field, and its key is how to recognise the whole speech from the speakers' continuous speech, whose content is unrestricted. Nowadays, the continuous speech recognition commonly adopts continuous density HMM or semi-continuous HMM. HMM as the current most successful speech recognition method has strong time-series modeling capability and is widely used in acoustic modeling. But the traditional Maximum Likelihood (ML) based HMM training method has poor classification ability because it only considers one class of data. SVM well known for its powerful performance of classification is essentially a discriminant model. It attempts to find an optimal separating hyper-plane among classes in the feature space to construct the classifier and has good generalization performance and ability of classification even when the training samples are limited. This paper combines the two models for Chinese speech classification and recognition [1]. D. Liu et al. (Eds.): ISNN 2011, Part III, LNCS 6677, pp. 629–635, 2011. © Springer-Verlag Berlin Heidelberg 2011
630
X. Luo
2 Speech Recognition Algorithm SVM has good performance of classification, and HMM, which is widely used in the continuous speech recognition, is a proven classification method. In this section, we will analyze the two methods respectively. 2.1 Support Vector Model SVM firstly maps the vectors into a much higher dimensional space. Then it builds a hyper-plane with the largest distance in the space and establishes two parallel hyper-planes in the both sides of the hyper-plane used to separate the data. Finally, it attempts to find a separating hyper-plane with the proper direction to make the distance between the two hyper-planes, which is parallel to the separating hyper-plane, maximum. Given a set of samples {(x1,y1),(x2,y2),…,(xk,yk)}, where xk Rn, yk (-1,+1); the decision function is:
∈
f ω , x = sgn(ω ⋅ x + b) y i (ω ⋅ x + b) > 1,1 < i < k
∈
(1)
In this paper, it's actually a multiple classification problem, which means that the sequence should be classified into several classes according to the actual need. To solve this problem, we adopt the non-linear classifier. SVM can map the input vector into a much higher dimensional feature space by choosing the proper non-linear transformationφ(x), this will help to build a linear hyper-plane in the object higher dimensional space. Generally, this form of non-linear transformation may be very complex and hard to implement. But both the optimized object function and the classification function only use the dot-product operation like the formφ(xi)•φ(xj) [2]. The optimized function (refer to Form 2) and the classification function (refer to Form 3) are as follows:
∑α i
i
−
1 ∑ α i α j y i y j xi x j 2 i, j
(2)
m
class ( x) = sgn(ω ⋅ x = b) = sgn(∑ α i y i xi x + b)
(3)
i =1
If there exists “Kernel Function” K satisfying the condition K(xi, xj)=φ(xi)•φ(xj), it's not necessary to care about the exact form of φ(x). According to the theory of norm function, as long as the kernel function K(x, y) satisfies Mercer's condition, it will correspond to a dot-product in the transform space, that is there exists mapping φ(x) making the classification function [3] true:
⎡m ⎤ class( x) = sgn ⎢∑α i yi K ( xi , x) + b⎥ ⎣ i =1 ⎦
(4)
2.2 Hidden Markov Model
HMM is a kind of model which is commonly used for time-series recognition and it can model the given sample set through training. HMM output probability is used to
Chinese Speech Recognition Based on a Hybrid SVM and HMM Architecture
631
describe the fitting degree of the samples and the model in the recognition application. HMM can be expressed as λ = (π, A, B), which represents finite states like {S1, S2,…, Sn}. When the initial probability is πi=P(qt=Si), the transition probability from state Si to state Sj will be aij=P(qt+1=Sj | qt=Si), and each state is described by the observation probability b(Oi)=P(Oi | qt=Si). Define a state sequence {qi,…,qr}, when the observation sequence is Oi={O1, O2,…,Or}, we can calculate the Xi for each observation value Ot and obtain the maximum probability P(τi=Xi | O) according to Bayesian theory. In HMM, there exists a very important algorithm named forward-backward algorithm. When the time is t, the probability from the previous t observation sequences O1, O2,…,Ot to state Si can be expressed as αt(i)=P(qt=Si, O1,…,Ot | λ), this is forward algorithm; the probability from the observation sequences {Ot+1, Ot+2,…,Ot}which start from time t+1, to state Si can be expressed as βt(i)=P(Ot+1,…,Ot | qt=Si, λ), this is backward algorithm. Then we can calculate the probability from the observation sequence O to state Si in time t [4]:
γ t (i ) = α t (i ) βt (i ) / P (O | λ )
(5)
3 Improved Algorithm for Chinese Speech Recognition Based on SVM/HMM Chinese speech recognition process can be divided into three parts (as shown in Fig.1). Firstly, collect the speech through microphone and extract the features using HMM algorithm. Secondly, process the extracted speech features by SVM. Finally, apply the processed data to SR interface of SAPI to translate speech into text.
Speech Data
Feature
HMM
Extraction (HMM)
SAPI SR
Text
SVM
Fig. 1. Chinese speech recognition process
3.1 Improved Algorithm Combining HMM and SVM
This section will detailedly describe the algorithm combining SVM and HMM. In this algorithm, we use HMM to extract the speech features, which are in the same dimension and in the same vector space; SVM can help reduce the dimension to raise the classification speed and accuracy. Suppose that there are k classes expressed as l={l1,…,lk} and each class corresponds to an HMM set λ'={λ1,…, λk} when λi={λ1i,…, λki}, calculate the HMM number (Ni) of k each HMM set respectively; then the total HMM number (N) is N = ∑ i =1 Ni . For an HMM set λi={λ1,…, λk}, calculate the probability P(O | λ), the maximum probability of i a class li in this set is Pmax li = max P (O | λ j ) . We adopt the HMM method introduced 1≤ j ≤ Ni
632
X. Luo
before to extract the feature vector g={w1,w2,…,wm}, combine li and g to obtain a new →
→
feature vector g = {li , g} and normalize the new vector g =
→
g , which can be used || g ||2
for SVM preprocess [5]. Now we have known the way to extract features from the existing HMM, but how to make use of HMM to extract the features from the existing sequence O={O1, O2,…,OK} still remains unsolved. In this paper, we use Baum Weld method to calculate −
−
the parameter a ij and b j (i ) : −
aij
∑ ∑ = ∑ ∑ k
Tk −1
k =1
b j (i )
γ (i , j )
Tk
k =1
(6)
γ (i)
∑ ∑ = ∑ ∑ k
−
ε (i, j )
k t =1 t Tk −1 k t =1 t
k =1 k
k t =1, otk = vm t k Tk k k =1 t =1 t
γ ( j)
(7)
Here, ε tk (i, j ) represents the joint probability; γ tk (i ) represents the state transition probability. We use SVM to train the feature vector obtained by HMM. For each classifier, the data belonging to class i is marked with '+', the data belonging to class j is marked with '-' the decision function used in the classifier is:
,
~
→
Λ
~
→
lil j
~
→
fij ( g ) = ∑ y α φ ( g , g ) + b i j li l j n
li l j n
ll
(8)
n
Here, Λ is the data sum of class i and class j in the training set. Given a training set, for each data in the training set, if the decision function predicts the data belong to class i, class i will get a vote, or class j will get the vote. The training set belongs to the class, whose has the most votes. Fig.2 shows the classification result by SVM and SVM/HMM respectively. As can be seen from Fig.2, the algorithm combining SVM and HMM can get more accurate classification result and higher classification rate than the one only using SVM. 3.2 Improvement of SR Interface
Microsoft provides the speech applications developed on the Windows platform with the speech engine software development kit, which mainly includes Speech Application Programming Interface (SAPI)[6] compatible with Win32, Microsoft Continuous Speech Recognition engine (CSR), Microsoft Text To Speech engine (TTS), tools used to compile and debug the speech application, samples and help documents. Fig.3 shows the structure of this software kit. SAPI is between the speech application and the speech engine, Microsoft Speech SDK provides the interface in the form of COM interface calls. The application
Chinese Speech Recognition Based on a Hybrid SVM and HMM Architecture
(a)
633
(b)
Fig. 2. (a) The classification result only using SVM and (b) the classification result using SVM and HMM
Speech Recognition Application
TTS Application
API SAPI Runtime DDI Recognition Engine
TTS Engine
Fig. 3. The structure chart of Microsoft Speech SDK
interfaces provided by SAPI can mainly be divided into two classes: one is used for Speech Recognition; the other is for Text To Speech. We apply the improved algorithm to SR interface as the Dynamic Link Library to improve it and solve the problem of low recognition rate. 3.3 Experimental Results Compare
Experimental results show that the recognition rate has increased greatly after SR interface was improved by SVM/HMM based algorithm. Fig. 4 compares the recognition rate of one-word phrase, two-word phrase, three-word phrase, four-word phrase and the phrase more than four words before and after the improvement. As can be seen from the Fig.4, the recognition rate is more accurate using the improved algorithm. In Fig.4, the number 1, 2, 3, 4 and 5 of curve abscissa represent the oneChinese-character phrase, two-Chinese-characters phrase, three-Chinese-characters phrase, four-Chinese-characters phrase and the phrase more than four Chinese characters
634
X. Luo
Recognition rate before and after improvement
Recognition rate
1.00 0.95 0.90 Before improvement
0.85
After improvement
0.80 1
2
3
4
5
The word number of the phrase in the speech to be recognised
Fig. 4. The recognition rate before and after improvement
respectively; ordinate is the recognition rate corresponding to the word number. The curve declines because the recognition rate will decrease with the increase of the word number. In Fig.4, the curve with rectangles is the recognition rate before improvement; the curve with triangles represents the recognition rate after improvement. The downward trend of recognition rate is very significant with the increase of the word number before improvement; the downward trend is slow after improvement. The experimental results show that the recognition rate of more words increases significantly.
4 Conclusions This paper adopts the improved algorithm based on SVM and HMM to improve SR interface of Microsoft Speech SDK, this achieve the transform from Chinese speech to text. Compared with the original SR interface, the classification accuracy and speed improve greatly after improvement. This algorithm can be used to many other dictation equipments, such as Repeater and Chinese-English dictation machine. But there are still many problems to this model like the affect of the dialect, emotion and mood to the recognition rate. We will do further research and make improvement in the future. Acknowledgments. This paper was supported by the Foundation Project of China West Normal University under Grant 10A016.
References [1] Lin, Q., Ou, J., Cai, J.: The Design and Implementation of a Speech Keywords Retrieving Application Based on Microsoft Speech SDK. J. Mind and Computation 1, 433–441 (2007) [2] Wang, J., Yao, Y., Liu, Z.: A New Text Classification Method Based on HMM-SVM. In: 2007 International Symposium on Communications and Information Technologies, pp. 1516–1519 (2007)
Chinese Speech Recognition Based on a Hybrid SVM and HMM Architecture
635
[3] Doumpos, M., Zopounidis, C., Golfinopoulou, V.: Additive Support Vector Machines for Pattern Classification. IEEE Transactions On Systems, Man, And Cybernetics, Part B: Cybernetics 37, 540–550 (2007) [4] Scheffer, T., Decomain, C., Wrobel, S.: Active Hidden Markov Models for Information Extraction, pp. 309–318. Springer, Heidelberg (2001) [5] Xue, W., Bao, H., Huang, W., et al.: Web Page Classification Based on SVM. In: The 6th World Congress on Intelligent, Control and Automation, pp. 6111–6114 (2006) [6] Microsoft Speech API 5.3, http://msdn.microsoft.com/en-us/ library/ms723627(v=vs.85).aspx
Author Index
Abdikeev, Niyaz I-1 Ahmed, Sultan Uddin II-260 Ajina, Sonia III-132 Alejo, R. II-19 Alizadeh, Hosein II-144 Arifin, Imam II-525 Balahur, Alexandra III-611 Beigi, Akram II-144 Bo, Yingchun I-52 Bornschein, Joerg II-232 Breve, Fabricio III-426 Burchart-Korol, Dorota III-380 Cai, Kuijie I-93 Cambria, Erik III-601 Cao, Fengjin III-50 Cao, Jianting III-306 Cao, Jianzhi II-506 Cao, Jinde I-321 Chai, Wei III-122 Chang, Chao-Liang II-278 Chen, Bing II-552 Chen, Chun-Chih III-21 Chen, C.L. Philip II-535 Chen, Cuixian II-251 Chen, Hongwei II-356 Chen, Hui III-460 Chen, Jing II-583 Chen, Lingling III-58, III-68 Chen, Ping II-159 Chen, Qiang II-57 Chen, Qili III-122 Chen, Qin I-139 Chen, Wanzhong I-505, III-340 Chen, Xiaofeng I-260 Chen, Xiaoping I-297 Chen, Yen-Wei III-355 Chen, Yuehui III-363 Chen, Yuhuan I-385 Chen, Zhanheng III-280 Cheng, Yifeng III-397 Cheng, Zunshui I-125 Chien, Tzan-Feng III-548
Chiu, Chih-Chou III-228 Chu, Hongyu I-587 Chu, Zhongyi III-41 Chuan, Ming-Chuen III-21 Cui, Jianguo I-139 Cui, Jing III-41 Cui, Li II-225 Czaplicka-Kolarz, Krystyna III-380 Dai, Lizhen II-583 Dai, Xiaojuan I-60 Dang, Xuanju III-50 Dang, Zheng II-388, II-395 Decherchi, Sergio III-523 Deng, Shitao III-397 Ding, Gang I-484 Ding, Heng-fei I-158 Ding, Lixin III-264 Ding, Yi II-199 Ding, Yongsheng III-112 Dong, Beibei III-58, III-68 Dong, Wenyong II-1 Dong, Yongsheng II-9 Dou, Binglin II-591 Dou, Yiwen III-112 Doungpaisan, Pafan II-486 Du, Jing III-460 Du, Ying I-109 Eckl, Chris III-601 Er, Meng Joo II-350, II-525 Essoukri Ben Amara, Najoua Fan, LiPing II-130 Fang, Chonglun II-136 Fang, Guangzhan I-139 Fu, Chaojin I-348 Fu, Jian III-1 Fu, Siyao II-305, II-381 Fu, Xian II-199 Fu, XiangHua III-485 Fujita, Hiroshi II-121 Gao, Meng Gao, Xinbo
II-207 II-395
III-132
638
Author Index
Gao, Ya II-151 Gao, Yun I-565 Gasca, E. II-19 Gastaldo, Paolo III-523 Ghaffarian, Hossein III-576 Golak, Slawomir III-380 Gong, TianXue III-485 Grassi, Marco III-558 Guo, Chengan II-296 Guo, Chengjun III-416 Guo, Chongbin I-10, III-112 Guo, Daqing I-176 Guo, Dongsheng I-393 Guo, Jia I-437 Guo, Ping II-420 Guo, XiaoPing II-130 Guzm´ an, Enrique III-388 Han, Fang I-109 Han, Min III-313 Han, Peng II-563 Han, Zhiwei III-442 Hao, Kuangrong III-112 Haque, Mohammad A. II-447 He, Haibo III-1 He, Jingwu I-587 He, Wan-sheng I-158 He, Yunfeng I-572 Hermida, Jes´ us M. III-611 Hilas, Constantinos S. I-529 Hori, Yukio III-493 Hu, Peng I-194 Hu, Shigeng I-194 Hu, Wenfeng I-339 Hu, Xiaolin I-547 Hu, Yongjie III-238 Huang, Chien-Lin III-475 Huang, Fei III-467 Huang, He I-297 Huang, Qingbao I-455 Huang, Rongqing II-76 Huang, Ting III-539 Huang, Wei III-256, III-264 Huang, Yan III-11 Hui, Guotao I-329, III-274 Hussain, Amir III-601 Ichiki, Akihisa Imai, Yoshiro
I-251, I-287 III-493
Imran, Nomica III-94 Ishikawa, Tetsuo I-76 Ji, Feng II-395 Ji, You III-323 Jia, Yunde I-514 Jiang, Haijun II-506, III-280 Jiang, Huiyan II-121 Jiang, Liwen III-313 Jiang, Minghui I-117 Jiang, Shanshan II-159 Jiang, Yuxiang II-215 Jiao, Junsheng I-572 Jim´enez, Ofelia M.C. III-388 Jin, Feng I-595 Jin, Jian III-323 Jitsev, Jenia II-232 Kalti, Karim III-168 Kang, Bei II-601 Ke, Yunquan I-241, I-375 Ke, Zhende I-393 Keck, Christian II-232 Khan, Asad III-94 Khan, Md. Fazle Elahi II-260 Khanum, Aasia I-578 Khashman, Adnan III-530 Kim, Cheol-Hong II-447 Kim, H.J. I-602 Kim, Hyun-Ki I-464 Kim, Jeong-Tae III-256 Kim, Jong-Myon II-373, II-447 Kim, Wook-Dong I-464 Kiselev, Andrey I-1 Kuai, Xinkai II-305, II-381 Lee, Chung-Hong III-548 Lee, Tian-Shyug III-246 Lee, Yang Weon II-412 Lei, Jingye III-104 Lei, Miao I-437 Lei, Ting I-68 Leoncini, Alessio III-523 Li, Bing I-411 Li, Bo II-1 Li, Chuandong I-339 Li, Dahu I-348 Li, Fangfang III-621 Li, Hong-zhou II-364 Li, Huan III-297
Author Index Li, Juekun II-350 Li, Junfu III-348 Li, Lixue III-539 Li, Shusong II-591 Li, Wenfan I-429 Li, Xiang II-525 Li, Xiao III-594 Li, Yang I-505, III-331, III-340 Li, Yangmin II-85 Li, Yongwei I-474 Li, Zhan I-101, I-393 Li, Zhaobin II-611 Li, Zhongxu II-66 Lian, Chia-Mei III-246 Lian, Chuanqiang III-11 Lian, Zhichao II-350 Liang, Qing-wei II-159 Liang, Xiao III-112 Liang, Xue II-66 Liao, Bilian I-420 Liao, Kuo-Wei III-475 Liao, Xiaofeng I-339 Lin, Chen-Lun III-355 Lin, Huan III-238 Lin, Lin I-484 Lin, Xiaofeng I-420, I-455, III-143 Lin, Xiaozhu I-60 Lin, Yuzhang III-188 Liu, Bo I-280 Liu, Derong II-620, II-630 Liu, Guohai III-58, III-68 Liu, Huaping II-207 Liu, Jian-ming II-364 Liu, Jiming III-426 Liu, Jin II-1 Liu, Jingfa III-209 Liu, Jiqian I-514 Liu, Kefu II-552 Liu, Kun-Hong III-513 Liu, Li III-621 Liu, LianDong III-485 Liu, Qing III-1 Liu, Qingzhong II-466 Liu, Shaoman II-542 Liu, Wenjie III-209 Liu, Xiangjie III-77 Liu, Xiangying II-121 Liu, Xiaobing II-313 Liu, Xiao-ling I-68 Liu, Xiaoping II-552
639
Liu, Yan II-76 Liu, Yan-Jun II-535 Liu, Yankui II-182 Liu, Yeling I-358 Liu, Zhenbing I-620 Liu, Zhigang I-429, III-442 Liu, Zhuofu III-348 Long, Guangzheng I-557 Lopes, Leo II-192 Lu, Chao III-188 Lu, Chi-Jie II-278, III-228, III-246 Lu, Jun-an I-166 Lu, Junxiang I-185 Lu, Qishao I-109 Lu, Wenlian I-280, I-305 Lu, Xiaofan III-442 Lu, Youlin III-219 Luo, Wei-lin II-574 Luo, Xingxian III-629 Luo, Zhongming III-348 Lv, Yupei I-203 Ma, Binrong III-460 Ma, Cenrui I-557 Ma, Jinwen II-9, II-47, II-136, II-165 Ma, Libo II-429 Ma, Tinghuai III-209 Ma, Wei-Min II-103 Ma, Xirong II-477 Mahjoub, Mohamed Ali III-168 Majewski, Maciej I-83 Malsburg, Christoph von der II-232 Marchi, Erik II-496 Mastorocostas, Paris A. I-529 Mazzocco, Thomas III-601 Meng, Xuejing I-194 Miao, Chunfang I-241, I-375 Minaei, Behrouz II-144 Minaei-Bidgoli, Behrouz III-576 Mitsukura, Yasue III-306 Mogi, Ken I-76 Montoyo, Andr´es III-611 Mu, Chaoxu II-515 Nakama, Takehiko I-270 Nakayama, Takashi III-493 Nawaz, Asif I-578 Neruda, Roman I-538, III-31 Nguyen, Ngoc Thi Thu II-447 Ni, Zhen III-1
640
Author Index
Nishida, Toyoaki I-1 Nwulu, Nnamdi I. III-530 Oh, Sung-Kwun Okumura, Keiji
I-464, III-256, III-264 I-287
Pamplona, Daniela II-232 Parvin, Hamid II-144, III-576 Pathak, Manas III-450 Patternson, Eric II-288 Pedrycz, Witold III-426 Pei, Weidong II-477 Peng, Kai III-566 Peng, Kui I-420, I-455 Peng, Xiyuan I-437, I-445 Peng, Yu I-437, I-445 Peng, Yueping I-42 Peng, Zhi-yong II-364 P´erez, Alejandro D. III-388 Piazza, Francesco I-403, II-437, III-558 Pogrebnyak, Oleksiy III-388 Pratama, Mahardhika II-525 Principi, Emanuele II-437 Qian, Dianwei III-77 Qiao, Junfei I-52, I-495, III-122 Qiao, Mengyu II-466 Qin, Hui III-219 Quiles, Marcos III-426 Rasch, Malte J. II-429 Raza, Fahad II-388 Ren, Dongxiao II-457 Ricanek, Karl II-251, II-288 Richard, J.O. II-525 Rotili, Rudy II-437 Ruan, Xiaogang II-583 Salari, Ezzatollah III-200 San, Lin II-525 Sangiacomo, Fabio III-523 Sato, Yasuomi D. I-251, I-287, II-232 Savitha, R. I-602 Schuller, Bj¨ orn II-496 Sethuram, Amrutha II-288 Shadike, Muhetaer III-594 Shahjahan, Md. II-260 Shang, Xiaojing III-331, III-340 Shao, Yuehjen E. II-278 Shen, Jifeng II-342
Shen, Jihong I-93 Shen, Ning III-467 Shi, Dongyu II-151 Shi, Haibo I-10, I-27 Shiino, Masatoshi I-287 Sluˇsn´ y, Stanislav III-31 Song, Chunning III-143 Song, Gaoshun I-572 Song, Jing I-139 Song, Q. II-563 Song, Qiankun I-203, I-213, I-231, I-260, I-411 Song, Sanming I-17 Song, Shaojian I-420, I-455, III-143 Song, Shun-cheng I-68 Song, Xiaoying II-331 Song, Xin III-407 Song, Y.D. II-563 Sotoca, J.M. II-19 Squartini, Stefano I-403, II-437, II-496 Stuart, Keith Douglas I-83 Sun, Changyin II-251, II-270, II-288, II-342, II-515 Sun, Fuchun II-207 Sun, Qiuye II-66 Sun, Shiliang I-595, II-76, II-151, III-323, III-434 Sun, Wanlu I-429 Sun, Zhongxi II-342 Sundararajan, N. I-602 Sung, Andrew H. II-466 Suresh, S. I-602 Tan, Yue II-542 Tanaka, Toshihisa III-306 Tang, Yezhong I-139 Tao, Lan III-485 Teng, Yunlong III-416 Thuy, Nguyen Thi Thanh II-94 Tian, Bo II-103 Tian, Maosheng I-194 Tian, Yantao I-505, II-356, III-331, III-340 Tiˇ no, Peter II-37 Tomita, Yohei III-306 Tong, Huan I-348 Tong, Weiqing II-57 Torabi, Amin II-525 Toribio, P. II-19 Toshima, Mayumi I-76
Author Index Trelis, Ana Botella I-83 Truong, Tung Xuan II-373 Tsai, Yi-Jun III-228 Tu, Wenting I-595 Valdovinos, R.M. II-19 Vidnerov´ a, Petra I-538 Vien, Ngo Anh II-94 Vinh, Nguyen Thi Ngoc II-94 Wan, Chuan II-356 Wang, Changming I-572 Wang, Cong III-160 Wang, Cuirong III-152, III-160, III-407 Wang, Dan II-542 Wang, Ding II-620, II-630 Wang, Ge II-388 Wang, Hongyan II-47 Wang, Huajin I-385 Wang, Huiwei I-231 Wang, Jia III-274 Wang, Jian II-611 Wang, Jianmin I-445 Wang, Jinkuan III-152 Wang, Juan III-407 Wang, Jun I-166, I-547 Wang, Kui III-586 Wang, Lu III-586 Wang, Mei-Hong III-513 Wang, M.R. II-563 Wang, Ning II-542 Wang, Qiang II-402 Wang, Rong-Tsu III-370 Wang, San-fu I-158 Wang, Shangfei III-238 Wang, ShanShan I-185 Wang, Shuwei I-52 Wang, Xiaoyan I-315, II-175 Wang, Xiaozhe II-192, III-450 Wang, Xin III-539 Wang, Xinzhu II-356 Wang, Yanmei I-315, II-175 Wang, Yingchun I-329, III-274 Wang, Yuekai II-331 Wang, Yun I-35 Wang, Zhanjun III-50 Wang, Zhanshan I-148 Wang, Zheng III-586 Wang, Zhengxin I-321 Wang, Zhijie I-10, I-27
Wang, Zhiliang I-329 Wasili, Buheliqiguli III-594 Wei, Hui I-35, II-215 Wei, Qinglai II-620 Wei, Yongtao III-152 Wen, Guo-Xing II-535 Weng, Juyang II-331 Wieczorek, Tadeusz III-380 W¨ ollmer, Martin II-496 Won, Chang-Hee II-601 Wong, Pak-Kin II-85 Wu, Chih-Hong III-548 Wu, Chunxue I-521 Wu, Feng III-442 Wu, Jui-Yu III-228, III-246 Wu, Jun II-611, III-11 Wu, Qing-Qiang III-513 Wu, Shaoxiong II-113 Wu, Si I-93, II-429 Wu, Sichao I-339 Wu, Wenfang III-460 Wu, Xiaofeng II-331 Wu, Xiaoqun I-166 Xia, Hong III-460 Xia, Shenglai I-587 Xiang, Ting I-117 Xiao, Hao III-209 Xiao, Lingfei III-84 Xiao, Min I-132 Xie, Jinli I-10, I-27 Xie, SongYun II-225, II-388, II-395 Xing, Yun II-296 Xiong, Ming II-270 Xu, Daoyun III-566 Xu, Jianchun III-297 Xu, Jingru III-363 Xu, Qingsong II-85 Xu, Xianyun I-565 Xu, Xiaohui I-368 Xu, Xin II-611, III-11 Xu, Yuli II-241 Xu, Zhijie III-434 Xu, Zhiyuan III-297 Yampolskiy, Roman V. III-132 Yang, Dong II-420 Yang, Gang I-495, II-583 Yang, Guosheng II-305, II-381 Yang, Kai II-182
641
642
Author Index
Yang, Lei II-103 Yang, Miao III-460 Yang, Ping I-139 Yang, Qingshan II-296 Yang, Wankou II-251, II-270, II-288, II-342 Yang, Yan II-165 Yang, Yongqing I-565 Yang, Yun I-557 Yao, Gang III-539 Yao, Hongxun I-17 Ye, Mao II-457 Ye, Xiaoming I-60 Ye, Yibin I-403 Yi, Chenfu I-385 Yu, Wenwu III-178 Yu, Xinghuo II-515 Yuan, Mingzhe I-495 Yuan, Tao I-474 Yuan, Ying III-160 Yun, Kuo I-148 Zang, Tianlei III-467 Zhai, L.-Y. II-525 Zhang, Bo II-313, II-323 Zhang, Boya III-77 Zhang, Chengyi I-185 Zhang, Enlin I-148 Zhang, Fan II-241 Zhang, Huaguang I-148, I-329 Zhang, Huaxiang III-505 Zhang, Huifeng III-219 Zhang, Jiye I-368 Zhang, Liqing II-402 Zhang, Nian I-610, II-27 Zhang, Rui III-219 Zhang, Shiyong II-591 Zhang, Shuangteng III-200
Zhang, Weihua I-368 Zhang, Wengqiang II-331 Zhang, Xin II-182 Zhang, Xinhong II-241 Zhang, Yanzhu I-315, II-175 Zhang, Ying I-474 Zhang, Yunong I-101, I-393 Zhang, Yu-xin I-158 Zhang, Zhonghua I-358 Zhao, Leina I-222 Zhao, Liang III-426 Zhao, Min-quan II-159 Zhao, Wenxiang III-58, III-68 Zhao, Zihao III-586 Zhao, Ziping II-477 Zheng, Yu III-209 Zhong, Jia I-474 Zhong, Jiang III-397 Zhou, Bo I-203 Zhou, Changsong I-166 Zhou, Jianguo II-66 Zhou, Jianzhong III-219 Zhou, Lidan III-539 Zhou, Lingbo II-121 Zhou, Liqiong I-305 Zhou, Naibiao II-552 Zhou, Xiangrong II-121 Zhou, Xiaohua III-143 Zhu, Wenjun II-402 Zhu, Yuanxiang II-457 Zhu, Yue III-84 Zou, Dayun III-467 Zou, Yi Ming III-290 Zou, Zao-jian II-574 Zunino, Rodolfo III-523 Zuo, Lei II-611 Zuo, Yuanyuan II-323