Advances in Swarm Intelligence: First International Conference, ICSI 2010, Beijing, China, June 12-15, 2010, Proceedings, Part II (Lecture Notes in ... Computer Science and General Issues)
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
6146
Ying Tan Yuhui Shi Kay Chen Tan (Eds.)
Advances in Swarm Intelligence First International Conference, ICSI 2010 Beijing, China, June 12-15, 2010 Proceedings, Part II
13
Volume Editors Ying Tan Peking University, Key Laboratory of Machine Perception (MOE) Department of Machine Intelligence Beijing 100871, China E-mail: [email protected] Yuhui Shi Xi’an Jiaotong-Liverpool University, Research and Postgraduate Office Suzhou, 215123, China E-mail: [email protected] Kay Chen Tan National University of Singapore Department of Electrical and Computer Engineering 4 Engineering Drive 3, 117576 Singapore E-mail: [email protected]
Library of Congress Control Number: 2010927598 CR Subject Classification (1998): F.1, H.3, I.2, H.4, H.2.8, I.4 LNCS Sublibrary: SL 1 – Theoretical Computer Science and General Issues ISSN ISBN-10 ISBN-13
0302-9743 3-642-13497-1 Springer Berlin Heidelberg New York 978-3-642-13497-5 Springer Berlin Heidelberg New York
This book and its companion volume, LNCS vols. 6145 and 6146, constitute the proceedings of the International Conference on Swarm Intelligence (ICSI 2010) held in Beijing, the capital of China, during June 12-15, 2010. ICSI 2010 was the first gathering in the world for researchers working on all aspects of swarm intelligence, and provided an academic forum for the participants to disseminate their new research findings and discuss emerging areas of research. It also created a stimulating environment for the participants to interact and exchange information on future challenges and opportunities of swarm intelligence research. ICSI 2010 received 394 submissions from about 1241 authors in 22 countries and regions (Australia, Belgium, Brazil, Canada, China, Cyprus, Hong Kong, Hungary, India, Islamic Republic of Iran, Japan, Jordan, Republic of Korea, Malaysia, Mexico, Norway, Pakistan, South Africa, Chinese Taiwan, UK, USA, Vietnam) across six continents (Asia, Europe, North America, South America, Africa, and Oceania). Each submission was reviewed by at least three reviewers. Based on rigorous reviews by the Program Committee members and reviewers, 185 high-quality papers were selected for publication in the proceedings with the acceptance rate of 46.9%. The papers are organized in 25 cohesive sections covering all major topics of swarm intelligence research and development. In addition to the contributed papers, the ICSI 2010 technical program included four plenary speeches by Russell C. Eberhart (Indiana University Purdue University Indianapolis, IUPUI, USA), Gary G. Yen (President of IEEE Computational Intelligence Society, CIS, Oklahoma State University, USA), Erol Gelenbe (London Imperial College, UK), Nikola Kasabov (President of International Neural Network Soceity, INNS, Auckland University of Technology, New Zealand). Besides the regular parallel oral sessions, ICSI 2010 also had several poster sessions focusing on wide areas. As organizers of ICSI 2010, we would like to express sincere thanks to Peking University and Xi’an Jiaotong-Liverpool University for their sponsorship, to the IEEE Beijing Section, International Neural Network Society, World Federation on Soft Computing, Chinese Association for Artificial Intelligence, and National Natural Science Foundation of China for their technical co-sponsorship. We appreciate the National Natural Science Foundation of China and K.C. Wong Education Foundation, Hong Kong, for their financial and logistic supports. We would also like to thank the members of the Advisory Committee for their guidance, the members of the International Program Committee and additional reviewers for reviewing the papers, and members of the Publications Committee for checking the accepted papers in a short period of time. Particularly, we are grateful to the proceedings publisher, Springer, for publishing the proceedings in the prestigious series of Lecture Notes in Computer Science. Moreover, we wish to express our heartfelt appreciation to the plenary speakers, session chairs, and
VI
Preface
student helpers. In addition, there are still many more colleagues, associates, friends, and supporters who helped us in immeasurable ways; we express our sincere gratitude to them all. Last but not the least, we would like to thank all the speakers, authors and participants for their great contributions that made ICSI 2010 successful and all the hard work worthwhile.
June 2010
Ying Tan Yuhui Shi Tan Kay Chen
Organization
Honorary Chairs Qidi Wu, China Russell C. Eberhart, USA
General Chair Ying Tan, China
Advisory Committee Chairs Zhenya He, China Xingui He, China Xin Yao, UK Yixin Zhong, China
Program Committee Chairs Yuhui Shi, China Tan Kay Chen, Singapore
Technical Committee Chairs Gary G. Yen, USA Jong-Hwan Kim, South Korea Xiaodong Li, Australia Xuelong Li, UK Frans van den Bergh, South Africa
Plenary Sessions Chairs Robert G. Reynolds, USA Qingfu Zhang, UK
Special Sessions Chairs Martin Middendorf, Germany Jun Zhang, China Haibo He, USA
VIII
Organization
Tutorial Chair Carlos Coello Coello, Mexico
Publications Chair Zhishun Wang, USA
Publicity Chairs Ponnuthurai N. Suganthan, Singapore Lei Wang, China Maurice Clerc, France
Finance Chair Chao Deng, China
Registration Chairs Huiyun Guo, China Yuanchun Zhu, China
Program Committee Members Peter Andras, UK Bruno Apolloni, Italy Payman Arabshahi, USA Sabri Arik, Turkey Frans van den Bergh, South Africa Christian Blum, Spain Salim Bouzerdoum, Australia Martin Brown, UK Jinde Cao, China Liang Chen, Canada Zheru Chi, Hong Kong, China Leandro dos Santos Coelho, Brazil Carlos A. Coello Coello, Mexico Emilio Corchado, Spain Oscar Cordon, Spain Jose Alfredo Ferreira Costa, Brazil Xiaohui Cui, USA Arindam Das, USA
Prithviraj Dasgupta, USA Kusum Deep, India Mingcong Deng, Japan Yongsheng Ding, China Haibin Duan, China Mark Embrechts, USA Andries Engelbrecht, South Africa Meng Joo Er, Singapore Peter Erdi, USA Yoshikazu Fukuyama, Japan Wai Keung Fung, Canada Ping Guo, China Luca Maria Gambardella, Switzerland Erol Gelenbe, UK Mongguo Gong, China Jivesh Govil, USA Suicheng Gu, USA Qing-Long Han, Australia
Organization
Haibo He, USA Zhengguang Hou, China Huosheng Hu, UK Xiaohui Hu, USA Guangbin Huang, Singapore Amir Hussain, UK Zhen Ji, China Colin Johnson, UK Nikola Kasabov, New Zealand Arun Khosla, India Franziska Klugl, Germany Lixiang Li, China Yangmin Li, Macao, China Kang Li, UK Xiaoli Li, UK Xuelong Li, UK Guoping Liu, UK Ju Liu, China Fernando Lobo, Portugal Chris Lokan, Australia Wenlian Lu, China Hongtao Lu, China Wenjian Luo, China Xiujun Ma, China Jinwen Ma, China Bernd Meyer, Australia Martin Middendorf, Germany Hongwei Mo, China Francesco Mondada, Switzerland Ben Niu, China Erkki Oja, Finland Mahamed Omran, Kuwait Paul S. Pang, New Zealand Bijaya Ketan Panigrahi, India Thomas E. Potok, USA
Jose Principe, USA Ruhul A. Sarker, Australia Gerald Schaefer, UK Giovanni Sebastiani, Italy Michael Small, Hong Kong, China Ponnuthurai Nagaratnam Suganthan, Singapore Norikazu Takahashi, Japan Ying Tan, China Ran Tao, China Peter Tino, UK Christos Tjortjis, Greece G.K. Venayagamoorthy, USA Ling Wang, China Guoyin Wang, China Bing Wang, UK Lei Wang, China Cheng Xiang, Singapore Shenli Xie, China Simon X. Yang, Canada Yingjie Yang, UK Dingli Yu, UK Zhigang Zeng, China Yanqing Zhang, USA Qingfu Zhang, UK Jie Zhang, UK Lifeng Zhang, China Liangpei Zhang, China Junqi Zhang, China Yi Zhang, China Jun Zhang, China Jinhua Zheng, China Aimin Zhou, China Zhi-Hua Zhou, China
Reviewers Ajiboye Saheeb Osunleke Akira Yanou Antonin Ponsich Bingzhao Li Bo Liu Carson K. Leung Changan Jiang
IX
Chen Guici Ching-Hung Lee Chonglun Fang Cong Zheng Dawei Zhang Daoqiang Zhang Dong Li
X
Organization
Fei Ge Feng Jiang Gan Huang Gang Chen Haibo Bao Hongyan Wang Hugo Hern´ andez I-Tung Yang Iba˜ nez Panizo Jackson Gomes Janyl Jumadinova Jin Hu Jin Xu Jing Deng Juan Zhao Julio Barrera Jun Guo Jun Shen Jun Wang Ke Cheng Ke Ding Kenya Jinno Liangpei Zhang Lihua Jiang Lili Wang Lin Wang Liu Lei Lixiang Li Lorenzo Valerio Naoki Ono Ni Bu Orlando Coelho Oscar Iba˘ nez Pengtao Zhang
Prakash Shelokar Qiang Lu Qiang Song Qiao Cai Qingshan Liu Qun Niu Renato Sassi Satvir Singh Sergio P. Santos Sheng Chen Shuhui Bi Simone Bassis Song Zhu Spiros Denaxas Stefano Benedettini Stelios Timotheou Takashi Tanizaki Usman Adeel Valerio Arnaboldi Wangli He Wei Wang Wen Shengjun Wenwu Yu X.M. Zhang Xi Huang Xiaolin Li Xin Geng Xiwei Liu Yan Yang Yanqiao Zhu Yongqing Yang Yongsheng Dong Yulong Wang Yuan Cao
Using AOBP for Definitional Question Answering . . . . . . . . . . . . . . . . . . . . Junkuo Cao, Weihua Wang, and Yuanzhong Shu
85
Radial Basis Function Neural Network Based on PSO with Mutation Operation to Solve Function Approximation Problem . . . . . . . . . . . . . . . . . Xiaoyong Liu
Distributed Hierarchical Control for Railway Passenger-Dedicated Line Intelligent Transportation System Based on Multi-Agent . . . . . . . . . . . . . . Jingdong Sun, Yao Wang, and Shan Wang
252
GA-Based Integral Sliding Mode Control for AGC . . . . . . . . . . . . . . . . . . . Dianwei Qian, Xiangjie Liu, Miaomiao Ma, and Chang Xu
260
Stable Swarm Formation Control Using Onboard Sensor Information . . . Viet-Hong Tran and Suk-Gyu Lee
268
XIV
Table of Contents – Part II
A Distributed Energy-aware Trust Topology Control Algorithm for Service-Oriented Wireless Mesh Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . Chuanchuan You, Tong Wang, BingYu Zhou, Hui Dai, and Baolin Sun A Quay Crane Scheduling Model in Container Terminals . . . . . . . . . . . . . . Qi Tang Leader-Follower Formation Control of Multi-robots by Using a Stable Tracking Control Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yanyan Dai, Viet-Hong Tran, Zhiguang Xu, and Suk-Gyu Lee Research on the Coordination Control of Vehicle EPS and ABS . . . . . . . . Weihua Qin, Qidong Wang, Wuwei Chen, and Shenghui Pan
A Discrete-Time Recurrent Neural Network for Solving Systems of Complex-Valued Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wudai Liao, Jiangfeng Wang, and Junyan Wang
315
A Recurrent Neural Network for Solving Complex-Valued Quadratic Programming Problems with Equality Constraints . . . . . . . . . . . . . . . . . . . Wudai Liao, Jiangfeng Wang, and Junyan Wang
321
Computer-Aided Detection and Classification of Masses in Digitized Mammograms Using Artificial Neural Network . . . . . . . . . . . . . . . . . . . . . . . Mohammed J. Islam, Majid Ahmadi, and Maher A. Sid-Ahmed
327
Gene Selection and PSO-BP Classifier Encoding a Prior Information . . . Yu Cui, Fei Han, and Shiguang Ju A Modified D-S Decision-Making Algorithm for Multi-sensor Target Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiaolong Liang, Jinfu Feng, and An Liu
335
343
Machine Learning Methods Intelligent Decision Support System for Breast Cancer . . . . . . . . . . . . . . . . R.R. Janghel, Anupam Shukla, Ritu Tiwari, and Rahul Kala
351
An Automatic Index Validity for Clustering . . . . . . . . . . . . . . . . . . . . . . . . . Zizhu Fan, Xiangang Jiang, Baogen Xu, and Zhaofeng Jiang
359
Table of Contents – Part II
Exemplar Based Laplacian Discriminant Projection . . . . . . . . . . . . . . . . . . X.G. Tu and Z.L. Zheng
XV
367
A Novel Fast Non-negative Matrix Factorization Algorithm and Its Application in Text Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fang Li and Qunxiong Zhu
A Computerized Approach of the Knowledge Representation of Digital Evolution Machines in an Artificial World . . . . . . . . . . . . . . . . . . . . . . . . . . . Istvan Elek
On the Correlations between Fuzzy Variables Yankui Liu and Xin Zhang College of Mathematics and Computer Science, Hebei University Baoding 071002, Hebei, China [email protected], [email protected]
Abstract. The expected value and variance of a fuzzy variable have been well studied in the literature, and they provide important characterizations of the possibility distribution for the fuzzy variable. In this paper, we seek a similar characterization of the joint possibility distribution for a pair of fuzzy variables. In view of the success of introducing the expected value and variance as fuzzy integrals of appropriate functions of single fuzzy variable, it is natural to look to fuzzy integrals of appropriate functions of a pair of fuzzy variables. We consider one such function to obtain the covariance of the pair fuzzy variables and focus on its computation for common possibility distributions. Under mild assumptions, we derive several useful covariance formulas for triangular and trapezoidal fuzzy variables, which have potential applications in quantitative finance problems when we consider the correlations among fuzzy returns. Keywords: Fuzzy variable, Expected value, Covariance, Quantitative finance problem.
1
Introduction
In probability theory, the mean value of a random variable locates the center of the induced probability distribution, which provides important information about the distribution. Since quite different probability distributions may share the same mean value, we can distinguish them via variance. Therefore, both the mean value and the variance provide useful characterizations of the probability distribution for a single random variable. To show the probabilistic ties between a pair of random variables, the covariance is a practical tool and has been widely studied in the literature. Chen et al. [1] proposed a simulation algorithm to estimate mean, variance, and covariance for a set of order statistics from inverse-Gaussian distribution; Cuadras [2] gave the covariance between the functions of two random variables in terms of the cumulative distribution functions; Hirschberger et al. [3] developed a procedure for the random generation of covariance matrices in portfolio selection. For more applications about the covariance, the interested reader may refer to [4,5]. Since the pioneering work of Zadeh [6], possibility theory has been well developed and extended in the literature such as [7,8,9,10]. Among them, Liu and Liu [7] presents the concept of credibility measure based on possibility distribution, Liu [8] develops credibility theory, and Liu and Liu [9] proposed an Y. Tan, Y. Shi, and K.C. Tan (Eds.): ICSI 2010, Part II, LNCS 6146, pp. 1–8, 2010. c Springer-Verlag Berlin Heidelberg 2010
2
Y. Liu and X. Zhang
axiomatic framework from which fuzzy possibility theory was developed. Credibility theory provides the theoretical foundation for optimization under possibilistic uncertainty [11,12,13,14,15,16]. In addition, Gao [17] and Hua [18] discussed the properties of covariance of fuzzy variables. The objective of this paper is also to study the correlations of fuzzy variables. Since the covariance of a pair fuzzy variables is defined by nonlinear fuzzy integral, its computation for general fuzzy variables is a challenge issue for research, and very often relies on approximation scheme and intelligent computing. To avoid this difficulty, we consider the case when the joint possibility distribution of the fuzzy variables is the minimum of its marginal possibility distributions, and derive several useful covariance formulas for common triangular and trapezoidal fuzzy variable. The obtained results have potential applications in portfolio optimization problems when the correlations among fuzzy returns are considered. Our paper proceeds as follows. In Section 2, we review several required fundamental concepts. Under mild assumptions, Section 3 derives several useful covariance formulas for triangular fuzzy variables. An extension result about trapezoidal fuzzy variables is reported in Section 4. Section 5 concludes the paper and points out our future research in this field.
2
Fuzzy Variables
Let ξ be a fuzzy variable with a possibility distribution μξ : → [0, 1]. Then for any r ∈ , the possibility and credibility of an event {ξ ≥ r} were defined by Pos{ξ ≥ r} = sup μξ (t), and Cr{ξ ≥ r} = t≥r
1 (1 + sup μξ (t) − sup μξ (t)) 2 t
and the expected value of ξ was defined as [7] ∞ Cr{ξ ≥ r}dr − E[ξ] = 0
(1)
0
−∞
Cr{ξ ≤ r}dr
(2)
provided that one of the two integrals is finite. Formula (2) is a kind of nonlinear fuzzy integrals, but under the independence assumption [19], the expected value has linear property [20]. Furthermore, if ξ has a finite expected value, then (2) can be represented as the following equivalent form [20] 1 1 E[ξ] = (ξsup (α) + ξinf (α))dα, (3) 2 0 where for any α ∈ (0, 1], ξsup (α) and ξinf (α) are the following α-optimistic and α-pessimistic functions of ξ with respect to possibility in the sense ξsup (α) = sup{r | Pos{ξ ≥ r} ≥ α}, and ξinf (α) = inf{r | Pos{ξ ≤ r} ≥ α}. Let ξ and η be fuzzy variables with finite expected values E[ξ] and E[η]. Then the covariance of ξ and η was defined by [18] Cov[ξ, η] = E[(ξ − E[ξ])(η − E[η])].
(4)
On the Correlations between Fuzzy Variables
3
3
Correlations between Triangular Fuzzy Variables
Since the covariance of fuzzy variables is defined by nonlinear fuzzy integral, its computation for general fuzzy variables is a challenge issue for research, and usually relies on intelligent computing. In this section, if ξ and η are two fuzzy variables with possibility distributions μξ and μη , respectively. In order to compute their covariance, we assume that joint possibility distribution μξ,η and marginal possibility distributions μξ and μη have the following relationship: μξ,η (s, t) = μξ (s) ∧ μη (t) for any (s, t) ∈ 2 . This property is called the independence between ξ and η, and has been studied in [19]. In this section, we limit ourself to triangular fuzzy variables. In the case when one of triangular fuzzy variables is symmetric, we have the following result: Theorem 1. If ξ = (r0 − a, r0 , r0 + a) and η = (r0 − b, r0 , r0 + c) are triangular fuzzy variables such that μξ is symmetric with a > 0, b > 0, and c > 0, then we have Cov[ξ, η] = 0. Proof. Denote by ξ = ξ − E[ξ], and η = η − E[η]. Since E[ξ] = r0 , we have ξ = ξ − E[ξ] = (−a, 0, a). Therefore, the α-cut of ξ is ξα = [ξαL , ξαR ], where ξαL = a(α − 1), and ξαR = a(1 − α) for any 0 < α ≤ 1. On the other hand, according to E[η] =
4r0 − b + c −3b − c b − c b + 3c , η = η − E[η] = ( , , ), 4 4 4 4
we know the α-cut of η is ηα = [ηαL , ηαR ], where ηαL = (4bα − 3b − c)/4, and ηαR = (b + 3c − 4cα)/4 for any 0 < α ≤ 1. Using the notations above, we get the α-cut of fuzzy variable ξ η as follows R (ξ η )α = [(ξ η )inf (α), (ξ η )sup (α)] = [(ξ η )L α , (ξ η )α ] L L L R R L R R = [min{ξα ηα , ξα ηα , ξα ηα , ξα ηα }, max{ξαL ηαL , ξαL ηαR , ξαR ηαL , ξαR ηαR }].
Case I. If 0 < α ≤ 1/2, then ξαL < 0, ξαR > 0, |ξαL | = ξαR , ηαL < 0, ηαR > 0, |ηαL | ≤ ηαR . L R R R R Therefore, we have (ξ η )L α = ξα ηα , and (ξ η )α = ξα ηα . Case II. If 1/2 < α ≤ 1, then
ξαL < 0, ξαR > 0, |ξαL | = ξαR , ηαL < 0, ηαR > 0, |ηαL | > ηαR . L L R R L Thus, we have (ξ η )L α = ξα ηα , and (ξ η )α = ξα ηα . Combining the above gives L R ξα ηα , if 0 < α ≤ 12 L (ξ η )inf (α) = (ξ η )α = ξαL ηαL , if 12 < α ≤ 1,
and
(ξ η )sup (α) =
(ξ η )R α
=
ξαR ηαR , if 0 < α ≤ 12 ξαR ηαL , if 12 < α ≤ 1.
Finally, it follows from (3) that Cov[ξ, η] = 0, which completes the proof of the theorem.
4
Y. Liu and X. Zhang
In the case when the left spreads are greater than the right spreads for both triangular fuzzy variables, we have: Theorem 2. Let ξ = (r0 − a, r0 , r0 + b) and η = (r0 − c, r0 , r0 + d) be triangular fuzzy variables such that the left spreads of μξ and μη are greater than their respective right spreads in the sense a > b > 0, and c > d > 0. (i) If bc ≥ ad, a = c, and b = d, then Cov[ξ, η] =
Proof. We only prove assertion (i), and (ii) can be proved similarly. If we denote ξ = ξ − E[ξ], and η = η − E[η], then we have E[ξ] =
4r0 − a + b −3a − b a − b a + 3b , and ξ = ξ − E[ξ] = ( , , ). 4 4 4 4
By the possibility distribution of ξ , we get the α-cut of ξ is ξα = [ξαL , ξαR ], where ξαL = (4aα − 3a − b)/4, and ξαR = (a + 3b − 4bα)/4 for 0 < α ≤ 1. On the other hand, from E[η] =
4r0 − c + d −3c − d c − d c + 3d , and η = η − E[η] = ( , , ), 4 4 4 4
we get the α-cut of η is ηα = [ηαL , ηαR ], where ηαL = (4cα − 3c − d)/4, and ηαR = (c + 3d − 4dα)/4 for 0 < α ≤ 1. As a consequence, in this case, the α-cut of ξ η can be represented as R (ξ η )α = [(ξ η )inf (α), (ξ η )sup (α)] = [(ξ η )L α , (ξ η )α ] L L L R R L R R = [min{ξα ηα , ξα ηα , ξα ηα , ξα ηα }, max{ξαL ηαL , ξαL ηαR , ξαR ηαL , ξαR ηαR }].
By the supposition, bc ≥ ad, a = c, and b = d, we have (3c + d)/4c ≤ (3a + b)/4a < (3b + a)/4b ≤ (3d + c)/4d. Case I. If 0 < α ≤ 1/2, then ξαL < 0, ξαR > 0, |ξαL | ≥ ξαR , and ηαL < 0, ηαR > 0, |ηαL | ≥ ηαR . L R R L R L L Therefore, one has (ξ η )L α = min{ξα ηα , ξα ηα }, and (ξ η )α = ξα ηα . According to the following inequality
× 4cα−3c−d − 4aα−3a−b × ξαR ηαL − ξαL ηαR = a+3b−4bα 4 4 4 1 = (ad − bc)(α − 2 )(α − 1) < 0, R L we known (ξ η )L α = ξα ηα .
c+3d−4dα 4
On the Correlations between Fuzzy Variables
5
Case II. If 1/2 < α < (3c + d)/4c, then ξαL < 0, ξαR > 0, |ξαL | < ξαR , and ηαL < 0, ηαR > 0, |ηαL | < ηαR . L R R L R R R Therefore, we have (ξ η )L α = min{ξα ηα , ξα ηα }, and (ξ η )α = ξα ηα . By the following inequality
L R we get (ξ η )L α = ξα ηα . Case III. If α = (3c + d)/4c, then
ξαL < 0, ξαR > 0, |ξαL | < ξαR , and ηαL = 0, ηαR > 0, ηαL < ηαR . L R R R R In this case, we have (ξ η )L α = ξα ηα , and (ξ η )α = ξα ηα . Case IV. If (3c + d)/4c < α ≤ (3a + b)/4a, then
ξαL ≤ 0, ξαR > 0, |ξαL | < ξαR , and ηαL > 0, ηαR > 0, ηαL < ηαR , L R R R R which lead to (ξ η )L α = ξα ηα , and (ξ η )α = ξα ηα . L R R Combining the cases II, III and IV gives (ξ η )L α = ξα ηα , and (ξ η )α = R R ξα ηα whenever 1/2 < α ≤ (3a + b)/4a. Case V. If (3a + b)/4a < α ≤ 1, then 0 < ξαL < ξαR , 0 < ηαL < ηαR . It follows L L R R R that (ξ η )L α = ξα ηα , and (ξ η )α = ξα ηα . Finally, from the above computational results, we have ⎧ R L ⎨ ξα ηα , if 0 < α ≤ 12 L (ξ η )inf (α) = (ξ η )α = ξαL ηαR , if 12 < α ≤ 3a+b 4a ⎩ L L ξα ηα , if 3a+b < α ≤ 1, 4a
and
⎧ L L ⎨ ξα ηα , if 0 < α ≤ 12 R (ξ η )sup (α) = (ξ η )α = ξαR ηαR , if 12 < α ≤ 3a+b 4a ⎩ R R ξα ηα , if 3a+b 4a < α ≤ 1.
As a consequence, by (3), we have the desired result. The proof of assertion (i) is complete. For triangular fuzzy variables, in the case when their right spreads are greater than their left spreads, we have: Theorem 3. Let ξ = (r0 − a, r0 , r0 + b) and η = (r0 − c, r0 , r0 + d) be triangular fuzzy variables such that the right spreads of μξ and μη are greater than their respective left spreads in the sense b > a > 0, d > c > 0. (i) If ad ≥ bc , b = d, and a = c, then Cov[ξ, η] =
The next theorem deals with the case when the left spread of one fuzzy variable is greater than its right spread, while the left spread of another fuzzy variable is smaller than its right spread. Theorem 4. Let ξ = (r0 − a, r0 , r0 + b) and η = (r0 − c, r0 , r0 + d) be triangular fuzzy variables such that a > b > 0, and d > c > 0. (i) If bd < ac, then Cov[ξ, η] =
The correlations between fuzzy variables is an important issue in fuzzy community. Due to the limitation of nonlinear fuzzy integrals, the covariance can very often only be obtained numerically for general fuzzy variables with known possibility distributions. In this paper, we focused on the computation of covariance for triangular and trapezoidal fuzzy variables. Under the assumption that the joint possibility distribution is the minimum of its marginal possibility distributions, we derived several useful covariance formulas. The obtained results have potential applications in portfolio optimization when we consider the correlations among fuzzy returns, which will be addressed in our future research. Acknowledgments. This work was supported by the National Nature Science Foundation of China (NSFC) under Grant No. 60974134.
References 1. Chen, H., Chang, K., Cheng, L.: Estimation of Means and Covariances of InverseGaussian Order Statistics. European Journal of Operational Research 155, 154–169 (2004) 2. Cuadras, C.M.: On the Covariance between Functions. Journal of Multivariate Analysis 81, 19–27 (2002) 3. Hirschberger, M., Qi, Y., Steuer, R.E.: Randomly Generating Portfolio Selection Convariance Matrices with Specified Distributional Characteristics. European Journal of Operational Research 177, 1610–1625 (2007) 4. Koppelman, F., Sethi, V.: Incorporating Variance and Covariance Heterogeneity in the Generalized Nested Logit Model: an Application to Modeling Long Distance Travel Choice Behavior. Transportation Research Part B 39, 825–853 (2005) 5. Popescu, I.: Robust Mean Covariance Solutions for Stochastic Optimization. Operations Research 55, 98–112 (2007) 6. Zadeh, L.A.: Fuzzy Sets as a Basis for a Theory of Possibility. Fuzzy Sets and Systems 1, 3–28 (1978) 7. Liu, B., Liu, Y.K.: Expected Value of Fuzzy Variable and Fuzzy Expected Value Models. IEEE Transactions on Fuzzy Systems 10, 445–450 (2002) 8. Liu, B.: Uncertainty Theory. Springer, Berlin (2004) 9. Liu, Z., Liu, Y.: Type-2 Fuzzy Variables and Their Arithmetic. Soft Computing 14, 729–747 (2010) 10. Qin, R., Hao, F.: Computing the Mean Chance Distributions of Fuzzy Random Variables. Journal of Uncertain Systems 2, 299–312 (2008) 11. Liu, Y.K.: The Convergent Results about Approximating Fuzzy Random Minimum Risk Problems. Applied Mathematics and Computation 205, 608–621 (2008) 12. Liu, Y., Tian, M.: Convergence of Optimal Solutions about Approximation Scheme for Fuzzy Programming with Minimum-Risk Criteria. Computers & Mathematics with Applications 57, 867–884 (2009) 13. Liu, Y., Liu, Z., Gao, J.: The Modes of Convergence in the Approximation of Fuzzy Random Optimization Problems. Soft Computing 13, 117–125 (2009)
8
Y. Liu and X. Zhang
14. Lan, Y., Liu, Y., Sun, G.: Modeling Fuzzy Multi-Period Production Planning and Sourcing Problem with Credibility Service Levels. Journal of Computational and Applied Mathematics 231, 208–221 (2009) 15. Sun, G., Liu, Y., Lan, Y.: Optimizing Material Procurement Planning Problem by Two-Stage Fuzzy Programming. Computers & Industrial Engineering 58, 97–107 (2010) 16. Qin, R., Liu, Y.: Modeling Data Envelopment Analysis by Chance Method in Hybrid Uncertain Environments. Mathematics and Computers in Simulation 80, 922–950 (2010) 17. Gao, X.: Some Properties of Covariance of Fuzzy Variables. In: 3th International Conference on Information and Management Sciences, vol. 3, pp. 304–307. California Polytechnic State University, USA (2004) 18. Hua, N.: Properties of Moment and Covariance of Fuzzy Variables. Bachelor Thesis, Department of Mathematical Science, Tsinghua University (2003) 19. Liu, Y.K., Gao, J.: The Independence of Fuzzy Variables with Applications to Fuzzy Random Optimization. International Journal of Uncertainty, Fuzziness & Knowledge-Based Systems 15, 1–20 (2007) 20. Liu, Y.K., Liu, B.: Expected Value Operator of Random Fuzzy Variable and Random Fuzzy Expected Value Models. International Journal of Uncertainty, Fuzziness & Knowledge-Based Systems 11, 195–215 (2003)
Modeling Fuzzy Data Envelopment Analysis with Expectation Criterion Xiaodong Dai, Ying Liu, and Rui Qin College of Mathematics & Computer Science, Hebei University Baoding 071002, Hebei, China [email protected], [email protected], [email protected]
Abstract. This paper presents a new class of fuzzy expectation data envelopment analysis (FEDEA) models with credibility constraints. Since the proposed model contains the credibility of fuzzy events in the constraints and the expected value of a fuzzy variable in the objective, the solution process is very complex. Thus, in the case when the inputs and outputs are mutually independent trapezoidal fuzzy variables, we discuss the equivalent nonlinear forms of the programming model, which can be solved by standard optimization software. At the end of this paper, one numerical example is also provided to illustrate the efficiency of decision-making unites (DMUs) in the proposed model. Keywords: Data envelopment analysis, Credibility constraint, Fuzzy variable, Expected value, Efficiency.
1
Introduction
Data envelopment analysis (DEA) was initially proposed by Charnes, Cooper and Rhodes [1]. It is an evaluation method for measuring the relative efficiency of a set of homogeneous DMUs with multiple inputs and multiple outputs. Since the first DEA model CCR [1], DEA has been studied by a number of researchers in many fields [2,3,4]. The advantage of the DEA method is that it does not require either a priori weights or the explicit specification of functional relations between the multiple inputs and outputs. However, when evaluating the efficiency, the data in traditional DEA models must be crisp, and the efficiency is very sensitive to data variations. To deal with stochastic data variations, some researchers proposed several DEA models. For example, Cooper, Huang and Li [5] developed a satisficing DEA model with chance constrained programming; Olesen and Peterso [6] developed a probabilistic constrained DEA model. For more stochastic DEA approaches, the interested readers may refer to [7,8,9]. On the other hand, to deal with fuzziness in the real world problems, Zadeh [10] proposed the concept of fuzzy set. Recently, the credibility theory [11], mean chance theory and fuzzy possibility theory [12] have also been proposed to treat
Corresponding author.
Y. Tan, Y. Shi, and K.C. Tan (Eds.): ICSI 2010, Part II, LNCS 6146, pp. 9–16, 2010. c Springer-Verlag Berlin Heidelberg 2010
10
X. Dai, Y. Liu, and R. Qin
fuzzy phenomena existing in real-life problems. For more theories and applications of credibility theory and mean chance theory, the interested readers may refer to [13,14,15,16]. In fuzzy environments, some researchers extended the traditional DEA and proposed several fuzzy DEA models. For example, Entani, Maeda and Tanaka [17] developed a new pair of interval DEA models; Saen [18] proposed a new pair of assurance region-nondiscretionary factors-imprecise DEA models, and Triantis and Girod [19] proposed a mathematical programming approach to transforming fuzzy input and output data into crisp data. This paper attempts to establish a new class of fuzzy DEA models based on credibility theory [11], and discuss its equivalent nonlinear forms when the inputs and outputs are mutually independent trapezoidal fuzzy variables. The rest of this paper is organized as follows. In Section 2, we present the fuzzy expectation DEA models with fuzzy inputs and fuzzy outputs. Section 3 discusses the equivalents of credibility constraints and the expectation objective in some special cases. In Section 4, we provide a numerical example to illustrate the relative efficiency in the proposed model and the effectiveness of our solution method. Section 5 draws our conclusions.
2
Fuzzy DEA Formulation
The traditional DEA model, which was proposed by Charnes, Cooper and Rhodes (CCR) [1], is built as ⎧ v T y0 ⎪ ⎪ ⎪ max ⎪ u,v uT x0 ⎪ ⎨ T subject to vT yi ≤ 1, i = 1, · · · , n (1) ⎪ u xi ⎪ ⎪ ⎪ u ≥ 0, u =0 ⎪ ⎩ v ≥ 0, v = 0, where xi represent the input column vector of DMUi , x0 represents the input column vector of DMU0 ; yi represent the output column vector of DMUi , y0 represents the output column vector of DMU0 ; u ∈ m and v ∈ s are the weights of the input and output column vectors. Model (1) is used to evaluate the relative efficiency of DMUs with crisp inputs and outputs. However, in many cases, we can only obtain the possibility distributions of the inputs and outputs. Thus in this paper, we assume that the inputs and outputs are characterized by fuzzy variables with known possibility distributions. Based on fuzzy expected value operator and credibility measure [20], we can establish the following fuzzy expectation DEA model T max VEDEA = E vuT ηξ00 u,v
subject to Cr{uT ξi − v T ηi ≥ 0} ≥ αi , u ≥ 0, u =0 v ≥ 0, v = 0, where the notations are illustrated in Table 1.
i = 1, 2, · · · , n
(2)
Modeling Fuzzy Data Envelopment Analysis with Expectation Criterion
11
Table 1. List of Notations for Model (2) Notations
Definitions
ξ0
the fuzzy input column vector consumed by DMU0
ξi
the fuzzy input column vector consumed by DMUi , i = 1, · · · , n
η0
the fuzzy output column vector produced by DMU0
ηi
the fuzzy output column vector produced by DMUi , i = 1, · · · , n
u ∈ m
the weights of the fuzzy input column vector
v∈ s
the weights of the fuzzy output column vector
αi ∈ (0, 1]
the predetermined credibility level corresponding to the ith constraint
In model (2), our purpose is to seek a decision (u, v) with the maximum value of E v T η0 /uT ξ0 , while the fuzzy event {uT ξi − v T ηi ≥ 0} is satisfied at least with credibility level αi . Thus, we adopt the concept of expectation efficient value to illustrate the efficiency of DMU0 . The optimal value of model (2) is referred to as the expectation efficient value of DMU0 , and the bigger the value is, the more efficient it is. Model (2) is very difficult to solve. Therefore, in the next section, we will discuss the equivalent forms of model (2) in some special cases.
3
Deterministic Equivalent Programming of Model (2)
In the following, we first handle the constraint functions. 3.1
Handing Credibility Constraints
When the inputs and outputs are mutually independent trapezoidal fuzzy vectors, the constraints of model (2) can be transformed to their equivalent linear forms according to the following theorem. Theorem 1. Let ξi = (Xi −ai , Xi , Xi +bi , Xi +ci ), ηi = (Yi −¯ ai , Yi , Yi +¯bi , Yi +¯ ci ) be independent trapezoidal fuzzy vectors with ai , bi , ci , a ¯i , ¯bi , c¯i positive numbers. Then Cr{uT ξi − v T ηi ≥ 0} ≥ αi in the model (2) is equivalent to gi (u, v) ≥ 0,
(3)
ci + 2(1 − αi )¯bi ). where gi (u, v) = uT (Xi − (2αi − 1)ai ) − v T (Yi + (2αi − 1)¯ Proof. It is obvious that uT ξi − v T ηi = (uT (Xi − ai ) − v T (Yi + c¯i ), uT Xi − ¯i )). When 0.5 < αi < 1 v T (Yi + ¯bi ), uT (Xi + bi ) − v T Yi , uT (Xi + ci ) − v T (Yi − a (i = 1, · · · , n), according to the distributions of uT ξi − v T ηi and the definition of the credibility measure, we have uT (Xi − ai ) − v T (Yi + c¯i ) < 0 < uT Xi − v T (Yi + ¯bi ).
12
X. Dai, Y. Liu, and R. Qin
Thus, Cr{uT ξi − v T ηi ≥ 0} ≥ αi is equivalent to uT (Xi − (2αi − 1)ai ) − v T (Yi + (2αi − 1)¯ ci + 2(1 − αi )¯bi ) ≥ 0. The proof of the theorem is complete. By the transformation process proposed above, we have turned the constraint functions of model (2) to their equivalent linear forms. In the following, we will discuss the equivalent form of the objective. 3.2
Equivalent Representation of the Expectation Objective
In this section, we first deduce some formulas for the expected value of the quotient of two independent fuzzy variables. Theorem 2. Suppose ξ = (X −a, X, X +b, X +c) and η = (Y −¯ a, Y, Y +¯b, Y +¯ c) are two mutually independent trapezoidal fuzzy variables, where a, b, c, ¯a, ¯b, c¯ are positive numbers, b < c, ¯b < c¯ and X > a or X < −c. Then we have
¯ c a ¯ 1 X ¯ c¯−¯b E ηξ = − 2(c−b) + b−¯ 2a + 2a Y + b + a X ln X−a (4) 1 a ¯ Y + c−b + 2(c−b) (X + b) ln X+c X+b . Proof. We only prove the case X > a, and when X < −c, the proof is similar. When X > a, ξ is a positive fuzzy variable. Thus, we have
0 +∞ E ηξ = 0 Cr ηξ ≥ r dr − −∞ Cr ηξ ≤ r dr (5) +∞ 0 = 0 Cr{η − rξ ≥ 0}dr − −∞ Cr{η − rξ ≤ 0}dr. Since η − rξ = (Y − rX − (rc + a ¯), Y − rX − rb, Y − rX + ¯b, Y − rx + (¯ c + ra)), according to credibility measure of a fuzzy event, we have ⎧ −¯ a 1, if r < YX+c ⎪ ⎪ ⎪1 Y −r(X+b) Y −¯ a Y ⎪ ⎪ ⎨ 2 + 2(¯a+r(c−b)) , if X+c ≤ r < X+b ¯ Y Cr{η − rξ ≥ 0} = 12 , if X+b ≤ r < YX+b ⎪ ¯ ¯ ⎪ 1 Y +¯ c ⎪ + Y −rX+¯b , if YX+b ≤ r < X−a ⎪ ⎪ ⎩ 2 2(ra+¯c−b) Y +¯ c 0, if r ≥ X−a , ⎧ Y −¯ a 0, if r < ⎪ X+c ⎪ ⎪ Y −r(X+b) 1 Y −¯ a Y ⎪ − , if ⎪ ⎨ 2 2(¯a+r(c−b)) X+c ≤ r < X+b ¯ Y Cr{η − rξ ≤ 0} = 12 , if X+b ≤ r < YX+b ⎪ ⎪ ⎪ 1 − Y −rX+¯¯b , if Y +¯b ≤ r < Y +¯c ⎪ X X−a ⎪ ⎩ 2 2(ra+¯c−b) Y +¯ c 1, if r ≥ X−a . In the following, we calculate E[η/ξ] according to five cases. (i) If (Y − a ¯)/(X + c) > 0, i.e. Y > a ¯, then (5) becomes to Y −¯a
YX+¯b 1 Y Y −r(X+b) 1 + dr + dr E ηξ = 0X+c 1dr + YX+b −¯ a Y 2 2(¯ a+r(c−b)) 2 X+b X+c Y +¯ c
X−a ¯ Y −rX+¯ b −¯ a Y + Y +¯b 12 + 2(ra+¯ + M1 + 12 ( YX+b − X+b ) + M2 , dr = YX+c c−¯ b) X
Modeling Fuzzy Data Envelopment Analysis with Expectation Criterion
where
13
Y
a ¯ Y −¯ a X+b 1 c−b (X+b)+Y M1 = 12 X+b − YX+c + YX+b − 2(c−b) + 2(c−b) dr a ¯ −¯ a c−b +r X+c
X+b Y a ¯ = 12 − 2(c−b) − Y −¯a + 1 + Y ln X+c c−b (X + b) X+b ,
X+b X+c 2(c−b)
1 X Y +¯ c Y +¯ b 1 c¯−¯ b X ¯ M2 = 2 − 2a X−a − X + 2a a X + Y + b ln X−a .
Therefore, formula (4) is valid for this case. (ii) If (Y − a ¯)/(X + c) ≤ 0 < Y /(X + b), i.e. 0 < Y ≤ a ¯, then (5) becomes to Y
Y +¯ c
YX+¯b 1 X−a Y −r(X+b) 1 Y −rX+¯ b E ηξ = 0X+b 12 + 2(¯ dr Y a+r(c−b)) dr + X+b 2 dr + Y +¯b 2 + 2(ra+¯ c−¯ b) X 0 1 Y −r(X+b) 1 Y +¯ b Y − Y −¯a 2 − 2(¯ a+r(c−b)) dr = M1 + 2 ( X − X+b ) + M2 − M3 , X+c
where M2 is the same with the M2 in Case (i), and
(c−b) X+b Y 1 a ¯ + (X + b) + Y ln a¯(X+b)+Y , M1 = 12 − 2(c−b)
X+b 2(c−b) c−b
a¯(X+b) a ¯ (X+c) X+b 1 Y Y −¯ a 1 a ¯ M3 = 2(c−b) − 2 X+b − X+c − 2(c−b) c−b (X + b) + Y ln a ¯ (X+b)+Y (c−b) , Therefore, formula (4) is valid for this case. (iii) If Y /(X + b) ≤ 0 < (Y + ¯b)/X, i.e. −¯b < Y ≤ 0, then (5) becomes to Y +¯b
Y X+b Y +¯c 1 Y −r(X+b) Y −rX+¯ b 1 E ηξ = 0 X 12 dr + YX−a dr − − dr Y −¯ a ¯ +¯ b 2 + 2(ra+¯ 2 2(¯ a +r(c−b)) c−b) X+c X 0 1 Y +¯ b Y − Y 2 dr = 2X + M2 − M3 + 2(X+b) , X+b
where M2 is the same with the M2 in Case (i), and
X+c Y Y −¯ a 1 a ¯ − M3 = 2(c−b) − (X + b) + Y ln X+c X+b X+c 2(c−b) c−b X+b , Therefore, formula (4) is valid for this case. (iv) If (Y + ¯b)/X ≤ 0 < (Y + c¯)/(X − a), i.e., −c < Y ≤ −¯b, then (5) becomes to Y +¯c
Y 1 YX+¯b 1 Y −r(X+b) Y −rX+¯ b dr − YX+b E ηξ = 0X−a 12 + 2(ra+¯ −¯ a Y 2 − 2(¯ a+r(c−b)) dr − X+b 2 dr c−¯ b) X+c
0 1 ¯ Y −rX+¯ b Y − Y +¯b 2 − 2(ra+¯ dr = M2 − M3 − 12 YX+b − 2(X+b) − M4 , c−¯ b) X
where M3 is the same with the M3 in Case (iii), and
Y +¯c X 1 c¯−¯ b ¯b ln X(¯c−¯b)+a(Y +¯a) , M2 = 12 − 2a + X + Y + X−a 2a X−a
a 1 X Y +¯ b 1 c¯−¯ b X ¯ M4 = − 2 + 2a X − 2a a X + Y + b ln X(¯c−¯b)+a(Y . +¯ a) Therefore, formula (4) is valid for this case. (v) If (Y + c¯)/(X − a) ≤ 0, i.e. Y ≤ −¯ c, then (5) becomes to
Y +¯ c
Y YX+¯b 1 X−a Y −r(X+b) 1 1 − dr − dr − − E ηξ = − YX+b ¯ −¯ a Y Y + b 2 2(¯ a+r(c−b)) 2 2 X+b X+c X
0 ¯ Y Y +¯ c − Y +¯c 1dr = −M2 − 12 YX+b − 2(X+b) − M3 + X−a , X−a
Y −rX+¯ b 2(ra+¯ c−¯ b)
dr
14
X. Dai, Y. Liu, and R. Qin
where M3 is the same with the M3 in Case (iii), and
X+c Y Y −¯ a 1 a ¯ − M2 = 2(c−b) − (X + b) + Y ln X+c X+b X+c 2(c−b) c−b X+b . Therefore, formula (4) is valid for this case. The proof of the theorem is complete. 3.3
Deterministic Equivalent Programming
Denote the inputs and outputs of DMU0 as ξ0 = (ξ1,0 , · · · , ξm,0 )T and η0 = (η1,0 , · · · , ηs,0 )T . Suppose that ξj,0 = (Xj,0 − aj,0 , Xj,0 , Xj,0 + bj,0 , Xj,0 + cj,0 ) and ηk,0 = (Yk,0 − a ¯k,0 , Yk,0 , Yk,0 + ¯bk,0 , Yk,0 + c¯k,0 ) are mutually independent trapezoidal fuzzy variables, where aj,0 , bj,0 , ck,0 , a ¯j,0 , ¯bj,0 , c¯k,0 are positive numbers, and Xj,0 > aj,0 , Yk,0 > a ¯k,0 for j = 1, · · · , m, k = 1, · · · , s. Then according to Theorem 2, we have T
¯ c ¯ a ¯ 1 b X f0 (u, v) = E vuT ηξ00 = − 2(c−b) + b−¯ + 2a Y + ¯b + c¯− X ln X−a 2a a
(6) 1 a ¯ + 2(c−b) (X + b) ln X+c Y + c−b X+b , where m m s s a = j=1 uj aj,0 , b = j=1 uj bj,0 , c = k=1 vk ck,0 , a ¯ = k=1 vk a ¯k,0 , ¯b = s vk ¯bk,0 , c¯ = s vk c¯k,0 , X = m uj Xj,0 , Y = s vk Yk,0 . k=1 k=1 j=1 k=1 As a consequence, when the inputs and outputs are mutually independent trapezoidal fuzzy variables, the model (2) can be transformed into the following equivalent nonlinear programming max u,v
f0 (u, v)
subject to gi (u, v) ≥ 0, i = 1, 2, · · · , n u ≥ 0, u =0 v ≥ 0, v = 0,
(7)
where f0 (u, v) and gi (u, v) are defined by (6) and (3), respectively. The model (7) is a nonlinear problem with linear constraints, which can be solved by standard optimization solvers.
4
Numerical Example
In order to illustrate the solution method for the proposed FDEA, we provide a numerical example with five DMUs, and each DMU has four fuzzy inputs and four fuzzy outputs. In addition, for each DMU, the inputs and outputs are characterized by mutually independent trapezoidal fuzzy variables, as shown in Table 2. For simplicity, we assume that α1 = α2 = · · · = α5 = α. With model (7), we obtain the results of evaluating all the DMUs with credibility level α = 0.95 with Lingo software [21], as shown in Table 3. From the
Modeling Fuzzy Data Envelopment Analysis with Expectation Criterion
15
Table 2. Four Fuzzy Inputs and Outputs for Five DMUs DMUi i=1 i=2 i=3 i=4 i=5 DMUi i=1 i=2 i=3 i=4 i=5
results, we know that DMU4 is the most efficient with expectation efficient value 0.9149018, followed by DMU2 and DMU1 , which implies that DMU4 has the best position in competition. If the DMUs with less expectation efficient values want to improve their position in competition, they should decrease their inputs. Therefore, with the expectation efficient values, the decision makers can obtain more information and thus make better decisions in competition.
5
Conclusions
This paper proposed a new class of fuzzy DEA models with credibility constraints and expectation objective. In order to solve the proposed model, for trapezoidal fuzzy inputs and outputs, we discussed the equivalent representation for the constraints and the objective. With such transformations, the proposed DEA model can be turned into its equivalent nonlinear programming, which can be solved by standard optimization softwares. At last, a numerical example was provided to illustrate the efficiency of DMUs in the proposed DEA model. Acknowledgments. This work was supported by the National Nature Science Foundation of China (NSFC) under Grant No. 60974134.
16
X. Dai, Y. Liu, and R. Qin
References 1. Charnes, A., Cooper, W.W., Rhodes, E.: Measuring the Efficiency of Decision Making Units. European Journal of Operational Research 2, 429–444 (1978) 2. Cook, W.D., Seiford, L.M.: Data Envelopment Analysis (DEA)-Thirty Years on. European Journal of Operational Research 192, 1–17 (2009) 3. Cooper, W.W., Seiford, L.M., Tone, K.: Data Envelopment Analysis. Springer Science and Business Media, New York (2007) 4. Emrouznejad, A., Parker, B.R., Tavares, G.: Evaluation of Research in Efficiency and Productivity: A Survey and Analysis of the First 30 Years of Scholarly Literature in DEA. Socio-Economic Planning Sciences 42, 151–157 (2008) 5. Cooper, W.W., Huang, Z.M., Li, S.X.: Satisficing DEA Models under Chance Constraints. Annals of Operations Research 66, 279–295 (1996) 6. Olesen, O.B., Peterso, N.C.: Chance Constrained Efficiency Evaluation. Management Science 41, 442–457 (1995) 7. Desai, A., Ratick, S.J., Schinnar, A.P.: Data Envelopment Analysis with Stochastic Variations in Data. Socio-Economic Planning Sciences 3, 147–164 (2005) 8. Gong, L., Sun, B.: Efficiency Measurement of Production Operations under Uncertainty. International Journal of Production Economics 39, 55–66 (1995) 9. Retzlaff-Roberts, D.L., Morey, R.C.: A Goal Programming Method of Stochastic Allocative Data Envelopment Analysis. European Journal of Operational Research 71, 379–397 (1993) 10. Zadeh, L.A.: Fuzzy Sets. Information and Control 8, 338–353 (1965) 11. Liu, B.: Uncertainty Theory: An Introduction to its Axiomatic Foundations. Springer, Berlin (2004) 12. Liu, Z.Q., Liu, Y.K.: Type-2 Fuzzy Variables and their Arithmetic. Soft Computing 14(7), 729–747 (2010) 13. Lan, Y., Liu, Y.K., Sun, G.: Modeling Fuzzy Multi-period Production Planning and Sourcing Problem with Credibility Service Levels. Journal of Computational and Applied Mathematics 231, 208–221 (2009) 14. Qin, R., Hao, F.F.: Computing the Mean Chance Distributions of Fuzzy Random Variables. Journal of Uncertain Systems 2, 299–312 (2008) 15. Qin, R., Liu, Y.K.: A New Data Envelopment Analysis Model with Fuzzy Random Inputs and Outputs. Journal of Applied Mathematics and Computing (2009), doi:10.1007/s12190-009-0289-7 16. Qin, R., Liu, Y.K.: Modeling Data Envelopment Analysis by Chance Method in Hybrid Uncertain Environments. Mathematics and Computers in Simulation 80, 922–950 (2010) 17. Entani, T., Maeda, Y., Tanaka, H.: Dual Models of Interval DEA and its Extension to Interval Data. European Journal of Operational Research 136, 32–45 (2002) 18. Saen, R.F.: Technology Selection in the Presence of Imprecise Data, Weight Restrictions, and Nondiscretionary Factors. The International Journal of Advanced Manufacturing Technology 41, 827–838 (2009) 19. Triantis, K., Girod, O.: A Mathematical Programming Approach for Measuring Technical Efficiency in a Fuzzy Environment. Journal of Productivity Analysis 10, 85–102 (1998) 20. Liu, B., Liu, Y.K.: Expected Value of Fuzzy Variable and Fuzzy Expected Value Models. IEEE Transaction on Fuzzy Systems 10, 445–450 (2002) 21. Mahmoud, M.E.: Appendix II: Lingo Software. Process Systems Engineering 7, 389–394 (2006)
Finding and Evaluating Fuzzy Clusters in Networks Jian Liu LMAM and School of Mathematical Sciences, Peking University, Beijing 100871, P.R. China [email protected]
Abstract. Fuzzy cluster validity criterion tends to evaluate the quality of fuzzy partitions produced by fuzzy clustering algorithms. In this paper, an effective validity index for network fuzzy clustering is proposed, which involves the compactness and separation measures for each cluster. The simulated annealing strategy is used to minimize this validity index, associating with a dissimilarity-index-based fuzzy c-means iterative procedure, under the framework of a random walker Markovian dynamics on the network. The proposed algorithm (SADIF) can efficiently identify the probabilities of each node belonging to different clusters during the cooling process. An appropriate number of clusters can be automatically determined without any prior knowledge about the network structure. The computational results on several artificial and real-world networks confirm the capability of the algorithm. Keywords: Fuzzy clustering, Validity index, Dissimilarity index, Fuzzy c-means, Simulated annealing.
1
Introduction
Recently, the structure and dynamics of networks have been frequently concerned in physics and other fields as a foundation for the mathematical representation of various complex systems [1,2,3]. Network models have also become popular tools in social science, economics, the design of transportation and communication systems, banking systems, etc, due to our increased capability of analyzing these models [4,5]. Modular organization of networks, closely related to the ideas of graph partitioning, has attracted considerable attention, and many real-world networks appear to be organized into clusters that are densely connected within themselves but sparsely connected with the rest of the networks. A huge variety of cluster detection techniques have been developed into partitioning the network into a small number of clusters [6,7,8,9,10,11], which are based variously on centrality measures, flow models, random walks, optimization and many other approaches. On a related but different front, recent advances in computer vision and data mining have also relied heavily on the idea of viewing a data set or an image as a graph or a network, in order to extract information about the important features of the images or more generally, the data sets [12,13]. Y. Tan, Y. Shi, and K.C. Tan (Eds.): ICSI 2010, Part II, LNCS 6146, pp. 17–26, 2010. c Springer-Verlag Berlin Heidelberg 2010
18
J. Liu
The dissimilarity index for each pair of nodes and the corresponding hierarchical algorithm to partition the networks are proposed in [9]. The basic idea is to associate the network with the random walker Markovian dynamics [14]. In traditional clustering literature, a function called validity index [15] is often used to evaluate the quality of clustering results, which has smaller values indicating stronger cluster structure. This can motivate us to solve the fuzzy clustering problem by an analogy to the fuzzy c-means algorithm [16] and construct an extended formulation of Xie-Beni index under this measure. Then simulated annealing strategy [17,18] is utilized to obtain the minimum value of such index, associating with a dissimilarity-index-based fuzzy c-means iterative procedure. The fuzzy clustering contains more detailed information and has more predictive power than the old way of doing network partition. We will construct our algorithm — simulated annealing with a dissimilarityindex-based fuzzy c-means (SADIF) for fuzzy partition of networks. From the numerical performance to three model problems: the ad hoc network with 128 nodes, the karate club network and and sample network generated from Gaussian mixture model, we can see that our algorithm can efficiently and automatically determine the optimal number of clusters and identify the probabilities of each node belonging to different clusters during the cooling process. The rest of the paper is organized as follows. In Section 2, we briefly introduce the dissimilarity index [9] which signifies to what extent two nodes would like to be in the same cluster, then proposed the extended fuzzy c-means and validity index for network partition. After reviewing the idea of simulated annealing, we describe our algorithm (SADIF) and the corresponding strategies in Section 3. In Section 4, we apply the algorithm to three representative examples mentioned before. Finally we make the conclusion in Section 5.
2
The Framework for Fuzzy Clustering of Networks
In [9], a dissimilarity index between pairs of nodes is defined, which one can measure the extent of proximity between nodes of a network. Let G(S, E) be a network with n nodes and m edges, where S is the nodes set, E = {e(x, y)}x,y∈S is the weight matrix and e(x, y) is the weight for the edge connecting the nodes x and y. We can relate this network to a discrete-time Markov chain with stochastic matrix P = (p(x, y)) whose entries are given by p(x, y) =
e(x, y) , d(x)
d(x) =
e(x, z),
(1)
z∈S
where d(x) is the degree of the node x [10,11,14]. Suppose the random walker is located at node x. The mean first passage time t(x, y) is the average number of steps it takes before it reaches node y for the first time, which is given by t(x, y) = p(x, y) +
+∞ j=1
(j + 1) ·
z1 ,··· ,zj =y
p(x, z1 )p(z1 , z2 ) · · · p(zj , y).
(2)
Finding and Evaluating Fuzzy Clusters in Networks
It has been shown that t(x, y) is the solution of the linear equation ⎛ ⎞ ⎛ ⎞ t(1, y) 1 ⎜ .. ⎟ ⎜ .. ⎟ [I − B(y)] ⎝ . ⎠ = ⎝ . ⎠ , t(n, y)
19
(3)
1
where B(y) is the matrix formed by replacing the y-th column of matrix P with a column of zeros [9]. The difference in the perspectives of nodes x and y about the network can be quantitatively measured. The dissimilarity index is defined by the following expression 1 2 2
N We take a partition of S as S = k=1 Sk with Sk Sl = Ø if k = l. If two nodes x and y belong to the same cluster, then the average distance t(x, z) will be quite similar to t(y, z), therefore the network’s two perspectives will be quite similar. Consequently, Λ(x, y) will be small if x and y belong to the same cluster and large if they belong to different clusters. However, this is often too restrictive for the reason that nodes at the boundary among clusters share commonalities with more than one cluster and play a role of transition in many diffusive networks. This motivates the extension to the fuzzy clustering concept where each node may belong to different clusters with nonzero probabilities. Let ρk (x) represent the probability of the node x belonging to the k-th cluster. An extended form of fuzzy c-means is considered to address the optimization issue min ρk (x), m(Sk )
JDI (ρ, m) =
N
ρ2k (x)Λ2 (m(Sk ), x),
(5)
k=1 x∈S
which guarantees convergence towards a local minimum [16]. The Euler-Lagrange N equation for (5) with constraints k=1 ρk (x) = 1 is given by the following 1/Λ2 (m(Sk ), x) ρk (x) = N , x ∈ S, 2 l=1 1/Λ (m(Sl ), x) 1 Λ(x, y), m(Sk ) = arg min x∈Sk |Sk |
k = 1, · · · , N,
(6a)
k = 1, · · · , N,
(6b)
y∈Sk ,y =x
where |Sk | is the number of nodes in Sk and we set x ∈ Sk if k = arg maxl ρl (x). A well known validity index for fuzzy clustering called Xie-Beni index [15] are widely used to classify samples overlap in Euclidean space, which is based on the fuzzy c-means algorithm [16]. We extend the idea of considering both compactness and separateness to our formulation, and propose a new dissimilarityindex-based validity index for network partition as following N ρ2 (x)Λ2 (m(Sk ), x) JDI = k=1 x∈S2 k , (7) VDI = K(m) mink =l Λ (m(Sk ), m(Sl ))
20
J. Liu
where JDI is the objective function constructed for the dissimilarity-index-based c-means which reflects compactness of the data set S and K(m) plays the role of separation. The more separate the clusters, the larger K(m) and the smaller VDI . An ideal partition requires a more stable state in space S = {S1 , . . . , SN }, which has smaller JDI and larger K( m). Thus, an optimal partition can be find by solving min N
3
min
{S1 ,··· ,SN }
VDI .
(8)
The Algorithm
The simulated annealing strategy is utilized here to address (8), which is motivated by simulating the physical process of annealing solids [17]. Firstly, a solid is heated from a high temperature and then cooled slowly so that the system at any time is approximately in thermodynamic equilibrium. At equilibrium, there may be many configurations with each one corresponding to a specific energy level. The chance of accepting a change from the current configuration to a new configuration is related to the difference in energy between the two states. The simulated annealing strategy is widely used to optimization problems [18]. Let E = VDI . E (n) and E (n+1) represent the current energy and new energy, respectively. E (n+1) is always accepted if it satisfies E (n+1) < E (n) , but if E (n+1) > E (n) the new energy level is only accepted with a probability as specified by exp(− T1 E (n) ), where E (n) = E (n+1) − E (n) is the difference of energy and T is the current temperature. The initial state is generated by randomly N clusters, here N ∈ [Nmin , Nmax ], and the initial temperature T is set to a high temperature Tmax . A neighbor of the current state is produced by randomly flipping one spin, then the energy of the new state is calculated. The new state is kept if the acceptance requirement is satisfied. This process will be repeated for R times at the given temperature. A cooling rate 0 < α < 1 decreased the current temperature until reached the bound Tmin . The whole procedure of the Simulated Annealing with a Dissimilarity-Index-based Fuzzy c-means algorithm (SADIF) is summarized as follows (1) Set parameters Tmax , Tmin , Nmin , Nmax , α and R. Choose N randomly within (0) range [Nmin , Nmax ] and initialize the memberships {ρk }N k=1 randomly; Set the current temperature T = Tmax . (0) (2) Compute the centers {m(Sk )}N k=1 according to (6b), then calculate the (0) initial energy E using the definition of VDI (7); Set n∗ = 0. (3) For n = 0, 1, · · · , R, do the following (n)
(3.1) Generate a set of centers {m(Sk )}N k=1 according to our proposal below and set N = N ; (n+1) N }k=1 and the corresponding centers (3.2) Update the memberships {Sk (n+1) N {m(Sk )}k=1 according to (6a) and (6b), respectively, then calculate the new energy E (n+1) using (7);
Finding and Evaluating Fuzzy Clusters in Networks
21
(3.3) Accept or reject the new state. If E (n+1) < E (n) or E (n+1) > E (n) with u ∼ U[0, 1], u < exp{− T1 E (n) }, then accept the new solution by setting n = n + 1; Else,reject it; ∗ (3.4) Update the optimal state, i.e. if E (n) < E (n ) , set n∗ = n. (4) Cooling temperature T = α · T . If T < Tmin , go to Step (5); Else, set n = n∗ , repeat Step (3). (n∗ ) (n∗ ) of (5) Output the optimal solution {ρk }N k=1 and the minimum energy E the whole procedure; Classify the nodes according to the majority rule, i.e. x ∈ Sk if k = arg maxl ρl (x), gives the deterministic partition. Our proposal to the process of generating a set of new partitions in Step (3.1) comprises three functions, which are deleting a current center, splitting a current center and remaining a current center. At each iteration, one of the three functions can be randomly chosen and the size of a cluster M (Sk ) = ρk (x), k = 1, · · · , N, (9) x∈Sk
is used to select a center. Obviously, the size of a cluster is larger indicates the structure of the cluster is stronger. The three functions are described below (i) Delete Center. The cluster with the minimal size Sd is identified using (9) and its center should be deleted from {m(Sk )}N k=1 . (ii) Split Center. The cluster with the maximal size Ss is identified using (9) and should be split into two clusters. The new center m(SN +1 ) is randomly chosen in Ss but m(SN +1 ) = m(Ss ). (iii) Remain Center. We remain the center set {m(Sk )}N k=1 . The number of the iterative steps depends on the initial and terminal temperature, the cooling rate and the repeating times at the given temperature. The (0) advantages of our algorithm are the initial memberships {ρk } can be randomly chosen and the whole annealing process does not cost so much as in the traditional cases according to the selected model parameters. The global minimum of (8) can be also obtained by searching over the all possible N using the fuzzy c-means algorithms (6). This will cost extremely much since for each fixed N , the fuzzy c-means procedure should be operated 1000 to 5000 trials due to its local minima. However, the simulated annealing strategy can avoid repeating ineffectively and lead to a high degree of efficiency and accuracy.
4 4.1
Experimental Results Ad Hoc Network with 128 Nodes
We apply our method to the ad hoc network with 128 nodes. The ad hoc network is a typical benchmark problem considered in many papers [7,9,10,11]. Suppose we choose n = 128 nodes, split into four clusters containing 32 nodes each.
0.7 0.6 0.5
SADIF shortest path random walk
0.4 0.3 0.2 0.1
0
1
2
3
4
5
Out links zout
6
7
8
7
30
6
25 5
20 15 10
4
5 2
3
4
5
6
N
3 2
2.5
3
3.5
4
4.5
5
5.5
Number of clusters N
(a)
(b)
6
20 2000 18
1500
JDI
0.8
8
JDI
1 0.9
DI-based validity index VDI
J. Liu DI-based validity index VDI
Fraction of nodes clasified correctly
22
16
1000 500
14 0
2
3
4
5
6
N
12
10
8
2
2.5
3
3.5
4
4.5
5
5.5
6
Number of clusters N
(c)
Fig. 1. (a)The fraction of nodes classified correctly of ad hoc network by SADIF comparing with the methods used in [7]. (b)VDI and JDI changed with N for the karate club network. The optimal VDI is reached at N = 3 with the value VDI = 3.0776. (c)VDI and JDI changed with N for the 3-Gaussian mixture network. The optimal VDI is reached at N = 3 with the value VDI = 7.1404.
Assume pairs of nodes belonging to the same clusters are linked with probability pin , and pairs belonging to different clusters with probability pout . These values are chosen so that the average node degree, d, is fixed at d = 16. In other words pin and pout are related as 31pin + 96pout = 16.
(10)
Here we naturally choose the nodes group S1 = {1 : 32}, S2 = {33 : 64}, S3 = {65 : 96}, S4 = {97 : 128}. We change zout from 0.5 to 8 and look into the fraction of nodes which correctly classified. The model parameters are set by Tmax = 3, Tmin = 10−2 , α = 0.9 and R = 50. The fraction of correctly identified nodes is shown in Figure 1(a), comparing with the two methods described in [7]. It seems that our algorithm performs noticeably better than the two previous methods, especially for the more diffusive cases when zout is large. This verifies the accuracy of our method, but our method gives more detailed information for each node. 4.2
The Karate Club Network
This network was constructed by Wayne Zachary after he observed social interactions between members of a karate club at an American university [19]. Soon after, a dispute arose between the clubs administrator and main teacher and the club split into two smaller clubs. It has been used in several papers to test the algorithms for finding clusters in networks [6,7,8,9,10,11]. The validity index function VDI changed with N using (6) is shown in Figure 1(b). Our method is operated with the model parameters Tmax = 3, Tmin = 10−2 , α = 0.9, R = 20 and the numerical and partitioning results are shown in Table 1 and Figure 2. We can see clearly that nodes nodes in the transition region have diffusive weights of belonging to the different clusters.
Finding and Evaluating Fuzzy Clusters in Networks 16
16
10
19 15
9
31
21
3
27
14
7
2
29
30
28
24
32
5 22
25
6
17
8
3
27
14
11 1
7
2
29
30
28
24
18
12
4
20
33 34
23
11 1
9
31
21
8
20
33 34
23
13
15
12
4
10
19
13
32
5 22
25
26
23
6
17
18
26
(b)
(a)
Fig. 2. (a)The fuzzy partition for the karate club network. The optimal validity index achieved is VDI = 3.0776 and corresponds to the 3 clusters represented by the weighted colors which is done as in [11]. (b)The hard partition for the karate club network obtained by the majority rule.
Table 1. The probabilities of each node belonging to different clusters of the karate club network. ρR , ρY or ρG means the probability belonging to red, yellow or green colored cluster, respectively.
Sample Network Generated from Gaussian Mixture Model
To further test the validity of the algorithm, we apply it to a sample network generated from a Gaussian mixture model. This model is quite related the concept random geometric graph [20]. We generate n sample points {xi } in two dimensional Euclidean space subject to a K-Gaussian mixture distriK bution k=1 qk G (µk , Σk ), where {qk } are mixture proportions satisfying 0 < K qk < 1, k=1 qk = 1. µk and Σk are the mean positions and covariance matrices for each component, respectively. Then we generate the network as following: if |xi −xj | ≤ dist, we set an edge between the i-th and j-th nodes; otherwise they
Fig. 3. (a)300 sample points generated from the given 3-Gaussian mixture distribution. The star symbols represent the centers of each Gaussian component. The circle, square and diamond shaped symbols represent the position of sample points in each component respectively. (b)The network generated from the sample points in Figure 3(a) with the parameter dist = 0.7. Table 2. The probabilities of the nodes which have intermediate weights belonging to different clusters for the Gaussian mixture network. For other nodes, though they have not 0-1weights, one dominate component have strength weight more that 0.85. Nodes 19 ρR ρY ρG
are not connected. We take n = 400 and K = 3, then generate the sample points with the means and the covariance matrices µ1 = (2.0, 5.0)T , µ2 = (3.5, 6.5)T , µ3 = (1.5, 7.0)T , Σ1 = Σ2 = Σ3 =
Fig. 4. (a)The fuzzy partition for the Gaussian mixture network. The optimal validity index achieved is VDI = 7.1404 and corresponds to the 3 clusters represented by the weighted colors. The positions of m ={49, 141, 204} which are colored pink has a mean L2 -error at 0.06 with respect to µ. (b)The hard partition obtained by the majority rule.
Here we pick nodes 1:100 in group 1, nodes 101:200 in group 2 and nodes 201:300 in group 3 for simplicity. With this choice, approximately q1 = q2 = q3 = 100/300. The thresholding is chosen as dist = 0.7 in this example. The sample points are shown in Figure 3(a) and the corresponding network is shown in Figure 3(b). The validity index function VDI changed with N using (6) is shown in Figure 1(c). Our method is operated with the model parameters Tmax = 3, Tmin = 10−2 , α = 0.9, R = 20 and the numerical and partitioning results are shown in Table 2 and Figure 4. The results are reasonable to indicate that our algorithm can go smoothly with several hundreds of nodes.
5
Conclusions
In this paper, we have proposed an effective validity index for fuzzy clustering in networks and used the simulated annealing strategy to minimize this index associating with a dissimilarity-index-based fuzzy c-means procedure. The algorithm (SADIF) can not only identify the probabilities of each node belonging to different clusters but also determine the optimal number of clusters automatically without any prior knowledge about the network structure. Successful applications to three representative examples, including the ad hoc network, the karate club network and the sample networks generated from Gaussian mixture model, indicate that our method can always lead to a high degree of efficiency and accuracy.
26
J. Liu
Acknowledgements. This work is supported by the National Natural Science Foundation of China under Grant 10871010 and the National Basic Research Program of China under Grant 2005CB321704.
References 1. Albert, R., Barab´ asi, A.L.: Statistical Mechanics of Complex Networks. Rev. Mod. Phys. 74(1), 47–97 (2002) 2. Newman, M.: The Structure and Function of Networks. Comput. Phys. Commun. 147(1), 40–45 (2002) 3. Newman, M., Barab´ asi, A.L., Watts, D.J.: The Structure and Dynamics of Networks. Princeton University Press, Princeton (2005) 4. Barab´ asi, A., Jeong, H., Neda, Z., Ravasz, E., Schubert, A., Vicsek, T.: Evolution of the Social Network of Scientific Collaborations. Physica A 311, 590–614 (2002) 5. Ravasz, E., Somera, A., Mongru, D., Oltvai, Z., Barab´ asi, A.: Hierarchical Organization of Modularity in Metabolic Networks. Science 297(5586), 1551–1555 (2002) 6. Girvan, M., Newman, M.: Community Structure in Social and Biological Networks. Proc. Natl. Acad. Sci. USA 99(12), 7821–7826 (2002) 7. Newman, M., Girvan, M.: Finding and Evaluating Community Structure in Networks. Phys. Rev. E 69(2), 026113 (2004) 8. Newman, M.: Modularity and Community Structure in Networks. Proc. Natl. Acad. Sci. USA 103(23), 8577–8582 (2006) 9. Zhou, H.: Distance, Dissimilarity Index and Network Community Structure. Phys. Rev. E 67(6), 061901 (2003) 10. Weinan, E., Li, T., Vanden-Eijnden, E.: Optimal Partition and Effective Dynamics of Complex Networks. Proc. Natl. Acad. Sci. USA 105(23), 7907–7912 (2008) 11. Li, T., Liu, J., Weinan, E.: Probabilistic Framework for Network Partition. Phys. Rev. E 80, 26106 (2009) 12. Shi, J., Malik, J.: Normalized Cuts and Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intel. 22(8), 888–905 (2000) 13. Meilˇ a, M., Shi, J.: A Random Walks View of Spectral Segmentation. In: Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, pp. 92–97 (2001) 14. Lovasz, L.: Random Walks on Graphs: A Survey. Combinatorics, Paul Erdos is Eighty 2, 1–46 (1993) 15. Xie, X.L., Beni, G.: A Validity Measure for Fuzzy Clustering. IEEE Tran. Pattern Anal. Mach. Intel. 13(8), 841–847 (1991) 16. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, New York (2001) 17. Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of State Calculations by Fast Computing Machines. J. Chem. Phys. 21(6), 1087 (1953) 18. Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by Simulated Annealing. Science 220(4598), 671–680 (1983) 19. Zachary, W.: An Information Flow Model for Conflict and Fission in Small Groups. J. Anthrop. Res. 33(4), 452–473 (1977) 20. Penrose, M.: Random Geometric Graphs. Oxford University Press, Oxford (2003)
On Fuzzy Diagnosis Model of Plane’s Revolution Swing Fault and Simulation Researches Dongcai Qu1, Jihong Cheng2, Wanli Dong1, and Ruizhi Zhang1 1
Department of Control Engineering of Naval Aeronautical and Astronautical University, Yantai, 264001, P.R. China [email protected] 2 Department of Scientific Research of Naval Aeronautical and Astronautical University, Yantai, 264001, P.R. China [email protected]
Abstract. Considering the fact that traditional fault diagnosis can’t absorb human’s experiences well, this paper simulated the procedure of expert’s interference with fuzzy interference to build a fault diagnosis model, and use fuzzy network to improve the model. The result of simulation proved that this model can absorb the experiences of human and make accurate judgments; the trained fuzzy network has the same function and can reach the self-learning demand. Keywords: Fault Diagnosis; Fuzzy Interference; Revolution Swing; Fuzzy Network.
2 Build of Revolution Swing Fault Diagnosis Model for Avion’s Engine From the technique regulation ,we know that when the engine’s rotate speed is under 88%, revolution swing can’t exceed ±0.5%; when rotate speed is higher than 88%, revolution swing can’t exceed ±0.3%, and too big revolution swing means that the value of revolution swing exceeded the regulated range .The reason causing this fault is the fault of rotate speed indication system or the engine’s rotate speed control system .The rotate speed meter’s swing caused by the first reason is called “ false swing [3] ”, while the swing caused by the second one is called “ real swing ” .Based on the fault diagnosis illation regulations from experiences, when rotate speed meter indicated that revolution swing was too big , and the release temperature meter also swung at the same time ,we called it “real swing ”, when rotate speed meter indicated swing was too big , while the release temperature meter indicated normal ,we called it “false swing ”. It was obvious that no matter preconditions or conclusions, actual revolution swing fault diagnosis of engine had fuzzy characters to some degree because of differences from maintenance people’s illation and conclusion. And disposal for this kind of problem belongs to uncertainty category. Therefore, fuzzy judgment [4] theory was used to deal with it . Following this, the fuzzy diagnosis model, when rotate speed of this kind of engine was lower than 88%, was founded based on the illation with uncertain words upwards. 2.1 Build of Fuzzy Fault Diagnosis Model Based on Fuzzy Illation The “A” was the fuzzy set which indicated big swing showed by rotate speed meter, and “B1,B2 ”were fuzzy sets which separately indicated swing and stabilization showed by the release temperature meter; “C1,C2”were fuzzy sets which indicated big revolution swing and revolution stabilization ; “D1, D2”were fuzzy sets which indicated that the rotate speed meter had faults and worked normally. The way of fuzzy diagnosis was to get grade of membership of all kinds of fault causes through some [5] signs’ grade of membership , so the first step to construct model was to ensure the membership functions between fault signs and fault causes. Fuzzy statistics, duality [6] contrast compositor and so on are used to ensure grade of membership . From maintenance persons’ experiences and combining with common fuzzy distribution, grade of membership of input and output were ensured here. Supposed that the subjection degree function of heavy swing showed by engine’s rotate speed meter is: 1 (1) μA = 1 + exp − 20( ni ′ − 0.5)
[
]
Thereinto, ni′ was the value of indicator’s swing .The indication of release temperature meter sometimes swung in a short time because of body’s shake, which shouldn’t be considered as the swing of exhaust temperature, and thus, supposed that subjection degree functions of swing and stabilization separately were
μ B1 =
1 1 + exp[− 8(Δt − 3)]
(2)
On Fuzzy Diagnosis Model of Plane’s Revolution Swing Fault
μ B2 =
1 1 + exp[8( Δt − 3.1)]
29
(3)
Thereinto, Δt was the time of swing showed by release temperature meter. Supposed that grade of membership of heavy revolution swing and normal revolution swing from engine were
μ C1
⎧0 ⎪ ' = ⎨10(n r − 0.4 ) ⎪1 ⎩
(n r < 0.2) '
(0.2 ≤ x ≤ 0.8) ( x > 0.8)
( x ≤ 0.5)
⎧− 2(n r ' − 0.5 ) μ C2 = ⎨ ⎩0
(4)
(5)
( x > 0.5)
There into, nr′ was the actual value of engine’s rotate speed swing. Subjection degree functions of happening heavy swing fault and working normally are
μD = 1
[
1
[
1
′
1 + exp − 10( ni − 0.5)
μ D2 =
1 + exp 20( ni ′ − 0.4)
]
(6)
]
(7)
There are two main fuzzy containing ratiocinative ways in fuzzy illation, broad sense [7] affirmation illation and broad sense denial illation . Judging from maintenance people’s illation mode in this fault diagnosis, affirmation illation was adopted here. Fuzzy illation regulations R of fault diagnosis for rotate speed control system and regulations L of fault diagnosis for rotate speed indication system were constructed:
⎧R 1 : if ni ' is A and Δt is B1 , n r' is C1 ⎪ ' ' ⎪R 2 : if ni is A and Δt is B2 , n r is C 2 ⎨ ' ' ⎪L1 : if ni is A and Δt is B2 , ni is D1 ⎪ ' ' ⎩L 2 : if ni is A and Δt is B1 , ni is D1
(8)
After constructing language variables of input and output, subjection degree functions and fuzzy regulations, compound regulations of fuzzy illation need to be ensured. Considering the affirmation illation, compound regulation of “maximum, minimum” was used, RC was adopted to ensure fuzzy containing, conjunction “also” was used to compute outputs’ compound. Taking fuzzy regulation R for example, compound operation eventually ensured was
μ C ' = ∨[(μ A ' ∧ μ B ' ) ∧ μ R ] =
{ i
i
[
i
∨ (μ A ∧ μ Bi ) ∧ (μ Ai ∧ μ Bi ) ∧ μCi '
'
μ C' = μ C ' ∨ μ C 1
' 2
]}
(9) (10)
30
D. Qu et al.
、
µA′ µB′ indicated subjection degree functions of input , µCi′ indicated output subjection degree functions of each regulation, µC′ indicated last output subjection degree function. Compound operation of fuzzy regulation L was similar with that. In this example, fuzzy single dot input was the input, the way of choosing the minimum value from maximum grade of membership was adopted in fuzzy operation for answers of output, and chose 0.5 as the threshold value of fault judgment. 2.2 Build of Fuzzy Neural Network Fault Diagnosis Model From the build process of fuzzy fault diagnosis model upwards, it was obvious that much diagnosis knowledge about swing faults of engine must be known. In order to realize the model’s self-study ability, fuzzy neural network based on foregone fuzzy regulations was built hereinafter. Two fuzzy regulations were adopted in the built model with two inputs and two outputs, so four layers network with two inputs and two outputs was built, as indicated in Fig. 1.
μ ij x1
x2
π ij
ω kj y1 y2
Fig. 1. Fault diagnosis model of fuzzy neural network
Considering that there were four regulations in the built model, so four regulations were used in this fuzzy neural network, on whose second layer , the layer of subjection degree function, eight neural cells were used, and on whose third layer , regulation layer , four neural cells were used. Four regulations may be realized in this network, if revolution swing was too big and the oil pressure swung, the fault cause was the fault of engine’s control system; if rotate speed and oil pressure were normal, the system worked normally; if revolution swing was too big and oil pressure was normal, fault cause was the fault of knowledge system; if rotate speed was normal and oil pressure swung, rotate speed didn’t swing. Using subjection degree functions µij indicated whether rotate speed and oil pressure swung, while πij indicated the output subjection degree. RP containing relation, compound regulation of “maximum and minimum”, gauss subjection degree function were adopted in the network, so the last output could be gotten,
On Fuzzy Diagnosis Model of Plane’s Revolution Swing Fault 4
2
j =1
i =1
y k = ∑ (ω kj ∏ μ ij ) μ ij = exp( −
( x i − m ij ) 2
σ ij 2
31
(11)
)
(12)
Thereinto, x1 and x2 indicated swing values and swing time of rotate speed meter and oil pressure meter. y1 and y2 indicated faults of rotate speed indication system and control system µij indicated subjection degree functions of the second layer, ωkj indicated power value from regulation layer to output layer, Error reverse transfer arithmetic was used to amend network parameters ωkj , σij and mij. Error of network output was (13) e = 0.5[( y1 − Y1 ) 2 + ( y 2 − Y2 ) 2 ] Thereinto, Y1 and Y2 indicated anticipant outputs. Making a set of fault swatch in table 1 as training swatch, the fuzzy neural network was trained. 1 indicated system fault, 0 indicated normal work of the system. Network training’s study step length was 0.01, and the network’s error constringency thing was gotten, as indicated as in Fig. 2. 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0
20
40
60
80
100
120
Fig. 2. Simulation error curve of fuzzy neural network
And diagnosis results were gotten through computing for the swatch with the network which had been finished to train, as indicated in Table 1.
3 Simulation Researches Choosing swing range of rotate speed meter from 0% to 2%, swing time range of release temperature meter from 0 second to 10 seconds, fuzzy fault diagnosis model
32
D. Qu et al. Table 1. Training swatch and training results
Fault signs
ni′
(%)
Fault causes
Δt
Network output
(s)
Fault of rotate speed control system
Fault of rotate speed indication system
0 5 4 8 6 0
0 1 0 1 0 0
0 0 0 0 0 1
0.2 0.9 0 0.7 0 0.8
y1
y2
0.01 1.02 0.01 0.98 0.02 0.00
0.22 0.05 0.01 0.17 0.04 0.98
Table 2. Diagnosis results of models
Fault signs
ni′
(%) 0.88 1 0.6 0.7
Actual fault causes
Fault of Δt rotate speed (s) control system 10 1 2 0 5 1 0 0
Fault of rotate speed indication system 0 1 0 1
Diagnosis results of fuzzy neural network Fault of Fault of rotate rotate speed speed control indication system system 1.08 0.01 0.001 0.74 0.95 0.01 0.01 0.87
Diagnosis results of fuzzy diagnosis model Fault of Fault of rotate rotate speed speed control indication system system 0.88 0 0 1.38 0.74 0 0 0.9
Fig. 3. Fault diagnosis simulation diagram of rotate speed indication system using fuzzy model
On Fuzzy Diagnosis Model of Plane’s Revolution Swing Fault
33
Fault of rotate speed control system
Fig. 4. Fault diagnosis simulation diagram of rotate speed control system using fuzzy model
was simulated when built rotate speed of engine was lower than 88%, and results were showed in Fig. 3 and Fig. 4. In figures, 0.5 is the threshold value of output, the system is judged fault when output is bigger than or equal to 0.5. It is obvious that conditions of fuzzy regulations as well as these inputs, such as normal indication of rotate speed meter and normal swing of release temperature meter, can be diagnosed using this model from results. Output of the model is according with the judgment for faults’ character and degree. Results were gotten, as indicated in Table 2, using a set of actual swatch to test the fuzzy fault diagnosis model and fuzzy neural network diagnosis model.
4 Conclusions Fuzzy fault diagnosis model and fuzzy neural network diagnosis model of engine of some model avion were built. The simulation results indicated that intrinsic experiences could be made use of preferably , faster and exact judgments for fault causes could be made, using the fuzzy fault diagnosis system built through illation means of actual revolution swing fault diagnosis ; functions of fuzzy diagnosis model and self-study ability could be realized preferably using built fuzzy neural network. Acknowledgement. Our earnest gratitude goes to the National Natural Science Foundation of China (60774016) for the support of this work.
34
D. Qu et al.
References 1. Xin, X.H., Yang, X.F.: Develop Summary of Fault Diagnosis Means in Modern Simulation Circuit. Aviation Compute Technology (2004) 2. Wang, S.T.: Fuzzy System, Fuzzy Neural Network and Application Program Design. Technology Literature Publishing Company of Shanghai (1998) 3. Tang, Y.C., Wang, Z.Y.: Revolution Swing Fault analysis of Aeroengine and Prevention Measures. Aviation Engine and Maintenance (2002) 4. Jia, J.D., Jiang, S.P.: Fault Diagnosis Expertis System of Engine Based on Fuzzy Relation Matrix Illation. Engineering of Gas Engine (1999) 5. Yang, X.C.H., Xie, Q.H.: Fuzzy Fault Diagnosis Means Based on Fault Tree. Transaction of Tong Ji University (2001) 6. Hu, B.Q.: Foundation of Fuzzy Theory. Publishing Company of Wuhan University, Wuhan (2004) 7. Sun, Z.J.: Brainpower Control Theory and Technology. Publishing Company of Tsinghua University, Beijing (1997)
Fuzzy Cluster Centers Separation Clustering Using Possibilistic Approach Xiaohong Wu1,2,∗, Bin Wu3, Jun Sun2, Haijun Fu2, and Jiewen Zhao1 1
School of Food and Biological Engineering, Jiangsu University, Zhenjiang 212013, P.R. China Tel.: +86 51188791245 [email protected] 2 School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, P.R. China 3 Department of Information Engineering, ChuZhou Vocational Technology College, ChuZhou 239000, P.R. China
Abstract. Fuzzy c-means (FCM) clustering is based on minimizing the fuzzy within cluster scatter matrix trace but FCM neglects the between cluster scatter matrix trace that controls the distances between the class centroids. Based on the principle of cluster centers separation, fuzzy cluster centers separation (FCCS) clustering is an extended fuzzy c-means (FCM) clustering algorithm. FCCS attaches importance to both the fuzzy within cluster scatter matrix trace and the between cluster scatter matrix trace. However, FCCS has the same probabilistic constraints as FCM, and FCCS is sensitive to noises. To solve this problem, possibilistic cluster centers separation (PCCS) clustering is proposed based on possibilistic c-means (PCM) clustering and FCCS. Experimental results show that PCCS deals with noisy data better than FCCS and has better clustering accuracy than FCM and FCCS. Keywords: Fuzzy c-means; Possibilistic c-means; Noise sensitivity; Cluster centers separation; Cluster scatter matrix.
1 Introduction Since Zadeh introduced the concept of fuzzy set [1], fuzzy set theory based on membership function has advanced in many disciplines, such as control theory, optimization, pattern recognition, image processing, data mining, etc, in which information is incomplete or imprecise. Fuzzy clustering performs data clustering based on fuzzy set theory, while K-means clustering clusters data based on classical set. The well-known fuzzy clustering is the fuzzy c-means (FCM) algorithm [2]. FCM algorithm makes the memberships of a data point across classes sum to one by the probabilistic constraints. And FCM is appropriate to interpret memberships as probabilities of sharing. However, FCM and its derived algorithms are mostly based on minimizing the fuzzy within cluster scatter matrix trace [3]. The fuzzy within cluster scatter matrix trace ∗
can be interpreted as a compactness measure with a within-cluster variation. FCM attaches importance to the fuzzy within cluster scatter matrix trace that measures the class centroids close to data points but neglects the between cluster scatter matrix trace that considers distances between the class centroids. From the aspect of data classification, both the within cluster scatter matrix and the between cluster scatter matrix are important. The concept of involving the between cluster scatter matrix is used in cluster validity such as the FS index proposed by Fukuyama and Sugeno [4], and in clustering algorithms such as fuzzy compactness and separation (FCS) algorithm proposed by Wu, Yu and Yang [5], fuzzy cluster centers separation (FCCS) clustering proposed by Wu and Zhou [6]. Because the between cluster scatter matrix trace can be interpreted as a separation measure with a between cluster variation, maximization of the between cluster scatter matrix trace will induce a result with well-separated clusters [5]. On the other hand, FCM is sensitive to noises [7]. To overcome these disadvantages Krishnapuram and Keller have presented the possibilistic c-means (PCM) algorithm [7] by abandoning the constraints of FCM and constructing a novel objective function. The PCM can deal with noisy data better than FCM. In this paper, we find FCCS is also sensitive to noises because the probabilistic constraints. To solve this problem, we propose possibilistic cluster centers separation (PCCS) clustering to extend the FCCS to its possibilistic model.
2 Fuzzy Cluster Centers Separation Given an unlabeled data set X={x1,x2,…,xn} ⊂ ℜ , FCCS finds the partition of X into 1
c
n
c
J FCCS (U, V ) = ∑∑ uikm Dik2 − λ ∑ vi − x i =1 k =1
subject to the constraints:
2
(1)
i =1
0 ≤ uik ≤ 1, Dik = xk − νi
c
,
and
∑u i =1
c is the number of clusters and n is the number of data points,
ik
= 1 ,∀k . Where
uik is the membership
xk in class i , weighting exponent m ∈ [1,∞]. λ > 0 represents the extent of cluster centers separation. V = ( ν1 , ν2 , … νc ) ∈ Rcp with νi ∈ Rp is the cluster center of ui , 1≤ i ≤c. U ∈ Mfc, Mfc is a fuzzy c-partition space of X : of
c
n
i =1
k =1
M f c = {U ∈ R cn | μik = [0,1], ∀i, k ; ∑ uik = 1;0 < ∑ uik < n,∀i} In equation (1), the first term is the fuzzy within cluster scatter matrix trace
S fw is the fuzzy within cluster scatter matrix:
(2)
tr ( S fw ) ,
Fuzzy Cluster Centers Separation Clustering Using Possibilistic Approach c
n
S fw = ∑∑ uikm ( xk − vi )( xk − vi )T
37
(3)
i =1 k =1
The second term includes the between cluster scatter matrix trace
tr ( S B ) , S B is the
between cluster scatter matrix [6] c
S B = ∑ (vi − x )(vi − x )T ; x = i =1
Then
1 n ∑ xj n j =1
(4)
min J FCCS (U, V ) is equal to minimize the fuzzy within cluster scatter ma( U ,V )
trix trace and maximize the between cluster scatter matrix trace. Equation (1) is optimized under constraints by Lagrange multipliers and the following equations are obtained
⎡ c ⎢ ⎛D uik = ⎢ ∑ ⎜ ik ⎜D ⎢⎣ j =1 ⎝ jk n
νi =
∑u k =1
−1
⎤ ⎥ ⎥ , ∀i, k ⎥⎦
(5a)
x − λx
m ik k
n
∑u k =1
Where
⎞ ⎟⎟ ⎠
2 m −1
m ik
−λ
, ∀i
(5b)
νi is the cluster center or prototype of ui .
3 Possibilistic c-Means Clustering The possibilistic c-means clustering is obtained by minimizing the following objective function [7]: c
n
c
n
i =1
k =1
J PCM (T, V ) = ∑∑ tik Dik2 + ∑ηi ∑ (tik log tik − tik ) i =1 k =1
Here
(6)
0 ≤ tik ≤ 1, m > 1, Dik = xk − νi . And c is the number of clusters, n is the
number of data points,
tik is the typicality of xk in class i , and tik is the typicality
value that depends on all data. Krishnapuram and Keller suggest choosing the parameters ηi that are positive constants by computing [7]:
38
X. Wu et al.
n
ηi = K
∑u
m ik , FCM
k =1 n
Dik2
∑u
,K > 0
(7)
m ik , FCM
k =1
K is always chosen to be 1; uik , FCM is the terminal membership values of
Here FCM. If
Dik = xk − νi > 0 for all i and k > 1, and X contains at least c distinct data
points,
min J PCM (T, V) is optimized and the possibilistic c-means clustering is (T, V )
obtained as follows [7]:
Dik2
tik = exp(−
ηi
), ∀i, k
(8a)
, ∀i
(8b)
n
νi =
∑t k =1 n
∑t k =1
Here
x
ik k
ik
νi is the cluster center or prototype of ui .
4 Possibilistic Cluster Centers Separation Clustering In this section, we propose a novel fuzzy clustering objective function that is a generalization of the FCCS objective function by introducing PCM algorithm. The proposed algorithm is called possibilistic cluster centers separation (PCCS) clustering. Then the objective function of PCCS is defined as: c
n
c
c
n
i =1
k =1
J PCCS (T, V ) = ∑∑ tik Dik2 − λ ∑ vi − x + ∑ηi ∑ (tik log tik − tik ) i =1 k =1
i =1
2
(9)
Here, we use the technique that comes from possibilistic clustering algorithm (PCA) [8] to compute the parameters ηi . The objective function of PCCS is rewritten: c
n
c
J PCCS (T, V ) = ∑∑ tik Dik2 − λ ∑ vi − x i =1 k =1
+
σ
2
m2c
c
n
∑∑ (t i =1 k =1
ik
log tik − tik )
i =1
2
(10)
Fuzzy Cluster Centers Separation Clustering Using Possibilistic Approach
Here the parameter σ
2
is a normalization term that measures the degree of separation
of the data set, and it is reasonable to define
σ2 =
39
1 n ∑ xk − x n k =1
2
σ 2 as the sample co-variance. That is: x=
with
1 n ∑ xj n j =1
To minimize equation (10), subject to the constraints m>1,
(11)
0 ≤ tik ≤ 1 , we obtain the
following equations
tik = exp(− n
νi =
∑t
σ2
), ∀i, k
(12a)
x − λx
ik k
k =1
n
∑t k =1
If
m 2 cDik2
ik
−λ
, ∀i
(12b)
Dik >0 for all i and k≥1, and X contains c < n distinct data points, then the algo-
rithm described below is called PCCS-AO algorithm: Initialization 1) Run FCM until termination to obtain the class center (0) V as V used by PCCS, and use Eq.(11) to calculate the parameter σ ; 2) Fix c, 1
Until
(
V ( r ) − V ( r −1) < ε ) or (r> rmax)
5 Experiments We first do experiment on X12 that is a two-dimensional data set with 12 data points whose coordinates are given in [9]. The data set X12 comes from Pal [9] and Figure 1 shows its distribution in coordinates. There are ten points (except x6 and x12) form two diamond shaped clusters with five points each on the left and right sides of the y axis. We can see x6 and x12 as noisy points and each has the same distances from the two clusters. The initialization of cluster centers [9]
Table 1 shows the terminal membership values of FCM by applying FCM-AO and Table 2 shows the terminal membership values and typicality values of FCCS and PCCS by applying FCCS-AO and PCCS-AO to X12, respectively. The membership values of x6 and x12 in both FCM and FCCS are 0.50 in each cluster. Because the probabilistic constraints used by FCM and FCCS, the membership values of x6 and x12 reflect the share of two data points between two clusters by FCM and FCCS algorithms. So their values are 0.50. But in fact x6 and x12 are noises, their membership values should be very small. So both FCM and FCCS cannot reflect the fact and they are sensitive to noises. From Table 1 and Table 2 both FCCS and FCM provide membership information but PCCS provides typicality information. PCCS abandons the probabilistic constraints in FCCS. The typicality values of x6 and x12 are small by PCCS algorithm. Because x12 is farther away from cluster centers than x6, the PCCS assigns the smaller typicality values of x12 than those of x6. This reflects the real situations. That is to say, x6 and x12 are more atypical than other 10 data points. So PCCS can distinguish the noises from datasets. So FCM and FCCS cannot distinguish noises from input data. In conclusion, FCM and FCCS are more sensitive to noises than PCCS. The other example is that we perform experiments on IRIS data set [10] is widely used in experiments. It is a four-dimensional data set that includes three classes: Setosa, Versilcolor and Viginica and each class has 50 data points. The computational
Fuzzy Cluster Centers Separation Clustering Using Possibilistic Approach
41
condition is ε =0.00001, maximum number of iterations rmax =100, m=2.0. The clustering accuracys from FCM, FCCS and PCCS on IRIS data set are illustrated in Table 3. The PCCS algorithm has better clustering accuracy than the other two algorithms except that λ is bigger than 0.40. Table 2. Terminal U and T from FCCS and PCCS on X12
6 Conclusions Based on fuzzy cluster centers separation clustering and possibilistic c-means clustering we propose possibilistic cluster centers separation clustering as an extension of fuzzy cluster centers separation clustering algorithm. PCCS abandons the probabilistic constraints of FCCS and constructs a novel objective function. Furthermore, possibilistic clustering algorithm is used to compute the parameters ηi in PCCS objective
42
X. Wu et al.
function. Our Experiments on data set X12 show that both FCM and FCCS are sensitive to noises, but PCCS can deal with noisy data better. Next example is that we make experiments on IRIS data set by running FCM-AO, FCCS-AO and PCCS-AO respectively. The experimental results show that the PCCS has better clustering accuracys than FCM and FCCS when λ is smaller than 0.50. In this paper, we select the values of λ to do experiments to evaluate FCM, FCCS and PCCS algorithms. λ controls the extent of cluster centers separation. If λ > 0 the cluster centers separate each other. If λ < 0 the cluster centers are close to each other. The further interesting study is to optimize the parameter λ to advance FCCS and PCCS. Acknowledgments. The authors would like to thank China Postdoctoral Science Foundation funded project (No. 20090460078) for financially supporting this research.
References [1] Zadeh, L.A.: Fuzzy sets. Information and Control 8, 338–353 (1965) [2] Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981) [3] Bezdek, J.C., et al.: Fuzzy Models and Algorithms for Pattern Recognition and Image Processing. Kluwer Academic, Dordrecht (1999) [4] Fukuyama, Y., Sugeno, M.: A New Method of Choosing the Number of Clusters for Fuzzy C-Means Method. In: Proceedings of the 5th Fuzzy System Symposium, pp. 247–250 (1989) [5] Wu, K.L., Yu, J., Yang, M.S.: A Novel Fuzzy Clustering Algorithm Based on a Fuzzy Scatter Matrix with Optimality Tests. Pattern Recognition Letters 26, 639–652 (2005) [6] Wu, X.H., Zhou, J.J.: Fuzzy Clustering Models Based on Cluster Centers Separation. Journal of South China University of Technology (Natural Science Edition) 36(4), 110–114 (2008) [7] Krishnapuram, R., Keller, J.: The Possibilistic C-Means Algorithm: Insights and Recommendations. IEEE Trans. Fuzzy Systems 4(3), 385–393 (1996) [8] Yang, M.S., Wu, K.L.: Unsupervised Possibilistic Clustering. Pattern Recognition 39(1), 5–21 (2006) [9] Pal, N.R., Pal, K., Bezdek, J.C.: A Possibilistic Fuzzy C-Means Clustering Algorithm. IEEE Trans. Fuzzy Systems 13(4), 517–530 (2005) [10] Bezdek, J.C., Keller, J.M., Krishnapuram, R., et al.: Will the Real Iris Data Stand up? IEEE Trans. Fuzzy System 7(3), 368–369 (1999)
A Class of Fuzzy Portfolio Optimization Problems: E-S Models Yankui Liu and Xiaoli Wu College of Mathematics and Computer Science, Hebei University Baoding 071002, Hebei, China [email protected], [email protected]
Abstract. This paper adopts the spread of fuzzy variable as a new criteria in practical risk management problems, and develops a novel fuzzy expectation-spread (E-S) model for portfolio optimization problem. Since the spread is defined by Lebesgue-Stieltjes (L-S) integral, its computation for general fuzzy variables is a challenge issue for research, and usually depends on approximation scheme and soft computing. But for frequently used trapezoidal and triangular fuzzy variables, the spread can be represented as quadratic functions with respect to fuzzy parameters. These new representations facilitate us to turn the proposed E-S model into its equivalent parametric programming problem. As a consequence, given the fuzzy parameters, the E-S model becomes a quadratic programming problem that can be solved by general purpose software or conventional optimization algorithms. Finally, we demonstrate the developed modeling idea via two numerical examples. Keywords: Portfolio, Fuzzy variable, Spread, E-S model, Parametric programming.
1
Introduction
The mean-variance (M-V) model of Markowitz [1] is a cornerstone of modern portfolio theory. The M-V model has received widespread acceptance as a practical tool for portfolio optimization. Therefore, Markowitz’s seminal work has been widely extended in the literature, including mean-semivariance model [2], mean-absolute-deviation model [3], and mean-VaR model [4]. All the models mentioned above belong to bi-criteria optimization problems, in which a reasonable trade-off between return and risk is concerned—either minimizing risk for a given level of expected return, or maximizing expected return for a given level of risk. In fuzzy environments, with the development of fuzzy set and possibility theories [5,6], more and more researchers realized the importance to handle possibilistic uncertainty in decision systems, and applied the fuzzy theory to portfolio optimization problems. Abiyev and Menekay [7] presented the development of fuzzy portfolio selection model in investment, where fuzzy logic was utilized in the estimation of expected return and risk, Parra et al. [8] discussed the optimum Y. Tan, Y. Shi, and K.C. Tan (Eds.): ICSI 2010, Part II, LNCS 6146, pp. 43–50, 2010. c Springer-Verlag Berlin Heidelberg 2010
44
Y. Liu and X. Wu
portfolio for a private investor, taking into account three criteria: return, risk and liquidity, and Qin et al. [9] established a novel data envelopment analysis model with type-2 fuzzy inputs and outputs. Recently, based on the concepts of credibility, expected value and variance defined in [10], an axiomatic approach called credibility theory has been developed [11]. Some interesting applications about credibility in fuzzy optimization problems can be found in the literature such as [12,13,14,15,16,17,18]. The purpose of this paper is to develop the E-S model for fuzzy portfolio optimization problems, in which we adopt the spread of fuzzy variable as a new risk index to measure the variation of fuzzy returns. In [12], the investment return is quantified by the expected value of a portfolio and investment risk by the variance. However, since the variance of a fuzzy variable is defined by nonlinear fuzzy integral, the established portfolio selection problem is neither linear nor convex that have to be solved by heuristic algorithms [12]. In the current development, the spread is defined by Lebesgue-Stieltjes (L-S) integral [19], although its computation for general fuzzy variables is a challenge issue for research, for frequently used trapezoidal and triangular fuzzy variables, their spreads can be represented as quadratic functions with respect to fuzzy parameters. These new representations facilitate us to turn the proposed E-S model into its equivalent parametric programming problem. As a consequence, given the fuzzy parameters, the E-S model becomes a quadratic convex programming problem that can be solved by general purpose software or conventional optimization algorithms [20]. The plan of this paper is as follows. Section 2 gives the parametric representation for the spreads of trapezoidal and triangular fuzzy variables. In Section 3, we first formulate a new E-S model for portfolio selection problem with fuzzy returns. Then, according to the parametric representation of spread, we turn the proposed E-S model into its equivalent parametric programming problem. Given the parameters, the equivalent parameter programming becomes a quadratic convex programming problem. Since the K-T conditions of the quadratic programming can be written as a linear complementary problem, we can solve it by conventional optimization algorithms. In Section 4, we demonstrate the developed modeling idea via two numerical examples. One is solved by Lemke’s complementary pivoting algorithm, another is solved by Lingo software. Section 5 concludes the paper.
2
Parametric Representation for Spread
If ξ is a fuzzy variable with a possibility distribution function μξ : → [0, 1], then its spread is defined by the following L-S integral Sp[ξ] = (r − E[ξ])2 dΦ(r), (1) (−∞,+∞)
where E[ξ] is the expected value of fuzzy variable ξ (see [10]), and Φ(r) is the credibility distribution of the fuzzy variable (see [11]).
A Class of Fuzzy Portfolio Optimization Problems: E-S Models
45
For a trapezoidal fuzzy variable, its spread can be represented as a quadratic function with respect to fuzzy parameters. Theorem 1. If ξ is a trapezoidal fuzzy variable (r1 , r2 , r3 , r4 ), then Sp[ξ] = where r = (r1 , r2 , r3 , r4 )T , and
For a triangular fuzzy variable, its spread can also be represented as the quadratic function of fuzzy parameters, which is stated as Theorem 2. If ξ is a triangular fuzzy variable (r1 , r2 , r3 ), then Sp[ξ] = where r = (r1 , r2 , r3 )T , and
3 3.1
1 T r Br, 48
⎤ 5 −2 −3 B = ⎣ −2 4 −2 ⎦ . −3 −2 5
(4)
⎡
(5)
Fuzzy Portfolio Optimization Model Formulation
Every investor must decide on an appropriate mix of assets to include in his investment portfolio. Given a collection of potential investments indexed from 1 to n, let ξi denote the fuzzy return in the next time period on investment i, i = 1, . . . , n. A portfolio is determined by specifying what fraction of one’s assets to put into each investment. That is, a portfolio is a collection of nonnegative numbers xi , i = 1, . . . , n that sum to one. nThe return the investor would obtain by using a given portfolio is expressed as reward associated with such a i=1 xi ξi , which is also a fuzzy variable. Thus the n portfolio is defined as the expected return E[ i=1 xi ξi ]. If reward were the only concern, it is simple for the investor to put all his assets in the investment with the highest expected return. However, it is known that investments with high reward usually result in a high level of risk. Therefore, there is a need to define n a risk measure for the reward i=1 xi ξi . In this section, we will define the risk associated with an investment to be the spread of the fuzzy return ni=1 xi ξi , which is the quadratic deviation from the expected value. The investor would like to minimize the risk while at the same time not incur too small reward. In
46
Y. Liu and X. Wu
our portfolio selection problem, we formally build the following E-S model by a linear combination of the risk and the reward ⎧ n n ⎨ min Sp[ n i=1 xi ξi ] − μE[ i=1 xi ξi ] s.t. (6) i=1 xi = 1 ⎩ xi ≥ 0, i = 1, 2, . . . , n, where μ is a positive parameter. The parameter μ describes the importance of risk relative to reward, low values of μ attempt to minimize risk, while high values of μ tend to maximize reward regardless of risk. 3.2
Equivalent Parametric Programming
In this section, we discuss the equivalent parametric programming problem of E-S model (6). For this purpose, suppose ξi = (ri1 , ri2 , ri3 ), i = 1, 2, . . . , n are mutually independent triangular fuzzy variables. In this case, fuzzy variable n x ξ i=1 i i can be represented as n
n n n
xi ξi = ( xi ri1 , xi ri2 , xi ri3 ) = (y1 , y2 , y3 ).
i=1
i=1
i=1
i=1
As a consequence, we have the following relationships between yk , k = 1, 2, 3, and xi , i = 1, . . . , n, ⎡
If we denote y = (y1 , y2 , y3 )T , x = (x1 , x2 , . . . , xn )T , and ⎡
⎤ r11 r21 r31 · · · rn1 S = ⎣ r12 r22 r32 · · · rn2 ⎦ , r13 r23 r33 · · · rn3 then we have y = Sx. Therefore, it follows from Theorem 2 that n Sp[ i=1 xi ξi ] =
1 T 48 y By
=
1 T T 48 x S BSx.
On the other hand, by the independence of the fuzzy variables, we have n n
E xi ξi = xi E[ξi ] = bT x, i=1
where b = (E[ξ1 ], E[ξ2 ], . . . , E[ξn ])T .
i=1
(7)
A Class of Fuzzy Portfolio Optimization Problems: E-S Models
47
As a consequence, the E-S model (6) can be turned into the following equivalent parametric programming problem ⎧ 1 T T T ⎪ ⎨ min 48 x S BSx − μb x n (8) s.t. i=1 xi = 1 ⎪ ⎩ xi ≥ 0, i = 1, 2, . . . , n. Furthermore, in problem (8), if we denote (1, 1, . . . , 1)T , and ⎡ −1 0 · · · ⎢ 0 −1 · · · ⎢ A=⎢ . . . ⎣ .. .. .. 0
c = −μb, H = S T BS/24, e = 0 0 .. .
⎤ ⎥ ⎥ ⎥, ⎦
0 · · · −1
then problem (8) can be rewritten as the following equivalent parametric programming model ⎧ 1 T T ⎪ ⎨ min c x + 2 x Hx (9) s.t. eT x = 1 ⎪ ⎩ Ax ≤ 0. In problem (9), H is a parametric matrix with respect to (ri1 , ri2 , ri3 )T , i = 1, . . . , n. Given the parameters, H is a deterministic matrix. Thus, in the case when H is a positive semidefinite for some parameters, then problem (9) is a quadratic convex programming. In this case, the Kuhn-Tucker point x of problem (9) is just a global optimal solution to the problem. Therefore, to solve problem (9), it suffices to find the K-T points, which satisfy the following conditions ⎧ ⎨ −Hx − λe + v = c v x = 0, i = 1, 2, . . . , n ⎩ i i vi ≥ 0, i = 1, 2, . . . , n,
(10)
with λ being a positive real number and v = (v1 , v2 , . . . , vn )T ∈ n+ . If we denote 0 −eT 1 0 λ M= ,q = ,w = , and z = , e H c v x then the Kuhn-Tucker conditions (10) could be rewritten as the following linear complementary problem w − M z = q, wT z = 0, w, z ≥ 0, which can be solved by Lemke’s complementary pivoting algorithm (see [20]). In the next section, we provide two numerical examples to demonstrate the modeling idea developed in this section.
48
4
Y. Liu and X. Wu
Numerical Examples
In this section, we illustrate the developed modeling idea in this paper by two numerical examples. The first one is stated as follows: Example 1. Suppose an investor intends to invest his fund in two securities. Let xi denote the investment proportion in security i, and ξi the fuzzy return of security i, i = 1, 2. Assume that ξi , i = 1, 2 are mutually independent triangular fuzzy variables, and their expected values and spreads are calculated and collected in Table 1. Table 1. The Values of E[ξi ] and Sp[ξi ] Security 1 2
Return (0, 1, 1.5) (0, 1, 2)
E[ξi ] 0.875 1
Sp[ξi ] 0.1927 0.3333
In this example, the Kuhn-Tucker conditions (10) become ⎧ ⎨ −Hx − λe + v = c v x = 0, i = 1, 2 ⎩ i i vi ≥ 0, i = 1, 2,
(11)
where λ is a positive real number and v = (v1 , v2 )T ∈ 2+ . In the case when μ = 1, the Kuhn-Tucker conditions (11) are reduced to find a solution of the following linear complementary system w − M z = q, wT z = 0, w, z ≥ 0, where c = (−0.875, −1)T , ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0 −1 −1 1 0 λ 1 ⎦ M = ⎣ 1 37 , q = ⎣ −0.875 ⎦ , w = ⎣ v1 ⎦ , and z = ⎣ x1 ⎦ . 96 2 1 12 23 −1 v2 x2 By using Lemke’s complementary pivoting algorithm [20], we can obtain the following solution (w1 , w2 , w3 , z1 , z2 , z3 ) = (0, 0, 0, 7/15, 4/5, 1/5), which implies the Kuhn-Tucker point is (x1 , x2 ) = (z2 , z3 ) = (4/5, 1/5). That is, the investor should allocate four fifths of his fund for the first security and one fifth for the second security in order to the maximum benefit. Example 2. Suppose an investor intends to invest his fund to five securities. Let xi denote the investment proportion in security i, and ξi the fuzzy return of security i, i = 1, . . . , 5. In addition, the fuzzy returns ξi , i = 1, . . . , 5 are supposed to be mutually independent triangular fuzzy variables, and their expected values and spreads are computed and provided in Table 2.
A Class of Fuzzy Portfolio Optimization Problems: E-S Models
49
Table 2. The Values of E[ξi ] and Sp[ξi ] Security 1 2 3 4 5
To solve the portfolio selection problem, we first turn the into its equivalent quadratic parametric programming one. Given the parameter μ, we employ Lingo software to solve the corresponding quadratic convex programming problem. To illustrate parameter’s influence to efficiency, we also compare solutions with different values of parameter μ, and the computational results are reported in Table 3. Table 3. Comparison of Solutions with Different Values of μ μ 0.00 0.99 1.00 1.50 20.00
5
x1 1.00000 0.16160 0.08354 0.00000 0.00000
x2 0.00000 0.36715 0.44219 0.42010 0.00000
x3 0.00000 0.00000 0.00000 0.00000 0.00000
x4 0.00000 0.47125 0.47427 0.57990 0.00000
x5 0.00000 0.00000 0.00000 0.00000 1.00000
Concluding Remarks
For illustrating the use of spread as a new risk measure in practice, we presented the E-S model for portfolio optimization problem, which extends the Markowits’s mean-variance framework. Our main results are as follows. First, we gave the parametric representations for spreads of frequently used triangular and trapezoidal fuzzy variables. Second, we developed a new E-S model for portfolio optimization problems, and discussed its equivalent parametric programming problem. Under mild assumptions, the convexity about the equivalent parametric programming problem was also analyzed. Third, two numerical examples were provided to demonstrate the developed modeling idea, one example is solved by Lemke’s complementary pivoting algorithm, another is solved by Lingo software. Acknowledgments. This work was supported by the National Nature Science Foundation of China (NSFC) under Grant No. 60974134.
50
Y. Liu and X. Wu
References 1. Markowitz, H.M.: Portfolio Selection. Journal of Finance 7, 77–91 (1952) 2. Mao, J.C.: Models of Capital Budgeting, E-V vs. E-S. Journal of Financial and Quantative Analysis 5, 657–675 (1970) 3. Simaan, Y.: Estimation Risk in Portfolio Selection: the Mean Variance Model Versus the Mean Absolute Deviation Model. Management Science 4, 1437–1446 (1997) 4. Jorion, P.H.: Value at Risk: a New Benchmark for Measuring Derivatives Risk. Irwin Professional Publishers (1996) 5. Zadeh, L.A.: Fuzzy Sets as a Basis for a Theory of Possibility. Fuzzy Sets and Systems 1, 3–28 (1978) 6. Liu, Z., Liu, Y.K.: Type-2 Fuzzy Variables and Their Arithmetic. Soft Computing 14, 729–747 (2010) 7. Abiyev, R., Menekay, M.: Fuzzy Portfolio Selection Using Genetic Algorithm. Soft Computing 11, 1157–1163 (2007) 8. Parra, M., Terol, A., Uria, M.: A Fuzzy Goal Programming Approach to Portfolio Selection. European Journal of Operational Research 133, 287–297 (2001) 9. Qin, R., Liu, Y., Liu, Z., Wang, G.: Modeling Fuzzy DEA with Type-2 Fuzzy Variable Coefficients. In: Yu, W., He, H., Zhang, N. (eds.) ISNN 2009, Part II. LNCS, vol. 5552, pp. 25–34. Springer, Heidelberg (2009) 10. Liu, B., Liu, Y.K.: Expected Value of Fuzzy Variable and Fuzzy Expected Value Models. IEEE Transactions on Fuzzy Systems 10, 445–450 (2002) 11. Liu, B.: Uncertainty Theory. Springer, Berlin (2004) 12. Chen, Y., Liu, Y., Chen, J.: Fuzzy Portfolio Selection Problems Based on Credibility Theory. In: Yeung, D.S., Liu, Z.-Q., Wang, X.-Z., Yan, H. (eds.) ICMLC 2005. LNCS (LNAI), vol. 3930, pp. 377–386. Springer, Heidelberg (2006) 13. Liu, Y., Zhu, X.: Capacitated Fuzzy Two-Stage Location-Allocation Problem. International Journal of Innovative Computing, Information and Control 3, 987– 999 (2007) 14. Liu, Y.K.: The Convergent Results about Approximating Fuzzy Random Minimum Risk Problems. Applied Mathematics and Computation 205, 608–621 (2008) 15. Liu, Y., Tian, M.: Convergence of Optimal Solutions about Approximation Scheme for Fuzzy Programming with Minimum-Risk Criteria. Computers & Mathematics with Applications 57, 867–884 (2009) 16. Hao, F., Qin, R.: Variance Formulas for Trapezoidal Fuzzy Random Variables. Journal of Uncertain Systems 3, 145–160 (2009) 17. Sun, G., Liu, Y., Lan, Y.: Optimizing Material Procurement Planning Problem by Two-Stage Fuzzy Programming. Computers & Industrial Engineering 58, 97–107 (2010) 18. Qin, R., Liu, Y.: Modeling Data Envelopment Analysis by Chance Method in Hybrid Uncertain Environments. Mathematics and Computers in Simulation 80, 922–950 (2010) 19. Carter, M., Brunt, B.: The Lebesgue-Stieltjes Integral. Springer, Berlin (2000) 20. Bazaraa, M.S., Shetty, C.M.: Nonlinear Programming: Theory and Algorithms. Wiley, New York (1979)
Application of PSO-Adaptive Neural-fuzzy Inference System (ANFIS) in Analog Circuit Fault Diagnosis Lei Zuo∗, Ligang Hou, Wang Zhang, Shuqin Geng, and Wucheng Wu VLSI & System Lab, Beijing University of Technology, Beijing 100124, China [email protected]
Abstract. In order to solve the problem of fault diagnosis method for analog IC diagnosis, the method based on Adaptive Neural-fuzzy Inference System (ANFIS) is proposed. Using subtractive clustering and Particle Swarm Optimization (PSO)-hybrid algorithm as a tool for building the fault diagnosis model, then, the model of fault diagnosis system was used to the circuit fault diagnosis. Simulation results have shown that the method is more effective. Keywords: ANFIS; analog circuit; fault diagnosis; PSO; hybrid algorithm.
1 Introduction The Fault diagnosis technology is to ensure the normal operation of complex electronic system; the key technology is also a hotspot of current research. With the large-scale and ultra large scale integrated circuit applications, the previous analog circuit fault diagnosis method has not been applied [1]. As the 1990s of vigorous development of artificial intelligence, fuzzy theory and neural network technology has been applied to fault diagnosis of analog circuits [2-6], but neural network learning process is essentially a black box of the learning process, it to establish the relationship between input and output is difficult to use and intuitive way to express, and in the process of learning existed in learning, local minimum disadvantages. The fuzzy theory of "IF ... THEN ..." expression solve the problem of traditional two-valued logic expressions, but in the actual process, the need to manually determine the membership function and fuzzy rules expression, in complex systems to achieve them more complicated. In recent years, fuzzy theory and neural network technology combined with each other gradually become a reality. The method using neural network technology to the learning process of fuzzy reasoning automatically created, and defined the expression of fuzzy rules and membership functions. Fuzzy theory and neural networks integrate organically build a fault diagnosis method, and the use of ANFIS (Adaptive Neural fuzzy Inference System) to achieve the above procedure. At present this method has ∗
This work was supported in part by the China National Nature Science Foundation under Grant 60976028.
been applied in chemical process, mechanical systems and so on fault diagnosis and achieved good results [7-9]. Therefore, this paper, ANFIS techniques to build analog circuit fault diagnosis model, the method to use subtractive clustering to determine the structure of fault diagnosis model, and the use of particle swarm optimization (PSO) and the least square method hybrid learning algorithm consisting of an optimization. The actual fault diagnosis results show that the ANFIS method has higher accuracy than the traditional method of diagnosis method.
2 ANFIS for Analog Circuit Fault Diagnosis Model In the analog circuit fault diagnosis, the failure point of the circuit voltage response can be measured using wavelet packet processing technology, will feature information is divided into the training samples and test samples. Among them, the training samples were entered into the ANFIS diagnostic model, in order to improve the diagnostic accuracy of the model, using subtractive clustering algorithm to determine the initial structure of ANFIS model, while PSO and the least squares method using the composition of the hybrid algorithm in the calculation model of the relevant parameters; training is completed, will the test samples were entered into ANFIS . 2.1 The Basic Principle of ANFIS [10] The traditional fuzzy theory require manual identification fuzzy rules and membership function expression, while the ANFIS model is built on the basis of the input and output learning sample, eliminating the tedious modeling process, which for those people can not fully understand the fuzzy characteristics of complex systems modeling. Jiang proposed ANFIS corresponding with a first-order Sugeno type fuzzy inference system; there are two rules for a first-order Sugeno model: if x is A 1 and y is B 1 then f 1 = p 1 x + q 1 y + r 1 if x is A 2 and y is B 2 then f 2 = p 2 x + q 2 y + r 2 (1) Where, x and y on behalf of inputs, A and B is to describe the language of variables, usually a fuzzy number, while f is the x and y of the polynomial, representing a fuzzy rule, the structural parameters p, q, r can usually be calculated using hybrid algorithm. Thus, the model is equivalent to the ANFIS model structure can be shown in Figure 2: As can be seen, the model has five-layer structure: The first layer fuzzy input signals for processing, the node i is a function, denoted as follows:
oi1 = μ Ai ( x) i = 1,2 oi1 = μ B( i − 2 ) ( y ) i = 3,4
(2)
Where, x, y for the input, O for the membership function of sets A and B, the function selected Gaussian function, the expression as follows: 2 1 oi = μ A = exp[−( x − d i ) / σ j ] , D and σ respectively, the center of Gaussian function and the width values, A and B with membership function related to the language of variables.
Application of PSO- ANFIS in Analog Circuit Fault Diagnosis
53
Fig. 1. ANFIS Structure
The second layer is the product of all output variables from first layer, marked as Π:
oi2 = ωi = μ Ai ( x) ⋅ μ Bi ( y ) i = 1,2
(3)
The third layer will be done on the rules of the output layer normalized, the output is the ratio, when the product of the first i-node to calculate the first i rules with all the rules. Marked N, expressed as:
oi3 = ϖ i =
ωi ω1 + ω 2
i = 1,2
(4)
The fourth layer is added to the conclusions parameter of output value, the output is:
oi4 = ϖ i f i = ϖ i ( pi x + qi y + ri ) i = 1,2
(5)
Therefore, the layer of each node can also be referred to as adaptive node. The fifth layer there is only one fixed node, all the inputs used to calculate the total output:
o = ∑ϖ i f i 5 i
i
∑ω f = ∑ω i
i
i
i = 1,2
(6)
i
i
2.2 Parameter Optimization of ANFIS Given the premise parameters, the output of ANFIS is: (7) f = (ϖ 1 x) p1 + (ϖ 1 y )q1 + (ϖ 1 )r1 + (ϖ 2 x) p2 + (ϖ 2 y )q2 + (ϖ 2 )r2 This article defines the hybrid algorithm is divided into two parts, the first least squares to optimize the parameters of ANFIS network, therefore, equation (7) rewritten into a matrix form, denoted as f = D ⋅ X . Where, X = [p1, q1, r1, p2, q2, r2], if input data have n set, then the dimension of matrix D and f, respectively n×6 and n×1, error function J = 1 / 2‖f - DX‖. Then according to the method of least squares to minimize the objective function are: X = [DTD]-1DTf
(8)
54
L. Zuo et al.
Calculated conclusion parameters, it will be the previous error function step into the algorithm, the calculation of the premise parameters of ANFIS network, the current algorithm commonly used in BP algorithm, but due to the use of gradient descent algorithm is easy to fall into local minimum, Therefore, this paper adopts the premise parameters of particle swarm optimization. Particle swarm optimization is evolved from a flock of birds in flight always maintain the integrity of the flight formation. The algorithm can be viewed as optimization problems in the particle space. Suppose there are m optimization problems, the corresponding particles in this space according to their location, as well as the location of companions, dynamic adjustment of its own flight to determine the current location is the best. Therefore, the definition of its own algorithm position vector xi = (xi1, xi2... xid) and flight vector vi= (vi1, vi2 ... v id). In the adjustment process, the record of a single particle current best position, known as the pbest. The record current best position of all the particles, known as gbest. In iteration of algorithm, According to equation (9) and equation (10) to adjust the particle velocity vector and position vector. k+1 k k k (9) viD = w×viD + c1 × rand(⋅)×( pbest− xiD ) + c2 × rand(⋅) ×(gbest− xiD ) k +1 k k +1 xiD = xiD + viD
(10)
Therefore, particle swarm algorithm used to calculate premise parameters of ANFIS: --First, Analog circuits in the event of failure, the circuit point voltage can be measured, so the use of wavelet technology to extract the fault characteristics of point voltage. The training sample sets of elements T = ((xi, y) | i = 1, 2... n), the test samples were set for T '= ((xi', y') | i=1, 2 ..., m). y said that the types of failures. The initialization of ANFIS premise parameters and particle swarm algorithm velocity vector v are (0, 1) random number between. PSO algorithm is the definition of the cut-off number of iterations N max = 100, the initial population size of 50, the weight factor c1 = c2 = 1.995, particle swarm algorithm in the weight function w according to equation (11) from 0.8 Linear reduced to 0.3.
w = wmax −
wmax − wmin ×T Tmax
(11)
--Second, after initialized, based on the first part of the hybrid algorithm, using least squares defined error function to evaluate the fitness of each particle. In the PSO algorithm for computing, if a particle in the current generation of computing in fitness value is better than the previous generation of computing the particle's fitness value, would result in pbest value equal to the optimization of the particle; if the entire group, the remaining particles fitness value is better than the current particle's fitness value, make gbest value equal to the particle optimization. --Third, Update pbest and gbest values substituted into equation (9) and equation (10), calculation of particle velocity and position vectors. --Fourth, after the end of this round of learning algorithm, if the current number of iterations to reach a pre-set maximum or minimum target number of errors then gets the final value of the premise parameters, or else go to First.
Application of PSO- ANFIS in Analog Circuit Fault Diagnosis
55
The second part of the hybrid algorithm using PSO algorithm update the premise parameters, you can change the shape of membership functions. 2.3 Fuzzy Clustering Process ANFIS model for the initial structure of the fault diagnosis model has a great influence on performance of diagnostic accuracy, fuzzy clustering is usually an inherent feature of the training samples collected, the data re-grouping, each a cluster as a data pair, can use clustering grouped the information to generate a Sugeno fuzzy inference system. The current methods commonly used fuzzy C-means clustering method and Subtractive subtraction clustering method. In this paper, Subtractive subtraction clustering method was used[11], all the data points are all divided into a possible cluster center, and calculating the cluster centers of data points density ,and high density of data points was selected as cluster centers, and the data point of adjacent data points were emitted as the cluster center, and so on, in the remaining data points in the same way of choosing the other cluster centers, until the remaining data points are less dense than the algorithm for setting up the threshold standard for judging . Therefore, the algorithm can automatically determine the initial Sugeno fuzzy inference system structure.
3 Simulation Results This diagnosis of Half-Wave Rectifier filters circuit, shown in Figure 2. For this circuit, consider the circuit D1, D2, R1 and R2 a short circuit, coupled to a normal state, a total of five kinds of failure modes. Using figures 1 to 5 for describing. In the definition of failure modes, using PSPICE circuit simulation software, testing imposed by the signal V1=15sin (100πt), the wavelet transform of measuring point voltage value Y1 using the db3 wavelet obtained four wavelet decomposition coefficients (d1, d 2, d3, c3). Define the network input feature vectors for the (D1, D2, D3, C3), where, C3 =Σc3, Dj =Σdj (j = 1, 2, 3). Analysis of fault characteristics using Monte Carlo method, then the failure mode of each group of 50 samples, One group of 30 samples were used for training, 20 group of samples for testing. ANFIS has four input vectors, output vectors corresponding to failure mode has five, but the ANFIS network is a single-output network, so for multi-input multioutput system, the need for ANFIS to do some improvements. Fault diagnosis model corresponds to the n kinds of failure were established sub-model n-ANFIS training until the training has been completed, the test data over the respective n-type ANFIS subsystem to calculate the actual output value of all sub-yi (i = l, 2, ..., n) with the error between the ideal output value of ei = | 1 - yi|, so that en = min (e 1, e 2, ..., e i), diagnose the kinds of n failure. Circuit were used to treat diagnostic by ANFIS and in this article PSO-ANFIS approach to fault diagnosis, extract part of the diagnostic results of analysis, the results shown in Table 1. As can be seen, using PSO-ANFIS method of diagnosis was significantly higher than the precision of BP network, while the introduction of particle swarm optimization of ANFIS network structure parameters, the system's minimum error of 0.001, while the ANFIS method can only be reached 0.1, and the PSO-ANFIS method markedly improved the convergence rate, as shown in Figure 3 and Figure 4.
56
L. Zuo et al.
Fig. 2. Half-Wave Rectifier Circuit Filters Table 1. Part of the diagnostic results Failure Expected R1 1 R2 2 D1 3 D2 4 Normal 5
Diagnostic value of ANFIS 0.9692 1.9709 3.0638 3.9728 4.9803
Diagnostic value of PSO-ANFIS 0.9975 2.0036 3.0058 3.9986 4.9989
Fig. 3. ANFIS fitness function curve
Fig. 4. PSO-ANFIS fitness function curve
Application of PSO- ANFIS in Analog Circuit Fault Diagnosis
57
4 Conclusion This paper presents a new analog circuit fault diagnosis method based on PSOANFIS; using wavelet techniques extract fault characteristics of the circuit information. Because traditional hybrid learning algorithm for the model in the calculation of the premise parameters has some problems, the introduction of the PSO algorithms constitute a new hybrid learning algorithm; in Half-Wave Rectifier filter circuit fault diagnosis, verified the validity and usefulness of the method to improve the diagnostic performance of the system.
References 1. Aminianm, S., Aminian, F.: Neural-network based analog-circuit fault diagnosis using wavelet transform as preprocessor. IEEE Transaction on Circuits and Systems 47(2), 151–156 (2000) 2. Martin, H.T.: Back propagation neural network design. China Machinery Press, Beijing (2002) 3. Stopjakova, V., Micusik, D., Benuskova, L., et al.: Neural networks-based parametric testing of analog IC. In: Proc. of IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, pp. 408–416 (2002) 4. He, Y., Ding, Y., Sun, Y.: Fault diagnosis of analog circuits with tolerances using artificial neural networks. In: Proc. IEEE APCCAS, Tianjin, pp. 292–295 (2000) 5. Zhang, C., He, g., Liang, S., Chen, G.: Fault diagnosis of analog circuits in control system based on information fusion. Journal of Harbin Engineering University 28(12), 1312–1315 (2007) 6. Zhu, D., Yu, S., Tian, Y.: Circuit Fault Diagnosis Based on Fuzzy Data Fusion. MiniMicro Systems 23(5), 633–635 (2002) 7. Song, X., Huang, D.: Application of improved ANF IS in chemical process fault diagnosis. Journal of East China University of Technology 32(8), 985–988 (2006) 8. Liu, Y., Zhang, T., Wen, B., Cao, W.: Fault diagnosis of diesel engine Based on ANFIS. Journal of System Simulation 20(21), 5836–5839 (2008) 9. Xiao, Z., Li, Z., Zhang, z.: Application of ANFIS neural fuzzy inference engine to fault diagnosis for flood gate integrated automation. Engineering Journal of Wuhan university 37(2), 41–44 (2004) 10. Jang, J.S.: ANFIS: Adaptive-network-based Fuzzy Inference System. IEEE Transactions on Systems, Man and Cybernetics (S0018-9472) 23(3), 665–685 (1993) 11. Yager, R.R., Filev, D.P.: Approximate clustering via the mountain method. IEEE Transactions on Systems, Man and Cybernetics (S0018-9472) 24(8), 1279–1284 (1994)
Chaos Optimization SVR Algorithm with Application in Prediction of Regional Logistics Demand Haiyan Yang, Yongquan Zhou, and Hongxia Liu College of Mathematics and Computer Science Guangxi University for Nationalities Nanning, Guangxi 530006, China [email protected]
Abstract. In this paper we explore using the support vector regression (SVR) based on the statistics-learning theory of structural risk minimization for the regional logistics demand. Aiming at the blindness of man made choice of parameter and kernel function of SVR, we apply a chaos optimization method to select parameters of SVR. The proposed approach is used for forecasting logistics demand of Shanghai, The experimental results show that the above method obtained lesser training relative error and testing relative error. Keywords: Support vector regression (SVR); Chaos optimization; Regional logistics demand.
Chaos Optimization SVR Algorithm with Application in Prediction
59
For this shortcoming, it is applied chaos optimization theory to SVM parameter selection in this paper, which presents a chaotic support vector regression machine (SVR) algorithm. The algorithm is applied to regional logistics demand forecasting problems. To forecasting logistics demand of Shanghai as an example, the simulation results show that the method has a smaller relative error of the training and testing of relative error, and have achieved fairly good results in the regional logistics demand forecasting.
2 Basic SVR Theory Support vector machine [4] [5] is a machine learning algorithm based on statistical learning theories adopts SRM criteria which minimizes the sample point error and reduces the upper bound of generalization error model to improve the model generalization ability. In the function approximation problem, for the data set
{xi , y i }, i = 1,2,.., n, xi ∈ R d , y i ∈ R. if it is used linear function to regression. The linear function:
ζ i′ ≥ 0 make
f ( x) = ωx + b .Where ε is fitting Accuracy, ζ i ≥ 0 and
is relaxation factor. According to SRM criteria, fitting function f ( x ) should
n 1 | ω | 2 +C ∑ (ξ i + ξ i* ) the minimum. Where (C > 0) is penalty factor. At 2 i =1
the same time it has to satisfy the conditions:
y i − ω ⋅ xi − b ≤ ε + ξ i
(1)
ω ⋅ xi + b − y i ≤ ε + ξ i
(2)
ξ i ≥ 0 and ξ i* ≥ 0, i = 1,2,.., n
(3)
Using Lagrangian optimization problem can get the target of the dual problem. With
∑ (a n
constraints,
i
−a
i =1
∗ i
) = 0, 0 ≤ a , a i
* i
≤ C , i = 1, 2, L , n , for Lagrange factor
*
a , a , maximize the objective function. i
i
W ( a, a ) = −ε ∑( a + a ) + ∑ y ( a − a ) − n
*
n
*
i
i =1
*
i
i
i
i
i =1
1 ∑( a − a ) ( a − a )( x ⋅ x ) 2 n
*
i
*
i
j
j
i
j
(4)
i , j=1
So the regression function is then,
f ( x ) = (ω ⋅ x ) + b = ∑ ( ai* − ai )( xi ⋅ x ) + b * n
i =1
(5)
60
H. Yang, Y. Zhou, and H. Liu
For the non-linear function fitting, using the appropriate inner product function k ( x , x ) can achieve the linear approximation after nonlinear has transformated.so the i
j
fitting function is then,
f ( x) =(ω⋅ x) +b = ∑( a −a ) (ϕ( x ) ⋅ϕ( x) ) +b = ∑( a −a ) k( x ⋅ x) +b n
n
*
*
i
i
*
i
*
i
i=1
i
(6)
i
i=1
Different inner kernel functions from different algorithms in SVR.There are common kernel functions: polynomial kernel function, Gaussian radial basis kernel function, K-kernel function [6] and sigmoid kernel function. A number of experiments prove K-kernel function has good generalization ability, accuracy and learning efficiency, which is better than RBF kernel. So we adopt the K-kernel function in this paper.
3 Optimizing Parameters of SVR Chaos has the characteristics of randomness regularity and periodicity. The basic idea is to transform the variable from the chaos of space to the solution space, then use the characteristics of chaotic randomness, regularity, and periodicity to search [7][8]. In cognizance of the characteristics of blindness and randomness that Support Vector Machines selected parameters shown by, the chaos optimization method will be used to select the parameters in this paper. The specific steps are as follows: Step 1: Supposing that k = 1 , fixed optimizing vector
1
x
allowed the most iterative times T . We request
t = (x −a )/(b −a ) , f (x ) 1
1
i
1
i
1
x,a
i
i
and
i
Then xi
1
f (x )
1
, where ti
is
is forecast output accurate percentage under parameter
i
bi is the range margin of x1 .Let x∗ = x1 , f ∗ = f ( x1 );
Step 2: There is chaos variables k +1
( parameter and kernel) is
= a + (b − a )(t ) k +1
i
i
i
2.5
i
t = 4t (1 − t ) k +1
k
k
i
i
i
, to work out
by introducing Logistic.
k +1
f (x ) ;
f (x ) , if f ( x ) > f ( x ) , then x = x , f = f ( x ) ; jumping trace routine step 4, else k = k + 1; jumping trace
Step 3: To compare *
k +1
*
k
f (x )
with
k+1
k +1
k
k +1
routine Step 2; Step 4: If *
x
k < T , ,then let k = k + 1 , jumping trace routine step 2; else the output
is the current optimization parameter of SVR.
4 Regional Logistics Demand Forecasting Experiments 4.1 Experimental Data
In order to verify the effectiveness of the algorithm, we take logistics demand forecasting of Shanghai as an example. It is based on the impact of regional logistics
Chaos Optimization SVR Algorithm with Application in Prediction
61
demand factors. Taking into account the impact of various factors, in line with the principles of operability, economic indicators that we select to forecast the scale of logistics demand are as follows: Value of primary industry, the secondary industry output value. Output value of tertiary industry, Regional retail sales, the region's total foreign trade, and Per capita consumption level and so on. The size of Shanghai logistics demand and statistical data on economic indicators (1978-2003) are shown in Table 1, for reference [9]. Table 1. Shanghai logistics demand and Statistical data on economic indicators (1978~2003) Indicat -ors
In order to better optimize the various parameters in the SVR and to reduce the complexity of computation, Using
xij' = xij / xmax j normalized on the collected data. In
62
H. Yang, Y. Zhou, and H. Liu '
this formula, xij are the normalized values of the
j items of data for the i indicators;
xij means the actual value of the j items of data for the i indicators; xmax is the maximum value of all the data in the i indicators. For the maximum normalized results, SVR can be taken into account to obtain better convergence, and its value is 0.9999. we take samples of the 1978-1995 year as a training sample, which of 19962003 year as a test sample. Table 2. The size of Shanghai Logistics Demand forecasting results Years
Fig. 1. The predictive value of non-optimized SVR and Chaos optimized SVR
Chaos Optimization SVR Algorithm with Application in Prediction
63
4.3 Results
In the MATLAB6.5 environment, training and testing experiments is the simulated for the SVR model of regional logistics demand forecasting. Using chaos optimization method to select the SVR parameter, Learning factor C=367.6216, Kernel parameter P1= -0.0208.The simulation results of test samples are showed in Table 2. Generally speaking, chaos optimized SVR prediction results than non-optimal SVR prediction is closer to the true value, and there are a more small relative error and a higher degree of fit. The model, to some extent reflects the complex mapping relations between the needs of the regional logistics and its influence factors. Figure 1 is a visual representation of fitting curves that the non-optimal SVR predictions and chaos optimization of SVR predictions compared with the actual results. Chaos Optimization of SVR can be seen basically in line with the actual value, whereas non-optimal SVR prediction relative error is larger and fitting precision is lower. Figure 2 shows the error between the predicted values and the real value. * is the error of non-optimal SVR prediction value and actual value (the error of N-O SVR). o is the error of Chaos Optimization SVR predicted value and true values (the error of COSVR).
Fig. 2. The error curves of predicted value and true value
5 Conclusions In this paper, the introduction of chaos optimization algorithm solves the blindness and randomness of the support vector machine parameters of artificial selection, ensuring the accuracy of the fitting results. And the use of highly nonlinear function of
64
H. Yang, Y. Zhou, and H. Liu
the support vector machine, from the aspect of the regional economic (and other factors) and the intrinsic relationship between regional logistics demands, establishes a regional logistics SVR prediction model, and reveals the inherent non-linear mapping relationship between the regional economy and the regional fate of the
。
demand Logistics demand in Shanghai as an example, confirmed the prediction model of highly adaptable, strong learning, fast convergence and high accuracy. The results showed that the chaotic optimization support vector regression machine made more accurate projections of regional logistics and more reasonable choice of parameters. Acknowledgements. This work is supported by Grants 60461001 from NSF of China and the project Supported by Grants 0832082 ,0991086 from Guangxi Science Foundation.
References 1. Wang, Y.H.: Logistics Demand Forecast of A Linear Regression Analysis. Shopping Center Modernization 34, 136 (2006) (in Chinese) 2. Chen, S., Zhou, F.: Based on Gray System Theory Logistics Demand Forecasting Model. Statistics and Decision 3, 59–60 (2006) (in Chinese) 3. Chu, L.Y., Tian, Z.G., Xie, X.L.: Combination Forecasting Fodel in Logistics Demand Forecasting. Journal of Dalian Maritime University 30(4), 43–46 (2004) (in Chinese) 4. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995) 5. Deng, N.Y., Tian, Y.J.: A New Method of Data Mining-SVM. Science Press, Beijing (2004) (in Chinese) 6. Sun, C.J.: Based on K-kernel Function Support Vector Machine. Journal of Huaihai Institute of Technology 12, 1672–1685 (2006) (in Chinese) 7. Wang, L., Zheng, D.Z., Li, Q.S.: Chaos Optimization Algorithm Research Computing Technology and Automation 20(1), 1–5 (2001) (in Chinese) 8. May, R.: Simple Mathematical Model with Very Complicated Dynamics. Nature 261, 459–467 (1976) 9. Hou, R., Zhang, B.X.: MLP Neural Network Based on Regional Logistics Demand Forecasting Method and Its Application. Systems Engineering Theory and Practice 25(12), 43–47 (2005) (in Chinese)
Cooperation Partners Selection for Multiple-Core-Type MPN Shuili Yang, Taofen Li, and Yu Dong Faculty of Economics and Management, Xi’an University of Technology, Xi’an. China [email protected], [email protected], [email protected]
Abstract. The key to multiple-core-type modular producing network cooperation partners selection is the choice of core cooperation partner, on the analytical basis of multiple-core-type modular producing network cooperation partner selection, this paper suggests that Comprehensive Evaluation Method to choose partners is to choose the multiple-core-type modular producing network cooperation partners in accordance with the principles of the lowest total cost, the shortest reaction time and the minimum risk and through two stages of the initial selection of core partner and the comprehensive evaluation of core partner combination. Keywords: Modularization, Modular Production Network, Cooperation Partners selection.
Above-mentioned evaluating methods are functional when there are not many partner candidates to be selected. Once the candidates’ number was 100 or more, there would be too much workload. Although the cooperative selection and optimal patterns based on genetic algorithm can search for the partners to satisfy the conditions from many candidate enterprises, it lacks the capability to produce the optimum individual whereby leading to the fact that the speed slowdown when the searching is nearing to optimal solution and even falls into the local solution, so that it can not be implemented in the practice. As for the past studies of network organization partners’ selection, some scholars thought the best partner should meet the following three principles, i.e. lowest total cost, agility and lowest risk; this is also the so-called union of the strong. However uniting the strong doesn’t mean that the whole will be the best. This article aims to optimize the selection of core partners’ combinations by the method of comprehensive judgment of core partner constituter’s combination. The cooperative partners of poly-nuclear modularized producing network are divided into two categories: key partners and ordinary partners. As to the ordinary partners, the selection method is the same as mono-nuclear network cooperators. As for the key producing enterprise, they constitute the core enterprises of poly-nuclear network along with system integrators. Core cooperative partners’ selection is crucial to the selection of partners of poly-nuclear modularized producing network. The following part is to discuss core partners’ selection of poly-nuclear modularized producing network through two phases: primary selection of core partners and comprehensive judgment of core partners.
2 Primary Selection of Core Partners When choosing multiple core modularized producing networks, the evaluation standard of primary selection of core partners is the module customization capability of modularized producing enterprise. The formation of modularized producing network begins with system integrator’ recognition of market opportunities; according to product structure’ modularized classification scheme, the integrator will determine the core modules and their type and quantity. Primary selection consists of two partsbidding and filtrating. Bidding is the process that the integrator invite public bid and seek for the producing enterprises. There is a need to investigate the module customization capacity of the tendering enterprises and a higher customization capacity standard should be set for each module’s bidding enterprises to check whether they have the core producing capability. Next, the enterprises will be asked to estimate their time and cost consumption for the following filtration. The bidding enterprises will be firstly divided into different groups in terms of module producing categories. Then the integrator will set the same time and cost standards for each module. If a enterprise’s time and cost consumption is above the standards, it will be eliminated. The quantity of each module’s passers should be controlled; four to six will be a good range. If the number is too large, it will greatly increase the workload of the following comprehensive optimizing judgment; while only a few passers will make us lose potential partners. Accordingly, in practice, if there are too many primary selection passers, we should raise the standards and vice versa. (But this must accord to the premise of the request of cost and time).
Cooperation Partners Selection for Multiple-Core-Type MPN
67
3 Comprehensive Evaluation of Core Partner Combinations Comprehensive Evaluation of core partner combinations follows the primary selection and emphasizes on total cost, agility and risk [7]. This article suggests the method of comprehensive judging to select the combination of lowest total cost, total reaction time and total risk from core partner combinations. Some system integration business intends to construct modularization producing network to respond to marketplace opportunity, divides scheme according to product modularization, and discovers the core module being M1, M2,…,MN, calls for bid thereupon by the fact that the medium is in progress. After the first time bidding preparation by screening that the enterprise carries out to possessions, the assumed core module Mi (i = 1,2,…,N), Ai1, Ai2,…, Aij are the manufacturing enterprises sifting, and the core partner can constitute a number then being Π j i . The core partner composed of n kinds of module type manufacturing enterprises constitutes general expression by:
(
Γ i = Α 1 j1 , Α 2 j 2 , … , Α nj n
)
(1)
Every core partner combination will be analyzed based on the principle of the lowest total cost, agility and lowest risk. Lowest total cost is firstly considered. The cost of modularized producing network operation consists of producing cost, operation management cost and coordination cost. Coordination cost is the coordinating and controlling expenditure among the partners during the life circle of the producing network; operation management cost is the everyday management cost of the modularized enterprise; and producing cost is made up of basic producing cost and customized producing cost. If C ijkT is used to representative producing cost in module Mi manufac(jk=1,2,…,ji), C ijB represent U=0 base type module basic produces cost, the turer A Mi jk module is satisfied with style down then between the cost producing total cost and module making to satisfy the following equation: k
C Tij k = C ijBk + Δ C Mjk i
(2)
Where ΔCMj i is that module Mi manufacturing enterprises module makes cost customk arily among them. Since ΔCMj increasing by the increasing with the fact that the module makes degree of difficulties customarily, whose value has trend nature, in order to be simple the average customized cost ΔCMj in a certain time section can be used to i
k
i
k
measure.
Therefore,
total
cost in
module
Mi production enterprise
AMjk i
is C ij k = C ijBk + Δ C ij . T k If C Yij is used to express the operation and management cost of module Mi produck ing enterprise A Mj , and C ijH can be its coordinative cost, the total cost of the sum of the production constituted by n core modules is equal to the sum of the production cost, the operation and management and coordinative cost of each core module manufacturing or production enterprise: i
k
k
68
S. Yang, T. Li, and Y. Dong
C
x
=
n
∑
C
i =1
Y ij k
+
n
∑
H ij k
C
i =1
+
n
∑
i =1
[C
B ij k
+ ΔC
M jk
i
]
(3)
The second consideration is agility. Agility refers to the rapid integration of its resources of each candidate core partner through its management mechanism, methods, and measures (including communication means, information management, resource integration and management function). Whereby, the responding speed is made by every production task assigned by the system integrators. The judging standard of the partner’s agility is the time used for fining the orders so that the total agility of core partner combination is judged by its responding time to customer’s order. Assuming that the production network receives the customer order forms, they should be further examined through such four key links to the analysis of the order form, the module designs, module fabrication and the module assemble, of which the order form analysis and module assembling are completed by the system integration coming business and the module design and the module fabrication are completed by every module manufacturing enterprises. Letting TD and TZ be that system integration business order form analysis and module assemble time, TSijk is the jk module design time by the i module manufacturing enterprises. TMijk is the jk module fabrication time by i module manufacturing enterprises, and then total agility is formed by n core module forming the core partner combination x promptness for aggregation of times of four kinds:
Τx = ΤD + ΤZ + ΤSijk + ΤMijk
(4)
The last consideration is lowest risk. The risk of selecting modularized network partner refers to the cooperation risk between system integrator and each core partner. This risk has two meanings, i.e. the probability of the risk and the possible loss is likely to be caused by the risk, both of which can not be precisely calculated, so that the expert group’s evaluation can be adopted. As to risk probability, the core partners can be compared in couples for every module so as to obtain the risk factor of each candidate partner and then the analytic hierarchy is adopted to work out important degree which can be served as the probability evaluation value for each candidate partner to cause risks. Accordingly, the jk risk of the candidate partner of choosing i module can be measured by the hidden risk factor rijk of cooperation partner, which is likely to cause the expected value of total losses of the whole core partner combination and the likely occurrence of probability products: RE
ijk
[
= E ( rij k ) × P ( rij k )
]
(5)
The X risk of the whole core partners’ combination can be as follows: RK
x
=
n
∑
RK
i =1, k = i
(6) ij k
To know the expected value and its happening probability caused by potential hidden risk factor of a candidate partner, system integrator will resort to professional institutions or expert group for assessment. Generally speaking, the higher controllability of the risk is, the lower the total loss expected value and its happening probability will be.
Cooperation Partners Selection for Multiple-Core-Type MPN
69
According to the above analysis, among all the core partner combination, Ay(y=1, 2… Π ji ), can be assured to exist, thus rendering or making C*=CAy=minCy. Accordingly, there exist T*and RK*, thus making T*=minTy, RK*=min RKy. Since C*, T* and RK* are the combination of the lowest cost, the shortest response time and the minimum risk in all the core partner combination respectively, C* can be served as the cost measurement norms of each core partner combination, which total cost coefficient δ C = C Ay / C * for each core partner combination. Similarly the agilAx
ity coefficient δ T
Ax
= T Ay / T * for each core partner combination can be obtained, with
total cost coefficient δ R = RK Ay / RK * for each core partner combination. Ay
If the important degree of total cost total response time and total risk measured by the experts in the optional selection of the core partners can be λ C , λT and λ R respectively, for every core partner combination candidates, the coefficients of comprehensive judging in best cooperation partner is as follows:
ϕ G = δ C ⋅ λC + δ T ⋅ λT + δ R ⋅ λR i
i
i
(7)
i
Among the combination of core partners, there must exist ϕ
∗
for ϕ
∗
= min ϕ G i ,the
G* institute is corresponding to ϕ ∗ ,which is the optimal combinations of Total cost , total response time and total risk evaluation, as well as the most ideal core cooperation partner combination.
4 Conclusion This article suggests a new method- comprehensive judging core partner combination to select partners in poly-nuclear modularized producing network based on the summarization and analysis of selection research by the partners engaging in multi-modularization production network. This method is not based on evaluating the capability of one candidate cooperator but comprehensively judges all core partner combination according to the principles of the lowest total cost, shortest response time and lowest total risk and chooses the optimum combination. This method is easy to operate and C language can be used to convert it into corresponding calculating programs whereby enormously reducing the calculation amount so as to provide the enterprises a set of feasible method for finding the cooperation partners and agile response to market opportunities and customers’ orders. Acknowledgements. This project is financed by: The Soft Sciences Research Fund of Xi'an in 2005(HJ05002-2); The Sciences Research Fund of Shaanxi Province in (2006KR100, 2008KR09); The Sciences Research Fund of Xi’an University of Technology in 2007(107-210705); The Special scientific research Fund of Shaanxi Province Education Committee in 2009(09JK160); The Soft Sciences Research Fund of Xi'an in 2010.
70
S. Yang, T. Li, and Y. Dong
References 1. Yong-hui, M., He-ao, C., Shu, Z.: Selection Method for Design Partners in Network Extended Enterprises. Chinese Journal of Mechanical Engineering 36 (2000) 2. Zheng, Q., Bing-heng, L.: The Integration Decision of Agile Manufacturing. China Mechanical Engineering 14 (1997) 3. Chen, J., Feng, W.-d.: Structure and manage suppositional enterprise. The Press of Tsinghua (2001) 4. Yong-ling, Y., Ya-qing, Z.: The Exploration of Optimal Selection Method of Cooperation Partners of Virtual Enterprise. Soft Science 2 (2004) 5. Xian-hua, W., Lie-ping, Z.: Decision Making Method on Partner Selection of Virtual Enterprise and the Establishment of Strategic Analysis Model. Systems Engineering 16 (1998) 6. Kang, J., Zhen-hua, Y., Guo-xing, H.: Study of Partner Selection in Manufacturing Extended Enterprise. Modular Machine Tool & Automatic Manufacturing Technique 10 (2007) 7. Shui-li, Y.: Synthetic Judgement Method for Cooperative Partner Selection of the Virtual Enterprise. Operations Research and Management Science 5 (2003)
A New Technique for Forecast of Surface Runoff Lihua Feng1 and Juhua Zheng2 1
2
Zhejiang Normal University, Zhejiang Jinhua 321004, China Hangzhou Vocational & Technical College, Zhejiang Jiande 311604, China [email protected]
Abstract. Wet-and-low water changes of surface runoff have always been the focus of various researches. Regional water bodies are already being challenged by economic development. Artificial Neural Networks (ANN) deal with information through interactions among neurons (or nodes), approximating the mapping between inputs and outputs based on non-linear functional composition. They have the advantages of self-learning, self-organizing, and self-adapting. It is practical to use ANN technology to carry out forecast of surface runoff. Keywords: ANN; water resources; surface runoff.
1 Introduction Wet-and-low water changes of surface runoff have always been the focus of various researches [1]. As economies rapidly develop, many nations have faced shortages in water resources, especially those in areas prone to droughts and medium to large metropolises. This has resulted in the significant problem in the coordination of economic development with the usage of water resources [2]. Regional water bodies are already being challenged by economic development. In the conventional forecast of surface runoff, it is commonly to set up mathematical models or draw related graphs based on existing data. Hence, it involves issues of pattern recognition [3]. Since the developed technology of Artificial Neural Networks (ANN for short) has advantages of self-learning, self-organizing, and self-adapting, there are many successful applications of it on pattern recognition [4]. Therefore, based on the principle and method of ANN, we study some related issues of forecast of surface runoff in this note.
ANN algorithms include Hebbian, Delta, Kohonen, and BP. The BP algorithm (Error Back Propagation) was presented in 1985 by Rumelhart and his PDP team. It realized Minsky’s thought on multilayer neural networks. A typical multilayer-feedforward neural network consists of a number of neurons that are connected together, usually arranged in layers. Its first layer is the input layer. Its final layer is the output layer. All other layers are hidden layers, which contain the neurons that do the real work. A neural network that uses the error back propagation algorithm is said to be a BP network, whose learning process consists of the feed-forward and feed-backward. Each sample signal in the feed-forward process is applied by the Sigmoid function f ( x) = 1 /(1 + e − x ) before it is passed to next layer. The situation of neurons on each layer can only affect the situation of neurons on the next layer. If the output layer does not produce the desired value, then the errors will be fed back from the outputs to the inputs through the network, and the weights of nodes in each layer will be changed along the way. The algorithm repeats in this way until the error values are sufficiently small. Let m be the number of layers, y mj denote the output from the node j in the layer
m, y 0j = x j denote the input at node j, Wijm be the weight of connection between node i and node j, and θ j be the threshold at the node j in the layer m. The BP network is m
training as follows: (1) Initialize each weight and threshold to a random value in (−1, 1) . (2) Select a pair of data ( x k , T k ) from the training data and substitute inputs into the input layer such that yi0 = xik (for i) (1) Where k denotes the number of iterations. (3) Pass the signal forward by using the formula: y mj = F ( s mj ) = F (∑ Wijm yim −1 + θ mj ) (2) i
The calculation processes the output at each node j from the first layer through the last layer until it completes. Where F(s) is the Sigmoid function. (4) Calculate the error for each node j in the output layer as follows: δ mj = y mj (1 − y mj )(T jk − y mj ) (3) Where the error is obtained by the difference of the actual output value and the desired target value. (5) Calculate the error for each node j in each hidden layer as follows: δ mj −1 = F ' ( s mj −1 )∑ Wijmδ im (4) i
The error is obtained by feeding back errors layer by layer, where m = m, m − 1,L ,1. (6) Change the weights and thresholds backward layer by layer: Wijm (t + 1) = Wijm (t ) + ηδ mj yim −1 + α [Wijm (t ) − Wijm (t − 1)] (5)
Where t is the number of iterations; η is the learning rate ( η ∈ (0,1) ); α is the momentum value ( α ∈ (0,1) . (7) Go to step (2), start the next iteration, repeat (2) through (7) until the network error E = ∑ ∑ (T jk − y mj ) 2 / 2 (7) k
j
is sufficiently small as expected. Once the network completes its training, its weights and thresholds are determined. Thus, we can start a calculation analysis.
3 Case and Testing We demonstrate an application of the ANN technology in forecast of surface runoff in this section by examining a yearly average discharge at the Yamadu station in Yili River, China as shown in Table 1. Yili River is in the northwestward of Xinjiang, it’s a typical river in the drought area. By using the period analysis of step regression, we can obtain three correlation factors: x1 — precipitation from November in last year to March in the same year at Yili station; x2 — average 500hPa circumfluence exponent of latitude direction in Asia-Europe on August in last year; x3 — average 500hPa circumfluence exponent of longitude direction in Asia-Europe on May in last year. Since the three factors are the inputs while the yearly average discharge Q of Yamadu Station is the output, there are three nodes in the input layer and one node in the output layer. It follows from Kolmogorov’s law that there are eight nodes in the hidden layer. Hence, the ANN in forecast of surface runoff has the topological structure (3, 5, 1). In order to speed up the convergence, let us normalize the original data xi as follows xi' = ( xi − xmin ) /( xmax − xmin ) (8) Where xmax , xmin denote the maximal value and the minimal value of the precipitation series, respectively. Thus, each xi' ∈ [0,1] . We can input xi' into the input layer of the BP algorithm and select training data to start the training and learning process. We choose the learning rate η=0.85 and the momentum value α=0.60. In order to test the BP algorithm after each training and learning, we take the yearly average discharge Q in 1953-1970 as the training samples, and the yearly average discharge Q in 1971-1975 as the testing samples. After one ten thousand times of training and learning from the training samples, the network error E=0.07, which is less than expected error; thus, the BP algorithm is convergent. It is clear as shown in Table 1 that the imitation is very good since the average error e of the series is only 4.02% and the maximal error emax of the series is only -10.61%.
74
L. Feng and J. Zheng
Since the trained network has imitated and memorized the functional relationship between input and output, it can be used to determine the forecast of surface runoff. It is obvious as shown in Table 1 that the result of tests for the yearly average discharge Q in 1971-1975 is good since the average error e of the series is only 11.50% and the maximal error emax of the series is only 19.60%. Table 1. The yearly average discharge of Yamadu Station and its calculating results
—————————————————————————————————— x2 x3 Q Fit value Error Year x1 —————————————————————————————————— 1953 114.6 1.10 0.71 346 362.53 4.78 1954 132.4 0.97 0.54 410 425.44 3.77 1955 103.5 0.96 0.66 385 385.97 0.25 1956 179.3 0.88 0.59 446 480.67 7.77 1957 92.7 1.15 0.44 300 327.70 9.23 Training 1958 115.0 0.74 0.65 453 468.60 3.44 1959 163.6 0.85 0.58 495 475.31 -3.98 1960 139.5 0.70 0.59 478 481.92 0.82 1961 76.7 0.95 0.51 341 337.03 -1.16 1962 42.1 1.08 0.47 326 304.53 -6.58 1963 77.8 1.19 0.57 364 325.39 -10.61 1964 100.6 0.82 0.59 456 428.28 -6.08 sample 1965 55.3 0.96 0.40 300 297.92 -0.69 1966 152.1 1.04 0.49 433 419.93 -3.02 1967 81.0 1.08 0.54 336 345.38 2.79 1968 29.8 0.83 0.49 289 304.98 5.53 1969 248.6 0.79 0.50 483 491.50 1.76 1970 64.9 0.59 0.50 402 401.87 -0.03 —————————————————————————————————— 1971 95.7 1.02 0.48 384 359.84 -6.29 Testing 1972 89.8 0.96 0.39 314 348.52 11.00 1973 21.8 0.83 0.60 401 460.74 14.90 sample 1974 78.5 0.89 0.44 280 334.89 19.60 1975 90.0 0.95 0.43 301 318.19 5.71 —————————————————————————————————— To estimate roughly the future trend of surface runoff in Yili River, they can been divided into three types: 1 — low-water year ( Q < 320 m3/s); 2 — mid-water year ( 320 ≤ Q < 400 m3/s); 3 — wet-water year ( Q ≥ 400 m3/s), which shows as (1, 0, 0), (0, 1, 0), (0, 0, 1) (Table 2). Therefore the three factors are the inputs while low-water,
A New Technique for Forecast of Surface Runoff
75
mid-water and wet-water year are the outputs. Hence, the ANN in forecast of surface runoff has the topological structure (3, 5, 3). Table 2. Three types of the yearly average discharge of Yamadu Station and its calculating results
—————————————————————————————————— 1 2 3 4 5 6 7 8 —————————————————————————————————— 1953 346 2 0 1 0 0.0000 1.0000 0.0017 2 √ 1954 410 3 0 0 1 0.0000 0.0525 0.9709 3 √ 1955 385 2 0 1 0 0.0000 0.9999 0.0322 2 √ 1956 446 3 0 0 1 0.0000 0.0020 0.9988 3 √ 1957 300 1 1 0 0 0.9624 0.0820 0.0007 1 √ Training 1958 453 3 0 0 1 0.0000 0.0260 0.9949 3 √ 1959 495 3 0 0 1 0.0000 0.0017 0.9989 3 √ 1960 478 3 0 0 1 0.0000 0.0013 0.9991 3 √ 1961 341 2 0 1 0 0.0017 0.8661 0.0254 2 √ 1962 326 2 0 1 0 0.0660 0.9790 0.0000 2 √ 1963 364 2 0 1 0 0.0000 1.0000 0.0001 2 √ 1964 456 3 0 0 1 0.0000 0.0626 0.9854 3 √ sample 1965 300 1 1 0 0 1.0000 0.0005 0.0037 1 √ 1966 433 3 0 0 1 0.0017 0.0168 0.9970 3 √ 1967 336 2 0 1 0 0.0000 1.0000 0.0002 2 √ 1968 289 1 1 0 0 0.9354 0.0706 0.0015 1 √ 1969 483 3 0 0 1 0.0179 0.0001 0.9999 3 √ 1970 402 3 0 0 1 0.0399 0.0001 0.9997 3 √ —————————————————————————————————— 1971 384 2 0 1 0 0.0519 0.4113 0.0434 2 √ Testing 1972 314 1 1 0 0 0.9992 0.0000 0.5822 1 √ 1973 401 3 0 0 1 0.0014 0.0112 0.9965 3 √ sample 1974 280 1 1 0 0 0.9944 0.0000 0.0577 1 √ 1975 301 1 1 0 0 0.9966 0.0004 0.0041 1 √ —————————————————————————————————— Where: 1-Style, 2-Year, 3- Q , 4-Real types, 5- Expected patterns, 6- Network output patterns, 7-Fit types, 8-Accord. Based on the same calculation (η=0.85, α=0.60, E=0.03) after one ten thousand times of training and learning from the training samples, fit types and forecast types of the yearly average discharge Q of Yamadu Station are obtained (Table 2). It has been made clear in the table that fit types from 1953 to 1970 and forecast types from 1971 to 1975 accord with the real types.
76
L. Feng and J. Zheng
4 Conclusion ANN deals with information through interactions among neurons (or nodes), and approximates the mapping between inputs and outputs based on the non-linear functional composition. It has the advantages of self-learning, self-organizing, and selfadapting. Therefore, it is practical to use the ANN technology to carry out forecast of surface runoff. Our calculation results have confirmed that. This note, aiming at the issues in forecast of surface runoff, has preliminarily set up a system of calculation and analysis based on ANN technology. We have developed applied functional software along with our research. This is a new attempt in forecast of surface runoff. If we combine it with other algorithms, there is no doubt that we will be able to improve the accuracy and level of forecast of surface runoff.
Acknowledgements This work was supported by National Natural Science Foundation of China (No. 40771044).
References 1. Redha, M.M.: A numerical method to index the risk of conflict around the transboundary water resources. Validation by a studied case. Water Resources 36(6), 731–742 (2009) 2. Shao, W.W., Yang, D.W., Hu, H.P., Sanbongi, K.: Water resources allocation considering the water use flexible limit to water shortage—a case study in the Yellow River Basin of China. Water Resources Management 23(5), 869–880 (2009) 3. Acharya, U.R., Bhat, P.S.: Classification of heart rate data using artificial neural network and fuzzy equivalence relation. Pattern Recognition 36(1), 61–68 (2003) 4. Campolo, M., Soldati, A., Andreussi, P.: Artificial neural network approach to flood forecasting in the River Arno. Hydrological Sciences Journal 48(3), 381–398 (2003) 5. Lippmann, R.P.: An introduction to computing with neural nets. IEEE ASSP Magazine 4(2), 4–22 (1987) 6. Chat, S.R., Abdullah, K.: Estimation of all-terminal network reliability using an artificial neural network. Computers and Operations Research 29(7), 849–868 (2002) 7. Brion, G.M., Lingireddy, S.: Artificial neural network modelling: a summary of successful applications relative to microbial water quality. Water Science and Technology 47(3), 235–240 (2003)
Computational Intelligence Algorithms Analysis for Smart Grid Cyber Security Yong Wang1,2, Da Ruan2, Jianping Xu1, Mi Wen1, and Liwen Deng3 1
Department of Computers Science and Technology, Shanghai University of Electric Power, 20090 Shanghai, China [email protected] 2 Belgian Nuclear Research Centre, Boeretang 200, 2400 Mol, Belgium [email protected] 3 Shanghai Changjiang Computer Group Corporation, 200001, China
Abstract. The cyber attack risks are threatening the smart grid security. Malicious worm could spread from meter to meter to take out power in a simulated attack. The North American Electric Reliability Corporation (NERC) has thus developed several iterations of cyber security standards. According to the NERC cyber standards CIP-002-2 requirements, in this paper, we present cyber security risk analysis using computational intelligence methods and review on core methods, such as in risk assessment HHM, IIM, RFRM algorithms, fault analysis FTA, ETA, FMEA, FMECA algorithms, fuzzy sets, intrusion detection systems, artificial neural networks and artificial immune systems. Through the analysis of the core computational intelligence algorithms used in the smart grid cyber security in power system network security lab, we clearly defined existing smart grid research challenges. Keywords: Smart Grid, Cyber Security, Computational Intelligence.
made demands of the utility companies and in one case caused a power failure that affected multiple cities [4]. The smart grid exist cyber attacks risks and challenges. We have analyzed the potential web Trojan attack risks of EPS of web-based SCADA system in power system network security lab. Through the cyber attack risk analysis in the smart grid, we will summarize the cyber security standards requirements, compared the core computational intelligence algorithms research.
2 Cyber Attack Risk in Smart Grid Statements Malicious can attack the power grid through different transmission system operators (TSO) in an interconnected power system. The smart grid needs analytic tool for the assessment of information impacts in handling on-line security after a malicious attack. The main research areas are describing the physical, cyber and decision-making aspects of the problem and their interrelations [5].Malicious with distributed features such as Botnet can bring bigger damage to the smart grid. Distributed Network Protocol (DNP3) is used between components in process automation systems, which are mainly used in electric grid. It plays a crucial role in SCADA systems, where it is used by SCADA Master Stations, Remote Terminal Units (RTUs), and Intelligent Electronic Devices (IEDs) [6]. Non-utility stakeholder data access is limited through DNP3 data sets rather than granting direct access to the data points within an outstation [7].The biggest component device in the US smart grid is smart meters. There are maybe some cyber attack risks for smart grid. The data from smart meters can be transported to back office through the smart grid network by Ethernet, CDMA or GSM. In the Black Hat conference in USA 2009, Mike Davis and et al. developed a malicious worm which, in a simulated attack, was able to spread from meter to meter to take out power in more than 15,000 homes in 24 hours [13]. In the simulate power system network lab, we have analyzed the security architecture of power systems by firewall, intrusion detection system and intrusion prevention system. We find out some possible attack methods from Trojan and Botnet in virtual power system surroundings, such as web-based SCADA system, VMware station, Honeynet, anti-trojan software and operation system security [42].
3 Cyber Security Standards Relationship 3.1 Cyber Security Standards of Smart Grid The IEEE defines electronic intrusions as: Entry into the substation via telephone lines or other electronic-based media for the manipulation or disturbance of electronic devices. These devices include digital relays, fault recorders, equipment diagnostic packages, automation equipment, computers, programmable logic controllers, and communication interfaces [8]. The standard titles include draft guide for smart grid interoperability of energy technology and information technology operation with the
Computational Intelligence Algorithms Analysis for Smart Grid Cyber Security
79
Electric Power System (EPS), and End-Use Applications and Loads [9]. IEEE also begins work on standards for cyber security trial use standard, which defines the cryptographic protocol for cyber security of serial SCADA links and engineering access points that implement the requirements of IEEE P1689. To increase the reliability and security of the power grid, the North American Electric Reliability Corporation (NERC) has developed several iterations of cyber security standards for critical cyber assets used to control the smart grid [10]. All standards on Table 1 have been approved by the NERC Board of Trustees [11]. Table 1. Cyber Security standards in Critical Infrastructure Protection (CIP) Number
3.2 Reliability Standards Relations of Critical Infrastructure Protection Critical Infrastructure Protection (CIP) standards have nine contents. Almost standards such as CIP-001, CIP-002, CIP-003, CIP-004, CIP-005, CIP-008 and CIP-009 include three sub standards. For instants CIP-001 includes CIP-002-1, CIP-002-2 and CIP-002-3 sub standards.CIP-001 only has CIP-001-1 sub standards, yet CIP-006 has five sub standards, such as CIP-006-1, CIP-006-1a, CIP-006-1b, CIP-006-2 and CIP006-3. CIP-007 has four sub standards, such as CIP-007-1, CIP-007-2, CIP-007-2a and CIP-007-3. The relation between the standards and sub standards is as follow formula 1:
{
}
⎧⎪ CIPnm n ∈ [3,9] ∧ m ∈ [1,3] ∧ n ∈ N ∧ m ∈ N ⊂ CIP2m ⎨ ⎪⎩CIP2 → CIP9
(1)
The first line in formula (1) means that standard CIP-003-1, CIP-003-2 and CIP-003-3 should be read as part of a group of standards numbered Standards CIP-002-1, CIP002-2 and CIP-002-3. The last line in formula (1) means that responsible entities should interpret and apply Standards CIP-002 through CIP-009 using reasonable business judgments [11].
80
Y. Wang et al.
4 Computational Intelligence Algorithms Analysis 4.1 Cyber Security Risk Assessment Methods for Smart Gird According to the cyber security standards of smart gird CIP-002-2, NERC Standards CIP-002-2 through CIP-009-2 provide a cyber security framework for the identification and protection of Critical Cyber Assets to support reliable operation as Table 2 [14]. Table 2. Critical cyber asset identification risk assessment requirement and algorithms Num.
Requirements
Algorithms
R1.
critical asset identification methods
HHM[16], IIM[17], RFRM[18]
R2.
critical asset identification
FTA[19], ETA[20], FMEA[21], FMECA[21]
R3.
critical cyber asset identification
Fuzzy Set[22], IDS[29], ANN[37], AIS[41]
R4.
annual approval
approval risk assessment R1,R2,R3
Distributed Control Systems (DCSs), which is a dynamic manufacturing system, in smart grids obey to R.1.2.1 requirements. Supervisory Control and Data Acquisition (SCADA) obey to R1.2.2 requirements [15]. Hierarchical Holographic Modeling (HHM) provides a methodology for capturing and dealing with fundamental, but heretofore neglected, characteristics of large-scale systems--their multifarious nature. The HHM application is for energy systems [16].HHM can identify the SCADA systems sources of risk, but to quantify the efficacy of risk management, inoperability input–output modeling (IIM) is needed [17]. The Inoperability Input-Output Model (IIM) is an analytical framework to quantify and address the risks from the intra and inter-connectedness of infrastructure sectors. Risk Filtering, Ranking, and Management (RFRM) is an eight phase process that begins with HHM for risk identification, progresses through various phases of filtered risk scenarios with quantitative ranking to the final phases of management and feedback [15,18]. Risk assessment methods, such as HHM, IIM, and RFRM, have been applied successfully to SCADA systems with many interdependencies. The methods have highlighted the need for quantifiable metrics [15]. Risk analysis methods include FTA, ETA, FMEA and FMECA, which have been applied successfully to SCADA systems in smart grid. During the R.1.2.4 system restoration procedure, Fault Tree Analysis (FTA) method is to model and analyze failure processes of engineering. FTA composes of logic diagrams that display the state of the system and is constructed using graphical design techniques [19]. FTA analysis involves five steps: define the undesired event, obtain an understanding of the system, construct the fault tree, evaluate the fault tree and control the hazards identified. Event trees are used to analyze systems in which all components are continuously operating, or for systems in which some or all of the components are in standby mode – those that involve sequential operational logic and switching [20]. Failure Modes and Effects Analysis (FMEA) is a procedure in operations management for analysis of potential failure modes within a classification system by severity or determination of the effect of failures on the system [21]. Failure Mode, Effects, and Criticality Analysis (FMECA) is an extension of Failure Mode and Effects
Computational Intelligence Algorithms Analysis for Smart Grid Cyber Security
81
Analysis (FMEA). In addition to the basic FMEA, it includes a criticality analysis, which is used to chart the probability of failure modes against the severity of their consequences [21]. 4.2 Intelligence Fuzzy Sets for Risk Analysis According to Table 2 R3 requirements, quantitative risk analysis methods fall under the broad category of probabilistic risk assessment (PRA). A natural extension to PRA involves the use of fuzzy set concepts [15]. Fuzzy sets were introduced by Lotfi A. Zadeh as an extension of the classical set. Fuzzy sets permit the gradual assessment of the membership of elements in a set and can be used in a wide range of domains in which information is incomplete or imprecise [22]. The main methods of fuzzy sets for risk analysis are: (1) measures of similarity between interval-valued fuzzy numbers and intervalvalued fuzzy number arithmetic operators [23] (2) similarity of trapezoidal fuzzy numbers [24] (3) similarity measures of generalized fuzzy numbers[25] (4) ranking generalized fuzzy numbers with different heights and spreads [26] (5) ranking fuzzy numbers using α-cuts, belief features and signal/noise ratios [27] (6) fuzzy numbers with different shapes and different deviations [28] In order to prevent from cyber attack in smart grid, computational intelligence algorithms are used in risk assessments. Fuzzy sets as one of the computational intelligence algorithms can be used in IDS [29]. Hidden Markov model (HMM) detection engine and a normal database detection engine have been combined to utilize their respective advantages [30]. Hybridized fuzzy systems with learning and adaptation methods have been made in the realm of soft computing. Neural fuzzy systems and genetic fuzzy systems hybridize the approximate reasoning method of fuzzy systems with the learning capabilities of neural networks and evolutionary algorithms [31]. Fuzzy rule-based system is evolved from an agent-based evolutionary framework and multi-objective optimization [32]. The use of fuzzy association rules for building classifiers is reported in KDD cup 99 [33]. Neuro-fuzzy networks, fuzzy inference approach and genetic algorithms are investigated and parallel neuro-fuzzy classifiers are used to do an initial classification [34]. Fuzzy sets used in cyber security of smart grid are currently active in computational intelligence area: (1) neural network, evolutionary algorithms and genetic algorithms for classification and rule definition; (2) decision trees, machine learning algorithms, such as SVMs or HMMs [29]. 4.3 Hybrid Neural Networks and Artificial Immune System in Smart Grid According to Table 2 R3 requirements, neural networks, called artificial intelligence, can be used to model complex relationships between inputs and outputs or to find patterns in data. Neuro-fuzzy network is a fuzzy inference system in the body of an artificial neural network. Depending on the FIS type, there are several layers that simulate the processes involved in a fuzzy inference like fuzzification, inference,
82
Y. Wang et al.
aggregation and defuzzification. Embedding an FIS in a general structure of an ANN has the benefit of using available ANN training methods to find the parameters of a fuzzy system.We have researched neural networks with fuzzy sets used in Denial of Service (DoS) intrusion detection from KDD cup 99 records [35]. The method can classify malicious software behavior detection algorithms from smart gird cyber [36]. Neural network algorithms can be used to find a stochastic frontier by input– output observational data and do not require explicit assumptions about the function structure of the stochastic frontier [37]. The algorithm is also used in dynamic voltage stability assessment of power transmission systems [38] and power transient stability [39]. According to Table 2 R3 requirements, Artificial Immune Systems (AIS) will be widely used in smart gird in the future according to the US National Energy Research Lab research report [41]. IBM has been preparing a defense against fast spreading viruses for several years. The Digital Immune System for Cyberspace can automatically detect viral activity during early spread. The Smart Grid’s integrated immune security systems will reduce physical and cyber vulnerabilities and improve the speed of recovery from disruptions and security breaches. The AIS algorithms composed of model such as self–non-self discrimination, lifecycle, evolutionary and network [29]. In order to get complexity possible failures and anomalous working arguments from smart grid critical infrastructures, we firstly need to detect early anomalies and failures inside information intensive critical infrastructures. Neural networks are used to analysis the intrusion detection data from executed inside an emulated SCADA system of an electrical power transmission grid [40].
5 Conclusions The cyber attack risks are threatening the smart grid security. Smart grid security protocols will contain elements of deterrence, prevention, detection, response and mitigation, and a mature Smart Grid will be capable of thwarting multiple, coordinated attacks over a span of time. This paper presents cyber security risk analysis using computational intelligence methods. According to the reliability standards CIP002-2 of cyber security in smart grid requirements, this review aimed at core methods in risk assessment HHM, IIM, RFRM algorithms, fault analysis FTA, ETA, FMEA, FMECA algorithms, fuzzy set, intrusion detection systems, artificial neural networks and artificial immune systems. Through the analysis in power system network security lab, we summarized the smart grid cyber security standards requirements, compared the core computational intelligence algorithms research, which allows us to clearly define existing smart grid research challenges. Acknowledgments. The National Natural Science Foundation of China under Grant No.60903188. Shanghai postdoctoral scientific program No.08R214131. World Expo Science and Technology Special Fund of Shanghai Science and Technology Commission (08dz0580202).
Computational Intelligence Algorithms Analysis for Smart Grid Cyber Security
83
References 1. Wang, J.W., Rong, L.L.: Cascade-based Attack Vulnerability on the US Power Grid. Security Science 47, 1332–1336 (2009) 2. Chen, G., Dong, Z.Y., David, J.H., Zhang, G.H., Hua, K.Q.: Attack Structural Vulnerability of Power Grids: A Hybrid Approach Based on Complex Networks. In: Physica A: Statistical Mechanics and its Applications, vol. 389, pp. 595–603. Elsevier, Amsterdam (2010) 3. Ettore, B., Roberto, N., Fei, X.: Analysis of Structural Vulnerabilities in Power Transmission Grids. International Journal of Critical Infrastructure Protection 2, 5–12 (2009) 4. John, S.: CIA Says Hackers Attack Global Power Grid. Info. Security 5, 9 (2009) 5. Bompard, E., Napoli, R.: Assessment of Information Impacts in Power System Security against Malicious Attacks in a General Framework. Reliability Engineering & System Safety 94, 1087–1094 (2009) 6. DNP3 Wikipedia, http://en.wikipedia.org/wiki/DNP3 7. Todd, M., Richard, C., Farhad, N.: Power System DNP3Data Object Security using Data Sets. Computers & Security (2009), doi:10.1016/j.cose.2009.10.001 8. Electric Power Research Institute, 2010 Research Portfolio (2010), http://portfolio.epri.com/ 9. IEEE Smart Grid StandardsP2030, http://grouper.ieee.org/groups/scc21/dr_shared/2030/ 10. IEEE Power Engineering Society, IEEE Standard 1402-2000: IEEE Guide for Electric Power Substation Physical and Electronic Security. IEEE, New York, NY (2000) 11. North American Electric Reliability Corporation, Reliability Standards of Cyber Security, http://www.nerc.com/page.php?cid=2|20 12. Smart Grid Device, http://earth2tech.files.wordpress.com/2008/04/ silver-demo.jpg 13. Smart Grid Device Security, By Mike Davis - Senior Security Consultant at Black Hat USA (2009), http://www.blackhat.com/presentations/bh-usa-09/ MDAVIS/BHUSA09-Davis-AMI-SLIDES.pdf 14. NERC Reliability Standards of Cyber Security: Standard CIP-002-3-Cyber Security- Critical Cyber Asset Identification, http://www.nerc.com/files/CIP-002-3.pdf 15. Ralston, P.A.S., Graham, J.H., Hieb, J.L.: Cyber Security Risk Assessment for SCADA and DCS Networks. ISA Transactions 46, 583–594 (2007) 16. Haimes, Y.Y.: Hierarchical Holographic Modeling. IEEE Transactions on Systems, IEEE System Man, and Cybernetics 11, 606–617 (1981) 17. Crowther, K.G., Haimes, Y.Y.: Application of the Inoperability Input–output Model (IIM) for Systemic Risk Assessment and Management of Interdependent Infrastructures. Systems Engineering, IEEE System Man, and Cybernetics 8, 323–341 (2005) 18. Haimes, Y.Y., Kaplan, S., Lambert, J.H.: Risk Filtering, Ranking, and Management Framework using Hierarchical Holographic Modeling. Risk Analysis 22, 381–395 (2002) 19. Fault Tree Analysis, http://www.fault-tree.net/ 20. Event Tree Analysis, http://www.event-tree.com/ 21. Failure Mode and Effects Analysis (FMEA) or Failure Mode, Effects, and Criticality Analysis (FMECA), http://www.fmea-fmeca.com/ 22. de Ru, W.G., Eloff, J.H.P.: Risk Analysis Modelling with the use of Fuzzy Logic. Computers & Security 15, 239–248 (1996) 23. Chen, S.M., Chen, J.H.: Fuzzy Risk Analysis based on Similarity Measures Between Interval-valued Fuzzy Numbers and Interval-valued Fuzzy Number Arithmetic Operators. Expert Systems with Applications 36, 6309–6317 (2009) 24. Xu, Z.Y., Shang, S.C., Qian, W.B., Shu, W.H.: A Method for Fuzzy Risk Analysis Based on the New Similarity of Trapezoidal Fuzzy Numbers. Expert Systems with Applications 37, 1920–1927 (2010)
84
Y. Wang et al.
25. Wei, S.H., Chen, S.M.: A New Approach for Fuzzy Risk Analysis Based on Similarity Measures of Generalized Fuzzy Numbers. Expert Systems with Applications 36, 589–595 (2009) 26. Chen, S.M., Chen, J.H.: Fuzzy Risk Analysis Based on Ranking Generalized Fuzzy Numbers with Different Heights and Different Spreads. Expert Systems with Applications 36, 6833–6842 (2009) 27. Chen, S.M., Wang, C.H.: Fuzzy Risk Analysis Based on Ranking Fuzzy Numbers using αcuts, Belief Features and Signal/Noise Ratios. Expert Systems with Applications 36, 5576–5581 (2009) 28. Lee, L.W., Chen, S.M.: Fuzzy Risk Analysis Based on Fuzzy Numbers with Different Shapes and Different Deviations. Expert Systems with Applications 34, 2763–2771 (2008) 29. Shelly, X.W., Wolfgang, B.: The Use of Computational Intelligence in Intrusion Detection Systems: A review. Applied Soft Computing 10, 1–35 (2010) 30. Xuan, D.H., Jiankun, H., Peter, B.: A Program-based Anomaly Intrusion Detection Scheme using Multiple Detection Engines and Fuzzy Inference. Journal of Network and Computer Applications 32, 1219–1228 (2009) 31. Abadeh, M.S., Habib, J., Lucas, C.: Intrusion Detection using a Fuzzy Genetics-based Learning Algorithm. Journal of Network and Computer Applications 30, 414–428 (2007) 32. Tsang, C.H., Kwong, S., Wang, H.L.: Genetic-Fuzzy Rule Mining Approach and Evaluation of Feature Selection Techniques for Anomaly Intrusion Detection. Pattern Recognition 40, 2373–2391 (2007) 33. Tajbakhsh, A., Rahmati, M., Mirzaei, A.: Intrusion Detection using Fuzzy Association Rules. Applied Soft Computing 9, 462–469 (2009) 34. Adel, N.T., Mohsen, K.: A New Approach to Intrusion Detection Based on an Evolutionary Soft Computing Model using Neuro-fuzzy Classifiers. Computer Communications 30, 2201–2212 (2007) 35. Wang, Y., Gu, D.W., Wen, M., Xu, J.P., Li, H.M.: Denial of Service Detection with Hybrid Fuzzy Set based Feed Forward Neural Network. In: Advances in Neural NetworksISNN 2009. LNCS. Springer, Heidelberg (2009) 36. Wang, Y., Gu, D.W., Wen, M., Li, H.M., Xu, J.P.: Classification of Malicious Software Behaviour Detection with Hybrid Set based Feed Forward Neural Network. In: Advances in Neural Networks-ISNN 2009. LNCS. Springer, Heidelberg (2010) 37. Azadeh, A., Ghaderi, S.F., Anvari, M., Saberi, M.: Performance Assessment of Electric Power Generations Using an Adaptive Neural Network Algorithm. Energy Policy 35, 3155–3166 (2007) 38. Vakil-Baghmisheh, M.T., Razmi, H.: Dynamic Voltage Stability Assessment of Power Transmission Systems using Neural Networks. Energy Conversion and Management 49, 1–7 (2008) 39. Sawhney, H., Jeyasurya, B.: A Feed-forward Artificial Neural Network with Enhanced Feature Selection for Power System Transient Stability Assessment. Electric Power Systems Research 76, 1047–1054 (2006) 40. Balducelli, C.S., Lavalle, L., Vicoli, G.: Safeguarding Information Intensive Critical Infrastructures Against Novel Types of Emerging Failures. Reliability Engineering & System Safety 92, 1218–1229 (2007) 41. Research on the Characteristics of a Smart Grid by the NETL Modern Grid Strategy Team, http://www.netl.doe.gov/moderngrid/referenceshelf/articles/ EC%20Self%20Heals_Renz_APPROVED_2008_12_02.pdf 42. Wang, Y., Gu, D.W., Xu, J.P., Du, H.Z.: Hacking Risk Analysis of Web Trojan in Electric Power System. In: Electric Power System, 2009 International Conference on Web Information Systems and Mining, pp. 1047–1054. IEEE Press, Los Alamitos (2009)
Using AOBP for Definitional Question Answering Junkuo Cao, Weihua Wang, and Yuanzhong Shu NanChang HangKong University, Department of Computer Application, Fenghe south Road. 696, 330063 NanChang, China [email protected], [email protected], [email protected]
Abstract. This paper presents an integrated system for the task of definitional question answering. Firstly, we extract question-related knowledge as much as possible which include 3 categories. The first is based on language model. We train our language model on four different corpora. The second resource is the syntax dependency relations, which are extracted by Minipar. And the third resource contains only one feature, the document score provided by Information Retrieval (IR) engine. After that, we use a novel Adaptive Optical Back-Propagation (AOBP) neural network to score candidate sentences by using above extracted knowledge. The top k candidates will be selected as the final question answers. We experiment on the task of definitional question answering in TREC2006&2005. The experimental results show that our method can greatly improve the performance. Keywords: definitional question answering; language model; Adaptive Optical Back-Propagation (AOBP).
1 Introduction Related works concerning definitional question answering are mostly concentrated on Centroid-based ranking, Pattern Extraction, as well as utilizing external knowledge. In this paper, we present an integrated system for the task of definitional question answering. Firstly, we extract question-related knowledge as much as possible. After that, we use an AOBP to score candidate sentences by using the extracted knowledge. Then the top k candidate sentences will be selected as the final question answers. This paper is organized as following. In Section 2, we introduce the multiple resources used in system. Following that, we will provide the learning process of AOBP. Then in Section 4, experiments and analysis are conducted. Finally, our conclusions are given in Section 5.
on syntax dependency relation of the sentences, which include 20 different types of relations used in Minipar. And the last is the document score returned by IR engine. 2.1 Features Based on Language Model General Language Model. Given a sentence s and its’ words sequence w1w2…wn, the probability P(s) can be rewritten by using the chain rule as follows: P(w1,…,n)=P(w1)P(w2|w1)P(w3|w1,2)…P(wn|w1,...,n-1) .
(1)
Assuming the word occurrences are independent of one another, the probability can be calculated by the following equation: P(w1,..,n)=∏i P(wi) .
(2)
The prior probability P(wi) of a word wi is estimated by maximum likelihood estimation (MLE) based on the whole collection where the answer is searched: P(wi)=C# (wi)/∑jC# (wj) .
(3)
Where C# (wi) is the occurrence count of word wi in the whole collection. Corpus Construction. We construct a corpus composed of definitional sentences by web knowledge on the train targets. In our system, the knowledge is extracted from some specific websites, such as online biography dictionaries or online cyclopedia. Because some words like named entity phrase and number word may be high related with the specific target, we rewrite them with general tags. For each candidate sentence s, we train our language model on the constructed corpus. To calculate P(s), s is also rewritten in the same way. After that, we use unigram model to calculate the probabilities for each sentence feature. 2.2 Features Based on Syntax Relation The second resource is the syntax dependency relation extracted by Minipar (Some familiar relation shown as Table 1). Each target Ti is associated with a set of answer sentences Ai. To evaluate each answer a∈Ai, we define ψ (Ti, a) as the relation feature vector between them. Firstly, we get a set of triples relation R(Ti, a)={<word1, r, word2>} by Minipar, where r is one of Minipar relation, and the one word of word1 or word2 should be occurred in target Ti , another one in answer a. And then we extract 20 relation patterns R={R1,R2…R20}, after removing those relations whose occurrence in returned is less than a predefined threshold times. So we can define the feature vector ψ (Ti, a) as follows:
ψ (Ti , a) = {ψ j (Ti , a)}20j =1
.
(4)
and define the pattern feature ψj (Ti, a) as:
⎧⎪Score( R j ) ⎪⎩0
ψ j (Ti , a) = ⎨
where R j ∈ R(Ti , a) where R j ∉ R(Ti , a)
. Where Score(R j ) is the appearance proportion of the relation R j in the answer A.
(5)
Using AOBP for Definitional Question Answering
87
Table 1. The familiar dependency relation list
Abbr.
Definition
Example
nn det gen poss appo
noun-noun modifier determiner of a noun genitive modifier of a noun possive modifier of a noun a positive of a noun
John Wayne airport the cat Marry’s bag Marrry ’s bag the caption, Tom
mod pred
adjunct modifer of any head Predicative
black horse John is beatuful
pre
pre determiner
all the participants
s
surface subject
The dog is chased by the cat
subj
subject of a verb
Susan loves rose
2.3 Features Based on IR Engine The last resource contains only one feature, the document score provided by Information Retrieval (IR) engine. For each target, we receive a set of documents, which were ranked and scored according to their relevancy to the target. After that, all of the returned documents are tokenized into candidate sentences, denoted as S={s1, s2, ..., sm}, and they are supposed to have an initial score which is exactly the same with the document which they belong to. This score is expressed as ScoreD(s). Since there is a case that a single sentence occurs in several documents, we calculate sentence score by IR, Score IR (s), as follows: 2Count D ( s ) . ) ScoreIR ( s ) = Max ( ScoreD ( s )) × (2 − (6) Count D ( s ) 2 + 1 Where Max(ScoreD(s)) denotes the maximal score among the documents which sentence s belongs to, and CountD(s) is the total number of these documents. While CountD(s) =1, this formula is equivalent to ScoreD(s). Moreover, with the number of CountD(s) increased, ScoreIR(s) also experiences a rise, which is in accordance with our previous assumption.
3 AOBP Algorithm 3.1 BP Neural Network The artificial neural network (ANN) was inspired by biological model of neurological systems and is an established machine learning model with robust learning properties and simple deployment. Since the introduction of the Back-Propagation (BP) learning algorithm [2], this gradient descent method has emerged as one of the most well-known and popular learning algorithms for artificial neural networks (ANNs). However, in
88
J. Cao, W. Wang, and Y. Shu
various cases its convergence speed often tends to be very slow. Therefore, Optical Back-Propagation (OBP) neural network will be introduced at which it would improve the performance of the BP algorithm and speed up the learning process [3]. In standard BP, the error at a single output unit is defined:
ErrorBP = (Ypk − O pk ) .
(7)
Where the subscript “p” refers to the pth training vector, and “k” refers to the kth output unit. In this case, Ypk is the desired output value, and Opk is the actual output from kth unit. The error at a single output unit in OBP will be defined as follows:
Otair & Salameh has prove that OBP will minimize the errors of each output unit more quickly than general BP, and the weights on certain units change very large from their starting values [3]. Algorithm 1. The rules to tune-up the momentum
If ErrorBP (t + 1) − ErrorBP ( t ) > τ • ErrorBP ( t ) Set η = η + η × τ Else if
ErrorBP ( t + 1) − ErrorBP ( t ) > 0
Set η = η + η × Error BP (t ) / Error BP (t + 1) Else Set η = ε 1 3.2 Adaptive OBP Neural Network Several suggestions have been proposed to train BP neural network with adaptive learning rate. For example, start with a small learning rate and increase it, if successive epochs keep gradient direction fairly constant, or rapidly decrease it, if the direction of the gradient varies greatly at each epoch [4]; for each weight, an individual learning rate is given, which increases if the successive changes in the weights are in the same direction and decreases otherwise [5][6]; and use a closed formula to calculate a common learning rate for all the weights at each iteration [7]. Note that all the above mentioned strategies employ heuristic parameters in an attempt to enforce the decrease of the learning error at each iterating step and to secure the convergence of the training algorithm. In this paper, we bring forward a simple and novel BP neural network, in which it not only applied some adaptive rules to tune-up the learning rate and momentum, but
Using AOBP for Definitional Question Answering
89
also to adjust the error through OBP. For the steps of the OBP training, please refer to the related document [3]. In the steps of the OBP training, we join the adaptive rules based on the strategy: increase learning rate and momentum exponentially if successive epochs reduce the error, or decrease them if a significant error increase occurs [8]. Assume learning rate at t iteration asα(t), momentumμ(t), and the error between desired output and actual output ErrorBP(t). The learning rate tune-up as follows:
α ( t + 1) = λ • α ( t ) • 2 ∑ (Ypk − O pk ) 2 + ε 0 .
(9)
k
Where
⎧+ 1 ⎩− 1
λ=⎨
if ErrorBP ( t ) − ErrorBP ( t + 1) ≤ 0 if ErrorBP ( t ) − ErrorBP ( t + 1) > 0
.
(10)
And ε0 is an initial learning rate. The pseudo code to tune-up the momentum is shown in Algorithm 1, in which τ is a threshold between 0 and 1. Whereε1, some likesε0, is the initial momentum.
4 Experiments 4.1 Evaluation Metric We adopt the evaluation metrics used in the TREC definitional question answering task. For each topic, TREC provides a list of answer nuggets to evaluate system’s response. According to the official evaluation criterion, [9] designed an effective automatically evaluation tool for definitional question answering. All experimental results are tested by the tool. Table 2. Comparison with BP, ABP, OBP on TREC2006 Definition Question Answering with the training data TREC2005. Thinking of the sensitivity of BP neural network, we take the average F3-Score for each model on running 10 times.
Table 3. Comparison with FUDQA, SP and HIM on TREC2005 Definition Question Answering with the same training data TREC2004
System FUDQA SP HIM AOBP
F3-Score 0.310 0.287 0.303 0.308
4.2 Process and Analysis of Experiments Our experiments include 65 TREC 2004 targets, 75 TREC 2005 targets and 75 TREC 2006 targets. And we look for the answer sentences from the AQUAINT which is the corpus in TREC2003-2006 QA track. In order to building training corpus, we collect the evaluation of TREC to all the submitted answers from participants. If a [string, docid] pair is judged covering certain nugget of a target, we extract the original sentence from AQUAINT according to the [string, docid] pair, and add it to the corpus. In order to train our neural network, we use the 25 features (see section 2) as input vector. In this experiment, we use two types structure to train our neural network for the task of definitional question answering. Both structures have the same dimension input vector and one hidden layer. But one structure has only one output neuron, another has two output items. For the structure with one output neuron, we set the desired output 0.999 for an answer sentence, and set 0.001 for a non-answer sentence. For the structure with two output items, we set the desired output vector {0.999, 0.001} for an answer sentence, and set {0.001, 0.999} for a non-answer sentence. In the test process of the first structure model, the candidate sentence score is the actual output. While in the second structure, the candidate sentence score is calculated by the diversity between two output units. Both structures select the top 12 candidate sentences as final question answers. In the training process of AOBP neural network, there are three parameters which we should determine first, including initial learning rate, initial momentum and the threshold τ. For the first two parameters, we just use empirical values which are 0.02 and 0.01 respectively. To evaluate the threshold τ, we design experiments on all values by the step equals 0.1 between 0.0 and 1.0. By the parameter τ training on data TREC2005 and the test data TREC2006, we can get that F3-Score will get the best result when τ around 0.4, and decrease sharply when τ become too big or too small. 4.3 Comparison with Other Systems To evaluate the effectiveness of our approach, we design two experiments on the data set of TREC2004&2005&2006 Definition Question Answering track. The first experiment is used to evaluated BP, ABP, OBP, and AOBP for the same task, in which “A” denotes adaptive and “O” denotes as optical. The structure denotes the neuron number in each layer. Thinking of the sensitivity of BP neural network, we take the average F3-Score for each model on running 10 times. As shown from Table 2, the performance of standard BP is usually worse than the other modified BP. And the AOBP is little better than OBP and ABP.
Using AOBP for Definitional Question Answering
91
In the second experiment, four systems are selected to compare with each other. They include our system AOBP using the structure (25-8-2) and setting τ=0.4, and three state of the art systems, which are FDUQA[9], Human Interests Models(HIM) and Soft Pattern Model(SP). All of these systems are tested and evaluated with the same setting of [9] and on the same training data TREC2004, and test data TREC2005. As Table 3 shows, the performance of AOBP clearly outperforms SP, and is a little higher than HIM and is a little lower than FDUQA (the best participated system evaluated by official TEEC2007).
5 Conclusion In this paper, we integrate multiple resources for the task of definitional question answering. Specifically, we have proposed a novel adaptive optical BP neural network to rank candidate sentence. In the new BP neural network, we join some effective adaptive rules in the OBP. Experimental results indicate that our proposed method has a comparable result to the state of art system. For the future work, we will seek the application of our method on the ranking problems in other tasks such as summarization and query expansion. To acquire the reliable information, external knowledge and the related words, phrases and entities were extracted. Using these multiple knowledge, the definitional QA system can rank the candidate answers effectively.
References 1. Han, K.S., Song, Y.I., Rim, H.C.: Probabilistic model for definitional question answering. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2006) 2. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation: Parallel Distributed Processing. Exploration in the Microstructure of Cognition 1, 318–362 (1986) 3. Otair, M.A., Salameh, W.A.: An improved back-propagation neural networks using a modified non-linear function. In: Proceedings of the IASTED International Conference, pp. 442–447 (2004) 4. Chan, L.W., Fallside, F.: An adaptive training algorithm for back–propagation networks. Computers Speech and Language 2, 205–218 (1987) 5. Jacobs, R.A.: Increased rates of convergence through learning rate adaptation. Neural Networks 1, 295–307 (1988) 6. Riedmiller, M., Braun, H.: A direct adaptive method for faster backpropagation learning: the Rprop algorithm. In: Proceedings of the IEEE International Conference on Neural Networks, San Francisco, pp. 586–591 (1993) 7. Magoulas, G.D., Vrahatis, M.N., Androulakis, G.S.: Effective back–propagation with variable stepsize. Neural Networks 10, 69–82 (1997) 8. Battiti, R.: Accelerated back-propagation learning: two optimization methods. Complex Systems 3, 331–342 (1989) 9. Qiu, X., Li, B., Shen, C., Wu, L., Huang, X., Zhou, Y.: FDUQA on TREC2005 QA Track. In: Proceedings of the Sixteenth Text REtreival Conference (2007)
Radial Basis Function Neural Network Based on PSO with Mutation Operation to Solve Function Approximation Problem Xiaoyong Liu 1,2,3 1
Department of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, Guangdong, 510665, China 2 National Science Library, Chinese Academy of Sciences, Beijing, 100190, China 3 Graduate University of Chinese Academy of Sciences, Beijing 100049, China [email protected]
Abstract. This paper presents a novel learning algorithm for training and constructing a Radial Basis Function Neural Network (RBFNN), called MuPSORBFNN algorithm. This algorithm combines Particle Swarm Optimization algorithm (PSO) with mutation operation to train RBFNN. PSO with mutation operation and genetic algorithm are respectively used to train weights and spreads of oRBFNN, which is traditional RBFNN with gradient learning in this article. Sum Square Error (SSE) function is used to evaluate performance of three algorithms, oRBFNN, GA-RBFNN and MuPSO-RBFNN algorithms. Several experiments in function approximation show MuPSO-RBFNN is better than oRBFNN and GA-RBFNN. Keywords: Radial Basis Function Neural Network (RBFNN), Genetic Algorithm, Particle Swarm Optimization, Function Approximation.
Radial Basis Function Neural Network Based on PSO with Mutation Operation
93
1.2 Function Approximation The need for an approximating functional form arises if one could in principle the function value for any given set of arguments but that it is very expensive to do so. For example, the function value may be the outcome of many complex calculations and it may take a lot of computing time to calculate one function value. With an approximating functional form one could obtain (approximate) function values much quicker. Again the goal would be to come up with an approximating functional form using a finite set of data points. The freedom to choose the location of the arguments makes it much easier to come up with accurate approximations. The theory on function approximation is very useful if one is trying to solve for a function that is (implicitly) defined by a system of functional equations.
2 Methods 2.1 RBFNN The structure of the RBFNN is showed in Fig.1. RBF networks have three layers[3]:
Fig. 1. The structure of the RBF neural network
Input layer – There is one neuron in the input layer for each predictor variable. In the case of categorical variables, N-1 neurons are used where N is the number of categories. The input neurons (or processing before the input layer) standardizes the range of the values by subtracting the median and dividing by the interquartile range. The input neurons then feed the values to each of the neurons in the hidden layer.
94
X. Liu
Hidden layer – This layer has a variable number of neurons (the optimal number is determined by the training process). Each neuron consists of a radial basis function centered on a point with as many dimensions as there are predictor variables. The spread (radius) of the RBF function may be different for each dimension. The centers and spreads are determined by the training process. When presented with the x vector of input values from the input layer, a hidden neuron computes the Euclidean distance of the test case from the neuron’s center point and then applies the RBF kernel function to this distance using the spread values. The resulting value is passed to the output layer. Output layer – The value coming out of a neuron in the hidden layer is multiplied by a weight associated with the neuron and passed to the summation which adds up the weighted values and presents this sum as the output of the network. Not shown in this figure is a bias value of 1.0 that is multiplied by a weight W0 and fed into the summation layer. For classification problems, there is one output (and a separate set of weights and summation unit) for each target category. The value output for a category is the probability that the case being evaluated has that category. The weights applied to the RBF function outputs as they are passed to the summation layer. Mathematically the network output is expressed by the following: M
y k ( x ) = ∑ wkj F j ( x ) + wko
(1)
j =1
Where
x is the n-dimensional input vector with elements, xi and wkj are the Output
wko is the bias. The basis function F j ( x ) for the popular Gaussian function is expressed as the
layer weights and following:
⎛ x −uj F j ( x ) = exp⎜ − ⎜ 2r j2 ⎝
⎞ ⎟ ⎟ ⎠
(2)
Where r is the width of the basis function-Gaussian function, or called radius, and u j is the vector determining the center of the basis function F j with elements u ji . Generally, r is equal 1 in the RBFNN algorithm. Training an RBF network with linear outputs is accomplished in two stages. The first stage is unsupervised and accomplished by obtaining cluster centers of the training set input vectors. A popular method is k-means clustering, which is applied by Moody and Darken[4]. The second stage consists of solving a set of linear equations the solution of which can be obtained by a matrix inversion technique or by least squares.[5] Various methods have been used to train RBF networks. One traditional approach first uses K-means clustering to find cluster centers which are then used as the centers for the RBF functions. RBFNN based on clustering is called oRBFNN in this paper.
Radial Basis Function Neural Network Based on PSO with Mutation Operation
95
2.2 GA-RBFNN Holland[5] illustrated how the Darwinian evolution process can be applied, in the form of an algorithm, to solve a wide variety of problems. Due to the biological motivation this highly parallel adaptive system is now called the genetic algorithm (GA). The GA has a population of individuals competing against each other in relation to a measure of fitness, with some individuals breeding, others dying off, and new individuals arising through combination and mutation. Generally, GA has three operations, Reproduction, Crossover and Mutation. Harpham et al.[6] pointed in their review that there are different studies in applying GAs to the RBF network, such as a search for an optimal subset or an optimal architecture, optimizing all parameters, optimizing network learning parameters, and using a GA as a clustering algorithm etc. 2.3 MuPSO-RBFNN Particle Swarm Optimization (PSO) is a population-based stochastic optimization technique developed by Kennedy and Eberhart[7], inspired by social behavior of bird flocking or fish schooling. In the PSO algorithm, there is a swarm of particles moving in an n-dimensional problem space, where each particle represents a potential solution. In simple terms, particles are ‘flown’ through a multidimensional search space, where the position of each particle is adjusted according to its own experience and that of its neighbors [8]. There are a number of studies using PSO for training neural networks (NNs) in the literature for different applications. Zhang et al.[9] studied a new evolutionary system for evolving artificial neural networks, which is based on the PSO. Yin [10] presented a new polygonal approximation approach based on the discrete PSO. Zhang et al. [11] developed a hybrid PSO-BP algorithm for feed-forward NN training. Different applications are illustrated with the test results. Da and Xiu[12] run developed a PSObased NN with simulated annealing technique. Das et al.[13] developed PSO-NN toolbox for signature verification(SV). Lee and Ko[14] used PSO and RBFNN to solve time series prediction problem. This article presents a new training RBFNN algorithm based on PSO with mutation operation. Mutation operation, like GA, provides a mechanism for introducing new material into the gene pool, thus preventing the algorithm from getting stuck in local minima. The strings are mutated in a bit by bit process. However, the probability of mutation is usually set very low and if selected the binary character is swapped from 0 to 1 or vice versa. This study uses a structure of RBFNN, in which, the centers of radial-basis functions and all other free parameters of the network will undergo a supervised learning process, that is to say, the RBFNN takes on its most generalized form. A gradientdescent procedure is adopted to train RBFNN. [3]
96
X. Liu
In this work, MuPSO-RBFNN is proposed. Its pseudo-code is following: Step1 Initialization of the RBFNN Determine the number of hidden layers (centers), learning rate between 0 and 1. Step2 Choose the initial value of weights between -1 and 1 randomly Choose the initial value of spreads of centers randomly Step3 PSO for training parameters of RBFNN Define PSO parameters Initialize population Calculate fitness value of each particle While (error criteria is not attained) { Calculate lbest value of a particle Calculate gbest value Update velocity and position vector of each particle Mutation operation for particle according Mutation rate Evaluate } End criteria (maximum iterations) Output parameters, weights and spreads Step4 Using weights and spreads to train RBFNN Positions of Centers Spreads of Centers Linear weights Step5 Using the RBFNN to solve problems Positions of Centers in test dataset Spreads of Centers in test dataset Output of MuPSO-RBFNN Step6 Calculate SSE of MuPSO-RBFNN
3 Simulation 3.1 Results For the compare of performance in function approximation between oRBFNN , GARBFNN and MuPSO-RBFNN, the function , as showed in equation (3), is chosen to test the novel algorithm.
⎛ x2 ⎞ y = 1.1 1 − x + 2 x 2 exp⎜⎜ − ⎟⎟ ⎝ 2 ⎠
(
)
(3)
In the range of -4 and 4, 100 data points are produced randomly. This paper chose SSE, Sum of Square Error, as index that evaluated algorithms. SSE is a network performance function to compare different NNs. It measures the network's performance according to the Sum of squared errors.
Radial Basis Function Neural Network Based on PSO with Mutation Operation
97
The programs of oRBFNN , GA-RBFNN and Mu-RBFNN are written by Matlab 2008a. All Trials have been executed on a PC with 2.0GHz CPU, 1GB DDR RAM. Three algorithms are run several times respectly. Value of parameters of oRBFNN, GA-RBFNN and MuPSO-RBFNN are set as table1, and are run several times. The values of SSE (Sum of Square Error) are showed in Table2 which are calculated by three algorithms. As table1 shown, three algorithms will iterate one thousand in each running process. Learning rate of RBFNN with ten hidden layers is fixed as 0.001. Table 1. Parameters setting of Algorithms Parameters Value
Iterations 1000
Mutation rate 0.04
Learning rate 0.001
Hidden layer 10
Table 2. Result of Comparison among three algorithms Algorithm SSE
oRBFNN 1.1476
Fig. 2. SSE of oRBFNN
Fig. 4. SSE of GA-RBFNN
GA-RBFNN 1.1447
MuPSO-RBFNN 0.8756
Fig. 3. Curve between oRBFNN and Real Value
Fig. 5. Curve between GA-RBFNN and Real Value
98
X. Liu
Fig.2, Fig.4 and Fig.6 showed the change of the SSE in three algorithms in one thousand iterations. Fig.3, Fig.5 and Fig.7 are result of compassion between three algorithms and output real value by Function, showed as equation (3). The blue curve is approximated about Function by three algorithms. And the red curve is actual picture of function.
Fig. 6. SSE of MuPSO-RBFNN
Fig. 7. Curve between MuPSO-RBFNN and Real Value
3.2 Discussion From Table 2, SSE of MuPSO-RBFNN is least among three algorithms. The result of GA-RBFNN is better than oRBFNN. From the index of SSE, the performance of MuPSO-RBFNN, which is optimized by PSO with mutation operation, is better than the other algorithms. RBFNN trained by GA and PSO are better than standard RBFNN.
4 Conclusions ANN is one of AI application which has recently been used widely to model human interesting activities in many fields. This paper presents a new RBFNN algorithm, Mu-RBFNN. The algorithm, which combines RBFNN with gradient algorithm and PSO with mutation operation, uses PSO to train the parameters of RBFNN and give a new method to construct the structure of RBFNN. This paper takes function approximation for example and shows Mu-RBFNN has better performance than oRBFNN and GA-RBFNN by the index of SSE, which is the performance of artificial neural networks. Acknowledgement. The author is thankful to the reviewers who provided valuable comments that greatly improved the quality of this article.
Radial Basis Function Neural Network Based on PSO with Mutation Operation
99
References 1. Rafiq, M.Y., Bugmann, G., Easterbrook, D.J.: Neural network design for engineering applications. Computers & Structures 79(17), 1541–1552 (2001) 2. Zhu, Q., Cai, Y., Liu, L.: A global learning algorithm for a RBF network. Neural Networks 12(3), 527–540 (1999) 3. Haykin, S.: Neural networks: a comprehensive foundation. Prentice-Hall, Englewood Cliffs (2008) 4. Moody, J., Darken, C.: Fast learning in networks of locally-tuned processing units. Neural computation 1(2), 281–294 (1989) 5. Holland, J.: Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor (1975) 6. Harpham, C., Dawson, C.W., Brown, M.R.: A review of genetic algorithms applied to training radial basis function networks. Neural Computing & Applications 13(3), 193–201 (2004) 7. Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proceedings of the IEEE International Conference on Neural Networks, Piscataway, NJ, pp. 39–43 (1995) 8. Armand, S., Blumenstein, M., Muthukkumarasamy, V.: Off-line signature verification using an enhanced modified direction feature with single and multi-classifier approaches. IEEE Computational Intelligence Magazine, 18–25 (2007) 9. Zhang, C., Shao, H., Li, Y.: Particle swarm optimisation for evolving artificial neural network. In: Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, pp. 2487–2490 (2000) 10. Yin, P.-Y.: A discrete particle swarm algorithm for optimal polygonal approximation of digital curves. Journal of Visual Communication and Image Representation 15, 241–260 (2004) 11. Zhang, J.R., Zhang, J., Lok, T.-M., Lyu, M.R.: A hybrid particle swarm optimization-back propagation algorithm for feedforward neural network training. Applied Mathematics and Computation 185(2), 1026–1037 (2007) 12. Da, Y., Xiurun, G.: An improved PSO-based ANN with simulated annealing technique. Neurocomputing, 527–533 (2005) 13. Das, M.T., Dulger, L.C.: Signature verification (SV) toolbox: Application of PSO-NN. Engineering Applications of Artificial Intelligence 22, 688–694 (2009) 14. Lee, C.-M., Ko, C.-N.: Time series prediction using RBF neural networks with a nonlinear time-varying evolution PSO algorithm. Neurocomputing 73, 449–460 (2009)
CRPSO-Based Integrate-and-Fire Neuron Model for Time Series Prediction Liang Zhao and Feng Qian Key Laboratory of Advanced Control and Optimization for Chemical Processes, Ministry of Education, East China University of Science and Technology, Shanghai 200237, P.R.China {lzhao,fqian}@ecust.edu.cn
Abstract. Single Integrate-and-Fire neuron (IFN) model is used for time series prediction recently in which a multilayer neural network is always utilized. An improved particle swarm optimization (PSO) algorithm named cooperative random learning particle swarm optimization (CRPSO) algorithm is put forward to training the IFN model in order to enhance its approximation and generalization capabilities. The proposed CRPSO-based IFN model is used for Mackey-Glass time series prediction problem. The experimental results demonstrate the superiority of CRPSO-based model in efficiency and robustness over the PSO algorithm, BP algorithms and GA. Keywords: Time series prediction, Integrate-and-Fire neuron model, Particle swarm optimization, Cooperative random learning particle swarm optimization.
CRPSO-Based Integrate-and-Fire Neuron Model for Time Series Prediction
101
In order to enhance the approximation capability of the IFN model, the CRPSO algorithm is introduced to train this model [6]. CRPSO algorithm evolves multiple sub-swarms simultaneously and uses randomly selected best information from all the sub-swarms to calculate the velocity and position of the particle, is proposed to overcome the shortcomings of conventional PSO [7]. The IFN model with CRPSO algorithm is used for Mackey-Glass (MG) time series prediction problem and the results are compared with BP algorithm, GA and PSO algorithm. The rest of the paper is organized as follows: Section 2 describes the IFN model and the well-defined BP learning algorithm. The CRPSO algorithm is provided in Section 3. Section 4 discusses the detailed applications of the IFN model with various learning algorithms for the Mackey-Glass time series prediction problem and conclusions are drawn in Section 5.
2 The Single Integrate-and-Fire Neuron Model The IFN model, first proposed by Yadav et al., is used as a learning machine for time series prediction and classification. The detailed simplified process from biological integrate-and-fire neuron models to artificial IFN model can be found in [3]. Here, the structure of the IFN model and the well-defined BP learning algorithm are provided. 2.1 The Structure of the Single Multiplicative Neuron Model The diagram of a generalized single IFN model with learning algorithm is illustrated in Fig. 1:
x1
b1
ln
w1
+
ydesired
d1 x2
.. .
xn
b2
ln
w2
Π(x, b, w, d)
+
u f(u)
y
-
d2 bn
ln
wn
Learning Algorithm
+
dn Fig. 1. The structure of the IFN model
where (x1, x2, ... , xn) is the input pattern, n is the dimension of the input pattern, (b1, b2, ... , bn) and (w1, w2, ... , wn) are the weights of the IFN model, (d1, d2, ... ,dn) is the biases of the IFN model, Π is a multiplicative operation as in Eq. (1), u is the output of the multiplicative operation. n
∏ ( x, b, w, d ) = ∏ ( w i =1
i
ln( bi xi ) + d i ).
(1)
102
L. Zhao and F. Qian
The Sigmoid function is selected as the activation function of the IFN model, which is defined as Eq. (2):
f (u ) =
1 . 1 + e−u
(2)
y is the output of the IFN model and ydesired is the desired output of the IFN model. The learning algorithms are used to minimize the error between y and ydesired. CRPSO algorithm, BP algorithm, GA and PSO algorithm are employed as the learning algorithm of the IFN model in this paper. 2.2 The BP Algorithm for the IFN Model BP algorithm has been used widely in neural network learning. The BP algorithm is based on the steepest descent gradient method applied to the minimization of an energy function representing the instantaneous error. BP algorithm is adopted to train the IFN model and minimize the error ( mean square error (MSE)) function:
E = MSE = where
1 2N
N
∑(y p =1
− y desired )2 , p
p
(3)
y desired and y p represent the desired output and the actual output for the pth p
input pattern of the neuron shown in Fig. 1 respectively. Using the steepest descent gradient approach and the chain rules for the partial derivatives, the learning rules for the weights and biases are given in equations (4), (5) and (6) respectively:
binew = biold − η
∂E , ∂bi
(4)
winew = wiold − η
∂E , ∂wi
(5)
d inew = d iold − η
∂E , ∂d i
(6)
where η is the learning rate parameter, which is used for controlling the convergent speed of the algorithm. The partial derivative
∂E ∂E ∂E , and are defined as ∂bi ∂wi ∂di
follows:
log( bi xi ) ∂E = ( t − y ) y (1 − y ) u ( ) ∂ bi ai log( bi xi ) + d i
(7)
CRPSO-Based Integrate-and-Fire Neuron Model for Time Series Prediction
wi ∂E 1 = ( t − y ) y (1 − y ) u ( )( ) ∂ wi a i log( bi xi ) + d i bi
∂E 1 = (t − y ) y (1 − y )u ( ) ∂di ai log(bi xi ) + di
103
(8)
(9)
According to equations (4), (5) and (6), the iterated procedure is repeated until the predefined termination criteria such as the maximum generation or the error goal is reached.
3 The Cooperative Random Learning Particle Swarm Optimization Particle swarm optimization is a novel evolutionary algorithm paradigm which imitates the movement of bird flocking or fish schooling looking for food. Each particle has a position and a velocity, representing the solution to the optimization problem and the search direction in the search space. The particle adjusts the velocity and position according to the best experiences which are called the pbest found by itself and gbest found by all its neighbors. The updating equations of the velocity and position of the particle are given as follows: V (t + 1) = wV (t ) + c1 r1 ( P - X (t )) + c2 r2 ( Pg - X (t ))
(10)
X ( t + 1) = X ( t ) + V ( t + 1)
(11)
where X and V represent the velocity and position of the particle at the time t+1; c1 and c2 are positive constants referred to as acceleration constants; r1 and r2 are random numbers following the uniform distribution between 0 and 1; P refers to the best position found by the particle and pg refers to the best position found by its neighbors. Introduced by Shi and Eberhart [8], w is the inertia weight, which is used to balance global and local search abilities of the algorithm by controlling the influence of previous information on the new velocity updated [8] [9]. Under the cooperative search framework, the authors have presented the CRPSO algorithm [6] and applied it to train a single multiplicative neuron model successfully [4]. In CRPSO algorithm, multiple sub-swarms are used to search different portions of the search space simultaneously and the particles in sub-swarms learn its gbest and the gbest found by all the sub-swarms randomly when updating their velocity and position. The velocity updating equation of the particle in sub-swarm j is rewritten as follows: V j (t + 1) = wV j (t ) + c1 r1 ( Pj - X (t )) + c2 r2 ( Pg ( j ) - X j (t )) + c3 r3 ( Pg ( r ) - X j (t ))
(12)
where j =1,...,n, is the number of the swarms and r is a random integer between 1 and n which refers to the index of the gbest selected to update the velocity at one iteration. Moreover, the information exchange is implemented through r. The schematic diagram of CRPSO is shown in Fig. 2:
104
L. Zhao and F. Qian
Fig. 2. The schematic diagram of the information exchange of CRPSO
where the central circle represents the archive of gbest found by all the sub-swarms and the neighboring circles represent different sub-swarms. Three sub-swarms are chosen according to the experiments which provide the best solutions. The initialization and search process of the sub-swarms are performed independently and the particles do not always move toward a single gbest location. This mechanism helps the algorithm get away from the local optima, which is the primary advantage of CRPSO algorithm when comparing with PSO algorithm. Thus, the diversity of the swarm is maintained efficiently and more feasible solutions can be found due to the enlarged search space. Furthermore, the multiple independent random initialization processes of the sub-swarms increase the probability of the algorithm to find the global optima and make it more robust. On the other hand, much useful information from different sub-swarms is used when updating the velocities of the particles in the evolving process, which counteracts the additional computation cost of preserving and selecting the gbest and leads to a high-speed convergence. 3.1 Encoding Strategy The vector encoding strategy encodes each particle as a real vector and is adopted in this study. The corresponding encoding style is given as [w1, w2, ..., wn, b1, b2, ..., bn, d1, d2, ..., dn]. The vector represents all the parameters of the IFN model. When calculating the outputs of the model, each particle needs to be decoded to weight vectors and bias vectors. By using this kind of strategy, the weight vector [w1, w2, ..., wn], [b1, b2, ..., bn] and the bias vector[d1, d2, ..., dn] can be obtained by splitting the style vector directly and the fitness of the particles can be calculated easily.
4 Results and Discussion 4.1 The Parameters of the Related Algorithms The developed CRPSO-based IFN model is applied to MG time series problem [10]. The performance of the proposed method is compared with BP algorithm, PSO
CRPSO-Based Integrate-and-Fire Neuron Model for Time Series Prediction
105
algorithm and GA. The learning rate η of BP algorithm is set as 0.7 and the maximum iterating epoch is set as 6000. The MATLAB Genetic Algorithm and Direct Search Toolbox (GADS) are used to carry out the involved optimized task. Population size of GA is set as 30, the generation is set as 2000 and the other parameters are set as default values. In PSO algorithm, the population size is set as 30 and the generation is set as 2000. c1 and c2 are set as 2. w is decreased from 0.9 to 0.4 linearly. In CRPSO algorithm, c1, c2 and w are set as the same as PSO algorithm. But the number of subswarm is 3, the population size is 20 and the generation is set as 1000. Thus, the four algorithms have the same computation cost. 4.2 Mackey-Glass Time Series Prediction Problem The Mackey-Glass (MG) series, based on the Mackey-Glass differential equation [10], is often regarded as a benchmark used for testing the performance of neural network models. This series is a chaotic time series generated from the following time-delay ordinary differential equation:
dx(t ) ax(t − τ ) = − bx(t ), dt 1 + x10 (t − τ )
(13)
where τ = 1 7 , a=0.2 and b=0.1. The task of this study is to predict the value of the time series at the point x(t+1) from the earlier points x(t), x(t-6), x(t-12) and x(t-18). The training is performed on 450 samples and 500 samples are used for testing the generalization ability of the model. The data sets have been pre-processed by normalizing them between 0.1 and 0.9. The training MSEs and testing MSEs are given in Table 1: Table 1. The training and testing performance for predicting the MG time series
Learning algorithms BP GA PSO CRPSO
Training MSE
Testing MSE
Mean
Std.
Mean
Std.
0.0821 0.0018 8.818e-4 9.384e-4
0.1346 0.0023 1.094e-5 2.997e-5
0.0919 0.0019 9.165e-4 8.556e-4
0.1510 0.0025 1.184e-5 1.079e-5
From Table 1, it is observed that CRPSO algorithm, PSO algorithm and GA perform better than BP algorithm in the mean MSEs and the standard deviations in both training and testing cases. PSO algorithm has the better mean training MSE and the standard deviation than CRPSO algorithm. In the other cases, CRPSO algorithm shows the best performance of the four algorithms, so it can be concluded that CRPSO algorithm is the most effective learning algorithm for training the IFN model. The training and testing results as well as the errors are shown in Fig. 3 and Fig. 4:
106
L. Zhao and F. Qian
1
x(desired),x(model
0.8 0.6 0.4 0.2 training 0
0
100
200
300
400
desired output model output
testing 500
t
600
700
800
900
Fig. 3. The prediction results for the MG time series using the CRPSO-based model
0.2
prediction errors
training
testing
0.1 0 -0.1 -0.2
0
100
200
300
400
500
600
700
800
900
t
Fig. 4. The prediction errors for the MG time series using the CRPSO-based model
It can be observed from Fig. 3 and Fig. 4 that the proposed INF model approaches the chaotic behavior of the MG series very well during the training and testing stages. 4.3 Discussion The results in table 1 show that BP algorithm performs worse than the other three algorithms. The reason is that BP algorithm is a gradient descent based algorithm and it cannot escape from a trapped local minimum when facing the non differentiable problems or other complicated tasks. Meanwhile, BP algorithm is very sensitive to the values of initialization. Thus, BP algorithm always converges to a local optimum. The PSO algorithm is also sensitive to its parameters and the initial values, which may cause it to be easily trapped in a local optimum. Therefore, PSO algorithm has worse performance than CRPSO algorithm. The cooperative random learning mechanism maintains the diversity of the population effectively and provides more useful information in the iterated process so that CRPSO has the best testing MSEs. Another benefit of the mechanism is that multiple sub-swarms search the space independently. This ensures that the search space is sampled thoroughly, thus it increases the chances finding a good solution. So CRPSO algorithm has better standard deviations than the other algorithms.
CRPSO-Based Integrate-and-Fire Neuron Model for Time Series Prediction
107
5 Conclusions The CRPSO algorithm is introduced to train the IFN model for MG time series prediction. The IFN model is considered as a neural network with simple structure and less parameters and it is used as a learning machine for function approximation. The CRPSO algorithm, PSO algorithm, GA and BP algorithm have been used as the learning algorithms of the IFN model. The simulation results show that CRPSO algorithm exhibits much better performance than the other algorithms. Thus, the CRPSO-based IFN model can predict the chaotic behavior of the MG time series accurately and effectively. Furthermore, The CRPSO-based IFN model proposed method can also be used to solve the complex system identification and other function approximation problem. Acknowledgements. This research is supported by National Science Fund for Distinguished Young Scholars (60625302), the National Natural Science Foundation of China (60704028), High-Tech Research and Development Program of China (863 Program) (2007AA041402), Shanghai Key Technologies R&D Program (09DZ1120400) and the Shanghai Leading Academic Discipline Project (B504).
References 1. Koch, C.: Computation and single neuron. Nature 385, 207–210 (1997) 2. Yadav, R.N., Kalra, P.K., John, J.: Time series prediction with single multiplicative neuron model. Applied Soft Computing 7, 1157–1163 (2007) 3. Yadav, A., Mishra, D., Yadav, R.N., Ray, S., Kalra, P.K.: Time-series prediction with single integrate-and-fire neuron. Applied Soft Computing 7, 739–745 (2007) 4. Zhao, L., Yang, Y.: PSO-based single multiplicative neuron model for time series prediction. Expert Systems with Applications 36, 2805–2812 (2009) 5. McKenna, T., Davis, J., Zornetzer, S.F.: Single neuron computation (neural nets: Foundations to applications). Academic Press, London (1992) 6. Zhao, L., Yang, Y., Zeng, Y.: Cooperative Random learning Particle Swarm Optimization for Functions optimization. In: The 4th International Conference on Natural Computation, pp. 609–614 (2008) 7. Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Networks, Perth, Australia (1995) 8. Shi, Y., Eberhart, R.C.: A modified particle swarm optimizer. In: Proceedings of the IEEE Congress on Evolutionary Computation, Piscataway, USA, pp. 69–73 (1998) 9. Shi, Y., Eberhart, R.C.: Empirical study of particle swarm optimization. In: Proceedings of the IEEE Congress on Evolutionary Computation, Piscataway, USA, pp. 1945–1950 (1999) 10. Mackey, M., Glass, L.: Oscillation and chaos in physiological control systems. Science 197, 287–289 (1977)
An Agent-Based Model of Make-to-Order Supply Chains Jing Li1,2 and Zhaohan Sheng2,∗ 1
School of Engineering, Nanjing Agricultural University, Nanjing, China [email protected] 2 School of Management Science and Engineering, Nanjing University, Nanjing, China [email protected]
Abstract. One kind of important supply chains is make-to-order supply chains. This paper proposes a common agent-based model for the simulation of maketo-order supply chains. Based on the model, scholars can study the management of make-to-order supply chains easily. Agents, who are used to simulate the members of supply chains, produce appropriate products by intelligent choices. The relationships among agents are connected by their products. Agents’ attributes are presented by their knowledge and four actions of agents are introduced in the paper. A foreign trade supply chain is simulated to show the availability of the agent-based model. The model should be available as a toolkit for the studying of the make-to-order supply chains. Keywords: agent-based model; make-to-order supply chains; supply chain management; simulation model.
1 Introduction Traditionally, companies produce products and stock them as inventory until they are sold (make-to-stock). However, some products can’t be produced with make-to-stock strategy. For many foreign trade companies, product’s design in different orders is not same. All companies must produce different products to satisfy different demands. These companies have designed their production systems to produce a product only after it is ordered. Thus, many companies have shifted to “pull”, holding no inventory at all and producing to order [1]. This kind of supply chain (consists of former companies) is referred to as make-to-order supply chain. This paper proposed an agent-based model to simulate the make-to-order supply chains. Based on the model, scholars can research on the management of make-to-order supply chains. Since the phenomena were modeled in our paper involved non-linear relationships, it is implausible to make simplifying assumptions until the equation do become solvable. Scholars have completed various perspectives in order to resolve problems of the non-linear system. Computer simulation can be used to model either quantitative theories or qualitative ones. It is particularly good at modeling process and although non-linear relations can generate some methodological problems, there is no difficulty in representing them with a computer program [2]. Although the agent-based modeling has ∗
An Agent-Based Model of Make-to-Order Supply Chains
109
become an increasingly important tool for scholars studying supply chains, but there are no common models on describing and testing the make-to-order supply chains. An agent-based model is proposed in the paper to simulate the make-to-order supply chain. Rajagopalan provided insights into the impact of various problem parameters on the make-to-order versus make-to-stock decisions using a computational study [3]. Rajagopalan’s research is beneficial to the study of make-to-order supply chains. Gilbert built a multi-agent model embodying a theory of innovation networks [4]. Gilbert’s work was a key issue for the building of the multi-agent system. Bhavnani provided a general introduction to an agent-based computational framework for studying the relationship between natural resources, ethnicity, and civil war [5]. The framework in Bhavnani’s work is beneficial for the building of our model. Lei discussed a distributed modeling architecture in a multi-agent-based behavioral economic landscape (MABEL) model that simulated land-use changes over time and space [6]. A few agent-based models were proposed with different technologies [7, 8]. Janssen reported on the establishment of the Open Agent-Based Modelling Consortium, www.openabm.org, a community effort to foster the agent-based modelling development, communication, and dissemination for research, practice and education [9]. These agent-based models show significance to build the model of our paper. The structure of the supply chains’ models depend on the knowledge of agents and the characteristics of demands. The C2 architecture, which was studied to present a network [10], inspired the structure definition of our paper. In the paper, Section 2 introduces the agent-based mode of make-to-order supply chains for the studying of supply chains management. Besides the agent-based framework, agents’ attributes and four actions are proposed in Section 2. The simulation of a foreign trade supply chain is presented in Section 3 to illustrate the validity of the model. Finally, the conclusions of the work are given in Section 4.
2 The Agent-Based Model The agent-based model is a model containing heterogeneous agents (virtual companies) which act in a virtual environment. In the model, the production of virtual company may be supported by several parts which are produced by other virtual companies. Virtual companies accomplish their works with their knowledge. Each virtual company is simulated by an agent in the model. The agent-based model G ( G =< V , E , P > ) consists of N agents ( V = {v1 , v2 , v3 ,..., vN } ), where each agent can be considered as a unique node in the virtual supply chain. The relationship in the network is modeled by an adjacency matrix E, where an element of the adjacency matrix eij = 1 if the agent vi uses his knowledge to support v j to satisfy its demand ( M v j ) and
eij = 0 otherwise. The relation among agents are directed, so eij ≠ e ji .
The relation between between
vi and v j is shown in Figure 1 with an arrow. The arrowed line
vi and v j means vi produces the parts of v j ’s products.
110
J. Li and Z. Sheng
P ( P = {maxParttimeWork , initialPopulation, requirementCount , subRequirementCount} ) is attributes of the model. How many products can be produced by an agent in same time is decided by the parameter ( maxParttimeWork ) of the model. This parameter will increase the complexity of supply chains. The initial population of agents is set by the parameter of initialPopulation . The total number of customer requirements (products) and sub-requirements (parts) are controlled by parameters requirementCount and subRequirementCount . 2.1 Customer Requirements Customer requirements are decided by environments of the virtual system. The requirements (products) consist of several sub-requirements (parts). In the paper, all requirements are defined as AT = {subti , i = 0,1, 2...} . Each sub-requirement
subti has two characteristics f subti and tsubti . f subti means the field knowledge requirement of subti for the agent who want to satisfy this sub-requirement.
tsubti means the requirement of technologies in the field of f subti . Only if the agent has enough field knowledge and technologies in the special field, the agent is the adaptive candidate to satisfy the sub-requirement. The total number of subrequirements depend on parameters of the model, such as requirementCount and
subRequirementCount . A special software interface is proposed to design the requirements based on users’ requirements. One of the most important actions for agents in the model is to find suitable requirements (sub-requirements) according to its’ knowledge and tasks’ characteristics. 2.2 Agent States The state of the agent,
vi is defined as Svi = {kvi , f vi } , where kvi is the knowledge of
f vi is the fitness of the agent. If f vi ≤ f dead , vi will be deleted from the
model. In the model, the agent is a member with an individual knowledge base. This knowledge of where
vi is represented as kvi = {{kvFi , kvTi },{kvFi , kvTi },...{kvFi , kvTi }} ,
kvFi ( kvFi ∈ [1, kvFi max ] ) is the research field, kvTi ( kvTi ∈ [1, kvTi max ] ) is the
special technology in the field of
kvFi . The length of kvi is between klvmin and i
klvmax . i The agent’s performance in the model is presented as the fitness (
f vi ). The fit-
ness can be explained by the sum of rewards in the all last periods. In the paper, all revenues and costs are in fitness units. Each new agent’s fitness is f initial .
An Agent-Based Model of Make-to-Order Supply Chains
111
2.3 Agent Actions A finite set of actions for agent
vi is defined as Avi = {aavi , abvi , acvi , atvi } .
aavi means the action of vi to compute his attributes, such as fields, technologies and work qualities.
abvi means the bid action of vi . acvi means the action of vi to call
for bids and choose adaptive agents as his suppliers. In make-to-order supply chains, vi have stable suppliers. atvi is used to pay taxes of agent vi at each period of simulations. Tax rate in the model is
trate . Taxes of vi at each simulation period is
f vi * trate . The relation among these four actions at a simulation period is shown in Figure 1.
abvi
acvi
atvi
aavi
Fig. 1. Actions of
vi
at simulation period t
At the beginning of each period, agent will check their work plans. If agent
vi
vi will purchase these parts only if the parts is produced by parts’ manufacturers. If vi has no purchasing mission at this period, vi will bid the requirements according his abilities. If no agent wins the offer, vi will bid again with new price until one of agents gets the offer. If vi wins the offer, vi will choose the suppliers of parts. Actions of vi are proposed in gives offer to other agents to by some parts in last periods,
the following. 1. Actions of computing attributions For the first action
aavi , vi will compute his work fields ( α vi ), technologies
( β vi ) and qualities ( qvi ) of work results. The fields of (1) and (2).
vi is calculated by Formula
112
J. Li and Z. Sheng
α
klvi
min vi
= ∑ kvFi j − γ * klvi .
(1)
j =1 klvi
α vmax = ∑ kvF j + γ * klv i
α vmin i
i
j =1
i
αv (αv
is the minimal value of
i
is the field of
i
γ ( γ ∈ [0,1] ) is a system parameter. α of
vi is the value between α The technologies of
maximal value of
min vi
and
α
(2)
.
max vi
max vi
vi ). klvi is the length of kvi .
is the maximal value of
( α vi
∈ [α
min vi
,α
max vi
αv
i
. The fields
] ).
vi is calculated by Formula (3) and (4) where N is the
kvTi .
β
klvi
min vi
= (∑ kvFi j * kvTi j ) *(1 − λ ) / N .
(3)
j =1
klvi
β vmax = (∑ kvF j * kvT j )* (1 + λ ) / N . i
β vmin i
is the minimal value of
β
system parameter. between If
α
β
min vi
min vi
and
max vi
β
(4)
i
βv ( βv i
i
is the technologies of
vi ). λ ( λ ∈ [0,1] ) is a
is the maximal value of β vi . The fields of
max vi
≤ f subti ≤ α
i
j =1
( β vi
max vi
and
∈[β
β
min vi
min vi
,β
max vi
vi is the value
] ).
≤ t subti ≤ βvmax , agent vi is an adaptive candii
date of sub-requirement subti . This rule is used at the process of “Find demands” in Figure 1. The quality of agents’ work depends on the knowledge of agents. The quality ( qvi ) is calculated by Formula (5) in the model. klvi
qvi = (∑ kvTi j *(1 − e j =1
− kvFi j
) / klvi .
(5)
2. Actions of bidding ( abvi ) If
agent
vi is adaptive to do task subti ( α vmin ≤ f subti ≤ α vmax and i i
β vmin ≤ tsubt ≤ βvmax ), vi i
i
i
subti
calculates his bid price ( bpvi
) for subti .
An Agent-Based Model of Make-to-Order Supply Chains
subt i bpvsubt = [η *(α vi + β vi ) + (1 − η )*( f subti + tsubti )]*(1 + ipr ) * nsubt . i i
113
(6)
i bpvsubt is calculated by Formula (6) where η ( η ∈ [0,1] ) is system parameter, ipr is i
the initial profit rate,
subt nsubt is the number of suppliers of subti . If vi accomplishes i
subti successfully, all kvTi = kvTi *(1 + ΔkvTi ) of kvi will be improved ( ΔkvTi is the step of improvements). If
i vi can’t win the offer, vi will decrease bpvsubt with a i
minor step in next bid period.
3 The Implementation This paper simulates the supply chain of down coats to show the validity of the multi-agent model. The foreign trade company has down coat suppliers in china. In the last years, this company deals with down coats orders with different category of designs. Since the design of foreign customer’s coat is not same in different orders, all companies in the supply chain produce the products only after it is ordered (make-to-order strategy). Since it is similar for all suppliers, this paper simulates the supply chain with one down coat supplier. This supply chain consists of one Down-Plant called DP, one Fabric-Plant called FCP, one Fastener-Plant called FRP, one Down-Coat-Plant called DCP, and one Foreign-Trade-Company called FTC. DP supplies the pure down, FCP supplies the fabric, and FRP supplies various fasteners. DCP is a down coat producer. FTC is a special foreign trade company. The adjacencies of companies for the supply chain are summarized in Figure 2. In the reminder of the paper the company network made by five node-companies is considered (surrounded by the dashed line in Figure 2).
Fig. 2. Foreign down coat supply chain architecture
The agent-based model is programmed by JAVA based on RePast. The program is run on WinXP. In order to simulate the supply chain in Figure 3, the parametric settings for the model are proposed in Table 1.
114
J. Li and Z. Sheng Table 1. Parametric settings
maxParttimeWork
1 min vi
initialPopulation 5 max vi
requirementCount
subRequirementCount
1
4 F vi max
kvTi max
kl
kl
k
3
30
100
100
100
0.1
0.1
ipr
Δk
0.1
0
f initial 3000
η
0.7
f dead
γ
T vi
λ
ϕ
0.8
trate 0.005 Different settings of parameters will influence the running of the model. However, the validity of the model is not depending on the parametric settings. Different supply chains can be modeled by different parameters in the model. Figure 3 shows the running of the model which is run with the parametric settings of Table 1. Each agent is shown as a point in the figure. The commercial relationship between two agents is described with the line between agents. The model will run iterated with different customers’ demands.
Fig. 3. Simulation of the supply chain by the model
In the model of the supply chain, the agent (virtual FTC) finds demands firstly (step i). Virtual FTC will give offer to the agent (virtual DCP) with the special coat designs (step i+1). Virtual DCP give offers to three agents (virtual DP, virtual FCP, and virtual FRP) (step i+2). All virtual companies in the model produce products only after it is ordered (step i+3). When the virtual FTC finds new demands, the model runs again with different coat designs. Researchers can carry out studying on the supply chain management based on this model.
An Agent-Based Model of Make-to-Order Supply Chains
115
4 Conclusion An agent-based model is proposed in the paper to simulate the make-to-order supply chains. This model is an attempt to improve our understanding of the complex processes going on in make-to-order supply chains. The validity of the model is shown by the simulation of the foreign trade supply chain. Agents are used to simulate the members of the supply chain. Agents’ attributes are presented by their knowledge kvi . Four actions ( Avi = {aavi , abvi , acvi , atvi } ) of agents are defined to support the decision of agents. The purpose of the paper is proposed a common agent-based model to simulate make-to-order supply chains. The studying of make-to-order supply chains can be done in the model easily. Acknowledgments. This research was supported by the NSFC (National Natural Science Foundation of China) key program under Grant 70731002 and fund of Jiangsu Agricultural Machinery Bureau (Grant Number-GXS08005).
References 1. Kaminsky, P., Kaya, O.: Combined make-to-order/make-to-stock supply chains. IIE Transactions 41, 103–119 (2009) 2. Gilbert, N., Terna, P.: How to build and use agent-based models in social science. Mind & Society 1, 57–72 (2000) 3. Rajagopalan, S.: Make to Order or Make to Stock: Model and Application. Management Science 48(2), 241–256 (2002) 4. Gilbert, N., Pyka, A., Ahrweiler, P.: Innovation Networks - A simulation approach. Journal of Artificial Societies and Social Simulation 4(3) (2001), http://jasss.soc.surrey.ac.uk/4/3/8.html 5. Bhavnani, R., Miodownik, D., Nart, J.: REsCape: An agent-based model for modeling resources, ethnicity and conflict. Journal of Artificial Societies and Social Simulation 11(2) (2008), http://jasss.soc.surrey.ac.uk/11/2/7.html 6. Lei, Z., Pijanowski, C.B., Olson, J.: Distributed Modeling Architecture of a Multi-AgentBased Behavioral Economic Landscape (MABEL) Model. Simulation 81(7), 503–515 (2005) 7. Dunham, B.J.: An Agent-Based Spatially Explicit Epidemiological Model in MASON. Journal of Artificial Societies and Social Simulation 9(1) (2005), http://jasss.soc.surrey.ac.uk/9/1/3.html 8. Shibuya, K.: A Framework of Multi-Agent-Based Modeling, Simulation, and Computational Assistance in an Ubiquitous Environment. Simulation 80(7), 367–380 (2004) 9. Janssen, A.M., Alessa, L.N., Barton, M., Bergin, S., Lee, A.: Towards a Community Framework for Agent-Based Modelling. Journal of Artificial Societies and Social Simulation 11(2) (2008), http://jasss.soc.surrey.ac.uk/11/2/6.html 10. Krackhardt, D., Carley, M.K.: A PCANS Model of Structure in Organization. In: Proceedings of the 1998 International Symposium on Command and Control Research and Technology, Evidence Based Research, Vienna, VA, pp. 113–119 (1998)
Pricing and Bidding Strategy in AdWords Auction under Heterogeneous Products Scenario∗ E. Zhang and Yiqin Zhuo School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai, China, 200433 [email protected], [email protected]
Abstract. This paper focus on biding and pricing strategies in a scenario two heterogeneous products manufacturers selling through on-line channel. The firms competes customers in quality to price ratio. The value of prominent AdWords advertising position and the resulting price dispersion patterns are studied. We found that prominent position of an Ad words is not always favorite to all firms according to the analysis based on game theory. For the firm which produced high-quality products, the revenue gained from listed on the prominent place is always higher than in the second place; However, for the low-quality product firm the revenue gained from its advertisement listed on the prominent place might less than on the second place. Meanwhile the attractiveness of the prominent Ad place depends on the market structure in terms of consumer preference and consumer search behavior. The more consumers purchase from the firm listed in the prominent Ad places or the more consumers prefer high-quality product the more strict area in which the low-quality product manufacture has positive profit. Keywords: AdWords Auction; Heterogeneous Product pricing.
1 Introduction With the bloom of the Internet market, online advertising has become a popular way for many businesses to advertise which accounted for a growing proportion of the entire advertising market. PricewaterCoopers estimated that from 2006 to 2011, the online advertising market will rise from $31.6 billion to $73.1 billion. Typically Ad styles online such as banner ads and button ads impresses surfing people in the same way as advertising on TV, While other styles are more interactive such as games ads and keyword ads (Abbr. as Adwords). Adwords has already become one of the fastest growing, currently accounts for 40% of online advertising market revenue [1]. Since people’ surfing time online is limited, advertisers value the prominence location much higher than the posterior. Thus the online advertising platform such as Google, Taobao allocate the slot with auction, the highest bidder get the top ad impressions, ∗
This work was supported by the National Natural Science Foundation of China (NSFC Grant Nos.70602031).
followed bidder get second, and so on [2]. Researchers from different area study Adwords auction from different point of view. As for the Adwords allocation research, previous literatures focused on the following aspects: how effective the auction mechanism in total revenue when sales thousands of Adwords? What is the optimal bidding strategy for advertisers under the current auction system? How does the product quality impact on the bid and price of Adwords? As for explorer efficient auction mechanism in online environment, Yossi Aza et al found the first-price auction and second-price auction are quite different [3]. Benjamin Edelman and Michael Schwarz’s research result showed that the second-price auction with reserve price is optimal auction mechanism, and the optimal auction reserve price is not change with the number of bidders increased [4]. These studies focused on the settings that bidders bid only once or repeated same bidding, and did not take into account the advertiser’s current bidding behavior related with past bidding behavior. Liu and Chen studied this problem by designed auction mechanism under unit price contract based on the past performance [5]. They expressed the advertisers’ behavior in the past in the form of performance and found that the auctioneer will get more revenue when set up the appropriate key elements under this auction mechanism. In the aspect of bidders’ behavior, the researchers primarily concerned with the effect of internal and external environment on bidding decision. The inspection of optimal bidding strategy under continuous-time markets condition shows that the users tend to bid less under market fluctuation [6]. However, the author did not consider the effect of the relationship between the bidders who will be cooperation or competition on the bidding behavior and equilibrium. Oliver Kirchkamp and J. Philipp Reiß demonstrated that in the first-price auction, the bidders who took bidding low-price strategy would get better returns [7]. However, they did not discuss the reality of the universal application of the second-price auction. In the case where bidders are asymmetric, Xu inspected bidders bidding strategy under the second-price auction mechanism, the low type firm would bid positively, as the profits received in the first more than in the second position; while the high type firm should be reduced bidding in some cases, as some of the time the return in the second position higher than in the first [8]. Edelman et al. shows that all bidders submit bids equal to their true values without taking account quality factor [9]. Several researches indicated that products quality does impact on the Adwords’ bid and price. Animesh et al analyzed the relationship between product quality and the sponsored link advertising from the empirical view, pointing out that firms bidding strategy were different with different categories of products [10]. For different quality products, Nelson indicated that in the commodity market for search products, the quality of products and advertising bidding have positive relationship, in which highquality firms bid more active than low-quality firms, especially in the case of repeated purchased [11], Significant difference in cost structure of high quality and low quality firms [12]. However, Katona and Sarvaryextends the analysis by taking the quality factor into account and shows that there are multiple equilibriums which do not have closed form solutions [13].
118
E. Zhang and Y. Zhuo
Although the existing research results contribute Adwords auction area much, the searchers’ consume behavior and product market need to be taken into account together. In this paper we combine the searchers’ and analyze the bidding strategy for firms produced heterogeneous products in the Adwords market. Under the frame work of literature [8], we assumed that there are two advertisers to bid for two advertising positions. The values and biddings of the two positions are all different. Each advertiser will measure the revenue in each position, and will also consider how to set a reasonable price for their products. Therefore, using backward induction, we get the revenue on both positions for each firm. The firm would like to stay in the position which generates higher revenue. Then we compare the difference of the revenues. The firm who bid higher will win the top position. We found prominent position of an Ad words is not always favorite to all firms according to the analysis based on game theory. For the firm which produced highquality products, the revenue gained from listed on the prominent place is always higher than in the subordinate place; However, for the low-quality product firm the revenue gained from its advertisement listed on the prominent place might less than on the subordinate place. Meanwhile the attractiveness of the prominent place depends on the market structure in terms of consumer preference and consumer search behavior. The more consumers purchase from the firm listed in the prominent Ad places or the more consumers prefer high-quality product the more strict area in which the low-quality product manufacture has positive profit. This paper is organized as follows. We describe our model and solve it in Section 2. In Section 3, we discuss the effect of the change of heterogeneous consumers on the firms bidding strategy. In section 4 we conclude the paper.
2 Model Set Up 2.1 Description of the Problem Suppose there are two heterogeneous firms, given one produced only high-quality product (with H represent), the other produced only low-quality product (with L represent), and did not change with the market environment. The prices of the two products marked pH and pL respectively. In order to produce high quality product, firm H should expend extra cost c for each product. Firms bid the Adwords to maximize their own revenue, the winner will get the prominent one, while the other stays in subordinate. Under the same framework of [8], there are two types of consumers: non-shoppers (accounting α ) and shoppers (accounting 1 − α ); non-shoppers click the ads in the prominent place and purchase, shoppers will click on both positions, and then make a purchase decision depended on their consumer surplus. Consumers’ preferences may not be the same (e.g. some people may only like high-quality products). Among them, β are the proportion of consumers who are willingness to buy H product, and 1 − β are the proportion of homogeneous consumers who make a purchase decision based on product quality and price. The consumer who clicks on the ads position will purchase products. Consumers willingness to pay for H and L products are different,
Pricing and Bidding Strategy in AdWords Auction
119
mark them WH and WL respectively, and WH > WL , WH > c . Let WH = θWL , and θ > 1 .
The consumer surplus for i product is Wi − pi , i ∈{H, L} . The timing of the game is as follows: in the first phase, the two firms bid under the rule of the generalized second-price auction, the one bid higher will get the prominent position; in the second stage, the firms pricing their products respectively, and consumers click on the position and make purchase decisions. 2.2 Model Solution
Using backward induction, begin with second stage, we calculate the firm’s expected revenue on both position, and find the equilibrium price competition under different situations. Back to the first stage, we compare which position will gain higher revenue for each firm, and discuss their bidding strategy. In the second stage, there is no pure strategy Nash equilibrium. There are two reasons: firstly, due to the fixed quality of products, in order to get more consumers, each firm would have to reduce prices; secondly, once the price dropped to a certain level, the firm in the prominent place has an incentive to raise prices, so that he can make more profits from non-shoppers. Thus the firms’ equilibrium pricing point does not exist. So there is no pure strategy Nash equilibrium. We use FH ( p ) , FL ( p ) to indicate mixed-pricing strategies, FH ( p ) , FL ( p ) measures the probability that H(L) firm will charge a price less than or equal to p, and π i j to represent expected revenue of firm i in j position, i ∈ { H , L} , j ∈ {1, 2} . When firm H wins the prominent position, his pricing should be less than WH . As he can get the revenue α (W H − c ) and (1 − α ) β (W H − c ) from non-shoppers and shoppers, so his pricing should be higher than (α + β − αβ )(W H − c ) + c .Then the pricing support is: p ∈ ⎡⎣(α + β − αβ )(WH − c ) + c,WH ⎤⎦ . For firm L, his price should be less than WL . And the minimum price should satisfy such equation: WL − p L = WH − ⎡⎣(α + β − αβ )(WH − c ) + c ⎤⎦ . Therefore, firm L pricing support is: p ∈ ⎡⎣W L − (θ W L − c )(1 − α − β + αβ ) , W L ) . Proposition 1. When firm H wins the position, the two firms’ mixed bidding strategy are as follows: p + ⎡⎣(1 − α − β + αβ ) θ − 1⎤⎦ WL − (1 − α )(1 − β ) c FL ( p ) = p ∈ ⎡⎣(1 + (α + β − αβ − 1) θ ) WL + (1 − α )(1 − β ) c,WL ⎤⎦ (1 − α )(1 − β ) ⎣⎡ p + (θ − 1)WL − c ⎦⎤ 1 W H − (W H − c )(1 − α )(1 − β 1− θ 1⎞ ⎛ p − ⎜1 − ⎟W H ⎪⎧ θ ⎝ ⎠ FH ( p ) = ⎨ ⎪⎩ 1
)
p ∈ ⎡⎣ (α + β − α β )W H + (1 − α
p = WH
When firm L wins the position: The revenue of firm L: ⎡
⎛
⎣
⎝
⎛ ⎝
π 1L = α (1 − β ) p L + (1 − α )(1 − β ) ⎢1 − FH ⎜ p L + ⎜ 1 −
1⎞
⎞⎤
WH ⎟⎥ pL θ ⎟⎠ ⎠⎦
)(1 − β ) c , W H )
120
E. Zhang and Y. Zhuo
The revenue of firm H: ⎡
⎛
⎣
⎝
⎛ ⎝
1⎞ ⎠
⎞⎤
π H2 = (1 − α )(1 − β ) ⎢1 − FL ⎜ p H − ⎜ 1 − ⎟ W H ⎟ ⎥ ( p H − c ) + (1 − α ) β ( p H − c ) θ ⎠⎦
As WL is the highest pricing for L, so the least profit is α (1 − β ) WL , and the minimum pricing is αWL ; as WH is the highest pricing for firm H, his profit will be achieved at least (1 − α ) β (WH − c ) , and the minimum pricing is β (WH − c ) + c . If the two products price are all the minimum, the consumer surplus of them is (WL − αWL ) , (WH − β (WH − c ) − c ) respectively, notes TL , TH .When TL > TH , it shows that the consumer surplus of buying L product is greater than H, then the proportion (1 − α )(1 − β ) of consumers choose to buy L product completely. However, according to the principle of profit maximization, firm L will raise product price until the consumer surplus of buying L product equal to H product. Therefore, L and H firm pricing support is ⎡⎣WL −(1− β )(θWL −c) ,WL ⎤⎦ , ⎡⎣β (WH −c) + c,WH ⎤⎦ respectively. On the contrary, when TL < TH , firm H will raise price. So L firm pricing support is [α WL ,WL ] , and H firm pricing support is ⎡⎣ (1 − (1 − α ) / θ ) WH , WH ⎤⎦ in this situation. Proposition 2: When firm L wins the prominent position, c 1−α (ⅰ) when , the two firms’ mixed bidding strategy are as follows: > 1− WH (1 − β )θ ⎡ ⎤ p + (θ − 1 − β θ )W L − (1 − β ) c FL ( p ) = p ∈ ⎢⎢ ⎜⎛⎝θ β + 1 − θ ⎟⎞⎠W L + ⎜⎛⎝1 − β ⎟⎞⎠ c ,W L ⎥⎥ ⎣⎢ ⎦⎥ (1 − β ) ( p + (θ − 1 ) W L − c )
FH
(p)
p − ⎜⎛ α + β − α β ⎟⎞W H − ⎜⎛1 − α ⎟⎞ ⎜⎛1 − β ⎟⎞ c ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎛ ⎞ p − ⎜ 1 − 1 ⎟W
⎧ ⎪⎪ = ⎨ ⎪ ⎪⎩ 1
⎜ ⎜ ⎝
θ
⎟ ⎟ ⎠
⎡
p ∈ ⎢⎢ β W H + ⎛⎜⎜1 − β ⎞⎟⎟ c ,W H ⎝
⎢⎣
H
⎠
⎞ ⎟ ⎟ ⎟ ⎠
p =W H
c 1− α (ⅱ) when 0< , the two firms’ mixed bidding strategies are as follows: < 1− WH (1− β )θ 1−α ⎞ ⎛ p − ⎜1 − W θ ⎟⎠ H ⎝ FH ( p ) = ⎡ 1 ⎤ (1 − α ) ⎢ p − ⎛⎜ 1 − ⎞⎟ W H ⎥ θ ⎠ ⎝ ⎣ ⎦ p − αWL ⎧⎪ (1 − β ) [ p + (θ − 1 ) W L − c ] FL ( p ) = ⎨ ⎪⎩ 1
⎡⎛ 1−α ⎞ ⎤ p ∈ ⎢⎜1 − ⎟W H ,W H ⎥ θ ⎝ ⎠ ⎣ ⎦ p ∈ [α W L , W L
)
P = WL
Proposition 3: For the firm who produces high-quality products, the revenue gains in the prominent place is always higher than in the subordinate one; for the firm who 1−α produces low-quality products, if and β < 1− θ (1 − 2α ) c 1 , he will obtain a higher income in the sub1− < < 1− 2 W 1 − β 2 −α )θ ( )( (1− α ) (1− β )θ H
ordinate place than in the prominent one.
Pricing and Bidding Strategy in AdWords Auction
121
So advertisers should take different bidding decisions depended on their products. In some case, it is not worth bidding aggressive for the top position, as he may get higher returns in the subordinate place.
3 Discussion In this section, we investigate how value β could affect the bidding strategy. We discuss it in two parts: β = 0and β >0. (1) When β = 0 , the model will degenerate to the case of homogeneous consumers. We draw on area where income in the second place higher than in the prominent one for L firm in different θ values: (shaded area) c WH
1
θ = 1.5
c WH
1
θ =2
c WH
1
0. 8
0. 8
0. 8
0. 6
0. 6
0. 6
0. 4
0. 4
0. 4
0. 2
0. 2 3− 5 2
0 0
0. 2
0. 4
0. 6
(a) θ = 1.5
0. 8
1
α
θ =5
0. 2 3− 5 2
0 0
0. 2
0. 4
0. 6
0. 8
1
α
(b) θ = 2
3− 5 2
0 0
0. 2
0. 4
0. 6
0. 8
1
α
(c) θ = 5
Fig. 1. The better income area in subordinate place for L firm in different θ : β = 0
As the figure 1 shown, when the difference between the prominent and subordinate position is not significant ( α small), the cost which H firm paid is high 1 − 2α ) c ), consumers are willing to pay relatively high price for low(1 − ( < 2 (1 − α ) θ W H quality products (
c 1 ), then L firm still be able to obtain better returns <θ − WL (2 −α )
even if he does not bidding. The θ value reflect the degree of the difference of two products qualities, so with the increase of the difference of the two products qualities ( θ value increased), only when H firm spend a higher cost or consumers’ willingness to pay in a higher price for low-quality product, L firms will be able to obtain better returns if he does not bidding. (2) When β >0, that the market exists heterogeneous consumers. Take θ = 2 , Proposition 3 shows if β < 1 + α and 1 − 2
(1 − 2α ) 2 (1 − α ) (1 − β ) 2
<
c 1 , < 1− WH (1 − β )( 2 − α ) 2
L firm would have better earnings in the subordinate place. The cost and price which consumers willingness to pay for low-quality product have an effect on the area
122
E. Zhang and Y. Zhuo
where has a better earnings in the subordinate place. With the increase of preferring H product, even if H firm spends lower cost or consumers are willing to pay lower price for low-quality product, L firm still has a better return in the subordinate position. c WH
1
β = 0.1
c WH
c β = 0.15 WH
1
1
0. 8
0. 8
0. 8
0. 6
0. 6
0. 6
0. 4
0. 4
0. 4
0. 2
0. 2
0. 2 3− 5 2
0 0
0. 2
0. 4
0. 6
(a) β = 0.1
0. 8
1
α
β = 0.3
3− 5 2
0 0
0. 2
0. 4
(b) β = 0.15
0. 6
0. 8
1
α
3− 5 2
0 0
0. 2
0. 4
0. 6
0. 8
1
α
(c) β = 0.3
Fig. 2. The better income area in subordinate position for L firm in different β : θ = 2
4 Conclusion In this paper we discussed bidding and pricing strategy in AdWords auction under a scenario where there are two firms manufacturing heterogeneous products and selling to consumers with different preference and search behavior. We found that the high-quality products manufacturer is much better off when he win the prominent AdWords place than the subordinate one. But for the low-quality products manufacturer there is another story. It would be better off when it win the prominent AdWords place sometimes; it might get higher profit to win the subordinate place in some cases. Products heterogeneous degree ( θ ), consumer’s preference ( β ), and consumer search behavior ( α ) impact the firms bidding and pricing strategy much. Although in some cases the two firms both get higher profits if they win the prominent place than in the subordinate place, the high-quality products manufacturer has more chance to win, especially when more consumers prefer to high-quality products. Meanwhile the opportunities to get higher profit in the subordinate place become small if θ increased or β decreased. By take into account heterogeneous products and consumer preference this study extends research in literature [8]. But there still are some questions to be answered in future. We assumed that the consumer's information is complete, namely, they are completely clear about products quality, prices for each firm. In the case of incompletely information, will the firms’ bidding behavior change? When the product for each firm is not fixed, in another words, the firms can choose to manufacture products according to the market structure, will he still keep the bidding strategy? These questions still need to be further researched.
Pricing and Bidding Strategy in AdWords Auction
123
References 1. Online advertising category (in Chinese), http://news.iresearch.cn/0472/62203.shtml 2. Aggarwal, G., Feldman, J., Muthukrishnan, S.: Bidding to the Top: VCG and Equilibria of Position - Based Auctions. In: Erlebach, T., Kaklamanis, C. (eds.) WAOA 2006. LNCS, vol. 4368, pp. 15–28. Springer, Heidelberg (2007) 3. Azar, Y., Birnbaum, B., Karlin, A.R., Thach Nguyen, C.: Thinking Twice about SecondPrice Ad Auctions. CoRR abs/0809.1895 (2008) 4. Edelman, B., Schwarz, M.: Optimal Auction Design in a Multi-unit Environment: The Case of Sponsored Search Auctions. American Economic Review (forthcoming) 5. De, L., Jianqing, C.: Designing online auctions with past performance information. Decision Support Systems 42(3), 1307–1320 (2006) 6. Fang, W.: Optimal Bidding Strategy for Keyword Auctions and Other Continu-
ous-time Markets. Working paper, CiteSeerX (2007) 7. Kirchkamp, O., Philipp Reiß, J.: Heterogeneous bids in auctions with rational and markdown bidders-Theory and Experiment. Jena Economic Research Papers in Economics 2008-066 (2008) 8. Lizhen, X., Jianqing, C., Andrew, W.: To Place Better or Price Cheaper? Bidding and Pricing under Keyword Advertising. In: Whinston (ed.) Proceedings of the Seventh Workshop on e-Business (WeB), Paris, France (2008) 9. Edelman, B., Ostrovsky, M., Schwarz, M.: Internet advertising and the generalized second price auction: Selling billions of dollars worth of keywords. The American Economic Review 97, 242–259 (2007) 10. Animesh, A., Ramachandran, V., Viswanathan, S.: Quality Uncertainty and the Performance of Online Sponsored Search Markets. Information Systems Research (2010) (in Press) 11. Nelson, P.: Advertising as Information. Journal of Political Economy 82, 729–754 (1974) 12. Laurent, L.: Price and advertising as signals of quality when some consumers are informed. International Journal of Industrial Organization 20(7), 931–947 (2002) 13. Katona, Z., Sarvary, M.: The race for sponsored links: A model of competition for search advertising. In: Working Paper, INSEAD (2007)
FIR Cutoff Frequency Calculating for ECG Signal Noise Removing Using Artificial Neural Network Sara Moein Computer Engineering Department, Islamic Azad University of Najafabad Branch, Najafabad, Esfahan, Iran [email protected]
Abstract. In this paper, an automated approach for electrocardiogram (ECG) signal noise removing using artificial neural network is investigated. First, 150 of noisy heart signal are collected form MIT-BIH database. Then signals are transformed to frequency domain and cutoff frequency is calculated. Since heart signals are lowpass frequency, a Finite Impulse Response (FIR) filter is adequate to remove the noise. In the next step, a dataset is configured for a multilayer perceptron (MLP) training with feedforward algorithm. Finally, the MLP is trained and results of cutoff frequency calculation are shown. Keywords: Finite Impulse Response (FIR), Cutoff frequency, Dataset, Multilayer Percecptron.
FIR Cutoff Frequency Calculating for ECG Signal Noise Removing Using ANN
125
Fig. 1. ECG signal from a normal human
2 Literature Review Nowadays many researchers work on signal noise removing [4], [5], [6], [7], [9]. Hyun D.K et al. [5] grouped various noised signals into six categories by context estimation, and effectively reconfigured noise reduction filter by neural network and genetic algorithm (GA). Neural network based control module effectively select optimal filter block by noise context based clustering at running mode and filtering performance was improved by GA at evolution mode. Manash S. [4] et al. Designed and applied a notch filter to the ECG signal containing power line noise. Complete design is performed with FDA tool in the Matlab. The equiripple notch filter designed is having higher order due to which increase in the computational complexity observed. In another study, Moein, S. [8] used Kalman filter for ECG signal noise removing. In her study author used neural network for calculating Kalman filter parameters and results show that neural network can be applied for noise removing using Kalman filter. Considering previous researches, this paper is an attempt to apply artificial neural network to gain the cutoff frequency for FIR filter. In other words, this study mainly focuses on intelligent cutoff frequency calculating.
3 FIR Filter Given a finite duration of nonzero input values, the effect is that an FIR filter will always have a finite duration of nonzero output values, and that is how FIR filters got their name. FIR filters were selected over IIR as they are stable and the effect of finite word length on the specified frequency or time domain response of the output noise is smaller than that for Infinite Impulse Response (IIR) filters [10], [11].
126
S. Moein
The ideal lowpass filter is one that allows through all frequency components of a
signal below a designated cutoff frequency ωc , and rejects all frequency components of a signal above ωc .
4 Methodology 4.1 Database Physiobank is a complete collection of heart signals of normal and patient human. In this research, 150 of noisy ECG signals are collected form three databases. MIT-BIH Arrhythmia Database that is collection of 48 fully annotated half-hour two-lead ECGs, MIT-BIH Normal Sinus Rhythm Database that includes 18 long-term ECG recordings of subjects referred to the Arrhythmia Laboratory at Boston's Beth Israel Hospital and MIT-BIH Supraventricular Arrhythmia Database that includes 78 halfhour ECG recordings chosen to supplement the examples of supraventricular arrhythmias in the MIT-BIH Arrhythmia Database [8]. All natural signals are corrupted by noises and artifacts caused by some recourses. To remove the noise of ECG signals FIR is an adequate filter. Cutoff frequency and order of FIR filter are two effective variables to design the filter. Here the objective is to find out the cutoff frequency using neural network. 4.2 Frequency Domain Considering the noise interference in ECG signals, all 150 Signals must be transformed to frequency domain using Fast Fourier Transform (FFT). Fig. 2 and 3 show examples of noisy signals and Fourier transform. 4.3 Dataset Configuration By transforming signals to frequency domain, to configure the dataset, statistical attributes of each signal in frequency domain are extracted. The attributes are standard deviation, variance etc. Standard deviation based on definition is as below (1):
σ =
N −1
∑
n=0
Where N is number of samples,
1 ( xn − μ ) 2 N
(1)
x n is data sample and μ is the mean. σ is the stan-
dard deviation. Using Matlab software, features are extracted from Fourier transform of each signal and results are collected in a dataset. Table 1 is part of dataset. Since the proposed method is based on a supervised learning for neural network, therefore it is necessary to calculate a target for each record of data in a dataset. The target is the value of
FIR Cutoff Frequency Calculating for ECG Signal Noise Removing Using ANN
127
cutoff frequency that can be calculated using frequency spectrum. Then the frequency must be normalized (2).
ω = 2 Where
ωc fs
f s is the sampling frequency.
(a)
(b) Fig. 2. Arrhythmia ECG signal a) noisy signal b) Frequency spectrum
(a)
(b) Fig. 3. Arrhythmia ECG signal a) noisy signal b) Frequency spectrum
(2)
128
S. Moein Table 1. Part of provided dataset
Signal
Mean FFT
Variance FFT
Standard Deviation
1
Arrhythmia
0.3303
107.4256
10.3646
0.33
2
Arrhythmia
0.2076
69.0750
8.3111
0.80
3
Arrhythmia
0.1390
122.1468
11.0520
0.80
4
Arrhythmia
0.3211
144.7610
12.0317
0.33
5
Arrhythmia
0.1124
113.5184
10.6545
0.22
6
superventricular
0.0595
47.3639
6.8821
0.22
7
superventricular
0.3818
186.4406
13.6543
0.33
8
Normal ECG
0.5862
331.1704
18.1981
0.33
9
Normal ECG
0.4293
1.2263e+003
35.0192
0.44
10
superventricular
0.5550
268.9242
16.3989
0.33
11
superventricular
0.0579
104.3169
10.2136
0.60
12
superventricular
0.6750
89.7411
9.4732
0.60
13
superventricular
0.0585
43.8329
6.6206
0.44
14
Arrhythmia
0.1148
222.9100
14.9302
0.44
15
Arrhythmia
0.0121
23.7992
4.8784
0.33
16
Normal ECG
0.0596
19.6352
4.4312
0.60
17
Normal ECG
0.1812
45.8616
6.7721
0.80
18
Normal ECG
0.2852
380.5688
19.5082
0.60
No.
c
Table 2 shows assigning a label to each ω c . It shows that the configured dataset consists of 6 classes. Table 2. Assigning a label to classes of cutoff frequency
Cutoff frequency 0.11 0.22 0.33 0.60 0.80 0.44
ωc
Label 0 1 2 3 4 5
FIR Cutoff Frequency Calculating for ECG Signal Noise Removing Using ANN
129
4.4 MLP Training Multilayer Perceptron (MLP) is an efficient neural network for classification problems and it has a considerable performance [7]. There are different issues involved in training the MLP network: • • • •
Number of hidden layer Number of nodes in hidden layer Converging to an optimal solution in a reasonable period of time Testing neural network for overfitting
The MLP with 3 nodes in input layer and 1 node in output layer is trained. 130 numbers of records of dataset will train the neural network and 20 records are used to test
Fig. 4. Effect of training cycle on performance of MSE Table 3. Comparison of real cutoff frequencies and calculated cutoff frequencies with MLP Test samples 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Fig. 5. Error of MLP in calculating the cutoff frequency
the trained network. Number of nodes in hidden layer and number of training cycle are variable in different tests to gain the considerable performance. Fig. 4 shows the effect of variable training cycle on Mean Square Error (MSE) performance function of network. Where there are 10 nodes in hidden layer. It shows that the better performance occur with 300 and 700 training cycles. Table 3 shows results of MLP training with 10 nodes in hidden layer and 700 training cycles. First column shows the real cutoff frequencies and calculated cutoff frequencies are shown in second column. Fig. 5 presents the error of network. Using the calculated cutoff frequency and determining the order of FIR filter, the required parameters are prepared and noise removing can be done.
5 Conclusion Results show that neural network has the ability to be applied for solving problems of signal processing. It has good performance for calculating cutoff frequency and observing figures and tables shows that the error of network is less than 0.5. It proves that the performance is more than 90%. Therefore, it is true to claim that noise removing can be developed toward automatic solution using neural network. For future works, applying recurrent and unsupervised neural networks can improve the presented method.
References 1. Sornmo, L., Laguna, P.: Bioelectrical Signal Processing in Cardiac and Neurological Applications. Elsevier, Amsterdam (2005) 2. Medical Device Safety Reports, http://www.mdsr.ecri.org/ 3. Behbahani, S.C.: Investigation of Adaptive Filtering for Noise Cancellation in ECG signals. In: Second International Multi-Symposiums on Computer and Computational Sciences (2007)
FIR Cutoff Frequency Calculating for ECG Signal Noise Removing Using ANN
131
4. Mahesh, S., Agarvala, A., Uplane, M.: Design and implementation of digital FIR equiripple notch filter on ECG signal for removal of power line interference. J. WSEAS Transactions on Signal Processing 4, 221–230 (2008) 5. Hyun, D.K., Chul, H.M., Tae, S.K.: Advances in Neural Networks: Adaptable Noise Reduction of ECG Signals for Feature Extraction. Springer, Heidelberg (2006) 6. Moein, S.: Hepatitis Diagnosis by Training A MLP Artificial Neural Network. In: Worldcomp 2008 Conference, Las Vegas, USA (2008) 7. Moein, S.T.: Advances in Computational Biology: A MLP Neural Network for ECG Noise Removal Based on Kalman Filter. In: Arabnia, H.R. (ed.). Springer, Heidelberg (2010) (accepted in publish stage) 8. Archive of ECG signals, http://www.physionet.org/physiobank/database/PTB 9. Karimi-Ghartemani, K., Ziarani, A.C.: A nonlinear time frequency analysis method. IEEE Trans. Signal Process, 1585–1595 (2004) 10. Lian, Y., Hoo, P.J.: WSEAS Transactions on Electronics: Digital elliptic filter application for noise reduction in ECG signal 3, 65–70 (2006) 11. Sornmo, L.: Time-varying digital filtering of ECG baseline wander. J. Med. Biol. Eng. Comput. 31, 503–508 (1993)
A System Identification Using DRNN Based on Swarm Intelligence Qunzhou Yu1, Jian Guo2,∗, and Cheng Zhou3 1 School of Environmental Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China 2 Department of Controlled Science and Engineering, Huazhong University of Science and Technology, Wuhan 43007, China Tel.: +86-13607134003 3 School of Civil Engineering and Mechanic, Huazhong University of Science and Technology, Wuhan 43007, China [email protected], [email protected], [email protected]
Abstract. Original Elman, which is one of the well-known dynamic recurrent neural network (DRNN), has been improved to easily apply in dynamic systems identification during the past decade. In this paper, a learning algorithm for Original Elman neural networks (ENN) based on modified particle swarm optimization (MPSO), which is a swarm intelligent algorithm (SIA), is presented. MPSO and Elman are hybridized to form MPSO-ENN hybrid algorithm as a system identifier. Simulation experiments show that MPSO-ENN is a more effective swarm intelligent hybrid algorithm (SIHA), which results in an identifier with the best trained model. Dynamic identification system (DIS) of the MPSOENN is obtained. Keywords: systems identification, swarm intelligent, recurrent neural network, particle swarm optimization.
1 Introduction The unknown nonlinear system identification is a kind of branch of research in the automatic control domain. However, the majority of methods for system identification and parameters’ adjustment are based on linear analysis. It is difficult to extend them to complex nonlinear systems. The ability of the above-mentioned methods is still limited and unable to achieve better efficiency in identification. Therefore, there are many correlation research looked for new methods overcome these problems. Various intelligent control techniques have been proposed over the past decade to identification the dynamics system. These methods can be divided into three types: artificial intelligent (AI)[1,2], evolutionary computation (EC)[3]and swarm intelligent (SI) [4] like artificial neural network (ANN) [5], genetic algorithm (GA) [6], particle swarm optimization(PSO) [4,7,8] and evolutionary programming (EP) [9,10]. ∗
A System Identification Using DRNN Based on Swarm Intelligence
133
The ANN (artificial neural network) characteristics, namely nonlinear transformation and support to highly parallel operation, can provide effective techniques for system identification and control, especially for unknown nonlinear systems [11]. It has also been applied in system identification and classification [12,13] mainly because: (1) it can approximate the nonlinear input-output mapping of a dynamic system; (2) it enables to model the complex systems’ behavior and to achieve an accurate control through training, without a priori information about the structures or parameters of systems. There has been a growing interest in the application of neural networks to dynamic system identification recently. Original Elman is one of the well-known dynamic neural network (DRNN). The Elman recurrent multi-layer network introduces links to memorize feedback information of the history influence dynamically. The approach has great developmental potential in the fields of system modeling, identification and control [14]. PSO (particle swarm optimization) , which is one of the SI algorithm, is particularly effective for the optimization of nonlinear and multimodal functions [15]. It has been applied in different areas [4,16,17]. Moreover, it has been recognized as a computational intelligence technique related to evolutionary algorithms. PSO is a populated search method for optimization of continuous nonlinear functions resembling the movement of organisms in a bird flock or fish school. Its key concept is that potential solutions are flown through hyperspace and are accelerated towards better or more optimum solutions. In this paper, modified particle swarm optimization (MPSO) is used to obtain the values of the weights will be investigated. One of the features of MPSO, which distinguishes it from conventional optimization methods such as hill climbing, is its much greater ability to stochastic local minima. The results obtained in evaluating the performance of the original Elman when trained by MPSO for dynamic system identification. So the complex theoretical analysis and the exact mathematical description of the strongly nonlinear system are avoided.
2 Modified PSO The real optimization problem can be formulated as the following functional optimization problem. min f ( X ), X = [ x1 , x 2 ,⋅ ⋅ ⋅, x n ] .
(1)
s.t. x i ∈ [ a i , bi ], i = 1,2,⋅ ⋅ ⋅, n .
(2)
where f (⋅) is the objective function, and X is the decision vector consisting of n variables. PSO is a simple evolutionary stochastic algorithm [7]. It finds optimal solutions through interaction of individuals in a population of particles. It is an effective method to solve optimizing problems in complex multi-dimensional functions [18]. Based on the concept of fitness degree, the evolution equations are described as follows:
134
Q. Yu, J. Guo, and C. Zhou
Vi t +1 = wVi t + c1 ⋅ r1 ( Pi − X i ) + c2 ⋅ r2 ( Pg − X i )
X it +1 = X it + Vi t +1
(3) (4)
where, Vi = [vi,1, vi,2, . . ., vi,n] called the velocity for particle i, which represents the distance to be traveled by this particle from its current position; Xi = [xi,1,xi,2, . . .,xi,n] represents the position of particle i; Pi represents the best previous position of particle i; Pg represents the best position among all particles in the population X = [X1,X2, . . .,XN]; r1 and r2 are two independently uniformly distributed random variables; c1 and c2 are positive constant parameters called acceleration coefficients which control the maximum step size; w is called the inertia weight that controls the impact of previous velocity of particle on its current one. In the standard PSO, equation (3) is used to calculate the new velocity according to its previous velocity and to the distance of its current position from both its own best historical position and its neighbors’ best position. Generally, the value of each component in Vi can be clamped to the range [vmax, +vmax] to control excessive roaming of particles outside the search space. Then the particle flies toward a new position according equation (4). This process is repeated until a user-defined stopping criterion is reached. However, PSO is easily to get into local optimum value during solving the multi-model and highly complicated nonlinear function problems. It induces the particles in a state of premature convergence. At present, many researchers have modified PSO algorithms to alleviate the problems, and a great deal of research findings has been gained [19]. In the proposed method, if the absolute velocity of a particle is smaller than a threshold vth (>0), increase it with a larger value. In this way, the particle has a large probability to escape from the local minimum point. A new equation of the particle velocity is defined as
vˆit +1 = (1 − ξ ) ⋅ vit + ξ ⋅ (1 − r3 ⋅ where
vit ) ⋅ vmax v th
(5)
v max is a designated maximum velocity; r3 is a random number in the range of
(0,1); ξ is a self-adaptive coefficient, namely
⎧⎪0, vit > v th ξ =⎨ t th ⎪⎩1, vi ≤ v
(6)
The Pi in equation (3) is reinitialized with a new value, that is
Pi = X i .
(7)
A System Identification Using DRNN Based on Swarm Intelligence
135
3 Original Elman Neural Network Original Elman neural network introduced by Elman [20] is a typically dynamic recurrent neural network algorithm. It has several characteristics: self-feedback, memorization ability, and one-step time delay [21]. Many numerical experiments have proved that Elman is superior to the feed-forward neural networks algorithms with better astringency and stability. The topological structure of Elman neural network is commonly divided into four layers: input layer, hidden layer, context layer and output layer. The connections among the input, hidden and output layers are similar to feed forward network. The mathematical description of the relationship of Elman can be given as follows [20]:
x(t ) = f ( w (1) ⋅ xc (t ) + w ( 2 ) ⋅ u (t − 1) + θ (1) )
(8)
xc (t ) = x(t − 1)
(9)
y N (t ) = g ( w ( 3) ⋅ x(t ) + θ ( 2 ) )
(10)
where f (⋅) is often taken as the activation sigmoid function in the hidden layer , and g (⋅) is the linear function; x (t ) is the output of the hidden layer at time t ; x c (t ) is the output of the context layer at time t; u (t ) and y N (t ) are the network input and
;w
output vectors at the time t, respectively
the context layer to the hidden layer ; w
(2)
(1)
is the weight matrices that connects
is the weight matrices that connects the
( 3)
input layer to the hidden layer ; w is the weight matrices that connects the hidden layer to the output layer ; θ (1) and θ ( 2 ) are threshold values of hidden layer and output layer respectively.
4 Identification System The off-line training of Elman adopts mainly the methods as follows: back propagation algorithm, gradient descent method and genetic algorithm. This would cause the trained weights and thresholds of Elman to be fixed. Once system orders or hiddenlayer units are changed, the problems of weak adaptive ability and low approximate precision would be caused. To improve Elman real-time performance, MPSO is employed to train and optimize Elman structure online. The dynamic system is described as
yi (t ) = f [Yt , X t , Vt ]T
(11)
136
Q. Yu, J. Guo, and C. Zhou
where f (⋅) is a nonlinear function; yi (t ) is the actual output value of the particle i at
Yi and X i are the output and input vectors of the system at the time t respectively; Vi is the random noise vectors. time t;
The Elman function
f N (⋅) is used to approximate the nonlinear relation function
f (⋅) and can map the input/output of the network, namely y N ,i (t ) = f N [Yt , Xt , Vt ]T
(12)
In order to realize the nonlinear dynamic identification, the network output value y N ,i is trained to approximate actual value y i , i.e. y N ,i ≈ yi .The fitness function of the particle is used by the reciprocal of the system performance index E i , namely
J i (t ) =
1 = Ei (t )
1 p
∑[ y (t ) − y i
t = p − n +1
N ,i
(t )]2
(13)
where J i (t ) is the fitness value of the particle i at time t; p is the sample number; y N ,i (t ) is the training output value of the particle i at time t.
5 Simulation Results Simulations are conducted to study the ability of Elman to be trained by a MPSO to model a dynamic non-linear system. A sampling period of 0.01 sec is assumed in all cases. This is a non-linear systems with the following discrete time equation:
y ( k + 1) =
y (k ) - 0.3 y (k - 1) + 0.5u ( k ) 1.5 + y 2 (k )
(14)
The Elman and the MPSO-ENN with non-linear neurons in the remaining layers are employed. In all cases, MPSO control parameters are maintained in all simulations at the following values: Population size: 10 Constants named acceleration coefficients:
c1 = c 2 =2.0
Range of independent random numbers: r1 and r2 Inertia weight: w = wmax −
~ (0, 1)
wmax − wmin t , wmax =0.9, wmin =0.4. itermax
A System Identification Using DRNN Based on Swarm Intelligence
137
The above control parameters are within ranges found to be suitable for a variety of optimization problems. No other parameter values are tried in this work. The hyperbolic tangent function is adopted as the activation function of the non-linear neurons. The neural networks are trained using the same sequence of random input signals as mentioned previously. The responses obtained using the networks, taking only the feed forward connections as variable are presented in figures 1-2, respectively.
Fig. 1. Response of the first system using Elman
Fig. 2. Responses of the first system using MPSO-ENN
138
Q. Yu, J. Guo, and C. Zhou
The simulation results indicate that MPSO-ENN is superior to the Elman in identification of non-linear system successfully. For all net structures, the training is significantly faster when all connection weights are modifiable than when only the feed forward connection weights could be changed. This is probably because of the fact that in the former case MPSO had more freedom to evolve good solutions.
6 Conclusion This paper has invested the use of the MPSO to train Elman for the identification of dynamic system. Identification results are obtained for non-linear system. The main conclusion is that MPSO is successful in training all at the expense of computation time. The fact that the MPSO could not train the Elman has confirmed the finding of a previous MPSO. Confirmation has also been made of the superiority of the modified Elman network compared to the original Elman network. Acknowledgment. This paper is partially supported by the National Natural Science Foundation of Hubei (No:2008CDZ057) and the Science and Technology Project Plan of Ministry of Housing and Urban-Rural Development of China (No:2009-K3-16) .
References 1. Luger, G.F., Stubblefield, W.A.: Artificial Intelligence: Structures and Strategies for Complex Problem Solving. Addison Wesley, MA (1998) 2. Nilsson, N.J.: Artificial Intelligence: A New Synthesis. Morgan Kaufmann, San Francisco (1998) 3. Fogel, D.B.: Evolutionary computation: toward a new philosophy of machine intelligence. John Wiley & Sons, Hoboken (2006) 4. Kennedy, J., Eberhart, R.C.: Swarm Intelligence. Morgan Kaufmann Publishers, San Francisco (2001) 5. Yegnanarayana, B.: Artificial Neural Networks. Prentice-Hall of India, New Delhi (1999) 6. Holland, J.H.: Adaptation in Natural and Artificial Systems. MIT Press, Cambridge (1992) 7. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Networks, pp. 1942–1948. IEEE Press, Piscataway (1995) 8. Shi, Y.: Particle swarm optimization. IEEE Connect. Newsletter IEEE Neural Networks Soc. 2(1), 8–13 (2004) 9. Fogel, D.B.: Evolving artificial intelligence. University of California, San Diego (1992) 10. Hayakawa, T., Haddad, W.M., Bailey, J.M.: Passivity-based neural network adaptive output feedback control for nonlinear nonnegative dynamical systems. IEEE Trans. Neural Networks 16, 387–398 (2005) 11. Ku, C., Lee, K.Y.: Diagonal recurrent neural networks for dynamic systems control. IEEE Trans. on Neural Networks 6(1), 144–156 (1995) 12. Pham, D.T., Liu, X.: Dynamic system identification using partially recurrent neural networks. Journal of Systems Engineering 2(2), 90–97 (1992) 13. Cong, S., Gao, X.P.: Recurrent neural networks and their application in system identification. System Eng. Electron 25, 194–197 (2003)
A System Identification Using DRNN Based on Swarm Intelligence
139
14. Davis, L.: Handbook of Genetic Algorithms. Van Nostrand Reinhold, New York (1991) 15. Kennedy, J.: The particle swarm: social adaptation of knowledge. In: Proceedings of the 1997 International Conference on Evolutionary Computation, Indianapolis, pp. 303–308 (1997) 16. Clerc, M., Kennedy, J.: The particle swarm-explosion, stability and convergence in a multidimensional complex space. IEEE Transactions on Evolutionary Computation 6(1), 58–73 (2002) 17. Chau, K.W.: Particle swarm optimization training algorithm for ANNs in stage prediction of Shing Mun River. Journal of Hydrology 329(3-4), 363–367 (2006) 18. Vincent, T.L., Grantham, W.J.: Optimality in Parametric Systems. Wiley, New York (1981) 19. Shi, Y.H., Eberhart, R.C.: Experimental study of particle swarm optimization. In: Proceedings of SCI Conference, Piscataway, pp. 1945–1950. IEEE Press, Los Alamitos (1999) 20. Elman, J.L.L.: Finding structure in time. Cognitive Science 14(2), 179–211 (1990) 21. Pham, D.T., Liu, X.: Dynamic system identification using partially recurrent neural networks. Journal of Systems Engineering 2(2), 90–97 (1992)
Force Identification by Using SVM and CPSO Technique Zhichao Fu*, Cheng Wei, and Yanlong Yang School of Aeronautic Science and Engineering, Beijing University of Aeronautics and Astronautics, Beijing, 100191, P.R.China [email protected]
Abstract. A novel method is presented to determine the external dynamic forces applied on structures from measured structural responses in this paper. The method utilizes a new SVM-CPSO model that hybridized the chaos particle swarm optimization (CPSO) technique and support vector machines (SVM) to tackle the problem of force identification. Both numerical simulations and experimental study are performed to demonstrate the effectiveness, robustness and applicability of the proposed method. It is potential that the proposed method is practical to the real-life application. Keywords: force identification; inverse problem; support vector machine; particle swarm optimization; chaos.
1 Introduction The knowledge of the dynamic forces is an essential requirement during the design and optimization stages of mechanical systems. However, it is difficult or impossible to directly measure the dynamic forces acting on a structure in many practical cases. So it is very valuable and necessary to find alternative methods of force estimation. Force estimation using vibration data has attracted a lot of attention from researchers. The classical approach to force reconstruction is to use a frequency domain technique[1], which means that dynamic loads spectra are identified by the response and the relationship between the matrix of frequency response function(FRF) and response spectra in frequency domain It is preferable in some applications to have a time domain algorithm capable of estimating the forces acting on a structure in real time. There are two main time domain techniques: the Sum of Weighted Accelerations Technique (SWAT) [2] and the Inverse Structural Filter (ISF) [2]. Allen and Carne [2] presented the comparison between the ISF and SWAT in time domain force identification. The support vector machine (SVM) is an emerging technique for learning relationships in data within the framework of statistical learning theory [3]. The SVM technique has been applied successfully in many fields. Recently, the application of the SVM was employed for structural dynamics. Mita and Fujimoto[4] employed SVM to *
Force Identification by Using SVM and CPSO Technique
141
detect structural damages. Jian Zhang et al [5] applied SVM method to structural identification. Lute et al [6] applied SVM technique to analyze the aerodynamic characteristics (flutter derivatives) of cable stayed bridges. Bornn [7] utilized SVM and nonlinear time-series model for the structural health monitoring. In this present study, SVM is applied to the dynamic force estimation. Proposed by Kennedy and Eberhart [8] and inspired by social behavior in nature, PSO is a population-based search algorithm that is initialized with a population of random solutions, called particles. PSO has been successfully applied to function optimization, artificial network training and fuzzy system control, etc. Moreover, it also has been applied to structural dynamics. Flores et al [9, 10] employed PSO to the force identification in mechanical system. In this paper, a novel algorithm of parameter selection is proposed based on chaos particle swarm optimization (CPSO) technique. The purpose of the present study is to employ support vector regression (SVR) combined with CPSO to tackle the problem of force determination. This paper is written as follows: in section 2, the SVM and CPSO are described briefly, respectively. In Section 3, hyper-parameter selection based on CPSO is explained in detail. In Section 4, the present scheme is applied to the force reconstruction problem with both numerical cases and an experiment study. Section 5 gives summary and conclusions.
2 Method The quality of SVM models strongly depends on a proper setting of parameters. In this research, the CPSO technique is employed to select the optimal hyper-parameters of the SVM model, including the penalty parameter and kernel function parameter. 2.1 Support Vector Machine (SVM) Support vector regression (SVR) aims at producing an optimum regression function to predict future data accurately[3]. Learning systems for regression estimation can be described as follows[6]. Given a set of input/output training data D = {( x1 , y1 ), ( x 2 , y 2 ), L , ( x n , y n )} ∈ R n × R , we try to find a function f : R n → R that will correctly predict unseen data underlying the same probability distribution as the training data. The generic SVR regression function is expressed
f ( x ) = w T ⋅ Φ( x ) + b
(1)
where w ∈ R n ,b ∈ R an Φ denotes a nonlinear transformation from R n to a high dimensional feature space. The goal is to find the value of w and b that minimize the regression risk. The optimal function is given by the minimum of the functional
Ψ(w,ξ ) =
n 1 2 w + C∑ (ξi− + ξi+ ) 2 i =1
(2)
142
Z. Fu, C. Wei, and Y. Yang
where the constant C controls the trade-off between complexity and losses. ξ i− ,ξ i+ are non-negative slack variables representing upper and lower constraints. The ε − insensitive loss function is given by Eq. (3a) [3]
⎧⎪ f ( x) − y − ε Γ( f ( x) − y ) = ⎨ ⎪⎩0
for f ( x) − y ≥ ε for f ( x) − y < ε
(3a)
The optimization problem is solved while minimizing the regression function and loss functions simultaneously by Eq. (3b) l
L(α ∗ ,α ) = ∑ (α i ( y i + ε ) − α i∗ ( y i − ε )) + i =1
1 l ∑ (α i∗ − α i )(α ∗j − α j ) K ( xi , x j ) 2 i , j =1
(3b)
where the dual variables α ,α ∗ are Lagrange multipliers and kernel K ( xi , x j ) is a symmetric function and ε is the given approach precision. In practice, a low degree polynomial kernel or RBF kernel with a reasonable width is a good initial trial. In this work, RBF kernel is considered.
K ( x, y ) = exp(−
x− y
2σ 2
)
(4)
Thus, the approximation function is expressed as p
f ( x) = ∑ (α i∗ − α i ) K ( xi , x j ) + b
(5)
i =1
where p is the number of support vectors. 2.2 Chaos Particle Swarm Optimization(CPSO)
PSO is a population-based optimization, which searches for optima by updating generations. The mathematical description and executive steps are as follows [11]. Let the ith particle in a supposed D-dimensional searching space be represented as r xi = (xi1 ,L xid ,L , xiD ) . The best previous position of ith particle is recorded and r represented as pi = ( pi1 ,L pid ,L , p iD ) , called pbest. The index of the best pbest among all the particles is marked by the symbol g. The location Pg is called gbest. The r velocity for the ith particle is denoted as vi = (vi1 ,L vid ,L , viD ) . The concept of the PSO consists of changing the velocity and location of each particle towards its pbest and gbest locations according to Eqs (6a), (6b) and (7) at each time step.
Force Identification by Using SVM and CPSO Technique
vid (t ) = w × vid (t − 1) + c1 × r1 × ( p id − xid (t − 1)) + c 2 × r2 × ( p gd − xid (t − 1)) xid (t + 1) = xid (t ) + vid (t + 1) w = wmax −
I current ( wmax − wmin ) I max
143
(6a) (6b) (7)
where w is the inertial coefficient and is adjusted in the direction of linear decrease described as Eq. (7); c1 and c2 are learning rates which are non-negative constants;
r1 and r2 are generated randomly in the interval [0, 1]; vid ∈ [− v max , v max ] and vmax is a
designated maximal velocity; I current is the current iteration, I max is the predefined maximal iteration; wmax , wmin is the predefined maximal and minimal inertial weight, respectively. The termination criterion for iterations is determined according to either the maximum generation or a designated value of the fitness is reached. In this study, these parameters are set as follows: c 1 = 2 , c 2 = 2 , wmin = 0.1 , wmax = 0.9 . To avoid being trapped into local optimum, chaos is incorporated into the above PSO, for chaos sequences can experience all the states without repeat. Logistic equation is employed to obtain chaos sequences herein, which is defined as follows: x n +1 = μx n (1 − x n ),
n = 0,1,2,L
(8)
where μ is the control parameter, suppose that 0 ≤ x 0 ≤ 1 , when μ = 4 and x 0 ∉ {0,0.25,0.5,0.75} the system has been proved to be entirely chaotic.
Chaos initialization, which is applied to both initialize the hyper-parameters of the SVM model and the random coefficients in Eq.(6a), is adopted to locate the positions of particles and to increase the diversity of the population and the ergodicity in the course of searching without changing the randomicity of algorithm.
3 SVM-CPSO Hybrid Model In this research, there are two parameters to be optimized, which are regularization parameter C and kernel parameter σ . The fitness function of each particle is evaluated by the following formulation: RMSE =
1 n
n
∑ ( y pre (i) − y ori (i)) 2
(9)
i =1
where n is the number of sample points in the test data set. y pre , y ori are predicted and original values, respectively. RMSE is the fitness function.
144
Z. Fu, C. Wei, and Y. Yang
In summary, the SVM-CPSO hybrid method for input estimation is implemented as follows: (1) structural dynamic responses are preprocessed to zero mean and unit standard deviation; (2) the SVR-CPSO hybrid method are utilized to obtain optimal hyper-parameters; (3) force is estimated from a new data set using the trained hybrid SVM model. The principle of the proposed method is demonstrated in Fig.1.
Fig. 1. The principle of proposed method
4 Numerical Cases and Experiment Study The aim of this section is to illustrate the properties of the proposed method to resolve the input estimation problem in structural dynamics. In this section, both numerical simulations and experimental tests are investigated. The results demonstrate that the proposed method is effective to tackle input estimation problem. 4.1 Numerical Simulations
The model, depicted in Fig.2, represents a 3 DOF mass-spring system and it is used in the following numerical examples. Stiffness of all springs is k = 100 N / m ; and all masses are equal m = 100kg . A mass-proportional damping with a proportionality constant of 1% is added to the model. The system is excited by transient force applied to first mass m1 , whose the non-zero part is given as follows. u = (1 − cos 2πft ) sin 6πft , 0 < t < 1 / f
(10)
where f=0.5Hz. It is sampled 30 seconds with a sampling frequency f s = 200Hz . To simulate the field measurement, a set of contaminated versions of structural dynamic responses are constructed by artificially adding different noise levels of
Force Identification by Using SVM and CPSO Technique
145
Fig. 2. Discrete structure with 3 DOF
signal-to-noise ratio (SNR) =50db, 40db, 30db, 20db and 10db.The comparison will be made quantitatively by way of the relative estimation error: y exact − y estimated ~ f = × 100% y exact
(11)
For transient force estimation, both noise-free and noise cases are considered herein. The comparison of exact input and estimated force with noise-free accelerations are plotted in Fig.3. It is easily observed that excellent conformity is attained. Figure.4 illustrates the case when input estimation is carried out with the accelerations contaminated by noise at level of signal-to-noise-ratio (SNR) =10. From Fig.4, it is clear that there is fairly good agreement between the exact and identified forces. To conduct further investigation, input estimation is performed under different noise levels of SNR=50, 40, 30, 20 and 10. The results are listed in Table.1. Table 1. Relative error under different conditions
Fig. 3. Transient force recovered from noise-free responses
10db 9.34
146
Z. Fu, C. Wei, and Y. Yang
Fig. 4. Transient force recovered from responses at the noise level of SNR=10db
4.2 Experiment Study An experiment on force determination of a test rig is conducted to validate the effectiveness and applicability of the proposed novel scheme in real practice. The test sample is made of aluminum, shown in Fig.5. Eight piezoelectric sensors are mounted to measure the responses. The impulse load was applied using a PCB instrumented hammer. A LMS data acquisition system was set up to measure signals from the sensors. The sampling frequency was set to 2048Hz and total 8 seconds were sampled for each channel per impact from each measurement. A pre-trigger was set to ensure that all of the appropriate waveforms were recorded. In this experiment, only 9 channels were adopted for data acquisition.
Fig. 5. Photo of test bed rig for experimental measurement
Force Identification by Using SVM and CPSO Technique
147
The measured forces and corresponding responses are required to estimate input. The experimental force determined using the proposed method is produced in Fig.6. It is clear that the identified impact force history agrees with the experimental one very well. Thus, the effectiveness of the new method of force estimation is validated.
Fig. 6. Experimental force recovered from measured responses
5 Conclusions This paper presented an identification method to determine the external forces applied to mechanical structures. This identification strategy is based on SVM hybrid CPSO technique. The approach is tested on simulations and real-life measurements. The results demonstrate the effectiveness and robustness of the novel scheme. It extends the SVM to the inverse problem of input estimation successfully. Further studies will focus on the case of multi-point excitation and some open problems such as the numbers and locations of sensors, types of structural responses.
References 1. Doyle, J.F.: Experimental determining the contact force during the transverse impact of an orthotropic plate. Journal of Sound and Vibration 118(3), 441–448 (1987) 2. Allen, M.S., Carne, T.G.: Comparison of Inverse Structural Filter (ISF) and Sum of Weighted Accelerations Technique (SWAT) Time Domain Force Identification Methods. In: 47th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, Newport, Rhode Island (2006) 3. Vapnik, V.N.: Statistical learning theory. Wiley, New York (1998) 4. Mita, A., Fujimoto, A.: Active detection of loosened bolts using ultrasonic waves and support vector machines. In: Proceeding of the 5th International Workshop on Structural Health Monitoring, pp. 1017–1024 (2005)
148
Z. Fu, C. Wei, and Y. Yang
5. Zhang, J., Sato, T., Iai, S., Hutchinson, T.: A pattern recognition technique for structural identification using observed vibration signals: Linear case studies. Engineering Structures 30, 1439–1446 (2008) 6. Lute, V., Upadhyay, A., Singh, K.K.: Support vector machine based aerodynamic analysis of cable stayed bridges. Advances in Engineering Software 40, 830–835 (2009) 7. Bornn, L., Farrar, C.R., Park, G., Farinholt, K.: Structural Health Monitoring With Autoregressive Support Vector Machines. ASME Journal of Vibration and Acoustics 131(4), 021004-1– 021004-9 (2009) 8. Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Networks, Piscataway, NJ, pp. 1942–1948 (1995) 9. Flores, J.E.R., Viana, F.A.C., et al.: Force Identification of Mechanical Systems by Using Particle Swarm Optimization. In: 10th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, Albany, New York (2004) 10. Flores, J.E.R., Viana, F.A.C., et al.: Identification of external forces in mechanical systems by using LifeCycle model and stress-stiffening effect. Mechanical Systems and Signal Processing 21, 2900–2917 (2007) 11. Guo, X.C., Yang, J.H., Wu, C.G., Wang, C.Y., Liang, Y.C.: A novel LS-SVMs hyper-parameter selection based on particle swarm optimization. Neurocomputing 71, 3211–3215 (2008)
A Novel Dual Watermarking Scheme for Audio Copyright Protection and Content Authentication Zhaoyang Ma∗, Xueying Zhang, and Jinxia Yang Department of Information Engineering, Taiyuan University of Technology, Taiyuan 030024, China [email protected], [email protected]
Abstract. A novel dual watermarking scheme is presented that incorporates two watermarks in a host audio to achieve copyright protection and content authentication. The original audio signal is divided into four parts due to the segmented binary watermark image. Our technique introduces the wavelet packet coefficients to construct zero-watermarking and embed one watermark image block in each part by quantization which is different from the previous system. Moreover, it removes the process converting two-dimensional image to one-dimensional sequence. On one hand, experimental results demonstrate the robustness of zero-watermarking, exposed to a variety of common attacks, has improved. On the other hand, the semi-fragile watermarking can detect and localize malicious attacks in the form of block showing the tampering region of the original audio accurately yet tolerate mild modifications. Keywords: Wavelet packet, dual watermark, image block, copyright protection, content authentication.
1 Introduction Watermarking, which allows for the secret embedding of information in a host data, has emerged as a widely accepted approach for ownership identification [1] or the authentication of multimedia content [2]. However, the single watermarking can not meet the needs in many fields. Another possibility, application of digital watermarking is for multiple authentications, are paid more attention. Dual watermarking [3] has sharply increased for this goal. The technology can embed robust watermarking and fragile watermarking into the host audio. Copyright protection of audio signal can be achieved by robust watermarking, including embedded watermarking and zero watermarking [4]. Fragile watermarking is purposed for contents authentication. In this paper, we introduce zero-watermarking and semi-fragile watermarking into the wavelet packets coefficients of segmented audio. This method is superior to previously proposed ones since the robustness of zero-watermarking has improved, and it allows for more accurate detection of malicious alterations of an audio by locating block. ∗
2 Watermarking Embedding Process Our technique uses a binary image V (size 64 by 64 pixels) to serve applications, which is segmented into four small blocks vi (1≤i≤4). The Arnold scrambling [5] is adopted to encrypt every block for increasing the security. The segmented binary image and the encrypted image are respectively described in Fig. 1 (a) and Fig. 1 (b).
Tree Decomposition (0)
(1)
(3)
(a)
(b)
Fig. 1. Segmented image (a) and encrypted image (b)
(2)
(4)
(5)
(6)
(7) (8) (9) (10)(11)(12)(13)(14)
Fig. 2. The diagram of 3-level WPT
2.1 Wavelet Packet Transform Wavelet Packet Transform (WPT) [7] is the further development of wavelet transform. In WP decomposition, not only the output of the low pass filtered audio is used for the subsequent decomposition level, but also the high pass filter outputs. This also has much higher precision and flexibility in the selection of the bands to be used in the embedding process. The diagram of 3-level wavelet packet decomposition for an audio is depicted in Fig. 2. 2.2 The Embedding Algorithm The main steps of the embedding procedure developed are presented here and summarized in Fig. 3. The original audio X is firstly divided into four (which is considered as the first key K1) equal parts x1,x2,x3,x4 , where the block i is inserted in part xi (1≤i≤4), the process is depicted as follows: 1) Divide part xi into N sections with N=32 (N is the secure key K2), and then WP decomposition of these sections was performed based on the specification, that is, the selected wavelet was Daubechies 1 and the level is 3. The specific wavelet packet coefficients are obtained. 2) Choose these numbers, the absolute of them are the largest in the low-frequency coefficients of each section (as the coefficients of node’ 8’ in Fig. 2), and remember the position of these numbers (selected as the identification key K3). Then, arranging these numbers into a sequence si(1≤i≤4).
A Novel Dual Watermarking Scheme for Audio Copyright Protection
151
3) Pick up higher low-frequency coefficients of WPT of each section (as coefficients of node ’9’ in Fig. 2). Above all, subdivide these coefficients into N sub-sections. Then, the means aver of all the selected sub-sections of WPT are individually computed. As mentioned, each bit of the block i is inserted in the WP domain by the modification of the mean of sub- section. This is done by adjusting the mean variation to the computed value, the detail procedure is the case: ⎧−r + 0.5Δ if q (i, j ) = v′(i, j ) ⎪⎪ u = ⎨−r + 1.5Δ if q (i, j ) ≠ v′(i, j ) and r > 0.5Δ ⎪ ′ ⎩⎪−r − 0.5Δ if q (i, j ) ≠ v (i, j ) and r ≤ 0.5Δ
⎧ ⎢ aver ⎥ ⎫ ⎪q(i, j ) = mod (⎢ Δ ⎥ , 2) ⎪ ⎪ ⎣ ⎦ ⎪ where ⎨ ⎬ aver ⎢ ⎥ ⎪r = aver − ⎪ ⋅ Δ ⎢⎣ Δ ⎥⎦ ⎪⎩ ⎭⎪
(1)
Where Δ denotes the quantization step selected, r is quantization noise and v′(i, j ) is the pixels in the image block i, u is change of the mean, so that the coefficients of the sub-paragraph have to couple with u . 4) Repeat step1and step2, we can get sequence s1, s2, s3, s4, then combine them to a whole sequence S, its mean is m. Afterwards, the following rule will be used to generate a binary-valued watermark W, if s(i)>m, we can get w(i)=1; else w(i)=0. In order to make sure the security, W is scrambled by a chaotic sequence to obtain a zero-watermarking W with higher robustness. 5) Repeat step3 until finish the watermarking image embedding. Rewrite the modified coefficients into the corresponding node and reconstruct the audio signal from the new WP tree.
x i′
S′
Fig. 3. Embedding scheme
Fig. 4. Decoding scheme
152
Z. Ma, X. Zhang, and J. Yang
3 Watermark Decoding Process Fig. 4 represents the block diagrams of watermarking extraction which mainly consist of two steps. The recovery of zero-watermarking is first scanned. Then we continue the content authentication if the detected audio passes the original ownership for potential security; otherwise, we do not conduct the next verification. Step1: The specific decoding procedures of robust watermarking used for copyright protection can be described as follows: 1) The detected audio X ′ is segmented several parts xi′(1 ≤ i ≤ K1) based on secure key K1 .Furthermore, each part will be divided into N sections corresponding to the key K2 and decomposed in WP domain using the fixed parameters. 2) Pick up these low-frequency coefficients of all sections used in the embedding process by the secure key K3 and combine them into a sequence S′ . Then retrieve the zero-watermarking W ′ by the same calculation as before and scrambling operation. 3) A similar approach is adopted in the detection to compute the similarity between the extracted watermark W ′ and the designated watermark W to decide whether the designated watermark is presented in the detected model. If it is genuine the following content authentication is indispensable. Step2: The extraction of semi-fragile watermarking. Choose the higher low-frequency coefficients of WP decomposition of each section, and divide these coefficients into N sub-section. Afterwards, extract the inserted bit in block i by applying the rule ⎧1 q ′(i , j ) = 1 on the calculated mean ave r ′ in part x′ , where q′(i, j ) can be i v t′ (i , j ) = ⎨ ⎩0 q ′(i , j ) = 0
obtained as q(i, j ) , vt′(i, j ) denotes the pixels in the scrambled block. Subsequently, we
start to restore the extracted watermark V′ by inverse scrambling and combination.
4 Simulation Results The schemes presented in this paper were tested using a host audio with sampling frequency 44.1 kHz and 16 bits/sample, MATLAB 7.0 was used as the simulation tool to perform all the embedding and decoding operations. The quantization step we selected was 0.01. With the watermarked audio produced, we have obtained a signal-to-noise ratio (defined by Eq. (2), where L is the length of x (n)) to evaluate the quality comparison between the marked audio and original audio) of 30.3860 dB. There is no difference between original audio and marked audio from hearing and waveform assessment.
A Novel Dual Watermarking Scheme for Audio Copyright Protection
153
⎡ L SNR (dB) = 10 ⋅lg ⎢ ∑ x 2 (n) ⎣ i =1
(2)
L
⎤
∑ [ x′(n) − x(n)] ⎥⎦ 2
i =1
4.1 Robustness Testing
To verify the ability of our system to resist common attacks, the similarity between the recovered watermark and the original watermark is often measured by the normalized cross-correlation (NC) value (Eq. (3)) or the bit error rate (BER) (Eq. (4)). In the process of zero-watermarking detection, we choose NC=0.75 as the threshold, if the similarity value exceeds the chosen threshold, we conclude that the detected model had copyright protection.
NC (W , W ′) =
∑ w(i )w′(i ) i
∑ w 2 (i ) i
B ER =
1 N
N
∑ b ( n ) × 100% , n =1
(3)
∑ w′ ( i ) 2
i
w '( n ) ≠ w ( n )
⎧1, b (n) = ⎨ ⎩ 0,
w '( n ) = w ( n )
(4)
From Table 1, it is obvious that common attacks will cause small alterations to zero-watermarking, which is comparable with the scheme of [6]. The results illustrate the proposed scheme achieves great robustness by introduction of wavelet packet analysis. It contributes to the completion of copyright protection. Table 1. Zero-watermarking detection results for various attacks NC(our) 1 0.9969 0.9923 0.9846 0.8798
Some commonly used audio signal processing manipulations were utilized to estimate the robustness of the semi-fragile watermarking when the tampering did not occur. The detection results listed in Table 2 shows that our semi-fragile watermarking has the ability to resist common operations. But the above-mentioned criteria in the table, such as NC and BER, only manifests the robustness of semi-fragile watermarking, they can not determine whether the image is tampered and where is tampered. Table 2. Semi-fragile watermarking detection results for various attacks
Of course, an important aspect of our system is its ability to localize watermark tampering, even original audio. The tampering detection ability can be assessed by TAF(Tamper Assessment Function)which is defined by TAF ( i ) =
1 N
′
N
∑ v ′( i , j ) ⊕ v j =1
t
(i, j )
(5)
A Novel Dual Watermarking Scheme for Audio Copyright Protection
155
where N is the number of bits for each line. We have tampered the previously watermarked audio and tested the ability of our system to detect and highlight the doctoring. We mainly consider the following two kinds of malicious operations: cutting attack and replacement operation. step1: we produced tampered audio by cutting in a specific area. Then, the detection results are presented for audio. As shown in Fig. 5, our system can recognize the tampering in any region of audio and reach a very high precision.
(a)
(b)
(c)
Fig. 5. Tampered image after cutting (a), tampered region in block (b) and tampered section in part xi of audio
(a)
(b)
(c)
Fig. 6. Tampered image after replacement (a), tampered region in block (b) and tampered section in part xi of audio
The tampering was conducted by applying other audio section to replace the same length of original audio. The destructive region can be tested and located as Fig. 6 which demonstrates the ability of our system to detect tampering is super. Fig. 6 (a)-6(c) respectively shows which block is destroyed in the image, which line is tampered in the block and which section of specific part is destroyed in the audio. In addition, we can reach a conclusion that the distribution of pixels are not uniform in the tampered region comparing Fig. 6 (a)-6 (b) with the image after common attacks.
156
Z. Ma, X. Zhang, and J. Yang
5 Conclusions An efficient dual watermarking scheme is proposed to solve the problems—owner identification and tampering localization of audio. The robust watermarking achieves great robustness and imperceptibility; meanwhile, the fragile watermarking implements more accurate tampering detection. Since it combines WPT, image segmentation and mean quantization to watermarking embedding and recovery. Among them the image segmentation is the biggest difference between our scheme and others. Moreover, we get a conclusion that the criteria to evaluate malicious tampering are no longer NC and BER when tampering occurred. Simulation results demonstrate the outstanding nature of our algorithm. However there are still some issues deserve further exploration. Our future work will concentrate on introducing synchronization strategy into the scheme to make it resist synchronization attacks.
,
References 1. Malvar, H.S., Florencio, D.F.: A New Modulation Technique for Robust Watermarking. IEEE Trans. Signal Process 51, 898–905 (2003) 2. Fu, X.-B.: Digital Audio Fragile Watermarking Algorithm Based on Mean Quantization. Applied Science and Technology 32(8), 17–19 (2005) 3. Lu, B.-L., Zhu, Y.-Q.: An Algorithm of Dual Watermarking Based on Wavelet Transform. Microelectronics and Computer 24(8), 31–34 (2007) 4. Wen, Q., Wang, S.-X.: The Concept and Application of Zero-Watermarking. Electronic Journal 31(2), 214–216 (2003) 5. Sun, X.-D., Lu, L.: Application and Research of Arnold Transform in Digital Image Watermarking. Information Technology Magazine 10, 129–132 (2006) 6. Zhong, X., Tang, X.-H.: Audio Characteristics-Based Zero-Watermarking Scheme in Wavelet Domain. Journal of Hangzhou University of Electronic Science and Technology 27(2), 33–36 (2007) 7. Zhang, D.-F.: Matlab Wavelet Analysis. Machinery Industry Press (2009) (in Chinese)
On the Strength Evaluation of Lesamnta against Differential Cryptanalysis Yasutaka Igarashi and Toshinobu Kaneko Tokyo University of Science, 2641 Yamazaki, Noda, Chiba, 278-8510, Japan [email protected], [email protected]
Abstract. We focus on the cryptographic hash algorithm Lesamnta256. Lesamnta-256 consists of the Merkle-Damg˚ ard iteration of a compression function and an output function. The compression function consists of a mixing function and a key scheduling function. The mixing function consists of 32 rounds of four-way generalized Feistel structure. On each round there is a nonlinear function F with 64-bit input/output, which consists of the 4 steps of AES type of SPN (Substitution Permutation Network) structure. A subkey is XORed only at the first step of the SPN. The designers analyzed its security by assuming that the subkey is XORed at every step of the SPN. Such an independent subkey assumption is also applied to the analysis of other SHA-3 candidates, e.g. Grøstl, LANE, Luffa. However we analyze the security of these components of Lesamnta as is. We show that the 2 steps of SPN referred to as XS have the maximum differential probability 2−11.415 . This probability is greater than both of the differential characteristic probability 2−18 and the differential probability 2−12 derived under the independent subkey assumption. On the strength of whole compression function, we show that there are at least 15 active F functions in the mixing function on 64-bit truncated analysis. As the input bit length of the mixing function is 256, we can say that it is secure against differential attack if the maximum differential probability of F function is less than 2−256/15 ≈ 2−17.067 . We also show that the key scheduling function is secure against differential cryptanalysis.
1
Introduction
Lesamnta is a family of the hash functions proposed by Hirose, Kuwakado, and Yoshida in 2008, which was one of candidates for the new hash algorithm SHA-3 [1], [2]. Semi-free start collision and preimage attack on Lesamnta was reported, and designers modified its round constants to prevent these attacks [3]. Lesamnta was not selected for the second round of the SHA-3 competition, however the vulnerability of the modified version of Lesamnta has not been founded yet [4]. Lesamnta provides 4 different sizes of message digest, i.e. 224, 256, 384, and 512 bits. These message digests are produced by 4 algorithms: Lesamnta-224, Lesamnta-256, Lesamnta-384, and Lesamnta-512. Lesamnta employs the block cipher E as its major component. If the block cipher is assumed to be truly random, Lesamnta is indifferentiable from a random oracle. In other words, the Y. Tan, Y. Shi, and K.C. Tan (Eds.): ICSI 2010, Part II, LNCS 6146, pp. 157–166, 2010. c Springer-Verlag Berlin Heidelberg 2010
158
Y. Igarashi and T. Kaneko H(i-1) 256bit
Message M(1)
M(2)
M(N)
K(0)
M(i) 256bit
C(0) 64bit
fK
Key 64bit
fM
0th round
C(1) 64bit
fK
Key K(1) 64bit
fM
1st round
C(31) 64bit
fK
Key K(31) 64bit
fM
31st round
256 bit H(0) 256 bit
Compression function
Compression function
Block cipher E
Compression function
Output function
256 bit H(1)
H(2)
Fig. 1. Lesamnta-256
H(N-1䠅
H(N) Message digest
Key scheduling function
H(i) 256bit Mixing function
Fig. 2. Block cipher E
security of Lesamnta is based on the security of the block cipher E. So we focus on Lesamnta-256, and analyze the block cipher E. Then we derive some properties of Lesamnta-256 from the analysis. Block cipher E consists of a mixing function and a key scheduling function. The mixing function and the key scheduling function consist of F function and G function, respectively. We analyze differential properties of the mixing function and the key scheduling function in Sect. 5 and 6, respectively. In the analysis we focus on nonlinear functions without independent key addition, and evaluate the independence of the nonlinear functions properly. Our analysis is valid for the modified version of Lesamnta because round constants do not affect differential property. Moreover, our analysis is more precise than the analysis with independent subkey assumption, which is novelty in this paper. Such analysis is indeed absent from the submission documents of Lesamnta, and important for the cryptanalysis of a hash function. As preliminaries to the analysis, we give an outline of Lesamnta-256 in Sect. 2, and review a differential probability and a differential characteristic probability in Sect. 3 and 4.
2
Lesamnta-256
We survey Lesamnta-256 with a focus on the block cipher E in this section. Further details are given by [1], [2]. Lesamnta consists of the Merkle-Damg˚ ard iteration of a compression function and an output function, which is similar to the compression function. Figure 1 shows a schematic diagram of Lesamnta-256. A message is divided into a 256bit message block M (i) (1 ≤ i ≤ N ) that is put into the compression function. The compression function consists of the block cipher E and XOR (⊕). M (i) is encrypted by using the chaining variable H (i−1) as a secrecy key to the block cipher E. An output of the compression function is put into the next compression function as the chaining variable. The final message block M (N ) is put into an
On the Strength Evaluation of Lesamnta against Differential Cryptanalysis
159
=
M(i) H(i-1)
m0(i ) m1( i ) m2( i )
K(0)
G
m3( i )
F
C(0) m0′( i ) m′1 ( i ) m′2( i )
m′3( i )
Fig. 3. The zeroth (first) round of fK (left side) and fM (right side) a0
a1
a2
a3
a4
a5
a6
a7
Input
S-box
S-box
S-box
S-box
S-box
S-box
S-box
S-box
ByteSwap layer
3
2
3
3
2
3
3 2
2
2
3
3
S-box layer
3
ShiftRow layer
2
2
2
1 round of SPN
MixColumn layer Total 4 rounds ByteSwap layer
a’0
a’1
a’2
a’3
a’4
a’5
Fig. 4. G function
a’6
a’7
Output
Fig. 5. F function
output function. Its output H (N ) is the hash value of the input message, i.e. a message digest. Figure 2 shows a schematic diagram of the block cipher E that consists of a key scheduling function and a mixing function. C (i) denotes round constants, which are modified by the designers to prevent semi-free start collision and preimage attack [3]. The key scheduling function and the mixing function consist of the 32 rounds of fK and fM function, respectively. fK and fM functions have a 4-way generalized Feistel structure as shown in Fig. 3. Each function consists of XOR and G function or F function. In fM for example, the message block M (i) is (i) (i) divided into 4 blocks of 64-bit data mj (0 ≤ j ≤ 3), and m j is derived as its output. Figure 4 shows a schematic diagram of G function, which is bijective. ai and ai (0 ≤ i ≤ 7) denote an 8-bit input and output (I/O) of G function, respectively. G function employs the AES type of S-box. This S-box has the differential probability of 2−6 , and its algebraic degree is 7. 2 and 3 denote multiplication of its input by 2 and 3 over GF(28 ), whose characteristic polynomial ϕ(x) is the same as AES given by ϕ(x) = x8 + x4 + x3 + x + 1. Representing an output of S-box as S(ai ), the output a0 in Fig. 4 is, for example, given by a0 = 2 · S(a0 ) ⊕ S(a1 ) ⊕ S(a2 ) ⊕ 3 · S(a3 ).
160
Y. Igarashi and T. Kaneko Plaintext P n bit P0 Key K0 F0 C0 Key K1 P1 F1 C1
Secret key K x0
x1
x2
x3
x4
x5
x6
x7
2
2
2
2
2
2
2
2
x’0
x’1
x’2
x’3
x’4
x’5
x’6
x’7
x’’0
x’’1
x’’2
x’’3
x’’4
x’’5
x’’6
x’’7
Fig. 6. The first 4 layers and last layer from the input of F function
Figure 5 shows a schematic diagram of F function. S-box layer, ShiftRow layer, and MixColumn layer are collectively referred to as SPN (Substitution Permutation Network). F function is bijective and consists of the 4 rounds of SPN which are sandwiched between ByteSwap layers. S-box layer is a nonlinear function, and the other layers are linear functions. Figure 6 shows the first 4 layers and the last ByteSwap layer from the input of F function. xi and xi (0 ≤ i ≤ 7) denote an 8-bit I/O of the first 4 layers, respectively. xi denotes an 8-bit output of F function. S-box and the characteristic polynomial used in F function are the same as AES. As an example x0 is given by x0 = S(x0 ) ⊕ 2 · S(x3 ).
3
Differential Probability and Differential Characteristic Probability
In this section we describe a differential probability (DP) and a differential characteristic probability (DCP), which are security indexes against differential cryptanalysis [5]. We review DP and DCP [6] by taking a typical model of n-bit block cipher shown in Fig. 7 as an example. Fi (0 ≤ i ≤ r − 1) denote an encryption function. Pi and Ci denote the I/O of Fi , respectively. Round key (subkey) Ki (0 ≤ i ≤ r) is assumed to be uniformly random. When Pi is also uniformly random, differential probability DPFi of Fi is given by DPFi(ΔPi → ΔCi) =
#{Pi |Fi (Pi) ⊕ Fi (Pi ⊕ΔPi) = ΔCi } 2n
(1)
where ΔPi and ΔCi denote differences of Pi and Ci , respectively. #{•|∗} denotes the number of occurrences of the variable • conditioned by ∗. When n is large,
On the Strength Evaluation of Lesamnta against Differential Cryptanalysis
161
e.g. n > 32, it is generally difficult to derive DP of this block cipher. In such a case, we derive the differential characteristic probability DCP given by DCP (ΔP → ΔP0 → · · · → ΔPr−1 → ΔC) =
r−1
DPFi (ΔPi → ΔCi)
(2)
i=0
where ΔP = ΔP0 , ΔCi = ΔPi+1 , and ΔCr−1 = ΔC.
(3)
If the block cipher shown in Fig. 7 is ideally random, we expect that DCP = 0. If we find the ≤ 2−n regardless of ΔP , ΔPi , ΔCi , and ΔC when ΔP differential characteristic (ΔP → ΔPi → ΔCi → ΔC) that gives DCP > 2−n , we can distinguish the block cipher from a random function. If we also find the differential characteristic that gives DCP > 2−k where k is the bit length of a secret key K, the block cipher does not ensure the security provided by K against a differential attack. As described above, DCP is practically a security index against the differential attack on a block cipher. And if the maximum differential characteristic probability DCPmax defined by the following equation does not exceed a threshold, i.e. 2−n and 2−k , the block cipher is assumed to be secure against the differential attack. DCPmax =
4
max
ΔP,ΔP0 ,··· ,ΔPr−1 ,ΔC
DCP (ΔP → ΔP0 → · · · → ΔPr−1 → ΔC). (4)
Differential Characteristic Probability and Data Truncation
We describe a data truncation technique related to DCP [7] in this section. First, we equally divide the n-bit difference ΔP of plaintext P in Fig. 7 into d blocks of the difference ΔP (i) (1 ≤ i ≤ d) as follows: ΔP = ΔP (1) ||ΔP (2) || · · · ||ΔP (d)
(5)
where the symbol x||y denote a concatenation of data x and y. ΔP (i) denotes n (= n/d) bits of difference. Next we represent ΔP (i) by the 1-bit difference ΔP (i) as follows: 0, ΔP (i) = 0 (i) ΔP = . (6) 1, ΔP (i) =0 Such an operation is referred to as data truncation, and ΔP (i) is referred to as a truncated difference. From (6), the XOR operation of the truncated difference is given by 0, ΔP (i) = ΔP (j) (i) (j) ⊕ ΔP = . (7) ΔP 1, ΔP (i) = ΔP (j)
162
Y. Igarashi and T. Kaneko
From (5) and (6), the truncated difference ΔP is given by ΔP = ΔP (1) || ΔP (2) || · · · || ΔP (d) . The differential characteristic probability DCP of the truncated difference ΔP is given by
Equation (10) is true, but the opposite is not always true. Therefore, the max satisfies the following inimum differential characteristic probability DCPmax equation: DCPmax =
max
ΔP ,ΔP0 ,··· ,ΔPr−1 ,ΔC
DCP (ΔP → ΔP0 → · · · → ΔPr−1 → ΔC )
≥ DCPmax ,
(11)
is the upper bound of DCPmax . The complexity of (11) is which shows DCPmax much less than that of (4). Accordingly, we employ DCPmax as a security index when the computation of (4) is difficult.
5
Differential Property of Mixing Function
In this section, we describe the differential property of the mixing function by 64-bit truncation. We also describe the differential property of F function and its equivalent modification. 5.1
Differential Property of Mixing Function by 64-Bit Truncation
In this section we derive the upper bound of DP of F function ensuring the security of mixing function. We first evaluate (11) for P = M (i) , C = H (i) shown in Fig. 2, and DPF i = DPfM , r = 32 in (8). Note that we assume n = 64 because the mixing function has a 4-way 64-bit generalized Feistel structure. Both ΔP and ΔC are therefore (256/64 =) 4-bit variables. We also assume that the round keys K (i) in Fig. 2 are uniformly random. Since fM function is bijective, we can find that there are 4 kinds of DPfM (ΔPi → ΔCi ) as follows: DPfM (x0 x1 0 x3 → x3 x0 x1 0) = 1 (passive), 0 < DPfM (x0 x1 1 0 → 1 x0 x1 1) < 1 (active),
(12) (13)
(active),
(14)
0 < DPfM (x0 x1 1 1 → x3 x0 x1 1) < 1 DPfM (the
other characteristics) = 0
(15)
On the Strength Evaluation of Lesamnta against Differential Cryptanalysis 0 0
0
F
1
F
K(10)
K(0) F
K(1)
F
F
K(12) F
K(3)
K(9)
r0
F
F
F
K(28) F
K(19)
S-box 8bit u1
Fig. 9. 16-bit I/O function referred to as XS
u0
K(27)
K(18) F
S-box 8bit u0
F
F
F
t0 MixColumn t1
K(26)
K(17)
K(8)
F
F
F
0
F
K(25)
K(16)
K(7)
2
K(24) F
F
11
F
F
K(15)
K(6)
0
r1 8bit S-box s1
2
K(23)
K(14) F
F
K(31) F
F
F
K(5)
F
K(22)
K(13)
K(4)
K(30)
K(21) F
r0 8bit S-box s0
F
K(20)
K(11)
K(2)
F
163
F
K(29)
r1
XS u1
R-M-R
u2
v0
v1 XS
r3 XS u3 2
2
w0
Fig. 8. Differential path satisfying (11) for mixing function where n = 64, ΔP = 0001, and ΔC = 0110
r2
v2
w1 w2
r4 u4
r5 XS u5
2
r6 u6
r7 XS u7
2
2
2
v3 XS
v4
2
2
v5 XS
w3 w4
v6
v7 XS
w5 w6
w7
Fig. 10. Diagram of mF
where xi denotes “0” or “1.” ΔPi and ΔCi are represented by binary four digits. Equation (12) represents DP of fM function when the input difference to F function is zero. In this case, we say that F function is passive. Equation (13) and (14) represent the probability when the input difference to F function is nonzero. In this case, we say that F function is active. Equation (15) represents the probability when (ΔPi → ΔCi ) takes the other differential characteristics, i.e. the other characteristics do not exist. By a computer search under such appropriate = 0, 1) satisfying (11) is conditions, the minimum number of DPfM (ΔPi , ΔCi ) ( derived as 15, i.e. the number of active F functions in the mixing function is 15. The difference propagates as shown in Fig. 8 (referred to as differential path). Bold line denotes a path with a nonzero difference (referred to as active path), while thin line denotes a path with a zero difference (referred to as passive path). We can find that total 15 bold lines put into F functions. The mixing function is assumed to be secure against differential attack if DCP does not exceed 2−256 , because the mixing function is a 256-bit block encryption function. Accordingly, the following property is obtained.
164
Y. Igarashi and T. Kaneko
Property 1 The security of the mixing function is assured if the maximum differential probability DPmax of F function is given by DPmax ≤ 2−256/15 ≈ 2−17.067 . 5.2
Differential Property of F Function
F function consists of 4 rounds of SPN. At the input to the first round of SPN, a 64-bit subkey is XORed to the data. On the other hand, the subkey is not inserted into the other rounds of SPN. Therefore, it is not appropriate to estimate (4) or (11) by assuming all the S-boxes in F function behave independently each other as noted in [1]. To be precise, we must estimate differential probability (1) of F function. However, such estimation is difficult because the bit length of I/O of F function is as large as 64. 5.3
Equivalent Modification of F Function
We represent S, R, and M as S-box layer, ShiftRow layer, and MixColumn layer in Fig. 5, respectively. The 4 rounds of SPN procedure can be represented as (SR-M)-(S-R-M)-(S-R-M)-(S-R-M) where we can interchange S and the following R because S and R are a byte-oriented substitution and a byte-oriented swap, respectively. Accordingly, the 4 rounds of SPN procedure can be rewritten as R-mF -(R-M)
(16)
where mF = (S-M-S)-(R-M-R)-(S-M-S). The first R and the last R-M in (16) do not dominate DPmax because they are linear transformation. So we only have to analyze the middle part, i.e. mF . Since S is a parallel operation of 8 S-boxes and M is a parallel operation of four 16-bit I/O MixColumns shown in Fig. 6, the operation of S-M-S can be represented by a parallel operation of the four 16-bit I/O functions (referred to as XS) shown in Fig. 9. ri and ui (i = 0, 1) denote 1-byte I/O of XS. si and ti (i = 0, 1) denote intermediate bytes between ri and ui . By using XS, the structure of mF can be shown as Fig. 10 where ri and wi (i = 0, 1, · · · , 7) denote I/O bytes of mF . ui and vi denote intermediate bytes between ri and wi . R-M-R is a linear transformation with branch number 3, i.e. nonzero differences pass through at least 3 XS, when differences are put into F function. Through an exhaustive search of (1) for XS, we find the following property. Property 2. DPmax of XS is given by DPmax = 2−11.415 where I/O differences are 1313 and b0b0 given in hexadecimal, respectively. If a subkey is XORed at all the inputs of 4 S-boxes in Fig. 9, i.e. all these S-boxes are independent each other, the maximum differential characteristic probability DCPmax of XS is given by DCPmax = (2−6 )3 = 2−18 . And, a key-averaged DPmax of XS is given by DPmax = (2−6 )3−1 = 2−12 because the branch number of MixColumn is 3 and DPmax of S-box is 2−6 [8]. However, the real value
On the Strength Evaluation of Lesamnta against Differential Cryptanalysis 00 00 00
01 G
G
G
C(10)
C(0) G
G
G
G
G
G
G
C(24) G
G
C(25) 0f
C(15) G
G
77
C(6)
G
C(16)
G
C(26) G
C(7)
G
C(17) G
C(27) G
C(8)
G
C(18) G
C(28) G
0f
C(19)
00
G
C(14)
C(5)
00 00 0f
C(23) c3
C(13)
G
C(4)
C(31)
C(22) 77
G
c3
G
C(21)
G
C(12) 01
C(3)
C(30) G
c3
C(11)
C(2)
G
C(20) G
C(1)
C(9)
165
G
c3
C(29)
Fig. 11. Differential path of key scheduling function representing DCPmax =2−294 where ΔP and ΔC are “00 00 00 01” and “00 00 0f 00” given in hexadecimal, respectively
of DPmax of XS is even larger than 2−12 . If all the eight XS in Fig. 10 are independent each other, DPmax of F function is given by DPmax = (2−11.415 )3−1
(17)
because the branch number of R-M-R is 3. If this were true, the mixing function is assumed to be secure because (17) is smaller than the security threshold 2−17.067 shown in Sect. 5.1. However, the four XS below R-M-R are not actually independent because subkeys are not inserted at the inputs to these XS. Similar analyses, i.e. independent subkey assumption, has been found in other proposal literatures for SHA-3, e.g. Grøstl, LANE, and Luffa. We should pay attention to the security of these candidates if their security proofs mainly rely on the independent subkey assumption.
6
Differential Cryptanalysis of Key Scheduling Function by 8-Bit Truncation
We can properly evaluate the security of the key scheduling function shown in Fig. 2 because all S-boxes are independent each other in the function. We
166
Y. Igarashi and T. Kaneko
assume that round constants are uniformly random. From (11) where n = 8, P = H (i−1) , C is the output of the 31st round of fK function, and DPF i = DPfK , we derive the DCPmax of the key scheduling function as follows: DCPmax = 2−294 .
(18)
Figure 11 shows the corresponding differential path to (18). For example, truncated I/O differences of G function on the 4th round are 01 and c3 given in hexadecimal, respectively. From this result, we can find that the key scheduling function is secure against differential attack.
7
Conclusion
We have discussed the strength of Lesamnta-256 against differential attack. We showed that Lesamnta-256 is secure against the attack if DPmax of F function is less than 2−17.067 . We noted that it is not appropriate to estimate the strength by applying independent subkey assumption to every step of SPN as described in the proposal. We equivalently modified F function, and showed that the real value of DPmax of XS is 2−11.415 , which is greater than DCPmax of 2−18 and DPmax of 2−12 derived under the independent subkey assumption. We also showed that the key scheduling function is secure against differential cryptanalysis.
References 1. First round candidates for SHA-3, http://csrc.nist.gov/groups/ST/hash/sha-3/Round1/submissions_rnd1.html 2. The Hash Function Family Lesamnta, http://www.sdl.hitachi.co.jp/crypto/lesamnta/ 3. Hirose, S., Kuwakado, H., Yoshida, H.: Security analysis of the compression function of Lesamnta and its impact, http://csrc.nist.gov/groups/ST/hash/sha-3/ Round1/documents/LESAMNTA Comments.pdf 4. Regenscheid, A., Perlner, R., Chang, S.-j., Kelsey, J., Nandi, M., Paul, S.: Status Report on the First Round of the SHA-3 Cryptographic Hash Algorithm Competition, NISTIR 7620., http://csrc.nist.gov/groups/ST/hash/sha-3/Round1 documents/sha3 NISTIR7620.pdf 5. Seki, H., Kaneko, T.: Differential Cryptanalysis of Reduced Rounds of GOST. In: Stinson, D.R., Tavares, S. (eds.) SAC 2000. LNCS, vol. 2012, pp. 315–323. Springer, Heidelberg (2001) 6. Biham, E., Shamir, A.: Differential Cryptanalysis of DES-like Cryptosystems. In: Menezes, A., Vanstone, S.A. (eds.) CRYPTO 1990. LNCS, vol. 537, pp. 2–21. Springer, Heidelberg (1991) 7. Knudsen, L.R.: Truncated and higher order differentials. In: Preneel, B. (ed.) FSE 1994. LNCS, vol. 1008, pp. 196–211. Springer, Heidelberg (1995) 8. Hong, S., Lee, S., Lim, J., Sung, J., Cheon, D., Cho, I.: Provable security against differential and linear cryptanalysis for the SPN structure. In: Schneier, B. (ed.) FSE 2000. LNCS, vol. 1978, pp. 273–283. Springer, Heidelberg (2001)
Sparse Source Separation with Unknown Source Number Yujie Zhang1 , Hongwei Li1 , and Rui Qi2 1
School of mathematics and Physics China University of Geosciences Wuhan, 430074 [email protected] 2 School of sciences Naval University of Engineering Wuhan, 430033, China
Abstract. Sparse Blind Source Separation (BSS) problems have recently received some attention. And some of them have been proposed for the unknown number of sources. However, they only consider the overdetermined case (i.e. with more sources than sensors). In the practical BSS, there are not prior assumptions on the number of sources. In this paper, we use cluster and Principal Component Analysis (PCA) to estimate the number of the sources and the separation matrix, and then make the estimation of sources. Experiments with speech signals demonstrate the validity of the proposed method.
1
Introduction
BSS consists in recovering unknown sources from their unknown mixtures. Since the pioneering work by Jutten and Herault[1], many methods for BSS have been proposed. Most of them assumed that the number of sources is equal to the number of sensors. In many case, the number of the sources is changing over time. Therefore the mixture matrix and demixture matrix are not square and not invertible. Recently, some approaches for overdetermined and underdetermined BSS assumed that the number of sources is known[2-4]. Only a few of the papers discussed the unknown source number case[5-6]. But, they all assume there are more sensors than sources. In this paper, we consider the instantaneous mixture systems, and make no assumption on the source number, which is an important practical issue, and the method for BSS sparse sources available for either the overdetermined or the underdetermined case. The paper is organized as follows. Section2 discusses the signals model. Section3 introduces the process of estimating the number of sources and separation matrix. Section4 presents the algorithm steps and details some stimulation experiments. Finally, a conclusion is made in Section5.
2
Under- and Over-Determined BSS Problems
Assume that the sources are stationary zeromean processes and sufficiently sparse. Let s(t) = [s1 (t), s2 (t), · · · , sn (t)]T be an unknown sparse source vector Y. Tan, Y. Shi, and K.C. Tan (Eds.): ICSI 2010, Part II, LNCS 6146, pp. 167–172, 2010. c Springer-Verlag Berlin Heidelberg 2010
168
Y. Zhang, H. Li, and R. Qi
and x(t) = [x1 (t), x2 (t), · · · , xm (t)]T be a sensor vector, which is linear instantaneous mixture of sources by x(t) = As(t) (1) Where A ∈ Rm×n is an unknown mixture matrix of full rank(i.e.rank(A) = min(m, n)). The blind separation problem is to recover original signals from observations x(t) without prior knowledge on the sources and mixture except for sparse of the sources. The demixing model here is a linear transformation of form y(t) = W x(t) (2) Where y(t) = [y1 (t), y2 (t), · · · , yn (t)]T , W ∈ Rn×m is a separating matrix. If n > m, the mixture is called underdetermined. And if n < m, the mixture is called overdetermined.
3
Estimation of Number and Separating Matrix
The sparsity of sources can be used to estimate the mixing matrix in clustering approaches[7-8]. But [7-8] only discussed the case of a known number of sources. Here, we discuss the BSS problem with an unknown number of sources. In [9], it indicated that for getting N sparse signals from N observations, N (hyper-) planes of the form α1 x1 + · · · + αN xN = 0 must be first ”fitted” onto the scatter plot of observations. Then each row of the separating matrix was the coefficients (α1 , α2 , · · · , αN ) of one of these (hyper-)planes. Considering the problem of fitting an N-dimensional hyper-plane α1 x1 + · · · + αN xN = 0 (i) (i) (i) onto a set of K data points {xi = (x1 , x2 , · · · , xN )T , i = 1, 2, · · · , K}, the best hyper-plane was obtained by minimizing the cost function: ϕ(α1 , · · · , αN ) =
K
(i)
(i)
(i)
(α1 x1 + α2 x2 + · · · + αN xN )2
(3)
i=1
Subject to the constraint α21 + α22 + · · ·+ α2N = 1. Using Lagrange multipliers, the coefficient vector α = (α1 , · · · , αN )T of the hyper-plane α1 x1 + · · · + αN xN = 0 (i) (i) (i) which had the best fit onto the set of data points {xi = (x1 , x2 , · · · , xN )T , i = 1, 2, · · · , K} was the eigenvector of the correlation matrix Rx which corresponds to its minimum eigenvalue. That is to say, the BSS matrix can be obtained by a clustering of observation samples, and then by taking the direction of the smallest principal component of each cluster as the rows of W , the k-means algorithm is used to separate the mixtures into C clusters[9]. If we know the number of the sources, we can obtain the estimation source directly. But sometimes the number of sources is unknown. we can suppose C(C > n) as the number of sources. Thus some of the clusters represent the mixtures of overlapping sources intervals, which should be removed from the clusters in the sense of estimating the number of sources. To reduce the effect of overlapping sources to the estimation of the number of sources and separating matrix, we present an improvement result. Let χi
Sparse Source Separation with Unknown Source Number
169
denote the ith i = 1, 2, · · · , C set of clustering results, Ni be the number of (i) (i) (i) elements in χi , αi = (α1 , α2 , · · · , αm )T be the coefficient of the hyper-plane (i) (i) (i) of the ith cluster, and cij = (x1j , x2j , · · · , xmj )T be the jth element of χi , (i)
ˆ = maxi Ni , and we use the following procedure to refine the hyper-plane, N clustering results: Algorithm 1. (estimate number) Randomly distribute the observation samples into C clusters χ1 , · · · , χC For each cluster Computer αi (i.e. the eigenvector of the correlation matrix of the points in χi which corresponds to its minimum eigenvalue) (i)
If dij < ρd , then remove the cij from χi End for update χi and the element number Ni of χi until αi do not change ˆ < ρN , then remove the ith cluster If Ni /N Update the clusters End for Where ρN and ρd are preset thresholds whose value should be selected properly. If the source signals are not sparse enough, a small value should be selected for ρd while a large value for ρN . If the sources are not sparse enough, we used the Short-Time Fourier transform (STFT) or Discrete Cosine Transform (DCT) of observations. This is because these transform increases the sparsity of speech signals, without affecting the mixing matrix, since these transform is linear [8]. The algorithm may get trapped in a local minimum. One approach for escaping local minima is to run the algorithm with several randomly chosen initializations, and then to take the result which produces the minimum cost-function. Here, too, we use the same idea for reducing the probability of getting trapped in a local minimum: run the algorithm 1 with several random initializations, and calculate the final cost function ϕ = d2 (xi , l1 ) + · · · + d2 (xi , lC ).Take xi ∈ϕ1
xi ∈ϕC
the answer which results in the smallest final cost function.
4
Stimulation Experiments
In order to obtain the number and the separation matrix, using the following steps: Step 1. If the signals are not sparse enough, we applied them on the Fourier transform or Discrete Cosine Transform (DCT) of observations, let the signals are X(t), t = 1, 2, · · · , N . Step 2. Choose C > n.
170
Y. Zhang, H. Li, and R. Qi
Step 3. Randomly divide X into C clusters. Step 4. Compute the eigenvector of the correlation matrix R(i) of χi which corresponds to its minimum eigenvalue as αi . (i)
Step 6. If dij > ρd , then remove the Cij from αi . Let X = [χ1 , χ2 , · · · , χC ]T , go to step3 until αi do not change. ˆ < ρN , then remove the ith cluster. Step 7. if Ni /N 2 2 Step 8. Compute the cost function ϕ = d (xi , l1 ) + · · · + d (xi , lC ) xi ∈ϕ1
xi ∈ϕC
Step 9. Repeat step3 to step8 several times, choose the C and αi in the smallest the cost function. After the refined clustering procedure, we use the refined cluster number C as the number of sources and the αi as the estimation of the row of matrix W . Once the number of sources and the separation matrix are obtained, we can separate the sources in the following two cases. If the estimated sources number C ≤ m, we estimate the source vector by using s = (W T W )−1 W T x. Which is already proved in [10] that s is the optimum solution in the sense of least square error with the assumption in 1). If C > m, we can use [8] find the solution. If the observation signals use the translation, for each separation signal, the inverse transformation was used to obtain the estimation of sources. In this paper, we consider three experiments: 1.The underdetermined BSS, we use all the four speech signals in Fig.1 as the sources. A is 3 × 4 random matrix; 2.The well-determined BSS, we use the frontal three speech signals as the sources. A is 3 × 3 random matrix; 3.The overdetermined BSS, we use the frontal two speech signals as the sources. A is 3 × 2 random matrix. 10 5 0 −5
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
5
0
−5 5
0
−5 10
0
−10
Fig. 1. Signal sources
Sparse Source Separation with Unknown Source Number experiment 1
experiment 2
35
171
experiment 3
30
40
30 25
35
20
30
20
SIR(dB)
SIR(dB)
SIR(dB)
25
15
25
15
10
20
10 sources1 sources2 sources3 sources4 5 5000
6000
7000
8000
9000 10000 Sample Number
11000
12000
13000
sources1 sources2 sources3 5 5000
6000
7000
8000
9000 10000 Sample Number
11000
12000
13000
sources1 sources2 15 5000
6000
7000
8000
9000 10000 Sample Number
11000
12000
13000
Fig. 2. The SN R measure of the separation
In all the experiments, we consider separating speech sources from their mixtures and select ρd = 0.3, ρN = 0.1. In each simulation, we applied the algorithm on the DCT of observations. To measure the performance of the algorithm, the SNR is defined as SN Ri = 10log
si (k)2 ˆ si (k) − si (k)2
(4)
where · 2 is the sum of squares over discrete time k. In each experiment, the sample size of the sources range from 5000 to 13000. The algorithm is run 10 times(with 10 different random cluster) and the averaged SN R is calculated. The results of separation of the different sources are shown in Fig.2. The results of the experiments clearly show the performance. On the average, we find that the performance of each experiment is becoming better as the sample numbers are added, and when the sample number exceed 8000, every experiment can make an ideal separation result for the source signals, the average SN R is over 10db.
5
Conclusion
This paper studies the BSS problem where the number of sources is unknown. Our method estimates the source number and separation matrix via cluster and PCA. The simulation results of speech signals show the validity of the proposed method. As further work, we are currently interested on the extension of the methods presented here to correlated sources. Acknowledgment. This work was supported by National Natural Science Foundation of China(Grand No.60672049 and Grand No.40776006) and the Special Fund for Basic Scientific Research of Central Colleges, China University of GeosciencesWuhan (Grand No.CUGL090252).
172
Y. Zhang, H. Li, and R. Qi
References 1. Jutten, C., Herault, J.: Blind separation of sources part I: An adaptive algorithm based on neuromimetic architecture. Signal Process 24(1), 1–10 (1991) 2. Albera, L., Ferreol, A., Comon, P., Chevalier, P.: Blind identification of overcomplete mixtures of sources(BIOME). Linear Algebra Appl. 391, 3–30 (2004) 3. Naini, M.F., Mohimani, F.H., Babaie-Zadeh, M., Jutten, C.: Estimating the mixing matrix in Sparse Component Analysis based on partial k-dimensional subspace clustering. Neurocomputing 71, 2330–2343 (2008) 4. Cichocki, A., Karhunen, J., Kasprzak, W., Vigario, R.: Neural networks for blind separation with unknown number of sources. Neurocomputing 24(1), 55–93 (1999) 5. Lewicki, M.S., Sejnowski, T.J.: Learning overcomplete representations. Neural Computation 12(2), 337–365 (2000) 6. Xiao, M., Xie, S.L., Fu, Y.L.: Underdetermined blind source separation algorithm based on normal vector of hyperplane 34(2), 142–149 (2008) 7. Zibulevsky, M., Pearlmutter, B.A., Bofill, P., Kisilev, P.: Blind source separation by sparse decomposition. In: Independent Component Analysis: Principles and Practice. Cambridge Univ. Press, Cambridge (2001) 8. He, Z.S., Xie, S.L., Fu, Y.L.: Sparse representation and blind separation of overcomplete. Science in China Ser. E. Information Sciences 36(8), 864–879 (2006) 9. Babaie-Zadeh, M., Jutten, C., Mansour, A.: Sparse ICA via cluster-wise PCA. Neurocomputing 69, 1458–1466 (2006) 10. Haykin, S.: Adaptive Filter Theory, 3rd edn. Prentice-Hall, Englewood Cliffs (1996)
Matrix Estimation Based on Normal Vector of Hyperplane in Sparse Component Analysis Feng Gao1 , Gongxian Sun1 , Ming Xiao1,2 , and Jun Lv1 1
School of Electric & Information Engineering, South China University of Technology, Guangzhou 510640, China 2 School of Computer & Electrical Information, Maoming University, Maoming, Guangdong, 525000, China [email protected], [email protected], {xiaoming1968,rylj}@163.com
Abstract. This paper discusses the matrix estimation for sparse component analysis under the k-SCA condition. Here, to estimate the mixing matrix using hyperplane clustering, we propose a new algorithm based on normal vector for hyperplane. Compared with the Hough SCA algorithm, we give a method to calculate normal vector for hyperplane, and the algorithm has lower complexity and higher precision. Two examples demonstrates its performance. Keywords: sparse component analysis (SCA); hyperplane clustering; underdetermined blind signal separation (BSS);normal vector.
1
Introduction
One task of sparse component analysis is how to estimate matrix form a large data set X(a form of m × N matrix): X = AS A ∈ Rm×n S ∈ Rn×N
(1)
where the A is a mixing matrix and S is sparse component. When the sparse components (sources) are statistically independent, the source can be estimated through independent component analysis (ICA)[1−5] method. If the sources are sparse, the mixing matrix and sources can be estimated by sparse component analysis (SCA) [6−9] . Recently, Georgiev, Theis and Cichocki present the k-SCA conditions : A1) the mixing matrix A ∈ Rm×n has the property that any square m × m-dimensional submatrix of it is nonsingular. A2) the sources are sparse of leveln − m + 1, i.e. each column of S has at least n − m + 1 zero elements. A3) the sources are sufficiently rich represented in the following sense: for any index set of n − m + 1 elementsI = {i1 , ..., in−m+1 } ⊂ {1, ..., n}, there exist at least m column vectors of the matrix S such that each of them has zero element in places with indexes in I and each m − 1 of them are linearly independent[10][11] . In fact, the k-SCA condition is extension of SCA. Therefore, hyperplane clustering is also applied to identify the matrix, for example, Hough SCA algorithm [11] . Y. Tan, Y. Shi, and K.C. Tan (Eds.): ICSI 2010, Part II, LNCS 6146, pp. 173–179, 2010. c Springer-Verlag Berlin Heidelberg 2010
174
F. Gao et al.
In Hough SCA algorithm, Georgiev, Theis and Cichocki hadn’t found the method to calculate the normal vector for hyperplane, so their algorithm is complex and not precise. In order to overcome the above problem, we give a formula to calculate the normal vector of hyperplane and then analyze the property of the mixtures under k-SCA condition. According to the k-SCA condition, a new algorithm based normal vector for hyperplane is proposed. At last, some experiment results testifies our algorithm.
2
A Hyperplane and Its Normal Vector
Given a group of linear independent m-dimensional vectors {uk }m−1 k=1 , where the vector uk = (uk1 , ..., ukm )T (k = 1, ..., m − 1), they will generalize a subspace H written: Hq := {y|y ∈ Rm , ∀c1 , ..., cm−1 ∈ R, y = c1 u1 + · · · + cm−1 um−1 }. The symbol (·)T denotes the transpose. As the subspace H generalized by any two linear independent vectors is a plane in 3-dimensional linear space, a subspace H in m-dimensional linear space is often called hyperplane. Set U = (u1 , ..., um−1 ) is m × (m − 1) matrix and then remove lth row from U to obtain the submatrix ⎤ ⎡ u11 · · · um−1,1 ⎥ ⎢ .. . . .. ⎥ ⎢. .. ⎥ ⎢ ⎥ ⎢ . ⎢ u1,l−1 . . um−1,l−1 ⎥ ⎥ l = 1, ..., m. (2) Ul = ⎢ ⎥ ⎢ ⎢ u1,l+1 . . . um−1,l+1 ⎥ ⎥ ⎢ ⎥ ⎢. . . .. ⎦ ⎣ .. .. u1m
· · · um−1,m
Definition 1. Give a nonzero vector n and ∀y ∈ H, if n, y = 0, the vector n is an orthogonal vector of a hyperplane H (i.e. n⊥H), where ·, · denotes inner product of two vectors. Theorem. The vector n0 = (det(U1 ), − det(U2 ), ..., (−1)m−1 det(Um ))T is unique normal vector of hyperplane H up to scaling of the normal vector. Proof. According above definition, m−1 n0 , u det(Um ) j = uj1 det(U1 ) + · · · + ujm (−1) ul1 u11 · · · um−1,1 .. . . .. = ... = 0 (j = 1, ..., m − 1). . . . ulm u1m · · · um−1,m The vector n0 is an normal vector of the hyperplane H. In order to find all the normal vectors, set n⊥H, n, ul = 0(l = 1, ..., m), thus, all the normal vectors of the hyperplane are the solution of the linear equation UT n = 0. As rank(U) = m − 1 and the vector n0 is one of its solutions, the solution of the linear equation is cn0 (i.e. n = cn0 ,c ∈ R). Therefore, the vector n0 is unique normal vector of hyperplane H up to scaling of the normal vector. we often use unite normal vector of the hyperplane H, so it is normalized n = n0 /||n0 ||.
Matrix Estimation Based on Normal Vector of Hyperplane in SCA
3
175
Matrix Estimation
In order to analyze a large data set X, the expression (1) under decomposing A into its columns aj ’s can be written: xt =
n
aj stj
(3)
j=1
where the vector xt is a column in the data set X, and assume the column vectors aj ’s to be normalized into unit length (||aj || = 1). Here, the sparse components and the matrix A are assumed to satisfy the condition A1), A2) and A3). Any m−1 columns {aik }m−1 k=1 of A will generalize a hyperplane in m-dimensional linear space. The hyperplane is written: Hq := {y|∀cik ∈ R, y = ci1 ai1 + · · · + cim−1 aim−1 }
(4)
n . m−1 According theorem 1, the normal vector bq of the hyperplane Hq , namely bq = (det(Aq1 ), − det(Aq2 ), ..., (−1)m−1 det(Aqm ))T , where the submatrix Aql is
Since the each column st of sources S has at least n − m + 1 zeros elements, it hyperplanes generalized by the columns of is obvious each vector xt lie in the Q Q n t Hq (Q = ,t = 1, ..., N ). A, namely,x ∈ m−1 q=1 According to the above description, a new algorithm is outlined as following. j j j Remove zero vectors and then {xt }N t=1 , i.e.x := x /||x ||. j k j k If x + x = 0 or x − x = 0, they are same vector or opposite vectors, let yl = xj and remove all the same or opposite columns of X with yl , record the rel moved number hl ,. . . ,repeat, obtain all the different normal vectors {yj }N j=1 ,and construct the data set Y = (y1 , ..., yNl ),their number respectively is h1 , ..., hNl . Select any m linear independent vectors {yji }m i=1 in the data set Y in term of det(yj1 , ..., yjm ) = 0, and then select any m − 1 vectors of them again and computer the corresponding unit normal vector n1 . Give a position, if the number of these vector are more than N0 , remove all columns in Y orthogonal with n1 ,. . . , repeat and obtain m normal vectors,. . . , continue to select and compute
176
F. Gao et al.
unit normal vector until there is not any m linear independent vectors in the data set Y. At last, obtain the different normal vectors {nj }L j=1 . Here, the remainder columns are small to be neglected. The data set X have L different hyperplane {Hj }L j=1 . Detect the columns in data set Y orthogonal with nj , if they are index j1 , ..., jk ∈ {1, ..., Nl }, we can compute the nonzero columns in data set X ork
thogonal with nj , namely,mj = hji . Then, we obtain the normal vectors {nj }L j=1 in term ofmj . {nj }L j=1
i=1
n generalized by the data set X only have , m−1
The normal vectors
n i.e. L = . Therefore, the normal vectors of the hyperplanes generalized m−1 by the columns in the matrix A have been obtained. It’s easy to obtain the ˆj = nj (j = estimation of orthogonal vector bq of the hyperplane Hq , i.e. b 1, ..., Q). After estimating the normal vectors {bj }Q vectors aj j=1 of the hyperplane, the
n−1 are identified as generators of the n lines lying at the intersections of
m − 2 n−1 hyperplanes, that is, the vectors aj must be orthogonal with vectors m−2 in {bj }Q j=1 .
4
Experiments and Results
Example 1: We consider the case of 4 sparse component and 3 mixtures in the example. To satisfy the k-SCA condition, so we made four artificial sparse components using randn command of Matlab (see fig.1).Form Fig.1, we know
Fig. 1. The four sparse components satisfied the k-SCA condition
Matrix Estimation Based on Normal Vector of Hyperplane in SCA
177
Fig. 2. (a) the scatter plot of the data set X; (b) the scatter plot of the projection of X on unite spherical surface
all the projected data of the mixtures lie ⎡on the six unite circles for they ⎤satisfy 0.4545 -0.7912 -0.6634 0.4568 the k-SCA condition.The matrix is A = ⎣ 0.4545 0.2120 -0.3830 -0.7912 ⎦. 0.7660 0.5736 0.6428 0.4067 ˆ6 ] = ˆ1 , ..., b After the simulation, six normal vectors are obtained as following: [b ⎡ ⎤ 0.5857 0.6105 0.5653 0.8012 0.3655 0.0999 ⎣ -0.8005 0.2197 0.6112 0.1672 0.5837 -0.8806 ⎦. 0.1275 0.7610 0.5539 -0.5746 0.7250 0.4632 There respectively are 400, 317, 302, 286, 257and 244 samples orthogonal to them. The two vectors (0.4942 - 0.7598 - 0.4225) and (0.2646 0.8419 - 0.4703) in the simulation only have 10 and 7 samples orthogonal to it. According to the six normal vectors, we can get the estimated mixing matrix, ⎡ ⎤ 0.6634 0.4545 0.7912 0.4568 ˆ = ⎣ 0.3830 0.4545 -0.2120 -0.7912 ⎦. A -0.6428 0.7660 -0.5736 0.4067 ˆ − A||2 = 0, Compare it with the original mixing matrix, we get minP∈p ||AP where p is a permutation matrix and p is a set of permutation matrix. Example 2: In order to show the performance of this algorithm again, we experimented under the condition n = 5, m = 3. A is decided randomly as follows, ⎡
⎤ 0.5525 0.3919 0.5707 0.3934 0.6904 A = ⎣ 0.6863 -0.6066 0.5166 0.8634 -0.6007 ⎦ . 0.4730 0.6917 -0.6383 -0.3158 0.4032 We generate 2,000 samples as artificial sparse components and substitute zero to 3 components chosen randomly (see Fig.3). The sparse components don’t satisfy the K-SCA condition, but most samples of the sparse components have at least n − m + 1 zero elements. So we see most of their projected samples lie on the ten unite circles in Fig.4(b).
178
F. Gao et al.
Fig. 3. The five sparse components
Fig. 4. (a)The scatter plot of the data set X; (b)the scatter plot of their projection on unite spherical surface
ˆ1 , ..., b ˆ10 . After the simulation, ten normal vectors are obtained as following b ˆ ˆ Here, the normal vectors [b1 , ..., b5 ] are respectively ⎡ ⎤ 0.5681 0.0331 0.7930 0.4046 0.1779 ⎣ 0.1051 0.7610 -0.1448 0.7851 0.6812 ⎦ -0.8162 0.6479 0.5917 0.4690 0.7102 ˆ 10 ] are respectively ˆ6 , ..., b and the vectors [b ⎡ ⎤ 0.4270 0.7678 0.5066 0.8326 0.1655 ⎣ 0.7877 -0.1986 -0.4899 -0.4803 -0.4094 ⎦ . 0.4440 -0.6091 -0.7095 -0.2758 -0.8972 There respectively are 406, 372, 364, 359, 358, 358, 356, 351, 340 and 338 samples ˆ 10 isn’t normal vector only. According to the orthogonal to them. The vector b ten normal vectors, we can get the estimated mixing matrix ⎡ ⎤ 0.6918 0.3755 0.3931 0.5527 0.5703 ˆ = ⎣ -0.5981 -0.6102 0.8636 0.6860 0.5179 ⎦ . A 0.4045 0.6976 -0.3156 0.4731 -0.6376
Matrix Estimation Based on Normal Vector of Hyperplane in SCA
179
ˆ − A||2 = 3.2956e-4, Compare it with the original matrix, we get minP∈p ||AP where p is a permutation matrix and p is a set of permutation matrix. From the examples, our algorithm is simple and it has high precision.
5
Summary and Conclusions
Matrix estimation under k-SCA condition is studied in this paper. The matrix estimation algorithm based on normal vector for hyperplane is a main contribution. Our algorithm improves Georgiev’s Hough SCA algorithm, for we give a formula to calculation normal vector for hyperplane. The experiment results show our algorithm simpler and practical. Acknowledgments. The work was supported by national basic research project (2010CB731800), the key project of NSFC-Guangdong Natural Science Foundation (U0635001,U0835003), NSFC (60874061,60774094) and Guangdong international cooperative research project (2009B050700020).
References 1. Cardoso, J.F.: Blind signals separation: Statistical principles. Proc. IEEE 86, 1129– 1159 (1998) 2. Zhang, J., Xie, S., Wang, J.: Multi-input single-output neural network blind separation algorithm based on penalty function. DCDIS-Series B-Applications & Algorithms Suppl. SI, 353–361 (2003) 3. Xie, S., He, Z., Gao, Y.: Adaptive Theory of Signal Processing. Chinese Science Press, Beijing (2006) 4. Xie, S., He, Z., Fu, Y.: A note on Stone’s conjecture of blind signal separation. Neural Computation 17, 321–330 (2005) 5. Cichocki, A., Amari, S.: Adaptive blind signal and image processing: learning algorithms and applications. Wiley, New York (2002) 6. Bofill, P., Zibulevsky, M.: Underdetermined blind source separation using sparse representations. Signal Processing 81, 2353–2362 (2001) 7. Li, Y., Amari, S., Cichocki, A., et al.: Underdetermined Blind Source Separation Based on Sparse Representation. IEEE Transactions on Signal Processing 54(2), 423–437 (2006) 8. He, Z., Xie, S., Fu, Y.: FIR convolutive BSS based on sparse representation. In: Wang, J., Liao, X.-F., Yi, Z. (eds.) ISNN 2005. LNCS, vol. 3497, pp. 532–537. Springer, Heidelberg (2005) 9. He, Z., Cichocki, A.: K-EVD Clustering and its Applications to Sparse Component Analysis. In: Rosca, J.P., Erdogmus, D., Pr´ıncipe, J.C., Haykin, S. (eds.) ICA 2006. LNCS, vol. 3889, pp. 90–97. Springer, Heidelberg (2006) 10. Georgiev, P.G., Theis, F.J., Cichocki, A.: Sparse component analysis and blind source separation of underdetermined mixtures. IEEE Transactions of NeuralNetworks 16(4), 992–996 (2005) 11. Theis, F.J., Georgiev, P.G., Cichocki, A.: Robust overcomplete matrix recovery for sparse sources using a generalized hough transform. In: Proceedings of 12th European Symposium on Artificial Neural Networks (ESANN 2004), Bruges, Belgium, April 2004, pp. 343–348 (2004)
A New HOS-Based Blind Source Extraction Method to Extract μ Rhythms from EEG Signals Kun Cai1,2 and Shengli Xie2 1
2
College of Engineering, South China Agriculture University, Guangzhou, Guangdong, China, 510641 School of Electronic and Information Engineering, South China University of Technology, Guangzhou, Guangdong, China, 510641 [email protected]
Abstract. The µ rhythm is a type of EEG rhythms, which usually occurs over the motor-sensory cortex of the brain. It is believed to reflect the limb movement and imaginary limb movement controlled by the brain, thus it is one of the important sources of BCI systems. In this paper, a new fixed-point BSE algorithm based on skewness is proposed to extract µ rhythms by the feature of asymmetric distribution. The local stability of the algorithm is also proved in this article. The results from simulations indicate that, for the µ rhythm extraction, the proposed skewness-based algorithm performs better than the negentropybased FastICA.
1
Introduction
The μ rhythm is one of the rhythms in the brain and it is usually recorded over the central sulcus of most healthy people. It is also called the arciform rhythm because of the pattern of the shape. Usually encompassed in the frequency range of 8-12 Hz, the μ rhythm is suppressed during the performance of contralateral motor acts, tactile stimulations, and movement imagery. It is believed that the modulation of the μ rhythm reflects the electrical activity of the synchronization of large portions of pyramid neurons of the motor cortex which control the limb movements when inactive. An Event-related desynchronization (ERD) and an Event-related synchronization (ERS) are considered as key features in motorimaginery-based BCI systems, which correspond to the amplitude decrease and increase of the μ rhythm, respectively[1][2][3][4]. The modulated μ rhythms like ERD and ERS are the sources we desire to obtain. Therefore, it’s significant to develop a μ rhythm extraction algorithm to improve the performance of BCI systems. Among those conventional methods to track μ rhythms, spectral analysis methods are mostly used. They were proposed to resolve sinusoidal components in its characteristic 8-12 Hz band, such as Fourier based methods[5], autoregressive (AR) models[6], and narrow-band power estimation[7]. However, there Y. Tan, Y. Shi, and K.C. Tan (Eds.): ICSI 2010, Part II, LNCS 6146, pp. 180–187, 2010. c Springer-Verlag Berlin Heidelberg 2010
A New HOS-Based Blind Source Extraction Method to Extract µ Rhythms
181
are some drawbacks with these approaches. Firstly, they can be interfered by other rhythms in the brain, which may occupy the same band as the μ rhythm, such as the visual α rhythm. Secondly, although the prominent component of the μ rhythm is in the range of 8-12 Hz band, it is incapable to model the real-world μ rhythm accurately because of the existence of higher frequency components larger than 12 Hz. Dean J. Krusienski proposed a method to control a BCI with μ rhythms processed by a matched filter[8]. Nevertheless, the output of matched filtering indicates the presence of the known template in an unknown signal. In other words, the matched filtering method will be possible to local the μ rhythm in all multichannel EEG observations in time domain, but cannot extract μ rhythms from the mixed EEG signals. In this paper, we propose a new BSE method to extract the μ rhythm based on its asymmetric distribution. This article is organized as follows. In section 2, the skewness-based characteristics of the μ rhythm are analyzed. In section 3, a new BSE algorithm is derived according to the findings in section 2 and local stability of the proposed algorithm will be proved. In section 4, real-world EEG signals are used to evaluate the performance of the proposed method.
2 0 −2
0
0.5
1
1.5
2
2.5 (a)
3
3.5
4
4.5
5
0
0.5
1
1.5
2
2.5 (b)
3
3.5
4
4.5
5
2 0 −2
0.5 0 −0.5
0
1
2
3 (c)
4
5
6
Fig. 1. The simulated and real-world µ rhythm: (a) The simulated µ rhythm. (b) A simulation of a real-world µ rhythm occurs in EEG signals. (c) A real-world µ rhythm signal obtained from clinical EEG data.
182
2
K. Cai and S. Xie
The Skewness-Based Characteristics of the μ Rhythm
In this section, the skewness-based characteristics of μ rhythms will be analyzed. Since it is difficult to obtain an accurate model to describe the generation of the μ rhythm in neurophysiology, a rough one[8] is used to generate a simulated μ rhythm signal as follows: sμ (t) =
3
ai cos(2πif0 t + θi )
(1)
i=1
where f0 ∈ [10, 12] is the fundamental frequency of the μ rhythm, ai and θi are the amplitude and phase of first three harmonics respectively. Let f0 = 10, a1 = 1, a2 = 0.26, a3 = 0.04, θ1 = 0, θ2 = π, θ3 = 0, sampling rate fs = 128 Hz, a simulated μ rhythm is obtained and depicted in Fig. 1(a). As real-world μ rhythms occur in clinical EEGs are usually modulated in the brain during motor activities or mental tasks, the signal in Fig. 1(b) simulates the amplitude changes of the signal by multiplying with a Gaussian window function. And, it is important to know that this is the source signal that we desire to obtain. The signal in Fig. 1(c) displays a real-world μ rhythm contaminated by noise, which can be treated as a modulated μ rhythm adding with a noise signal. Due to the facts that the third-order statistic skewness is a measure of the asymmetry of the probability distribution, and the arciform μ rhythms are also asymmetrically distributed, it is reasonable to consider skewness as the feature of μ rhythms contrast with other source signals mixed in EEGs. The values of the skewness is calculated and listed in Table 1, where the skewness is defined by E x3 skew(x) = (2) 32 E x2 Waves displayed in Fig. 1 and data listed in Table 1 imply that the minus skewness value of the modulated μ rhythm is less than that of the rhythm without modulation, which means that the modulation makes μ rhythms more seriously asymmetrical. Even if the modulated μ rhythms are interfered by noise, the skewness of the signals is still less than the unmodulated one. Fortunately, the modulated μ rhythm is the source signal that we are desired to extract. Therefore, the real-world μ rhythms that were modulated in clinical EEGs can be extracted with an algorithm to maximize the asymmetry of estimated source signals. Table 1. Skewness values of the signals depicted in Fig. 1 Signals Fig.1(a) Fig.1(b) Fig.1(c) Skewness -0.5390 -1.0449 -0.5647
A New HOS-Based Blind Source Extraction Method to Extract µ Rhythms
3 3.1
183
The Proposed Algorithms Object Functions
The observed signals x(t) = [x1 (t), x2 (t), · · · , xn (t)]T are described by the following equation: x(t) = As(t), (3) where E[x(t)] = 0, s(t) = [s1 (t), s2 (t), · · · , sn (t)]T , is a vector of unknown sources with zero mean and unit variance, and A ∈ Rn×n is unknown mixing matrix. Because only one desired signal is expected to extract from the observations, the algorithm can be formulated by y(t) = wT x(t)
(4)
where w ∈ Rn×1 is the weight vector. The observed signals are whitened by an n-by-n whitening matrix V in advance, and each component of ˜ (t) = V x(t) x
(5)
is unit variance and uncorrelated. Thus, (4) can be further rewritten by ˜ (t) y˜(t) = wT x
(6)
˜ (t) and y˜(t) are denoted by x ˜ and y˜ for conIn the following of this paper, x venience. The objective function can be described by the following constrained maximization problem based on the skewness function of the desired source. max J(w) =
w2 =1
2 1 skew(˜ y) 6
(7)
T ˜x ˜ = I, we have where skew(·) is defined by (2). Because w2 = 1 and E x 2 T T T T ˜x ˜ w =w E x ˜x ˜ w = 1. Thus, (7) is simplified to E y˜ = E w x 1 2 max J(w) = E y˜3 (8) w2 =1 6 3.2
Learning Algorithms
Maximizing the objective function in (8), a fixed-point BSE algorithm can be derived. The gradient of the objective function J(w) with respect to w can be obtained from 2 ∂J(w) ˜ y˜ = E y˜3 E x ∇J = (9) ∂w Thus, the fixed-point algorithm to extract μ rhythms is (10) w+ (l + 1) = ∇J w(l) w(l + 1) = where l is the l-th iteration step.
w+ (l + 1) w+ (l + 1)2
(11)
184
3.3
K. Cai and S. Xie
Local Stability Analysis
Here we analyze the stability about the proposed algorithm. Theorem 1. Assume that the input data follows the model (3) with whitened ˜ = V As, signals x V is the whitening matrix. Furthermore, E(si ) = 0, where σ(si ) = 1, and si , sj , ∀i = j, i, j = 1, 2, . . . , n , are mutually independent. Then the local maxima of J(w) under constraint w2 = 1 include one row of the inverse of the mixing matrix V A. Proof. Assume that {si , sj } ∀i = j, i, j = 1, 2, . . . , n are mutually independent, ˜ is white, then we obtain and x T ˜x ˜ = E V AssT AT V T = V AE ssT AT V T = V AAT V T = I (12) E x T = which means that V A is an orthogonal matrix. Let p = p1 , p2 , · · · , pn AT V T w, then the coordinates of the objective function can be changed by the rotation of V A , and we have
2 1 3 max J(p) = E pT s (13) 6 p2 =1 Furthermore, we also have the gradient vector and the Hessian matrix of (13) as follows 3 2 ∂J(p) = E pT s E pT s s (14) ∂p 2 T 2 T 3 T T ∂ 2 J(p) T T + 2E p E p s ss (15) = 3E p s s E p s s s ∂p2 Then, we analyze the stability of the point p = ei , where ei = [0, · · · , 0, 1, 0, · · · , 0]T , i.e. the i-th element of ei is one and others are zeros. Making a small perturbation ε = [ε1 , · · · , εn ]T at p = ei and using the independency assumptions, we obtain ∂J(ei ) = ∂p
T 2 3 0, · · · , 0, E si , 0, · · · , 0
2 ∂ 2 J(p) 3 = diag 0, · · · , 0, 5 E s , 0, · · · , 0 i ∂p2
(16)
(17)
and expand (13) as Taylor series to obtain J(ei + ε) = J(ei ) + εT
∂J(ei ) 1 T ∂ 2 J(ei ) + ε ε + o ε2 2 ∂p 2 ∂p
(18)
Then, substituting (16) and (17) into (18), we have 2 5 2 J(ei + ε) = J(ei ) + εi E s3i + E s3i ε2i + o ε2 2
(19)
A New HOS-Based Blind Source Extraction Method to Extract µ Rhythms
185
Due to the constraint w2 = 1 and the orthogonality of V A, we have p2 = 1. Thus, we get εi =
1− ε2j − 1
(20)
j =i
√ Due to the fact that 1 − γ = 1 − γ2 + o(γ), the term of order ε2i is o ε2 , and can be neglected. Finally we have 1 2 2 J(ei + ε) = J(ei ) − E s3i εj + o ε2 , (21) 2 j =i
which clearly proves that p = ei is an extremum.
4
Experimental Results
To evaluate the performance of two algorithm on the real-world EEG signals, some trails of Data set III of BCI Competition II[9] are used as observed signals. Fig. 2 plots the results of the 15th trail of Data set III processed by OurALG and FastICA. x1
0.5 0 −0.5 0.5 0 −0.5 5 0 −5 5 0 −5 5 0 −5
0
100
200
300
0
100
200
0
100
0
0
400 x2
500
600
700
800
300
400 500 FastICA −− IC1
600
700
800
200
300
400 500 FastICA −− IC2
600
700
800
100
200
300
400 500 OurALG −− y1
600
700
800
100
200
300
600
700
800
400
500
Fig. 2. The results of the 15th trail of Data set III of BCI Competition II processed by OurALG and FastICA
186
K. Cai and S. Xie 20 FastICA −− IC2 OurALG −− y1
dB
0 −20 −40
0
10
20
30
40
50
60
70
Hz
20 FastICA −− IC1 OurALG −− y1 dB
0
−20
−40
0
10
20
30
40
50
60
70
Hz
Fig. 3. The spectrum of the signals extracted by FastICA and OurALG
X1 and x2 are the signals recorded by electrodes placed over C3 and C4, and the sampling rate is 128 Hz. FastICA-IC1 and FastICA-IC2 represent the output independent components of the FastICA algorithm. OurALG-y1 is the extracted signal with our proposed algorithm. Among these three estimated signals, only FastICA-IC2 and OurALG-y1 are arciform, while FastICA-IC1 is more like a triangular wave. Using a Welch’s spectrum analysis method, the power spectra of OurALG-y1, FastICA-IC1, and FastICA-IC2 are estimated and displayed in Fig.3. In this figure, except 10 Hz band, other part of the spectrum of FastICAIC1 is totally different with that of OurALG-y1. While in the frequencies near 10Hz, 20Hz, 30Hz, 40Hz, and 50Hz, the spectra of FastICA-IC2 and OurALGy1 almost overlap. Moreover, it is clear that FastICA-IC2 is more noisy than OurALG-y1, as the energy of OurALG-y1, from 0 - 9 Hz, 23 - 28 Hz, and 34 - 39 Hz, is lower than that of FastICA-IC2. And the experiments we repeated for 100 times show that the robustness of our algorithm is similar to that of FastICA.
5
Conclusions
In this paper, a new skewness-based blind source extraction method is proposed to separate μ rhythms from mixed EEG signals. Due to the fact that the shape of the μ rhythm is arciform, the third order statistic skewness is introduced in the objective function as the measurement of the shape of the signal. Maximizing the objective function, a fixed-point algorithm is derived. The analysis of
A New HOS-Based Blind Source Extraction Method to Extract µ Rhythms
187
local stability of the algorithm proves that interfered by a small perturbation, the proposed fixed-point algorithm will converge to one of the local maxima. Experiments on real-world EEG signals describe that the skewness-based BSE algorithm is better than the kurtosis-based or the negentropy-based BSE methods for the μ rhythm signal extraction. Acknowledgement. This paper is funded by National Basic Research Program of China (973 Program, No. 2010CB731800), National Natural Science Foundation of China (Grant U0635001, U0835003, 60874061, 60974072).
References 1. Morash, V., Bai, O., Furlania, S., et al.: Classifying EEG signals preceding right hand, left hand, tongue, and right foot movements and motor imageries. Clinical Neurophysiology 119, 2570–2578 (2008) 2. Mouraux, A., Iannetti, G.D.: Across-trial averaging of event-related EEG responses and beyond. Magnetic Resonance Imaging. 26, 1041–1054 (2008) 3. Hsu, W.-Y., Sun, Y.-N.: EEG-based motor imagery analysis using weighted wavelet transform features. Journal of Neuroscience Methods 176, 310–318 (2009) 4. Townsend, G., Feng, Y.: Using phase information to reveal the nature of eventrelated desynchronization. Biomedical Signal Processing and Control 3, 192–202 (2008) 5. Akin, M., Kiymik, M.K.: Application of periodogram and AR spectral analysis to EEG signals. Journal of Medical Systems 24, 247–256 (2000) 6. Jansen, B.H., Bourne, J.R., Ward, J.W.: Autoregressive estimation of short segment spectra for computerized EEG analysis. IEEE Trans. Biomed. Eng. 28, 630–638 (1981) 7. Kalcher, J., Pfurtscheller, G.: Discrimination between phase-locked and nonphaselocked event-related EEG activity. Electroenceph. Clin. Neurophysiol. 94, 381–384 (1995) 8. Krusienski, D.J., Schalk, G., McFarland, D.J., et al.: A mu-Rhythm Matched Filter for continuous control of a Brain-Computer Interface. IEEE Trans. Biomedical Engineering 54, 273–279 (2007) 9. Blankertz, B.: BCI Competition II Data set III, http://www.bbci.de/competition/ii/#datasets
An Adaptive Sampling Target Tracking Method of WMSNs Shikun Tian, Xinyu Jin*, and Yu Zhang Department of Information Science & Electronic Engineering, Zhejiang University, China [email protected]
Abstract. A novel energy efficient target tracking approach is proposed for wireless multimedia sensor networks: ARMA and piecewise Cubic Spline interpolation based Adaptive Sampling model (ACSAS). The least square based acoustic signal energy ratio localization model is presented. With unequal interval historical target positions interpolated by piecewise cubic spline interpolation, the target position is forecasted by ARMA. Sampling interval is dynamically determined and updated based on target future location and velocity. Sensor nodes near the forecasted position are awakened at the next sampling. Compared with NonACSAS, the simulation results have verified that ACSAS greatly reduces the tracking energy consumption of WMSN for its much lower computational cost. Keywords: Wireless multimedia sensor networks, energy efficiency, target tracking, adaptive sampling, ARMA, piecewise cubic spline.
1 Introduction The availability of low-cost CMOS cameras and microphones has fostered the development of Wireless Multimedia Sensor Networks (WMSNs), which is swarm intelligent and able to process multimedia data. It also made WMSN applicable for wide range of applications, such as military, environmental monitoring, smart home and so on [1]. Target localization and tracking with WMSN using acoustic signal become a research hotspot. It is a difficult task due to the amount of signal processing and limited resources for sensing, communication and computation. As sensor nodes usually work in unsupervised area, the battery cannot be recharged or replaced, to prolong the lifetime of WMSN, energy efficiency becomes a crucial issue. A lot of methods were researched for target localization and tracking with WMSN. The Energy Ratio method is commonly used for localization [2, 3]. Target tracking with WMSN needs energy efficient method. Some researches have taken ARMA (autoregressive moving average) model into account [4]. A robust forecasting method combining ARMA model and RBFNs was proposed. Sensor nodes around the forecasted target position are awakened and energy saving is achieved by sending nodes to sleep when there is no task [5]. Adaptive sampling approach can achieve significant improvement on energy consumption. According to the Extended Kalman Filter (EKF) predicted tracking accuracy and cost, an adaptive sensor scheduling strategy *
An Adaptive Sampling Target Tracking Method of WMSNs
189
was proposed by jointly selecting the tasking sensor and determining the sampling interval [6]. An adaptive Kalman filtering based algorithm was presented: future target location and velocity are predicted to adaptively determine the active tracking region and sampling interval [7]. In this work, by combining ARMA model and piecewise Cubic Spline interpolation, a novel Adaptive Sampling approach, ACSAS, was proposed to achieve better energy efficiency. Least Square was employed for target location. Piecewise cubic spline interpolation was adopted for ARMA model prediction with unequal interval historical location data. Target future location and velocity were predicted to determine the adaptive sampling interval and sensor nodes that should be awakened.
2 Preliminaries for WMSN Target Tracking 2.1 Target Localization Model Assumed that the acoustic source is an omni-directional point and signal propagates in the free air. Sensor nodes are placed in the two-dimension field without obstacle. Target position is ps=[xs, ys]. Position of one sensor node is pd=[xd, yd], acoustic signal sensed by this sensor node is represented as: ud = g d ×
S ps − pd
α
+ nd
(1)
Where S is the acoustic signal at ps, gd is the gain of microphone,║·║represents the vector 2-norm. nd is the zero-mean additive white Gaussian noise. When the acoustic energy decay exponent is α =2.0818, the mean square error is 0.016 [2]. Assume node i, j are two of the Ns active sensor nodes. According the Energy Ratio method, ratio kij is got as follows: g j (ui − ni ) g i (u j − n j )
= kij =
ps − p j
2
ps − pi
2
(2)
When kij=1, Equation (2) could be transferred to a linear equation:
2( xi - x j ) x + 2( yi - y j ) y = xi2 + yi2 − x 2j − y 2j
(3)
When kij≠1, it could be transferred to a circle equation: ⎛ kij xi − x j ⎜⎜ x − kij − 1 ⎝
2
⎞ ⎛ kij yi − y j ⎟⎟ + ⎜⎜ y − kij − 1 ⎠ ⎝
kij ⎡⎣ ( xi − x j ) 2 + ( yi − y j ) 2 ⎤⎦ ⎞ ⎟⎟ = (kij − 1) 2 ⎠ 2
(4)
Due to the noise, lines and circles may intersect at more than one point. Least square or maximum likelihood (ML) estimation [2, 3] could be adopted to estimate the target location [xs, ys]. For the energy consumption, the computationally simple least square method is adopted. Equations (3) and (4) can be denoted as a matrix form: AX=B, X is the target position, A, B are coefficient matrix. Target position can be acquired by least square estimation as: X = [( AT A )−1 AT B ]T (5)
190
S. Tian, X. Jin, and Y. Zhang
2.2 Target Forecasting by ARMA Model With the current and historical target location which available in the active sensor nodes, location at the next sensing period can be forecasted. Compared to EKF, unscented Kalman filter (UKF), unscented particle filter (UPF), ARMA model is adopted due to its outstanding performance and lightweight computational cost [5, 8]. The p order AR process: yk = ϕ1 yk −1 + ϕ 2 yk − 2 + L + ϕ p y k − p +ε k (6) The current yk is expressed linearly from previous series { yk −1 , yk − 2 ,L , yk − p } and a
random noise ε k . {ϕi | i = 1, 2,L , p} are the AR coefficients. The q order MA process:
yk = ε k − θ1ε k −1 − θ 2ε k − 2 − L − θ q ε k − q
(7)
Where {ε k , ε k −1 ,L , ε k − p } are white noise series constructed from the prediction er-
rors. {θ i | i = 1, 2,L , p} are the MA coefficients. The backshift operator B is introduced:
yk − i = B i yk ε k − i = B i ε k , . So:
Φ ( B ) = 1 − ϕ1 B − ϕ 2 B 2 − L − ϕ p B p
(8)
Θ ( B ) = 1 − θ1 B − θ 2 B 2 − L − θ q B q
(9)
So the ARMA(p, q) model can be written as: Φ ( B ) yk = Θ ( B ) ε k
(10) The order of ARMA model can be determined by autocorrelation function (ACF) and partial autocorrelation function (PACF) analysis. The time series of target moving can be modeled by AR(p) and order of the AR(p) model is set as p=4 [5]. The method of least square estimation is adopted to determine the coefficients of AR(p) [9]. 2.3 Energy Model
Energy consumption of an active sensor node consists of transmission energy, receiving energy and CPU computation energy [10]: ETx (k , d ) = Eelec * k + ε amp * k * d 2
(11)
ERx (k ) = Eelec * k
(12)
Ecomp (Vdd , f ) = N cyc CVdd2 + Vdd ( I 0 e
Vdd nVT
)( N cyc / f )
(13)
Eelec andεamp are hardware related parameters. The distance between two sensor nodes is d m, the data is k bits. Vdd, f, Ncyc are the CPU supply voltage, frequency and cycles ran. C, I0 and n are processor related parameters. The total energy consumption with sleeping energy ignored is: Etotal = ETx (k , d ) + ERx (k ) + Ecomp (Vdd , f )
(14)
An Adaptive Sampling Target Tracking Method of WMSNs
191
3 WMSN Adaptive Sampling Tracking Method: ACSAS During tracking, fixed sampling interval will bring uneven density of location points. If too dense, large amount of sampling, computation and communication will increase the energy consumption. If too sparse, tracking accuracy will be reduced, even lost the target. WMSN will be able to dynamically change the sampling interval using ACSAS so the location points could be uniform. 3.1 Interpolation Method for ARMA with Missing Data
The interval of ARMA model time series is fixed. But it will be un-fixed when adaptive sampling. To achieve the forecasting of un-fixed interval time series with ARMA, missing data ARMA model or fixed interval interpolation could be adopted. Missing data ARMA model needs special and complex parameter estimation methods such as Kalman filter and ML [11, 12]. They were unsuitable for the WMSN which require high energy efficiency. Methods of fixed interval interpolation could avoid Kalman filter and ML. Linear interpolation and low-order polynomial interpolation are easy to compute but hard to ensure the accuracy. High-order polynomial interpolation is computation-expensive, even rises to Runge's phenomenon. Piecewise low-order polynomial interpolation is stable and easy to compute, but hard to guarantee the smoothness of the whole curve. Cubic spline interpolation is stable and easy to compute, also improve the smoothness of the curve [13-15]. The target trajectory is always smooth, cubic spline interpolation is preferable. Piecewise cubic spline interpolation is adopted in ACSAS, so not only the interpolation accuracy is guaranteed but also the computation, energy consumption are both reduced. 3.2 Process of ACSAS
By combining ARMA model and piecewise Cubic Spline interpolation, ACSAS is adopted to predict the target future location and velocity, update the interval for the next sampling, and wake up the sensor nodes near the future location to sample the acoustic signals. The process of ACSAS is as follows: Step1: Initialization, sensor nodes sample the acoustic signals with the fixed time interval Tmin, also the minimum interval, and locate the target with the model in 2.1. ACSAS will be enabled when N sets of target location data is ready. Step2: Select N latest sets of the historical location data as a time series D: ⎡t1 t2 L t N ⎤ D = ⎢ x1 x2 L xN ⎥ ⎢⎣ y1 y2 L y N ⎥⎦
From the first line of D, series TD is got: TD = ⎡⎣T12 Τ23 L Τ( N −1) N ⎤⎦ , where T12 = t2t1,T23 = t3-t2,T(n-1)n = tn-tn-1. Check whether there is any element of TD greater than 1. If none, interpolation is unnecessary. Otherwise, starting from T(N-1)N to T12, if T(M-1)M is greater than 1 ( M ∈ [1, N] ), set T(M-1)M-1 interpolation points in series D with
192
S. Tian, X. Jin, and Y. Zhang
interval Tmin between tM-1 and tM, until the total number of original point and interpolated points is N. Apply piecewise Cubic Spline interpolation at the interpolated points, series DP is got: ⎡tp1 tp2 L tpN ⎤ DP = ⎢ xp1 xp2 L xpN ⎥ ⎣⎢ yp1 yp2 L ypN ⎦⎥ Step 3: So time series DP is an equal-time interval sequence. Applying the ARMA model, target location after Tmin is got from the abscissa sequence [ xp1 xp2 L xpN ]
and ordinate sequence [ yp1 yp2 L ypN ] : p p = [ x p , y p ] .
Step 4: The future target velocity after Tmin is got from pp:
Vp =
( x p − xN ) 2 + ( y p − y N ) 2 Tmin
(15)
The time interval Ts for the next sampling is got from:
⎡ V ⎤ Ts = ⎢ kep × max ⎥ * Tmin Vp ⎥ ⎢
(16)
Vmax is the hypothetical maximum velocity of the target. The minimum sampling interval Tmin is determined by the sampling, computing and communications capability of the sensor nodes. To achieve a balance between energy consumption and tracking accuracy, kep∈[0,1], is used to adjust the size of Ts dynamically. Greater kep will make less energy consumption; smaller kep will make better accuracy. Step 5: The target location P at the next Ts could be estimated from Ts and Vp. Select Ns sensor nodes nearest from P as a cluster. Also, the ud in Equation (1) of these Ns nodes are bigger than others. Select the node with the biggest ud as cluster head, which will finish location, forecasting, calculating Ts and waking up the sensor nodes for the next location task. After Ts, the selected Ns sensor nodes will get out of sleep to sample the acoustic signals. Location result will be saved as history data. Repeat Step 2 to Step 5, track the target continuous until tracking task finished.
4 Experimental Results Simulation was performed in MATLAB. A sensing field of 600×600m2 was covered by 1000 WMSN nodes with random distribution. The minimum sampling period for target localization is Tmin=1s. The acoustic signal decay exponent is α =2, the white Gaussian noise is 20dB. After experiment comparison, with 6 nodes cooperating localization, the accuracy of model 2.1 is the best, so Ns=6. In the ARMA model, the orders are set as p=4, q=0. In the energy model, data packet size k is set as 480bits. The hardware related parameters were: Eelec=50nJ/bit, εamp=100pJ/(bit*m2), C=0.67nF, I0=1.196mA, n=21.26. Assume the target maximum velocity is Vmax=40m/s. Three ways for target moving was simulated: (A) Uniform linear movement with speed of 6.4m/s, (B) Variable-speed linear movement with speed of 2~12m/s, (C) Variable curve movement with speed of 6.8~40m/s.
An Adaptive Sampling Target Tracking Method of WMSNs
193
4.1 An Overview of ACSAS Tracking Result
Take way C as an example; apply ACSAS and Non-ACSAS in target tracking. If Non-ACSAS was applied, sampling points were dense when target moved slowly and sparse when fast. When dense, more sensor nodes were awakened to work. If ACSAS was applied, sampling points were uniform. Awakened sensor nodes were reduced. 600 Sleeping Nodes T arget
500
Waken Nodes None-ACSAS ACSAS
Y(m)
400
300
200
100
0
0
100
200
300 X(m)
400
500
600
Fig. 1. Contrast of ACSAS and Non-ACSAS
4.1 Contrast of Energy Consumption and Accuracy
The Root Mean Square Error (RMSE) is adopted for target tracking accuracy [16]. For each way of A, B and C, simulations were performed when kep was set as 0.2, 0.5 and 0.8. For each case, 100 times of contrast were done between ACSAS and NonACSAS. The average energy consumption and RMSE were recorded in Table 1. Table 1. Contrast of average energy consumption and average RMSE
Energy consumption was reduced when ACSAS was applied. More energy was saved when kep was greater. Up to about 85% energy could be saved when using ACSAS. Fig. 2 shows the contrast of energy consumption.
Fig. 2. Contrast of energy consumption
The accuracy characteristics of ACSAS were researched by analysis of RMSE data distribution. For way C when kep=0.8, distribution characteristics of 100 groups of RMSE data was showed in Fig. 3. ACSAS average RMSE was 0.685m, variance was 0.0673 m2. Non-ACSAS average RMSE was 0.484m, variance was 0.0103m2. Compared with Non-ACSAS, average and variance of ACSAS RMSE were both greater. ACSAS achieves good energy efficiency with an acceptable accuracy. 5 None-ACSAS RMSE Average None-ACSAS RMSE ACSAS RMSE Average ACSAS RMSE
RMSE(m)
4 3 2 1
0
0
20
40
60
80
100
N
Fig. 3. Contrast of tracking RMSE
5 Conclusions Energy efficiency is significant for WMSN. The main contribution of this paper is proposing an energy efficient adaptive sampling target tracking model: ACSAS. ACSAS greatly reduces the amount of sensor nodes sampling, data communication
An Adaptive Sampling Target Tracking Method of WMSNs
195
and processing, with an acceptable accuracy. Up to about 85% energy is saved with ACSAS. Compared to computation-expensive methods based on EKF or ML, ACSAS achieves lower computational complexity and is more suitable for WMSN. To further improve accuracy and energy efficiency, keep a robust tracking system, a lot of work need to be done. Optimal sensor nodes should be selected as a cluster and Doppler Effect need to be compensated. The forecasting model should be optimized to improve the prediction accuracy and tracking stability. Acknowledgments. This work was funded by the Science and Technology Program of Zhejiang Province in P.R. China (Grant No.:2005CS31001).
References 1. Akyildiz, I.F., Melodia, T., Chowdury, K.R.: A survey on wireless multimedia sensor networks. Computer Networks 51(4), 921–960 (2007) 2. Li, D., Hu, Y.H.: Energy based collaborative source localization using acoustic microsensor array. In: IEEE Workshop on Multimedia Signal Processing, December 9-11, pp. 371–375 (2002) 3. Li, D., Hu, Y.H.: Least square solutions of energy based acoustic source localization problems. In: Proceedings of the 2004 International Conference on Parallel Processing Workshops, ICPPW 2004, pp. 443–446 (2004) 4. Liu, W., Farooq, M.: An ARMA model based scheme for maneuvering target tracking. In: Proceedings of Midwest Symposium on Circuits and Systems 1994, Hiroshima, Japan, July 25-28, pp. 1408–1411 (1994) 5. Wang, X., Ma, J.J., Ding, L., Bi, D.W.: Robust Forecasting for Energy Efficiency of Wireless Multimedia Sensor Networks. Sensors 7, 2779–2807 (2007) 6. Xiao, W.D., Wu, J.K., Xie, L.H.: Adaptive sensor scheduling for target tracking in wireless sensor network. In: Proc. SPIE, vol. 5910, pp. 591 00B-1–591 00B-9 (2005) 7. Yick, J., Mukherjee, B., Ghosal, D.: Analysis of a prediction-based mobility adaptive tracking algorithm. In: 2nd International Conference on Broadband Networks, BROADNETS 2005, pp. 809–816 (2005) 8. Broersen, P.M.T.: Automatic identification of time series models from long autoregressive models. IEEE Transactions on Instrumentation and Measurement 54, 1862–1868 (2005) 9. Biscainho, L.W.P.: AR model estimation from quantized signals. IEEE Signal Processing Letters 11, 183–185 (2004) 10. Wang, A., Chandrakasan, A.: Energy-efficient DSPs for wireless sensor networks. IEEE Signal Process Mag., 68–78 (July 2002) 11. Jones, R.H.: Maximum likelihood fitting of ARMA models to time series with missing observations. Technometrics 22(3), 389–395 (1980) 12. Broersen, P., Waele, S., Bos, R.: Estimation of autoregressive spectra with randomly missing data. In: Proc. 20th IEEE Instrument. Measure. Technol. Conf., Vail, CO, vol. 2, pp. 1154–1159 (2003) 13. Zerfran, M., Kumar, V.: Interpolation schemes for rigid body motions. Computer-Aided Design 30(3), 179–189 (1998) 14. Zhang, T., Yan, J.B.: Numerical Analysis. Metallurgical Industry Press, Beijing (2001) 15. Xiao, X.N., Zhao, L.J., Dang, L.L.: Modern method of numerical calculation. Peking University Press, Beijing (2003) 16. Mazomenos, E., Reeve, J., White, N.: An accurate range-only tracking system using wireless sensor networks. In: Proceedings of the Eurosensors XXIII conference, Lausanne, Switzerland, pp. 1199–1202 (2009)
Asymptotic Equivalent Analysis for LTI Overlapping Large-Scale Systems and Their Subsystems Qian Wang and Xuebo Chen School of Electronics and Information Engineering, Liaoning University of Science and Technology, Anshan 114051, China [email protected], [email protected]
Abstract. According to the matrix exponential function and the matrix stability, a criterion of asymptotic equivalence is proposed in this paper. The criterion is presented for linear time-invariant (LTI) overlapping large-scale systems and their pair-wise subsystems which are decomposed by the inclusion principle. The study of asymptotic equivalence offers the convenience for the stable analysis, furthermore, offers rationale for the asymptotic equivalent analysis for the other large-scale systems and their isolated subsystems. Simultaneously, an example has been given to illustrate the feasibility and the validity of this method. Keywords: LTI overlapping large-scale systems, Pair-wise subsystems, Asymptotic equivalence, Matrix exponential function.
1 Introduction The guiding ideology of the method to study the stability of the large-scale systems can be divided into two steps. Firstly, the large-scale systems are decomposed into some isolated subsystems and connective systems. Secondly, by the appropriate Lyapunov function or the matrix exponential function as well as certain algebra relations, the integral relations among the connections of the lower subsystems, the stability of the largescale systems is achieved. The large-scale systems are decomposed by the inclusion principle in the paper. However, the stability of the large-scale systems and their isolated subsystems is not equal. This paper will study the asymptotic equivalence of LTI overlapping large-scale systems and their pair-wise subsystems based on the theories such as matrix exponential function, matrix stability and so on. [1-4]
compensation matrix: M A = [M11 ,L, M1N ,L M N ( N −1) , M NN ] ∈ Rn(n−1)×n( n−1) ,
where, M 11
⎡n − 2 ⎢ n − 1 a11 =⎢ ⎢ −1 a ⎢⎣ n − 1 11
−1 ⎤ a11 ⎥ ⎡0 n −1 ⎥ , M 12 = ⎢ n−2 ⎣0 a11 ⎥ ⎥ n −1 ⎦
⎡n − 2 ann ⎢ 0⎤ ,…, M NN = ⎢ n − 1 ⎥ 0⎦ ⎢ −1 a ⎢⎣ n − 1 nn
−1 ⎤ ann ⎥ n −1 ⎥. n−2 ann ⎥ ⎥⎦ n −1
% = VA , according When the matrix A of the expansion space satisfies the condition AV % % to A = VAU +M A , then, A can be written as the following form: ⎡⎡a a ⎤ ⎡a(n−1)(n−1) a(n−1)n ⎤⎤ ⎡a a ⎤ ⎡a a ⎤ ⎡a a ⎤ A% = block _ diag ⎢⎢ 11 12 ⎥ ,L, ⎢ 11 1n ⎥ , ⎢ 22 23 ⎥ ,L, ⎢ 22 2n ⎥ ,L, ⎢ ⎥ ann ⎥⎦⎦⎥ a a a a a a a a ⎣ n1 nn ⎦ ⎣ 32 33 ⎦ ⎣ n2 nn ⎦ ⎣ an(n−1) ⎣⎢⎣ 21 22 ⎦ + B(Bij ) ,
⎡ a( n −1)( n − 2) ⎡0 a13 ⎤ ⎡0 0 ⎤ B11 = ⎢ , B12 = ⎢ , L , BN ( N −1) = ⎢ ⎥ ⎥ ⎣0 0 ⎦ ⎣0 a23 ⎦ ⎣ an ( n − 2)
0⎤ ⎡0 0 ⎤ , BNN = ⎢ ⎥ ⎥ . 0⎦ ⎣0 0 ⎦
Therefore, we can rewrite the system S in the formula (1) :
dy = diag ( A11 ,L, ANN ) y + B( Bij ) y , dt
(2)
with its pair-wise subsystems:
dx = diag ( A11 ,L, ANN ) x , dt ⎡a Where, A11 = ⎢ 11 ⎣ a21
a12 ⎤ ⎡a , A22 = ⎢ 11 a22 ⎥⎦ ⎣ a31
a13 ⎤ ⎡ a( n −1)( n −1) , L , ANN = ⎢ a33 ⎥⎦ ⎣ an ( n −1)
(3) a( n −1) n ⎤ ; ann ⎥⎦
198
Q. Wang and X.B. Chen
x , y is n(n − 1) column vector: y = col( y1, y2 ,L, y1, yn , y2 , y3 ,L, yn−2 , yn , yn−1, yn ) ,
x = col(x1, x2 ,L, x1, xn , x2 , x3 ,L, xn−2 , xn , xn−1, xn ) . Let y = col(Y1,L,YN ), x = col( X1,L, XN ). Where, Y1 = col ( y1 , y2 ) , Y2 = col ( y1 , y3 ) , L , YN = col ( yn−1 , yn ) ; X 1 = col ( x1 , x2 ) ,
X2 = col(x1, x3 ) , L , X N = col ( xn−1 , xn ) . Here,
N =
n ( n − 1) (If it is not special 2
showed in the following, N is n ( n − 1) ), i, j = 1, 2,L , N . Aii is 2 × 2 constant matrix, 2
Bij is 2 × 2 constant matrix, i, j = 1,2,L, N . B( Bij ) is n(n − 1) × n(n − 1) constant matrix,
and is the interconnection matrix among the N pair-wise subsystems. Definition 1. Let x(t, t0 , x0 ) and y(t, t0 , y0 ) is the solution of the formula (3) and the
formula (2), respectively. If there is a homeomorphism mapping x(t,t0, x0 ) → y(t,t0, y0 ) and lim x(t , t0 , x0 ) − y (t , t0 , y0 ) = 0 , then, the formula (2) and the formula (3) are t →∞
asymptotically equivalent.
3 Main Results Theorem 1. We suppose the following conditions satisfy:
(Ⅰ) C (t , t0 ) = diag (C11 (t , t0 ), L , C NN (t , t0 )) is the matrix exponential function of the formula (3), Cii (t , t 0 ) ≤ M i e −α i ( t − t0 ) , i = 1, 2, L , N − 1 , C NN (t , t0 ) ≤ M N ,
where M i , M N and α i are positive constants, Bij ≤ Lij , where i, j = 1, 2,L , N − 1 , Lij is positive constant; (Ⅱ) Bij satisfies: ∫
∞
t0
BNj dt < ∞,
∫
∞
t0
BiN dt < ∞, i, j = 1, 2,L , N ;
(Ⅲ) Matrix G := −diag (α1 ,L, α N −1 ) + diag (M1 ,L, M N −1 ) Lij ∈ R( N −1)×( N −1) is stable. Then, the formula (2) and the formula (3) are asymptotically equivalent. Proof. (a) We prove that the solution of the formula (2) is bounded firstly. Let Yi (t , t0 , y0 ) = Yi (t ) , Yi (t0 ) = Yi 0 ,then the solution of the formula(2)is ∞
N
t0
j =1
Yi (t ) = Cii (t , t0 )Yi 0 + ∫ Cii (t ,τ )∑ Bij Y j (τ )dτ , where i = 1, 2,L , N . Therefore we get
(4)
Asymptotic Equivalent Analysis for LTI Overlapping Large-Scale Systems N −1 ∞
∞
j =1 t0
t0
Yi (t ) ≤ Mi Yi 0 e−αi (t −t0 ) + Mi ∑ ∫ e−αi (t −τ ) Lij Yj (τ ) dτ + Mi ∫ e−αi (t −τ ) BiN YN (τ ) dτ N
199
,
(5)
t
where i = 1, 2, L , N − 1 . YN (t ) ≤ M N YN 0 + M N ∑ ∫ BNj Y0 (τ ) dτ . Let j =1 t0
N −1 ∞
∞
ξi (t ) = M i ∑ ∫ e−α (t −τ ) Lij Yj (τ ) dτ + M i ∫ e−α (t −τ ) BiN Yi (τ ) dτ i
i
j =1 t0
,
(6)
t0
where i = 1, 2,L , N − 1 , N
t
ξ N (t ) = M N ∑ ∫ BNj Y j (τ ) dτ
,
(7)
j =1 t0
then, Yi (t ) ≤ M i Yi 0 e−αi (t −t0 ) + ξi (t ) , where i = 1,2,L, N −1 . YN (t ) ≤ M N YN 0 + ξN (t ) . Therefore N −1 ⎧ d ξ i (t ) ⎪ dt ≤ −α iξ i (t ) + M i ∑ Lij ξ j (t ) + M i BiN ξ N (t ) + f i (t ), j =1 ⎪ . ⎨ N d ξ ( t ) ⎪ N ≤ M N ∑ BNj ξ j (t ) + f N (t ) ⎪⎩ dt j =1
(8)
N −1 ⎧ −α j (t −t0 ) + Mi M N BiN YN 0 , ⎪ fi (t ) = Mi ∑ M j Lij Yj 0 e j =1 ⎪ . ⎨ N −1 −α j (t −t0 ) 2 ⎪ f (t ) = M + M N BNN YN 0 N ∑ M j BNj Yj 0 e ⎪⎩ N j =1
(9)
Here,
Now we consider the comparative equations of the formula (8): N −1 ⎧ dηi ⎪ dt = −α iηi + M i ∑ Lijη j + M i BiN η N + f i (t ), j =1 ⎪ . ⎨ N N ⎪ dη N = oη + M ∑ i N ∑ BNj η j + f N (t ) ⎪⎩ dt i =1 j =1
(10)
Ⅲ
⎡G 0 ⎤ n× n Let W = ⎢ ⎥ ∈ R . From the condition ( ) we may know that W only has a 0 0 ⎣ ⎦ single zero characteristic value, other characteristic values all have negative real parts. Thus, dη = Wη , dt
(11)
200
Q. Wang and X.B. Chen
the matrix exponential function solution of the formula (11) is bounded for each t ≥ t0 , where, η = col (η1 ,L,ηN ) . We suppose K (t , t0 ) is the fundamental-solution matrix of the homogeneous equations of the formula (8). Due to BNj ∈ (0, +∞ ) and BiN ∈ (0, +∞) , i, j = 1,2,L, N ,
Ⅱ
we can infer that K (t , t0 ) is also bounded according to the condition ( ). The solution of the formula (10) : t
η (t ) = K (t , t0 )η (t0 ) + ∫ K (t ,τ ) f (τ ) dτ .
(12)
t0
Where, f (t ) = col ( f1 (t ),L , f N (t )) is defined in the formula (9). Due to f (t ) ∈ L1 (t0 , ∞)
and η (t ) being bounded as well as the formula (4), the formula (6) and the formula (7), we can get y (t ) is bounded according to the comparison principle [5]. (b) Now we prove that lim yi (t , t0 , y0 ) = 0, i = 1, 2,L , N − 1 . Let η% = (η1 , L,η N −1 )T t →+∞
and f% (t ) = ( M 1 B1N η N (t ),L , M N −1 B( N −1) N η N (t ))T + ( f1 (t ), L , f N −1 (t ))T . Because
η N (t ) is bounded and BiN ∈ (0, +∞) , i = 1, 2,L , N , for certain K>0,
∞
∫
f% (t ) dt ≤ K .
0
We can infer from the formula (10) that η% satisfies the following nont
dη% G(t −t ) G(t −τ ) % f (τ )dτ . homogeneous equation: = Gη% + f% (t ) , then η%(t, t0 ,η%0 ) = e 0 η%0 + e dt t0
∫
Because that the matrix G is stable, there are constants c>0 and β >0 according to the
Ⅲ
condition ( ), which causes eG (t −τ ) ≤ ce − β ( t − t0 ) . So we get t
We can infer that ξi (t , t0 , ξ 0 ) → 0 when t → ∞ , thus yi (t , t0 , y0 ) → 0 , i = 1, 2,L , n . (c) Now we make the homeomorphism mapping between the solutions of the formula (2) and the formula (3) as follows. Let n1 nN } } n× ( n −1) I1 = diag (1,L ,1, 0, L , 0) ∈ R , … , I N = diag (0, L , 0,1, L ,1) ∈ R n×( n −1) , (13) then the matrix exponential function of the equations set (3) can be expressed by N
C (t , t 0 ) = ∑ C (t , t 0 ) I i . i =1
(14)
Asymptotic Equivalent Analysis for LTI Overlapping Large-Scale Systems
201
So we can express y(t) : ∞
N t
y(t ) = C(t , t0 ) y0 + ∑ ∫ C(t ,τ ) Ii B( Bij ) y(τ )dτ = C (t, t0 )( y0 + ∫ C(t0 ,τ )I N B( Bij ) y(τ )dτ ) i =1 t0
t0
N −1 t
∞
i =1 t0
t
+∑ ∫ C(t ,τ ) Ii B( Bij ) y(τ )dτ − ∫ C(t ,τ ) I N B( Bij ) y(τ )dτ .
(15)
We suppose that y (τ , t0 ) is the transition matrix of the formula (2), and ∞
∞
t0
t0
x0 = y0 + ∫ C (t0 ,τ ) I N B( Bij ) y(τ )dτ = ( E + ∫ C (t0 ,τ ) I N B( Bij ) y (τ , t0 )dτ ) y0 . (16) We can see from the formula (16) that x0 is the single-valued function of y0 . Record: ∞
z0 = ∫ C (t0 ,τ ) I N B ( Bij ) y (τ , t0 ) dτ . We can infer that z0 → 0 when t0 → ∞ from the t0
integrability of y(t) and the absolute integrability of BNj . Therefore, t0 can be elected sufficiently large, which makes E + z0 nonsingular, then y0 = ( E + z0 ) −1 x0 . So we can get that y0 is the single-valued continuous function of x0 . Therefore, the formula (16) gives the homeomorphism mapping whose starting value is between Ryn0 and Rxn0 which is also the homeomorphism mapping between the solutions of the formula (2) and the formula (3). (d) Finally, we prove lim x(t, t0 , x0 ) − y(t,t0 , y0 ) := lim x(t) − y(t) = 0 , here, x0 =(E+z0)y0 . t →∞
From the formula (15) and the formula (16), we can get N −1
N −1
t
y(t ) − x(t ) ≤ ∑ M i ∫ e−α (t −τ ) ∑ Bij Yj (τ ) dτ i
i =1
j =1
t0
N −1
t
N ∞
i =1
t0
j =1 t0
+∑ Mi ∫ e−αi (t −τ ) BiN YN (τ ) dτ + ∑ ∫ CNN (t,τ ) BNj Yj (τ ) dτ := J1 + J 2 + J3 . N −1
By direct calculation we can obtain lim J1 = lim ∑ t →∞
N−1
t 2
N−1
t0
i =1
t →∞
i =1
1
αi
N −1
M i ∑ Bij Y j (t ) = 0 j =1
t
N−1
t 2
i=1
, t
−αit /2 limJ2 = ∑Mi lim∫e−αi (t−τ ) BiN YN (τ) dτ +∑Mi lim∫e−αi (t −τ ) BiN YN (τ) dτ ≤ ∑(limce +limc1∫ BiN dτ) = 0 1 t →∞
t→∞
i=1
t →∞
t→∞
where c1 is a sufficiently large number. ∞
N
lim J 3 = ∑ lim∫ CNN (t ,τ ) BNj Y j (τ ) dτ = 0 . The proof completes. t →∞
j =1
t →∞
t0
t →∞
t 2
202
Q. Wang and X.B. Chen
4 Example We take n=3 overlap large-scale system being decomposed into N=3 pair-wise subsystems by the inclusion principle for example to illustrate the feasibility and the validity of the proposed method. Let us consider the system S:
0⎤ ⎡ −4 0 x& = ⎢⎢ 0 − 5 − 1⎥⎥ x . ⎢⎣ −1 −1 − 2 ⎥⎦
Matrix A is decomposed by the inclusion principle, when select expansion matrix:
1 T T T V = block _ diag ⎡(1,1) , (1,1) , (1,1) ⎤ , U = block _ diag ⎡⎣(1,1) , (1,1) , (1,1) ⎤⎦ , ⎣ ⎦ 2 ⎡ −2 compensation matrix: M A = ⎢ ⎡⎢ ⎣⎣ 2
2 ⎤ ⎡0 , − 2 ⎥⎦ ⎢⎣ 0
0⎤ ⎡ −1 1 ⎤ ⎤ ,L , ⎢ ⎥ ⎥⎥ . 0⎦ ⎣ 1 − 1⎦ ⎦
According to A% = VAU +M A and A% is processed by permutation transformation, then: ⎡⎡a A% = block _ diag ⎢ ⎢ 11 ⎣ ⎣ a 21
a12 ⎤ ⎡ a11 , a 22 ⎥⎦ ⎢⎣ a31
a13 ⎤ ⎡ a 22 , a33 ⎥⎦ ⎢⎣ a32
a 23 ⎤ ⎤ ⎥ + B (B ij ) . a33 ⎥⎦ ⎦
We will rewrite the system S according to the form of the formula (2) : S:
y& = diag ( A11 , A22 , A33 ) y + B( Bij ) y ,
(17)
where, i, j = 1,2,3 , y = col (Y1 , Y2 , Y3 ) , Y1 = col( y1 , y2 ) , Y2 = col ( y1 , y3 ) , Y3 = col ( y2 , y3 ) . With its pair-wise subsystems: dx = diag ( A11 , A22 , A33 ) x , (18) dt ⎡ −4 0⎤ , A22 = ⎢ ⎥ −5⎦ ⎣ −1
0⎤ ⎡ −5 −1 ⎤ , A33 = ⎢ ⎥ , x = col ( X 1 , X 2 , X 3 ) , −2 ⎥⎦ ⎣ − 1 −2 ⎦ X1 = col ( x1 , x2 ) , X 2 = col (x1 , x3 ) , X 3 = col ( x2 , x3 ) . The matrix exponential function of the formula (18): −4
Asymptotic Equivalent Analysis for LTI Overlapping Large-Scale Systems ∞
∫
t0
203
∞
Bi 3 dt = ∫ (0 + 1 + 0)dt = ∞ − t0 < ∞ , where i = 1, 2,3 . We can learn that the matrix t0
⎡ −4 0 ⎤ ⎡ 0 2 ⎤ [5] G=⎢ ⎥ + ⎢ 0 0 ⎥ is stable from the matrix stability . From the above, the 0 1 − ⎣ ⎦ ⎣ ⎦ formula (17) and the formula (18) are asymptotically equivalent according to Theorem 1.
5 Conclusion The asymptotic equivalence of LTI over-lapping large-scale systems and their pairwise subsystems has been studied in this paper. So, we can study the pair-wise subsystems instead of the large-scale systems while not losing any stability. This method reduces the complexity of the stable analysis for the LTI overlapping large-scale systems and is suitable for the actual application. Acknowledgment. This research reported herein was supported by the NSF of China under grant No. 60874017.
References 1. Chen, X.B., Stankovic, S.S.: Inclusion Principle of Stochastic Discrete-Time Systems. J. Acta Automatica Sinica (1997) 2. Chen, X.B.: Special Decompositions of Interconnected Power Systems. J. Journal of Anshan University of Science and Technology 26(6), 47–53 (2003) 3. Chen, X.B., Stankovic, S.S.: Decomposition and Decentralized Control of Systems with Multi-overlapping Structure. J. Automatica 41(10), 1765–1772 (2005) 4. Chen, X.B., Stankovic, S.S.: Overlapping Decentralized Approach to Automation Generation Control of Multi-area Power Systems. J. International Journal of Control 80(3), 386–402 (2007) 5. Liao, X.X.: The Mathematics Theory and Appliance of the Stability. Huazhong University of Science and Technology Press, Wuhan (2006) (in Chinese) 6. Chen, J.L., Chen, X.H.: Special Matrix. Tsinghua University Press, Beijing (2000) (in Chinese) 7. Huang, L.: The Rationale of Stability and Robust. Press of the scientific and technical university, Beijing (2003) (in Chinese)
Brain-Computer Interface System Using Approximate Entropy and EMD Techniques Qiwei Shi1 , Wei Zhou1 , Jianting Cao1,2,3, Toshihisa Tanaka2,4, and Rubin Wang3 1
Saitama Institute of Technology 1690 Fusaiji, Fukaya-shi, Saitama 369-0293, Japan 2 Brain Science Institute, RIKEN 2-1 Hirosawa, Wako-shi, Saitama 351-0198, Japan 3 East China University of Science and Technology Meilong Road 130, Shanghai 200237, China 4 Tokyo University of Agriculture and Technology 2-24-16, Nakacho, Koganei-shi, Tokyo 184-8588, Japan [email protected]
Abstract. Brain-computer interface (BCI) is a technology which would enable us to communicate with external world via brain activities. The electroencephalography (EEG) now is one of the non-invasive approaches and has been widely studied for the brain computer interface. In this paper, we present a motor imaginary based BCI system. The subject’s EEG data recorded during left and right wrist motor imagery is used as the input signal of BCI system. It is known that motor imagery attenuates EEG μ and β rhythms over contralateral sensorimotor cortices. Through offline analysis of the collected data, a approximate entropy (ApEn) based complexity measure is first applied to analyze the complexity between two channels located in different hemispheres. Then, empirical mode decomposition (EMD) is used to extract informative brain activity features to discriminate left and right wrist motor imagery tasks. The satisfactory results we obtained suggest that the proposed method has the potential for the classification of mental tasks in brain-computer interface system. Keywords: Brain-computer Interface (BCI), Motor Imagery, Electroencephalography (EEG), Approximate Entropy (ApEn), Empirical Mode Decomposition (EMD).
1
Introduction
Brain-computer interface (BCI) is a system that uses electric, magnetic, or cortical neuronal activity signals rather than peripheral nerves and muscles to control external devices such as computers, switches, wheelchairs. Like any communication or control system, a BCI system has input (e.g., electrophysiological activity from the object), output (e.g., device commands), components that translate input into output, and a protocol that determines the onset, offset, and timing of operation [1]. Y. Tan, Y. Shi, and K.C. Tan (Eds.): ICSI 2010, Part II, LNCS 6146, pp. 204–212, 2010. c Springer-Verlag Berlin Heidelberg 2010
BCI System Using Approximate Entropy and EMD Techniques
205
The most exploited signal in BCI is the scalp-recorded electroencephalogram (EEG) which is a noninvasive measurement of the brain’s electrical activities and has a temporal resolution of milliseconds. The most existing BCI systems use three basic signal-proceeding blocks [2]. The system applies a preprocessing step to remove noise and artifacts which mostly related to ocular, muscular and cardiac. In the next step, the system perform feature extraction and selection to detect the specific target patterns in brain activity that encode the user’s mental tasks or motor intentions. The last step is aimed at translating these specific features into useful control signals to be sent to an external device [9]. Recently, Brain-computer Interface (BCI) research has been evolved tremendously. BCI provides control capabilities to people with motor disabilities. There are many experimental approaches including P300, VEP (Visual Evoked Potential), SSVEP (Steady State Visual Evoked Potential), and motor imagery that carried out to study BCI system [3,4]. The movement-related BCI aims at providing an alternative non-muscular communication path and control system for motion disabled people to send the command to the external world using the measures of brain activity. This type of brain-computer interface is based upon detection and classification of the change of EEG rhythms during different motor imagery tasks, such as the imagination of left and right hand movements. One approach of motor imagery based BCI is to exploit spectral characteristics of μ rhythm (8–12 Hz) and β rhythm (12–30 Hz). These oscillation typically decrease during, or in preparing for a movement–event related desynchronization (ERD), and increase after movement and in relaxation–event related synchronization (ERS) [6]. That is to say, for example, left hand motor imagery makes μ or β rhythm decrease in the sensory motor region of right hemisphere. This paper describes a method of complexity measure associating with EMD technique. Approximate entropy (ApEn) reflects the different complexity of two electrodes’ signals. EMD takes its effect in extraction the feature between left and right motor imagery. The experimental results illustrate the proposed method is effective in the classification of motor imagery EEG.
2 2.1
Methods The Approximate Entropy
Approximate entropy (ApEn) is a regularity statistic quantifying the unpredictability of fluctuations in a time series that appears to have potential application to a wide variety of physiological and clinical time-series data [7,8]. Intuitively, one may reason that the presence of repetitive patterns of fluctuation in a time series renders it more predictable than a time series in which such patterns are absent. To computing the ApEn(m, r) (m: length of the series of vectors, r: tolerance parameter) of a time series {x(k)}, (k = 1, . . . , N ), v(k) = [x(k), x(k + 1), . . . , x(k + m − 1)] is first constructed from the signal samples {x(k)}. Let D(i, j) denote the distance between two vectors v(i) and v(j) (i, j ≤ N − m+ 1),
206
Q. Shi et al.
which is defined as the maximum difference in the scalar components of v(i) and v(j), or D(i, j) = max |vl (i) − vl (j)|. l=1,···,m
(1)
Then, compute the metric N m,r (i), which represents the total number of vectors v(j) whose distance with respect to the generic vector v(j) is less than r, or D(i, j) ≤ r. Now define C m,r (i), the probability to find a vector that differs from v(i) less than the distance r, as follows: C m,r (i) =
φm,r =
N m,r (i) , N −m+1
(2)
N −m+1
log C m,r (i) . N −m+1
i=1
(3)
For m + 1, repeat above steps and compute φm+1,r . ApEn statistic is given by ApEn(m, r) = φm,r − φm+1,r .
(4)
The typical values m = 2 and r between 10% and 25% of the standard deviation of the time series {x(k)} are often used in practice [7]. 2.2
Empirical Mode Decomposition
The EMD method as a time-frequency analysis tool for nonlinear and nonstationary signals has been proposed in [5]. EMD is a fully data driven technique with which any complicated data set can be decomposed into a finite and often small number of Intrinsic Mode Functions (IMF). An IMF component as a narrow band signal is a function defined having the same numbers of zero-crossing and extrema, and also having symmetric envelopes defined by the local maxima and minima respectively. The procedure to obtain the IMF components from an observed signal is called sifting and it consists of the following steps: 1. Identification of the extrema of an observed signal. 2. Generation of the waveform envelopes by connecting local maxima as the upper envelope, and connection of local minima as the lower envelope. 3. Computation of the local mean by averaging the upper and lower envelopes. 4. Subtraction of the mean from the data for a primitive value of IMF component. 5. Repetition above steps, until the first IMF component is obtained. 6. Designation the first IMF component from the data, so that the residue component is obtained. 7. Repetition above steps, the residue component contains information about longer periods which will be further resifted to find additional IMF components.
BCI System Using Approximate Entropy and EMD Techniques
207
The sifting algorithm is applied to calculate the IMF components based on a criterion by limiting the size of the standard deviation (SD) computed from the two consecutive sifting results as T 2 (hk −1 (t ) − hk (t )) . (5) SD = hk2−1 (t ) t=0 in which a typical value for SD can be set between 0.2 and 0.3 for the sifting procedure. Based on the sifting procedure for one channel of the real-measured EEG data, we finally obtain n x(t) = ci (t) + rn (t). (6) i=1
In Eq. (6), ci (t)(i = 1, · · · , n) represents n IMF components, and rn represents a residual component. The residual component can be either a mean trend or a constant. Since each IMF component has a specific frequency, it is easily to discard high frequency electrical power interference after raw data decomposition. The rest desirable components are combined to a new signal x (t).
3 3.1
Experiments and Results Motor Imagery Experiment
In our experiment, the EEG signal was recorded by NEUROSCAN ESI system. As illustrated in Fig. 1(a), six exploring electrodes (F3 , F4 , C3 , C4 , P3 , P4 ) are placed on forehead and two references are placed on earlobes (A1, A2) based on the standardized 10-20 system. The sampling rate of EEG is 500 Hz and the resistance of each electrode is set to less than 8 kΩ. EEG data recorded during right or left wrist movement imagery is used to simulate a BCI input data sources. The subject sat in a relaxing condition and was presented with a series of sound task by STIM AUDIO SYSTEM, from which the subject is able to perform motor imagery and eyes’ closing alternatively. As showed in Fig. 1(b), the onset of mental wrist movement is paced with the interval of 3 seconds. In the first session, the subject attempted to imagine left wrist movement after the sound signal. In the second session, right wrist movement imagery was carried out. Each of the section lasted about 300 seconds including wrist movement imagery 50 times and eyes’ closing 50 times. As an example, a small time window of eight seconds (79–87 sec.) right wrist movement imagery EGG signal is shown in Fig. 2. Event 1 (i.e., red line) is the sound signal for motor imagery and event 2 (i.e., green line) is the one for eyes’ closing. The subject began to imagine the right wrist movement when event 1 was presented and stopped as soon as event 2 appeared.
208
Q. Shi et al.
(a) close eyes
motor imagery 3 sec
3 sec
(b) Fig. 1. (a) The location of six electrodes and two references (A1 , A2 ) based on the standardized 10-20 system in the motor imagery experiment. (b) The process of each wrist mental movement task.
1
eyes closing
2
1
motor imagery
P4
Channels
P3 C4 C3 F4
Scale 60.03 +
F3 79
80
81
82
83
84
85
86
87
Time
Fig. 2. A view of 8 seconds right wrist movement imagery EEG data. Event 1 is the motor imagery stimulation and event 2 is for subject’s eyes’ closing.
3.2
ApEn Results for Wrist Motor Imagery EEG
xIn this subsection, we firstly use ApEn measure to analyze the recorded EEG signals from channel C3 and C4. These two typical channels are located separately in two areas where are relative to classifying the type of motor imagery. The ApEn calculated results of some motor imagery time points are showed in Table 1 and Table 2. In left wrist motor imagery process (Table 1), ApEn results of channel C3 are usually lower than those of C4. We suspect the result implies that EEG signal from channel C3 is more predictable since certain brain wave rhythms occur in the left hemisphere when the subject acts left wrist motor imagery. Contrarily, lower ApEn of channel C4 EEG signal in right wrist motor imagery (Table 2) implies regular rhythms occur in the right hemisphere.
BCI System Using Approximate Entropy and EMD Techniques
209
Table 1. ApEn results for left wrist motor imagery in some time points (r=0.25) Chan.
Left wrist motor imagery time points (sec.) 52–53 84–85 91–92 117–118 135–136 238–239 270–271 309–310
C3
0.1599
0.1760 0.1855
0.1966 0.1641
0.1196 0.1056
0.2224
C4
0.2721
0.3048 0.2996
0.2865 0.2822
0.3054 0.1887
0.3339
Table 2. ApEn results for right wrist motor imagery in some time points (r=0.25) Chan.
3.3
Right wrist motor imagery time points (sec.) 75–76 94–95 95–96 133–134 145–146 203–204 209–210 274–275
C3
0.1645
0.2361 0.2663
0.2583 0.1848
0.2675 0.1781
0.2576
C4
0.1540
0.1483 0.1630
0.1819 0.1132
0.2224 0.1544
0.2405
EMD for Wrist Motor Imagery EEG
Basing on the result of ApEn measure, we do further analysis by applying EMD method to EEG signal from the channel C3 and C4 during the motor imagery task. As shown in Fig. 3, the signal from the channel C3 of one left wrist movement imagery task from 270 to 271 sec. is selected as an example. By applying the EMD method described in Section 2, we obtained four IMF components (C1 to C4 ) within different frequency from high to low and a residual component (r). Generally in our experiment, the component with such a high frequency like C1 refers to electrical interference from environment and the residual component (r) is also not typical useful component, considering. Several factors suggest that μ and/or β rhythms could be good signal features for EEG-based communication. Mental imagery of movement or preparation for movement is typically accompanied by a decrease in μ and β rhythms, particularly contralateral to the movement. This decrease has been labeled ‘event-related desynchronization’ (ERD). Its opposite, rhythm increase, or ‘event-related synchronization’ (ERS) occurs in the cortex areas without movement or in relaxation [6]. EMD Result for One Second Signal in Channel C3 C3
1100 1050
C1
1000 270 5
270.1
270.2
270.3
270.4
270.1
270.2
270.3
270.4
270.5
270.6
270.7
270.8
270.9
271
270.5
270.6
270.7
270.8
270.9
271
0
C2
-5 10 0
r
C4
C3
-10 20 0 -20 10 0
-10 1100 1080 1060 270
Time(sec.)
Fig. 3. EMD result for one second (270–271 sec.) signal in channel C3.
210
Q. Shi et al. EMD result for one second signal of channel C3
EMD result for one second signal of channel C4 450
C4
C3
1100 1050 1000 270
270.1
270.2
270.3
270.4
270.5
270.6
270.7
270.8
270.9
400 350 270
271
270.1
Time(sec.)
EMD result
271
0 0
10
1500
20
30
40
C3
270.5
1000
0
500 270.5
271
0 0 1500
10
20
30
0 271
0 0
10
Time(sec.)
20
30
270.7
270.8
270.9
271
1000 500 270.5
271
0 0
1500
10
20
30
40
10
20
30
40
10
20
30
40
1000
0
500 270.5
271
0 0 1500 1000
0
500
-5 270
40
270.6
Fourier transform
5
500 270.5
270.5 Time(sec.)
0
-10 270
40
1000
-10 270
270.4
1500
-5 270 10
4
C3
C
500
-20 270 10
C4
2
1000
0 -10 270 20
270.3
5
1500
C
C2
10
270.2
EMD result
Fourier transform
270.5
271
0 0
Time(sec.)
Frequency(Hz)
Frequency(Hz)
(b)
(a)
Fig. 4. EMD results for channel C3 and C4 signal from a left wrist movement imagery task (270–271 sec.). (a) Decomposed IMFs for channel C3 in time and frequency domains. (b) Decomposed IMFs for channel C4 in time and frequency domains. EMD result for one second signal of channel C3 C3
550
50
450 145
C4
500 145.1
145.2
145.3
145.4
145.5
145.6
145.7
145.8
145.9
146
Time(sec.)
EMD result
C2 30
40
145.5
146
0 0
1500
C3 10
20
30
40
1000 500 145.5
146
0 0
Time(sec.)
10
20
30
Frequency(Hz)
(a)
145.4
145.5
145.6
145.7
145.8
145.9
146
Fourier transform
40
1500 1000
0
-10 145 10
500 145.5
146
10
20
30
40
10
20
30
40
20
30
40
500 145.5
146
0 0
1500 1000
0 -5 145
0 0
1500 1000
0
-10 145 5
C4
2
C
3
C
4
C
20
500
0
-10 145
10
1000
0
-10 145 10
0 0 1500
145.3
Time(sec.)
500 146
145.2
10
1000
145.5
145.1
EMD result
1500
0
-10 145 10
-50 145
Fourier transform
10
EMD result for one second signal of channel C4
0
500 145.5
146
0 0
Time(sec.)
10
Frequency(Hz)
(b)
Fig. 5. EMD results for channel C3 and C4 signal from a right wrist movement imagery task (145–146 sec.). (a) Decomposed IMFs for channel C3 in time and frequency domains. (b) Decomposed IMFs for channel C4 in time and frequency domains.
Therefore, after the EMD processing, the rest three IMF components (C2 to C4 in a dotted line box in Fig. 3) that as desirable ones are displayed in their frequency domain by applying the Fast Fourier Transform (FFT) (Fig. 4(a)). With y-coordinate in the scope from 0 to 40Hz, one component within the frequency that corresponds to the range of μ rhythm is visualized (the second block in right column of Fig. 4(a)). By applying the EMD method to the EEG of channel C4 obtained from the same motor imagery task, the amplitude of each desirable IMF components this time is in a low range (Fig. 4(b)). Comparing the analysis results in Fig. 4, it is clear that μ rhythm can be extracted from channel C3 rather than channel C4. Without loss of generality, the same process is applied to the EEG signal of other left wrist movement imagery tasks. Similar results we obtained implies the left wrist movement imagery lead to the decrease in
BCI System Using Approximate Entropy and EMD Techniques
211
Fig. 6. Extraction of μ rhythm from channel C3 in left wrist movement imagery as well as that from channel C4 in right wrist movement imagery
μ rhythm in the right hemisphere. In the session of right wrist motor imagery, comparatively, analysis between channel C3 and C4 demonstrates μ rhythm can be extracted from channel C4 (Fig. 5). The EMD method shows the appearance of μ rhythm results in difference of ApEn of each hemisphere and correlation between μ rhythm and motor imagery task obtained from the analysis corresponds to the theoretical fact (Fig. 6).
4
Conclusion
In our study, EEG data recorded in an offline BCI experimental setting presents two classes which correspond to the left wrist and the right wrist motor imageries. We suggest the applicability of the frequency spatial pattern approach to classification of motor imagery EEG. The approximate entropy (ApEn) is applied to do preliminary measure of complexity difference between the channel which dominate the motor imagery. EMD method is used to classify the subject’s motor imagery conditions based on spectral analysis of μ rhythms. Since depending on the part of the body imagined moving, the amplitude of multichannel EEG recordings exhibits differences in spatial patterns, we note the proposed method showed its effectiveness. Acknowledgments. This work was supported in part by KAKENHI (22560425, 21360179).
References 1. Wolpaw, J.R., Birbaumer, N., Mcfarland, D.J., Pfurtscheller, G., Vaughan, T.M.: Brain-Computer Interfaces for Communication and Control. Clinical Neurophysiology 113(6), 767–791 (2002) 2. Cichocki, A., Washizawa, Y., Rutkowski, T., Bakardjian, H., Phan, A., Choi, S., Lee, H., Zhao, Q., Zhang, L., Li, Y.: Noninvasive BCIs: Multiway Signal-Processing Array Decompositions. Computer 41(10), 34–42 (2008) 3. Farwell, L.A., Donchin, E.: Talking off the top of your head: toward a mental prosthesis utilizing event-related brain potentials. Electroenceph Clin Neurophysiol. 70, 510–523 (1988)
212
Q. Shi et al.
4. Gao, X., Xu, D., Cheng, M., Gao, S.: A BCI-based environmental controller for the motion-disabled. IEEE Trans. Neural Syst. Rehabil. Eng. 11(112), 137–140 (2003) 5. Huang, N.E., Shen, Z., Long, S.R., Wu, M.C., Shih, H.H., Zheng, Q., Yen, N.C., Tung, C.C., Liu, H.H.: The Empirical Mode Decomposition and The Hilbert Spectrum for Nonlinear and Non-stationary Time Series Analysis. Proceedings of the Royal Society of London, A 454, 903–995 (1998) 6. Pfurtscheller, G.: EEG Event-Related Desynchronization (ERD) and Eventrelated Synchronization (ERS). In: Electroencephalography: Basic Principles, Clinical Applications and Related Fields, 4th edn., pp. 958–967. Williams and Wilkins, Baltimore (1999) 7. Pincus, S.M.: Approximate Entropy (ApEn) As A Measure of System Complexity. Proc. Natl. Acad. Sci. 88, 110–117 (1991) 8. Pincus, S.M., Goldberger, A.L.: Physiological Time-series Analysis: What Does Regularity Quantify? Am. J. Physiol. 266, 1643–1656 (1994) 9. Sajda, P., Mueller, K.-R., Shenoy, K.V.: Brain Computer Interfaces. IEEE Signal Processing Magazine, Special issue, 16–17 (January 2008)
An Application of LFP Method for Sintering Ore Ratio Xi Cheng, Kailing Pan, and Yunfeng Ma School of Management, Wuhan University of Science and Technology, P.R.China, 430081 [email protected]
Abstract. The proper ratio decision of sintering burden is a significant section for both of decreasing sintering costs and increasing quality of iron. At present most company in China take the Fixed-ratio method and linear programming (LP) model to calculate the proper ratio for sintering. The former is the performance appraisal method for production cost management of iron. The latter is to use maths method to improve the computation process. This paper brings up a linear fractional programming (LFP) model combining the advantages of both methods to compute the proper ratio to minimize the iron cost per ton for sintering. Next based on the production data from some steel company this paper takes use of MATLAB to solve the problem. Then comparing the solutions with the original method, the traditional LP model and LFP model the conclusions are revealed in the end. Keywords: Linear fractional programming, Linear programming, Optimization models, Sintering ore blending.
2 Construction of LFP Model According to production practice, only raw materials which conform to indicators of physical and chemical properties can put into the furnace. And the requirement of ore dressing is that proportion and content of chemical is uniformly distributed and stable. Suppose that the particle size, chemical composition of raw materials and other indicators act in full compliance with sintered metallurgical performance requirements. The same kind fuel from the same origin is assumed with the same chemical composition. Suppose that proportion and content of chemical is uniformly distributed, stable and well-mixed. I is a set of names of raw materials. J is a set of names of chemicals. Other symbols are defined as following. aij is the j'th chemical composition percentage of the i'th raw materials, where the unit is %, for any i belonged to I and any j belonged to J. xi is the i'th raw materials ratio, which is also the decision variable, where the unit is %, for any i belonged to I. pi is the price of the i'th raw materials, where the unit is yuan per ton, for any i belonged to I. E j is the lower limit of j'th chemical composition percentage, where the unit is %, for any j belonged to J. E j is the upper limit of j'th chemical composition percentage, where the unit is %, for any j belonged to J. M i is the lower limit of the i'th raw materials ratio, where the unit is %, for any i belonged to I. M i is the upper limit of the i'th raw materials ratio, where the unit is %, for any i belonged to I. R is the upper limit of sinter basicity, R is the lower limit of sinter basicity. The sinter basicity is the value of quotient of the sum of composition percentage of CaO divided the sum of composition percentage of SiO2, where there is no unit. rfuel is the upper limit of fuel percentage, where the unit is %. rfuel is the lower limit of fuel percentage, where the unit is %. Si is the supply of the i'th raw materials within plan period, where the unit is ton, for any i belonged to I. rf is the sum of foreign ore percentage, where the unit is %.ë2 is iron recovery rate coefficient which is a constant. Qs is the output of sinter, where the unit is ton. The LFP model is made up of decision variables, object function and constrains as follows. 30
∑px i
Min f(x)=
i
i=1
⎛ a ⎞ λ2 ∑ xi ⋅ ⎜ 1 − i ,15 ⎟ ⎝ 100 ⎠ i =1 30
(1)
An Application of LFP Method for Sintering Ore Ratio
215
30
∑a x ij
Ej ≤
s.t .
i
i =1
⎛ ai ,15 ⎞ ∑ xi ⋅ ⎜ 1 − 100 ⎟ ⎝ ⎠ i =1 30
Qs xi 30
⎛ ⎝
λ 2 ∑ xi ⋅ ⎜ 1 − i =1
ai ,15 ⎞
≤ E j , ∀j ∈ J
≤ Si , ∀i ∈ I
(2)
⎟
(3)
= rf
(4)
100 ⎠
12
∑x
i
i =1
30
∑a
i ,3
R≤
xi
i =1 30
∑
≤R
(5)
ai ,5 xi
i =1
M i ≤ xi ≤ M i , ∀i ∈ I
(6)
rfuel ≤ x29 + x30 ≤ rfuel
(7)
23
∑x
i
= 100
(8)
i =1
The object function formula (1) is a linear fraction which numerator and denominator are sum of linear relationship. The numerator is the mixed price of each kind of material. And the denominator is the sum composition of TFe of mixed materials after sintering.ë2 is iron recovery rate coefficient that is the percentage of melted iron become pure iron. Obviously the burned loss ai,15 is including. f(x) is total price divided output of TFe. In other words the object is to minimize the cost of iron per ton. The constrains are composed of chemicals restrictions, supply constrains, foreign ore limit, sinter basicity boundary as follows. Formula (2) is the chemical composition percentage restrictions, including TFe, CaO, MgO, SiO2, Al2O3, S, P and Ig. Ig is the burning loss of sintering. In formula (2), xi is variable and other coefficient is known. Formula (2) can equivalent transformed to inequality with linear relationship as follows. Both sides of formula (2) multiply the denominator, which is 30
∑ i =1
⎛ ⎝
xi ⎜ 1 −
30 30 ⎞ ⎛ ai ,15 ⎞ E a x E x 1− ≤ ≤ j∑ i ⎜ ⎟ j ∑ ij i ⎟ 100 ⎠ i =1 i =1 ⎝ 100 ⎠
⎡ ⎛ ai ,15 ⎞⎤ − E j ⎜1 − ij ⎟ xi ≥ 0 , ∀j ∈ J ⎣ ⎝ 100 ⎠⎦⎥
∑ ⎢a i =1
(10)
The lower limit of j'th chemical composition percentage is transformed to formula (9). And the upper transformed to formula (10). After that the condition (2) can be transformed to linear constrains. Formula (3) is the supply constrains, in the same argument can be transformed to S i ⋅ λ2
⎛ i =1 ⎝ 30
⎞ ⎠
ai ,15
∑ ⎜ 1 − 100 ⎟ ⋅ x
i
− Qs xi ≥ 0 , ∀i ∈ I
(11)
Formula (4) is foreign ore limit, which is equality. Formula (5) is sinter basicity boundary, which can be transformed to
∑[R ⋅ a
− ai ,3 ] xi ≥ 0
(12)
− R ⋅ ai ,5 ] xi ≥ 0
(13)
30
i ,5
i =1
∑[a 30
i ,3
i =1
Formula (6) (7) (8) are all linear relationship equations. All constraints become linear relationship. According to numerical optimization theory[3], linear search method can help the iterative calculation to solve the model of which algorithm effectiveness has been proofed by reference[4]. Many kinds of mathematical software have developed optimization toolbox to solve the problem directly such as MATLAB.
3 Calculation and Solving Though the optimization toolbox of MATLAB has a GUI interface which can directly input the simple maths model, the LP model has too many variables and parameters to use the convenient GUI panel. It is inevitable to write m files to define the functions to solve the problem. Reference[5] recommends how to write MATLAB programming code to input the model into the optimization toolbox and save the program as m files in detail. When input the sentence in the command window to call the functions from m files, the program automatically choose the sequential quadratic programming (SQP) method and the linear search method named quasi-Newton which is also called variable metric algorithm[3] for iteration to search the optimal. Getting the optimal of the model, the iron cost per ton can be calculated. Data should all input into a mat file in the same directory with the m files. The content of the chemical composition of raw materials matrix, supply and other known values are shown in Table 1. Upper and lower limits of each chemical composition of ore blender are shown in Table 2. From Table 1 and Table 2,│I│=30 and │J│=8 is known. The value of rf is 65. The sinter basicity boundary value is between 1.75 and 2.20. Table 1 data is from production database of some steel company.
An Application of LFP Method for Sintering Ore Ratio
217
Table 1. Prices and percentages of sinter raw materials table Content (%)
ID TFe 1 2 3 4 5 6
CaO MgO
61.77
SiO2 Al2O3
P
0.1
4.11
2.1 0.05 0.075
58.2 0.11
1.83 0.05 0.046
0.5
4.52
63
0.1
0.1
3.68
60
0.5
0.17
62.33 0.09
0.1
64.5
t
yuan/t
0
500
0.075
13
0
200
0.046
0
0
500
0.03
4.5
2.8 0.08 0.088
8.07
2
2
500
0.088
4.19
2.83 0.05 0.056
1.8
15
5
500
0.056
3.5 3.7
8
65.5 0.03
0.03
9
65.32 0.03
0.03
10
57.81 0.14
1.7 0.01
0.03
1.35
0
0
500
0.03
1.98 0.05 0.063
1 0.05
1.1
5
0
500
0.063
3.91
1.24 0.02
0.01
1.9
0
0
500
0.01
3.15
1.24 0.05 0.045
1.9
11
5
500
0.045
0.1
5.25
1.67 0.05 0.043
0.1
4.5
10.2
2
2
500
0.043
0.02
1.75
0
0
500
0.02
1.65 0.02
0.06
3.2
0
0
500
0.06
1.22 0.25
0.06
1
0
0
500
0.06
0.07
1
17
0
500
0.07
0.1
5
0
0
433
0.1
62
3
12
61.8
0.1
0.1
4.3
13
63.5 0.68
2.14
6
14
62.74 0.72
0.32
6.93
2.7 0.19
15
61 0.72
0.32
7
1.38 0.27
16
57 0.35
1.61
7.5
2.56 0.43
0
0
0
%
0
0.1
0
% 18
3.35
0.02
17
Ig 3.75
0.03
0.3
11
S
0.1
65.35 0.29
7
Upper Lower Supply Price
18
0
0
0
0
19
67.44
0.5
0.1
4.59
0.7 0.08
0.09
5.3
0
0
433
0.09
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1.7 0.36 0.066
37
8
0
433
0.066
20
68.5 0.01
0.7
2.4
0.18 0.22
0.04
-1.4
0
0
433
0.04
21
58.89 0.18
0.1
9.2
6.36 0.02
0.09
0
3
3
500
0.09
22
18.43 45.3
7.1
11.5
1.4 0.08
0.33
15
2
2
433
0.33
23
35
5
4
6.5
1.5 0.35
0.07 46.47
5
5
100
0.07
24
0
32
19
1.02
25
0
51
0
3
26
0 83.5
2
3.5
27
0
32
2.1
28
0 5.86 33.95 35.18
29 30
54
4.35 0.84 2.5
0
1.16
7.99
0
8.5
0
45
40
0
100
0
1 0.08 0.003
0 0.01
11.46
0
0
500
0.003
0 12.46
40
0
500
0
1
0
1 0.23
0
3.18
0
0
500
0
1
0
0
78
0
0
500
0
4
0
0
70
5
5
500
0
0 0.35
0
0
0
0
500
0
218
X. Cheng, K. Pan, and Y. Ma Table 2. The chemical composition restrictions and quality of the optimal table
Upper%
TFe 80.00
CaO 30.00
MgO 2.00
SiO2 10.00
Al2O3 2.50
S 2.00
P 2.00
Ig 20.00
lower%
57.00
5.00
1.80
5.00
1.80
0.00
0.00
0
57
10.54
2
5.48
2.37
0.1
0.06
14.3
Sinter ore %
Table 3. Iterative calculation report Iter 0 1 2 3
F-count 31 62 93 124
f(x) 1 316.64 1 195.61 1 194.55 1 194.33
Max constraint 803.2 4.313e-013 6.064e-009 1.843e-008
Fig. 1. Bar 1 is the cost of Fixed-ratio method. Bar 2 is the calculated cost of LP model. And bar 3 is the iron cost per ton of LFP model. The bar chart takes use data from Table 4.
An Application of LFP Method for Sintering Ore Ratio
219
1.43%
99.57%
LP model
Fig. 2. Pie 1 is the iron cost of sintering ore computed by the LP model. Pie 2 is the cost saving percentage from the Fixed-ratio method.
4.28%
95.72%
LFP model
Fig. 3. Pie 1 is the iron cost per ton calculated by the LFP model. Pie 2 is the cost saving percentage from the Fixed-ratio method.
Taking the vector of zeros, X0, X1 as start point to compute can obtain the same optimal X*. Calculation report which taking X1 as the start point shown in Table 3. X0=(18,13,0,2,13,0,5,0,12,2,0,0,0,17,0,0,0,0,8,0,3,2,5,0,0,0,0,0,5,0)T, X1=(18,12,0,2,15,0,5,0,11,2,0,0,0,17,0,0,0,0,8,0,3,2,5,0,0,0,0,0,5,0)T, X*=(18,13,0,2,14,0,5,0,11,2,0,0,0,17,0,0,0,0,8,0,3,2,5,7.2042,0,8.8915,0,0,5,0)T. Step length and iterations indicate that convergence of the quasi-Newton search algorithm perform well. The effectivity of the SQP algorithm to solve the quadratic form is already testified in reference[4]. The result shows that the selection of start point can impact the convergence rate to some degree but acquire the same solution. The final optimization result is not affected. LFP model calculations need not select the feasible ratio as the initial iteration point. This method is more advanced than the Fixed-ratio method. All content of chemical composition of the optimal are within the boundary as shown in Table 2. That content of MgO and TFe reached the boundary reveals that these chemical composition restrictions have effect on the optimal. Table 4 shows the result of calculations of the traditional LP model, the LFP model and the Fixed-ratio method. According to Table 4, the iron cost per ton of Fixed-ratio method is 1 247.74 yuan per ton. The cost of LP model is 1 229.85 yuan per ton. And the iron cost per ton of LFP model is 1 194.33 yuan per ton. So LP model can save 17.89 yuan per ton. That is 1.43% of Fixed-ratio method. LFP model can save 53.41
220
X. Cheng, K. Pan, and Y. Ma
yuan. That is 4.28% of Fixed-ratio method. Apparently the LFP model can save much more money for sintering.
4 Conclusion The result of LFP model argues the following conclusions. (1) Iron cost per ton of optimization models is smaller than Fixed-ratio method, indicating that the model application of scientific and effective. Optimization model program compute the ratio of raw materials for sintering greatly simplifies the Fixedratio method which takes use of traditional manual spreadsheet. (2) The results depict that the application of LFP to calculate sintering ore ratio is feasible and operating well. (3) The optimal solution obtained by LP is only an approximate solution of Fixedratio method. From the perspective of the assessing of cost savings, the effect is less than the LFP model. (4) In addition of the LFP model based on constraint set methods, some tight constraints can be used as one of the goals to establish another type of model based on fuzzy set theory, like multi-objective LFP model. For example the pure iron content only reached the lowest limit constraints in the LFP model. It is natural to consider maximizing TFe content as one of the goals. Acknowledgments. This research was supported by school of management of WUST and Xiangtan iron and steel company. We wish to thank the referees for their very useful suggestions on the project.
References 1. Wang, D.-q.: Application of Linear Programming in Production of Mixing Materials to Sinte. China Metallurgical. J. 15(8), 19–22 (2005);
中国冶金
线性规划在烧结矿配料中的应用 . 炼铁计算
2. Na, S.-r.: Iron Calculation, pp. 73–80. Metallurgy Industry Press, Beijing (2005); 3. Sun, W.-y., Xu, C.-x., Zhu, D.-t.: Optimization Method, pp. 5–11, 173–202. Higher Educa-
最优化方法
tion Press, Beijing (2004); 4. Benson, H.P.: Fractional programming with convex quadratic forms and functions. European Journal of Operational Research 173(2), 351–369 (2006) 5. Gong, C.: Proficient in Matlab calculation, pp. 231–260. Electronics Industry Press, Beijing (2009);
精通MATLAB最优化计算
Contour Map Plotting Algorithm for Evaluating Characteristics of Transient Electron Beam Chunlong Shen1,2, Miping Zhang1, Kehong Wang3, Yong Peng3, and Jianhua Xu1,2 1
Institute of Computer Science & Technology, Nanjing Normal University 2 Jiangsu Center of Security and Confidence Engineering 3 Dept. Materials, Nanjing University Science & Technology 210097 Nanjing, P.R. China [email protected]
Abstract. The performance of electron beam is firmly associated with shape for electron beam stream which reflects its focus state. With contour map algorithm various power distribution regions were segmented by multi-level power density value sets. Based on iterating policy the power density value was computed by specialized power region value. The theory was analyzed and algorithm steps were depicted in detail. 2D regular grid logical interpolation states with isovalue were explained and four cases for contour propagation were concluded particularly. A criterion for selecting candidate isovalue points was analyzed. An illustration for contour propagation denotes tracing process for isovalue points of different grid units which forms contour map. A contour map plotting of multi-value power regions appearance was achieved with parameters for various shapes. Experimental results on various datasets demonstrate the effectiveness and robustness of our method. Keywords: Contour Map, Contour Propagation Algorithm, Electron Beam.
recognition and contour-based region analysis. Power density distribution denotes the properties of electron beam. Different power segmentation method and relative power density contour generation are explored and manipulated.
2 Power Segmentation Algorithm Based on Iterating Policy Different power region distribution of 2D scalar field helps portray the overall meaningful performance of focus for electron beam. Various power areas are described with multi-level sets of float number specialized power density value. With value of power how to obtain that of power density is a challenging research issue.
E (e)( w)
E(max e)
E(Te ) E(ie+)1
1 E iT+ 1
ΔEii +1
E(ie ) 1
2
E(min e)
e( w/mm 2 )
i emin Tmin ( ei )
0 min
T
e i +1 eT i +1 Tmin
i Tmax
emax
i +1 max
0 Tmax
T
Fig. 2. Variant Evolvement of Iterating Processing
Contour Map Plotting Algorithm for Evaluating Characteristics of Transient EB
223
Assuming E(e) is power function of electron beam, f ( x, y , e) is power density function whose value is positive. E(e) can be given in integral formula as e
∫ ∫ ∫ f ( x, y , e) dxdyde .
E (e) = Where [e min , e
]
x max y max
(1)
emin x min y min
is interval of power density for electron beam, [ x min , x max ] and
[ y min , y max ] are section locations of electron beam. Obviously function E(e) is strictly monotonic increasing. Namely E(e) is a singled-valued function. Based on above information, an iterating algorithm which can get corresponding power density e from specified E(e) is proposed. The principle was showed in Fig.2. The denotation Δ E ii + 1 represents the power difference value of hatching region between (i+1)th step and ith step. The denotation E (i e ) reflects power value for ith step. The denotation
e i denotes power density value for ith step. Then i +1 i
ΔE
i +1 (e)
=E
−E
i ( e)
=
Δ E ii + 1 can be written as
e i +1 xmax y max
∫ ∫ ∫ f ( x, y, e)dxdyde .
e
i
(2)
xmin y min
The error ( E iT+ 1 ) between specialized power value ( E (Te ) ) and computed power value ( E (ie+)1 ) for (i+1)th step can be presented as formula
EiT+1 = E(Te ) − E(ie+)1 .
(3)
The judgment of finding corresponding power density value is given as
fabs ( EiT+1 ) ≤ Eξ . Where
fabs (•) means absolute value function, and E ξ is specialized final error.
Pseudo-code for computing procedure was described as following steps. Step 1: Initialization: Define Eξ ; 0 → i ; emin 0
i i i > 0 ) {e i → Tmin , Tmax → Tmax ; } // along
T
if( Ei
else {e
i
i i i → Tmax , Tmin → Tmin ; } // along
②
①
return Step 3; } T
If ( fabs ( Ei
) ≤ Eξ ) return Step 4; // Terminated ↔ ei } ;
T
Step 4: output value map pair {E( e )
3 Contour Map Generation Algorithm 3.1 Logical States Analysis of Grid The interaction of an isoline through a rectangular element can have a maximum of 16 different topological states [4]. Fig. 3 shows all possible states where the value (1) at a node means that the value of the criteria at that node is greater than the isolvaue whereas a value of 0 at a node means that the nodal criteria value is less than the isovalue.For the symmetry of nodes only 8 topological states should be considered. The number of isolvlue for every grid must be even and then all cases for isoline of each grid are denoted in Fig.3. 1 1
1
1
1
0
(1) 0 0
(9)
1
1
1
1
(2) 0
0
0
1
(10)
1
1
0
0
(3) 0
0
0
0
1
1
0
1
(4)
(11)
0
0
1
1
(12)
0
1
1
0
0
1
0
1
1
1
(6)
(5) 0
0
1
0
0
1
(13)
(14)
0
0
(7) 1
0
0
0
0
1
0
(8) 1
0
1
1
(15)
1 1
(16)
Fig. 3. 2D Grid Logical States
3.2 Isovalue Interpolation Contour is made up of series of piecewise isoline which has open and closed type. The start tracing point of open contour is boundary point, and end point is boundary point too. The start tracing point of closed contour is non-boundary point, and end point is consistent with start point obviously. The generation of each contour are consist of three major components [5].
Contour Map Plotting Algorithm for Evaluating Characteristics of Transient EB
225
●Interpolation for each edge of grid by isovalue ●Open contour tracing ●Closed contour tracing Before decribing contour tracing algorithm some notations are defined . Define w for isovalue, pData [ i ][ j ] for node value, xSide [ i ][ j ] and ySide [ i ][ j ] for saving interpolation value r of row edge and column edge respectively. r is given as formula r = ( w − pData [ i ][ j ]) /( pData [ i ][ j + 1 ] − pData [ i ][ j ]) . (5) where pData [ i ][ j + 1] ≠ pData [ i ][ j ]
r = ( w − pData [ i ][ j ]) /( pData [ i + 1][ j ] − pData [ i ][ j ])
. (6) where pData [ i + 1][ j ] ≠ pData [ i ][ j ] Obviously if r lies in interval of 0 to 1 it shows that relevant edge includes isovalue point. Otherwise isovalue point doesn’t exist on edge. xSide[i+1][j]
(i+1,j)
P33
xSide[i+1][j] (i+1,j+1)
P31
ySide[i][j+1]
P33
(i+1,j)
P31
(i+1,j+1) Tracing Unit ySide[i][j+1]
Tracing Unit ySide[i][j]
P32
ySide[i][j]
ĉ
P32
(i,j)
ĉ
(i,j+1)
(i,j) (i,j+1)
P
P
(Step 1) xSide[i+1][j]
(Step 2)
P33
(i+1,j)
xSide[i+1][j] (i+1,j+1)
P31
ySide[i][j+1]
P31
Tracing Unit ySide[i][j]
P33
(i+1,j)
(i+1,j+1) ĉ
ySide[i][j+1]
ySide[i][j]
P32
ĉ
P32 Tracing Unit
(i,j)
(i,j+1)
(i,j+1)
(i,j)
P (Step 3)
P (Step 4)
Current Tracing Point
Next Tracing Point
Isovalue Interpolation Point
Nonexistent Isovalue Interpolation Point
Fig. 4. Four cases of selection Criteria for candidate isovalue points
226
C. Shen et al.
3.3 Selection Criteria for Candidate Isovalue Points Given an initial cell that contains isovalue, the remainder of the isovalue can be found by propagation procedure. As open contour to say, the initial cell must be selected from boundary isovalue and that of closed contour be selected from arbitrary inner isovalue which can not be marked as traced isovalue. The idiographic tracing process of grid unit, namely contour propagation can conclude four cases, which are from bottom of grid to that of top, from top of grid to that of bottom, from left of grid to that of right, and from right of grid to that of left. The selecting next isovalue point for case of from bottom to top is demonstrated in Fig.4. Define current tracing isovalue P lies on the bottom of grid unit I. The next isovalue point may be one of P31 , P32 and P33 . For the reason of one isovalue point runs into grid there only one isovalue point leaves from same grid, selection criterion of candidate points is determined by the relation of , and . A candidate isovalue point ySide [ i ][ j ] ySide [ i ][ j + 1 ] xSide [ i + 1 ][ j ] becomes next tracing point following steps by order. Step 1: When P31 and P32 exist all, the point which is nearer the bottom of grid can be next contour tracing point. Namely the corresponding point of the less between ySide [ i ][ j ] and ySide [ i ][ j + 1 ] becomes next contour tracing point. Step 2: When the distances to bottom of side for P31 and P32 are equal, the point whose horizontal distance to P is less becomes next contour tracing point. Step 3: When only one point exists among P31 , P32 , the existing one becomes next contour tracing point. Step 4: When P31 and P 32 doesn’t exist, then P33 becomes next contour tracing point. For other three cases the selection criteria for candidate isovalue points is familiar with that of from bottom to up. xSide[i+1][j]
P33
(i+1,j+1)
(i+1,j)
(i+1,j+2)
P31 Ċ Tracing Unit ySide[i][j] ĉ (i,j)
ySide[i][j+1]
P32 (i,j+1) Current Tracing Point P
Fig. 5. Process for Contour Propagation
(i,j+2)
Contour Map Plotting Algorithm for Evaluating Characteristics of Transient EB
227
3.4 Contour Propagation and Contour Map Plotting For open or closed contour, after the initiating isovalue points, scanning the grid sets from left to right is done to find fist isovalue point. Relying on logical state of isovalue points the next tracing isovalue point is determined in a grid and is connected with previous isovalue point. Then next isovalue point is searched in adjacent grids and so on. Contour propagation is finished by series of piecewise isolines. The principle is showed in Fig.5. After all contours generation with drawing tool (OpenGL) different contours map are plotted. Fig.6 shows different power region and power density distribution of electron beam.
Fig. 6. Contour Map and Parameter Computing
4 Conclusion and Future Work Contour map is an effective solution for handling various 2D scalar field data from kinds of resource ubiquitously in science and engineering. In the paper multi-level energy regions were segmented and a set of significant parameters were calculated accurately which provides an urge help for diagnostic tool of features from electron beam. The future research will incrementally focus on pattern building and recognition which helps to evaluate particular characteristics of electron beam in quantity. Based on slices of acquired data in different layers 3D volume rendering of electron beam will be realized and spatial geometry measurement will be handled which spatial information can be discovered furthermore. Acknowledgment. Project supported by the nature science foundation of China(No. 60875001), the nature science foundation of Jiangsu province (No. 08KJD520008).
228
C. Shen et al.
References 1. Dilthey, U., et al.: The focussing of electron beam effected by ion compensation and self magnetic field. IIW Doc.No. IV-702-98 (1998) 2. Dilthey, U.: Electron beam diagnostics: a new release of the diabeam system. J. Vacuum 62, 77–85 (2001) 3. Olszewska, K.: Control of the electron beam active zone position in electron beam welding processes. J. Vacuum 74, 29–43 (2004) 4. Bartier, P.M., Peter Keller, C.: Interpolation for geochemical surface reconstruction incorporating topographic catchment definitions. J. Mathematical Geology 3, 253–273 (1996) 5. Spedding, G.R., Rignot, E.J.M.: Performance analysis and application of grid interpolation techniques for fluid flows. J. Experiments in Fluids 15, 417–430 (1993)
Study on Modification Coefficient of Planetary Gear Tao Zhang and Lei Zhu College of Engineering and Mechanical, Chang An University, Xi’an 710064, China
Abstract. The improvement of load capacity for planet gear is not only about choosing the reasonable meshing angle, but also about the reasonable distribution of modification coefficient. That is to say, the optimal choosing of the external gearing modification coefficient for planet gear should be seen as a two-variable optimization problem. Based on this theory, this paper discusses how to choose the optimal modification coefficient for external gearing planet gear. Keywords: Gear, Planet gear.
2 Selection of the Optimum Modification Coefficient by the 0.618 Method Build up the coordinate system which is shown in Fig. 1. Node is the coordinate original, and the limit meshing area is the coordinate axis. O2 a2 W
r b2 Pb
K D B1 CP S Pb N1 B2
N2
S
Rb1 W A1
O1
N
1
B2 C
P
B1 KD
N2
Fig. 1. The comprehensive curvature radius of mesh point
where α is the pressure angle, α w is the meshing angle, m is the module, Z1 and Z 2 are the number of teeth, rb1 and rb 2 are the base circle radius, pb is the base pitch length, α a1 and α a 2 are outside pressure angle. So a pair comprehensive curvature radius of conjugate tooth profile, passing any point K (s) in the mesh line.
ρ∑ =
(rb1 tgα w + s ) (rb 2 tgα w − s ) (rb1 + rb 2 )tgα w
(1)
so the comparatively comprehensive curvature radius of the K ( s ) is
R=
ρ∑ m
=
cos α ( z1tgα w + s* ) ( z2 tgα w − s* ) ⋅ 2 ( z1 + z2 )tgα w
(2)
2s s* = m cosα . Where It can be shown as the equal (1) that the comparatively comprehensive curvature radiuses in the meshing line is symmetrical about the middle point of the limit mesh area. The distribution graph is shown as the figure (1). The maximum is one fourth of the mesh area length. That is
ρ Σ max =
1 (rb1 + rb 2 )tgα ω 4 [3]
。
Study on Modification Coefficient of Planetary Gear
231
So, in theory, the maximum contract stress of the gear surface only take place on one of the four points B1 , B2 , C and D . If the precision of manufacture is high enough, so it is to be ensure that the contract stress in the double teeth mesh areas is lower than the signal tooth mesh areas. So the point with the maximum contract stress is between the point C and D . Then the objective functions after optimizations are shown as the following:the objective function:
min F0 = − min{RC , RD }
(3)
(ha* = 1, α = 20°, i = 1、 2)
(4)
the restriction:
the restriction of undercutting:
zi sin 2 α ) ≥0 2
(5)
1 [ z1tgα a1 + z2tgα a 2 − ( z1 + z2 )tgα w ] ≥1.2 2π
(6)
M i = xi − (ha* − the restriction of contact ratio:
ε=
the restriction of addendum thickness:
π
z cos α 2 S ai* = i ( cos α i
+ 2 xi tgα zi
+ invα − invα ai ) ≥0.25
(7)
The restriction of interference:
z1 cos α h * − x1 z 2 cos α (tgα w − tgα ) + a − (tgα a 2 − tgα w ) ≥0 2 sin α 2 z cos α h * − x 2 z1 cos α G2 = 2 (tgα w − tgα ) + a − (tgα a1 − tgα w ) ≥0 2 sin α 2 ( z + z )(tgα a 2 − tgα w ) Slip rate: y1 = 1 2 ( z1 + z2 )tgα w − z2tgα a 2 ( z + z )(tgα a1 − tgα w ) y2 = 1 2 ( z1 + z2 )tgα w − z1tgα a1 G1 =
Pressure rate:
ζ1 =
1 + y1 z2 /( z1 + z2 ) 1 + y1
(8) (9) (10) (11) (12)
232
T. Zhang and L. Zhu
ζ2 =
1 + y2 z1 /( z1 + z2 )
(13)
1 + y2
general requirements: yi ≤ 4 , ζ i ≤ 1.4 (i = 1,2) . Input coincidence degree:
ε =1.2, decision variable, gear Z1
and Z 2 ,state variable,
meshing angle α w , it can be seen by regarding the non-backlash meshing formula and coincidence ratio calculating formula that: ⎛ 2 invα ω + α ⎞ 4ha ∗ ⎫ ⎟+ b1 = (Z1 + Z 2 )⎜⎜ − ⎪ sin α ⎟⎠ cos α ⎪ ⎝ cos α ω ⎪ b2 = 2πε + (Z1 + Z 2 )tgα ω ⎪ 2 2 2 2 ⎪ b − b2 + Z 2 − Z 1 b3 = 1 ⎪ 2Z 2 ⎪ ⎬ 2 2 2 D = b3 + b2 − b1 ⎪ ⎪ b2 b3 ± b1 D ⎪ tgα a 2 = 2 2 ⎪ b1 − b2 ⎪ b − Z 2 tgα a 2 ⎪ tgα a1 = 2 ⎪ Z1 ⎭
(14)
This mathematical model can be solved by the 0.618 method. Let the meshing angle of the maximum codification as the upper limit, and the meshing angle of the uncodification is the lower limit. When a meshing angle has two corresponding objective functions (in the equation (14), D > 0 , moreover, the two solutions of
α a1 and
α a 2 meet restrictive conditions. Differences between two solutions of the objective functions reflect the influence of the load capacity when modification coefficients are divided unequally under the same meshing angle. Now, the modification coefficient which is beneficial to the objective function should be chose as the optimal modification coefficient.
3 Results and Discussion When 12≤ Z1 ≤27 and Z1 ≤ Z 2 ≤70, Modification coefficients
x1 , x 2 and meshing
angel α w , which are beneficial to highlight the load capacity of the gear surface, be solved by the above simplified mathematic model. Under the condition,
K 1 , the
RB1 to RL , K 2 , the ratio of RB 2 to RL , and y max ,the maximum slip rate, are shown as the Fig 4 and Fig 5 separately. R B1 is the comparatively comprehensive curvature radius of the big gear tooth, and R B 2 is one of the small gear. RL is the minimum comparatively comprehensive curvature radius in the signal ratio of
meshing areas.
Study on Modification Coefficient of Planetary Gear
233
Fig. 2. The optimal modification coefficient of the small gear
Fig. 3. The optimal modification coefficient of the big gear
It can be seen from the Fig 2 and the Fig 3 that
x1 , the modification coefficient of
x 2 , the big gear’s, except the condition that the gear ratio is 1, when the modification coefficient is equal. When the gear ratio is 1.5, x1 is the small gear, is larger than
not larger any more because of the restriction of addendum thickness and the interference of the small gear. So that the optimal modification point is the maximum of When the gear ratio is bigger than 1.6, as to the same meshing angle, the bigger
x1 .
x1 is,
234
T. Zhang and L. Zhu
and the higher the load capacity is. When the gear ratio is smaller than 1.5, the point
which meet the condition Z 1tgα a1 = Z 2 tgα a 2 is acceptable. It can be seen from the Fig 3 that the meshing angel, which is beneficial to improve the load capacity, is not the maximum meshing angle, besides the gear is 1. What is more, the less Z1 is, the bigger Z Σ , the summation of functions, the larger the difference between the optimal meshing angle and the maximum meshing angle.
Fig. 4. The optimum meshing angle
Fig. 5. The comprehensive curvature radius after optimum
Study on Modification Coefficient of Planetary Gear
235
It can be seen from Fig 4 that, according to the calculation, K 1 and K 2 is larger than 73%. So the maximum contract stress of gear surface is on the signal meshing point of the small gear, according to the calculation for the choice of modification coefficient. It can be seen from Fig 5 that according to the calculation, the maximum split rate of the gear surface happened on the big gear tooth, and its value is between 0.8 and 3.8. The split rate is much smaller than the standard gear’s. The modification coefficients agree with ones of the standard JB1799-76 planetary gear reducer.
References 1. Zhu, J., et al.: The choice of modification coefficient of involute gear. The People’s Education Press, Bei Jing (1982) 2. Xianbo, Z.Z.: Gear modification. Shanghai Scientific and Technical Publishers, Shang Hai (1984) 3. Pu, L., et al.: Mechanical design. Academic Press, Bei Jing (2001) 4. Fan, M., et al.: Basis of optimal technique. Tsinghua University Press, Bei Jing (1982)
The Automatic Feed Control Based on OBP Neural Network Ding Feng, Bianyou Tan, Peng Wang, Shouyong Li, Jin Liu, Cheng Yang, Yongxin Yuan, and Guanjun Xu School of Mechanical Engineering, Yangtze University, Jingzhou, Hubei, China, 434023 [email protected]
Abstract. It is the important technology to take the optimum control of automatic drilling in the course of oilfield drilling in accordance with actual situation. Due to the complexity of drilling process and the non-linear relationship between input and output of drilling system; it’s difficult to acquire satisfied results to adopt general control method. This article presents a new control method which based on the OBP neural network. The OBP algorithm and the design of control system are elaborated in details in this paper. The automatic feed control method based on OBP neural network has applied successfully in Liaohe and Xinjiang oilfield. The result indicated that the control system is efficient and response, stability of the system, the control precision is improved. All the characters index arrive the control required. Keywords: Neural network; OBP neural network; Automatic feed control.
The Automatic Feed Control Based on OBP Neural Network
237
keep steady finally. And finally the bit weight will achieve the setting value under this speed.[1]
2 Neural Network and OBP Neural Network Neural network is a parallel information process system, which is composed of some
,
simple processing unites called the "neurons" and these neurons are arranged and connected in different topological ways according to the realization of the function requirement. Its basic structure is shown in figure 1. The neural network has massive parallel distributed processing ability, selfadaptability, self-learning, associative memory, fault tolerance, treating complicated model and so on. Moreover, it can adjust the topological unit to solve the problem under the environment with clutter and great uncertainty.[2] OBP is an operational rule based on an optimization model established between the multiple layers of the feed-forward network. What’s more, OBP is a novel learning algorithm for multilayer feed-forward neural network, which is very useful to make some concrete algorithm. High dimensional is substituted by the general two dimensional drawings, and the relationship between each other is revealed when the data topological relation does not changed.
Fig. 1. Neural network structure
3 OBP Algorithm OBP algorithm, independent on gradient equation, uses iterative equation to optimize and connect weight with threshold. Its algorithm steps are as follows. (1) Network Initialization
① Determine the structure of the network, the number of hidden layers, the number of input nodes and output nodes; ②Weight initialization; ③Determine the maximal iterate times IC ; max
238
D. Feng et al.
④Determine the accuracy of output EE ; ⑤Determine the initial weight factor μ , and adjusting system γ β . (2) Output weight optimization ①Iterate times IC = IC + 1 ; ②Calculate V according to the formula: V = A B , where V is the optimized connect weight vector of the output layers and the hidden layers, A = [a ] , a = ∑ z z , z is the value of every output node. ij
ij
−1
opt
opt
i
opt
j
k
B = [b i ] , bi =
∑d • z
i
,
d is gradient.
k
③Renew the weight of output hidden layer V = V ; ④Calculating object function E = (W, V) . W is the connect weight vector of opt
input and hidden layers, (3)Hidden layer optimization
①Using W , V 1
to calculate E t = (W1 , V ) , and calculate Δ W opt accord the for-
ΔW opt = A*−1 ⋅ b * . opt where ΔW is the optimized connect weight vector of the hidden layers and the * * * * input layers, A = c , c = c jt if j ≠ h while c = c jt + μci if j = h ; mula
[ ]
c jt = ∑Vlh x mVlj xi , h, j = 1, K, H ; m, i = 1,K , M ; ci = V ∑ f (s ) x m xi , K
m, i = 1, K , M . b = [d ] , d = ∑ e VLh . xm
*
k
② ⅰ
If
t
E <E
Renew the weight W = W + Δ W opt ;
ⅱ Ordering
E = Et ;
μ;
ⅲ
Decreasing the affect of
ⅳ
Turning to the Step (4), if not, following the next step;
ⅴ
Increasing the affect of
μ;
Turning to the Step ② of the Step (3) (4)Result output IF(IC< IC max ) AND (E>EE), ⅵ
①
Turning to Step (2), if not, following the next step;
②
Saving the parameters.
K
The Automatic Feed Control Based on OBP Neural Network
239
4 The Automatic Feed Control Based on OBP Neural Network First we should follow two principles when we select the inputs. The inputs should be variable that affected outputs largely and could be collected or detected easily, and they should be unrelated or related slightly between each other. We select five important parameters of drilling process as the inputs. They are the bit’s diameter, well depth, bit weight, rotate speed of rotary table and fluid volume of drilling mud which have the direct or indirect relationship with bit weight. The output is the control signal of tray type brake. In the certain field, these parameters are widely selected as the training sample set.[3] They will be able to apply in the field as long as they were trained successfully in a certain well of the field. Generally speaking, the more training samples get, the more correctly training results reflect the intrinsic laws of them, but the collection of samples is always confined by objective conditions. In addition, it is difficult to increase the precision of the network while samples get more enough.[4] The direct relationship between the quantity of the train samples and network error is presented in figure 2. network error
0
The quantity of samples
Fig. 2. The direct relationship between the quantity of the train sample and network error
After the problem of network training samples is solved, the quantity of node point of input layer and the output layer is determined. The structural design of the OBP network is mainly to settle problems between the hidden layer and the node point quantity of hidden layer. The theoretical analysis has already proved that the feed forward network owning the single hidden layer can reflect all continuous functions and it is necessary for two hidden layers while studying the discrete function. Generally the OBP network asks for one hidden layer and it is likely to increase another hidden layer when the quantity of node point of the only hidden layer is still not enough to improve the network performance. How many hidden nodes need to be set depends on the quantity of training samples, the size of sample mush, and the complex degree of the law of the samples. Generally speaking, the complex nonlinearity functions, fluctuating frequently and
240
D. Feng et al.
changing a lot requires that network contains more of the hidden nodes to enhance the ability of mapping. The general method used to fix the optimal number of hidden nodes is cut-and-try method.[4] First we set training network with less hidden nodes, then increase the quantity inch by inch, use the same sample set to train, and finally select the quantity which corresponds the minimum network error. When using the cut-and-try method, we can use this formula as follows, it is necessary to remark that he number of hidden nodes resulting form this formula is just an approximate estimated value, which can be used as the initial value of cut-and-try method.
m = n+l +a
(1)
Where m is the number of hidden nodes, n is the number of input layer nodes, l is the number of output layer nodes, a is the constant that from 1 to 10. Based on the above-mentioned rule, selecting 10 hidden nodes, then we can obtain the OBP neural network topology structure as shown in figure 3. The next works are drawing up the programs of OBP algorithm and testing the network. These works have been done in many other methods which control automatic feed based on neural network, their technology are very ripe, we don’t mention it again in this paper. [5]
Fig. 3. The OBP neural network topology structure
The control method based on OBP network is implemented essentially after completing above works. This control method has applied successfully in Xinjiang and Liaohe oilfield, it shows approved control effect.
The Automatic Feed Control Based on OBP Neural Network
241
5 Application Figure 4 is the monitor screen screenshots of the IPC of a drilling site while the method is adopted in Xingjiang Karamy. From the figure, we can see that the disc brake control signal values outputted from WOB control system followed the perfect output value which the training samples are required.[6] The measured WOB changes realtime according to the control law of the training samples. It illustrates that the trained neural network controller effective learned and reflected the control law contained in training samples. It also shows good generalization ability of the neural network.[7] Comparing with the former constant WOB control system in drilling, this design can improve drilling efficiency. The initial comparing test results are showed in table 1.
Fig. 4. The OBP control chart Table 1. Comparing result Control type constant WOB control OBP neural network control
Bit feeding speed
Bit wear
1.5 h/ single pipe 1.3 h/ single pipe
80 h 85 h
6 Conclusions (1) Neural network is a parallel information process system, and has many advantages as a new and extensive subject. (2) OBP is an operational rule which is based on an optimization model established between the layers of the multilayer feed-forward network. Its algorithm use iterative equation to optimize and connect weight with threshold, taking advantages over others.
242
D. Feng et al.
(3)The network design is the foremost work of the implementation of OBP control, and there are many rules when selecting the quantity of input layers, hidden layers, output layers and nodes of every layer. (4)This method is easy to be realized, meeting the requirements of the drilling process such as nonlinearity, uncertainty and real-time.
References 1. Feng, D., Tang, H., et al.: The Status Quo and Development Trend of Modularized Drilling Rig. J. China Petroleum Machinery 36, 143–149 (2008) (in Chinese) 2. Wang, S.: Reach in Damage Detection Theory of K Type Derrick Structure Based on Frequency Domain System Identification. J. Thesis for the Master degree in Engineering 03, 5–60 (2007) (in Chinese) 3. Yu, J., Feng, D., et al.: Reach in Damage Detection of Submersible Pump Based on Neural Network. J. Machinery 32, 54–57 (2005) (in Chinese) 4. Wang, F., Xiaoping, Z.: A Study of A DRNN-Based Smart Control Experimental System for Use with Automatic Bit Feed on Rig. J. Inner Mongolia Petrochemical Industry 12, 58–60 (2006) (in Chinese) 5. Zhang, N., Jing, G., Jingtian, X., et al.: Study of drilling-rig safety monitoring system based on fuzzy neural network. J. China Petroleum Machinery 37(2), 53–57 (2009) (in Chinese) 6. Feng, D.: Application Research of Neural Network in Bit Selection. J. Petroleum Drilling Techniques 26, 43–47 (1998) (in Chinese) 7. Jenks, W.G., et al.: Squids for Nondestructive Evaluation. J. Journal of Physics& Applied Physics 30, 293–323 (1997)
A Capacitated Production Planning Problem for Closed-Loop Supply Chain Jian Zhang1 and Xiao Liu2 1 2
Department of Mechanical and Manufacturing Engineering, University of Calgary, 2500 University Drive NW, Calgary, Alberta T2N1N4, Canada Department of Industrial Engineering management, Shanghai Jiao Tong University 800 Dongchuan Road, Min-Hang District, Shanghai, 200240, P. R. China x [email protected]
Abstract. This paper addresses a dynamic Capacitated Production Planning (CPP) problem in steel enterprise employing a closed-loop supply chain strategy, in which a remanufacturing process is adopted. We develop a model in which all demands are met by production or remanufacturing without backlogs, under the context that both the production and remanufacturing setup cost functions are arbitrary and time-varying. We also develop a corresponding genetic algorithm (GA) heuristic approach, and run a numerical simulation to test our algorithm’s efficiency by comparing with Branch and Bound method. The simulation results illustrate our algorithm’s accuracy and efficiency in large scale problems. Keywords: capacitated production planning; closed-loop supply chain; genetic algorithms, lot sizing (batches), inventory.
1
Introduction
Nowadays, environmental legislations and increasing global uncertainties drive manufacturing firms to undertake appropriate management strategies in their production planning. One strategy in dealing with the Capacitated Production Planning (CPP) problem is to set up a program for collecting and processing of used products from customers [1,2]. Possible options for processing returned products include remanufacturing, repairing , recycling and disposing such as batteries, cameras, glass, metal, paper, computers, etc. Closed-loop supply chains are characterized by the recovery of returned products. Production planning in such a hybrid system is a real challenge, especially for those increasing uncertainties triggered by unexpected attacks, natural disasters or other risks. The companies engaged in remanufacturing operations have to face more complicated planning situations than traditional ones. This requires new production planning approaches to facilitate the scheduling procedure. Recently, some studies on the CPP problem with remanufacturing have been addressed based on analytical methods. Savaskan et al. [3] designed a closedloop supply chain channel structure by choosing the appropriate reverse for the collection of used products from customers. Guide and Van Wassenhove [4] Y. Tan, Y. Shi, and K.C. Tan (Eds.): ICSI 2010, Part II, LNCS 6146, pp. 243–251, 2010. c Springer-Verlag Berlin Heidelberg 2010
244
J. Zhang and X. Liu
proposed a framework for managing product returns in remanufacturing and analyzed the influence of reuse activities on operational requirements. Souza and Ketzenberg [5] studied the situation where both remanufactured and new products are perfect substitutes for satisfying make-to-order demand and where they also share production facilities. Due to the computational complexity for the CPP problem with a general cost function, heuristic approaches have been adopted widely. The study in this paper is to develop a GA heuristics to the large scale general CPP problem with remanufacturing. The remainder of this paper is organized as follows. In Section 2, a CPP problem for a closed-loop supply chain is presented. In Section 3, a genetic algorithm heuristics is proposed. A numerical experiment is designed to test our algorithm in Section 4. Finally in Section 5, innovations and limitations of this research are discussed.
2
Problem Description
The closed-loop supply chain with remanufacturing can be described as that a demand must be satisfied by production and/or remanufacturing, and/or inventory from previous periods as shown in Fig. 1. Production xt
Returns Ut
Inventory of Returns it
Remanufacturing rt
Inventory of final products It
Demand dt
Disposal
Fig. 1. Structure of the remanufacturing problem
In the general CPP with remanufacturing model in this paper, demands, capacities, and cost functions are all considered time-varying. Furthermore, we consider the production, remanufacturing, and inventory all capacitated. In particular, the production setup cost and remanufacturing setup cost are all timevarying, and there is an extra startup cost in the first period of a series of strictly positive manufacture (or remanufacture) periods. 2.1
Notations
Decision variables: xt : The quantity of the products newly manufactured in period t; rt : The quantity of the products remanufactured in period t.
A Capacitated Production Planning Problem for Closed-Loop Supply Chain
245
Status variables: it : The inventory of the reclaimed products at the end of period t; It : The inventory of finished products at the end of period t; at : Binary variable indicating the occurrence of manufacture setup cost in period t; bt : Binary variable indicating the occurrence of remanufacture setup cost in period t; at : Binary variable indicating the occurrence of manufacture startup cost in period t; bt : Binary variable indicating the occurrence of remanufacture startup cost in period t. Parameters: Ut : The quantity of returned products in period t; dt : The quantity of demand in period t; xt : The manufacture capacity in period in period t; r t : The remanufacture capacity in period t; ct : The unit manufacturing cost in period t; vt : The unit remanufacturing cost in period t; ht : The unit inventory cost of returned product in period t; Ht : The unit inventory cost of finished product in period t; SPt : The manufacture setup cost in period t; SRt : The remanufacture setup cost in period t; ESPt : The manufacture startup cost in period t; ESRt : The remanufacture startup cost in period t; M : A positive big number. 2.2
Model
The model can be formulated as: min x,r
T [SPt at + ESPt at + xt ct + SRt bt + ESRt bt + rt vt + it ht + It Ht ];
(1)
t=1
s.t. it = it−1 + Ut − rt
t = 1, 2, · · · , T ;
It = It−1 + xt + rt − dt
t = 1, 2, · · · , T ;
(2) (3)
0 ≤ xt ≤ xt
t = 1, 2, · · · , T ;
(4)
0 ≤ rt ≤ rt
t = 1, 2, · · · , T ;
(5)
it ≤ 0 t = 1, 2, · · · , T ;
(6)
It ≤ 0 t = 1, 2, · · · , T ;
(7)
at M ≤ xt
t = 1, 2, · · · , T ;
(8)
246
J. Zhang and X. Liu
bt M ≤ rt 1 − at 1 − bt
t = 1, 2, · · · , T ;
(9)
≤ M [1 − at + at−1 ] t = 1, 2, · · · , T ;
(10)
≤ M [1 − bt + bt−1 ] t = 1, 2, · · · , T ;
(11)
i0 = 0, I0 = 0, a0 = 0, b0 = 0;
(12)
at , at , bt , bt
∈ {0, 1} t = 1, 2, · · · , T.
(13)
The object function (ref1) is to minimize the total of manufacture costs, remanufacture costs, inventory costs, setup costs, and the startup costs. (2) and (3) are the balance constraints for the two independent inventory systems. (4) and (5) are the capacity constraints. (8) and (9) constrain that the setup cost happen when the production in the same period is positive. (10) and (11) constrain that the startup cost occurs in period t only if the manufacture (or remanufacture) batch is zero in period t − 1 and strictly positive in period t. (12) implicates the initial status of constraints (2), (3), (10) and (11). In the above model, (2) can be transformed into: rt = Ut + it−1 − it
t = 1, 2, · · · , T ;
(14)
Then from equation (5), (6) and (14), the range of rt can be transformed to: 0 ≤ rt ≤ min[Ut + it−1 , r t ] t = 1, 2, · · · , T. Let rt = min[Ut + it−1 , r t ], and then we have: t
rk ≤
k=1
t
r k
t = 1, 2, · · · , T.
k=1
To ensure that the model can have feasible solution, we need to ensure enough available-to-promise capacity. Since from (3) we have: t
dk =
k=1
t
xk +
k=1
t
rk − It
t = 1, 2, · · · , T,
k=1
then we can get the necessary condition for existing feasible solutions: t k=1
xk +
t
r k ≤
t = 1, 2, · · · , T.
(15)
k=1
Equation (15) will be used in our algorithm to test the feasibility.
3
Algorithm
The algorithm consists of two main steps. Firstly, We generate a binary string to decide the non-zero production periods (manufacturing and remanufacturing). Secondly, we compute the exact value of the decision variables through a dynamic programming. In the rest part of this section, the coding scheme, parent selection, crossover and mutation methods are described.
A Capacitated Production Planning Problem for Closed-Loop Supply Chain
3.1
247
Coding Scheme
In our GA heuristics, the chromosome is coded as a 2 × T binary matrix B in whichB[i, j] indicates the setup occurrence of the production method i in period j. In matrix B, the first row shows the manufacture setup cost status while the second row shows the remanufacture setup cost status, as presented in Figure 2. Since the values of xt and rt cannot be obtained directly from the chromosome, a decoding process in which xt and rt are heuristically computed is presented in Section ??. In order to develop a time efficient GA heuristic approach, two criteria are applied to constrain an individual chromosome: (1)production in the first period should be positive if the demand in the first period positive; (2) the length of the successive “0” periods should satisfy a necessary condition that the production cost in the first ”1” period plus the holding cost occurred in the following successive “0” periods is less than the production cost of the next ”1” period. 3.2
Fitness Function
An individual with a higher fitness level has lower value total cost. In this study, we use relative fitness function (F itR ) to measure relative fitness over within the population of the current generation. F itR is defined as: F itR =
Objmax − Obj , Objmax − Objmin
where Obj is the objective function of concerned candidate, Objmax is the objective function of worst fit candidate within current generation, and Objmin is the objective function of the best fit candidate. Note that we compare the fitness of candidates across different generations using the object function only. 3.3
Genetic Operators
In this research, we reference three standard genetic operators, i.e., parent selection, crossover and mutation. We adopt the roulette wheel method in selection, because it tends to promote the diversity in chromosome population and is thus conducive to avoiding a premature convergence of the GA. The crossover adopted in this paper is standard single point crossover. In mutation, we use a decreasing mutation rate for a better convergence in the later generations. 3.4
Compute xt and rt
It is proved that the optimal solution for the non-capacitated lot-sizing problem is featured by zero-stock production [6]. We use (·, ·)t to represent the ith column of matrix B, i.e., (1, 0)t means produce with manufacture only, (0, 1)t means remanufacture only, and (1, 1)t means both. We adopt Bitran and Yanasee’s (1982)’s method to calculate the corresponding value of xt and rt .
248
J. Zhang and X. Liu
For and non-zero production period, the manufacturer need to produce enough quantity for demands in the following zero production period. We define cutthe 2 −1 mulative demand Dt1 , the quantity produced in period t1 , as Dt1 = k=t dk , 1 where t1 is a non-zero production period and t2 is the next non-zero production period. Next we need to revise Dt1 so that no overflow happens. The production quantity is constrained by production capacity, inventory capacity. Failures such as the manufacture breakdown, material shortage, may cause dramatically decreasing in capacity. Intuitively, there are two approaches to stabilize the production system: (I)increasing the stock level of the finished products, and (II) activating the other production method to relieve the intensity productivity shortage. We design our revise algorithm as followed. Let’s say the productivity shortage in period t is Et , and the previous non-zero production period (manufacture or remanufacture) is t . 1. If (1, 1)t , then compare the total cost of Strategy (I), Strategy (II), the one with lower cost will be chosen for productivity supplement. The cost of strategy (I), CostI is: CostI =
t−1 Et ct + k=t Et Hk xt rt t−1 T , Et vt + k=t Et Hk − k=t hk rt xt
where means “is superior than” with cost concerns. The cost of strategy (II), CostII is: Manufacturing is activated Et ct T CostII = . Et (vt − k=t hk ) Remanufacturing is activated 2. If (1, 0)t or (0, 1)t , the other production method is constrained to zero, which means only strategy (I) is available. Note that the overflow fixing process is a iterative process, in which revise continues when new overflow happens. Recall that (15) gives the necessary condition of being a feasible solution without overflow, it is natural that some infeasible individuals cannot be fixed. In these cases, an extra penalty will be added to lower the infeasible individual’s fitness.
4
Numerical Experiment
The algorithm is coded in MS Visual C++ 6.0 and tested on PC with a 1.66 GHz Intel Core 2 Duo CPU. The demand, the manufacturing capacity, the reclaimed quantity, the remanufacture capacity in each period are randomly generated from the uniform distribution U (50, 100), U (50, 150), U (50, 150) and U (50, 150) respectively. The setup cost of manufacturing and remanufacturing are 500 and 550 respectively. The extra startup costs of manufacturing and remanufacturing
A Capacitated Production Planning Problem for Closed-Loop Supply Chain
249
are both set to 200. The unit production costs of manufacturing and remanufacturing of each period are vary from uniformly distribution U (4.5, 5) and U (5, 5.5) respectively. The unit finished product inventory cost and the unit reclaimed product inventory cost are set to 1 and 0.5 respectively. We test our algorithm through four steps: 1. Generate the parameters from the distributions above, and compute the feasibility using Equation (15). If there is no feasible solution, reset the parameters. 2. Run the algorithm with different set of GA operation parameters, i.e., the population size, the number of generations, and the initial rate of mutation, and choose the best one among different sets. 3. Analyze the results using statistical methods to test the reliability of the proposed algorithm. 4. Run Branch and Bound, and test the accuracy of the proposed algorithm. Table 1 shows the running results in different combinations of population size and number of generations, in which we run 50 times for each combination. we use relative standard deviation (RSD) to measure the stability of our algorithm. To reduce the bias in estimating RSD, the average of all the 600 results is used as the denominator. From Table 1, we can observe that when the population size and the number of generations are set to over 200 and 2000 respectively, the RSD is less than 0.50%, which illustrates the stability of the algorithm. Table 1. Running results in different population size and number of generation combinations Population size 50
100
200
Number of generations 500 1000 2000 4000 500 1000 2000 4000 500 1000 2000 4000
Average time (s) 1.65 3.70 7.82 15.56 4.38 8.77 17.05 33.99 11.56 22.17 40.67 79.55
Table 2 describes the performance of proposed algorithm under different problem scales from the aspects of accuracy and efficiency. From Table 2, it can be observed that when the problem scale reaches 200, the result of the proposed algorithm still shows the stability with RSD under 0.34%, which means the solution computed only vibrates within a small range around the optimal solution.
250
J. Zhang and X. Liu Table 2. Performances under different problem scales Problem Scale 20 50 100 200
Total Cost Worst 18277.42 47087.91 96467.02 196436.40
Total Cost Best 18264.14 46413.15 94610.2 192333.70
Total Cost Average 18285.64 46618.52 95854.58 194733.49
RSD(%) 0.24 0.33 0.33 0.34
Average time (s) 8.545 24.088 75.543 272.61
Fig. 2 shows the convergence curve in different problem scales. The vertical axis represents the total cost while the horizontal axis represents the generations. It is intuitive that the larger the problem scale is, the more generations is required for convergence.
Fig. 2. Convergence curves
Table 3 shows the comparing results of the proposed algorithm and branch and bound (BB). We use error level, the offset proportion from optimal, to measure the inaccuracy of the proposed algorithm based on the optimal solution computed through BB. From Table 3 we can observe that when the problem Table 3. Compare with BB Problem Size 10 20 30
Total cost Average 9254.32 18277.42 27886.68
proposed heuristics Cost Error Average Best level time(s) 9252.36 0.02% 4.48 18264.14 0.46% 8.55 27871.88 0.57% 15.32
Optimal cost 9252.36 18194.20 27728.13
BB Average time (mm:ss) 00:03 00:58 52:08
A Capacitated Production Planning Problem for Closed-Loop Supply Chain
251
scale increases to 30, BB can not solve the problem in reasonable time, while our algorithm only use 15.32s with tolerated error.
5
Conclusions
In this paper a model of the capacitated production plan problem under a closedloop supply chain is formulated, in which the remanufacture is also regarded as a supplement to increase the supply chain’s robustness. In the model developed in this paper, two independent inventory are considered. We also import the startup costs which occur at the beginning of a series of successive productions. A problem specific GA is proposed to solve the problem. The statistical results of the running results are provided in the numerical experiment. In the comparison with the BB, we illustrate the accuracy and efficiency of the proposed algorithm. The comparison results also implicate that for small size problem (usually less than 20 periods), BB is the first choice, and otherwise the proposed the proposed algorithm is.
References 1. Fleischmann, M., Bloemhof-Ruwaard, J.M., Dekker, R., van der Laan, E., van Nunen, J.A.E.E., Van Wassenhove, L.N.: Quantitative models for reverse logistics: A review. European Journal of Operational Research 103(1), 1–17 (1997) 2. Hammond, D., Beullens, P.: Closed-loop supply chain network equilibrium under legislation. European Journal of Operational Research 183(2), 895–908 (2007) 3. Savaskan, R.C., Bhattacharya, S., Van Wassenhove, L.N.: Closed-loop supply chain models with product remanufacturing. Manage. Sci. 50(2), 239–252 (2004) 4. Guide, V.D.R., Van Wassenhove, L.N.: Managing product returns for remanufacturing. Production and Operations Management 10, 142–154 (2001) 5. Souza, G.C., Ketzenberg, M.E., Guide, V.E.R.: Capacitated remanufacturing with service level constraints. Production and Operations Management 11(2), 231–248 (2002) 6. Wagner, H.M., Whitin, T.M.: Dynamic version of the economic lot size model. Manage. Sci. 5, 89–96 (1958)
Distributed Hierarchical Control for Railway Passenger-Dedicated Line Intelligent Transportation System Based on Multi-Agent Jingdong Sun1, Yao Wang2, and Shan Wang3 1
Department of Traffic and Transportation, Southwest Jiaotong University, Num.1, Jingqu Road, Emeishan Sichuan 614202, China [email protected] 2 Department of Computer Science and Communication Engineering, Southwest Jiaotong University, Num.1, Jingqu Road, Emeishan Sichuan 614202, China [email protected] 3 Department of Electronic information engineering, Jincheng Campus, Sichuan University, Num.1, Xiyuan Road, Chengdu, Sichuan 611731, China [email protected]
Abstract. Through detailed analysis of control problem of railway passengerdedicated line intelligent transportation system, a distributed hierarchical control method was proposed by blending the factitious division of hierarchical system and the natural division of distributed control system. This method is based on the hierarchical control theory of large scale system, and includes three levels: the organization level, the coordination level, and the execution level. The objectives are decreasing the total delay time and increasing the absorption to passengers of the successive trains, a method for train operation adjustment based on Multi-Agent is developed, and the control strategy based on rules is proposed. The distributed hierarchical control method deals with the train planning, the coordination of trains, and the action control of trains in different layers. The organization level responds to the real-time changing transport demands and determines the planning orders of the trains, the coordination level is used to resolve the underlying conflicts between trains in the system, and the execution level is used to control the action of the trains. The simulation results show the effectiveness of the model and the revised algorithm based on Multi-Agent comparing with the traditional control theory. Keywords: railway passenger-dedicated line, intelligent transportation system, distributed hierarchical control, Multi-Agent, dispatch.
and is prone to analyzing description. However, a distributed control system uses the system structure, which has such characteristics: geographic distribution, disperse control function and centralized management operation [1]. The train operation process of intelligent railway passenger-dedicated line is a complex human-computer cooperation process, which includes the analysis of qualitative and quantitative, time and space, local linear and integral non-linear. The system has several input and output variables, while most parametric variables are non-linear, time-variable, uncertain and spatial distributed [2]. To such discrete event dynamic system, the difficulty of the train operation adjustment is huge. It will be hard to recover train behind schedule fleetly, if it uses the traditional control theory, such as optimization method, scheduling rules, simulation method and so on. The distributed hierarchical intelligent control system based on Multi-Agent blends the factitious division of hierarchical system and the natural division of distributed control system can preferably solve the above problems. It coordinates a series of decentralized intelligent units (Agent) to solve problems, and these units have their own objectives and autonomous behaviors, and also have sub-units. However, neither one unit can solve the global problem, so they must be coordinated.
2 Model Establishment The train operation adjustment is a process to reschedule the train timetable and assure most trains running on time by some adjustment measures when the actual train operation status deviate the project train schedule. The train operation adjustment of railway passenger-dedicated line intelligent transportation system must adopt realtime control mode, which uses the decreasing the total delay time and increasing the absorption to passengers of the successive trains as its performance guidelines according to the characteristics of train operation of railway passenger-dedicated line. Suppose there are n stations in district A-B, the station set is S={l,2,3,…, n}, the corresponding section set is Q={l,2,3,…, n-1}, the train set running in the district is L={l,2,3,…, m}. The planned arrival time of train l ( l ∈ L ) at station k ( k ∈ S ) is Ak ,l , the actual arrival time is Ak′ ,l , the planned departure time of train l from station k is Dk ,l , the actual departure time is Dk′ ,l . The minimal running time of train l between station k and k+1 is t k ,k +1 , the additional starting time at station k is τ q , the additional l
k
stopping time at station k is τ t , the planned stop time for train l at station k is Tk ,l , the k
actual stop time is Tk′,l . The minimal train tracking interval time is k
arrival headway interval time for trains at station k is I d
I , the minimal
,the minimal departure
k
interval time for trains at station k is I f . The skylight starting time of section i is TCBi , the skylight stopping time of section i is TCE i , the arrival and departure tracks of station k is G k . Define
level (l ) the level of train l, the smaller the value l,
254
J. Sun, Y. Wang, and S. Wang
ω ( j ) the operation adjustment weight value of level j, and j= level (l ) , the higher the train level, the bigger the value of ω ( j ) . The opti-
the higher the train level. Define
mization model of train’s total delay time is as formula (1) [3]; the optimization model of passenger waiting time is as formula (2). n
[
m
min F = ∑ ∑ ω (level (l )) Ak′ ,l − Ak ,l + Dk′ ,l − Dk ,l k =1 l =1
]
(1)
n −1 m −1
min P = ∑∑ level (l )[( Ak ,l +1 − Ak ,l − Tk ,l ) − ( Ak′ ,l +1 − Ak′ ,l − Tk′,l )] (2) k =1 l =1
Formula (3) is the constraint of the section of the train operation. Formula (4) is the constraint of the station working time. Formula (5) is the constraint of the station tracking interval. Formula (6) is the constraint of the train overtaking. Formula (7) is the constraint of the departure time. Formula (8) is the constraint of station’s arrival and departure tracks capacity, where u ( x, t ) is binary step function,
u ( x, t ) =1( x ≤ t ), u ( x, t ) =0( x > t ). Formula (9) is the constraint of the skylight
time.
3 Multi-Agent Characteristic and Structure of Train Operation Adjustment The characteristics of railway passenger-dedicated line train agent include: autonomy, communication, reaction, oriented-objective, facing-environment and so on. Autonomy indicates that agent possesses control right to its behavior or action, without external interference, autonomously completes its special task; Communication indicates that each agent accepts task assignment or feedbacks the task execution information by intercommunication in organized train group; Reaction indicates that agent should possess the ability of perception environment and reaction. Oriented-objective indicates that agent can evaluate its own behavior, and gradually orientate targets;
Distributed Hierarchical Control
255
Facing-environment indicates that agent can only works in special environment, which is to realize the safety and reliability of train operation with the communication between one after another agents or agent and dispatcher agent. According to the above request, Multi-Agent System (MAS) uses compound structure with cognitive process and reactive process, as Fig. 1. In which the external environment mainly includes dispatching section status information, train operation adjustment decision and so on. Perceptron receives environment information by sensors, and carries through preprocessing and character identification. Reactor makes judges of the information from perceptron, directly starts up performer to control train operation towards emergency state or simple state. Towards un-emergency state, it usually starts up the performer after the decider making a decision.
Fig. 1. Multi-Agent compound structure of train operation adjustment
4 Distributed Hierarchical Control Structure of Train Operation Adjustment The railway passenger-dedicated line operation has the requests of high speed, high density, high-punctuality rate and high safety, so consider transport task decomposition and train coordination in different lever by distributed hierarchical control structure, which includes three levels: the organization level, the coordination level and the execution level (see Fig. 2). The organization level locates in the high layer of MAS, it receives and translates input instructions and correlated system feedbacks, determines the task, and decompose it into sub-tasks with appropriate execute order, which is realized by administer computer. The coordination level locates in the middle layer of MAS; it receives instructions from organization level and feedback information in each sub-task executing course, and coordinates the executing course with the execution level, which is realized by operating station and monitoring computer. The execution level locates in the bottom layer of MAS, its task usually includes executing certain movement, which requires the knowledge of course mathematical model, course end state and performance criterion or cost function defined by coordinator, which is realized by spot controller.
256
J. Sun, Y. Wang, and S. Wang
Fig. 2. Distributed hierarchical control structure
4.1 Organization Level Organization level represents the dominant thought of control system, and is controlled by artificial intelligence [4]. Railway passenger-dedicated line operation adopts the control strategy based on the rules. Rule 1: for the trains of the same grade, on-time train absolutely leaves first, delay train leaves later; all the same grade trains leaves in the order of the earliest departure time; some with special requests leaves first. Rule 2: for the trains of different grades, lower grade train can’t arose the joint and several delay of higher grade trains; higher grade trains might accept lower grade trains joint and several delay in a certain limit time; some higher grade trains can accept lower grade trains joint and several delay in a certain limit time, other trains have absolute priority. Rule 3: lower grade trains can’t overtake higher grade ones, trains of equal grade can’t overtake each other. Rule 4: if large range of delay happens, it should process global adjustment; or, it should process local adjustment. Global adjustment adopts offset planned train diagram, by reducing the amount of the trains to shorten adjusting time. Local adjustment adopts rolling horizon optimization method, which is every adjustment improves the whole system operation situation, and reappraise system operation situation, and then make a new term of adjustment measures, and ultimately recover planned train diagram by several adjustments. 4.2 Coordination Level Coordination level is interface between organization level and execution level, which plays a connective role. The essential of the train operation adjustment is the conflict resolution process between the delay trains and other trains. In the railway passengerdedicated line, based on the location and the mode of the train conflict happens, there are 3 basic types: sections conflict, station headways conflict and station tracks conflict. Sections conflict has two kinds which are departure conflict and arrival conflict. Solve conflicts (except station tracks conflict) include: trains release order and waiting train translation. (1) Confirm the conflict type and solve method: based on the characteristics of high-speed train operation and the real situation of recursive solution by time order of the algorithm, in the point view of solving, the train events include departure conflict, arrival conflict and arrival-departure conflict. That is to confirm the train conflict by the order of departure conflict, arrival conflict and arrival-departure conflict. If a
Distributed Hierarchical Control
257
certain conflict is confirmed, then stop confirming, define it as the two train event conflict, and provide nature-determination information. (2) Confirm the conflict type when several train events simultaneously happen and solve method: When several train events simultaneously conflict, the combination of conflicts is too complicated. If considering all possible identities and matching, the account number will increase and it will be impossible to assure the real-time character of the algorithm. This paper adopts a strategy of gradually solving the conflicts, which can overcome the above difficulties, the method is as follows: Define train l occupies section i as a train event, expressed by eil . When it conflicts
{
}
Qil = eiq1 , eiq2 , L , eiqh ( q h ∈ Q ), and suppose
with several events, suppose
Dilq ( q ∈ {q1 , q 2 , L , q h }) is the discharged function of conflict between train event
eil and eiq . When Dilq =1, first discharge train event eil . Otherwise, when Dilq =0, first discharge train event eiq .
Δeil is the movement for train event eiq , while Δeiq
is the movement for train event
eiq . Suppose Dil is the first discharged function for
train event eil , and
Dil = Dilq1 Dilq2 L Dilqh If
(10)
Dil = 1, first discharge event eil , or Dil = 0; Δeil is the movement for event eil .
The solving conflicts steps are as follows: Step1: Solve each train event conflict in the conflict set, ∀ separately work out
Dilq , Δeil and Δeiq .
q ∈ {q1 , q 2 ,L , q h },
Dil by formula (10). When Dil = 1, first discharge event eil , the movement for conflict event eiq is eiq + Δeiq ; Otherwise, when Dil = 0, the current Step 2: Account
event
eil waits for avoiding, the movement for event eij is eil + Δe , and, Δe = min {Δeilq : Δeilq ≠ {0,0}}
(11)
Formula (11) means that, when all the train events eil in the coming conflicts, the time for event eil is the minimal. The purpose is to avoid the confused phenomena by repeat solving train events, when the happening time of each train event changes in future. 4.3 Execution Level Execution level is the lowest layer of the distributed hierarchical intelligent control system, which requires higher precision and lower intelligence. It controls by control theory, and execute appropriate control function to correlated course. In execution level, Agent directly faces controlled objects and detected objects, and it queries the state of the controlled train by the task of coordination level and the decision results. The central control subsystem obtains the passenger information from each station
258
J. Sun, Y. Wang, and S. Wang
and the state information of each train as the inputs, gains the control instruction of each train, and sent to each train real-timely to accomplish transportation task.
5 Model Simulation In order to verification the validity the algorithm, simulation is made in the Matlab environment. Taking the transport and organization of passenger dedicated line from Zhengzhou to Xian as an example, the simulation condition is supposed as follows:
Fig. 3. Objective function evolution process
The total length is 511 km, station number n=10, plan to run L=24 trains, which include 12 up-ways and 12 down-ways. Train level=[1 1 1 2 2 4 3 2 1 3 2 2], in which, level 1 represents the passenger dedicated trains with speed above 250 km/h; level 2 represents the crossing-over high-speed trains with speed of 200 km/h; level 3 represents the middle-speed trains with speed of 160 km/h; level 4 represents the middle-speed trains with speed of 120 km/h. The minimal running time of trains with different level and the minimal corresponding operating time refer to literature [5], the additional stopping time is 1 min, and additional starting time is 2 min, station tracking train headways time is 4 min, station arrival and departure tracks matrix is [3 2 2 3 2 3 2 2 3 3], the skylight time is 0~6 h and the weight of trains with different level is [4 3 2 1], population size is 30, crossover probability is 0.8, mutation probability is 0.001, maximum algebra is 200. By combining genetic algorithms, the model is showed in Fig. 3. The test results show that the distributed hierarchical control method based on Multi-Agent can complete train operation adjustment in disturbing situation efficiently.
6 Conclusion Train is the objection with distributed character in logic and physics, train operation dispatch system is one of the important measures to improve railway transport
Distributed Hierarchical Control
259
efficiency and keep the safety and punctuality of train transportation. The transport task of railway passenger-dedicated line intelligent transport system has the characteristics of uncertainty, complexity and spatial distribution, the thought and method of large scale system control is an effective way to solve the system control problem. Multi-Agent technology which possesses autonomy, reaction and oriented-objective characteristic, is used to design train operation adjustment algorithm of railway passenger-dedicated line based on distributed hierarchical control method. By the method of decomposition and coordination, local adjustment and global adjustment, the cooperation ability and initiative of every part of the control system can be improved, the total delay time of train group can be decreased, the absorption to passengers of the successive trains can be increased, and ultimately improve railway train dispatching automation level. Acknowledgments. Sponsored by Promising Star Project-Distinguished Teacher Backbone of Young Teacher, Southwest Jiaotong University (2009).
&
References 1. Wang, J.P., Chen, H.: Study of Agent-based Distributed Hierarchical Intelligent Control. Control and Decision 16(2), 177–180 (2001) 2. Dong, H.Y., Jia, S.F.: Distributed Intelligent Control of Train Based on Multi-agent. Journal of the China Railway Society 26(5), 61–65 (2004) 3. Wang, H.N.: Study on Passenger-special Line Train Regulation Model and Algorithm. Master’s Thesis, Southwest Jiaotong University, 16-21 (2006) 4. Cai, Z.X.: Structures for Hierarchical Intelligent Control Systems. Control and Decision 14(6), 642–647 (1999) 5. Zhao, S.X., Dang, J.W.: Study on Chaos-improved Genetic Algorithm for passengerdedicated lines train operation adjustment. Computer Engineering and Applications 45(9), 220–222 (2009)
GA-Based Integral Sliding Mode Control for AGC Dianwei Qian1 , Xiangjie Liu1 , Miaomiao Ma1 , and Chang Xu2 1
2
School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, P.R. China [email protected] College of Energy and Electricity, Hohai University, Nanjing, 210098, P.R. China
Abstract. This paper addresses an integral sliding mode control approach for automatic generation control (AGC) of a single area power system. Genetic algorithm (GA) is employed to search the parameters of the sliding surface. The proposed design is investigated for AGC of a single area power system, made up of reheated thermal and gas power generations. Compared with the GA-based proportion-integral (PI) control, simulation results show the feasibility of the presented method.
1
Introduction
Automatic Generation Control (AGC) is one of the most important issues in the operation and design of contemporary power systems [1]. The primary objectives of AGC are to adjust the power output of the electrical generator within a prescribed area in response to changes in system frequency, tie-line loading (for interconnected areas), so as to maintain the scheduled system frequency and interchange with the other areas with predetermined limits [2]. A large number of approaches concerning the AGC problem have been presented in the last two decades, e.g., optimal control [3], variable structure control [4], adaptive control [5], robust control [6], intelligent control [7]. In the refereed literature, the AGC problem can be cataloged as single area with a thermal or hydro power source and interconnected double areas with thermal-thermal power sources. In this paper, we focus on a single area with reheated thermal and gas power sources, which is rarely refereed in the above references. With the increase in size and complexity of power systems, there may exist large number of various sources of generations in a prescribed area, which make our research interesting in practical accounts [8]. Integral sliding mode control (ISMC) [9] is a robust feedback control method, possessing the property that the order of its motion equation is equal to the order of the original system. Such technology could avoid the chartering phenomenon of the conventional sliding mode and preserve the robustness and accuracy provided by the sliding mode. But we have to select the parameters of the sliding surface of ISMC after trial and error during the design process of ISMC. Genetic algorithm (GA) is a searching strategy inspired by natural evolution behavior, pointing out Y. Tan, Y. Shi, and K.C. Tan (Eds.): ICSI 2010, Part II, LNCS 6146, pp. 260–267, 2010. c Springer-Verlag Berlin Heidelberg 2010
GA-Based ISMC for AGC
261
a gateway to free the this time-consuming business. Thus, the combination of ISMC and GA provides a good candidate to solve the AGC problem concerning the single area with multiple sources.
2
Power System Models
The power system for the AGC problem under consideration is expressed only to relatively small changes so that it can be adequately represented by the linear model in Fig. 1. Figure 1 represents the block diagram of a single area system with multi-source power generations. The total generation is from reheated thermal and gas power generating units equipped with speed governors, which represents the thermal and gas power generating units lumped together in this prescribed area, respectively.
Fig. 1. Linear model of a single area power system
The symbols in Fig. 1 are explained as a and c are constants of valve positioner, b is time constant of valve positioner, X is gas turbine speed governor lead time constant, Y is gas turbine speed governor lag time constant, Tgf is fuel time constant, Tgcr is combustion reaction time delay, Tgcd is compressor discharge volume time constant, Kg is gas power generation contribution, Rg is speed governor regulation parameter of the gas unit, Tthg is steam turbine speed governor time constant, Kthr is coefficient of re-heater steam turbine, Tthr is reheater time constant, Ttht is steam turbine time constant, Kth is thermal power generation contribution, KP S is power system gain constant, TP S is power system time constant, Rth is speed governor regulation parameter of the reheated thermal unit, ΔPCth is change in thermal turbine speed-changer position, ΔPCg is change in gas turbine speed-changer position, ΔPGth is thermal power deviation, ΔPGg is gas power deviation, ΔPG is total power deviation, ΔPd is load disturbance, ΔF is frequency deviation, ΔPCth is the AGC control signal of the reheated thermal unit, ΔPCg is the AGC control signal of the gas unit. It is obvious that the plants for the AGC in the area consist of three parts: – Reheated thermal turbine and its governor with dynamics Gthg (s) = and Gtht (s) =
1+sKthr T thr (1+sTthr )(1+sTtht )
1 Tthg s+1
262
D. Qian et al.
– Valve positioner, fuel system, gas turbine and its generator with dynamics 1−T s a Gvp (s) = bs+c , Gf c (s) = Xs+1 , Ggt (s) = 1+T1gcd s and Ggg (s) = 1+Tgcr Y s+1 gf s – Power systems with dynamics Gps (s) =
KP S TP S s+1
Assume there is no mismatch between generation and load under normal operating conditions. The total generation is determined by PG = PGth + PGg
(1)
where PGth = Kth PG , PGg = Kg PG , and Kth + Kg = 1. The values of Kth and Kg depend upon the total load and also involve economic load dispatch. For small perturbation, (1) can be written as ΔPG = ΔPGth + ΔPGg
(2)
In Fig. 1, both the AGC control signals ΔPCth and ΔPCg will be produced by two integral sliding mode controllers, designed in the following section.
3 3.1
GA-Based Integral Sliding Mode Controller Design of Integral Sliding Mode Controller
In the integral sliding mode method, a sliding surface should be constructed by the system state variables in the state space so that the state space models of the above single area with the two generation sources should be transformed from their transfer functions. Without loss of generality, the state space expression of the AGC problem of a single area with a generation source can be depicted as ˙ = AX + Bu + f (X) X
(3)
where X is the n-dimensional state vector, u is the control scalar, produced by the integral sliding mode controller, A is the n × n state matrix, and B is the n × 1 input vector, f (X) is the nonlinear perturbation vecotor with a known upper bound vector f0 > 0. (3) is a simple model, associated with the AGC problem of a single area with multiple generation sources in the paper. In Fig. 1, we can get such the state space model as (3) by separating one generation unit from the other. For example, let Kth = 0, we could get the transfer function model of a single area with a gas generation unit, then the state space model of such the system can be gotten. For the control design, we define the control input u as u = uic + urp
(4)
where uic is the predetermined ideal control, denoting the state trajectory of the ˙ = AX + Buic , e.g. uic may be obtained through linear static feedback system X control uic = −k T X, urp is discontinuous to reject the perturbation.
GA-Based ISMC for AGC
263
Then, the sliding surface s is defined as s = s0 (X) + z
(5)
where s0 may be designed as the linear combination of the system states s0 = C T X (similar to conventional sliding surface), z induces the integral term and may be determined as z˙ = −C T (AX + Buic ), z(0) = −C T X(0). Theorem 1. Consider the single area power system with a generation unit as (3). If the control law and the integral sliding surface are defined as (4) and (5), then the sliding motion of the nominal system of (3) will occur at t = 0. Proof. Let f (X) = 0 in (3), its nominal system could be written as ˙ = AX + Bu X
(6)
From (5), we can get the sliding surface as t s = s0 (X) + z = C T X + [−C T (AX + Buic )]dt
(7)
0
When the sliding mode occurs, we have ˙ − C T (AX + Buic ) = 0 s˙ = C T X
(8)
which means the motion equation of the sliding surface coincides with the nominal system dynamics. Further, at t = 0, we have s(0) = C T X(0) + z(0) = C T X(0) − C T X(0) = 0 Thus, the sliding mode of the nominal system will occur at t = 0.
Theorem 2. Consider the single area power system with a generation unit as (3), define the control law (4), the integral sliding surface (5) and the discontinuous term urp = −ρs − σsign(s), (ρ > 0, > 0). If C T Bσ ≥ C T f0 is satisfied, then the integral sliding surface is asymptotically stable. Proof. Define the Lyapunov function as V = to time t, we obtain V˙ = ss˙
s2 2 .
Differentiating V with respect (9)
Substituting (3), (4) and (5) into (9), we have V˙ = ss˙ = s(s˙0 + z) ˙ ˙ − C T (AX + Buic )] = s[C T X = s{C T [AX + Bu + f (X)] − C T (AX + Buic )} = s{C T [AX + B(uic + urp ) + f (X)] − C T (AX + Buic )} = s{C T Burp + C T f (X)}
(10)
= s{C T B[−ρs − σsign(s)] + C T f (X)} = −C T Bρs2 − C T Bσ| s | + C T f (X)s ≤ −C T Bρs2 − C T Bσ| s | + C T f0 s < 0 Thus, the sliding surface s with integral term is asymptotically stable.
264
3.2
D. Qian et al.
Parameter Tuning by Genetic Algorithm
Genetic algorithm (GA) is a searching strategy inspired by natural evolution behavior. Each individual consisting of a set of parameters to be tuned can be represented by a chromosome. A simple GA includes individual selection, mutation and crossover steps. The selection from the whole population is based on each individual’s fitness. A roulette selection strategy is adopted in the following comparison. The mutation causes a complete opposite change on gene bit randomly. The crossover exchanges part of the information between two individuals. After genetic operation, new individuals are generated to form a new population. The fitness mapping is a key problem for the genetic learning process. The reciprocal of the integral squared error (ISE) of the system states is selected as the individual fitness 1 J = ∞ i=n 2 { i=0 xi (t)}dt
(11)
0
Here xi is the ith element of the n−dimensional state vector X. A good individual corresponds to a small objective value or a big fitness. As the genetic operation goes on, the individual maximum fitness and the population average fitness are increased steadily. In our simulations, we find that the controller parameters can be searched out by using such simple genetic algorithm, but they vary greatly with different crossover probability, mutation probability, and population size. Controller parameters often converge to different results in different experiments, which may not be an optimized solution even may lead to a false solution. For such the case of the AGC problem in this paper, some modifications are proposed on the basis of the simple genetic algorithm. Large crossover probability and small mutation probability will ensure population diversities and prevent premature convergence of maximum individual fitness so that crossover fraction and mutation fraction are set to 0.95 and 0.05, respectively. Elitist individual reservation is applied to ensure the maximum fitness to keep on increasing and prevent fluctuation of maximum fitness caused by large crossover probabilities. From (5), the design of the sliding surface with integral term can be summarized as the process of finding a suitable vector C T . Further, we have known that uic can be designed as the state feedback controller uic = −C T X so that we can conclude the constraint as the eigenvalues of (A − BC T ) < 0 on the aspect of the system stability. For accelerating our search, we preset σ and ρ before the optimization by GA.
4
Simulation Results
In this section, we shall demonstrate the application of the presented GA-based integral sliding mode control for the AGC problem of a single area with reheated thermal and gas generation units as shown in Fig. 1. The values of the parameters of this power systems are determined [8] as follows.
GA-Based ISMC for AGC
265
– Reheated thermal generation unit: Tthg = 0.08 s, Ttht = 0.3 s, Tthr = 10 s, Kth = 0.3, Rth = 2.4 Hz/puMW – Gas generation unit: X = 0.6, Y = 1.0, a = 1, b = 0.05, c = 1, Tgf = 0.23 s, Tgcr = 0.01 s, Tgcd = 0.2 s, Rg = 2.4 Hz/puMW – Power system: KP S = 68.57 Hz/puMW, TP S = 11.43 s for the operating load 1750 M W ; KP S = 75 Hz/puMW, TP S = 12.5 s for the operating load 1600 M W For verifying the robustness of the integral sliding mode controller, we employ the parameters at the operation load 1750 M W as the design point and the parameters at the operation load 1600 M W as the checking point. th th th T Let the system output fth = [0 0 74.99 24.99] × [xth 1 x2 x3 x4 ] , where th xi (i = 1, 2, 3, 4) is the state variables. For the reheated thermal generation unit, the corresponding values of the state matrix Ath and the input vector Bth can be obtained as ⎡ ⎤ ⎡ ⎤ −16.02 −44.66 −39.19 −10.78 1 ⎢ 1 ⎢0⎥ ⎥ 0 0 0 ⎥ ⎥ Bth = ⎢ Ath = ⎢ ⎣ 0 ⎣0⎦ 1 0 0 ⎦ 0 0 1 0 0 Similarly, the values of the state matrix Ag and the input vector Bg for the gas generation unit can be obtained as ⎡ ⎤ ⎡ ⎤ −30.43 −240.70 −657.66 −1132.37 −1124.76 1 ⎢ 1 ⎥ ⎢0⎥ 0 0 0 0 ⎢ ⎢ ⎥ ⎥ ⎥ ⎥ 1 0 0 0 Bg = ⎢ Ag = ⎢ ⎢ 0 ⎥ ⎢0⎥ ⎣ 0 ⎣ ⎦ 0 1 0 0 0⎦ 0 0 0 1 0 0 where the system output fg = [0 0 −15.65 1538.99 2608.22]×[xg1 xg2 xg3 xg4 xg5 ]T , xgi (i = 1, 2, 3, 4, 5) is the state variables. Due to the AGC problem in Fig. 1 with relatively small changes, we set σ = 2 and ρ = 0.3, which is enough to resist the perturbation up to 30% according to (10), far from the small changes. Utilizing the modified GA with the constraint T < 0, and adopting the two-point crossover method and the uniform Ath −Bth Cth T mutation function, we can get the optimized Cth during the continuous four exT periments shown in Table 1 and average each element of Cth as the parameters of the integral sliding mode controller of the reheated thermal generation unit. Similarly, Table 2 shows the optimized CgT during the continuous four experiments by the modified GA with the constraint Ag − Bg CgT < 0 and average each element of CgT as the parameters of the integral sliding mode controller of the gas generation unit. As shown in both the tables, the controller parameters are able to converge to similar results with the modified GA. T & CgT , the simulation results in Fig. 2 show Employing the searched Cth the frequency for 1% step load disturbance at the operating load 1750M W , where kth = 1 and Kg = 0 in Fig. 2(a) and kth = 71.43% and Kg = 28.57% in
Fig. 2(b). To verify the robustness of the GA-Based integral sliding mode control method for the AGC problem of the single area with multi-sources, Fig.3 shows the frequency for 1% step load disturbance at the operating load 1600M W with the same AGC controller parameters, where kth = 1 and Kg = 0 in Fig. 3(a) and kth = 71.43% and Kg = 28.57% in Fig. 3(b). Compared with the GA-based PI controllers, it is obvious that the ISMC method can decrease the overshoot in Fig. 2 and Fig. 3. 0.025
0.025 0.02
GA−based PI
0.02
K =1 th
GA−based PI
GA−based ISMC K =0
Δf
Δf
th
g
0.015
0.01
0.01 0.005
0.005
0
0 −0.005 0
K =71.43%
GA−based ISMC K =28.57%
g
0.015
5
10 time ( s) (a)
−0.005 0
20
15
5
10 time ( s) (b)
15
20
Fig. 2. Frequency deviation of the single area at the setting load point 1750M W
0.025
0.025
0.02
GA−based PI
K =1
GA−based IMSC
K =0
0.02
K =71.43% th
Ga−based ISMC K =28.57%
0.015
g
0.015
g
0.01 Δf
Δf
GA−based PI
th
0.01
0.005 0
0.005
−0.005 0 −0.005 0
−0.01 5
10 time ( s) (a)
15
20
−0.015 0
5
10 time ( s) (b)
15
20
Fig. 3. Frequency deviation of the single area at the checking load point 1600M W
5
Conclusions
This paper designs an integral sliding mode control approach for the AGC problem of a single area power system with multiple-source generation unit. The
GA-Based ISMC for AGC
267
stability analysis of the integral sliding surface is proven as well. For searching two groups of the sliding surface parameters of both the AGC controllers, GA is employed. Simulation results show the controller’s feasibility.
Acknowledgements This work was supported by the NSFC Projects (No.60904008, 60974051), the Fundamental Research Funds for the Central Universities (No. 09MG19, 09QG29), the National 863 Program (No. 2007AA05Z445).
References 1. Saadat, H.: Power system analysis. McGraw-Hill, New York (1999) 2. Kundur, P.: Power system stability and control. McGraw-Hill, New York (1994) 3. Mariano, S., Pombo, J., Calado, M., Ferreira, L.: Optimal output control: Load frequency control of a large power system. In: Proceeding of International Conference on Power Engineering, Energy and Electrical Drives, vol. 2, pp. 369–374 (2009) 4. Al-Hamouz, Z.M., Al-Duwaish, H.N.: A new load frequency variable structure controller using genetic algorithms. Electric Power Systems Research 55, 1–6 (2000) 5. Shoults, R.R., Jativa Ibarra, J.A.: Multi-area adaptive LFC developed for a comprehensive AGC simulator. IEEE Transaction on Power Systems 8, 541–547 (1993) 6. Yu, X.F., Tomsovic, K.: Application of linear matrix inequalities for load frequency control with communication delays. IEEE Transactions on Power Systems 19, 1508–1515 (2004) 7. C ¸ am, E.: Application of fuzzy logic for load frequency control of hydroelectrical power plants. Energy Conversion and Management 48, 1281–1288 (2007) 8. Ramakrishna, K.S.S.: Automatic generation control of single area power system with multi-source power generation. Proceedings of the Institution of Mechanical Engineers, Part A: Journal of Power and Energy 222, 1–11 (2008) 9. Utkin, V., Shi, J.X.: Integral sliding mode in systems operating under uncertainty conditions. In: Proceedings of the 35th IEEE conference on Decision and Control, vol. 4, pp. 4591–4596 (1996)
Stable Swarm Formation Control Using Onboard Sensor Information Viet-Hong Tran and Suk-Gyu Lee Department of Electrical Engineering, Yeungnam University, Gyeongsan, Korea [email protected], [email protected]
Abstract. In this paper, a stable leader-following formation control for multiple mobile robot systems with limited sensor information is studied. The proposed algorithm is to control a robot (follower) to follow another robot (leader), and easily extended to form any complex formation. The control algorithm requires information available from onboard sensors only, and utilizes estimation of leader’s acceleration in a simple form to reduce measurement of indirect information. There is also a rule to tune parameters of control in application. Keywords: formation control, leader-following control, swarm robotics, stability, nonlinear.
Stable Swarm Formation Control Using Onboard Sensor Information
269
by a local sensor carried by the follower robot. It must be estimated by positioning measurements which tend to enhance measurement noise dramatically, and estimation of absolute speed would be difficult due to the fact that the estimated speed is at the same time required in the robot’s own speed control. In this paper, we propose a formation control algorithm for multiple nonholonomic mobile robots system which considers the acceleration in a simple form to eliminate measuring leader’s velocity, and to stably control even when leader has complex trajectory. The algorithm is stable and all errors in relative states will be converged to zero. Tuning rule is provided to adjust parameters of controller.
2 Problem Statement In this section, we discuss the problem of designing control algorithms for mobile robots moving in formation. Our scope is just switching and maintaining the formation. Formation protocols for coordinating and organizing the grouped robots to accomplish the formation task is not our purpose. The system is divided into subsystems including 2 robots, where one robot has a role of leader and the other is at follower role. The control rule is applied for these pairs of robots. By this, it is very easy to extend the scale of the system without changing the main algorithm. 2.1 Formation Every formation has a global leader that is responsible for the formation’s collective motion. Each of the other robots is assigned a local leader among its neighbors (which may be the same as the global leader) such that the resulting formation structure is connected. The objective of each robot is to keep the relative distance and angle to its assigned local leader as close as possible to some desired reference values. Implicitly, relative distances and angles to the other agents will also be kept fixed. This sort of formation control can be used for a wide range of complex formations, as shown in two formation examples in Fig. 1.
R0 R1
global leader
R1
R2
R0
R2
global leader
R3 R3
R4
(a)
R4
(b)
Fig. 1. Leader–follower formation in various forms (a) and (b)
270
V.-H. Tran and S.-G. Lee
In Fig. 1(a), the form is arrow form with one global leader R0. R1, R2, and R3 are followers of R0, while R3 itself is leader of R4. Another formation of the four robots, in which R0 and R1 are two leaders, is shown in Fig. 1(b). By changing the relative distance and orientation between each pair of robots (R2–R0, R1–R0, R3–R1, R4–R3), the formation can be changed easily from form (a) to form (b). 2.2 Formation Control Framework We now consider a system of n mobile robots, R0, ..., Rn-1, where R0 is the global leader. The rest of the robots should line up with its own leader (adjacent robot). The problem we study deals with wheeled mobile robots with two degrees of freedom. The dynamics of ith robot is described by unicycle model as follows:
x&i = vi cos θ i ⎡ x&i ⎤ ⎡cos θ i ⎤ ⎡0⎤ y& i = vi sin θ i or ⎢⎢ y& i ⎥⎥ = ⎢⎢ sin θ i ⎥⎥ vi + ⎢⎢0⎥⎥ωi . ⎢⎣θ&i ⎥⎦ ⎢⎣ 0 ⎥⎦ ⎢⎣1⎥⎦ θ&i = ωi .
(1)
where vi is the linear velocity and ωi is the angular velocity of the mobile robot; θi is the angle between the heading direction and the x axis, and (xi,yi) are the Cartesian coordinates of the center of mass of the vehicle (see Fig. 2). The problem with a system of n robots can be considered as a series of problems with each pair of robots. Therefore, we analyze the basic leader-formation of two robots as shown in Fig. 2. Every complex formation can be a combination of these basic formations. In this configuration, Ri is leader robot and Rk is follower robot. Let dk,i denote the actual distance between Ri and Rk, ϕk,i is the actual bearing angle from the orientation of Rk to d-axis (the axis connecting Ri and Rk). The definition of formation requires that distance between Ri and Rk equals dk0 and the bearing angle from the orientation of Rk to d-axis is desired to be ϕk0.
Fig. 2. Basic leader-follower formation
Fig. 3. Basic leader-follower formation in details
Base on the configuration and those definitions, the dynamics of system is: d& k ,i = −vk cos ϕ k ,i + vi cos(ϕ k ,i + Δθ k ).
ϕ& k ,i = −ω k + vk
sin ϕ k ,i d k ,i
− vi
sin (ϕ k ,i + Δθ k ) d k ,i
.
(4) (5)
Stable Swarm Formation Control Using Onboard Sensor Information
271
Δθ&k = ω k − ωi .
(6)
where Δθk = θk – θi is the difference between headings of two robots. The objective of the leader–follower control, therefore, is stated as following: Problem: Given vi(t) and ωi(t), find control vk(t) and ωk(t) such that d k ,i → d k 0 ,
ϕ k ,i → ϕ k 0 ,
Δθ k → 0 as t → ∞,
ϕk0 ≤ π / 2 .
(7)
In order to solve the problem, we choose a reference point (xk0,yk0) (see Fig. 2) on the direction whose angular deviation with orientation of follower robot is ϕk0, and at a distance dk0 from the center of follower. ⎡ xk 0 ⎤ ⎡ xk ⎤ ⎡cos(θ k + ϕ k 0 )⎤ ⎢ y ⎥ = ⎢ y ⎥ + ⎢ sin (θ + ϕ )⎥ d k 0 . k k0 ⎦ ⎣ k0 ⎦ ⎣ k ⎦ ⎣
(8)
With the definition of this reference point, we can obtain the desired control of robot by controlling (xk0,yk0) towards the position of leader (xi,yi). It means that simultaneously control the distance dk,i and the angle of orientation, φk,i, so that relative distance dk,i and relative bearing angle ϕk,i approaches dk0 and respectively.
the we the
ϕk0
3 Proposed Control Algorithm Fig. 3 shows the formation with some supported information in order to find the following control. As t → ∞, we would like to control (xk0(t),yk0(t)) converge to (xi(t),yi(t)), i.e. the point C need to be approach to A. On dk0-axis, the change of position from C to A is mirrored as the change from C to D. The component on dk0-axis of relative velocity between vk and vi will do this task. vk cos(ϕ k 0 ) − vi cos(ϕ k 0 + Δθ k ) = K1 {d k ,i cos(Δϕ k ) − d k 0 }.
(9)
where Δϕ k = ϕ k ,i − ϕ k 0 , and K1 is a positive constant. Next, in order to find ωk, we need to consider ak axis, as shown in Fig. 3. ak-axis is perpendicular to vk. Similar to the way of calculating vk, the change of position on this axis (L→M) is caused by the rotation with angular velocity ωk and vi. We have: d k ,i cos(ϕ k 0 ).ω k + vi sin (Δθ k ) = K1 {d k ,i cos(ϕ k ,i ) − d k 0 sin (ϕ k 0 )}.
(10)
The above results can be shown quite straight forward. In fact, let ex = xk0−xi, and ey = yk0−yi, then we have: e& x = − K1e x and e& y = − K1e y .
(11)
This implies that ex and ey are exponentially convergent to zero, with an exponential rate of K1. As a consequence, the dimension of K1 is [s-1]. Apparently, for any given rate of convergence K1, the control actions in (10) and (11) will grow large as ϕk0 approaches π/2 and become singular for ϕk0 = π/2. In fact, it seems that no controller
272
V.-H. Tran and S.-G. Lee
can handle the whole range from 0 to π/2, so for angles close to π/2 one should switch to another controller to avoid saturation, see e.g. [17]. One problem is that the above control rule and most other controllers in the literature require the measurement of the leader’s speed vi. However, using only onboard sensors to accurate measure the leader’s speed is practically impossible, in particular if the speed is supposed to be fed back into the robot’s own speed regulation as in (10) and (11). Therefore, we try to estimate vi from Fig. 3. The estimated leader’s velocity ve has same direction with vi. As time goes on, when the formation stable state is reached, ve is equal to vi, i.e. ve → vi as t → ∞. This time, we will consider the behavior on vi axis. The change of ve over time in this axis is related to the change of position on this axis (C→A). If the difference between A and C decreases, it means that the velocity ve does not change much, because the state of the system is approaching stable state, and vice versa. From that idea, we propose a method to estimate vi as follows: (12) v&e = K 2 {d k ,i cos(ϕ k ,i + Δθ k ) − d k 0 cos(ϕ k 0 + Δθ k )}. where K2 > 0 is a constant whose dimension is [s-2]. In summary, the proposed leader-follower control is:
v&e = K 2 {d k ,i cos(ϕ k ,i + Δθ k ) − d k 0 cos(ϕ k 0 + Δθ k )} v k = K1
ω k = K1
d k ,i cos(Δϕ k ) − d k 0 cos(ϕ k 0 )
+ ve
cos(ϕ k 0 + Δθ k ) cos(ϕ k 0 )
d k ,i cos(ϕ k ,i ) − d k 0 sin (ϕ k 0 ) d k ,i cos(ϕ k 0 )
− ve
(13)
sin (Δθ k ) . d k ,i cos(ϕ k 0 )
In this case, the role of v&e is like the role of acceleration. K2 also decides the convergence rate of ve to vi. There is a relation between K1 and K2 which will be mentioned in the following section. Theorem: Suppose that the motion of the leader robot Ri satisfies the following condiv&i (t ) ∈ L2 [0, ∞), ωi (t ) ∈ L2 [0, ∞) . tion: vi (t ) ≥ v0 > 0, Then, with the control (13), where we let K12 > K 2 d k 0
(14)
as t → ∞ we will have globally dk,i → dk0 and ϕk,i → ϕk0. Furthermore, Δθk → 0 from almost all initial conditions. Proof: Because of the existence of ve, let Δve = ve – vi, (11) is changed as e& x = − K1e x + Δve cos(θ i ) and e& y = − K1e y + Δve sin(θ i ) . Letting e x = e x sin θ i − e y cos θ i and e y = e x cos θ i + e y sin θ i , then using Lyapunov function
V = K (e x )
2
2
⎛ ⎞ 1 1 + ⎜⎜ e y − Δve ⎟⎟ + 2 (Δve )2 K1 K1 ⎝ ⎠
Stable Swarm Formation Control Using Onboard Sensor Information
273
where K > 0 is sufficiently large, we can easily show that V& ≤ 0 and e x (t ) , e y (t ) , Δve(t) globally converge to zero exponentially if vi > 0 (see more on [6]). Besides, by the well-known classical results on input to state stability (see [18]), it is obvious that there is only a finite set of initial conditions that will not result in Δθk → 0 for all k. In addition, larger K1 makes convergence faster and reduces a steady error. However, it is not appropriate to have the time constant 1/K1 comparable to the sampling time of the robot’s hardware. With a larger K1, the control system tends to be oscillatory and instable even in its stop state. We have to make a tradeoff when choosing the coefficients.
4 Simulations and Analysis In order to show the validity, quality and feasibility of the proposed leader-follower control method, several simulations were carried out. The time step is chosen at 0.2 s, which is assumed that 0.1s for measurement and communication, and 0.1s for driving and transportation. First, a simulation with control (13) was performed to tune the coefficients for the control. The follower is desired to keep a distance of dk0 = 2 2 m and bearing angle ϕk0 = π/4 to the leader. The initial state of leader robot and follower robot are: [x1, y1, θ1] = [0, 0, 0], [x2, y2, θ2] = [-3, -3, 0]. K1 is changed with 4 values 1.5, 1.0, 0.1 and 0.001, while K2 is chosen at K12 4 cos 2 (ϕ k 0 ) , as suggested from [6]. The simulation results of trajectory of leader robot seen from the follower after 10000 time steps are shown in Fig. 4. If we see the leader from follower when the formation is in desired state, the position of leader is just one point with the distance 2 2 m and bearing angle π/4 (marked as target point in Fig. 4). The initial state is not correct yet, so the control (13) will make the leader towards the target point in different way depending on K1 and K2. Fig. 4(a) and Fig. 4(b) show that there is oscillation, the convergent rate is fast, and the steady error is small. The leader can reach target point in both those cases. Conversely, in Fig. 4(c) and Fig. 4(d), the convergent rate is slow. Although the leader’s position nearly straight approaches to the target point, it is still far from the target most of the time. Especially, the steady error is large when K1 = 0.001 (Fig. 4(d)). It proved the tuning rule we mentioned above. In this simulation, the time step is 0.2 s. The time constant 1/K1 in case (a) is 0.67 and in case (b) is 1. Those values are comparable with the time step (sampling time), therefore the output of controller is oscillated. 1/K1 should be chosen greater tens of times than sampling time in order to achieve the balance between convergent rate, oscillation and steady error. In second simulation, we demonstrate the ability to apply the algorithm to multirobot system. Fig. 5 shows that 4 robots can keep the diamond form (R1 is only leader) when the leader R1 moves. The initial poses of R1, R2, R3, and R4 are (0, 0, 0), (-3, 3, π/4), (-6, 0, 0), and (-3, -3, -π/4) respectively. The distances between robots are (d20, d30, d40) = ( 3 2 m, 6 m, 3 2 m), and the bearing angle between each pair of robot is defined as (ϕ20, ϕ30, ϕ40) = (-π/4, 0, π/4).
274
V.-H. Tran and S.-G. Lee
-1.5
-1.5
target (-2,-2)
o
target (-2,-2)
-2
y-axis
y-axis
-2
-2.5
-3
-2.5
-3
(a) -3.5 -3.5
(b) -3
-2.5
-2
-1.5
-3.5 -3.5
-1
-3
-2.5
x-axis
-1.5
-1
-1.5
-1
-1.5
target (-2,-2)
o
target (-2,-2)
-2
y-axis
-2
y-axis
-2
x-axis
-1.5
-2.5
-3
o
-2.5
-3
(c) -3.5 -3.5
o
(d) -3
-2.5
-2
-1.5
x-axis
-1
-3.5 -3.5
-3
-2.5
-2
x-axis
Fig. 4. Trajectory of leader robot viewed from the follower’s coordinate with tuning parameters: (a) K1 = 1.5 , (b) K1 = 1, (c) K1 = 0.1, (d) K1 = 0.001
Moreover, if we change the relative distance and bearing angle between robots, the formation can be changed as shown in Fig. 6. The diamond form (form A) as above simulation is changed to line form (form B: R2 follows R1, R3 follows R2, R4 follows R3) which has (d20, d30, d40) = (3 m, 3 m, 3 m) and (ϕ20, ϕ30, ϕ40) = (π/6, π/6, π/6).
Fig. 5. Diamond formation of 4 robots
Fig. 6. Transformation from diamond formation to line formation
Stable Swarm Formation Control Using Onboard Sensor Information
275
5 Conclusions In this paper, a stable leader-follower control for swarm formation was proposed. Simulations have shown that stability of the control algorithm can be well achieved by tuning parameters properly. The algorithm can work well in any scale of the formation. The formation also can change easily and stably just by adjusting relative distance and bearing angle between each pair of robots. Moreover, leader’s velocity is estimated by a simple approximation of acceleration using only available onboard sensor data, thus the number of measurements is reduced, the error from measurement is smaller and the calculation time is less as a consequence.
References 1. Liu, B., Chu, T., Wang, L., Xie, G.: Controllability of a leader–follower dynamic network with switching topology. IEEE Trans. Autom. Control 53, 1009–1013 (2008) 2. Xu, W.B., Chen, X.B.: Artificial moment method for swarm robot formation control. Sci. China Ser. F-Inf. Sci. 51, 1521–1531 (2008) 3. Reynolds, C.W.: Flocks, herds, and schools: A distributed behavioral model. Computer Graphics 21, 25–34 (1987) 4. Lawton, J.R.T., Beard, R.W., Young, B.J.: A decentralized approach to formation maneuvers. IEEE Trans. Robot. Autom. 19(6), 933–941 (2003) 5. Das, A.K., Fierro, R., Kumar, V., et al.: A vision-based formation control framework. IEEE Trans. Robot. Autom. 18, 813–825 (2002) 6. Gustavi, T., Hu, X.: Observer-based leader-following formation control using onboard sensor information. IEEE Transactions on Robotics 24, 1457–1462 (2008) 7. Wang, J., Wu, X., Xu, Z.: Potential-based obstacle avoidance in formation control. J. Control Theory Appl. 6, 311–316 (2008) 8. Barnes, L.E., Fields, M.A., Valavanis, K.P.: Swarm formation control utilizing elliptical surfaces and limiting functions. IEEE Trans. on Systems, Man, and Cybernetics, Part B: Cybernetics 39, 1434–1445 (2009) 9. Tanner, H.G., Jadbabaie, A., Pappas, G.J.: Flocking in teams of nonholonomic agents. Lect. Notes Contr. Inf., pp. 229–239. Springer, Berlin (2005) 10. Warburton, K., Lazarus, J.: Tendency-distance models of social cohesion in animal groups. Journal of Theoretical Biology 150, 473–488 (1991) 11. Lewis, M.A., Tan, K.H.: High precision formation control of mobile robots using virtual structures autonomous. Autom Robot 4, 387–403 (1997) 12. Egerstedt, M., Hu, X., Stotsky, A.: Control of mobile platforms using a virtual vehicle approach. IEEE Trans. Robot. Autom. 46, 1777–1782 (2001) 13. Desai, J.P.: A graph theoretic approach for modeling mobile robot team formation. J. Robot Syst. 19, 511–525 (2002) 14. Fierro, R., Das, A.K.: A modular architecture for formation control. In: Proceedings of the 3rd Int. Workshop on Robot Motion and Control, Poznan, pp. 285–290. IEEE Press, Los Alamitos (2002) 15. Kang, W., Xi, N., Zhao, Y., Tan, J., Wang, Y.: Formation control of multiple autonomous vehicles- Theory and experimentation. Intell. Autom. Soft Comput. 10(2), 1–17 (2004) 16. Tanner, H.G., Pappas, G.J., Kumar, V.: Leader-to-formation stability. IEEE Trans. Robot. Autom. 20(3), 443–455 (2004) 17. Gustavi, T., Hu, X., Karasalo, M.: Robust formation adaptation using on-board sensor information. In: 24th Chinese Control Conference, Guangzhou, pp. 1782–1788 (2005) 18. Khalil, H.: Nonlinear Systems, 2nd edn. Prentice Hall, New Jersey (1996)
A Distributed Energy-aware Trust Topology Control Algorithm for Service-Oriented Wireless Mesh Networks Chuanchuan You1,Tong Wang1,2, BingYu Zhou2, Hui Dai3, and Baolin Sun1 1
School of Computer, Hubei University of Economics, Wuhan, China 430205 2 School of Computer, Wuhan University, Wuhan, China 430072 3 School of Engineering, ChangJiang Professional College, Wuhan, China 430074 [email protected]
Abstract. In this paper, we introduce the Energy-aware Trust Topology Control algorithm based on Ant colony approach (ETTC) that adapts the biological metaphor of Swarm Intelligence to control topology of wireless mesh networks. As trust is important to consider while forwarding packets, this paper propose a novel model that integrated the energy consumption and trust evaluation. The simulations of ETTC show the joint energy-aware and trust effect on the performance metrics such as network connectivity, node failure rate, etc. Keywords: WMN; ant colony; topology control; energy model; trust.
A Distributed ETTC Algorithm for Service-Oriented WMNs
277
One strategy to improve security of ad hoc networks is to develop mechanisms that allow a node to evaluate trustworthiness of other nodes. Trust routing aims to solve this problem[4]. Though most of them do not consider the energy metric which related to the trust model[5]. Also there exists a few works on the joint research of connectivity and trust of topology control algorithms. In this paper we adapt the biological metaphor of Swarm Intelligence to design a novel distributed topology control algorithm termed Energy-aware Trust Topology Control (ETTC) for mobile ad hoc networks. In the rest of the paper, we first briefly report the related work found in the literature on topology control and trust routing, while in section 2 we detail our innovative system model that combined with energy awareness and trust evaluation. In section 3, we evaluate the efficiency of our ant colony algorithm. Conclusions are drawn in the final section 5.
2 Related Works In [4] Yannis Stelios proposed a trust model and pointed out that trust models to detect malicious nodes based on direct and indirect evidence can cause additional energy consumption. But the paper do not quantitative analysis the energy in the model. [6] Propose a secure routing protocol (Ambient Trust Sensor Routing, ATSR) which takes into account the remaining energy of each neighbor and the exchange of indirect trust information thus allowing for better load balancing and network lifetime extension. [7]point out that in current TC schemes, the transmission range of each node is mostly accounted as the exclusive estimator for its energy consumption, while ignoring the amount of data it forwards. such schemes coupled with the popular shortest path routing usually create a highly-loaded area at the center of the network in which nodes deplete their battery very quickly. in[8] the idea is to utilize some nonneighbor nodes to reduce the probability that packets will end up in the local minimum during greedy forwarding and thereby improves the routing performance of existing geographic routing algorithms. the scheme is called Small World TopologyAware. The work most related to us is [9], the author studied the TC problem from serviceoriented prospect. Compared to the existing works that concentrate mainly on the basic connectivity of the underlying network graph, the main aim of this paper is to maximize the overall throughput, Provide satisfactory end-to-end delay and Enhance security and reliability of communications. But trust is not considered in it, too.
3 System Model Description 3.1 Assumption 1) 2)
Nodes are connected if they are neighbors. Nodes move randomly according to the random waypoint mobility model [10].
278
C. You et al.
3) 4)
The capacity of every link is binary and homogeneous. That is, a link either exists at a specified capacity or it does not exist. channel and radio of MANET are sufficient, so links can be build when neighbor node entering each other’s radio coverage.
3.2 Network Model Our network model is very similar to the ones used in [7] . We assume an ad hoc network which consists of a set of wireless nodes that are uniformly distributed with density d within a circle COR of center O and radius R. Each node sends messages to any other node with an average uniform rate k per flow. The initial transmission range of all nodes is the same and also configurable to a real value T. An arbitrary node A can directly communicate with any other node within the distance of its transmission range. The transmission range T along with the geographical positions of nodes in the network represents the topology as an undirected graph in which there is a link between any pair of nodes which can communicate directly. 3.3 Energy Model We are given a set
V of transceivers, with V = n ,equipped with an omnidirectional
antenna which is responsible for sending and receiving signals. An ad hoc network is established by assigning a transmission power pu to each transceiver u ∈V . Each node can (possibly dynamically) adjust its transmitting power, based on the distance to the receiving nodes and on the background noise. In the most common ε
power attenuation model [11] , the signal power falls with 1 / d , where d is the distance from the transmitter and ε is the path loss exponent (typical values of ε are between 2 and 4). Under this model, the power requirement at node u for supporting the transmission through a link from u to v is given by
pu ≥ d uvε ⋅ qv Where and
(1)
d uv is the Euclidean distance between the transmitter u and the receiver v ,
qv is the receiver’s power threshold for signal detection, which is usually nor-
malized to 1. We assume that the power requirement for supporting a transmission between
u and v separated by a distance d uv becomes e(u, v ) = d uvε . Communication from node u to node v will be enabled whenever pu ≥ e(u, v ) . Therefore, the transmission graph associated with a power assignment pu to each transceiver u ∈V is defined as the direct graph G ( p ) = (V , E ( p )) , where E ( p ) = {(u, v ) : u ∈V , v ∈V , pu ≥ e(u, v )}. nodes
A Distributed ETTC Algorithm for Service-Oriented WMNs
279
We adopt a transmission power model similar to the one used in [12, 13] for the communications energy models. The energy spent by a node for transmitting one packet can be compute as follows:
Etx ( p, d , l ) = (l / M ) ⋅ ( d α ⋅ b + a ) ⋅ p
(2)
where a, b, and α are constants dependent on the characteristics of the communication channel, M is the bandwidth, l is the packet length and p is the power .The value of α is usually greater than or equal to 2. We do not consider the energy consumption for node mobility as in [13] ,this can be true as the node is PDA, mobile phone, etc. 3.4 Trust Model We adopt a energy proportional cost model .In this model, the trust related to energy degrading can be calculated from the energy residential E res :
T = kE res Where
(3)
k is a constant dependent on the environment .
4 ETTC Protocol In essence, ETTC works as follows. Every node discovery protocol using the full power addition, node
v j ,periodically executes a neighbor
p full to obtain its neighbor set N ( v j ) . In
v j (periodically) broadcasts ant packets using different transmission
power levels, and the values of the transmission power and trust metrics are also carried inside the ant packets. In the meantime, upon receiving ant packets, node v j evaluates the whether the trust is higher than threshold. If this condition holds, ETTC assigns p to Pv j . 4.1 ETTC Algorithm Algorithm 1 presents the ETTC protocol. Each node, say
v j , executes this protocol
asynchronously and independently from other nodes, and ETTC protocols running on different nodes interact via sending and receiving ant packets. Notice that ETTC is an event-driven protocol which reacts to three types of events: (1) ticks of periodic timer to originate ant packets, (2) receptions of ant packets to update the local pheromone table and decide whether to forward ant packets, and (3) ticks of periodic timer to evaporate pheromone values.
280
C. You et al.
INPUT:
p full
BEGIN
pv j ← p full entries in pheromone table are set to 0; loop upon event periodic timer for ant packet origination ticks compute trust according to equation (3); txPower← Pbest; end if if txPower<= p full then choose power txPower to the next power level in a round-robin order; end if seq(vj )++; totalPower ← p full ; relaySet ← {vj }; With power txPower, broadcast ant packet < vj , seq(vj ), txPower, totalPower, relaySet>; end upon event upon event receiving an ant packet < origin, seq, txP, totalP, relay S > if the ant packet, identified by , has not been received recently then Update local pheromone table with org←origin and p ←txP; If ( vi ∈ relay S, vi is vj’s neighbor) (totalP ≥ txP) then broadcast ant packet < origin, seq, txP, txP, relay S {vj} >; end if end if end upon event upon event periodic timer for pheromone evaporation ticks Evaporate local pheromone values; end upon event end loop END
∀
∧
∪
5 Computational Results We use QualNet to analysis our ETTC algorithm . The parameters used is the simulation study are as follows. Initially, 20 nodes are uniformly distributed in a 1200 ,1200 terrain. The propagation path-loss model is two-ray. The channel data rate is 2.0 Mbps. The sensing and receiving thresholds are -91.0 and -81.0 dBm, respectively. The mobility model is random way-point, with 3-second pause-time and maximum speed ranging from 0 m/sec to 80 m/sec and minimum speed of 10 m/sec for non-zero maximum speeds. There is no mobility when the maximum speed is 0. The simulation time is 150 seconds.
A Distributed ETTC Algorithm for Service-Oriented WMNs
281
8 7
Connectivity
6 5 4 3 2 1 0
0
30
60
90
120
150
Time
Fig. 1. Network connectivity evaluation from the ETTC
6 Conclusion In this paper, we described the Ant-Based Topology Control algorithm for mesh networks. ETTC adapts the biological metaphor of swarm intelligence as a heuristic search mechanism to discover power assignments that achieve better performance. In particular, ETTC is a distributed algorithm where each mobile node asynchronously collects local information from its neighbors to search for its proper transmission power. Its operations do not require any location, angle-of-arrival, topology, or routing information. The positive feedback mechanism of swarm intelligence allows ETTC to converge quickly to good power assignments with respect to better network performance, while the amplification of fluctuation lets ETTC discover new and better power assignments to adapt to changing topology due to mobility.
Acknowledgment The work is supported by the Young and Middle-aged Elitists' Scientific and Technological Innovation Team Project of the Institutions of Higher Education in Hubei Province(No.T200902),Key Scientific Reasearch Project of Hubei Education Department(No.D20081904).
References 1. Ding, X.Y., Luo, H.Q.: Trust Evaluation Based Reliable Routing in Wireless Mesh Network, pp. 2294–2297. IEEE, New York (2007) 2. Kim, K., Han, K.: A Topology Control Scheme for Selecting Active Nodes in Wireless Sensor Networks. IEICE T. Commun. E92.B, 3915–3918 (2009) 3. Omar, M., Challal, Y., Bouabdallah, A.: Reliable and fully distributed trust model for mobile ad hoc networks. Comput. Secur. 28, 199–214 (2009)
282
C. You et al.
4. Stelios, Y.: A Distributed Energy-Aware Trust Management System for Secure Routing in Wireless Sensor Networks. In: Mobile Lightweight Wireless Systems, pp. 85–92 (2009) 5. Melchor, C.A., Salem, B.A., Gaborit, P., Tamine, K.: AntTrust: A novel ant routing protocol for wireless ad-hoc network based on trust between nodes. In: ARES 2008: Proceedings of the Third International Conference on Availability, Security And Reliability, pp. 1052–1059 (2008) 6. Zahariadis, T., Leligou, H.C., Voliotis, S., Maniatis, S., Trakadas, P., Karkazis, P.: An Energy and Trust-aware Routing Protocol for Large Wireless Sensor Networks, pp. 216–224. World Scientific and Engineering Acad. and Soc., Athens (2009) 7. Zarifzadeh, S., Nayyeri, A., Yazdani, N., Khonsari, A., Bazzaz, H.H.: Joint range assignment and routing to conserve energy in wireless ad hoc networks. Computer Networks 53, 1812–1829 (2009) 8. Xi, F., Liu, Z.: Small World Topology-Aware Geographic Routing in Wireless Sensor Networks. In: 2009 WRI International Conference on Communications and Mobile Computing: CMC 2009, vol. 2, pp. 116–120 (2009) 9. Zhang, T., Yang, K., Chen, H.H.: Topology Control for Service-Oriented Wireless Mesh Networks. IEEE Wireless Communications 16, 64–71 (2009) 10. Camp, T., Boleng, J., Davies, V.: A survey of mobility models for ad hoc network research. wireless communication & mobile computing 2, 483–502 (2002) 11. Rappaport, T.: Wireless Communications: Principles and Practice. Prentice-Hall, Englewood Cliffs (2002) 12. Ooi, C.C., Schindelhauer, C.: Utilizing detours for energy conservation in mobile wireless networks. Telecommun Syst. 43, 25–37 (2010) 13. Tang, C., Mckinley, P.K.: Energy optimization under informed mobility. IEEE Transactions on Parallel and Distributed Systems 17, 947–962 (2006)
A Quay Crane Scheduling Model in Container Terminals Qi Tang School of Management, Tianjin Polytechnic University, TianJin 300160, China [email protected]
Abstract. This paper discusses the problem of scheduling quay cranes, the most important equipment in port terminals. A simulation model is developed for evaluating time of quay cranes. Then a dynamic scheduling model using objective programming for quay cranes is developed based on genetic algorithm approach. Finally, numerical experiments on a specific container terminal are made for propose approach. Computational results suggest that the proposed method is able to solve the problem efficiently. Keywords: quay crane, container terminal, genetic algorithm.
1 Introduction The fast storage and retrieval of containers at the ship’s hold and the deck are essential for the economic performance of container terminals. These issues affect directly on the traffic of the handling equipment and consequently on the dwell and turnaround time of vessels. The process of unloading and loading a ship at a container
2 Literature Review Daganzo was the first who discussed the QC scheduling problem [2]. He suggested an algorithm for determining the number of cranes to assign to ship-bays of multiple vessels. Peterkofsky and Daganzo also provided an algorithm for determining the departure times of multiple vessels and the number of cranes to assign to individual holds of vessels at a specific time segment [3]. They also attempted to minimize the delay costs. The studies by Daganzo and Peterkofsky and Daganzo assumed one task per ship-bay, a task which needs crane operations during a specific length of time, and did not consider the interference among QCs or precedence relationships among tasks. Lim et al. augmented the static quay crane scheduling problem for multiple container vessels by taking into account non-interference constraints [4]. They assumed that containers from a given area on a container vessel were a job, and there was a profit value when a job was assigned to a quay crane. The objective was to find a crane-to-job matching which maximized the total profit. Dynamic programming algorithms, a probabilistic tabu search, and a squeaky wheel optimization heuristic were proposed in solving the problem. Kim et al. studied the load-sequencing problem for outbound containers in port container terminals which was to determine the pick up sequence by transfer cranes in the yard and the loading sequence of slots in the container vessel by quay cranes simultaneously [5]. A beam search algorithm was proposed to solve this problem. Kim and Park discussed the quay crane scheduling problem with non-interference constraints in which only single container vessel was considered [6]. This paper focuses on the Quay Crane Scheduling and attempts to determine the schedule of each QC assigned to a vessel, with the goal of completing all of the ship operations of a vessel as rapidly as possible.
3 Quay Crane Scheduling Problem The total transshipment operations of a ship are a set of tasks, including unloading processes loading processes. Our goal is to determine the sequence of discharging and
A Quay Crane Scheduling Model in Container Terminals
285
loading operations that a QC will perform so that the completion time of a ship operation is minimized. As illustrated in Fig. 2, container vessels are typically divided longitudinally into holds that open to the deck through a hatch. The interference between quay cranes will be occur if the two QCs working on the same time since they are on the same track. In practice, only one quay crane can work on a hold at any time. So, a quay crane can move to another hold until it completes the current one. To avoid crossing of cranes, the QCSP(Quay crane scheduling problem) requires a spatial constraint, which is not involved in machine scheduling problems. As a further spatial constraint, sophisticated QCSP models also comprise the compliance with safety margins between adjacent cranes. Additional attributes for tasks and cranes lead in fact to a variety of different models for QC scheduling. Tasks to be scheduled on a QC describe the granularity in which the workload of a vessel is considered in a QCSP model [8]. The idea of dividing the workload of a vessel into bay areas is to serve each bay area exclusively by one QC. If the bay areas are non-overlapping, crane interference is completely avoided. However, a sufficient balance of the workload distribution among the cranes is often not possible.
Fig. 2. Collocation of container
4 Model Formulation This section proposes a mathematical formulation for the QC scheduling problem. The following assumptions are imposed in formulating the QCs scheduling: Assumptions 1) All QCs are on the same track and they cannot cross each other. 2) Some tasks must be performed firstly. 3) Some tasks that cannot be performed simultaneously. 4) There is enough QCs to handle containers. In order to formulate the Quay crane scheduling, the following parameters and decision variables are introduced: Parameters QCk Quay crane, k=1,…, K; T The total time of task; I The total quantity of containers;
286
H
Q. Tang
The number of holds;
ph The processing time of hold h by a quay crane (1≤ h ≤ H); ci The time required to perform container i; tijk The travel time of a QCk from container i to container j; rk The earliest possible operation of a QCs ; S Safety margins between two adjacent QCs; tk The completion time of QCk , k=1,…, K; Fit Total workload of hold i within Period t; Fit’ Delayed workload of hold i within Period t-1; R the number of container types; CNit Number of loading and discharging containers in hold i within Period t; Decision variables
⎧ 1, if hold h is handled by quay crane k Ohk = ⎨ (1≤h≤H, 1≤k≤K); ⎩0, otherwise ⎧1, if container j stars later than the completion time of container i Pij = ⎨ ⎩0, otherwise Th
the completion time of hold h (1≤h≤H).
The mathematical model can be formulated as follows: H
K
h =1
k =1
MinZ = ∑ Th ∑ Ohk
(1)
Subject to: K
∑O
hk
k =1
= 1 ∀h = 1,L, H
Th − ph ≥ 0 ∀h = 1,L, H K
∑ (t ki =1
(2) (3)
I
k
+ rk + tijk ) + ( K − 1) S ≥ ∑ ci
(4)
i =1
I
T
I
R ∑∑ CN it ≤ ∑ ( Fit + Fit' ) i =1 t =1
(5)
i =1
Equation (1) is used to minimize the total completion time among all holds at each scheduling, which synthetically considers all unloading processes loading processes. Constraints (2) is aimed to ensure that the task is consist of each QC’s sub-task. Constraints (3) define the property of the decision variable Th. Constraints (4) illuminate the total time of QC is consist of work time and Safety margins time. Constraints (5) is used to ensure that the total allocated workload of all deployed yard cranes is not more than the workload of the entire hold.
A Quay Crane Scheduling Model in Container Terminals
287
5 Genetic Algorithm Based Solution The proposed mathematical model in the paper includes nonlinear components in the objective function (1) and four constraints. Thus, we propose a genetic algorithmbased heuristic in order to obtain good solutions. Genetic Algorithm (GA) proposed by Goldberg is an efficient meta-heuristic approach to solve combinatorial problems [9]. Fig. 3 demonstrates the steps of this approach.
Genetic Algorithm
Solve QC Scheduling Problem
Compute the Total Time
N
Y
Gen<Max-Gen
End
Fig. 3. Steps of the proposed GA
5.1 Code of Chromosome For the QCs scheduling, the holds are detected in each period by using heuristic rules. In details, if the container is processed by QC, the value of gene is 1, else the value is 0. The code of chromosome is shown as Fig. 4. Chromosome 1
Chromosome n
Hold-1 QC1 QC2 … QCn
Container 1
Container 2
Hold-n …
Container n
…
Container 1
Container 2
Container n
1
0
0
1
1
0
0
1
1
1
0
1
Fig. 4. He code of chromosome
5.2 Calculation of Fitness Value In the simulation optimization process, GA is used to create loading order of outbound containers first. And then the order is inputted in simulation model. In simulation model QCs is dispatched to container according to certain rules. Finally, the loading/unloading time can be obtained by running simulation model. Therefore, simulation cannot only be used to evaluate the loading/unloading order obtained by GA, but also be used to construct a schedule considering all the constraints.
288
Q. Tang
In particular, we first generate the set of loading/unloading time variables from chromosomes using genetic operators and then evaluate them. Then, based on the set of these variables, the problem reduces to an operation time problem in both loading and unloading processes. The fitness function is same as equation (1). 5.3 Selection Process The parent selection operator is an important process that directs a GA search toward promising regions in a search space. Two parents are selected from the solutions of a particular generation by selection methods that assign reproductive opportunities to each individual parent in the population. There are several selection methods, such as roulette wheel selection, tournament selection, rank selection, elitism selection, random selection and so on. For this study, we used a binary tournament selection that works by forming two teams of chromosomes. Each team consists of two chromosomes randomly drawn from the current population. The two best chromosomes that are taken from one of the two teams are chosen for crossover operation. In this way, two offspring are generated and entered into a new population. 5.4 Crossover Process The crossover operator generates new children by combining the information contained in the chromosomes of the parents so that new chromosomes will have the good parts of the parents’ chromosomes. A crossover probability indicates how often crossover will be performed. There are several types of crossovers, including singlepoint crossover, multi-point crossover, and uniform crossover. We applied the twopoint crossover in which one is used for loading and uploading processes. The two locations of the crossover points are randomly selected in only loading/unloading decisions of facilities in the initial time period since expansion decisions are dependent upon installation decisions. Then, the blocks of the two parents’ strings are swapped to produce two children. 5.5 Mutation Process After recombination, some children undergo mutation. Mutation operates by inverting each bit in the solution with some small probability, usually from zero percent to 10 percent. The rationale is to provide a small amount of randomness, and to prevent solutions from being trapped at a local optimum. The type of mutation varies depending on the encoding as well as the crossover. In the GA used for this study, the mutation operator first randomly selects a time period and a bit value of only opening/closing decision variables on a chromosome. Then, a bit value is flipped from 0 to 1, or from 1 to 0. If the changed bit value is 0, the corresponding two bits for expansion and amount of expansion are changed to zero; otherwise they are randomly generated. Hence, a good level of diversity in each generation is achieved. 5.6 Experiment We tested the proposed solution approach by using the proposed GA that sets the parameter values through extensive experiments. The experiments were run on a
A Quay Crane Scheduling Model in Container Terminals
289
Fitness
value
ACER AMD Athlon 2.6G, 3.2 G system with 2G RAM. To implement the GA, the population size and the total number of generations were set to 100 and 500 accordingly. The crossover, reproduction, and mutation probabilities were set to 80%, 10%, and 10%, respectively. In crossover, three techniques of random generation, semirandom generation, and tournament were employed with an equal chance of occurrence. In the semi-random technique, one of the binary values was selected randomly and the best solution in the previous generation was chosen as the second binary string to participate in the crossover. Fig. 5. shows the best fitness values at each generation as a function of the number of generations. Different parameter combinations are used to test to the performance of our algorithm; results show that this method has a very good robust performance, so it is very useful for practical use. Because of the limitation of the space, some results are omitted.
37 36 35 l a 34 v s 33 s e n t 32 i f 31 30 29 28
400 1 502 100 3 41505 200 6 7250 8 300 9 10350 11 12 13 450 14 15500 16 17 18 19 20 gener at i on
Generation
Fig. 5. Evolution curves of genetic algorithm
6 Conclusion The efficiency of container terminals is extensively dependent on the effectiveness of terminal resource allocation at different container handling stages. This paper proposed an efficient genetic algorithm (GA) to QC scheduling problem, which is an important problem in the operation of port container terminals. Experimental result shows that proposed approach has better synthetic performance in both the computation speed and optimality in reverse logistics model.
References 1. Steenken, D., Vo, S., Stahlbock, R.: Container terminal operation and operations research – a classification and literature review. OR Spectrum 26, 3–49 (2004) 2. Daganzo, C.F.: The crane scheduling problem. Transportation Research Part B 23, 159–175 (1989) 3. Peterkofsky, R.I., Daganzo, C.F.: A branch and bound solution method for the crane scheduling problem. Transportation Research Part B 24(3), 159–172 (1990)
290
Q. Tang
4. Lim, A., Rodrigues, B., Xiao, F., Zhu, Y.: Crane scheduling with spatial constraints. Naval Research Logistics 51, 386–406 (2004) 5. Kim, K.H., Kang, J.S., Ryu, K.R.: A beam search algorithm for the loadi (2004) 6. Kim, K.H., Park, Y.M.: A crane scheduling method for port container terminals. European Journal of Operation Research 156, 752–768 (2004) 7. Pinedo, M.: Scheduling – Theory Algorithms and Systems, 2nd edn. PrenticeHall, Englewood Cliffs (2002) 8. Bierwirth, C., Meisel, F.: A survey of berth allocation and quay crane scheduling problems in container terminals. European Journal of Operational Research 202, 615–627 (2010) 9. Goldberg, D.A.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley Publishing Company, Inc., Reading (1989)
Leader-Follower Formation Control of Multi-robots by Using a Stable Tracking Control Method Yanyan Dai, Viet-Hong Tran, Zhiguang Xu, and Suk-Gyu Lee Department of Electrical Engineering, Yeungnam University 214-1 Daedong Gyeongsan Gyeongsbuk, Korea 712-749 [email protected], [email protected][email protected], [email protected]
Abstract. In this paper, the leader-waypoint-follower robot formation is constructed based on the relative motion states to form and maintain the formation of multi-robots by stable tracking control method. The main idea of this method is to find a reasonable target velocity and angular velocity to change the robot’s current state. The proposed Lyapunov functions prove that robots change current velocities to target velocities which we propose, in globally asymptotically stable mode. The simulation results based on the proposed approach show better performance in accuracy and efficiency comparing with EKF based approach which is applied in multiple robots system in common. Keywords: leader-waypoint-follower robots formation, stable tracking control method, EKF, Lyapunov function.
p X , Y|Z, U by a multivariate Gaussian which includes mean µ and covariance Σ . We linearize the nonlinear function g and h using standard Kalman filter equations in order to update this Gaussian. However, since a single vehicle measurement generally affects all parameters of the Gaussian, the update procedure requires long calculation time when applied in environments with many landmarks (or features). Moreover, in [3] and [5], since EKF requires long time to update when the follower robot calculates the desired state, the leader robot has moved. This case may cause hysteresis and reduce the accuracy of multiple robots formation. In addition, the extended Kalman filter is not good at minimizing the error of θ which is the most important aspect about the motion of the robot. To over these drawbacks, we propose a stable tracking control method which is not only stable as EKF but also produces smaller error than EKF with less time required.
2 Formation Control Framework 2.1 Modeling of the Robot For the n mobile robots under consideration, the motion of each robot is described in terms of P x, y, θ , where x, y and θ are x coordinate y coordinate and bearing respectively. The trajectory of each robot has the form of x, y where input of each robot is velocity v and angular velocity ω. The model of robot R has the form of x
v cosθ .
y
v sinθ . θ
(1)
ω .
2.2 Problem Formulation In the formation control which is considered in this paper, the n robots are controlled to move following the desired distance and desired angle in relative coordinate frame in a stable mode. In multiple robot system, one robot is assigned as the leader robot determining the follower robot’s motion by defining the desired distance and desired bearing angle which the follower robot must follow. According to the desired distance and desired bearing angle, the follower robot calculates its waypoint and then moves toward it. In this system, leader and follower robots are linked by wireless connection. Therefore, the speed of the data connection and update is important since it produces direct effect on the overall formation in the multiple robots system. There are two problems: the first one is how to define and keep the formation stable, the second one is how to accelerate the speed of connection and reduce calculation time. 2.3 Formation Control Framework Let R and R be the leader and follower robot respectively. We denote d as the actual distance between R and R , d is the desired distance, and β as the actual
Leader-Follower Formation Control of Multi-robots
293
bearing angle from the orientation of follower robot to the axis connecting R and R , β is the desired bearing angle. The formation among leader robot R –waypoint R – follower robot R with desired distance and desired bearing angle is shown in Figure1. The waypoint of follower robot R is denoted by P x ,y ,θ which is calculated by Equation (2): x y θ
x y
d cos β d sin β θ
θ θ
.
(2)
θti −1
d 0i θti−1
β 0i θti
Fig. 1. Leader- waypoint- follower robots formation control framework
3 Stable Tracking Control Method for Each Robot When controlling the motion of each robot, two points should be considered such as reducing the connection and calculation time, and keeping the robot motion stable. To achieve these goals, the proposed approach is motivated by the method introduced in [10]. In the proposed method, robot’s waypoint model is defined as P x ,y ,θ and robot current state model has the form of P x ,y ,θ . According to the waypoint and current state error posture, the robot calculates the target velocity and angular velocity v, ω and changes robot current state velocities to track it. Finally, based on the velocities of the robots, each robot produces the derivative of current state P to realize the goal of minimizing error and making the formation stable, especially the θ error which is the weak point of EKF. The derivative of current state P is derived from the target velocity and angular velocity. Since this method does not require landmarks for measurement, the matrix is smaller than that in EKF. This contributes directly to accelerate the speed of calculation, solve the problem of hysteresis and improve accuracy and efficiency. The related equations are derived similarly as done in [10]. We define the waypoint and the current state error e x ,y ,θ P P and transfer state error e to state error posture E by the following equation. x cosθ sinθ 0 E (3) y sinθ cosθ 0 e T e. 0 0 1 θ
294
Y. Dai et al.
If P P , and the error e tion x sinθ y cosθ E
x y θ
cosθ sinθ 0
0, then E 0. Using Equation (1) and an equal equa0 [10], the derivative matrix E can be derived. y x 1
1 0 v 0
0 0 ω 1
v
ω .
(4)
Based on the waypoint and state error posture, we propose target velocities as follows: v ω
ω
v cosθ y v
K x K sinθ
.
(5)
where K and K are positive constants. To transfer the current state velocities to target velocities, the derivative matrix E is described by E
x y θ
cosθ sinθ 0
v
0 0 ω 1
1 0 v 0
cosθ sinθ 0
v
0 0 ω 1
1 0 0
y v
K sinθ
y x 1
ω
y x 1
v cosθ
.
K x
(6)
Theorem. (Lyapunov stability of autonomous systems). Let point for a system described by: x
ω
be an equilibrium
f x .
(7)
Where f:U R is a locally Lipschitz and U ⊂ R a domain that contains the origin. Let V: U R be a continuously differentiable, positive definite function in U. 1. If V x ∂V/ ∂x f is negative semidefinite, then x=0 is a stable equilibrium point. 2. If V x is negative definite, then x=0 is an asymptotically stable equilibrium point. In both cases above V is called a Lyapunov function. Moreover, if the conditions hold ∞ implies that V x ∞, then x=0 is globally stable in case for all x R and x 1 and globally asymptotically stable in case 2. Proposition. Assume we use the target velocities as the current state velocities. if waypoint velocities , and , , then as ∞, E=0 is an asymptotically stable equilibrium point. Proof. We propose a natural Lyapunov function candidate for this system as follows: V E
x
y
1
cosθ
.
(8)
Leader-Follower Formation Control of Multi-robots
295
Notice that V 0 0 and V E E 0 is positive definite. The derivative of V E along the trajectories of the system is given by V E
E
x x K x
Therefore, by Theorem, as t
y y
θ sinθ K θ sinθ
0.
(9)
∞, the origin E=0 is globally asymptotically stable.
4 Simulation Results Figure 2 shows simulation result of the square formation by one leader robot and three follower robots. In the simulation, the first follower robot is designed to keep 0.2m, the desired angle β π/4, and the second follower the desired distance d robot is supposed to maintain the desired distance d 0.2m, the desired angle β π/4. In addition, the third robot should hold the desired distance d gle β 0. Based on the tracking control method, each robot √0.08m, the desired angle β 0.1s , K 0.28s is controlled. By using experimental trial, the constant K was chosen. the x-y-position of leader robot and follower robots leader robot follower robot1 follower robot2 follower robot3
0.6 0.5 y-axis (meter)
0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 10
10.2
10.4
10.6 10.8 11 x-axis (meter)
11.2
11.4
Fig. 2. One leader and three follower robots square formation
Figure 3, 4 and 5 compare the state error, (x, y, θ of followe robot based on the EKF control method and stable tracking control method with the same input and measurement noises. The simulation was performed during 240 seconds and it was run 100 times to take average values. The leader robot’s velocity is kept at 0.3 m/s, while the angular velocity is changed smoothly. The input noise is σv = 0.01 m/s and σω = 0.01 rad/s. The results present clearly that the stable tracking control method is superior to EKF when controlling each robot with smaller error. In order to see the merit of stable tracking control method over EKF under high noisy condition, we simulated the system with input noise of 10 times bigger than those used in the above simulation, i.e. σv = 0.1 m/s, σω = 0.1 rad/s. The root mean square(rms) of error values were calculated and shown in Table 1.
Y. Dai et al.
3
x 10
x Error
-3
x error: EKF x error: stable tracking control method
x Error (meter)
2
1
0
-1
-2
-3 0
40
80
120 Time (sec)
160
200
240
Fig. 3. Comparison of the error of EKF and stable tracking control method in x-axis
5
x 10
y Error
-3
y error: EKF y error: stable tracking control method
4 3
y Error (meter)
2 1 0 -1 -2 -3 -4 -5 0
40
80
120 Time (sec)
160
200
240
Fig. 4. Comparison of the error of EKF and stable tracking control method in y-axis θ Error 1.5
θ error: EKF θ error: stable tracking control method
1
θ Error (Degree)
296
0.5
0
-0.5
-1
-1.5 0
40
80
120 Time (sec)
160
200
240
Fig. 5. Comparison of the error of EKF and stable tracking control method in θ-axis
Leader-Follower Formation Control of Multi-robots
297
Table 1. The rms errors in various noisy conditions
stable tracking control method
EKF
x-error
y-error
θ-error
σv=0.01 m/s σω=0.01 rad/s
2.0224 × 10-4
4.9353 × 10-4
0.0609
σv=0.1 m/s σω=0.1 rad/s
8.6353 × 10-4
0.0014
0.0999
σv=0.01 m/s σω=0.01 rad/s
5.2112× 10-4
0.0011
0.3223
σv=0.1 m/s σω=0.1 rad/s
0.0039
0.0041
1.2222
As shown in Table 1, at each noisy level, stable tracking control method shows better performance comparing with EKF. Moreover, when the noisy level increased 10 times, the x, y and θ error caused by using stable tracking control method increased by 4.2698, 2.8367 and 1.6404 times while EKF produces the error by 7.4839, 3.7273 and 3.7921 times higher. Therefore, the proposed method shows increased stability for bigger noise condition comparing with EKF based approach.
5 Conclusion In this paper, we presented the leader-waypoint-follower formation framework using a stable tracking control method based on the target velocities which are less complicated than other paper’s equations. The two advantages of the proposed control method include stable formation through navigation and enhanced accuracy of the motion despite reduced calculation time. The simulation results shows that the proposed method has better performance in tracking accuracy comparing with EKF based approach which is the common, fundamental control method. In the future, in order to improve the safety and stability of multi-robot system, we will consider clusters of robots. Cooperation of leader robots in each cluster will be deeply considered to enhance efficiency of the whole system.
References 1. Chen, J., Sun, D., Yang, J., Chen, H.: Leader-Follower Formation Control of Multiple Non-holonomic Mobile Robots Incorporating a Receding-horizon Scheme. J. The International Journal of Robotics Research 28 (2009) 2. Song, Y., Kim, D.Y., Ahn, H.-S., Shin, V.: Simultaneous Pedestrian and Multiple Mobile Robots Localization Using Distributed Extended Kalman Filter. In: 2008 IEEE International Conference on Robotics and Biomimetics, Thailand, pp. 1065–1069 (2009)
298
Y. Dai et al.
3. Simmons, R., Apfelbaum, D., Burgard, W., Fox, D., Moors, M., Thrun, S., Younes, H.: Coordination for Multi-Robot Exploration and Mapping. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence, pp. 852–858 (2001) 4. Schneider, F.E., Wildermuth, D.: Using an Extended Kalman Filter for Relative Localisation in a Moving Robot Formation. In: Fourth International Workshop on Robot Motion and Control, pp. 85–90 (2004) 5. Gustavi, T., Hu, X.: Observer-Based Leader-Following Formation Control Using Onboard sensor Information. IEEE Transactions on Robotics 24(6), 1457–1462 (2008) 6. Desai, J.P., Ostrowski, J.P., Kumar, R.V.: Modeling and Control of Formations of Nonholonomic Mobile Robots. IEEE Transactions on Robotics and Automation 17(6), 905–908 (2001) 7. Fernando, A., Fontes, C.C.: A General Framework to Design Stabilizing Nonlinear Model Predictive Controllers. Systems & Control Letters 42, 127–143 (2001) 8. Ögren, P., Egerstedt, M., Hu, X.: A Control Lyapunov Function Approach to Multiagent Coordination. IEEE Transactions on Robotics and Automation 18(5), 847–851 (2002) 9. Thrun, S., Liu, Y.: Multi-Robot SLAM With Sparse Extended Information Filers. In: Proceedings of the 11th International Symposium of Robotics Research (ISRR 2003), Sienna, Italy (2003) 10. Kanayama, Y., Kimura, Y., Miyazaki, F., Noguchi, T.: A Stable Tracking Control Method for an Autonomous Mobile Robot. In: Proceeding IEEE International Conference on Robotics and Automation, pp. 384–389 (1990) 11. Consolini, L., Morbidi, F., Prattichizzo, D., Tosques, M.: A Geometric Characterization of Leader-Follower Formation Control. In: IEEE International Conference on Robotics and Automation, April 2007, pp. 2397–2402 (2007) 12. Balch, T., Arkin, R.C.: Behavior-based Formation Control for Multi-robot Teams. IEEE Transactions on Robotics and Automation 14, 926–939 (1998) 13. Lalish, E., Morgansen, K.A., Tsukamaki, T.: Formation Tracking Control using Virtual Structures and Deconfliction. In: Proceeding of the 2006 IEEE Conference on Decision and Control (2006)
Research on the Coordination Control of Vehicle EPS and ABS Weihua Qin, Qidong Wang, Wuwei Chen, and Shenghui Pan School of Mechanical and Automotive Engineering, Hefei University of Technology, Hefei 230009, China [email protected]
Abstract. Based on the multibody dynamics model of Electric Power Steering System(EPS) and the lateral and longitudinal dynamics vehicle model, the EPS and Anti-lock Braking System (ABS) sub-controllers and the coordination controller are respectively designed according to the motion coupling relation between the steering system and the braking system. The coordination controller supervises and coordinates each sub-controller as the upper controller. The simulation under Matlab and the vehicle test with hardware-in-the-loop(HIL) based on LabVIEW have evaluated, tested and verified the vehicle maneuverability and braking performance under coordination control. The results show that the coordination control effectively improved the comprehensive performance of vehicle, and the application of multibody dynamics model and hardware-in-the-loop test were convenient and feasible in the coordination control research of vehicle chassis.
,
,
Keywords: Multibody dynamics Electric power steering system Anti-lock braking system, Coordination control, Hardware in the loop.
hierarchical coordination control. Such control structure can shorten the controller’s development cycle and is easy to guarantee the control effect; and multibody dynamics model of EPS is applied into the coordination control research and a dynamical equations set is established through the constraint relation between the members, which can embody the nonlinear relation between steering input and steering wheel output and describe the influence on the vehicle’s steering motion and the internal stress of the steering system more accurately for the change of working condition (tire-road friction and vehicle speed, etc.). The hardware-in-the-loop test is a test method which can connect the software and actual hardware like actuator to the system circuit, and has become an important development approach for the vehicle control system, and has been widely used in the design of rapid control prototype[5-6]. LabVIEW is a software platform based on graphical programming language, which can conduct seamless integration with hardware through the open software environment, so as to complete the simulation and realization of the control system. In this thesis, the effectiveness of control strategy has been validated through the simulation and HIL test based on LabVIEW.
2 System Model 2.1 EPS Multibody Dynamics Model A multibody dynamics model for the steering column assisted or rack-and-pinion electric power assisted steering system is established by the Cartesian coordinates method. The dynamic equation[7] based on Cartesian coordinates is
z ——Generalized force matrix. 2.2 Vehicle Model The longitudinal, lateral and yaw motion model is applied. The vehicle is regarded as a single mass rigid body that conducts free motion in the plane x − y . The dynamic equation for the vehicle is established as Equation (1). In the model, the generalized Cartesian coordinates is: The yaw velocity ω r
q = [x1 y1 ψ ]T
= ψ& , in which the forward velocity along the direction of x1 is set as u and the lateral velocity along the direction of y1 is set as v . It is converted by the coordinates to get: ⎧ax = u& − v ⋅ ωr ⎨ ⎩ay = u ⋅ ωr + v&
(2)
Research on the Coordination Control of Vehicle EPS and ABS
Force element
z =[fx
301
]
T n ,
fy
f x = Fx 1 cosδ + Fx 2 − Fy1 sin δ .
(3)
f y = Fy1 cos δ + Fy 2 + Fx 1 sin δ .
(4)
n = l f F y1 cos δ − l r F y 2 + l f Fx1 sin δ
(5)
Wherein: Fx i , Fy i ——Longitudinal and Lateral force exerted to front and rear wheels( i =1, 2)
l f , l r ——Distance from front and rear wheels to mass center.
2.3 Braking System Model Equations of wheel rolling motion:
I 1ω& f = Fx 1 R1 − Tb 1 . I 2ω& r = Fx 2 R2 − Tb 2 . Wherein:
(6) (7)
I i ——Wheel mass rotational inertia of front and rear wheels( i =1,2)
ω f , ω r ——Angular velocity at front and rear wheel Ri ——Rolling radius of front and rear wheel ( i =1, 2)
Tbi ——Brake torque [8].
2.4 Tyre Model The PACEJKA nonlinear tire model under the combined condition is applied. The equation can be seen in paper[9].
3 Controller Design Owing to the coupling relations between steering and braking, each sub-controller is designed with two working modes according to the sequence of control objectives under different working conditions. The coordination controller has the following functions: giving the switching command between the two working modes of the subsystems according to the identification of the driver’s intention and the vehicle’s motion state and the supervision and detection to the subsystems, altering the control parameters of the sub-controllers , to guarantee the best overall control effect. 3.1 ABS Controller (1) Normal mode. In order to guarantee the braking performance, the control strategy of sliding mode variable structure, based on the proportional switching function, is applied. Take the difference e between the actual slip rate λ and the expected slip rate λ0 and e& as input variable; and take the variation p of brake oil pressure as output variable. The expected slip rate is taken as 0.2.
302
W. Qin et al.
The switching function is s = ce + e& , in which c is slope ratio. By the control method of proportional switching, the formula is
p = ( α e + β e& ) sgn( s ) .
(8)
Wherein, α and β are the constants that are larger than 0. (2) Steering mode. During simultaneous steering and braking, in order to guarantee the comprehensive performance of the vehicle in terms of lateral and longitudinal dynamics, adjust the value of the expected slip rate λ0 in real time. 3.2 EPS Controller (1)Normal mode. In order to improve the steering portability, the control strategy of Fuzzy Neural Network is adopted. The vehicle speed and torque of steering wheel are fuzzified by Gaussian membership function, and the fuzzy inference rule is Mamdani. The three layers in the middle of the structure are the BP neural network, and the Gradient- Descent method is applied to optimize the central parameters and width parameters of the membership function and the weight of the neural network, so that the relation between the trained vehicle speed, torque and current approaches the assistance characteristic curve[10].Finally, the Gravity method is used for defuzzification, to determine the assisted target current. (2) Braking mode. During simultaneous steering and braking, in order to ensure the braking performance, the lateral friction coefficient always decreases, thereby causing the change in steering moment, easy to weakening the road feel, making the vehicle at an instable state. Then the normal steering assisted target current value I of the EPS controller is multiplied by a correction coefficient ε , so the target current is I ′ = I ⋅ ε (0 < ε < 1) , and the assisted moment of the system shall be moderately reduced to increase the steering response and the handling stability of vehicle. 3.3 Coordination Controller The rule-based fuzzy control method was adopted. The coordination control block diagram is shown in Figure 1.
Fig. 1. Coordination control block diagram
Research on the Coordination Control of Vehicle EPS and ABS
303
The coordination rules are shown as follows: (1) ABS system: Under the linear braking condition, ABS controller controls in the normal mode; under the condition of simultaneous steering and braking, it controls in the steering mode, and adjusts the expected slip rate λ0 of the wheel according to the changes of vehicle speed, lateral acceleration and longitudinal acceleration in real time, so that the comprehensive performance can be optimized. The reference formula for the expected slip rate[4] is: λ 0 = k
a y .g a x .u
2
.
(9)
(2) EPS system: under normal steering, EPS controls in the normal mode; under the condition of simultaneous steering and braking, it controls in the braking mode and identifies the vehicle’s steering characteristic, based on the deviation e between the actual yaw velocity and the expected yaw velocity and the steering angle δ of the steering wheel, and determines the current correction coefficient ε , thereby reducing the trend of excess steering and avoiding the instability of vehicle. The reference formula for the expected yaw velocity[1] is: ωr =
u ⋅δ L⋅(1+ K ⋅ u2 )
(10)
Wherein, K ——Stability factor of vehicle L ——Axle base
4 Simulation and Result Analysis The above systems have been subject to simulation in Matlab, The multibody dynamic equation is programmed and solved by means of the Direct method in the numerical analysis. The main parameters of one reference vehicle are shown as follows: m=1360kg, l f =1.219 m, l r =1.252 m, I z =1087.8 kg⋅m2 , etc. Simulation conditions: The initial velocity is 40km/h, giving angle step input to the steering wheel. After the vehicle is at a steady circular state, apply emergency braking until the vehicle stops. The two characteristic quantities of yaw velocity and lateral acceleration are selected to inspect the vehicle maneuverability;and the braking distance is selected to inspect the braking performance.
Fig. 2. Yaw velocity
Fig. 3. Lateral acceleration
Fig. 4. Braking distance
304
W. Qin et al.
Figure 2and Figure 3 show the response curve of the vehicle’s yaw velocity and lateral acceleration under coordination control and separate control (the controller only controls in the normal mode)after the brake is applied. It can be seen that the yaw velocity falls slowly after the coordination control is applied, so that the vehicle’s steering capability is enhanced and the fluctuation of lateral acceleration is small ,so that the vehicle is not easy to have such phenomena as instability and lateral sliding, thereby effectively increasing the vehicle maneuverability. Figure 4 is the comparison of the braking distance. It can be seen that the braking distance under the coordination control is increased by 0.18m than that under the separate control; because the vehicle stability is first guaranteed in controller design, the braking distance is slightly increased, but is still within an acceptable scope and then the comprehensive performance of vehicle is improved.
5 Hardware-in-the-Loop Test 5.1 Overall Design of Test The test equipments include the trail vehicle equipped with EPS and ABS actuators, NI PXI-8196 embedded controller, 6289 data acquisition card, terminal box, selfmade hardware-driving circuit board, notebook PC, and gyroscope and so on. The test design is: Vehicle is the control terminal of this system. Host computer (equipped with LabVIEW) carries out programming, program debugging, and data display. The slave computer (PXI embedded controller) provides the software running environment and it is a real-time operation system, which can receive the vehicle status information , then analyze the information through the user-defined control algorithm in the software module and make the corresponding decision, and finally output the control commands, so as to drive the hardware circuit and control the action of actuators. The host computer and the slave computer are connected by network through TCP/IP protocol. The overall layout for test is shown as Figure 5.
Fig. 5. Test layout diagram
5.2 Software Program Design The software adopts modular programming, mainly include data acquisition and processing module, control and debugging module etc. The LabVIEW program totally adopts three threadings, namely EPS, ABS and coordination section. The three threadings are synchronous and associated. The coordination threading communicates with EPS and ABS threadings to coordinate the two subsystems.
Research on the Coordination Control of Vehicle EPS and ABS
305
As a multi-threading language, LabVIEW can realize concurrent execution of multiple independent tasks. The synchronization among three threadings can be realized by the synchronization technique in LabVIEW language, and the priority is set through timed loop. The communication among three threadings is realized through the queues technique. The queue structure is a FIFO structure which may be divided into three steps of obtaining, operation and release. The control strategies for each system under two control modes are programmed by utilizing the timed loop, CASE structure, formula node and fuzzy control toolbox etc. in LabVIEW software. 5.3 Test Results The vehicle test conditions were that the vehicle was in the steady circular state at a initial speed of about 40km/h and a lateral acceleration of about 0.3g, and then was applied with emergency brake. Figures 6 and 7 show the real-time voltage signals of the yaw velocity and lateral acceleration, and the horizontal axis is the acquisition time. It can be seen from the signal chart, compared with the separate control, the yaw velocity and the lateral acceleration under coordination control fall slowly, namely smaller slope of curve, thereby increasing the ability of the vehicle to keep steering travel,and the fluctuation of curve is smaller , thereby reducing the effect of brake on the vehicle maneuverability.
(a)Separate control
{b)coordination control
Fig. 6. Voltage signal of the yaw velocity
(a)Separate control
(b)coordination control
Fig. 7. voltage signal of the lateral acceleration
306
W. Qin et al.
The braking distance was measured by a photoelectric speed sensor. It can be seen from the value of braking distance in Table 1, as the stability is guaranteed preferentially, the braking distance is increased slightly, but the comprehensive performance of vehicle is improved. Table 1. Braking distance
Separate control Coordination control
Initial braking speed (km/h) 40.08 40.12
Braking distance(m) 11.82 12.05
Braking time(s) 2.35 2.74
6 Conclusion (1) A hierarchical coordination control research on two systems of EPS and ABS of vehicle chassis is carried out, and the multibody model for EPS system is applied. The simulation results show that, coordination control effectively improves the comprehensive performance of vehicle. (2) The hardware-in-the-loop test provides a new simulation analysis technique to solve various problems in the control systems of vehicle such as design, test and performance optimum. The results of hardware-in-the-loop test based on LabVIEW verify the effectiveness of coordination control strategy in this thesis. (3) This thesis has certain directive significance to multibody dynamics model and hardware-in- the-loop test used in the research and development of electrical control system of vehicle.
References 1. Yu, F., Lin, Y.: Vehicle System Dynamics. China Mechanical Press, Beijing (2005) 2. Taheri, S.: An investigation and design of slip control braking systems integrated with four wheel steering. In: Department of Mechanical Engineering. Clemson University (1990) 3. Zhang, L.P.: An Analysis of Vehicle Performance in Cornering with Braking & Anti-Lock Braking Control Simulation Research. Yanshan University, Qin huangdao (2004) 4. Li, J., Yu, F., Zhang, J.W., et al.: An Investigation into Optimal Target Slip Ratio Control for Simulation of an Anti-lock Braking System of Vehicles. Journal of System Simulation 13(6), 789–793 (2001) 5. Chu, C.B., Chen, W.W., Liu, X.Y., et al.: Vehicle Experiment of Hardware-in-the-loop Based on LabVIEW. Journal of Hefei University of Technology 32(1), 89–92 (2009) 6. Luo, Y.G., Yang, D.G., Li, M.H., et al.: Hardware-in-the-loop Simulation on Dynamical Coordinated Control Method in Parallel Hybrid Electric Vehicle (PHEV). Chinese Journal of Mechanical Engineering 44(5), 80–85 (2008) 7. Parviz, E.: Nikravesh: Computer-Aided analysis of mechanical systems. Prentice-Hall, Englewood Cliffs (1988) 8. Chu, C.B., Chen, W.W.: Vehicle Chassis System Based on Layered Coordinated Control. Chinese Journal of Mechanical Engineering 44(2), 157–162 (2008) 9. Bakker, E., Nyborg, L., Pacejka, B.: Tyre modeling for use in vehicle dynamics studies. SAE paper No.870421, pp. 2190–2198 (1987) 10. Yang, X.J.: Dynamic Analysis and Control Study of EPS System. Hefei University of Technology, Hefei (2003)
SVM Classifier Based Feature Selection Using GA, ACO and PSO for siRNA Design Yamuna Prasad1, K.K. Biswas1, and Chakresh Kumar Jain2 1
Department of Computer Science and Engineering, Indian Institute of Technology, Delhi, India [email protected], [email protected] 2 Department of Biotechnology, Jaypee Institute of Information Technology University, Noida, India [email protected]
Abstract. Recently there has been considerable interest in applying evolutionary and natural computing techniques for analyzing large datasets with large number of features. In particular, efficacy prediction of siRNA has attracted a lot of researchers, because of large number of features involved. In the present work, we have applied the SVM based classifier along with PSO, ACO and GA on Huesken dataset of siRNA features as well as on two other wine and wdbc breast cancer gene benchmark dataset and achieved considerably high accuracy and the results have been presented. We have also highlighted the necessary data size for better accuracy in SVM for selected kernel. Both groups of features (sequential and thermodynamic) are important in the efficacy prediction of siRNA. The results of our study have been compared with other results available in the literature. Keywords: siRNA, ACO, GA, PSO, LibSVM, RBF.
widely known evolutionary and natural computing heuristics like GA, ACO and PSO etc. [12, 13, 17, 21] for better classification and reported their suitability for the problems associated with large class of research domains like network simulation, anomaly detection, security and vigilance, image classification and Bioinformatics[11, 12]. Our work is directed towards handling the data size issue for better classification, identification of optimal features and kernel applicability for the siRNA dataset. In this work, we have shown that a SVM classifier for siRNA efficacy prediction coupled with evolutionary and natural computing heuristics significantly improves the prediction results. We are using these to obtain the most appropriate set of features for siRNA efficacy prediction. We have used both the linear and the RBF kernels for the SVM classifier. We present results for GA-SVM, ACO-SVM and PSO-SVM for three datasets, namely, the siRNA dataset, the wine and breast cancer gene wdbc dataset [6]. A new control parameter has also been proposed for ACO ant traversal stopping criteria. We also compare our results with the published results on the three data sets [5, 17, 21]. A study also has been carried out for the stability prediction of the proposed models.
2 Methodology Heuristic solution methods for optimization, often based on local neighborhood searches, are rather sensitive to starting point conditions and tend to get trapped in local minima. Conversely, in order to avoid these types of problems Ant Colony Optimization, Genetic Algorithms and Particle Swarm optimization randomizes the search space stochastically and also use the information of previous history to explore the search space [11, 12, 13, 14, 15, 18, 19] which provides global optima with proper tuning of the parameters. The descriptions of these approaches for optimal feature selection have been described in the subsections below: 2.1 ACO ACO is one of the meta-heuristic algorithms applied for several classes of optimization problems. The algorithm is entirely developed by Dorigo M. [8, 10] and later on by [11, 12, 13, 21], conceptually based upon foraging behavior of ant. This algorithm utilizes the approximate methods to provide the better solution against hard combinatorial problems. The construction of solution starts with an empty partial solution and then in the succeeding steps the current partial solution is extended by adding a feasible solution component from the set of solution components [10]. In the current work, SVM classifier performance has been used as heuristic information for feature selection. The transition rule for the ant 'm' to decide whether to include the ith feature at any time ‘t’ in its solution jointly influenced by the heuristic and level of pheromone. The probabilistic transition rule is computed by the equation 1 as follows: 1
∑
0
SVM Classifier Based Feature Selection Using GA, ACO and PSO for siRNA Design
309
k Where h is the set of feasible features that can be added to the partial solution; τi and ηi are, respectively, the pheromone value and heuristic desirability associated with feature i. α and β are two parameters that determine the relative importance of the pheromone value and heuristic information. The value of local heuristic desirability ηi for the ith feature has been evaluated using the SVM – classifier classification accuracy unlike the mutual information score and other statistical measures [12, 13]. After all ants have completed their solutions, pheromone evaporation on all nodes is triggered, and then according to equation 2, each ant k deposits a quantity of pheromone, on each node that it has traversed. Δ
|
|
|
1
| 2
0 k k Where S (t) is the feature subset found by ant k at iteration t, and |S (t)| is its length. The pheromone is updated according to both, the measure of the classifier performance C(Sk(t)) and feature subset length |Sk(t)|. The parameter φ controls the relative weight of classifier performance and feature subset length; N is the total number of feature in the data set. In our experiment, we assume that classifier performance is more important than subset length, so the value of has been set to 0.95. The total pheromone by all ants on the features is updated by the following equation which also includes the effect of evaporation: 1
Δ
3
Where ‘m’ is the number of ants and ρ is the pheromone trail decay coefficient which ranges from 0 to 1. The main role of pheromone evaporation is to avoid stagnation, that is, the situation in which all ants constructing the same solution. 2.1.1 Ants Traversal Stopping Criteria Various methodologies have been evolved for the feature selection stopping criteria like fixed number of features and constant number of accuracy inversions [12, 13]. In the former approach user defines the minimum and maximum limit for the feature subset length and ants stop selecting the next feature if the specified number of features have been included where as in the later approach if selection of a feature degrades the performance, it is termed as accuracy inversions and if the number of inversions reach the specified limit ants stop the selection feature and returns the subset. In our proposed methodology a control parameter has been defined to automatically decide the feature selection stopping criteria according to the equation 4. 4 Where, µ ranges from 1 to total number of features; when µ = 1 it considers the number of inversions as the total number of features and when µ = number of features, ant stops the feature selection when a single accuracy inversion takes place. After performing several experiments with an artificial data set we have tuned the value of µ to 3. The details of the ACO algorithm can be found in [13].
310
Y. Prasad, K.K. Biswas, and C.K. Jain
2.2 GA Genetic algorithm is a stochastic computational model inspired from the principles of biological evolutionary theory to solve optimization problems. GA model the natural phenomenon of genetic inheritance based on the principle of “survival of the fittest”. Several researchers had shown the applicability of GA to solve sequential decision process for function optimization, machine learning and general optimization problems [12, 14, 16]. In this work we have used the BVO methodology [14, 16] and roulette wheel selection method for crossover and have also used elitism technique, which retains the some better chromosomes in the new generation. 2.3 Particle Swarm Optimization Particle swarm optimization (PSO) was developed for the solution of optimization problems using social and cognitive behavior of swarm [17, 18, 19]. For the problem of feature selection, in binary PSO the particles are represented by a vector of binary values '1' or '0' which suggests the selection and removal of particular feature in the feature vector represented by the particles. The velocity and particle updating for binary PSO are the same as in the case of continuous one. However, the final decisions are made in terms of the output generated by Sigmoid function [18, 19] based on the probabilistic decision induced by the velocity vector. Details can be found in [17]. 2.4 SVM Generally machine learning methods are used to identify the pattern in the input dataset. Support vector machines (SVM), developed by Vapnik 1963, are a group of supervised learning methods that can be applied to classification or regression. LibSVM [7] is an integrated software for support vector classification, (C-SVC, nu-SVC), regression (epsilon-SVR, nu-SVR) and distribution estimation (one-class SVM). In this work we have applied the SVM classifier for evaluating the fitness in each of the methodology described as earlier. We have pipelined the SVM classifier of LibSVM library with the feature selection methodologies presented earlier. We have also investigated the kernel suitability for the various benchmark datasets [3, 6, 17]. 2.5 Feature Selection Criteria After evaluating the new generation in all proposed model, best population so for is calculated. For computing the best subset a following rule has been proposed: “If fitness is same for two subsets then the subset of minimal size is selected, but if fitness and size of the subsets are same then the subset with maximal cross validation accuracy is selected.”
3 Implementation and Results 3.1 Dataset In our experiment, we have used 3 benchmark datasets viz., siRNA dataset having 2431 siRNA sequences [3], wine data set comprised of 178 samples and 13 features
SVM Classifier Based Feature Selection Using GA, ACO and PSO for siRNA Design
311
[6] and wdbc dataset is a breast cancer dataset comprising of 569 samples and 30 features [6]. The siRNA dataset is comprised of thermodynamic features (column 121) and sequence features (22-110) with their efficacy. We have carried out a study for selecting training and testing data sets using SVM classifier, which shows that if we use the 200 most potent and 200 least potent siRNAs from the original siRNA dataset it provides better classifier learning of 92.60 cross validation accuracy as shown in figure 1, which represents, training dataset size on X-axis where ‘1’ refers to 200 most potent and 200 least potent siRNAs, 2 refer 300 most potent and 300 least potent siRNAs and so on and cross-validation accuracy on Y-axis. 3.2 Experiments The experiments have been conducted in C on Linux platform with gcc compiler. Table 1 illustrates the values of the parameters tuned for the experiments for ACOSVM, GA-SVM and PSO-SVM. In the following table v stands for initial velocity, C stands for cross-validation rate, MR stands for mutation rate, PS stands for population size and MI stands for maximum iteration. Table 1. Parameter values for the models Models ACO-SVM GA-SVM PSO-SVM
τ 1.0 -
α 1.0 2.0
β 0.1 2.0
φ 0.95 -
ρ 0.2 -
μ 3 -
ω 1.0
v 0.0
C 0.8 -
MR 0.02 -
PS 30 30 30
MI 50 50 50
The two parameters C and r of SVM classifier has been set to 212 and 16 for the wine dataset and 212 and 1 for the wdbc and siRNA dataset after performing several experiments. The author doesn’t claim for the optimality of the parameters. These values are obtained by carrying out the experiments many times on a small data set generated artificially. 3.3 Results and Discussions All the parameters of the models have been set according to the table 1. For the purpose of the experiment all the datasets have been randomly divided into two groups, with 90% samples for training and 10% samples for testing. The proposed model computes more stable results as dataset ( according to the feature subset) is randomly partitioned into the training and testing dataset 10 times and then classification is performed each time then average is taken as the fitness value for the feature subset. The use of control parameter (µ) automatically takes care of the size of the feature subset which improves the accuracy. We have implemented all three proposed model with two different kernels for the SVM- classifier linear and radial basis function (RBF) respectively. The results for the linear kernel and RBF kernel are illustrated in the table 2 and table 3 respectively. From the results it can be seen that all the three approaches produces the better results than the conventional SVM approach. In the following tables DS stands for Dataset name, WN stands for wine dataset, WD stands for wdbc dataset, HN stands for
312
Y. Prasad, K.K. Biswas, and C.K. Jain Table 2. Results using linear kernel SVM-classifier
DS
WN WD HN
SVM with all 110 features TA CVA 96.00 96.25 92.98 96.09 87.50 87.78
ACO-SVM TA 98.15 98.25 90.25
GA-SVM
CVA 97.19 95.96 87.45
NF 8 15 56
TA 100 98.77 95.28
CVA 97.19 97.19 91.11
PSO-SVM NF 5 18 79
TA 100 100 95.94
CVA 96.63 97.37 91.35
NF 10 17 71
Table 3. Results using RBF kernel SVM-classifier DS
WN WD HN
SVM with all 110 features TA CVA 97.50 96.25 94.55 96.29 89.23 88.23
ACO-SVM TA 98.15 98.83 91.67
GA-SVM
CVA 97.75 96.84 88.75
NF 8 20 42
TA 100 98.95 96.50
CVA 98.88 97.36 91.50
PSO-SVM NF 9 7 75
TA 100 100 97.50
CVA 95.51 98.07 91.00
NF 9 18 66
siRNA dataset, NF stands for number of features, CVA stands for cross validation accuracy and TA stands for test accuracy. The accuracy of the PSO-SVM is higher than the GA-SVM and ACO-SVM; where as the accuracy of the ACO-SVM is comparable to the GA-SVM. The number of features obtained by the ACO-SVM is very less than that of the PSO-SVM and GASVM for the Huesken and wine data sets. The accuracy with linear kernel is slightly less than the accuracy of the RBF kernel. During the experiments it has also been observed that the time taken by the linear kernel optimizer is very large than that of RBF kernel. The observed results have been compared with the results reported by the various researchers. The comparison of proposed model for wine and wdbc dataset has been reported in the table 4. The comparison results for the Huesken dataset have been presented in the table 5. Table 4. Comparison of proposed models with previous PSO-SVM models Dataset Wine Wdbc
Chung-Jui [17] 100 95.61
Proposed PSO-SVM 100 100
Proposed ACO-SVM 98.15 98.83
Proposed GA-SVM 100 98.95
Table 5. Comparison of proposed models with previous SVM model for siRNA data Dataset Huesken
Wang et. al.[5] 77.00
Jain et. al. [21] 71.10
Proposed PSO-SVM 97.50
Proposed ACO-SVM 91.67
Proposed GA-SVM 96.50
From the table 4, it is obvious that the results proposed by our models with the parameters described in table 1, yields higher accuracy than that reported by [17]. Table 5 shows that the observed results after feature selection have outperformed the results
SVM Classifier Based Feature Selection Using GA, ACO and PSO for siRNA Design
313
Population Fitness
reported by [5, 21]. Further, a study has been carried out for the stability of the proposed models by computing the population fitness during the iterations. Figure 2 represents the stability of all the proposed method Huesken dataset.
GA-SVM ACO-SVM PSO-SVM
Iteration
Fig. 1. Training size and CV accuracy
Fig. 2. Population fitness for siRNA dataset
The stability of ACO-SVM and PSO-SVM is more than the GA-SVM model whereas the ACO-SVM and PSO-SVM both show the better stability in the terms of population fitness which can be seen in the figure 2. Observed results show the higher accuracy than that of the previously reported one. ACO-SVM and PSO-SVM are rather more stable than GA-SVM for achieving the optimality. Even though, the complexity of the PSO-SVM and GA-SVM is less than the ACO-SVM.
4 Conclusion In this paper, we have established that GA, ACO and PSO methods supported by a SVM classifier yields better siRNA efficacy predictions than [5, 21]. It also shows that better feature selection is carried out for the wine and wdbc data sets. The obtained results using our proposed models GA-SVM and PSO-SVM describe the significance of both sequence and thermodynamic features [3, 21] while the ACO-SVM assigns more significance to the sequence features in siRNA dataset [1, 21] generated by Huesken et. al. [3].
References 1. Saetrom, P., Snove, O.: A comparison of siRNA efficacy predictors. Biochem. Biophys. Re. Commun. 321(1), 247–253 (2004) 2. Reynolds, A., Leake, D., Boese, Q., Scaringe, S., Marshall, W.S., Khvorova, A.: Rational siRNA design for RNA interference. Nat. Biotechnol. 22(3), 326–330 (2004) 3. Huesken, D., Lange, J., Mickanin, C., Weiler, J., Asselbergs, F., Warner, J., Meloon, B., Engel, S., Rosenberg, A., Cohen, D., Labow, M., Reinhardt, M., Natt, F., Hall, J.: Design of a genome-wide siRNA library using an artificial neural network. Nat. Biotechnol. 23, 995–1001 (2005) 4. Zhi, J.L., David, H.M.: OligoWalk: an online siRNA design tool utilizing hybridization thermodynamics. Nucleic Acids Research 36(Suppl. 2), 104–108 (2008)
314
Y. Prasad, K.K. Biswas, and C.K. Jain
5. Xiaowei, W., Xiaohui, W., Verma Rajeev, K., Beauchamp, L., Maghdaleno, S., Surendra, T.J.: Selection of Hyperfunctional siRNAs with improved potency and specificity. Nucleic Acids Research 37(22), 152 (2009) 6. Asuncion, A., Newman, D.J.: UCI Machine Learning Repository, School of Information and Computer Science. University of California, Irvine (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html 7. Chih-Chung, C., Chih-Jen, L.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm 8. Dorigo, M., Maniezzo, V., Colorni, A.: The ant system: Optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics - Part B 26(1), 29– 42 (1996) 9. Cheng-Lung, H.: ACO-based hybrid classification system with feature subset selection and model parameters optimization. Neurocomputing 73, 438–448 (2009) 10. Dorigo, M., Blum, C.: Ant colony optimization theory: A survey. Theoretical Computer Science, 243–278 (2005) 11. Tsang, C.H.: Ant Colony Clustering and Feature Extraction for Anomaly Intrusion Detection. In: Swarm Intelligence in Data Mining, pp. 101–123. Springer, Heidelberg (2007) 12. Nemati, S., Basiri, M.E., Ghasem-Aghaee, N., Aghdam, M.H.: A novel ACO–GA hybrid algorithm for feature selection in protein function prediction. Expert Systems with Applications 36, 12086–12094 (2009) 13. Aghdam, M.H., Ghasem-Aghaee, N., Basiri, M.E.: Text feature selection using ant colony optimization. Expert Systems with Applications 36, 6843–6853 (2009) 14. Yang, J., Honavar, V.: Feature subset selection using a genetic algorithm. IEEE Intelligent Systems 13(2), 44–49 (1998) 15. Zhao, X., Huang, D., Cheung, Y., Wang, H., Huang, X.: A Novel Hybrid GA/SVM System for Protein Sequences Classification. In: Yang, Z.R., Yin, H., Everson, R.M. (eds.) IDEAL 2004. LNCS, vol. 3177, pp. 11–16. Springer, Heidelberg (2004) 16. Raymer, M., Punch, W., Goodman, E., Kuhn, L., Jain, A.K.: Dimensionality reduction using genetic algorithms. IEEE Transactions on Evolutionary Computing 4, 164–171 (2000) 17. Chung-Jui, T., Li-Yeh, C., Jun-Yang, C., Cheng-Hong, Y.: Feature Selection using PSO-SVM. IAENG International Journal of Computer Science, IJCS 33(1), 18 (2007) 18. Liu, Y., Qin, Z., Xu, Z., He, H.: Feature selection with particle swarms. In: Zhang, J., He, J.-H., Fu, Y. (eds.) CIS 2004. LNCS, vol. 3314, pp. 425–430. Springer, Heidelberg (2004) 19. Khanesar, M.A., Teshnehlab, M., Soorehdeli, M.A.: A Novel Binary Particle Swarm Optimization. In: Proc. 15th Mediterranean Conference on Control and Automation (2007) 20. Correa, S., Freitas, A.A., Johnson, C.G.: Particle Swarm and Bayesian networks applied to attribute selection for protein functional classification. In: Proc. of the GECCO 2007 workshop on particle swarms, The second decade, pp. 2651–2658 (2007) 21. Jain, C.K., Prasad, Y.: Feature selection for siRNA efficacy prediction using natural computation. In: World Congress on Nature & Biologically Inspired Computing (NaBIC 2009), pp. 1759–1764. IEEE Press, Los Alamitos (2009)
A Discrete-Time Recurrent Neural Network for Solving Systems of Complex-Valued Linear Equations Wudai Liao, Jiangfeng Wang, and Junyan Wang School of Electrical and Information Engineering Zhongyuan University of Technology Zhongyuan West Road 41, Zhengzhou, Henan Province, China [email protected], [email protected], [email protected]
Abstract. A discrete-time recurrent neural network is presented in this paper for solving systems of complex-valued linear equations. The network shown in this paper is simple in structure and can converge to the solutions of complex-valued linear equations. The condition for the neural network to globally converge to the complex-valued linear equations is given. An illustrative example is presented to illustrate its performance.
1
Introduction
The system of complex-valued linear equations had been widely used in many analytical and design problems such as signal processing, circuit design, and controller synthesis, and it also used in many large-scale systems. The nature of parallel distributed processing makes neural networks viable alternatives for solving simultaneous linear equations in real time, and neural networks for solving linear algebraic systems had been developed (see Ref[1,2]). Ref[3] had presented a continuous-time recurrent neural network for solving systems of complex-valued linear equations. Considering the equations as follows Ax = b
(1)
where x is an n-dimensional vector of the unknown solutions, A ∈ Cn×n is a complex matrix of coefficients, and b ∈ Cn is a complex vector of right-hand sides. A = Are + jAim , b = bre + jbim , x = xre + jxim , and j denotes the imaginary component. However, in view of the availability of the digital hardware and the compatibility to the digital computers, the discrete-time version is more desirable in practical applications. In this paper, we will propose a discrete-time network, which is the counterpart of the continue-time network in Ref[3], and give the condition for the discrete-time recurrent neural network with global convergence. Y. Tan, Y. Shi, and K.C. Tan (Eds.): ICSI 2010, Part II, LNCS 6146, pp. 315–320, 2010. c Springer-Verlag Berlin Heidelberg 2010
316
2
W. Liao, J. Wang, and J. Wang
Network Descriptions
In this section, we will firstly review the continuous-time network introduced in Ref[3], which is represented by a system of differential equation. Then, we propose its discrete-time version by discretization, which is expressed by a system of difference equations. 2.1
Continuous-Time Network
According to Ref[3], the system of complex-valued linear equations Ax = b can be explicitly described as (Are + jAim ) × (xre + jxim ) = bre + jbim . By decomposing the explicit expression into real and imaginary parts, the system of n-dimensional complex-valued linear equations can be written as the following 2n real-valued linear equations: Are −Aim xre bre = (2) Aim Are xim bim Let the augmented coefficient matrix, augmented right-hand side vector, and augmented unknown vector be defined as follows Are −Aim bre xre ˜ ˜ A= ,b = , x˜ = (3) Aim Are bim xim ˜x = ˜b. Noting that The reformulated problem can be rewritten as A˜ Are −Aim xre T ˜ x = ( xre xim ) x ˜ A˜ Aim Are xim
(4)
When Are is positive definite and Aim is symmetric, then the A˜ is positive definite. Then a recurrent neural network with single-layer architecture can be used. The dynamic equation of the recurrent neural network can be expressed as follows dv(t) ˜ = −η1 Av(t) + η1˜b (5) dt where vector v(t) denotes the activation states of the hidden neurons in the input layer. 2.2
Discrete-Time Network
Taking discretization of the dynamical equation of the continuous-time network (5), we propose the dynamical equation for its discrete-time counterpart as x(k + 1) − x(k) ˜ = α(−Ax(k) + ˜b) (k + 1) − k
(6)
˜ x(k + 1) = (I − αA)x(k) + α˜b
(7)
So we can get where I denotes an identity matrix, and α is a positive fixed step size. Figure 1 shows the block diagram for the configuration of the discrete-time xre (k) ˜ network, where x(k) = , and Q = I − αA. xim (k)
A Discrete-Time Recurrent Neural Network
317
Fig. 1. Block diagram for the configuration of the discrete-time network
3
Convergence Analysis
Lemma 1. The equilibrium point of the dynamical system (7) is equal to the optimal solution of the equation (1). Proof. Let v ∗ be the equilibrium point of the dynamical system (7), then by the nature of equilibrium point ,we have
Equation (8) implies
˜ ∗ + α˜b v ∗ = (I − αA)v
(8)
˜ ∗ − ˜b = 0 Av
(9)
which satisfies the optimality condition (2). Hence the equilibrium point of the dynamical system (7) maps to the optimal solution of the equation (1). The proof is complete. Theorem 1. The discrete-time network (5) is asymptotically convergent to the optimal solution of the equation (1) if α < min
1≤i≤2n
˜ 2Re[ρi (A)] ˜ 2 |ρi (A)|
(10)
˜ denotes the ith eigenvalue of the matrix A, ˜ Re[ρi (A)] ˜ and |ρi (A)| ˜ 2 where ρi (A) ˜ respectively. are the real part and the absolute value of ρi (A), Proof. The discrete-time network (7) is described by first-order difference equation. So, it is a linear system. By the linear system theory, we know that system ˜ have absolute (7) is asymptotically stable if all eigenvalues of the matrix (I −αA) values less than 1, i.e., ˜ < 1, ∀i = 1, 2, · · · , 2n . |ρi (I − αA)|
(11)
318
W. Liao, J. Wang, and J. Wang
By using the properties of eigenvalue of matrices, the inequality (11) can be written as ˜ < 1, ∀i = 1, 2, · · · , 2n . |ρi (I) − αρi (A)|
(12) √ ˜ = Re[ρi (A)] ˜ + jIm[ρi (A)], ˜ where j = −1. Since for all ρi (I) = 1, and let ρi (A) For all i = 1, 2, · · · , 2n, inequality (12) becomes ˜ − jαIm[ρi (A)]| ˜ <1 . |1 − αRe[ρi (A)]
Since inequality (14) has satisfied for all i = 1, 2, · · · , 2n, which implies that α has an upper bound, which equals to the minimum of the right hand side of (14). Therefore, the condition of the positive fixed step size α, which makes the discrete-time network to be asymptotically stable is α < min
It should be noted that the right hand side of (15) is always greater than zero, because A˜ is positive definite. It has been proven in Ref[4] that the continuoustime network(5) is asymptotically stable. The fixed step size α obtained by (15) is positive. From Lemma 2 and with the convergence condition (15), the discrete-time network is convergent to the equilibrium point, which is equal to the global solution of the quadratic program (1). The proof is complete.
4
Simulation Results
Considering the quadratic program (1) with ⎛ ⎞ ⎛ ⎞ 0.1 + j0.15 0.2 + j0.2 −0.3 − j0.19 0.5 + j0.7 A = ⎝ 0.1 + j0.2 0.5 + j0.23 0.6 + j0.3 ⎠ , b = ⎝ 0.4 − j2.3 ⎠ 0.1 − j0.19 −0.4 + j0.3 0.7 + j0.17 −1.2 + j2.9 The theoretical optimal solution for this quadratic program is T
x = ( −4.79 + j5.6 4.09 − j3.24 −1.87 − j1.11 )
A Discrete-Time Recurrent Neural Network
319
Fig. 2. Time evolution of the decision variable generated by the discrete-time network with different fixed steps. Transient states of the decision variable with (a) α = 0.2; (b) α = 0.9; (c) α = 1.17; (d) α = 1.35.
By using the orthogonal decomposition method,and according to the equation (2) and (3) , we have ⎛
˜ denotes the eigenvalue of the matrix A, ˜ Re[ρi (A)] ˜ and |ρi (A)| ˜ are where ρi (A) ˜ respectively. the real part and the absolute value of ρi (A), So, the discrete-time network is globally convergent to the optimal solution of the desired quadratic program if we select α < 1.269.
320
W. Liao, J. Wang, and J. Wang
Fig 2 shows the time evolution of the decision variable generated by the discretetime network with different values of the fixed step size α. In subplot (a), we select the fixed step size to be α = 0.2. It noted that the discrete-time network generates the optimal solution x = [ −5.1 + j5.44 4.14 − j3.14 −1.83 − j1.18 ]T after 50 iterations. In subplot (b), where the fixed step size is selected to be α = 0.9, it can achieves to the theoretical solution only after about 25 iterations. In subplot (c), when the fixed step size is set as α = 1.17, which is near the upper bound but under it, we find the estimated optimal solution after 50 iterations is x = [ −4.76 + j5.60 4.12 − j3.18 −1.90 − j1.05 ]T , which is not equal to the theoretical optimal solution, and the convergence process of the system shows to be unsteady. Subplot (d) shows the network is divergent when the fixed step size α = 1.35, which is greater than its upper bound 1.267. Therefore, we can speed up the rate of convergence by selecting a greater step size below its upper bound, but not too close to it.
5
Concluding Remarks
This paper presents a discrete-time network for solving complex-valued linear equations. The discrete-time recurrent neural network is obtained through discretization of its continuous-time counterpart, which was introduced in Ref[3]. We have given the condition of the fixed step size used in the discrete-time model, which guarantees the proposed discrete-time network to be asymptotically convergent to the optimal solution of the desired quadratic program. The structure of the discrete-time model is very simple, and it is easier to be implemented in practical applications because of the availability of the digital hardware. Acknowledgement. This work was supported by National Nature Science Foundation of China (60774051).
References 1. Jun, W.: Electronic Realisation of Recurrent Neural Network for Solving Simultaneous Linear Equations. Electron Letters 28(5), 493–495 (1992) 2. Cichocki, A., Unbehauen, R.: Neural Networks for Solving Systems of Linear Equations and Related Problems. IEEE Trans. 39(2), 124–138 (1992) 3. Jun, W.: Recurrent Neural Networks for Solving Systems of Complex-valued Linear Equations. Electron Letters 28(18), 1751–1753 (1992) 4. Wai, S.T., Jun, W.: A Discrete-time Lagrange Network for Solving Constrained Quadratic Programs. International Journal of Neural Systems 10(4), 261–265 (2000)
A Recurrent Neural Network for Solving Complex-Valued Quadratic Programming Problems with Equality Constraints Wudai Liao, Jiangfeng Wang, and Junyan Wang School of Electrical and Information Engineering Zhongyuan University of Technology Zhongyuan West Road 41, Zhengzhou, Henan Province, China [email protected], [email protected], [email protected]
Abstract. A recurrent neural network is presented for solving systems of quadratic programming problems with equality constraints involving complex-valued coefficients. The proposed recurrent neural network is asymptotically stable and able to generate optimal solutions to quadratic programs with equality constraints. An opamp based analogue circuit realization of the recurrent neural network is described. An illustrative example is also discussed to demonstrate the performance and characteristics of the analogue neural network.
1
Introduction
Quadratic programming represents a class of optimization problem in which the objective function to be optimized is quadratic and the constraints to be conformed to are linear. Let x be an n-dimensional column vector of decision variables, we can get a quadratic programming problem with equality constraints as follows: 1 T min x Qx + xT c 2 s.t. Ax = b (1) where x ∈ Rn is the decision variables, Q ∈ Rn×n is the matrix of objective coefficients. We assume the matrix Q be symmetrical and positive definite. c ∈ Rn is the column vector of objective coefficients. A ∈ Rm×n (m < n), each column of A is formed by the coefficients of the corresponding constraint. We assume that the matrix A be full rank. b ∈ Rn is the vector of constraint coefficients. Recently, neural networks for solving quadratic programming problems have been developed by several researchers (see Ref[1,2,3]). The nature of parallel distributed processing makes neural networks viable alternative for solving simultaneous quadratic programming in real time. Whereas these published results are concerned with real-valued quadratic programming. This paper shows that recurrent neural networks can be also applied to solving systems of complexvalued quadratic programming based on the results of Ref[1,2,3]. Y. Tan, Y. Shi, and K.C. Tan (Eds.): ICSI 2010, Part II, LNCS 6146, pp. 321–326, 2010. c Springer-Verlag Berlin Heidelberg 2010
322
2
W. Liao, J. Wang, and J. Wang
Neural Network Design
When the coefficients in (1) are in real value, the Lagrange of the quadratic programming problem with equality constraints described in (1) can be defined as follows (Ref[2]): 1 T x Qx + xT c + λT (Ax − b) (2) 2 where λ is an m-dimensional column vector of Lagrange multipliers. By setting the gradients of L(x, λ) with respect to x and λ to zero, the Lagrange necessary condition gives rise to the following matrix-form algebraic equation Q AT x −c = (3) A N λ b L(x, λ) =
where N is an m × m null matrix. It is known that the (n + m)-dimensional linear system of algebraic equations above has a unique solution if Q is positive definite and A has full rank. When the coefficients in (1) are complex-valued but not in real value, we let Qre and Qim denote the real and imaginary part of the complex matrix Q, respectively; cre and cim denote the real and imaginary part of the complex vector c; Are and Aim denote the real an imaginary part of the complex matrix A; bre and bim denote the real and imaginary part of the complex vector b; xre and xim denote the real and imaginary part of the complex solution vector x, respectively. According to equation (3), we can get x Q AT x −c Q AT × − × = A N λ re A N λ im b re re im Q AT x Q AT x −c × + × = (4) A N λ re A N λ im b im im re That is
For a system of simultaneous linear algebraic equation φz = ξ, the dynamical equation of the recurrent neural network is as follows (Ref[4]): dz(t) = −αΦT Φz(t) + αΦT ξ dt where α > 0 is a scaling parameter . Let ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ Qre AT −Qim −AT xre −cre re im N −Aim N ⎜A ⎟ ⎜ λre ⎟ ⎜ bre ⎟ Φ = ⎝ re ⎠,z = ⎝ ⎠,ξ = ⎝ ⎠ Qim AT Qre AT xim −cim re im Aim N Are N λim bim
(6)
A Recurrent Neural Network
323
According to equation (6), the dynamical equation of the recurrent neural network can be described as ⎛ dxre (t) ⎞ ⎛ ⎞ xre (t) dt ⎜ dλre (t) ⎟ ⎜ dt ⎟ ⎜ λre (t) ⎟ T (7) ⎜ dxim (t) ⎟ = −αΦT Φ ⎝ ⎠ + αΦ ξ xim (t) ⎝ dt ⎠ dλim (t) λim (t) dt
The proposed analogue neural network is composed of 2(n + m) massively connected neurons representing x(t) and λ(t). The symmetric connection weight matrix W and biasing threshold vector θ are defined respectively as ⎛
Qre ⎜ Are W = −α ⎝ −Qim −Aim
AT re N −AT im N ⎛
Qre ⎜ Are θ = α⎝ −Qim −Aim
Qim Aim Qre Are
AT im N AT re N
AT re N −AT im N
⎞⎛
Qre ⎟ ⎜ Are ⎠⎝ Qim Aim
Qim Aim Qre Are
AT im N AT re N
AT re N AT im N
−Qim −Aim Qre Are
−AT im N AT re N
⎞ −cre ⎟ ⎜ bre ⎟ ⎠⎝ ⎠ −cim bim
⎞ ⎟ ⎠
⎞⎛
Fig. 1. Opamp based circuit realization of analogue neural network
(8)
324
W. Liao, J. Wang, and J. Wang
Fig.1 illustrates an opamp based analogue circuit schematic diagram of the neural network for solving complex-valued quadratic programs with equality constraints. Each neuron can be realized by a summer, an integrator, and an inverter. The ohmic value of each connection resistor is determined according to the magnitude of the corresponding connection weight, i.e. for i, j = 1, 2, · · · , 2(n + m), Rij = Rf /|wij |. The connecting terminal of each connection resistor is determined according to the sign of the connection weight, i.e. if the connection from neuron j to neuron i is excitatory (i.e. wij > 0), then connect Rij to the vi terminal; if the connection from neural j to i is inhibitory (i.e. wij < 0), then connect Rij to the −vi terminal. The biasing threshold for neuron i can be realized by a voltage source Ei such that Rf Ei /Ri = θi or a current source Ii such that Rf Ii = θi .
3
An Illustrative Example
The following illustrative example demonstrates the operating characteristics of the designed analogue neural network in solving quadratic programming problems with equality constraints. Considering the numerical example with the following coefficients ⎛ ⎞ 1+j 2+j 3+j Q = ⎝ 2 + j 5 + j2 1 + j ⎠ , c = ( 1 + j 2 + j 3 + j )T 3 + j 1 + j 7 + j3 A=
1+j 2+j
2+j 1+j
3+j 3+j
,
b = (1 + j
3 + j )T
The optimal solution of the problem is x∗ = ( 1.507 − j0.214 −0.409 − j0.215 0.077 − j0.184 )T and the associated Lagrange multiplier is λ∗ = ( −0.846 − j1.319 −1.756 + j0.96 ) According to equation (8), the are as follows ⎛ −24 −25 −42 ⎜ −25 −43 −35 ⎜ ⎜ −42 −35 −90 ⎜ ⎜ −17 −19 −31 ⎜ W = α ⎜ −16 −16 −33 ⎜ 0 2 ⎜ 0 ⎜ 0 5 ⎜ 0 ⎝ −2 −5 0 0 1 −1
θ = α ( −15 −25 −30 −24 −25 −10 −20 −6 −9 −5 )T Let α = 1000, Rf = 100MΩ, Rc = 10kΩ, c = 100μF, c = 100μF, and Ri = 10kΩ, for i = 1, 2, · · · , 2(n + m); the connection resistance matrix in kilo-ohms and voltage source array in volts can be determined as follows, where the plus and minus signs indicate the connection resistance associated with excitatory and inhibitory connection, respectively. G B [Rij ] = BT G [Ei ] = ( −1.5 −2.5 −3 −2.4 −2.5 −1 −2 −0.6 −0.9 −0.5 )T The matrix G and B in [Rij ] are as follows: ⎛
−4.17 ⎜ −4 ⎜ G = ⎜ −2.38 ⎝ −5.88 −6.25
−4 −2.33 −2.86 −5.26 −6.25
−2.38 −2.86 −1.11 −3.23 −3.03
−5.88 −5.26 −3.23 −5.88 −6.25
⎞ −6.25 −6.25 ⎟ ⎟ −3.03 ⎟ ⎠ −6.25 −5.88
⎛
⎞ ∞ ∞ −50 ∞ ∞ ∞ −20 ∞ 100 ⎟ ⎜∞ ⎜ ⎟ 20 ∞ −100 −100 ⎟ B = ⎜ 50 ⎝ ⎠ ∞ ∞ 100 ∞ ∞ ∞ −100 100 ∞ ∞ Rij = ∞ denotes that there is no link between neuron j and neuron i by Rij .
Fig. 2. Transient states of opamp based on analogue neural network
326
W. Liao, J. Wang, and J. Wang
The simulation results show that the steady states of the analogue neural network indeed represent the optimal solution and associated Lagrange multiplier. Fig.2 illustrates the transient behavior of the opamp based analogue neural network activation states.
4
Conclusion
In this paper, an analogue neural network for solving complex-valued quadratic programming problems with equality constraints has been developed. An opamp based circuit realization has been designed. An illustrative example is discussed. The proposed analogue neural network has been shown to be capable of generating optimal solutions to quadratic programs with equality constraints. The proposed analogue neural network has also been shown to be realizable by an analogue circuit. Because the solution process is inherently parallel and distributed, the convergence rate is independent of the problem size. Furthermore, the convergence rate of the neural network can be scaled by properly selecting the design parameter α. These features enable the proposed neural network to solve the large-scale quadratic programming problems in real time. Acknowledgement. This work was supported by National Nature Science Foundation of China (60774051).
References 1. Kennedy, M., Chua, L.O.: Neural Networks for Nonlinear Programming. IEEE Trans. CAS-35(5), 554–562 (1988) 2. Wang, J.: Recurrent Neural Network for Solving Quadtratic Propramming Problems with Equality Constraints. Electronics Letter 28(14), 345–1347 (1992) 3. Wudai, L., Jiangfeng, W.: A Lower Order Recurrent Neural Network for Solving Higher Order Quadratic Programming Problems with Equality Constraints. In: Proceedings of the Second International Joint Conference on Computational Sciences (CSO 2009), Sanya, Hainan, China, April 24-26, pp. 176–178 (2009), 978-0-7695-3605-7 4. Jun, W.: Electronic Realisation of Recurrent Neural Network for Solving Simultaneous Linear Equations. Electron Letters 28(5), 493–495 (1992)
Computer-Aided Detection and Classification of Masses in Digitized Mammograms Using Artificial Neural Network Mohammed J. Islam, Majid Ahmadi, and Maher A. Sid-Ahmed Department of Electrical and Computer Engineering University of Windsor, Windsor, ON, Canada {islam1l,ahmadi,ahmed}@uwindsor.ca
Abstract. In this paper we present a computer aided diagnosis (CAD) system for mass detection and classification in digitized mammograms, which performs mass detection on regions of interest (ROI) followed by the benign-malignant classification on detected masses. In order to detect mass effectively, a sequence of preprocessing steps are proposed to enhance the contrast of the image, remove the noise effects, remove the x-ray label and pectoral muscle and locate the suspicious masses using Haralick texture features generated from the spatial gray level dependence (SGLD) matrix. The main aim of the CAD system is to increase the effectiveness and efficiency of the diagnosis and classification process in an objective manner to reduce the numbers of false-positive of malignancies. Artificial neural network (ANN) is proposed for classifying the marked regions into benign and malignant and 83.87% correct classification for benign and 90.91% for malignant is achieved. Keywords: Mammograms, Artificial Neural Network, Region Growing, Haralick Texture Features.
1
Introduction
Breast cancer continues to be a public health problem in the world. It is the second leading cause of death in Canada for women, after lung cancer [1]. Early detection of breast cancer, allowing treatment at an earlier stage, can significantly reduce breast cancer mortality. Mammography has been one of the most reliable methods for early detection of breast carcinomas. X-ray mammography is currently considered as standard procedure for breast cancer diagnosis. However, retrospective studies have shown that radiologists do not detect all breast cancers that are visible on the mammograms [2]. Double reading has been suggested to be an effective approach to improve the sensitivity. But it becomes costly because it requires twice as many radiologists’ reading time. Cost effectiveness is one of the major requirements for a mass screening program to be successful. So, the main objective of this paper is to develop a CAD system for breast cancer diagnosis and detection based on automated segmentation of masses in mammograms to increase the sensitivity in aid of radiologist. It is Y. Tan, Y. Shi, and K.C. Tan (Eds.): ICSI 2010, Part II, LNCS 6146, pp. 327–334, 2010. c Springer-Verlag Berlin Heidelberg 2010
328
M.J. Islam, M. Ahmadi, and M.A. Sid-Ahmed
expected that the automated methods for detection and classification can be used as a second opinion to aid the radiologist by indicating the locations of suspicious abnormalities called ROI in mammograms based on their physical attributes. The final decision regarding the likelihood of the presence of a cancer depends on the radiologist. The principal stages of breast cancer detection and classification is depicted in figure 1. a) Sample Cancerous Mammogram
a) Principal Stages of Breast Cancer Detection
Fig. 1. Sample mammogram and principal stages of breast cancer detection
In this paper automated seeded region growing is presented based on the Haralick texture features to extract mass from the suspicious area of the breast region. Once mass is extracted from the ROI, it is classified into benign and malignant using ANN. This paper is organized as follows. Section 2 briefly reviews some existing techniques for segmentation and classifications. Section 3 describes the materials and proposed methods for image segmentation and classification. Section 4 demonstrates some simulation results and their performance evaluation and finally conclusions are presented in section 5.
2
Literature Review
Masses and microcalcifications are two important early signs of breast cancer. Masses are often indistinguishable from the surrounding parenchyma because their features can be obscured or similar to the normal inhomogeneous breast tissues [2]. This makes the automatic mass detection and classification challenging. Reports shows that the estimated sensitivity of the radiologists is about 75%. To increase the rate it is especially very important to use the computer-aided system [2]. In recent years, a few researchers have used different approaches to do the detection and classification of masses. The segmentation and classification steps
Computer-Aided Detection and Classification of Masses
329
are vital for the performance of the CAD system that is shown in figure 1. The segmentation techniques reported in the literature are divided into five different classes [13], which are: (1) Histogram-based methods (2) Edge-based methods (3) Template matching-based methods (4) Region growing-based methods (5) Bilateral image subtraction and (6) Fuzzy techniques Histogram-based methods [3] are divided into two types, global thresholding and local thresholding. Global thresholding methods are based on global information, such as histogram of the mammograms. It is widely used and easy to implement. But it is not good for identifying ROIs and false-positive (FP) and false-negatives (FN) may be too high. In local thresholding, the threshold value is determined locally. It can refine the results of global thresholding, and is better for mass detection than global thresholding. The edge-based methods [4] -traditional method for image segmentation that detects the discontinuity in mammograms are used. In template-matching [5]- the possible masses are segmented from the background using prototypes. The region growing [6] methods find a set of seed pixels first, then grow iteratively and aggregate with the pixels that have similar properties. In these methods the main challenge is to find the seed points because of the peculiarity of these kind of object of interest. Bilateral image subtraction [7] is based on the normal symmetry between the left and right breast. The fuzzy techniques [8] includes fuzzy thresholding and fuzzy region growing. It can handle the unclear boundary between normal tissue and tumors but it is not easy to determine suitable membership functions and rules. Features extraction and selection is a key step in mass detection and classification. Features are calculated from the region of interest (ROI) characteristics such as size, shape, density and smoothness etc. [9]. Feature space is very large and complex due to wide diversity of the normal tissues and the variety of the abnormalities. Feature space can be divided into 3 sub-spaces intensity features, shape features and texture features. Haralick texture features [10] are based on the gray-level co-occurrence matrix (GLCM) or gray-level dependence matrix (GLCD). Once the features are extracted and selected, the features are input into a classifier to classify the detected suspicious areas into benign or malignant. Classifiers such as linear discriminants analysis (LDA)[11] and ANN have performed well in mass classification. ANNs area adjusted or trained so that a particular input leads to a specific desired or target output. It uses the most common NN model multi-layer perceptron (MLP) model and supervised training methods to train the NN. It is robust, no rule or explicit expression is needed and widely applicable [13].
3
Proposed Methods
In this section, we propose an automatic seeded region growing method to extract the mass from the ROI. Haralick texture features are used to select the seed from the ROI from where the region growing starts and then grow iteratively and aggregate with the pixels that have similar properties. Then the statistical
330
M.J. Islam, M. Ahmadi, and M.A. Sid-Ahmed
and textural features [12] are extracted from the extracted mass to use them for classification into benign or malignant. Proposed steps to extract the mass and classification are as follows: 3.1
ROI Preprocessing
The size of a digitized mammogram is generally very large but the ROIs are very small and limited to areas being determined as suspicious regions of masses such as shown in figure 1(a). So, the first step is to separate the ROIs from the image background so the the image processing will not be overwhelmed and dominated by the large image background. To do that x-ray removal, pectoral muscle removal, breast region and ROI extraction are the key steps that is shown in figure 2. a) Original mammogram
Histogram of the global ROI
b) Xray label removed region
Extracted significant peak region
10000 8000 6000 4000 2000 0
Pectoral muscle
0
100
200
Region of Interest
Petoral muscle suppressed breast global ROI
Sample ROI
Fig. 2. Breast Extraction, ROI extraction and Sample ROI
3.2
Mass Extraction
In order to extract suspicious regions(ROI) from the whole mammographic image a preliminary automatic identification is done that probably contain massive lesions. Next step is to extract the contour of tumoral masses from this ROI. The proposed system consists of contrast enhancement, segmentation by automatic seeded region growing algorithm using Haralick texture features [10]. Contrast Enhancement. The contrast of the ROI is enhanced using the following nonlinear operator. Ien (i, j) =
I(i, j) Imax
k ∗ Imax
(1)
where k = 2, 3, 4..... and I(i, j), Ien (i, j) are the pixel intensity of the original image and enhanced image respectively. Imax is the maximum intensity of the original image. This way we penalize the dark pixels more than the bright pixels [6].
Computer-Aided Detection and Classification of Masses
331
Automatic Seeded Region Growing Using Haralick Texture Features. Region growing approaches exploit the important fact that pixels which are close together have similar gray levels. The process starts from one or more points called seed points [6]. We propose an automated method to select the seeds. The steps are as follows: 1. The first step is to divide enhanced ROI into RxR non-overlapping blocks. Figure 3 shows the sample blocks division and the seed block location. Segmented Mass
Extracted Contour
Estimated Mass Region
Extracted Mass
Seed block
Fig. 3. Block Division and Extracted Mass
2. Calculate the Haralick texture features from SGLD matrix of each block. Then select the significant features that can easily discriminate mass and non mass region. 3. Select the block that contains mass based on the features. Maximum gray level of that block is the seed point. 4. Region growing starts from that point and then grow iteratively and aggregate with the pixels that have similar properties that results is segmented mass region. 5. Extract the mass region from the original image that is used as an input for classification. Figure 3 shows the extracted mass. 3.3
Mass Classification
One of the major mammograohic characteristics for mass detection and classification is texture. ANN exploits this important factor to classify the mass in benign or malignant. The statistical textural features used in characterizing the masses are mean, standard deviation, entropy, smoothness, skewness, kurtosis and uniformity [12]. These 7 features are used in preparing the training data for multi-layer perceptron (MLP) neural network which are obtained from the whole extracted mass region. The 7 features and their corresponding target value (for benign=0 and malignant=1) are stored in a file and then used as inputs to the NN to train the network to produce the weights needed for testing the classifier. Figure 4 shows the sample screen capture of training data preparation and classification.
332
M.J. Islam, M. Ahmadi, and M.A. Sid-Ahmed
Fig. 4. Training data preparation and mass classification
4
Simulation Results and Performance Evaluation
To develop and evaluate the proposed system we used the Mammographic Image Analysis Society (MiniMIAS) [11] database. It includes radiologist’s ”truth”markings on the locations of any abnormalities that may be present. The existing data in the collection consists of the location of the abnormality, its radius, breast position (left or right), type of breast tissues and tumor type if it exists (benign or malign). Among the region-based approaches [6], the region growing algorithm appears as the natural choice in the mass segmentation, since the peculiarity of these kind of objects of interest is the connectivity of pixels, neither edge or luminance alone be used for the isolation of the region inside the mass. The proposed seed selection method made the process easier. This proposed method is applied to 82 benign and malignant images and 84.15% correct segmentation, 15.85% incorrect segmentation is obtained whereas the radiologist’s sensitivity is 75%. The performance of the proposed algorithm is assessed by comparing the segmented area by our algorithm with, as a ground truth, the area within a radiologist marked contour [6]. The terms that are used for that purpose are, estimated region (ER) is the segmented region by our algorithm, reference region (RR)- circular area estimated by the radiologist, area difference (AD)- difference between RR and ER, true positive (TP)- intersection of ER and RR, false positive(FP)- the area not identified in RR, false negative (FN)- the area in RR not identified in ER, and completeness (CM) and correctness (CR) are defined P TP as follows: CM = T PT+F N and CR = T P +F P . For 10 images 78% CM and 94% CR is achieved in this experimentation. ANN is used to classify the extracted mass into benign or malignant. It has 3 layers structure comprised of 7 units in input layer, 5 units in hidden layer and 1 unit in the output layer. So the total weights becomes 40. Total 69 correctly segmented masses are used for classification where 25% images are used for training and 75% are used for testing purpose and the overall classification for benign is 83.87% and for malignant 90.91% whereas biopsy results shows that 65-90% turned into benign.
Computer-Aided Detection and Classification of Masses
333
Fig. 5. Segmantation Performance Evaluation
5
Conclusion
In this paper a computer-aided system for detection, segmentation and classification of masses is presented. Initially the x-ray label is removed using global Otsu thresholding technique followed by connected component labeling. Pectoral muscle is removed using automatic region growing method. ROI is extracted using peak analysis from the histogram of the breast tissue. Automated seeded region growing is proposed for image segmentation using Haralick texture features. Sum Average is found the most discriminative features among 13 features and finally segmented image is being smoothed using mathematical morphology operators. Performance of the proposed method is evaluated using efficiency, adaptability and robustness. Correct segmentation is achieved 84.15% that is very much very promising compare to the radiologists sensitivity 75%. 3 Layers artificial neural network is proposed for mass classification. Correct classification for benign is achieved 83.87% and for malignant 90.91%. Results are encouraging and have shown promise of our proposed system.
References 1. Canadian Breast Cancer Foundation, http://www.cbcf.org/breastcancer/bc_whatbc_bc.asp 2. Yang, S.C., Wany, C.M., et al.: A Computer-aided System for Mass Detection and Classification in Digitized Mammograms. J. Bio. Med. Engg.- Appl., Basis and Comm. 17, 215–228 (2005) 3. Gimenez, V., Manrique, D., Rios, J., Vilarrasa, A.: Iterative method for automatic detection of masses in digital mammograms for computer-aided diagnosis. In: Proceedings of SPIE–The International Society for Optical Engineering 3661, vol. II, pp. 1086–1093 (1999)
334
M.J. Islam, M. Ahmadi, and M.A. Sid-Ahmed
4. Abdel-Mottaleb, M., Carman, C.S., Hill, C.R., Vafai, S.: Locating the boundary between breast skin edge and the background in digitized mammograms. In: Digital Mammography, pp. 467–470. Elsevier, Amsterdam (1996) 5. Lai, S.M., Li, X., Biscof, W.F.: On techniques for detecting circumscribed masses in mammograms. IEEE Trans. Med. Imaging 18(4), 377–386 (1989) 6. Mencattini, A., Rabottino, G., Salmeri, M., Lojacono, R., Colini, E.: Breast Mass Segmentation in Mammographic Images by an Effective Region Growing Algorithm. In: Blanc-Talon, J., Bourennane, S., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2008. LNCS, vol. 5259, pp. 948–957. Springer, Heidelberg (2008) 7. Mendez, A.J., Tahoces, P.G., Lado, M.J., Souto, M., Vidal, J.J.: Computer-aided diagnosis: automatic detection of malignant masses in digitized mammograms. Med. Phys. 25(6), 957–964 (1998) 8. Sameti, M., Ward, R.K.: A fuzzy segmentation algorithm for mammogram partitioning. In: Doi, K., Giger, M.L., Nishikawa, R.M., Schmidt, R.A. (eds.) Digital Mammography, pp. 471–474. Elsevier, Amsterdam (1996) 9. Undrill, P., Gupta, R., Henry, S., Dowing, M.: Texture analysis and boundary refinement to outline mammography masses. In: Proceedings of the 1996 IEE colloquium on Digital mammography, pp. 51–56 (1996) 10. Haralick, R.M., Shanmugam, K., Denstein, I.K.: Textural Features for Image Classification. IEEE Transctions on systems, man and cybernetics 3(6), 610–621 (1973) 11. Suckling, J., et al.: The Mammographic Image Analysis Society Digital Mammogram Database Exerpta Medica. International Congress Series, vol. 1069, pp. 375–378 (1994) 12. Alginahi, Y.: Thresholding and character recognition is security documents with watermarked background. In: Proceedings of Intl. Conf. on Digital Image Computing: Techniques and Applications (DICTA 2008), pp. 220–225 (2008) 13. Cheng, H.D., Shi, X.J., Hu, L.M., Cai, X.P., Du, H.N.: Approaches for automated detection and classification of masses in mmammograms. Pattern Recognition 39, 646–668 (2006)
Gene Selection and PSO-BP Classifier Encoding a Prior Information Yu Cui, Fei Han∗, and Shiguang Ju School of Computer Science and Telecommunication Engineering, Jiangsu University, Zhenjiang, Jiangsu, China [email protected], [email protected], [email protected]
Abstract. Selecting a relevant and discriminative combination of genes for cancer classification and building high-performing classifier are common and critical tasks in cancer classification problems. In this paper, a new approach is proposed to address the two issues at the same time. In details, BP neural network is employed to construct a classifier, and PSO algorithm is used to select a discriminative combination of genes and optimize the BP classifier accordingly. Besides, sample’s prior information is encoded into PSO algorithm for better performance. The proposed approach is validated on the leukemia data set. The experimental results show that our novel method selects fewer discriminative genes while has comparable performance to the traditional classification approaches. Keywords: Cancer classification, High-dimensional and small sample, BP neural network, Particle swarm optimization.
1 Introduction The diagnosis of cancer has traditionally been made on the basic of non-molecular criteria such as tumor tissue type, which cannot reveal the underlying genetic disorders or biological process that contributes to malignant process. As the DNA microarray technology is attracting tremendous attention in both the scientific community and in industry, a number of machine learning methods, such as decision tree and nearest neighbor, have been developed for cancer classification based on the high-throughput gene expression data obtained by this technology [1], [2]. Since Microarray data is featured with “high-dimension and small sample” and the existence of inherent noise raised from complex scientific procedures which makes it a more complex issue, It is obvious that traditional existing classification methods were not designed to handle this kind of data. To address these problems, gene selection methods such as K-means, SOM (Self Organization Map) methods and hierarchical method were proposed to select the most informative genes from the wide data sets [3], [4]. Removal of irrelevant genes decreases noise, confusion, and complexity, and thus improves the identification of the most important genes, classification ∗
accuracy, and running time. However, the large number of predictive gene sets and the disparity among them make identifying potential genes an NP-complete problem. Recently, researchers have been focusing on exploring new gene selection methods from a biological viewpoint. For example, gene selection based on rough information entropy [5], [6], BAN-based classification approach [7], and a novel approach which mapped gene expression levels into gene regulation levels and optimized the regulation levels with GA algorithm on the meantime [8]. As mentioned above, to address the cancer classification issue, three points should be considered: studying from a biological view, selecting a discriminative combination of genes and obtaining a precise classifier. Hence, in this paper, a new approach is proposed to address the three points at the same time, which employs PSO algorithm to search for the gene subset owing the best combinative performance, and on the meantime, optimize the classifier based on BP neural network. Besides, the samples’ prior information is encoded into the PSO algorithm for a superior performance. Specifically, our method consists of two steps: Firstly, a group-based ensemble gene selection method proposed by Huawen Liu is employed to preprocess the gene dataset from a biological viewpoint and then a candidate gene subset is obtained. Secondly, PSO algorithm which is good at global searching is used to select a discriminative group of genes with the best combinative performance from the candidate gene subset. Then BP neural network which is good at classifying nonlinear separable patterns is employed to construct a cancer classifier. Since BP algorithm will easily get trapped into local minima especially for those non-linearly separable pattern classification problems [9], PSO algorithm is applied to optimize BP’s parameters to address the local minima problem while searching for the best combination of genes. Besides, sample’s prior information is encoded into PSO algorithm for a better searching performance. Our proposed approach is carried on the leukemia data set, and the experimental results show the nice performance of the novel approach. Several common classification methods are compared with our approach, and the comparison results validate the novel algorithm’s superior performance on both gene selection and classification.
2 Particle Swarm Optimization In 1995, inspired from complex social behavior shown by the natural species like flock of birds, PSO was proposed by James Kennedy and Russell Eberhart. Different from the BP algorithm, the PSO algorithm has good ability of global search. PSO can be stated as initializing a team of random particles and finding the optimal solutions by iterations. Each particle will update itself by two optimal values pbest and gbest which are mentioned above. The original PSO algorithm is described as follows:
V
I
( t + 1) = V
I
( t ) + c1 * r 1 * ( P I ( t ) −
X
i
(t + 1) =
X
X i
I
( t )) + c 2 * r 2 * ( P
(t ) +
V
i
(t + 1)
g
(t ) −
X
I
( t ))
(1) (2)
where Vi is the velocity of the ith particle; Xi is the position of the ith particle; Pi is the best position achieved by the particle so far; Pg is the best position among all particles
Gene Selection and PSO-BP Classifier Encoding a Prior Information
337
in the population; r1 and r2 are two independently and uniformly distributed random variables with the range of [0,1]; c1 and c2 are positive constant parameters called accelerated coefficients, which control the maximum step size. The adaptive particle swarm optimization (APSO) algorithm is proposed by Shi & Eberhart in 1998. This algorithm can be stated as follows:
V
I
(t + 1) = w *V I (t ) + c1 * r1 * ( PI (t ) −
X
i
(t + 1) =
X
i
X
I
(t )) + c2 * r 2 * ( Pg (t ) −
(t ) +
V
i
X
I
(t ))
(t + 1)
(3) (4)
where w is called the inertia weight that controls the impact of the previous velocity of the particle on its current. Several selection strategies of inertial weight w have been given. Generally, in the beginning stages of algorithm, the inertial weight w should be reduced rapidly, and the inertial weight w should be reduced slowly when around global optimum. Another important variant of standard PSO is the CPSO, which was proposed by Clerc and Kennedy. The CPSO ensures the convergence of the search producers and can generate higher-quality solutions than standard PSO with inertia weight on some studied problems.
3 The Proposed Method 3.1 Data Preprocessing Data preprocessing is performed using a group-based ensemble gene selection method proposed by Huawen Liu basing on the theory of information entropy and Markova blanket [10]. As a novel gene selection method, group-based ensemble gene selection method is proposed from a biological view. The rationale behind this method is that given a microarray profile, there is a similarly good prediction performance, notwithstanding many of them have only a few genes in common; two genes’ respective class-discriminative power will not change much after one of them has been removed, when they high correlate to each other, that is to say, a gene is good enough if it is highly correlated to classes and uncorrelated to already selected genes. This method consists of three steps: firstly, a concept—information correlation coefficient (ICC) is proposed to evaluate the correlation degree between each gene and the patterns, and the correlation degree between each two genes. Secondly, approximate Markov blanket technique is used to organize genes into several groups basing on the ICC proposed above. After this grouping procedure, similar genes are bound within the same group, while dissimilar genes belong to different groups. In the third stage, for each group, the first and most respective gene is picked out to compose our candidate gene subset. Since this subset summarizes the patterns seen across the entire data set, it is an informative one. 3.2 The Proposed Algorithm As mentioned in section one, two goals of our paper are a small gene subset and a precise cancer classifier. For the first aim, since the candidate gene subset obtained in
338
Y. Cui, F. Han, and S. Ju
the last step is not small enough due to data processing method’s limitation of tending to choose more genes than many other methods, in this paper, PSO algorithm is further employed to reduce the size of candidate gene subset. For the second aim, a BP classifier is built and PSO algorithm is adopted to overcome BP’s limitations mentioned above to get a more precise classification performance at the same time. Besides, sample’s prior information is encoded into PSO algorithm for a better performance.The details of the proposed algorithm are as follows: Step1: According to 5-fold Cross Validation method, samples are divided into training samples and test ones. Supposed the number of samples is N, N/5 samples are used as testing samples, while the left are used to train the algorithm. After the next generation, another N/5 samples are used for testing, the left one is used for training. This process is continued to cycle 5 times until every sample has been tested. The number of misclassified test samples is calculated as K and the average misclassification rate E = K / 5 is used to valuate the algorithm’s misclassification rate. The smaller the test classification rate is, the better the gene subset’s classification generalization performance will be. Step2: Reprocess the normalized data set according to the group-based ensemble gene selection method, the details of the method is described above. In this step, a large number of redundant noise genes are cut off and an informative candidate gene subset is given out. Step3: Basing on the candidate gene subset obtained in step2, PSO algorithm is employed to search for a smaller and better combination of genes from the candidate subset and optimize the BP classifier constructed in this paper. Besides, sample’s prior information is encoded into the PSO algorithm for a better performance. The details are as follows: Firstly, initialize the positions and velocities of a group of particles randomly in the range of [-1, 1]. As shown in table 1, [ X 1 ,.. ... .. X N ] represent the candidate gene subset, if the value of X i is within [ 0 , + ∞ ], this gene is selected. Otherwise, the gene will be cut off. As mentioned above, to reduce PSO’s searching time and give the searching a good initial direction, the samples’ prior information is encoded into the algorithm [11,12]. Here, BSS/WSS ratio (Between-groups to within-groups sum of squares ratio) is employed to calculate selected probability of each candidate gene and rank them, then the first L genes are taken as most probable selected genes in which ‘L’ is the size of the combination. Hence, the L most probable selected genes are initialized randomly in the range of [ 0 , + ∞ ], while others are initialized randomly in the range of [ − ∞ , 0 ]. Table 1. Description of each particle 1~N X
1
....., X
i
, ......
N+1~N+M X
N
X
N +1
....., X
N +i
, ......
X
N + M
In each iteration, each particle concludes a subset of selected genes and a set of parameters of BP classifier, and each particle is valuated according to the fitness
Gene Selection and PSO-BP Classifier Encoding a Prior Information
339
function, and the worst particle is replaced by the sorted best particle. The fitness function is given as follows: fitn e s s
= 1 0 0 * (1 −
a c c u r a c y ( %
) ) +
E ( X
i
)
(5)
Where the accuracy (%) is the classification accuracy of the optimized BP classifier with the selected gene subset on the training sample, and E(Xi ) is defined as the fitness function of the i-th samples. The learning error E can be defined q
O
as E= ∑ ∑ ( yik − cik ) 2 /( q * o ) , in which q is the number of total training samples, k =1 i =1
y − cik is the error of the actual output and desired output of the i-th output unit k i
when the k-th training sample is used for training. The smaller the particle’s fitness value is the superior performance the selected gene subset and the optimized BP classifier will obtain. If the fitness value reaches the threshold value, or the maximal iterative generations are arrived, the particle with the best fitness value is output. Otherwise, all particles continue to be updated according to formula (3) and (4), and the new particles are revalued according to the fitness function again. In this way, particles continue to update themselves to search for the best particle until the end condition is meted. At the end of the algorithm, the particle with the best fitness value is output, concluding a gene subset and an optimized BP classifier. Step4. According to the gene subset and the optimized BP classifier obtained in step3, we input the test sample to get the last classification result. In this experiment, we adopt 5-fold cross validation to evaluate the obtained combination’s generalization performance. Each validation experiment runs 5 times, and ultimately returns the mean prediction accuracy and the best one.
4 Experimental Results and Discussion To evaluate the performance of our proposed method, the publicly available leukemia data sets are selected, which contains the expression levels of 7129 genes for 72 bone marrow samples labeled with two classes: 47 acute lymphoblast leukemia (ALL) and 25 acute myeloid leukemia (AML). For the leukemia dataset, we first employ the group-based ensemble gene selection method to reprocess this dataset. As the result of data preprocessing, a candidate subset concluding 12 genes is obtained. In the next step, a BP classifier is built, and sample’s prior information is calculated and encoded into PSO algorithm. Then PSO algorithm is adopted to search for the best gene combination which concludes L genes from the candidate gene subset, and optimize the parameters of the BP classifier at the same time. The search is performed by initially setting the number of biomarkers L to 1, and then gradually increasing this number. The 5-fold cross validation is employed to validate the classification performance of each combination. To provide a reduction of the variance of the estimated errors, the process is repeated for 5 times to obtain the best accuracy and the mean accuracy. For the PSO algorithm, the maximal generation is assumed as 20, and the very initial particle consists of two parts: the 12 candidate genes and a set of weights and thresholds generated at random in the range of [-1, 1].
340
Y. Cui, F. Han, and S. Ju
Suppose the initial weight w is 0.9, the acceleration constants, both c1 and c2 are 2.0, r1 and r2 are two random numbers in the range of [0, 1], respectively. Let the maximum velocity be 1.0 and the minimum velocity be -1.0. The initial velocities of the initial particles were generated randomly in the range of [0, 1], and the population size is assumed as 50. For the BP neural network, the maximal generation is assumed as 500 times, and the number of hidden units is assumed to be 11. The best prediction accuracy and the average one are given out in Table 2. From Table 2, it is found that the novel approach presents the best prediction performance when only one gene is selected. The average accuracy for the approach is 94.5% and the best accuracy reaches 96.4%. Table 3 compares the results of our method with those of previous approaches including SNR, BGA and so on, showing that our method obtains high prediction performance with the fewest feature genes selected. Table 2. Prediction errors of the proposed method Datasets leukemia
Prediction accuracy Mean value The Best value
L=1 94.46 94.46
L=2 90.46 87.13
L=3 95.79 92.29
L=4 91.60 89.02
L=5 93.13 90.96
Table 3. Performance comparison of the proposed algorithm with several previous approaches on the leukemia data Methods
No. of genes
Accuracies
SNR[7]
50
0.85
Logistic regression method [13]
20
0.972
PSO—C4.5 [14]
26.5
0.958
BGA[7]
1
0.88
NLN classifier[15]
2
0.853
The proposed method
1
0.945
0.95
0.95
0.945
0.94 0.93 p re d ic tio n a c c u ra c y
p re d ic tio n a c c u ra c y
0.94 0.935 0.93 0.925
0.91 0.9
0.92 0.915 0.91
0.92
0.89
mean prediction accuracy best prediction accuracy 1
1.5
2
2.5
3 Swarm size
(a)
3.5
4
4.5
5
0.88
mean prediction accuracy best prediction accuracy 2
4
6
8
10
12
hidden units
(b)
Fig. 1. (a) The relationship between the algorithm’s performance (mean prediction performance, best prediction performance) and the swarm size on the leukemia data set. (b) The relationship between the algorithm’s performance (mean prediction performance, best prediction performance) and the number of hidden units on the leukemia dataset.
Gene Selection and PSO-BP Classifier Encoding a Prior Information
341
Since the number of hidden units and the swarm size are two critical elements for BP neural network and PSO algorithm respectively, extra experiments are carried out on this point. The relationship between the swarm size of the PSO and the prediction performance are presented in Fig 1(a). It is obviously that the algorithm shows the best performance and the lowest testing error when the swarm size is 50. Fig 1(b) shows the relationship between the number of hidden units and the algorithm’s performance. For leukemia dataset, it is found that when the number of hidden units is set as 9, the novel approach shows the best classification accuracy and the lowest testing error accordingly.
5 Conclusions In this paper, we proposed a new method to address gene selection issue and cancer classification problem. In the proposed approach, PSO algorithm is employed to search for the smallest and most discriminative combination of biomarkers, and optimizes the weights and thresholds of the BP classifier on the meantime. Besides, sample’s prior information is encoded into PSO algorithm for better performance, and its effectiveness is validated on the leukemia data set. The simulation results indicate that our novel method is competitive and effective. Compared with other common methods, it not only leads to a smaller size of gene subset, but also a high-performing classifier. However, there exist disadvantages both in the PSO algorithm and the BP algorithm, which will affect the classification performance of our new approach. Thus, our feature work will be dedicated to the optimization of PSO and BP algorithm. Besides, we will further validate our method on multiclass cancer classification problems. Acknowledgments. This work was supported by the National Natural Science Foundation of China (No.60702056) and Natural Science Foundation of Jiangsu Province (No.BK2009197).
References 1. Boulesteix, A.L., Strobl, C., Augustin, T., Daumer, M.: Evaluating microarray-based classifiers: an overview. Cancer Inform. 6, 77–97 (2008) 2. Mehmet, F.A.: Support vector machines combined with feature selection for breast cancer diagnosis. Expert Systems with Applications 36, 3240–3247 (2009) 3. Yu, L.: Feature selection for genomic data analysis. In: Liu, H., Motoda, H. (eds.) Computational methods of feature selection, pp. 337–354. Chapam Hall/CRC, Boca Raton (2008) 4. Iffat, A.G., Leslie, S.S.: Feature subset selection in large dimensionality domains. Pattern Recognition 43, 5–13 (2010) 5. Wang, J.Y., Wu, Z.J.: Study for gene analysis and selection based on rough information entropy. Application Research of Computers 25, 1713–1716 (2008) 6. Cai, R.C., Hao, Z.F., Yang, X.W., Wen, W.: An efficient gene selection algorithm based on mutual information. Neuro computing 72, 991–999 (2009) 7. Wang, H.Q., Wong, H.S., Zhu, H.L., Timothy, T.C.: A neural network-based biomarker association information extraction approach for cancer classification. Journal of Biomedical Informatics 42, 654–666 (2009)
342
Y. Cui, F. Han, and S. Ju
8. Wong, H.S., Wang, H.Q.: Constructing the gene regulation-level representation of microarray data for cancer classification. Journal of Biomedical Informatics 41, 95–105 (2008) 9. Han, F., Ling, Q.H., Huang, D.S.: Modified Constrained Learning Algorithms Incorporating Additional Functional Constraints Into Neural Networks. Information Sciences 178(3), 907–919 (2008) 10. Liu, H.W., Liu, L., Zhang, H.J.: Ensemble gene selection by grouping for microarray data classification. Journal of Biomedical Informatics (2009) (in Press), Corrected Proof 11. Han, F., Huang, D.S.: A new constrained learning algorithm for function approximation by encoding a priori information into feedforward neural networks. Neural Computing and Applications 17, 433–439 (2008) 12. Han, F., Gu, T.Y., Ling, Q.H.: A New Approach Encoding A Priori Information for Function Approximation. In: 2008 International Conference on Computer Science and Software Engineering, vol. I, pp. 82–85. IEEE Computer Society Press, Los Alamitos (2008) 13. Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97, 77–87 (2002) 14. Liao, J.G., Chin, K.V.: Logistic regression for disease classification using microarray data: model selection in a large p and small n case. Bioinformatics 23(15), 1945–1951 (2007) 15. Zhang, J.R., Zhang, J., Lok, T.M., Michael, R.L.: A hybrid particle swarm optimizationback-propagation algorithm for feedforward neural network training. Applied Mathematics and Computation 185(2), 1026–1037 (2007)
A Modified D-S Decision-Making Algorithm for Multi-sensor Target Identification∗ Xiaolong Liang, Jinfu Feng, and An Liu College of Engineering, Air Force Engineering University, 710038, Xi’an China [email protected], [email protected], [email protected]
Abstract. In this paper, we proposed a modified D-S decision-making approach. Firstly, the definition of drop-falling distance was given and the dropfalling similarity was derived, based on which the credence was obtained. Then, the uncertainties were considered since they reflect the reasonableness of possibility distributions. Finally, a binary linear weight function was derived with respect to the credence and uncertainties of evidences. Murphy’s combination rule was used to determine the type of targets. Simulation results show that the proposed approach is more efficient and advantageous in conflict evidence decision-makings compared with presently existing methods. Keywords: target identification, evidence theory, conflict evidence, evidence combination rule.
1 Introduction In recent years, the multi-sensor data fusion technology has been used widely in both military and civil applications. Decision-making is an important means of analyzing and processing the information from all sources, and it has been applied in many fields such as target recognition, malfunction diagnosis and situation evaluation[3]. The development of applied mathematics, such as operational research and probability statistics, has provided many important methods of conducting quantitative analysis in decision-makings. In the cases of uncertain decision-makings, the probabilities for all the situations are given. However, it is usually impossible to get precise probabilities in practice. It is both demanding and unnecessary to require decision-makers to offer precise probabilities. The evidence theory was first proposed by Dempster and then improved and systematized by Shafer, so it is also called the Dempster-Shafer evidence theory[1]. In this theory, the basic probability assignment is generalized from the probability estimation. Compared with the probability, the basic probability is more effective in characterizing incomplete, uncertain and unreliable information. For its natural form of expression and powerful ability of processing ∗
This research is funded in part by National Hi-tech Research and Development Program of China under No.2007AAXX27.
uncertain information, the D-S evidence theory has got wide applications in the field of decision-makings. However, in some cases, this theory will lead to conclusions that obviously contradict intuitions. This means that when the conflict between the evidences is significant, this theory becomes invalid. To solve this problem, we propose a new decisionmaking method which can eliminate not only the conflicts between evidences but also the uncertainties in every evidence themselves in the light of singleton statistic.
2 D-S Evidence Combination Rule In this part, we will introduce the fundamentals of D-S evidence theory briefly. The readers who are interest in the detailed theory can refer to Ref. [1,7]. In this theory, a sample space is called a frame of discernment, which is usually expressed as Θ. Θis composed of a series of objects which are mutually exclusive. Moreover, Θ contains all the objects in the present decision-making. So Θ can be expressed as follows: Θ = {θ1,θ2,…,θn}
(1)
Where θi is called singleton. A set that has only one singleton is called singleton set. In the data fusion system, such a singleton is the decision made by system. The fundamental problem of evidence theory is to ascertain the degrees for a element belong to a subset A ( A ⊂ Θ ) under the condition that the frame of discernment Θ is known. As for each subset in Θ, we can assign a probability as the basic probability assignment. Suppose Θis a the continuous domain set and 2Θ is the set of all the subsets of Θ. The basic probability assignment function is m: 2Θ→[0,1] and it is such that
∑ m( A) = 1
(2)
m (∅ ) = 0
(3)
A∈2Θ
m(A) denotes the proportion of the available and relevant evidences supporting the claim that a particular element of X belongs to A. A basic strategy of evidence theory is to divide the evidence set into two or more indepedent parts and then to judge the identification frame by using these parts independently and to combine them by D-S evidence rules. The rule has the following form: ⎧ 1 m ( A )m ( B ) ⎪ m( A) = ⎪ − k A ∩B = A 1 i 2 j 1 ⎨ i j ⎪ ⎪⎩ m(∅) = 0
∑
(4)
Where
k=
∑
Ai ∩B j =∅
m1( Ai )m2 ( B j )
k reflects the confliction between the evidences.
(5)
A Modified D-S Decision-Making Algorithm for Multi-sensor Target Identification
345
3 Analysis on the Causes of Conflicts and Existing Methods As described above, k is the coefficient of confliction between evidences, so we can also define Shannon entropy as conf. = -log (1-k)
(6)
When conf.= +∞ or conf.→ +∞, normalizing the conflict evidence will produce results contrary to the intuition. It can be proved by the following example. If there is a target decision-making system containing two sensors, the frame of discernment Θ = {A, B, C} and the basic probability assignment of the singleton searched by the sensors are: m1:m1(A) = 0.99, m1(B) = 0.01, m2:m2(B) = 0.01, m2(C) = 0.99. We can induce from (4) that : k = 0.99, m(A) = m2(C) = 0, m(B) = 1, conf.→ +∞.
,
Although both probability assignments of m1(B) and m2(B) are low the results combined by D-S theory are contrary to the intuition. Meanwhile, there is a problem of Zadth paradox[3], which indicates that if one of the evidence vetoes the target, the combine evidence also vetoes the target. In Mahler’s paper[2], Bayesian theory is regarded as the base of evidence theory rather than a generalization. This also implies that the failure can be ascribed to the limitation of Bayesian theory. We can suppose a probability as an evidence. When the multi-probability renew the situation probability, if the situation probability of one sensor equals zero, whatever the probability of other sensors is, the situation posterior probability equals zero. In practical military data fusion, natural and man-made interferences are made to be a part of the output, which lead to the trouble that different sensors conflict with each other. These adverse situations usually make us passive, and further indirectly misguide the instructor’s strategies and practices in battles. From the perspective of the entire battlefield environment, the source of conflict can be reduced to the follow-
()
ing causes: (1) The interference from other targets. 2 The interference of false alarm and noise. (3) Targets’ cheat. (4) Flexible evasion of target. (5) The change of signal transferring media. (6) The failure of sensors. So, two methods of eliminating the conflicts can be considered. The first one is to eliminate the source of conflict. From the above analysis, we can see that the sources are quite complex, it is impractical to find and correct the conflict evidence in a short time. The other one is to search a more efficient mathematic tool by using the multi-sensor. The tool is the evidence decision-making method. It is one of the hottest topics in the study of evidence theory. These methods can be reduced to two strategy systems: (1) Modifying the rule. This strategy ascribes the failure of evidence combination under high conflict information to the normalization step of the rule and it mainly solves the reassignment of conflict credibility problem. Yager[4] added the conflict to m(Θ) as an uncertainty. Smet[5] it is improper to accept blindly the enclosed-world assumption proposed by Dempster. Because of decision-making risks and the neglect of newly-emerging objects, the anti-interference ability of the system is quite weak. These problems can be attributed to the fact that the scheduled recognition frame cannot cover all the possible situations in operations, such as the emergence of new objects or noises. He affirmed the correctness of evidences and holds that the newlyemerging unknown modes are the origin of conflicts, which lead to the result that the conflict credence is assigned to null set.
346
X. Liang, J. Feng, and A. Liu
(2) Modifying the mode. There is nothing wrong with D-S combination rule itself. Highly conflicting evidences should be pretreated and then combined using the D-S combination rule. On this basis, Murphy [6] presented a rule of average evidence combination. The steps of this rule are as follows. Firstly, the basic probability assignments of evidences are averaged. Secondly, the information is fused by the D-S rules. Compared with other methods, this combination rule can process conflict evidences and the corresponding convergence is fast. However, Murphy’s average method is just simply average multi-source information, without considering the connection between evidences. This is its deficiencies. In this paper, based on Murphy’s rule, we consider the drop-falling similarity on each singleton and the reasonability of possibility distribution to describe the distribution of targets. A binary linear weight function was defined to evaluate each evidence weight, then D-S theory was used to combine and fuse multi-souse evidences with different weights. Method proposed in this paper inherited all the advantages of Murphy’s rule, and moreover has stronger anti-interference ability and faster convergence.
4 The New Modified Model of D-S Decision Method 4.1 The Drop-Falling Similarity Degree
Define (Ω, F , P ) as the probability space, and (Θ, P ) as a measurable space. X:Ω→2Θ is a
∈
random set. In the light of ∀ ω Ω and ∀A ∈ 2Θ , the basic probability assignment of A can be deduced by the random set X(ω):
∈
m(A) = P(ω Ω : X(ω) = A)
(7)
In light of θ in Θ ,the drop-falling of a random set X(ω) in θ is:
∈
∈
μX(θ) = P(ω Ω : θ X(ω)
(8)
Actually, μX(θ) is equivalent to a vague subordinate function. In many references, μX(θ) is called single-point converage function, and when θ is a subset of nonsingleton, μX(θ) is called multi-point coverage function. ‘Drop falling’ is firstly named by the Chinese scholars Wang Pei Zhuang [8].We can use a vivid metaphor to describe the drop-shadow: a random set X in Θ is like a piece of cloud (X) over the ground (Θ) whereas every possible realization X(ω) is one part of this piece of cloud, then the probability P of X(ω) is just the thickness of this piece of cloud. By equation (8), we can see that μX(θ) indicates the total thickness over θ . The thicker the cloud, the deeper the shadow. The random set combines the membership function in fuzzy mathematics with the probability measure in probability theory. Similarly, so the random set can connect the drop-falling with the basic probability assignment.
μ X (θ ) = ∑1A∋θ ⋅ P(ω ∈Ω : X (ω ) = A)
(9)
A
= ∑ m( A) A∋θ
(10)
A Modified D-S Decision-Making Algorithm for Multi-sensor Target Identification
347
where 1A∋θ is a delta function. Eqs. (9) and (10) are a discrete single-point statistical methods under random set frame. In the continuous case, μX denotes the Probability Hypothesis Density (PHD) which can be realized by set integral. For an evidence mτ, τ=1, 2, …, N, the drop-falling similarity degree Siml can be obtained by the dropfalling on pre-decision-making singleton θi of two evidences. Actually, the similarity degree is a fuzzy mathematics similarity measure. Definition. Given that two evidences mα, mβ as well as their respevtive focal elements {A1,…, Ai, …, AMα} and {B1,…, Bi, …, BMβ} are known, for pre-decision-making singletons set {{θ1},…, {θk},…, {θk}}, the drop-falling distance between mα and mβ is 1 d (mα , mβ ) = ⋅ ∑ ∑ mα ( Ai ) − ∑ mβ ( B j ) 2 θk Ai ∋θk B j ∋θk
(11)
The drop-falling similarity degree matrix [Siml]N×N can be obtained by equation Siml ( mα , mβ ) = 1 − d (mα , mβ )
(12)
The support degree of evidence mτ can be obtained by the drop-falling similarity degree Supt (mτ ) =
N
∑ Siml (mυ , mτ ) τ ≠υ ,υ =1
(13)
By normalizing the evidence credibility degree mτ, we can obtain Cred (mτ ) = Supt (mτ ) / ∑ Supt (mτ )
(14)
τ
Obviously, the higher the support degree of an evidence is, the higher its credibility degree is. 4.2 Evidence Uncertainty
As a result of the various kinds of interference in practical battlefield environment, information collected by sensors presents uncertainty. Uncertain entropy of information can be used as the evaluation of reliability of sensors (or evidences). In the evidence theory, the uncertainty entropy of evidence is both non-specific and random and the measurement formula is as follows: Nons (m) = ∑ m( A) log 2 A
(15)
A
Stoc(m) = −∑ m( A) log 2 ∑ m( B) A
B
A∩ B A
(16)
The total uncertainty entropy of evidence is the simple sum of the non-specificity and randomness, that is: (17)
Unct(m)=Nons(m)+Stoc(m) = −∑ m( A) log 2 ∑ m( B) A
B
A∩ B A
2
(18)
348
X. Liang, J. Feng, and A. Liu
The total uncertain entropy of evidence has a good compatibility with probability, set compatibility and subadditivity and it is a generalized form of Shannon entropy. We can see that when all the focal elements in A are singleton sets, with a zero nonspecificity, the randomness is equal to Shannon entropy and thus the total uncertainty entropy is also equal to the Shannon entropy. Entropy is a measure used to describe the uncertainty. Jaynes put forward in 1957 the maximum entropy principle whose main idea is to choose the maximum entropy distribution based on the knowledge on the unknown distribution. Because the distribution is a strong random distribution, where subjective element is minimal, the most reasonable deduction of the unknown distribution is consistent with the already-known knowledge. In practical multi-sensor target identification systems, the purpose is to make reasonable judgments for an unknown target type based on the known target information from multi-sensors. According to Jaynes's maximum entropy theory, evidences can be developed on the choice of a number of focus criteria. In the framework of the power set, evidences often describe the possible distribution of a subset, and it is sure that the evidence with more total uncertain entropy has some advantages in describing the distribution of targets, that is, the more non-specific and random an evidence is, the more attention we should pay to it. The following formula is used to describe the uncertainty of evidence: U(m)=2Unct(m)-1
(19)
On this basis, by normalization, we can get the uncertainty of one evidence.
Ratn(mτ ) = U (mτ ) / ∑U (mυ ) υ
(20)
4.3 The New Combination Method
From the above analysis, we know the right value of each evidence not only depends on the supporting degree of other evidences, that is, the credibility of the evidence, but also on the their own uncertainty. In this way, the weights of evidence can be defined as: Wτ=w·Cred(mτ)+(1-w)·Ratn(mτ)
(21)
Where w∈[0,1] denotes the importance of credibility Cred. When 0≤w<0.5, more attention is paid to the reasonableness of the evidence itself in the decision-making; when 0.5<w≤0, more attention is paid to the degree of mutual supporting degree between evidences. On the basis of establishing evidence weight Wτ, by following Murphy’s methods, the D-S combination rule is used to fuse the weighted averaging evidences: when there are n evidences in the system, combinate the weighted averaging evidences n-1 times. Weighted averaging combination of evidence can be an effective way of processing conflict information so as to avoid the loss of effective evidence.
A Modified D-S Decision-Making Algorithm for Multi-sensor Target Identification
349
5 The Analysis of Practical Example Example 2: In a battlefield environment, target type may be one of A, B and C. For a target identification system contained five characteristics sensors, the basic probability assignments of the five characteristics were measured and summed up by the five sensors, respectively. The assigned values are as follows: m1(A) = 0.5, m1(B) = 0.2, m1(C) = 0.3 m2(A) = 0, m2(B) = 0.9, m2(C) = 0.1 m3(A) = 0.55, m3(B) = 0.1, m3(C) = 0.35 m4(A) = 0.55, m4(B) = 0.1, m4(C) = 0.35 m5(A) = 0.55, m5(B) = 0.1, m5(C) = 0.35 By using D-S combination rules and the method proposed by Yager and Murphy in Ref. [7], the basic probability assignments of targets are calculated as follows (Table 1): D-S completely rejects the type of A when there are more than three evidences. This kind of ‘rejection’ is clearly inconsistent with common sense. Yager’s decisionmaking method is that as long as there is an evidence, no matter how the evidence behaves, the probability assignment is mostly endued to uncertain frame. Correspondingly, the assignment of A is zero, which is obviously contrary to common sense. Most of the frameworks of the probability assignments lose their values in the decision-making. Murphy's decision-making method is a simple D-S weighted decision-making method with a certain degree of blindness, for it does not take into account the information relevance and reasonableness between the various evidences. When there is a Table 1. Comparison of the calculation results of the five combination rules Evidence Set
D-S’s Rule
Yager’s Rule
Murphy’s Rule
Combination Rule in Ref. [7]
Our Method
m1, m2
m1, m2, m3
m1, m2, m3, m4
m1, m2, m3, m4, m5
m(A)=0
m(A)=0
m(A)=0
m(A)=0
m(B)=0.8571
m(B)=0.7500
m(B)=0.7500
m(B)=0.6429
m(C)=0.1428
m(C)=0.2500
m(C)=0.2500
m(C)=0.3571
m(A)=0
m(A)=0
m(A)=0
m(A)=0
m(B)=0.1800
m(B)=0.0270
m(B)=0.0076
m(B)=0.0010
m(C)=0.0300
m(C)=0.0090
m(C)=0.0021
m(C)=0.0006
m(Θ)=0.7900
m(Θ)=0.9640
m(Θ)=0.9910
m(Θ)=0.9984
m(A)=0.1543
m(A)=0.3118
m(A)=0.5567
m(A)=0.9856
m(B)=0.7476
m(B)=0.6264
m(B)=0.4300
m(B)=0.0143
m(C)=0.0976
m(C)=0.0616
m(C)=0.0111
m(C)=0.0001
m(A)=0.1543
m(A)= 0.5814
m(A)=0.8060
m(A)=0.8908
m(B)=0.7469
m(B)= 0.2438
m(B)=0.0493
m(B)=0.0085
m(C)=0.0988
m(C)=0.1744
m(C)=0.1456
m(C)=0.1006
m(A)=0.2615
m(A)=0.6971
m(A)=0.9658
m(A)=0.9998
m(B)=0.6010
m(B)=0.1677
m(B)=0.0031
m(B)=0.0001
m(C)=0.1371
m(C)=0.1352
m(C)=0.0308
m(C)=0.0009
350
X. Liang, J. Feng, and A. Liu
lack of adequate evidences, the convergence rate is slow. Ref. [7] takes into account the similarity between evidences and uses the similarity to measure the reliability of all the evidences as the right values. Based on Murphy’s method, the second evidence for the negative decision-making impact on the results is eliminated. The results show that the method has a good convergence rate. The method proposed in this paper supposes the importance of credibility Credw=0.5, considering the degree of similarity and the reasonableness of the evidence itself for all the evidence in the decision-making list. The method in this paper and in Ref. [7] can eliminate the impact of ‘deception’ evidence. When the evidence number is more than two, the target A can be identified more rapidly and exactly by the method in this paper.
6 Conclusion This paper proposed the concept of drop-falling similarity based on statistical decision-making singleton and D-S decision-making method of evidence uncertainty entropy. The proposed method can overcome the negative impact of interference information against the evidence, and make judgments on the target type effectively through the synthesis of average weighted evidence combination.
References 1. Shafer, G.A.: Mathematical Theory of Evidence. Princeton Univ. Press, Princeton (1976) 2. Mahler, R.: Statistical Multisource-Multitarget Information Fusion. Artech House (2007) 3. Zadeh, L.: A simple view of the Dempster-Shafer theory of evidence and its implication for the rule of combination. J. A I Magazine 7, 85–90 (1986) 4. Yager, R.R.: On the Dempster-Shafer framework and new combination rules. J. Information Science 41, 93–137 (1989) 5. Smets, P., Kennes, R.: The transfer belief model. J. Artificial Intelligence 66, 191–234 (1994) 6. Murphy, C.K.: Combining belief functions when evidence conflicts. J. Decision support systems 29, 1–9 (2000) 7. Deng, Y., Shi, W.K., Zhu, Z.F.: Efficient combination approach of conflict evidence. J. Journal of Infrared and Millimeter Wave 23, 27–32 (2004) 8. Wang, P.Z.: Fuzzy set and random set drop-falling. Beijing Normal Univ., Beijing (1985)
Intelligent Decision Support System for Breast Cancer R.R. Janghel, Anupam Shukla, Ritu Tiwari, and Rahul Kala Soft Computing and Expert System Laboratory ABV-Indian Institute of Information Technology and Management, Gwalior, India [email protected], [email protected], [email protected], [email protected]
Abstract. Breast cancer is the second leading cause of cancer deaths in women worldwide and occurs in nearly one out of eight women. Currently there are three techniques to diagnose breast cancer: mammography, FNA (Fine Needle Aspirate) and surgical biopsy. In this paper we develop an integrated expert system for diagnosis, prognosis and prediction for breast cancer using soft computing techniques. The basic aim is to compare the various neural network models from the recent literature. Breast cancer database used for this purpose is from the University of Wisconsin (UCI) Machine Learning Repository. Three different data sets have been used, each employing different diagnostic technique. It can use diagnosis, prognosis and survivability prediction of breast cancer patient in one intelligent system. We implement six models of neural networks namely Back Propagation Algorithm, Radial Basis Function Networks, Learning vector Quantization, Probabilistic Neural Networks, Recurrent Neural Network, and Competitive Neural network. Experimental Results show that different models give optimal performance for different types of data sets. However, all the models are able to solve the problem to a reasonable extent. Keywords: Artificial Neural Networks, Breast Cancer, Expert System, Back Propagation Algorithm, Radial Basis Function Network, Learning Vector Quantization, Probabilistic Neural Networks, Recurrent Neural Networks, Competitive Neural Network.
Soft computing is an exciting field that deals with the learning of the historical data. In most problems a lot of data is available from history. These systems are made to learn this data by the use of training algorithms that may be specific to the system. Learning involves the extraction of rules or patterns from the historic data. It is evident that well trained systems would be able to give correct results to the problems that they have been trained with. Further the time and memory requirement would be reasonably less as the system has already summarized the historic data into some patterns or rules. Generalization is the ability of the system to give correct outputs to the unknown similar problems. This happens by the application of extracted patterns or rules by the system to the unknown inputs. A system is considered effective if it shows a very high generalizing capability.
2 Literature Review Yuanjiao et al. proposed a technology to extract micro-calcifications clusters with accurate edge effects to obtain much more hidden information which can't be detected by the naked eye on mammograms in order to help the doctors in diagnosing early breast cancer [2]. Computerized microcalcification detection based on fuzzy logic, vibro-acoustography and probabilistic neural network on mammograms for breast cancer diagnosis has been carried out by Heng-Da et al. [3]. Image feature extraction was utilized to retrospectively analyze screening mammograms taken prior to the detection of a malignant mass for early detection of breast cancer in [4]. Statistical texture features for breast cancer detection using Support Vector Machine (SVM) and other machine learning methods like LDA, NDA, PCA, and ANN was done in [5]. SVM was able to achieve better classification accuracy. Early detection of breast cancer is the key to improve survival rate. Thermogram is a promising front-line screening tool as it is able to warn women of cancer up to 10 years in advance [6]. Laufer et al. proposed a modified self-organizing map with nonlinear weight adjustments to reduce number of unnecessary biopsies with a minimal number of subsequent [7]. Taio et al. used Kohonen self organizing map and multilayer perceptron trained with the backpropagation algorithm and classified results as sensitivity and false-positive fraction of actually benign or normal cases that are incorrectly classified [8]. Xiong et al. used statistics methods like PCA, PLS linear regression analysis, data mining methods, and hybrid system of combination rough set and probabilistic neural network. Probabilistic neural network to perform supervised classification and rough sets was able to reduce the number of attributes in the dataset without sacrificing classification accuracy in [9, 10, 11]. Karabatak et al. used ANN and ANFIS with linear discriminant analysis and principal component analysis for diagnosis of breast cancer. The target size is very reliably and target shape was classified [12, 13]. The masses from the mixed classes were input to a supervised linear discriminant classifier LDA in [14]. Ky constructed a hierarchical evolutionary RBF network and employed it to detect the breast cancer. The hierarchical RBF network model reduced number of input features with high
Intelligent Decision Support System for Breast Cancer
353
detection accuracy [15, 16]. The optimum network for classification of breast cancer cells was found using Hybrid Multilayer Perceptron (HMLP) network. A combination of the proposed features gave the highest accuracy [17, 18]. Jain et al. used fuzzy-logic, Hybrid Neuro-Fuzzy generator based on the Knowledge Oriented Design (KOD) concept and Cooperative Neuro-Fuzzy systems using Genetic Algorithms were used for the classification (diagnosis) of breast cancer [19, 20]. Pena-Reyes et al. proposed fuzzy-genetic breast cancer identification. Fuzzy CoCo model is to evolve a fuzzy model that describes the diagnostic decision and the classification [21]. Seker et al. proposed a methodology with neural network, fuzzy logic, FK-NN and statistical method to prognostic analysis of cytometric image data. FK-NN system gives highest accuracy as compared to other techniques [22].
3 Methodology In this work neural network models have been used for the diagnosis and prediction of breast cancer. The learning takes place through an iterative process of weight adjustments applied to its initial weight after epoch iteration of the learning process. Figure 1 shows the overall diagram of complete methodology. Mammography, biopsy and FNA database of breast cancer microscopic and clinical tests reports are collected. Based on the tests reports our expert system predicts cancer as either Benign or Malignant. In this paper we employ several methods like Back Propagation Network (BPA), Radial Basis Function (RBF) Networks, Learning Vector Quantization (LVQ), Recurrent Neural Network (RNN), Probabilistic Neural network (PNN), and Competitive Learning (CL) to reduce the sample-per-feature ratio; and then we investigate multiple machine learning methods to find an optimal classifier. The database contains many missing attributes values, which are filled using data mining techniques. We calculate average of all available values for an attribute and put average value in place of missing values of attributes. Many models of neural network take normalized inputs; therefore we find maximum value of an attribute and then divide all values of attributes by this value. We hence get all inputs in range between zero and one. The BPA makes use of the gradient descent to compute the new values of the weights and biases. It is quickly able to adjust the network weights for good performance. The graph or space denoting the error of the system for every combination of weights and biases is called as the error space. The aim of any training algorithm is to find the global optima in this search space. BPA may many a times get trapped in local minima. This is due to the absence of any global guiding strategy or the attempt to cover the entire error space which is highly complex and of high dimensionality. The feedforward neural network architecture used in this experiment consisted of one hidden layers along with input and output layers.The transfer function in hidden layer neurons and output layer neurons are sigmoid and purelin respectively. The performance function used was Mean Sum-squared Error (MSE) [23]. For PNN, the architecture is consist four types of units namely input unit, pattern units: Class 1 and Class 0, summation unit and output unit. The PNN is based on the theory of Bayesian classification and the estimation of probability density functions.
354
R.R. Janghel et al.
Collect data set for Breast Cancer 1. Feature Extraction and analysis 2. Feature numerical values are stored into the Database.
Data Processing
Data conversion in the form of matrix
Preprocessing of the Data Data minimizing 1. Removal of missing data or inclusion. 2. Data normalization.
Diagnostic
Machine learning Model
System
Comparison of Neural Network Models
Fig. 1. Block Diagram of Complete Methodology
It is necessary to classify the input vectors into one of the two classes in a Bayesian optimal manner. This theory allows for a cost function to represent the fact that it may be worse to misclassify a vector that is actually a member of class 1 than it is to misclassify a vector that belongs to class 0. The probabilistic neural net to estimates the probability density function using the following equation. f
1 σ 2π
1 m
exp
x
x
x
x
σ
th
Here X1i is i Training pattern from class 1, n is the dimension of the input vectors, m1 is Number of training pattern from class 1. σ is smoothing parameter (corresponds to standard deviations of Gaussian distribution), f1(x) serves as an estimator as long as the parent density is smooth and continuous. A complete transfer function on the output of the summation units selects the maximum of these probabilities and produces 1 for malignant and 0 Benign [24].
4 Simulation Results The results are measured against the following diagnostic performance measures: True positive (TP): the number of positive cases correctly detected, true negative (TN): the number of negative cases correctly detected, false positive (FP): the number of negative cases diagnosed as positive, and false negative (FN): the number of positive cases diagnosed as negative are shown in table 1. The various performance measures are summarized in the same table.
Intelligent Decision Support System for Breast Cancer
355
Table 1. Diagnostic performance measures Breast cancer Cancer Test Positive Negative Total
4.1 Susceptibility Prediction Using Fine Needle Aspiration Microscopic Data The Wisconsin breast cancer diagnosis (WBCD) database is the result of the efforts made at the University of Wisconsin Hospital for accurately diagnosing breast masses based solely on an FNA test. The database contains diagnosis results of 699 patients out of which 458 are benign and 241 are malignant [25]. Different neural network models were simulated for different parameter settings. Experimental results of breast cancer system using different neural network models are shown in table 2. LVQ emerged as the optimal network, which used 10 hidden neurons and 0.01 learning rate. The performance function used was MSE. Table 2. Performance comparisons of ANN models ANN Model BPA RBF LVQ PNN RNN CL
Sensitivity
Specification
Accuracy
FPR
FNR
17.39 19.63 95.83 19.63 21.00 25.00
79.47 74.24 95.80 74.24 75.54 77.09
51.88 49.79 95.82 49.79 52.72 74.48
65.48 61.82 04.17 61.82 61.82 94.55
42.93 46.74 04.20 46.74 42.93 04.89
Training Time (seconds) 2.17 10.00 57.14 0.1094 23.31 17.53
4.2 Susceptibility Prediction Using Fine Needle Aspiration Digital Image Data The purpose of the data set is to classify a tumour as either benign or malignant based on cell descriptions gathered by FNA image test. This database contains information about 569 patients with 357 benign and 212 malignant [25]. Again different neural network models were simulated with different architectures. Experimental results of breast cancer system using different neural network models are shown in table 3. The optimal network, BPA in this case, used 25 hidden neurons, MSE error function, 0.05 learning rate with 1000 epochs and momentum of 0.8.
356
R.R. Janghel et al. Table 3. Performance comparison of Neural networks models ANN Model BPA RBF LVQ PNN RNN CL
Training Time (seconds) 49.82 14.85 28.09 0.0675 104.10 20.14
4.3 Susceptibility Prediction Using Fine Needle Aspiration Microscopic Data This experiment uses Wisconsin Prognostic Breast Cancer (WPBC) databases, which has 34 real features computed from an image. The purpose of the data set is to classify a tumour as either recurrent or nonrecurrent based on cell descriptions gathered by image test. The database contains 198 patients database out of which 151 are nonrecurrent and 47 malignant are recurrent [25]. Following the same methodology different neural models with different settings were simulated. Experimental results of breast cancer system using neural network models are shown in table 4. In this case also BPA emerged as the optimal network. The optimal structure of ANN with BPA used 50 hidden neurons, 0.05 learning rate with 15000 epochs and a momentum of 0.8. Table 4. Performance comparisons of ANN models ANN Sensitivity Models BPA 1.0000 RBF 1.0000 LVQ 0.9583 PNN 1.0000 RNN 1.0000 CL 0.3333
Specification
Accuracy
FPR
0.9722 0.8750 0.9583 0.9459 0.9459 0.7333
0.9792 0.8958 0.9583 0.9583 0.9583 0.7083
0.0769 0.3846 0.0417 0.1538 0.1538 0.9231
FNR
Training Time (seconds) 0 53.48 0 4.2 0.0417 22.70 0 0.0936 0 34.23 0.0571 6.71
5 Conclusions All the systems were simulated on the breast cancer database. The ultimate aim is to make an expert system that would assist the doctors in diagnosis of the disease. This would prove to be a very useful system considering the present scenario where diseases are on a hike and there is a lack of availability of specialized doctors. Such a system learns from the past data which is a collection of a lot of information in itself. The system tries to extract this hidden information to make a generalized system for the detection of the disease. It may again be noted that though we have built effective systems, we were still not able to get an accuracy of 100%, which is the ultimate goal of medical diagnosis.
Intelligent Decision Support System for Breast Cancer
357
Acknowledgement. The authors sincerely acknowledge Prof. S.G. Deshmukh, Director ABV-IIITM, Gwalior, India for encouraging and providing facilities to carry out this research work. This work is sponsored by ABV-IIITM Gwalior.
References 1. http://www.breastcancer.org 2. Ma, Y., Wang, Z., Lu, J.L., Wang, G., Li, P., Ma, T., Xie, Y., Zheng, Z.: Extracting Micro-calcification Clusters on Mammograms for Early Breast Cancer Detection. In: Proceedings of the 2006 IEEE International Conference on Information Acquisition, Weihai, Shandong, China, August 20-23, pp. 499–504 (2006) 3. Cheng, H.-D., Lui, Y.M., Freimanis, R.I.: A Novel Approach to Microcalcification Detection Using Fuzzy Logic Technique. IEEE Transactions on Medical Imaging 17(3), 442–450 (1998) 4. Sameti, M., Ward, R.K., Morgan-Parkes, J., Palcic, B.: Image Feature Extraction in the Last Screening Mammograms Prior to Detection of Breast Cancer. IEEE Journal of Selected Topics in Signal Processing 3(1), 46–52 (2009) 5. Mumtaz, A.l., Deris, S., Zaki, N., Ghoneim, D.M.: Breast Cancer Detection Based on Statistical Textural Features Classification, pp. 728–730. IEEE, Los Alamitos (2008) 6. Tan, T.Z., Quek, C., Ng, G.S., Ng, E.Y.K.: A novel cognitive interpretation of breast cancer thermography with complementary learning fuzzy neural memory structure. Expert Systems with Applications 33, 652–666 (2007) 7. Laufer, S., Rubinsky, B.: Tissue Characterization with an Electrical Spectroscopy SVM Classifier. IEEE Transactions on Biomedical Engineering 56(2), 525–528 (2009) 8. Santos-Andrk, T.C.S., da Silva, A.C.R.: A Neural Network Made of a Kohonen’s SOM Coupled to a MLP Trained Via Backpropagation for the Diagnosis of Malignant Breast Cancer from Digital Mammograms, pp. 3647–3650. IEEE, Los Alamitos (1999) 9. Revett, K., Gorunescu, F., Gorunescu, M., El-Darzi, E., Ene, M.: A Breast Cancer Diagnosis System: A Combined Approach Using Rough Sets and Probabilistic Neural Networks. In: EUROCON 2005, Serbia & Montenegro, Belgrade, November 22-24, pp. 1124–1127 (2005) 10. Xiong, X., Kim, Y., Baek, Y., Rhee, D.W., Kim, S.-H.: Analysis of Breast Cancer Using Data Mining & Statistical Techniques. In: Proceedings of the Sixth International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing and First ACIS International Workshop on Self-Assembling Wireless Networks (SNPD/SAWN 2005). IEEE, Los Alamitos (2005) 11. Nabil, E., Badr, A., Farag, I., Khozium, M.O.: A Hybrid Artificial Immune Genetic Algorithm with Fuzzy Rules for Breast Cancer Diagnosis. In: INFOS 2008, Cairo-Egypt, March 27-29, pp. 31–40 (2008) 12. Fogel, D.B., Wasson, E.C., Boughton, E.M., Porto, V.W., Angeline, P.J.: Linear and Neural Models for Classifying Breast Masses. IEEE Transactions on Medical Imaging 17(3), 485–488 (1998) 13. Davis, S.K., Van Veen, B.D., Hagness, S.C., Kelcz, F.: Breast Tumor Characterization Based on Ultrawideband Microwave Backscatter. IEEE Transactions on Biomedical Engineering 55(1), 237–246 (2008)
358
R.R. Janghel et al.
14. Hadjiiski, L., Sahiner, B., Chan, H.-P., Petrick, N., Helvie, M.: Classification of Malignant and Benign Masses Based on Hybrid ART2LDA Approach. IEEE Transactions on Medical Imaging 18(12), 1178–1187 (1999) 15. Van Ha, K.: Hierarchical Radial Basis Function Networks. In: 1998 EEE, pp. 1893–1898 (1998) 16. Chen, Y., Wang, Y., Yang, B.: Evolving Hierarchical RBF Neural Networks for Breast Cancer Detection. In: King, I., Wang, J., Chan, L.-W., Wang, D. (eds.) ICONIP 2006, Part II. LNCS, vol. 4234, pp. 137–144. Springer, Heidelberg (2006) 17. Salleh, N.M., Sakim, H.A.M., Othman, N.H.: Neural Networks to Evaluate Morphological Features for Breast Cells Classification. IJCSNS International Journal of Computer Science and Network Security 8(9), 51–58 (2008) 18. Salleh, N.M., Sakim, H.A.M., Othman, N.H.: Neural Networks to Evaluate Morphological Features for Breast Cells Classification. IJCSNS International Journal of Computer Science and Network Security 8(9), 51–58 (2008) 19. Rahmoun, A., Berrani, S.-A.: A Genetic-Based Neuro-Fuzzy Generator: NEFGEN. In: 2001 IEEE, pp. 18-23 (2001) 20. Pena-Reyes, C.A., Sipper, M.: A fuzzy-genetic approach to breast cancer diagnosis. Artificial Intelligence in Medicine 17, 131–155 (1999) 21. Peña Reyes, C.A.: Breast Cancer Diagnosis by Fuzzy CoCo. In: Peña Reyes, C.A. (ed.) Coevolutionary Fuzzy Modeling. LNCS, vol. 3204, pp. 71–87. Springer, Heidelberg (2004) 22. Seker, H., Odeyato, M., Petrovie, D., Naguib, R.N.G., Bartoli, C., Alasilo, L., Sherbet, G.V.: Prognostic comparison of statistical, neural and fuzzy methods of analysis of breast cancer image cytometric data. In: 2001 Proc. of the 23rd Annual IEEE MBS Int. Conference, Istanbul, Turkey, October 25-28, pp. 3811–3814 (2001) 23. Anupam, S., Ritu, T., Janghel, R.R., Prabhdeep, K.: Diagnosis of Thyroid Disorders using Artificial Neural Networks on 2009 IEEE International Advance Computing Conference (IACC 2009), Patiala, India, March 6-7, pp. 2722–2726 (2009) 24. Sivanandan, S.N., Deepa, S.N.: Principle of Soft Computing. Wiley India Private Limited, Chichester (2007) 25. http://www.cs.wisc.edu
An Automatic Index Validity for Clustering Zizhu Fan1,2, Xiangang Jiang2, Baogen Xu2, and Zhaofeng Jiang2 1
Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, 518055, China School of Basic Science, East China Jiaotong University, Nanchang, 330013, China [email protected]
2
Abstract. Many validity index algorithms have been proposed to determine the number of clusters. These methods usually employ the Euclidean distance as the measurement. However, it is difficult for the Euclidean distance metric to evaluate the compactness of data when non-linear relationship exists between different components of data. Moreover, most current algorithms can not estimate well the scope of the number of clusters. To address these problems, in this paper, we adopt the kernel-induced distance to measure the relationship among data points. We first estimate the upper bound of the number of clusters to effectively reduce iteration time of validity index algorithm. Then, to determine the number of clusters, we design a kernelized validity index algorithm to automatically determine the optimal number of clusters. Experiments show that the proposed approach can obtain promising results. Keywords: clusters; validity index; eigenvalue; upper bound.
To cope with above problems, in this paper, we adopt the kernel-induced distance to measure the relationship among data points. Firstly, a kernel-induced matrix, i.e., kernel matrix [7] is generated. This matrix is then decomposed to get the number of eigenvalues with the information of the eigenvalues, we can estimate the upper bound of the number of clusters. To finally determine the number of clusters, we kernelize Rezaee’s validity algorithm[3] by using kernel-induced distance. There are no parameters to be manually specified by user. So, this method can automatically select the optimal number of clusters. Moreover, the iterations of the proposed method in this paper are significantly less than those of the aforementioned algorithms. The organization of this paper is as follows: In section 2, we discuss the relationship between eigenvectors of the kernel matrix and cluster centers. In section 3, Rezaee’s validity algorithm is implemented to determine the optimal cluster number. In Section 4, experimental results are given to illustrate the effectiveness of the new algorithm. Finally, we make a conclusion in Section 5.
2 Kernel Matrix and Eigen Decomposition Mark Girolami investigated the relationship between eigenvectors of the kernel matrix and cluster centers [8]. He pointed out and proved that the possible number of clusters could be estimated using the kernel matrix derived from the dataset. We know that the Gaussian kernel function is usually employed to define kernel matrix. Given a finite set of observations of datum vector Gaussian
kernel
function
is 2
k i , j = k ( xi , x j ) = exp(− xi − x j
xi (i=1,…,n) where xi ∈ R d , the defined
as
[9,10]:
(2σ )) , where xi , x j ∈ R (i,j=1,…,n). 2
d
K = (kij ) n×n . The mean value of the elements of K is:
Then the kernel matrix is:
n
M=
n
1 n 2 ∑∑ K ij = 1Tn K1n
(1)
i =1 j =1
where the n×1 dimensional vector 1n has elements of value
1 n . An eigenvalue de-
composition on the kernel matrix gives: K = VΛV , where the columns of V are the individual eigenvectors vi of K and the diagonal matrix Λ contains eigenvalues denoted as λi . Then the following formula holds: T
T
⎧
⎫
n
⎭
i =1
∑ λi vi viT ⎬1n = ∑ λi {1Tn vi }
M= 1n ⎨
n
⎩ i =1
2
,
(2)
where M is the average kernel distance of all sample data. It can measure the average similarity and the compactness of the whole dataset. According to [8], the kernel matrix will have a block diagonal structure when there are definite groupings or clusters within the data sample. Therefore, from the formula (2), we can conclude that if there are K distinct clustered regions within the N data
An Automatic Index Validity for Clustering
samples then there will be K dominant terms
λi {1Tn vi }
2
361
in the summation. As
k ( xi , x j ) is positive semidefined, M is always not equal to zero. There must be some non-zero eigenvalues that correspond to the cluster regions. Thus, we can determine the possible number of clusters after computing these eigenvalues. However, some of eigenvalues are so small that they can be neglected. Indeed they usually correspond to the noise or outlier. This work only pays attention on those big eigenvalues. We can select the possible number of clusters through counting the number of these eigenvalues which the ratio of their summation to the total eigenvalue summation is no less than a threshold t, say, 0.9 or 0.95. The range of this threshold will be discussed in the experiment section shown in Section 4. In order to accurately determine cluster number, we use the cluster validity index method. Currently, most validity algorithms are based on the Euclidean distance metric which is not powerful in dealing with the nonlinear data. In this paper, we employ a kernel distance to overcome the drawback of the Euclidean distance metric. This metric will be discussed in the following section.
3 Kernel Validity Measure Rezaee, et al. proposed a validity index that could measure both the compactness and separation of clustering in [3]. This index was slightly modified by Sun [4] and was used in [11]. Both are based on the Euclidean distance. To be able to deal with the nonlinear data, Rezaee’s validity is kernelized and improved in this paper. For the simplicity, we only focus on the k-means algorithm. Given the data set
{
}
X = x1 , x 2 ,..., x n | xi ∈ R d with c cluster centers vi , such
that V= {v1 , v 2 ,..., v c }. The variance of the pattern set X with kernel induced distance is denoted by V(X) [12]:
V (X ) =
1 n 2 n 1 ( k ( x , x ) − k ( xi , x j ) + 2 ∑ ∑ i i n i =1 n j =1 n
n
n
1
∑∑ k ( xs , xt )) 2 .
(3)
s =1 t =1
Similarly, the variance of the pattern in class ci whose center is vi, with kernel induced distance is defined as
σ (v i ) = where
1 ni
ni
∑ (k ( xi , xi ) − i =1
2 ni
ni
∑ k ( xi , x j ) + j =1
1 2 ni
ni
ni
1
∑∑ k ( xs , xt )) 2 ,
(4)
s =1 t =1
ni is the number of samples in the class ci. Then the average scattering for c
clusters is defined as
S (c ) =
1 c ∑ σ (vi ) V ( X ) . c i =1
(5)
362
Z. Fan et al.
To measure the total scattering between the clusters, we define a kernel distance function D(c) as follows
D (c ) = where
Dmax Dmin
c
c
∑∑ kd (v , v i
j
) −1 ,
(6)
i =1 j =1
kd (vi , v j ) is the kernel distance between the cluster ci and the cluster cj:
kd (vi , v j ) =
1 ni2
ni
ni
∑∑ k ( xi , x j ) − i =1 j =1
2 ni n j
ni
nj
∑∑ k ( x p , xq ) + p =1 q =1
1 n 2j
nj
nj
∑∑ k ( x , x ) s =1 t =1
s
t
(7)
where ni and nj are the pattern number of the cluster ci and the cluster cj, respectively. The detailed derivation can be found in Appendix B. In Formula (6), Dmax = max ( kd (vi , v j ) ) (i,j =1,2,…,c) is the maximum kernel distance between the cluster prototypes. Similarly, Dmin = min ( kd (vi , v j ) )(i,j =1,2,…,c,
i≠j
) is the minimum kernel
distance between the cluster centers. Therefore, our kernel-based validation index KRVc is now defined by combining the two equations (5) and (6):
KRVc = αS (c) + D(c) .
(8)
Where α is a weighting factor which can be specified in the experiments. Usually, it is automatically set to be D(cmax). S(c) indicates the average of the scattering (compactness) within the clusters for c number of clusters. D(c) denotes the separation between the clusters. If a cluster number can minimize the KRVc , it would be viewed as an optimal number of the clusters. It is well known that the aforementioned validation algorithms choose the class number from cmin to cmax. The cmin is always equal to 2, but it is usually difficult to set the cmax which is experientially set to be n . Our method can address this problem because we can effectively narrow the range of the possible number of clusters through eigenvalue decomposition discussed in the section 2. Thus, we only need to compute a few possible numbers to finally determine the optimal class number.
4 Experimental Results In this section, we present three experiments to show the effectiveness of the proposed algorithm. Experiments are divided into two parts. The first part is conducted on three data sets to suggest the upper bound of optimal cluster number. These data sets include one synthetic data set, two data sets from UCI machine Learning Repository. They are shown in Table 1. The second part is performed on three data sets including one synthetic data and two real data sets using the proposed validation to select the optimal cluster number.
An Automatic Index Validity for Clustering
363
4.1 Upper Bound of Optimal Cluster Number According to the above discussion, eigenvalue decomposition of kernel matrix can provide a powerful tool to estimate the possible number of clusters. After decomposition, eigenvalues are arranged in descending order. In the experiments, the Gaussian parameter σ is set to be the average Euclidean distance with regard to arbitrary two data points. The threshold t is the ratio of the summation of several largest eigenvalues to the total eigenvalue summation. As shown in Table 2, we calculate the possible cluster numbers of data set corresponding to t from 0.9 to 0.98. From this table, when t is equal to 0.96, its corresponding possible numbers are larger than the optimal cluster numbers of data sets. So these possible numbers can be viewed as upper bounds of optimal cluster numbers of data sets respectively. In Table 2, the eighth column, denoted as “UB”, lists the upper bounds of each data set. There may be some possible numbers which are bigger than n of data sets. Upper bounds of these optimal numbers of data sets are set to be square root of instance numbers of these data sets. Thus, we can finally determine the upper bound of the optimal cluster number for each data set. In general, the upper bound is less than n which is always set to be cmax in the other algorithms. Our algorithm is described as the following 1. compute the kernel matrix K. 2. decompose the K to obtain the possible number k. 3. determine the upper bound UB for the data set; 4. a) run kernel k-means to cluster the data set; b) calculate the KRV from 2 to the UB. 5. finally determine the optimal number k corresponding to the minimum of KRV. Table 1. The possible numbers and upper bounds of data sets
Data set Iris Wine Synthetic
0.90 2 2 3
0.92 2 2 3
0.94 3 2 3
0.96 3 3 4
0.98 4 3 5
S 13 14 25
UB 3 3 4
S-UB 10 11 21
Columns from 2 to 6 In Table 2 list the possible numbers of data sets when t is from 0.90 to 0.98. The seventh column lists the minimal integer which is bigger than the square root of the instance number of each data set. For each data set, the value in this column indicates iterations to compute the optimal cluster number using usual validity measures. The column “UB” is the upper bound of each data set. The last column shows the reducing iterations by using our proposed method. According to this column, our algorithm can effectively reduce the iteration time of validity index, especially when the data set includes a great number of instances. 4.2 Experiments on Three Data Sets After calculation of the upper bound, we use the validation index to determine the optimal number. We choose three data sets: one is the synthetic data and the others are the real data sets. The synthetic data set has four classes which are Gaussian
364
Z. Fan et al.
distribution: (0,2), (5,1.5), (10,1.5), (15,2). The real data sets are IRIS and WINE. They are typical linear non-separable data sets that are always used in the experiments [11,5]. To compare with other validation methods, we implement Xie–Beni’s separation measure (XBV) [4], Rezaee’s validity (RV) [5] and Chou’s validity (CV) [7]. These validity measures are based on the Euclidean distance while the proposed algorithm uses the kernel- induced distance. We test these measures on three data sets and run 20 times for each data set with randomly initial centers. The smallest value indicates a valid optimal partition. In the experiments, there are two parameters to be set. One is Gaussian kernel function parameter which is the mean of variance norm of each datum vector. Another parameter α in the equation (8) is set to be Dis(cmax). Its value depends on the data set. It has three values: 150, 1200 and 100 corresponding to the IRIS, WINE and synthetic Gaussian data set, respectively. Experiment results on three data sets are shown in the following tables. Each K in tables denotes the possible cluster number. Results in tables contain two parts. As for the DV measure, the first one is the maximum for each K in 20 runs while it is the mean value for other measures such as the XBV, RV, CV and the proposed KRV. The second part in the bracket is the frequency of the maximum or the minimum mean value for each K in 20 runs. The bold italic in tables shows the highest frequency. Table 2. Validity indexes on IRIS data set
XBV RV CV KRV
K=2 0.07(20/20) 12.1(3/20) 0.03(17/20) 547(1/20)
K=3 0.27(0/20) 12.4(17/20) 0.13(3/20) 541(19/20)
K=4 0.54(0/20) 14.0(0/20) 0.14(0/20) 832(0/20)
K=5 0.50(0/20) 16.8(0/20) 0.23(0/20) 1747(0/20)
Table 2 shows the experimental results on IRIS data set. This data set has widely been used for testing clustering algorithms. It contains three physical classes representing different IRIS subspecies. Each class includes 50 samples. Of the three classes, one is well separated from the other two, which are not easily separable due to the overlapping of their convex hulls. From Table 2, KRV yields an optimal number of clusters of 3 (the best number of clusters) in 19 out of 20 runs and gets the lowest mean value. RV produces the correct number in 17 out of 20. CV gets only three optimal numbers of 20 runs. The other measures do not result in the true cluster number. Table 3. Validity indexes on WINE data set
XBV RV CV KRV
K=2 0.26(0/20) 0.58(0/20) 0.16(0/20) 116.4(0/20)
K=3 0.18(0/20) 0.46(0/20) 0.12(7/20) 96.2(17/20)
K=4 0.15(6/20) 0.36(0/20) 0.11(10/20) 106.4(3/20)
K=5 0.21(3/20) 0.32(0/20) 0.20(5/20) 140.7(0/20)
An Automatic Index Validity for Clustering
365
The experimental results on WINE data set are shown in Table 3. This data set has also usually been used for testing clustering algorithms. It consists of 178 records, each being described by 13 attributes. Like the IRIS data set, it also contains three clusters which are linear non-separable. According to Table 3, the proposed KRV obtains the best result too. It produces the correct optimal number in 17 out of 20 runs. The CV validity yields the best number of clusters in 10 out of 20 runs. The XBV validity yields the best number of clusters in 6 out of 20 runs. And RV validity can not correctly determine the optimal cluster number.
5 Conclusion In this paper, we propose a novel selection of optimal number of clusters. This method firstly show the relationship between eigenvalues of kernel matrix derived from data patterns and the number of clusters by many experiments. Then, based on kernel induced distance, we establish a new validity measure to determine the optimal cluster number. Our experiments show the effectiveness and robustness of the proposed algorithm. The main advantage of the proposed algorithm is good at coping with the linear non-separate problems. Our method can reduce the iteration time using the upper bound of the optimal cluster number. The proposed algorithm is suitable for large scale data sets. Moreover, there are no parameters to be set manually in our algorithm. So our method is an automatic technique. Of course, our approach faces the problem of high computational cost when computing eigenvalues. How to address this problem is our future work. Acknowledgements. The authors would like to thank the education department of Jiangxi province for financially supporting this research under Contracts No. GJJ09500.
References 1. Xu, R., Wunsch II, D.: Survey of Clustering Algorithms. IEEE Transactions on Neural Networks 16(3), 645–678 (2005) 2. Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Machine Intell. 13(8), 841–847 (1991) 3. Ramze Rezaee, M., Lelieveldt, B.P.F., Reiber, J.H.C.: A new cluster validity index for the fuzzy c-mean. Pattern Recognition Letters 19, 237–246 (1998) 4. Sun, H., Wang, S., Jiang, Q.: FCM-Based Model Selection Algorithms for Determining the Number of Clusters. Pattern Recognition 37, 2027–2037 (2004) 5. Chou, C.-H., Su, M.-C., Lai, E.: A new cluster validity measure and its application to image compression. Pattern Anal. Applic. 7, 205–220 (2004) 6. Wang, W., Zhang, Y.: On fuzzy cluster validity indices. Fuzzy Sets and Systems 158, 2095–2117 (2007) 7. Xu, Y., Yang, J.-y., Jin, Z.: A novel method for Fisher discriminant Analysis. Pattern Recognition 37(2), 381–384 (2004) 8. Girolami, M.: Mercer Kernel-Based Clustering in Feature Space. IEEE Transactions on Neural Networks 13(3), 780–784 (2002)
366
Z. Fan et al.
9. Xu, Y., Zhang, D., Jin, Z., et al.: A fast kernel-based nonlinear discriminant analysis for multi-class problems. Pattern Recognition 39(6), 1026–1033 (2006) 10. Xu, Y., Zhang, D., Song, F., Yang, J.-Y., Jing, Z., Li, M.: A method for speeding up feature extraction based on KPCA. Neurocomputing 70(4-6), 1056–1061 (2007) 11. Li, M.J., Ng, M.K., Cheung, Y.-m., Huang, J.Z.: Agglomerative fuzzy K-means clustering algorithm with selection of number of clusters. IEEE Transactions on knowledge and data engineering 20(11), 1519–1534 (2008) 12. Vert, J.P., Tsuda, K., Schölkopf, B.: A Primer on Kernel Methods. In: Kernel Methods in Computational Biology, pp. 35–70. MIT Press, Cambridge (2004) 13. Xu, Y., Yang, J.-y., Lu, J., Yu, D.-j.: An efficient renovation on kernel Fisher discriminant analysis and face recognition experiments. Pattern Recognition 37(10), 2091–2094 (2004) 14. Xu, Y., Yang, J.Y., Yang, J.: A reformative Fisher discriminant analysis. Pattern Recognition 37, 1299–1302 (2004) 15. Xu, Y.: A New Kernel MSE Algorithm for Constructing Efficient Classification Procedure. International Journal of Innovative Computing, Information and Control 5(8), 2439–2447 (2009)
Exemplar Based Laplacian Discriminant Projection X.G. Tu and Z.L. Zheng Department of Computer Science, Zhejiang Industry Polytechnic College, China
Abstract. A new algorithm, exemplar based Laplacian Discriminant Projection (ELDP), is proposed in this paper for supervised dimensionality reduction. ELDP aims at learning a linear transformation which is an extension of Linear Discriminant Analysis(LDA). Specifically, we define three scatter matrices using similarities based on representative exemplars which are found by aÆnity propagation clustering. After the transformation, the considered pairwise samples within the same exemplar subset and the same class are as close as possible, while those between classes are as far as possible. The structural information of classes is contained in the exemplar based Laplacian matrices. Thus the discriminant projection subspace can be derived by controlling the structural evolution of Laplacian matrices. The performance on several data sets demonstrates the competence of the proposed algorithm. Keywords: Laplacian Matrix, Exemplars, Supervised Learning, Linear Discriminant Analysis.
1 Introduction Dimensionality reduction has attracted tremendous attention in pattern recognition community over the past few decades and many new algorithms have been developed. Among these algorithms, linear dimensionality reduction is widely spread for its simplicity and e ectiveness. Principal component analysis(PCA), as a classic linear method for unsupervised dimensionality reduction, aims at learning a kind of subspaces where the maximum covariance of all training samples are preserved [2]. Locality Preserving Projections(LPP), as another typical approach for unsupervised dimensionality reduction, seeks projections to preserve the local structure of the sample space [3]. However, unsupervised learning algorithms can not properly model the underlying structures and characteristics of di erent classes [5]. Discriminant features are often obtained by supervised dimensionality reduction. Linear discriminant analysis(LDA) is one of the most popular supervised techniques for classification [6,7]. LDA aims at learning discriminant subspace where the within-class scatter is minimized and the between-class scatter of samples is maximized at the same time. Many improved LDAs up to date have demonstrated competitive performance in object classification [10,11,12,14,16]. In traditional LDA algorithm, the similarity measure of the scatter matrices is based on the distances between sample vectors and the corresponding center vectors. Frey proposed a new clustering method, called aÆnity propagation clustering(APC), to identify a subset of representative examples which is important for detecting patterns and processing sensory signals in data[21]. Compared with center vector, the exemplars are Y. Tan, Y. Shi, and K.C. Tan (Eds.): ICSI 2010, Part II, LNCS 6146, pp. 367–374, 2010. c Springer-Verlag Berlin Heidelberg 2010
368
X.G. Tu and Z.L. Zheng
much more representative because the similarities between the samples and exemplars within the same subsect are more closely than those of between the samples and the center vectors. Motivated by APC, traditional LDA, Laplacian Eigenmaps(LE) and the nearest neighborhood selection strategy [4,8,9], we propose a new dimensionality reduction algorithm, exemplar based Laplacian discriminant projection, for discriminant feature extraction. In our algorithm, we play much emphasis on exemplar based scatter similarity which can be viewed as extensions of within-class scatter and the betweenclass scatter similarity. We formulate the exemplar based scatter by means of similarity criterions which were commonly used in LE and LPP. The extended exemplar scatter are governed by di erent Laplacian matrices. Generally, LDA can be regarded as a special case of ELDP. Therefore, ELDP not only conquers the non-Euclidean space problem[5], but also provides an alternative way to find potential better discriminant subspaces. The paper is organized as follows. In Section 2, we provide a brief introduction of the related work. In Section 3, the proposed exemplar based Laplacian discriminant projection is described in detail. The experimental results and the performance comparisons are presented in Section 4. Section 5 covers some conclusions.
2 Overview of Linear Discriminant Analysis Let X [x1 x2 xn ] ÊD¢n denote a data set matrix which consists of n samples xi ni 1 ÊD . Linear dimensionality reduction algorithms focus on constructing a small number, d, of features by applying a linear transformation W ÊD¢d that maps each sample data xi of X to the corresponding vector yi Êd in ddimensional space as follows: (1) W : xi ÊD yi W T xi Êd Assume that the matrix X contains c classes, and is ordered such that samples appear by class X
[X1 X2 Xc ] [x11 x1c1 xc1
xccc ]
(2)
In traditional LDA, two scatter matrices, i.e., within-class matrix and between-class matrix are defined as follows [6]: Sw
1 n
Sb
1 n
c
(x m(i) )(x m(i) )T
(3)
ni (m(i) m)(m(i) m)T
(4)
i 1 x¾ X i c i 1
where ni is the number of samples in the i th class Xi , m(i) is the mean vector of the i th class, and m is the mean vector of total samples. It follows from the definition that trace(S w ) measures the within-class compactness, and trace(S b ) measures the betweenclass separation.
Exemplar Based Laplacian Discriminant Projection
369
The optimal transformation matrix W obtained by traditional LDA is computed as follows [6]: tr(W T S b W) Wopt arg max (5) w tr(W T S w W) To solve the above optimization problem, the traditional LDA computes the following generalized eigenvalue equation
S w wi
S b wi
(6)
and takes the d eigenvectors that are associated with the d largest eigenvalues i i 1 d.
3 Laplacian Discriminant Projections 3.1 Discriminant Within-Exemplar Scatter Let xis denote the s th sample in i th class. We formulate Eq.(1) as follows: yis
W T xis
(7)
Assume that each xis belong to its exemplar neighborhood within i th class. Each X j X can be divided into k exemplar neighborhoods ßis s 1 k. Let ßis denote the center vector of exemplar neighborhoods ßis . We define the withinexemplar scatter of ßis as s as follows: s
yis
¾
s s i yi
is 2
(8)
s i
where is is the weight defined as
s i
exp(
xis ßis 2 t
)
(9)
To obtain the compact expression of Eq8, let is diag(1i ni s ) be a diagonal matrix and Yis [y1i yni s ]. In addition, let ens denote the all one column vector of length n s . Then is n1s Yis ens . Equation8 can be reformulated as: s i
where 2 s en (en )T ns i s s The within-exemplar scatter of class i: Lis
is
ks
i
s i s 1
(ens )T is ens ens (ens )T n2s
(11)
ks
trYis Lis (Yis )T
(12)
s 1
There exists a 0-1 indicator matrix Pis satisfying Yis Yi Pis . Each column of Pis records the exemplar information which is derived from the clustering process. i
ks
trYi Pis Lis (Pis )T (Yis )T
(13)
s 1
where Li
ks s 1
trYi Li YiT
Pis Lis (Pis )T . The total within-exemplar scatter of all class: A
c
i
c
trYi Li (Yi )T
i 1
(14)
i 1
There exists a 0-1 indicator matrix Qi satisfying Yi YQi . Each column of Qi records the class information which is known for supervised learning. A
c
trYQi Li (Qi )T (Yi )T
(15)
i 1
trYLExem Y T
c T where LExem i 1 Qi Li (Qi ) can be called the within-exemplar Laplacian matrix. Plugging the expression of Y into Eq15, we obtain the final form of the total withinexemplar scatter: (16) A tr(W T Exem W)
where Exem is the within-exemplar scatter matrix. 3.2 Discriminant Within-Class Scatter
i
ki
s s i yi
yi 2
(17)
s 1 i
trYi L i YiT
where L
i
i
2 en (en )T ni i i i
c i
i 1
(18)
(eni )T s i eni n2i
trYi L i YiT
eni (eni )T
(19) (20)
Exemplar Based Laplacian Discriminant Projection
371
3.3 Discriminant Between-Class Scatter Let B denote the exemplar based between-class scatter of all classes. B is defined as c
B
cs
i i j y j
y2
(21)
j 1 i 1
where yij implies its corresponding original i th exemplar center of j th class of X j . i is the weight, defined as j i j
exp(
xij x2 t
)
(22)
Let Y [y11 yc11 y1c yccc ] consists of exemplar center vectors of all classes. By the similar deduction, B can be formulated as follows B where Lb
W
2 nexem
T
trY Lb Y
Weexem eTexem
eTexem Weexem eexem eTexem n2exem
(23)
(24)
is the exemplar based between-class Laplacian matrix. Taking Y W T X into account, we re-write Eq.(23) as follows B
tr(W T W)
(25)
T
where XLb X is the total exemplar based between-class scatter matrix. The idea of exemplar based Laplacian discriminant projection is to make the samples within the same class cluster as compact as possible and samples between classed separate as far as possible in Laplacian aspect. Meanwhile, the micro-structures of data within the same class are kept by clustering. Di erent from [5,9,15], ELDP mainly focuses on samples subgroup within the same class. In addition, it should be noted that the distance between samples in the original sample space are measured Euclidean norm for simplification reason in this paper. In fact, the distance measure can be other norms depending on the metric of the original sample space which may be Euclidean or non-Euclidean.
4 Experiments In this section, we investigate the use of ELDP on several data sets including UCI1 , USPS2 and PIE-CMU face data set [13]. The data sets used in the paper belong to di erent fields in order to test the performance of ELDP algorithm. We compare our proposed algorithm with PCA[2], LDA[7], LPP[3] and Marginal Fisher Analysis (MFA)[1]. 1 2
Available at http:www.ics.uci.edumlearn ˜ MLRepository.html Available at http:www.kernel-machines.orgdata
372
X.G. Tu and Z.L. Zheng
4.1 On UCI Data Set In this experiment, we perform on iris data taken from the UCI Machine Learning Repository. There are 150 samples of 3 classes (50 samples per class) in iris data set. We randomly select 20 samples per class for training and the remaining samples for testing. The average results are obtained over 50 random splits. All algorithms reduce the original samples to 2-dimensional space. The classification is based on k-nearest neighbor classifier. The experimental results are shown in Table1. In terms of ELDP algorithm, there are several parameters which should be set before the experiments. In the experiment, k f w 15, knb 20, and the time variable t 10. Table 1. Recognition accuracy of dierent algorithms Algorithm PCA LDA LPP MFA ELDP Accuracy 95.112 95.391 95.391 95.383 95.891
3
3
3
3
3
2
2
2
2
2
1
1
1
1
1
0
0
0
0
0
−1
−1
−1
−1
−1
−2
−2
−2
−2
−2
−3 −3
−2
−1
0
1
2
3
(a) PCA
−3 −3
−2
−1
0
1
2
3
(b) LDA
Fig. 1. Embedding results in 2
−3 −3
−2
−1
0
1
(c) LPP
2
3
−3 −3
−2
−1
0
1
(d) MFA
2
3
−3 −3
−2
−1
0
1
2
3
(e) ELDP
D space of PCA, LDA, LPP, MFA and ELDP
To demonstrate the performance of ELDP algorithm, we randomly select one split from the 50 splits. The embedding results of ELDP in 2D space, together with other four algorithms, are shown in Fig.1. As illustrated in Table1, ELDP algorithm outperforms other methods with recognition rate of 93891%. We can find that the within-class embedding result of ELDP is more compact than those of others four methods, as illustrated in Fig.1.
5 Conclusions In this paper, based on aÆnity propagation and LDA algorithms, we propose a new method, exemplar based Laplacian Discriminant Projection(ELDP) for supervised dimensionality reduction. Using similarity weighted discriminant criterions, we define the exemplar based within-class Laplacian matrix and between-class Laplacian matrix. In comparison with the traditional LDA, ELDP focuses more on the enhancement of the discriminability while keeping local structures. Therefore, ELDP has the flexibility of finding optimal discriminant subspaces. Experiments are performed on several real data sets. Experimental results indicate that discriminant criterions formulated in ELDP are more suitable for discriminant feature extraction, no matter the sample space is Euclidean or not. The performance of ELDP will be further enhanced by trying other
Exemplar Based Laplacian Discriminant Projection
373
improved LDA strategies. In addition, how to choose the best parameters, for example, the number of exemplars k, of ELDP will be an interesting direction for future study.
Acknowledgement The authors confirm that the research was supported by National High-Tech Research and Development Plan (No.2007AA01Z164) and National Science Foundation of China (No.60805001).
References 1. Yan, S., Xu, D., Zhang, B., Zhang, H., Yang, Q., Lin, S.: Graph Embedding and Extension: A General Framework for Dimensionality Reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(1) (2007) 2. Turk, M., Pentland, A.: Eigenfaces for recognition. Journal of Cognitive Neuroscience 3(1), 71–86 (1991) 3. He, X., Yan, S., Hu, Y.X., Niyogi, P., Zhang, H.: Face recognition using Laplacianfaces. IEEE Trans. on Pattern Analysis and Machine Intelligence 27(3), 328–340 (2005) 4. Belkin, M., Niyogi, P.: Laplacian Eigenmaps for dimensionality reduction and data representation. Neural Computation 15, 1373–1396 (2003) 5. Zhao, D., Lin, Z., Xiao, R., Tang, X.: Linear Laplacian Discrimination for Feature Extraction. In: CVPR 2007 (2007) 6. Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press, Boston (1990) 7. Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J.: Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(7), 711–720 (1997) 8. Weinberger, K., Blitzer, J., Saul, L.: Distance metric learning for large margin nearest neighbor classification. In: NIPS 2006, pp. 1475–1482 (2006) 9. Nie, F., Xiang, S., Zhang, C.: Neighborhood MinMax Projections. In: IJCAI 2007, pp. 993–998 (2007) 10. Howland, P., Park, H.: Generalizing discriminant analysis using the generalized singular value decomposition. IEEE Trans. on Pattern Analysis and Machine Intelligence 26(8), 995–1006 (2004) 11. Liu, C.: Capitalize on dimensionality increasing techniques for improving face recognition grand challenge performance. IEEE Trans. on Pattern Analysis and Machine Intelligence 28(5), 725–737 (2007) 12. Martinez, A., Zhu, M.: Where are linear feature extraction methods applicable. IEEE Trans. on Pattern Analysis and Machine Intelligence 27(12), 1934–1944 (2006) 13. Sim, T., Baker, S., Bsat, M.: The CMU Pose, illumination, and expression (PIE) database. In: Proceedings of the IEEE International Conference of Automatic Face and Gesture Recognition (2002) 14. Wang, X., Tang, X.: Dual-space linear discriminant analysis for face recognition. In: CVPR 2004, pp. 564–569 (2004) 15. Yan, S., Xu, D., Zhang, B., Zhang, H.: Graph embedding: A general framework for dimensionality reduction. In: CVPR 2005 (2005) 16. Yang, J., Frangi, A., Yang, J., Zhang, D., Jin, Z.: KPCA plus LDA: a complete kernel Fisher discriminant framework for feature extraction and recognition. IEEE Trans. on Pattern Analysis and Machine Intelligence 27(2), 230–244 (2005)
374
X.G. Tu and Z.L. Zheng
17. Zheng, Z.L., Yang, J., Zhu, Y.: Face detection and recognition using colour sequential images. Journal of Research and Practice in Information Technology 38(2), 135–149 (2006) 18. Zheng, Z.L., Yang, J.: Supervised Locality Pursuit Embedding for Pattern Classification. Image and Vision Computing 24, 819–826 (2006) 19. Wang, X., Tang, X.: A unified framework for subspace face recognition. IEEE Trans. on Pattern Analysis and Machine Intelligence 26(9), 1222–1228 (2004) 20. Wang, X., Tang, X.: Random sampling for subspace face recognition. International Journal of Computer Vision 70(1), 91–104 (2006) 21. Frey, B.J., Dueck, D.: Clustering by Passing Messages Between Data Points. Science 315, 972–994 (2007)
A Novel Fast Non-negative Matrix Factorization Algorithm and Its Application in Text Clustering Fang Li and Qunxiong Zhu* School of Information Science and technology Beijing University of Chemical Technology, BUCT Beijing, China, 100029 [email protected], [email protected]
Abstract. In non-negative matrix factorization, it is difficult to find the optimal non-negative factor matrix in each iterative update. However, with the help of transformation matrix, it is able to derive the optimal non-negative factor matrix for the transformed cost function. Transformation matrix based nonnegative matrix factorization method is proposed and analyzed. It shows that this new method, with comparable complexity as the priori schemes, is efficient in enhancing nonnegative matrix factorization and achieves better performance in NMF based text clustering. Keywords: Term-Document Matrix; Non-negative Matrix Factorization (NMF); Text Clustering.
1 Introduction Non-negative Matrix Factorization (NMF) is a method of matrix factorization subject to the non-negative constraint, which has been widely used in many fields, such as text mining, spectral data analysis, image processing etc[1,2,3,4]. In the field of text mining, NMF plays an important role in dealing with high dimensional data. Using NMF to the document-term matrix obtained from document data set, it can get termfeature matrix and feature-document matrix accordingly. There, term-feature matrix shows the features of text data set, while feature-document matrix may imply the data clustering results for this data set. NMF was firstly proposed in 1999 by D.D.Lee and H.S.Seung [5]. Later, the well-known multiplicative updating method (Mult) was proposed and analyzed in 2001[6]. However, multiplicative updating method may get stuck in a non-stationary point and hence suffer from a poor convergence[7,8]. In [7], Chin-Jen Lin proposed a projected gradient method in the non-negative matrix factorization. However, a quite amount of computation overhead should be spent in choosing an appropriate step for the projected gradient method. In [8], Ngoc-Diep Ho and his colleagues proposed *
rank-one residual iteration (RRI) method. In RRI, a residual matrix Ri (i=1…r) is obtained with respect to each pair of column vector ui and vi of the factor matrices. And for each residual matrix Ri, a pair of non-negative optimal updating vectors can be derived directly. RRI algorithm not only converges to a stationary point but also converges much faster than the previous NMF methods. From RRI method, it may imply that, if the optimal non-negative matrix V (or U) in each iterative update can be obtained directly, the required iteration overhead for NMF can be greatly reduced. Unfortunately, it’s difficult to derive the exclusive optimal non-negative matrix V (or U) during each iterative update. However, after doing appropriate linear transformation, it may get the optimal non-negative matrix for the transformed cost function. In this paper, transformation matrix based NMF is proposed and analyzed. In section 3, we will discuss the whole process of the algorithm. Complexity analysis will be provided in section 4. Experimental simulation result will be provided in section 5. It shows that the proposed transformation matrix based NMF method can surpass the previous NMF methods in the speed of reducing residual error and achieve a slightly less residual error after enough iteration. And it also shows that the proposed method can work better in text clustering than the known RRI and Multiplicative NMF methods.
2 Problem Description Given a non-negative matrix Amxn, to solve non-negative factor matrix Umxr and Vrxn, which satisfy the following,
min
×r r ×n U∈R m + , V∈R +
A − UV
2 F
(1)
In non-negative matrix factorization, alternating least squares method (ALS) is widely used, i.e. a least squares step is followed by another least squares step [6,7,8]. For example, given U, solve for non-negative matrix V at first. Next, given V, solve for non-negative matrix U alternatively. This alternative update process will continue until reaching the ending condition.
3 Transformation Matrix Based NMF 3.1 Algorithm Proposition
% , i.e. Given U, solve non-negative matrix V
% = arg min A − UV V + V∈R
2 F
(2)
A Novel Fast NMF Algorithm and Its Application and Its Application in Text Clustering
377
Then, it is equivalent to calculate each non-negative column vector independently,
v% i = arg min+ ai − Uv i
2 F
vi ∈R
, i=1,..,n
where ai is the ith column vector of matrix A. However, as to the following equation:
x% = arg min+ a − Ux
2
(3)
F
x∈R
it is always difficult to derive the optimal non-negative solution. But, if using appropriate linear matrix transformation as below,
(U U) T
−1
UT ( a − Ux ) = ( UT U ) UT a − x −1
therefore, we can get non-negative vector x , which satisfies
x = arg min+ ( UT U ) UT a − x
2
−1
(4)
x∈R
F
Then, it can get −1 x = ⎡⎢( UT U ) UT a⎤⎥ ⎣ ⎦+
(5)
Obviously, x is the non-negative part of the Least-square Solution to equation a = Ux .
G
Lemma 1: Given column vector y, matrix G and Lemma 2: Let y ( x ) = a − Ux , matrix G with
x1 = arg min x
G
F
x
y (x1 )
F
−
1 Gy ( x1 ) GF
y (x1 )
F
− y(x0 )
Remark 1: According to the above lemma 2, if
F
ε
F
Gy G
F
≤ y
F
.
F
≠ 0 , and
1 Gy (x ) GF
x 0 = arg min y ( x ) Given any ε > 0 , if
≠ 0 ,then:
F
F
F
≤ ε , then
≤ε . is small enough,
y (x1 )
F
will
approximate to y ( x 0 ) F . In other words, if ε is small enough,the solution vector to the transformed cost function can also achieve the optimal value of the original cost function.
378
F. Li and Q. Zhu
Remark 2: From the lemma above, teed even with of
Gy ( x k +1 )
F
y (x k +1 )
F
≤ y (x k )
F
is possibly not guaran-
≤ Gy (x k ) F . So, with the iteratively updated vectors
{x k , k = 1,...} obtained via gradient descent method to the transformed cost func-
tion, it can’t guarantee monotonical decrease of the original cost function Therefore, the iterative updates should be terminated if
y (x ) F .
y (x ) F are not monotoni-
cally decreasing. It also means that residual error after transformation matrix based NMF is worthy of being exploited as discussed in section 3.3. 3.2 Transformation Matrix Based NMF Using Alternating Least Squares Method
(TM-ALS)
According to the above discussion, using alternating lease squares method, transformation matrix based NMF algorithm can be summarized as below. Algorithm: Transformation Matrix based NMF using Alternative Least Squares method (TM-ALS) (1) Initialize U and V. (2) Repeat the following updates until satisfying the stopping criterion, −1 V = ⎡⎢( UT U ) UT A ⎤⎥ ⎣ ⎦+
(a.1)
−1 U = ⎡⎢ AV T ( VV T ) ⎤⎥ ⎣ ⎦+
(a.2)
where the stopping criterion may be set to a fixed iteration number. 3.3 Enhancement to TM-ALS Algorithm Although TM-ALS could effectively improve the convergence speed, the residual error is worthy of being further exploited. Here, TM-ALS is considered to be cooperated with RRI. That is, using TM-ALS to do the initial matrix factorization, then, using RRI method to do the subsequent iterative update. Thus, the combined algorithm is summarized as below:
:
Algorithm TM-ALS combined with RRI (TM-RRI) (1) Initialize U and V. (2) Repeat the following updates until criterion a is satisfied,
A Novel Fast NMF Algorithm and Its Application and Its Application in Text Clustering
379
−1 V = ⎡⎢( UT U ) UT A ⎤⎥ ⎣ ⎦+
(b.1)
−1 U = ⎡⎢ AV T ( VV T ) ⎤⎥ ⎣ ⎦+
(b.2)
where criterion a can be the fixed iteration number. (3) Repeat the following updates until the stopping criterion b is satisfied. With respect to each column vector ui, vi(i=1…r), calculate the related residual matrix Ri:
R i = A − ∑ u j v Tj
(b.3)
j ≠i
Next, updating the column vectors according to the following:
⎡⎣ R Ti ui ⎤⎦ + vi = 2 ui 2
(b.4)
⎡ R i vi ⎤⎦ + ui = ⎣ 2 vi 2
(b.5)
where criterion b can also be the fixed iteration number. More discussion of stopping criterion can refer to [7,8].
4 Complexity Analysis With non-negative matrix A m×n , U m×r and 2
3
sion (a.1) is about O(mnr+2mr +r ). Here
Vr×n , computation complexity of expres-
,complexity of calculating U U is about T
O(mr2), complexity of the matrix inversion is about O(r3), and complexity of the two matrix multiplication are about O(mr2) and O(mnr) respectively. Furthermore, considering that m will be much more greater than r in most practical implementation such as text clustering, therefore, the main computation complexity of expression (a.1) is about O(mnr). Similarly, the main computation complexity of expression (b.1) is also about O(mnr). Computation complexity of each iterative update in RRI and multiplicative update algorithm is about O(mnr)[8]. So, transformation matrix based NMF has a comparable computation complexity as the current NMF methods in each iterative update.
380
F. Li and Q. Zhu
5 Experimental Simulation In this section, different NMF methods mentioned above are evaluated. The normalized residual error after the ith iterative update is compared. Furthermore, we introduce the documents’ silhouette value to measure the effectiveness of different text clustering algorithms. The silhouette value for point i is defined as:
S (i ) =
min(b(i, k )) − a(i ) max(a(i ), min(b(i, k )))
where a(i) is the average distance from the ith point to all the other points within its own cluster, and b(i,k) is the average distance from the ith point to all the points within cluster k. Obviously, silhouette value for each point is a measure of how similar that point is to points in its own cluster compared to points in other clusters. Therefore, the bigger the silhouette values, the better the clustering result. 5.1 Random Non-negative Matrix Simulation In this part, computer generated random non-negative matrix Amxn (m=100, n=20) is chosen for factorization. In Fig. 1, average normalized residual error for such NMF methods as TM-ALS, Mult and RRI, are plotted with respect to the iteration number. It shows that: z z z
TM-ALS and RRI both outperform Multi algorithm greatly in the speed of reducing residual error. TM-ALS is slightly better than RRI in reducing residual error in the initial updates. With the growth of iterative updating times, residual error of RRI is lower than that of TM-ALS.
5.2 TanCorp Document Corpora Tancorp document corpora[9,10] is one of the available test sets for chinese document clustering, where documents have been manually clustered based on their topics. In this part, 60 documents from such three categories as sports, scientific techniques and geography are selected for clustering experiment. After the indispensable preprocess such as word segmentation and stop words removal, the term-document matrix is constructed. Then, NMF of this term-document matrix is implemented using TMRRI, RRI and Mult NMF separately. Next, text clustering is made based on the right factor matrix in NMF. In Fig. 2, silhouette values for the clustered documents are plotted. It shows that silhouette values using TM-RRI are bigger than those using RRI and Mult Methods.
A Novel Fast NMF Algorithm and Its Application and Its Application in Text Clustering
Fig. 1. average normalized residual error versus iteration number (r=3, 5)
1 1
Cluster
Cluster
2
2
3
3 -0.5
0 0.5 Silhouette Value
1
-0.2
0
(a)
0.2 0.4 0.6 Silhouette Value
0.8
1
(b) 1
Cluster
2
3
0
0.2 0.4 0.6 Silhouette Value
0.8
1
(c) Fig. 2. Silhouette value chart after text clustering using Mult(a),RRI(b) and TM-RRI(c) method
6 Conclusion In this paper, we have proposed and analyzed the transformation matrix based NMF methods. The proposed NMF methods can provide better performance in reducing residual error and achieve faster convergence speed than the known RRI and Multiplicative NMF methods but with the comparable computation complexity. It also
382
F. Li and Q. Zhu
shows that the proposed TM-RRI is better than RRI and Multiplicative NMF methods in text clustering. Acknowledgement. This work is supported by Natural Science Foundation of China (No. 60774079). The authors are grateful for the anonymous reviewers who made constructive comments.
References 1. Non-negative matrix factorization, http://en.wikipedia.org/wiki/ 2. Liu, W.X., Zheng, N.N., You, Q.B.: Non-negative matrix factorization and its applications in pattern recognition. Chinese Science Bulletin 51, 17–18 (2006) 3. Pascual-Montano, A., Carazo, J.M., Kochi, K., Lehmann, D., Pascual-Marqui, R.D.: Nonsmooth Non-Negative Matrix Factorization (nsNMF). IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 403–415 (2006) 4. Pascual-Montano, A., Carmona-Saez, P., Chagoyen, M., Tirado, F., Carazo, J.M., PascualMarqui, R.D.: bioNMF: a versatile tool for non-negative matrix factorization in biology. BMC Bioinformatics (2006) 5. Daniel, D., Lee, H., Sebastian, S.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999) 6. Daniel, D., Lee, H., Sebastian, S.: Algorithms for Non-negative Matrix Factorization. In: Advances in Neural Information Processing Systems, vol. 13, pp. 556–562. MIT Press, Cambridge (2001) 7. Chih-Jen, L.: Projected Gradient Methods for Non-negative Matrix Factorization. Neural Computation 19, 2756–2779 (2007) 8. Ngoc-Diep, H., Van Dooren, P., Blondel, V.D.: Descent Methods for Non-negative Matrix Factorization. Numerical Linear Algebra in Signals (2007) 9. Tan, S.B., Wang, Y.F.: Chinese text categorization corpora –TanCorp V1.0, http://www.searchforum.org.cn/tansongbo/corpus.htm 10. Tan, S.B., Cheng, X.Q., Moustafa, M.G., Wang, B., Xu, H.B.: A Novel Refinement Approach for Text Categorization. In: 14th ACM Conf. on Information and Knowledge Management, pp. 469–476. ACM Press, Bremen (2005)
Coordination of Urban Intersection Agents Based on Multi-interaction History Learning Method Xinhai Xia1,2 and Lunhui Xu1 1
School of Civil Engineering and Transportation, South China University of Technology, Guangzhou 510640, China 2 Department of Port and Shipping Management, Guangzhou Marine College, Guangzhou 510725, China [email protected]
Abstract. The high growth rate of vehicles per capita now poses a real challenge to efficient Urban Traffic Control (UTC).An efficient solution to UTC must be adaptive in order to deal with the highly-dynamic nature of urban traffic. In this paper we have adopted a multi-interactive history learning approach for coordination of urban intersection traffic signal agents. The design employs an agent controller for each signalized intersection that coordinates with neighbouring agents.Multi-interaction model for urban intersection traffic signal agents was built based on two-person game which has been applied to let agents learn how to cooperate. A multi-interactive history learning history algorithm(HL) was constructed. This algorithm takes all history interactive information which comes from neighbouring agents into account. In the algorithm proposed, the learning rule assigns greater significance to recent than to past payoff information. To achieve this motivation ,a memory factor δ is used in order to avoid the complete neglect of the payoff obtained by one action in the past. The memory factor namedδreflects the influence of newer interactive information on the Agent decision.How it will affect the algorithm’s performance was analysed by the experiment with traffic control of a few connected intersections .Analyzing the results, one sees that the memory factor has an effect on the time needed to reach a given pattern of coordination. Keywords: Agent; traffic signal control; learning; coordination.
and the communication and fusion of traffic data, must all be taken into consideration. In order to provide a scheme that deals with all these challenges, an adaptive UTC solution must be deployed. The optimization of traffic light control systems is at the heart of work in traffic management.Many of the solutions considered to design efficient traffic signal patterns rely on controllers that use pre-timed stages.Such systems are unable to identify dynamic changes in the local traffic flow and thus cannot adapt to new traffic conditions.An alternative,novel approach proposed by computer scientists in order to design adaptive traffic light controllers relies on the use of intelligent agents.The idea is to let autonomous entities,named agents,learn an optimal behavior by interacting directly in the system.By using machine learning algorithms based on the attribution of rewards according to the results of the actions selected by the agents,we can obtain a control policy that tries to optimize the urban traffic flow. In recent years, there are two shortages in the study of self-coordination mechanism in the field of Multi-Agent Systems (MAS):one is on the assumption of the complete knowledge that utility matrix of each agent is public ,the other is on the assumption of single interaction that each agent interacts only one time irrespective of the effect of previous interactive history information. Therefore, multi-interaction maybe becomes the key to solve the problem of the assumption of the complete knowledge. Since most control tasks are often too complex to be modeled and solved exactly by humans,we are interested in letting agents learn their own adaptive behavior by learning.Using this approach,agents receive reward signals in response to taking actions in their environment.Learning algorithms then proceed to associate the current state of the agent to the action it should take to maximize its expected reward. MAS, numerous decentralized learning methods are being customized for UTC. This is a new way of achieving highly-adaptive and responsive near-optimal solutions for the UTC problem. A few researchers have studied the use of intelligent agents and learning for traffic light control. Reinforcement learning is considered to be one of the approaches that provides adaptive optimization solutions to control problems. Classical reinforcement learning is a centralized approach while a number of collaborative and decentralized reinforcement learning methods have been proposed for solving decentralized optimization problems.[1] used model-based reinforcement learning to learn traffic light control strategies.The approach was based on a highly simplified model of traffic flow.The non-stationary aspects of the problem was focused on. In order to detect changes in the traffic flow, [2]designed a context detection algorithm that learns and triggers different control policies according to the traffic flow detected. The algorithm was able to detect those changing phases and adapt to currently observed conditions.Abdulhai et al.[3,4] have shown that the use of reinforcement learning,particularly Q-Learning,for providing adaptive traffic control solutions is a promising approach to pursue.They argue that the use of Q-Learning is encouraging since it is an off-policy unsupervised learning approach that does not need a predefined model for the environment. [5] proposed a distributed Q-Learning scheme in which optimization is aimed at controlling vehicle speed. More complex learning techniques were used by Richter et al[6-7].They exploited the Natural Actor-Critic algorithm that is based on 4 different learning methods,i.e.,policy gradient,natural gradient,temporal difference and least-square temporal difference.In their simplified simulation they had 5 scenarios and every junction on
Coordination of Urban Intersection Agents
385
the grid had 4 phases .In order to provide(intelligent)cooperation schemes among learning -based traffic control agents, learning has been coupled with different genetic algorithms in several cases[8,9].
2 Intersection Traffic Signal Agents Model Traffic flow of every intersection of urban area is influence on each other. Urban area traffic coordinated control system based on muli-agent mostly utilizes traffic signal control agent interaction to realize the optimization of traffic signal control. Traffic signal control agent(TSCA) is a intelligent planning control system which can have real-time dynamic, autonomous interaction with constant changeable external environment. It can perceive and act on environment, and expect to obtain a certain goal through implementing actions. TSCAs are classified into Intersection agent and area agent. A area agent takes charge of some intersection Agents. These local stochastic events of happen with probability σ i at each intersection (in an independent way),while global changes of the area agent occur with probability
γ
.
For any intersection agent, we can calculate the average waiting time
tdg + tdr Tn
(1)
Tn ⎡ ⎤ Q * ( T t ) q ki * (Tn − k ) − off −c * t ig ⎥ − + ∑ n ig ⎢ ( n −1)i k =0 i =1,i ≠ p ⎣ ⎦
(2)
td = t dr =
td from
M
∑
Tn
t dg = Q( n −1) p * (Tn − t pg ) + ∑ q kp * (Tn − k ) − off −c * t pg
(3)
k =0
Where
t dr is the vehicle waiting time in red light direction of current phase; tdg is the vehicle waiting time in green light direction from the time when
the green light signal in last cycle is over to present time;
Tn is the length of signal cycle time; M is the number of phases; q kp
p -th is the number of vehicles arriving in the k -th time of the
phase;
Q( n−1) p
is the number of waiting vehicles in this direction at the end of
p the -th green light signal phase in the ( n -1)-th cycle; off −c
is the leaving rate of the vehicle of the intersection;
t pg
is the length of of green light time in the
p -th phase.
386
X. Xia and L. Xu
Intersection agent is mainly composed of learning module, action decision-making module, communication module, and coordination module. Learning module infers whether there are reasonable regulations from real-time observational data. If some reasonable regulations exist, learning module executes these regulations and determine signal control plan. Coordination module analyzes present traffic state of intersection agent decides if it is necessary to send messages to the adjacent intersection agents, and deals with the intersection agent coordination. Communication module is mostly responsible for the communication with management agent and adjacent intersection agents. Action decision-making module resolve the function of reasoning and decision-making of intersection agent. Intersection agent working process is as flows: First, vehicle detection device sends the detected traffic state information to the learning module of intersection agent, at the same time adjacent intersection agent also provide with its self traffic state information; Second, learning module offers decision-making reference for decision-making module according to the received or learned information and experience knowledge. Then, decision-making module selects and executes rational action .When the control action acts on the intersection, it will alter the intersection’s traffic state, after a certain time interval, Vehicle detection device sends the intersection’s traffic state to intersection agent, and a feedback will be sent to learning module. Finally, learning module will modify the value of the feedback,and make a decision according to traffic state again. The process above is repeated. The value of the feedback can be the ratio of passing traffic quantity of green light phase to passing traffic quantity of red light phase in decision-making interval.
3 Multi-interaction Model for Intersection Agents In this section,agents are intersections located in a network.The central concern is how to select an equilibrium when more than one exists. Each agent i plays a twoperson game against each member of his neighborhood in a network.Multiinteraction model for intersection agents is represented by the 3n-tuple
, i =1,…, n ; Where,
k =(1,2,…) is the number of interaction; Agt={agt1,agt2, …,agtn} is the set of agents; S i = {s i1 , s i 2 ,..., s im (i ) } is the set of pure strategies of agent i ; k pik = ( pik1 ,..., pijk , pim ( i ) ) (the mixed strategy of agent i )is a probability distribu-
tion on
Si in the k-th interaction;
k ij
p is the probability assigned to the j-th pure strategy of agent i in the k-th interaction,with
p > 0 and k ij
m (i )
∑p
k ij
j =1
= 1 for each si , j ∈ Si ;
Coordination of Urban Intersection Agents
387
p k = ( p1k ,...., pik−1 , pik , pik+1...., pnk ) is the mixed strategy situation in the k-th interaction ; m (1)
m( n)
j1
jn
uik ( p k ) = ∑ ... ∑ uik ( s = ( s1 j1 , s2 j 2 ,..., snj n )) p1kj1 p2k j 2 ... pnjk n :is the payoff fuction of agent
p
*k
i in the k-th interaction;
is a mxied equilibrium situation in the k-th interaction ,it is computered as
follows:
uik ( p*k qi ) ≤ uik ( p*k ) with
(4)
p qi = ( p1 ,..., pi−1 , qi , pi+1 ,..., pn ) , for each agti ∈ Agt and qi (probabilk
ity distribution on
Si , qi ≠ pik )in the k-th interaction. k
i updates its mixed strategy based solely on the payoff received by sek lecting an action. pi ,the mixed strategy for agent i ,is time dependent.After selecting Each agent
si ,t ∈ Si at time t ,each agent i receives an individual payoff calculated as the sum of the payoffs obtained by playing against each agent j . an action
4 Multi-interactive History Learning History Algorithm Learning is an important component of the approach.Therefore,the history especially the recent one of the game plays a significant role in deciding the selection of future strategies. This history is partially discarded only when a player detects a change in its environment, in which case it has to react to it in a new way. The dynamics of the model is as following multi-interactive history learning history algorithm: In the (k-1)-th interaction, agti (agent i ) selects the pure strategy
sihi ( agti ∈ Agt , sihi ∈ S j ). In the k-th interaction ,
agt j (agent j ) selects.
which (1) 2
( )
agti (agent i ) thinks that p kjl is the mixed strategies
k = 0 , p 0jl = priori knowledge (if no priori knowledge,then p kjl = 1 / S j );
k = 1 , p1jl = 0 ( l ≠ h j ); p1jh j = 1 ;
(3) k
> 1 , p kjl = δp kjl−1 (k − 1) / k ( l ≠ h j );
p kjh j = (δp kjh−j1 (k − 1) + k − δ (k − 1)) / k Where, δ
is
memory factor,with
0 ≤ δ ≤ 1.
388
X. Xia and L. Xu
agti selects pure strategy si ∈ Si due to the predicted p−k i to maximize uik ( p−k i , si ) .If more than two same maximums exist , agti should deal with this problem as follows: if there is a pure strategy
sihk −i 1 , agti selects sihk −i 1 ,otherwise
agti selects one at random. In the algorithm proposed, the learning rule assigns greater significance to recent than to past payoff information. To achieve this, a memory factor δ ( 0 ≤ δ ≤ 1 )is used in order to avoid the complete neglect of the payoff obtained by one action in the past.
5 Simulation Be n the number of agents and Agt the set of agents. The network K is an arterial composed of n =10 intersections, each being designed as an agent. The range of interaction among neighbors is r =1. Main lanes are those in direction W and E . In order to reach a full synchronization of the signals, all agents have to select the same action from the set Si = {spW , sp E } of signal plans, for each agti ∈ Agt . If two neighboring intersections have each to choose between running a signal plan which gives preference to the flow of vehicles traveling either westwards ( spW ) or eastwards ( spE ), then the payoff which both intersections get is presented in Table 1.
Besides: a =2, b =1, c = 0 ,in case the global goal is to coordinate towards the direction west, or a =1, b =2, c = 0 in the opposite case. The performance of the algorithm is measured by the time needed to coordinate .Interactions occurring between all pairs of neighbors can be of three types: WW (both selecting spW ,), EE (both selecting spE ), and miscoordinated (WE+EW) (i.e.,
spW , spE or spE , spW ). One expects that all agents place increasingly higher probability on the more profitable action(let us say spW ), and that on the they select either
long run they select this action with probability close to one. Consequently ,the pareto-efficient equilibrium ( spW , spW ) is expected to be selected. In this section the effect of the selection of a strategy is verified using the measurement of performance discussed above , and also according to the utility values as in Table 1. Be tc the time needed to reach pW =0.9 , f l the frequency of learning ,
f i the frequency of a local change in traffic condition and δ the memory factor .The more profitable strategy is assumed to be spW unless otherwise stated. The simulation time is 50 periods.In order to analyze the influence of the memory factor, experiments were done with δ =0.80,0.95,and 1.00. In the situation with f l =10 and
f i =20, tc is not reached,22 periods,28 periods respectively; In the situation with f l =10
Coordination of Urban Intersection Agents
389
f i =100, tc is 21 periods,18 periods,18 periods respectively ; In the situation with f l =5 and f i =20, tc is 10 periods,10 periods,10 periods respectively. Analyzing these
and
results, one sees that the memory factor has an effect on the time needed to reach a given pattern of coordination. In general, as expected, the lower the memory factor, the higher the time needed for agents to select spw with probability pW =0.9. This
happens because the lower δ , the more the weight of past payoffs, and hence the higher the inertia. In the case in which the population is expected to perform poorly (for δ =1.0, f l =10,and f i =20), by setting δ = 0.8 the population even fails to reach
pW =0.9 within the simulation time. For an intermediate situation with f l =10 and f i =100, tc differs only slightly under the three memory factors. And finally, in the situation expected to be the best of those compared, namely with f l = 5 and f i =200, results were the same. Here there is room for employing an even lower memory factor if necessary. Table 1. Pure-coordination game :payoff matrix Agent1
Agent2
spW
spW a/a
spE c/c
spE
c/c
b /b
6 Conclusions The design employs an agent controller for each signalized intersection that coordinates with neighbouring agents. Multi-interaction model for urban intersection traffic signal agents was built based on two-person game which has been applied to let agents learn how to cooperate. A multi-interactive history learning history algorithm using two-person game for multi-agent in game theory was constructed. This algorithm takes all history interactive information which comes from neighbouring agents into account. In the algorithm proposed,the learning rule assigns greater significance to recent than to past payoff information.To achieve this motivation, a memory factor δ is used in order to avoid the complete neglect of the payoff obtained by one action in the past. We can infer that the memory factor has an effect on the time needed to reach a given pattern of coordination from the experiment with traffic control of a few connected intersections. In general, the lower the memory factor, the higher the time needed for agents to select spw with probability pW =0.9. This happens because the lower δ , the more the weight of past payoffs, and hence the higher the inertia. Multiinteraction history learning also help to accelerate converge to equilibrium point and is capable of adapting to the change of dynamic environments .
390
X. Xia and L. Xu
References 1. Wiering, M.: Multi-Agent Reinforcement Learning for Traffic Light Control. In: Seventeeth International Conference on Machine Learning and Applications, pp. 1151–1158 (2000) 2. Bazzan, A.L.C., da Silva, B.C., de Oliveria, D., Basso, E.W.: Adaptive traffic control with reinforcement learning. In: Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 80–86 (2006) 3. Abdulhai, B., Pringle, P.: Autonomous multiagent reinforcement learning - 5gc urban traffic control. In: Annual Transportation Research Board Meeting (2003) 4. Abdulhai, B., Pringle, P., Karakoulas, G.: Reinforcement learning for true adaptive traffic signal control. ASCE Journal of Transportation Engineering 129(3), 278–284 (2003) 5. Pendrith, M.D.: Distributed reinforcement learning for a traffic engineering application. In: AGENTS 2000, pp. 404–411. ACM Press, New York (2000) 6. Richter, S., Aberdeen, D., Yu, J.: Natural actor-critic for road traffic optimization. In: Advances in Neural Information Processing Systems, vol. 19. The MIT Press, Cambridge (2007) 7. Peters, J., Vijayakumar, S., Schaal, S.: Natural actor-critic. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 280–291. Springer, Heidelberg (2005) 8. Cao, Y.J., Ireson, N., Bull, L., Miles, R.: Design of a traffic junction controller using classifier systems and fuzzy logic. In: Proceedings of the 6th International Conference on Computational Intelligence, Theory and Applications, pp. 342–353. Springer, London (1999) 9. Mirchandani, P., Head, L.: A real-time traffic signal control system: architecture, algorithms and analysis. Transportation Research Part C: Emerging Technologies 9, 415–432 (2001)
Global Exponential Stability Analysis of a General Class of Hopfield Neural Networks with Distributed Delays Chaojin Fu1 , Wei Liu1 , and Meng Yang2 2
1 College of Mathematics and Statistics, Hubei Normal University School of Mathematics and Physics, Huangshi Institute of Technology, Huangshi 435000, China [email protected], [email protected], [email protected]
Abstract. In this paper, based on contraction mapping principle and differential inequality technique, we investigate global exponential stability of a general class of Hopfield neural networks with distributed delays. Some sufficient conditions are derived which ensure the existence, uniqueness, global exponential stability of equilibrium point of the neural networks. Finally, an example is given to illustrate advantages of our approach. Keywords: Hopfield neural networks, Global exponential stability, Distributed delays.
1
Introduction
Recently, the dynamical behavior of neural networks have been attracted increasing attention due to its applicability in signal processing, bidirectional associative memory, parallel computation [1, 2]. One of the most studied problems in dynamical behavior of neural networks is the existence, uniqueness, and global stability of the equilibrium points, stability analysis of the equilibrium points are mainly focused on global asymptotic stability [3], global exponential stability [4], robust stability [5]. In the past two decades, stability analysis of neural networks with delays are commonly encountered, time delays may cause undesirable dynamic behavior such as oscillation and instability. Therefore, it is essential to study stability of neural networks with delays. Stability analysis of the neural networks with delays had been proposed in [3 − 7]. the methods most of them used is Lyapunov direct method, however, as we all know, for a complex system, construct a suitable Lyapunov function is difficult. In this paper, we mainly depend on the properties of upper-right Dini-derivation and by the way of contradiction, we get corresponding stability results. The remainder of this paper is organized as follows. In section 2, model description and preliminaries are given. And then in section 3, our main results are presented. In section 4, a numerical example is supplied to illustrate the effectiveness of our obtained results. Finally, in section 5, our conclusion is given. Y. Tan, Y. Shi, and K.C. Tan (Eds.): ICSI 2010, Part II, LNCS 6146, pp. 391–397, 2010. c Springer-Verlag Berlin Heidelberg 2010
392
2
C. Fu, W. Liu, and M. Yang
Model Description and Preliminaries
Consider a general class of Hopfield neural networks as follows: x˙ i (t) = − di xi (t) + +
n j=1
cij
n j=1 t
−∞
aij Fj (xj (t)) +
n
bij Gj (xj (t − τ ))
j=1
kij (t − s)Hj (xj (s))ds + Ii ,
i = 1, · · · , n,
(1)
where n denotes the number of neurons; xi (t) is the state of the ith neuron at time t, the state vector x(t) := (x1 (t), x2 (t), · · · , xn (t))T ; di > 0 is the neural self-inhibitions of the ith neuron, D := diag(d1 , d2 , · · · , dn ); Lipschitz continuous functions Fj (·), Gj (·), Hj (·) represent the input-output activations, F := (F1 , · · · , Fn )T , G := (G1 , · · · , Gn )T , H := (H1 , · · · , Hn )T ; aij , bij , cij denote the connection of the jth neuron to the ith neuron, respectively, A := (aij ) ∈ Rn×n , B := (bij ) ∈ Rn×n , C := (cij ) ∈ Rn×n , connection weight matrices A, B, C are not assumed to be symmetric; τ is a positive constant; Ii is the external bias on the ith neuron, I := (I1 , · · · , In )T , I0 := max |Ii | < +∞; 1≤i≤n
The kernel functions kij : [0, +∞) → [0, +∞) are continuous on [0, +∞) with +∞ kij (t)dt = 1, i, j = 1, 2, · · · , n. 0 System (1) is supplemented with initial conditions xi (t) = φi (t), t ∈ (−∞, t0 ], i = 1, 2, · · · , n, in which φi (t) ∈ C((−∞, t0 ]; R), C((−∞, t0 ]; R) is a Banach space of continuous mapping which maps (−∞, t0 ] into R with a topology of uniform convergence. φ(t) := (φ1 (t), φ2 (t), · · · , φn (t))T , φ := sup φ(t0 + θ)∞ < +∞, θ ∈ (−∞, 0]. For convenience, we introduce some notations. R denote the set of real numbers. Let Q = (qij )m×n be a real matrix, Q∞ represents the infinity norm n |qij |. Given a column vector x = of matrix Q. That is, Q∞ := max T
n
1≤i≤m j=1 (x1 , · · · , xn )T denotes
(x1 , · · · , xn ) ∈ R , where the transpose of row vector (x1 , · · · , xn ), we note the infinity norm of column vector x is x∞ := max |xi |. 1≤i≤n
Furthermore, we introduce the following assumptions on (1): (A1 ) The neuron activation functions Fj , Gj and Hj are Lipschitz continuous. That is, there exist positive constants αj , βj , and γj , j = 1, 2, · · · , n, such that |Fj (u1 ) − Fj (u2 )| ≤ αj |u1 − u2 | , |Gj (u1 ) − Gj (u2 )| ≤ βj |u1 − u2 | , |Hj (u1 ) − Hj (u2 )| ≤ γj |u1 − u2 | , ∀u1 , u2 ∈ R. (A2 ) αmax A∞ + βmax B∞ + γmax C∞ := E < dmin , where αmax := max αj , βmax := max βj , γmax := max γj , dmin := min dj . 1≤j≤n 1≤j≤n 1≤j≤n 1≤j≤n +∞ (A3 ) 0 kij (t)eσt dt < ζ, where 0 < σ < 1, 1 < ζ < +∞. (A4 ) αmax A∞ + βmax B∞ + γmax ζ C∞ < dmin .
Global Exponential Stability Analysis
393
Definition 1. A constant vector x∗ := (x∗1 , · · · , x∗n )T ∈ Rn is an equilibrium point of system (1) if and only if x∗ is a solution of the following equation: −di xi +
n
aij Fj (xj ) +
j=1
n j=1
bij Gj (xj ) +
n
cij Hj (xj ) + Ii = 0, i = 1, · · · , n.
j=1
Definition 2. If for ∀ Q1 > 0, ∀ φ ∈ C((−∞, t0 ]; Rn ), φ < Q1 , ∀ t0 ≥ 0, ∃ Q2 > 0, such that x(t)∞ ≤ Q2 , for ∀ t ≥ t0 holds, where x(t) is any solution of system (1), then all the solutions of system (1) are uniform bounded. Definition 3. The equilibrium point x∗ of system (1) is said to be global exponential stable, if there exist constants ε > 0 and K ≥ 1, such that x(t) − x∗ ∞ ≤ K φ − x∗ e−ε(t−t0 ) , ∀t ≥ t0 , where x(t) is any solution of system (1) with initial condition x(t) = φ(t), t ∈ (−∞, t0 ]. Definition 4. Let f be a continuous function on R, the upper-right Dini(t) derivation of f at t is defined to be D+ f (t) = lim+ sup f (t+Δt)−f . Δt h→0
0<Δt≤h
Lemma 1. (Contraction Mapping Principle). Suppose that (X, ρ) is a complete metric space, h : X → X, and there is a real number k, 0 < k < 1, such that ρ(h(x) − h(y)) ≤ kρ(x, y), ∀x, y ∈ X, then there is an unique point x0 ∈ X, such that h(x0 ) = x0 . Lemma 2. Let f be a continuous function on R, g is a differentiable function on R, then ˙ D+ (f (t) + g(t)) = D+ f (t) + g(t).
3
Main Results
Theorem 1. If (A1 ), (A2 ) hold, then for any φ ∈ C((−∞, t0 ]; Rn ), for ∀ Q1 > 0, if φ < Q1 , there exists a constant Q2 = max(Q1 , I0m+1 ) > 0, where m := dmin − αmax A∞ − βmax B∞ − γmax C∞ , such that x(t)∞ ≤ Q2 ,
(2)
where x(t) = (x1 (t), x2 (t), · · · , xn (t))T is any solution of system (1) with initial condition x(t) = φ(t), t ∈ (−∞, t0 ], i.e., all the solutions of system (1) with the initial condition are uniform bounded.
394
C. Fu, W. Liu, and M. Yang
Proof. In order to proof (2), firstly, we proof that x(t)∞ < rQ2 , r > 1.
(3)
If (3) is false, for t ∈ (−∞, t0 ], x(t)∞ ≤ φ < Q1 < rQ2 holds, therefore, there must exist i and t1 > t0 , such that xi (t1 ) = rQ2 , |xi (t)| < rQ2 , ∀t ∈ [t0 , t1 ), (4) |xj (t)| ≤ rQ2 , j = i, j = 1, 2, · · · , n, ∀t ∈ [t0 , t1 ].
(5)
D+ xi (t1 ) ≥ 0.
(6)
Thus,
We construct Lyapunov function |xi (t)| foreach subsystem in (1), and calculating the upper-right Dini-derivative of xi (t1 ) along the trajectory of system (1), i = 1, 2, · · · , n, combining with (4), (5), we have D+ xi (t1 ) n n ≤ − dmin xi (t1 ) + |aij | Fj (xj (t1 )) + |bij | Gj (xj (t1 − τ ))
(7) contradicts to (6), thus, (3) holds. Let r → 1 in (3), then (2) holds. This completes the proof of Theorem 1. Theorem 2. Suppose that system (1) satisfies hypotheses (A1 ), (A2 ), then system (1) has an unique equilibrium point. Proof. By Theorem 1, we obtain any solution of system (1) which is supplemented with initial condition x(t) = φ(t), t ∈ (−∞, t0 ] is uniform bounded, i.e., for any φ ∈ C((−∞, t0 ]; Rn ), if for ∀Q1 > 0, φ < Q1 , then there exists some constant Q2 > 0, such that x(t)∞ ≤ Q2 , thus, we get AF (x(t)) + BG(x(t)) + CH(x(t)) + I∞ ≤ Q2 (αmax A∞ + βmax B∞ + γmax C∞ ) + I0 := N. Let di x∗i = u∗i , i = 1, 2, · · · , n, then u∗i
=
n j=1
aij Fj
u∗j dj
+
n j=1
bij Gj
u∗j dj
+
n j=1
cij Hj
u∗j dj
+ Ii .
Global Exponential Stability Analysis
395
Consider a mapping Ψ : Ω → Ω, where Ω := {u = (u1 , u2 , · · · , un )T | u∞ ≤ N } ⊂ Rn , defined by ⎞ ⎛ n n n uj uj uj a F b G c H + + + I 1j j 1j j 1j j 1⎟ ⎜ ⎟ ⎜ j=1 dj j=1 dj j=1 dj ⎜ ⎟ n n n uj uj uj ⎜ a2j Fj + b2j Gj + c2j Hj + I2 ⎟ ⎜ ⎟ dj dj dj ⎟. j=1 j=1 Ψ (u) := ⎜ ⎜ j=1 ⎟ .. ⎜ ⎟ ⎜ ⎟ . ⎜ n ⎟ n n uj uj uj ⎝ ⎠ anj Fj bnj Gj cnj Hj + + + In d d d j j j j=1 j=1 j=1 We will show that Ψ is a contraction mapping on Ω endowed with the norm ·∞ . In fact, for any two different points u := (u1 , u2 , · · · , un )T , u := (u1 , u2 , · · · , un )T ∈ Ω, we have Ψ (u) − Ψ (u )∞ αmax A∞ + βmax B∞ + γmax C∞ ≤ uj − uj < u − u ∞ , dmin which implies Ψ is a contraction mapping on Ω. Hence by Lemma 1, there exists an unique fixed point of the mapping Ψ , which is an equilibrium point of system (1), the existence of an unique equilibrium point of system (1) will follow. This completes the proof. Let (1) has an unique equilibrium point x∗ := (x∗1 , x∗2 , · · · , x∗n )T , we make a transformation: yi (t) := xi (t)−x∗i , fi (yi (t)) := Fi (yi (t)+x∗i )−Fi (x∗i ), gi (yi (t)) := Gi (yi (t) + x∗i ) − Gi (x∗i ), hi (yi (t)) := Hi (yi (t) + x∗i ) − Hi (x∗i ), Φi (t) = φi (t) − x∗i , i = 1, 2, · · · , n, then system (1) is transformed into the following system: y˙ i (t) = − di yi (t) + +
n j=1
cij
n j=1 t
−∞
aij fj (yj (t)) +
n
bij gj (yj (t − τ ))
j=1
kij (t − s)hj (yj (s))ds,
i = 1, · · · , n,
(8)
by the initial conditions of system (1), we obtain (8) is supplemented with initial conditions yi (t) = Φi (t) ∈ C((−∞, t0 ]; R). Φ(t) := (Φ1 (t), Φ2 (t), · · · , Φn (t))T , Φ := sup Φ(t0 + θ)∞ < +∞, θ ∈ (−∞, 0]. Obviously, fj , gj and hj are Lipschitz continuous, and fj , gj , hj also satisfy (A1 ), j = 1, 2, · · · , n. By this way, we shift the equilibrium point x∗ of system (1) to the origin of system (8). Theorem 3. If (A1 ), (A3 ), (A4 ) satisfied, then system (1) has an unique equilibrium point, which is global exponential stable. Proof. By Theorem 2, system (1) has an unique equilibrium point. ϕ(ε) = dmin − αmax A∞ − βmax B∞ eετ − γmax ζ C∞ − ε, by (A4 ), we obtain ϕ(0) > 0, thus, ∃ ε, 0 < ε 1, such that ϕ(ε) > 0. (9)
396
C. Fu, W. Liu, and M. Yang
For ∀Φ ∈ C((−∞, t0 ]; Rn ), we shall prove that y(t)∞ ≤ K Φ e−ε(t−t0 ) , ∀t ≥ t0 .
(10)
In order to proof (10), firstly, we proof that y(t)∞ − κK Φ e−ε(t−t0 ) < 0, κ > 1, ∀t ≥ t0 .
(11)
If (11) is not true, then there exist some i, t2 > t0 , such that yi (t2 ) − κK Φ e−ε(t2 −t0 ) = 0, |yi (t)| − κK Φ e−ε(t−t0 ) < 0, ∀t ∈ [t0 , t2 ), (12) (13) |yj (t)| − κK Φ e−ε(t−t0 ) ≤ 0, ∀t ∈ [t0 , t2 ], j = i, j = 1, 2, · · · , n. Thus,
D+ {yi (t2 ) − κK Φ e−ε(t2 −t0 ) } ≥ 0.
(14)
By Lemma 2 and (14), we obtain (15) D+ yi (t2 ) + εκK Φ e−ε(t2 −t0 ) ≥ 0. Calculating the upper-right Dini-derivative of yi (t2 ) along the trajectory of system (8), i = 1, 2, · · · , n, then by (9), (12), (13), (A1 ), (A3 ), (A4 ), we have n n D yi (t2 ) ≤ − dmin yi (t2 ) + |aij | fj (yj (t2 )) + |bij | gj (yj (t2 − τ )) +
(16) contradicts to (15), so (11) holds. Let κ → 1 in (11), thus, (10) holds. This proof is completed.
4
Illustrative Example
Given a two-state Hopfield neural network as follows: x˙ i (t) = − di xi (t) +
+
2 j=1
cij
2 j=1 t
−∞
aij Fj (xj (t)) +
2
bij Gj (xj (t − τ ))
j=1
kij (t − s)Hj (xj (s))ds + Ii ,
i = 1, 2,
Global Exponential Stability Analysis
1 1 5 10 1 1 15 20
397
, B = (bij )2×2 =
where D = diag(d1 , d2 ) = diag(5, 5), A = (aij )2×2 = 1 1 1 +∞ 3 9 , C = (c ) 2 1 , let k (t) = e−t , then = kij (t)dt = 1, ij 2×2 ij 1 1 1 1 0 6 15 4 6 +∞ kij (t)e−σt dt < 2, 0 < σ < 1, i, j = 1, 2, let ζ = 2, τ is a positive constant, 0 √ |v + 1| − |v − 1| Fi (v) = , Gi (v) = sin v, Hi (v) = cos v, v ∈ R, i = 1, 2, then 2 αmax A∞ + βmax B∞ + γmax ζ C∞ < dmin , so by Theorem 3, the system has an unique equilibrium point, which is global exponential stable.
5
Conclusion
The main purpose of this paper is to provide a new simple and feasible mathematical method to analysis stability of neural networks, i.e., based on the definition of global exponential stability of the system, we mainly depend on the properties of upper-right Dini-derivation and by the way of contradiction, we can also get corresponding results, no need to make great efforts to construct Lyapunov function, in fact, for a complicated system, this is very difficult. Acknowledgements. The work is supported by Key Science Foundation of Educational Department of Hubei Province under Grant D20082201 and Z20062202, Innovation Teams of Hubei Normal University.
References 1. Zhong, S., Liu, B., Wang, X., Fan, X.: Stability Theory of Neural Networks. Science Press, Beijing (2008) (in Chinese) 2. Hopfield, J.: Neurons with Graded Response Have Collective Computational Properties Like Those of Two State Neurons. Proceedings of the National Academy of Sciences, USA 81, 3088-3092 (1984) 3. Samidurai, R., Sakthivel, R., Anthoni, S.: Global Asymptotic Stability of BAM Neural Networks with Mixed Delays and Impulses. Applied Mathematics and Computation 212, 113–119 (2009) 4. Zhou, J., Li, S.: Global Exponential Stability of Impulsive BAM Neural Networks with Distributed Delays. Neurocomputing 72, 1688–1693 (2009) 5. Zheng, C., Zhang, H., Wang, Z.: Novel Delay-Dependent Criteria for Global Robust Exponential Stability of Delayed Cellular Neural Networks with Norm-Bounded Uncertainties. Neurocomputing 72, 1744–1754 (2009) 6. Ahmada, S., Stamov, I.: Global Exponential Stability for Impulsive Cellular Neural Networks with Time-Varying Delays. Nonlinear Analysis: Real World Applications 69, 786–795 (2008) 7. Wu, H.: Exponential Convergence Estimates for Neural Networks with Discrete and Distributed Delays. Nonlinear Analysis: Real World Applications 10, 702–714 (2009)
Object Recognition of a Mobile Robot Based on SIFT with De-speckle Filtering Zhiguang Xu, Kyung-Sik Choi, Yanyan Dai, and Suk-Gyu Lee Department of Electrical Engineering, Yeungnam University 214-1 Daedong Gyongsan Gyongbuk Republic of Korea [email protected], [email protected], [email protected], [email protected]
Abstract. This paper presents a novel object recognition method, of a mobile robot, by combining scale invariant feature transform (SIFT) and de-speckle filtering to enhance the recognition capability. The main idea of the proposed algorithm is to use SIFT programming to identify other robots after de-speckle filtering process to remove outside noise. Since a number of features are much larger than needed, SIFT method requires a long time to extract and match the features. The proposed method shows a faster and more efficient performance, which enhances localization accuracy of the slave robots. From the simulation results, the method using de-speckle filtering based SIFT shows that the number of features in the extraction process, and that the points in matching process are reduced. Keywords: SIFT, image processing, de-speckle filtering, mobile robot.
Object Recognition of a Mobile Robot Based on SIFT with De-speckle Filtering
399
correct matching of a single feature. However, there are some drawbacks such as a long calculation time and the extraction of a feature point with unnecessary elements because of noise in the images. Speckle, a multiplicative form, locally correlated noise, plagues imaging applications such as medical ultrasound image processing. For speckled images, it is very important to remove the speckle without destroying important image features. Since speckling tends to blur images, speckle noise reduction is important in most detection and recognition systems. Speckle is an inherent characteristic in laser, synthetic aperture radar images, or ultrasound images. De-speckle filtering also used in medical ultrasound B-scan (brightness scan) echo imaging is acquired by summation of the echo signals from ultrasound scatter in the ultrasound beam range [6]. Over the years, a wide variety of techniques have been developed to de-speckle filtering images. The earliest methods were spatial filters based on local statistics, working directly on the intensity image. Examples of such filters are the Lee-filter [7], the sigma-filter [8], and the Kuan-filter [9]. In the past decade, speckle filtering based on wavelet transform (WT) [10] has become more and more popular. In this paper, we combine SIFT and de-speckle filtering to solve the drawbacks. For proposing and presenting this approach, the SIFT algorithm and de-speckle concept are presented in section 2, and attempts to use de-speckle filtering on edge detection in section 3. In section 4, we performed simulations to compare the de-speckle processing fused SIFT with the original SIFT. Section 5 concludes the paper.
2 Related Knowledge 2.1 Scale Invariant Feature Transform(SIFT) 2.1.1 Scale Space Extrema Detection The first stage is to construct a scale-space which consists of a set of blurred and subsampled versions of the original image. The scale space of an image is defined as the function, L(x, y, σ ) , that is produced from the convolution of the variable-scale Gaussian, G (x, y, σ ) , with the input image, I (x, y ) : L(x, y, σ ) = G(x, y, σ ) ∗ I (x, y ) .
(1)
Next, using scale-space extrema in the difference of Gaussian (DoG) function convolved with the image, D(x, y, σ ) , which can be computed from the difference between the two nearby scales separated by the constant multiplicative factor k: D(x, y, σ ) = (G(x, y, kσ ) − G(x, y, σ )) ∗ I (x, y ) = L(x, y, kσ ) − L(x, y, σ ) .
(2)
In order to detect the local maxima and minima of D(x, y, σ ) , each point is compared with only the eight surrounding points of the same scales as well as nine points from the scale above and nine points from the scale below.
400
Z. Xu et al.
2.1.2 Key-Point Localization The edge as a small directional derivative in the oriental of edge, but in the vertical oriental of edge, the directional derivative is huge. The values of both directional derivatives are presented in the eigenvalues of Hessian matrix: ⎡ Dxx H =⎢ ⎣ Dxy
Dxy ⎤ ⎥. D yy ⎦
(3)
where Dxx , Dxy , Dyy are second order directional derivative. Let the eigenvalues of H matrix be α β (α > β ) then:
,
,
⎧ Tr (H ) = Dxx + D yy = α + β . ⎨ 2 ⎩ Det ( H ) = Dxx − Dxy = αβ
( )
(3)
where Tr (H ) is matrix trace and Det (H ) is matrix determinant value. For α = λβ , we have: Tr (H ) (α + β ) = (λ + 1) . = Det (H ) αβ λ 2
2
2
(5)
2.1.3 Orientation Assignment Before a descriptor for the key-point is constructed, the key-point is assigned an orientation to make the descriptor invariant to rotation. This key-point orientation is calculated from an orientation histogram of local gradients from the closest smoothed image L( x, y , σ ) . For each image sample L( x, y ) at this scale, the gradient magnitude m( x, y ) and orientation θ ( x, y ) is computed using pixel differences: ⎧m(x, y ) = (L( x + 1, y ) − L(x − 1, y ))2 + (L(x, y + 1) − L(x, y − 1))2 ⎪ . ⎨ −1 L( x, y + 1) − L( x, y − 1) ⎪θ (x, y ) = tan L(x + 1, y ) − L(x − 1, y ) ⎩
(6)
2.1.4 Key-Point Descriptor For each key-point, with the eigenvector to describe the information contained in this point, in order to maintain the rotation invariant of eigenvectors, at key-point, the coordinate axis is aligned with the main direction. Using function (6), we calculate the gradient module of each pixel and gradient direction to count the distribution of the 8 directions. The descriptor is formed from a vector containing the values of all the orientation histogram entries. 2.2 Speckle Noise Modeling In order to obtain an effective de-speckle filter, a speckle noise model is needed. For both ultrasound and SAR images a speckle noise model can be approximated as the multiplication [11]. The ultrasound imaging system in the receiver demodulator module output signals can be defined as
Object Recognition of a Mobile Robot Based on SIFT with De-speckle Filtering
y i , j = xi , j ni , j + ai , j .
401
(7)
where yi , j represents the noise pixel in the middle of the moving window, xi , j represents the noise-free pixel, ni , j and ai , j represent the multiplicative and additive noise, respectively. And i, j are the indices of the spatial locations that belong in the 2D space of real numbers, i, j ∈ R 2 . De-speckling is based on estimating the true intensity xi , j as a function of the intensity of the pixel yi , j with local statistics calculated on a neighborhood of this pixel. Back to Eq. (7), since the effect of the additive noise is much smaller than the multiplicative noise, it can be written as yi , j ≈ xi , j ni , j .
(8)
Therefore, the logarithmic compression transforms the model in Eq. (8) into the classical signal in the additive noise form as
( )
( )
( )
log yi , j = log xi , j + log ni , j .
(9)
g i , j = f i , j + nli , j .
(10)
and
The term log( yi , j ) , which is the observed pixel on the ultrasound image display after
logarithmic compression, is denoted as g i , j , and the terms log(xi , j ) and log(ni , j ) , which are the noise-free pixel and the noise component after logarithmic compression, are denoted as f i , j and nli , j , respectively.
3 De-speckle Filtering Used in Edge Detection to Test Noise Abatement Effect As mentioned, in the SIFT application, how to reduce the process time for a real time system is one of the most important problems researchers focus on. Before the experiment, the de-speckle filtering on edge detection processing was implemented to check whether or not de-speckle filtering removes noise, effectively. We modified the de-speckle filtering program developed by Christos P. Loizou & Constantinos S. Pattichis [12]. During the simulation, we used blur, de-speckle and combined blur and de-speckle at the same time, and compared these results to obtain a good performance in real time experimentation.
402
Z. Xu et al.
Fig. 1. Five simulation results with different processing methods: (a) original image, (b) edge detection result of (a), (c) image after de-speckling 5 times, (d) edge detection result of (c), (e) image after blurring 5 times, (f) edge detection result of (e), (g) image after de-speckling 5 times + blurring 5 times, (h) edge detection result of (g), (i) image after blurring 5 times + despeckling 5 times, (j) edge detection result of (i)
The simulation results show blur and de-speckle methods reduce the noise in edge detection processing to some extent. By contrast, the result of de-speckle processing 5 times resulted in the most obvious effect of reducing noise. Although removing the noise of the original image also removed the excess key-point, the result is relatively clean and has almost no distortion for the object itself. As seen in the results, after despeckle processing, not only could ensure the characteristics of the target object’s completeness, but could also remove the noise of the image. Therefore, the proposed method demonstrates a satisfactory result.
4 Experimentation Procedure 4.1 Experiment Purpose and Environment In order to prove the de-speckle property of the proposed algorithm; an experiment was conducted with two robots (one master and one slave robot) under a comparatively simply structured environment and fluorescent lamps. In the experiment, the master robot uses a vision sensor as explorer, and the slave robot is responsible for exploring and constructing a map using other sensors. This paper focuses on the problem of how the master robot identifies the slave robot quickly and efficiently, and determines the slave robot’s position accurately in a shorter time.
Object Recognition of a Mobile Robot Based on SIFT with De-speckle Filtering
403
4.2 Simulation Results Using SIFT and SIFT Based De-speckle Filtering To show the enhancement of SIFT based de-speckle filtering by comparing it with result using SIFT, the simulation source developed by Lowe [13] [14] was modified.
(a)
(b)
Fig. 2. Matching result of original image using SIFT and SIFT based de-speckle filtering
Fig. 3. Two simulation results of SIFT features found, with scale and orientation indicated by the size and orientation of the squares: (a) original image of target, (b) image of target using despeckle processing, (c) original image, (d) image does de-speckle processing
404
Z. Xu et al.
The results show that in (a) of Figure 2, there are 8 points which do not match among 28 characteristic points. Those 8 points are considered: noise. And in (b) of Figure 2, there are 3 unmatched points which are noise, among 13 characteristic points. In (a) and (c) of Figure 3, the process of matching picks up more characteristic points and much of the characteristic points centralize on the ground rather than on the object. This not only reduces the accuracy of extracting the characteristic points, but also increases the robot’s calculation time. From (b) and (d) of Figure 3, the number of feature points for floor is reduced dramatically. It does not affect the characteristic point of the object robots explore. The experiment results show the proposed method could reduce matching time and feature extraction processing. As seen above, the noise on the floor has been significantly reduced, and the extraction on characteristic point of the object has not been weakened as expected before. Table 1. The result of feature extration, matching and noise rate
Original SIFT DS-SIFT
Image
Feature(No.)
Target Background Target Background
317 621 189 431
Match(No.)
Noise Rate of Matching
28
0.286
13
0.231
Table 1 show how the proposed method reduces the feature number of targets by almost 40 percent and the background’s feature number by 30 percent. According to the noise rate of matching, it not only reduces the program running time, but also degrades the impact of measurement of the master robot. According to several tests on different positions in the same environment, it is found that with the increase in the complexity of the environment, the effect becomes more apparent. Moreover, with the complexity of environmental conditions, the processing of de-speckle filtering times is increased appropriately, to remove more noise.
5 Conclusion In multi-robot cooperation, it is very important to distinguish the robot from stationary objects in localization, collision avoidance and mapping, etc. Especially for the master robot, it not only needs to know its position, but also should have the ability to get the other object’s information, and send the data to the slave robots to offset the measurement error of the slave robot. Though SIFT improves the effect of recognition of object, it suffers from long calculation time. Basically, the proposed method takes out a number of feature points of target objects. SIFT in a multi-robot system is to recognize the object and determine the position of the object instead of outlining the whole figure. The experimental results show
Object Recognition of a Mobile Robot Based on SIFT with De-speckle Filtering
405
that it improves the mobile robot’s moving efficiency. Although after running 5 despeckle processes, the whole programming time has not been reduced. However, the proposed algorithm reduces the matching time in SIFT process significantly. In future research, we plan to extend the number of robots to generalize the proposed algorithm of multi-robot cooperation for localization and mapping in a real environment and also make an effort to reduce the whole processing time.
References 1. Grimson, W.E.L., Lozano-Perez, T.: Model-based recognition and localization from sparse range or tactile data. MIT Artificial Intell. Lab., AI Memo 738 (1983) 2. Betke, M., Gurvits, L.: Mobile Robot Localization Using Landmarks. IEEE Transactions on Robotics and Automation 13(2), 251–263 (1997) 3. Li, G.-z., An, C.-w.: Scene Recognition for Mobile Robot Localization. ROBOT 27(2) (March 2005) 4. Lowe, D.: Distinctive image features from scale-invariant key-points. International Journal on Computer Vision 60(2), 91–110 (2004) 5. Ke, Y., Sukthankar, R.: PCA-SIFT: A more Distinctive Representation for Local Image Descriotors. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2004), July 2004, pp. 506–513. IEEE Computer Society Press, Los Alamitos (2004), 1315206, doi:10.1109/CVPR 6. Nadernejad, E.: Despeckle Filtering in Medical Ultrasound Imaging. Contemporary Engineering Sciences 2(1), 17–36 (2009) 7. Lee, J.: Refined filtering of image noise using local statistics. Comput. Graph. Image Process. 15, 255–269 (1981) 8. Lee, J.: Speckle suppression and analysis for synthetic aperture radar images. Opt. Eng. 25, 636–643 (1986) 9. Kuan, D.T., Sawchuck, A.A., Strand, T.C., Chavel, P.: Adaptive noise smoothing filter for images with signal dependent noise. IEEE Trans. Pattern Anal. Machine Intell. PAMI-7, 165–177 (1985) 10. Vetterli, M., Kovacevic, J.: Wavelets and Subband Coding. Prentice-Hall, Upper Saddle River (1995) 11. Dutt, V.: Statistical analysis of ultrasound echo envelope. Ph.D. dissertation, Mayo Graduate School, Rochester, MN (1995) 12. Loizou, C.P., Pattichis, C.S.: Despeckle Filtering Algorithms and Software for Ultrasound Imaging, pp. 36–54. Morgan & Claypool (2008), doi:10.2200/S00116ED1V01Y200805ASE001 13. Park, C.I., Lee, S.H., Jeong, Y.J.: A Hardware Design of Feature Detector for Real time Processing of SIFT (Scale Invariant Feature Transform) Algorithm in Embedded System. Journal of IEEK 46(3), 86–95 (2009) 14. Gao, Q., Li, J., Yang, G.: Vision Based Road Crossing Scene Recognition for Robot Localization. In: 2008 International Conference on Computer Science and Software Engineering, pp. 62–68 (2008)
Some Research on Functional Data Analysis Hui Liu Tsinghua University, Beijing 100084, PRC [email protected]
Abstract. In order to model the functional time series system, we developed a new model–Gaussian process hidden Markov model. We use the hidden Markov model to characterize the time order of system, and Gaussian process to model the function observations. We utilized this new model to consider the functional time series classification and prediction problem. The simulation results for real data demonstrate that our model is efficient. Keywords: Functional data analysis, Gaussian process, Hidden Markov model, Time series classification and prediction.
1
Introduction
In many experiments, the basic observed objects are curves, rather than single data points, such as the stock and index data of every day. When the measures are recorded densely over time, they are typically termed function or curve data, and accordingly the method chosen to analyze them is called Functional data analysis (FDA)[1], such as functional principle component analysis and functional canonical correlation analysis. Mean and covariance structure is usually used in many data analysis tasks, but when the dimension of the random vector is high, we need to estimate too many parameters, especially in covariance structure. Gaussian process is very suitable for analyzing the function objects, because the covariance function in Gaussian process can reduce the parameter estimation task greatly. A Gaussian process is a collection of random variables, any finite number of which have joint Gaussian distributions, and it is fully specified by a mean function and a covariance function: f (x) ∼ GP (m(x), K(x, x )). Gaussian process is widely used in machine learning[2,3]. Time series system is familiar in many situations, different from the common curve set, there is a time order in functional time series system. So only using shape information to model the time series system is insufficient, and we should consider the time order simultaneously. Hidden Markov model(HMM)[4] has been used successfully to analyze various types of time series. In the proposed paper, we utilized the HMM to characterize the time order in the functional time series system. There are many research works for functional data analysis based on Gaussian process, such as [5][6][20]. Meanwhile mixture models have been studied for many
Corresponding author.
Y. Tan, Y. Shi, and K.C. Tan (Eds.): ICSI 2010, Part II, LNCS 6146, pp. 406–413, 2010. c Springer-Verlag Berlin Heidelberg 2010
Some Research on Functional Data Analysis
407
decades in their two main roles of modeling heterogeneity for data coming from different populations and as a convenient form of flexible population density ([8] [9][10]). In the proposed paper we also defined a mixture of Gaussian process. Different from the traditional mixture model, we model the state space of HMM to be different Gaussian process. After the parameters estimation, we can utilize the Viterbi algorithm to compute the optimized state path of the observation functional time series system, and can specify the classification of the curve observations. We also consider the prediction problem of the function time series system. In time series prediction, ARMA (ARIMA) as conventional statistical techniques are constrained with the underlying seasonality, non-stationary and other factors, and HMM can efficiently overtake this constrain. The organization of this paper is as follows. In Section 2, we will describe the Gaussian process hidden Markov model(GPHMM). In section 3, we will describe the functional time series classification and prediction based on the proposed model. A conclusion remark will be provided in section 4.
2
Gaussian Process Hidden Marcov Model
A hidden Markov Model can provide a probabilistic framework for modeling a time series of multivariate observations. HMM was introduced in the beginning of the 1970’s [4] and has become increasingly popular due to its strong mathematical structure and theoretical basis for use in a wide range of applications. In recent years researchers proposed HMM as a classifier or predictor for speech signal recognition [11,12], DNA sequence analysis [13], handwritten characters recognition, natural language domains etc. Gaussian process model any function f with the following probability 1 P[f ] ∝ exp(− f − f0 2H ) 2
(1)
where f0 = E(f ) is the mean function and .H is the norm for a reproducing kernel Hilbert space(RKHS) , whose kernel K is also noted as the covariance function of the Gaussian process [2,3]. In practice, the data we face is discrete sample point on finite grid, denoted by ytm , 1 ≤ t ≤ T, 1 ≤ m ≤ M , and we assume that each ytm , 1 ≤ m ≤ M is regularly taken from a smooth function yt (x). So, (1) can be expressed as 1 P(yt ) = (2π)−M/2 |K|−1/2 exp(− (yt − y0 ) K −1 (yt − y0 )) 2
(2)
where yt = (yt1 , ..., ytM ) , and y0 is the values of mean function on x1 , ..., xM . In this paper, we take the popular squared exponential (SE) kernel with the 2 ) following form: K(x, x ) = σ1 exp(− (x−x ) + σ2 δxx . l Next, we define a S states HMM. The state space is denoted as S = {1, ·, s, ·S}, and each state is corresponding to a Gaussian process with mean function μs (x) and covariance function Ks (x, x ). We also define a state transition matrix a, and asr is the probability of from state s to state r. We utilize the B-spline
408
H. Liu
basis function to expand the mean function. So to each s, μs (x) = us · Φ(x), with Φ(x) = (φ1 (x), ··, φJ (x)) being the selected B-spline basis functions and us being the expand coefficients. So, the total parameters in our GPHMM model is denoted as λ = {{asr }Ss,r=1 , a1 , {us , σ1s , σ2s , ls }Ss=1 }, where a1 is the initial probability distribution vector of all the states, and we denote θs = {us , σ1s , σ2s , ls } for 1 ≤ s ≤ S, Y (x) = (y1 (x), ··, yT (x)) , and ω = (ω1 , ··, ωT ) being any state chain with ωt ∈ S. The likelihood of Y (x) given ω and λ is P(Y (x)|ω, λ) = T t=1 P(yt (x)|λt ), so the likelihood of Y (x) given λ is P(Y (x)|ω, λ)P(ω|λ) (3) P(Y (x)|λ) = ω
where ω is any possible state chain. In order to compute the above likelihood, we need to define the Forward and Backward operator as in the original HMM . Forward operator is defined as α1 (s) = a1 (s)P(y1 (x)|λs ), 1 ≤ s ≤ S, and αt (s) = S t ≤ T . Backward operator is r=1 αt−1 (r)ars P(yt (x)|λs ) , for 1 ≤ s ≤ S, 2 ≤ S defined as βT (s) = 1, 1 ≤ s ≤ S, and βt (s) = r=1 βt+1 (r)asr P(yt+1 (x)|λr ) , for 1 ≤ s ≤ S, 1 ≤ t ≤ T − 1. We know αt (s)βT (s) = P(Y (x), ωt = s|λ), so the S likelihood of Y (x) given λ is computed as P(Y (x)|λ) = s=1 αt (s)βt (s) for any S t. Particularly, P(Y (x)|λ) = s=1 αT (s), whose computation is linear in T . Our purpose is to find a model λ that maximizes the likelihood of given function time series Y (x), namely we want to find λ∗ = argmaxλ P(Y (x)|λ). In the original HMM, the Baum-Welch algorithm is used to update the model parameters. But in our method, the parameter update has not the simple and elaborated reestimated formula as in the Baum-Welch algorithm. We also utilized the EM algorithm [14] to update the desired parameters, and we know that the Baum-Welch algorithm is a special case of the EM algorithm. Let γt (s) = P(ωt = s|Y (x), λ) and γt (s, r) = P(ωt = s, ωt+1 = r|Y (x), λ), according to the definition of forward and backward operator, we know that γt (s) =
γt (s) and γt (s, r) denote the probability of state s happens at time t and the probability of from state s to state r at time t and the next time respectively, which can be utilized to reestimate {a1 (s), 1 ≤ s ≤ S} and {as,r , 1 ≤ s, r ≤ S}. Next we will give the model update formula utilizing the EM algorithm. Firstly we define a auxiliary function Q(λ|λn ) = ω P(ω|Y (x), λn )logP(ω, Y (x)|λ), where λn is currently given model parameters. We only need to compute λn+1 = sup Q(λ|λn ) and the obtained λn+1 can make the likelihood L(Y (x)|λn+1 ) > λ
L(Y (x)|λn ). We can obtain Q(λ|λn ) = P(ω|Y (x), λn )logP(ω, Y (x)|λ) ω
=
S s=1
γ1 (s) log a1 (s) +
T S S t=2 s=1 r=1
γt (s, r) log asr +
T S
γt (s) log P(yt (x)|λs )
t=1 s=1
(5)
Some Research on Functional Data Analysis
409
The above Eq.(5) indicates that the parameters needed to be reestimated is separate. So, like the reestimate formula in original HMM[4], we can reestimate {a1 (s), asr , 1 ≤ s ≤ S, 1 ≤ r ≤ S} as follows: T −1 γt (s, r) a1 (s) = γ1 (s), 1 ≤ s ≤ S; asr = t=1 (6) T −1 t=1 γt (s) Next we through maximizing the third term of the right side of (5) to reestimate the parameters of θs . Let L =
T S
γt (s) log P(yt (x)|θs )
(7)
t=1 s=1
, and taking the partial derivation about Substituting (2)and us into the above L the parameters needed to be reestimated, we obtain T
∂L = γt (s)(yt − Φus ) Ks−1 Φ ∂us t=1 T
∂L 1 ∂Ks 1 ∂Ks −1 = γt (s)[− tr(Ks−1 ) + (yt − Φus ) Ks−1 K (yt − Φus )] ∂θsj 2 ∂θsj 2 ∂θsj s t=1 (8) where {θsj , j = 1, 2, 3} is according to σ1s , σ2s , ls of state s. Φ is a M ∗ J matrix , and Φmj = φj (xm ). Let ∂L/∂us = 0 , we can obtain the analytic solution of us T T us = ( γt (s)Φ Ks−1 Φ)−1 ( γt (s)yt Ks−1 Φ) t=1
(9)
t=1
But explicit expression for ∂L/∂θsj does not exist. So, we firstly update us by (9), and update the θsj by (8) with the given us , and need to repeat the iteration multitimes. The number of state in HMM can be determined by BIC criteria[22].
3 3.1
The Model Application on Functional Time Series Classification and Prediction Functional Time Series Classification
The problem of curve clustering has been drawn attention recently, and most of the existing methods are based on mixture model and Bayesaian posterior estimate, such as [16,17,20]. The standard setup of this sort of methods is modeling the observations curves Y = (y1 , ., yn , ., yN ) as P(y|θ) = Σk αk pk (y|θk ), where pk is the component density with parameters θk for the k th cluster, and αk is the unconditional probability that y(x) was generated by cluster k. It’s possible to learn the parameters θ of many mixture models using standard EMbased algorithms [14,15]. In EM estimation, we can introduce a latent variable
410
H. Liu
znk denoting the nth observationbelongs to the kth cluester. We can compute p(znk = 1|θ) = αk pk (yn (x)|θk )/ k αk pk (yn (x)|θk ) for all the k, and can pick the k ∗ with the maximum probability value among all the k as the cluster index. Commonly, in multivariate case, the above pk takes the Gaussian probability density given mean μk and covariance matrix σk , and in functional case, pk takes the likelihood of Gaussian process of yn (x) under the given mean function mk (x) and covariance function σk (x, x ), such as the work in [5], where the clustering task is accomplished by means of a mixture Gaussian process model. Our GPHMM is similar to the mixture Gaussian process model, and the difference is that we use γt (s) to substitute unconditional probability αk that y was generated by cluster k and latent index variable znk . Through above demonstration , we know Gaussian process mixture model is suit to classify the curve set. But if the curve set is created by a time series system, the classification by Gaussian process mixture model is not very suitable, because it is only considering the shape information and not considering the time order. While our GPHMM is suitable to model the functional time series system, because the the transition probability and observation series of HMM is suitable to model the time series property. After training a GPHMM model, we can use Viterbi algorithm to determinate a state chain with the maximum probability of given the observation series, and each state is corresponding to a group or a cluster, and the clustering process is naturally accomplished. The Viterbi algorithm of our method is same as the common HMM, please refer to [4] for more details for Viterbi algorithm. Next we will provide experiment results. The data we considered is daily data of Dow Jones Industrial Average Indices(DJI) from 1945 to 2007 (The data resource is in http: // finance.yahoo.com). The log of original data fitted by 30 B-spline basis functions is displayed in left of Fig.1. In the clustering task, we mainly concerned the shape of the curve, so we let curve of every year minus its mean value. We mainly compare our method with Gaussian process mixture model(GPMM) proposed in [5]. The number of clusters in cluster analysis is an important but difficult problem. In [5], the number of clusters is calculated by BIC [22]. In our method, we also utilize this method to calculate the number of states, and we find that 3 states is corresponding to the minimum BIC value. The two set of mean function found by our method and GPMM in [5] is displayed in Fig.1. The classification result of two methods is basically same, but the mean function is a little different. Through the mean function we find that the mean functions found by GPMM is easy to be influenced by some outliers, and this is because that GPMM is like the K-means method and not considering the time order of the curve series. 3.2
Functional Time Series Prediction
Prediction is the most important problem in time series analysis, and the curve time series prediction has been taken attention recently. Many functional autoregressive (FAR) methods has been proposed to predict function data, for example [7,18,19,21] study the forecasting of electricity consumption, traffic, climatic variations, electrocardiograms, and ozone concentration respectively,
Some Research on Functional Data Analysis
0.1
0.1
0.05
0.05
0
0
−0.05
−0.05
−0.1
−0.1
411
0.2 0.1 0 −0.1 −0.2 −0.3 −0.4
0
100
200
0
100
200
0
100
200
Fig. 1. Left: DJI data minus its mean; Middle: mean function found by GPMM ; Right: mean function found by our method
using a generalization of the autoregressive model to functional data. In functional time series prediction, the data we face is discrete sample point on finite grid, denoted by ytm , 1 ≤ t ≤ T, 1 ≤ m ≤ M , and we assume that each ytm , 1 ≤ m ≤ M is taken from a smooth function yt (x). Our task is to predict the next yT +1 (x). After training a GP-HMM, we obtain a set of parameters λ = {{asr }Ss,r=1 , a1 , {us , σ1s , σ2s , ls }Ss=1 }. We can obtain the state of yT (x) by the Viterbi algorithm. Suppose the state variable of time T is s0 , s0 ∈ S, for each s, we can compute the probability of ps = P(ωT +1 = s) = as0 s . So we can compute the mean and covariance of yT +1 as follows: E(yT +1 ) = E(E(yT +1 |ωT +1 )) = ps us s
where u = S1 s us , b(s) is the proportion of each state and calculated by Viterbi algorithm . So the above mean plus E(yT +1 ) can give the one step prediction, e.g. the curve of time T + 1, and Var(yT +1 ) can give the the pointwise confidence interval of the one step prediction. We can also give multi-step prediction, and only need to compute the ps of muti-steps, and the rest of the computation is as same as the one step prediction. Next we will provide the real data prediction experiment. The data we consider is daily mean and minimum temperature of Angers in Quebec, Canada from 1971 to 2008 and the mean temperature of Montreal, Canada from 1971 to 2004. The data resource is in http: // climate.weatheroffice.ec. gc. ca. The original mean temperature data of Montreal and the fitted data via B-spline with 30 basis functions is illuminated in up of Fig.2. We take all the data except for the last year as training dataset, and the data of last year as test data. One step prediction result of mean temperature data of Montreal and Angers are displayed in down of Fig.2. From the result we can see our model can give a acceptable prediction result, and meanwhile because the high dimension and the
the true curve HMM prediction 95% confidence limit
−20 −30 400 0
100
200
300
400
Fig. 2. Up-Left: the original mean temperature data of Montreal; Up-Right: The Montreal data fitted by B-spline; Down-Left: the one-step prediction result of Montreal data; Down-Right: the one-step prediction result of mean temperature of Angers, Quebec
insufficient number of observation, the traditional VAR model is infeasible here. So our model provides a feasible prediction technique for functional time series system.
4
Conclusion
In the proposed paper, we introduced a new model– Gaussian process hidden Markov model to model the functional time series system, which characterize the curve properties and time series properties of the functional time series system respectively. We considered the classification and prediction problem with the new model, and the experiment results demonstrate that our model is suitable for characterizing the functional time series system.
References 1. Ramsay, J.O., Silverman, B.W.: Functional Data Analysis, 2nd edn. Springer, Heidelberg (2005) 2. Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006) 3. Seeger, M.: Gaussian processes for machine learning. International Journal of Neural Systems 14(2), 69–106 (2004) 4. Rabiner, L.R.: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proc. IEEE (77), 257–286 (1989) 5. Shi, J.Q., Wang, B.: Curve prediction and clustering with mixtures of Gaussian process functional regression models. Stat. Comput. (18), 267–283 (2008) 6. Schwaighofer, A., Tresp, V., Yu, K.: Learning gaussian process kernels via hierarchical bayes. NIPS17 (2005)
Some Research on Functional Data Analysis
413
7. Besse, P.C., Cardot, H., Stephenson, D.B.: Autoregressive forecasting of some functional climatic variations. Scand. J. Statist. (27), 673–687 (2000) 8. Titterington, D.M., Smith, A.F.M., Makov, U.E.: Statistical Analysis of Finite Mixture Distributions. Wiley, New York (1985) 9. McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000) 10. Fernandez, C., Green, P.: Modelling spatially correlated data via mixtures: a Bayesian approach. J. Roy. Stat. Soc. Ser. B 64, 805–826 (2002) 11. Huang, X., Ariki, Y., Jack, M.: Hidden Markov Models for speech recognition. Edinburgh University Press (1990) 12. Xie, H., Anreae, P., Zhang, M., Warren, P.: Learning Models for English Speech Recognition. In: Proceedings of the 27th Conference on Australasian Computer Science, pp. 323–329 (2004) 13. Liebert, M.A.: Use of runs statistics for pattern recognition in genomic DNA sequences. Journal of Computational Biology 11, 107–124 (2004) 14. Bilmes. J.A.: A General Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models (1998) 15. McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. John Wiley and Sons, New York (1997) 16. Gaffney, S., Smyth, P.: Curve clustering with random effects regression mixtures. In: Bishop, C., Frey, B. (eds.) Proc. Ninth Inter. Workshop on Artificial Intelligence and Statistics (2003) 17. James, G., Sugar, C.: Clustering for sparsely sampled functional data. J. Am. Stat. Assoc. 98, 397–408 (2003) 18. Damon, J., Guillas, S.: The inclusion of exogenous variables in functional autoregressive ozone forecasting. Environmetrics 13, 759–774 (2002) 19. Kargin, V., Onatski, A.: Curve forecasting by functional autoregression. Journal of Multivariate Analysis (99), 2508–2526 (2008) 20. Shi, J.Q., Murray-Smith, R., Titterington, D.M.: Hierarchical Gaussian process mixtures for regression. Stat. Comput. 15, 31–41 (2005) 21. Algirdas, L.: Functional data analysis for cash flow and transactions intensity continuous-time prediction using Hilbert-valued autoregressive processes. European Journal of Operational Research (185), 1607–1614 (2008) 22. Schwarz, G.: Estimating the dimension of a model. Ann. Stat. (6), 461–464 (1978)
Optimization Algorithm of Scheduling Six Parallel Activities to Three Pairs Order Activities Xiuhua Zhao, Jianxun Qi, Shisen Lv, and Zhixiong Su School of Business Administration North China Electric Power University Beijing, China [email protected]
Abstract. It is a special resource allocation problem to adjust 2N paralleling activities into N activity pairs within resource limits in a CPM (Critical Path Method) network planning; also it is a hot topic in the field of project scheduling. So far, no simple and effective method has been designed to solve this problem. In this paper, an optimized algorithm is developed to adjust 6 paralleling activities into 3 activity pairs. Firstly, an algorithm is designed to calculated the tardiness which can be applied in any circumstance; then, the standard activity pair theory and normalized activity pair theory are created; finally, an optimum method is developed on the basis of the theories and algorithms as mentioned above. Keywords: project management; scheduling optimization; CPM network.
Optimization Algorithm of Scheduling Six Parallel Activities
415
scheduling problem, which can only be applied in small network; the heuristic methods used in reference [6,7] are of great limitation; reference [8] made an introduction to the development and characteristic of the heuristic methods, and classified them into two group: the serial scheduling scheme and the paralleling scheduling scheme, more than 30 priority rules are introduced. Solution based on heuristic method is near to the optimum, but it never guarantee the optimum result. As the purpose of heuristic method is to find a feasible solution quickly and efficiently, but it can not predict the difference between a feasible solution and an optimum one. Also the solution for certain problem varies with the heuristic methods, and vice versa. There are also many different exact procedures such as linear programming approaches, 0-1 programming approaches, the branch-and-bound procedures, and some other implicit enumeration methods. But all these methods can not be applied in large and complex network. They can only be used successfully to deal with the small network with no more than 50 networks, such as in reference [9]. However, the most frequent problems happened in project management often consist of 4 or 6 paralleling activities. Therefore, it is importance to find simple and practical optimization method. Based on the characteristic of the CPM network, the preceding main chain theory and the succeeding main chain theory is developed in this paper, and optimization algorithm is designed for the 3 activity pairs on the basis of the theories mentioned above. This method is simple and can be applied in any circumstance.
2 Conception Activity pair and its tardiness. If two paralleling activities A, B are adjusted to a serial chain as A → B , this chain is called an activity pair and noted as ( AB ) . Delay of total project is called tardiness of the activity pair and noted as [ AB ] . Ternary pair. If six parallel activities A1 , A2 , A3 , B1 , B2 , B3 are adjusted to ( A1 B1 ),
⎛ A1 A2 A3 ⎞ ( A2 B2 ) , ( A3 B3 ) , the scheme is called ternary pair and could noted as ⎜ ⎟. ⎝ B1 B2 B3 ⎠ ⎡ A1 A2 A3 ⎤ Delay of project is called tardiness of ternary pair and noted as ⎢ ⎥. ⎣ B1 B2 B3 ⎦ Normalized ternary pair. If EFA1 ≤ EFA2 ≤ EFA3 , and LS B1 ≤ LS B2 ≤ LS B3 , then
⎛ A1 A2 A3 ⎞ ⎜ ⎟ is named as normalized ternary pair. ⎝ B1 B2 B3 ⎠ Ternary pair normalization and cognate ternary pair. By changing position of *
⎛ A1 A2 A3 ⎞ ⎛ A1 A2 A3 ⎞ A1 , A2 , A3 and B1 , B2 , B3 , ⎜ ⎟ , this ⎟ will be adjusted as normalized ⎜ ⎝ B1 B2 B3 ⎠ ⎝ B1 B2 B3 ⎠ process is called ternary pair normalization. If some ternary pair has same normalization result, they are called cognate ternary pairs.
416
X. Zhao et al.
Activity focus. The sum of ES A and LFA is called focus of A , noted as C A .
⎛ A1 A2 A3 ⎞ Standard ternary pair. If C A1 ≤ CB1 , C A2 ≤ CB2 , C A3 ≤ CB3 , then ⎜ ⎟ is called ⎝ B1 B2 B3 ⎠ standard ternary pair. Ternary pair standardization and similar ternary pairs. By changing the position ∇
⎛ A1 A2 A3 ⎞ ⎛ A1 A2 A3 ⎞ of Ai , Bi , ⎜ ⎟ will be adjusted as standard ternary pair ⎜ ⎟ ; this process ⎝ B1 B2 B3 ⎠ ⎝ B1 B2 B3 ⎠ is called Ternary pair standardization. If some ternary pair has the same standardization result, they are called similar ternary pairs.
3 Main Lemma and Theorem Lemma 1. Length μi* of the longest path between arbitrary node (i ) and the initial
node (1) equals to the earliest start time ESij of activity (i, j ) , viz.
μi* = ESij
(1)
Lemma 2. Length μ ⊕j of the longest path between node ( j ) and final node ( w)
equals to μ ∇ of critical path minus the latest finish time of node ( j ) , viz.
μ ⊕j = μ ∇ − LFj
(2)
Theorem 1. For any activity pair ( AB ) , tardiness [ AB ] can be calculated as
[ AB ] = max { EFA − LS B , 0}
(3)
( )
Proof. After two parallel activity A and B are adjusted to AB , there is no adjustment to other activities. Therefore, except some new paths are added, there is no amendment to any path in original network. Regarding to these new paths, they pass A and B orderly. It is assumed that the longest path of these new paths is μ A∇→ B . As the completion time of a project equals to the longest path in its CPM network, thereby, when μ A∇→ B ≤ μ ∇ , the original path is still μ ∇ , there is no tardiness to the
project. When μ A∇→ B > μ ∇ , μ A∇→ B becomes the new critical path, the project completion time is μ A∇→ B . Therefore, the tardiness of project is [ AB ] = μ A∇→ B > μ ∇ . Suppose A = (u, v) , B = ( s, t ) , According to the preceding main chain theory, the longest path between the start node (u ) of activity A and the initial node of the network is μu∗ = ESu ; for definition, we have ESu = ESuv , μu∗ = ESuv = ES A .
Optimization Algorithm of Scheduling Six Parallel Activities
417
According to Lemma 2, the longest path between finish node (t ) of activity B and the final node ( w) is μt ⊕ =μ ∇ − LFt ; for definition, we have LFt = LFst , so
μt ⊕ =μ ∇ − LFst = μ ∇ − LFB From the succeeding and preceding main chain theories, the longest path which path through both the activity A and B is as the following:
μ A → B ∇ = μu * + A + B + μt ⊕ From above, it can be deduced that
μ A∇→ B = ES A + TA + TB + ( μ ∇ − LFB ) = μ ∇ + EFA − LS B When μ A∇→ B > μ ∇ , the project completion time is delayed; and the tardiness is [ AB ] = μ A∇→ B − μ ∇ = EFA − LS B From above, it can be deduced that [ AB ] = max { EFA − LS B , 0} Theorem 2. When two paralleling activities ( AB ) are adjusted to a serial chain as
A → B or B → A , if C A ≤ CB , then A → B is the better scheme; if C A ≥ CB , then B → A is better. Proof. 1) When [ AB] > 0, [ BA] > 0 , based on the tardiness theory
[ AB ] − [ BA] = ( EFA − LS B ) − ( EFB − LS A ) = ( EFA + LS A ) − ( EFB + LS B ) = C A − CB When C A < CB , then [ AB ] < [ BA] , then A → B is better; when C A > CB , then [ AB ] > [ BA] , then B → A is better. 2) When [ AB] > 0, [ BA] = 0 , then according to the tardiness theory ( EFA − LS B ) − ( EFB − LS A )=( EFA + LS A ) − ( EFB + LS B )=C A − CB > 0 ⇒ C A > CB For [ AB ] > [ BA] , then B → A is better. 3) When [ AB] = 0, [ BA] > 0 , then A → B is better. 4) When [ AB] = [ BA] = 0 , then A → B and B → A are same. Based on the conclusion of 1)- 4), it can be proved that the theorem is correct. Theorem 3. The tardiness theory of ternary pair
Proof. Suppose activities A1 , A2 , A3 , B1 , B2 , B3 are paralleling to each other, so they
share no common path. So when ( A1 B1 ) is created, the longest one of the new path ensued is μ A∇1 → B1 . Certainly A2 , A3 , B2 , B3 will be never on μ A∇1 → B1 , otherwise it will be contradict to the fact that A1 , A2 , A3 , B1 , B2 , B3 are paralleling.
418
X. Zhao et al.
Therefore when ( A2 B2 ) is crested, there is no adjust to the path μ A∇1 → B1 in terms of the activity and their durations, therefore μ A∇1 → B1 does not change. The longest one of the new paths ensured is μ A∇2 → B2 . Certainly it does not pass A1 , A3 , B1 , B3 , otherwise it will be in contradiction to the paralleling assumption. Similarly, when ( A3 B3 ) is created, there is no change to the path of μ A∇1 → B1 and
μ A∇ → B . The longest one μ A∇ → B of the new paths ensued does not pass A1 , A2 , B1 , B2 . 2
2
3
3
The project makespan depends on the longest path of the network. The new paths ⎛ A1 A2 A3 ⎞ occurred due to ⎜ ⎟ can be classified into three types: 1) paths which passes ⎝ B1 B2 B3 ⎠ through both A1 and B1 , and the longest one is μ A∇1 → B1 ; 2) paths which passes through both A2 and B2 , and the longest one is μ A∇2 → B2 ; 3) paths which passes through both
A3 and B3 , and the longest one is μ A∇3 → B3 . Therefore, the longest path in the new
{
}
network is μ ′∇ = max μ A∇1 → B1 ,μ A∇2 → B2 ,μ A∇3 → B3 ,μ ∇ , its length is μ ′∇ . According to the definition of ternary pair,
⎡ A1 A2 A3 ⎤ ⎢ ⎥ = max {[ A1 B1 ],[ A2 B2 ],[ A3 B3 ]} ⎣ B1 B2 B3 ⎦ Theorem 4. The tardiness of the normalized ternary pair is less than or equal to that of other cognate ternary pairs, viz. *
⎛ A1 A2 A3 ⎞ ⎛ A1 A2 A3 ⎞ ⎛ A1 A2 A3 ⎞ 3) If ⎜ ⎟ and other pair of the same activities, it ⎟ =⎜ ⎟ or ⎜ B B B B B B ⎝ B1 B3 B2 ⎠ ⎝ 1 2 3⎠ ⎝ 3 1 2⎠ could be also proved that the theorem is correct. Theorem 5. The tardiness of the standard ternary pair is less than or equal to that of the similar ternary pairs, viz. ∇
⎡ A1 A2 A3 ⎤ ⎡ B1 B2 B3 ⎤ Proof. If ⎢ ⎥ =⎢ ⎥ , then according to the definition CBi ≤ C Ai and theoB B B ⎣ 1 2 3⎦ ⎣ A1 A2 A3 ⎦ rem 3, [ Bi Ai ] ≥ [ Ai , Bi ] . According to theorem 3, it can be deduced that ∇
⎡ B1 B2 B3 ⎤ ⎡ A1 B2 B3 ⎤ ⎡ A1 A2 B3 ⎤ ⎡ A1 A2 A3 ⎤ ⎡ A1 A2 A3 ⎤ ⎡ A1 A2 A3 ⎤ ⎢ ⎥≤⎢ ⎥≤⎢ ⎥≤⎢ ⎥⇒⎢ ⎥ ≤⎢ ⎥ A A A B A A B B A B B B B B B ⎣ 1 2 3⎦ ⎣ 1 2 3⎦ ⎣ 1 2 3⎦ ⎣ 1 2 3⎦ ⎣ 1 2 3⎦ ⎣ B1 B2 B3 ⎦ Similarly, the theory can be proved true in other circumstance.
420
X. Zhao et al.
4 The Standard Normalization Algorithm 4.1 Discription of Algorithm
The algorithm is discript as follows: Step 1. Calculate the focuses of the six paralleling activities and marked as C A1 ≤ C A2 ≤ C A3 ≤ C A4 ≤ C A5 ≤ C A6 Step 2. List the following 5 standard ternary pairs: ⎛ A1 A2 A3 ⎞ ⎛ A1 A3 A4 ⎞ ⎛ A1 A3 A5 ⎞ ⎛ A1 A2 A4 ⎞ ⎛ A1 A2 A5 ⎞ ⎜ ⎟,⎜ ⎟,⎜ ⎟,⎜ ⎟,⎜ ⎟ ⎝ A4 A5 A6 ⎠ ⎝ A2 A5 A6 ⎠ ⎝ A2 A4 A6 ⎠ ⎝ A3 A5 A6 ⎠ ⎝ A3 A4 A6 ⎠
Normalize and standardize the ternary pairs above repeatedly. Step 3. Calculate the tardiness value of the standard and normalized ternary pairs, the one with the minimum value will be the best solution. 4.2 Analyze Correctness of Algorithm
According to definition, ternary pairs with the same genes include following 6 types:
⎛ A1 A2 A3 ⎞ ⎛ A1 A3 A2 ⎞ ⎛ A3 A1 A2 ⎞ ⎛ A2 A1 A3 ⎞ ⎛ A3 A2 A1 ⎞ ⎛ A2 A3 A1 ⎞ ⎜ ⎟,⎜ ⎟,⎜ ⎟,⎜ ⎟,⎜ ⎟,⎜ ⎟ ⎝ B1 B2 B3 ⎠ ⎝ B1 B3 B2 ⎠ ⎝ B3 B1 B2 ⎠ ⎝ B2 B1 B3 ⎠ ⎝ B3 B2 B1 ⎠ ⎝ B2 B3 B1 ⎠ Also, according to the definition, ternary pairs with different activities are different from each other. For similar ternary pairs, there are 8 different type pairs:
⎛ A1 A2 A3 ⎞ ⎛ B1 A2 A3 ⎞ ⎛ A1 B2 A3 ⎞ ⎛ A1 A2 B3 ⎞ ⎛ A1 B2 B3 ⎞ ⎛ B1 A2 B3 ⎞ ⎛ B1 B2 A3 ⎞ ⎛ B1 B2 B3 ⎞ ⎜ ⎟,⎜ ⎟,⎜ ⎟,⎜ ⎟,⎜ ⎟,⎜ ⎟,⎜ ⎟,⎜ ⎟ ⎝ B1 B2 B3 ⎠ ⎝ A1 B2 B3 ⎠ ⎝ B1 A2 B3 ⎠ ⎝ B1 B2 A3 ⎠ ⎝ B1 A2 A3 ⎠ ⎝ A1 B2 A3 ⎠ ⎝ A1 A2 B3 ⎠ ⎝ A1 A2 A3 ⎠ Therefore, each ternary group will have 6 × 8 = 48 different pairs. From the definition of ternary group and the standard ternary pair theory, it can be learned that the standardization of these 48 ternary pairs will have a same standard ternary pair with the minimum tardiness value. Therefore, any any ternary pair in each group can be used to calculate the standard ternary pair. According to the definition, the ternary pair varies with the group. For any six parallel activities, there will be 6! = 720 feasible ternary pairs, and there are totally 720 ÷ 48 = 15 groups. For above analysis, the optimum solution can be acquired bases on 15 ternary pairs from different groups. In step 1, the 15 standard ternary pairs are as follows:
Optimization Algorithm of Scheduling Six Parallel Activities
421
As a matter of fact, they are from 5 different cognate ternary pairs as listed above. According to the definition cognate and the tardiness theory of the normalized ternary pair, cognate ternary pairs have the same normalization result. Therefore, any five of different cognate ternary pairs can represent total 15 different standard ternary pairs. We can select the first one from each monolog and call them as basic ternary pair. From the theory of standard ternary pair and that of the normalized ternary pair, it could be deduced that the best pair must be a standard and normalized one. So, if the basic ternary pairs are not normalized, it will be normalized first. Then, if the result is not standard any more, a standardization process will ensued, which will change its monolog. As there are only 5 monologs, so if the normalized basic ternary pair is not a standard one, it will be eliminated. The step 2 is right. From theory of standard and normalized ternary pair, it could be deduced that the best pair must be a standard and normalized one. Therefore, the standard normalized ternary pair with minimum tardiness value must be the best solution.
5 Conclusion Based on the property of the CPM network, this paper developed the preceding main chain theory and the succeeding main chain theory. With these theories, the tardiness theories for duality pair and ternary pair are proved. Then, the standard ternary pair theory, the barycentre theory and the standardized ternary pair theory are developed. Further, a scientific demonstration is give to the above problem for the first time. Finally, an optimization method is designed for the 3 activity pair on the basis of the theories above. The method is simple and can be applied in any circumstance. Acknowledgments. R.B.G. thanks the Natural Science Foundation of China (70671040) and Beijing Municipal Commission of Education (X90017).
References 1. Chen, Y.H., Men, F.C.: The barycenter theory and its application in progress control. The Construction Optimization 15, 30–32 (2002) 2. Qi, J.X.: The optimization method of normal activity pairs. Journal of North China Electric Power University 2, 106–116 (1988) 3. Ulusoy, G., Ozdam, L.: Heuristic Performance and Network/ Resource Characteristics in Resource-Constrained Project Scheduling. Journal of Operations Research 40, 1145–1152 (1989) 4. Akpan, E.O.P.: Job-shop sequencing problems via network scheduling technique. International Journal of Operations & Production Management 16, 76–86 (1996) 5. Whitehouse, G.E.: Systems Analysis And Design Using Network Techniques. PrenticeHall, New Jersey (1973) 6. Elmaghraby, S.E.: On criticality and sensitivity in activity networks. European Journal of Operational Research 127, 220–238 (2000) 7. Montemanni, R., Gambardella, L.M., Donati, A.V.: A branch and bound algorithm for the robust shortest path problem with interval data. Operations Research Letters 32, 225–232 (2004) 8. Bai, S.J.: Modern Project Management, pp. 83–105. Machinery Industry Press, Beijing (2005) 9. Li, X.M., Qi, J.X., Niu, D.X.: The sequential optimal decision of three parallel procedure chains. Journal of Chinese Management Science 1, 46–51 (2007)
Research on the Optimization Decision-Making Two Row-Sequencing-Pairs of Activities with Slacks Shisen Lv, Jianxun Qi, Xiuhua Zhao, and Zhixiong Su School of Business Administration North China Electric Power University Beijing, China [email protected]
Abstract. In Operation research, how to schedule parallel activities to sequential ones is typical project scheduling problem with restrained resources, and it is also complicated optimization problem. In this paper, on the basis of characteristic of CPM (Critical Path Method) network, theories of Deficient Values of sequencing-pair and standard row-sequencing-pair are deduced. Based on these theories, an optimization method on selecting 4 activities from N parallel activities to constitute 2 row-sequencing-pairs of activities is designed. By proof, using the method could get optimal solution. Keywords: project scheduling; CPM network planning; row-sequencing-pair; deficient value.
Research on the Optimization Decision-Making Two Row-Sequencing-Pairs
423
and adjust them to 2 row-sequencing-pairs, N activities of whole CPM network must be considered. Therefore, we must consider 2N restricted equations and inequations which are consisted of N variables. In this paper, according to the basic characteristic of CPM network, we deduced Deficient Values Theorem of sequencing-pair and row-sequencing-pair and Theorem of standard row-sequencing-pair. Base on above theorems, we give an optimization method on how to select 4 activities from N parallel activities to constitute 2 rowsequencing-pairs, and we proved that this method is optimal and universal.
2 Basic Conception Sequencing pair and its deficient value. Parallel activity A, B are adjusted to A being advanced to B, viz. A → B , it is called a sequencing pair, signed as ( AB ) . The delayed time of duration of project caused by the adjustment is called the deficient value of sequencing pair ( AB ) , signed [ AB ] . Row-sequencing-pair. Four parallel activities A1 , A2 , B1 , B2 are adjusted to two se-
⎛A A ⎞ quencing-pair ( A1 B1 ) , ( A2 B2 ) , they are called row-sequencing-pair, signed ⎜ 1 2 ⎟ . ⎝ B1 B2 ⎠ ⎛A A ⎞ Delay of project is called the deficient value of ⎜ 1 2 ⎟ , signed ⎝ B1 B2 ⎠
⎡ A1 A2 ⎤ ⎢B B ⎥ . ⎣ 1 2⎦
⎛A A ⎞ Standard row-sequencing-pair. If EFA1 ≤ EFA2 , and LS B1 ≤ LS B2 , then ⎜ 1 2 ⎟ is ⎝ B1 B2 ⎠ ⎛Z Z ⎞ called standard row-sequencing-pair. For row-sequencing-pair ⎜ 1 2 ⎟ , by inter⎝ Y1 Y2 ⎠ changing Z1 and Z 2 , or Y1 and Y2 , to make it be standard row-sequencing -pair *
*
⎛ Z1 Z 2 ⎞ ⎛ Z1 Z 2 ⎞ ⎛ Z1 Z 2 ⎞ ⎜ ⎟ , this is called standardization from ⎜ ⎟ to ⎜ ⎟ . ⎝ Y1 Y2 ⎠ ⎝ Y1 Y2 ⎠ ⎝ Y1 Y2 ⎠ Slacks of row-sequencing-pair. From N parallel activities in network, if any four activities such as A1 , A2 , B1 , B2 are selected to constitute row-sequencing-pair ⎛ A1 A2 ⎞ ⎜ ⎟ ; the rest parallel activities are called the slacks of ⎝ B1 B2 ⎠
⎛ A1 A2 ⎞ ⎜ ⎟. ⎝ B1 B2 ⎠
3 Basic Theorem Lemma 1. Deficient Values Theorem of sequencing-pair
⎧ EFA − LS B , EFA > LS B = max {0, EFA − LS B } EFA ≤ LS B ⎩0,
[ AB ] = ⎨
424
S. Lv et al.
Lemma 2. Deficient Values of row-sequencing-pair is
⎡ A1 A2 ⎤ ⎢ B B ⎥ = max {[ A1 B1 ] , [ A2 B2 ]} ⎣ 1 2⎦ Lemma 3. Theorem of standard row-sequencing-pair Among the agnate rowsequencing-pair, the deficient value of standard row-sequencing-pair is minimal. *
⎡ A1 A2 ⎤ ⎡ A1 A2 ⎤ ⎢B B ⎥ ≤ ⎢B B ⎥ ⎣ 1 2⎦ ⎣ 1 2⎦ Theorem 1. Suppose activities A, B, C , D, Z1 , Z 2 ∈ E , EFZ1 is larger than EFA or
EFB , and EFZ2 is larger than another one, {Z1 , Z 2 } ∩ {C , D} = ∅ , then from criterion ⎛AB⎞ row-sequencing-pair ⎜ ⎟ , choose arbitrary activity in {Z1 , Z 2 } to replace activity ⎝C D⎠ in { A, B} which value of EF is smaller than it, the deficient values of all row⎡A B ⎤ sequencing-pairs which are newly constituted are larger than ⎢ ⎥. ⎣C D ⎦ ⎛AB⎞ Proof. ⎜ ⎟ is criterion row-sequencing-pair, EFA ≤ EFB , LSC ≤ LS D , then ⎝C D⎠ (1) If EFA ≤ EFZ1 ≤ EFB ≤ EFZ 2 , then
① Replace the activity B of ⎛⎜ CA BD ⎞⎟ with Z , then ⎝
⎠
2
{
}
[ BD ] = max { EFB − LS D , 0} ≤ max EFZ2 − LS D , 0 = [ Z 2 D ]
⎡ A Z2 ⎤ ⎡A B ⎤ ⎢C D ⎥ = max {[ AB],[CD]} ≤ max {[ AC ],[ Z 2 D ]} = ⎢C D ⎥ ⎣ ⎦ ⎣ ⎦ For EFA ≤ EFZ2 , LSC ≤ LS D , according to Lemma 3, the deficient value of ⎛ A Z2 ⎞ ⎛ A Z2 ⎞ ⎡ A Z2 ⎤ ⎡A B ⎤ ⎜ ⎟ which are agnate with ⎜ ⎟ is larger than ⎢ ⎥ and ⎢C D ⎥ . ⎣ ⎦ ⎝C D ⎠ ⎝C D ⎠ ⎣C D ⎦ ⎛AB⎞ Replace the activity A of ⎜ , we can prove that ⎟ with Z1 , use method of ⎝C D⎠ ⎛Z B ⎞ ⎡A B ⎤ deficient values of new ⎜ 1 ⎟ and its agnate pairs are larger than ⎢ ⎥. C D ⎣C D ⎦ ⎝ ⎠
②
①
③ Replace A with Z , similarly, we have same conclusion. ④ Replace {A,B} with {Z , Z } . 2
1
2
(i) Replace A, B with Z1 , Z 2 , then [ Z1C ] ≥ [ AC ] , [ Z 2 D ] ≥ [ BD] , therefore ⎡ Z1 Z 2 ⎤ ⎡A B ⎤ ⎢C D ⎥ = max {[ AC ],[ BD]} ≤ max {[ Z1C ],[ Z 2 D ]} = ⎢C D ⎥ ⎣ ⎦ ⎣ ⎦
Research on the Optimization Decision-Making Two Row-Sequencing-Pairs
425
⎡Z Z ⎤ ⎡A B ⎤ For EFZ1 ≤ EFZ2 , LSC ≤ LS D , then ⎢ 1 2 ⎥ is larger than (or equal to) ⎢ ⎥. ⎣C D ⎦ ⎣C D ⎦ ⎛Z Z ⎞ (ii) Replace B with Z1 , replace A with Z 2 , we have ⎜ 2 1 ⎟ , it is an cognate row⎝C D ⎠ ⎛Z Z ⎞ sequencing-pair of ⎜ 2 1 ⎟ , from (i), we get ⎝C D ⎠ (2) Suppose EFA ≤ EFB ≤ EFZ1 ≤ EFZ2 ,
⎡ A B ⎤ ⎡ Z 2 Z1 ⎤ ⎢C D ⎥ ≤ ⎢ C D ⎥ . ⎣ ⎦ ⎣ ⎦
① Replace B with Z or Z , use the method of (1)-①, the Theorem 1 holds. ② Replace A with Z , get ⎛⎜ CZ BD ⎞⎟ . For EF ≥ EF , then ⎛⎜ CZ BD ⎞⎟ = ⎛⎜ CB ZD ⎞⎟ , 1
2
*
1
1
Z1
1
B
⎝ ⎠ ⎝ ⎡ B Z1 ⎤ ⎡ A B ⎤ use method of (1)- , we get ⎢ ⎥≥⎢ ⎥ . From Lemma 2, and ⎣C D ⎦ ⎣C D ⎦
④
*
⎡Z B ⎤ ger than (or equal to) ⎢ 1 ⎥ , therefore it is larger than ⎣C D ⎦
1
⎠ ⎝ ⎠ ⎡ Z1 B ⎤ ⎢ ⎥ is lar⎣C D ⎦
⎡A B ⎤ ⎢C D ⎥ . ⎣ ⎦
③ Replace A with Z , use the method of (2)-②, the Theorem 1 holds. ④ Replace A, B with Z , Z , using (1)-④, Theorem 1 is correct. 2
1
2
Theorem 2. Suppose A, B, C , D, Y1 , Y2 ∈ E , LSY1 is smaller than at least one of LSC
and LS D , and LSY2 is smaller than another. Besides, {Y1 , Y2 } ∩ { A, B} = ∅ , then select arbitrary activities from {Y1 , Y2 } to replace any activities in {C , D} which the latest⎛AB⎞ starting time is smaller than it. In the criterion row-sequencing-pair ⎜ ⎟ , the defi⎝C D⎠ ⎡A B ⎤ cient value of which are newly constituted is larger than ⎢ ⎥. ⎣C D ⎦ Proof. It is similar to the proof of Theorem 1. Theorem 3. Suppose activity A, B, C , D, Z1 , Z 2 , Y1 , Y2 ∈ E , EFZ1 is at least larger than
one of EFA and EFB , EFZ2 is at least larger than another one of EFA and EFB ; {Z1 , Z 2 } ∩ {C , D} = ∅ ; LSY1 is at least larger than one of LSC and LS D , LSY2 is at least larger than another one of LSC and LS D . Besides, {Y1 , Y2 } ∩ { A, B} = ∅ , then replace the activity of { A, B} with {Z1 , Z 2 } and replace the any activity of {C , D} with ⎛AB⎞ {Y1 , Y2 } in the criterion row-sequencing-pair ⎜ ⎟ , deficient value of new row⎝C D⎠ ⎡A B ⎤ sequencing-pair is larger than or equal to ⎢ ⎥. ⎣C D ⎦
426
S. Lv et al.
Proof. The replacement is as following: (1) Replace the activity of { A, B} with activity of {Z1 , Z 2 } ;
(2) Replace activity of {C , D} with activity of {Y1 , Y2 } in the row-sequencing-pair which is constituted in (1). According to Theorem 1, then 2 and 3 are correct. ⎛A A ⎞ Theorem 4. If { A1 , A2 } ∩ {B1 , B2 } = ∅ , optimal row-sequencing-pair is ⎜ 1 2 ⎟ . ⎝ B2 B1 ⎠ ⎛A A ⎞ Proof. For { A1 , A2 } ∩ {B1 , B2 } = ∅ , then ⎜ 1 2 ⎟ is exist. Suppose EFA1 ≤ ⎝ B2 B1 ⎠ EFA2 ≤ ⋅⋅⋅ ≤ EFAn , LS Bn ≤ ⋅⋅⋅ ≤ LS B2 ≤ LS B1 , the row-sequencing-pair is criterion.
Suppose Ai ∈ { A3 , A4 , ⋅⋅⋅, An } , if Ai ∈ {B1 , B2 } , replace activity of { A1 , A2 } with Ai , ⎛ A1 A2 ⎞ ⎛ A1 A2 ⎞ ⎜ ⎟ is exist; if Ai ∉ {B1 , B2 } , replace activity of { A1 , A2 } with Ai in ⎜ ⎟, ⎝ B2 B1 ⎠ ⎝ B2 B1 ⎠ ⎡A A ⎤ for Lemma 1, their deficient value are larger than or equal to ⎢ 1 2 ⎥ . ⎣ B2 B1 ⎦ ⎛A A ⎞ Except ⎜ 1 2 ⎟ and its cognate pairs, row-sequencing-pairs which constituted by ⎝ B2 B1 ⎠ activities of E can be considered as the new row-sequencing-pair and its cognate pairs which composed by replacing activities of { A1 , A2 , B1 , B2 } . According to Theorem 1, 2 ⎡A A ⎤ and 3, the deficient value of row-sequencing-pair are larger than or equal to ⎢ 1 2 ⎥ . ⎣ B2 B1 ⎦ ⎛A A ⎞ For ⎜ 1 2 ⎟ is stardard, according to Lemma 3, the deficient value of row⎝ B2 B1 ⎠ ⎡A A ⎤ sequencing-pairs which are cognate with this one are larger than or equal to ⎢ 1 2 ⎥ . ⎣ B2 B1 ⎦ Therefore, deficient value of row-sequencing-pair which composed with activities ⎡A A ⎤ ⎛A A ⎞ of E are larger than or equal to ⎢ 1 2 ⎥ , thus ⎜ 1 2 ⎟ is the optimal one. ⎣ B2 B1 ⎦ ⎝ B2 B1 ⎠ Theorem 5. If { A1 , A2 } ∩ {B1 , B2 } = {Z } , B3 ∉ { A1 , A2 } , A3 ∉ {B1 , B2 } , Suppose
R = { A1 , A2 } − Z , H = {B1 , B2 } − Z , then the optimal row-sequencing-pair which is
⎛ R A3 ⎞ ⎛ A1 A2 ⎞ constituted by N parallel activities of E is ⎜ ⎟ or ⎜ ⎟. B B ⎝ 2 1⎠ ⎝ B3 H ⎠ Proof. (1) Suppose that A1 = B1 = {Z } , then R = A2 , H = B2 , therefore
Research on the Optimization Decision-Making Two Row-Sequencing-Pairs
427
According to Lemma 3, the optimal row-sequencing-pair which constituted by the activities of E must be standard, we only consider the standard row-sequencing-pairs. ⎛ A2 A3 ⎞ ⎛ A1 A2 ⎞ We can replace activities of ⎜ ⎟ or ⎜ ⎟ with some proper activities of E ⎝ B2 B1 ⎠ ⎝ B3 B2 ⎠ ⎛ A2 A3 ⎞ ⎛ A1 A2 ⎞ and standardize them to get ones that are not cognation of ⎜ ⎟ and ⎜ ⎟. ⎝ B2 B1 ⎠ ⎝ B3 B2 ⎠ For supposed EFA1 ≤ EFA2 ≤ EFA3 ≤ ⋅⋅⋅ ≤ EFAn , LS Bn ≤ ⋅⋅⋅ ≤ LS B3 ≤ LS B2 ≤ LS B1 , Ai ≠ Aj , Bi ≠ B j , i ≠ j , and Ai ≠ B j (except A1 = B1 ), for Theorem 1, 2 and 3,
⎛ A1 A2 ⎞ ⎛ A2 A3 ⎞ replace activities of ⎜ ⎟ or ⎜ ⎟ with any other activities of E, if new row B B ⎝ 3 2⎠ ⎝ B2 B1 ⎠ pairs are exist, then standardize them, and deficient values are larger than or equal to ⎡ A1 A2 ⎤ ⎡ A2 A3 ⎤ ⎛ A1 A2 ⎞ ⎛ A2 A3 ⎞ ⎢ B B ⎥ or ⎢ B B ⎥ , thus the optimal row-sequencing is ⎜ B B ⎟ or ⎜ B B ⎟ . ⎣ 2 1⎦ ⎣ 3 2⎦ ⎝ 3 2⎠ ⎝ 2 1⎠
(2) If Z = A1 = B2 , Z = A2 = B1 , Z = A2 = B2 , we could get the same conclusion. Theorem 6. If { A1 , A2 } ∩ {B1 , B2 } = {Z } , { A1 , A2 , B1 , B2 } ∩{ A3 , B3 } = Q ≠ ∅ ,
R = { A1 , A2 } − Z , H = {B1 , B2 } − Z , then
⎛ A1 A2 ⎞ ⎛ R A4 ⎞ (1) When Q = { A3 } , the optimal row-sequencing-pair is ⎜ ⎟; ⎟ or ⎜ ⎝ B3 H ⎠ ⎝ B2 B1 ⎠ ⎛ R A3 ⎞ ⎛ A1 A2 ⎞ (2) When Q = {B3 } , the optimal row-sequencing-pair is ⎜ ⎟ or ⎜ ⎟; ⎝ B2 B1 ⎠ ⎝ B4 H ⎠ ⎛ R A4 ⎞ ⎛ A1 A2 ⎞ (3) When Q = { A3 , B3 } , the optimal row-sequencing-pair is ⎜ ⎟ or ⎜ ⎟. ⎝ B2 B1 ⎠ ⎝ B4 H ⎠ Proof. (1) For supposed condition and Theorem 1, 2, 3, it’s easy to prove that ⎛ A1 A2 ⎞ ⎛ A1 A2 ⎞ ⎛ R A4 ⎞ ⎛ R A4 ⎞ ⎟ are standard. Replace activities of ⎜ ⎟ with ⎜ ⎟ and ⎜ ⎟ or ⎜ ⎝ B3 H ⎠ ⎝ B3 H ⎠ ⎝ B2 B1 ⎠ ⎝ B2 B1 ⎠
other activities of E, if the new pairs are row-sequencing-pairs, then the deficient ⎡ R A4 ⎤ value of standardized row-sequencing-pair are lager than (equal to) ⎢ ⎥ or ⎣ B2 B1 ⎦ ⎡ A1 A2 ⎤ ⎛ A1 A2 ⎞ ⎛ R A4 ⎞ ⎢ B H ⎥ . Then, we can replace some activities of ⎜ B B ⎟ or ⎜ B H ⎟ and standard⎣ 3 ⎦ ⎝ 3 ⎠ ⎝ 2 1⎠ ⎛ R A4 ⎞ ize the new row-sequencing-pairs to standard pairs that are not cognate to ⎜ ⎟ or ⎝ B2 B1 ⎠ ⎛ A1 A2 ⎞ ⎡ A1 A2 ⎤ ⎡ R A4 ⎤ and ⎢ ⎜ ⎟ . According to Lemma 3, ⎢ ⎥ ⎥ is minimal among the cog⎣ B2 B1 ⎦ ⎣ B3 H ⎦ ⎝ B3 H ⎠ nate pairs, the two row-sequencing-pairs are optimal.
428
S. Lv et al.
Similarly, we could prove (2) and (3). Theorem 6 is correct. Theorem 7. For { A1 , A2 } ∩ {B1 , B2 } = { A1 , A2 } ,
⎛ A1 A4 ⎞ ⎛ A2 A3 ⎞ ⎛ A2 A4 ⎞ ⎜ ⎟, ⎜ ⎟ or ⎜ ⎟ are B A B A ⎝ 3 2⎠ ⎝ 4 1⎠ ⎝ B3 A1 ⎠
Proof. For { A1 , A2 } ∩ {B1 , B2 } = { A1 , A2 } , N parallel activities are different with each
other, it is easy to prove { A3 , A4 } ∩ {B1 , B2 } = ∅ and {B3 , B4 } ∩ { A1 , A2 } = ∅ . There⎛ A1 A2 ⎞ fore, if A3 ≠ B3 , ⎜ ⎟, ⎝ B4 B3 ⎠ Then, we can deduce that
⎛ A3 A4 ⎞ ⎛ A1 A3 ⎞ ⎜ ⎟,⎜ ⎟ ⎝ B2 B1 ⎠ ⎝ B3 A2 ⎠ they are standard
⎛ A2 A3 ⎞ ,⎜ ⎟ are row-sequencing-pairs. ⎝ B3 A1 ⎠ according to suppose; if A3 = B3 ,
⎛ A1 A2 ⎞ ⎛ A3 A4 ⎞ ⎛ A1 A3 ⎞ ⎛ A1 A4 ⎞ ⎛ A2 A3 ⎞ ⎛ A2 A4 ⎞ ⎜ ⎟,⎜ ⎟,⎜ ⎟,⎜ ⎟ ,⎜ ⎟ ,⎜ ⎟ are standard. Use method ⎝ B4 B3 ⎠ ⎝ B2 B1 ⎠ ⎝ B4 A2 ⎠ ⎝ B3 A2 ⎠ ⎝ B4 A1 ⎠ ⎝ B3 A1 ⎠ similar to Lemma 6 and 7, we can prove that deficient value of some standard rowsequencing-pairs that are not cognate with the pairs above are larger than or equal to one of the deficient value of these pairs, so they are not the optimal. According to Theorem 3, the optimal row-sequencing-pair must be one above. Theorem 7 is correct.
4 Optimization Algorithm of Row-Sequencing-Pair with Slacks The algorithm is discript as follows: (1) Arrange N activities in ascending order of the earliest-finish-time (EF) : EFA1 ≤ EFA2 ≤ EFA3 ≤ ⋅⋅⋅ ≤ EFAn , then rearrange and number these activities in descending order of the latest-starting-time (LS): LS B1 ≥ LS B2 ≥ LS B3 ≥ ⋅⋅⋅ ≥ LS Bn . (2) Examine A1 , A2 , B1 , B2 and ascertain if there are same activities.
① If there are not same activities, then ⎛⎜ BA AB ⎞⎟ is a row-sequencing-pair and it is 1
2
⎝ 2 1⎠ the optimal, otherwise it is not a row-sequencing-pair.
② If there is a couple of same activities: A = B , A ∈ {A , A } , B ∈{B , B } , i
j
i
1
2
j
1
2
then A1 , A2 B1 , B2 are actually three activities. From the rest N-3 activities, select activity Ar which earliest-finish-time (EF) is the minimum and activity Bt which lateststarting-time (LS) is the maximum (Note: Ar = Bt is possible). Replacing Ai with Ar , and replacing B j with Bt , we have two row-sequencingpairs, standardize them and calculate their deficient value, then the one with the minimal deficient value is optimal.
Research on the Optimization Decision-Making Two Row-Sequencing-Pairs
429
③ If there are two pairs of the same activities, then ( A , A B , B ) are actually two 1
2
1
2
activities ( A1 , A2 ) or ( B1 , B2 ). Examine if A3 is same as B3 .
⎡ A1 A2 ⎤ ⎡ A3 A4 ⎤ ⎡ A1 A3 ⎤ ⎡ A2 A3 ⎤ (i) If A3 ≠ B3 , calculate ⎢ ⎥,⎢ ⎥ , the row-sequencing⎥ ,⎢ ⎥,⎢ ⎣ B4 B3 ⎦ ⎣ B2 B1 ⎦ ⎣ B3 A2 ⎦ ⎣ B3 A1 ⎦ pair with the minimum deficient value is the optimal. ⎡ A1 A2 ⎤ ⎡ A3 A4 ⎤ ⎡ A1 A3 ⎤ ⎡ A1 A4 ⎤ ⎡ A2 A3 ⎤ ⎡ A2 A4 ⎤ (ii) If A3 = B3 , calculate ⎢ ⎥ ,⎢ ⎥,⎢ ⎥ ,⎢ ⎥,⎢ ⎥,⎢ ⎥, ⎣ B4 B3 ⎦ ⎣ B2 B1 ⎦ ⎣ B4 A2 ⎦ ⎣ B3 A2 ⎦ ⎣ B4 A1 ⎦ ⎣ B3 A1 ⎦ the row-sequencing-pair with the maximum deficient value is the optimal.
① is correct; for (1), Theorem 5 and 6, we know (2)-② is correct; for (1) and Theorem 7, we know (2)-③ is correct. Proof. For (1) and Theorem 4, we know (2)-
5 Conclusion In the paper, we have given some definitions of relationships among activities and some Theorems about the relationships and scheduling of activities in CPM network. Furthermore, on the basis of these Theorems, we propose an optimal method to solve a sequencing problem, such as choosing 4 activities from N activities to arrange two row-sequencing-pairs in a CPM network, which has not been solved optimally by heuristic methods. Acknowledgments. R.B.G. thanks the Natural Science Foundation of China (70671040) and Beijing Municipal Commission of Education (X90017).
References 1. Bai, S.J.: Modern Project Management, pp. 83–105. Machinery Industry Press, Beijing (2005) 2. Elmaghraby, S.E.: Activity nets: A guided tour through some recent developments. European Journal of Operational Research 82, 383–408 (1995) 3. Lin, M., Lin, Z.X.: A cost-effective critical path approach for service priority selections in grid computing economy. Decision Support Systems 42, 1628–1640 (2006) 4. Chanas, S., Zieliski, P.: The computational complexity of the criticality problems in a network with interval activity times. European Journal of Operational Research 136, 541–550 (2002) 5. Cao, G.M., Bai, S.J.: Three aspects of international development of PERT/CPM network techinques. System Engineering Theory and Practice 3, 41–46 (1993) 6. Bai, S.J.: Network planning and heuristic optimal method with restrained resources and its evaluation and choice. Chinese management science 11, 30–38 (1993) 7. Montemanni, R., Gambardella, L.M., Donati, A.V.: A branch and bound algorithm for the robust shortest path problem with interval data. Operations Research Letters 32, 225–232 (2004) 8. Wang, Z.T.: Network planning techniques, pp. 48–62. LiaoNing Press, Shengyang (1994)
A Second-Order Modified Version of Mehrotra-type Predictor-Corrector Algorithm for Convex Quadratic Optimization Qiang Hu and Mingwang Zhang College of Science, China Three Gorges University, Yichang 443002, Hubei, China [email protected], [email protected]
Abstract. Mehrotra-type predictor-corrector algorithms are the core of the interior point methods software packages. The example made by Salahi et al. recently has shown that a second-order modified version of Mehrotra-type predictor-corrector algorithm for linear optimization may be forced to take very small steps in order to remain in a certain neighborhood of the central path. This motivates them to introduce a safe strategy in order to guarantee a lower bound for the maximum feasible step in the corrector, and subsequently ensure the polynomial complexity. Based on their research, this paper extend the algorithm to convex quadratic optimization. complexity of the new algorithm The polynomial (x0 )T s0 . Since the search directions are not is derived, namely, O n log orthogonal, the new algorithm is different from their method by the way of computing the barrier parameter and performing the complexity analysis. Keywords: Convex quadratic optimization, Predictor-corrector methods, Interior point methods, Second order Mehrotra-type methods, Polynomial complexity.
1
Introduction
Since the landmark paper of Karmarkar [1], the interior-point method (IPM) has become one of the most active research areas. Predictor-corrector methods are not only among the most efficient IPMs, but also the back bones of IPMs software packages such as [2]. Recently, Salahi et al.[3] analyzed a modified version of Mehrotra-type predictor-corrector algorithm, and by using the example they showed that the algorithm might imply a very small step size for the corrector step in order to remain in a certain neighborhood of the central path, and hence takes many iterations to convergence. To avoid it, Salahi et al. [3,4] have introduced a safeguard strategy to guarantee a lower bound for the maximum feasible step size in the corrector. Their algorithms have the polynomial complexity while keeping the practice efficiency. However, they have only been discussed for linear optimization (LO) problems. Y. Tan, Y. Shi, and K.C. Tan (Eds.): ICSI 2010, Part II, LNCS 6146, pp. 430–438, 2010. c Springer-Verlag Berlin Heidelberg 2010
A Second-Order Modified Version
431
In this paper, we extend the second-order modification of algorithm [4] to convex quadratic optimization (CQO) problems. Based on their work, we also incorporate a safeguard in the algorithm to guarantee a lower bound for the maximum step size in the corrector. Apart from it, we use the corresponding analytical method for CQO problems, and prove that the iteration complexity 0 T 0 of our new algorithm is O n log (x ) s . Since the search direction is not orthogonal, the algorithm is different from their method by the way of computing the barrier parameter and performing the complexity analysis. Throughout the paper, we consider the following CQO problem in the standard form: 1 (CQO) min{cT x + xT Qx : Ax = b, x ≥ 0}, 2 with its dual problem 1 (DCQO) max{bT y − xT Qx : AT y − Qx + s = c, s ≥ 0}, 2 n n where Q ∈ S+ (S+ denotes the cone of symmetric positive semidefinite n × n matrix), A ∈ Rm×n with rank(A) = m and x, s, c ∈ Rn , b, y ∈ Rm . Without loss of generality [5], we assume that both CQO and DCQO satisfy the interior point condition (IPC), i.e., there exists an (x0 , y 0 , s0 ) such that Ax0 = b, x0 > 0, AT y 0 − Qx0 + s0 = c, s0 > 0. In the paper [4] they have already introduced both the basic idea of primaldual IPMs and the central path for CQO. In what follows, we briefly describe the variation of Mehrotra’s second order predictor-corrector algorithm for CQO. In the predictor step, the affine scaling search direction,
AΔxa = 0, AT Δy a − QΔxa + Δsa = 0, a
(1)
a
sΔx + xΔs = −xs is computed and the maximum feasible step size αa is calculated so that (x + αa Δxa , s + αa Δsa ) ≥ 0. However, the algorithm does not make such a step. It just uses the information from the predictor step to compute the centering direction that is defined as follows: AΔx = 0, T
where ga = (x + αa Δxa )T (s + αa Δsa ), and g = xT s.
(3)
432
Q. Hu and M. Zhang
Finally, the maximum step size α is computed so that the next iterate given by √ x(α) := x + α αa Δxa + α2 Δx, √ y(α) := y + α αa Δy a + α2 Δy, √ s(α) := s + α αa Δsa + α2 Δs
(4)
belong to a certain neighborhood of the central path. In fact, by the definition of μ we can obtain the following crucial result. (1 − αa )3 μg ≤ μ =
This paper is organized as follows. In Section 2, we estimate the maximum feasible step size for both predictor and corrector steps, and describe our algorithmic scheme for CQO. Then in Section 3, the polynomial complexity of our algorithm is also derived. Computational results are obtained in Section 4. For self completeness we list four technical lemmas in Appendix A. For simplicity in the rest of this paper, we use the following notations: I = {1, ..., n},
I+ = {i ∈ I | Δxai Δsai ≥ 0},
I− = {i ∈ I | Δxai Δsai < 0},
F = {(x, y, s) ∈ Rn × Rm × Rn | Ax = b, AT y − Qx + s = c, x ≥ 0, s ≥ 0}, Δxc = Δxa + Δx, Δy c = Δy a + Δy, Δsc = Δsa + Δs, μg =
2
xT s n ,X
= diag(x).
The Safeguarded Algorithm
In this section we present the estimation of the maximum feasible step size for both predicter and corrector steps, and outline our second-order modified algorithm for CQO. our algorithm works in the negative infinity norm neighborhood defined by − N∞ (γ) := {(x, y, s) ∈ F 0 : xi si ≥ γμg
∀i ∈ I},
where γ ∈ (0, 1) is a constant independent of n. Theorem 1. (Theorem 3.1 in [3]) Suppose that the current iterate (x, y, s) ∈ − N∞ (γ) and (Δxa , Δy a , Δsa ) be the solution of (1). Then the maximum feasible step size, αa ∈ (0, 1], such that (x(αa ), s(αa )) ≥ 0, satisfies γ αa ≥ . (6) n To estimate the bound of the corrector step, the following lemmas are necessary.
A Second-Order Modified Version
433
Lemma 1. (Lemma 4.1 in [3]) If (Δxa , Δy a , Δsa ) be the solution of (1), then one has 1 1 a a −Δxi Δsi ≤ − 1 xi si , ∀i ∈ I− (7) αa αa − (γ) and (Δxa , Δy a , Lemma 2. Suppose that the current iterate (x, y, s) ∈ N∞ a Δs ) be the solution of (1) and (Δx, Δy, Δs) be the solution of (2) with μ ≥ 0, Then, 12 2 1 μ αa μ α2a − 4αa + 4 a + + nμg , Δx Δs ≤ γ μg 2γ μg 16
12 2 2 μ μ 1 α α − 4α + 4 a a + + a nμg . ΔxΔsa ≤ γ μg 2γ μg 16 Proof. Analogous to the proof of Lemma 4.3 in [4], one can easily prove the lemma. − Lemma 3. Suppose that the current iterate (x, y, s) ∈ N∞ (γ) and (Δx, Δy, Δs) be the solution of (2) with μ ≥ 0. Then, 2 1 μ αa μ α2a − 4αa + 4 − 32 ΔxΔs ≤ 2 + + nμg . γ μg 2γ μg 16 1
Proof. By multiplying the third equation of (2) by (XS)− 2 , then by Lemma A.1 and A.4, the proof is straightforward. In what follows we give a bound on αa , violation of which might imply a very small step size in the corrector step. − Theorem 2. Suppose that the current iterate (x, y, s) ∈ N∞ (γ) and (Δxa , Δy a , a Δs ) be the solution of (1) and (Δx, Δy, Δs) be the solution of (2) with μ as defined by (3). Then, for αa ∈ (0, 1] satisfying
αa < 1 −
γ(t + 14 ) 1−γ
13 ,
(8)
the maximum step size is strictly positive in the corrector of the algorithm, where Δxai Δsai t = max . (9) i∈I+ xi si Proof. Our propose is to find the maximum step size α ∈ (0, 1] such that xi (α)si (α) ≥ γμg (α)
α γα √ γα a ≥ γ(1 − α αa )μg + γα2 μ + (Δxa )T Δs + ΔxT (Δsa ) + ΔxT Δs. n n Thus, the worst case for this inequality happens when xi si = γμg , Δxai Δsi + Δxi Δsai < 0 and Δxi Δsi < 0, then for ∀i ∈ I+ , one gets √ √ xi (α)si (α) = (1 − α αa )xi si + α2 μ + α3 αa (Δxai Δsi + Δxi Δsai ) + α4 Δxi Δsi √ ≥ (1 − α αa )xi si + α2 μ − α2 Δxai Δsai + α3 Δxci Δsci . In addition, according to the fact that can get
μ μg
< 1 by the second equality of (5), we
(Δxa )T Δs + ΔxT Δsa ≤ DΔxa D−1 Δs + DΔxD−1 Δsa 1 √ nμg 1 αa (αa − 2)2 2 ≤2 + + nμg ≤ 7 1 . γ 2γ 16 γ2 Similarly, based on Lemma A.1 and A.4, one could prove that 2 2 1 1 μ α α − 4α + 4 7 μ a a a ΔxT Δs ≤ nμg ≤ nγ −1 μg . + + 4 γ μg 2γ μg 16 16 To sum up, for ∀i ∈ I+ , (10) holds whenever √ (1 − α αa )xi si + α2 μ − α2 Δxai Δsai + α3 Δxci Δsci √ √ √ 1 7 ≥ γ(1 − α αa )μg + γα2 μ + 7α3 αa γ 2 μg + α4 μg . 16 By the definition of t given in (9), one has Δxai Δsai ≤ txi si . It is sufficient to have α ≥ 0 for which √ (1 − α αa − α2 t)xi si + α2 μ + α3 Δxci Δsci (13) √ √ √ 1 7 ≥ γ(1 − α αa )μg + γα2 μ + 7α3 αa γ 2 μg + α4 μg . 16 Using the fact that xi si ≥ γμg and t ≤ (13) holds for 1
Finally, we assume that in the worst case when Δxci Δsci < 0 and μ = (1−αa )3 μg , then we need to have 1 (1 − γ)(1 − αa )3 μg − γ(t + )μg > 0. 8
(16)
Therefore, the previous inequality certainly holds whenever αa < 1−
γ(t+ 18 ) 1−γ
13
.
The corresponding inequalities in (10) for ∀i ∈ I− also hold for these values of αa , which completes the proof. Note that t ∈ [0, 14 ] by Lemma A.2, then to get an explicit lower bound 13 γ for maximum step size in the corrector, we let αa = 1 − 2(1−γ) and μ = (1 − αa )3 μg =
γ μ . 2(1−γ) g
Corollary 1. Let μ =
γ 2(1−γ) μg ,
and γ ∈ (0, 13 ). Then
γ μ , 2(1−γ) g
and γ ∈ (0, 13 ). Then ΔxΔs ≤
√ √ 7n 7n a √ Δx Δs ≤ μg and ΔxΔs ≤ √ μg . 2 2γ 2 2γ a
Corollary 2. Let μ =
7nμg √ . 16 2γ
− Theorem 3. Suppose that the current iterate (x, y, s) ∈ N∞ (γ) and (Δxa , Δy a , a Δs ) be the solution of (1) and (Δx, Δy, Δs) be the solution of (2) with μ = γ μ . Then, 2(1−γ) g 3
γ2 α≥ √ . 3 14αa n Proof. Our goal is to find the maximum feasible step size α ∈ (0, 1] such that (10) holds. According to the similar analysis of the Theorem 2, it is sufficient to have √ 1 (1 − γ)μ + α αa (Δxai Δsi + Δxi Δsai ) + α2 Δxi Δsi ≥ γμg (17) 8 √ 12 1 2 γ αa γ for α ≤ min{ 16γ√7 , }. By Corollaries 1, 2 and the fact that μ = 2(1−γ) μg , 7 (17) is also sufficient to have 38 γμg −
√ √7αa αnμg 2γ
−
7 √ α2 nμg 16 2γ
≥ 0. Thus, when
436
Q. Hu and M. Zhang 3
α =
2 3γ √ 2 14αa n 1+ 1+ 16√3γ 2α
an
≥
3 2
√γ , 2 14αa n
(17) certainly holds. This completes
the proof. Eventually, we describe the new algorithm as follows: Algorithm 1 Step 1. Input a an accuracy parameter > 0 and a proximity parameter − γ ∈ (0, 13 ). Choose a strictly feasible pair (x0 , y 0 , s0 ) ∈ N∞ (γ). T Step 2. If x s < ε, then stop, and (x, y, s) is optimal solution; otherwise, go to Step 3. Step 3. Solve (1), and compute the maximum step size αa such that (x(αa ), y(αa ), s(αa )) ∈ F, and go to Step 4. 2 ga Step 4. If αa ≥ 0.4, then solve (2) with μ = gga n and compute the − maximum step size α such that (x(α), y(α), s(α)) given by (4) belongs to N∞ (γ); 3 2
γ γ If α < 2√14α , then solve (2) with μ = 2(1−γ) μg , and compute the maximum an − step size α such that (x(α), y(α), s(α)) ∈ N∞ (γ); γ Else, Solve (2) with μ = 2(1−γ) μg and compute the maximum step size α such − that (x(α), y(α), s(α)) ∈ N∞ (γ). Then, go to Step 5. Step 5. Set(x, y, s) = (x(α), y(α), s(α)), and go back to Step 2.
3
The Complexity Analysis
0 T 0 Theorem 4. Algorithm 1 stops after at most O n log (x ) s number of iterations with a solution for which xT s ≤ . Proof. If αa ≥ 0.4 and α ≥ μg (α) ≤
Considering the following well-known CQO problem [6] under the MATLAB 7.0 environment: min 2x21 + 3x22 + 5x23 + x1 + 10x2 − 3x3 ⎧ ⎪ ⎨ x1 + x2 = 5, s.t. x2 + x3 = 10, ⎪ ⎩ x1 , x2 , x3 ≥ 0. The optimal solution of the problem is x∗ = (0, 5, 5)T , and the optimum is z ∗ = 235. Let the algorithm start from the following feasible points in the neigh− (0.20): x0 = (0.5, 4.5, 5.5)T , s0 = (0, 5, 5)T , y 0 = (−15, 50)T , borhood N∞ where = 10−6 . Computational results show that our algorithm stops after at most 24 iterations with a solution for which xT s ≤ , namely, x = (0.000000025484320, 4.999999974515683, 5.000000025484317)T , s = (8.000000505578116, 0.000000034751142, 0.000000038859425)T . The duality gap x T s = 5.719274071599888e − 007, the approximate optimum z = 2.350000002038746e + 002. Acknowledgment. Supported by Natural Science Foundation of Hubei Province of China (NO. 2008CDZ047).
References 1. Karmarkar, N.K.: A new ploynomial-time algorithm for liner programming. Combinatorice 4, 373–395 (1984) 2. Zhang, Y.: Solving large-scale linear programms by interior point methods under the Matlab enviroment. Optimization Methods and Software 10, 1–31 (1999) 3. Salahi, M., Peng, J., Terlaky, T.: On Mehrotra type predictor-corrector algorithms, Technical Report 2005/4, Advanced Optimization Lab, Department of Computing and Software, McMaster University, SIAM Journal on Optimization (2005) 4. Salahi, M., Mahdavi-Amiri, N.: Polynomial time second order Mehrotra-type predictor-corrector algoritms. Applied Mathematics and Computation 183, 646–658 (2006) 5. Peng, J., Roos, C., Terlaky, T.: Theory and Algorithms for Linear Optimization, An Interior-point Approach. John Wiley and Sons, Chichester (1997) 6. Chen, F., Zhang, H., Wu, Z.: Preliminary numerical texts in primal-dual interiorpoint methods for convex quadratic programming. Science Technology and Engineering (1), 97–99 (2009)
438
Q. Hu and M. Zhang
Appendix: A In this section, we provide the following four technical lemmas. These lemmas are quoted from [4]. Let (Δxa , Δy a , Δsa ) be the solution of (1) and (Δx, Δy, Δs) be the solution of (2). Lemma A.1. (Δxa )T Δsa ≥ 0 , (Δx)T Δs ≥ 0. Lemma A.2. Δxai Δsai ≤ xi4si ∀i ∈ I+ . T Lemma A.3. i∈I− |Δxai Δsai | ≤ i∈I+ Δxai Δsai ≤ x 4 s . Lemma A.4. Let ∀p, q ∈ Rn , p + q = r and pT q ≥ 0, then one has √ 2 1 r2 and pT q ≤ r2 . P q ≤ 4 4
An Optimization Algorithm of Spare Capacity Allocation by Dynamic Survivable Routing Zuxi Wang, Li Li*, Gang Sun, and Hanping Hu Institute for Pattern Recognition and Artificial Intelligence, Huazhong University of Science and Technology; National Key Laboratory of Science &Technology on multi-spectral information processing Wuhan, China [email protected]
Abstract. The survivability of MPLS networks has received considerable attention in recent years. One of the key tasks is to route backup paths and allocate spare capacity in the network to provide the QoS guaranteed communication services to a set of failure scenarios. This is a complex multi-constraint optimization problem, called the spare capacity allocation (SCA). In this paper, a dynamic survivable routing (DSR) algorithm using the chaotic optimization theory is proposed to solve it. Comparing traditional SCA algorithm, numerical results show that DSR has the satisfying QoS performances. Keywords: Network Survivability, Spare Capacity Allocation, Survivable Routing, Chaos Optimization, MPLS.
1 Introduction With the rapid development of the Internet, network is used to transmit all kinds of real-time services. To respond quickly to network faults and avoid the router cache left large amounts of data, resulting in decreased network performance and QoS, network survivability have become an important research issue. MPLS (Multi Protocol Label Switching) proposed by IETF has gradually become one of the core technology of IP backbone network, and its network survivability have attached broad attentions. Network survivability of MPLS include two components, survivable network design and restoration schemes [11]. During design stage, survivable strategy is integrated into the network design. Then in Failure recovery stage, re-routing scheme and recovery path switching scheme are mostly adopted. The rerouting recovery scheme without recovery path can establish a backup path on-command to restore traffic transmission after the detection of a failure. Since the calculation of new routes and resource reservation of new path are time-consuming, it is considerably slower than recovery path switching mechanisms. In the case of recovery path switching, traffic is switched to the pre-established recovery path when the failure occurs. So the recovery is very fast. But the pre-established recovery path reserve a part of network resources. *
On a given two-connected mesh network, deciding how much spare capacity allocation should to reserved on links and how to route backup paths to protect given working path from a set of failure scenarios is usually treated as a key issue. It is called spare capacity allocation problem. There are some existing schemes [1, 2, 4, 5, 6, 7, 9, 10] to solve this problem. Previous research on spare capacity allocation of mesh-type networks uses either mathematical programming techniques or heuristics to determine the spare capacity allocation as well as backup paths for all traffic demands. Related methods like Branch and Bound (BB) [6], Genetic Algorithm (GA) [10], Simulated Annealing (SA) [2] and Spare Link Placement Algorithm (SLPA) [5] adopt the way of problem solving. All of above methods are still in the pre-planning phase which can only be implemented centrally. So a distributed scheme called Resource Aggregation for Fault Tolerance (RAFT) is proposed by Dovrolos [4] for IntServ services using the resource Reservation Protocol (RSVP) [1]. Since the RAFT scheme has not considered the chance of spare capacity sharing, two dynamic routing schemes, called Sharing with Partial routing Information (SPI) and Sharing with Complete routing Information (SCI) were presented in paper [7]. But the redundancy of SPI is not very close to the optimal solutions and the per-flow-based information is necessary for SCI. Then Yu Liu and David Tipper [9] proposed a Successive Survivable Routing (SSR) algorithm which unraveled the SCA problem structure using a matrix-based model. However, Yu Liu and David Tipper did not give an effective search strategy in the state space of SCA problem. So the convergence time of algorithm is unstable and the algorithm is not optimal. To solve this problem, combining the matrix-based model from SSR and chaos optimization method, we propose a dynamic survivable routing algorithm that can dynamically restore the failure, based on the given working path and bandwidth and delay constraints. So we can maximize the restoration speed and provide the QoS guaranteed communication services.
2 Proposed Dynamic Survivable Routing Scheme In this section, we describe the dynamic survivable routing scheme, which use a matrix-based model from SSR and chaos optimization method. As the matrix-based model describes the SCA problem structure, chaos optimization method is applied to the optimized computation for backup paths and spare capacity allocation. They cooperate to achieve seamless services upon failures. Since the algorithm not only provides the survivable services, but also minimize the total cost of spare capacity in the backup path selection process dynamically, we call it dynamic survivable routing (DSR). 2.1 Notations and Definitions To describe the SCA problem, a network is represented by an undirected graph with N nodes, L links and R flows. Then a set of matrix-based definitions and optimization model are given as follows. N,L,R,K, numbers of nodes, links, flows and failure scenarios; AN*L=[anl], node link incidence matrix;
An Optimization Algorithm of Spare Capacity Allocation
441
BR*N=[brn], flow node incidence matrix; TK*L=[tkl], failure link incidence matrix, tkl =1 if link l fails in failure k; UR*K=[urk], failure flow incidence matrix, urk =1 if failure k will affect flow r’s working path; CR*L=[crl], working path link incidence matrix, crl =1 if link l is used on flow r’s working path; DR*L=[drl], backup path link incidence matrix, drl =1 if link l is used on flow r’s backup path; SL*K=[slk], spare provision matrix, slk is spare capacity on link l for failure k; W=Diag({wr}), diagonal matrix of demand bandwidth wr of flow r; MR*L=[mrl], flow tabu-link matrix, mrl =1 if link l should not be used on flow r’s backup path; hL*1, vector of link spare capacity; vr={vrl}, vector of cost on additional link spare capacity for flow r. Given above notation and definitions, based on the matrix-based model of SSR, the spare capacity allocation problem can be formulated as follows. Objective function: min e T h (1) D, h
Constraints: h≥S M + D ≤1 DAT = B(mod 2) S r = wr (d rT u r )
(2) (3) (4) (5)
R
S = ∑Sr
(6)
r =1
The objective function (1) is to minimize the total spare capacity by the backup paths selection and spare capacity allocation. Constraints (2), (5) and (6) calculate h and S. Constraint (3) guarantees that each backup path is link-disjoint from its working path. Constraint (4) guarantees that the given backup paths are feasible. 2.2 Proposed Algorithm As a universal phenomenon in nonlinear system, chaos has stochastic property, ergodic property and regular property, whose ergodicity can be used as a kind of optimization mechanism to effectively avoid the search being trapped into local optimum [3]. In the proposed algorithm, chaos optimization method is applied to the optimized computation for backup paths and spare capacity allocation. The adopted Logistic map [8] can be formulated as equation (7), where u is a control parameter. When u=4, 0≤x0≤1, the logistic map become chaotic behavior. (7) xn+1 = uxn (1 − x n ) Under the above definition, DSR solves the original multi-commodity flow problem by partitioning it into a sequence of single flow problems. Using the logistic map, DSR algorithm traversals the state space formed by flows of network. Because of the stochastic property, ergodic property and regular property, each flow’s backup path
442
Z. Wang et al.
can be optimized dynamically. The implementation flow of DSR algorithm is given as follows. First of all, based on the network topology and QoS requirements, we calculate the working paths and backup paths for each pair of nodes to provide the heuristic information. Step 1: the flows of network are numbered 1~R; Step 2: initialize the logistic map with m different values, and then obtain m result values by n iterations; Step 3: quantify the interval (0, 1) uniformly with R level and map to each flow, e.g. the value after iteration in the interval (0, 1/R) is mapped to the flow with number 1, others map to the flow according to their iteration value; Step 4: according to the step 3’s rule, map m iteration value generated by the m logistic map with one iterative operation to m flows, and then push the number of m flows into the stack ST; Step 5: pop ST, if ST is null, turn to step 4, otherwise get the number of flow r: (1) According to the number, get the working path cr, then calculate the ur and mr (2) Collect the current network state information, and update the spare provision matrix S (3) Calculate S − r by formula S − r = S − S r and constraint (5), then get h-r=max S-r (4) Let d r* = e − mr denote the alternative backup path for flow r, and S r* = wr (d r*T u r ) , then calculate h* = max( S − r + S r* ) (5) Calculate vr by formula v r = {vrl } = φ (h * (e − mr )) − φ (h − r ) w-here φ is a function formulated bandwidth cost of each link Step 6: first exclude all the tabu links marked in the binary flow tabu-link vector mr of flow r, then use the shortest path algorithm with link weight vr to find the updated backup path d rnew Step 7: replace the original backup path dr when it has a higher path cost than updated backup path d rnew , then the spare provision matrix S and link spare capacity vector h are updated accordingly. If satisfy the optimization solution, export it and exit, else turn to step 6.
3 Performance Discussion In this paper, the objective is not only to route an eligible backup path, but also minimize the total cost of resource reservation and provide survivable services. To evaluate them, we do experiment with medium sized network topology more times. In the literature [9], compared with other classic algorithms, SSR present better performance. So we just compare proposed algorithm with SSR in experiments. In the figures of optimization procedure, sampling point represents the spare capacity cost of all the backup paths which is sampled at an interval of 5 iterations. All the flows have one unit bandwidth demand. The termination condition of algorithm was set that the reserved capacity did not change after 300 iterations.
An Optimization Algorithm of Spare Capacity Allocation
443
Shown in figure 1, there are 17 nodes and 31 links in the topology of experiment network. Assume that the traffic exist between any two nodes, R=136 can be calculated. 7
1 6
1
5
6
7
13
5 18
15
27
16
11 24
30
14
23 26
9
12 15
14 20
10
4
8
8
11
10
9
22
4
31
17
13
3
3
2
2
21
12
25 28
29 17
16
Fig. 1. Topology of experiment network
For the experiment network, we did 10 group experiments. Each group algorithms are repeated for 10 times, and the iteration times, convergence time and reserved capacity are used to represent the algorithm performance. One of optimization procedure of SSR and DSR are shown in Fig. 2, 3 respectively.
Fig. 2. Optimization procedure of SSR
Fig. 3. Optimization procedure of DSR
For lack of space, the iteration times and convergence value of reserved capacity of first four optimization procedure of two algorithms are shown in Table 1. The statistics of all the 10 group experiments are shown in Table 2. (Note: IT = iteration times, CVRC = convergence value of reserved capacity, CT = convergence time) Table 1. IT and CVRC of two algorithms for first four optimization procedure
Optimization procedure 1 2 3 4
SSR based on random search IT CVRC 115 146 239 147 201 149 192 152
DSR based on chaos optimization IT CVRC 140 148 164 149 166 148 149 145
444
Z. Wang et al. Table 2. Statistics for SSR and DSR
Maximum Minimum Average
SSR based on random search IT CVRC CT 239 152 13.6 115 146 7.8 186.75 148.5 11.2
DSR based on chaos optimization IT CVRC CT 166 149 10.4 140 145 9.7 154.75 147.5 10.1
According to the experimental data, compared with SSR, the average iteration times of DSR is reduced by 20.96 percent, the average convergence time of DSR is reduced to 90.18 percent. The reserved capacity of two algorithms is very close. So DSR has better performance than SSR. Since the minimum always appear in the first five optimization process of 10 optimization processes, we can balance the reserved capacity and convergence time and choose the best in them.
4 Conclusion In this paper, we do Research on the backup path choice, which is the key problem of Failure recovery. Analyzing the SCA problem, a backup paths calculation method called DSR algorithm based on chaos optimization is proposed. The proposed algorithm provides backup paths to guarantee the network survivability, and minimizes the network bandwidth cost for resource reservation to improve the utilization rate of network resource simultaneously. The experiment results indicate that DSR algorithm has good performance on resource reservation and convergence time, and converge to an optimal value fast on stable network conditions. Compared with traditional SCA algorithms, the proposed DSR algorithm can maximize the restoration speed and provide the QoS guaranteed communication services. Acknowledgments. This work was supported by the grants from the National Natural Science Foundation of China (No. 60773192), and Natural Science Foundation of Hubei province (2007ABA015), and by the grant from Ph.D. Programs Foundation of Ministry of Education of China (No. 20050487046), and by Beijing Key Laboratory of Advanced Information Science & Network Technology and Railway Key Laboratory of Information Science & Engineering (No. XDXX1008).
References 1. Braden, R., Zhang, L., Berson, S., Herzog, S., Jamin, S.: Resource ReSerVation Protocol (RSVP) –Version 1 Functional Specification, IETF. RFC 2205 (1997) 2. Van Caenegem, B., Van Parys, W., De Turck, F., Demeester, P.M.: Dimensioning of survivable WDM networks. IEEE J. Select. Areas Commum. 16(7), 1146–1157 (1998) 3. Cui, C., Zhao, Q.: Reserch on improved chaos optimization algorithm. Science Technology and Engineering 3 (2007) 4. Dovrolis, C., Ramanathan, P.: Resource aggregation for fault tolerance in integrated service networks. ACM Comput. Commun. Rev. 28(2), 39–53 (1998)
An Optimization Algorithm of Spare Capacity Allocation
445
5. Grover, W.D., Iraschko, R.R., Zheng, Y.: Comparative methods and issues in design of mesh-restorable STM and ATM networks. In: Soriano, P., Sanso, B. (eds.) Telecommunication Network Planning, pp. 169–200. Kluwer, Boston (1999) 6. Iraschko, R.R., Mac Gregor, M.H., Grover, W.D.: Optimal capacity placement for path restoration in STM or ATM mesh survivable networks. IEEE/ACM Trans. Networking 6(3), 325–336 (1998) 7. Kodialam, M., Lakshman, T.V.: Dynamic routing of bandwidth guaranteed tunnels with restoration. In: Proc. IEEE INFOCOM (March 2000) 8. Li, J., Han, Y., Sun, Z.: Performance analysis of chaos optimization algorithm. MinimicroSystems 8(26), 140–143 (2005) 9. Liu, Y., Tipper, D.: Approximating optimal spare capacity allocation by successive survivable routing. IEEE/ACM Transactions on Networking 13(1) (February 2005) 10. Medhi, D., Tipper, D.: Some approaches to solving a multi-hour broadband network capacity design problem with single-path routing. Telecommun Syst. 13(2), 269–291 (2000) 11. Nederlof, L., Struyue, K., Shea, C., Misser, H., Du, Y., Tamayo, B.: End-to-end survivable broadband networks. IEEE Commun. Mag. 9, 63–70 (1995)
Numerical Approximation and Optimum Method of Production Monitoring System of the Textile Enterprise Jingfeng Shao, Zhanyi Zhao, Liping Yang, and Peng Song Xi’an Polytechnic University, Xi’an 710048, China [email protected]
Abstract. To realize the informationization development of textile enterprise, solve the present condition of the production information can not be integrated and share in the local area network (LAN), through LAN, we propose a system structure based on C/S mode, and develop a real-time monitoring and control system. First, the weaknesses that are from the existing computer monitoring systems are analyzed, the structure model of system is designed via the techniques of database, communication, network and TCP/IP techniques, etc. Second, in the system developing process, the difficult technical problems have been studied, which are how to ensure the correctness and accuracy of production data. Thus, we analyze and research numerial filtering technology from theory to actual application, and then, we put forward to use B-spline function fitting method to optimize the collected front roller speed pluse signal. Third, According to the actual requirements, the corresponding solutions are proposed by using numerical approximation and information control theories, and the accuracy of production data is ensured. As verified by practice, system can run stably, and collect accurately production data. The core functions of system meet the requirements of production control and information development. Being compared with the traditional system, the system has some obvious advantages, such as the comprehensive management and production information integration are achieved, as well as production data share and scientific management are realized. Keywords: monitoring system; textile enterprise; function fitting; optimum.
Numerical Approximation and Optimum Method of Production Monitoring System
447
workshop, that is, they neither can realize the network management of the overall production process, nor can complete a comprehensive enterprise-wide management and production information integration, even can not to feedback timely the abnormalities status during the production process. So these systems are not conducive to the information construction of the textile enterprise. Through studying the practical application and analyzing the actual requirements, we develop a production monitoring system for the textile enterprise based on C/S mode, in the LAN, make it realize the production monitoring data share and management information integration, achieve the network management of production process, and complete the distributed collection of production data and centralized management of production data information. Furthermore, system can objectively and truly reflect the real-time operating status of machine, effectively provide an information management platform for the production managers, and speed up the information technology construction of the textile enterprise. At present, some workshops of the textile enterprise used many production information system, which have improved the work efficiency, reduced the manual labor, however, these systems are unable to achieve the overall management, such as real-time production information’s collection, collation, storage, monitoring, control, analysis, etc., even can not realize all kinds of real-time data management. In order to raise the quality maintenance and equipment management, lower the maintenance cost, and reduce the maintenance strength, we develop a real-time network monitoring system for the textile enterprise, it can reflect the real-time production data and operating status, makes the enterprise or workshop realize the network management of production information, and provides a comprehensive data analysis and information management platform for the enterprise production and decision-making.
2 System Structure Monitoring and control system is constituted by a monitoring server, multi-monitor and many client computers. The entire workflow is the following. First of all, all monitors are installed in machine object respectively, timely collect production data, such as operating status (YZZT), standing status (TJZT), front roller pulse (QLLMC), pulse length (CDMC), spindle wing pulse (DYMC), leather roller standing times (PGTJCS), broken standing times (DTTJCS), etc., and then they begin to preliminarily process and store these datum. Secondly, all production datum are conveyed to the remote monitoring center server via bus, the server begins to check, process, calculate and display the result datum in the terminal. Finally, the server saves all data into the temporary table “Tempyield” of the monitor and control information database (MCIDB). The clients are connected with the control center server through LAN, they retrieve all production datum from Tempyield, realize the online monitoring about machine’s production data and operating status, at the same time, the client system can achieve the production parameter’s input, production report’s print and so on. The whole architecture is shown in Figure 1.
448
J. Shao et al.
Fig. 1. System architecture
3 Digital Filtering Technology Due to signal interference of the strong power environment, production data may be affected frequently. If we want to ensure the correction of production data, we must take effective measures to deal with the inaccurate interference phenomenon. Hence, in the process of the entire system design, first of all, we take a number of antijamming measures in the monitors, filter some useless signal, reduce the proportion of noise interference in the useful signal, and make the useful signal be extracted from noise interference. But, in the actual production process, because of existing too much interference factors, sometime, they are unpredictable, even can’t completely rule out. Only the signal processing of hardware can not completely meet the production needs, production data can not reflect the real production site. Secondly, we use software algorithm to achieve the digital filtering and numerical approximation, carry out the necessary smooth processing of the monitoring pulse signal, and ensure the normal operation of monitoring system [2]. The goal of digital filtering and numerical approximation analysis is to deal with the pulse signal through the program software [3], guarantee the correction of production data. In this system, we mainly use two mathematical algorithms, which are the average filtering and weighted arithmetic average filtering. Through using mathematics algorithm, we use relative fitting function to match real yield of each machine, because all datum which are real-time machine operating status (YZZT), standing status (TJZT), front roller pulse (QLLMC), pulse length (CDMC), spindle wing pulse (DYMC), leather roller standing times (PGTJCS), broken standing times (DTTJCS) are served to the front roller speed, meanwhile, the efficiency and yield data computation are related with the front roller speed. In general, the front roller speed’s correction will become the core of computing yield value. For drawing and coving frame, calculation formula of valid work efficiency (VWE) is shown as follows:
Numerical Approximation and Optimum Method of Production Monitoring System
VWE =
∑ work time(min)-∑ standing time(min) × 100% ∑ work time(min)
449
(1)
The calculation formula that is the theory yield per hour (TYPH) of each eye (spindle) is shown as follows:
TYPH (kg) =
n × D × π × Ε × 60 × tex 1000 × 1000 × 1000
(2)
As well as, the calculation formula of real yield per hour (RYPH)of each eye (spindle) is shown as follows: RYPH (kg) = TYPH(kg)* VWE
(3)
Among them, n is front roller speed(r/min), D is front roller diameter (mm), E is a draft ratio that is between the pressed axis and the front roller or an accident draft ratio that is between the pressed axis and the spindle wing, tex is the cotton or rove cotton yarn count. We can see from above, D, ð, tex are const value, only front roller speed n is variable value, and E is related with n, so n will not only become an important control index that is to calculate system yield, but also be an evaluation index that is to assess machine running efficient, therefore, the whole running process of system is related with front roller speed n. 3.1 Theory Analysis
Arithmetic Average Filtering. In the course of data collection, firstly, monitoring and control system continuously takes N times the front roller speed value per eye (spindle), and then to sum them, secondly, selects the arithmetic average value as a sample function, which is y =
1 N
N
∑X i =1
i
, Xi is the ith sample value of roller speed, N
is the sample frequency , y is the Nth sample arithmetic mean value. This approach depends largely on the sample frequency N. That is, if the bigger N is, the smoother the curve in monitoring interface is, and the stronger the ability of inhibiting other signal interference is [4]. But, this method reduces the efficiency of system data collection and the flexibility of system data processing, we have to improve it and use the other. Weighted Average Filtering. In the course of data collection, firstly, monitoring and control system also continuously takes N times the front roller speed value per eye (spindle), and then they are multiplied by the different weighted coefficient Ci, s.t.
450
J. Shao et al.
N −1
∑ Ci = 1, secondly, to sum them as sample result, which is y = i=0
N −1
∑C X i =0
i
N −1
, Xi is
the ith sample value of front roller speed, N is the sample times, yi is sample weighted value [5]. According to the actual requirements, we add the weighted coefficient Ci, make this method bulge a prominent signal, and effectively inhibit the other interference signal, meantime, make the front roller speed close to the actual mechanical machine speed, and achieve the production machine yield data to be correct and accuracy. Because Ci need to be determined by the specific condition, makes the system management be flexibility and convenience, furthermore, system functions meet the basic demand of workshop. To be compared with the average arithmetic, the weighted average filtering method expands greatly system resources, as a whole, it is beneficial for us. 3.2 Comparison and Selection If user does not terminate the data collection process, it will run in the main thread of data collection, executes production data’s collection, processing, storage and display, and feedback the latest status of machine and production data to the user. When the system monitoring interface is witched to interface which is monitored by the curves, the data collection function module starts immediately a thread, begins to send many orders for machine monitors, and then the monitors return production data to the server according to the communication command. After the data acquisition module continuously takes N times the front roller speed pulse, temporarily stores them in a temporary array A[j] (j = 1,2,3, ... .., N), and sorts in the ascending order. Finally, we construct a fitting function yi = f (xi, a), which is the front roller speed pulse signal input and output data, the aim is to achieve numerical approximation. In the fitting function yi=f(xi, a), where a is a parameter, and a ∈
(a1 , a 2 , a3 ,..., an )T , through
obtaining the value of a in the xi, we use the function value fi(i=1,2,3,…..,n) and the roller speed data value yi to form the square, which is Zi=min
1 n ( f i − yi ) 2 , ∑ n i =1
and make Zi be minimum [6]. In the actual production process, the arithmetic average data filtering method may bring a few calculation errors, and the result is not a good approximation, so we need to design a ideal mathematical algorithms to match the front roller fitting pulse. According to the filtering role of each data, we assign a weighted coefficient Ci (Ci ≥ 0), N −1
s.t.
∑C i=0
⎡
i
= 1, as a result, make Zi
n
∑C ( f ⎣
Zi=min ⎢
i
i =1
i
be turned into the function
⎤ − yi ) 2 ⎥ . ⎦
At this point, we obtain a good approximation result, reduce the caculation error, and achieve the basis of front roller speed data acquisition. However, as verified by the practice, though the weighted average filtering method can effectively suppress the interference signal components, it could not thoroughly eliminate the basic
Numerical Approximation and Optimum Method of Production Monitoring System
451
random error, and ensure the production data correct. Since, we propose an approximation way with B-spline function.
4 Optimum Method The spline function that is given by the linear approximation of B-spline function Bi(x) is a good way to resolve numerical approximation problem. Furthermore, as the approximation function, the spline function has a very great excellent and performance, and be easily realized by the computer [7]. 4.1 B-spline Construction According to the actual requirement, B-spine function construction mainly involves five steps, which are shown as follows. Step 1: Suppose that the collected front roller speed pluse signal is in a custom limited range [u,v], where u>0, v>0,and both holds for v ≥ u, then we could divide the range into several zones, make them attain an ideal result, because the smaller the zone is, the smoother the curve is. The divided zones are u = x0 < x1 < x2......< xn+1 = v, which is named △. In order to obtain the best approximation function, we also add some new nodes, and expand the zones. The result will be changed into x-n <
⋯
⋯<x
-1 <
u = x0 < x1 <
⋯
< xn+1 = v < xn+2 < < xn+m+1, where m >0. However, the value m should not be too much, since it will increase computation burden, lower system database’s retrieval efficiency. Step 2: Through the definition and the theory of the spline function, for the function
θ n (x,t)=(t - x) n+
⋯
within the data collection module, first, it obtains the value of t from
⋯ , x , x , ⋯, < x respectively, and then generate (x, x ),⋯ , θ (x, x ), θ (x, x ), θ (x, x ), θ (x, x ), θ (x,
the points x-n, , x-1, x0, x1, the result sequence xn+2),
⋯, θ
n
θn
n+1
-n
n+2
n+m+1
-1
n
n
0
n
1
n
n+1
n
(x, xn+m+1), which will become n times spline function [8].
In this sequence, when data collection module deletes the last but N+n+1 elements, the superfluous elements form a group of substrate of Sn (△), and become the nonlinear in the interval [u,v]. Now, through the appropriate linear fitting of N+2n +2 nspline functions, we can structure the partial strict positive substrate of Sn (△ . Step 3: At the points xi,xi+1,xi+2,...xi+n+1, the n+1-order difference quotient f(xi,xi+1,xi+2,...xi+n+1) of function f(t) can be obtained, and denoted as the linear ap-
)
i + n +1
proximation
f(xi,xi+1,xi+2,...xi+n+1)
=
∑ ω' k =i
i + n +1
∏ (t − x j =i
j
) , ω ' n +1 ,i ( x k ) =
i + n +1
∏ (x j =i j ≠k
i
− xj).
f ( xk ) , where ω ' n+1 ,i (t ) = n +1 ,i ( x k )
452
J. Shao et al.
Step 4: On n+1-order difference quotient of the function(xi+n+1 - xi) θ n (x,t), system can represent Bi,n(x) = (xi+n+1 - xi) θ n (x, xi, ..., xi+n+1), where i = -n, ..., N and t = xi, ..., xi+n+1 , Bi,n(x) is called ith n times specification B-spline function or n times B-spline function [9]. i + n +1
Step 5: Finally, we use the formulas f(xi,xi+1,xi+2,...xi+n+1) =
∑ ω' k =i
f ( xk ) in step 3 n +1 ,i ( x k )
and Bi,n(x) = (xi+n+1 - xi) θ n (x, xi, ..., xi+n+1) in step 4 to construct the following B-spine function Bi,n(x). i + n +1
Bi,n(x) = (xi+n+1 - xi)
∑ k =i
( x k − x) n ω ' n +1,i ( x k )
(4)
4.2 B-spline Function Fitting In order to facilitate computer programming, the n times the B-spline curve formulas ought to be converted into the following expression: n
P n(t) =
∑F i =0
i,n
(t ) Pi , s.t. t ∈ (0,1), where Pi is the characteristic polygon vertex
of n times B-spline curve, Fi,n (t) is the basis function curve of n times B-spline curve. According to Bi,n(x), for n = 3, the cubic B-spline basis function is the following: F0,3(t) = (-t3+3t2-3t+1)/6 F1,3(t) = (3t3-6t2+4)/6 F2,3(t) = (-3t3+3t2+3t+1)/6 F3,3(t) = t3/6 P (t) = F0,3(t)P0+ F1,3(t)P1+ F2,3(t)P2+ F3,3(t)P3
(5)
P (t) = [(-t3+3t2-3t+1)P0+(3t3-6t2+4)P1+(-3t3+3t2+3t+1)P2+t3P3]/6 where t ∈ (0,1). But, when data collection module draws the curve, the parameter expression P (t) will be written into the following function, because the curve value is represented from the horizontal x(t) and vertical y (t) direction. x(t) = F0,3(t)x0+ F1,3(t)x1+ F2,3(t)x2+ F3,3(t)x3 y (t) = F0,3(t)y0+ F1,3(t)y1+ F2,3(t)y2+ F3,3(t)y3
(6)
For the front roller speed, we adopt cubic B-spline function to draw the pulse signal curve (xi,yi), where i ∈ (1,n), the curve is good and smooth, which can be certified by the application, the result is smooth, and the yield data is accurate.
Numerical Approximation and Optimum Method of Production Monitoring System
453
5 Conclusion Computer monitoring and control system is based on LAN of textile enterprise, multiclient and the server constitutes a C/S model system structure, realize remote control and remote office. The system has been put into operation in one textile enterprise of China, application results have proven that the system is stable, data is accuracy, the real situation is reflected timely, the manual operation are reduced in a large extent, the scientific management of production data is achieved, furthermore, the client management software can dynamically monitor and control the production site, implement some queries, statistics, printing statements and other management functions. Thus, the management efficiency and the scientific decision-making of the textile enterprise are improved greatly.
References 1. Mei, Z.Q.: The present situation and its future development of yarn combing technology at home and abroad. Shanghai Textile Science & Technology 36(1), 1–2 (2008) 2. Tian, J., Bai, J., Yan, X.P., Bao, S., Li, Y., Liang, W., Yang, X.: Multimodality molecular imaging. IEEE Eng. Med. Biol. Mag. 27(5), 48–57 (2008) 3. Feng, J., Jia, K., Yan, G., Zhu, S., Qin, C., Lv, Y., Tian, J.: An optimal permissible source region strategy for multispectral bioluminescence tomography. Opt. Exp. 16(20), 15640– 15654 (2008) 4. Schulz, R.B., Ripoll, J., Ntziachristos, V.: Noncontact optical tomography of turbid media. Opt. Lett. 28, 1701–1703 (2003) 5. Schulz, R.B., Ripoll, J., Ntziachristos, V.: Experimental fluorescence tomography of tissues with noncontact measurements. IEEE Trans. Med. Imaging 23, 492–500 (2004) 6. Deliolanis, N., Lasser, T., Hyde, D., Soubert, A., Ripoll, J., Ntziachristos, V.: Free-space fluorescence molecular tomography utilizing 360° geometry projections. Opt. Lett. 32, 382– 384 (2007) 7. Ntziachristos, V., Graves, E.E., Schultz, R.B., Ripoll, J.: Fluorescence molecular tomography: New detection schemes for acquiring high information content measurements. In: IEEE International Symposium on Biomedical Imaging (ISBI 2004), vol. 2, pp. 1475–1478 (2004) 8. Schultz, R.B., Peter, J., Semmler, W.: Comparison of noncontact and fiber-based fluorescence-mediated tomography. Opt. Lett. 31, 769–771 (2006) 9. Guven, M., Yazici, B., Intes, X., Chance, B.: Diffuse optical tomography with a priori anatomical information. Phys. Med. Biol. 50(12), 2837–2858 (2005)
Design and Simulation of Simulated Annealing Algorithm with Harmony Search Hua Jiang, Yanxiu Liu, and Liping Zheng* College of Computer Science Shandong, Liaocheng University, 252059, P.R.China [email protected]
Abstract. Harmony search is a new heuristic optimization algorithm. Comparing with other algorithms, this algorithm has very strong robustness and can be easily operated. Combining with the features of harmony search, an improved simulated annealing algorithm is proposed in this paper. It can improve the speed of annealing. The initial state of simulated annealing and new solutions are generated by harmony search. So it has the advantage of high quality and efficiency. The simulation results show that this new algorithm has faster convergence speed and better optimization quality than the traditional simulated annealing algorithm and other algorithms. Keywords: Harmony search, Simulated annealing algorithm, Convergence speed.
1 Guidelines Harmony search(HS)[1] was proposed in recent years ,which is one of the Swarm Optimization algorithms. Based on the creative process of music Z. W. Geem etc. proposed the algorithm. Instrument i(1,2,…..,n) is regarded as design variable i, harmony of instrument Rj, j=1,2,……M is equal to the solution vector j. The algorithm first produces m initial solutions and they are put in Harmony memory. Then, a new solution is searched in the HM by harmony memory considering rate(HMCR) or searched in the whole variable range by 1-HMCR.The new generated solution is locally disturbed by Pitch Adjusting Rate(PAR).Finally, according to the objective function value to determine whether to update HM. Global Harmony Search(GHS)[2]is proposed by Mahamed G.H.Omran and Mehrdad Mahdavi. Compared with HS and GHS,GHS has a better searching efficiency. It modified the step of producing new solution. Simulated Annealing Algorithm(SA)[3] is a kind of heuristic random searching algorithm. It is based on Monte Carlo iterative solution strategy. Its main advantage is that it not only accepts better solution than the current state, but also it can jump out of local minimum. .SA is a commonly used method to the solution of combination optimal questions.It has been successfully applied to a variety of complex prolems *
This work is supported by Liaocheng University Research Rrojects Foundation. ((No. X0810015).
Design and Simulation of Simulated Annealing Algorithm with Harmony Search
455
such as TSP[4].However, there are several insufficiencies, such as the initial high temperature, slow annealing speed and large number of iterations and so on. At present, there are many methods were proposed. Li shuang in[5] proposed an improved SA for solving data mining. In [6][7], Jin Jiangang and Lv Pin respectively proposed different improved methods for solving different problems. Qu Zhiyi and Zheng Shilian in [8][9] proposed improved algorithms are the combination of GA and SA algorithm for different problems. Therefore, this paper presents an improved simulated annealing algorithm based on GHS. The choice of initial temperature is not very sensitive. So the improved algorithm can solve the short coming of high initial temperature and slow convergence speed of the simulated annealing.
2 The Hybrid of SA Algorithm and GHS Algorithm (SAGHS) The initial solution is randomly generated in the traditional annealing algorithm. Therefore, the size of the solution is uneven and ruleless. This feature will affect the effectiveness of algorithm. The defect can be avoided by using harmony search to create the initial solution. In order to enhance the searching efficiency, the new solution is generated by GHS search mechanism. The new solution is chose by Metropolis criterion. The worst solution is replaced by the new solution. The advantage of the searching process lies in remaining the intermediate optimal solution and updating on time. Finally, annealing process is executed once again based on the final optimal solution. 2.1. The Parameters Design of SAGHS 1. Function of state generator In simulated annealing algorithm, the current solution is disturbed to generate new solution. According to the GHS methods, state generating function is designed. Candidate solution is generated by GHS methods as following: j k x new = xbest
xj
(1)
xk
Where new is a new harmony vector of the first j, best is the best number of harmony, k is the first k-vectors in the best harmony. As GHS is a kind of heuristic global search, the designed algorithm can search the entire solution space and retain the intermediate optimal solution. 2. Function of state accepting SAGSA adopts min [1 exp(-ΔE/t)] as a function of the state to accept. 3. Annealing function To ensure the global optimal point can be converged, this algorithm uses exponential temperature-reducing approach as the temperature update function:
,
t k +1 = λ * t k .Where λ can be a random number from 0 to 1. In order to ensure the
regular and optimal solution can be find, SAGSA sets λ=0.96, which is the slower temperature-reducing rate.
456
H. Jiang, Y. Liu, and L. Zheng
2.2 SAGHS Design Steps According to above analysis and design, this article will integrate GHS into the simulated annealing algorithm for SAGHS. The main design flow shown is in Figure 1. Initializion with HS
Convergen ce creteria?
Y
With optimal solution reinitial and anneling once again
N
Update temperature
Y
Sample stability creteria? N
Generate new solution with GHS
N
(<0?
Y With
min{1 exp[ (/t]} accept
New solution replace old And update
Output solution
Fig. 1. SAGHS design flow
The main part of the specific design ideas and the steps are as follows: Step1: The Initialization of annealing temperature is T. SAGHS uses the harmony search method to create initial solution library MS. The formula of generating initial solution is x = LB + r ∗ (UB − LB ) , where UB is the maximum value of solution, LB is the minimum value of solution, r is uniformly distributed random number in (0,1). The iterations of each T is L. Step2: Disturbing and receiving processes are repeatedly executed L times under temperature T to determine whether the sample stability criterion is reached. If the criterion is not satisfied step3 –step4 will be executed. Otherwise the step5 will be executed to lower the temperature by using exponential function. Step3: The new solution is generated by GHS search mechanism. That is a new solution to be searched with probability HMCR in MS and with probability 1-HMCR in
Design and Simulation of Simulated Annealing Algorithm with Harmony Search
457
the entire variable. Then the optimal solution to be find in the initial solution library with PAR. Part of the code of generating a new solution is showed as follows. for (j=0,j
//find optimal solution
Step4: Difference value E between the objective function values E’ of new solution and the smallest solution E is calculated. If E<0 then the new solution is accepted,
△
△
otherwise the new solution is accepted with probability min{1,exp[- E /t]}. Then the MS is updated and the worst solution in the library also is replaced. Step5: Annealing. Tk+1=0.96*Tk, k=k+1. If the convergence criterion is satisfied, then the annealing process is executed based on the final solution once again and output terminal result else go to step2.
3 Simulation experiments In order to verify the performance of SAGHS, Five evaluation functions are used to test in simulation experiments. We compared SAGHS with the traditional SA and GHS. (1)Ackley function
f ( x) = −20 exp(−0.2 where
1 Nd 2 1 Nd x ) − exp( ∑ i ∑ cos(2Πxi )) + 20 + e (2) 30 i =1 30 i =1
x ∗ = 0 , f ( x∗ ) = 0 , − 600 ≤ xi ≤ 600
(2)Rosenbrock function N −1
f ( x) = ∑ (100( xi − xi2−1 ) 2 + ( xi −1 − 1) 2 )
(3)
i =1
where
x∗ = (1,1,...,1) , f ( x* ) = 0 , − 30 ≤ xi ≤ 30
(3) Rotated function Nd
i
i =1
j =1
f ( x ) = ∑ (∑ x j ) 2 where
x * = 0, f ( x * ) = 0,−100 ≤ xi ≤ 100
(4)
458
H. Jiang, Y. Liu, and L. Zheng
(4) Griewank function
x 1 Nd 2 Nd xi − ∏ cos( i ) + 1 ∑ 4000 i =1 i i =1 ∗ ∗ where x = 0, f ( x ) = 0, − 600 ≤ xi ≤ 600 f ( x) =
(5)
(5)Rastrigin function Nd
f ( x) = ∑ ( xi2 − 10 cos(2πxi ) + 10)
(6)
i =1
where
x* = 0, f ( x*) = 0,−5.12 ≤ xi ≤ 5.12
To investigate the effect of improved simulated annealing algorithm based on harmony search, The algorithm was encoded by VC++6.0.The parameters include HMCR=0.95, PAR=0.35. Simulation time of each test function is 30.Every simulation iterative 50000 times. Because the temperature doesn’t obviously influence the improved algorithm, experiment sets up temperature for 7000. Figure 2-Figure 6 is the comparison of convergence speed to the same objective function with different methods. Ackly function 4
SAGHS GHS SA
The value of the objective function
3 2 1 0 -1 -2 -3 -4 -5 -6 0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
iterations
5 x 104
Fig. 2. Ackly function convergence effect with three kinds of algorithms Rosenbrock function The value of the objective function
20
SAGHS GHS
18
SA
16 14 12 10 8 6 4 2 0 0
0.5
1
1.5
2
2.5
3
iterations
3.5
4
4.5
5 4
x 10
Fig. 3. Rosenbrock function convergence effect with three kinds of algorithms
Design and Simulation of Simulated Annealing Algorithm with Harmony Search
Rotated function
x 105
The value of the objective function
3
SAGHS GHS SA
2.5
2
1.5
1
0.5
0 0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
iterations
5 x 104
Fig. 4. Rotated function convergence effect with three kinds algorithms
The value of the objective function
Griewank function 8
SAGHS GHS
6
SA
4
2 0
-2
-4
-6
-8 0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
iterations
5 4
x 10
Fig. 5. Griewank function convergence effect with three kinds algorithms Rastrigin function The value of the objective function
10
SAGHS GHS SA
8 6 4 2 0 -2 -4 -6 -8 -10 0
0.5
1
1.5
2
2.5
3
iterations
3.5
4
4.5
5 x 104
Fig. 6. Rastrigin function convergence with three kinds of algorithm
459
460
H. Jiang, Y. Liu, and L. Zheng
According to Figure 2-Figure 6,we can see that: 1. Ackley, Griewank and Rastrigin are multimodal functions. The experimental result of SAGHS is the best in these three algorithms. According to the result of Figure2-Figure6, the convergence speed of SAGHS is also the fastest in these three algorithms. 2. Rosenbrock and Rotated are unimodal functions. Adopting SAGHS to test Rosenbrock function, the result and convergence speed are very well. Although Rotated testing result is not very well, its convergence speed is quite fast. From the comparison results, we can see that SAGHS has obvious advantage for solving multimodal functions. Both convergence speed and the search quality have reached a relatively high level. It is also good for the unimodal functions. Although the Rotated result is not very good, the convergence speed is quite quick. Overall, the convergence rate of SAGHS is the fastest and SA is the slowest one.
4 Conclusions Simulated annealing algorithm has been further improved based on harmony search mechanism. This paper proposed a new method by GHS search mechanism. SAGHS has a faster convergence rate and retains an optimal solution. This paper uses some benchmark test functions to test the performance of the new algorithm -SAGHS. Simulation results show that SAGHS has clear superiority for function optimization. In future, the selection of parameters needs to be studied to enhance the performance of the SAGHS algorithm. And this algorithm is applied in specific application of optimization problems but it needs to be deeply studied.
References 1. Geem, Z., Kim, J., Loganathan, G.: A New Heuristic Optimization Algorithm: Harmony Search. J. Simulation 76(2), 60–68 (2001) 2. Omran, M.G.H., Mahdavi, M.: Global-best Harmony Search. J. Applied Mathematics and Computation 198, 643–656 (2008) 3. Ling, W.: Intelligent Optimization Algorithm and its Application. Tsinghua University Press, Beijing (2001) 4. Jian, F., Qi, Y.: Solving TSP problem by Using Simulated Annealing Algorithm. J. Forest Engineering 24(1), 94–96 (2008) 5. Guohua, S., Yujin, C.: Improved Simulated Annealing Algorithm for Solving TSP problem. J. The Computer Knowledge and Technology 2(15), 1103–1105 (2008) 6. Jiangang, J., Juqun, L.: An Improved Simulated Annealing Algorithm to Solve Objective Optimization. J. Science and Technology Consulting Review 28, 148 (2007) 7. Pin, L., Jinfang, Z., Guan-bo, B., Lin, Y.: Research on Observer Sitting Problem Based on Improved Simulated Annealing Algorithm. J. Journal of System Simulation 21(14), 4328– 4330 (2009) 8. Zhiyi, Q., Xuefei, W., Zhiming, F., Zhenming, S.: An Optimization Algorithm for Multiple Constrained QoS Multicast Routing based on Genetic and Simulated Annealing Algorithm. J. Computer Applications and Software 24(12), 182–184 (2007) 9. Shilian, Z., Zhijin, Z., Junna, S., Xiaoniu, Y.: Cognitive Radio Decision of Simulated Annealing based on Genetic algorithm. J. Computer Simulation 25(1), 192–196 (2008)
Sudoku Using Parallel Simulated Annealing Zahra Karimi-Dehkordi, Kamran Zamanifar, Ahmad Baraani-Dastjerdi, and Nasser Ghasem-Aghaee Department of Computer Engineering, University of Isfahan (UI), P.O. Code 81746-73441, Hezar Jerib Avenue, Isfahan, Iran {zahra.karimi,zamanifar,ahmadb,aghaee}@eng.ui.ac.ir
Abstract. Parallel Simulated Annealing was applied to solving Sudoku puzzle. Simulated annealing is a stochastic search strategy which is best known for not getting trapped at the local optimums. Although SA suffers from low efficiency, it has been recognized as one of successful solutions in Sudoku. Sudoku puzzles in this study were formulated as an optimization problem in multi-agent environments. Variants of parallel SA could successfully solve this optimization problem. In this paper we implemented 3 different parallel SA in JADE and compared them. The results show that parallel search with periodic jumps gets better efficiency and success rate. Keywords: Sudoku, Parallel Simulated Annealing, Multi-Agent System.
2 Sudoku Using Simulated Annealing Simulated Annealing (SA) is an optimization technique which finds the optimal point by running series of moves under different thermodynamic condition. During each move, a neighboring state is determined by randomly changing the current state of the individual. The new state is evaluated via a cost function, if a state with a lower cost is found then the individual moves to that state. Otherwise, if the neighboring state has a higher cost then the individual will move to that state only if an acceptance probability condition is met. If it is not met, then the individual remains at the current state. The acceptance probability is proportional with a temperature which is redoing during run. High temperature increases the chance of worse move. A variant of SA called homogenous SA groups moves in markov chains where each Markov chain is generated at a fixed value of t, and t is then altered in-between subsequent chains. [2] The snapshot of this algorithm is shown in Fig 2. We applied SA at solving Sudoku. Our approach is based on Lewis algorithm[3] who has first applied SA for this puzzle. The algorithm uses a direct representation of candidate solutions. It means each solution is encoded as a matrix. To form initial candidate solution each non-fixed cell gets a random value in such a way that every block contains the values 1 to 9 exactly one. During the run, the neighborhood operator repeatedly chooses two different non-fixed cells in the same block and swaps them. The method of producing initial candidate solution and neighborhood ensures the third rule always met. The cost function looks at each row and column individually and calculates the number of values between 1 through 9 that are not present. The total cost is simply sum of these 18 cost values. Reevaluating the solutions in each move gets optimized by recalculating at most 2 rows and 2 columns. Their algorithm takes homogenous variant of SA using markov chain which its length equals with square of the number of non-fixed cells so the maximum of it is 81^2. Perez[1] uses Quantum Simulated Annealing(QSA) for solving Sudoku. The major difference between QSA and SA is that in SA the temperature determines the probability of moving from one state to the next, while the neighborhood remains
Sudoku Using Parallel Simulated Annealing
463
constant. In QSA, the neighborhood radius or the distance between the current state and the neighboring state is decreases when temperature is getting low. In the Sudoku context, Quantum simulated annealing is like simulated annealing with markov chains. But quantum simulated annealing has markov chains with different lengths. However QSA sounds better than SA with markov chains Perez[1] does not compare his result with Lewis[3] one
Fig. 2. Homogenous simulated annealing algorithm
3 Review of Parallel Simulated Annealing Algorithms Chen [4] reviewed five strategies of parallelizing SA algorithm. The first approach uses divide and conquer to decompose search space into smaller one. Unpromising sub-spaces are excluded from further search and promising ones are recursively decomposed and evaluated. The second approach is to run several solutions simultaneously; they start with the same random point and synchronize to continue the best one. The third approach further improves the second approach by changing a singlestart solution to multistart. The fourth and fifth approaches use a combination of SA with genetic algorithm. In both of them, search agents select an initial configuration randomly and evolve it with genetic operators. Then a synchronizer selects best solution, broadcast it and all continue search in a simulated annealing manner. In the forth approach genetic algorithm just applied at the initial step but in the fifth approach it is applied before every synchronization step to select another best. Minerals[5] classifies parallel SA differently in 3 categories named low level parallelization, domain decomposition and parallel search. The first one is about parallelizing algorithm for example evaluation part. Domain decomposition corresponds with the second approach in Chen classification and search parallelization is a general category of all others of Chen[4]. So parallel simulated annealing means searching all or different part of search space with a single or muli, improved or produced start point. Our approach uses a combination of all these approaches as explained in the next section.
464
Z. Karimi-Dehkordi et al.
Fig. 3. Domain Composition parallel simulated annealing: The top part presents phase I, and the bottom part shows phase II
4
Sudoku Using Parallel Simulated Annealing
We applied 3 variants of parallel simulated annealing, domain decomposition, parallel search with jumps, independent parallel search. All these variants use single random start configuration. Domain Decomposition: This 2-phase approach applies a variation of domain decomposition in the first phase and then runs parallel search(Second approach in [4]) during the second phase. Sudoku as a search problem is a permutation one. Decomposing this large permutation space (9^81) sounds impossible. For managing this difficulty, we use 3 parallel search agent each responsible for 3 rows (or blocks) of Sudoku puzzle. It means search agent 1 is moving through the three first rows and search agent 2 is exploring rows 4, 5, 6 and the last search agent works on other rows. Figure 3 shows the activity diagram of presented approach.1.2 Additional Information Required by the Volume Editor.
Sudoku Using Parallel Simulated Annealing
465
Fig. 4. Moving snapshot of the proposed approach, phase I consists of move-jump of 3 search agents and phase II is parallel search of individual agents
If you have more than one surname, please make sure that the Volume Editor knows how you are to be listed in the author index. When a search agent is searching through its own part, does not see others’ moves. This mechanism is a kind of partitioning but don’t cover all the space. We introduce a jump at the end of each markov chain to increase exploration. Jump method is explained in the next section. Fig 4 shows a snapshot of 3 agents moving. This movejump continues till getting a local optimum. Then second phase starts, local optimum is propagated to all agents and each agent start searching all space independently. Parallel search with jumps: In this variant we use search agents to search all the puzzle but jumps are used in the search space as previous. Independent parallel search: In this variant we use search agents without any jumps. Manager Agent
SA Agent 1
SA Agent 2
SA Agent 3
Fig. 5. Architecture of Multi-Agent system, SA Agents interacts with each other indirectly using Manager Agent
466
Z. Karimi-Dehkordi et al.
5 Experimentation We has implemented this approach in JADE [6]. It is a mulitagent framework which lets easy implementation of interacting agents. There are 4 agents, one manager and 3 simulated annealing agents. (Fig 5) Manager agent creates and starts SA Agents, manages jumps, propagates found local optimum and maintains the last received search point. SA Agent does evaluating, moving and interacting with manager. Other implementation points are discussed in the following paragraph. •
•
•
How jump method is implanted: The manager maintains a point (as a matrix) which is initially filled by produce random configuration. When it receives a new configuration from one of SA agents, It constructs 4 different configurations with swapping their partitions, evaluates them and selects one of them using roulette wheel. Simulated Annealing Parameters based on Lewis[3] o Cooling Factor(α) = 0.75 o Number of Markov Chains(n) = 25 o Length of Markov Chain(m) = (number of unfixed cells) ^ 2 o Initial temperature(T) = 2.5 Local Optimum for beginning phase II: Cost = 4
We applied these 3 variants on a Sudoku of Fig 6, each variant runs 30 times. The summary of results has been shown in table 1. The results show that parallel search with jumps has better efficiency, mean, standard deviation an max are the best. Also the last column, success, indicates the number of successful runs which parallel search with jump is the best Table 1. Summary of results of running algorithm Time Variant Domain Decomposition Parallel search with jump Parallel Search independently
Mean 23.79167
Standard Deviation 136.6093
Min
Max
12
58
16
13.6161
10
24
17.57692
72.0801
9
51
Fig. 6. Snapshot of running program (black numbers are fixed one)
Success 24 28
26
Sudoku Using Parallel Simulated Annealing
467
6 Conclusion In this paper, we have presented parallel version of Simulated Annealing Sudoku Solver[3] to improve its efficiency. However there are lots of parallel Sudoku solvers using genetic algorithm[1], all use parallel search through all the search space. We compared 3 variants of parallel simulated annealing. Domain decomposition makes search space smaller which obviously results in shorter markov chains, but it could not solve Sudoku efficiently. The best one is parallel search through whole space with jumps. Also the implementation, to our knowledge is the first multi-agent implementation of solving Sudoku which provides synchronization capabilities needed to our approach. We are going to evaluate the results with more Sudoku sample and multistart variants of parallel simulated annealing.
References 1. Perez, M., Marwala, T.: Stochastic optimization approaches for solving Sudoku, Arxiv preprint arXiv:0805.0697 (2008) 2. Laarhoven, P., Aarts, E.: Simulated annealing: theory and applications. Springer, Heidelberg (1987) 3. Lewis, R.: Metaheuristics can solve sudoku puzzles. Journal of heuristics 13, 387–401 (2007) 4. Chen, D., et al.: Parallelizing simulated annealing algorithms based on high-performance computer. Journal of Global Optimization 39, 261–289 (2007) 5. Minerals, D., Arabia, S.: Evaluating Parallel Simulated Evolution Strategies For VLSI Cell Placement 6. Bellifemine, F., et al.: Jade administrator’s guide, TILab (February 2006)
A Novel Spatial Obstructed Distance by Dynamic Piecewise Linear Chaotic Map and Dynamic Nonlinear PSO Xueping Zhang1,2, Yawei Liu1, Jiayao Wang1,3, and Haohua Du4 1 School
of Information Science and Engineering, Henan University of Technology, Zhengzhou 450052, China 2 Key Laboratory of Spatial Data Mining & Information Sharing of Ministry of Education, Fuzhou University, Fuzhou, 350002 , China 3 School of Surveying and Mapping, PLA Information Engineering University, Zhengzhou 450052, China 4 School of computer science and engineering, Beihang University, Beijing 100191, China [email protected]
Abstract. Spatial Clustering with Obstacles Constraints (SCOC) has been a new topic in Spatial Data Mining (SDM). Spatial Obstructed Distance (SOD) is the key to SCOC. The obstacles constraint is generally ignored in computing distance between two points, and it leads to the clustering result which is of no value, so obstructed distance has a great effect upon clustering result. In this paper, we propose a novel Spatial Obstructed Distance using Dynamic Piecewise Linear Chaotic Map and Dynamic Nonlinear Particle Swarm Optimization (PNPSO) based on Grid model to obtain obstructed distance, which is named PNPGSOD, it is not only simple and easy to actualize, but also convergent rapidly, the experimental results are provided to verify the effectiveness and practicability. Keywords: Spatial Obstructed Distance; Particle Swarm Optimization, Dynamic Piecewise Linear Chaotic Map, Dynamic Nonlinear, Grid model.
A Novel Spatial Obstructed Distance by Dynamic Piecewise Linear Chaotic Map
469
greater than the Euclidean distance when we compute the distance between two points with obstacles, sometimes it is even infinite, and the obstructed distance is the key problem influenced algorithm efficiency in the process of clustering. So obstructed distance could be considered as a special case of individual object with constraints, and is also a significant research topic in SCOC. To make public presently, the best of our knowledge, the existed methods of computing obstructed distance for SCOC are Visual Graph algorithm[3], Dijkstra algorithm[4] and A* algorithm[5] etc, these algorithms have their own advantages, but also have some limitations, choosing different algorithms according to different environment feature and performance is an effective way. It demands to enrich the methods of computing obstructed distance further, and takes in new algorithms continuously. Particle Swarm Optimization (PSO) is relatively a newer addition to a class of population based search technique for solving numerical optimization problems. PSO requires no gradient information of the function to be optimized is very easy to implement in any standard programming language and uses only primitive mathematical operators throughout. PSO has undergone a plethora of changes since its development in 1995. Recently, paper [6] proposed PNPSO (Dynamic Piecewise Linear Chaotic Map and Dynamic Nonlinear PSO). Experiments and comparisons demonstrated that the new proposed PNPSO outperformed several other well-known improved PSO algorithms on many famous benchmark problems in all cases. In this paper, we propose a novel method of spatial obstructed distance using PNPSO, at the same time, design an algorithm to compute obstructed distance, which is named PNPGSOD, it is not only has a stronger global searching capability, but also finds out a global best solution rapidly. Furthermore, fully take account of the obstacles to the distance between two points, and the final result has significance of actuality. The remainder of the paper is organized as follows. Section 2 introduces PNPSO algorithm. PNPGSOD is developed in Section 3. The performances of PNPGSOD are showed in Section 4, and Section 5 concludes the paper.
2 Dynamic Piecewise Linear Chaotic Map and Dynamic Nonlinear PSO 2.1 Classical PSO The Particle Swarm Optimization (PSO) is a population-based optimization method first proposed by Kennedy and Eberhart [7]. In order to find an optimal or nearoptimal solution to the problem, PSO updates the current generation of particles (each particle is a candidate solution to the problem) using the information about the best solution obtained by each particle and the entire population. In the context of PSO, a swarm refers to a number of potential solutions to the optimization problem, where each potential solution is referred to as a particle. The aim of the PSO is to find the particle position that results in the best evaluation of a given fitness (objective) function. Each particle has a set of attributes: current velocity, current position, the best position discovered by the particle so far and, the best position discovered by the particle and its neighbors so far. The user can define the size of the neighborhood.
470
X. Zhang et al.
The mathematic description of PSO is as the following. Suppose the dimension of the searching space is D, the number of the particles is n. Vector X i = ( xi1 , xi 2 ,K , xiD )
represents the position of the i th particle and pbesti = ( pi1 , pi 2 ,K , piD ) is its best position searched by now, and the whole particle swarm's best position is represented as gbest = ( g1 , g 2 ,K , g D ) .Vector Vi = (vi1 , vi 2 ,K , viD ) is the position change rate of the i th particle. Each particle updates its position according to the following formulas:
vid (t + 1) = wvid (t ) + c rand ()[ pid (t ) - xid (t )] +c rand ()[ g d (t ) - xid (t )] 1
2
xid (t + 1) = xid (t ) + vid (t + 1) , 1 ≤ i ≤ n, 1 ≤ d ≤ D
(1) (2)
where c and c are positive constant parameters, Rand () is a random function with 1 2 the range [0, 1], and w is the inertia weight. Equation (1) is used to calculate the particle's new velocity, then the particle flies toward a new position according to equation (2). The performance of each particle is measured according to a predefined fitness function, which is usually proportional to the cost function associated with the problem. This process is repeated until userdefined stopping criteria are satisfied. A disadvantage of the global PSO is that it tends to be trapped in a local optimum under some initialization conditions [8]. 2.2 Coordinate Particle Swarm Optimization with Dynamic Piecewise-Mapped and Nonlinear Inertia Weights 2.2.1 Dynamic Piecewise Linear Chaotic Map The well-known Piecewise linear chaotic map is employed in PNPSO [16] for constructing a new dynamic chaotic map. The Piecewise linear chaotic map is defined as follows [13]:
where pc is the control parameter and X is a variable. Although the above equation is deterministic, it exhibits chaotic dynamics in (0, 1) when pc ∈ (0, 0.5) ∪ (0.5,1) . That is, it exhibits sensitive dependence on initial conditions, which is the basic characteristic of chaos. A minute difference in the initial variable value would result in a considerable difference in its long time behaviors. The track of chaotic variable can travel ergodicly over the whole search space. In general, the above chaotic map enjoys certainty, ergodicity, pseudo-randomness, irregularity and stochastic property. To enrich searching behaviors and to avoid being trapped into local optima, the new introduced dynamic Piecewise linear chaotic map is incorporated into the PSO inertia weight which is described in equations (4) and (5) [6].
A Novel Spatial Obstructed Distance by Dynamic Piecewise Linear Chaotic Map
w = α + (1 − α ) Pmap
471
(5)
where α is the dynamic chaotic inertia weight adjustment factor, αmax and αmin represent the maximum and minimum values of α respectively, iter is the current iteration number, itermax is the maximum iteration number, and Pmap is the result of Piecewise linear chaotic map. 2.2.2 Dynamic Nonlinear Equations To achieve trade-off between exploration and exploitation, two types of dynamic nonlinear inertia weights are introduced [6]. In this paper, the first type is proposed in equations (6) and (7):
where dnl represents the dynamic nonlinear factor, w represents the inertia weight, wmax and wmin represent the maximum and minimum value of w respectively, dnlmax and dnlmin represent the maximum and minimum value of dnl respectively, iter represents the current iteration number, and itermax represents the maximum iteration number. 2.2.3 Parallel Inertia Weight Adjustment To avoid the premature convergence problem and to achieve the balance between global exploration and local exploitation, dynamic Piecewise linear chaotic map and dynamic nonlinear equations are used in parallel to dynamically adjust PSO inertia weight w, which is described as follows[6]: Initialize all the parameters. repeat Evaluate the fitness values of all the particles. if fi f f avg
Equations (4), (5), (1) and (2) are employed. Elseif fi ≤ f avg Equations (6), (7), (1) and (2) are employed endif until (a termination criterion is met) where f i is the fitness value of particle i and f avg is the average fitness value of the whole population. The above new introduced methods help particles search the global minima (global optima) dynamically according to different fitness values of two different dynamic sub-swarms. When fitness values are worse than the average, in order to help the local-optima trapped particles break away from the bad conditions, equations (4) and
472
X. Zhang et al.
(5) are employed to adjust the inertia weight chaotically, which make particles search global minima in very complex environments dynamically and avoid premature convergence. On the contrary, when fitness values are better than or equal to the average, in order to retain the favorable conditions, equations (6) and (7) are employed to adjust the inertia weight dynamically in a continuous convex area, which make particles achieve a good balance between global exploration and local exploitation and ensure convergence on global optima step by step.
3 Spatial Obstructed Distance Using PNPSO Based on Grid Model To derive a more efficient algorithm for SCOC, the following definitions are first introduced. Definition 1. (Obstructed distance)Given point p and point q , the obstructed distance d o ( p, q ) is defined as the length of the shortest Euclidean path between two points p and q without cutting through any obstacles. Spatial obstructed distance using PNPSO can be divided into two stages, that is, first establishes environment modeling based on grid model, and then adopts PNPSO to get shortest obstructed path. 3.1 Environment Modeling Based on Grid Model
The basic thought is that the space area is divided into many small areas which have the same size, and every small area is considered as a grid. If the obstructed area is irregular shape, we can fill up the obstacles within verges. The grid considered as a free grid if it has no obstacles, otherwise an obstructed grid. Every grid has a corresponding coordinate and a sequence number, which represents coordinate uniquely. Grid model divided the space area into two-valued grids, number 0 represents the free grid and number 1 represents the obstructed grid. An example is shown in Fig.1, 20*20 grid model, the shadow areas indicate the obstructed grid, and the number in the grid is the sequence number. The relationship between coordinate ( x p , y p ) and sequence number P is defined as follows: ⎪⎧ x p = ⎡⎣( p − 1) mod m ⎤⎦ + 1⎪⎫ ⎨ ⎬ ⎩⎪ y p = int ⎡⎣( p − 1) / m ⎤⎦ + 1⎭⎪
(8)
where m is the number of grid in every line. Our task is to search a route avoiding obstacles from point S to point E .The objective function can be revised as: np
L = ∑ ( xi − xi −1 )2 + ( yi − yi −1 )2
(9)
i =2
where ( xi , yi ) express the information of route point, n p is the number of route point.
A Novel Spatial Obstructed Distance by Dynamic Piecewise Linear Chaotic Map
473
Fig. 1. Environment modeling
3.2 Obstructed Distance by PNPSO
In the system of PSO, each particle represents a route from starting point to target point, for an example, xi = ( xi1 , xi 2 , ⋅⋅⋅xiD ) , where D is the dimension of particle. The
dimension of every particle represents a grid, the first dimension express the sequence number of the starting grid and the last one is the sequence number of target grid, it can forms a route when the sequence numbers are connected by the ascending order. The route from sequence 1 to sequence 400 is depicted as follows: 1→21→147→148→190→191→212→232→274→275→317→318→339→379→40 0. The particle coding is defined as follows: xi = (1, 21,147,148,190,191, 212, 232, 274, 275, 317, 318, 339, 379, 400 ) ; The size of the particle dimension is an important problem though analyzing, the description of the route can be detailed by increasing the dimension of particles, and we can obtain an ideal result. But with the increasing of the dimension, the searching space will be more complicated, and performance will be decreased. If the size of dimension n is too small, the capability of avoiding obstacles will be weak, and the distance obtained will depart from the real distance. So, it is important to choose a suitable size of dimension in the process of coding. Thus it can form a route when the sequence numbers are connected by the ascending order. The fitness function is defined as follows: f =
1
1 ⎞ ⎛ ⎜1 + ⎟L n −1 ⎠ ⎝
(10)
where n is the number of the route grid passed, L is the sum distance between the two sequence number, it can be calculated according to equation (9). The PNPGSOD is developed as follows. 1. Establish the environment model by grid model; 2. Initialize each particle, and make any particle’s position is free grid; 3. For t = 1 to t do { max 4. Evaluate fitness of particle according to equation (10); 5. if fi f f avg Update particles using equations (4), (5), (1) and (2);
474
X. Zhang et al.
6. Elseif fi ≤ f avg Update particles using equations (6), (7), (1) and (2) ; 7. Update Pbest ; 8. Update Pgbest ; 9. If ||v|| ≤ ε , terminate; } 10. Output obstructed distance.
4 Experimental Result and Performance of PNPGSOD In order to prove the effect of the new algorithm, we do the experiment for several times. In our experiment, the PNPSO algorithm uses the following parameter values. The population size n = 50 , the maximum number of iterations tmax = 100 . Figure 2 shows the PNPGSOD algorithm finds a best global path in environment 1. Fig.3 shows the fitness value varying with the iterative number, it is clear that the new method proposed in this paper constringes in about 12generations.
Fig. 2. The simulation result in environment 1
Fig. 3. Variation of fitness value
In the process of experiment, we do the experiment for about 30 times on condition that keep the scale of particles and the number of iterations invariable, and the average time of execution is 9.5s until the algorithm constringes. The hardware environment for the experiment is on a PC (CPU Pentium(R) D 3.0, Main Memory 1G); the software environment is Windows XP and Visual C++. The complexity of PNPGSOD the algorithm proposed in this paper is O ( N * M * D ) , it relies on the number of particles N, the maximum number M,and the number of the dimension D when the algorithm constringes. The complexity of Visual Graph algorithm is O ( N log N + N 2 ) , the complexity of Dijkstra algorithm is O ( N 3 ) and the complexity of A* algorithm is O ( N 2 log N ) ,we can easily find out
that the complexity of the traditional algorithms are rising with the increasing of the obstacles.
A Novel Spatial Obstructed Distance by Dynamic Piecewise Linear Chaotic Map
475
5 Conclusion Obstructed distance is a significant research filed in the spatial clustering. Traditional algorithms generally ignore the fact that many constraints exit in the real world, and cluster by using the Euclidean distance directly, it is obvious that the effectiveness of clustering result could be affected. This paper proposes a novel Spatial Obstructed Distance using PNPSO based on Grid model (PNPGSOD) to obtain obstructed distance. The experiments show that PNPGSOD is effective, and it can not only give attention to higher local constringency speed and stronger global optimum search. Acknowledgments. This work is partially supported by Program for New Century Excellent Talents in University (NCET-08-0660),the Supporting Plan of Science and Technology Innovation Talent of Colleges in Henna Province (Number: 2008HASTIT012),and the Opening Research Fund of Key Laboratory of Spatial Data Mining & Information Sharing of Ministry of Education (Number:200807).
References 1. Li, D.R., Wang, S.L., Li, D.Y.: Spatial data mining theory applications. Science Publishing House (2006) 2. Tung, A.K.H., Han, J., Lakshmanan, L.V.S., Ng, R.T.: Constraint-based clustering in large database. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 405– 419. Springer, Heidelberg (2000) 3. Lozano-perez, T.: Automatic planning of manipulator transfer movements. IEEE Trans. on Sys. Man an Cyb., 681–698 (1981) 4. Dijkstra, E.W.: A note on two problems in connection with graphs. Numerische Mathematik, 269–271 (1959) 5. Chabini, I., Lan, S.: Adaptations of the A* algorithm for the computation of fartest paths in deterministic discrete-time dynamic networks. IEEE Transactions on intellingent transportation systems, 60–74 (2002) 6. Liu, H.L., Su, R.J., Gao, Y.: Coordinate Particle Swarm Optimization with Dynamic Piecewise-mapped and Nonlinear Inertia Weights. In: Proceedings of the 2009 International Conference on Artificial Intelligence and Computational Intelligence (AICI 2009), Shanghai China, pp. 124–128 (2009) 7. Kennedy, J., Eberhart, R.C.: Particle Swarm Optimization. In: Proceedings of IEEE International Conference on Neural Networks, Perth, Australia, vol. IV, pp. 1942–1948 (1995) 8. Frans, V.D.B.: An Analysis of Particle Swarm Optimizers. Ph.D. thesis, University of Pretoria (2001)
A Novel Spatial Clustering with Obstacles Constraints Based on PNPSO and K-Medoids Xueping Zhang1, Haohua Du2, Tengfei Yang1, and Guangcai Zhao1 1 School
of Information Science and Engineering, Henan University of Technology, Zhengzhou 450052, China 2 School of computer science and engineering, Beihang University, Beijing 100191, China [email protected]
Abstract. In this paper, we propose a novel Spatial Clustering with Obstacles Constraints (SCOC) based on Dynamic Piecewise Linear Chaotic Map and Dynamic Nonlinear Particle Swarm Optimization (PNPSO) and K-Medoids, which is called PNPKSCOC. The contrastive experiments show that PNPKSCOC is effective and has better practicalities, and it performs better than PSO K-Medoids SCOC in terms of quantization error and has higher constringency speed than Genetic K-Medoids SCOC. Keywords: Spatial Clustering, Obstacles Constraints, Particle Swarm Optimization, Piecewise Linear Chaotic Map, Nonlinear Inertia Weights, K-Medoids.
A Novel Spatial Clustering with Obstacles Constraints Based on PNPSO
477
shortcoming of DBSCAN algorithm, which cannot run in large high dimensional data sets etc. We proposed GKSCOC based on Genetic algorithms (GAs) and Improved KMedoids SCOC in the literature [9]. The experiments show that GKSCOC is effective but the drawback is a comparatively slower speed in clustering. PKSCOC based on Particle swarm optimization (PSO) and IKSCOC is presented [10] by us. However, the performance of simple PSO depends on its parameters, it often getting into local optimum and fails to converge to global optimum. Recently, paper [11] proposed PNPSO (Dynamic Piecewise Linear Chaotic Map and Dynamic Nonlinear PSO). Experiments and comparisons demonstrated that the new proposed PNPSO outperformed several other well-known improved PSO algorithms on many famous benchmark problems in all cases. This article developed PNPKSCOC algorithm based on PNPSO and K-Medoids to cluster spatial data with obstacles constraints. The contrastive experiments show that PNPKSCOC is better than PKSCOC in terms of quantization error and has higher constringency speed than GKSCOC. The remainder of the paper is organized as follows. Section 2 introduces PNPSO algorithm. Section 3 presents PNPKSCOC. The performances of PNPKSCOC are showed in Section 4, and Section 5 concludes the paper.
2 Dynamic Piecewise Linear Chaotic Map and Nonlinear PSO 2.1 Classical PSO PSO is a population-based optimization method first proposed by Kennedy and Eberhart [12].The mathematic description of PSO is as the following. Suppose the dimension of the searching space is D, the number of the particles is n. Vector X i = ( xi1 , xi 2 ,K , xiD ) represents the position of the i th particle and
pbesti = ( pi1 , pi 2 ,K , piD ) is its best position searched by now, and the whole particle swarm's best position is represented as gbest = ( g1 , g 2 ,K , g D ) .Vector Vi = (vi1 , vi 2 ,K , viD ) is the position change rate of the i th particle. Each particle updates its position according to the following formulas: vid (t + 1) = wvid (t ) + c rand ()[ pid (t ) - xid (t )] +c rand ()[ g d (t ) - xid (t )] 1
2
xid (t + 1) = xid (t ) + vid (t + 1) , 1 ≤ i ≤ n, 1 ≤ d ≤ D
(1) (2)
where c and c are positive constant parameters, Rand () is a random function with 1 2 the range [0, 1], and w is the inertia weight.Equation (1) is used to calculate the particle's new velocity, then the particle flies toward a new position according to equation (2). A disadvantage of the global PSO is that it tends to be trapped in a local optimum under some initialization conditions [13].
478
X. Zhang et al.
2.2 Coordinate Particle Swarm Optimization with Dynamic Piecewise-Mapped and Nonlinear Inertia Weights
The Piecewise linear chaotic map is defined as follows:[14] xt / p, xt ∈ (0, p) ⎧ xt +1 = ⎨ ⎩(1 − xt ) /(1 − p ), xt ∈ ( p,1)
(3)
where p is the control parameter and X is a variable. Although the above equation is deterministic, it exhibits chaotic dynamics in (0, 1) when p ∈ (0,0.5) ∪ (0.5,1) . To enrich searching behaviors and to avoid being trapped into local optima, the newly introduced dynamic Piecewise linear chaotic map is incorporated into the PSO inertia weight which is described in equations (4) and (5) [11].
where α is the dynamic chaotic inertia weight adjustment factor, αmax and αmin represent the maximum and minimum values of α respectively, iter is the current iteration number, itermax is the maximum iteration number, and Pmap is the result of Piecewise linear chaotic map. To achieve trade-off between exploration and exploitation, two types of dynamic nonlinear inertia weights are introduced [11]. In this paper, the first type is proposed in equations (6) and (7): ⎛ iter ⎞ (6) dnl = dnlmin + ( dnlmax − dnlmin ) ⎜ ⎟ ⎝ itermax ⎠
w = wmin + ( wmax
⎛ iter − iter ⎞ − wmin ) ⎜ max ⎟ ⎝ itermax ⎠
dnl
(7)
where dnl represents the dynamic nonlinear factor, w represents the inertia weight, wmax and wmin represent the maximum and minimum value of w respectively, dnlmax and dnlmin represent the maximum and minimum value of dnl respectively, iter represents the current iteration number, and itermax represents the maximum iteration number. To avoid the premature convergence problem and to achieve the balance between global exploration and local exploitation, dynamic Piecewise linear chaotic map and dynamic nonlinear equations are used in parallel to dynamically adjust PSO inertia weight w, which is described as follows[11]: Initialize all the parameters. repeat Evaluate the fitness values of all the particles. if fi f f avg Equations (4), (5), (1) and (2) are employed.
A Novel Spatial Clustering with Obstacles Constraints Based on PNPSO
479
Elseif fi ≤ f avg Equations (6), (7), (1) and (2) are employed endif until (a termination criterion is met) where f i is the fitness value of particle i and f avg is the average fitness value of the whole population.
3 Spatial Clustering with Obstacles Constraints by PNPSO and K-Medoids 3.1 Obstructed Distance
Definition 1. (Obstructed distance) Given point p and point q , the obstructed distance d o ( p, q ) is defined as the length of the shortest Euclidean path between two points p and q without cutting through any obstacles. To compute obstructed distance, we use particle swarm optimization based on Grid model as presented in [15]. 3.2 IKSCOC Based on K-Medoids
K-Medoids algorithm is adopted for SCOC to avoid cluster center falling on the obstacle. Square-error function is adopted to estimate the clustering quality, and its definition can be defined as: Nc E = ∑ ∑ ( d ( p , m j )) 2 j =1 p∈C j
where
(8)
is the number of cluster C j , m is the cluster centre of cluster C j , d ( p, q) is j the direct Euclidean distance between the two points p and q . To handle obstacle constraints, accordingly, criterion function for estimating the quality of spatial clustering with obstacles constraints can be revised as: Nc
Eo =
N c ∑ ∑ j =1p∈C
( d o ( p , m )) 2 j j
where d o ( p, q ) is the obstructed distance between point p and point q . The method of IKSCOC is adopted as follows [5]. 1. Select N c objects to be cluster centers at random; 2. Assign remaining objects to nearest cluster center; 3. Calculate Eo according to equation (12); 4. While ( Eo changed) do {Let current E = Eo ; 5. Select a not centering point to replace the cluster center m randomly; j 6. Assign objects to the nearest center;
(9)
480
X. Zhang et al.
7. Calculate E according to equation (11); 8. If E > current E , go to 5; 9. Calculate Eo ; 10. If Eo < current E , form new cluster centers}. While IKSCOC still inherits two shortcomings, one is selecting initial value randomly may cause different results of the spatial clustering and even have no solution, the other is that it only gives attention to local constringency and is sensitive to an outlier. 3.3 PNPKSCOC Based on PNPSO and K-Medoids
In the context of clustering, a single particle represents the N c cluster centroid. That is, each particle X i is constructed as follows: X i = (mi1 ,..., mij ,..., miNc )
(10)
where mij refers to the j th cluster centroid of the i th particle in cluster Cij . Here, the objective function is defined as follows: 1 f (x ) = Nc i (11) ∑ ∑ d o ( p, m j ) j = 1 p ∈ Cij The PNPKSCOC is developed as follows. 1. Execute the IKSCOC algorithm to initialize one particle to contain N c selected cluster centroids; 2. Initialize the other particles of the swarm to contain N c selected cluster centroids at random; 3. For t = 1 to t do { max
4. 5. 6.
For i = 1 to no_of_particles do { For each object p do { Calculate d o ( p, mij ) ;
7. Assign object p to cluster Cij such that do ( p, mij ) = min∀c = 1,..., N {do ( p, mic )} ; c 8. 9.
Evaluate fitness of particle according to equation (11); if fi f f avg Update particles using equations (4), (5), (1) and (2);
11.
Elseif fi ≤ f avg Update particles using equations (6), (7), (1) and (2) ;
12. Update Pbest ; 13. Update Pgbest ; 14. If ||v|| ≤ ε , terminate } 15. Select two other particles j and k ( i ≠ j ≠ k ) randomly; 16. Optimize new individuals using IKSCOC} 17. Output. where t is the maximum number of iteration for PNPSO. STEP 16 is to improve max
the local constringency speed of PNPSO.
A Novel Spatial Clustering with Obstacles Constraints Based on PNPSO
481
4 Results and Discussion We have made experiments separately by K-Medoids, IKSCOC, GKSCOC, PKSCOC, and PNPKSCOC. n = 50 , c1 = c2 = 2 , tmax = 100 . Fig.1 shows the results on Dataset1. Fig.1 (a) shows the original data with simple obstacles. Fig.1 (b) shows the results of 4 clusters found by K-Medoids without considering obstacles constraints. Fig.1(c), (d), (e), and (f) show the results found by IKSCOC, GKSCOC, PKSCOC, and PNPKSCOC respectively. Obviously, the results of Fig.1(c), (d), (e), and (f) have better practicalities than that in Fig.1 (b), and the one Fig.1 (f) is superior to the one in Fig.3 (e).
(a)
(d)
(b)
(e)
(c)
(f)
Fig. 1. Clustering Dataset1
Fig.2 shows the results on Dataset2 of residential spatial data points with river and railway obstacles in facility location on city parks. Fig.2 (a) shows the original data with river and railway obstacles. Fig.2 (b) is the cluster result by K-Medoids without considering obstacles constraints. The result in Fig.2(c) found by PNPKSCOC is effective obviously. So, it can be drawn that PNPKSCOC is effective and has better practicalities. Fig.3 is the value of J showed in every experiment on Dataset1 by IKSCOC, PKSCOC, and PNPKSCOC respectively. It is showed that IKSCOC is sensitive to initial value and it constringes in different extremely local optimum points by starting at different initial value while PNPKSCOC constringes nearly in the same optimum points at each time, and PNPKSCOC is better than PKSCOC. Fig.4 is the constringency speed in one experiment on Dataset1. It is showed that PNPKSCOC constringes in about 12 generations while GKSCOC constringes in nearly 25 generations. So, it can be drawn that PNPKSCOC is effective and has higher constringency speed than GKSCOC.
482
X. Zhang et al.
(a)
(b)
(c)
Fig. 2. Clustering Dataset2
Fig. 3. PNPKSCOC vs. IKSCOC, PKSCOC
Fig. 4. PNPKSCOC vs. GKSCOC
Therefore, we can draw the conclusion that PNPKSCOC has stronger global constringent ability than PKSCOC and has higher convergence speed than GKSCOC.
5 Conclusions In this paper, we propose a novel PNPKSCOC based on PNPSO and K-Medoids. The proposed method is also compared with some other algorithms to demonstrate its efficiency and the experimental results are satisfied. Acknowledgments. This work is partially supported by Program for New Century Excellent Talents in University (NCET-08-0660),and the Supporting Plan of Science and Technology Innovation Talent of Colleges in Henna Province (Number: 2008HASTIT012).
References 1. Tung, A.K.H., Han, J., Lakshmanan, L.V.S., Ng, R.T.: Constraint-Based Clustering in Large Databases. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 405–419. Springer, Heidelberg (2000)
A Novel Spatial Clustering with Obstacles Constraints Based on PNPSO
483
2. Tung, A.K.H., Ng, R.T., Lakshmanan, L.V.S., Han, J.: Geo-spatial Clustering with UserSpecified Constraints. In: Proceedings of the International Workshop on Multimedia Data Mining (MDM/KDD 2000), Boston, USA, pp. 1–7 (2000) 3. Tung, A.K.H., Hou, J., Han, J.: Spatial Clustering in the Presence of Obstacles. In: Proceedings of International Conference on Data Engineering (ICDE 2001), Heidelberg, Germany, pp. 359–367 (2001) 4. Estivill-Castro, V., Lee, I.J.: AUTOCLUST+: Automatic Clustering of Point-Data Sets in the Presence of Obstacles. In: Proceedings of the International Workshop on Temporal, Spatial and Spatial-Temporal Data Mining, Lyon, France, pp. 133–146 (2000) 5. Zaïane, O.R., Lee, C.H.: Clustering Spatial Data When Facing Physical Constraints. In: Proceedings of the IEEE International Conference on Data Mining (ICDM 2002), Maebashi City, Japan, pp. 737–740 (2002) 6. Wang, X., Hamilton, H.J.: DBRS: A Density-Based Spatial Clustering Method with Random Sampling. In: Proceedings of the 7th PAKDD, Seoul, Korea, pp. 563–575 (2003) 7. Wang, X., Rostoker, C., Hamilton, H.J.: DBRS+: Density-Based Spatial Clustering in the Presence of Obstacles and Facilitators (2004), http://Ftp.cs.uregina.ca/Research/Techreports/2004-09.pdf 8. Wang, X., Hamilton, H.J.: Gen and SynGeoDataGen Data Generators for Obstacle Facilitator Constrained Clustering (2004), http://Ftp.cs.uregina.ca/Research/Techreports/2004-08.pdf 9. Zhang, X., Wang, J., Wu, F., Fan, Z., Li, X.: A Novel Spatial Clustering with Obstacles Constraints Based on Genetic Algorithms and K-Medoids. In: Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications (ISDA 2006), Jinan Shandong, China, pp. 605–610 (2006) 10. Zhang, X., Wang, J.: A Novel Spatial Clustering with Obstacles Constraints Based on Particle Swarm Optimization and K-Medoids. In: Proceedings of the 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2007), Nanjing, China, pp. 1105–1113 (2007) 11. Liu, H., Su, R., Gao, Y.: Coordinate Particle Swarm Optimization with Dynamic Piecewise-mapped and Nonlinear Inertia Weights. In: Proceedings of the 2009 International Conference on Artificial Intelligence and Computational Intelligence (AICI 2009), Shanghai, China, pp. 124–128 (2009) 12. Kennedy, J., Eberhart, R.C.: Particle Swarm Optimization. In: Proceedings of IEEE International Conference on Neural Networks, Perth, Australia, vol. IV, pp. 1942–1948 (1995) 13. van den Bergh, F.: An Analysis of Particle Swarm Optimizers. Ph.D. thesis, University of Pretoria (2001) 14. Baranovsky, A., Daems, D.: Design of One-Dimensional Chaotic Maps with Prescribed Statistical Properties. International Journal of Bifurcation and Chaos 5(6), 1585–1598 (1995) 15. Zhang, X., Deng, G., Liu, Y., Wang, J.: Spatial Obstructed Distance Based the Combination of Ant Colony Optimization and Particle Swarm Optimization. In: Proceedings of 2009 4th IEEE Conference on Industrial Electronics and Applications (ICIEA 2009), Wuhan, China, pp. 106–111 (2009)
The Optimization of Procedure Chain of Three Activities with a Relax Quantum Shisen Lv, Jianxun Qi, and Xiuhua Zhao School of Business Administration North China Electric Power University {Beijing,China,qijianxun}@yeah.com
Abstract. In order to solve the problem of four parallel activities being adjusted to a procedure chain of three activities and a parallel activity in deterministic activity-on-arc networks of the CPM type, a new theory and the comparative method of special procedure chains of three activities are proposed. Based on the theory and method, a branch and bound of the decision tree is described. Keywords: Branch-and-bound, Decision Tree, Procedure Chain.
The Optimization of Procedure Chain of Three Activities with a Relax Quantum
485
which can only be applied in small network; the heuristic methods used in reference [5,6] are of great limitationThrough intensive study on this problem, we bring forward a new theory and give the comparison method of special procedure chain of three activities. Based on the theory and method, reference [7,8] describe branch and bound of the decision tree.
2 The Latest Start Time Theory The latest start time theory: If LSC ≥ LS D , then [ ABC ] ≤ [ ABD] . Proof: Consider LSC ≥ LS D ,
⑴ If both ( ABC ) and ( ABD) are standard chain, then
[ ABC ] = EFA + TB − LSC , [ ABD ] = EFA + TB − LS D
For LSC ≥ LS D , then [ ABC ] − [ ABD ] = ( EFA + TB − LSC ) − ( EFA + TB − LS D ) = LS D − LSC ≤ 0
⇒ [ ABC ] ≤ [ ABD ]
⑵ If both ( ABC ) and ( ABD) are front separated chain, then [ ABC ] = EFB − LSC , [ ABD] = EFB − LS D
⑷ If ( ABC ) is standard chain, then ( ABD) must be standard chain. If not, then 。
( ABD) should be either front separated chain or behind separated chain Assume ( ABD) is front separated chain, then EFA < ES B , but ( ABC ) is standard chain, so is contradictory. Assume ( ABD) is behind separated chain, then LFB < LS D . However, consider ( ABC ) is standard chain, hence LFB ≥ LSC . For LSC ≥ LS D , then LFB ≥ LS D , which is contradictory. Therefore ( ABD) is standard chain. According to (1), we get [ ABC ] ≤ [ ABD]
⑸ If ( ABC ) is front separated chain, then ( ABD) is also front separated chain,
hence, ( ABD) is not standard chain.
486
S. Lv, J. Qi, and X. Zhao
If ( ABD) is also behind separated chain, then [ ABD] = 0 . For ( ABC ) is front separated chain, then EFA < ES B . And for ( ABD) is behind separated chain, then LFB < LS D . And for LSC ≥ LS D , then LFB ≤ LSC . Then ( ABC ) is also behind separated chain, consequently, [ ABC ] = 0 , therefore [ ABC ] ≤ [ ABD]
⑹
If ( ABC ) is behind separated chain and front separated chain, then ( ABD) must be front separated chain, according to (2), we can prove that [ ABC ] ≤ [ ABD] . If ( ABC ) can only be behind separated chain, then ( ABD) would be either behind separated chain or standard chain. If ( ABD) is behind separated chain, we can prove that [ ABC ] ≤ [ ABD] according to (3). If ( ABD) is standard chain, we can get LFB ≥ LS D . Consider that [ ABC ] = EFA − LS B , [ ABD ] = EFA + TB − LS D , then
[ ABC] − [ ABD] = (EFA − LSB ) − (EFA + TB − LSD ) = LSD − LFB ≤ 0 ⇒ [ ABC] ≤ [ ABD] According to , , , , , , if LSC ≥ LS D , then [ ABC ] ≤ [ ABD]
⑴⑵⑶⑷⑸⑹
3 Comparison Method of Special Procedure Chain of Three Activities 3.1 Comparison Method of (ABX) and (BAX)
⑴ If at least one of ( ABX ) and ( BAX ) is standard chain, according to the following ① and ②, we can give a comparison of the two procedure chains. Assume ( ABX ) is standard chain,
① If ES ≤ ES , then ( ABX ) is better than (BAX ) , otherwise ②; ② If ( BAX ) is standard chain, then ( BAX ) is better than ( ABX ) ; if ( BAX ) is A
B
front separated chain, then ( BAX ) is better than ( ABX ) ; if ( BAX ) is behind separated chain, and C A > CBC − TX , ( BAX ) is better than ( ABX ) ; and if C A < CBC − TX , then ( ABX ) is better than ( BAX ) . Similarly, we can get the comparison method of standard chain ( BAX ) . Proof: Suppose ( ABX ) is standard chain,
① If ES ② If ES
A
≤ ES B , according to front symmetrical theorem, [ ABX ] ≤ [ BAX ] .
A
> ES B ,
1) If ( BAX ) is standard chain, for front symmetrical theorem, [ BAX ] ≤ [ ABX ] .
The Optimization of Procedure Chain of Three Activities with a Relax Quantum
487
2) If ( BAX ) is front separated chain, then [ BAX ] = EFA − LS X . For ( ABX ) is standard chain, then [ ABX ] = EFA + TB − LS X . Therefore
[ ABX ] − [BAX ] = (EFA + TB − LSX ) − (EFA − LSX ) = TB ≥ 0 ⇒ [BAX ] ≤ [ ABX ] 3) If ( BAX ) is behind separated chain, then [ BAX ] = EFB − LS A . For ( ABX ) is standard chain, then [ ABX ] = EFA + TB − LS X . Therefore [ ABX ] − [ BAX ] = ( EFA + TB − LS X ) − ( EFB − LS A ) = EFA + TB − LS X − EFB + LS A = ( ES A + LFA ) − ( ES B + LS X ) = C A − (CBX − TX ) When C A > CBC − TX , [ BAX ] ≤ [ ABX ] , then ( BAX ) is better than ( ABX ) . When C A < CBC − TX , [ ABX ] ≤ [ BAX ] , then ( ABX ) is better than ( BAX ) .
⑵ If both ( ABX ) and ( BAX ) are behind separated chain, then only need to compare the center of gravity of activity A and B . If C A < CB , then ( ABX ) is better than ( BAX ) ; if C A > CB , then ( BAX ) is better than ( ABX ) ; and if C A = CB , then ( BAX ) is same as ( ABX ) . Proof: For both ( ABX ) and ( BAX ) are behind separated chain, then [ ABX ] = EFA − LSB , [ BAX ] = EFB − LS A Thereore [ ABX ] − [BAX ] = (EFA − LSB ) − (EFB − LSA ) = (EFA + LSA ) − (EFB + LSB ) = CA − CB
If C A < CB , then [ ABX ] < [ BAX ] , ( ABX ) is better than ( BAX ) . If C A > CB , then [ ABX ] > [ BAX ] , ( BAX ) is better than ( ABX ) . If C A = CB , then [ ABX ] = [ BAX ] , ( BAX ) is same as ( ABX ) .
⑶
If one of ( ABX ) and ( BAX ) is front separated chain, the other is behind separated chain, then the tardiness front separated chain is smaller. Proof: Suppose both ( ABX ) and ( BAX ) are front separated chain, then
EFA < ES B , EFB < ES A For ES A ≤ EFA < ES B ≤ EFB and EFB < ES A , contradictory, then neither ( ABX ) nor ( BAX ) is front separated chain. Suppose ( ABX ) is front separated chain, and ( BAX ) is behind separated chain, for ( ABX ) is front separated chain, then EFA < ES B , [ ABX ] = EFB − LS X . For ( BAX ) is behind separated chain, then [ BAX ] = EFB − LS A . Therefore
[ ABX ] − [ BAX ] = ( EFB − LS X ) − ( EFB − LS A ) = LS A − LS X < 0 ⇒ [ ABX ] < [ BAX ] Therefore, the tardiness of front separated chain procedure chain of is smaller.
488
S. Lv, J. Qi, and X. Zhao
3.2 Comparison Method of (ABX) and (BAY)
(1) If X = Y , then use 2.1 to compare, otherwise (2). (2) Compare the tardiness of ( ABX ) and ( BAY ) .
4 Branch and Bound Problem In the network of CPM type, A, B, C , D are four parallel activities. If three of them should be adjusted to procedure activities, which three should be chosen and how can we get the minimum total tardiness by adjusting them? Algorithm Step 1: Build decision tree, see Fig.1.
Fig. 1. Decision tree
Step 2: Use late start theory to make the decision. Step 3: Use Method 3.1 and 3.2 to make the decision. Step 4: Use the basic theory in reference [2] to compare procedure chain of three activities with the same activities, find the one with the minimum total tardiness. Step 5: Use the tardiness of first procedure chain of three activities from step 4 as upper bound, compare it with the tardiness of another procedure chain of three activities, if the tardiness of the other one is bigger or the same, then keep upper bound, else, then use that tardiness as procedure chain of three activities, till there is no new procedure chain of three activities.
5 Application Example: Four different parts Z1 , Z 2 , Z 3 , Z 4 is processed in four same lathes (see Fig.2). Now, two lathes cannot be used, and the four parts must be processed in the
The Optimization of Procedure Chain of Three Activities with a Relax Quantum
489
two other lathes remained, but the performance of one of the two lathes remained is poor, it can process one part only. In order to minimize the tardiness of the completion time of project, which three parts are processed on the other lathe and how to arrange the sequence of the three parts?
Fig. 2. Part processing network
Solution: Table 1. Gives the time parameter Z1 , Z2 , Z3 , Z 4 Activity name
Z1
Z2
Z3
Z4
T
4
4
9
1
ES
7
8
5
12
EF
11
12
14
13
LS
8
8
7
12
LF
12
12
16
13
The center of gravity C
19
20
21
25
Take following steps: Step 1: Draw the decision tree. See Fig.3. Step 2: Use late start theory to give the decision of every decision dot (see Fig.3). For example: on the decision dot Z 2 of decision tree (1), according to late state theory,
since LS Z4 > LS Z3 , therefore we choose Z 4 . Similarly, we can get all decisions on other decision dot of decision tree (1) and of decision tree (2) (3) (4).
490
S. Lv, J. Qi, and X. Zhao
ΔT = 3
Fig. 3. Branch and bound method decision figure
Step 3: Use Method 3.1 and 3.2, find procedure chain of three activities with the minimum tardiness of completion time in kinds of ( ABX ) or ( BAY ) . (see Fig.3). For
example, compare ( Z1 Z 2 Z 4 ) and ( Z 2 Z1 Z 4 ) , then EFZ1 > ESZ2 , LFZ2 ≥ LS Z4 . There-
fore ( Z1 Z 2 Z 4 ) is asstandard chain. For ES Z1 < ES Z2 , according to method 3.1, ( Z1 Z 2 Z 4 ) is better than ( Z 2 Z1 Z 4 ) . Similarly, we can make other decision. Step 4: Use basic theorem to compare procedures chain of three activities composed with same activities. Compare ( Z1 Z 2 Z 4 ) , ( Z1 Z 4 Z 2 ) and ( Z 2 Z 4 Z1 ) .
First, compare ( Z1 Z 2 Z 4 ) with ( Z1 Z 4 Z 2 ) . For ( Z1 Z 2 Z 4 ) is standard chain, and LFZ4 > LFZ2 . For behind symmetrical theorem, ( Z1 Z 2 Z 4 ) is better than ( Z1 Z 4 Z 2 ) . Similarly, we can make other decision; ( Z1 Z 2 Z 4 ) is better than ( Z1 Z 4 Z 2 ) and ( Z 2 Z 4 Z1 ) ; ( Z 3 Z 2 Z 4 ) is better than ( Z 3 Z 4 Z1 ) . From the aforementioned analysis, we get three-procedure chain of three activities: ( Z1 Z 2 Z 4 ) , ( Z 3 Z1 Z 4 ) and ( Z 3 Z 2 Z 4 ) , see Fig.3. Step 5: [Z1Z2 Z4 ] = EFZ1 + TZ2 − LSZ4 = 11+ 4 −12 = 3 .
Therefore, 3 are taken as the upper bound of tardiness of completion time, marked tardiness ΔT , upper bound of tardiness of completion time as ΔT , then ΔT = 3 .
The Optimization of Procedure Chain of Three Activities with a Relax Quantum
491
[ Z 3 Z 4 Z1 ] = 7 , and 7 > 3 , therefore keep ΔT as 3. [ Z 3 Z 2 Z 4 ] = 6 , and 6 > 3 , therefore keep ΔT as 3 similarly. Hence, the procedure chain of three activities with the minimum tardiness of completion time is ( Z1 Z 2 Z 4 ) , and tardiness of completion time is 3 days, see Fig.3. The result is that Z1 , Z 2 and Z 4 in one group, and sequence is Z1 , Z 2 , Z 4 ; Z 3 in another group. After this adjusting, the tardiness of completion time is three days.
6 Conclusion The thesis makes research on the optimization of procedure chain of three activities with a relax quantum, and gives one new theory and the comparison method of special procedure chain of three activities. Based on these, it shows the branch and bound method to solve this problem. That method greatly lowers the computational complexity. Also, it is the first time to give a branch and bound method to solve such problem. For further research, we can use computer implement to solve this problem and consider the optimization of procedure chain of three activities with many relax quantum.
Acknowledgment. R.B.G. thanks the Natural Science Foundation of China (70671040) and Beijing Municipal Commission of Education (X90017).
References 1. Elmaghraby, S.E.: Activity nets: A guided tour through some recent developments. European Journal of Operational Research 82, 383–408 (1995) 2. Qi, J.X.: The new theory of project planning and control and technical economical decision. Scientific Press, Beijing (1997) 3. Wang, Z.T., Zhang, J.: The network planning and control. Liaoning Press, Shenyang (1984) 4. Whitehouse, G.E.: Systems Analysis And Design Using Network Techniques. PrenticeHall, New Jersey (1973) 5. Elmaghraby, S.E.: On criticality and sensitivity in activity networks. European Journal of Operational Research 127, 220–238 (2000) 6. Kelley, J.E., Walker, M.R.: Critical path planning and scheduling. In: Proceedings of the Eastern Joint Computational Conference, vol. 16, pp. 160–172 (1959) 7. Demeulemeester, E.L., Herroelen, W.S., Elmaghraby, S.E.: Optimal procedures for the discrete time / cost trade-off problem in project networks. European Journal of Operational Research 88, 50–68 (1996) 8. Montemanni, R., Gambardella, L.M.: A branch and bound algorithm for the robust spanning tree problem with interval data. European Journal of Operational Research 161, 771–779 (2005)
Invalidity Analysis of Eco-compensation Projects Based on Two-Stage Game Xianjia Wang1,2, Nan Xu1, and Binbin Huang1 1
2
Institute of System Engineering, Wuhan University, Wuhan 430072, China School of Economics and Management, Wuhan University, Wuhan 430072, China [email protected]
Abstract. Due to economic benefits, pollution control costs and the compensated side’s strategy to get more compensation, many eco-compensation projects can not achieve desired outcome. In this paper, we assume that the pollution control strength of the compensated side is private information and establish a two-stage dynamic game model with incomplete information to analyze the strategy of the two sides and how incomplete information affects the compensation amount and the pollution level. Keywords: Eco-compensation, Bayesian game, incomplete information.
Invalidity Analysis of Eco-compensation Projects Based on Two-Stage Game
493
In this paper, we will establish dynamic analysis game models for two-area eco-compensation and theoretically explore the reason why the eco-compensation is invalid. We assume that the pollution control strength of the compensated side is private information and the compensating side can not observe the real type of the pollution control strength of the contaminated area during the game process and establish a two-stage dynamic game analysis model with incomplete information to analyze the strategy selection of the two areas and how incomplete information affects the compensation amount and the pollution level.
2 Basic Model of Two-Stage Game Environmental quality can be seen as a common commodity between the compensating and compensated area. If the welfare function of the compensating side is U1 (Y1 , C ( A), q1 ) , where Y1 is the income after paying the compensation to the contaminated area, Y1 = y1 − A , and A is the compensation amount, y1 is the income before paying the compensation; C ( A) is the utility obtained by a compensation of A , ∂C ∂A > 0 , ∂U1 ∂C > 0 ; q1 ( E2 ) is the air quality of the compensating side, which is depended on the pollution level E2 of the contaminated area. Since paying the compensation will reduce the income of the compensating side, ∂ 2U1 ∂q1∂A < 0 . Since the air quality of the compensating side will decline if the pollution amount of the contaminated area increases, ∂ 2U1 ∂E2 ∂A > 0 . For convenience, we use utility function V1 ( A, E2 ) to replace welfare function and the objective of the compensating side is to choose suitable compensation amount A to maximize V1 ( A, E2 ) . If V1 ( A, E2 ) is a strict concave function about A , then for any E2 , the maximized utility function has a unique solution and r1 ( E2 ) is the optimal response function of the compensating side. The welfare function of the compensated side is U 2 (Y2 , q2 ) , where Y2 is the income after being compensated and Y2 = y2 + A , y2 is the income before being compensated, q2 ( E2 ) is the air quality of the compensated side, which depends on the value of the pollution amount E2 . ∂q2 ∂E2 < 0 , ∂U 2 ∂q2 < 0 , ∂U 22 ∂q2 ∂Y2 > 0 . Similarly, we can replace the welfare function of the compensated side with its utility function and its objective is to maximize the utility function. The determination of the compensated side could be classified into two types: “tough” and “weak”. Due to the asymmetry of information, the compensating side has no knowledge of the type of the compensated side, but he knows the probability of the compensated side being weak is α and being tough is 1 − α . And use V2T ( A, E2 ) and V2W ( A, E2 ) to represent the utility function of “tough” and “weak” respectively. The marginal utility of being “weak” is greater than that of being “tough”, so ∂V2W ( A, E2 ) ∂E2 > ∂V2T ( A, E2 ) ∂E2 . No matter which type the compensated side is, a suitable pollution level E2 will be chosen to maximize the utility function, so any compensation amount A is corresponding to an optimal responding function r2T and r2W of the compensated side. Since the marginal utility of being “weak” is greater than being “tough”, r2W ( A) > r2T ( A) , as shown in Fig. 1
494
X. Wang, N. Xu, and B. Huang
r1 E2
r2W A
A
r2T A AW A AT
0
E2T
E2T
E2W
E2W
E
Fig. 1. Static Bayesian Equilibrium
Now we establish a two-stage dynamic game model with imperfect information. In stage 1, the pollution control type of the compensated side is private information and the compensating side can only observe the behavior of the compensated side to estimate the pollution control type. Assume the probability of the compensated side being “weak” is α and use ( A1 , ET 1 , EW 1 ) to denote the strategy selected by two sides in stage 1. Then in stage 2, the behavior of each side in stage 1 becomes public information and the compensating side and the compensated side will reselect their strategy and denoted by ( A2 , ET 2 , EW 2 ) . We analyze two kinds of perfect Bayesian equilibriums, separating equilibrium and pooling equilibrium. Pooling equilibrium is that in stage 1 of the game, different types of the compensated sides choose same action and their action does not provide any useful information to two sides of the game. Since the compensating side does not know the true pollution control type of the compensated side, the result of the static Bayesian equilibrium in stage 2 is ( A2 , ET 2 , EW 2 ) = ( A* (α ), ET* (α ), EW* (α )) . The separating equilibrium is that different types of compensated sides choose different pollution levels and the compensating side can decide the type of the compensated side by observed information. And the stage 2 of the game becomes a static game. The equilibrium is ( A2 , ET 2 ) = ( AT* , ET* ) if compensated side is “tough” and ( A2 , EW 2 ) = ( AW* , EW* ) if the compensated side is “weak”. First we analyze pooling equilibrium. In stage 1 of the game, the compensated side uses imitative strategy and chooses the same pollution level, thus in stage 2 the compensating side does not know the strategy of the compensated side. Using imitative strategy in stage 1 causes loss of the utility of the contaminated area, because the pollution level chosen by imitative strategy is not the pollution level that maximizes the utility. That is, if the compensated side creates a false impression by imitative strategy, this will bring a loss to itself. Uncertainty will benefit the “tough” compensated side, so the “tough” compensated side will imitate a “weak” one to increase the compensating
Invalidity Analysis of Eco-compensation Projects Based on Two-Stage Game
495
side’s uncertainty of its type. So, the imitative strategy of pooling equilibrium will increase the pollution. In stage 1, if the compensated side is “weak” and its strategy is EW 1 = EW* (α ) , then it will imitate a “tough” one, so in stage 1, ET 1 = EW 1 = EW* (α ) , and the optimal response of the compensating side in stage 1 is r ( EW* (α )) . In stage 1, the compensating side does not know anything about the type of the compensated side, so in stage 2, the probability of the compensated side being “weak” is still α . If the compensated side is “tough”, the equilibrium of stage 2 is ( A* (α ), ET* (α )) ; if it is “weak”, the equilibrium of stage 2 is ( A* (α ), EW* (α )) . Suppose the discount rate is δ , then the gross profit of the compensated side in the game will be V2T (r ( EW* (α )), EW* (α )) + δ V2T ( A* (α ), ET* (α )) , if it is “tough”. If the compensated side does not adopt imitative strategy, then the gross profit will be V2T ( A* (α ), ET* (α )) + δ V2T ( AT* , ET* ) . So the “tough” compensated side will adopt imitative strategy if the following condition is true: V2T (r ( EW* (α )), EW* (α )) + δ V2T ( A* (α ), ET* (α )) ≥ V2T ( A* (α ), ET* (α )) + δ V2T ( AT* , ET* ) That is :
δ [V2T ( A* (α ), ET* (α )) − V2T ( AT* , ET* )] ≥ V2T ( A* (α ), ET* (α )) − V2T (r ( EW* (α )), EW* (α )) (1) So we can get the following conclusion: Conclusion 1: “Tough” compensated side will adopt imitative strategy only when the discounted value of the amount of increased profit in stage 2 is greater than the amount of decreased profit in stage 1 if imitative strategy is adopted. In stage 2 of the game, the compensating side will amend the priori probability distribution of the compensated side by information observed in stage 1. If the pollution observed by compensating side in stage 1 is E , the posterior probability is u ( E ) , there will be: ⎧⎪u ( E ) = α , E = EW* (α ); (2) ⎨ * ⎪⎩u ( E ) = 0, E ≠ EW (α ) Theorem 1: When formula (1) is true, the compensated side will select the following strategy: ( A* (α ), ET* (α ), EW* (α )) for stage 1 and ( A* (α ), ET* (α )) for stage 2 (when the compensated side is “tough”); ( A* (α ), EW* (α )) (when the compensated side is “weak”), which is a Bayesian pooling equilibrium with formula (2). Now we analyze separating equilibrium. Under these circumstances, compensated side of different pollution control types chooses different pollution level. The strategy set is ( A1 , ET 1 , EW 1 ) = ( A* (α ), ET* (α ), EW* (α )) for stage 1 and the strategy set is ( A2 , ET 2 ) = ( AT* , ET* ) for stage 2 (if the compensated side is “tough”) or ( A2 , EW 2 ) = ( AW* , ET* ) for stage 2 (if the compensated side is “weak”). Similarly, the compensating side will amend the priori probability distribution in stage 2 according to the information observed in stage 1.
⎧⎪u ( E ) = 1, E = EW* (α ); ⎨ * ⎪⎩u ( E ) = 0, E ≠ EW (α )
(3)
496
X. Wang, N. Xu, and B. Huang
That is, the compensated side reveals that it is “weak” by choosing pollution level EW* (α ) . If any other pollution level is chosen, then the compensated side is “tough”. The gross profit of the “tough” compensated side is V2T ( A* (α ), ET* (α )) + δ V2T ( AT* , ET* ) . However, if “tough” compensated side chooses pollution level EW* (α ) , then in stage 2, the compensating side will pay it a compensation of AW* and the optimal response function of “tough” compensated side is r2T ( AW* ) and its gross benefit is V2T ( A* (α ), EW* (α )) + δ V2T ( AW* , r2T ( AW* )) . If u ( E ) = 0 , which means that the “tough” compensated side adopt EW* (α ) in stage 1 and the discount value of the increased profit in stage 2 is less than the decreased profit in stage 1 due to imitative strategy, then the “tough” compensated side will not adopt EW* (α ) . V2T ( A* (α ), ET* (α )) + δ V2T ( AT* , ET* ) ≥ V2T ( A* (α ), EW* (α )) + δ V2T ( AW* , r2T ( AW* )) That is : V2T ( A* (α ), ET* (α )) − V2T ( A* (α ), EW* (α )) ≥ δ [V2T ( AT* , ET* ) − V2T ( AW* , r2T ( AW* )]
(4)
Theorem 2: When formula (4) is satisfied, “tough” compensated area will not adopt
EW* (α ) in stage 1 and the strategy will be: ( A* (α ), ET* (α ), EW* (α )) for stage 1, and ( AT* , ET* ) for stage 2 (when the compensated side is “tough”); ( AW* , EW* ) (when the compensating side is “weak”). It’s a perfect Bayesian Equilibrium with formula (3). So, to reach a separating equilibrium, a “weak” compensated side will choose a higher level of pollution to emphasis its pollution control type to prevent the mixed equilibrium, EW 1 > EW* (α ) , which will increase the cost for “tough” compensated side to imitate it. Since the pollution level chosen by “weak” compensated side is not the optimal one, so its own utility is reduced when choosing higher pollution level. In stage 1, “weak” compensated side choose a pollution level EW 1 > EW* (α ) , and the optimal response of compensating side and compensated side is A1 = A* (α ; EW 1 ) and ET 1 = ET* (α ; EW 1 ) respectively; in stage 2 the strategy set is ( A2 , ET 2 ) = ( AT* , ET* ) and ( A2 , EW 2 ) = ( AW* , EW* ) (the pollution control type is “tough” and “weak” respectively). Similarly, suppose the posterior probability function of the compensating side is u ( E ) . ⎧u ( E ) = 1, E = EW 1 ; (5) ⎨ ⎩u ( E ) = 0, E ≠ EW 1 If a “tough” compensated side does not adopt imitative strategy, then:
Invalidity Analysis of Eco-compensation Projects Based on Two-Stage Game
497
The cost of imitate a “weak” compensated side should exceed the discount value of the profit in stage 2. And when “weak” compensated side adopt EW 1 , the following must be satisfied:
That is, if the discount value of the profit in stage 2 for a “weak” compensated side is greater than the cost of choosing a higher level of pollution in stage 1, then the “weak” compensated side will adopt EW 1 to reach a separating equilibrium. Theorem 3: If formula (4) is not satisfied and both formula (6) and (7) are satisfied, the
strategy of the compensated side will be: ( A* (α ; EW* 1 ), ET* (α ; EW* 1 ), EW* 1 ) for stage 1, and ( AT* , ET* ) for stage 2 (when compensated side is “tough”) or ( AW* , EW* ) (when compensated side is “weak”). And a perfect Bayesian separation equilibrium is reached with formula (5). From Theorem 3, we know that “weak” compensated side will choose a higher level of pollution to emphasis its pollution control type to increase the cost to imitate it and thus reach a separating equilibrium and let the compensating side pay it more compensation. But the action to maximize its own profit by rational game player will cause more pollution and the compensation project will not achieve expected result.
3 Conclusions Undeveloped areas take extensive mode of production and affect the environment and the welfare of the whole society. Eco-compensation is supposed to reduce this kind of negative affection. However, many eco-compensation projects can not achieve expected result. The work of this article can reveal the reason why many eco-compensation projects are invalid. The determination of the compensated area is a key factor that affects the eco-compensation project. If the compensating side decides the compensation value by the amount of pollution discharged by the compensated side, it is easily misled by the strategic selection of compensated side which will lead to more pollution. By the analysis in this article, we can conclude that the uncertainty of the pollution control type of the compensated side will cause deterioration of the environment. From the research in literature [8, 9], we know that the instability of policies and personnel causes the deterioration of the environment, which is consistent with our conclusion. Since different policy and personnel will result in the uncertainty of the pollution control strength. So in future eco-compensation project, we should not only take into account the amount and method of compensation, but also take into account the change of policy and personnel.
498
X. Wang, N. Xu, and B. Huang
References 1. Ma. Ecosystems and Human Well-being: a Framework for Assessment. Island Press, Washington (2003) 2. Ma. Ecosystems and Human Well-being: Synthesis. Island Press, Washington (2005) 3. Landell-Mills, N., Porras, I.: Silver Bullet or Fools’ Gold-A Global Review of Markets for forest Environmental Services and their Impact on the Poor. International Institute for Environment and Development (IIED), London (2002) 4. FONAFIFO.: El Desarrolla del Sistema de Pago de Services Ambient ales en Costa Rica. Costa Rica: Fondo National de Financiamiento Forestall (FONAFIFO), San Jose’ (2000) 5. Pagiola, S.: Paying for water services in Central America: learning from Costa Rica. In: Pagiola, S., Bishop, J., Landell-Mills, N. (eds.) Selling Forest Environmental Services: Market-based Mechanisms for Conservation and Development. Earth Scan, London (2002) 6. Bulas, J.M.: Implementing Cost Recovery for Environmental Services in Mexico. Paper Presented at World Bank Water Week, Washington, DC, February 24-26 (2004) 7. Echevarrfs, M.: Water User Associations in the Cauca Valley; a Voluntary Mechanism to Promote Upstream-downstream Cooperation the Protection of Rural Watersheds. Land-water Linkages in Rural Watersheds Case Study Series, Food and Agriculture Organization (FAO), Roma, Italy (2002) 8. Bohn, H., Deacon, R.T.: Ownership risk, investment, and the use of natural resources. Amer. Economic Review (90), 526–549 (2000) 9. Deacon, R.T.: Deforestation and the rule of law in cross-section of countries. Land Economic (70), 414–430 (1994)
Botnet Traffic Discriminatory Analysis Using Particle Swarm Optimization Yan Zhang, Shuguang Huang, Yongyi Wang, and Min Zhang Electronic Engineering Institute, Hefei 230037, China [email protected]
Abstract. Particle Swarm Optimization are inherently distributed algorithms where the solution for a problem emerges from the interactions between many simple individual agents called particles. This article proposes the use of the Particle Swarm Optimization as a new tool for botnet traffic discriminatory analysis. Through this novel approach, we classify the C&C session, which functions as the unique characteristic of the bots, from the complicated background traffic data so as to identify the compromised computers. Experimental results show that the proposed approach perform a high accuracy in the identification of the C&C session. Keywords: PSO, botnet, traffic analysis.
In this research, an efficient approach to detect the computer compromised by bots is explored. Considered as a common activity of bots, the C&C, therefore, is taken into account. Moreover, the ability of PSO to detect botnet traffic is examined. However, no other botnet detection approaches have applied PSO technique yet. 1.2 Related Work Several techniques have been developed to automatically identify or classify communication streams [3-6]. Dewes et al. [3] propose a scheme for identifying chat traffic using a combination of discriminating criteria, including service port number, packet size distribution, and packet content. Sen. et al. [4] propose an approach which relies on identifying particular characteristics in the syntax of packet payloads exchanged as part of the operation of the particular P2P applications. The recent trend toward using non-standard ports and encryption may reduce the effectiveness or, even, prevent the use of these techniques. Others [5, 6] have proposed approaches using statistical techniques to characterize and classify traffic streams. Roughan et al. [5] use multitudes of traffic statistics to classify flows, which pertain to either packets, flows, connections, intra-flow, intraconnection, or multi-flow characteristics. They also investigate the effectiveness of using average packet size, RMS packet size, and average flow duration to discriminate among flows. Given these characteristics, simple classification schemes produced very accurate traffic flow classification. In a similar approach, Moore and Zuev [6] apply variants of the naive Bayesian classification scheme to classify flows into 10 distinct application groups. They also search through the various traffic characteristics to identify those that are most effective at discriminating among the various traffic flow classes. By also identifying highly correlated traffic flow characteristics, this search is also effective in pruning the number of traffic flow characteristics used for classification.
2 Preliminary In this section, we firstly explain the collection of the malware we used to analyze the bot traffic. Then, we determine the flow characteristics of the traffic which vary in the ability of differentiation. 2.1 Data Preparation We used the collection of malware which is captured by honeypot system. The honeypot system uses nepenthes[7] to collect bots from the Internet. It is operated by the bot analysis team of the Institute of Information Security. In this research, we used 1784 unique binary files that were captured by this system. The malware collection was scanned by ClamAV[8] antivirus tool (version 0.88.2 and signature file number 2416). The antivirus program identified 1521 files as bots and 263 unknown files. We captured all IP packets during execution of each malware on the sandbox environment for traffic analysis. We executed a malware under Windows XP (with no service pack applied) in VMware[9] for 10 minutes. All packets from/to this sandbox environment were captured and stored as files in tcpdump format.
Botnet Traffic Discriminatory Analysis Using Particle Swarm Optimization
501
By hand-analysis, we identified 726 active bots and 1566 C&C server sessions. Those bots accessed to 81 unique servers. We used these 1566 sessions as bot C&C sessions for classification. More details of their analysis and examination of the classification are given in the following sections. 2.2 Flow Characteristics We characterize flows using attributes based on TCP and IP packet headers. These can be interpreted even if the encapsulated payload is encrypted. The characteristics were collected for each of the flows in the traffic traces used in our work. These include the cumulative application payload size, the IP protocol type (TCP), the IP source and destination addresses, the source and destination ports, and TCP flags. Moreover, we record flow start and end times, packet counts, byte counts, statistics for variance, client/server role for the connection(as indicated by the initial three-wayhandshake of TCP), and a histogram of application payload sizes. For experimental purposes, we also recorded the packet counts associated with TCP push and maximum window size.
3 Proposed Approach 3.1 Particle Swarm Optimization PSO is a heuristic technique based on a swarm of n individuals called particles, each representing a solution to a problem with N dimensions. Its genotype consists of 2*N parameters, the first N representing the coordinates of particle position, while the latter N representing its velocity components in the N-dimensional problem space. r Velocity v i (t + 1) of i-th particle at next step t+1 is a linear combination of current r r velocity v i (t ) of i-th particle at time t, of the difference between the position b i (t ) of ur the best solution found up to this time by i-th particle and current position p i (t ) of ith particle, and of the difference between best position ever found in the population r ur b g (t ) and that of i-th particle p i (t ) :
r r r ur r ur vi (t + 1) = w ⋅ v i (t ) + c1 ⋅ U (0,1) ⊗ (bi (t ) − pi (t )) + c2 ⋅ U (0,1) ⊗ (b g (t ) − p i (t ))
(1)
where ⊗ denotes point-wise vector multiplication, U (0,1) is a function that returns a vector whose positions are randomly generated by a uniform distribution in [0, 1], c1 is the cognitive parameter, c2 is the social parameter, and w is the inertia factor whose range is [0.0, 1.0]. Velocity values must be within a range defined by two parameters vmin and vmax . An improvement to original PSO is in w not being kept constant during execution; rather, starting from a maximal value wmax , it is linearly decremented as the number of iterations increases down to a minimal value wmin as follows[10]:
502
Y. Zhang et al.
⎛ t ⎞ w(t ) = wmax − ⎜ ( wmax − wmin ) ⎟ Tmax ⎠ ⎝
(2)
where t and Tmax are the current and the maximum allowed number of iterations respectively. The position of each particle at next step is then evaluated as the sum of its current position and of the velocity obtained by Eq.(1): ur ur r (3) p i (t + 1) = p i (t ) + v i (t + 1) These operations are repeated for a predefined number of iterations Tmax or until some other stopping criterion gets verified. The pseudocode of PSO is as follows: for each particle do initialize particle position and velocity end for while stopping criteria are not fulfilled do for each particle do calculate fitness value if (fitnessr value is better than best fitness value bi (t ) in particle history) r then take current particle as new bi (t ) end if end for r choose as bi (t ) the particle with best fitness value among all particles in current iteration for each particle do calculate particle velocity based on Eq. (1) update particle position based on Eq. (3) end for update the inertia factor based on Eq. (2) end while 3.2 Applying PSO to Botnet Traffic Detection
Actually, the problem of botnet traffic detection can be seen as a sort of classification. Given a database with C classes and N parameters, the problem can be translated into that of finding the optimal positions of the C centroids in an N-dimensional space, i.e. which of determining for any centroid its N coordinates. With these premises, the i-th individual of the population is encoded as follows:
( p ,L, p 1 i
C i
, vi1 , L , viC )
(4)
where the position of the j-th centroid is constituted by N real numbers representing its N coordinates in the problem space: pij = { p1,j i , L , p Nj ,i }
(5)
Botnet Traffic Discriminatory Analysis Using Particle Swarm Optimization
503
and similarly the velocity of the j-th centroid is made up of N real numbers representing its N velocity components in the problem space: vij = {v1,j i , L , vNj ,i }
(6)
Then, any individual in the population consists of 2 ⋅ C ⋅ N components, each of which is represented by a real value. To evaluate the quality of solutions, two fitness functions have been taken into account. Starting with the positions of the C centroids, any training set instance is assigned to one class whose centroid is the closest in the N-dimensional space. The fitness
function ψ1 is calculated as the percentage of incorrectly assigned instances on the r training set, i.e. it takes into account all the cases in which the class CL( x j ) assigned r r to instance x j is different from its class CLKnown ( x j ) as known from the database. Formally, i-th individual fitness is:
ψ 1 (i) = where
1 DTrain
DTrain r ⋅ ∑ δ (xj )
(7)
j =1
DTrain
is the number of instances which compose the training set and r r 1 ⎧⎪ if CL( x j ) ≠ CLKnown ( x j ) r δ (xj ) = ⎨ ⎪⎩0 otherwise
The fitness function ψ2 is computed as the sum on all training set instances of r Euclidean distance in N-dimensional space between generic instance x j and the cenur CLKnown ( xr j ) troid of the class CL it belongs to according to database ( pi ). This sum is D normalized with respect to Train . In symbols, i-th individual fitness is given by:
ψ2(i) =
1 DTrain
DTrain r ur CLKnown ( rx j ) ⎞ ⋅ ∑ d ⎛⎜ x j , p i ⎟ ⎝ ⎠ j =1
(8)
When computing distance, any of its components in the N-dimensional space is normalized with respect to the maximal range in the dimension, and the sum of distance components is divided by N. With this choice, any distance can range within [0.0, 1.0], and so can ψ2. The rationale behind this fitness is that ψ1 can vary with steps equal to 1 DTrain only, whereas it is to be hoped that this latter can do it with greater continuity. In fact
here the fitness varies for small variations in centroid positions too, while in ψ1 small changes in centroid positions might not cause any change of class for instances, thus no variation in incorrectly classified instances percentage would be obtained. We compare the two fitness functions by examining the results on the traffic database according to the incorrect classification percentages on the testing set. The two versions of PSO with different fitness functions are executed respectively for 20 runs with different starting seed provided in input to the random number generator. As a
504
Y. Zhang et al.
result, the PSO-ψ2 achieved better values for 17 out of the 20 runs. According to the
results, we consider that fitness ψ2 is on average better than fitness ψ 1 . Therefore, when we mention PSO performance, we will make reference to that of PSO-ψ2.
4 Experiments and Results In this section, we present our work on using PSO-based classifiers to identify IRCbased botnet C&C flows. We first classify flows into IRC and non-IRC flows; then, among the flows identified as IRC flows, we distinguish between authentic IRC and botnet flows. The False Negative Rate (FNR) and the False Positive Rate (FPR) were utilized to evaluate the performance of the classifiers considered. 4.1 Stage I: Identifying IRC from Background Traffic
We explore the effectiveness of PSO-based classification in identifying IRC traffic by comparing with three distinct classification techniques: J48, PSO, and Bayesian networks. J48 is the WEKA [11] implementation of C4.5 decision trees [12]. The Bayesian networks technique uses a directed acyclic graph to capture the dependence among sample features. Figure 1 depicts the FNR vs. FPR scatter plot for several runs of J48, PSO, and Bayesian networks for the labeled trace. Each data point corresponds to a different subset of the initial flow attribute set. Figure 1 reveals clustering in the performance of each of three classification techniques. PSO seems to have low FNR, but higher FPR. The Bayesian networks technique seems to have low FPR, but higher FNR. J48 seems to strike a balance between FNR and FPR.
Fig. 1. FNR and FPR of J48, PSO, and Bayesian Net Classification Schemes for IRC/non-IRC Flows of the Trace
Botnet Traffic Discriminatory Analysis Using Particle Swarm Optimization
505
Only the PSO classifiers were successful in achieving low FNR. Notably, the PSO classifiers accurately classified 35 out of the 38 background flows, thus achieving an FNR of 7.89%. In contrast, the J48 and the Bayesian networks classifiers, possibly tuned too tightly to the training set, performed very poorly. Since the PSO classifier is the only one that showed potential in accurately classifying IRC flows, it would be preferable to the J48 and Bayesian network classifiers. 4.2 Stage II: Identifying Botnet from IRC Traffic
In this section, we investigated which of the attribute sets provide the most differential benefit in classifying botnet C&C traffic by PSO. Firstly, we defined three kinds of vectors for session classification. Then, we examine the results of the C&C session classification by PSO using each vector definition. To evaluate the effectiveness and accuracy of the classification, we defined three kinds of vectors for session classification, namely session information vector, packet sequence vector, and packet histogram vector. Session information vector is defined as total receive packet numbers, total receive packet data size, total send packet numbers, total send packet data size and session time. Packet sequence vector consists of the packet size, and packet interval time of the first 16 packets from the session established. Packet histogram vector is the histogram data by packet payload size and packet interval time in the session. Figure 2 shows the result of C&C session classification on different attribute vectors. According to the session information vector, the detection rate of the training
Fig. 2. Comparison of C&C Session Classification Results on Training and Testing set using Different Attribute Vectors
506
Y. Zhang et al.
dataset is 82.68% and 80.85% for the testing dataset. That was a good classification for the bot C&C session using simple vector data to represent the session characteristics. However, the FPR is higher (9.8%) for the IRC chat session. It misclassified the normal IRC chat session as the C&C session. For the packet sequence vector, all of the C&C sessions in the training dataset were correctly identified. However, there is an 82.55% FNR for classification of the C&C sessions in the testing dataset. The packet histogram vector was better than the other two vector definitions. It classified the C&C session in the training dataset and testing dataset well. The FPR is 0.09% in the training dataset; the other data had no FPR. The FNR is 3.15% in the training dataset and 5.25% in the testing dataset.
5 Conclusions In this paper, we use PSO techniques to identify C&C traffic of IRC-based botnet. We split this task into two stages: (I) distinguishing between IRC and non-IRC traffic, and (II) distinguishing between botnet and real IRC traffic. In Stage I, only the PSO classifiers were successful in achieving low FNR in classifying the IRC flows. For Stage II, the packet histogram vector showed its superiorities over the other two attribute vectors in identifying bot C&C session, which performed not only high in detection rate but also low in FPR. Acknowledgment. We express our deep gratitude to the research team of Tanaka Laboratory in the Institute of Information Security for the collection of malware.
References 1. Eberhart, R.C., Kennedy, J.: A new optimizer using particle swarm theory. In: Proceedings of the Sixth International Symposium on Micro Machine and Human Science, Nagoya, Japan, pp. 39–43. IEEE Press, Piscataway (1995) 2. Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proceedings of the IEEE International Conference on Neural Networks IV, pp. 1942–1948. IEEE Press, Piscataway (1995) 3. Dewes, C., Watchman, A., Feldman, A.: An analysis of internet chat systems. In: IMC 2003: Proceedings of the 3rd ACM SIGCOMM Conference on Internet Measurement, pp. 51–64. ACM Press, New York (2003) 4. Sen, S., Spatscheck, O., Wang, D.: Accurate, scalable in network identification of p2p traffic using application signatures. In: WWW 2004: Proceedings of the 13th International Conference on World Wide Web, pp. 512–521. ACM Press, New York (2004) 5. Roughan, M., Spatscheck, O., Sen, S., Duffield, N.: Class of-service mapping for qos: a statistical signature-based approach to ip traffic classification. In: IMC 2004: Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, pp. 135–148. ACM Press, New York (2004) 6. Moore, A.W., Zuev, D.: Internet traffic classification using Bayesian analysis techniques. In: SIGMETRICS 2005: Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pp. 50–60. ACM Press, New York (2005)
Botnet Traffic Discriminatory Analysis Using Particle Swarm Optimization
507
7. Nepenthes Development Team: Nepenthes - Finest Collection, http://nepenthes.mwcollect.org/ 8. ClamAV project: ClamAV, http://www.clamav.net/ 9. VMware Inc.: VMware workstation, http://www.vmware.com/ 10. Shi, Y., Eberhart, R.C.: A modified Particle Swarm Optimizer. In: Proceedings of the IEEE International Conference on Evolutionary Computation, pp. 69–73. IEEE Press, Piscataway (1998) 11. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005) 12. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley & Sons, Inc., Chichester (2001)
Design and Implement of a Scheduling Strategy Based on PSO Algorithm Suqin Liu, Jing Wang, Xingsheng Li, Jun Shuo, and Huihui Liu College of Computer Communication Engineering in China University of Petroleum, Qingdao 266555, China [email protected], [email protected]
Abstract. The job scheduling technology is an effective way to achieve resource sharing and to improve computational efficiency. Scheduling problem has been proved to be NP-complete problems, Particle Swarm Optimization (PSO) algorithm has demonstrated outstanding performance in solving such issues. In cognizance of the characteristics of cluster scheduling problem, a schedule strategy based on PSO was designed and implemented. Comparing with backfilling algorithm, PSO algorithm can improve the fairness of jobs better. It can avoid the problem that bigger jobs can’t be executed quickly. The speed and accuracy of strategy generation are improved significantly. The experiment results show that the scheduling strategy based on PSO algorithm can increase the utilization of the CPU and reduce average response time significantly. Keywords: scheduling strategy, cluster, PSO algorithm, design, implement.
Design and Implement of a Scheduling Strategy Based on PSO Algorithm
509
2 Principle of Cluster Scheduling In the model of cluster job scheduling, (M1,M2,L,Mn)is a set of modules or sub-tasks to be processed, (P1,P2,L ,Pm) is the processor or node machines of the system, they communicate with each other through the network, scheduling mechanism S assigned the n modules or sub-tasks to the m-processors reasonable through a certain strategy. It identify the order of specific tasks which each processor handles, under the premise of the implementation order of each sub-task scheduling to meet the dependency constraints between the tasks, shortest the completion time of the system to handle the whole tasks. The job scheduling problem for the optimization objectives of minimizing the maximum completion time has been proved to be NP-complete problems [2]. Backfill [3] has proved to be one of the best-performing algorithms currently. But this is not the good scheduling strategy to adapt to the characteristics of scheduling problems, such as NP-completeness, heterogeneity of resource, etc. [4] Genetic algorithms have also been applied well in scheduling strategy recent years, but for the scheduling strategy it is difficult to meet the requirements of higher scheduling efficiency, the dynamic of computational grid environment and load balance at the same time, etc. [5] PSO algorithm is superior to Genetic algorithm in optimization results and convergence speed [6]. With the PSO algorithm used for job scheduling problem, minimize the maximum processing time of all processors by searching the appropriate mapping between the problem solution and the algorithm particles, PSO algorithm is an effective choice in solving the job allocation problem.
P1
M1
M2
Ă
Mn
P2
S 噯
CPU Queue
Pm
Processor Fig. 1. Scheduling model of Cluster
3 The Design of Scheduling Strategy Based on PSO Algorithm The basic operation object of PSO algorithm is particles or individual, the possible solution to solve the problem is represented by each particle. Because the simulated cluster environment is isomorphic, the scheduling mechanism is maintaining a list of operational implementation, which includes each job’s expected executing time.
510
S. Liu et al.
3.1 Particle Encoding Method The key step is to map the solution of the problem from solution space to representation space that have spatial structure when using PSO algorithm to solve problems, that is using what kind of particle representation to express the solution space of the solving problem. According to the characteristics of the cluster scheduling problem, this paper adopts representation method of location-based particle rounding operation. Define a two-dimensional particle, two-dimensional particle in the first dimension with the natural numbers 1,2,L,n to express n-jobs, in the second-dimensional to express the location of particles, it is the location of selected node resources, the length of the particle is the number for all operations. For a vector of the ith of the two-
⎡ j⎤ ⎥ j=1,2,L,n;xij [1,m+1) ⎣ xij ⎦
dimensional particle populations xj can be expressed as: ⎢
∈
is a random real number, m is the number of nodes for resources, indicate the different resource nodes using the natural number of 1,2,L,m. 3.2 Determine the Fitness Function The objective of job scheduling is to minimize the maximum scheduling completion time (Makespan). For the n-job, m-node cluster resource scheduling problem, with Tj indicated the execution time of job j, Wj represented the job waiting time in the implementation process, Ej indicated the job’s finished time, f means the maximum completion time of all operations, If F indicated the optimization objective of scheduling problems, which is minimize the maximum completion time:
{
}
{
}
F = min f = min max ( E j , j = 1, 2," , n ) = min max (W j + T j , j = 1, 2," , n )
(1)
3.3 Determine the Termination Criteria of the Algorithm The most commonly used termination criteria of PSO algorithm are preset a maximum number of iterations, or terminate algorithm when the fitness value of solutions have no significant changes after multiple iterations in the searching process. This approach is: when
F ( x )i − F ( x )i −1 < ε , terminate the algorithm. ε is a pre-given
sufficiently small real number. 3.4 Select the Particle Swarm Model The speed - location model that introduced inertia weight enhanced the global and local search ability, thereby reduced the number of iterations to find the optimal solution, and improved the convergence speed and efficiency of the algorithm, this paper will use this model. In each step of iteration, the particles update their location according to the following formula:
vi +1 = ω vi + c1random (
)( pi − xi ) + c2 random ( )( g − xi )
(2)
Design and Implement of a Scheduling Strategy Based on PSO Algorithm
511
xi +1 = xi + vi +1
(3)
Among them, ù is the inertia weight. Random () is a random number between 0 and 1, c1 and c2 are both the learning factor, Pi is the individual optimum particle of current optimal particle swarms, vector g is the global optimum particle. Particles in each dimension can not flight exceed the maximum speed Vmax which the algorithm has set. 3.5 Particle Decoding Method It needs to decode the above-mentioned two-dimensional particles before generating the scheduling program. Take rounding operation for the second dimension of the particles which is the particle position xij, with after rounding operation,
INT ( xij ) represent, since xij ∈[1,m],
INT ( xij ) means the resource location that job j corre-
sponding to. Through this way, all the jobs are allocated to different resources, and thus can get the operating sequence of various resources, that is the scheduling program. And iterate based on the speed - location model of PSO algorithm, generate a new scheduling program, the process is shown in Figure 2:
Particle Location xij
Rounding operation INT(x ) ij
Node Allocation Si
Scheduling Program
Particle decoding Fig. 2. The mapping of Particle location and scheduling solution
The maximum completion time of all jobs is calculated as follows: First, calculate the sum of job completion time that allocated in each node resource, then take the sum of the execution time which is the greatest in each resource node as the maximum completion time of all jobs, or Makespan. 3.6 Description of the Scheduling Strategy Based on PSO (1) Randomly generate the initial Particle Swarm X={x1,x2,L,xn}, flight speed vi in the solution space and initialize the size of the particle swarm n, inertial weight ω=1.15, random factor random()=0.75, learning factor c1=c2=2, the maximum flight speed vmax and the conditions of terminating the iteration. (2) Maintain a job-executed list with jobs’ expected executing time. (3) Decode the particles and generate the scheduling program. Through inquiring the job-executed list, the fit value of particles can be figured out according to the located fitness functions, namely the maximum time of accomplishing a job. If the
512
S. Liu et al.
scheme of dispatch generated by a certain particle is infeasible, the fit value of the particle (namely the maximum accomplishing time) should be set to a larger value, and ascertaining the optimal individual particle pbest and global particle gbest . (4) According to the formula (2) and the formula (3), the location and speed of particles in the population can be iterated. We will select a value randomly in [1, m] if the location of the updated particle overrun the limited range [1, m+1]. (5) Judging that whether or not the iteration is finished. If so, output the optimal global particle and the Makespan, and generate the optimal scheduling according to the optimal global particle. Otherwise go to the third step. During the experiment process, we repeatedly adjust the value of the parameters based on the results to achieve best effects within a relatively short time. 3.7 Generating the Scheduling Program Initialize
Scheduling Program
Search the job-executed list
Decoding of the updated particle
Ascertain the fitness value, and determine the optimal individual particle pi and the global optimum particle g
Update particle through the position - velocity model
No
Stop the iteration or not? Yes Output the optimal particle Decoding Optimal scheduling Program
Fig. 3. The flowchart of the cluster schedule problem using PSO algorithm
Design and Implement of a Scheduling Strategy Based on PSO Algorithm
513
PSO Algorithm shows good features on optimizing efficiency and reducing computing time. When using in the scheduling strategy, it can make the system quickly converge to the optimal scheduling. So the scheduling strategy based on PSO Algorithm will be improved in the speed and precision of generating the scheduling program.
4 Experiment Results and Analysis The simulation experiments in a cluster environment were executed in this paper in order to verify the effectiveness of the algorithm. Build an isomorphic cluster platform using eight resource nodes with 2G memory, Linux Red Hat 4.0 as their OS and the 100Mpbs standard Ethernet as their network communication. Referring to evaluation indicators, users are usually concerned about the response time of jobs. While the utilization ratio of the system is also very important to the operating system, so we use the average response time and the utilization ratio of the system as the evaluation indicators in this paper. 500 jobs are submitted to the cluster platform to test the backfill and the scheduling strategy based on PSO Algorithm in the experiment, during which the average response time of jobs and the utilization ratio of the CPU of each node are recorded. The experiment results are showed as the following figure 4 and figure 5: hc ae fo em it es no ps eR
4000 3500 3000 )s 2500 (e do 2000 n 1500 1000 500 0
backfill PSO
R0
R1
R2
R3
R4
R5
R6
R7
Fig, 4. The comparison chart of the resxonse time of each node e d 100% o n 90% h c 80% a e 70% f 60% o n )% 50% o ( i 40% t a 30% z i 20% l i 10% t u 0% U P C
backfill PSO
R0
R1
R2
R3
R4
R5
R6
R7
Fig. 5. The comparison of the CPU utilization of each node
514
S. Liu et al.
We can come to conclusions from the experiment results as follows: In scheduling strategy based on PSO algorithm, the average response time of each node are less than backfill and the total executing time are obviously reduced, also the time fluctuations of each node decreased, which illuminates that the optimization results of each job based the time as its fitness value is well. The utilization ratio of the CPU is higher than backfill and the fluctuations of CPU utilization ratio of each node decreased, which illuminates that the load of each node is more balanced.
5 Conclusions The scheduling strategy based on PSO algorithm was advanced and implemented in this paper, which makes the scheduling strategy more outstanding in the speed and precision for the excellent performance of PSO algorithm. The experiment results show that the scheduling strategy can obviously reduce the average response time of the system, make each job more optimization based on time and improve the utilization ratio of the CPU and the throughput ratio of the system of each node, also make the system more balanced load with high efficiency. We remain committing to solve the problems of the load balance of the scheduling strategy and the study of computing dynamic environment of cluster using PSO algorithm.
References [1] Liu, Z.-x., Wang, S.-m.: Research on parallel machines scheduling problem based on particle swarm optimization algorithm. Computer Integrated Manufacturing Systems 12(2), 183–185, 296 (2006) [2] Wu, Q.-d., Lei, W.: Research and Application of Intelligence Particle Swarm Optimization. Jiangsu Education Publishing House, Nan Jing (2005) [3] Zhang, L.-x., Yuan, L.-q., Xu, W.-m.: A Kind of Scheduling Strategy Based on the Type of the Job. Computer Engineering 30(13), 63–64, 115 (2004) [4] Yong, Y., Cai, Z.-x., Ying, F.: An Adaptive Grid Job Scheduling Method Based on Genetic Algorithm. Computer Engineering and Applications 1, 48–50, 167 (2005) [5] Hao, T.: Research on the Strategy of Grids Resource Management Scheduling Based on Genetic Algorithm. Journal of Wuhan University of Technology (Information & Management Engineering) 28(11), 16–19 (2006) [6] Liu, Z.-x.: Research and Application of Particle Swarm Optimization in Scheduling Problem. PhD thesis, Wuhan University of Technology, 46–64 (2005) [7] Feng, G., Chen, H.-p., Lu, B.-y.: Particle Swarm Optimization For Flexible Job Shop Scheduling. Systems Engineering 23(9), 20–23 (2005)
Optimal Design for 2-DOF PID Regulator Based on PSO Algorithm Haiwen Wang1,2, Jinggang Zhang2, Yuewei Dai1, and Junhai Qu3 1
Automation School, Nanjing University of Science and Technology, 210094 Nanjing, China 2 Automation Department of Taiyuan University of Science and Technology, 030024 Taiyuan, China 3 China North Industries Group 207 Research Institute, 030024 Taiyuan, China [email protected]
Abstract. Particle swarm optimization(PSO) algorithm is a random global optimization technology, The algorithm features simple and ease to implement. Through interaction between particles, the algorithm found the optimal area in complicate searching space. In this paper, a method of optimizing two-degreeof-freedom (2-DOF) PID regulator parameter by using PSO algorithm is proposed, the optimization result is compared with the design way of 2-DOF based on IMC. The comparative results show that the system achieves better simultaneously both the command tracking and disturbance rejection characteristics by using PSO algorithm. The result verified the effectiveness of the PSO algorithm. Keywords: 2DOF control, Particle Swarm Algorithm, Robustness.
1 Introduction Adjusting parameters of 2-DOF PID controller is a very difficult problem, it is the present hot spot of the 2-DOF PID regulator research. Although the traditional tuning results have been obtained some satisfactory effects in the practical application, but these are not the most superior results. Therefore, it has very important theory and practical significance to research the optimal design of tuning the parameters of 2DOF PID regulator, propose simple, practical and robust intelligent optimization algorithm, design the novel 2-DOF the PID regulator. For example: An improved genetic algorithm was presented, and it is successfully applied to the design of 2-DOF PID regulator, meanwhile the limit (0< α β γ <1) is broken[1]. Neural network was combined with 2-DOF control, on the basis of the 2-DOF system, through a recognizing mechanism recognized the parameters of the servo system as the input of the Neural network, the outputs of the Neural network were the adjusting parameters of 2-DOF PID controller, the error between the estimated parameters and the expected parameters was minimized by the droped low, then the parameters of 2-DOF PID controller has self-adaptive ability and higher robustness[2].A full neuron realization method of
two degree of freedom PID controller was proposed, auto-adjusting parameters was solved by weight value auto-tuning. By selecting proper coefficient of neuron weight the ability of rejecting disturbance and tracking set value can get better at the same time[3]. Mind Evolutionary Computation is a new optimizing method, it was applied to optimize parameters of 2-DOF PID controller of the steam pressure control system in power plant. Simulations proved that parameters of 2-DOF PID controller optimized by MEC have good command tracking and disturbance rejection characteristics[4]. The designed procedure of parameters of 2-DOF PID regulator for typical industrial process was proposed. The designed controller has only two adjustable parameters, which are directly related to system performance, its parameters can be adjusted easily, meanwhile it has the disadvantages of the IMC and much better robustness, but it was only given the two parameters qualitatively .At end of this paper, the proposed PSO algorithm is applied to design of optimal parameters of 2-DOF PID regulator for plant to demonstrate its effectiveness which can offset the deficiency in the literature [5].
2 Particle Swarm Optimization Particle swarm optimization is a rather new technique introduced by Kennedy and Eberhart in1995. PSO is developed through simulation of bird flocking in multidimensional space. PSO is a flexible, robust population based stochastic search/optimization algorithm with non-differential objective functions, unlike traditional optimization method. PSO is less susceptible to getting trapped on local optimization unlike GA, SA, etc. The main idea is to interpret each particle as a solution to the problem being investigated and to let these particles explore the search space. In the standard particle swarm optimizer ,each particle i has a position (xi) in the search space, a velocity vector(vi), the position (pi) and the fitness of the best point encountered by the particle, and index (g) of the best particle in the swarm. The particle’s position represent the current solution to the numerical problem. The particle’s next position(xi/) determined by the velocity vector:
xi = xi + vi /
(1)
The velocity vector is updated according to the current velocity, the particle’s own best position, and the overall best position in the swarm:
v / i = ωvi + c1 rand ()( pi − xi ) + c 2 Rand ()( p g − xi )
(2)
where ω is the inertia weight, pg is the position of the best particle in the swarm .c1 and c2 are positive constant and rand() and Rand() are uniformly distributed random number in [0,1]. The inertia weight is used to balance the global and local search ability. A large inertia weight facilities a global search while a small search facilities a local search. By changing the inertia weight dynamically, the search ability is dynamically adjusted. Since the search process of the PSO is non-linear and very complicated; it is hard to mathematically model the search process to dynamically adjust the inertia weight. Instead of a fixed inertia weight, a linearly decreasing inertia
Optimal Design for 2-DOF PID Regulator Based on PSO Algorithm
517
weight is deployed. By linearly decreasing inertia weight from a relatively large value to a small value through the course of the PSO run, PSO tends to have more global search ability at the beginning of the run while having more local search ability near the end of the run. ω in this paper is from 1.1 to 0.3.
3 Design for 2-DOF PID Based on PSO 3.1 2-DOF PID Regulator The conventional PID(one-degree-of-freedom) regulator only has one set of PID parameters to be set, if they are tuned according to disturbance rejection characteristic, the command tracking characteristic will become worse; conversely they are tuned according to the command tracking characteristic, the disturbance rejection characteristic will be worse, so the parameters of these regulators are usually tuned by the method of trade-off which can not get satisfied control effect. In order to solve this contradiction, the 2DOF is presented. It can tune two sets of PID parameters respectively; very good dynamic response performance of both the command tracking and the disturbance rejection characteristic can be achieved simultaneously. The structure of 2DOF PID regulator is various.
Fig. 1. It is a set-point filter type, where C1(s) is the set-point filter, C 2 ( s) is the regulator, P(s) is the plant, r, y, d is the input, output, disturbance of the control system respectively
When P ( s ) =
K e − θ ⋅ s , the equation can be derived via 1/1 Pade: Ts + 1
P(s) = K
(1 −
θ 2
s)
(Ts + 1)(1 +
θ 2
(3)
s)
The closed-loop transfer function is given by:
Y (s) =
C 1 ( s )C 2 ( s ) 1 R (s) + D (s) 1 + C 2 (s)P (s) 1 + C 2 (s)P (s)
(4)
518
H. Wang et al.
Here:
C1 ( s ) =
C 2 (s) =
λ2 s + 1 λ1 s + 1
(Ts + 1)(1 +
θ
(5)
θ 2
s) (6)
Ks [ λ 2 s + (λ 2 + θ )] 2
From (3) to (6), the next equation can be derived:
[ Y (s) = [
θ 2
θ 2
λ 2 s + ( λ 2 + θ )] s
λ 2 s + ( λ 2 + θ )] s + (1 + (Ts + 1)( λ 2 s + 1)(1 +
Ks ( λ 1 s + 1)[
θ 2
θ 2
θ
2
s ) e −θ ⋅ s
D(s) +
s)
λ 2 s + ( λ 2 + θ )] + K (1 +
θ 2
R (s)
(7)
s )( λ 1 s + 1)
We can know from formula (7) that λ1 is tuned to achieve the best command tracking characteristic, the best disturbance rejection characteristic can be achieved by regulating λ 2 . 3.2 2-DOF Regulator Based on PSO
It is difficult to choice object function to optimize the parameters of the 2-DOF PID regulator based on PSO. As a general form of the performance index[6], consider the functional: 2
J [λ , p, H ( s )] = ∫
∞
0
⎧ d p H (s) ⎫ λ (ω )⎨ dω ⎬ p ds ⎩ ⎭ s = jω
(8)
Here, H(s) is the function, such as Gyd(s)/s or Ger(s)/s, which gives the response of the “error e” to a step input in the Laplace domain. A distinctive feature in (8) is the introduction of the frequency weight λ (ω ) .By using λ (ω ) that has larger values in the high frequency domain, we can suppress the feedback gain in the high frequency range and ,in most cases of the PID control applications, prevent the system to become oscillatory. By applying the above type of performance index with various λ (ω ) and p to representative test batches, it was found [7] that:
Optimal Design for 2-DOF PID Regulator Based on PSO Algorithm
λ (ω ) = ω 1 / 4 , p = 2
519
(9)
makes the conventional PID control systems the “optimal” in the classical sense which implies the overshoot is less than 20%,the settling time is almost the same with or less than that of the “optimal” system tuned by the CHR method. We will use the performance index (8) with λ (ω ) and p given by (9), the steps of tuning the 2DOF PID control system based on PSO are stated as follows: Step1: The parameters required for the PSO are the following: maximum number of iteration cycles=50, population(np)=30,c1=2, c2=2, J is given by Eq.(9),weight is updated by the following Eq(10):
ω (iter ) = ω max − (
ω max − ω min Nm
) * iter
(10)
Step2: Initialization Generate randomly initial velocities, positions of all particles, set individual best position, the global best position. Step3: Velocity updating Using the global best position and individual best position, the velocity output of any component of jth(j=1,2,…, np) particle is updated according to Eq.(2). Step4: Position updating Based on the updated velocities, each particle updated its position according to Eq.(1). Step5: Individual best position updating Each particle among np particles is evaluated as Ji(iter) according to the updated position. If the current J is better, then update individual best position. Step6: Global best position updating Search for minimum Jmin(iter) among all Jj*(iter), j=1,2,…np, if Jmin(iter) is better among np particles, then, update global best position. Step7: Stopping criteria: If the number of iterations reaches maximum iterations Nm, then stop, or else go to Step3.
4 Simulation and Discussions on the Results Consider tuning the optimal parameters of 2-DOF controller for the following plant using PSO method based on J given by Eq.(8). Plant: P ( s ) =
1 e −3S 10 s + 1
520
H. Wang et al.
Ultimately, the tuning results obtained by PSO are:
λ1 = 2.3759, λ2 = 2.1987;
Fig. 2. The given parameter in literature [5] is λ1 =2 on PSO is
λ1 = 2.3759, λ2 = 2.1987(solid line)
Fig. 3. The given parameter in literature [5] is λ1 =3 on PSO is
λ1 = 2.3759, λ2 = 2.1987(solid line)
, λ2 =1(dotted line),;the parameter based
, λ2 =1(dotted line),;the parameter based
Optimal Design for 2-DOF PID Regulator Based on PSO Algorithm
Fig. 4. The given parameter in literature [5] is λ1 =3 on PSO is
λ1 = 2.3759, λ2 = 2.1987(solid line)
521
, λ2 =2(dotted line),;the parameter based
Fig. 5. when the parameter of the plant changes15%, the dotted line is the method of IMC( λ1 =3
, λ2 =2), the solid line is the method in this paper λ1 = 2.3759, λ2 = 2.1987
From Fig.2-4,we can know that IMC method based response’s curve has larger overshoots and larger setting time and much worse disturbance rejection as compared with those of PSO. The robustness of the proposed method based on PSO is apparent from Fig .5
522
H. Wang et al.
5 Conclusions In this paper the PSO algorithm is presented and applied to design of optimal parameters of 2-DOF controllers. The simulation results show that PSO algorithm based satisfactory performance are much better as compared with those of traditional methods.
References 1. Xu, H., Xu, M., Zhang, F.: Two-degree-of-freedom PID Regulator Design Using an Improved Genetic Algorithm. Journal of System Simulation 11(2), 59–64 (1999) 2. Kung, Y.S., Liaw, C.M., Ouyang, M.S.: Adaptive Speed Control for Induction Motor Drives Using Neural Networks. IEEE Trans. Ind. Electron. 42(1), 25–32 (1995) 3. Qiu, G., Lin, R.: Full Neuron Realization of Parameters Auto-Adjusting Two-degree-offreedom PID. Journal of System Simulation 14(10), 1293–1295 (2003) 4. Zhang, J., Li, L., Chen, Z.: IMC Tuning of Two-degree-of-freedom PID Regulator. Chinese Journal of Scientific Instrument 23(1), 28–30 (2002) 5. Zhang, J., Li, L., Chen, Z.: IMC Tuning of Two-degree-of-freedom PID Regulator. Chinese Journal of Scientific Instrument 23(1), 28–30 (2002) 6. Araki, M., Hidefumi, T.: Two-Degree-of-Freedom PID Controllers. International Journal of Control, Automation, and Systems 1(4) (December 2003) 7. Taguchi, H., Doi, M., Araki, M.: Optimal Parameters of Two-Degree-of-Freedom PID Control Systems. Trans. SICE 23, 889–895 (1987)
An Examination on Emergence from Social Behavior: A Case in Information Retrieval Daren Li, Muyun Yang, Sheng Li, and Tiejun Zhao MOE-MS Key Laboratory of NLP and Speech, School of Computer Science and Technology Harbin Institute of Technology, Harbin, China, 150001 {drli,ymy,tjzhao}@mtlab.hit.edu.cn, [email protected]
Abstract. The swarm intelligence has been applied to enhancing web search. But few researches investigate the emergence from the behaviors of users when they forage information through web search engine. In this paper we study the emergence in users’ click behaviors in AOL log and examine its reliability as the key to queries. We introduce kappa statistic to characterize the emergence through the consistency of users’ clicks on the same query. By analyzing the kappa distribution, we reveal that emergence only occurs to the query issued by a large number of users; and for the queries issued by a fewer users, the clicks are not very reliable as an emergence. We further infer that the occurrence of emergence in users’ click behaviors is somewhat related to the scale of users. It may be unreliable to apply techniques of swarm intelligence to enhancing web search for all the queries through considering all users as agents. Keywords: Emergence, Information Retrieval, Swarm Intelligence, Query Log, Kappa.
brain cells [2]. In brief, current studies consider emergence as an important evidence of the existence of intelligence. There are also some studies trying to enhance the performance of information retrieval through swarm intelligence. Wu and Aberer compared the process of users’ searching information with the process of ants’ hunting for food [3]. They found that the two processes are very analogical and they enhanced the ordinary web server/surfer model through ant’s algorithm. Daniel G. Avello continues to create a model to adapt ant colony optimization methods to swarms of users’ employing a search engine [4]. The above methods both attempted to consider a user of search engines as an agent to apply the techniques of swarm intelligence. However, it is still an open issue whether the emergence phenomenon among the behaviors of the users is reliable enough. To our best knowledge, there are few studies on the emergent behaviors in web search. The most similar work we can find is the recent research on collaborative tagging [17], in which Robu et al. considered the consensus of users’ tagging behaviors in collaborative tagging systems as a kind of emergence phenomenon. Similarly, in this paper, we follow this thread and further consider the consensus of the behaviors performed by different users who issue the same query to the search engine as a kind of emergent behaviors. We wonder to what degree the emergence is reliable enough for search engine to find related documents through the users’ clicks recorded in the query log. The statistic kappa is introduced to simply characterize the agreement of the emergence in information retrieval through the analysis of AOL log , i.e. to measure the consistency of different users’ behaviors (i.e. clicks) when they issue the same query. And we try to observe the emergent behaviors of different users who issue the same query through the distribution of kappa values and reveal the nature of emergence in the interactions between users and search engines. The remainder of this paper is organized as follows. In Section 2, we provide an overview of definitions and several researches on emergence. Section 3 introduces kappa statistic to characterize emergence in query log. The analysis results of AOL log is presented in Section 4. We conclude our work and outline possible future work in Section 5.
2 Related Work Emergence is considered as an evidence of the existence of intelligence, but the definition of emergence is far from resolved. Many researchers try to define emergence from different point of views. A lot of previous studies on swarm intelligence consider emergence is a complex and unpredictable phenomenon emerges from the collaboration and interaction among individuals in a community. Holland first associated the “much from little” idea to emergence, considering emergence as the production of complex behaviors through the interaction of multiple simple entities or rules [9]. In software agent, emergent behavior is also defined as behavior that is not attributed to any individual agent, but a global outcome of agent coordination [2]. In this sense, the emergent behavior is not an individual behavior but a global behavior. Similarly, Wolf and Holvoet define emergence as the macro-level behavior or property that dynamically arise from the interactions between the parts at the micro-level [10].
An Examination on Emergence from Social Behavior
525
Kennedy and Eberhart concluded the previous definitions of emergence and indicated that “Whichever way you choose to consider emergence, the fact remains that very complex systems are able to maintain something like equilibrium, stability, or regularity without any invisible hand or central control” [1], which is supposed to be consistent with our assumption on the definition of emergence mentioned above. As for the deep researches on emergence phenomenon, a lot of studies try to observe and model it in the community of different types of agents since 1970s. The emergent phenomenon is observed by zoologists in the process of some social insects’ foraging strategies [7]. Then emergent behavior in animal societies has become a source of inspiration for scientists and engineers working in the other areas. Kephart et al. conducted a research in information economies. They have observed two emergent phenomena through the variation of price with time in large multi-agent economies: cyclical price/niche wars and spontaneous specialization [13]. Valverde et al. conducted a weighted network analysis of open source communities which revealed that common emergent behaviors in communities of software developers by considering software developers as agents interacting in a complex mathematical network [5]. Cucker and Smale established a complex model to stimulate the velocity of the birds in a flock and explain how the emergence appears according the parameter of the model [6]. These models successfully demonstrate that the emergent behavior can be modeled and employed. However, we find that the occurring condition and reliability of emergence is less addressed. In this paper, we try to focus on this problem through examining the behaviors of users interacting with web search engine according to the large-scale query log, facilitating better adaption of the concepts from swarm intelligence to information retrieval.
3 Characterizing Emergence in Query Log As we mentioned above, many studies have been conducted to observe emergence [3, 4, 11]. But most of them only investigated a small sample and few works have focused on emergence in large-scale data set. With the wide use of search engine, the query log has record and accumulated a large number of users’ behaviors in their interactions with search engine. Many researches in information retrieval have analyzed query log to enhance the performance of web search. In this paper, we consider the query log as a proper large-scale data set to investigate emergence. To characterize emergence in the query log, we introduce kappa statistic. Kappa statistic is designed to measure the degree of agreement among different raters who rate each of a sample of subjects and it is widely used in the social and biomedical sciences. It was originally introduced by Cohen [14] and developed by Fleiss [15]. Cohen’s kappa can be only applied to measure the inter-rater agreement of two raters, while Fleiss’s kappa generalizes Cohen’s kappa to measure agreement among more than two raters. Recent work on information retrieval uses Fleiss’s kappa to measure the consistency of different raters’ explicit relevance judgments [16]. In this paper, we also use Fleiss’s kappa statistic to measure the consistency of users’ behaviors who issue the same query.
526
D. Li et al.
Fleiss’s kappa is calculated as follows:
κ= where
Pa − Pe 1 − Pe
Pa is the observed probability of agreement among different raters, and Pe is
the expected agreement when the raters are rating independently and randomly. Kappa is thus the proportion of the excess of agreement above the expected agreement. The maximum value of kappa is 1.0, which means the raters are in perfect agreement. If kappa value is lower than 0, we consider there is no agreement among the raters. The advantage of kappa statistic is that we can directly interpret the meaning from its value. Landis and Koch [12] have given the Table 1 to interpret kappa value. They suggest that values of kappa above 0.60 show good to excellent agreement among different raters and values of 0.40 or less show fair to poor agreement. Table 1. Interpretation of Kappa Value
Interpretation No agreement Slight agreement Fair agreement Moderate agreement Substantial agreement Almost perfect agreement
In this paper, we consider each user of a query as a rater and users’ click as a kind of rating. So we can infer the consistency level of different users’ click behaviors who issue the same query from the kappa value of the query’s click sets. The queries with high kappa value are supposed to be the representation of emergence in the process of different users foraging the same or the same kind of information.
4 Emergence in Query Log In Section 3, we introduce kappa statistic which can be used to characterize emergence in web search query log. In this section, we discuss our exploration into observing the emergence in query log. First, we examine how many queries with different kappa values are and depict the distribution of kappa values. Second, we study how the shape of the curve of kappa value develops with the number of users in the group who issue the same query and try to indicate what kind of condition emergence requires to occur. 4.1 Dataset In this paper, we use a subset of the AOL query log from March 1, 2006 to May 31, 2006, which is distributed for non-commercial research use. The query log contains
An Examination on Emergence from Social Behavior
527
AnonID (an anonymous user ID number), Query (the query issued by the user), QueryTime (the time at which the query was submitted for search), ItemRank (the rank of the result on which the user clicked) and ClickURL (the domain portion of the URL in the clicked result) for each data record. Notice that there are a lot of queries on which users don’t click any results. In our following statistics, we exclude the queries without any clicks and queries issued by only one user, because their kappa value can’t be calculated. Table 2 summarizes the basic statistics of the dataset which we use in the following analysis. Table 2. Basic statistics of dataset
4.2 Kappa Distribution over Queries This analysis tries to give an overview of the consistency level of the commercial query log and present the proportion of queries which reliable emergence occurs to. In Figure 1, we can see that the number of distinct queries and query instances is evenly distributed when the kappa value is bigger than 0. About 45.4% queries’ kappa value is bigger than 0.6, on which users’ click behaviors are almost perfectly consistent. Because we don’t take the number of people who issue the query into account, we can’t eliminate the random factor of the consensus. Therefore we can’t affirm the users’ consistent click behaviors to be emergent behaviors. But it is reasonable to consider that the emergence may be reliable enough to apply the techniques of swarm intelligence to finding the related documents for this kind of queries.
Fig. 1. The number of distinct queries and query instances as a function of kappa value
528
D. Li et al.
Furthermore, 54.6% queries falls below 0.6 for their kappa values, which means users’ click behaviors on the queries are somewhat inconsistent. Limiting to the scale of the query log, we can only consider users who issued the same one of those queries haven’t come to reliable consensus yet. From our dataset, the emergence of clicks as a reliable answer to a given query is not very significant. 4.3 Kappa Distribution over Users In order to further study on what kind of queries the emergence phenomenon among the click behaviors of users is reliable enough for the application of the techniques of swarm intelligence, we investigate the relationship between the kappa value of a query and the number of users who issue the query. Kappa curve for the relationship is depicted in Figure 2, combining with the distribution of number of queries and query instances as a function of the number of users in a group who issue the same query. We find that the number of query decreases sharply when the number of users in a group grows over 10. So we set a variant step length for clarity. In Figure 2, we find that with the number of users for a given query growing, the average kappa value of the query gradually increasing. For the queries with a small number of users, the click behaviors of each user are quite different from each other. Intuitively, for the queries with more users, the user clicks for them should be more inconsistent. However, the high kappa values of those queries deny the hypothesis. The kappa curve shows that the more a query is issued, the more consistent click behaviors the users have on that query. When the number of users in a group is bigger than 10, the kappa values of the queries are over 0.6 which indicates the click behaviors of the users who issue the same one of the queries reach the approximate consensus on the query. In contrast, the kappa values of the queries issued by less than 10 users are all below 0.6. Users’ behaviors on those queries are somewhat disordered. Furthermore, kappa values of the queries issued by a large number of users
Fig. 2. Average kappa value of queries as a function of number of people in a group who issue the same query (dashed line), the number of queries as a function of number of people in a group (solid line), the total number of instances of the same frequent queries (columns)
An Examination on Emergence from Social Behavior
529
are quite big and we consider that it is hard to make a large number of users click almost consistent results on one query all by chance. So we simply assume that the phenomenon is inevitable. Therefore, the emergence phenomenon among the click behaviors on the queries which are issued by a large number of users in AOL query log is significant. In this sense, the emergence of consensus clicks is not observable for the less frequent queries. Hence, we suppose that it is more reasonable and convincible to apply the techniques of swarm intelligence to the queries issued by a large number of users rather than to those issued by a minority of users.
5 Conclusion In this paper, we try to investigate the emergence from the behaviors of users who search information through web search engine. We survey the definitions of emergence in previous literatures and clarify the definition used in this paper. Kappa statistic is introduced to characterize emergence through the consistency of users’ click behaviors on one query. We calculate the kappa values of queries in AOL log and find that there are a lot of queries on which users’ behaviors are almost perfectly consistent. Further analyzing the relationship between the number of users in a group who issue the same query and the kappa value of the query, we discover that the more users a query is issued by, the more consistent click behaviors the users have on that query. From our analysis, we can infer that the occurrence of reliable emergence in users’ click behaviors is somewhat related to the scale of users. The techniques of swarm intelligence can’t be indiscriminately applied to all kinds of queries issued to search engine to improve the quality of the results. It is more reasonable and feasible to apply the techniques of swarm intelligence to the queries issued by a large number of users rather than to those less frequent ones It should be pointed that there exists no clear definition of emergence in web search, since this is somewhat less touched issue for IR researches. This paper only studies the applicability emergent user clicks in one simple aspect: the consensus of click behaviors of different users on one query. and the statistical standards also deserves further examinations. We do deem that emergence phenomenon in web search is many-folded, which needs further explorations and verification.
References 1. Kennedy, J., Eberhart, R.C.: Swarm Intelligence. Morgan Kaufmann, San Francisco (2001) 2. Li, Z., Sim, C., Low, M.Y.H.: A survey of emergent behavior and its impacts in agentbased systems. In: IEEE International Conference on Industrial Informatics, pp. 1295– 1300 (2006) 3. Wu, J., Aberer, K.: Swarm Intelligent Surfing in the web. In: Cueva Lovelle, J.M., Rodríguez, B.M.G., Gayo, J.E.L., Ruiz, M.d.P.P., Aguilar, L.J. (eds.) ICWE 2003. LNCS, vol. 2722, pp. 431–440. Springer, Heidelberg (2003)
530
D. Li et al.
4. Avello, D.G.: Making the road by searching – a search engine based on swarm information foraging (2009) 5. Valverde, S., Theraulaz, G., Gautrais, J., Fourcassie, V., Sole, R.V.: Emergent behavior in agent networks: Self-Organization in Wasp and Open Source Communities. IEEE Intelligent System Special Issue on Self-Management through Self-Organization 21(2), 36–40 (2006) 6. Cucker, F., Smale, S.: Emergent Behavior in Flocks. IEEE Transactions on Automatic Control 52(5), 852–862 (2007) 7. Oster, G., Wilson, E.O.: Caste and ecology in the social insects. Princeton University Press, Princeton (1978) 8. Hillis, W.D.: Intelligence as an emergent behavior; or The Songs of Eden. Daedalus 117(1), 175–189 (winter 1988) 9. Holland, J.H.: Emergence: From Chaos to Order. Addision-Wesley, Reading (1998) 10. Wolf, T.D., Holvoet, T.: Emergence Versus Self-Organisation: Different Concepts but Promising When Combined. In: Engineering Self-Organising Systems: Methodologies and Applications, pp. 1–15 (2005) 11. Schockaert, S., Cock, M.D., Cornelis, C., Kerre, E.E.: Clustering Web Search Results Using Fuzzy Ants. International Journal of Intelligent Systems 22(5), 455–474 (2007) 12. Landis, J.R., Koch, G.G.: The measurement of inter-rater agreement for categorical data. Biometrics 33, 159–174 (1977) 13. Kephart, J.O., Hanson, J.E., Levine, D.W., Grosof, B.N., Sairamesh, J., Segal, R.B., White, S.R.: Emergent Behavior in Information Economies. In: International Conference on Multi-Agent Systems (1998) 14. Cohen, J.: A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20, 37–46 (1960) 15. Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychological Bulletin 76(5), 378–382 (1971) 16. Teevan, J., Dumais, S.T., Liebling, D.J.: To personalize or not to personalize: modeling queries with variation in user intent. In: Proceedings of SIGIR 2008, pp. 163–170 (2008) 17. Robu, V., Shepherd, H.: Emergence of Consensus and Shared Vocabularies in Collaborative Tagging Systems. ACM Transaction on the Web 3, 1–34 (2009)
A Novel Fault Diagnosis Method Based-on Modified Neural Networks for Photovoltaic Systems Kuei-Hsiang Chao1, Chao-Ting Chen1, Meng-Hui Wang1, and Chun-Fu Wu2 1 Department of Electrical Engineering, National Chin-Yi University of Technology 35, Lane 215, Sec. 1 Chung-Shan Rd., Taiping City, Taichung County 411 Taiwan, R. O. C. {chaokh,wangmh}@ncut.edu.tw, [email protected] 2 Photovoltaics Technology Center, Industrial Technology Research Institute 195, Sec. 4, Chung Hsing Rd. Chutung, Hsinchu, Taiwan 310, R.O.C. [email protected]
Abstract. The main purpose of this paper is to propose an intelligent fault diagnostic method for photovoltaic (PV) systems. First, Solar Pro software package was used to simulate a photovoltaic system for gathering power generation data of photovoltaic modules during normal operations and malfunctions. Then, the collected power generation data was used to construct matter-element models based on extension theory for PV systems. The matter-element model combines with the neural networks to form an intelligent fault diagnosis system for PV systems. The proposed fault diagnosis method was adopted to identify the faulty types of a 3.15kW PV system. The simulation results indicate that the proposed fault diagnosis method can detect the malfunction types of PV system rapidly and accurately with less time and memory consumption. Keywords: Extension theory, Fault diagnosis, Photovoltaic (PV) system, Matter-element model, Neural networks.
analyzers and other additional measurement equipment. The technologies applied in both methods require detailed calculations of the internal wire and cable lengths of PV modules. Consequently, maintenance engineers must possess sufficient PV power generator-related expert knowledge. Furthermore, if this fault diagnosis technology is applied to large-scale PV power generation systems, cable lengths must be calculated accurately or misjudgment of module fault regions will easily occur. Therefore, the application of this diagnosis technology does not facilitate the convenience of system repair. Although, the extension theory has been employed in PV system fault diagnosis, its identification rate still can not achieve 100%[5]. To promote the fault diagnosis accuracy rate, an intelligent fault diagnosis method based on extension theory with neural networks was proposed in this paper. The proposed method has the advantages of less learning time, higher accuracy and less memory consumption.
2 Capture of PV Power Generation System Fault Data In this paper a 3.15kW PV power generation system consisting of NT-R5E3E PV modules connected into 9×2 series–parallel is constructed by solar pro software package for testing[6]. It demonstrates that the results simulated by Solar Pro are completely consistent with the figures of the datasheet (not shown here)[7]. The voltage of maximum power point Vmp, current of maximum power point Imp, power of maximum power point Pm, and open-circuit voltage Voc were selected as the fault diagnosis characteristics by which faults and fault types of PV power generation systems are diagnosed in this paper.
3 The Proposed Modified Neural Networks 3.1 The Structure of the Proposed Modified Neural Network The proposed modified neural network combines the concept of an extension theory with the concept of a neural network, and calculates the relations among various data through the extension distance (ED)[8,9]. Figure 1 shows the architecture of the proposed modified neural network; it has input and output layer neurons, the input data are first classified in the architecture, and are then read in the proposed modified neural network, the output layer stores the calculated extension distance. The connection between the input layer and the output layer is the weighting factor, which includes the upper limit of the weighting factor, the weighting center, and the lower limit of the weighting factor. Finally the minimum ED value of the output layer of different types is determined, and the type of data is judged [10,11]. 3.2 Learning Process of the Proposed Modified Neural Network The proposed modified neural network is of supervised learning, which means first inputting a characteristic sample, if the characteristic sample does not match the preset target value, then, the weighting factor shall be modified, and the accuracy rate of the identification system can be effectively improved by adjusting the weighting factor.
A Novel Fault Diagnosis Method Based-on Modified Neural Networks
533
x inp
x i p1 x ip2
Fig. 1. The architecture diagram of the proposed modified neural network
The parameters are defined before learning, first the learning sample is defined by X = X 1 , X 2 , X 3 ,...X N and N p is the total amount of the learning samples, each sam-
{
p
}
ple contains the characteristic and type of data X ip = {xip1 , xip2 , xip3 ,..., xinp } , among which i = 1, 2, 3,…, N p , and n is the characteristic number, and p is the type. If Nm is the
total number of errors, then, the total error ratio, Eτ can be defined as follows: Eτ =
Nm . Np
(1)
The learning rule of proposed modified neural network is described as follows [9,10]: Step 1: Set up the extension matter-element model, and set the weighting factor between input and output layers, the kth matter-element model can be expressed as ⎡Nk , c1 , Vk1 ⎤ ⎢ ⎥ c2 , Vk 2 ⎥ Rk = ⎢ j = 1, 2,...,n k = 1, 2,...,nc . ⎢ # #⎥ ⎢ ⎥ c j , Vkj ⎥⎦ ⎢⎣
;
(2)
where N k is the name of the kth type; cj is the jth characteristic of N k ; Vkj is the classical region < wkjL , wkjU >; nc is the total classification clusters of output end.
534
K.-H. Chao et al.
The classical regions wUkj and wkjL of the proposed modified neural network are:
wkjL = min {xijp } . i∈N
(3)
wkjU = max{xijp } .
(4)
p
i∈ N p
Where, xijp represents the inputted learning data of the proposed modified neural network. Step 2: Find the weighting center z kj of each cluster. Z k = {z k 1 , z k 2 ,..., z kn } . (5) U L ( w + wkj ) (6) z kj = kj . 2 Among which, j = 1, 2 ,..., n k = 1, 2,..., nc . Step 3: Read in the ith learning sample of p type, expressed as
;
X ip = {xip1 , xip2 ,..., xinp }, x ∈ nc .
(7)
Step 4: Use (8), the extension distance equation to calculate the distance between learning sample and various clusters[8,9].
Where, xijp is the ith learning sample of type p, and the characteristic of type p is j; z kj is the weighting factor center of jth input end point and kth output end point. Step 5: The new cluster k* can be obtained after the calculation of extension distance, and EDik*=min{EDik}. If k* = p, skip to step 7 directly; on the contrary, execute step 6. Step 6: Adjust and update the weighting factors of p-th and k*-th clusters as follows: (1) Update the weighting center values of p-th and k*-th clusters.
z new = z old + η( xijp − z old ). pj pj pj
(9)
z new = z kold* j − η( xijp − z kold* j ) . k* j
(10)
(2) Update the weighting factors of p-th and k*-th clusters. ⎧ w pjL ( new ) = w pjL ( old ) + η( xijp − z old ) pj . ⎨ U ( new ) U ( old ) p old = w pj + η( xij − z pj ) ⎩ w pj
(11)
) ⎧ wkL*( jnew ) = wkL*( old − η( xijp − z kold* j ) j . ⎨ U ( new ) U ( old ) p old ⎩ wk* j = wk* j − η( xij − z k* j )
(12)
η: the learning rate of extension neural network , z old : the new and old weighting center value of characteristic j of type p after z new pj pj learning. , old : the new and old weighting center value of characteristic j of cluster k* after z new k* j z k * j learning.
A Novel Fault Diagnosis Method Based-on Modified Neural Networks
535
w pjL ( new ) , w pjL ( old ) : the new and old lower limit of weighting of characteristic j of type p. wUpj( new ) , wUpj( old ) : the new and old upper limit of weighting of characteristic j of type p. wkL*( jnew ) , wkL*( jold ) : the new and old lower limit of weighting of characteristic j of type k*. wkU*(jnew ) , wkU*(jold ) : the new and old upper limit of weighting of characteristic j of type k*.
Step 7: This learning is finished after all the samples are classified; otherwise, repeat the calculation procedure from Step 3 to Step 6. Step 8: When the total error rate Eτ meets the expected target value, stop the calculation procedure, otherwise, return to Step 3 to continue. Figure 2 shows the adjustment process of the weighting factors, as described in step 6. The learning sample xij, in Fig. 2 (a), belongs to cluster B. However, the result of the extension distance equation is EDA < EDB , thus, it is classified as cluster A. As shown in Fig. 2 (b), after the adjustment of the weighting factor, the new extension distance is EDA′ > EDB′ , hence, the learning sample xij can be classified as cluster B.
(a)
(b)
Fig. 2. Adjustment process of cluster weighting: (a)before learning; (b)after learning
4 The Proposed Fault Diagnosis Method for PV Systems 4.1 Division of Operation Regions
As irradiation and temperature change over time, the module temperature and irradiation ranges that potentially appear through a day are categorized into 21 categories. 2 2 2 Between 300W/m and 1,000W/m , every 100W/m is designated as one interval. Each interval is then divided into three sub-interval by every 10°C between 31°C and 60°C. All categories are shown in Table 1. 4.2 Matter-Element Model for Fault Types
Under identical irradiation and module temperature conditions, this paper divides fault categories of PV power generation systems into 10 different types, as shown in
536
K.-H. Chao et al.
Table 2. In addition, the upper and lower limits of the classical domain between 2 2 irradiation of 301W/m and 1,000W/m and module temperature between 31°C and 40°C for 21 regions under 10 different types can be obtained from simulation results. 4.3 Fault Diagnosis Procedure of the Proposed Modified Neural Network
Once the learning process is completed, identification or classification can be conducted, the calculation procedure is: Step 1: Read in weighting matrix of the proposed modified neural network. Step 2: Read the sample to be identified. Step3: Use (8) to determine the extension distance between the identification sample and the cluster after learning. Step 4: Determine the minimum extension distance to judge what cluster type the identification sample is. Step 5: Check whether all samples are tested, stop calculation if the identification is finished, otherwise return to step 2 and read the new sample to be identified. Table 1. 21 regions divided by temperature and irradiation intervals
Fault Condition Normal operation. One module fault in either of the two branches. Two modules fault in one branch. Three modules fault in one branch. One module fault in each branch. Two modules fault in each branch. Three modules fault in each branch. One module fault in one branch and two modules fault in another branch. One module fault in one branch and three modules fault in another branch. Two modules fault in one branch and three modules fault in another branch.
A Novel Fault Diagnosis Method Based-on Modified Neural Networks
537
5 Testing Results and Discussions In order to test the effectiveness of the proposed PV system fault diagnosis method, the 21 categories of irradiation, module surface temperature and selected fault module number were entered into Solar Pro simulation software to obtain the maximum power(c1), voltage of maximum power point(c2), current of maximum power point(c3), and open circuit voltage(c4) during system operation. These four data amount to 3990, which are the most important characteristics of the PV system for fault diagnosis. Half of these data can act as the learning data of the proposed modified neural network in 21 category regions to establish the fault diagnosis system of PV 2 systems. After learning, 10 data at different fault type under irradiation from 300 W/m 2
2
Table 3. Characteristics for different fault types under irradiation from 300 W/m to 1,000W/m and module temperature from 31°C to 60°C Test# 1 2 3 4 5 6 7 8 9 10
Table 4. Extension distance and fault diagnosis results for test data belonging to different fault 2 2 types under irradiation from 300 W/m to 1,000W/m and module temperature from 31°C to 60°C Extension Distance(ED)
Table 5. Accuracy comparision between the proposed method and MLP under irradiation from 2 2 300 W/m to 1,000W/m and module temperature from 31°C to 60°C The proposed method MLP(4-7-10) MLP(4-8-10) MLP(4-9-10)
Total learning epochs 22 8,507 11,089 8,597
Learning accuracy 100% 90.84% 85.65% 96.64%
Diagnosis accuracy 100% 93.33% 90% 93.33%
2
to 1,000W/m and module temperature from 31°C to 60°C are selected and list in Table 3 for testing. Table 4 shows the results of fault diagnosis. It can be seen from Table 4 that the diagnosis results of the PV system fault diagnosis method developed in this paper are completely consistent with the selected fault types already known; the fault diagnosis accuracy is extremely high. To further demonstrate the superiority of the proposed modified neural network in fault diagnosis accuracy, Table 5 shows the fault diagnosis results with different neural networks, in which we use 1995 samples obtained from simulation to learn and 1995 samples for testing. It indicates that the proposed modified neural network has a short learning time and more learning accuracy and recognized accuracy than the multilayer perceptions (MLP) method with different perceptions.
6 Conclusions An intelligent fault diagnosis method of a PV system was developed in this paper. First, the Solar Pro software package was used to complete simulations for a 3.15kW PV power generation system operating normally and in fault operations in order to capture the characteristics under different fault types. Through the classical domains and neighborhood domains of these characteristics, fault diagnosis for the PV power generation system was performed using a fault diagnosis method based on a modified neural network with extension distance. The proposed fault diagnosis method needs less learning data and is capable of rapid learning and identification. Consequently, this method is able to quickly and accurately identify the fault types of PV power generation systems. Acknowledgments. This work was supported by National Science Council, Taiwan, Republic of China, under the Grant NSC 97-2622-E-167-012-CC3.
References [1] Betcke, J., Dijk, V.V., Reise, C., Wiemken, E., Toggweiler, P., Hoyer, C., Heineman, D., Wiezer, H.D.F., Beyer, H.G.: PVSAT: Remote Performance Check for Grid Connected PV Systems Using Satellite Data, Evaluation of One Year Field-Testing. In: Proceedings of 17th European Photovoltaic Solar Energy Conference (2001) [2] Lorenz, E., Betcke, J., Drews, A., de Keizer, C., Stettler, S., Scheider, M., Bofinger, S., Beyer, H.G., Heydenreich, W., Wiemken, E., van Sark, W., Feige, S., Toggweiler, P., Heilscher, G., Heinemann, D.: PVSAT-2: Intelligent Preformance Check of PV System Operation Based on Satellite Data. In: Proceedings of 19th European Photovoltaic Solar Energy Conference (2004)
A Novel Fault Diagnosis Method Based-on Modified Neural Networks
539
[3] Schirone, L., Califano, F.P.: Fault Finding in a 1MW Photovoltaic Plant by Reflectometry. In: Proceedings of the 1st Word Conference on Photovoltaic Energy Conversion, pp. 846–849 (1994) [4] Takashima, T., Otanil, K., Sakuta, K.: Electrical Detection and Specification of Failed Modules in PV Array. In: Proceedings of the 3rd World Conference on Photovoltaic Energy Conversion, pp. 2276–2279 (2003) [5] Chao, K.H., Ho, S.H., Wang, M.H.: Modeling and Fault Diagnosis of a Photovoltaic System. Electric Power Systems Research 78, 97–105 (2008) [6] Solar Pro Brochure, Laplace System Co, http://www.lapsys.co.jp/english/e_products/e_pamphlet/ sp_e.pdf [7] Sharp NT-R5E3E PV Module User Menu, Sharp Corporation of Australia, http://www.sharp.net.au/catalogue/brochures/NTR5E3E_1127_bro chure.pdf [8] Cai, W.: The Extension Set and Incompatibility Problem. J. Scientific Exploration 1, 81– 93 (1983) [9] Chao, K.H., Lee, R.H., Yen, K.L.: An Intelligent Traffic Light Control Method Based on Extension Theory for Crossroads. In: Proceedings of the Seventh International Conference on Machine Learning and Cybernetics, pp. 1882–1887 (2008) [10] Wang, M.H.: Extension Neural Network for Power Transformer Incipient Fault Diagnosis. IET Generation, Transmission and Distribution 150, 679–685 (2003) [11] Wang, M.H.: Partial Discharge Pattern Recognition of Current Transformers Using an ENN. IEEE Transactions on Power Delivery 20, 1984–1990 (2005)
Wavelet Packet and Generalized Gaussian Density Based Textile Pattern Classification Using BP Neural Network Yean Yin, Liang Zhang, Miao Jin, and Sunyi Xie College of Computer Science, Wuhan University of Science and Engineering, Wuhan 430073, China [email protected]
Abstract. This paper presents a combined approach to classify the textile patterns based on wavelet packet decomposition and a BP neural network classifier. On the accurate modeling of the marginal distribution of wavelet packet coefficients using generalized Gaussian density (GGD), two parameters are calculated for every level wavelet packet sub-band by moment matching estimation (MME) or by maximum likelihood estimation (MLE). The parameter vectors then are taken as the pattern matrix to a BP neural network for recognition. The proposed method was verified by experiments that using 16 classes of textile patterns, in which the correct recognition rate is as high as 95.3%. Keywords: Wavelet packet; BP neural network; Generalized Gaussian Density; Moment matching estimation; maximum likelihood estimation.
Wavelet Packet and GGD Based Textile Pattern Classification
541
multi-resolution analysis maximizes the simultaneous localization of energy in both spatial and frequency domains for human and mammalian vision[9–10], such as Gabor filters and wavelet transform. Gabor filters, with different scale and orientation tuning, are powerful tools for texture discrimination and segmentation. An inherent demerit of the Gabor filter is, however, that they are computationally intensive in extracting the most suitable features which represent the texture. In addition, the outputs in the Gabor-filtered images are not mutually orthogonal, which may cause a significant correlation between texture features. Most of these problems can be avoided if the wavelet transform is applied, providing a powerful analytical tool at different resolutions. The wavelet transform down-samples the signal and the length of the decomposed signal is reduced. Chang and Kuo [2] found that the texture features are focused in the intermediate frequency band. Laine and Fan [8] achieved successful results in the texture classification using wavelet packet signatures. Because the mean of a sub-band wavelet coefficient is equal to zero, the sub-band energy exactly equals its variance, which means fitting sub-band histogram by Gaussian function [11]. However, the fitting error is relatively large. Mallat (1989) [9] proposed a fitting algorithm for the sub-band histogram by generalized Gaussian density (GGD) and characterized the sub-band using the GGD parameters. Mallat’s GGD parameters are far less than the elements in wavelet sub-bands, which make it possible to practical recognition system. Recently, neural network based pattern recognition frame have been successfully applied to the various classification problems. In this paper, a combined approach is proposed to classify textile patterns. First, wavelet packet is employed to transform the images to sub-bands at various levels for capturing the dominant texture features in high and intermediate frequency bands. Then based on GGD model, two parameters are estimated by maximum likelihood from the data in sub-band at every level; the vectors arranged from every level’s GGD parameters are taken as input recognition. Finally, a BP network is used to classify the textile patterns.
2 Wavelet Packet Coefficients and Generalized Gaussian Density (GGD) Wavelet has been an effective tool to analyze texture information as it can provide a natural partition of the image spectrum into multi-scale and oriented sub-bands via efficient transforms [2-4]. Wavelet based methods calculate energy as extracted features for texture discrimination at the output of the sub-band filters. The principle behind this is the assumption that the energy distribution in the frequency domain identifies a texture. Besides, evidence showed these approaches are partly supported by physiological research results of the visual cortex [12]. The only problem is that it produces a number of elements that are too large to be impractical when using a recognition scheme. Meanwhile, statistical modeling is much easier if some preprocessing is carried out on an image. Typical preprocessing is done via transformation of image pixel values into a mapped space where simple models with a small number of parameters can describe the data. On the other hand, statistical approaches treat texture analysis as a probability inference problem [13]. A natural extension of the energy method is to model a texture by the marginal densities of wavelet sub-band coefficients. This is
542
Y. Yin et al.
justified by recent psychological research on human texture perception, which suggests that two homogeneous textures are often difficult to discriminate if they produce similar marginal distributions of responses from a bank of filters [12]. In this paper, images’ texture is simply characterized via marginal distributions of their wavelet packet sub-band coefficients, which is more precise than the ones that use wavelet sub-band energies alone. Mallat (1989) [9] noticed that, for a variety of images, the distributions of sub-band wavelet coefficients appear similar. Typically, the distributions were symmetric about zero and had a sharp peak at zero. Mallat proposed modeling “typical wavelet coefficients” using the GGD as follows:
p ( x;α , β ) =
β 2αΓ(1 / β )
e
−( x / α ) β
(1)
∞
Where
Γ( z ) = ∫ e −t t Z −1dt z > 0 0
In formula (1), α represents the width of the PDF peak (standard deviation), β is inversely proportional to the decreasing rate of the peak. Sometimes, α is referred to as the scale parameter while β is called the shape parameter. As β decrease, the GGD become sharper and the rate of the peak increase. For special cases, the GGD model contains the Gaussian and Laplacian PDFs, when β = 2 and β = 1 , respectively. To estimate the GGD values, some algorithms proposed will be described next. The accuracy of GGDs in modeling wavelet coefficients from texture images has been shown in reference [14] by fitting the estimated PDF curve with the actual histogram of the coefficients. Experiments show that a good PDF approximation for the marginal density of coefficients at a particular sub-band produced by various types of wavelet transforms may be achieved by adaptively varying two parameters of the generalized Gaussian density (GGD) [15-16].
3 Algorithms of Estimation for Generalized Gaussian Density 3.1 Moment Matching for Estimation (ME) of
α
and
β
Moment estimation is defined as a kind of point estimation by assuming that moments of sample set equal moments of probability distribution function (PDF). The first order absolute and second order moments (normalized) of the sample set X = {x1 , x2 ,L, x L } are defined as:
m1 =
1 L ∑ xi , j L j =1
⎞ ⎛1 L m 2 = ⎜⎜ ∑ xi2, j ⎟⎟ ⎠ ⎝ L j =1
(2) 1/ 2
(3)
Wavelet Packet and GGD Based Textile Pattern Classification
543
So, according to formula (1), corresponding moment values of the GGD can be derived as: ∞
M 1 = ∫ x p( x, α , β )dx = 2 K −∞
α2 Γ( 2 / β ) β
(4)
α3 Γ(3 / β ) β
(5)
∞
M 2 = ∫ x 2 p( x, α , β )dx = 2 K −∞
Let
M 1 = m1 and M 2 = m2 , then it can be obtained:
βˆ = F −1 (
m12 Γ(1 / βˆ ) ) and αˆ = m1 m2 Γ(2 / βˆ )
(6)
Where
Γ 2 (2 / x) F ( x) = Γ(1 / x)Γ(3 / x)
(7)
By calculating formulae (5) and (6), parameters of α and β can be obtained. But compared to the more accurate one called Maximum likelihood estimation propose by Do and Vetterli(2002) [4] , it has relatively large errors. 3.2 Maximum Likelihood Estimation (MLE) for Obtaining
α and β
Under the condition that the components of X are identical density, the likelihood function of the sample set X is defined as L
L
i =1
i =1
G ( X ; α , β ) = log ∏ p ( x; α , β ) = ∑ log p ( x; α , β )
(8)
This function can be maximized by setting the first partial derivatives of G with respect to α and β :
∂G ( X ; α , β ) L L β xi = +∑ ∂α α i =1 α
β
=0 β
⎛ xi ∂G ( X ; α , β ) L GΨ (1 / β ) L ⎛⎜ xi ⎞⎟ = + − ∑ ⎜ ⎟ log⎜⎜ 2 β ∂β β i =1 ⎝ α ⎠ ⎝α
(9)
⎞ ⎟=0 ⎟ ⎠
(10)
544
Y. Yin et al.
where
Ψ (.) is the digamma function, which is expressed by Ψ ( z ) = Γ' ( z ) / Γ ( z )
(11)
By the fact that β > 0 and solving the equations (9) and (10), then
⎛β L β ⎞ αˆ = ⎜ ∑ xi ⎟ ⎝ L i =1 ⎠
1/ β
(12)
Substituting (12) into (10), the shape parameter βˆ can be calculated by the following transcendental formula: L
∑x Ψ (1 / βˆ ) i =1 i ˆ g (β ) = 1 + − L βˆ
βˆ
log xi
∑x i =1
βˆ i
⎛ βˆ L + log⎜⎜ ∑ xi ⎝ L i =1
βˆ
⎞ ˆ ⎟/ β = 0 ⎟ ⎠
(13)
This equation can be solved by Newton-Raphson iteration algorithm with a suitable initial value to obtain parameter βˆ . A reasonable guess of the initial value can be the calculated results by the moment matching algorithm.
4 Pattern Recognition Scheme by BP Neural Network Once the parameters α and β are at hands, it is the task to classify the textile patterns. There are numerous methods to do this. BP Neural network is our choice. A BP network is multi-layer, fully connected and feed-forward, which has been wildly used to recognize patterns. In our case, parameter α and β in every sub-band were taken as input vectors and the corresponding target vectors are used to train the BP network to develop internal relationships between nodes so as to organize the training data into classes of patterns. This same internal representation can be applied to inputs which are not used during training. The trained BP network tends to give reasonable answers when presented with inputs that the network has never seen. This generalization property makes it possible to train a network on a representative set of input/ target pairs and get good results without training the network on all possible input/output pairs. In this study, a three-layer BP neural network was designed and the architecture is illustrated in Fig.1. The network is formulated as a two-layer tangent sigmoid/logistic sigmoid network in which the logistic sigmoid transfer function is employed since its output range is perfect for learning the output bipolar values, i.e. 0 and 1. There are 128(64*2) variables of the inputs for wavelet packet decomposition at level 3,32(16*2) variables at level 2 and 8 (4*2) variables at level 1. The number of neural nodes of the input layer is 128 corresponding to the sum of 128 parameters in subbands. The number of neural nodes of the output layer is 4, which output 0000 to 1111, corresponding to the “first” - “sixteenth” classes. The neuron’s number of hidden layers is doubled to the number of inputs, which is confirmed by testing. The
Wavelet Packet and GGD Based Textile Pattern Classification
… . Input layer( 8 or 32 or 128)
545
… . Output layer(4)
Hidden layer(16 or 64 or256)
Fig. 1. The architecture of the BP neural network
training function of the BP neural network is a gradient descending function based on a momentum and an adaptive learning rate. The learning algorithm of the connection weights and the threshold values is a momentum-learning algorithm based on gradient descending.
5 Experimental Results Total 16 images (512 by 512 pixels) shown in Fig.2 were taken to form the textile database. Each of the 512 × 512 images was divided into 40 sub-images (each has 128 × 128 pixels), thus creating a test database of 640 texture images. Only grayscale levels of the images were used in the experiments. Furthermore, to eliminate the effect of common range in the gray level of sub-images from a same original image and to make the classification task less biased each sub-image was individually normalized to zero mean and unit variance before the processing. For one type of the textile pattern, half sub-images were used to train the BP network and the left half for testing. In the experiment, wavelet packet decomposition with maximum three levels is applied with the Daubechies’ maximally flat orthogonal filters of length 8. From one sub-image in the database (of size 128 × 128), two GGD parameters were calculated from each of wavelet packet sub-bands using the MM estimator or ML estimation. These have done based on our hypothesis that those model parameters could capture important texture-specific features and have discrimination power among texture classes. Table 1 shows the average recognition rates for left 16 classes of total 320 images. It can be seen that the highest rates are achieved at wavelet packet decomposition at level 3. Level 4 decomposition was not carried out duo to small size. It is also shown that maximum likelihood estimation is better than the moment matching estimation.
,
546
Y. Yin et al.
Fig. 2. 16 classes of textile patterns Table 1. Average Recognition Correct Rates
6 Conclusion A novel approach of textile pattern classification is studied based on wavelet packet decomposition and general Gaussian Density Modeling, applying the BP neural network as its classifier. Experiments show that the proposed method has achieved high recognition rates. This can be applied to textile products classification and defaults detection. Acknowledgments. This work has being supported by the funds of The Key Lab of Textile Equipments of Hubei Province ( Project number : DTL200601 ), The Educational Department of Hubei Province (Project number: D200717004) and The Science and Technology Department of Hubei Province (project number: 2009CDB114).
Wavelet Packet and GGD Based Textile Pattern Classification
547
References 1. Aiazzi, B., Alparone, L., Baronti, S.: Estimation based on entropy matching for generalized Gaussian PDF modeling. IEEE Signal Process. Lett. 6(6), 138–140 (1999) 2. Chang, T., Kuo, C.C.J.: Texture analysis and classification using tree-structured wavelet transform. IEEE Trans. Image Process. 2(4), 429–441 (1993) 3. Crouse, M., Nowak, R.D., Baraniuk, R.G.: Wavelet-based statistical signal processing using hidden Markov models. IEEE Trans. Signal Process. 46(4), 886–902 (1998) 4. Do, M.N., Vetterli, M.: Wavelet-based texture retrieval using generalized Gaussian density and Kullback–Leibler distance. IEEE Trans. Image Process. 11(2), 146–158 (2002) 5. Fan, G., Xia, X.-G.: Improved hidden Markov models in wavelet domain. IEEE Trans. Signal Process. 49(1), 115–120 (2001) 6. Heeger, D.J., Bergen, J.R.: Pyramid-based texture analysis/synthesis. Proc. ACM SIGGRAPH 3, 23–26 (1995) 7. Kokkinakis, K., Nandi, A.K.: Exponent parameter estimation for generalized Gaussian probability density functions with application to speech modeling. Signal Process. 85(9), 1852–1858 (2005) 8. Laine, A., Fan, J.: Texture classification by wavelet packet signature. IEEE Trans. Pattern Recognit. Machine Intell. 15, 1186–1193 (1993) 9. Mallat, S.: A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Recognit. Machine Intell. 11(7), 674–693 (1989) 10. Moulin, P., Liu, J.: Analysis of multiresolution image denoising schemes using generalized Gaussian and complexity priors. IEEE Trans. Inform. Theory 45, 909–919 (1999) 11. Unser, M.: Texture classification and segmentation using wavelet frames. IEEE Trans. Image Process. 4(11), 1549–1560 (1995) 12. Bergen, J.R., Adelson, E.H.: Theories of visual texture perception. In: Regan, D. (ed.) Spatial Vision. CRC, Boca Raton (1991) 13. Zhu, S.C., Wu, Y.N., Mumford, D.: FRAME: Filters, random field and maximum entropy—Toward a unified theory for texture modeling. Int. J. Comput. Vis. 27(2), 1–20 (1998) 14. Wouwer, G.V., Scheunders, P., Dyck, D.V.: Statistical texture characterization from discrete wavelet representations. IEEE Trans. Image Processing 8, 592–598 (1999) 15. Sharifi, K., Leon-Garcia, A.: Estimation of shape parameter for generalized Gaussian distributions in subband decompositions of video. IEEE Trans. Circuits Syst. Video Technol. 5, 52–56 (1995) 16. Moulin, P., Liu, J.: Analysis of multiresolution image denoising schemes using generalized Gaussian and complexity priors. IEEE Trans. Inform. Theory 45, 909–919 (1999)
Air Quality Prediction in Yinchuan by Using Neural Networks Fengjun Li School of Mathematics and Computer Science, Ningxia University, 750021 Yinchuan, People’s Republic of China [email protected]
Abstract. A field study was carried out in Yinchuan to gather and evaluate information about the real environment. O3 (Ozone), P M10 (particle 10 um in diameter and smaller) and SO2 (sulphur monoxide) constitute the major concern for air quality of Yinchuan. This paper addresses the problem of the predictions of such three pollutants by using the ANN. Because ANNs are non-linear mapping structure based on the function of the human brain. They have been shown to be universal and highly flexible function approximation for any date. These make powerful tools for models, especially when the underlying data relationship is unknown. Keywords: Artificial neural networks, Air quality prediction, Yinchuan.
1
Introduction
More and more people today are paying attention to environmental quality. The prediction of air quality has become an important task due to its impact on human health, vegetation and environment. Until now, numerous environmental studies have been carried out in various countries. It is virtually impossible to give a complete list here. The common techniques applied in the literature to predict the air pollutant concer-tration are Box-Tenkis methods, transfer function models, regression techniques and ANNs. These techniques are either univariate or multivariate in nature. If the data on input variables are not available, univariate techniques are most preferable. Despite Yinchuan is not a metropolis, we environment quality including P M10 expozure levels and ozone conditions and SO2 concentration are a key area of concern. As a result, the environmental air quality, in particular, P M10 and O3 of Yinchuan should become an important issue of public interest. According to the Ningxia Province of Environment Report, the yearly average of pollutants such as SO2 , N Ox , CO, T SP (total suspened particulate) has decreased of about 30%, 20%, 15%, 40% during the past two years, respectively [1]. A severe health issue is, on the other hand, constituted by high levels of P M10 , O3 and SO2 ; these pollutants associated in the epidemiological literature with Y. Tan, Y. Shi, and K.C. Tan (Eds.): ICSI 2010, Part II, LNCS 6146, pp. 548–557, 2010. c Springer-Verlag Berlin Heidelberg 2010
Air Quality Prediction in Yinchuan by Using Neural Networks
549
an increase in the mortaily and cardiorespiratory hospitalizations, constitute the major concern regarding the air quality of the city. Yinchuan is the big city of western China. It is capital of Ningxia Hui Autonomous Region, and locates on upstream of Yellow River and middle of Ningxia plain. Region scope is in north latitude, east longitude. At present, Yinchuan urban district area is 1.5 million people, and 1,482 square kilometers, including three districts (Xingqing, Jinfeng and Xixia). It’s also the most industrialized and populated district of Ningxia. Well known as ”a land of rice and fish” in west-north of China, the region enjoys some of the favorable physical conditions, with a diversity of in natural resources and the suitability for growing various crops [2]. The yearly average of P M10 has been substantially increased (about 3ug/m3 ), since the beginning of monitoring in 2001; in the Yinchuan area. Suspended P M10 is mainly produced by sand storm (about 30%), wind and dry (about 26%), particular chemical and physical mechanisms (about 23%), vehicular traffic (11%), moreover, a further significant part of P M10 is produced in the atmosphere because of residential heating (especially in winter)[3]. The very last regional law of Ningxia does not introduce any attention or alarm threshold for P M10 , rather, it preventively decrees the blockage, in some periods during the winter. Nevertheless, the previous law of China fixed the attention and alarm thresholds to 65 ug/m3 and 80 ug/m3 , respectively, the ”attention state” was declared if the attention thresholds were exceeded 5 consecutive days. On average, in the past three years, P M10 exceeded the attention threshold for about 70 days/year, and about 20 ”attention state days” have been declared yearly. Ozone is a secondary pollutant, produced in the atmosphere in presence of high solar radiation and primary pollutants (N Ox and V OC). The regional law sets the attention and alarm levels to 190 ug/m3 and 365 ug/m3 , respectively, for the maximum hourly concentration, while the health target on the 8-h moving average is fixed to 120 ug/m3 . Ozone began to rise in the Yinchuan area in the past three years, partly as a consequence of the reduced SO2 and CO concentrations, which cause a more oxidant atmosphere. Since ozone levels strongly depend on meteorology, great variations are observed among different years. A system able to predict O3 , P M10 and SO2 concentrations with sufficient anticipation can provide public authorities the time required to manage the emergency, by planning an increase in the public transports in the case of an incoming tragic block, or by issuing early warnings. Particulate matter is a complex mixture of extremely small particles and liquid droplets in the air. Particulate matter in the air is often considered to be an important factor for health effects. Scientific studies have found an association between exposure to particulate matter and significant health problems such as aggravated asthma, chronic bronchitis, reduced lung function, and so forth [4]. Particulate matter standards differ in many countries. According to the Chinese code for the Design of Metro, the P M10 level should be on more than 0.25mg/m3 in metro stations [5]. P M10 concentration standards were more tightened in USA that was 0.15mg/m3 under the USA Environmental Protection Agency (EPA) PM standards [6].
550
F. Li
Both P M10 and P M2.5 are known to be major wind, sand and dry-related air pollution. An increasing number of studies today focus on fine particulate matter [7], as these fine particles are small in size and can lodge deeply in the lungs. The USA EPA first established air quality standards for fine particles in 1997 [6]. Due to a lack of evidence linking health problems to long-term exposure to coarse pollution, more attention to has been focused on P M2.5 , which were pelieved to pose the largest health risks (EPA, 2007b). The USA EPA revised the air quality standards for particle pollution in 2006 and significantly strengthened the previous daily fine particle standard from 65 to 35 ug/m3 [8]. Unfortunately, there is no standard regarding fine particulate matter in China.
2
Mode Description
ANNs are a branch of artificial intelligence developed in the 1950s aiming at imitating the biological brain architecture. They are parallel distributed systems made of many interconnected non-linear processing elements, called neurons. A renewal of interest has grown exponentially in the last decade, mainly for the availability of suitable hardware (e.g. parallel computers, analogue/digital neural card for personal computers) that has made them convenient for fast data analysis and information processing. Neural networks have found many applications on time series prediction [10]. Although their behaviour has been related to non-linear statistical regression, the big difference is that neural networks seem naturally suited for problems that show a large dimensionality of data, such as the task of identification for systems with a big number of state variables. Many ANN based models were developed for very different environmental purposes. Boger and Guterman [11] have used network analysis to evaluate the carbon flow model built for the northern Benguela upwelling ecosystem in Namibia. Antonic et al. [1] have estimated the forest survival after building the hydro-electric power plant on the Drava River, Croatia by means of a GIS constructed database and a neural network. A threelayer Levenberg-Marquardt feedforward neural network was used by Bishop [12] to model the eutrophication process in the three water bodies in Turkey. Other examples in environmental field were provided by Giustolisi and Mastrorilli and Carosone et al., both adopted the perhaps most simple and widely used neural network, named perceptron with the error backpropagation (BP) algorithm. The ANNs works on a matrix containing more patterns. Particularly, the patterns represent the rows while the variables are the columns. This data set is a sample, i.e. a subset of the population representing the phenomenon studied. To be more precise, giving the ANN three types of subset of the available sample can create the forecasting model: the training set, the test set, and the validation set. Definitions on these sets are crucial and often confused in neural networks field. In this paper, these definitions will be referred as follows: Training set: the group of data by which we train the network, i.e. by which the network adjusts-taking randomly the pattern-its parameters (thresholds and
Air Quality Prediction in Yinchuan by Using Neural Networks
551
weights), according to the gradient descent for the error function algorithm, in order to the best fitting of the non-linear function representing the phenomenon. Test set: the group of the data, given to the network still in the learning phase, by which the error evaluation is verified in order to effectively update the best thresholds and the weights. Validation set: a set of new data (given in the generalization phase, i.e. with the fixed new parameters) used to evaluate ANN generalization, i.e. to evaluate whether the model has effectively approximated the general function representative of the phenomenon, instead of learning the patterns uniquely. A brief algorithm of BP in ANNs as follows: Step 1. Initialize the number of hidden neurons. Step 2. Initialize the maximum number of iterations and the learning rate (η ), set all weights and thresholds to small random numbers. Thresholds are weights (parameters) with corresponding inputs always equal to 1. Step 3. For each training vector ( input Xp = (x1 , x2 , · · · , xn ), output Y ) repeat step 4-7. Step 4. Present the input vector to the input. h Step 5. Calculate the input to the hidden neurons: ahj = ni=1 wij xi ; Calculate h h the output from the hidden neurons: xj = f (aj ), Calculate the input to the output neurons: ak = lj=1 wjk xhj and the corresponding outputs: Yˆk = f (ak ). Note that k = 1 and Yˆk = Yˆ , L is the number of hidden neurons. Step 6. Calculate the error term for the output neurons: δk = (Y − Yˆ )f (ak ) and for the hidden neurons: δjh = f (ahj ) k δk wjk . Step 7. Update weights on the output layer: wjk (t + 1) = wjk (t) + ηδk xhj and on the hidden neurons: wij (t + 1) = wij (t) + ηδjh xi . As long as the network errors are longer than the predefined threshold or the number of iteration is small than the maximum number of iterations envisaged, repeat steps 4-7 [12].
3
Applications
From the point of view of the applications, they constitute the more interesting forecasts. In order to foresee a pollutant trend for the next day, or the next week, under particular traffic trend or climatic conditions, weather and traffic variables were exclusively used as inputs to the neural net. Thus, it could be possible to evaluate the response of the air pollutant’s concentrations under hypothetical or forecasted climatic circumstance. The application of an ANN to the urban context in Yinchuan, particularly for the area of the Ningdong Energy and Chemistry Industry Base, near the Yellow river, is presented. The experimental data were obtained from the monitoring units of the Yinchuan environmental center and Ningdong environmental center since 2005 [13]. The variables monitored were: sulphur dioxide, nitrogen oxides ( N O, N O2 , N OX ), total suspended particulate and P M10 , benzene, carbon monoxide, ozone, horizontal wind speed, moisture, pressure, temperature, total sun radiation (tsr), rain, and traffic [14, 15]. The neural
552
F. Li
network was trained, tested and validated for many specific input configurations to forecast the concentrations of the single pollutant by varying the available information. Consequently, many models suitable for different circumstances, each one valid exclusively for a single air pollutant, have been implemented. The elaborations performed in Yinchuan regarded the choice and the experimentation of: (1) Different ANN architectures. (2) Methodologies to scale and elaborate the available information, previously controlled and validated, for the ANN. (3) Testing and validation techniques for the realized models. Among the many alternatives, by experience in Yinchuan, the following ANN characteristics were never modified. (1) Number of layers: the used perceptron is always constituted by a single hidden layer. (2) Number of output neurons: one, always corresponding to the foreseen pollutant concentration value. (3) Learning rule: always the backpropagation standard. On the other hand, the ANN modifiers have been mainly related to: (1) The number of input and hidden neurons. (2) Data scaling. (3) Training and test sets choice. (4) Validation methodology (simple or crossed). (5) Learning rate (constant or variable). (6) Learning procedure (batch or incremental). (7) Activation function choice (logistic or hyperbolic).
4
Results
Ozone concentrations are more difficult to be foreseen because of the complex mechanisms, which regulate the dynamics of this pollutant in atmosphere, classified as secondary. Therefore, it is very problematic to forecast ozone levels without information about its precursors. The following simulation, among the many performed, can be in reality considered good (see Figs.1 and 2). It used a training set of 3500 patterns (including 288 patterns of the ’mean day’), a test set of 3500 patterns and two validation sets (500 and 48 patterns). The number of neurons was 13 (seven as inputs, five in the hidden layer and one as output). About 10 000 epochs were performed at a constant learning rate of 0.3. The results for the two validations reported a relative MSE (Mean Square Error) equal to 0.126 (48 h) and a relative MSE equal to 0.19 (500 h). The available dataset comprises 4 years of data (2005-2008) for a total amount of about 1400 time steps; observations refer to the same monitoring station used for the ozone study. P M10 time series underlies a significant periodic behavior: concentrations are about twice during winter than during summer (Fig. 3), because both of the higher anthropic emissions (see, for instance, building heatings emissions) and the unfavorable dispersion conditions (i.e., lower mixing
Air Quality Prediction in Yinchuan by Using Neural Networks
553
layer). Concentrations are however not negligible during summer, when some exceedances of the 50ug/m3 threshold can be recorded, though not causing the declaration of the attention state. A significant periodicity is detected also at the weekly scale (Fig. 4): concentrations are, in fact, 25-30%lower on Sunday than on the remaining days of the week.
Fig. 1. Ozone: real data and ANN forecast (48 h)
Fig. 2. Ozone: real data and ANN forecast (500 data)
554
F. Li
Fig. 3. Average yearly profiles of P M10 time series
Fig. 4. Average weekly profiles of P M10 time series
Air Quality Prediction in Yinchuan by Using Neural Networks
555
Fig. 5. SO2 : real data and ANN forecast (48 h)
Fig. 6. SO2 : real data and ANN forecast (300 data)
With reference to the input selection, we use the exhaustive input/output correlation analysis carried out in [12] and regarding the same dataset, but aimed at developing a traditional linear ARX model. The analysis grouped all the candidate input variables on all the possible time windows comprised between 0 a.m. of day t-1 and 9 a.m. of day t, evaluating accordingly the cross-correlation with the output P M10 time series.
556
F. Li
We can obtain the real data and the ANN forecast for the SO2 (see, Figs. 5 and 6) by using the same way as above. Here we omit the detail.
5
Conclusions
Because people are exposed to natrueal environment for a long time, the exposure can have harmful or dissatisfactory effects on their health or bad sensation in poor conditions. A field study was carried out to investigate air quality environment in Yinchuan. About 92.1% of subjects voted that the environment was acceptable. However, the air quality of Yinchuan was not very good, as the particulate levels and SO2 concentration were quite high. More steps should be taken to improve thermal comfort and air quality in Yinchuan, such as proper banning exploitation, adequate plant, artificial lough, recover ecology and improve its quality, and so forth. The perceptron with backpropagation algorithm model have shown very good performances for the forecasts. It is necessary to mention that in order to use the model for forecasting aims (both short and middle long-term forecasts) single pollutant ANNs have to be built. For the middle (24 h) and long-term forecasts, ANNs can be used introducing hypothesis about the values of the meteorological and traffic parameters. In this case, although the ANN forecasts appear to be worse than the 1 h ones, their results, in term of MSEs, are better than the usual deterministic models ones and furthermore, they are more rapid in the forecasting phase. For local administrations and health and environmental protection institutions, which are usually more interested in catching the future pollutant trend rather than the precise concentration value, this methodology appears to be very useful. The ANN has given good results in the middle and long term forecasting of almost all the pollutants. A shrewd preliminary analysis of the available data appears fundamental because it can give additional input for the learning phase and good indications about the time series. Acknowledgments. This work was supported by Ningxia natural project under contract No. NZ0907 and Ningxia gaoxiao project(2009).
References 1. Sun, Y.Q., Hui, Q., Wu, X.H.: Hydrogeochemical characteristics of groundwater depression cones in Yinchuan City. Northwest China, Chinese Journal of Geochemistry 26, 350–355 (2007) 2. Sun, Y.C., Miao, O.L., Li, Y.C.: Prediction result analysis of air quality dynamic prediction system in Yinchuan city. Arid Meteorology 24, 89–94 (2006) 3. Wang, W., Li, X.H., Wang, X.F.: Levels and chiral signatures of organochlorine pesticides in urban soils of Yinchuan. China Bull. Environ. Contam. Toxicol. 82, 505–509 (2009) (in Chinese with English abstract) 4. Antonic, O., Hatic, D., Krian, J., Bukocev, D.: Modelling groundwater regime acceptable for the forest survival after the building of the hydro-electric power plant. Ecol. Model. 138, 277–288 (2001)
Air Quality Prediction in Yinchuan by Using Neural Networks
557
5. NSCGPRC: Code for Design of Metro (GB50157-2003). China Planning, Beijing (2003) 6. EPA: National ambient air quality standards for particulate matter, final rule. EPA-HQ-OAR-2001-0017, FRL-8225-3, 40 CFR Part 50, Research Triangle Park, NC (October 2006) 7. Han, Y.W., Gao, J.X., Li, H.: Ecology suitability analysis on the industry overall arrangement plan of Ningdong Energy Sources and Chemical Industry Base. Environmental Science and Management 32, 143–147 (2007) 8. EPA: PM standards (2007b), http://www.epa.gov/air/particlepollution/standards.html 9. EPA: Final clean air fine particle implementation rule for implementation of 1997 P m2.5 standards: Fact sheet (2007a), http://www.epa.gov/pmdesignations/documents/Mar07/factsheet.htm 10. Boger, Z., Guterman, H.: Knowledge extraction from artificial neural network models. In: IEEE Systems, Man and Cybernetics Conference, Orlando, FL (1997) 11. Bishop, A.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995) 12. Wang, Y.M., Zhao, Y.F.: Issues of environmental protection and ecological construction of Ningdong Energy and Chemistry Industry Base. Ningxia Engineering Technology 7, 190–193 (2008) (in Chinese with English abstract) 13. Darken, C., Moody, J.: Note on learning rate schedulesfor stochastic optimization. In: Lippman, R.P., Moody, J.E., Touretzky, D.S. (eds.), pp. 832–838 (1991) 14. Giustolisi, O., Mastrorilli, M.: Realizzazione di un modello connessionista applicato a un problema di idrologia urbana. In: XXIV Conv. di Idraulica e Costruz. Idrauliche, Italy (1994) 15. Chu, G., Mi, W.B.: An analysis to the ecological carrying capacity of Yinchuan city. Urban Problems 10, 39–42 (2008) (in Chinese with English abstract)
Application of Artificial Neural Network in Composite Research∗ Peixian Zhu∗∗, Shenggang Zhou, Jie Zhen, and Yuhui Li Kunming University of Science and Technology, Kunming 650051, China [email protected]
Abstract. Artificial neural network is a technique with flexible mathematical structure and lots of characteristics such as parallel distributed processing, nonlinear processing and so on. So the artificial neural network becomes a common method to solve complex problems in research of material science by building a model. This article uses BP and RBF neural network to study the impact from components of composite materials, process conditions on properties of composite materials. We establish the relational model among the third element in composition, hot dipping temperature and shear stress which can reflect the joint face strength of Pb-Al composite materials, and give the model verification by using experimental data. The results which show that the neural network model can be used to predict the shear stress when change the third element in composition and the hot dipping temperature. Keywords: BP neural network; RBF neural network; parameters of technology; composite material.
1 Introduction Compared with single component materials, metal composite materials, can play the separate characteristics of the multi-element materials, and achieve the optimum composite and resource allocation of each group element to save rare materials. The composite material always has high strength, long fatigue life and structural properties can be designed and so on. However, the structural design, preparation or processing technology optimization of the composite materials often need to build the relationship model between them and the material properties or other concern objects. Using the traditional mathematical modeling research on the relationship among the structural, the process and the properties, it needs a lot of experimental data .The model building process is complex and often difficult to meet the engineering requirements. Artificial neural network is suitable to handle complex multi-nonlinear problems.
Application of Artificial Neural Network in Composite Research
559
According to limited input and output data of the research question or object, the neural network can have self-learning and training and receive a mapping relationship between the input and output data to make a further prediction of the research questions and the object [1-2]. The neural network can be used to establish the relationship model among the components of composite materials, process conditions and the properties of composite materials to achieve the optimization work of process conditions and so on. It can reduce blindness in the experiment, the experiment costs and the cycle of materials development. It also can deepen the understanding that different components of composite materials and the processing conditions affect on the nature properties of the material. So the artificial neural network becomes an important method to predict the properties of composite materials [3-7]. This article uses the BP and RBF neural networks to establish the relationship model among the Pb-Al composite materials, process conditions and properties of composite materials.
2 Neural Network Introduction 2.1 Network Model The BP network is a multi-layer feedforward neural network which usually has one or more hidden layer. There are sigmoid-type transformation functions in the hidden layer neurons and pure linear transformation functions in the output layer neurons. The network is trained by the input and output sample set, same as learns and adjusts the weights and thresholds of the network to make sure the network realize the relation between the input and output. The network is considered as a nonlinear mapping from input to output. It can be close to complicated function by compounding simply function several times. RBF network is a feedforward network which has three layers, input layer, hidden layer and output layer. Each node of the input layer directly connects with each component xi of the input vector. It plays a role which transfers the input data to the next layer, the number of the nodes is n. Each node is a RBF hidden node in the hidden layer, it means a single radial basis function which has relationship with the central location and the constant expansion. The radial basis function can be used as a transfer function to deal with input data. 2.2 Characteristics of Neural Network Modeling The neural network can summarize the given experimental data, search out the inherent laws in them, and reveal the inherent nature law of the research questions. When the right understanding of the issues is lack, the neural network model can achieve sufficient accuracy requirements to meet the engineering needs. So when the problem cannot be expressed by the mathematical models, we often use the neural network to resolve the relevant problems, such as fault diagnosis, feature extraction and prediction, the adaptive control issues of the nonlinear system and so on.
560
P. Zhu et al.
The neural network also has some limitations. When the number of data sets is collected too little, it can lead to some problems that the errors cannot respond the true characteristics of the issue itself and variation law. When the sampling points distribute unevenly in the sample space, they tend to affect the results of neural network prediction.
3 Data Acquisition of the Model When we produce composite materials, it needs to establish the relationship model between them and the properties of them or the attention object which can be used to complete the optimization of the process conditions. In the Pb-Al composite production, we need to study the bond strength of the composite. The shear stress is one of the indexes which can reflect the combination strength of the composite. So we use the BP and RBF neural network to establish the network model and use the network to predict shear stress. In this way, we can prevent lots of experiments and saving manpower and material resources. 3.1 Experimental Methods In the process of measuring the shear stress of Pb-Al composite materials, we take the pure Al plate which has the surface treatment under different temperatures and into the third element solution which has different components to hot dip. Then we put it in the mechanical properties test mould to the mold(mold temperature is 230 ± 5°C). Finally the test sample are produced ready. 3.2 Sample Space Construction Collect the experiment data including the hot-dip temperature, the element of third component, and the shear stress of the composite. Under the certain hot-dip temperature, we choose the separate percentage of Sn and Bi which constitute the third components and the hot-dip temperature as the input parameters of BP network, and use the shear stress as the output parameter of the BP network. Then a sample space which has 3 input and 1 output is received. Because the difference between the order of magnitude of the three network input variables is big, so make the hot-dip temperature normalized in order to reduce network training time and the network output error. 3.3 Data Packet We analyse and collate the 35 groups data collected in the experiment, and choose randomly 25 groups data of the experiment data as learning samples shown in Table 1, and use the other 10 groups to validate the network model which show in Table 2.
Application of Artificial Neural Network in Composite Research
4 Neural Network Model Prediction Results and Analysis 4.1 Model Structure We use the Matlab neural network toolbox to establish BP neural network. Through repeating experiments and comparison tests, we select the BP network with
562
P. Zhu et al.
three-layer structures which are input layer, hidden layer and output layer. When the number of hidden layer neurons is 14 and Sigmoid activation function, the network training error is the minimum 8.29487e-0.29. Input layer nodes correspond to the dimension of the input vector, and the output is shear stress. In the same input-output condition, we use the newrbe(P, T, SPREAD)function in Matlab to build an accurate neural network where P is the input vector, T is the target vector, SPREAD is the distribution density of the radial basis functions. When using this function to build RBF network, it selects the number of hidden layer automatically to reach the minimal error, and selects different SPERAD to repeat network training. At last the network which has highest prediction accuracy is obtained, and the SPREAD is 0.5. 4.2 Model Test Results and Analysis The maximum absolute error in check points of the BP neural network model is 0.27051 where the relative error is 8%; The maximum absolute error in check points of the RBF neural network model is 0.20441 where the relative error is 7%. Compared the results of the two network in Figure 1, we can find there are large jumps of the error between the prediction value and actual value in the process of BP neural network prediction. But the RBF neural network has universal approximation and best approximation capability. It avoids the long tedious computing which BP network has, learning fast, and there is no local optimum problem. So it can reflect the actual situation of the system better [8]. Using the limited sets of the experiment data to establish the BP and RBF neural network model can better predict the actual measured value of shear stress, and the error can control in a certain context, The RBF neural network model is better than the BP neural network, and can be used in further experiments guidance.
Fig. 1. Result Compared
Application of Artificial Neural Network in Composite Research
563
5 Conclusion The basic structure of the BP and RBF neural network is introduced in this article, which is used to establish prediction models to research the properties of the Pb-Al composites. Verification results of the model show that the neural networks can basically demonstrate the variation trend of the shear stress which is the physical properties index under different third element composition and hot-dip temperature. We use the two kinds of networks to establish the prediction model to study the properties of the Pb-Al composite material. By comparing their training speed in the experiment process and checking the prediction results, we find that RBF neural network training speed is not only faster than BP neural network, but also avoid local minimum problem which the BP neural network has. RBF neural network is better than BP neural network in the properties prediction, and its validation results show a better variation trend of the prediction index. The RBF neural network can be widely used to research the properties of the Pb-Al composites in the future. Acknowledgments. The authors wish to thank the National Natural Science Foundation of China financial support of the research (No. 50664005), the National High Technology Research and Development Program of China(2009AA03Z512) and thank the authors of the literature that we use in this article and the colleague who apply the experiment data for us.
References 1. Ying, Y., Bingshan, L., Cong, H., Chuanxian, C.: Neural network predicting of stiffness and strength in plane of stitched composites. Acta Materiae Compositae Sinica 21(6), 179–183 (2004) 2. Taylor, K., Darsey, J.A.: Prediction of the electronic properties of polymers using artificial neural networks. Polymer Preprints 41(1), 331–332 (2000) 3. Jiang, Z., Zhang, Z., Friedrich, K.: Prediction on wear properties of polymer composites with artificial neural networks. Composites Science and Technology 67, 168–176 (2007) 4. Bezerra, E.M., Bento, M.S., Rocco, J.A.F.F., Iha, K., Louren, V.L., Pardini, L.C.: Artificial neural network (ANN) predicttion of kinetic parameters of (CRFC) composites. Computational Materials Science (44), 656–663 (2008) 5. Cherian, R.P., Smith, I.N., Midha, P.S.: A neural network approach for selection of powder metallurgy materials and process parameters. Artificial Intelligenee in Engineering 80(14), 39–44 (2000) 6. Hafizpour, H.R., Sanjari, M., Simchi, A.: Analysis of the effect of reinforcement particles on the compressibility of Al–SiC composite powders using a neural network model. Materials and Design (2008) 7. Hongxing, H., Xiaolin, C., Yinming, S.: Composite plate stiffness identification using neural networks. Acta Mate2riae Compositae Sinica 17(1), 108–110 (2000)
Application of Short-Term Load Forecasting Based on Improved Gray-Markov Residuals Amending of BP Neural Network Dongxiao Niu, Cong Xu, Jianqing Li, and Yanan Wei School of Economics and Management, Norch China Electric Power University, Beijing, China [email protected], [email protected], [email protected]
Abstract. For the characteristics of short-term load forecasting, we established load forecasting model based on BP neural network, combined the advantages of gray prediction and Markov forecasting, and make an amendment for the prediction residual, this has greatly improved the precision of prediction. Research has shown that neural network and gray - Markov residual error correction model has the value of popularization and application. Keywords: load forecasting, Gray-Markov residuals amending, BP neural network.
2 BP Neural Network Prediction Model 2.1 Principle BP neural network prediction model is an artificial network which is currently the most widely used. The algorithm’s learning process is composed of positive communication and reverse communication. In the process of positive communication, start typing a message to all floors from input layer. After proper treatment, produces an output, and get an error between the output and the desired output. The other again is the reverse communication, from the output layer to the input layer, using the error obtained from use of positive communication to adjust the connection weights layer by layer, in order to make the network output close to the desired output [3]. As is shown in Figure.1:
Fig. 1. BP neural network structure
2.2 Modeling Steps 1) To determine the network structure, namely, to determine the network layer, the number of layers of neurons and activation function. Activation function using S−x
function f ( x) = 1 (1 + e ) , suppose we have k samples vectors, the number of network input layer neuron is n, the number of hidden layer neurons is p, the number of output layer neurons is m, the network input vector is Pk = ( x1 , x2 ,… , xn ) , output vector is
Yk = ( y1 , y2 ,… , ym ) , the desired output vector is Tk = (t1 , t2 ,…, tm ) .
2) Normalize the sample vectors, make the data between 0 and 1, make the random initial value of connection weights of neural network wij , w jk and neuron threshold value
θ j between (-1,1).
3) Set the minimum of error E, learning rate is η , set the maximum of training times as n.
566
D. Niu et al.
4) According input Sample to calculate input of hidden layer and output layer and corresponding output. The input and output of hidden layer is
o j = f (∑ wij xi + θ j ) , h j = f (o j ) .
The input and output of output layer is
yk = f (∑ w jk − θ k ) sk = f ( yk )
i
,
j
.
5) Based on the network output to calculate the output layer and hidden layer error.
δ k = ( sk − t s ) sk (1 − sk ) . ⎡
m
(1)
⎤
η j = ⎢ ∑ δ k • w jk ⎥ h j (1 − h j ) ⎣ k =1
(2)
⎦
6) Calculate the actual output and expected output of the total error. w jk = w jk + αδ k h j ,
wij = wif + αη j xi , Yk = Yk + αδ k , θ j = θ j + αη j , E =
1 N
n
∑e u =1
2 u
.
If meet the requirements, then end the training, or continue to 5). 7) Repeated adjustment the network connections of each neuron weights, repeat steps 3 to 5 so that E can achieve error range [4].
3 Gray - Markov Residual Error Correction Model The application of gray system prediction is the predicting problems, which has a short time, less data, volatility, the geometry is a monotonically increasing or monotonically decreasing smooth curve. The prediction object Markov chain is a dynamic system with random change, a n-order Markov chain is determined by the n-state set (E1, E2, ..., En) and a set of transition probabilities pij (i, j = 1, 2, ..., n), the process at any one time can only be in a state, if it is in state Ei at time i, in the time k +1, it will in the state Ej with probability pij. This feature of Markov chain determines its projections are based on transition probability between states to speculate that the future of development and changes [5]. 3.1 The Establishment of GM (1,1) Model According to the gray system theory, GM (1,1) model curve equation is:
Y (k ) = ' x (0) (k + 1) .
'
3.2 State Division '
Make curve Y ( k ) as the benchmark, according to the specific circumstances of each prediction object, and the curve is divided into a number of parallel rows of regions, each region constituted a state bar, each state of Qi interval can be expressed as:
Qi = [Q1i , Q2i ] , i = 1, 2,..., n .
Application of Short-Term Load Forecasting
567
3.3 State Transition Probability Matrix The probability of Markov chain from state Ei to the state Ej through the k-step transition is
p
(k ) ij
=
mij( k ) Mi
.
Use C-K equation repeatedly. 3.4 Data Forecast If the initial vector of the initial state the state vector is Pk
Ei of a variable is P (0) , after k-step transition,
= P (0) × P ( k ) .
After identified the change interval [Q1d , Q2 d ] of forecasts, the forecast 'Y (k ) take the midpoint of the range. The result is
Y (k ) = 'Y (k ) +
'
Ai + Bi [6]. 2
4 Case Study This paper selected the load of 24 hours a day in the second quarter of a city as a sample for network training to predict the load of a 24-hour capacity in the next two days, take the data of the first 10 hours to validate the effectiveness of the method. The raw data is shown in Figure 2.
Fig. 2. The raw data
568
D. Niu et al.
1) Data Processing. Normalize the sample vectors, make the data between 0 and 1. 2) The establishment of BP neural network model. The network input layer i =3 , hidden layer j =2, output layer k =24 , take the data of 90-day in the second quarter as a sample, the training times is 1000, the error is 0.01.Training with the network shown in Figure 4, after training, network error meets the requirements.
Fig. 3. Training Results
3) BP neural network predict. Use the trained network above to predict the load of the next 2 days. And calculate the relative error. 4) Division of the state. According to the original data sequence map and the curve of prediction, use gray - Markov model to determine the state division and the state transition matrix. 5) Amendment. Use the forecast error and the predictive value of BP to anti- normalized, the for( mula is described as X
network,
0)
(t ) =
'
X (0) (t ) ' (0) . X (t ) is the predictive value of BP neural 1− q
X (0) (t ) is the revised value, q is the corresponding error value.
Table 1. The comparative table of BP neural network prediction and the modified No.
Actual data
Predictive value of BP
Prediction error Revised value of Prediction error of of BP(%) Gray-Markov Gray-Markov(%)
1
142.7
146.56
-0.0270
143.8340
-0.0079
2
133.2
138.18
-0.0374
135.6099
-0.0181
3
117.7
124.87
-0.0609
120.4871
-0.0237
4
98.6
110.92
-0.1249
107.0267
-0.0855
5
91.7
101.3
-0.1047
97.7444
-0.0659
Application of Short-Term Load Forecasting
569
From the results of Table 1, the prediction error of BP neural network is -0.1249% -0.0270% , after using gray - Markov residual error correction model , the prediction error is -0.0855% -0.0079%. Thus, gray - Markov residual error correction model can improve the prediction accuracy more effectively.
5 Conclusion In this paper, BP neural network and gray - Markov correction model are used to establish a short-term load forecasting model. Use gray- Markov amendments to amend the forecast results of BP neural network model can reflect the state of model prediction better , when it is compared with the BP neural network prediction model only, and it can also predict the load more accurately. This method is not only has a high accuracy, but also simple, practical and workable. Therefore, this model based on improved gray-Markov residuals amending of BP neural network is applicable to long-term electric load forecasting, avoiding the volatility of a single Series forecasting, suitable for universal use. As BP still has the problem of uncertainties in the hidden layer nodes, the next step could be to study how to improve the smart algorithm.
References 1. Bluementhal, R.M., Getoor, R.K.: Markov Process and Potential Theory. Academy Press, New York (1986) 2. Chen, S., Billings, S.A.: Neural Networks for Nonlinear Dynamic System Modeling and identification. Int. J. Control 56, 319–346 (1992) 3. Dongxiao, N., Zhihong, G.: Main Research on neural net-works based on culture particle swarm optimization and its application in power load forecasting. In: Proceedings-Third International Conference on Natural Computation, ICNC, pp. 270–274 (2007) 4. Habiballah, I.O., Ghosh-Roy, R., Irving, M.R.: Markov chains for multipartitioning large power system state estimation networks. Electric Power Systems Research 3, 135–140 (1998) 5. Youxin, L., Longting, Z., Huixin, G.: Grey system judgement on reliability of mechanical equipment. Internal J. of Plant Eng. and Management 21(3), 156–164 (2001) 6. Jones, D.I., Lorenz, M.H.: An application of a Markov chain noise model to wind generator simulation. Mathematics and Computers in Simulation 28, 391–402 (1986)
The RBFNN ’s Application in Nonlinear System Model Based on Improved APC-III Algorithm Xinping Liu, Xiwen Xue, and Mingwen Zheng Computer and communication engineering, China University of Petroleum, Dongying Shandong, China [email protected]
Abstract. This paper proposed an improved APC-III algorithm to determine RBFNN’s hidden-layer structure. The improved algorithm is used to obtain the different radial basis width and less hidden-layer node number. The RBFNN was constructed with the hidden-layer structure that got by the improved algorithm. Two nonlinear system's model simulations were carried on with the RBFNN. The simulation results show that the structure produced RBFNN by improved algorithm is simple and the generalization performance is better than the original algorithm. Keywords: Radial Basis Function Neural Network, Hidden-layer structure, APC-III algorithm.
The RBFNN ’s Application in Nonlinear System Model
571
The APC-III algorithm is an extension of APC-I (Adaptive Pattern Classifier) [8]. It is used in the handwriting recognition firstly, because it can get the center of basis functions by going through the entire training patterns only once. Hwang and Bang [9] [10]from South Korea used this method to determine the center of basis function, but the algorithm they proposed is used to determine the number of basis functions and the vector while the width of radical basis is still fixed, and they are run alone. But the distribution of samples is very uneven for non-linear systems, and this leads to the heterogeneous distance between the data centers we have selected. It is inadequate if we still select fixed width because of uneven distribution of the samples. So we could combine APC-III with solving of radial basis width to obtain the data centers and width of each clustering simultaneously. This is one of the improved points we proposed in this paper.
2 RBFNN Learning Algorithm Based on Improved APC-III 2.1
Algorithm Description
The implementation of APC-III algorithm only needs to determine the clustery radius R0, which is the only one pre-determined parameters in this algorithm. Hwang and Bang calculated R0 by Equation (1):
R
0
= α
1 P
p
∑
i =1
min ( x i − x j≠i
j
)
(1)
Where P is sample number, and á is adjustment factor. The calculation of the Equation (1) will spend very much time if the P is very large. So we may select a subset of samples P to approximate R0, but it will reduce the network accuracy. In the specific operations we can select some different subsets and calculate the arithmetic mean value of R0 to reduce accuracy loss. The APC-III(Algorithm 1) and improved APC-III(Algorithm 2) are in the following: Algorithm 1 Input parameter: X={ X1, X2, …, Xp }; Output: the center of each cluster. Variable definition: m − number of clusters, cj − the jth date center, nj − the jth number of samples, dij − the distance between current sample Xi and the jth data center (j=1,2,…,m). Step1. Initialize parameters, including the determination of variable á, the initial number of cluster m=1, the initial data center c1 = X1, the sample number of first cluster n1 =1. Step2. Calculate R0 by the Equation (1). Step3. Input a sample Xi, Calculate dij − the distance between Xi and the existing clustery center, and compare dij and R0: if dij ≤ R0, Xi belongs to the jth cluster and updates the data center of the jth cluster by cj=( cj nj+ Xi ) / (ni+1) , then add one to sample number of the jth cluster, nj= nj+1;otherwise, if dij > R0, create a new cluster, the number of clusters is added one, m= m+1, and set up the initial data center of the mth cluster: cm = Xi, and the sample of clusters nm =1. Step4. Repeat step2 and step3 till all samples are trained. The algorithm is end.
572
X.P. Liu, X.W. Xue, and M.W. Zheng
Algorithm 2 Input parameter: X={ X1, X2, …, Xp }; Output parameter: the centers of each cluster and their widths. Variable definition: m − number of clusters, cj − the jth date center, nj − the jth number of samples, dij − the distance between current sample Xi and the jth data center (j=1,2,…,m), ój − the jth ( j=1,2,…,m ) radial basis width, count − the threshold of sample number in each cluster. Step1. Initialize parameters, including the determination of variable á, the initial number of cluster m=1, the initial data center c1 = X1, the sample number of first cluster n1 =1, and initial the value of count. Step2. Calculate R0 by the equation (1) Step3. Input a sample Xi, calculate dij − the distance between Xi and the existing clustery center; set σ j = max{dij } ; comparing dij and R0: if dij ≤ R0, Xi belongs to the jth xi ∈ j
cluster, update the data center of the jth cluster by cj=(cjnj+Xi)/(ni+1), and add one to sample number of the jth cluster as well as nj= nj+1; update the corresponding radius basis width: σ j = max{dij } ; otherwise, if dij > R0, create a new cluster, the number xi ∈ j
of clusters is added one as well as m= m+1, and set the initial data center of the mth cluster: cm = Xi, then set the number contained in the clusters as well as nm =1. Step4. Repeat step2 and step3 till all samples are trained. Step5. Compare the number nj of all clusters m and count, if nj < count, then delete such center. The algorithm is end. 2.2 Algorithm Analysis of the Improved APC-III Compared with the original algorithm, the structure of improved APC-III algorithm is not changed, that is complete clustering only scan all samples once. The complexity of two algorithms is O(P(Nm+mn)), where P is the number of samples, n is the number of output-layers, and m is the number of hidden-layers. The width of radial basis is fixed in the original algorithm, but experimental results show that approximate errors of RBFNN with fixed widths would increase when the distribution of the data center is uneven. In the improved algorithm, we determine the different width for each type of clustery center, it can improve the training accuracy of training samples and also beneficial to non-linear modeling system with the uneven distribution. In the original algorithm, all the clustery centers are treated as RBFNN’s hidden-layer nodes. This might lead to generate excessive hidden-layer nodes so as to decrease the generalization ability of the network. While in the improved algorithm, we utilize threshold count to control the cluster delete or not. This method could avoid RBFNN’s mapping curve rapid changes because of excessive hidden-layer nodes. So the proposed algorithm can improve the generalization ability of RBFNN.
3 The Application of Improved RBFNN Algorithm in the Nonlinear System Model It is often applied in many complex systems nonlinearly because RBFNN can approximate any non-linear mapping with any accuracy. So we can model the nonlinear system
The RBFNN ’s Application in Nonlinear System Model
573
by RBFNN, and use it as the modeling of complex object. In the normal circumstances, there are two structures of non-linear system which is identified model by neural network: serial-parallel (the Equation (2)) and parallel structure (the Equation (3)).
$y ( k ) = f (W , $y ( k − 1),..., $y ( k − n ), u ( k − 1),..., u ( k − m )) NN
(2)
$y(k ) = f (W , y(k −1),..., y(k − n), u(k −1),..., u(k − m)) NN
(3)
Where fNN(•) denotes the map of neural network, W is the weight space of neural network. The following two non-linear experiments demonstrate the efficient of the improved algorithm in this paper. 3.1 Simulation 1 The modeling and simulating experiment for the non-linear system is in the following Equation (4).
y ( k ) = 0.725 sin(
16u ( k − 1) + 8 y ( k − 1) ) + 0.2u ( k − 1) + 0.2 y ( k − 1) 3 + 4u ( k − 1) 2 + 4 y ( k − 1) 2
(4)
In Equation (4), y(0)=0, u(k) are the input signals, and their scope is random values among interval [-1,1]. The order of nonlinear system is set up as five, namely, the network input is X=[u(k-1),y(k-1),u(k-2),y(k-2),y(k-3)].Generating 400 samples according to Equation (4), 300 samples are used to train the network while 100 remaining samples are used to test the network. Select á=0.5, thus R0=0.95, set count=1, namely prune the cluster which is only one sample; then construct a RBFNN to simulate Equation (4) according to Algorithm 1 and Algorithm 2. The final result is following as Table 1: Table 1. Comparison between simulation results Algorithm
the number of hidden-layer nodes
training accuracy
generalization accuracy
Algorithm 1
47
0.01585
2.0141
Algorithm 2
38
0.01561
1.7681
Remark 1: During the simulation of this instance, it would lead the value of R0 too large or small if the value of á has the same trend, so the network performance is poor. We have obtained the best result when the value of á is between 0.25 and 0.5. Remark 2: The square sums of error (SSE) are used as accuracy measure standard.
574
X.P. Liu, X.W. Xue, and M.W. Zheng
3.2 Simulation 2 This simulation aims to Continuous Stirred Reactor (CSTR) in chemical factory. The CSTR system is difficult to model and control because of non-linear nature. Xudong [6] has modeled it by RBFNN. In this paper, we simulate CSTR system by using the proposed algorithm. The ecumenical CSTR system could denote as following Equation (5).
q E ⎧& ⎪⎪CA = V (CAf − CA ) − k0CA exp(− RT ) ⎨ ⎪T& = q (Tf − T ) − ΔH exp(− E ) + ρCCPC qC [1− exp(− hA )](TC − T ) RT ρCPV qC ρCCPC ρCP ⎪⎩ V
(5)
In Equation (5), q is feed speed, qc is coolant flow rate and Tc is coolant temperature. These three parameters are the input variables of CSTR. The variable of C&A is the concentration of product A, and T& is the reactor temperature, these two parameters are the output variables of CSTR. The significances of other specific parameters are explained in literature [7]. In this simulation, we could sample the data 1200 times orderly for CSTR system. Record the input values, and set R0=1.5, then identify C&A and T& by RBFNN. The final results are followed as Table 2. Table 2. Comparison between simulation results Algorithm Algorithm 1 Algorithm 2
the output
the number of hidden-layer nodes
training accuracy
generalization accuracy
C& A
161
0.0162
3.3685
T&
315 112 168
0.0153 0.0095 0.0103
3.1369 2.2165 2.1178
C& A
T&
Remark: The square sums of error (SSE) are used as accuracy measure standard. The results of two simulations show that the efficiency of the proposed algorithm 2 is better than algorithm 1. Although the accuracy of training is nearly same, the numbers of hidden-layer nodes are decreased, and the accuracy of estimation has been greatly increased.
4 Conclusions and Prospect An improved APC-III algorithm has been proposed in this paper. Based on this algorithm we can obtain the number of hidden-layer nodes, the data centers of vector and the radical widths. While the original algorithm only determines the data centers and the number of hidden-layer nodes, and the widths are all the same. The simulation
The RBFNN ’s Application in Nonlinear System Model
575
result produced by the improved APC-III algorithm has less hidden-layer nodes and better generalization ability than the original algorithm. The choice of the value of R0 is obtained through experiment in the improved algorithm which is the same as the original algorithms, it would decrease the efficiency of the algorithm in some extent. We can obtain R0 adaptively so that it should be faster to access the appropriate value of R0.
References 1. Yu, D.L., Gomm, J.B., Williams, D.: A Recursive Orthogonal Least Squares Algorithm for Training RBF Networks. Neural Processing Letters 5, 167–176 (1997) 2. Feng, D., Zhang, Y.: Predictive Control Based on Neuro-fuzzy Model for CSTR system. Microcomputer Information Journal 24(6-1), 5–6 (2008) 3. Lai, G., Wang, Y.: Neural Network Algorithm for Rapid Global Convergence In the Nonlinear Modeling. Xiamen Institute of Technology Journal 16(1), 23–28 (2008) 4. Sarimveis, H., Doganis, P., Alexandridis, A.: A Classification Technique Based on Radial Basis Function Neural Networks. Advances in Engineering Software 37, 218–221 (2006) 5. Che, M., Chen, X., et al.: Based on Self-organizing Neural Network of Nonlinear System Modeling. Computer Simulation Journal 24(5), 142–144 (2007) 6. Wang, X., Shao, H.: The Application of RBFNN in Nonlinear System Modeling. Control Theory and Applications Journal 14(1), 59–66 (1997) 7. Zhu, X.: The nonlinear predictive of CSTR. South China University of Technology Journal 23(6), 7–16 (1995) 8. Park, Y.H., Bang, S.Y.: A New Neural Network Model based on Nearest Neighbor Classifier. In: Proc. IJCNN, vol. 3, pp. 2386–2389 (1991) 9. Hwang, Y.-S., Bang, S.-Y.: A Neural Network Model APC-III and Its Application to Unconstrained Handwritten Digit Recognition. In: Processing of International Conference on Neural Information Processing, pp. 1500–1505 (1994) 10. Hwang, Y.-S., Bang, S.-Y.: An Efficient Method to Construct a Radial Basis Function Neural Network. Neural Networks 10(8), 1495–1503 (1997)
An Improved Harmony Search Algorithm with Dynamic Adaptation for Location of Critical Slip Surface Shibao Lu1, Weijuan Meng3, and Liang Li2 2
1 China University of Mining and Technology, Beijing, P.R.China School of Civil Engineering, Qingdao Technological University, Qingdao, P.R. China 3 Tianjin University of Science and Technology, Tianjin, P.R. China {Shibaolu,Weijuanmeng,liangli14}@yahoo.com.cn
Abstract. Although the original harmony search algorithm has been widely used in many fields such as numerical optimization, automatic control, there are no theories or formulae for the determination of values for parameters used in this algorithm yielding different assigned values based on different researchers’ experiences or some researchers merely adopting recommendation from published references. The values are dynamic determined in consideration of the convergence degree of current individuals used in harmony search algorithm. Two parameters for the definition of convergence degree are adopted in this study. The improved harmony search algorithm has been found to be efficient for the location of critical slip surface than original one from the comparison of results obtained by original algorithm and by the proposed algorithm this study. Keywords: harmony search algorithm; critical slip surface; dynamic adaptation; factor of safety.
An Improved Harmony Search Algorithm with Dynamic Adaptation
577
which transformed the various constraints and requirement of a kinematically acceptable failure mechanism to the evaluation of upper and lower bounds of the control variables and employed simulated annealing algorithm to determine the critical slip surface. Zolfaghari [7] adopted genetic algorithm while Bolton [8] used leap-frog optimization technique to evaluate the minimum factor of safety. In recent years, many intelligent algorithms artificial fish algorithm [9], ant colony algorithm [10] to name just a few have been adopted for the location of critical slip surface. Further improvement on the existing algorithm should be encouraged from which this paper is inspired. The procedure for the generation of potential slip surface presented by Cheng [6] is used and the unbalanced thrust force method for the calculation of factor of safety is employed in this study.
2 Original Harmony Search Algorithm Geem (2001) developed a harmony search meta-heuristic algorithm that is conceptualized based on the musical process of searching for a perfect state of harmony. The harmony in music is analogous to the optimization solution vector, and the musician’s improvisations are analogous to local and global search schemes in optimization techniques. HS algorithm uses a stochastic random search by two parameters named harmony memory considering rate H R and the pitch adjusting rate PR . Harmony search algorithm is a population based search method. A harmony memory HM of size N is used to generate a new harmony which is probably better than the optimum in the current harmony memory. The harmony memory consists of N harmonies (slip surfaces), i.e. HM = {h1 h2 ,..., hN } , Each of which represents one slip surface. The generation of a new harmony hN +1 is the kernel of harmony search algorithm which will be described as follows: Taking the j-th element x j in hi = ( xi1 , xi 2 ,..., xim ) for instance, where m is the number of optimized variants for generating slip surface. Its lower and upper bound are l j and u j respectively.
A random number r in the range of 0 to 1 is generated, if r ≤ H R , xN +1, j
is randomly chosen from HM, i e., xN +1,1 ∈ { x11 , x21 ,..., xN 1} then PR is utilized to adjust xN +1,1 , at last xN +1,1 is obtained; if r > H R , xN +1,1 is randomly generated from its lower and upper bound, the abovementioned procedure is applied to other elements thereby obtaining a new harmony hN +1 .
3 Improved Harmony Search Algorithm It is implied from the above procedure used for the generation of new harmony that a larger value of H R enables the algorithm to exploit the existing harmonies, while a smaller value of H R leads the algorithm to explore search space. The constant value of parameters H R and PR can not reach the equilibrium between exploitation and exploration because the harmonies will be converged to one point if a large constant value of H R is used, at this time, the value of H R should be decreased to explore search
578
S. Lu, W. Meng, and L. Li
space, and then when the harmonies are scattered in the search space, a large value of H R is again used to exploit the existing solutions. In order to dynamic alter the values of H R and PR , two parameters describing the convergence degree of present harmonies are introduced called η1 and η 2 . The convergence degree is represented using variable Cd which is calculated by equation (1),(2),(3). L
Cd = ∑ Di
(1)
i =1
Di =
∑(h m
ij
j =1
− Cj )
2
(2)
L
Cj =
∑h i =1
ij
(3)
L
The maximum value of Cd named Cmax is determined by equation (4) based on the lower and upper bound l j and u j which j varied from 1 to m . Cmax =
N . 2
∑(u m
j =1
j
− lj )
2
(4)
Parameter η1 is used to define a threshold value C1 = η1Cmax , when Cd is higher than C1 , the value of H R is equal to 1.0 with PR = 0 leading the algorithm to exploit the existing solutions. Parameter η 2 is used to define another threshold value C2 = η2Cmax , when Cd is lower than C2 , the values of H R and PR are both equal to 0.0 enabling the algorithm to merely explore the search space. When Cd is within the range from C2 to C1 , H R is equal to 1.0 and the value of PR is dynamic altered using equation (5). PR =
( C1 − Cd ) ( C1 − C2 )
(5)
The dynamic equilibrium between exploitation and exploration of the improved harmony search algorithm can be easily achieved by altering the values of two imported parameters η1 and η 2 . If a smaller value of η1 is prescribed, the improved algorithm mainly focused on the exploitation, while a large value of η 2 will make the algorithm mainly explore the search space neglecting the exploitation. In general, the value of η 2 is small and a medium value of η1 is assigned. The effects of different values of these two parameters on the results are studied in the following case studies. In addition, the value of η 2 must be lower than that of η1 .
An Improved Harmony Search Algorithm with Dynamic Adaptation
579
4 Case Studies The number of control variables is equal to 25, i.e. m = 25 .The maximum number of iterations is equal to 10000 for both original and improved harmony search algorithm. Twenty values of H R beginning from 0.05 to 1.0 with interval of 0.05 are used in original harmony search algorithm, while twenty values of η1 beginning from 0.05 to 1.0 with interval of 0.05 are adopted in improved harmony search algorithm. In the original harmony search algorithm PR is constant equaling 0.1, while in improved harmony search algorithm η 2 is constant equaling 0.01. The example is a slope in layered soil and genetic algorithm with Morgenstern and Price method is used by Zolfaghari (2005). The geometric layout of the slope is shown in Fig.1 while Table 1 gave the geotechnical properties for soil layers 1 to 4. 50
Layer 1
Slope height/m
48
Layer 2
46
Layer 3
44
Layer 4
42
40 0
5
10
15
20
25
30
Slope width/m
Fig. 1. Cross section of example slope Table 1. Geotechnical parameters for example Layers
γ
1 2 3 4
19.0 19.0 19.0 19.0
(kN/m3)
c (kPa)
φ (degree)
15.0 17.0 5.00 35.0
20.0 21.0 10.0 28.0
It is clearly noticed from Fig.2 that the value of H R played an important role in the application of original harmony search algorithm, the maximum value of factor of safety of 2.30 is obtained, while the improved algorithm provided almost identical results lower than 1.12. This comparison indicates that dynamic altered values for parameters H R and PR based on the convergence degree Cd are available in the harmony search algorithm and the results are insensitive to the imported parameters. Zolfaghari (2005) presented a factor of safety of 1.24 whose corresponding critical slip surface is shown in Fig.1 and more portions of the result obtained by this improved algorithm lay within the soft layer 3.
580
S. Lu, W. Meng, and L. Li
2.4
Original harmony search algorithm
2.2
improved harmony search algorithm
y 2 t e f a s 1.8 f o r 1.6 o t c a F 1.4 1.2 1 0.05
0.15
0.25
0.35
0.45
0.55
0.65
0.75
0.85
0.95
Different values of HR and η1
Fig. 2. Results obtained by original and improved harmony search algorithm 50
Layer 1
Slope width/m
Zolfaghari Layer 2 Layer 3
Combinatorial search method
45
Layer 4
40 0
10
20
30
Slope height/m
Fig. 3. Comparison of critical slip surfaces
Acknowledgments. The author will like to thank for the help from Project (50874064, 50804026, 50904039) supported by the National Natural Science Foundation of China and also from Key Project (Z2007F10) supported by the Natural Science Foundation of Shandong province.
References 1. Arai, K., Tagyo, K.: Determination of noncircular slip surfaces giving the minimum factor of safety in slope stability analysis. 21, 43–51 (1985) 2. Baker, R.: Determination of the critical slip surface in slope stability computations. International Journal of Numerical and Analytical Methods in Geomechanics, 333–359 (1980) 3. Yamagami, T., Jiang, J.C.: A Search for the Critical Slip Surface in Three-Dimensional Slope Stability Analysis. Soils and Foundations 37, 1–6 (1997)
An Improved Harmony Search Algorithm with Dynamic Adaptation
581
4. Greco, V.R.: Efficient Monte Carlo technique for locating critical slip surface. Journal of Geotechnical Engineering 122, 517–525 (1996) 5. Malkawi Abdallah, I.H., Hassan, W.F., Sarma, S.K.: Global search method for locating general slip surface using Monte Carlo techniques. Journal of Geotechnical and Geoenvironmental Engineering 127, 688–698 (2001) 6. Cheng, Y.M.: Locations of Critical Failure Surface and some Further Studies on Slope Stability Analysis. Computers and Geotechnics 30, 255–267 (2003) 7. Zolfaghari, A.R., Heath, A.C., McCombie, P.F.: Simple genetic algorithm search for critical non-circular failure surface in slope stability analysis. Computers and Geotechnics 32, 139–152 (2005) 8. Bolton Hermanus, P.J., Heymann, G., Groenwold, A.: Global search for critical failure surface in slope stability analysis. Engineering Optimization 35, 51–65 (2003) 9. Cheng, Y.M., Liang, L., Chi, S.C.: Determination of the critical slip surface using artificial fish swarms algorithm. Journal of Geotechnical and Geoenvironmental Engineering 134, 244–251 (2008) 10. Li, L., Chi, S.C., Lin, G.: The complex method based on ant colony algorithm and its application on the slope stability analysis. Chinese Journal of Geotechnical Engineering 26, 691–696 (2004) (in Chinese)
Verifying Election Campaign Optimization Algorithm by Several Benchmarking Functions∗ Wenge Lv, Qinghua Xie, Zhiyong Liu, Deyuan Li, Siyuan Cheng, Shaoming Luo, and Xiangwei Zhang Faculty of Electro-mechanics Engineering, Guangdong University of Technology, 510006 Guangzhou, China [email protected]
Abstract. Election Campaign Optimization (ECO) algorithm is a new heuristic algorithm, it works by simulating the behavior that the election candidates pursue the highest support in campaign. The candidates can influence the voters round them. The higher prestige a candidate comports, the larger effect he has. Voters have to allot their support proportionally according to the effects imposed by the candidates. Global and local survey-sample to voters are done to investigate the support of candidates. The proportion of the support to a candidate from a voter to the sum of the support of the candidate from all voters is the contribution of a voter to the candidate. The sum of location coordinates of every voters powered by its contribution is a new location coordinates, it is the next position of the candidate. Such cycle is done continually until a candidate finds the position of the highest support. In this paper, several benchmarking functions are used to verify ECO algorithm. Keywords: election campaign; optimization; algorithm; benchmarking functions.
1 Introduction The no-free-lunch (NFL) theorem proves that the performance of all optimization algorithms is equivalent averagely for entire problem-field[1]. If algorithm A outperforms algorithm B on a problem set, there must exist exactly another where algorithm B outperforms algorithm A. Any two optimization algorithms are equivalent when their performance is averaged across all possible problems. Therefore the evaluation of algorithms depends greatly on the problem to solve. NFL theorem indicates actually that seeking a universal algorithm for all optimization problems is impossible, and it means also that an optimization algorithm will be specialized to some optimization problems well than others. Therefore, developing a new optimization algorithm ceaselessly is always necessary and significative. Election is an important political activity in human society, sample surveys are employed widely to forecast the election results. A sample survey is a statistical ∗
The work was supported by the National Science Foundation of China (50676022); Provincial Science Foundation of Guangdong (07001748).
method that gathers information from a number of individuals, a sample, in order to learn some conclusion about whole population. The U.S. presidential election is one of the largest elections in the world. In the 2004 there are 2.5 billion valid voters in The U.S., about 1.2 billion vote in the end. The election result based on a sample of a number of individuals will be forecasted within some margin of sampling error by means of sample survey. Campaign is an important behavior in election, election candidates always pursue the maximum support from voters by means of various election actions. Election candidates find out their supports according to the sample to voters, and decide how to do in the next. In order to obtain the more and more supports form voters, election candidates always will tend towards the voters possessing higher effect to others. Election is a process that candidates seed the maximum support from voters, optimization is a technique is that find the best solution in field of problem, it is can be found that there are considerable comparability between election and optimization. It can be imagined that, there must be an optimization mechanism in election process that can be learned to develop a new optimization algorithm. In this paper, an optimization algorithm simulating election process will be introduced, which is named Election Campaign Optimization algorithm[2-4]. Global Investigated Voter Local Investigated Voter
Candidate
Fig. 1. Distribution of candidates and investigated voters in ESO algorithm
2 ESO Algorithm Assume that there is a sort of election mechanism, a few election candidates are allowed to participate in election in voters. Election candidates can affect the voters around themselves within a certain range, their effects to the voters will decrease gradually with the increase of distance between election candidates and voter. The higher prestige the candidate owns, the greater extension he affects. But his effect to voters will decrease to zero as the distance between election candidates and voter exceeds a limit. Suppose that social structure is disequilibrium, the prestige of electorates is unequal, so the supports to election candidates are different. Every voter are affected by several election candidates, voters will distribute their supports to election candidates according to the effects from election candidates. Sample survey to electorates is carried through to investigate the support intensity of every candidate from electorates. In order to be sure that the sample survey is precise and general, the local
584
W. Lv et al.
survey sample is generated in the position as mean by the probability determined by normal distribution, the global survey sample is generated in the total field by the probability determined by uniform distribution. Then computing the effects from election candidates to the limit amount of voters and the supports to election candidates form voters, the support ratio can be found out approximately. An election candidate can affect several voters simultaneously, so he will win the supports from those voters. The proportion of the support to a candidate from a spot-check voter to the total supports to the same candidate from all spot-check voters is the contribution to a candidate from a spot-check. The sum of products of location coordinate of all spotchecks voters and their contributions is the location coordinate of the support barycenter of the candidate, it is the position of higher prestige of the candidate. If the candidates are compared to ships and voters to waters in the above election mechanism, the ships continue to adjust their positions according to spot-check results in order to gain the higher water surface. It can be found that the effects of candidates to spot-check voters of various distance is different, the supports that spot-check voters with various prestige can distribute is different, the support from a spot-check voter to each candidates is also different. Therefore the contributions from a same batch of spot-check voters to various candidates are different, and generates the different support barycenters for each candidates, which will leads candidates the suitablest position for themselves. Such an election tends to lead candidates to the nearer spot-check voter of the higher prestige, in order to arrive at a position owning higher prestige consequentially. Solution space is imagined as voters and current solutions are imagined as candidates in ESA algorithm. The function value of a feasible solution is named as the prestige of a voter and the function value of a current solution is named as the prestige of a candidate. The support barycenter of a candidate is obtained by means of sample-surveying, which depends on those sample-surveying voters whose distances to the candidate are nearer and prestiges are higher relatively. The next election location of the candidate should be his support barycenter, where the candidate will have the higher support. Do that circularly until the highest support is found. In order to jump out of local optimization solution and increase search rate, the prestiges of candidates are compared to that of sample-surveying voters, if the prestige of a sample-surveying voter is higher than that of a candidate, the samplesurveying voter with higher prestige will substitute for the candidate and the that candidate of lower prestige will be eliminated in election.
3 ESO Algorithm MATLAB Toolbox In order to test, apply, improve ESO algorithm easily, ESO algorithm MATLAB toolbox is realized by means of MATLAB. ESO algorithm MATLAB toolbox includes ESO algorithm main program, test function program, computational result processing and outputting program, help files. ESO algorithm main program implements the operation of ESO algorithm for optimization function in test function program, computational result processing and outputting program exports computational result to appointed folder in the form of figure files, data files, computational
processing record files. 89 test functions are collected, which contain 23 benchmark functions, 26 unconstrained optimization functions, 15 constrained optimization functions, 6 minmax optimization functions, 5 multi-objective optimization functions, 3 multi-peak optimization function, 14 nonlinear equation and equations. Help files consist of ESO algorithm MATLAB toolbox manual, version description.
4 Benchmark Functions The benchmark functions are employed to examine the performance of ESO algorithm[5]. The five benchmark functions selected in this paper are list in the following. n
2 (1) F1(De Jong’s Sphere function): min f1 ( x ) = ∑ xi i =1
Where xi ∈ [−5.12,5.12] , i = 1,2,...,n . The minimal value is
f ( x* ) = 0 at
x* = (0, 0,..., 0) . De Jong’s Sphere function is the most basic problem for optimization algorithms. It doesn’t contain local optima and provides a smooth gradient towards a global optimum. 2 2 2 (2) F2(Rosenbrock function 1): min f ( x ) = 100(x1 − x2 ) + ( x1 − 1)
where xi ∈ [−2.048, 2.048] , i = 1, 2,..., n . The minimal value is f ( x* ) = 0 at x* = (1,1) . Rosenbrock function 1 has a very narrow ridge. The global optimum is inside a long, narrow, parabolic shaped flat valley. n
(3) F3(De Jong’s Step function): min f ( x ) = ∑ integer( xi ) i =1
Where xi ∈ [−5.12,5.12] , i = 1, 2,..., n . The minimal value is f ( x* ) = −25 at xi* = (−5,−5,...,−5) . De Jong’s Step function contains multiple plateaux. Individuals on the same plateau have equivalent fitness values which can significantly slow the convergence of an optimisation algorithm. n
4 (4) F4(function with Gaussian noise): min f ( x ) = ∑ ixi + Gauss(0,1) i =1
Where xi ∈ [−1.28,1.28] , i = 1, 2,..., n . The minimal value is f ( x* ) = Gauss(0,1) at x* = (0, 0,..., 0) . F4 is a simple unimodal function with Gaussian noise. 25
(5) F5(Shekel’s foxholes function): max f ( x ) = 0.002 + ∑ j =1
1 2
j + ∑ (xi − aij )6 i =1
Where xi ∈ [−65.536, 65.536] , i = 1, 2 , a1 j = {− 32, − 16, 0,16, 32, − 32, − 16, 0,16, 32, − 32, − 16, 0,16, 32, − 32, − 16, 0,16, 32, −32, −16, 0,16,32} a2 j = {−32, −32, −32, −32, −32, −16, −16, −16, −16, −16,0,0,0,0,0,16,16,16,16,16, 32,32,32,32,32} . The maximal value is f ( x* ) = 1 at x* = (−32, −32) . Shekel’s foxholes function is an example of a two-dimensional multimodal function with 25 local optima.
586
W. Lv et al. 6
80 70
4 60 2
50
x1
40
f
0
30 -2
20 10
-4 0 -6
5
10
15
20
25
-10
30
5
10
Computational cycles
15
20
25
30
Computational cycles
Fig. 2. The x1(a) and f (b) of benchmark functions F1 convergence procedure 3
3000 2500
2 2000
x1
1
1500 1000
f
0
500 -1 0 -2
-3
-500
2
4
6
8
10
12
14
16
18
-1000
20
2
4
Computational cycles
6
8
10
12
14
16
18
20
Computational cycles
Fig. 3. The x1(a) and f (b) of benchmark functions F2 convergence procedure 30 6 20 4 10
0 0
f
x1
2
-10 -2 -20 20 -4 -30 -6 10
20
30
40
50
60
70
80
Computational cycles
90
100
-40
10
20
30
40
50
60
70
80
90
100
Computational cycles
Fig. 4. The x1(a) and f (b) of benchmark functions F3 convergence procedure
The computational results are shown in Fig.2 to Fig.6. Only variables x1 and objective value f are shown in figures to limit paper space. It is can be found that for benchmark functions F1 to F5, the variables and objective values converge together after 5 to 30 computational cycles, which are coincident with the results in relative reference. It means that ECO algorithm can search out the global solution in finite computational cycles, ECO algorithm is valid for those benchmarking functions. According to NFL theorem, there may be some optimization problems suitable for ECO algorithm best, let’s find them out.
Fig. 5. The x1(a) and f (b) of benchmark functions F4 convergence procedure
-20
0.5
-40
0
-60
-0.5
-80
5
10
15
20
25
30
35
40
Computational cycles
45
50
-1
5
10
15
20
25
30
35
40
45
50
Computational cycles
Fig. 6. The x1(a) and f (b) of benchmark functions F5 convergence procedure
References 1. Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation 1, 67–82 (1997) 2. Li, J., Lv, W.G., Hou, M.H.: Path planning for mobile robot based on Election Algorithm. Machine Tool & Hydraulics 37, 30–31, 68 (2009) 3. Lv, W.G., Du, J.H., Li, J., et al.: Optimization of double universal coupling using competitive algorithms. Journal of Gongdong Non-Ferrous Metals 1, 221–223 (2007) 4. Zheng, L.L., Lv, W.G.: The optimization design of machine-tools spindle structure based on competitive algorithm. Machinery Design & Manufacture 8, 35–37 (2006) 5. Liu, Y., Tang, L.S., Chen, Y.P.: Non-numerical Parallel Algorithm-Genetic Algorithm. Science Press, Beijing (1995)
An Algorithm of Alternately Mining Frequent Neighboring Class Set Gang Fang College of Math and Computer Science, Chongqing Three Gorges University Chongqing 404000, P.R. China [email protected]
Abstract. Aiming to these present frequent neighboring class set mining algorithms existing more repeated computing and redundancy neighboring class set, this paper proposes an algorithm of alternately mining frequent neighboring class set, which is suitable for mining frequent neighboring class set of objects in large spatial data. The algorithm uses the regression method to create database of neighboring class set, and uses the alternative method to generate candidate frequent neighboring class set, namely, it uses increasing sequence to generate candidate in the one hand, it also uses decreasing sequence to generate candidate on the other hand, it only need scan once database to extract frequent neighboring class set. The algorithm improves mining efficiency by the alternative method, since not only using numerical variable to generate candidate is simple, but also using logic operation to compute support is very simple. The result of experiment indicates that the algorithm is faster and more efficient than presented algorithms when mining frequent neighboring class set in large spatial data. Keywords: neighboring class set; regression method; alternative method; increasing sequence; decreasing sequence.
An Algorithm of Alternately Mining Frequent Neighboring Class Set
589
to each other. MFNCS as in [2] uses the similar method of Apriori to search frequent neighboring class set, which gains some right instance of (k+1)-neighboring class set only through connecting right instance of k-neighboring class set, and so the algorithm has some repeated computing and superfluous neighboring class set, its efficiency is not efficient. Hence, this paper proposes an algorithm of alternately mining frequent neighboring class set, denoted by AMFNCS, which may efficiently reduce repeated computing and the number of superfluous neighboring class set.
2 Problem Description Every object in spatial domain composes spatial data set, which is expressed as data structure, denoted by